Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240205となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 仮想量子資源蒸留 : 一般的な枠組みと応用 Virtual quantum resource distillation: General framework and applications ( http://arxiv.org/abs/2404.13048v1 ) ライセンス: Link先を確認	Ryuji Takagi, Xiao Yuan, Bartosz Regula, Mile Gu,	(参考訳) 我々は,従来の量子資源蒸留を,古典的後処理の力を統合することで拡張する,仮想資源蒸留の一般的な枠組み,すなわち [Phys. Lett. 132, 050203 (2024)] に提案されている代替蒸留戦略を開発する。ここで提示されるフレームワークは、量子状態だけでなく、量子チャネルや高次プロセスのような動的量子オブジェクトにも適用することができる。計算可能な半定値プログラムの形式での仮想資源蒸留の性能評価とベンチマーク,および運用上の動機付け量について述べる。我々は,量子メモリ,量子通信,非マルコフ力学などの動的資源を含む設定だけでなく,絡み合い,コヒーレンス,マジックなどの標準的な資源理論を含む,様々な具体的な設定に一般の枠組みを適用した。確率蒸留の枠組みについても論じる。 We develop the general framework of virtual resource distillation -- an alternative distillation strategy proposed in [Phys. Rev. Lett. 132, 050203 (2024)], which extends conventional quantum resource distillation by integrating the power of classical postprocessing. The framework presented here is applicable not only to quantum states, but also dynamical quantum objects such as quantum channels and higher-order processes. We provide a general characterization and benchmarks for the performance of virtual resource distillation in the form of computable semidefinite programs as well as several operationally motivated quantities. We apply our general framework to various concrete settings of interest, including standard resource theories such as entanglement, coherence, and magic, as well as settings involving dynamical resources such as quantum memory, quantum communication, and non-Markovian dynamics. The framework of probabilistic distillation is also discussed.	翻訳日:2024-07-01 11:58:46 公開日:2024-02-05
# ニューラルロバストネスの探索メカニズム-幾何とスペクトルの間の橋渡しを探る Exploring mechanisms of Neural Robustness: probing the bridge between geometry and spectrum ( http://arxiv.org/abs/2405.00679v1 ) ライセンス: Link先を確認	Konstantin Holzhausen, Mia Merlid, Håkon Olav Torvik, Anders Malthe-Sørenssen, Mikkel Elle Lepperød,	(参考訳) バックプロパゲーションに最適化された人工ニューラルネットワークは、正確性はあるものの堅牢性に欠けており、その安全性に影響を及ぼす予期せぬ行動を引き起こす。生体神経系はこれらの問題をすでに解決している。したがって、堅牢性の生物学的メカニズムを理解することは、信頼できる安全なシステムを構築するための重要なステップである。人工モデルとは異なり、生物学的ニューロンは隣の細胞活動に基づいて接続を調整する。神経表現におけるロバスト性は、符号化多様体の滑らかさと相関すると仮定される。最近の研究は、マウスの一次視覚野を観察したパワーロー共分散スペクトルが、表現の正確性と堅牢性の間のバランスの取れたトレードオフを示していることを示唆している。ここでは、勝者を持つ教師なし局所学習モデルが全てのダイナミクスにそのような力の法則表現を学習させることを示し、今後の研究でそのような特性を持つ力学モデルを提供する。本研究の目的は, 神経表現における幾何学的特徴, スペクトル特性, 頑健性, 表現性の間の相互作用を理解することである。そこで, 重み, ジャコビアン, スペクトル正則化を用いて, 表現の滑らかさとスペクトルの関連性を検討した。我々の研究は、生物と人工両方のシステムにおいて、パワーロースペクトルと最適にスムーズなエンコーディングの基礎となるメカニズムの研究の基礎となる。得られた知見は、哺乳類の脳で堅牢なニューラルネットワークを実現するメカニズムを解明し、より安定で信頼性の高い人工システムの発達を知らせる可能性がある。 Backpropagation-optimized artificial neural networks, while precise, lack robustness, leading to unforeseen behaviors that affect their safety. Biological neural systems do solve some of these issues already. Thus, understanding the biological mechanisms of robustness is an important step towards building trustworthy and safe systems. Unlike artificial models, biological neurons adjust connectivity based on neighboring cell activity. Robustness in neural representations is hypothesized to correlate with the smoothness of the encoding manifold. Recent work suggests power law covariance spectra, which were observed studying the primary visual cortex of mice, to be indicative of a balanced trade-off between accuracy and robustness in representations. Here, we show that unsupervised local learning models with winner takes all dynamics learn such power law representations, providing upcoming studies a mechanistic model with that characteristic. Our research aims to understand the interplay between geometry, spectral properties, robustness, and expressivity in neural representations. Hence, we study the link between representation smoothness and spectrum by using weight, Jacobian and spectral regularization while assessing performance and adversarial robustness. Our work serves as a foundation for future research into the mechanisms underlying power law spectra and optimally smooth encodings in both biological and artificial systems. The insights gained may elucidate the mechanisms that realize robust neural networks in mammalian brains and inform the development of more stable and reliable artificial systems.	翻訳日:2024-07-01 11:19:45 公開日:2024-02-05
# 大規模クラウドシステムにおける依存性認識インシデントリンク Dependency Aware Incident Linking in Large Cloud Systems ( http://arxiv.org/abs/2403.18639v1 ) ライセンス: Link先を確認	Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan Rajmohan,	(参考訳) 信頼性の高い努力にもかかわらず、大規模クラウドサービスは必然的に、サービスの可用性と顧客満足度に大きな影響を与える生産インシデントを経験します。さらに悪いことに、多くの場合、1つのインシデントが複数のダウンストリーム障害を引き起こします。多くの場合、オンコールエンジニア(OCE)は、これらのインシデントをサイロで調査し、大量の手動の爪を発生させ、全体的なタイム・トゥ・ミディゲートインシデントを増加させる。したがって,効率的なインシデントリンクモデルの開発は,大規模な機能停止を迅速に解決し,オンコール疲労を軽減するために,関連するインシデントをクラスタにグループ化する上で極めて重要である。既存のインシデントリンク手法は、主にインシデント(タイトル、説明、重大さ、影響のあるコンポーネントなど)のテキスト情報とコンテキスト情報を活用しているため、サービス間の依存性を活用できない。本稿では、テキストおよびサービス依存グラフ情報を活用する依存性対応インシデントリンク(DiLink)フレームワークを提案し、同一サービスから来るインシデントリンクの精度とカバレッジを向上させるとともに、異なるサービスやワークロードからもたらされるインシデントリンクのカバレッジを改善する。さらに,Orthogonal Procrustesを用いてマルチモーダル(テキストおよびグラフィカル)データの埋め込みを整列する手法を提案する。 Microsoftの5つのワークロードによる実世界のインシデントに対する大規模な実験結果によると、アライメントメソッドのF1スコアは0.96(現在の最先端メソッドよりも14%向上)である。また、これらの5つのワークロードから610のサービスにこのソリューションをデプロイして、インシデント管理の改善と手作業による爪の削減を継続的にサポートしています。 Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCEs) examine these incidents in silos that lead to significant amount of manual toil and increase the overall time-to-mitigate incidents. Therefore, developing efficient incident linking models is of paramount importance for grouping related incidents into clusters so as to quickly resolve major outages and reduce on-call fatigue. Existing incident linking methods mostly leverages textual and contextual information of incidents (e.g., title, description, severity, impacted components), thus failing to leverage the inter-dependencies between services. In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads. Furthermore, we propose a novel method to align the embeddings of multi-modal (i.e., textual and graphical) data using Orthogonal Procrustes. Extensive experimental results on real-world incidents from 5 workloads of Microsoft demonstrate that our alignment method has an F1-score of 0.96 (14% gain over current state-of-the-art methods). We are also in the process of deploying this solution across 610 services from these 5 workloads for continuously supporting OCEs improving incident management and reducing manual toil.	翻訳日:2024-04-01 02:34:48 公開日:2024-02-05
# opML: ブロックチェーン上での最適機械学習 opML: Optimistic Machine Learning on Blockchain ( http://arxiv.org/abs/2401.17555v2 ) ライセンス: Link先を確認	KD Conway, Cathie So, Xiaohang Yu, Kartin Wong,	(参考訳) マシンラーニングとブロックチェーンテクノロジの統合は、分散化、セキュア、透過的なAIサービスのビジョンによって、関心が高まっているのを目撃している。この文脈では、ブロックチェーンシステムがAIモデル推論を実行するための革新的なアプローチであるopML(Optimistic Machine Learning on chain)を導入します。 opMLには、楽観的なロールアップシステムを思い出させる、インタラクティブな不正証明プロトコルがある。このメカニズムにより、MLサービスに対する分散的で検証可能なコンセンサスが保証され、信頼性と透明性が向上する。 zkML(Zero-Knowledge Machine Learning)とは異なり、opMLはコスト効率と高効率のMLサービスを提供し、参加要件は最小限である。注目すべきは、GPUのない標準PC上で7B-LLaMAのような広範な言語モデルの実行を可能にし、アクセシビリティを大幅に拡張することである。 opMLを通じてブロックチェーンとAIの能力を組み合わせることで、アクセスしやすく、セキュアで、効率的なオンチェーン機械学習に向けた変革的な旅を開始します。 The integration of machine learning with blockchain technology has witnessed increasing interest, driven by the vision of decentralized, secure, and transparent AI services. In this context, we introduce opML (Optimistic Machine Learning on chain), an innovative approach that empowers blockchain systems to conduct AI model inference. opML lies a interactive fraud proof protocol, reminiscent of the optimistic rollup systems. This mechanism ensures decentralized and verifiable consensus for ML services, enhancing trust and transparency. Unlike zkML (Zero-Knowledge Machine Learning), opML offers cost-efficient and highly efficient ML services, with minimal participation requirements. Remarkably, opML enables the execution of extensive language models, such as 7B-LLaMA, on standard PCs without GPUs, significantly expanding accessibility. By combining the capabilities of blockchain and AI through opML, we embark on a transformative journey toward accessible, secure, and efficient on-chain machine learning.	翻訳日:2024-03-25 12:08:11 公開日:2024-02-05
# 暗号コンピューティングを用いた市町村のサイバーリスクモデリング Municipal cyber risk modeling using cryptographic computing to inform cyber policymaking ( http://arxiv.org/abs/2402.01007v2 ) ライセンス: Link先を確認	Avital Baral, Taylor Reynolds, Lawrence Susskind, Daniel J. Weitzner, Angelina Wu,	(参考訳) 市町村は破壊的な結果を伴うサイバー攻撃に弱いが、彼らのリスクを評価し、彼らのセキュリティ姿勢を仲間と比較するための重要な情報がない。セキュリティ姿勢,インシデント,セキュリティコントロールの失敗,損失に関する,暗号化的にセキュアな計算プラットフォームを通じて収集された83の自治体のデータを用いて,我々は,自治体のためのデータ駆動型サイバーリスクモデルとサイバーセキュリティベンチマークを構築した。セクターにおけるセキュリティ姿勢のベンチマーク、サイバーインシデントの発生頻度、防衛姿勢に基づく組織に対する年次損失予測、個別の障害率と関連する損失に基づくサイバーコントロールの重み付けを作成している。これら4つの項目を組み合わせることで、セクター内のサイバーリスクを定量化し、対処すべきギャップを特定し、ポリシーの介入を優先順位付けし、時間とともに介入の進捗を追跡することで、サイバーポリシー作成のガイドに役立つ。市町村の場合、これらの新たなリスク対策は、サイバーセキュリティ対策の継続的な改善の必要性を強調し、弱みと強みを明確に示し、セキュリティ教育、インシデント対応、セキュリティドル当たりのリスク低減率が最も低い自治体への取り組みなど、早期の政策目標を政府に提供するものである。 Municipalities are vulnerable to cyberattacks with devastating consequences, but they lack key information to evaluate their own risk and compare their security posture to peers. Using data from 83 municipalities collected via a cryptographically secure computation platform about their security posture, incidents, security control failures, and losses, we build data-driven cyber risk models and cyber security benchmarks for municipalities. We produce benchmarks of the security posture in a sector, the frequency of cyber incidents, forecasted annual losses for organizations based on their defensive posture, and a weighting of cyber controls based on their individual failure rates and associated losses. Combined, these four items can help guide cyber policymaking by quantifying the cyber risk in a sector, identifying gaps that need to be addressed, prioritizing policy interventions, and tracking progress of those interventions over time. In the case of the municipalities, these newly derived risk measures highlight the need for continuous measured improvement of cybersecurity readiness, show clear areas of weakness and strength, and provide governments with some early targets for policy focus such as security education, incident response, and focusing efforts first on municipalities at the lowest security levels that have the highest risk reduction per security dollar invested.	翻訳日:2024-03-25 11:58:26 公開日:2024-02-05
# LLMプログラム合成と不正確なオブジェクトデータベースを用いたオープン・ユニバース室内シーン生成 Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases ( http://arxiv.org/abs/2403.09675v1 ) ライセンス: Link先を確認	Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie,	(参考訳) テキストのプロンプトに応じて屋内シーンを生成するシステムを提案する。プロンプトはシーン記述の固定語彙に制限されず、生成されたシーン内のオブジェクトは固定されたオブジェクトカテゴリに制限されない。屋内シーン生成に関するこれまでのほとんどの研究とは異なり、既存の3Dシーンの大規模なトレーニングデータセットは不要である。代わりに、事前訓練された大規模言語モデル(LLM)に符号化された世界知識を活用して、オブジェクトとそれらの間の空間関係を記述するドメイン固有のレイアウト言語でプログラムを合成する。このようなプログラムを実行すると制約満足度問題の仕様が作成され、勾配に基づく最適化スキームを用いてオブジェクトの位置と向きを生成する。オブジェクトの幾何学を生成するために、システムはデータベースから3Dメッシュを検索する。カテゴリアノテートされた相互整合メッシュのデータベースを使用する以前の作業とは異なり、視覚言語モデル(VLM)を使用して、非アノテートで一貫性のないメッシュの巨大なデータベースからメッシュを取得するパイプラインを開発する。実験により,本システムは従来の閉片側シーン生成タスクにおいて,3次元データに基づいて訓練された生成モデルよりも優れており,また,開放片側シーン生成における最近のLLMに基づくレイアウト生成手法よりも優れていた。 We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of existing 3D scenes. Instead, it leverages the world knowledge encoded in pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language that describe objects and spatial relations between them. Executing such a program produces a specification of a constraint satisfaction problem, which the system solves using a gradient-based optimization scheme to produce object positions and orientations. To produce object geometry, the system retrieves 3D meshes from a database. Unlike prior work which uses databases of category-annotated, mutually-aligned meshes, we develop a pipeline using vision-language models (VLMs) to retrieve meshes from massive databases of un-annotated, inconsistently-aligned meshes. Experimental evaluations show that our system outperforms generative models trained on 3D data for traditional, closed-universe scene generation tasks; it also outperforms a recent LLM-based layout generation method on open-universe scene generation.	翻訳日:2024-03-25 08:06:28 公開日:2024-02-05
# 勾配降下型モルフォロジーニューラルネットワークの訓練 : いくつかの理論的考察 Training morphological neural networks with gradient descent: some theoretical insights ( http://arxiv.org/abs/2403.12975v1 ) ライセンス: Link先を確認	Samy Blusseau,	(参考訳) モルフォロジーニューラルネットワーク(英: Morphological Neural Network、または層)は、完全な格子演算子の表現のような理論的側面や画像処理パイプラインの開発において、数学的形態学の進歩を促進する強力なツールである。しかしながら、これらのアーキテクチャは、少なくとも勾配降下に基づく最適化アルゴリズムを使用する一般的な機械学習フレームワークにおいて、いくつかの形態的レイヤを数えると、トレーニングが困難であることが判明した。本稿では、ブーリガンド微分の非滑らかな最適化概念を考慮して、微分に基づくアプローチと形態素ネットワークに適用されるバックプロパゲーションの可能性と限界について検討する。我々は、特に初期化と学習率に関する洞察と最初の理論的ガイドラインを提供する。 Morphological neural networks, or layers, can be a powerful tool to boost the progress in mathematical morphology, either on theoretical aspects such as the representation of complete lattice operators, or in the development of image processing pipelines. However, these architectures turn out to be difficult to train when they count more than a few morphological layers, at least within popular machine learning frameworks which use gradient descent based optimization algorithms. In this paper we investigate the potential and limitations of differentiation based approaches and back-propagation applied to morphological networks, in light of the non-smooth optimization concept of Bouligand derivative. We provide insights and first theoretical guidelines, in particular regarding initialization and learning rates.	翻訳日:2024-03-25 07:27:10 公開日:2024-02-05
# 動的アタックグラフを用いたモノのインターネット(IoT)環境における脅威モデリング Threat Modelling in Internet of Things (IoT) Environment Using Dynamic Attack Graphs ( http://arxiv.org/abs/2310.01689v2 ) ライセンス: Link先を確認	Marwa Salayma,	(参考訳) 本研究は,IoT(Internet of Things,モノのインターネット)環境における攻撃経路の変更を動的に表現するための脅威モデリング手法を提案する。提案手法では,攻撃グラフを用いた脅威の伝播について検討する。しかし、従来のアタックグラフアプローチは、エンタープライズネットワークのように継続的に変更されない静的環境に適用され、静的で通常非常に大きなアタックグラフにつながる。対照的に、IoT環境は動的変更と相互接続によって特徴づけられることが多い。このような新たな相互接続は、対応するアタックグラフが変化するデバイス間の到達性の変化につながる。これは、脅威とリスク分析のための動的トポロジとアタックグラフを必要とする。本稿では,IoT環境において発生する動的システム変化に対処し,システムダイナミクスを許容しながら攻撃経路の特定を可能にする脅威モデリング手法を開発した。動的トポロジとアタックグラフは、関連するグラフを維持することで、IoT環境の変化に迅速に対応できる。研究の動機付けと提案したアプローチを説明するために,医療システムに基づく事例シナリオを紹介した。提案手法はグラフデータベース管理ツール(GDBM)-Neo4jを用いて実装されている。これは高度に接続されたデータのグラフをマッピング、視覚化、クエリするための一般的なツールであり、高速な脅威モデリングメカニズムを提供することで効率よく、動的IoT環境におけるセキュリティ変更のキャプチャに適している。 This work presents a threat modelling approach to represent changes to the attack paths through an Internet of Things (IoT) environment when the environment changes dynamically, i.e., when new devices are added or removed from the system or when whole sub-systems join or leave. The proposed approach investigates the propagation of threats using attack graphs. However, traditional attack graph approaches have been applied in static environments that do not continuously change such as the Enterprise networks, leading to static and usually very large attack graphs. In contrast, IoT environments are often characterised by dynamic change and interconnections; different topologies for different systems may interconnect with each other dynamically and outside the operator control. Such new interconnections lead to changes in the reachability amongst devices according to which their corresponding attack graphs change. This requires dynamic topology and attack graphs for threat and risk analysis. In this paper, a threat modelling approach is developed that copes with dynamic system changes that may occur in IoT environments and enables identifying attack paths whilst allowing for system dynamics. Dynamic topology and attack graphs were developed that are able to cope with the changes in the IoT environment rapidly by maintaining their associated graphs. To motivate the work and illustrate the proposed approach, an example scenario based on healthcare systems is introduced. The proposed approach is implemented using a Graph Database Management Tool (GDBM)- Neo4j- which is a popular tool for mapping, visualising and querying the graphs of highly connected data, and is efficient in providing a rapid threat modelling mechanism, which makes it suitable for capturing security changes in the dynamic IoT environment.	翻訳日:2024-03-19 03:31:41 公開日:2024-02-05
# ALBERTA: トランスフォーマーアーキテクチャにおけるalgorithmベースのエラーレジリエンス ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures ( http://arxiv.org/abs/2310.03841v2 ) ライセンス: Link先を確認	Haoxuan Liu, Vasu Singh, Michał Filipiuk, Siva Kumar Sastry Hari,	(参考訳) ビジョントランスフォーマーは、信頼性の高い安全クリティカルなアプリケーションにますますデプロイされている。過渡的ハードウェアエラーのような潜在的なエラーにもかかわらず、実行の正確性を保証することが不可欠である。アルゴリズムに基づく新しいレジリエンスフレームワークであるALBERTAを提案し、エンドツーエンドのレジリエンス分析とトランスフォーマーベースのアーキテクチャの保護を実現する。まず、トランス層のレジリエンスを計算し、ランク付けする効率的なプロセスを開発する。トランスモデルの規模が大きいため、従来のネットワーク冗長性を最も脆弱なレイヤのサブセットに適用することは、過激なオーバーヘッドを伴うにもかかわらず、高いエラーカバレッジを提供する。本稿では,浮動小数点演算と整数演算を併用したトランスフォーマーモデルにおいて,最も脆弱な汎用行列乗算(GEMM)層を保護することを目的とした,ソフトウェア指向のチェックサムに基づくエラー検出手法を提供することにより,この問題に対処する。その結果,提案手法は,それぞれ0.2%未満のミスマッチと0.01%のメモリオーバヘッドのミスマッチを生じるエラーに対して,99%以上のカバレッジを達成できた。最後に、異なる数値精度で、最新のGPUアーキテクチャにおける我々のフレームワークの適用性を示す。本稿では,誤り検出を平均2%未満のオーバーヘッドで解決する,効率的な自己補正機構を提案する。 Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors. We propose a novel algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis and protection of transformer-based architectures. First, our work develops an efficient process of computing and ranking the resilience of transformers layers. We find that due to the large size of transformer models, applying traditional network redundancy to a subset of the most vulnerable layers provides high error coverage albeit with impractically high overhead. We address this shortcoming by providing a software-directed, checksum-based error detection technique aimed at protecting the most vulnerable general matrix multiply (GEMM) layers in the transformer models that use either floating-point or integer arithmetic. Results show that our approach achieves over 99% coverage for errors that result in a mismatch with less than 0.2% and 0.01% computation and memory overheads, respectively. Lastly, we present the applicability of our framework in various modern GPU architectures under different numerical precisions. We introduce an efficient self-correction mechanism for resolving erroneous detection with an average of less than 2% overhead per error.	翻訳日:2024-03-19 03:12:08 公開日:2024-02-05
# 最適ロールアップのインセンティブ非適合性 Incentive Non-Compatibility of Optimistic Rollups ( http://arxiv.org/abs/2312.01549v2 ) ライセンス: Link先を確認	Daji Landis,	(参考訳) 最適化ロールアップ(Optimistic rollups)は、その基盤となるチェーンのスループットを向上する、人気があり有望な方法である。これらの方法は、彼らの安全を保証するための経済的なインセンティブに依存している。楽観的なロールアップのモデルは、インセンティブがプレイヤーの期待する行動と必ずしも一致していないことを示唆し、既存の楽観的なロールアップのセキュリティを損なう可能性がある。我々のモデルに照らされた潜在的な解決策について議論する。 Optimistic rollups are a popular and promising method of increasing the throughput capacity of their underlying chain. These methods rely on economic incentives to guarantee their security. We present a model of optimistic rollups that suggests that the incentives are not necessarily aligned with the expected behavior of the players, thus potentially undermining the security of existing optimistic rollups. We discuss some potential solutions illuminated by our model.	翻訳日:2024-03-18 13:15:35 公開日:2024-02-05
# Code-based Single-Server Private Information Retrieval: Circumventing the Sub-Query Attack Code-Based Single-Server Private Information Retrieval: Circumventing the Sub-Query Attack ( http://arxiv.org/abs/2402.02871v1 ) ライセンス: Link先を確認	Neehar Verma, Camilla Hollanti,	(参考訳) ランダムな線形コードを用いて,単一サーバからのプライベート情報検索を検討する。 Holzbaur、Hollanti、Wachter-Zehによって提案された最初のコードベースのシングルサーバ計算PIRスキームの修正版である[Holzbaur et al , "Computational Code-Based Single-Server Private Information Retrieval", 2020 IEEE ISIT]。元のスキームは[Bordage et al , “On the privacy of a code-based single-server computer PIR scheme, Cryptogr. Comm., 2021] で、ユーザのクエリのサブマトリックスの非常に高いランク差から生じる攻撃によって破られた。ここで、この攻撃は、サブ行列が無視できるランク差を持つことを保証することで回避される。さらに、ランク差を所望のファイルインデックスに関連付けることができないため、スキームのプライバシーを確保することができる。複数のファイルを取得する場合、修正されたスキームのレートは、ほとんど影響を受けず、元のスキームと同等である。 Private information retrieval from a single server is considered, utilizing random linear codes. Presented is a modified version of the first code-based single-server computational PIR scheme proposed by Holzbaur, Hollanti, and Wachter-Zeh in [Holzbaur et al., "Computational Code-Based Single-Server Private Information Retrieval", 2020 IEEE ISIT]. The original scheme was broken in [Bordage et al., "On the privacy of a code-based single-server computational PIR scheme", Cryptogr. Comm., 2021] by an attack arising from highly probable rank differences in sub-matrices of the user's query. Here, this attack is now circumvented by ensuring that the sub-matrices have negligible rank difference. Furthermore, the rank difference cannot be attributed to the desired file index, thereby ensuring the privacy of the scheme. In the case of retrieving multiple files, the rate of the modified scheme is largely unaffected and at par with the original scheme.	翻訳日:2024-03-18 07:57:54 公開日:2024-02-05
# UniHENN:im2colを使わずに、より多彩な同型暗号化ベースのCNNを設計 UniHENN: Designing More Versatile Homomorphic Encryption-based CNNs without im2col ( http://arxiv.org/abs/2402.03060v1 ) ライセンス: Link先を確認	Hyunmin Choi, Jihun Kim, Seungho Kim, Seonhye Park, Jeongyong Park, Wonbin Choi, Hyoungshick Kim,	(参考訳) ホモモルフィック暗号化は、プライバシ保護クラウドサービスにとって重要な復号化のない暗号化データの計算を可能にする。しかし、同相暗号を用いた畳み込みニューラルネットワーク(CNN)のデプロイは、特に、インプットデータを畳み込みのための2次元行列に変換する際に、重要な課題に直面する。この方法は効率的なが、暗号化されたデータ構造との互換性の制約により、デプロイ可能なCNNモデルの多様性を制限する。 UniHENNは、同型暗号に基づくCNNアーキテクチャであり、Im2colの必要性を排除し、同型暗号を用いた様々なCNNモデルとの互換性を確保する。実験の結果,UniHENNは2次元CNN推論アーキテクチャであるPyCrCNNを平均30.090秒で上回り,PyCrCNNの794.064秒より大幅に高速であることがわかった。さらに、UniHENNは、高要求のクラウドアプリケーションにとって不可欠な機能である、同時イメージの処理にim2colを使用するTenSEALよりも優れています。 UniHENNの汎用性は、1Dと6つの異なる2D CNNを含むさまざまなCNNアーキテクチャで証明されており、柔軟性と効率性を強調している。これらの品質は、UniHENNを、クラウドコンピューティング環境におけるスケーラブルでセキュアで効率的なディープラーニングの需要の増加に対処する、プライバシ保護、クラウドベースのCNNサービスの有望なソリューションとして確立している。 Homomorphic encryption enables computations on encrypted data without decryption, which is crucial for privacy-preserving cloud services. However, deploying convolutional neural networks (CNNs) with homomorphic encryption encounters significant challenges, particularly in converting input data into a two-dimensional matrix for convolution, typically achieved using the im2col technique. While efficient, this method limits the variety of deployable CNN models due to compatibility constraints with the encrypted data structure. UniHENN, a homomorphic encryption-based CNN architecture, eliminates the need for im2col, ensuring compatibility with a diverse range of CNN models using homomorphic encryption. Our experiments demonstrate that UniHENN surpasses the leading 2D CNN inference architecture, PyCrCNN, in inference time, as evidenced by its performance on the LeNet-1 dataset, where it averages 30.090 seconds--significantly faster than PyCrCNN's 794.064 seconds. Furthermore, UniHENN outperforms TenSEAL, which employs im2col, in processing concurrent images, an essential feature for high-demand cloud applications. The versatility of UniHENN is proven across various CNN architectures, including 1D and six different 2D CNNs, highlighting its flexibility and efficiency. These qualities establish UniHENN as a promising solution for privacy-preserving, cloud-based CNN services, addressing the increasing demand for scalable, secure, and efficient deep learning in cloud computing environments.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# 仮想空間におけるセキュリティとプライバシの強化:拡張現実感デバイスの分析 Augmenting Security and Privacy in the Virtual Realm: An Analysis of Extended Reality Devices ( http://arxiv.org/abs/2402.03114v1 ) ライセンス: Link先を確認	Derin Cayir, Abbas Acar, Riccardo Lazzeretti, Marco Angelini, Mauro Conti, Selcuk Uluagac,	(参考訳) 本研究では,セキュリティとプライバシ攻撃のデバイス中心の分析と,拡張現実性(XR)デバイスに対する防御について紹介し,堅牢でプライバシに配慮したセキュリティメカニズムの必要性を強調した。本分析に基づき,XRデバイスのセキュリティとプライバシの確保を支援するため,今後の研究方針を提示し,設計上の考慮事項を提案する。 In this work, we present a device-centric analysis of security and privacy attacks and defenses on Extended Reality (XR) devices, highlighting the need for robust and privacy-aware security mechanisms. Based on our analysis, we present future research directions and propose design considerations to help ensure the security and privacy of XR devices.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# 大規模言語モデルを用いた詐欺検出 Detecting Scams Using Large Language Models ( http://arxiv.org/abs/2402.03147v1 ) ライセンス: Link先を確認	Liming Jiang,	(参考訳) 大規模言語モデル(LLM)は、セキュリティなど、様々なアプリケーションで注目を集めている。本稿では,サイバーセキュリティの重要な側面である詐欺検知におけるLCMの有用性について検討する。従来のアプリケーションとは違って,フィッシングや前払い詐欺,ロマンス詐欺などの詐欺を識別するための,LSMの新しいユースケースを提案する。本稿では,LLMのセキュリティ応用について述べるとともに,詐欺によって引き起こされるユニークな課題について論じる。具体的には、LLMを用いた効率的な詐欺検知器の構築、データ収集、前処理、モデル選択、トレーニング、ターゲットシステムへの統合などに関わる重要なステップについて概説する。さらに、複製メール上でGPT-3.5とGPT-4を用いて予備評価を行い、フィッシングや詐欺メールの一般的な兆候を特定する能力を強調した。その結果、疑わしい要素を認識する上でモデルの有効性が示されたが、様々な言語タスクにまたがる包括的な評価の必要性を強調した。この論文は、進化する脅威に適応するために、継続的な改善とサイバーセキュリティの専門家とのコラボレーションの重要性を概説することで締めくくっている。 Large Language Models (LLMs) have gained prominence in various applications, including security. This paper explores the utility of LLMs in scam detection, a critical aspect of cybersecurity. Unlike traditional applications, we propose a novel use case for LLMs to identify scams, such as phishing, advance fee fraud, and romance scams. We present notable security applications of LLMs and discuss the unique challenges posed by scams. Specifically, we outline the key steps involved in building an effective scam detector using LLMs, emphasizing data collection, preprocessing, model selection, training, and integration into target systems. Additionally, we conduct a preliminary evaluation using GPT-3.5 and GPT-4 on a duplicated email, highlighting their proficiency in identifying common signs of phishing or scam emails. The results demonstrate the models' effectiveness in recognizing suspicious elements, but we emphasize the need for a comprehensive assessment across various language tasks. The paper concludes by underlining the importance of ongoing refinement and collaboration with cybersecurity experts to adapt to evolving threats.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# 静電気サイドチャネル攻撃に対する軽量マスキング Lightweight Masking Against Static Power Side-Channel Attacks ( http://arxiv.org/abs/2402.03196v1 ) ライセンス: Link先を確認	Jitendra Bhandari, Mohammed Nabeel, Likhitha Mankali, Ozgur Sinanoglu, Ramesh Karri, Johann Knechtel,	(参考訳) 本稿では,静的電力サイドチャネル攻撃(PSCA)に対する新たな防御戦略を提案する。本手法は,(1)合成中の高Vthと低Vthのセル選択を注意深く調整し,セキュリティとタイミングの影響を考慮し,(2)実行時にこれらのセル間の操作をランダムに切り替えることに基づく。このアプローチは静的PSCAの中心にある、非常に曖昧な静的パワーパターンに役立ちます。商業用28nmノードを用いた実験の結果,攻撃に要する労力は96倍に増加した。これまでの対策と比較すると、コストは少なく、軽量な防御手段となっている。 This paper presents a novel defense strategy against static power side-channel attacks (PSCAs), a critical threat to cryptographic security. Our method is based on (1) carefully tuning high-Vth versus low-Vth cell selection during synthesis, accounting for both security and timing impact, and (2), at runtime, randomly switching the operation between these cells. This approach serves to significantly obscure static power patterns, which are at the heart of static PSCAs. Our experimental results on a commercial 28nm node show a drastic increase in the effort required for a successful attack, namely up to 96 times more traces. When compared to prior countermeasures, ours incurs little cost, making it a lightweight defense.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# SOAP: ソーシャル認証プロトコル SOAP: A Social Authentication Protocol ( http://arxiv.org/abs/2402.03199v1 ) ライセンス: Link先を確認	Felix Linker, David Basin,	(参考訳) ソーシャル認証は、メッセージアプリケーションにおける手動鍵認証を置き換えるための、使用可能な認証儀式として提案されている。ソーシャル認証を使用して、チャットパートナーは、アイデンティティプロバイダが管理するデジタルIDを使用して、仲間を認証する。本稿では、社会認証を正式に定義し、社会認証をほぼ自動化し、SOAPのセキュリティを正式に証明し、SOAPの実用性を2つのプロトタイプで実証するSOAPと呼ばれるプロトコルを提示する。 1つのプロトタイプはWebベースで、もう1つはオープンソースのSignalメッセージングアプリケーションで実装されている。 SOAPを使用すると、ユーザーは自分のメッセージングアカウントを妥協する際の限界を著しく高めることができる。 SignalやWhatsAppのようなメッセージングアプリケーションが提供するデフォルトのセキュリティとは対照的に、攻撃者はメッセージアカウントと、被害者を攻撃するためにすべてのIDプロバイダが管理するIDの両方を侵害しなければならない。セキュリティと自動化に加えて、SOAPはよく確立されたOpenID Connectプロトコルの上に構築されているので、簡単に採用できます。 Social authentication has been suggested as a usable authentication ceremony to replace manual key authentication in messaging applications. Using social authentication, chat partners authenticate their peers using digital identities managed by identity providers. In this paper, we formally define social authentication, present a protocol called SOAP that largely automates social authentication, formally prove SOAP's security, and demonstrate SOAP's practicality in two prototypes. One prototype is web-based, and the other is implemented in the open-source Signal messaging application. Using SOAP, users can significantly raise the bar for compromising their messaging accounts. In contrast to the default security provided by messaging applications such as Signal and WhatsApp, attackers must compromise both the messaging account and all identity provider-managed identities to attack a victim. In addition to its security and automation, SOAP is straightforward to adopt as it is built on top of the well-established OpenID Connect protocol.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# VLCシステムの物理層セキュリティ向上のためのIRS誘起時間遅延の活用 Leveraging IRS Induced Time Delay for Enhanced Physical Layer Security in VLC Systems ( http://arxiv.org/abs/2402.03202v1 ) ライセンス: Link先を確認	Rashid Iqbal, Mauro Biagi, Ahmed Zoha, Muhammad Ali Imran, Hanaa Abumarshoud,	(参考訳) 室内可視光通信(VLC)は、光が伝播する狭い領域の外側の攻撃者に対して安全であると考えられているが、それでもカバー領域内からの傍受には影響しない。インテリジェント反射面(IRS)と呼ばれる新しい技術が最近導入され、物理層セキュリティ(PLS)を強化する方法を提供している。 IRS支援型VLCのほとんどの研究は、全ての反射要素からの到着と同じ時刻を仮定し、時間遅延と関連するシンボル干渉の影響を見落としている。本稿は,VLCシステムにおける時間遅延が機密保持率に与える影響を初めて取り上げる。その結果、3Wの固定発光ダイオード(LED)パワーでは、盗聴器がLEDの半径1m以内にある場合、正当性のあるユーザに対して、秘密度を253\%まで向上させることができることがわかった。以上の結果から, 盗聴器が良好な位置にある場合であっても, IRS 要素を慎重に割り当てることによりPSS が向上し, 正規ユーザよりもチャネルゲインが向上することが示唆された。 Indoor visible light communication (VLC) is considered secure against attackers outside the confined area where the light propagates, but it is still susceptible to interception from inside the coverage area. A new technology, intelligent reflecting surfaces (IRS), has been recently introduced, offering a way to enhance physical layer security (PLS). Most research on IRS-assisted VLC assumes the same time of arrival from all reflecting elements and overlooks the effect of time delay and the associated intersymbol interference. This paper tackles, for the first time, the effect of time delay on the secrecy rate in VLC systems. Our results show that, at a fixed light-emitting diode (LED) power of 3W, the secrecy rate can be enhanced by up to 253\% at random positions for the legitimate user when the eavesdropper is located within a 1-meter radius of the LED. Our results also show that careful allocation of the IRS elements can lead to enhanced PLS even when the eavesdropper has a more favourable position and, thus, a better channel gain than the legitimate user.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# 機会情報のガウス混合によるRAIMの拡張 Extending RAIM with a Gaussian Mixture of Opportunistic Information ( http://arxiv.org/abs/2402.03449v1 ) ライセンス: Link先を確認	Wenjie Liu, Panos Papadimitratos,	(参考訳) GNSSは様々なアプリケーションに必須だが、攻撃に対して脆弱である。元々の受信機自動整合性監視(RAIM)は、GNSSの確保のために設計されたものではない。この文脈では、RAIMは無線信号(SOP)またはオンボードセンサー(通常は良心と仮定される)で拡張された。しかし、攻撃者は無線ネットワークを操作する可能性もあり、信頼できないSOPを考慮に入れたソリューションの必要性が高まっている。これを解決するため,地上のインフラや搭載センサーからの計測情報を全て組み込んでRAIMを拡張し,高機能なGNSSスプーフィング検出を行う。本研究の目的は、GNSS疑似乱数サブセットと信頼できないネットワークの無線信号サブセットからのロケーションソリューションを含む拡張RAIMソリューションから得られる位置を解析することにより、GNSSスプーフィングの可能性を評価することである。本手法は, 部分集合生成と位置融合の2つの主成分からなる。位置情報のサブセットは、位置決めアルゴリズムによって作成および処理され、一時的な位置を生成する。オンボードセンサーは速度、加速度、姿勢データを提供し、動きの制約に基づいて位置フィルタリングを支援する。フィルタされた位置は不確実性でモデル化され、GNSSスプーフィング検出のために正規化された複合可能性関数に融合される。非協調的および協調的攻撃下でのGNSSのみ及び多層構造シナリオの理論的評価を行う。これらの攻撃の検出は、良性部分集合の数が特定のしきい値を超えた場合に実現可能である。実験的な検証には、キスタ地域の実世界のデータセットが使用されている。ベースライン法との比較分析では,ガウス混合RAIM法により検出精度が大幅に向上した。また、RAIM結果の可視位置回復への活用についても論じる。 GNSS are indispensable for various applications, but they are vulnerable to spoofing attacks. The original receiver autonomous integrity monitoring (RAIM) was not designed for securing GNSS. In this context, RAIM was extended with wireless signals, termed signals of opportunity (SOPs), or onboard sensors, typically assumed benign. However, attackers might also manipulate wireless networks, raising the need for a solution that considers untrustworthy SOPs. To address this, we extend RAIM by incorporating all opportunistic information, i.e., measurements from terrestrial infrastructures and onboard sensors, culminating in one function for robust GNSS spoofing detection. The objective is to assess the likelihood of GNSS spoofing by analyzing locations derived from extended RAIM solutions, which include location solutions from GNSS pseudorange subsets and wireless signal subsets of untrusted networks. Our method comprises two pivotal components: subset generation and location fusion. Subsets of ranging information are created and processed through positioning algorithms, producing temporary locations. Onboard sensors provide speed, acceleration, and attitude data, aiding in location filtering based on motion constraints. The filtered locations, modeled with uncertainty, are fused into a composite likelihood function normalized for GNSS spoofing detection. Theoretical assessments of GNSS-only and multi-infrastructure scenarios under uncoordinated and coordinated attacks are conducted. The detection of these attacks is feasible when the number of benign subsets exceeds a specific threshold. A real-world dataset from the Kista area is used for experimental validation. Comparative analysis against baseline methods shows a significant improvement in detection accuracy achieved by our Gaussian Mixture RAIM approach. Moreover, we discuss leveraging RAIM results for plausible location recovery.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# Ethereumスマートコントラクトのためのセキュリティフレームワーク A security framework for Ethereum smart contracts ( http://arxiv.org/abs/2402.03555v1 ) ライセンス: Link先を確認	Antonio López Vivar, Ana Lucila Sandoval Orozco, Luis Javier García Villalba,	(参考訳) ブロックチェーンとスマートコントラクトの使用は、ここ数年で止まらなかった。利用を拡大するすべてのソフトウェアと同様に、基盤技術とスマートコントラクトコードの両方の脆弱性を悪用しようとするハッカーもターゲットにしている。スマートコントラクトの脆弱性を分析するツールはすでに数多く存在するが、分析データの提供における異質性やさまざまなアプローチ、差異は、スマートコントラクト開発者の学習曲線を急激なものにしている。この記事では、著者らがスマートコントラクトの分析フレームワークであるESAF(Ethereum Security Analysis Framework)を紹介します。これは、一連のターゲットコントラクトの永続的なセキュリティ監視ツールや、古典的な脆弱性分析ツールとして使用できるスマートコントラクトの脆弱性を統一し、分析するタスクを促進することを目的としています。 The use of blockchain and smart contracts have not stopped growing in recent years. Like all software that begins to expand its use, it is also beginning to be targeted by hackers who will try to exploit vulnerabilities in both the underlying technology and the smart contract code itself. While many tools already exist for analyzing vulnerabilities in smart contracts, the heterogeneity and variety of approaches and differences in providing the analysis data makes the learning curve for the smart contract developer steep. In this article the authors present ESAF (Ethereum Security Analysis Framework), a framework for analysis of smart contracts that aims to unify and facilitate the task of analyzing smart contract vulnerabilities which can be used as a persistent security monitoring tool for a set of target contracts as well as a classic vulnerability analysis tool among other uses.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# 不審なブートシーケンスを解析してAndroidマルウェアを検出する新しいパターン認識システム A novel pattern recognition system for detecting Android malware by analyzing suspicious boot sequences ( http://arxiv.org/abs/2402.03562v1 ) ライセンス: Link先を確認	Jorge Maestre Vidal, Marco Antonio Sotelo Monge, Luis Javier García Villalba,	(参考訳) 本稿では,不審なアプリケーションの動的挙動を調査したスマートフォン用マルウェア検出システムを提案する。主な目的は、悪意のあるソフトウェアが被害者のシステムにインストールされるのを防ぐことである。このアプローチでは、Androidプラットフォームに対処するマルウェアの特定に重点を置いている。そのため、最近インストールされたアプリケーションのブートプロセス中に実行されるシステムコールのみを研究する。これにより、初期化に関連するアクティビティのみを考慮するため、考慮すべき情報量が削減される。この提案では,3つの処理層 – 監視,分析,意思決定 – でパターン認識システムを定義する。第一に、システムコールのシーケンスを抽出するために、潜在的に侵害されたアプリケーションは安全で孤立した環境で実行される。そして、分析ステップは意思決定に必要なメトリクスを生成する。このレベルは、配列アライメントアルゴリズムとバギングを組み合わせることで、最も類似した領域を考慮して抽出されたシーケンス間の類似性を評価することができる。意思決定段階では、ウィルコクソンの署名されたランクテストが実施され、新しいソフトウェアが合法か悪意かが判断される。この提案は、特定のユースケースの詳細な研究や、よく知られた公開データセットのサンプルを分析する際の有効性の評価を含む、さまざまな実験でテストされている。実験結果が実証され,そのアプローチが文献学の戦略を補完するものであることが実証された。 This paper introduces a malware detection system for smartphones based on studying the dynamic behavior of suspicious applications. The main goal is to prevent the installation of the malicious software on the victim systems. The approach focuses on identifying malware addressed against the Android platform. For that purpose, only the system calls performed during the boot process of the recently installed applications are studied. Thereby the amount of information to be considered is reduced, since only activities related with their initialization are taken into account. The proposal defines a pattern recognition system with three processing layers: monitoring, analysis and decision-making. First, in order to extract the sequences of system calls, the potentially compromised applications are executed on a safe and isolated environment. Then the analysis step generates the metrics required for decision-making. This level combines sequence alignment algorithms with bagging, which allow scoring the similarity between the extracted sequences considering their regions of greatest resemblance. At the decision-making stage, the Wilcoxon signed-rank test is implemented, which determines if the new software is labeled as legitimate or malicious. The proposal has been tested in different experiments that include an in-depth study of a particular use case, and the evaluation of its effectiveness when analyzing samples of well-known public datasets. Promising experimental results have been shown, hence demonstrating that the approach is a good complement to the strategies of the bibliography.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# インターネット上の見えないゲーム : 認識パターンの復号化を事例として The Invisible Game on the Internet: A Case Study of Decoding Deceptive Patterns ( http://arxiv.org/abs/2402.03569v1 ) ライセンス: Link先を確認	Zewei Shi, Ruoxi Sun, Jieshan Chen, Jiamou Sun, Minhui Xue,	(参考訳) 認知的パターンは、ユーザを操作するためのデジタルプラットフォームに埋め込まれたデザインプラクティスであり、Webおよびモバイルソフトウェア開発業界において、広く長く続いている問題を表している。法的行動は、世界規模で偽りのパターンを規制する緊急性を強調している。しかし, 検出ツールの進歩にもかかわらず, 誤認パターンのリスクを評価する上で, 重大なギャップが存在する。本研究では,認識パターンの脅威を形式化し,デコードするために,Adversary,Watchdog(e.g.,検出ツール),Challendar(e.g.,ユーザ)間のインタラクションを含む包括的アプローチを提案する。そこで本研究では,定量的リスク評価システムを提案する。提案するリスク評価システムの実践可能性を明らかにするために, パターン評価における人的要因の関与の重要性を強調した事例を代表例として分析した。 Deceptive patterns are design practices embedded in digital platforms to manipulate users, representing a widespread and long-standing issue in the web and mobile software development industry. Legislative actions highlight the urgency of globally regulating deceptive patterns. However, despite advancements in detection tools, a significant gap exists in assessing deceptive pattern risks. In this study, we introduce a comprehensive approach involving the interactions between the Adversary, Watchdog (e.g., detection tools), and Challengers (e.g., users) to formalize and decode deceptive pattern threats. Based on this, we propose a quantitative risk assessment system. Representative cases are analyzed to showcase the practicability of the proposed risk scoring system, emphasizing the importance of involving human factors in deceptive pattern risk assessment.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# Matcha: 正確なプライバシ栄養ラベルを作成するためのIDEプラグイン Matcha: An IDE Plugin for Creating Accurate Privacy Nutrition Labels ( http://arxiv.org/abs/2402.03582v1 ) ライセンス: Link先を確認	Tianshi Li, Lorrie Faith Cranor, Yuvraj Agarwal, Jason I. Hong,	(参考訳) AppleとGoogleは、プライバシ栄養ラベルのバージョンをモバイルアプリストアに導入し、アプリのデータプラクティスをよりよく通知した。しかし、これらのラベルは開発者によって自己報告されており、ラベル分類の誤解のために多くの不正確さが含まれていることが判明している。この作業では、自動コード分析を使用して開発者が正確なGoogle Playデータ安全ラベルを作成するのを助けるIDEプラグインであるMatchaを紹介します。開発者は、カスタムのJavaアノテーションを追加し、自動生成されたXML仕様を変更することで、生成されたラベルをコントロールしながら、Matchaのユーザデータアクセスと送信を検知する機能を利用することができる。 12人の開発者による評価によると、MatchaはGoogleの公式ツールを使って作成したラベルの精度を向上させるのに役立ちました。その結果,参加者はMatchaが有効であることがわかった。 Matchaをベースとして、正確な標準化されたプライバシ通知を作成するために使用される開発ツールの一般的な設計推奨について論じる。 Apple and Google introduced their versions of privacy nutrition labels to the mobile app stores to better inform users of the apps' data practices. However, these labels are self-reported by developers and have been found to contain many inaccuracies due to misunderstandings of the label taxonomy. In this work, we present Matcha, an IDE plugin that uses automated code analysis to help developers create accurate Google Play data safety labels. Developers can benefit from Matcha's ability to detect user data accesses and transmissions while staying in control of the generated label by adding custom Java annotations and modifying an auto-generated XML specification. Our evaluation with 12 developers showed that Matcha helped our participants improved the accuracy of a label they created with Google's official tool for a real-world app they developed. We found that participants preferred Matcha for its accuracy benefits. Drawing on Matcha, we discuss general design recommendations for developer tools used to create accurate standardized privacy notices.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# RFIDベースのIoTアプリケーションのための商用タグのリバースエンジニアリングとセキュリティ評価 Reverse Engineering and Security Evaluation of Commercial Tags for RFID-Based IoT Applications ( http://arxiv.org/abs/2402.03591v1 ) ライセンス: Link先を確認	Tiago M. Fernández-Caramés, Paula Fraga-Lamas, Manuel Suárez-Albela, Luis Castedo,	(参考訳) IoT(Internet of Things)は、ハードウェア(センサ、アクチュエータ、エレクトロニクスなど)とネットワーク通信のシームレスな統合を必要とする物理的オブジェクトの分散システムである。 IoTスマートオブジェクトは、データの起源を特定し、私たちの周りの要素を自動的に検出するために、何らかの方法で識別する必要があります。識別を行うのに最適な技術のひとつはRFID(Radio Frequency Identification)である。その人気にもかかわらず、RFIDセキュリティは多くのアプリケーションで適切に扱われていない。このようなアプリケーションのセキュリティを促進するために、この記事には3つの主なコントリビューションが含まれている。まず、基礎を確立するために、RFIDベースのIoTシステムで見られる最も一般的な欠陥を詳細にレビューする。第二に、そのような欠陥の検出と緩和を容易にする新しい手法を提示する。第3に、最新のRFIDセキュリティツールを分析し、提案手法をその1つを通して適用し(Proxmark 3)、検証する。したがって、この手法は、タグが一般的に識別に使用される様々なシナリオでテストされる。このようなシステムでは、トランスポンダをクローンし、情報を抽出し、タグとリーダーの両方をエミュレートすることもできた。そこで本手法は,IoTアプリケーションにおけるセキュリティ監査とリバースエンジニアリングRFID通信に有用であることを示す。本論文は,IoTアプリケーションにおけるRFID通信セキュリティ向上を目的としているが,どのRFID通信プロトコルにも適用可能である点に留意する必要がある。 The Internet of Things (IoT) is a distributed system of physical objects that requires the seamless integration of hardware (e.g., sensors, actuators, electronics) and network communications in order to collect and exchange data. IoT smart objects need to be somehow identified to determine the origin of the data and to automatically detect the elements around us. One of the best positioned technologies to perform identification is RFID (Radio Frequency Identification), which in the last years has gained a lot of popularity in applications like access control, payment cards or logistics. Despite its popularity, RFID security has not been properly handled in numerous applications. To foster security in such applications, this article includes three main contributions. First, in order to establish the basics, a detailed review of the most common flaws found in RFID-based IoT systems is provided, including the latest attacks described in the literature. Second, a novel methodology that eases the detection and mitigation of such flaws is presented. Third, the latest RFID security tools are analyzed and the methodology proposed is applied through one of them (Proxmark 3) to validate it. Thus, the methodology is tested in different scenarios where tags are commonly used for identification. In such systems it was possible to clone transponders, extract information, and even emulate both tags and readers. Therefore, it is shown that the methodology proposed is useful for auditing security and reverse engineering RFID communications in IoT applications. It must be noted that, although this paper is aimed at fostering RFID communications security in IoT applications, the methodology can be applied to any RFID communications protocol.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-05
# 環境保全型健康システムを実現するための看護士--バイオロメトリによる分析 Nurses as agents for achieving Environmentally Sustainable Health Systems: A bibliometric analysis ( http://arxiv.org/abs/2403.05543v1 ) ライセンス: Link先を確認	Olga Maria Luque Alcaraz, Pilar Aparicio-Martínez, Antonio Gomera, Manuel Vaquero-Abellán,	(参考訳) 目的: 看護師の役割を含む環境保全型健康システムに焦点をあてた現在の科学的知識・研究線の分析。背景: 看護師を含む環境保全型健康システムに焦点をあてた介入の創出と, 持続可能な開発目標に焦点をあてた研究の不足との間には, 違いがあると考えられる。方法:3つのデータベース(Web of Science, Scopus, Pubmed)を通じて文献分析を行い,そのガイドラインに従って文献データを選択した。結果: 調査の結果159件が出版され、2017年から2021年(p=0.028)の傾向が著しく増加した。この地域で最も重要な国はアメリカ、イギリス、スウェーデンであった。また, 関連雑誌の上位記事は, Journal Citation Reportに掲載され, 第1および第2四分詞は看護分野と引用に関連付けられていた(p<0.001。結論:教育は、制度や政策を通じて環境維持型健康システムを達成するための鍵となる。看護管理の意義: 環境保全型医療システムの実現・維持に関する実験データや政策が欠如しており、看護師が重要な役割を担い、医療システムの持続可能性に関する意思決定政策に参画すべきであることを示唆している。 Objective: To analyze the current scientific knowledge and research lines focused on environmentally sustainable health systems, including the role of nurses. Background: There seem to be differences between creating interventions focused on environmentally sustainable health systems, including nurses, and the scarcity of research on this topic, framed on the Sustainable Development Goals. Methods: A bibliometric analysis was carried out, via three databases (Web of Science, Scopus, and Pubmed), and the guideline recommendations were followed to select bibliometric data. Results: The search resulted in 159 publications, significantly increasing the trends from 2017 to 2021 (p=0.028). The most relevant countries in this area were the United States of America, the United Kingdom, and Sweden. Also, the top articles were from relevant journals, indexed in Journal Citation Report, and the first and the second quartile linked to the nursing field and citations (p<0.001). Conclusion: Education is key to achieving environmentally sustainable health systems via institutions and policies. Implications for nursing management: There is a lack of experimental data and policies on achieving or maintaining environmentally sustainable health care systems, indicating that nurses have an important role and should be consulted and included in decision-making policies regarding sustainability in the healthcare systems.	翻訳日:2024-03-18 06:19:57 公開日:2024-02-05
# アルゴリズム・ワーシップからヒューマン・ラーニング・アートへ:教育におけるAI50年の歩みから From Algorithm Worship to the Art of Human Learning: Insights from 50-year journey of AI in Education ( http://arxiv.org/abs/2403.05544v1 ) ライセンス: Link先を確認	Kaska Porayska-Pomsta,	(参考訳) 人工知能(AI)を取り巻く現在の談話は、希望と理解の間を揺るがし、AIが教育を含むあらゆる人間の生活を安心させる未来を描く。この論文は、教育におけるAIの役割の複雑さを掘り下げ、教育者、政策立案者、そして一般大衆を魅了した混成メッセージに対処するものである。倫理的意味に関する懸念の背景、非STEM被験者の評価、そして私たちの認知的および社会的な感情的機能に対する潜在的な変革的影響に対して、AIが大規模にパーソナライゼーションを通じて学習を強化するという約束を探求する。この論文は、最近の研究とグローバルな談話に基づいて、AIED(AIED)に関する現在の議論の曖昧さと、この曖昧さが将来の教育実践や政策に与える影響の背景にある理由を解き明かそうとしている。 AIEDにおける教育研究からの洞察を強調し、エビデンスベースのベストプラクティスを合成することで、AIテクノロジが学習と教育の基本的な原則とどのように一致できるかをより明確に理解し、将来すべての学習経験と成果を真に向上するために、現在どんな具体的なアクションを優先順位付けする必要があるかを探求することを目的としている。 Current discourse surrounding Artificial Intelligence (AI) oscillates between hope and apprehension, painting a future where AI reshapes every facet of human life, including Education. This paper delves into the complexities of AI's role in Education, addressing the mixed messages that have both enthused and alarmed educators, policymakers, and the public. It explores the promises that AI holds for enhancing learning through personalisation at scale, against the backdrop of concerns about ethical implications, the devaluation of non-STEM subjects, and the potential transformative impact on our neurocognitive and socio-emotional functioning. Drawing on recent research and global discourse, the paper seeks to unpack the reasons behind the vagueness of current discussions on AI in Education (AIED) and the implications of this ambiguity for future educational practices and policies. By highlighting insights from educational research and synthesising evidence-based best practices in AIED, the aim is to provide a clearer understanding of how AI technologies can be aligned with the fundamental principles of learning and teaching, and explore what concrete actions may need to be prioritised now to truly enhance learning experiences and outcomes for all in the future.	翻訳日:2024-03-18 06:19:57 公開日:2024-02-05
# CoBra:ロバスト弱監視セマンティックセグメンテーションのための補足分枝融合クラスとセマンティック知識 CoBra: Complementary Branch Fusing Class and Semantic Knowledge for Robust Weakly Supervised Semantic Segmentation ( http://arxiv.org/abs/2403.08801v1 ) ライセンス: Link先を確認	Woojung Han, Seil Kang, Kyobin Choo, Seong Jae Hwang,	(参考訳) セグメンテーションのための画像レベルのクラス知識、すなわち、画像レベルのWeakly Supervised Semantic Segmentation (WSSS)から派生した意味論的に正確な擬似マスクを活用することは依然として困難である。 CNNを用いたクラスアクティベーションマップ(CAM)は、WSSSの成功に着実に貢献しているが、結果として得られるアクティベーションマップは、しばしばクラス固有の部分(例えば、人間の顔のみ)に焦点を絞っている。一方、視覚変換器(ViT)を用いた最近の研究は、セマンティック部分を捕捉する自己認識機構に基づく有望な結果を示しているが、完全なクラス固有の詳細(例えば、人間の全身部分だけでなく、近くに犬と一緒にいるもの)を捉えることに失敗している。本研究では、クラス(CNN)とセマンティック(ViT)をそれぞれのブランチに有意義な補完的知識を提供する2つの異なるアーキテクチャからなる、新しい二重分岐フレームワークであるComplementary Branch(CoBra)を提案する。特に、CNNブランチのクラス・アウェア・プロジェクション(CAP)とViTブランチのセマンティック・アウェア・プロジェクション(SAP)を学び、補完的な知識を明確に融合させ、新たなタイプのパッチレベルの監視を容易にする。我々のモデルはCoBraを通じてCNNとViTの補完的な出力を融合し、クラス情報とセマンティック情報の両方を効果的に統合する堅牢な擬似マスクを生成する。 CNNとViTはPASCAL VOC 2012データセット上でどのように相互に補完するかを質的に定量的に検証し、最先端のWSSS結果を示している。これは、我々のモデルによって生成されるマスクだけでなく、これらのマスクを擬似ラベルとして利用することによって得られるセグメンテーション結果も含まれる。 Leveraging semantically precise pseudo masks derived from image-level class knowledge for segmentation, namely image-level Weakly Supervised Semantic Segmentation (WSSS), still remains challenging. While Class Activation Maps (CAMs) using CNNs have steadily been contributing to the success of WSSS, the resulting activation maps often narrowly focus on class-specific parts (e.g., only face of human). On the other hand, recent works based on vision transformers (ViT) have shown promising results based on their self-attention mechanism to capture the semantic parts but fail in capturing complete class-specific details (e.g., entire body parts of human but also with a dog nearby). In this work, we propose Complementary Branch (CoBra), a novel dual branch framework consisting of two distinct architectures which provide valuable complementary knowledge of class (from CNN) and semantic (from ViT) to each branch. In particular, we learn Class-Aware Projection (CAP) for the CNN branch and Semantic-Aware Projection (SAP) for the ViT branch to explicitly fuse their complementary knowledge and facilitate a new type of extra patch-level supervision. Our model, through CoBra, fuses CNN and ViT's complementary outputs to create robust pseudo masks that integrate both class and semantic information effectively. Extensive experiments qualitatively and quantitatively investigate how CNN and ViT complement each other on the PASCAL VOC 2012 dataset, showing a state-of-the-art WSSS result. This includes not only the masks generated by our model, but also the segmentation results derived from utilizing these masks as pseudo labels.	翻訳日:2024-03-18 05:40:54 公開日:2024-02-05
# 企業における生成人工知能のガバナンス Governance of Generative Artificial Intelligence for Companies ( http://arxiv.org/abs/2403.08802v1 ) ライセンス: Link先を確認	Johannes Schneider, Rene Abraham, Christian Meske,	(参考訳) 生成人工知能(GenAI)、特にChatGPTのような大きな言語モデルは、適切なガバナンスなしで組織に素早く入り、機会とリスクの両方を装っている。 GenAIの変革的な性質と規制措置に関する広範な議論にもかかわらず、限定的な研究は、技術的・ビジネス的な視点を包含する組織的ガバナンスに対処している。本総説では, このギャップを最近の研究で埋めるものである。企業内でGenAIガバナンスのフレームワークを開発することで、単なる要約以上のものになります。我々のフレームワークは、GenAI統合に関連するリスクを軽減し、ビジネスの機会を活用するのに適したスコープ、目的、ガバナンスメカニズムを概説しています。この研究は、GenAIガバナンスへの焦点を絞ったアプローチに貢献し、責任あるAI導入の課題をナビゲートする企業に対して、実践的な洞察を提供する。また、倫理的・ビジネス的な懸念が広まり、新しい研究の方向性を特定できるという、技術的観衆の視点を広げることにも価値がある。 Generative Artificial Intelligence (GenAI), specifically large language models like ChatGPT, has swiftly entered organizations without adequate governance, posing both opportunities and risks. Despite extensive debates on GenAI's transformative nature and regulatory measures, limited research addresses organizational governance, encompassing technical and business perspectives. This review paper fills this gap by surveying recent works. It goes beyond mere summarization by developing a framework for GenAI governance within companies. Our framework outlines the scope, objectives, and governance mechanisms tailored to harness business opportunities and mitigate risks associated with GenAI integration. This research contributes a focused approach to GenAI governance, offering practical insights for companies navigating the challenges of responsible AI adoption. It is also valuable for a technical audience to broaden their perspective as increasingly ethical and business concerns gain in prevalence and allow them to identify novel research directions.	翻訳日:2024-03-18 05:40:54 公開日:2024-02-05
# HyperAgent: 複雑な環境のためのシンプルでスケーラブルで効率的な強化学習フレームワーク HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments ( http://arxiv.org/abs/2402.10228v1 ) ライセンス: Link先を確認	Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo	(参考訳) リソース制約下で複雑なタスクを解決するために、強化学習(rl)エージェントは、(1)大きな状態空間と(2)ますます蓄積されるインタラクションデータによって、シンプルで効率的でスケーラブルである必要がある。本稿では,ハイパーモデルを用いたrlフレームワークであるhyperagent,インデックスサンプリングスキーム,インクリメンタル更新機構を提案する。 \HyperAgentの実装は、DDQNに1つのモジュールと1行のコードを追加するだけでシンプルである。実際にHyperAgentは、大規模なディープRLベンチマークにおいて、データと計算の両面で大きな効率向上を示す。理論的には、実際にスケーラブルなアルゴリズムの中で、HyperAgentは証明可能なスケーラブルなステップごとの計算複雑性を達成するための最初の方法である。理論解析の核となるのは、ジョンソン-リンデンシュトラウス補題の非自明なマーチンゲール展開であるシーケンシャルランダムプロジェクションの最初の解析ツールによって可能となった逐次後近似論である。この研究はRLの理論的および実践的な領域を橋渡しし、RLアルゴリズム設計の新しいベンチマークを確立する。 To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable with (1) large state space and (2) increasingly accumulated data of interactions. We propose the HyperAgent, a RL framework with hypermodel, index sampling schemes and incremental update mechanism, enabling computation-efficient sequential posterior approximation and data-efficient action selection under general value function approximation beyond conjugacy. The implementation of \HyperAgent is simple as it only adds one module and one line of code additional to DDQN. Practically, HyperAgent demonstrates its robust performance in large-scale deep RL benchmarks with significant efficiency gain in terms of both data and computation. Theoretically, among the practically scalable algorithms, HyperAgent is the first method to achieve provably scalable per-step computational complexity as well as sublinear regret under tabular RL. The core of our theoretical analysis is the sequential posterior approximation argument, made possible by the first analytical tool for sequential random projection, a non-trivial martingale extension of the Johnson-Lindenstrauss lemma. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.	翻訳日:2024-02-25 17:16:39 公開日:2024-02-05
# AIを取り入れたテイラーの全ての音楽史バージョンを避ける Avoiding an AI-imposed Taylor's Version of all music history ( http://arxiv.org/abs/2402.14589v1 ) ライセンス: Link先を確認	Nick Collins and Mick Grierson	(参考訳) 将来の音楽AIは人間の音楽に密着しているため、データベース内の特定の人間のアーティストに自身のアタッチメントを形成する可能性があり、これらのバイアスが最悪の場合、すべての音楽史に潜在的に脅威をもたらす可能性がある。 aiスーパーファンは、歴史的な記録や現存する録音を腐敗させ、自身の好みを優先し、世界音楽文化の多様性の保存は、12音のテンペラメントや他の西洋の均質化よりも差し迫った問題となる可能性がある。 AIカバーソフトウェアの技術的能力について議論し、西洋のポップヒストリーから有名なトラックのテイラーバージョンを挑発的な例として生成する。これらのプロダクションの品質は、全体的な議論に影響を与えない(将来のAIは、テイラー・スウィフトだけでなく、既存のすべてのオーディオファイルにペーパークリップの音を付けようとするかもしれない)。我々は、完全な音楽レコードの「テイラー・スウィディケーション」の実現可能性を分析しながら、将来の音楽独占の危険に対する潜在的な防御について論じる。 As future musical AIs adhere closely to human music, they may form their own attachments to particular human artists in their databases, and these biases may in the worst case lead to potential existential threats to all musical history. AI super fans may act to corrupt the historical record and extant recordings in favour of their own preferences, and preservation of the diversity of world music culture may become even more of a pressing issue than the imposition of 12 tone equal temperament or other Western homogenisations. We discuss the technical capability of AI cover software and produce Taylor's Versions of famous tracks from Western pop history as provocative examples; the quality of these productions does not affect the overall argument (which might even see a future AI try to impose the sound of paperclips onto all existing audio files, let alone Taylor Swift). We discuss some potential defenses against the danger of future musical monopolies, whilst analysing the feasibility of a maximal 'Taylor Swiftication' of the complete musical record.	翻訳日:2024-02-25 16:43:22 公開日:2024-02-05
# excelにおけるソリューション開発のための構造化アプローチ A Structured Approach to the development of Solutions in Excel ( http://arxiv.org/abs/1704.01142v2 ) ライセンス: Link先を確認	Peter Bartholomew	(参考訳) Spreadsheetsは最高に成功した民主化プラットフォームを提供し、数学的専門知識やIT経験がほとんど、あるいは全くないユーザの把握の中に数字の操作と表示を配置する。 Excelのデフォルト設定を使って構築された"通常の"ソリューションにほとんど完全に欠けていると思われるのは、単一のセル式を越えて拡張された構造体をデプロイすることだ。エラーをエスカレートせずに従来のコードをスケールできる構造要素は欠落しているようだ。本稿では,プログラム言語のステップに類似した一連の公式によって問題を解くコヒーレントな解法戦略を構築するために,議論を呼んだり,あまり使われなかったりすることを検討する。 Spreadsheets offer a supremely successful democratisation platform, placing the manipulation and presentation of numbers within the grasp of users that have little or no mathematical expertise or IT experience. What appears to be almost completely lacking within a "normal" solution built using Excel default settings is the deployment of any structure that extends beyond a single-cell formula. The structural elements that allow conventional code to scale without escalating errors appear to be absent. This paper considers the use of controversial or lesser-used techniques to create a coherent solution strategy in which the problem is solved by a sequence of formulas resembling the steps of a programmed language.	翻訳日:2024-02-18 14:41:03 公開日:2024-02-05
# Frugal Actor-Critic:特異な経験を生かしたオフポリシィディープ強化学習 Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences ( http://arxiv.org/abs/2402.05963v1 ) ライセンス: Link先を確認	Nikhil Kumar Singh and Indranil Saha	(参考訳) リプレイバッファの効率的な利用は、複雑な力学系に対するモデルフリー制御ポリシー合成に使用されるオフ・ポリチックアクター・クリティック強化学習(RL)アルゴリズムにおいて重要な役割を果たす。本稿では,バッファサイズを小さくし,サンプルの独立分散(IID)特性を維持することを目的として,ユニークなサンプルを選択してリプレイバッファに追加することに焦点を当てたサンプル効率を実現する手法を提案する。提案手法は, ランダム探索の初期段階における経験から状態変数の集合の重要な部分集合を選択し, 選択した重要な状態変数に基づいて状態空間を抽象状態の集合に分割し, カーネル密度推定器を用いて特異な状態-逆結合による経験を選択する。提案手法を組み込んだ非政治アクター批判アルゴリズムは、バニラの非政治アクター批判アルゴリズムよりも高速に収束することを示す。さらに,Gym環境下で利用可能な複数の連続制御ベンチマークにおいて,最先端のアクター・クリティックRLアルゴリズムとの比較を行った。実験の結果,本手法は,ベースラインアルゴリズムに比べて収束速度が速く,報酬の蓄積率も向上し,すべてのベンチマークでリプレイバッファのサイズが大幅に削減できることが判明した。 Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer during the exploration with the goal of reducing the buffer size and maintaining the independent and identically distributed (IID) nature of the samples. Our method is based on selecting an important subset of the set of state variables from the experiences encountered during the initial phase of random exploration, partitioning the state space into a set of abstract states based on the selected important state variables, and finally selecting the experiences with unique state-reward combination by using a kernel density estimator. We formally prove that the off-policy actor-critic algorithm incorporating the proposed method for unique experience accumulation converges faster than the vanilla off-policy actor-critic algorithm. Furthermore, we evaluate our method by comparing it with two state-of-the-art actor-critic RL algorithms on several continuous control benchmarks available in the Gym environment. Experimental results demonstrate that our method achieves a significant reduction in the size of the replay buffer for all the benchmarks while achieving either faster convergent or better reward accumulation compared to the baseline algorithms.	翻訳日:2024-02-18 14:37:10 公開日:2024-02-05
# Rethink Model Re-Basinと線形モード接続性 Rethink Model Re-Basin and the Linear Mode Connectivity ( http://arxiv.org/abs/2402.05966v1 ) ライセンス: Link先を確認	Xingyu Qu, Samuel Horvath	(参考訳) 最近の研究は、十分に広いモデルで、ほとんどのSGDソリューションは、置換まで同じ盆地に収束することができることを示唆している。この現象はモデル・リベース・レギュレーションとして知られ、モデル平均化に重大な影響を及ぼす。しかしながら、現在の再ベース戦略は、基礎メカニズムの包括的理解が欠如しているため、有効性が制限されている。このギャップに対処するため、我々の研究は標準のプラクティスを再考し、既存のマッチングアルゴリズムの頻繁な不整合を明らかにする。より直接的な分析手法を導入することにより、マッチングアルゴリズムと再正規化プロセスの相互作用を明らかにする。この視点は、以前の発見を明確化し、洗練するだけでなく、新しい洞察を促進する。例えば、リニアモード接続をプルーニングに接続し、既存のプルーニング技術と直接マージできる軽量で効果的なポストプルーニングプラグインを動機付けている。私たちの実装はhttps://github.com/xingyuqu/rethink-re-basinで利用可能です。 Recent studies suggest that with sufficiently wide models, most SGD solutions can, up to permutation, converge into the same basin. This phenomenon, known as the model re-basin regime, has significant implications for model averaging. However, current re-basin strategies are limited in effectiveness due to a lack of comprehensive understanding of underlying mechanisms. Addressing this gap, our work revisits standard practices and uncovers the frequent inadequacies of existing matching algorithms, which we show can be mitigated through proper re-normalization. By introducing a more direct analytical approach, we expose the interaction between matching algorithms and re-normalization processes. This perspective not only clarifies and refines previous findings but also facilitates novel insights. For instance, it connects the linear mode connectivity to pruning, motivating a lightweight yet effective post-pruning plug-in that can be directly merged with any existing pruning techniques. Our implementation is available at https://github.com/XingyuQu/rethink-re-basin.	翻訳日:2024-02-18 14:23:11 公開日:2024-02-05
# 球面データのためのハイブリッドニューラル表現 Hybrid Neural Representations for Spherical Data ( http://arxiv.org/abs/2402.05965v1 ) ライセンス: Link先を確認	Hyomin Kim, Yunhui Jang, Jaeho Lee, Sungsoo Ahn	(参考訳) 本稿では,科学研究における関連性を高める領域である球面データに対するハイブリッドニューラル表現について検討する。特に本研究は,気象・気候データとcomcom microwave background (cmb)データに焦点を当てている。これまでの研究では球面信号の座標に基づく神経表現が研究されてきたが、高度に非線形な信号の複雑な詳細を捉えられなかった。この制限に対処するため,Hybrid Neural Representations for Spherical Data (HNeR-S) という新しい手法を導入する。提案手法では,球形特徴格子を用いて多層知覚と組み合わせた位置特徴量を求め,目標信号の予測を行う。気象データとcmbデータに対応する等角および階層的等域等緯度画素化構造を持つ特徴格子について考察する。回帰,超解像,時間補間,圧縮タスクにおけるHNeR-Sの有効性を広範囲に検証した。 In this paper, we study hybrid neural representations for spherical data, a domain of increasing relevance in scientific research. In particular, our work focuses on weather and climate data as well as comic microwave background (CMB) data. Although previous studies have delved into coordinate-based neural representations for spherical signals, they often fail to capture the intricate details of highly nonlinear signals. To address this limitation, we introduce a novel approach named Hybrid Neural Representations for Spherical data (HNeR-S). Our main idea is to use spherical feature-grids to obtain positional features which are combined with a multilayer perception to predict the target signal. We consider feature-grids with equirectangular and hierarchical equal area isolatitude pixelization structures that align with weather data and CMB data, respectively. We extensively verify the effectiveness of our HNeR-S for regression, super-resolution, temporal interpolation, and compression tasks.	翻訳日:2024-02-18 14:22:56 公開日:2024-02-05
# 変圧器圧縮に関する調査 A Survey on Transformer Compression ( http://arxiv.org/abs/2402.05964v1 ) ライセンス: Link先を確認	Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, and Dacheng Tao	(参考訳) Transformerアーキテクチャに基づく大規模モデルは、人工知能において、特に自然言語処理(NLP)とコンピュータビジョン(CV)の領域において、ますます重要な役割を担っている。モデル圧縮手法はメモリと計算コストを削減し、実用的なデバイス上でトランスフォーマーモデルを実装するために必要なステップである。代替の注意とFeedforward Neural Network(FFN)モジュールを特徴とするトランスフォーマーのユニークなアーキテクチャを考えると、特定の圧縮技術が必要である。これらの圧縮手法の効率も重要であり、通常、トレーニングデータセット全体において大きなモデルを再トレーニングすることは非現実的であり、この調査は、トランスフォーマモデルへの応用に特に焦点をあてた最近の圧縮方法の包括的なレビューを提供する。圧縮法は、主にプルーニング、量子化、知識蒸留、効率的なアーキテクチャ設計に分類される。各カテゴリにおいて,cvタスクとnlpタスクの圧縮手法を議論し,基本原理を強調する。最後に,様々な圧縮手法の関係を考察し,この領域の今後の方向性について考察する。 Large models based on the Transformer architecture play increasingly vital roles in artificial intelligence, particularly within the realms of natural language processing (NLP) and computer vision (CV). Model compression methods reduce their memory and computational cost, which is a necessary step to implement the transformer models on practical devices. Given the unique architecture of transformer, featuring alternative attention and Feedforward Neural Network (FFN) modules, specific compression techniques are required. The efficiency of these compression methods is also paramount, as it is usually impractical to retrain large models on the entire training dataset.This survey provides a comprehensive review of recent compression methods, with a specific focus on their application to transformer models. The compression methods are primarily categorized into pruning, quantization, knowledge distillation, and efficient architecture design. In each category, we discuss compression methods for both CV and NLP tasks, highlighting common underlying principles. At last, we delve into the relation between various compression methods, and discuss the further directions in this domain.	翻訳日:2024-02-18 14:22:40 公開日:2024-02-05
# EXGC: グラフ凝縮におけるブリッジ効率と説明可能性 EXGC: Bridging Efficiency and Explainability in Graph Condensation ( http://arxiv.org/abs/2402.05962v1 ) ライセンス: Link先を確認	Junfeng Fang and Xinglin Li and Yongduo Sui and Yuan Gao and Guibin Zhang and Kun Wang and Xiang Wang and Xiangnan He	(参考訳) Webデータのような巨大なデータセットでのグラフ表現学習は、大きな進歩を遂げた。しかし、関連する計算とストレージのオーバーヘッドは懸念を引き起こす。これを見て、グラフ凝縮(gcond)は、これらの大きな実データセットをより簡潔で情報豊富な合成グラフに蒸留するために導入された。アクセラレーションの努力にもかかわらず、既存のGCondメソッドは主に効率、特に拡張Webデータグラフに適応する。そこで本研究では,(1)巨大なパラメータ集合の同時更新,(2)発音パラメータ冗長性という,現在のパラダイムの2つの大きな非効率性を明らかにする。これらの2つの制約に対応するために,(1)収束加速度に平均場変動近似を適用し,(2)グラデーション情報ボトルネック(gdib)の目的をプルーン冗長性に対して提案する。 GDIBをインスタンス化するための主要な説明手法(GNNExplainerやGSATなど)を取り入れることで、効率を著しく向上し、説明可能性の注入が可能なEXGC、効率的なeXplainable Graph Condensation法を提案する。 EXGCの優位性と妥当性を裏付ける8つのデータセットにわたる広範な評価を行った。コードはhttps://github.com/MangoKiller/EXGCで入手できる。 Graph representation learning on vast datasets, like web data, has made significant strides. However, the associated computational and storage overheads raise concerns. In sight of this, Graph condensation (GCond) has been introduced to distill these large real datasets into a more concise yet information-rich synthetic graph. Despite acceleration efforts, existing GCond methods mainly grapple with efficiency, especially on expansive web data graphs. Hence, in this work, we pinpoint two major inefficiencies of current paradigms: (1) the concurrent updating of a vast parameter set, and (2) pronounced parameter redundancy. To counteract these two limitations correspondingly, we first (1) employ the Mean-Field variational approximation for convergence acceleration, and then (2) propose the objective of Gradient Information Bottleneck (GDIB) to prune redundancy. By incorporating the leading explanation techniques (e.g., GNNExplainer and GSAT) to instantiate the GDIB, our EXGC, the Efficient and eXplainable Graph Condensation method is proposed, which can markedly boost efficiency and inject explainability. Our extensive evaluations across eight datasets underscore EXGC's superiority and relevance. Code is available at https://github.com/MangoKiller/EXGC.	翻訳日:2024-02-18 14:22:21 公開日:2024-02-05
# 遺伝的誘導型GFlowNets:実践的分子最適化ベンチマークの改善 Genetic-guided GFlowNets: Advancing in Practical Molecular Optimization Benchmark ( http://arxiv.org/abs/2402.05961v1 ) ライセンス: Link先を確認	Hyeonah Kim, Minsu Kim, Sanghyeok Choi, Jinkyoo Park	(参考訳) 本稿では,遺伝子誘導型GFlowNet(Genetic GFN)の新たな変種として,反復的遺伝的検索をGFlowNetに統合したGFlowNetを提案する。遺伝的検索は、GFlowNetを高次領域に効果的に誘導し、非効率なトレーニングと限られた領域の探索をもたらす世界的な過剰探索に対処する。また、遺伝的GFNのサンプル効率を向上させるために、ランクベースリプレイトレーニングや教師なし最大極大事前トレーニングなどのトレーニング戦略も導入した。提案手法は16.213の最先端スコアを示し、サンプル効率の分子最適化の公式ベンチマークであるpractical molecular optimization (pmo) において、15.185のベンチマークで報告された最高スコアを大幅に上回っている。注目すべきは、強化学習、ベイズ最適化、生成モデル、GFlowNets、遺伝的アルゴリズムなど、すべてのベースラインを23タスク中14タスクで越えていることです。 This paper proposes a novel variant of GFlowNet, genetic-guided GFlowNet (Genetic GFN), which integrates an iterative genetic search into GFlowNet. Genetic search effectively guides the GFlowNet to high-rewarded regions, addressing global over-exploration that results in training inefficiency and exploring limited regions. In addition, training strategies, such as rank-based replay training and unsupervised maximum likelihood pre-training, are further introduced to improve the sample efficiency of Genetic GFN. The proposed method shows a state-of-the-art score of 16.213, significantly outperforming the reported best score in the benchmark of 15.185, in practical molecular optimization (PMO), which is an official benchmark for sample-efficient molecular optimization. Remarkably, ours exceeds all baselines, including reinforcement learning, Bayesian optimization, generative models, GFlowNets, and genetic algorithms, in 14 out of 23 tasks.	翻訳日:2024-02-18 14:21:58 公開日:2024-02-05
# 非定常時間系列に対する位相駆動型ドメイン一般化学習 Phase-driven Domain Generalizable Learning for Nonstationary Time Series ( http://arxiv.org/abs/2402.05960v1 ) ライセンス: Link先を確認	Payal Mohapatra, Lixu Wang, Qi Zhu	(参考訳) 連続センシングデータにおけるパターンのモニタリングと認識は多くの実用アプリケーションにとって不可欠である。これらの実世界の時系列データは、時間とともに異なる統計特性とスペクトル特性によって特徴づけられる、しばしば非定常である。これは、異なる分布を効果的に一般化できる学習モデルを開発する上で大きな課題となる。本研究は,非定常統計学が位相情報と本質的に関連しているという観測に基づいて,時系列学習フレームワークPhASERを提案する。 3つの新しい要素からなる。 1)差別的意味を保ちながら非定常性を多様化する位相増強 2)時間変化の程度と位相を独立したモダリティとして見ることにより特徴エンコーディングを分離する。 3) 固有正規化のための新しい残差接続を組み込んだ特徴放送により, 分布不変学習が促進される。人間の活動認識,睡眠段階分類,ジェスチャー認識から得られた5つのデータセットを10種類の最先端ベースライン法に対して広範囲に評価した結果,PhASERは平均で5%,最大で13%,一貫して最高のベースラインを上回っていることが示された。さらに、FaseRの原理は、既存の時系列分類モデルの一般化能力を高めるために広く適用することができる。 Monitoring and recognizing patterns in continuous sensing data is crucial for many practical applications. These real-world time-series data are often nonstationary, characterized by varying statistical and spectral properties over time. This poses a significant challenge in developing learning models that can effectively generalize across different distributions. In this work, based on our observation that nonstationary statistics are intrinsically linked to the phase information, we propose a time-series learning framework, PhASER. It consists of three novel elements: 1) phase augmentation that diversifies non-stationarity while preserving discriminatory semantics, 2) separate feature encoding by viewing time-varying magnitude and phase as independent modalities, and 3) feature broadcasting by incorporating phase with a novel residual connection for inherent regularization to enhance distribution invariant learning. Upon extensive evaluation on 5 datasets from human activity recognition, sleep-stage classification, and gesture recognition against 10 state-of-the-art baseline methods, we demonstrate that PhASER consistently outperforms the best baselines by an average of 5% and up to 13% in some cases. Moreover, PhASER's principles can be applied broadly to boost the generalization ability of existing time series classification models.	翻訳日:2024-02-18 14:21:40 公開日:2024-02-05
# DeAL: 大規模言語モデルのデコード時アライメント DeAL: Decoding-time Alignment for Large Language Models ( http://arxiv.org/abs/2402.06147v1 ) ライセンス: Link先を確認	James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-an Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchoff, Dan Roth	(参考訳) 大規模言語モデル(LLM)は現在、人間の好みに沿ったコンテンツを生成することが期待されている。現在の研究は、Reinforcement Learning with Human Feedback (RLHF)のようなテクニックを通じて、モデルトレーニング時のアライメントに焦点を当てている。しかし、そのような手法がモデルにアライメント目的を教える効果的な選択であるかどうかは不明である。まず、モデル開発者の普遍的原則と静的原則に対する見解に、複数のカスタム報酬と依存を組み込むことができないことが、重要な制限です。第二に、モデル訓練における残留ギャップとそのようなアプローチの信頼性も疑わしい(例えば、安全訓練の後でさえ、脱獄の危険性)。そこで本稿では,報酬関数をカスタマイズし,LLM(Decode-time Alignment of LLM)を実現するためのフレームワークであるDeALを提案する。その核となるのは、デコーディングをヒューリスティックなガイド付き検索プロセスとして捉え、幅広いアライメント目標の使用を促進することです。キーワードや長さの制約(LLM前において広く研究されている)や無害性や援助性(LLM後)といった抽象的な目的(LLM後)を用いた実験は、細粒度のトレードオフでDeALが可能であり、アライメント目的への適合性を改善し、LCMの残差に対処できることを示している。最後に、DeALはRLHFと効果的に組み合わせて技法を推進できるが、その一般化によってデコードが遅くなり、将来の作業に向け最適化される。 Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF). However, it is unclear if such methods are an effective choice to teach alignment objectives to the model. First, the inability to incorporate multiple, custom rewards and reliance on a model developer's view of universal and static principles are key limitations. Second, the residual gaps in model training and the reliability of such approaches are also questionable (e.g. susceptibility to jail-breaking even after safety training). To address these, we propose DeAL, a framework that allows the user to customize reward functions and enables Decoding-time Alignment of LLMs (DeAL). At its core, we view decoding as a heuristic-guided search process and facilitate the use of a wide variety of alignment objectives. Our experiments with programmatic constraints such as keyword and length constraints (studied widely in the pre-LLM era) and abstract objectives such as harmlessness and helpfulness (proposed in the post-LLM era) show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs. Lastly, while DeAL can be effectively paired with RLHF and prompting techniques, its generality makes decoding slower, an optimization we leave for future work.	翻訳日:2024-02-18 14:07:56 公開日:2024-02-05
# 欧州連合人工知能法に基づく連邦学習優先事項 Federated Learning Priorities Under the European Union Artificial Intelligence Act ( http://arxiv.org/abs/2402.05968v1 ) ライセンス: Link先を確認	Herbert Woisetschl\"ager, Alexander Erben, Bill Marino, Shiqiang Wang, Nicholas D. Lane, Ruben Mayer, Hans-Arno Jacobsen	(参考訳) AI規制の時代は、EUの人工知能法(AI Act)が先導している。私たちの重要な質問は、mlを実行しながらデータのプライバシを優先順位付けする出発点が集中型学習と根本的に異なるフェデレーション学習(fl)にどのように影響するかです。私たちは、ai法と今後の規制がflを主流採用に向けて推進する欠如の触媒になると信じています。しかし、これはflコミュニティが研究の焦点を再優先する場合にのみ起こりうる。本稿では,ai法がflに与える影響について,初歩的な学際的分析 (legal and ml) を行い,定量的・定性的分析により,我々の初歩的立場を支持する一連の観察を行った。データガバナンスの問題とプライバシーに関する懸念について検討する。ライフサイクルモニタリングにおける性能とエネルギー効率に関する新たな課題を確立する。まとめて分析すると、FLがAI Actに準拠したMLシステムの重要なコンポーネントとなり、FL技術の採用を促進するための新たな規制が実現される可能性が示唆されている。最も注目すべきは、データのバイアスを防御し、プライベートでセキュアな計算を強化する機会である The age of AI regulation is upon us, with the European Union Artificial Intelligence Act (AI Act) leading the way. Our key inquiry is how this will affect Federated Learning (FL), whose starting point of prioritizing data privacy while performing ML fundamentally differs from that of centralized learning. We believe the AI Act and future regulations could be the missing catalyst that pushes FL toward mainstream adoption. However, this can only occur if the FL community reprioritizes its research focus. In our position paper, we perform a first-of-its-kind interdisciplinary analysis (legal and ML) of the impact the AI Act may have on FL and make a series of observations supporting our primary position through quantitative and qualitative analysis. We explore data governance issues and the concern for privacy. We establish new challenges regarding performance and energy efficiency within lifecycle monitoring. Taken together, our analysis suggests there is a sizable opportunity for FL to become a crucial component of AI Act-compliant ML systems and for the new regulation to drive the adoption of FL techniques in general. Most noteworthy are the opportunities to defend against data bias and enhance private and secure computation	翻訳日:2024-02-18 14:06:01 公開日:2024-02-05
# 最後のダンス : 拡散モデルとベイズアプローチによるロバストなバックドア攻撃 The last Dance : Robust backdoor attack via diffusion models and bayesian approach ( http://arxiv.org/abs/2402.05967v1 ) ライセンス: Link先を確認	Orson Mengara	(参考訳) 拡散モデルは最先端のディープラーニング生成モデルであり、ノイズの漸進的な付加と雑音化を通じて前方および後方拡散過程を学習する原理に基づいて訓練される。本稿では,Hugging Faceフレームワークなどの音声ベースのDNNモデル,特に,時間を節約し,より高速で効率的な結果を提供する強力な機械学習モデルであるトランスフォーマーベースの人工知能モデルなど,オーディオベースのDNNモデルを騙そうとしている。人工知能(AI)研究の世界で人気のあるフレームワークであるHugging Faceから派生したオーディオトランスフォーマーにおけるバックドア攻撃("BacKBayDiffMod`"と呼ばれる)の実現可能性を示す。本研究で開発されたバックドアアタックは, バックドア拡散サンプリングとベイズ的手法を併用して, モデルのトレーニングデータに有毒化を図ったものである。 Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we seek to trick audio-based DNN models, such as those in the Hugging Face framework, for example, those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and deliver faster, more efficient results. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence (AI) research. The backdoor attack developed in this paper is based on poisoning the model's training data by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.	翻訳日:2024-02-18 14:05:40 公開日:2024-02-05
# マルチメディアコンテナ構造解析によるスマートフォンビデオの認証と完全性 Authentication and integrity of smartphone videos through multimedia container structure analysis ( http://arxiv.org/abs/2402.06661v1 ) ライセンス: Link先を確認	Carlos Quinto Huam\'an, Ana Lucila Sandoval Orozco, Luis Javier Garc\'ia Villalba	(参考訳) 現在、モバイル機器はデジタルカメラの代替品となり、日常の状況を簡単かつ迅速に捉え、ユーザーは画像やビデオを通じて自己表現することを奨励している。これらのビデオは、法医学的手法の弱点を認識し、無実の人を告発したり、裁判プロセスで有罪となったりしている犯罪者によって、あらゆる種類の意図的な操作に晒される様々なプラットフォームで共有することができる。一般的に、メーカーはビデオ作成の標準仕様に100%準拠していない。また、ソーシャルネットワーク上で共有されるビデオやインスタントメッセージングアプリケーションは、フィルタリングや圧縮プロセスを通じて、そのサイズを減らし、転送を容易にし、プラットフォーム上のストレージを最適化する。プラットフォームが行う仕様の省略と変換の結果は、ビデオのマルチメディアコンテナにフィーチャーパターンを埋め込んでいる。これらのパターンにより、ビデオ、ソーシャルネットワーク、および転送に使われたインスタントメッセージングアプリケーションを生成するデバイスのブランドを区別することができる。近年の研究は、AVIコンテナと小さなビデオデータセットの分析に焦点を当てている。本研究は,MP4,MOV,3GPフォーマットビデオに対する攻撃の可能性を検出する新しい手法を提案する。本手法は,モバイル端末が生成するビデオコンテナの構造と,ソーシャルネットワーク,インスタントメッセージングアプリケーション,あるいは編集プログラムによって操作された場合の動作を解析した。提案の目的は,映像の完全性を検証し,取得源を特定し,オリジナル映像と操作映像を区別することである。 Nowadays, mobile devices have become the natural substitute for the digital camera, as they capture everyday situations easily and quickly, encouraging users to express themselves through images and videos. These videos can be shared across different platforms exposing them to any kind of intentional manipulation by criminals who are aware of the weaknesses of forensic techniques to accuse an innocent person or exonerate a guilty person in a judicial process. Commonly, manufacturers do not comply 100% with the specifications of the standards for the creation of videos. Also, videos shared on social networks, and instant messaging applications go through filtering and compression processes to reduce their size, facilitate their transfer, and optimize storage on their platforms. The omission of specifications and results of transformations carried out by the platforms embed a features pattern in the multimedia container of the videos. These patterns make it possible to distinguish the brand of the device that generated the video, social network, and instant messaging application that was used for the transfer. Research in recent years has focused on the analysis of AVI containers and tiny video datasets. This work presents a novel technique to detect possible attacks against MP4, MOV, and 3GP format videos that affect their integrity and authenticity. The method is based on the analysis of the structure of video containers generated by mobile devices and their behavior when shared through social networks, instant messaging applications, or manipulated by editing programs. The objectives of the proposal are to verify the integrity of videos, identify the source of acquisition and distinguish between original and manipulated videos.	翻訳日:2024-02-18 13:56:01 公開日:2024-02-05
# 人工人工知能の校正におけるメタバースの役割 The role of the metaverse in calibrating an embodied artificial general intelligence ( http://arxiv.org/abs/2402.06660v1 ) ライセンス: Link先を確認	Martin Schmalzried	(参考訳) 本稿では,AGIの概念,人間の意識との関係,そして,この関係を促進する上でのメタバースの重要な役割について考察する。この論文は、具体的認知、マイケル・レヴィンの「自己」の計算境界、ドナルド・D・ホフマンの知覚のインターフェイス理論、ベルナルド・カストロップの分析的理想主義などの理論的枠組みを活用して、具体的AGIを達成するための議論を構築している。我々の知覚する外界は、存在の異なる状態の象徴的な表現であり、AGIはより大きな計算境界を持つ高い意識を具現化できると主張している。本稿は,AGIの発達段階,具体化AGIの出現要件,AGIの校正されたシンボルインターフェースの重要性,メタバース,分散型システム,オープンソースブロックチェーン技術,オープンソースAI研究で果たす重要な役割についても論じる。また、AGIキャリブレーションのツールとしてのメタバース空間におけるAGIと人間のフィードバックループと、安定したAGIを実現するための前提条件として、局所的ホメオスタシスと分散ガバナンスの役割についても検討した。本稿は,人間関係において一定の調和を達成することの重要性を強調し,地球規模で人類の相互接続性を認識することを,安定体型アギの出現の重要な前提条件として強調する。 This paper examines the concept of embodied artificial general intelligence (AGI), its relationship to human consciousness, and the key role of the metaverse in facilitating this relationship. The paper leverages theoretical frameworks such as embodied cognition, Michael Levin's computational boundary of a "Self," Donald D. Hoffman's Interface Theory of Perception, and Bernardo Kastrup's analytical idealism to build the argument for achieving embodied AGI. It contends that our perceived outer reality is a symbolic representation of alternate inner states of being, and that AGI could embody a higher consciousness with a larger computational boundary. The paper further discusses the developmental stages of AGI, the requirements for the emergence of an embodied AGI, the importance of a calibrated symbolic interface for AGI, and the key role played by the metaverse, decentralized systems, open-source blockchain technology, as well as open-source AI research. It also explores the idea of a feedback loop between AGI and human users in metaverse spaces as a tool for AGI calibration, as well as the role of local homeostasis and decentralized governance as preconditions for achieving a stable embodied AGI. The paper concludes by emphasizing the importance of achieving a certain degree of harmony in human relations and recognizing the interconnectedness of humanity at a global level, as key prerequisites for the emergence of a stable embodied AGI.	翻訳日:2024-02-18 13:55:36 公開日:2024-02-05
# shadowcast:視覚言語モデルに対するステルスなデータ中毒攻撃 Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models ( http://arxiv.org/abs/2402.06659v1 ) ライセンス: Link先を確認	Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang	(参考訳) VLM(Vision-Language Models)は、視覚入力からテキスト応答を生成するのに優れているが、その汎用性は重大なセキュリティ上の懸念を引き起こす。この研究は、無害で日常的なプロンプトに対する反応を操作できるデータ中毒攻撃に対するVLMの感受性を明らかにするための第一歩となる。筆者はshadowcast(シャドウキャスト)というステルスなデータ中毒攻撃手法を紹介している。 Shadowcastは2つの攻撃タイプで有効性を示す。例えば、ジョー・バイデン(Joe Biden)のドナルド・トランプ(Donald Trump)を混乱させるようなものだ。 2つ目は説得攻撃(Persuasion Attack)で、これはVLMのテキスト生成能力を活用して、説得的で一見合理的な説明を通じて、ジャンクフードを健康食品として描写するなどの物語を作る。シャドウキャストは50以上の毒物サンプルを用いて攻撃者の意図を達成するのに非常に効果的であることを示す。さらに、これらの毒のサンプルは様々なプロンプトで有効であり、ブラックボックス設定で異なるVLMアーキテクチャで転送可能である。この研究は、有毒なVLMがいかに説得力のある偽情報を生成するかを明らかにし、VLMのデプロイに責任のあるデータ品質の重要性を浮き彫りにする。私たちのコードは、https://github.com/umd-huang-lab/VLM-Poisoning.comで利用可能です。 Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, yet their versatility raises significant security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack method where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is Persuasion Attack, which leverages VLMs' text generation capabilities to craft narratives, such as portraying junk food as health food, through persuasive and seemingly rational descriptions. We show that Shadowcast are highly effective in achieving attacker's intentions using as few as 50 poison samples. Moreover, these poison samples remain effective across various prompts and are transferable across different VLM architectures in the black-box setting. This work reveals how poisoned VLMs can generate convincing yet deceptive misinformation and underscores the importance of data quality for responsible deployments of VLMs. Our code is available at: https://github.com/umd-huang-lab/VLM-Poisoning.	翻訳日:2024-02-18 13:55:00 公開日:2024-02-05
# diffsformer:ストックファクター増強における拡散トランスフォーマー DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation ( http://arxiv.org/abs/2402.06656v1 ) ライセンス: Link先を確認	Yuan Gao, Haokun Chen, Xiang Wang, Zhicai Wang, Xue Wang, Jinyang Gao, Bolin Ding	(参考訳) 機械学習モデルは、幅広いストック予測タスクにおいて顕著な効果と効率を示した。しかし、SNR(low signal-to-noise ratio)やデータ均一性といったデータ不足の固有の課題は、正確な予測に重大な障害をもたらす。この問題に対処するために,AIGS(AIGS)を用いてトレーニング手順を強化する新しい手法を提案する。本稿では,Transformerアーキテクチャ(DiffsFormer)を用いたストックファクタ生成のための拡散モデルを提案する。 diffsformerは当初、グローバルなジョイント分布を捉えるために条件付きガイダンスを組み込んだ、大規模なソースドメインでトレーニングされた。特定の下流タスクを提示すると、既存のサンプルを編集してトレーニング手順を強化するためにDiffsFormerを使用します。この編集ステップにより、生成したデータが対象領域から逸脱する範囲を決定することにより、編集プロセスの強度を制御できる。 DiffsFormerの強化トレーニングの有効性を評価するため、8つの一般的な機械学習モデルを用いて、CSI300およびCSI800データセットの実験を行った。提案手法は,各データセットの年率7.2%,27.8%の相対的改善を実現する。さらに、DiffsFormerとその構成コンポーネントの機能に関する洞察を得るために広範な実験を行い、データ不足の課題にどのように対処するかを解明し、全体的なモデルパフォーマンスを向上させる。本研究は,AIGSとDiffsFormerアーキテクチャを活用して,ストック予測タスクにおけるデータ不足を軽減する効果を示す。 Machine learning models have demonstrated remarkable efficacy and efficiency in a wide range of stock forecasting tasks. However, the inherent challenges of data scarcity, including low signal-to-noise ratio (SNR) and data homogeneity, pose significant obstacles to accurate forecasting. To address this issue, we propose a novel approach that utilizes artificial intelligence-generated samples (AIGS) to enhance the training procedures. In our work, we introduce the Diffusion Model to generate stock factors with Transformer architecture (DiffsFormer). DiffsFormer is initially trained on a large-scale source domain, incorporating conditional guidance so as to capture global joint distribution. When presented with a specific downstream task, we employ DiffsFormer to augment the training procedure by editing existing samples. This editing step allows us to control the strength of the editing process, determining the extent to which the generated data deviates from the target domain. To evaluate the effectiveness of DiffsFormer augmented training, we conduct experiments on the CSI300 and CSI800 datasets, employing eight commonly used machine learning models. The proposed method achieves relative improvements of 7.2% and 27.8% in annualized return ratio for the respective datasets. Furthermore, we perform extensive experiments to gain insights into the functionality of DiffsFormer and its constituent components, elucidating how they address the challenges of data scarcity and enhance the overall model performance. Our research demonstrates the efficacy of leveraging AIGS and the DiffsFormer architecture to mitigate data scarcity in stock forecasting tasks.	翻訳日:2024-02-18 13:54:35 公開日:2024-02-05
# adversarial text clean: 防衛のための大規模言語モデルアプローチ Adversarial Text Purification: A Large Language Model Approach for Defense ( http://arxiv.org/abs/2402.06655v1 ) ライセンス: Link先を確認	Raha Moraffah, Shubh Khandelwal, Amrita Bhattacharjee, and Huan Liu	(参考訳) 敵対的浄化は、攻撃の種類や分類者の訓練を知ることなく、敵対的攻撃に対して分類器を保護するための防御機構である。これらの手法は攻撃された入力から敵の摂動を特徴づけ、排除し、初期攻撃された入力と類似性を保持し、分類器によって正しく分類される精製サンプルを復元することを目的としている。離散入力に対するノイズの摂動を特徴付ける本質的な課題から、逆行テキストの浄化は比較的未検討である。本稿では,テキスト分類器の防御における逆浄化法の有効性について検討する。本稿では,Large Language Models (LLMs) の生成能力を活用して,離散雑音の摂動を明示的に特徴づけることなく,対向テキストを浄化する新しい対向テキスト浄化法を提案する。我々は, llmを用いて, 意味的に類似し, 正しく分類されるような, 特定の敵例の純化例を回収する。提案手法は,様々な分類器に対して顕著な性能を示し,攻撃時の精度を平均65%以上向上させる。 Adversarial purification is a defense mechanism for safeguarding classifiers against adversarial attacks without knowing the type of attacks or training of the classifier. These techniques characterize and eliminate adversarial perturbations from the attacked inputs, aiming to restore purified samples that retain similarity to the initially attacked ones and are correctly classified by the classifier. Due to the inherent challenges associated with characterizing noise perturbations for discrete inputs, adversarial text purification has been relatively unexplored. In this paper, we investigate the effectiveness of adversarial purification methods in defending text classifiers. We propose a novel adversarial text purification that harnesses the generative capabilities of Large Language Models (LLMs) to purify adversarial text without the need to explicitly characterize the discrete noise perturbations. We utilize prompt engineering to exploit LLMs for recovering the purified examples for given adversarial examples such that they are semantically similar and correctly classified. Our proposed method demonstrates remarkable performance over various classifiers, improving their accuracy under the attack by over 65% on average.	翻訳日:2024-02-18 13:54:10 公開日:2024-02-05
# 強化学習における説明可能性のための抽象軌道可視化 Abstracted Trajectory Visualization for Explainability in Reinforcement Learning ( http://arxiv.org/abs/2402.07928v1 ) ライセンス: Link先を確認	Yoshiki Takagi, Roderick Tabalba, Nurit Kirshenbaum, Jason Leigh	(参考訳) 説明可能なAI(XAI)は、強化学習(RL)実践者がRLモデルがどのように機能するかを理解するのに役立つ可能性を実証している。しかし、RLの専門知識を持たないユーザ(非RLの専門家)向けのXAIは十分に研究されていない。これは、人間とAIが共存する入ってくる社会のためにRLモデルがどのように設計されるべきかについての基本的な議論に、非RL専門家が参加することの難しさをもたらす。このような問題を解決することで、RLの専門家は、社会に合った機械学習ソリューションを作成する際に、非RL専門家とコミュニケーションできるようになります。我々は、RLモデルの主要な状態間の遷移を描写した抽象軌道は、非RLの専門家がエージェントのメンタルモデルを構築するのに役立つと論じる。我々の初期の結果は、抽象化された軌跡の可視化を活用することで、RLの専門知識を持たないユーザは、RLの行動パターンを推測できることを示唆している。 Explainable AI (XAI) has demonstrated the potential to help reinforcement learning (RL) practitioners to understand how RL models work. However, XAI for users who do not have RL expertise (non-RL experts), has not been studied sufficiently. This results in a difficulty for the non-RL experts to participate in the fundamental discussion of how RL models should be designed for an incoming society where humans and AI coexist. Solving such a problem would enable RL experts to communicate with the non-RL experts in producing machine learning solutions that better fit our society. We argue that abstracted trajectories, that depicts transitions between the major states of the RL model, will be useful for non-RL experts to build a mental model of the agents. Our early results suggest that by leveraging a visualization of the abstracted trajectories, users without RL expertise are able to infer the behavior patterns of RL.	翻訳日:2024-02-18 13:27:49 公開日:2024-02-05
# 大規模言語モデルにおけるプロンプトエンジニアリングの体系的調査:技術と応用 A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications ( http://arxiv.org/abs/2402.07927v1 ) ライセンス: Link先を確認	Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha	(参考訳) プロンプトエンジニアリングは、大型言語モデル(LLM)とビジョン言語モデル(VLM)の能力を拡張するための欠かせない技術として登場した。このアプローチでは、プロンプトと呼ばれるタスク固有の命令を活用し、コアモデルパラメータを変更することなくモデルの有効性を高める。モデルパラメータを更新する代わりに、プロンプトは、与えられたプロンプトのみに基づいて所望のモデル動作を引き出すことによって、事前訓練されたモデルを下流タスクにシームレスに統合することを可能にする。 Promptsは、関連する知識を活性化するモデルや学習ベクター表現をガイドするコンテキストを提供する自然言語命令である。この急成長した分野は、質問応答から常識推論まで、様々なアプリケーションで成功を収めた。しかし、体系的な組織や様々な即席の工学的手法や技術の理解が欠けているままである。本稿では,アプリケーション領域別に分類したプロンプトエンジニアリングの最近の進歩を構造化した概要を提供することにより,このギャップについて述べる。それぞれのプロンプトアプローチについて、プロンプトの方法論、そのアプリケーション、関連するモデル、使用するデータセットを詳細に説明した概要を提供する。また、各アプローチの長所と限界についても検討し、各プロンプトテクニックのデータセット、モデル、および重要なポイントを要約した分類図とテーブルを含む。この系統的な分析は、この急速に発展する分野をよりよく理解し、オープンな課題と迅速なエンジニアリングの機会を照明することで将来の研究を促進する。 Prompt engineering has emerged as an indispensable technique for extending the capabilities of large language models (LLMs) and vision-language models (VLMs). This approach leverages task-specific instructions, known as prompts, to enhance model efficacy without modifying the core model parameters. Rather than updating the model parameters, prompts allow seamless integration of pre-trained models into downstream tasks by eliciting desired model behaviors solely based on the given prompt. Prompts can be natural language instructions that provide context to guide the model or learned vector representations that activate relevant knowledge. This burgeoning field has enabled success across various applications, from question-answering to commonsense reasoning. However, there remains a lack of systematic organization and understanding of the diverse prompt engineering methods and techniques. This survey paper addresses the gap by providing a structured overview of recent advancements in prompt engineering, categorized by application area. For each prompting approach, we provide a summary detailing the prompting methodology, its applications, the models involved, and the datasets utilized. We also delve into the strengths and limitations of each approach and include a taxonomy diagram and table summarizing datasets, models, and critical points of each prompting technique. This systematic analysis enables a better understanding of this rapidly developing field and facilitates future research by illuminating open challenges and opportunities for prompt engineering.	翻訳日:2024-02-18 13:27:33 公開日:2024-02-05
# data creatorからdata reuserへ: 距離が重要 From Data Creator to Data Reuser: Distance Matters ( http://arxiv.org/abs/2402.07926v1 ) ライセンス: Link先を確認	Christine L. Borgman, Paul T. Groth	(参考訳) 研究データの共有は複雑で、労働集約的で、高価であり、複数の利害関係者によるインフラ投資を必要とする。オープンサイエンスのポリシーはデータの再利用よりもデータリリースにフォーカスしているが、再利用もまた難しく、費用がかかり、決して起こり得ない。データ管理への投資は、誰がデータを再利用できるか、どのように、なぜ、どのような目的のために、いつ、どのように再利用するかを考えることでより賢明に行うことができる。我々の目標は、ステークホルダーが研究データに投資する方法、潜在的な再利用や再利用者を特定する方法、データ交換プロセスを改善する方法を決定するのに役立つ要因を特定することです。データ共有と再利用に関する経験的研究をもとに,データクリエータとデータリユーザとの間の距離に関する理論的構成を開発し,知識を効果的に伝達する能力に影響を与える6つの距離次元(ドメイン,メソッド,コラボレーション,キュレーション,目的,時間と時間)を同定する。これらの次元は主に社会的性格であり、関連する技術的側面はクリエーターと再利用者の間の距離を減らしたり、増やしたりすることができる。データ再利用における期待される影響の順序と、6次元が相互依存する方法を特定する。データ作成者と将来的な再利用者の間の距離に関する理論的枠組みは、データの共有と再利用をより効果的にする方法に関するステークホルダーの4つのカテゴリ – データ作成者、データ再利用者、データアーキビスト、資金提供機関 – に推奨を与えます。 Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release rather than on data reuse, yet reuse is also difficult, expensive, and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when. Data creators cannot anticipate all possible reuses or reusers; our goal is to identify factors that may aid stakeholders in deciding how to invest in research data, how to identify potential reuses and reusers, and how to improve data exchange processes. Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. These dimensions are primarily social in character, with associated technical aspects that can decrease - or increase - distances between creators and reusers. We identify the order of expected influence on data reuse and ways in which the six dimensions are interdependent. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies.	翻訳日:2024-02-18 13:27:08 公開日:2024-02-05
# ポイントとインストラクション:直接操作とテキストインストラクションの統合による精密画像編集の実現 Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text Instructions ( http://arxiv.org/abs/2402.07925v1 ) ライセンス: Link先を確認	Alec Helbling, Seongmin Lee, Polo Chau	(参考訳) 機械学習は、自然言語命令から画像を編集できる強力なシステムの開発を可能にした。しかし、多くの一般的なシナリオでは、ユーザーがテキストだけで正確な画像変換を指定することは困難である。例えば、複数の犬を乗せた画像では、特定の犬を選択して正確な場所に移動させることは困難である。これをテキストだけで行うには、ターゲット犬を曖昧にし、目的地を記述する複雑なプロンプトが必要になる。しかし、直接操作はオブジェクトの選択や場所の指定といった視覚的なタスクに適している。本稿では,手慣れた直接操作とテキスト命令をシームレスに組み合わせ,正確な画像操作を可能にするシステムであるPoint and Instructを紹介する。本システムでは,オブジェクトや位置を視覚的にマークし,テキストによる指示で参照することができる。これにより、ユーザーは自然言語の視覚的記述性と直接操作の空間的精度の両方の利点を享受できる。 Machine learning has enabled the development of powerful systems capable of editing images from natural language instructions. However, in many common scenarios it is difficult for users to specify precise image transformations with text alone. For example, in an image with several dogs, it is difficult to select a particular dog and move it to a precise location. Doing this with text alone would require a complex prompt that disambiguates the target dog and describes the destination. However, direct manipulation is well suited to visual tasks like selecting objects and specifying locations. We introduce Point and Instruct, a system for seamlessly combining familiar direct manipulation and textual instructions to enable precise image manipulation. With our system, a user can visually mark objects and locations, and reference them in textual instructions. This allows users to benefit from both the visual descriptiveness of natural language and the spatial precision of direct manipulation.	翻訳日:2024-02-18 13:26:42 公開日:2024-02-05
# Wasserstein生成逆数ネットワークを用いた脳波信号分類精度の向上 Improving EEG Signal Classification Accuracy Using Wasserstein Generative Adversarial Networks ( http://arxiv.org/abs/2402.09453v1 ) ライセンス: Link先を確認	Joshua Park, Priyanshu Mahey, Ore Adeniyi	(参考訳) 脳波(EEG)は脳の活動を記録する上で重要な役割を担い、脳-コンピュータインターフェース(BCI)技術の発展に不可欠である。しかし、脳波信号の可用性と高い可変性は、信頼性の高いBCIを作成する上で大きな課題となる。この問題に対処するために,ディープラーニングとWasserstein Generative Adversarial Network (WGAN) の最近の発展に関する実践的な解法を提案する。 WGANは、45人の個人から約1500個の脳波記録と64個のチャネルからなるBCI2000データセットで訓練された。生成した脳波信号は3つの分類器を用いて評価され,平均精度が向上した。また,frechetインセプション距離(fid)を用いて測定した信号のクオリティは,それぞれ1.345点,11.565点であった。スペクトル損失項や空間損失項がなくても,我々のWGANモデルは脳波トレーニングデータのスペクトル特性と空間特性をエミュレートすることができた。 wganが生成したデータは、その地形図とパワースペクトル密度(psd)プロットにおいて、閉眼安静時および高デルタ波中の支配的アルファ活動を反映していた。本研究は,BCI 開発における限られた脳波データ問題に対処する WGAN の可能性を検証するものである。 Electroencephalography (EEG) plays a vital role in recording brain activities and is integral to the development of brain-computer interface (BCI) technologies. However, the limited availability and high variability of EEG signals present substantial challenges in creating reliable BCIs. To address this issue, we propose a practical solution drawing on the latest developments in deep learning and Wasserstein Generative Adversarial Network (WGAN). The WGAN was trained on the BCI2000 dataset, consisting of around 1500 EEG recordings and 64 channels from 45 individuals. The generated EEG signals were evaluated via three classifiers yielding improved average accuracies. The quality of generated signals measured using Frechet Inception Distance (FID) yielded scores of 1.345 and 11.565 for eyes-open and closed respectively. Even without a spectral or spatial loss term, our WGAN model was able to emulate the spectral and spatial properties of the EEG training data. The WGAN-generated data mirrored the dominant alpha activity during closed-eye resting and high delta waves in the training data in its topographic map and power spectral density (PSD) plot. Our research testifies to the potential of WGANs in addressing the limited EEG data issue for BCI development by enhancing a small dataset to improve classifier generalizability.	翻訳日:2024-02-18 12:48:37 公開日:2024-02-05
# 効率的な検索増産のための財務報告チャンキング Financial Report Chunking for Effective Retrieval Augmented Generation ( http://arxiv.org/abs/2402.05131v1 ) ライセンス: Link先を確認	Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, and Leah Li	(参考訳) チャンキング情報は、検索拡張生成(RAG)の重要なステップである。現在の研究は主に段落レベルのチャンキングに焦点を当てている。このアプローチは全てのテキストを等しく扱い、文書の構造に含まれる情報を無視する。本稿では,文書の構造的要素によって,単に段落レベルのチャンクを超えて文書をチャンクする手法を提案する。これらの構成要素に文書を分割すると、チューニングせずに最高のチャンクサイズとなる文書をチャンクする新しい方法が生成される。本稿では,文書理解モデルによって注釈付けされた要素タイプに基づくチャンキングが,検索した情報の全体的なコンテキストと精度にどのように貢献するかを評価する新しいフレームワークを提案する。また、このアプローチがRAG支援質問&回答タスクのパフォーマンスにどのように影響するかを示す。本研究は, 各種要素の包括的分析, 有効情報検索における役割, RAG出力の品質への影響について検討した。要素タイプベースのチャンキングのサポートを見つけることは、財務報告のRAG結果を大幅に改善します。本研究により,高精度RAGの発見方法についても答えることができた。 Chunking information is a key step in Retrieval Augmented Generation (RAG). Current research primarily centers on paragraph-level chunking. This approach treats all texts as equal and neglects the information contained in the structure of documents. We propose an expanded approach to chunk documents by moving beyond mere paragraph-level chunking to chunk primary by structural element components of documents. Dissecting documents into these constituent elements creates a new way to chunk documents that yields the best chunk size without tuning. We introduce a novel framework that evaluates how chunking based on element types annotated by document understanding models contributes to the overall context and accuracy of the information retrieved. We also demonstrate how this approach impacts RAG assisted Question & Answer task performance. Our research includes a comprehensive analysis of various element types, their role in effective information retrieval, and the impact they have on the quality of RAG outputs. Findings support that element type based chunking largely improve RAG results on financial reporting. Through this research, we are also able to answer how to uncover highly accurate RAG.	翻訳日:2024-02-09 18:12:47 公開日:2024-02-05
# LB-KBQA:大言語モデルとBERTに基づく知識に基づく質問・回答システム LB-KBQA: Large-language-model and BERT based Knowledge-Based Question and Answering System ( http://arxiv.org/abs/2402.05130v1 ) ライセンス: Link先を確認	Yan Zhao, Zhongyun Li, Jiaxing Wang	(参考訳) 生成人工知能(AI)は、その創発的な能力のため、様々な分野に力を与えており、その典型例は大規模言語モデル(LLM)である。 Generative AIの典型的な応用分野の1つは大規模言語モデル(LLM)であり、LLMの自然言語理解能力は従来のAIベースの手法と比較して劇的に改善されている。自然言語理解能力は、言語多様性と新たに現れた意図から生じる知識・質問・回答システム(kbqa)の意図認識性能に常に障壁となっている。従来のaiベースのインテント認識は、セマンティック解析ベースのアプローチとモデルベースのアプローチに分けられる。しかし、どちらの方法も意図認識の資源が限られている。本稿では,Large Language Model(LLM)とBERT(LB-KBQA)に基づくKBQAシステムを提案する。生成AIの助けを借りて,提案手法は新たに現れた意図を検知し,新たな知識を得ることができた。金融分野質問応答の実験では,本モデルの方が優れた効果を示した。 Generative Artificial Intelligence (AI), because of its emergent abilities, has empowered various fields, one typical of which is large language models (LLMs). One of the typical application fields of Generative AI is large language models (LLMs), and the natural language understanding capability of LLM is dramatically improved when compared with conventional AI-based methods. The natural language understanding capability has always been a barrier to the intent recognition performance of the Knowledge-Based-Question-and-Answer (KBQA) system, which arises from linguistic diversity and the newly appeared intent. Conventional AI-based methods for intent recognition can be divided into semantic parsing-based and model-based approaches. However, both of the methods suffer from limited resources in intent recognition. To address this issue, we propose a novel KBQA system based on a Large Language Model(LLM) and BERT (LB-KBQA). With the help of generative AI, our proposed method could detect newly appeared intent and acquire new knowledge. In experiments on financial domain question answering, our model has demonstrated superior effectiveness.	翻訳日:2024-02-09 18:12:32 公開日:2024-02-05
# 大規模言語モデルを用いたテキストアノテーションのベストプラクティス Best Practices for Text Annotation with Large Language Models ( http://arxiv.org/abs/2402.05129v1 ) ライセンス: Link先を確認	Petter T\"ornberg	(参考訳) 大規模な言語モデル(llm)は、使いやすさ、正確性、そして比較的低いコストのため、テキストアノテーションの新たな時代を迎えている。しかし、この分野の急速な成長は、LCMベースのアノテーションが学術的なワイルド・ウェストのようなものとなり、確立されたプラクティスや標準の欠如が研究の質と妥当性の懸念につながった。研究者たちは、LLMの目に見える単純さは、偏見、誤解、信頼できない結果をもたらすため、誤解を招く可能性があると警告している。本稿では, LLMの変革的可能性を認識し, 信頼性, 再現性, 倫理的利用に関する包括的基準とベストプラクティスを提案する。これらのガイドラインは、モデル選択、迅速なエンジニアリング、構造化されたプロンプト、迅速な安定性分析、厳格なモデル検証、倫理的および法的影響の考慮といった重要な領域にまたがる。本稿は,テキストアノテーションの実践の整合性と堅牢性を確保することを目的とした,構造化され,指示され,形式化された LLM の利用方法の必要性を強調し,社会科学研究における LLM との曖昧で批判的な関わりを提唱する。 Large Language Models (LLMs) have ushered in a new era of text annotation, as their ease-of-use, high accuracy, and relatively low costs have meant that their use has exploded in recent months. However, the rapid growth of the field has meant that LLM-based annotation has become something of an academic Wild West: the lack of established practices and standards has led to concerns about the quality and validity of research. Researchers have warned that the ostensible simplicity of LLMs can be misleading, as they are prone to bias, misunderstandings, and unreliable results. Recognizing the transformative potential of LLMs, this paper proposes a comprehensive set of standards and best practices for their reliable, reproducible, and ethical use. These guidelines span critical areas such as model selection, prompt engineering, structured prompting, prompt stability analysis, rigorous model validation, and the consideration of ethical and legal implications. The paper emphasizes the need for a structured, directed, and formalized approach to using LLMs, aiming to ensure the integrity and robustness of text annotation practices, and advocates for a nuanced and critical engagement with LLMs in social scientific research.	翻訳日:2024-02-09 18:12:13 公開日:2024-02-05
# 大規模言語モデルによるテキスト質問応答タスクの強化と検索拡張 Enhancing Textbook Question Answering Task with Large Language Models and Retrieval Augmented Generation ( http://arxiv.org/abs/2402.05128v1 ) ライセンス: Link先を確認	Hessa Abdulrahman Alawwad, Areej Alhothali, Usman Naseem, Ali Alkhathlan, Amani Jamal	(参考訳) テキスト質問応答(TQA)は、コンテキストとマルチモーダルデータの複雑な性質のため、人工知能において難しい課題である。これまでの研究はタスクを大幅に改善したが、モデルの弱い推論や、長いコンテキストでコンテキスト情報をキャプチャできないなど、いくつかの制限がある。大規模言語モデル(LLM)の導入は、AIの分野に革命をもたらしたが、直接LLMを適用することは、しばしば不正確な答えをもたらす。本稿では,検索拡張生成(rag)手法を取り入れ,トランスファー学習を長文文脈の処理に活用し,推論能力を高めることで,異なる教訓にまたがる概念が広まるtqaの領域外シナリオを扱う手法を提案する。 LLMモデルLlama-2の微調整とRAGの導入により、アーキテクチャはベースラインよりも優れ、検証セットでは4.12%、非ダイアグラム多重選択質問では9.84%の精度向上を実現した。 Textbook question answering (TQA) is a challenging task in artificial intelligence due to the complex nature of context and multimodal data. Although previous research has significantly improved the task, there are still some limitations including the models' weak reasoning and inability to capture contextual information in the lengthy context. The introduction of large language models (LLMs) has revolutionized the field of AI, however, directly applying LLMs often leads to inaccurate answers. This paper proposes a methodology that handle the out-of-domain scenario in TQA where concepts are spread across different lessons by incorporating the retrieval augmented generation (RAG) technique and utilize transfer learning to handle the long context and enhance reasoning abilities. Through supervised fine-tuning of the LLM model Llama-2 and the incorporation of RAG, our architecture outperforms the baseline, achieving a 4.12% accuracy improvement on validation set and 9.84% on test set for non-diagram multiple-choice questions.	翻訳日:2024-02-09 18:11:50 公開日:2024-02-05
# Illuminate: 素早い工学的分析とプロアクティブセラピーによるうつ病検出のための新しいアプローチ Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering ( http://arxiv.org/abs/2402.05127v1 ) ライセンス: Link先を確認	Aryan Agrawal	(参考訳) 本稿では,GPT-4(Generative Pre-trained Transformer 4),Llama 2 chat,およびGeminiを用いた抑うつ検出・治療のための新しいパラダイムを提案する。これらのLSMは、うつ病の診断、説明、治療介入を提案する特別なプロンプトで微調整されている。ユニークな数発プロンプト法は、DSM-5基準に基づいて抑うつ症状を分析し説明する能力を高める。相互作用フェーズでは、モデルが共感的対話管理に従事し、サイコDBや認知行動療法(CBT)ガイドのようなリソースから引き起こされ、うつ病を経験する個人との支持的な相互作用を促進する。さらに、さまざまなCBTモジュールを満載したIlluminate Databaseを導入し、パーソナライズされた治療勧告を支援する。本研究は,f1スコア,精度,リコール,コサイン類似度,リコール指向下限などの測定値を用いて,異なるテストセットをまたいだジェスト評価(ルージュ)のためのllm性能を評価し,その効果を実証する。この包括的なアプローチは、最先端のAIと確立された心理学的手法を融合し、メンタルヘルスの新たな可能性を提供し、うつ病の診断と治療戦略に革命をもたらすLLMの可能性を示す。 This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini. These LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression. A unique few-shot prompting method enhances the models' ability to analyze and explain depressive symptoms based on the DSM-5 criteria. In the interaction phase, the models engage in empathetic dialogue management, drawing from resources like PsychDB and a Cognitive Behavioral Therapy (CBT) Guide, fostering supportive interactions with individuals experiencing major depressive disorders. Additionally, the research introduces the Illuminate Database, enriched with various CBT modules, aiding in personalized therapy recommendations. The study evaluates LLM performance using metrics such as F1 scores, Precision, Recall, Cosine similarity, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) across different test sets, demonstrating their effectiveness. This comprehensive approach blends cutting-edge AI with established psychological methods, offering new possibilities in mental health care and showcasing the potential of LLMs in revolutionizing depression diagnosis and treatment strategies.	翻訳日:2024-02-09 18:11:32 公開日:2024-02-05
# グラフニューラルネットワークとNERによるテキスト要約 Graph Neural Network and NER-Based Text Summarization ( http://arxiv.org/abs/2402.05126v1 ) ライセンス: Link先を確認	Imaad Zaffar Khan, Amaan Aijaz Sheikh, Utkarsh Sinha	(参考訳) 今日の時間にデータや情報が豊富にあるため、人間や機械でさえ、すべてのデータ線を直線で通過することはほぼ不可能である。通常、行をスキップして真に重要な情報を保持しようとすることは、より正式な用語で言えば、要約(summarization)と呼ばれる。テキスト要約は、中核情報と意味を保存しつつ、長い文書や記事をより短く一貫性のある表現に圧縮することを目的とした重要なタスクである。本稿では,グラフニューラルネットワーク(GNN)と名前付きエンティティ認識(NER)システムを活用した,テキスト要約のための革新的なアプローチを紹介する。 gnnは、テキスト情報に固有の関係データをキャプチャして処理する特別な能力を持ち、大きなドキュメント内の複雑な構造を理解するのに適しています。一方、NERシステムはキーエンティティを特定して強調することで貢献し、要約プロセスがテキストの最も重要な側面に焦点を合わせていることを保証します。これら2つの技術を統合することにより,要約の効率を高めるとともに,凝縮内容の高次関連性を確保することを目的とする。したがって、このプロジェクトは、情報飽和な世界でますます増加するテキストデータの量を扱うための、有望な方向性を提供する。 With the abundance of data and information in todays time, it is nearly impossible for man, or, even machine, to go through all of the data line by line. What one usually does is to try to skim through the lines and retain the absolutely important information, that in a more formal term is called summarization. Text summarization is an important task that aims to compress lengthy documents or articles into shorter, coherent representations while preserving the core information and meaning. This project introduces an innovative approach to text summarization, leveraging the capabilities of Graph Neural Networks (GNNs) and Named Entity Recognition (NER) systems. GNNs, with their exceptional ability to capture and process the relational data inherent in textual information, are adept at understanding the complex structures within large documents. Meanwhile, NER systems contribute by identifying and emphasizing key entities, ensuring that the summarization process maintains a focus on the most critical aspects of the text. By integrating these two technologies, our method aims to enhances the efficiency of summarization and also tries to ensures a high degree relevance in the condensed content. This project, therefore, offers a promising direction for handling the ever increasing volume of textual data in an information-saturated world.	翻訳日:2024-02-09 18:11:04 公開日:2024-02-05
# LLMを併用したゼロショット臨床試験 Zero-Shot Clinical Trial Patient Matching with LLMs ( http://arxiv.org/abs/2402.05125v1 ) ライセンス: Link先を確認	Michael Wornow, Alejandro Lozano, Dev Dash, Jenelle Jindal, Kenneth W. Mahaffey, Nigam H. Shah	(参考訳) 患者を臨床試験に合わせることは、新しい薬を市場に出す上で、未解決の課題だ。今日では、臨床試験の適格基準を満たす患者を特定することは非常に手作業であり、患者1人につき最大1時間かかる。しかし、構造化されていない臨床テキストを理解する必要があるため、自動スクリーニングは難しい。大きな言語モデル(LLM)は有望なソリューションを提供する。本研究では,試用マッチングへの応用について検討する。まず,非構造化臨床テキストとして患者の医療歴を考慮し,その患者が一連の包含基準(フリーテキストとしても特定)を満たしているかを評価するllmベースのシステムを設計する。ゼロショットシステムは、n2c2 2018コホート選択ベンチマークで最先端のスコアを得る。第2に,現状よりも桁違いに高速かつ安価に患者にマッチするプロンプト戦略を特定し,高い性能を維持しながら3分の1までのトークン処理を3分の1まで削減する2段階検索パイプラインを開発することにより,本手法のデータとコスト効率を向上させる。第3に,llmが生成する自然言語の正当化を臨床医に評価させ,適切な判断の97%,不正確な判断の75%に対してコヒーレントな説明を出力させることで,システムの解釈可能性を評価する。本研究は,臨床治験を加速するためのLSMの有用性を実証するものである。 Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations.	翻訳日:2024-02-09 18:10:43 公開日:2024-02-05
# TexShape: 言語モデルのための情報理論文埋め込み TexShape: Information Theoretic Sentence Embedding for Language Models ( http://arxiv.org/abs/2402.05132v1 ) ライセンス: Link先を確認	H. Kaan Kale, Homa Esfahanizadeh, Noel Elias, Oguzhan Baser, Muriel Medard, Sriram Vishwanath	(参考訳) データボリュームの指数的な増加と、特に機械学習分野におけるデータ集約型アプリケーションの出現により、資源利用、プライバシ、公平性に関する懸念が最重要になっている。本稿では,データのテキスト領域に焦点をあて,情報理論のレンズを通して文を最適化された表現に符号化する際の課題に対処する。特に,kullback-leibler 発散の donsker-varadhan 定義を用いて,相互情報の経験的推定を行う。我々の手法は、この推定を利用して、(タスクベースの)データ圧縮や機密情報のフィルタリング、プライバシーと公正性の強化のために、TexShapeと呼ばれる情報理論文の埋め込みを訓練する。本研究では,ニューラルネットワークによる情報理論圧縮と相互情報推定を補完する,初期テキスト表現のためのベンチマーク言語モデルを提案する。本実験は, 圧縮データを用いてトレーニングした下流モデルの予測精度を指標として, 最大目標情報と低感度情報を保存する上で, 顕著な進歩を示した。 With the exponential growth in data volume and the emergence of data-intensive applications, particularly in the field of machine learning, concerns related to resource utilization, privacy, and fairness have become paramount. This paper focuses on the textual domain of data and addresses challenges regarding encoding sentences to their optimized representations through the lens of information-theory. In particular, we use empirical estimates of mutual information, using the Donsker-Varadhan definition of Kullback-Leibler divergence. Our approach leverages this estimation to train an information-theoretic sentence embedding, called TexShape, for (task-based) data compression or for filtering out sensitive information, enhancing privacy and fairness. In this study, we employ a benchmark language model for initial text representation, complemented by neural networks for information-theoretic compression and mutual information estimations. Our experiments demonstrate significant advancements in preserving maximal targeted information and minimal sensitive information over adverse compression ratios, in terms of predictive accuracy of downstream models that are trained using the compressed data.	翻訳日:2024-02-09 17:55:14 公開日:2024-02-05
# 20年にわたる血圧データから学ぶ:7500万人の患者におけるデモグラフィ特有のパターン Learning from Two Decades of Blood Pressure Data: Demography-Specific Patterns Across 75 Million Patient Encounters ( http://arxiv.org/abs/2402.01598v2 ) ライセンス: Link先を確認	Seyedeh Somayyeh Mousavi and Yuting Guo and Abeed Sarker and Reza Sameni	(参考訳) 高血圧は依然として世界的な健康上の懸念であり、血圧(bp)動態の効果的なモニタリングと理解を必要としている。この研究は、高血圧の傾向を理解する上で重要なアプローチであるBP測定から得られる情報の豊富さを掘り下げるものである。 BP変動と様々な要因の関係について多くの研究が報告されている。本研究では,20年間にわたる7500万件の記録からなる広範なデータセットを活用し,年齢,人種,性別などの人口動態のBP変動を調査し分析するユニークな機会を提供する。その結果,性別によるBP変動は統計的に有意ではなく,従来の仮定では困難であった。興味深いことに, 収縮期血圧 (SBP) は年齢とともに常に上昇し, 拡張期血圧 (DBP) は40歳代で顕著なピークを示した。さらに,本研究では,人種集団におけるbp分布の類似性について検討した。この包括的調査は、高血圧に関する現在進行中の談話に寄与し、BP変動を理解する上で、多様な人口統計学的要因を検討することの重要性を強調している。この結果は、特定の人口統計に合わせたパーソナライズされた医療アプローチを示す貴重な洞察を提供する。 Hypertension remains a global health concern with a rising prevalence, necessitating effective monitoring and understanding of blood pressure (BP) dynamics. This study delves into the wealth of information derived from BP measurement, a crucial approach in informing our understanding of hypertensive trends. Numerous studies have reported on the relationship between BP variation and various factors. In this research, we leveraged an extensive dataset comprising 75 million records spanning two decades, offering a unique opportunity to explore and analyze BP variations across demographic features such as age, race, and gender. Our findings revealed that gender-based BP variation was not statistically significant, challenging conventional assumptions. Interestingly, systolic blood pressure (SBP) consistently increased with age, while diastolic blood pressure (DBP) displayed a distinctive peak in the forties age group. Moreover, our analysis uncovered intriguing similarities in the distribution of BP among some of the racial groups. This comprehensive investigation contributes to the ongoing discourse on hypertension and underscores the importance of considering diverse demographic factors in understanding BP variations. Our results provide valuable insights that may inform personalized healthcare approaches tailored to specific demographic profiles.	翻訳日:2024-02-08 18:59:33 公開日:2024-02-05
# 電子密度推定のためのガウス平面波ニューラル演算子 Gaussian Plane-Wave Neural Operator for Electron Density Estimation ( http://arxiv.org/abs/2402.04278v1 ) ライセンス: Link先を確認	Seongsu Kim, Sungsoo Ahn	(参考訳) 本研究は、化学系と密度汎関数理論(dft)のシミュレーションを理解するための基礎となる電子密度予測のための機械学習を研究する。そこで本稿では,DFTの文脈で広く認識されている平面波とガウス型軌道ベースを用いた無限次元関数空間で動作するガウス平面波ニューラル演算子(GPWNO)を紹介する。特に、密度の高周波数成分と低周波数成分は、2つの基底の相補的な性質により効果的に表現できる。 qm9、md、material projectデータセットに関する広範な実験は、gpwnoが10のベースラインよりも優れたパフォーマンスを示している。 This work studies machine learning for electron density prediction, which is fundamental for understanding chemical systems and density functional theory (DFT) simulations. To this end, we introduce the Gaussian plane-wave neural operator (GPWNO), which operates in the infinite-dimensional functional space using the plane-wave and Gaussian-type orbital bases, widely recognized in the context of DFT. In particular, both high- and low-frequency components of the density can be effectively represented due to the complementary nature of the two bases. Extensive experiments on QM9, MD, and material project datasets demonstrate GPWNO's superior performance over ten baselines.	翻訳日:2024-02-08 18:30:36 公開日:2024-02-05
# ast差分法のハイパーパラメータ最適化 Hyperparameter Optimization for AST Differencing ( http://arxiv.org/abs/2011.10268v3 ) ライセンス: Link先を確認	Matias Martinez and Jean-R\'emy Falleri and Martin Monperrus	(参考訳) 同じプログラムの2つのバージョンの違いを計算することは、ソフトウェア開発とソフトウェア進化研究に不可欠なタスクである。 ASTの違いは最も先進的な方法であり、活発な研究領域である。しかし、ASTの差分アルゴリズムは、その有効性に強い影響を与えるかもしれない設定パラメータに依存している。本稿では,ASTの高パラメータ最適化のための DAT (Diff Auto Tuning) という新しい手法を提案する。 ASTの差分処理におけるハイパーコンフィグレーションの問題について詳しく述べる。我々は、GumTreeという最先端AST差分アルゴリズムによって生成される編集スクリプトを、異なるシナリオで最適化するために、データ駆動型アプローチDATを評価した。 DATは、評価されたケースの21.8%で編集スクリプトを改善するGumTreeの新しい構成を見つけることができる。 Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing algorithms rely on configuration parameters that may have a strong impact on their effectiveness. In this paper, we present a novel approach named DAT (Diff Auto Tuning) for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper-configuration for AST differencing. We evaluate our data-driven approach DAT to optimize the edit-scripts generated by the state-of-the-art AST differencing algorithm named GumTree in different scenarios. DAT is able to find a new configuration for GumTree that improves the edit-scripts in 21.8% of the evaluated cases.	翻訳日:2024-02-07 21:52:43 公開日:2024-02-05
# 最大距離と平均距離の相関による高次元独立試験 High-Dimensional Independence Testing via Maximum and Average Distance Correlations ( http://arxiv.org/abs/2001.01095v2 ) ライセンス: Link先を確認	Cencheng Shen, Yuexiao Dong	(参考訳) 本稿では,多変量独立試験における最大距離と平均距離の相関式の導入と検討を行う。差分依存次元の個数に関する高次元的条件下でのそれらの整合性特性を特徴付け,各試験統計の利点を評価し,それぞれのヌル分布を検証し,高速なチ方形試験法を提案する。得られたテストは非パラメトリックであり、基礎となる計量としてユークリッド距離とガウス核の両方に適用できる。提案試験の実用事例をよりよく理解するために, 種々の多変量依存シナリオにおける最大距離相関, 平均距離相関, および原位置相関の実証的性能を評価し, 実際のデータ実験を行い, ヒト血漿中の癌の種類やペプチドの濃度について検討した。 This paper introduces and investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally dependent dimensions, assess the advantages of each test statistic, examine their respective null distributions, and present a fast chi-square-based testing procedure. The resulting tests are non-parametric and applicable to both Euclidean distance and the Gaussian kernel as the underlying metric. To better understand the practical use cases of the proposed tests, we evaluate the empirical performance of the maximum distance correlation, average distance correlation, and the original distance correlation across various multivariate dependence scenarios, as well as conduct a real data experiment to test the presence of various cancer types and peptide levels in human plasma.	翻訳日:2024-02-07 21:51:27 公開日:2024-02-05
# 雑音二元フィードバックからの最適クラスタリング Optimal Clustering from Noisy Binary Feedback ( http://arxiv.org/abs/1910.06002v4 ) ライセンス: Link先を確認	Kaito Ariu, Jungseul Ok, Alexandre Proutiere, Se-Young Yun	(参考訳) 本研究では,アイテム群をバイナリユーザフィードバックからクラスタリングする問題について検討する。このような問題はクラウドソーシングプラットフォームにおいて、ユーザに対して最小限の労力で大規模ラベリングタスクを解決するために発生する。例えば、最近のreCAPTCHAシステムでは、画像を効率的にラベル付けするためにユーザーがクリック(バイナリ回答)することができる。我々の推論問題では、アイテムは最初未知の非重複クラスタにグループ化される。これらのクラスタを回復するために、学習者は、一定有限集合から選択された二項解の質問とともに、有限個の項目のリストをユーザに順次提示する。これら各項目について、ユーザは、アイテムクラスタと質問によって予測が決定され、アイテムを分類する「it hardness」を特徴付けるアイテム固有のパラメータによってノイズの多い回答を提供する。目的は、最小のクラスタリカバリエラーレートでアルゴリズムを考案することである。我々は,任意のアルゴリズムが満たす誤り率に関する問題固有情報理論の下限を,一様および適応的(リスト,質問)選択戦略として導出する。均一な選択のために、K平均アルゴリズム上に構築された単純なアルゴリズムを示し、その性能は基本的限界にほぼ一致する。適応的選択のために,情報理論的誤差の下限の導出にインスパイアされた適応アルゴリズムを開発し,その結果,予算を効率的に配分する。アルゴリズムは、クラスタ化が難しい項目と関連する質問をより頻繁に選択することを学ぶ。我々は,アルゴリズムの性能を適応的選択戦略の有無を数値的に比較し,適応性によって得られる利益を例示する。 We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent reCAPTCHA systems, users clicks (binary answers) can be used to efficiently label images. In our inference problem, items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a noisy answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the {\it hardness} of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon the K-means algorithm and whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare the performance of our algorithms with or without the adaptive selection strategy numerically and illustrate the gain achieved by being adaptive.	翻訳日:2024-02-07 21:51:13 公開日:2024-02-05
# 時間データの独立性テスト Independence Testing for Temporal Data ( http://arxiv.org/abs/1908.06486v4 ) ライセンス: Link先を確認	Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, Joshua T. Vogelstein	(参考訳) 時間データは現代のデータ科学でますます普及している。基本的な問題は、2つの時系列が関連しているかどうかである。既存のアプローチには、パラメトリックな仮定に依存する、線形結合のみを検出する、複数のテストと修正を必要とする、といった制限がある。多くの非パラメトリックで普遍的に一貫した依存尺度が最近提案されているが、時間的データに直接それを適用すると、p値が膨らみ、無効なテストとなる。本稿では,時間データ間の独立性をテストするために,ブロック置換を伴う時間依存統計法を提案する。適切な仮定の下では、提案手法は漸近的に有効であり、定常時系列間の独立性をテストするために普遍的に一貫性があり、依存を最大化する最適な依存遅延を推定することができる。特に、距離とカーネルベースの依存尺度の豊富なファミリーと互換性があり、複数のテストの必要性をなくし、多変量、低サンプルサイズ、非線形設定において優れたパワーを示す。 fMRIデータによる神経接続の解析により、視覚ネットワークとデフォルトモードネットワーク内の信号間の様々な時間的依存が明らかになった。 Temporal data are increasingly prevalent in modern data science. A fundamental question is whether two time-series are related or not. Existing approaches often have limitations, such as relying on parametric assumptions, detecting only linear associations, and requiring multiple tests and corrections. While many non-parametric and universally consistent dependence measures have recently been proposed, directly applying them to temporal data can inflate the p-value and result in invalid test. To address these challenges, this paper introduces the temporal dependence statistic with block permutation to test independence between temporal data. Under proper assumptions, the proposed procedure is asymptotically valid and universally consistent for testing independence between stationary time-series, and capable of estimating the optimal dependence lag that maximizes the dependence. Notably, it is compatible with a rich family of distance and kernel based dependence measures, eliminates the need for multiple testing, and demonstrates superior power in multivariate, low sample size, and nonlinear settings. An analysis of neural connectivity with fMRI data reveals various temporal dependence among signals within the visual network and default mode network.	翻訳日:2024-02-07 21:50:50 公開日:2024-02-05
# 強化学習による再帰的QAOA Reinforcement Learning Assisted Recursive QAOA ( http://arxiv.org/abs/2207.06294v2 ) ライセンス: Link先を確認	Yash J. Patel, Sofiene Jerbi, Thomas B\"ack, Vedran Dunjko	(参考訳) 近年、量子近似最適化アルゴリズム (QAOA) のような変分量子アルゴリズムは、強い組合せ最適化問題に対処するためにNISQデバイスを使うことを期待して人気を集めている。しかし、低深さでは、QAOAの特定の局所性制約がその性能を制限することが知られている。これらの制限を超えるために、近似解の品質を改善するために、局所的でないQAOA、すなわち再帰的QAOA(RQAOA)が提案された。 RQAOAはQAOAよりも比較的小さく研究されており、例えば、どの種類のインスタンスが高品質なソリューションを提供できないかなど、あまり理解されていない。しかし、$\mathsf{NP}$-hard問題(具体的にはイジングスピンモデル)に対処しているため、RQAOAは失敗し、組合せ最適化のためのより優れた量子アルゴリズムを設計するという疑問が提起される。本稿では,RQAOAが故障した症例を特定し解析し,RQAOAを改善する強化学習RQAOA変異体(RL-RQAOA)を提案する。 RL-RQAOA は、RQAOA が劣る特定インスタンスでは厳格に優れており、RQAOA がほぼ最適であるインスタンスでは同様に動作する。私たちの研究は、ハード問題に対する新しいより優れたヒューリスティックの設計において、強化学習と量子(インスパイアされた)最適化の間の潜在的に有益な相乗効果を示している。 Variational quantum algorithms such as the Quantum Approximation Optimization Algorithm (QAOA) in recent years have gained popularity as they provide the hope of using NISQ devices to tackle hard combinatorial optimization problems. It is, however, known that at low depth, certain locality constraints of QAOA limit its performance. To go beyond these limitations, a non-local variant of QAOA, namely recursive QAOA (RQAOA), was proposed to improve the quality of approximate solutions. The RQAOA has been studied comparatively less than QAOA, and it is less understood, for instance, for what family of instances it may fail to provide high quality solutions. However, as we are tackling $\mathsf{NP}$-hard problems (specifically, the Ising spin model), it is expected that RQAOA does fail, raising the question of designing even better quantum algorithms for combinatorial optimization. In this spirit, we identify and analyze cases where RQAOA fails and, based on this, propose a reinforcement learning enhanced RQAOA variant (RL-RQAOA) that improves upon RQAOA. We show that the performance of RL-RQAOA improves over RQAOA: RL-RQAOA is strictly better on these identified instances where RQAOA underperforms, and is similarly performing on instances where RQAOA is near-optimal. Our work exemplifies the potentially beneficial synergy between reinforcement learning and quantum (inspired) optimization in the design of new, even better heuristics for hard problems.	翻訳日:2024-02-07 21:43:01 公開日:2024-02-05
# 行列投影による等角線 Equiangular lines via matrix projection ( http://arxiv.org/abs/2110.15842v4 ) ライセンス: Link先を確認	Igor Balla	(参考訳) 1973年、lemmens と seidel は、角 $\arccos(\alpha)$ を持つ$\mathbb{r}^r$ の等角線の最大数を決定する問題を提起し、r \leq 1/\alpha^2 - 2$ というレジームにおいて部分的な答えを与えた。一方、$r$が少なくとも1/alpha$で指数関数的である場合、最近のブレークスルーはこの問題のほぼ完全な解決につながった。本論文では,従来の手法を統一し,改良した上界を得るための新しい手法を提案する。我々のアプローチは、フロベニウスの内積に関する行列の直交射影に依存し、副積として、$\mathbb{R}^r$ における$\binom{r+1}{2} に対応する強い正則グラフに対するアロン・ボッパナの定理の最初の拡張をもたらす。本手法の複雑な設定における応用についても考察する。 In 1973, Lemmens and Seidel posed the problem of determining the maximum number of equiangular lines in $\mathbb{R}^r$ with angle $\arccos(\alpha)$ and gave a partial answer in the regime $r \leq 1/\alpha^2 - 2$. At the other extreme where $r$ is at least exponential in $1/\alpha$, recent breakthroughs have led to an almost complete resolution of this problem. In this paper, we introduce a new method for obtaining upper bounds which unifies and improves upon previous approaches, thereby yielding bounds which bridge the gap between the aforementioned regimes and are best possible either exactly or up to a small multiplicative constant. Our approach relies on orthogonal projection of matrices with respect to the Frobenius inner product and as a byproduct, it yields the first extension of the Alon-Boppana theorem to dense graphs, with equality for strongly regular graphs corresponding to $\binom{r+1}{2}$ equiangular lines in $\mathbb{R}^r$. Applications of our method in the complex setting will be discussed as well.	翻訳日:2024-02-07 21:40:21 公開日:2024-02-05
# ランダムウォークの遷移結合による方向性ネットワークのアライメントと比較 Alignment and Comparison of Directed Networks via Transition Couplings of Random Walks ( http://arxiv.org/abs/2106.07106v3 ) ライセンス: Link先を確認	Bongsoo Yi, Kevin O'Connor, Kevin McGoff, Andrew B. Nobel	(参考訳) 本稿では,2つのネットワークの比較とアライメントのために,NetOTC (network optimal transition coupling) と呼ばれるトランスポートベースの手法について述べる。興味のネットワークは、方向付け、方向付け、重み付け、または非重み付けされ、異なる大きさの頂点集合を持つ。 2つのネットワークと頂点に関連するコスト関数が与えられた場合、NetOTCは最小のコストで関連するランダムウォークの遷移結合を見つける。最小化コストはネットワーク間の差を定量化し、最適なトランスポートプラン自体は2つのネットワークの頂点とエッジの両方のアライメントを提供する。完全なランダムウォークの結合は、その限界分布ではなく、NetOTCがネットワークのローカルおよびグローバルな情報をキャプチャし、エッジを保存することを保証する。 NetOTCには自由パラメータはなく、ランダム化に依存しない。本稿では,netotcの理論的性質を調査し,その実験結果について述べる。 We describe and study a transport based procedure called NetOTC (network optimal transition coupling) for the comparison and alignment of two networks. The networks of interest may be directed or undirected, weighted or unweighted, and may have distinct vertex sets of different sizes. Given two networks and a cost function relating their vertices, NetOTC finds a transition coupling of their associated random walks having minimum expected cost. The minimizing cost quantifies the difference between the networks, while the optimal transport plan itself provides alignments of both the vertices and the edges of the two networks. Coupling of the full random walks, rather than their marginal distributions, ensures that NetOTC captures local and global information about the networks, and preserves edges. NetOTC has no free parameters, and does not rely on randomization. We investigate a number of theoretical properties of NetOTC and present experiments establishing its empirical performance.	翻訳日:2024-02-07 21:38:09 公開日:2024-02-05
# 概念グラディエント: 線形推定のない概念に基づく解釈 Concept Gradient: Concept-based Interpretation Without Linear Assumption ( http://arxiv.org/abs/2208.14966v2 ) ライセンス: Link先を確認	Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil Y. C. Lin, Cho-Jui Hsieh	(参考訳) ブラックボックスモデルの概念に基づく解釈は、人間にとって理解しやすいことが多い。概念に基づく解釈の最も広く採用されているアプローチは概念活性化ベクトル(cav)である。 CAVは与えられたモデルと概念の潜在表現の間の線形関係を学ぶことに依存する。線型分離性は通常暗黙的に仮定されるが、一般には成り立たない。本研究では,概念ベース解釈の本来の意図から始まり,概念ベース解釈を線形概念関数を超えて拡張する概念グラディエント(CG)を提案する。一般の(潜在的に非線形な)概念に対して、モデルの予測に影響を及ぼす概念の小さな変化がいかにして、勾配に基づく解釈を概念空間に拡張するかを数学的に評価できることを示した。 cgがおもちゃの例と現実世界のデータセットの両方でcavを上回っていることを実証した。 Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.	翻訳日:2024-02-07 21:26:01 公開日:2024-02-05
# 時間周波数領域における光分数フーリエ変換の実験実装 Experimental implementation of the optical fractional Fourier transform in the time-frequency domain ( http://arxiv.org/abs/2303.13305v2 ) ライセンス: Link先を確認	Bartosz Niewelt, Marcin Jastrz\k{e}bski, Stanis{\l}aw Kurzyna, Jan Nowosielski, Wojciech Wasilewski, Mateusz Mazelanik, Micha{\l} Parniak	(参考訳) 位相空間の任意の角度の回転に対応する物理学の基本演算である分数フーリエ変換(frft)は、ノイズ低減のためのデジタル信号処理において必須のツールである。時間周波数自由度における光信号の処理は、デジタル化のステップをバイパスし、量子通信や古典通信、センシング、計算において多くのプロトコルを強化する機会を提供する。本稿では,処理能力を有する原子量子光学メモリシステムを用いて,時間周波数領域における分数フーリエ変換を実験的に実現する。本手法は,プログラム可能なインターリーブスペクトルと時間位相を付与することで動作を行う。ショットノイズ制限ホモダイン検出器を用いてchroncyclic wigner関数を解析してfrftを検証した。この結果から,時間モードソート,処理,超解パラメータ推定の実現が期待できる。 The fractional Fourier transform (FrFT), a fundamental operation in physics that corresponds to a rotation of phase space by any angle, is also an indispensable tool employed in digital signal processing for noise reduction. Processing of optical signals in their time-frequency degree of freedom bypasses the digitization step and presents an opportunity to enhance many protocols in quantum and classical communication, sensing and computing. In this letter, we present the experimental realization of the fractional Fourier transform in the time-frequency domain using an atomic quantum-optical memory system with processing capabilities. Our scheme performs the operation by imposing programmable interleaved spectral and temporal phases. We have verified the FrFT by analyses of chroncyclic Wigner functions measured via a shot-noise limited homodyne detector. Our results hold prospects for achieving temporal-mode sorting, processing and super-resolved parameter estimation.	翻訳日:2024-02-07 21:17:37 公開日:2024-02-05
# ヘルプフィードバックによるエージェントとの対話による協調環境における接地言語理解の改善 Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback ( http://arxiv.org/abs/2304.10750v2 ) ライセンス: Link先を確認	Nikhil Mehta, Milagro Teruel, Patricio Figueroa Sanz, Xin Deng, Ahmed Hassan Awadallah, and Julia Kiseleva	(参考訳) 自然言語処理(nlp)タスクに対する多くのアプローチは、エージェントが命令を受け取り、実行し、最終的な結果に基づいて評価するシングルステップ問題として扱うことが多い。しかし、人間の言語は本質的に対話的であり、人間の会話の前後の性質によって証明される。これを踏まえて、人間とAIのコラボレーションも対話的であり、人間がAIエージェントの作業を監視し、エージェントが理解し活用できるフィードバックを提供するべきであると仮定する。さらに、AIエージェントは追加情報が必要なタイミングを検出し、積極的に助けを求めることができる必要がある。このシナリオを実現することで、より自然で効率的で魅力的な人間とAIのコラボレーションが可能になる。本研究では, IGLUコンペティションによって定義された課題である, マイニングクラフトのような世界における対話型言語理解タスクを用いて, これらの方向を探索する。さまざまなタイプのヘルププレーヤがAIに与えてガイドし、AI行動におけるこのヘルプの影響を分析し、結果としてパフォーマンスが向上します。 Many approaches to Natural Language Processing (NLP) tasks often treat them as single-step problems, where an agent receives an instruction, executes it, and is evaluated based on the final outcome. However, human language is inherently interactive, as evidenced by the back-and-forth nature of human conversations. In light of this, we posit that human-AI collaboration should also be interactive, with humans monitoring the work of AI agents and providing feedback that the agent can understand and utilize. Further, the AI agent should be able to detect when it needs additional information and proactively ask for help. Enabling this scenario would lead to more natural, efficient, and engaging human-AI collaborations. In this work, we explore these directions using the challenging task defined by the IGLU competition, an interactive grounded language understanding task in a MineCraft-like world. We explore multiple types of help players can give to the AI to guide it and analyze the impact of this help in AI behavior, resulting in performance improvements.	翻訳日:2024-02-07 21:03:16 公開日:2024-02-05
# カリフォルニア大学サンフランシスコ校脳転移性定位ラジオサージ(ucsf-bmsr)mriデータセット The University of California San Francisco, Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) MRI Dataset ( http://arxiv.org/abs/2304.07248v3 ) ライセンス: Link先を確認	Jeffrey D. Rudie, Rachit Saluja, David A. Weiss, Pierre Nedelec, Evan Calabrese, John B. Colby, Benjamin Laguna, John Mongan, Steve Braunstein, Christopher P. Hess, Andreas M. Rauschecker, Leo P. Sugrue, and Javier E. Villanueva-Meyer	(参考訳) カリフォルニア大学サンフランシスコ校脳転移ステレオタクティック放射線外科(UCSF-BMSR)データセットは、5136脳転移の専門アノテーションを持つ412人の患者の560個の脳MRIからなる、パブリック、臨床、マルチモーダル脳MRIデータセットである。データは、T1後コントラスト、T1前コントラスト、FLAIRおよびサブトラクション(T1前コントラスト - T1後コントラスト)の画像と、NifTIフォーマットで脳転移を増強するボクセルワイズセグメンテーションからなる。このデータセットには、患者の人口統計、手術状況、および原発性がんの種類も含まれる。 UCSF-BSMRは、研究者たちがこれらのデータを使って脳転移のためのAIアプリケーションの境界を押し上げることを期待して、一般公開されている。 The University of California San Francisco Brain Metastases Stereotactic Radiosurgery (UCSF-BMSR) dataset is a public, clinical, multimodal brain MRI dataset consisting of 560 brain MRIs from 412 patients with expert annotations of 5136 brain metastases. Data consists of registered and skull stripped T1 post-contrast, T1 pre-contrast, FLAIR and subtraction (T1 pre-contrast - T1 post-contrast) images and voxelwise segmentations of enhancing brain metastases in NifTI format. The dataset also includes patient demographics, surgical status and primary cancer types. The UCSF-BSMR has been made publicly available in the hopes that researchers will use these data to push the boundaries of AI applications for brain metastases.	翻訳日:2024-02-07 21:02:41 公開日:2024-02-05
# マルチパーティの絡み合い:幾何学による旅 Multipartite Entanglement: A Journey Through Geometry ( http://arxiv.org/abs/2304.03281v2 ) ライセンス: Link先を確認	Songbo Xie, Daniel Younis, Yuhan Mei, Joseph H. Eberly	(参考訳) 真のマルチパーティの絡み合いは量子情報と関連する技術にとって不可欠であるが、量子化は長年の課題であった。提案されたほとんどの措置は ``genuine'' 要件を満たさないため、多くの用途に適さない。本研究では,多部交絡と幾何的単純化の超体積の関係を導入し,四部交絡のテトラヘドロン測度を導出することにより,この問題に対処するための旅程を提案する。 2つの高度に絡み合った4ビット状態の絡み合いランキングを比較することで、テトラヘドロン測度は量子系内の粒子間の置換不変度に依存することを示した。我々は,多体系における量子情報の揺らぎの文脈において,我々の測度の将来的応用を実証する。 Genuine multipartite entanglement is crucial for quantum information and related technologies but quantifying it has been a long-standing challenge. Most proposed measures do not meet the ``genuine'' requirement, making them unsuitable for many applications. In this work, we propose a journey toward addressing this issue by introducing an unexpected relation between multipartite entanglement and hypervolume of geometric simplices, leading to a tetrahedron measure of quadripartite entanglement. By comparing the entanglement ranking of two highly entangled four-qubit states, we show that the tetrahedron measure relies on the degree of permutation invariance among parties within the quantum system. We demonstrate potential future applications of our measure in the context of quantum information scrambling within many-body systems.	翻訳日:2024-02-07 21:02:05 公開日:2024-02-05
# LaCViT:視覚変換器のためのラベル対応コントラスト微調整フレームワーク LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers ( http://arxiv.org/abs/2303.18013v3 ) ライセンス: Link先を確認	Zijun Long, Zaiqiao Meng, Gerardo Aragon Camarasa, Richard McCreadie	(参考訳) ビジョントランスフォーマー(ViT)はコンピュータビジョンの一般的なモデルとして登場し、様々なタスクで最先端のパフォーマンスを実証している。この成功は一般的に、マスクされたランダムパッチのような自己教師付き信号を使用して大規模データセットの事前トレーニングを含む2段階の戦略を踏襲し、その後、クロスエントロピーロスを持つタスク固有のラベル付きデータセットを微調整する。しかし、このクロスエントロピー損失への依存はViTsの制限要因として認識され、その一般化と下流タスクへの伝達性に影響を及ぼす。この重要な課題に対処するため、新しいラベル対応コントラストトレーニングフレームワークであるLaCViTを導入し、ViTへの埋め込みの質を大幅に向上させる。 LaCViTは、クロスエントロピー損失の限界に対処するだけでなく、多様な画像分類タスク間でより効果的な移動学習を促進する。 8つの標準画像分類データセットに関する包括的実験により,lacvitは3つの評価vitの性能をトップ1の精度で10.78%向上させた。 Vision Transformers (ViTs) have emerged as popular models in computer vision, demonstrating state-of-the-art performance across various tasks. This success typically follows a two-stage strategy involving pre-training on large-scale datasets using self-supervised signals, such as masked random patches, followed by fine-tuning on task-specific labeled datasets with cross-entropy loss. However, this reliance on cross-entropy loss has been identified as a limiting factor in ViTs, affecting their generalization and transferability to downstream tasks. Addressing this critical challenge, we introduce a novel Label-aware Contrastive Training framework, LaCViT, which significantly enhances the quality of embeddings in ViTs. LaCViT not only addresses the limitations of cross-entropy loss but also facilitates more effective transfer learning across diverse image classification tasks. Our comprehensive experiments on eight standard image classification datasets reveal that LaCViT statistically significantly enhances the performance of three evaluated ViTs by up-to 10.78% under Top-1 Accuracy.	翻訳日:2024-02-07 21:01:49 公開日:2024-02-05
# qudit量子力学のフレーム表現 Frame representations of qudit quantum mechanics ( http://arxiv.org/abs/2305.19287v9 ) ライセンス: Link先を確認	Nicolae Cotfas	(参考訳) quditsのwigner関数を定義する試みは数多くあり、それぞれにその利点と限界がある。既存の有限バージョンは単純な定義を持つが、構成上は人工的であり、直感的な状態解析を許さない。連続バージョンはより複雑な定義を持つが、元のウィグナー関数と類似しており、量子状態の可視化を可能にする。我々が提示するタイトフレームの概念に基づくバージョンは有限であるが、連続バージョンと似た特性と応用がある。フレーム表現に基づいて、キュービット状態のグラフィカル表現をいくつか提示し、それらに関する新しいパラメータを定義する。数学的な観点から、qubitはqutritの直交射影であることを示す。 There exist many attempts to define a Wigner function for qudits, each of them coming with its advantages and limitations. The existing finite versions have simple definitions, but they are artificial in their construction and do not allow an intuitive state analysis. The continuous versions have more complicated definitions, but they are similar to the original Wigner function and allow a visualization of the quantum states. The version based on the concept of tight frame we present is finite, but it has certain properties and applications similar to those of continuous versions. Based on the frame representation, we present several graphical representations of qubit states, and define a new parameter concerning them. We show that, from a mathematical point of view, the qubit is the orthogonal projection of qutrit.	翻訳日:2024-02-07 20:52:29 公開日:2024-02-05
# QbC: 構成による量子精度 QbC: Quantum Correctness by Construction ( http://arxiv.org/abs/2307.15641v2 ) ライセンス: Link先を確認	Anurudh Peduri, Ina Schaefer, Michael Walter	(参考訳) 量子アルゴリズムの急速な進歩と複雑さの増大により、量子プログラムの正確性が大きな関心事となっている。過去数年間の研究で、量子ホア論理のような証明システムを用いて量子プログラムを正式に検証するための様々なアプローチが提案されている。これらの以前のアプローチはすべてポストホックで、まずプログラムを実装し、その正しさを検証します。本稿では、構成による量子正当性(QbC)について、その仕様から正当性を保証する方法で量子プログラムを構築するためのアプローチを提案する。我々は,プログラム特性の指定にプリ条件とポスト条件を用い,その仕様からプログラムを量子的に構築するための音質および完全精細化ルールを提案する。慣用的な問題やパターンに対する量子プログラムを構築してQbCを検証する。このアプローチは、プログラムの詳細を導出する方法を自然に示唆し、その過程で重要な設計選択を強調する。このように、QbCは量子アルゴリズムとソフトウェアの設計と分類を支援する役割を担っていると信じている。 Thanks to the rapid progress and growing complexity of quantum algorithms, correctness of quantum programs has become a major concern. Pioneering research over the past years has proposed various approaches to formally verify quantum programs using proof systems such as quantum Hoare logic. All these prior approaches are post-hoc: one first implements a program and only then verifies its correctness. Here we propose Quantum Correctness by Construction (QbC): an approach to constructing quantum programs from their specification in a way that ensures correctness. We use pre- and postconditions to specify program properties, and propose sound and complete refinement rules for constructing programs in a quantum while language from their specification. We validate QbC by constructing quantum programs for idiomatic problems and patterns. We find that the approach naturally suggests how to derive program details, highlighting key design choices along the way. As such, we believe that QbC can play a role in supporting the design and taxonomization of quantum algorithms and software.	翻訳日:2024-02-07 20:39:49 公開日:2024-02-05
# NESTLE: 法定コーパスの統計解析のためのノーコードツール NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus ( http://arxiv.org/abs/2309.04146v2 ) ライセンス: Link先を確認	Kyoungyeon Cho, Seungkum Han, Young Rok Choi, Wonseok Hwang	(参考訳) 大規模法人の統計分析は、貴重な法的洞察を与えることができる。このような分析には,(1)文書検索ツールを用いてコーパスのサブセットを選択すること,(2)情報抽出システムを用いた構造テキストを選択すること,(3)統計解析のためのデータを視覚化することが必要である。それぞれのプロセスは特別なツールかプログラミングスキルを必要とするが、統合された"ノーコード"ツールは提供されていない。 NESTLEは法定コーパスの大規模統計解析のためのノーコードツールである。 LLM(Large Language Model)と内部のカスタムエンド・ツー・エンドのIEシステムにより、NESTLEは、一行のコードを書かずに、コーパスの無制限にカスタマイズ可能な統計分析の可能性を開放するIEシステムで事前に定義されていないあらゆる種類の情報を抽出することができる。韓国のIEタスク15件とLexGLUEの法的テキスト分類タスク3件について,本システムを検証した。 NESTLEは、内部IEモジュールを4つの人間ラベルと192個のLLMラベルの例でトレーニングすることで、GPT-4に匹敵する性能を達成することができる。 The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structure text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Here we provide NESTLE, a no-code tool for large-scale statistical analysis of legal corpus. Powered by a Large Language Model (LLM) and the internal custom end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LexGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples.	翻訳日:2024-02-07 20:28:42 公開日:2024-02-05
# MultiWay-Adapater:スケーラブルな画像テキスト検索のための大規模マルチモーダルモデルの適用 MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval ( http://arxiv.org/abs/2309.01516v3 ) ライセンス: Link先を確認	Zijun Long, George Killick, Richard McCreadie, Gerardo Aragon Camarasa	(参考訳) MLLM(Multimodal Large Language Models)のサイズが大きくなるにつれて、高い計算量とメモリ要求のため、特定のタスクに適応することがますます困難になる。実際、タスク固有の広範囲なトレーニングが必要なため、従来の微調整手法はコストがかかる。これらのコスト削減を目的とした効率的な適応手法は存在するが、実際にはモーダル間アライメントが浅く、モデルの有効性を著しく損なう。これらの計算課題に対処し、モーダル間アライメントを改善するために、「アライメント・エンハンサー」を特徴とする新しいフレームワークであるMultiWay-Adapter(MWA)を導入する。このエンハンサーはモーダル間アライメントを深くし、最小のチューニング作業で高い転送性を実現する。実験の結果,従来の効率的なチューニング手法とは異なり,MWAはモデルの有効性を維持しつつ,トレーニング時間を最大57%削減できることがわかった。 MWAは軽量で、BEiT-3 Largeのような最先端の基盤モデルに対して、モデルサイズをわずか2-3%増加させる(パラメータの観点から)。これらの結果から,MWAはMLLMの効率的かつ効果的な適応法であり,適用性を大幅に向上することが示された。 As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized tasks becomes increasingly challenging due to high computational and memory demands. Indeed, traditional fine-tuning methods are costly, due to the need for extensive, task-specific training. While efficient adaptation methods exist that aim to reduce these costs, in practice they suffer from shallow inter-modal alignment, which severely hurts model effectiveness. To tackle these computational challenges and improve inter-modal alignment, we introduce the MultiWay-Adapter (MWA), a novel framework featuring an 'Alignment Enhancer'. This enhancer deepens inter-modal alignment, enabling high transferability with minimal tuning effort. Our experiments show that unlike prior efficient tuning approaches, MWA maintains model effectiveness, while reducing training time by up-to 57%. MWA is also lightweight, increasing model size by only 2-3% (in terms of parameters) for state-of-the-art foundation models like BEiT-3 Large. These results demonstrate that MWA provides an efficient and effective adaptation method for MLLMs, significantly broadening their applicability.	翻訳日:2024-02-07 20:27:39 公開日:2024-02-05
# 選好によるピアリング: 大きな言語モデルを調整するためのフィードバック獲得 Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models ( http://arxiv.org/abs/2308.15812v3 ) ライセンス: Link先を確認	Hritik Bansal, John Dang, Aditya Grover	(参考訳) 大きな言語モデル(LLM)と人間の価値と意図を批判的に調整するには、人間やAIのフィードバックを使用する必要がある。密集したフィードバックアノテーションは取得と統合に費用がかかるが、スパースフィードバックは評価(例えば1-7のスコアスコアA)とランキング(例えばレスポンスAがレスポンスBより優れているか? 本研究では,この設計選択がllmのアライメントと評価に与える影響を分析した。評価やランキングから推定される選好が、人間とAIのアノテータの60%と大きく異なるという矛盾した問題を明らかにする。以上の結果から,この現象を説明する注釈者バイアスの様々な側面を同定し,例えば,人間の注釈者は対数判断において精度を優先しながら,より密な応答を高く評価した。驚いたことに、フィードバックプロトコルの選択は、アライメントされたllmの評価にも大きな影響を与えることも観察しています。特に,アライメントのためのランキングデータ(例えばモデルx)を利用するllmは,ランクベースの評価プロトコル(x/yの応答は基準応答より優れているか?)で評価データ(例えばモデルy)を利用するものよりも好ましいが,格付けベースの評価プロトコル(score rank x/yの応答は1～7のスケールで応答する)は好まれている。以上の結果から,言語モデルの実用性評価手法における重要なギャップと,アライメントに使用するフィードバックプロトコルへの強い依存が浮き彫りになった。私たちのコードとデータはhttps://github.com/hritikbansal/sparse_feedbackで入手できます。 Aligning large language models (LLMs) with human values and intents critically involves the use of human or AI feedback. While dense feedback annotations are expensive to acquire and integrate, sparse feedback presents a structural design choice between ratings (e.g., score Response A on a scale of 1-7) and rankings (e.g., is Response A better than Response B?). In this work, we analyze the effect of this design choice for the alignment and evaluation of LLMs. We uncover an inconsistency problem wherein the preferences inferred from ratings and rankings significantly disagree 60% for both human and AI annotators. Our subsequent analysis identifies various facets of annotator biases that explain this phenomena, such as human annotators would rate denser responses higher while preferring accuracy during pairwise judgments. To our surprise, we also observe that the choice of feedback protocol also has a significant effect on the evaluation of aligned LLMs. In particular, we find that LLMs that leverage rankings data for alignment (say model X) are preferred over those that leverage ratings data (say model Y), with a rank-based evaluation protocol (is X/Y's response better than reference response?) but not with a rating-based evaluation protocol (score Rank X/Y's response on a scale of 1-7). Our findings thus shed light on critical gaps in methods for evaluating the real-world utility of language models and their strong dependence on the feedback protocol used for alignment. Our code and data are available at https://github.com/Hritikbansal/sparse_feedback.	翻訳日:2024-02-07 20:27:18 公開日:2024-02-05
# 大規模言語モデルに対するベイズ低位適応 Bayesian Low-rank Adaptation for Large Language Models ( http://arxiv.org/abs/2308.13111v5 ) ライセンス: Link先を確認	Adam X. Yang, Maxime Robeyns, Xi Wang, Laurence Aitchison	(参考訳) 低ランク適応(LoRA)は、大規模言語モデル(LLM)のコスト効率の高い微調整のための新しいパラダイムとして登場した。しかし、微調整LPMは、特に小さなデータセットで微調整された場合、過信されることが多い。ベイズ的手法は、不確実性を推定する固有の能力を持ち、過信を緩和し校正を強化する強力なツールとして機能する。本稿では,LoRAパラメータにベイズ的アプローチを適用するLaplace-LoRAを提案する。特に、Laplace-LoRAは、LoRAパラメータの後方にLaplace近似を適用し、微調整LDMの校正を大幅に改善した。 Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient fine-tuning of large language models (LLMs). However, fine-tuned LLMs often become overconfident especially when fine-tuned on small datasets. Bayesian methods, with their inherent ability to estimate uncertainty, serve as potent tools to mitigate overconfidence and enhance calibration. In this work, we introduce Laplace-LoRA, which applies a Bayesian approach to the LoRA parameters. Specifically, Laplace-LoRA applies a Laplace approximation to the posterior over the LoRA parameters, considerably improving the calibration of fine-tuned LLMs.	翻訳日:2024-02-07 20:26:46 公開日:2024-02-05
# 超伝導量子ビット読み出しのモデルベース最適化 Model-based Optimization of Superconducting Qubit Readout ( http://arxiv.org/abs/2308.02079v2 ) ライセンス: Link先を確認	Andreas Bengtsson, Alex Opremcak, Mostafa Khezri, Daniel Sank, Alexandre Bourassa, Kevin J. Satzinger, Sabrina Hong, Catherine Erickson, Brian J. Lester, Kevin C. Miao, Alexander N. Korotkov, Julian Kelly, Zijun Chen, Paul V. Klimov	(参考訳) 測定は量子アルゴリズムの不可欠な要素であり、超伝導量子ビットにとって、しばしば最もエラーを起こしやすい。本稿では,不正な副作用を回避しつつ,低測定誤差を達成するモデルベース読み出し最適化を示す。 17量子ビットの同時および中間回路計測では、500nsの終端持続時間と残共振器光子からの過剰リセット誤差を最小に抑え、キュービット当たり1.5%の誤差を観測する。また,自然加熱によって制限された漏出率を達成する測定誘起状態遷移を抑制する。この技術は数百の量子ビットに拡張でき、エラー訂正コードや短期アプリケーションの性能を高めるために使用される。 Measurement is an essential component of quantum algorithms, and for superconducting qubits it is often the most error prone. Here, we demonstrate model-based readout optimization achieving low measurement errors while avoiding detrimental side-effects. For simultaneous and mid-circuit measurements across 17 qubits, we observe 1.5% error per qubit with a 500ns end-to-end duration and minimal excess reset error from residual resonator photons. We also suppress measurement-induced state transitions achieving a leakage rate limited by natural heating. This technique can scale to hundreds of qubits and be used to enhance the performance of error-correcting codes and near-term applications.	翻訳日:2024-02-07 20:24:41 公開日:2024-02-05
# フロー: 推論とコラボレーションAIのブロックを構築する Flows: Building Blocks of Reasoning and Collaborating AI ( http://arxiv.org/abs/2308.01285v2 ) ライセンス: Link先を確認	Martin Josifoski, Lars Klein, Maxime Peyrard, Nicolas Baldwin, Yifei Li, Saibo Geng, Julian Paul Schnitzler, Yuxing Yao, Jiheng Wei, Debjit Paul, Robert West	(参考訳) 人工知能(AI)の最近の進歩は、高い能力と制御可能なシステムを生み出している。これは、構造化推論と、複数のAIシステムと人間間の協調のための前例のない機会を生み出します。この可能性を十分に実現するためには、そのような構造化相互作用を設計し研究する原則的な方法を開発することが不可欠である。この目的のために,概念的フレームワークフローを紹介する。フローは計算の自己完結したビルディングブロックであり、独立した状態を持ち、標準化されたメッセージベースのインターフェイスを介して通信する。このモジュール設計は、フローを任意にネストしたインタラクションに再帰的に構成し、本質的に並行性に優しくすることで、フロー生成のプロセスを単純化する。重要なことは、AI-AIとヒューマン-AIインタラクションの事前作業、エンジニアリングスキームのプロンプト、ツール拡張など、あらゆるインタラクションをこのフレームワークを使って実装することができる。我々は、gpt-4でさえも苦労する課題である競合型コーディングにおけるフローの可能性を示す。その結果,AIのみのフローに+21,ヒューマンAIフローに+54の絶対点を加えることで,構造化推論と協調により一般化が大幅に向上することが示唆された。高速かつ厳密な研究を支援するために,フローを具体化するaiflowsライブラリを紹介する。 aiFlowsライブラリはhttps://github.com/epfl-dlab/aiflowsで入手できる。実験を再現するためのデータとフローは、https://github.com/epfl-dlab/cc_flowsで閲覧できます。 Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the conceptual framework Flows. Flows are self-contained building blocks of computation, with an isolated state, communicating through a standardized message-based interface. This modular design simplifies the process of creating Flows by allowing them to be recursively composed into arbitrarily nested interactions and is inherently concurrency-friendly. Crucially, any interaction can be implemented using this framework, including prior work on AI-AI and human-AI interactions, prompt engineering schemes, and tool augmentation. We demonstrate the potential of Flows on competitive coding, a challenging task on which even GPT-4 struggles. Our results suggest that structured reasoning and collaboration substantially improve generalization, with AI-only Flows adding +21 and human-AI Flows adding +54 absolute points in terms of solve rate. To support rapid and rigorous research, we introduce the aiFlows library embodying Flows. The aiFlows library is available at https://github.com/epfl-dlab/aiflows. Data and Flows for reproducing our experiments are available at https://github.com/epfl-dlab/cc_flows.	翻訳日:2024-02-07 20:24:29 公開日:2024-02-05
# 潜時プロンプト変圧器による分子設計 Molecule Design by Latent Prompt Transformer ( http://arxiv.org/abs/2310.03253v2 ) ライセンス: Link先を確認	Deqian Kong, Yuhao Huang, Jianwen Xie, Ying Nian Wu	(参考訳) 本稿では,分子設計などの課題を解決するために,既存のソフトウェアで計算可能な化学・生物特性の最適値を持つ分子を探索することを目的とした,潜在プロンプトトランスフォーマモデルを提案する。提案モデルは3成分からなる。 1) 先行分布がガウス白色雑音ベクトルのUnet変換によってモデル化された潜在ベクトル。 2)(1)の潜在ベクトル上の条件付き分子の文字列に基づく表現を生成する分子生成モデル。我々は(1) の潜伏ベクトルをプロンプトとする因果変換器モデルを採用する。 3)(1)の潜在ベクトルに対する非線形回帰に基づく分子のターゲット特性の値を予測する特性予測モデル。我々は提案したモデルを遅延プロンプトトランスフォーマーモデルと呼ぶ。モデルが既存の分子とそれらの性質値について初期訓練を行った後、分子設計の目的のために対象特性の所望値を支持する領域へモデルを徐々にシフトさせる。実験により,提案モデルが複数のベンチマーク分子設計タスクにおいて,技術性能の状態を達成できることが判明した。 This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.	翻訳日:2024-02-07 20:16:40 公開日:2024-02-05
# 拡散による目標達成の学習 Learning to Reach Goals via Diffusion ( http://arxiv.org/abs/2310.02505v2 ) ライセンス: Link先を確認	Vineet Jain and Siamak Ravanbakhsh	(参考訳) 本稿では,拡散モデルに基づく目標条件強化学習の新たな視点について述べる。ガウスノイズがデータ多様体から離れるランダムな軌跡を生成する拡散過程に類似して、潜在的な目標状態から離れて移動する軌跡を構築する。次に、スコア関数と同様に、これらの偏差を逆転する目標条件のポリシーを学ぶ。 Merlinと呼ばれるこのアプローチは、別の値関数を学ぶことなく、任意の初期状態から特定の目標に到達することができます。オフラインRLでの拡散モデルを利用する最近の研究とは対照的に、Merlinは状態空間での拡散を行う最初の方法として際立っている。我々は,様々なオフライン目標達成タスクにおいて,提案手法を実験的に検証し,最先端手法に比べて性能が大幅に向上し,他の拡散型rl法に比べて計算効率が桁違いに向上したことを示す。以上の結果から,RLの拡散に対するこの視点は,逐次的意思決定のためのシンプルでスケーラブルで実践的な方向であることが示唆された。 We present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of denoising diffusion models. Analogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy to reverse these deviations, analogously to the score function. This approach, which we call Merlin, can reach specified goals from an arbitrary initial state without learning a separate value function. In contrast to recent works utilizing diffusion models in offline RL, Merlin stands out as the first method to perform diffusion in the state space, requiring only one ``denoising" iteration per environment step. We experimentally validate our approach in various offline goal-reaching tasks, demonstrating substantial performance enhancements compared to state-of-the-art methods while improving computational efficiency over other diffusion-based RL methods by an order of magnitude. Our results suggest that this perspective on diffusion for RL is a simple, scalable, and practical direction for sequential decision making.	翻訳日:2024-02-07 20:16:06 公開日:2024-02-05
# P-ROCKET:特徴選択から見た時系列分類のためのランダム畳み込みカーネル P-ROCKET: Pruning Random Convolution Kernels for Time Series Classification from a Feature Selection Perspective ( http://arxiv.org/abs/2309.08499v2 ) ライセンス: Link先を確認	Shaowu Chen, Weize Sun, Lei Huang, Xiaopeng Li, Qingyuan Wang, Deepu John	(参考訳) 近年、ROCKETとMINIROCKETという2つの競合時系列分類モデルが、トレーニングコストの低さと高い精度で注目されている。しかし、リソース制約のあるデバイスと互換性のない機能を包括的にキャプチャするには、多数のランダムな1-D畳み込みカーネルが必要である。冗長カーネルを認識するように設計されたヒューリスティックアルゴリズムの開発にもかかわらず、進化アルゴリズムに固有の時間を要する性質は効率的な評価を妨げる。モデルを効果的にpruneするために,シーケンシャル分類器の接続を解消し,冗長なランダムカーネルを特徴選択の観点から除去する。第一のadmmに基づくアルゴリズムは群弾性ネット分類問題としてpruning challengeを定式化し、第二のコアアルゴリズムであるp-rocketは問題を2つの逐次段階に分岐させることにより、第一のアルゴリズムを大幅に加速する。 P-ROCKETのステージ1は動的に異なるペナルティを導入し、冗長カーネルを削除するためにグループレベルの正規化を効率的に実装する。様々な時系列データセットによる実験結果から、P-ROCKETは精度を著しく低下させることなく最大60%のカーネルを産み出し、P-ROCKETよりも11倍高速であることが示された。私たちのコードはhttps://github.com/ShaowuChen/P-ROCKETで公開されています。 In recent years, two competitive time series classification models, namely, ROCKET and MINIROCKET, have garnered considerable attention due to their low training cost and high accuracy. However, they require a large number of random 1-D convolutional kernels to comprehensively capture features, which is incompatible with resource-constrained devices. Despite the development of heuristic algorithms designed to recognize and prune redundant kernels, the inherent time-consuming nature of evolutionary algorithms hinders efficient evaluation. To effectively prune models, this paper removes redundant random kernels from a feature selection perspective by eliminating associating connections in the sequential classifier. Two innovative algorithms are proposed, where the first ADMM-based algorithm formulates the pruning challenge as a group elastic net classification problem, and the second core algorithm named P-ROCKET greatly accelerates the first one by bifurcating the problem into two sequential stages. Stage 1 of P-ROCKET introduces dynamically varying penalties to efficiently implement group-level regularization to delete redundant kernels, and Stage 2 employs element-level regularization on the remaining features to refit a linear classifier for better performance. Experimental results on diverse time series datasets show that P-ROCKET prunes up to 60% of kernels without a significant reduction in accuracy and performs 11 times faster than its counterparts. Our code is publicly available at https://github.com/ShaowuChen/P-ROCKET.	翻訳日:2024-02-07 20:12:43 公開日:2024-02-05
# 大きな言語モデルの性質を評価する:人類中心主義に対する注意 Assessing the nature of large language models: A caution against anthropocentrism ( http://arxiv.org/abs/2309.07683v2 ) ライセンス: Link先を確認	Ann Speed	(参考訳) 生成AIモデルは、OpenAIsチャットボットであるChatGPTのリリースによって、多くの大衆の注目を集め、憶測を呼んだ。少なくとも2つの意見キャンプが存在する。1つは、これらのモデルが人間のタスクに根本的な変化をもたらす可能性に興奮している。これらの問題に対処するため,標準的,規範的,評価された認知的・人格的尺度を用いて,主にGPT 3.5の評価を行った。この実生プロジェクトのために、私たちは、これらのモデルの能力のいくつかの境界、その能力が短時間でどれだけ安定しているか、そしてそれらがどのように人間と比較するかを推定できるテストのバッテリを開発しました。以上の結果から, LLMは人格の発見に反応する能力は興味深いが, 知覚を発達させる可能性が低いことが示唆された。 GPT3.5は、人間のような性格を持つと予測されない、反復的な観察よりも認知と人格の尺度に大きなばらつきを示した。多様性にも拘わらず、LSMは、低自尊心、現実からの解離、時には高揚感と有益な反応にもかかわらず、ナルシシズムやサイコパシーなど、人間の心の健康状態の悪いものを示す。 Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both cognitive and personality measures over repeated observations, which is not expected if it had a human-like personality. Variability notwithstanding, LLMs display what in a human would be considered poor mental health, including low self-esteem, marked dissociation from reality, and in some cases narcissism and psychopathy, despite upbeat and helpful responses.	翻訳日:2024-02-07 20:11:59 公開日:2024-02-05
# O3D:大規模言語モデルを用いた逐次決定処理のためのオフラインデータ駆動探索と蒸留 O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models ( http://arxiv.org/abs/2310.14403v4 ) ライセンス: Link先を確認	Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani, Jared Vann, Deepeka Garg, Sumitra Ganesh	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、逐次意思決定問題を解決する上で有望な性能を示した。プロンプト(インコンテキストラーニング)で提供される少数の例を模倣することで、LLMエージェントは外部環境と対話し、追加のトレーニングなしでタスクを完了させることができる。しかし、そのような少数の例は複雑で長い水平なタスクに対して高品質な解を生成するには不十分であるが、限られた文脈長は長い相互作用の地平線を持つより大規模な実演を消費することができない。そこで本研究では,オフラインデータを大規模に利用するオフライン学習フレームワーク(例えば,人間のインタラクションログ)を提案し,llmを活用したポリシーを微調整することなく改善する。提案手法であるO3D (Offline Data-driven Discovery and Distillation) は, オフラインインタラクションデータに基づいて, 再利用可能なスキルを自動的に発見し, 一般化可能な知識を抽出し, 下流タスクを解く能力を向上する。 2つの対話型意思決定ベンチマーク(ALFWorldとWebShop)による実証的な結果から、O3Dはオフライン発見および蒸留プロセスを通じてLLMの意思決定能力を著しく向上し、様々なLLMのベースラインを一貫して上回っていることが確認された。 Recent advancements in large language models (LLMs) have exhibited promising performance in solving sequential decision-making problems. By imitating few-shot examples provided in the prompts (i.e., in-context learning), an LLM agent can interact with an external environment and complete given tasks without additional training. However, such few-shot examples are often insufficient to generate high-quality solutions for complex and long-horizon tasks, while the limited context length cannot consume larger-scale demonstrations with long interaction horizons. To this end, we propose an offline learning framework that utilizes offline data at scale (e.g, logs of human interactions) to improve LLM-powered policies without finetuning. The proposed method O3D (Offline Data-driven Discovery and Distillation) automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data, advancing the capability of solving downstream tasks. Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably enhance the decision-making capabilities of LLMs through the offline discovery and distillation process, and consistently outperform baselines across various LLMs.	翻訳日:2024-02-07 20:03:58 公開日:2024-02-05
# ワッサースタイン距離を用いた自己教師型学習の実証的研究 An Empirical Study of Self-supervised Learning with Wasserstein Distance ( http://arxiv.org/abs/2310.10143v2 ) ライセンス: Link先を確認	Makoto Yamada and Yuki Takezawa and Guillaume Houry and Kira Michaela Dusterwald and Deborah Sulem and Han Zhao and Yao-Hung Hubert Tsai	(参考訳) 本研究では,木構造上の1-ワッサーシュタイン距離(木-ワッサースタイン距離(TWD))を利用して,TWDを2つの木埋め込みベクトル間のL1距離として定義する自己教師付き学習(SSL)問題について検討する。 SSL法では、コサイン類似性はしばしば目的関数として利用されるが、ワッサーシュタイン距離を利用する際にはあまり研究されていない。ワッサースタイン距離の訓練は数値的に難しい。そこで本研究では,ワッサーシュタイン距離でSSLを最適化する戦略を実証的に検討し,安定した訓練方法を見出した。具体的には,2種類のTWD(Total variationとClusterTree)と,ソフトマックス関数,ArcFace確率モデル,単純な埋め込みを含むいくつかの確率モデルの組み合わせを評価する。最適化を安定させるために, 単純で効果的なジェフリー発散に基づく正規化法を提案する。 STL10, CIFAR10, CIFAR100, SVHNの実証実験により, ソフトマックス関数とTWDの簡単な組み合わせにより, 標準SimCLRよりもはるかに低い結果が得られることがわかった。さらに、TWDとSimSiamの単純な組み合わせはモデルのトレーニングに失敗する。モデル性能はTWDと確率モデルの組み合わせに依存し,ジェフリー発散正規化はモデルの訓練に有効であることがわかった。最後に、TWDと確率モデルの適切な組み合わせはコサイン類似性に基づく表現学習より優れていることを示す。 In this study, we delve into the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein distance. Training the Wasserstein distance is numerically challenging. Thus, this study empirically investigates a strategy for optimizing the SSL with the Wasserstein distance and finds a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several probability models, including the softmax function, the ArcFace probability model, and simplicial embedding. We propose a simple yet effective Jeffrey divergence-based regularization method to stabilize optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we find that a simple combination of the softmax function and TWD can obtain significantly lower results than the standard SimCLR. Moreover, a simple combination of TWD and SimSiam fails to train the model. We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training. Finally, we show that the appropriate combination of the TWD and probability model outperforms cosine similarity-based representation learning.	翻訳日:2024-02-07 20:02:44 公開日:2024-02-05
# プライベート高次元モデル選択の計算複雑性について On the Computational Complexity of Private High-dimensional Model Selection ( http://arxiv.org/abs/2310.07852v2 ) ライセンス: Link先を確認	Saptarshi Roy, Zehua Wang, Ambuj Tewari	(参考訳) プライバシー制約下での高次元疎線形回帰モデルにおけるモデル選択の問題を考える。本稿では,モデル選択によく知られた指数的メカニズムを応用して,高い効用性を有する差分プライベートなベストサブセット選択法を提案する。本稿では,効率的なメトロポリス・ハスティングスアルゴリズムを提案し,その定常分布に対する多項式混合時間を求める。さらに,metropolis-hastingsランダムウォークの最終推定のための近似微分プライバシーを,その混合特性を用いて確立する。最後に,アルゴリズムの強力な有用性を示すいくつかの例証実験を行う。 We consider the problem of model selection in a high-dimensional sparse linear regression model under privacy constraints. We propose a differentially private best subset selection method with strong utility properties by adopting the well-known exponential mechanism for selecting the best model. We propose an efficient Metropolis-Hastings algorithm and establish that it enjoys polynomial mixing time to its stationary distribution. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we perform some illustrative experiments that show the strong utility of our algorithm.	翻訳日:2024-02-07 20:01:39 公開日:2024-02-05
# vlattack: 事前学習モデルによる視覚言語タスクに対するマルチモーダル攻撃 VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models ( http://arxiv.org/abs/2310.04655v3 ) ライセンス: Link先を確認	Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma	(参考訳) VL(Vision-Language)事前訓練モデルは、多くのマルチモーダルタスクにおいて優位性を示している。しかし、そのようなモデルの敵対的堅牢性は十分に検討されていない。既存のアプローチは主に、非現実的なホワイトボックス設定の下で敵の堅牢性を探究することに焦点を当てている。本稿では,学習済みのVLモデルを用いて画像とテキストの摂動を創り出し,異なる下流タスクにおけるブラックボックスの微調整モデルに対処する,新たな実用的課題について検討する。そこで本研究では,VLATTACKを用いて,画像とテキストの摂動を単一モードレベルとマルチモードレベルの両方から分離し,対向サンプルを生成する。単一モードレベルでは、画像摂動を学習して普遍表現を乱すブロックワイド類似性攻撃(BSA)戦略を提案する。また,既存のテキスト攻撃戦略を採用し,画像モーダル攻撃とは無関係にテキストの摂動を生成する。マルチモーダルレベルでは、単一のモーダルレベルからの出力から始まる逆画像とテキストのペアを定期的に更新する新しい反復的クロスサーチ攻撃法(ICSA)を設計する。広範に使われている5つのVL事前訓練モデルの6つのタスクに対する攻撃実験を行った。実験結果から,VLATTACKは最先端のベースラインと比較して,全タスクにおける攻撃成功率が最も高く,事前訓練されたVLモデルの展開に盲点があることが判明した。ソースコードはhttps://github.com/ericyinyzy/VLAttack.comにある。 Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks. Towards this end, we propose VLATTACK to generate adversarial samples by fusing perturbations of images and texts from both single-modal and multimodal levels. At the single-modal level, we propose a new block-wise similarity attack (BSA) strategy to learn image perturbations for disrupting universal representations. Besides, we adopt an existing text attack strategy to generate text perturbations independent of the image-modal attack. At the multimodal level, we design a novel iterative cross-search attack (ICSA) method to update adversarial image-text pairs periodically, starting with the outputs from the single-modal level. We conduct extensive experiments to attack five widely-used VL pre-trained models for six tasks. Experimental results show that VLATTACK achieves the highest attack success rates on all tasks compared with state-of-the-art baselines, which reveals a blind spot in the deployment of pre-trained VL models. Source codes can be found at https://github.com/ericyinyzy/VLAttack.	翻訳日:2024-02-07 20:00:53 公開日:2024-02-05
# 悪い奴を蹴飛ばせ! フェデレーション学習におけるゼロ知識に基づく異常検出 Kick Bad Guys Out! Zero-Knowledge-Proof-Based Anomaly Detection in Federated Learning ( http://arxiv.org/abs/2310.04055v2 ) ライセンス: Link先を確認	Shanshan Han, Wenxuan Wu, Baturalp Buyukates, Weizhao Jin, Qifan Zhang, Yuhang Yao, Salman Avestimehr, Chaoyang He	(参考訳) 悪意のあるクライアントが毒のモデルを提出して、グローバルモデルが収束したり、バックドアを植えたりしないようにし、グローバルモデルにいくつかのサンプルを誤分類させる。現在の防衛手法は、実世界のflシステムでは、非実用的な事前知識に依存するか、攻撃が起こらない場合でも精度の損失をもたらすため、不足している。また、これらの手法は実行を検証するプロトコルを提供しておらず、参加者はメカニズムの正しい実行を疑っている。そこで本研究では,現実のFLシステムを対象とした新しい異常検出手法を提案する。本手法は攻撃発生時にのみ防御を活性化し,無害なモデルに影響を与えずに悪意のあるモデルを正確に除去する。さらに, 防御機構の完全性を保証するため, ゼロ知識証明を取り入れている。実験結果は,flシステムの敵攻撃に対する安全性向上に本手法の有効性を示す。 Federated Learning (FL) systems are vulnerable to adversarial attacks, where malicious clients submit poisoned models to prevent the global model from converging or plant backdoors to induce the global model to misclassify some samples. Current defense methods fall short in real-world FL systems, as they either rely on impractical prior knowledge or introduce accuracy loss even when no attack happens. Also, these methods do not offer a protocol for verifying the execution, leaving participants doubtful about the correct execution of the mechanism. To address these issues, we propose a novel anomaly detection strategy designed for real-world FL systems. Our approach activates the defense only upon occurrence of attacks, and removes malicious models accurately, without affecting the benign ones. Additionally, our approach incorporates zero-knowledge proofs to ensure the integrity of defense mechanisms. Experimental results demonstrate the effectiveness of our approach in enhancing the security of FL systems against adversarial attacks.	翻訳日:2024-02-07 20:00:24 公開日:2024-02-05
# 自己回帰変換器の構成能力:合成・解釈可能な課題に関する研究 Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks ( http://arxiv.org/abs/2311.12997v2 ) ライセンス: Link先を確認	Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka	(参考訳) 巨大なテキストコーパスでトレーニングされたトランスフォーマーは、基本的な演算を実行するなど、驚くべき能力セットを示している。言語の固有の構成的性質を考えると、モデルがこれらの機能を構成することを学び、入力でどのような操作を実行できるかの組み合わせ的な爆発をもたらすことを期待できる。そこで本研究では, 自己回帰的トランスフォーマーモデルを合成データ生成プロセス上で訓練し, 高度に定義されたモノリシックな機能の集合を合成する。 Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) generating intermediate outputs when composing functions is more effective for generalizing to new, unseen compositions than not generating any intermediate outputs (3) biases in the order of the compositions in the training data result in Transformers that fail to compose some combinations of functions; and (4) the attention layers select which capability to apply while the feed-forward layers execute the selected capability. Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing basic arithmetic. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we train autoregressive Transformer models on a synthetic data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) generating intermediate outputs when composing functions is more effective for generalizing to new, unseen compositions than not generating any intermediate outputs (3) biases in the order of the compositions in the training data result in Transformers that fail to compose some combinations of functions; and (4) the attention layers select which capability to apply while the feed-forward layers execute the selected capability.	翻訳日:2024-02-07 19:52:24 公開日:2024-02-05
# MGTR:LiDARを用いた動き予測用多角形変圧器 MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR ( http://arxiv.org/abs/2312.02409v2 ) ライセンス: Link先を確認	Yiqian Gan, Hao Xiao, Yizhe Zhao, Ethan Zhang, Zhe Huang, Xin Ye, Lingting Ge	(参考訳) 動き予測は、異なる種類の移動エージェントを含む非常に不確実で複雑なシナリオを扱うため、自動運転システムにおいて不可欠な要素である。本稿では,多言語TRansformer(MGTR)フレームワークを提案する。これは,異なる種類のトラフィックエージェントに対して,異なる粒度のコンテキスト特徴を利用するエンコーダデコーダネットワークである。 MGTRの機能をさらに強化するために,既製のLiDAR特徴抽出器からLiDAR意味機能を組み込むことで,LiDARポイントクラウドデータを活用する。 waymo open dataset motion prediction benchmark 上で mgtr を評価し,提案手法が最先端性能を達成し,そのリーダボードでは1位となった(https://waymo.com/open/challenges/2023/motion-prediction/)。 Motion prediction has been an essential component of autonomous driving systems since it handles highly uncertain and complex scenarios involving moving agents of different types. In this paper, we propose a Multi-Granular TRansformer (MGTR) framework, an encoder-decoder network that exploits context features in different granularities for different kinds of traffic agents. To further enhance MGTR's capabilities, we leverage LiDAR point cloud data by incorporating LiDAR semantic features from an off-the-shelf LiDAR feature extractor. We evaluate MGTR on Waymo Open Dataset motion prediction benchmark and show that the proposed method achieved state-of-the-art performance, ranking 1st on its leaderboard (https://waymo.com/open/challenges/2023/motion-prediction/).	翻訳日:2024-02-07 19:39:50 公開日:2024-02-05
# PathMMU: 病理の理解と推論のための大規模マルチモーダルエキスパートレベルベンチマーク PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology ( http://arxiv.org/abs/2401.16355v2 ) ライセンス: Link先を確認	Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang	(参考訳) 大規模なマルチモーダルモデルの出現は、AI、特に病理学において顕著な可能性を解き放っている。しかし、専門的で高品質なベンチマークの欠如は、彼らの開発と正確な評価を妨げた。そこで我々は,LMMのための最大かつ高品質な専門家評価型病理診断ベンチマークPathMMUを紹介する。 33,573個のマルチモーダル・マルチチョイス問題と21,599枚の画像からなり、各質問に合致する正しい回答の説明がある。 PathMMUの構築はGPT-4Vのロバストな能力を生かし、約30,000枚の画像キャプチャーペアを使用してQ\&Aを生成する。ここでは,PathMMUの権威を最大化するために,PathMMUの検証とテストセットの厳格な基準の下で各質問を精査し,同時にPathMMUのエキスパートレベルのパフォーマンスベンチマークを設定する。我々は,14のオープンソースおよび3つのクローズドソースlmmのゼロショット評価,画像腐敗に対するロバスト性など,広範な評価を行う。また、PathMMUへの適応性を評価するために、代表LMMを微調整する。実験の結果、先進的なLMMは挑戦的なPathMMUベンチマークに苦戦し、トップパフォーマンスのLMMであるGPT-4Vは51.7%のゼロショットのパフォーマンスしか達成せず、ヒトの病理学者が示した71.4倍よりも大幅に低かった。微調整の後、オープンソースのLMMでさえ60\%以上のパフォーマンスでGPT-4Vを超えることができるが、いまだに病理学者が示した専門知識に欠けている。 PathMMUが貴重な洞察を提供し、より専門的で次世代のLLMの開発を促進することを期待しています。 The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for LMMs. It comprises 33,573 multimodal multi-choice questions and 21,599 images from various sources, and an explanation for the correct answer accompanies each question. The construction of PathMMU capitalizes on the robust capabilities of GPT-4V, utilizing approximately 30,000 gathered image-caption pairs to generate Q\&As. Significantly, to maximize PathMMU's authority, we invite six pathologists to scrutinize each question under strict standards in PathMMU's validation and test sets, while simultaneously setting an expert-level performance benchmark for PathMMU. We conduct extensive evaluations, including zero-shot assessments of 14 open-sourced and three closed-sourced LMMs and their robustness to image corruption. We also fine-tune representative LMMs to assess their adaptability to PathMMU. The empirical findings indicate that advanced LMMs struggle with the challenging PathMMU benchmark, with the top-performing LMM, GPT-4V, achieving only a 51.7\% zero-shot performance, significantly lower than the 71.4\% demonstrated by human pathologists. After fine-tuning, even open-sourced LMMs can surpass GPT-4V with a performance of over 60\%, but still fall short of the expertise shown by pathologists. We hope that the PathMMU will offer valuable insights and foster the development of more specialized, next-generation LLMs for pathology.	翻訳日:2024-02-07 19:03:36 公開日:2024-02-05
# SimFair:シミュレーションモデルによる物理誘導公正学習 SimFair: Physics-Guided Fairness-Aware Learning with Simulation Models ( http://arxiv.org/abs/2401.15270v2 ) ライセンス: Link先を確認	Zhihao Wang, Yiqun Xie, Zhili Li, Xiaowei Jia, Zhe Jiang, Aolin Jia, Shuo Xu	(参考訳) フェアネス・アウェアネスは、現実のアプリケーションにおける人工知能の責任を負うための重要なビルディングブロックとして登場した。多くの場合、パフォーマンスの不平等は、異なる領域における分布の変化によるものである。公平性の伝達性を改善する技術が開発されているが、この問題の解決策は必ずしも新しい領域からのサンプルがなくても実現可能であるとは限らない。幸いなことに、物理学に基づく力学モデルは、大きな社会的影響を持つ多くの問題に対して研究されてきた。物理ルールに基づくシミュレーションと逆モデリングをトレーニング設計に統合することにより,データ制限をブリッジする物理誘導型公正学習フレームワークであるSimFairを提案する。温度予測を例として,フェアネス保存におけるSimFairの有効性を示す。 Fairness-awareness has emerged as an essential building block for the responsible use of artificial intelligence in real applications. In many cases, inequity in performance is due to the change in distribution over different regions. While techniques have been developed to improve the transferability of fairness, a solution to the problem is not always feasible with no samples from the new regions, which is a bottleneck for pure data-driven attempts. Fortunately, physics-based mechanistic models have been studied for many problems with major social impacts. We propose SimFair, a physics-guided fairness-aware learning framework, which bridges the data limitation by integrating physical-rule-based simulation and inverse modeling into the training design. Using temperature prediction as an example, we demonstrate the effectiveness of the proposed SimFair in fairness preservation.	翻訳日:2024-02-07 19:02:11 公開日:2024-02-05
# 埋め込みネットワークのためのGradCAMの一般化 Generalizing GradCAM for Embedding Networks ( http://arxiv.org/abs/2402.00909v2 ) ライセンス: Link先を確認	Mudit Bachhawat	(参考訳) CNNの可視化は、信頼の構築とモデルの予測を説明する上で重要な部分である。 CAMやGradCAMのような手法は、出力に責任のある画像の領域のローカライズに成功しているが、分類モデルに限られている。本稿では,組込みネットワークのためのGrad-CAMを一般化した EmbeddingCAM を提案する。分類ネットワークでは, EmbeddingCAM が GradCAM に還元されることを示す。本手法は,cub-200-2011データセット上での有効性を示すとともに,定量的・定性的な解析を行う。 Visualizing CNN is an important part in building trust and explaining model's prediction. Methods like CAM and GradCAM have been really successful in localizing area of the image responsible for the output but are only limited to classification models. In this paper, we present a new method EmbeddingCAM, which generalizes the Grad-CAM for embedding networks. We show that for classification networks, EmbeddingCAM reduces to GradCAM. We show the effectiveness of our method on CUB-200-2011 dataset and also present quantitative and qualitative analysis on the dataset.	翻訳日:2024-02-07 18:49:22 公開日:2024-02-05
# SymbolicAI: 生成モデルとソルバを組み合わせた論理的アプローチのためのフレームワーク SymbolicAI: A framework for logic-based approaches combining generative models and solvers ( http://arxiv.org/abs/2402.00854v2 ) ライセンス: Link先を確認	Marius-Constantin Dinu and Claudiu Leoveanu-Condrei and Markus Holzleitner and Werner Zellinger and Sepp Hochreiter	(参考訳) 生成過程における概念学習とフロー管理に論理的アプローチを取り入れた,汎用的でモジュール化されたフレームワークであるSybolicAIを紹介する。 SymbolicAIは、自然言語とフォーマルな言語命令の両方に基づいてタスクを実行するセマンティックパーザとして、大きな言語モデル(LLM)を扱い、シンボリック推論と生成AIのギャップを埋めることによって、さまざまな問題解決者と生成モデルのシームレスな統合を可能にする。我々は確率的プログラミングの原理を利用して複雑なタスクに取り組み、それぞれの強みで微分可能および古典的なプログラミングパラダイムを利用する。このフレームワークは、データストリーム操作のための多型、構成、自己参照操作のセットを導入し、LCM出力をユーザ目標と整合させる。その結果、ゼロショット学習能力を持つ様々な基礎モデルの能力と、特定の問題に熟達した特殊で微調整されたモデルやソルバーを切り替えることができる。このフレームワークは、説明可能な計算グラフの作成と評価を容易にする。本稿では、これらの計算グラフを評価するための品質指標とその経験的スコアを導入し、複雑なワークフローの集合にまたがる様々な最先端のLCMを比較するベンチマークを提案する。我々は経験的スコアを「相互相似性による関係軌道評価のためのベクトル埋め込み」または「頂点スコア」と呼ぶ。フレームワークのコードベースとベンチマークを以下にリンクする。 We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for data stream manipulation, aligning LLM outputs with user objectives. As a result, we can transition between the capabilities of various foundation models endowed with zero- and few-shot learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. In turn, the framework facilitates the creation and evaluation of explainable computational graphs. We conclude by introducing a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase and benchmark are linked below.	翻訳日:2024-02-07 18:48:26 公開日:2024-02-05
# PirateNets: 残差適応ネットワークを用いた物理インフォームドディープラーニング PirateNets: Physics-informed Deep Learning with Residual Adaptive Networks ( http://arxiv.org/abs/2402.00326v2 ) ライセンス: Link先を確認	Sifan Wang, Bowen Li, Yuhan Chen, Paris Perdikaris	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、偏微分方程式(PDE)によって支配される前方および逆問題に対処するための一般的なディープラーニングフレームワークとなっているが、より大規模で深いニューラルネットワークアーキテクチャを採用すると、その性能は劣化することが知られている。この反直観的行動の根源は、不適な初期化スキームを持つ多層パーセプトロン(MLP)アーキテクチャを使うことであり、結果としてネットワークデリバティブの練習性が低下し、最終的にはPDE残留損失の不安定な最小化につながる。これを解決するために,我々は,深いPINNモデルの安定かつ効率的なトレーニングを容易にする新しいアーキテクチャであるPicical-informed Residual Adaptive Networks (PirateNets)を導入する。 PirateNetsは、新しい適応的残留接続を活用し、トレーニング中に徐々に深くなっていく浅層ネットワークとしてネットワークを初期化することができる。また,提案手法により,与えられたPDEシステムに対応する適切な帰納バイアスをネットワークアーキテクチャに符号化できることを示す。我々は、パイレーツネットの最適化が容易であり、精度が大幅に向上し、最終的には様々なベンチマークで最先端の結果が得られることを示す包括的な実証的証拠を提供する。この原稿に付随するすべてのコードとデータは、 \url{https://github.com/PredictiveIntelligenceLab/jaxpi}で公開される。 While physics-informed neural networks (PINNs) have become a popular deep learning framework for tackling forward and inverse problems governed by partial differential equations (PDEs), their performance is known to degrade when larger and deeper neural network architectures are employed. Our study identifies that the root of this counter-intuitive behavior lies in the use of multi-layer perceptron (MLP) architectures with non-suitable initialization schemes, which result in poor trainablity for the network derivatives, and ultimately lead to an unstable minimization of the PDE residual loss. To address this, we introduce Physics-informed Residual Adaptive Networks (PirateNets), a novel architecture that is designed to facilitate stable and efficient training of deep PINN models. PirateNets leverage a novel adaptive residual connection, which allows the networks to be initialized as shallow networks that progressively deepen during training. We also show that the proposed initialization scheme allows us to encode appropriate inductive biases corresponding to a given PDE system into the network architecture. We provide comprehensive empirical evidence showing that PirateNets are easier to optimize and can gain accuracy from considerably increased depth, ultimately achieving state-of-the-art results across various benchmarks. All code and data accompanying this manuscript will be made publicly available at \url{https://github.com/PredictiveIntelligenceLab/jaxpi}.	翻訳日:2024-02-07 18:47:10 公開日:2024-02-05
# CNN-LSTM-MLPハイブリッド核融合モデルを用いたコンピュータビジョンによるストーキング検出 A Computer Vision Based Approach for Stalking Detection Using a CNN-LSTM-MLP Hybrid Fusion Model ( http://arxiv.org/abs/2402.03417v1 ) ライセンス: Link先を確認	Murad Hasan, Shahriar Iqbal, Md. Billal Hossain Faisal, Md. Musnad Hossin Neloy, Md. Tonmoy Kabir, Md. Tanzim Reza, Md. Golam Rabiul Alam, Md Zia Uddin	(参考訳) 近年,犯罪や不審な活動の検出が研究の話題となっている。コンピュータビジョン技術の急速な成長は、この問題の解決に大きな影響を与えた。しかし、現代の技術の進化にもかかわらず、物理的ストーキング検出はいまだ探索の少ない領域である。今日では、公共の場所でストーカーがよく行われ、女性が最も影響を受けている。ストーキング(ストーキング、英: stalking)は、通常、ストーカーが暴行、誘拐、強姦などの犯罪行為を行う前に、被害者を追跡し、横行し、凝視し始めると、犯罪行為が始まる前に発生する目に見える行為である。そのため,ストーキング検出により,これらの犯罪行為をすべて防止できるため,ストーキング検出の必要性が高まっている。本研究では,最小限のフレーム数で単一のビデオから潜在的ストーカーを検出するための,ディープラーニングに基づくハイブリッド融合モデルを提案する。ビデオフレームから顔ランドマーク,頭部ポーズ推定,相対距離といった複数の関連特徴を数値として抽出する。このデータは多層パーセプトロン(MLP)に入力され、ストーキングと非ストーキングシナリオの分類タスクを実行する。同時に、ビデオフレームは畳み込みモデルとLSTMモデルの組み合わせに入力され、時空間の特徴を抽出する。我々はこれらの数値的特徴と時空間的特徴を融合して、ストーキング事件を検出する分類器を構築する。また,様々な特徴映画やテレビシリーズから収集したストーキング映像と非ストーキング映像からなるデータセットを導入し,モデルを訓練する。実験結果は,提案するストーカー検出システムの効率とダイナミズムを示し,最新手法と比較して89.58%の精度を達成し,大幅に改善した。 Criminal and suspicious activity detection has become a popular research topic in recent years. The rapid growth of computer vision technologies has had a crucial impact on solving this issue. However, physical stalking detection is still a less explored area despite the evolution of modern technology. Nowadays, stalking in public places has become a common occurrence with women being the most affected. Stalking is a visible action that usually occurs before any criminal activity begins as the stalker begins to follow, loiter, and stare at the victim before committing any criminal activity such as assault, kidnapping, rape, and so on. Therefore, it has become a necessity to detect stalking as all of these criminal activities can be stopped in the first place through stalking detection. In this research, we propose a novel deep learning-based hybrid fusion model to detect potential stalkers from a single video with a minimal number of frames. We extract multiple relevant features, such as facial landmarks, head pose estimation, and relative distance, as numerical values from video frames. This data is fed into a multilayer perceptron (MLP) to perform a classification task between a stalking and a non-stalking scenario. Simultaneously, the video frames are fed into a combination of convolutional and LSTM models to extract the spatio-temporal features. We use a fusion of these numerical and spatio-temporal features to build a classifier to detect stalking incidents. Additionally, we introduce a dataset consisting of stalking and non-stalking videos gathered from various feature films and television series, which is also used to train the model. The experimental results show the efficiency and dynamism of our proposed stalker detection system, achieving 89.58% testing accuracy with a significant improvement as compared to the state-of-the-art approaches.	翻訳日:2024-02-07 18:39:05 公開日:2024-02-05
# 散乱損失解析を用いた光マイクロナノファイバーのその場評価 In-situ characterization of optical micro/nano fibers using scattering loss analysis ( http://arxiv.org/abs/2402.03400v1 ) ライセンス: Link先を確認	Shashank Suman, Elaganuru Bashaiah, Resmi M, and Ramachandrarao Yalla	(参考訳) 光学マイクロナノファイバー(MNF)のその場特性を実験的に実証した。 MNF(試験繊維、TF)はマイクロファイバー(プローブ繊維、PF)上に配置され、様々なPFおよびTF径での散乱損失をシミュレーションする。 TFは化学エッチング技術を用いて製造される。 pfは従来の単モードファイバであり、外径125mmである。 TF軸沿いの散乱損失は,PF上に設置することで,様々な位置,すなわち直径で測定する。測定された散乱損失からTFの直径分布を推定し,その表面形態測定と相関した。本研究は、マイクロナノファイバー(OMNFs)のその場評価のための有効で低コストで非破壊的な方法を示す。 OMNFの表面の不規則性を検知し、決定することができる。また、局所的エバネッセント場を定量化するためにも用いられる。このような局所点の検出は、様々なセンシングおよび関連する研究領域においてこれらの分野を用いて行われる研究を改善することができる。実装は簡単で、研究者のすべてのドメインからアクセスすることができる。 We experimentally demonstrate the in-situ characterization of optical micro/nano fibers (MNFs).The MNF (test fiber, TF) is positioned on a microfiber (probe fiber, PF) and simulated for the scattering loss at various PF and TF diameters. The TF is fabricated using chemical etching technique. The PF is a conventional single-mode fiber with an outer diameter of 125 um. We measure the scattering loss along the TF axis at various positions i.e. diameters by mounting it on the PF. The diameter profile of the TF is inferred from the measured scattering loss and correlated with its surface morphology measurement. This work demonstrates an effective, low-cost, and non-destructive method for in-situ characterization of fabricated micro/nano fibers (OMNFs). It can detect and determine the irregularities on the surface of OMNFs. It can also be used to quantify the local evanescent field. Detecting such local points can improve studies that are carried out using these fields in various sensing and related study domains. It is simple to implement and can be accessed by all domains of researchers.	翻訳日:2024-02-07 18:38:33 公開日:2024-02-05
# 画像復元モデルにおけるRGB色表現の再考 Rethinking RGB Color Representation for Image Restoration Models ( http://arxiv.org/abs/2402.03399v1 ) ライセンス: Link先を確認	Jaerin Lee, JoonKyu Park, Sungyong Baik and Kyoung Mu Lee	(参考訳) 画像復元モデルは、典型的にはrgb色表現空間上で定義された画素間距離損失で訓練されるが、これは復元された画像のぼやけた非現実的なテクスチャの源としてよく知られている。その理由は、3チャンネルのRGB空間が復元モデルの監視に不十分であるからである。この目的のために,色情報と画素粒度を損なうことなく,各画素の局所的な近傍の構造情報を保持する表現を補強する。その結果、拡張RGB(a$RGB)空間と呼ばれる新しい表現空間が誕生した。画素毎損失の基底表現空間を構成することにより、画像復元モデルのトレーニングが容易になり、評価フェーズに影響を与えずに性能が向上する。特に、敵対的・知覚的損失などの補助的目的と組み合わせることで、我々の$a$RGB空間は、従来の知覚・歪曲トレードオフを克服し、色と局所構造の両方を再構築することで、全体的なメトリクスを一貫して改善します。 Image restoration models are typically trained with a pixel-wise distance loss defined over the RGB color representation space, which is well known to be a source of blurry and unrealistic textures in the restored images. The reason, we believe, is that the three-channel RGB space is insufficient for supervising the restoration models. To this end, we augment the representation to hold structural information of local neighborhoods at each pixel while keeping the color information and pixel-grainedness unharmed. The result is a new representation space, dubbed augmented RGB ($a$RGB) space. Substituting the underlying representation space for the per-pixel losses facilitates the training of image restoration models, thereby improving the performance without affecting the evaluation phase. Notably, when combined with auxiliary objectives such as adversarial or perceptual losses, our $a$RGB space consistently improves overall metrics by reconstructing both color and local structures, overcoming the conventional perception-distortion trade-off.	翻訳日:2024-02-07 18:38:18 公開日:2024-02-05
# マルチタスク学習を用いた深部非線形ハイパースペクトルアンミックス Deep Nonlinear Hyperspectral Unmixing Using Multi-task Learning ( http://arxiv.org/abs/2402.03398v1 ) ライセンス: Link先を確認	Saeid Mehrdad, Seyed AmirHossein Janani	(参考訳) 非線形ハイパースペクトルアンミキシングは、線形混合モデルがいくつかの問題において許容可能な解決に導かないため、最近かなりの注目を集めている。実際、ほとんどの非線形アンミックス法は、非混合性能を制限する非線形性モデルに関する特定の仮定を仮定して設計されている。本稿では,特別な仮定を持たない一般非線形モデルを用いて,ディープラーニングに基づく教師なし非混合手法を提案する。このモデルは2つの枝からなる。第1のブランチでは、いくつかの隠れたレイヤを用いてハイパースペクトル画像の行を再構成してエンドメンバーを学習し、第2のブランチでは、各イメージの列に基づいて存在量値を学ぶ。次に、マルチタスク学習を用いて、2つのブランチを連携させる補助タスクを導入する。この手法は、オーバーフィッティングを緩和する正規化器と見なすことができ、ネットワーク全体の性能が向上する。合成データおよび実データに関する広範な実験により,提案手法の有効性が検証された。 Nonlinear hyperspectral unmixing has recently received considerable attention, as linear mixture models do not lead to an acceptable resolution in some problems. In fact, most nonlinear unmixing methods are designed by assuming specific assumptions on the nonlinearity model which subsequently limits the unmixing performance. In this paper, we propose an unsupervised nonlinear unmixing approach based on deep learning by incorporating a general nonlinear model with no special assumptions. This model consists of two branches. In the first branch, endmembers are learned by reconstructing the rows of hyperspectral images using some hidden layers, and in the second branch, abundance values are learned based on the columns of respective images. Then, using multi-task learning, we introduce an auxiliary task to enforce the two branches to work together. This technique can be considered as a regularizer mitigating overfitting, which improves the performance of the total network. Extensive experiments on synthetic and real data verify the effectiveness of the proposed method compared to some state-of-the-art hyperspectral unmixing methods.	翻訳日:2024-02-07 18:38:00 公開日:2024-02-05
# CFTにおける絡み合い非対称性とその非位相的欠陥との関係 Entanglement asymmetry in CFT and its relation to non-topological defects ( http://arxiv.org/abs/2402.03446v1 ) ライセンス: Link先を確認	Michele Fossati, Filiberto Ares, Jerome Dubail, Pasquale Calabrese	(参考訳) エンタングルメント非対称性(英: entanglement asymmetry)は、拡張量子系の領域における対称性の破れの程度を定量化する情報ベースの観測量である。 CFTにより記述された1次元臨界系の基底状態におけるこの測定値について検討する。大域対称性と欠陥の対応を利用して、絡み合い非対称性の解析はリーマン面上の分割函数の項で定式化でき、分岐切断に複数の非位相欠陥線が挿入される。大規模サブシステムの場合、これらの分割関数は欠陥のスケーリング次元によって決定される。臨界点において、エンタングルメント非対称性は、大きなサブシステム長に対して$\log \ell / \ell$ でスケールするサブリードの貢献を得る。そして、実例として、質量を持たないマヨラナフェルミオン理論によって記述された臨界線を持つXYスピン鎖を考察し、z$軸に関する回転に関連する$U(1)$対称性を明示的に破る。この状況では、対応する欠陥は限界である。共形不変性を利用して、これらの欠陥のスケーリング次元と、等間隔の点欠陥を持つ円上のマヨラナフェルミオンの基底状態エネルギーを関連付ける。このマッピングを利用して、我々の2番目の主な結果である任意の強度の欠陥のn$に関連するスケーリング次元の正確な表現を導出します。この結果は、いくつかの先行研究で導かれた$n=1$ のケースの既知の公式を一般化する。次に、この正確なスケーリング次元を使って、3番目の主結果を引き出す:臨界xy鎖の非対称性における$\log \ell/\ell$項の正確な前因子。 The entanglement asymmetry is an information based observable that quantifies the degree of symmetry breaking in a region of an extended quantum system. We investigate this measure in the ground state of one dimensional critical systems described by a CFT. Employing the correspondence between global symmetries and defects, the analysis of the entanglement asymmetry can be formulated in terms of partition functions on Riemann surfaces with multiple non-topological defect lines inserted at their branch cuts. For large subsystems, these partition functions are determined by the scaling dimension of the defects. This leads to our first main observation: at criticality, the entanglement asymmetry acquires a subleading contribution scaling as $\log \ell / \ell$ for large subsystem length $\ell$. Then, as an illustrative example, we consider the XY spin chain, which has a critical line described by the massless Majorana fermion theory and explicitly breaks the $U(1)$ symmetry associated with rotations about the $z$-axis. In this situation the corresponding defect is marginal. Leveraging conformal invariance, we relate the scaling dimension of these defects to the ground state energy of the massless Majorana fermion on a circle with equally-spaced point defects. We exploit this mapping to derive our second main result: the exact expression for the scaling dimension associated with $n$ of defects of arbitrary strengths. Our result generalizes a known formula for the $n=1$ case derived in several previous works. We then use this exact scaling dimension to derive our third main result: the exact prefactor of the $\log \ell/\ell$ term in the asymmetry of the critical XY chain.	翻訳日:2024-02-07 18:27:04 公開日:2024-02-05
# イメージベースレンダリングによるノイズ拡散 Denoising Diffusion via Image-Based Rendering ( http://arxiv.org/abs/2402.03445v1 ) ライセンス: Link先を確認	Titas Anciukevicius, Fabian Manhardt, Federico Tombari, Paul Henderson	(参考訳) 3Dシーンの生成は、難しいオープンな問題であり、3D空間で完全に一貫した可塑性コンテンツを合成する必要がある。視合成や3次元再構成において神経放射場のような近年の手法は優れているが、生成能力が欠如しているため、観測されていない領域で可塑性詳細を合成することはできない。逆に、既存の生成法は、限られた容量の3dシーン表現、アライメントされたカメラポーズを必要とする、あるいは追加のレギュレータに依存するため、野生の詳細な大規模なシーンを再構築することができない。本研究では,現実の3Dシーンの高速かつ詳細な再構築と生成を可能にする最初の拡散モデルを提案する。これを達成するために、私たちは3つの貢献をします。まず、我々は、大きな3Dシーンを効率よく正確に表現し、各画像で見える詳細を捉えるのに必要な容量を動的に割り当てる新しいニューラルシーン表現であるIBプレーンを導入する。第二に,マスクや奥行きなどの追加の監視信号を必要としない2次元画像のみを用いて,この新しい3次元シーン表現の事前学習を行うためのデノイング拡散フレームワークを提案する。これは統一アーキテクチャにおける3D再構成と生成をサポートする。第3に,拡散モデルとイメージベースレンダリングを統合する際に,画像の表現を取り除き,自明な3dソリューションを避けるための原理的手法を開発した。実画像と合成画像の難解なデータセット上でモデルを評価し, 生成, 新規なビュー合成, 3次元再構成における優れた結果を示す。 Generating 3D scenes is a challenging open problem, which requires synthesizing plausible content that is fully consistent in 3D space. While recent methods such as neural radiance fields excel at view synthesis and 3D reconstruction, they cannot synthesize plausible details in unobserved regions since they lack a generative capability. Conversely, existing generative methods are typically not capable of reconstructing detailed, large-scale scenes in the wild, as they use limited-capacity 3D scene representations, require aligned camera poses, or rely on additional regularizers. In this work, we introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes. To achieve this, we make three contributions. First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes, dynamically allocating more capacity as needed to capture details visible in each image. Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images without the need for any additional supervision signal such as masks or depths. This supports 3D reconstruction and generation in a unified architecture. Third, we develop a principled approach to avoid trivial 3D solutions when integrating the image-based rendering with the diffusion model, by dropping out representations of some images. We evaluate the model on several challenging datasets of real and synthetic images, and demonstrate superior results on generation, novel view synthesis and 3D reconstruction.	翻訳日:2024-02-07 18:26:38 公開日:2024-02-05
# デコヒーレンス下における量子条件相互情報の非局所的成長 Nonlocal growth of quantum conditional mutual information under decoherence ( http://arxiv.org/abs/2402.03439v1 ) ライセンス: Link先を確認	Yifan Zhang, Sarang Gopalakrishnan	(参考訳) 局所的な測定では絡み合いが生じないが、量子テレポーテーションのように短距離の絡み合いを長距離の絡み合いに変換することができる。この測定誘起エンタングルメント(MIE)現象は,近年,測定誘起エンタングルメント相転移と関連する現象について広く議論されている。ここでは、長距離条件付き相互情報(CMI)のデコヒーレンス下での成長に関して、より広い文脈でMIEを定めている。我々は、デコヒーレンスが長距離CMIを生成できる速度を上限とし、この境界を飽和させる状態の特性を導出する。 cmi上界を飽和する状態の構造は、異なるデコヒーレントダイナミクスの下で大きく異なる可能性があることを指摘し、明示的な例を与える。さらに、回路深さの関数として、ランダム局所デコヒーレンスを受けるランダム量子回路におけるCMIのダイナミクスを探索する。我々は、有限深さテレポーテーション遷移の普遍性クラスとその低い臨界次元は、測定値よりも消去に関して異なると主張する。 Local measurements cannot create entanglement, but they can convert short-range entanglement to long-range entanglement, as in quantum teleportation. This phenomenon of measurement-induced entanglement (MIE) has been widely discussed in recent work on measurement-induced entanglement phase transitions and related phenomena. Here, we situate MIE in a broader context of the growth of long-range conditional mutual information (CMI) under decoherence. We upper-bound the rate at which decoherence can generate long-range CMI, and derive a characterization of states that saturate this bound. We point out that the structure of states saturating the CMI upper bound can be very different under different decoherent dynamics and provide explicit examples. We additionally explore the dynamics of CMI in random quantum circuits subject to random local decoherence, as a function of circuit depth. We argue that the universality class of the finite-depth teleportation transition, as well as its lower critical dimension, are different for erasures than for measurements.	翻訳日:2024-02-07 18:26:09 公開日:2024-02-05
# 大規模言語モデルによる心理的評価--プライバシーに焦点をあてた費用効果のアプローチ Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach ( http://arxiv.org/abs/2402.03435v1 ) ライセンス: Link先を確認	Sergi Blanco-Cuaresma	(参考訳) 本研究は,Reddit利用者のテキストコメントの分析にLarge Language Models (LLMs) を用いており,まず,自殺リスクの心理的評価を支持する重要な抜粋を指摘し,さらに,事前に設定された自殺リスクレベルを裏付ける資料を要約する。この作業は、ローカルで実行できる“オープンソース”なLLMを使用することで、データのプライバシが向上する。さらに、計算量の低いモデルを優先し、限られた計算予算で運用する個人と機関の両方にアクセスできるようにする。実装された戦略は、LLMのテキスト補完を導くために、慎重に作られたプロンプトと文法にのみ依存する。そのシンプルさにもかかわらず、評価指標は優れた結果を示し、価値あるプライバシー重視でコスト効率の良いアプローチとなっている。この研究は、computational linguistics and clinical psychology (clpsych) 2024の共有タスクの一部である。 This study explores the use of Large Language Models (LLMs) to analyze text comments from Reddit users, aiming to achieve two primary objectives: firstly, to pinpoint critical excerpts that support a predefined psychological assessment of suicidal risk; and secondly, to summarize the material to substantiate the preassigned suicidal risk level. The work is circumscribed to the use of "open-source" LLMs that can be run locally, thereby enhancing data privacy. Furthermore, it prioritizes models with low computational requirements, making it accessible to both individuals and institutions operating on limited computing budgets. The implemented strategy only relies on a carefully crafted prompt and a grammar to guide the LLM's text completion. Despite its simplicity, the evaluation metrics show outstanding results, making it a valuable privacy-focused and cost-effective approach. This work is part of the Computational Linguistics and Clinical Psychology (CLPsych) 2024 shared task.	翻訳日:2024-02-07 18:25:51 公開日:2024-02-05
# 間隔の量子力学的ブートストラップ:正確なスペクトルの取得 Quantum mechanical bootstrap on the interval: obtaining the exact spectrum ( http://arxiv.org/abs/2402.03434v1 ) ライセンス: Link先を確認	Lewis Sword, David Vegh	(参考訳) 特定のモデルに対して、量子力学的ブートストラップは正確な結果を見つけることができることを示す。 h=sz(1-z)s$ を持つ可解系を考えると、ここでは $z$ と $s$ は正準可換関係を満たす。このモデルは、適切な座標変換を用いて、珍しいように見えるが、シュリンガー方程式は、P\"oschl-Teller-型ポテンシャルを持つ標準形式にキャストすることができる。システムは間隔で定義されるので、$s$ は自己随伴ではないことがよく知られている。それでもブートストラップ法は実装可能であり、無限に肯定的な制約が生じる。ある演算子順序を用いることで、エネルギー固有値はバンドにのみ制限される。しかし、別の順序付けにより、有限個の制約が低いエネルギーレベルを正確に固定するのに十分であることが分かる。 We show that for a particular model, the quantum mechanical bootstrap is capable of finding exact results. We consider a solvable system with Hamiltonian $H=SZ(1-Z)S$, where $Z$ and $S$ satisfy canonical commutation relations. While this model may appear unusual, using an appropriate coordinate transformation, the Schr\"odinger equation can be cast into a standard form with a P\"oschl-Teller-type potential. Since the system is defined on an interval, it is well-known that $S$ is not self-adjoint. Nevertheless, the bootstrap method can still be implemented, producing an infinite set of positivity constraints. Using a certain operator ordering, the energy eigenvalues are only constrained into bands. With an alternative ordering, however, we find that a finite number of constraints is sufficient to fix the low-lying energy levels exactly.	翻訳日:2024-02-07 18:25:33 公開日:2024-02-05
# キラルな位相状態の逐次断熱的生成 Sequential Adiabatic Generation of Chiral Topological States ( http://arxiv.org/abs/2402.03433v1 ) ライセンス: Link先を確認	Xie Chen, Michael Hermele, David T. Stephen	(参考訳) 前回の研究では、逐次量子回路を用いて製品状態から非自明なガッピング状態が生成できることが示されている。回路構成は、正確に解ける固定点の様々なギャップのある状態に対して与えられた。本稿では, キラルな位相状態に対しても, 完全可解な形式を欠いているにもかかわらず, 同様の生成手順が確立可能であることを示す。局所ユニタリゲートを逐次適用する代わりに、1つの部分領域における局所項を次に変化させることで、ハミルトニアンを逐次進化させる。ハミルトニアンは過程を通じてガッピングされ、積状態からキラルな位相状態への基底状態のマッピングを可能にする断熱的な進化をもたらす。チャーン絶縁体や$p+ip$超伝導体のような自由フェルミオンキラル状態に対する一連の断熱生成過程を示す。さらに、離散ゲージ群への量子状態の結合はシーケンシャルな量子回路によって達成できることを示し、したがって自由フェルミオン群から相互作用するキラル位相状態を生成する。 In previous work, it was shown that non-trivial gapped states can be generated from a product state using a sequential quantum circuit. Explicit circuit constructions were given for a variety of gapped states at exactly solvable fixed points. In this paper, we show that a similar generation procedure can be established for chiral topological states as well, despite the fact that they lack an exactly solvable form. Instead of sequentially applying local unitary gates, we sequentially evolve the Hamiltonian by changing local terms in one subregion and then the next. The Hamiltonian remains gapped throughout the process, giving rise to an adiabatic evolution mapping the ground state from a product state to a chiral topological state. We demonstrate such a sequential adiabatic generation process for free fermion chiral states like the Chern Insulator and the $p+ip$ superconductor. Moreover, we show that coupling a quantum state to a discrete gauge group can be achieved through a sequential quantum circuit, thereby generating interacting chiral topological states from the free fermion ones.	翻訳日:2024-02-07 18:25:20 公開日:2024-02-05
# 暗号検閲 Cryptographic Censorship ( http://arxiv.org/abs/2402.03425v1 ) ライセンス: Link先を確認	Netta Engelhardt, {\AA}smund Folkestad, Adam Levine, Evita Verheijden, Lisa Yang	(参考訳) 我々は弱宇宙検閲予想の量子バージョンを証明するために、2つの大きな一歩を踏み出した。ホログラフィック CFT の時間発展作用素が、あるコード部分空間上のほぼ擬似乱数(あるいはハール乱数)であるとき、対応するバルク双対に事象の地平線が存在する必要があることを示す定理である。この結果は(有限時間に)事象の地平線形成を保証し、大域的な時空構造に関する仮定を最小にする一般的な条件を与える。この定理は、最近の量子学習のno-go定理の拡張に依存しており、疑似ランダム測度濃度の新しい手法を用いて証明されている。この結果を宇宙検閲に適用するために、特異点を古典的、半計画的、プランク的タイプに分離する。古典的および半プランク的特異点がおよそ擬似乱数 CFT 時間進化と相容れないことを示し、したがって、そのような特異点が実際にほぼ擬似乱数であるならば、暗号検閲により、それらは事象の地平線が存在しない状態では存在できない。この結果は、一般に地平線の典型性に依存している量子カオスと熱化に関するセミナルホログラフィック結果がAdS/CFTにおける裸の特異点の形成によって無効にならないという十分な条件を与える。 We formulate and take two large strides towards proving a quantum version of the weak cosmic censorship conjecture. We first prove "Cryptographic Censorship": a theorem showing that when the time evolution operator of a holographic CFT is approximately pseudorandom (or Haar random) on some code subspace, then there must be an event horizon in the corresponding bulk dual. This result provides a general condition that guarantees (in finite time) event horizon formation, with minimal assumptions about the global spacetime structure. Our theorem relies on an extension of a recent quantum learning no-go theorem and is proved using new techniques of pseudorandom measure concentration. To apply this result to cosmic censorship, we separate singularities into classical, semi-Planckian, and Planckian types. We illustrate that classical and semi-Planckian singularities are compatible with approximately pseudorandom CFT time evolution; thus, if such singularities are indeed approximately pseudorandom, by Cryptographic Censorship, they cannot exist in the absence of event horizons. This result provides a sufficient condition guaranteeing that seminal holographic results on quantum chaos and thermalization, whose general applicability relies on typicality of horizons, will not be invalidated by the formation of naked singularities in AdS/CFT.	翻訳日:2024-02-07 18:25:01 公開日:2024-02-05
# コヒーレント衝突デコヒーレンス Coherent collisional decoherence ( http://arxiv.org/abs/2402.03421v1 ) ライセンス: Link先を確認	Leonardo Badurina, Clara Murgui, Ryan Plestid	(参考訳) 背景ガスとのコヒーレント散乱によるn$非干渉重粒子(原子)系の非一貫性について検討した。 N$粒子量子状態の任意の準備のために、誘導位相シフトとコントラストの損失を計算するための新しい枠組みを導入する。我々は本質的に$(N\geq 2)$-body効果を持つ新しい位相シフトを見つけ、将来の実験で探索することができる。干渉計の2モード近似を含む簡単な設定を解析する。我々は、物質干渉計の相関位置に似た完全に絡み合ったn00n$状態と、原子干渉計の典型的な状態を表す全く無相関な生成状態について研究した。一貫性強化がデコヒーレンス率を増加させる程度は, 関心の観測可能性, 状態準備, 実験設計の詳細に依存することがわかった。 We study the decoherence of a system of $N$ non-interacting heavy particles (atoms) due to coherent scattering with a background gas. We introduce a new framework for computing the induced phase shift and loss of contrast for arbitrary preparations of $N$-particle quantum states. We find new phase shifts that are inherently $(N\geq 2)$-body effects and may be searched for in future experiments. We analyze simple setups, including a two-mode approximation of an interferometer. We study fully entangled $N00N$ states, which resemble the correlated positions in a matter interferometer, as well as totally uncorrelated product states that are representative of a typical state in an atom interferometer. We find that the extent to which coherent enhancements increase the rate of decoherence depends on the observable of interest, state preparation, and details of the experimental design.	翻訳日:2024-02-07 18:24:33 公開日:2024-02-05
# 自動パラメトリック脳PETマッピングのための部分容積補正による血液入力を導出するエンド・ツー・エンド深層学習パイプライン An end-to-end deep learning pipeline to derive blood input with partial volume corrections for automated parametric brain PET mapping ( http://arxiv.org/abs/2402.03414v1 ) ライセンス: Link先を確認	Rugved Chavan, Gabriel Hyman, Zoraiz Qureshi, Nivetha Jayakumar, William Terrell, Stuart Berr, David Schiff, Megan Wardius, Nathan Fountain, Thomas Muttikkal, Mark Quigg, Miaomiao Zhang, Bijoy Kundu	(参考訳) 動的2-[18f] fluoro-2-deoxy-d-glucose positron emission tomography (dfdg-pet) の臨床応用の可能性は高いが,利用範囲は限られている。 dFDG-PETの定量分析における重要な課題は、伝統的に侵襲的動脈血液採取に依存した患者固有の血液入力機能を特徴づけることである。本研究は,内頸動脈 (ICA) からの非侵襲的ディープラーニングモデルに基づく計算を一部容積 (PV) 補正に応用し,侵襲的動脈サンプリングの必要性を排除した新しいアプローチを提案する。本稿では,3次元U-NetをベースとしたICA-netをICAセグメンテーション用として,リカレントニューラルネットワーク(RNN)ベースのMCIF-netを組み込んだエンドツーエンドパイプラインを提案する。開発した3D U-NetとRNNは、50のヒト脳FDG PETデータセットに対して、5倍のクロスバリデーションアプローチを用いてトレーニングされ、検証された。 ICA-netのDiceスコアは平均82.18%、Unionのインターセクションは全スキャンで68.54%に達した。さらにMCIF-netは0.0052の最小ルート平均二乗誤差を示した。このパイプラインをdFDG-PET脳スキャンの真理データに応用することにより、発作発生領域の正確な局所化が達成され、治療後の発作のない状態が達成された。これらの結果は,ICA構造分布の学習と,PV補正によるMCIF計算の自動化において,ICA-netおよびMCIF-net深層学習パイプラインの有効性を裏付けるものである。この進歩は、非侵襲的神経画像化における大きな飛躍を意味する。 Dynamic 2-[18F] fluoro-2-deoxy-D-glucose positron emission tomography (dFDG-PET) for human brain imaging has considerable clinical potential, yet its utilization remains limited. A key challenge in the quantitative analysis of dFDG-PET is characterizing a patient-specific blood input function, traditionally reliant on invasive arterial blood sampling. This research introduces a novel approach employing non-invasive deep learning model-based computations from the internal carotid arteries (ICA) with partial volume (PV) corrections, thereby eliminating the need for invasive arterial sampling. We present an end-to-end pipeline incorporating a 3D U-Net based ICA-net for ICA segmentation, alongside a Recurrent Neural Network (RNN) based MCIF-net for the derivation of a model-corrected blood input function (MCIF) with PV corrections. The developed 3D U-Net and RNN was trained and validated using a 5-fold cross-validation approach on 50 human brain FDG PET datasets. The ICA-net achieved an average Dice score of 82.18% and an Intersection over Union of 68.54% across all tested scans. Furthermore, the MCIF-net exhibited a minimal root mean squared error of 0.0052. The application of this pipeline to ground truth data for dFDG-PET brain scans resulted in the precise localization of seizure onset regions, which contributed to a successful clinical outcome, with the patient achieving a seizure-free state after treatment. These results underscore the efficacy of the ICA-net and MCIF-net deep learning pipeline in learning the ICA structure's distribution and automating MCIF computation with PV corrections. This advancement marks a significant leap in non-invasive neuroimaging.	翻訳日:2024-02-07 18:24:20 公開日:2024-02-05
# 知覚的映像品質評価:調査 Perceptual Video Quality Assessment: A Survey ( http://arxiv.org/abs/2402.03413v1 ) ライセンス: Link先を確認	Xiongkuo Min, Huiyu Duan, Wei Sun, Yucheng Zhu, Guangtao Zhai	(参考訳) 映像信号の取得,圧縮,伝送,表示の様々な段階に導入される画質劣化の存在により,映像処理分野において知覚的映像品質評価が重要な役割を担っている。インターネット通信とクラウドサービス技術の進歩により、ビデオコンテンツとトラフィックは指数関数的に増加し、ビデオ品質の正確かつ迅速な評価の必要性がさらに強調されている。そのため、過去20年間で、ストリーミング、ユーザ生成コンテンツ(UGC)、3D、バーチャル・拡張現実(VRとAR)、ハイフレームレート(HFR)、オーディオ・ビジュアルなど、一般的なビデオと特定のビデオの両方に対して、多数の主観的、客観的なビデオ品質評価研究が実施されている。本調査は,これらの映像品質評価研究の最新の包括的レビューを提供する。具体的には,まず,映像品質指標の性能評価に必要な主観的映像品質評価手法とデータベースについて検討する。第2に,一般目的のビデオ品質評価アルゴリズムを,品質測定で用いる手法に基づいて調査し,結論づける。第3に、特定のアプリケーションや新興トピックに対する客観的な映像品質評価尺度の概要について述べる。最後に、最先端のビデオ品質評価尺度の性能を比較し、分析する。この調査は、古典作品と映像品質評価の領域における最近の進歩の両方を体系的に概観し、他の研究者が現場に迅速にアクセスし、関連する研究を行うのに役立つ。 Perceptual video quality assessment plays a vital role in the field of video processing due to the existence of quality degradations introduced in various stages of video signal acquisition, compression, transmission and display. With the advancement of internet communication and cloud service technology, video content and traffic are growing exponentially, which further emphasizes the requirement for accurate and rapid assessment of video quality. Therefore, numerous subjective and objective video quality assessment studies have been conducted over the past two decades for both generic videos and specific videos such as streaming, user-generated content (UGC), 3D, virtual and augmented reality (VR and AR), high frame rate (HFR), audio-visual, etc. This survey provides an up-to-date and comprehensive review of these video quality assessment studies. Specifically, we first review the subjective video quality assessment methodologies and databases, which are necessary for validating the performance of video quality metrics. Second, the objective video quality assessment algorithms for general purposes are surveyed and concluded according to the methodologies utilized in the quality measures. Third, we overview the objective video quality assessment measures for specific applications and emerging topics. Finally, the performances of the state-of-the-art video quality assessment measures are compared and analyzed. This survey provides a systematic overview of both classical works and recent progresses in the realm of video quality assessment, which can help other researchers quickly access the field and conduct relevant research.	翻訳日:2024-02-07 18:23:44 公開日:2024-02-05
# より詳細な情報:専門家による効率的な画像の超高解像度化 See More Details: Efficient Image Super-Resolution by Experts Mining ( http://arxiv.org/abs/2402.03412v1 ) ライセンス: Link先を確認	Eduard Zamfir and Zongwei Wu and Nancy Mehta and Yulun Zhang and Radu Timofte	(参考訳) 低分解能(LR)入力から高分解能(HR)画像を再構成することは、画像超解像(SR)において大きな課題となる。近年のアプローチでは、様々な目的のためにカスタマイズされた複雑な操作の有効性が実証されているが、これらの異なる操作の直接的な積み重ねは、その実用性を妨げ、かなりの計算負担をもたらす可能性がある。そこで我々は,エキスパートマイニングを用いた効率的なsrモデルseemoreを提案する。我々のアプローチは、異なるレベルの専門家を戦略的に取り入れ、協調的な方法論を採用する。マクロスケールでは,階層的および空間的情報的特徴に専門家が対処し,総合的な理解を提供する。その後、モデルは低位の専門家の混在を活用して、ランク選択の微妙さを掘り下げる。正確なSRに欠かせない重要な要素を専門に扱うことで、我々のモデルは複雑な機能内詳細を明らかにすることに長けています。このコラボレーティブなアプローチは、効率的な設定において最小の計算コストで最適性能を達成できる「詳細」の概念を思い起こさせる。ソースはhttps://github.com/eduardzamfir/seemoredetailsで公開されます。 Reconstructing high-resolution (HR) images from low-resolution (LR) inputs poses a significant challenge in image super-resolution (SR). While recent approaches have demonstrated the efficacy of intricate operations customized for various objectives, the straightforward stacking of these disparate operations can result in a substantial computational burden, hampering their practical utility. In response, we introduce SeemoRe, an efficient SR model employing expert mining. Our approach strategically incorporates experts at different levels, adopting a collaborative methodology. At the macro scale, our experts address rank-wise and spatial-wise informative features, providing a holistic understanding. Subsequently, the model delves into the subtleties of rank choice by leveraging a mixture of low-rank experts. By tapping into experts specialized in distinct key factors crucial for accurate SR, our model excels in uncovering intricate intra-feature details. This collaborative approach is reminiscent of the concept of "see more", allowing our model to achieve an optimal performance with minimal computational costs in efficient settings. The source will be publicly made available at https://github.com/eduardzamfir/seemoredetails	翻訳日:2024-02-07 18:23:22 公開日:2024-02-05
# 大規模LLMサービスの効果的な実行方法に関する調査研究 A Survey on Effective Invocation Methods of Massive LLM Services ( http://arxiv.org/abs/2402.03408v1 ) ライセンス: Link先を確認	Can Wang, Bolin Zhang, Dianbo Sui, Zhiying Tum, Xiaoyu Liu and Jiabao Kang	(参考訳) 言語モデル・アズ・ア・サービス(LMaaS)は、サービスプロバイダに課金するだけで、特別な知識を必要とせずにタスクを達成できる。しかし、多くのプロバイダは、レイテンシ、パフォーマンス、価格の異なる大規模言語モデル(LLM)サービスを提供している。その結果、特定のタスク要求を満たす低レイテンシかつ高性能な応答でLCMサービス実行戦略を構築することは、非常に難しい課題となる。本稿では, LLMサービス呼び出し方式の概要を概観する。技術的には、LMaaSにおける効果的な呼び出し戦略を構築することの問題を正式に定義し、LLMサービス呼び出しフレームワークを提示する。このフレームワークは、既存のメソッドを入力抽象、セマンティックキャッシュ、ソリューション設計、出力拡張を含む4つの異なるコンポーネントに分類する。最後に、このタスクでまだ十分に対処されていないオープンな課題を強調し、今後の研究に光を当てる。 Language models as a service (LMaaS) enable users to accomplish tasks without requiring specialized knowledge, simply by paying a service provider. However, numerous providers offer massive large language model (LLM) services with variations in latency, performance, and pricing. Consequently, constructing the cost-saving LLM services invocation strategy with low-latency and high-performance responses that meet specific task demands becomes a pressing challenge. This paper provides a comprehensive overview of the LLM services invocation methods. Technically, we give a formal definition of the problem of constructing effective invocation strategy in LMaaS and present the LLM services invocation framework. The framework classifies existing methods into four different components, including input abstract, semantic cache, solution design, and output enhancement, which can be freely combined with each other. Finally, we emphasize the open challenges that have not yet been well addressed in this task and shed light on future research.	翻訳日:2024-02-07 18:23:04 公開日:2024-02-05
# 自己教師付き表現によるLLM音声生成システムの安定性向上 Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations ( http://arxiv.org/abs/2402.03407v1 ) ライセンス: Link先を確認	\'Alvaro Mart\'in-Cortinas, Daniel S\'aez-Trigueros, Iv\'an Vall\'es-P\'erez, Biel Tura-Vecino, Piotr Bili\'nski, Mateusz Lajszczak, Grzegorz Beringer, Roberto Barra-Chicote, Jaime Lorenzo-Trueba	(参考訳) 大規模言語モデル(LLM)は、スケーラビリティと文脈内学習能力のため、次世代の音声生成システムにおいて最も有望な技術の一つである。それでも、幻覚、コンテンツのスキップ、音声の繰り返しなど、推論時に複数の安定性の問題に苦しんでいる。本研究では,話者IDや記録条件などの定常的な特徴とは独立して,コンテンツなどのトランジショナルな特徴を符号化し,話者不整合表現を生成するための,自己教師型音声変換(VC)アーキテクチャを提案する。テキスト・トゥ・スポーチ(TTS)のためのLLMの訓練に話者区別符号を使用すると、LLMは人間と同様にテキストからのみ音声の内容とスタイルを生成することができ、一方、話者識別はVCモデルのデコーダによって提供される。以上の結果から,LLM は SOTA の絡み合った表現よりも4.7pp の話者類似性が向上し,単語誤り率 (WER) 5.4pp が低くなった。さらに、LibriTTSテスト他のデータセットの人間の記録よりも自然性が高い。最後に、明示的な参照埋め込みは、スタイルを推論するためにテキストのみを使用するモデルと比較して、werが14pp増加することで、知性(安定性)に悪影響を及ぼすことを示した。 Large Language Models (LLMs) are one of the most promising technologies for the next era of speech generation systems, due to their scalability and in-context learning capabilities. Nevertheless, they suffer from multiple stability issues at inference time, such as hallucinations, content skipping or speech repetitions. In this work, we introduce a new self-supervised Voice Conversion (VC) architecture which can be used to learn to encode transitory features, such as content, separately from stationary ones, such as speaker ID or recording conditions, creating speaker-disentangled representations. Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model. Results show that LLMs trained over speaker-disentangled self-supervised representations provide an improvement of 4.7pp in speaker similarity over SOTA entangled representations, and a word error rate (WER) 5.4pp lower. Furthermore, they achieve higher naturalness than human recordings of the LibriTTS test-other dataset. Finally, we show that using explicit reference embedding negatively impacts intelligibility (stability), with WER increasing by 14pp compared to the model that only uses text to infer the style.	翻訳日:2024-02-07 18:22:49 公開日:2024-02-05
# 特集「量子情報と確率:基礎から工学へ」(QIP23)によせて Recollections about V\"axj\"o conferences. Preface to the special issue "Quantum Information and Probability: from Foundations to Engineering'' (QIP23) ( http://arxiv.org/abs/2402.03402v1 ) ライセンス: Link先を確認	Andrei Khrennikov	(参考訳) QIP23 (Quantum Information and Probability: From Foundations to Engineering') カンファレンスの特集号の序文として、私はこれらのメモを、V\"axj\"oカンファレンスの思い出とともに書いた。これらの会議は私の人生の25年間(2000-24年)をカバーし、量子基礎の基本的な問題に対する私の見解の進化において重要な役割を担った。私はこの会議シリーズをできるだけ長く続けたいと思っています。私の理解では、これは量子物理学の歴史における量子基盤に関する最長のカンファレンスシリーズである。これらのメモには、量子ファンデーションに関する世界有数の専門家との会話の思い出が含まれている。そのようなメモには歴史的価値があるかもしれない。量子基礎に関する私の見解は具体的であり、基本的に25年間に進化しました。最後に、von Helmholtz、Hertz、Boltzmann、Schr\odingerによって開発された物理的基礎において、事実上忘れられた経路を発見した。科学的理論は観察と因果の2つのモデルの組み合わせである。これらのモデル間の結合は難しいです。因果モデルは、観測可能なものと識別できない隠れた量で操作することができる。この観点から、ベルのサブ量子と量子モデルの結合は非常に特別であり、ベルの不等式は量子力学を超える(局所的な)方法では閉まらない。 As the preface to the special issue for the conference ``Quantum Information and Probability: from Foundations to Engineering'' (QIP23), I wrote these notes with recollection about V\"axj\"o conferences. These conferences covered 25 years of my life (2000-24) and played the crucial role in evolution of my own views on the basic problems of quantum foundations. I hope to continue this conference series as long as possible. Up to my understanding, this is the longest conference series on quantum foundation in the history of quantum physics. These notes contain my recollections of conversations with the world's leading experts on quantum foundations. I think that such notes may have the historical value. My own views on quantum foundations are specific and they evolved essentially during 25 years. Finally, I discovered the practically forgotten pathway in physical foundations developed by von Helmholtz, Hertz, Boltzmann, and Schr\"odinger and known as the Bild conception. A scientific theory is a combination of two models, observational and causal. Coupling between these models can be tricky. A causal model can operate with hidden quantities which can't be identified with observables. From this viewpoint, Bell's coupling of subquantum and quantum models is very special and the violation of the Bell inequalities doesn't close (local) ways beyond quantum mechanics.	翻訳日:2024-02-07 18:22:21 公開日:2024-02-05
# ポストホック解釈可能性と注意:数学的視点 Attention Meets Post-hoc Interpretability: A Mathematical Perspective ( http://arxiv.org/abs/2402.03485v1 ) ライセンス: Link先を確認	Gianluigi Lopardo, Frederic Precioso, Damien Garreau	(参考訳) 注意に基づくアーキテクチャ、特にトランスフォーマーは、技術的な革命の中心にある。興味深いことに、幅広いアプリケーションにおける最先端の成果の獲得を支援することに加えて、アテンションメカニズムは本質的にモデルの内部動作に関する有意義な洞察を提供する。これらの洞察は説明として利用できますか? 議論が続いている。本稿では,単純な注意に基づくアーキテクチャを数学的に研究し,ポストホックと注意に基づく説明の違いを指摘する。それらとは全く異なる結果が得られており、その制限にもかかわらず、ポストホック法は単に注意重みを調べるだけでなく、より有用な洞察を捉えることができる。 Attention-based architectures, in particular transformers, are at the heart of a technological revolution. Interestingly, in addition to helping obtain state-of-the-art results on a wide range of applications, the attention mechanism intrinsically provides meaningful insights on the internal behavior of the model. Can these insights be used as explanations? Debate rages on. In this paper, we mathematically study a simple attention-based architecture and pinpoint the differences between post-hoc and attention-based explanations. We show that they provide quite different results, and that, despite their limitations, post-hoc methods are capable of capturing more useful insights than merely examining the attention weights.	翻訳日:2024-02-07 18:16:23 公開日:2024-02-05
# アラビア語シノニム bert-based adversarial examples for text classification Arabic Synonym BERT-based Adversarial Examples for Text Classification ( http://arxiv.org/abs/2402.03477v1 ) ライセンス: Link先を確認	Norah Alshahrani, Saied Alshahrani, Esma Wali, Jeanna Matthews	(参考訳) テキスト分類システムは、敵対的なテキスト例に弱いことが証明されており、元のテキスト例の修正版は、しばしば人間の目に気付かれず、テキスト分類モデルにそれらの分類を変更するよう強制することができる。しばしば、逆境テキスト攻撃の影響を定量化する研究は、英語で訓練されたモデルにのみ適用されている。本稿では,アラビア語における対人攻撃に関する最初の単語レベル研究を紹介する。具体的には、アラビア語の敵対的攻撃に対する最先端のテキスト分類モデルの堅牢性を評価するために、ブラックボックス設定のbertモデルを用いたマスク言語モデリング(mlm)タスクを用いた同義語(単語レベル)攻撃を用いる。同義語bertに基づく攻撃を用いて新たに生成した逆例の文法的・意味的類似性を評価するために,4人の人間エバブリエータを招き,生成された逆例を元の例と比較した。また,新たに生成したアラビアの敵対例の様々なモデルへの転送可能性について検討し,BERTモデルに対する防衛機構の有効性について検討した。細調整されたBERTモデルは、私たちが訓練したWordCNNやWordLSTMのような他のディープニューラルネットワーク(DNN)モデルよりも、私たちの同義語攻撃の影響を受けやすいことが分かりました。また、細調整されたBERTモデルの方が攻撃の受けやすいことも判明した。最後に,対戦訓練を初期防御機構として適用した後,細調整したBERTモデルにおいて,少なくとも2%の精度回復が得られた。 Text classification systems have been proven vulnerable to adversarial text examples, modified versions of the original text examples that are often unnoticed by human eyes, yet can force text classification models to alter their classification. Often, research works quantifying the impact of adversarial text attacks have been applied only to models trained in English. In this paper, we introduce the first word-level study of adversarial attacks in Arabic. Specifically, we use a synonym (word-level) attack using a Masked Language Modeling (MLM) task with a BERT model in a black-box setting to assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. To evaluate the grammatical and semantic similarities of the newly produced adversarial examples using our synonym BERT-based attack, we invite four human evaluators to assess and compare the produced adversarial examples with their original examples. We also study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms against these adversarial examples on the BERT models. We find that fine-tuned BERT models were more susceptible to our synonym attacks than the other Deep Neural Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that fine-tuned BERT models were more susceptible to transferred attacks. We, lastly, find that fine-tuned BERT models successfully regain at least 2% in accuracy after applying adversarial training as an initial defense mechanism.	翻訳日:2024-02-07 18:16:11 公開日:2024-02-05
# スペクトル拡散後方サンプリングを用いたCT材料分解 CT Material Decomposition using Spectral Diffusion Posterior Sampling ( http://arxiv.org/abs/2402.03476v1 ) ライセンス: Link先を確認	Xiao Jiang, Grace J. Gang, J. Webster Stayman	(参考訳) 本研究では,拡散後サンプリング(DPS)に基づく新しい深層学習手法を提案する。このアプローチは、教師なしトレーニングからの洗練された事前知識と、厳密な物理モデルを組み合わせる。逆過程に要する時間ステップの削減と計算コストの削減に要する勾配近似のために、ジャンプ開始プロセスを用いたより高速でより安定した変種を提案する。 2つの分光CTシステム(dual-kVpとdual-layer detector CT)の性能について検討した。どちらのシステムでも、DPSはモデルベース材料分解(MBMD)で使われる繰り返しの10%しか得られず、高い構造類似度指標(SSIM)を達成する。 Jumpstarted DPS (JSDPS) はさらに計算時間を 85% 以上削減し、最高精度、最低不確実性、そして従来の DPS や MBMD よりも低い計算コストを達成する。その結果, スペクトルCTデータに基づく比較的高速かつ高精度な材料分解を実現するJSDPSの可能性が示された。 In this work, we introduce a new deep learning approach based on diffusion posterior sampling (DPS) to perform material decomposition from spectral CT measurements. This approach combines sophisticated prior knowledge from unsupervised training with a rigorous physical model of the measurements. A faster and more stable variant is proposed that uses a jumpstarted process to reduce the number of time steps required in the reverse process and a gradient approximation to reduce the computational cost. Performance is investigated for two spectral CT systems: dual-kVp and dual-layer detector CT. On both systems, DPS achieves high Structure Similarity Index Metric Measure(SSIM) with only 10% of iterations as used in the model-based material decomposition(MBMD). Jumpstarted DPS (JSDPS) further reduces computational time by over 85% and achieves the highest accuracy, the lowest uncertainty, and the lowest computational costs compared to classic DPS and MBMD. The results demonstrate the potential of JSDPS for providing relatively fast and accurate material decomposition based on spectral CT data.	翻訳日:2024-02-07 18:15:30 公開日:2024-02-05
# Sliding Window Multivariate Time Series Forest Classifier を用いた地域別フレア予測 Active Region-based Flare Forecasting with Sliding Window Multivariate Time Series Forest Classifiers ( http://arxiv.org/abs/2402.03474v1 ) ライセンス: Link先を確認	Anli Ji and Berkay Aydin	(参考訳) 過去数十年間、物理学に基づくシミュレーションやデータ駆動技術(機械学習やディープラーニングを含む)の多くの応用が太陽フレアの分析と予測に現れてきた。これらのアプローチは、太陽フレアのダイナミクスを理解する上で重要であり、主にこれらの事象を予測し、地球に対する潜在的なリスクを最小限に抑えることを目的としている。現在の手法は大きな進歩を遂げているが、これらのデータ駆動アプローチには制限がある。顕著な欠点の1つは、これらのフレアの起源である活性領域における時間的進化特性に対する考慮の欠如である。これにより、これらの手法が高次元のアクティブ領域の特徴間の関係を把握できなくなり、操作におけるユーザビリティが制限される。本研究は,多変量時系列の解釈可能な分類器の開発と,スライディングウィンドウに基づくサブインターバルランキングを用いた特徴ランキング手法の実証に焦点を当てた。私たちの研究の主な貢献は、高次元データに使用される複雑で理解できないブラックボックスモデルと、特に太陽フレア予測の文脈において、多変量時系列から関連するサブインターバルを探索することとのギャップを埋めることです。以上の結果から, 風下時系列森林分類器は, 太陽フレア予測において有効に機能し(真のスキル統計値が85%以上) , また, 与えられた学習課題において最も重要な特徴とサブインターバルを指摘できた。 Over the past few decades, many applications of physics-based simulations and data-driven techniques (including machine learning and deep learning) have emerged to analyze and predict solar flares. These approaches are pivotal in understanding the dynamics of solar flares, primarily aiming to forecast these events and minimize potential risks they may pose to Earth. Although current methods have made significant progress, there are still limitations to these data-driven approaches. One prominent drawback is the lack of consideration for the temporal evolution characteristics in the active regions from which these flares originate. This oversight hinders the ability of these methods to grasp the relationships between high-dimensional active region features, thereby limiting their usability in operations. This study centers on the development of interpretable classifiers for multivariate time series and the demonstration of a novel feature ranking method with sliding window-based sub-interval ranking. The primary contribution of our work is to bridge the gap between complex, less understandable black-box models used for high-dimensional data and the exploration of relevant sub-intervals from multivariate time series, specifically in the context of solar flare forecasting. Our findings demonstrate that our sliding-window time series forest classifier performs effectively in solar flare prediction (with a True Skill Statistic of over 85\%) while also pinpointing the most crucial features and sub-intervals for a given learning task.	翻訳日:2024-02-07 18:15:01 公開日:2024-02-05
# 医用AI画像における見えない透かしの有効性の評価 Assessing the Efficacy of Invisible Watermarks in AI-Generated Medical Images ( http://arxiv.org/abs/2402.03473v1 ) ライセンス: Link先を確認	Xiaodan Xing, Huiyu Zhou, Yingying Fang, and Guang Yang	(参考訳) AIが生成する医療画像は、現実世界におけるデータ不足問題に対処する可能性から、人気が高まっている。しかし、これらの合成画像の正確な識別の問題、特に実際の複製で顕著なリアリズムを示す場合、依然として懸念されている。この課題を軽減するため、dalleやimagenのような画像生成装置は、合成画像の真正性の識別を容易にするデジタル透かしを統合した。これらの透かしは画像のピクセル内に埋め込まれており、検出性を維持しながら人間の目からは見えない。それにもかかわらず、これらの見えない透かしが合成医療画像の有用性に与える影響に関する包括的な調査は欠落している。本研究では,合成医用画像に目に見えない透かしを取り入れ,下流分類作業の文脈で有効性を評価することを提案する。私たちの目標は、合成医用画像の検出性の向上、倫理基準の強化、データ汚染と潜在的な詐欺に対する保護といった、このような透かしの存続可能性に関する議論の道を開くことです。 AI-generated medical images are gaining growing popularity due to their potential to address the data scarcity challenge in the real world. However, the issue of accurate identification of these synthetic images, particularly when they exhibit remarkable realism with their real copies, remains a concern. To mitigate this challenge, image generators such as DALLE and Imagen, have integrated digital watermarks aimed at facilitating the discernment of synthetic images' authenticity. These watermarks are embedded within the image pixels and are invisible to the human eye while remains their detectability. Nevertheless, a comprehensive investigation into the potential impact of these invisible watermarks on the utility of synthetic medical images has been lacking. In this study, we propose the incorporation of invisible watermarks into synthetic medical images and seek to evaluate their efficacy in the context of downstream classification tasks. Our goal is to pave the way for discussions on the viability of such watermarks in boosting the detectability of synthetic medical images, fortifying ethical standards, and safeguarding against data pollution and potential scams.	翻訳日:2024-02-07 18:14:15 公開日:2024-02-05
# 物理符号化グラフニューラルネットワークによる接触変形予測 Physics-Encoded Graph Neural Networks for Deformation Prediction under Contact ( http://arxiv.org/abs/2402.03466v1 ) ライセンス: Link先を確認	Mahdi Saleh, Michael Sommersperger, Nassir Navab, Federico Tombari	(参考訳) ロボット工学では、触覚相互作用における物体の変形を理解することが重要である。変形の正確な理解はロボットのシミュレーションを増大させ、様々な産業に広く影響する。本稿では,物理符号化グラフニューラルネットワーク(GNN)を用いた予測手法を提案する。ロボットの把握や操作のシナリオと同様に、変形可能なメッシュに外部力で接触する剛体メッシュ間のダイナミクスをモデル化することに注力する。我々のアプローチは,メッシュの物理的状態を保持するグラフ構造内の軟体と剛体の両方を表す。また、オブジェクト間の相互作用をキャプチャするクロスアテンション機構も組み込んでいます。幾何学と物理を共同で学習することで, 変形の一貫性, 詳細性を再構築する。コードとデータセットを公開して、ロボットシミュレーションと把握の研究を進めました。 In robotics, it's crucial to understand object deformation during tactile interactions. A precise understanding of deformation can elevate robotic simulations and have broad implications across different industries. We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions. Similar to robotic grasping and manipulation scenarios, we focus on modeling the dynamics between a rigid mesh contacting a deformable mesh under external forces. Our approach represents both the soft body and the rigid body within graph structures, where nodes hold the physical states of the meshes. We also incorporate cross-attention mechanisms to capture the interplay between the objects. By jointly learning geometry and physics, our model reconstructs consistent and detailed deformations. We've made our code and dataset public to advance research in robotic simulation and grasping.	翻訳日:2024-02-07 18:11:54 公開日:2024-02-05
# 分散ニューラルネットワークによる次元の呪いを打ち破る Breaking the Curse of Dimensionality with Distributed Neural Computation ( http://arxiv.org/abs/2402.03460v1 ) ライセンス: Link先を確認	Haitz S\'aez de Oc\'ariz Borde and Takashi Furuya and Anastasis Kratsios and Marc T. Law	(参考訳) 本稿では,複数のマシンに分散可能なニューラルネットワークアルゴリズムを用いて,次元の呪いを克服する理論的アプローチを提案する。モジュール型分散ディープラーニングパラダイムである‘textit{neural pathways’は,GPU VRAMに少数のパラメータをロードするだけで任意の精度を実現することができる。形式的には、すべてのエラーレベル $\varepsilon>0$ およびすべての Lipschitz 関数 $f:[0,1]^n\to \mathbb{R}$ に対して、$f$ to $\varepsilon$ accuracy over $[0,1]^n$ を均一に近似するニューラルパスモデルを構築することができ、$\mathcal{O}(\varepsilon^{-1})$パラメータをメモリにロードする$\mathcal{O}(\varepsilon^{-1})$ と$\mathcal{O}(\varepsilon^{-1})$$ のネットワークのみを必要とする。これにより、同じ精度を達成するために$\mathcal{O}(\varepsilon^{-n/2})$パラメータを必要とする従来の非分散ディープラーニングモデルであるReLU MLPの最適境界が改善される。次元性の呪いを破る唯一の利用可能なディープラーニングモデルは、超表現的アクティベーション機能を持つMLPである。しかし、これらのモデルが神経経路モデルと異なり、境界深さと幅の制限があっても無限のvc次元を持つことを実証する。これは後者のみが一般化することを意味する。分析は回帰型と分類型の両方で実験的に検証され,大規模集中型ベンチマークよりも優れた性能を示すことが示された。 We present a theoretical approach to overcome the curse of dimensionality using a neural computation algorithm which can be distributed across several machines. Our modular distributed deep learning paradigm, termed \textit{neural pathways}, can achieve arbitrary accuracy while only loading a small number of parameters into GPU VRAM. Formally, we prove that for every error level $\varepsilon>0$ and every Lipschitz function $f:[0,1]^n\to \mathbb{R}$, one can construct a neural pathways model which uniformly approximates $f$ to $\varepsilon$ accuracy over $[0,1]^n$ while only requiring networks of $\mathcal{O}(\varepsilon^{-1})$ parameters to be loaded in memory and $\mathcal{O}(\varepsilon^{-1}\log(\varepsilon^{-1}))$ to be loaded during the forward pass. This improves the optimal bounds for traditional non-distributed deep learning models, namely ReLU MLPs, which need $\mathcal{O}(\varepsilon^{-n/2})$ parameters to achieve the same accuracy. The only other available deep learning model that breaks the curse of dimensionality is MLPs with super-expressive activation functions. However, we demonstrate that these models have an infinite VC dimension, even with bounded depth and width restrictions, unlike the neural pathways model. This implies that only the latter generalizes. Our analysis is validated experimentally in both regression and classification tasks, demonstrating that our model exhibits superior performance compared to larger centralized benchmarks.	翻訳日:2024-02-07 18:11:44 公開日:2024-02-05
# 説明可能なブースティングマシンを用いた効率的かつ解釈可能な交通方向予測 Efficient and Interpretable Traffic Destination Prediction using Explainable Boosting Machines ( http://arxiv.org/abs/2402.03457v1 ) ライセンス: Link先を確認	Yasin Yousif and J\"org M\"uller	(参考訳) 交通軌道予測のための正確なモデルの開発は、完全自動運転を達成するために不可欠である。この課題に対処するために、さまざまなディープニューラルネットワークモデルが採用されているが、ブラックボックスの性質は、デプロイされたシステムの透明性とデバッグ機能を妨げている。 glass-boxモデルは、 \ac{gam}のようなメソッドを通じて完全な解釈性を提供することで、ソリューションを提供する。本研究では,3つの混在トラフィックデータセットである \ac{SDD} , \ac{InD} , Argoverse のトラフィック予測のための, 効率的な加算モデルである \ac{EBM} を評価する。その結果,車載型argoverseデータセットに対する控えめな予測を提供しつつ,歩行者の目的地予測において,車載型モデルと車載型argoverseデータセットとの競合性が示された。さらに、我々の透明なトレーニングモデルは、特徴の重要性と相互作用を分析し、予測説明の質的な例を提供する。トレーニングコードの全文は公開時に公開されます。 Developing accurate models for traffic trajectory predictions is crucial for achieving fully autonomous driving. Various deep neural network models have been employed to address this challenge, but their black-box nature hinders transparency and debugging capabilities in a deployed system. Glass-box models offer a solution by providing full interpretability through methods like \ac{GAM}. In this study, we evaluate an efficient additive model called \ac{EBM} for traffic prediction on three popular mixed traffic datasets: \ac{SDD}, \ac{InD}, and Argoverse. Our results show that the \ac{EBM} models perform competitively in predicting pedestrian destinations within \ac{SDD} and \ac{InD} while providing modest predictions for vehicle-dominant Argoverse dataset. Additionally, our transparent trained models allow us to analyse feature importance and interactions, as well as provide qualitative examples of predictions explanation. The full training code will be made public upon publication.	翻訳日:2024-02-07 18:11:05 公開日:2024-02-05
# 自己教師型コントラスト学習のための制約付きマルチビュー表現 Constrained Multiview Representation for Self-supervised Contrastive Learning ( http://arxiv.org/abs/2402.03456v1 ) ライセンス: Link先を確認	Siyuan Dai, Kai Ye, Kun Zhao, Ge Cui, Haoteng Tang, Liang Zhan	(参考訳) 表現学習(representation learning)は、現代ディープラーニングのパラダイムにおける重要な基礎であり、潜在空間内の特徴を解明し、深層モデルを解釈するための導管を提供する。それにもかかわらず、解剖学的パターンの固有の複雑さと、医用画像分割における病変分布のランダムな性質は、表現の不連続と突出した特徴の理解に重大な課題をもたらす。相互情報の最大化、特にコントラスト学習の枠組みによって導かれる手法は、密接に絡み合った表現の分離において顕著な成功と優越性を示している。しかし, 比較学習の有効性は, 正と負のサンプルペアの品質に大きく依存する。つまり, 多視点間の非選択平均的相互情報が学習戦略を阻害し, 視点の選択が不可欠となる。本研究では,表現距離に基づく相互情報(mi)の最大化を前提とした新しい手法を提案する。さらに、表現選択のためのMI再分類戦略を導入し、連続MI推定と表現重要度距離測定の両立を図った。具体的には、周波数領域から抽出した多視点表現を活用し、各周波数間の相互情報に基づいてその意義を再評価し、意味理解を促進するための多面的コントラスト学習アプローチの促進を図る。 5つの指標に基づく統計的結果から,提案手法はMIの最大化による表現選択を十分に制約し,マルチビューのコントラッシブ学習プロセスを導出することを示す。 Representation learning constitutes a pivotal cornerstone in contemporary deep learning paradigms, offering a conduit to elucidate distinctive features within the latent space and interpret the deep models. Nevertheless, the inherent complexity of anatomical patterns and the random nature of lesion distribution in medical image segmentation pose significant challenges to the disentanglement of representations and the understanding of salient features. Methods guided by the maximization of mutual information, particularly within the framework of contrastive learning, have demonstrated remarkable success and superiority in decoupling densely intertwined representations. However, the effectiveness of contrastive learning highly depends on the quality of the positive and negative sample pairs, i.e. the unselected average mutual information among multi-views would obstruct the learning strategy so the selection of the views is vital. In this work, we introduce a novel approach predicated on representation distance-based mutual information (MI) maximization for measuring the significance of different views, aiming at conducting more efficient contrastive learning and representation disentanglement. Additionally, we introduce an MI re-ranking strategy for representation selection, benefiting both the continuous MI estimating and representation significance distance measuring. Specifically, we harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information across varying frequencies, thereby facilitating a multifaceted contrastive learning approach to bolster semantic comprehension. The statistical results under the five metrics demonstrate that our proposed framework proficiently constrains the MI maximization-driven representation selection and steers the multi-view contrastive learning process.	翻訳日:2024-02-07 18:10:48 公開日:2024-02-05
# ソーシャルネットワークにおけるリコメンデーションフェアネス Recommendation Fairness in Social Networks Over Time ( http://arxiv.org/abs/2402.03450v1 ) ライセンス: Link先を確認	Meng Cao, Hussain Hussain, Sandipan Sikdar, Denis Helic, Markus Strohmaier, Roman Kern	(参考訳) 社会的レコメンデーションシステムでは、推薦モデルは、性別や人種など、異なる人口集団に対して公平な可視性を提供することが重要である。既存の研究の多くは、時間とともに変化するネットワークの個々の静的スナップショットを調べるだけでこの問題に対処している。このギャップに対処するために,推奨フェアネスの経時的変化と動的ネットワーク特性との関係について検討する。本研究では,6つの推薦アルゴリズムのフェアネスを評価し,フェアネスとネットワーク特性の関係を時間とともに解析し,実世界の3つの動的ネットワークについて検討した。さらに,ネットワーク特性に対する介入が公平性にどのように影響するかを,代替進化の結果と異なるネットワーク特性の相反するシナリオについて検討した。実験結果から,提案手法によらず,推奨公正性は時間とともに向上することが示唆された。また,2つのネットワーク特性,マイノリティ比とホモフィリー比が,時間とともに公平性と安定な相関を示すことも見出した。我々の実証研究は、極端なホモフィリー比が、バランスの取れたマイノリティ比であっても不公平なレコメンデーションに寄与する可能性を示唆している。我々の研究は、社会科学における動的ネットワークにおける公正性の進化に関する洞察を提供する。我々は、システムオペレーターや政策立案者が、ソーシャルネットワークにおける公正をターゲットとした時間的変化や介入の影響をよりよく理解するのに役立つと信じている。 In social recommender systems, it is crucial that the recommendation models provide equitable visibility for different demographic groups, such as gender or race. Most existing research has addressed this problem by only studying individual static snapshots of networks that typically change over time. To address this gap, we study the evolution of recommendation fairness over time and its relation to dynamic network properties. We examine three real-world dynamic networks by evaluating the fairness of six recommendation algorithms and analyzing the association between fairness and network properties over time. We further study how interventions on network properties influence fairness by examining counterfactual scenarios with alternative evolution outcomes and differing network properties. Our results on empirical datasets suggest that recommendation fairness improves over time, regardless of the recommendation method. We also find that two network properties, minority ratio, and homophily ratio, exhibit stable correlations with fairness over time. Our counterfactual study further suggests that an extreme homophily ratio potentially contributes to unfair recommendations even with a balanced minority ratio. Our work provides insights into the evolution of fairness within dynamic networks in social science. We believe that our findings will help system operators and policymakers to better comprehend the implications of temporal changes and interventions targeting fairness in social networks.	翻訳日:2024-02-07 18:10:20 公開日:2024-02-05
# 分散散発的フェデレーション学習:一般化収束保証を伴う統一方法論 Decentralized Sporadic Federated Learning: A Unified Methodology with Generalized Convergence Guarantees ( http://arxiv.org/abs/2402.03448v1 ) ライセンス: Link先を確認	Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton	(参考訳) 分散連合学習(dfl)は近年、クライアントがモデル更新とモデル集約の両方を行うという、重要な研究の注目を集めている。本研究では,両プロセスにおける散発性の概念を一般化し,現実的なDFL設定で表される異質性の異なる形態の影響をモデル化するDFL方法論である分散散発的フェデレートラーニング(Decentralized Sporadic Federated Learning)(\textt{DSpodFL}$)を提案する。 $\texttt{DSpodFL}$は、分散勾配降下(DGD)、ランダム化ゴシップ(RG)、分散化フェデレーション平均化(DFedAvg)など、主要な分散最適化手法の多くを単一のモデリングフレームワークで統合する。我々は $\texttt{DSpodFL}$ の収束挙動を解析的に特徴づけ、幾何収束率を既存の研究よりも一般的な仮定の下で有限最適性ギャップに一致させることができることを示す。実験により、$\texttt{DSpodFL}$は、最先端技術と比較して、システムのパラメータの変化に対するトレーニング速度とロバスト性を大幅に改善することを示した。 Decentralized Federated Learning (DFL) has received significant recent research attention, capturing settings where both model updates and model aggregations -- the two key FL processes -- are conducted by the clients. In this work, we propose Decentralized Sporadic Federated Learning ($\texttt{DSpodFL}$), a DFL methodology which generalizes the notion of sporadicity in both of these processes, modeling the impact of different forms of heterogeneity that manifest in realistic DFL settings. $\texttt{DSpodFL}$ unifies many of the prominent decentralized optimization methods, e.g., distributed gradient descent (DGD), randomized gossip (RG), and decentralized federated averaging (DFedAvg), under a single modeling framework. We analytically characterize the convergence behavior of $\texttt{DSpodFL}$, showing, among other insights, that we can match a geometric convergence rate to a finite optimality gap under more general assumptions than in existing works. Through experiments, we demonstrate that $\texttt{DSpodFL}$ achieves significantly improved training speeds and robustness to variations in system parameters compared to the state-of-the-art.	翻訳日:2024-02-07 18:09:53 公開日:2024-02-05
# 相関下における可変重要度ランキングの課題 Challenges in Variable Importance Ranking Under Correlation ( http://arxiv.org/abs/2402.03447v1 ) ライセンス: Link先を確認	Annie Liang and Thomas Jemielita and Andy Liaw and Vladimir Svetnik and Lingkang Huang and Richard Baumgartner and Jason M. Klusowski	(参考訳) 可変重要性は、予測モデルの出力に対する因子の影響を測定するのに役立つため、解釈可能な機械学習において重要な役割を果たす。置換(または関連するアプローチ)による"null"機能の生成に基づくモデル非依存メソッドを適用することができる。このような分析は、木に基づくアンサンブルを含むブラックボックスモデルを解釈できるため、医薬品の用途でよく用いられる。しかし、変数重要度推定における大きな課題と重要な共同創設者は、機能間相関の存在である。近年, 条件付き予測影響 (CPI) と呼ばれる変動重要度尺度など, 特徴ノックオフを利用した限界変量の調整が提案されている。このようなアプローチの評価と評価が私たちの研究の焦点です。まず,可変重要度評価における特徴相関の影響を包括的シミュレーションにより検討する。次に,高い相関性を持つ特徴がノックオフ構成によってCPIに作用する限界を理論的に証明する。我々は、常にノックオフ変数とその対応する予測変数の間に相関が存在しないことを期待するが、相関が予測変数間の特定の相関しきい値を超えて線形に増加することを証明している。本研究は,高機能相関を扱う場合のフリーランチの欠如と,変数重要度推定における手法の背後にある有用性と限界を理解する必要性を強調する。 Variable importance plays a pivotal role in interpretable machine learning as it helps measure the impact of factors on the output of the prediction model. Model agnostic methods based on the generation of "null" features via permutation (or related approaches) can be applied. Such analysis is often utilized in pharmaceutical applications due to its ability to interpret black-box models, including tree-based ensembles. A major challenge and significant confounder in variable importance estimation however is the presence of between-feature correlation. Recently, several adjustments to marginal permutation utilizing feature knockoffs were proposed to address this issue, such as the variable importance measure known as conditional predictive impact (CPI). Assessment and evaluation of such approaches is the focus of our work. We first present a comprehensive simulation study investigating the impact of feature correlation on the assessment of variable importance. We then theoretically prove the limitation that highly correlated features pose for the CPI through the knockoff construction. While we expect that there is always no correlation between knockoff variables and its corresponding predictor variables, we prove that the correlation increases linearly beyond a certain correlation threshold between the predictor variables. Our findings emphasize the absence of free lunch when dealing with high feature correlation, as well as the necessity of understanding the utility and limitations behind methods in variable importance estimation.	翻訳日:2024-02-07 18:09:25 公開日:2024-02-05
# mantis shrimp: a multi-survey computer vision photometric redshift model に関する予備報告 Preliminary Report on Mantis Shrimp: a Multi-Survey Computer Vision Photometric Redshift Model ( http://arxiv.org/abs/2402.03535v1 ) ライセンス: Link先を確認	Andrew Engel, Gautham Narayan, Nell Byler	(参考訳) 大規模でパブリックでマルチモーダルな天文学データセットが利用可能であることは、ai科学と天文学の境界線にまたがる新しい研究を実行する機会を与える。光度赤偏移推定は天文学の確立されたサブフィールドである。先行研究によると、コンピュータビジョンモデルは通常カタログベースのモデルを上回るが、これらのモデルは複数の機器やセンサーからの画像を組み込む際にさらに複雑になる。本報告では,超紫外(galex),光学(panstarrs),赤外線(unwise)画像と融合する光度赤方偏移推定のためのマルチサーベイコンピュータビジョンモデルであるmantis shrimpの作成の進展について詳述する。我々は、ディープラーニングの解釈可能性診断を用いて、モデルが異なる入力からどのように情報を活用するかを測定する。我々は、CNNの振る舞いを解釈可能性の指標から推論し、特に銀河の性質に関する物理的に基底的な知識で結果をフレーミングする。 The availability of large, public, multi-modal astronomical datasets presents an opportunity to execute novel research that straddles the line between science of AI and science of astronomy. Photometric redshift estimation is a well-established subfield of astronomy. Prior works show that computer vision models typically outperform catalog-based models, but these models face additional complexities when incorporating images from more than one instrument or sensor. In this report, we detail our progress creating Mantis Shrimp, a multi-survey computer vision model for photometric redshift estimation that fuses ultra-violet (GALEX), optical (PanSTARRS), and infrared (UnWISE) imagery. We use deep learning interpretability diagnostics to measure how the model leverages information from the different inputs. We reason about the behavior of the CNNs from the interpretability metrics, specifically framing the result in terms of physically-grounded knowledge of galaxy properties.	翻訳日:2024-02-07 18:02:40 公開日:2024-02-05
# Attire and background replacement 用塗料注入パイプライン An Inpainting-Infused Pipeline for Attire and Background Replacement ( http://arxiv.org/abs/2402.03501v1 ) ライセンス: Link先を確認	Felipe Rodrigues Perche-Mahlow and Andr\'e Felipe-Zanella and William Alberto Cruz-Casta\~neda and Marcellus Amadeus	(参考訳) 近年、ジェネレーティブ・人工知能(GenAI)の進歩は変革的パラダイムシフトを引き起こし、様々な領域に大きな影響を与えている。本稿では,画像操作を重視したGenAIとコンピュータビジョンの高度な技術を活用し,統合的なアプローチを特に検討する。この手法は、深度推定、深度情報に基づく塗装マスクの作成、LCM(Latent Consistency Models)と組み合わせて安定拡散を利用した背景の生成と置換、続く衣服の交換、塗装パイプラインによる美的変化の応用など、いくつかの段階を通じて展開されている。本研究で行った実験は,視覚的に捕食するコンテンツを生み出す可能性を強調し,方法論の有効性を強調した。これらの高度な手法の収束により、ユーザーは個人の写真を入力し、特定のプロンプトに基づいて衣服や背景を変更することができる。 In recent years, groundbreaking advancements in Generative Artificial Intelligence (GenAI) have triggered a transformative paradigm shift, significantly influencing various domains. In this work, we specifically explore an integrated approach, leveraging advanced techniques in GenAI and computer vision emphasizing image manipulation. The methodology unfolds through several stages, including depth estimation, the creation of inpaint masks based on depth information, the generation and replacement of backgrounds utilizing Stable Diffusion in conjunction with Latent Consistency Models (LCMs), and the subsequent replacement of clothes and application of aesthetic changes through an inpainting pipeline. Experiments conducted in this study underscore the methodology's efficacy, highlighting its potential to produce visually captivating content. The convergence of these advanced techniques allows users to input photographs of individuals and manipulate them to modify clothing and background based on specific prompts without manually input inpainting masks, effectively placing the subjects within the vast landscape of creative imagination.	翻訳日:2024-02-07 18:02:21 公開日:2024-02-05
# ハードウェアエラー下での量子アーキテクチャ探索のためのカリキュラム強化学習 Curriculum reinforcement learning for quantum architecture search under hardware errors ( http://arxiv.org/abs/2402.03500v1 ) ライセンス: Link先を確認	Yash J. Patel, Akash Kundu, Mateusz Ostaszewski, Xavier Bonet-Monroig, Vedran Dunjko, and Onur Danaci	(参考訳) ノイズの多い中間スケール量子時代の重要な課題は、現在のデバイス制限と互換性のある有用な回路を見つけることである。変分量子アルゴリズム(VQA)は、回路アーキテクチャを固定し、外部ループ内の個々のゲートパラメータを最適化することで潜在的な解を提供する。しかし、パラメータ最適化は難易度が高くなり、アルゴリズム全体の性能は最初に選択された回路アーキテクチャに大きく依存する。いくつかの量子アーキテクチャ探索(QAS)アルゴリズムが、有用な回路アーキテクチャを自動で設計するために開発された。パラメータ最適化のみの場合、ノイズ効果はオプティマイザの性能と最終結果に劇的に影響を与えることが観測されており、これは研究の要点である。しかし, 構造探索における雑音の影響は, 同様に重要であろうが, ほとんど理解されていない。本研究は,現実的なVQA展開における課題に対処するために,カリキュラムベースの強化学習QAS(CRLQAS)アルゴリズムを導入することで,このギャップに対処する。アルゴリズムが組み込む (i)可能な回路の探索空間を効率的に探索するための3次元符号化と環境力学の制約二エージェントを操縦して短い回路を見つけるためのエピソード停止策及び (iii)高速収束のための最適化器としての同時摂動確率近似の新しい変種。本研究では,ポーリ・リオウヴィル法に基づくポーリ移動行列形式を用いることにより,ノイズ量子回路をシミュレーションする計算効率を大幅に向上させるアルゴリズムの最適化シミュレータを開発した。量子化学タスクに焦点をあてた数値実験により、CRLQASはノイズのない環境とノイズの多い環境の両方において、既存のQASアルゴリズムよりも優れていることを示した。 The key challenge in the noisy intermediate-scale quantum era is finding useful circuits compatible with current device limitations. Variational quantum algorithms (VQAs) offer a potential solution by fixing the circuit architecture and optimizing individual gate parameters in an external loop. However, parameter optimization can become intractable, and the overall performance of the algorithm depends heavily on the initially chosen circuit architecture. Several quantum architecture search (QAS) algorithms have been developed to design useful circuit architectures automatically. In the case of parameter optimization alone, noise effects have been observed to dramatically influence the performance of the optimizer and final outcomes, which is a key line of study. However, the effects of noise on the architecture search, which could be just as critical, are poorly understood. This work addresses this gap by introducing a curriculum-based reinforcement learning QAS (CRLQAS) algorithm designed to tackle challenges in realistic VQA deployment. The algorithm incorporates (i) a 3D architecture encoding and restrictions on environment dynamics to explore the search space of possible circuits efficiently, (ii) an episode halting scheme to steer the agent to find shorter circuits, and (iii) a novel variant of simultaneous perturbation stochastic approximation as an optimizer for faster convergence. To facilitate studies, we developed an optimized simulator for our algorithm, significantly improving computational efficiency in simulating noisy quantum circuits by employing the Pauli-transfer matrix formalism in the Pauli-Liouville basis. Numerical experiments focusing on quantum chemistry tasks demonstrate that CRLQAS outperforms existing QAS algorithms across several metrics in both noiseless and noisy environments.	翻訳日:2024-02-07 18:02:02 公開日:2024-02-05
# 適応的勾配法で正方形ルートを除去できるか? 2次展望 Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective ( http://arxiv.org/abs/2402.03496v1 ) ライセンス: Link先を確認	Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani	(参考訳) adam(w)のような適応勾配最適化は、トランスフォーマーのような多くのディープラーニングアーキテクチャのデフォルトのトレーニングアルゴリズムである。彼らの対角プレコンディショナーは、平方根を介してパラメータ更新に組み込まれた勾配外積に基づいている。これらの方法はしばしば近似二階法として動機づけられるが、平方根は基本的な差を表す。本研究では,根を取り除くと適応的手法の挙動がどう変化するか,すなわち2次動機づけの強化について検討する。驚くべきことに、このような二乗根なし適応法は畳み込みアーキテクチャの一般化ギャップをsgdに縮めつつ、トランスフォーマー上でのルートベースの対応式の性能を維持している。二階視点は、非対角プレコンディショナーを用いた適応法の開発にも実用的な利点がある。 shampooのようなルートベースとは対照的に、数値的に不安定な行列平方根は必要とせず、低精度でうまく機能する。これは適応的手法の成功に現在見過ごされている適応性の役割について重要な疑問を提起する。 Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter update via a square root. While these methods are often motivated as approximate second-order methods, the square root represents a fundamental difference. In this work, we investigate how the behavior of adaptive methods changes when we remove the root, i.e. strengthen their second-order motivation. Surprisingly, we find that such square-root-free adaptive methods close the generalization gap to SGD on convolutional architectures, while maintaining their root-based counterpart's performance on transformers. The second-order perspective also has practical benefits for the development of adaptive methods with non-diagonal preconditioner. In contrast to root-based counterparts like Shampoo, they do not require numerically unstable matrix square roots and therefore work well in low precision, which we demonstrate empirically. This raises important questions regarding the currently overlooked role of adaptivity for the success of adaptive methods.	翻訳日:2024-02-07 18:01:36 公開日:2024-02-05
# 部分的確率的無限大ベイズ型ニューラルネットワーク Partially Stochastic Infinitely Deep Bayesian Neural Networks ( http://arxiv.org/abs/2402.03495v1 ) ライセンス: Link先を確認	Sergio Calvo-Ordonez, Matthieu Meunier, Francesco Piatti, Yuantao Shi	(参考訳) 本稿では,部分的確率性を無限大ニューラルネットワークの枠組みに統合する新しいアーキテクチャ群である,部分的確率的無限大ベイズ型ニューラルネットワークについて述べる。新しいアーキテクチャのクラスは、トレーニングや推論時の計算効率に関する既存のアーキテクチャの制限を改善するように設計されています。これを実現するために,我々は,頑健性,不確実性定量化,記憶効率といった完全確率性の利点を含む無限大極限における部分的確率性の利点を活用し,訓練や推論時の計算効率に関する限界を改善した。我々は,重み分割の方法を含むネットワーク設計の柔軟性を提供する,さまざまなアーキテクチャ構成を提案する。また,我々のネットワークファミリーがUniversal Conditional Distribution Approximatorに該当することを確立することにより,モデル表現性に関する数学的保証も提供する。最後に,複数のタスクにまたがる経験的評価から,提案するアーキテクチャは,より効率的であると同時に,下流のタスク性能や不確実性も向上していることが示された。 In this paper, we present Partially Stochastic Infinitely Deep Bayesian Neural Networks, a novel family of architectures that integrates partial stochasticity into the framework of infinitely deep neural networks. Our new class of architectures is designed to improve the limitations of existing architectures around computational efficiency at training and inference time. To do this, we leverage the advantages of partial stochasticity in the infinite-depth limit which include the benefits of full stochasticity e.g. robustness, uncertainty quantification, and memory efficiency, whilst improving their limitations around computational efficiency at training and inference time. We present a variety of architectural configurations, offering flexibility in network design including different methods for weight partition. We also provide mathematical guarantees on the expressivity of our models by establishing that our network family qualifies as Universal Conditional Distribution Approximators. Lastly, empirical evaluations across multiple tasks show that our proposed architectures achieve better downstream task performance and uncertainty quantification than their counterparts while being significantly more efficient.	翻訳日:2024-02-07 18:01:18 公開日:2024-02-05
# beyond text:音声によるロボットナビゲーションのためのllmの意思決定を改善する Beyond Text: Improving LLM's Decision Making for Robot Navigation via Vocal Cues ( http://arxiv.org/abs/2402.03494v1 ) ライセンス: Link先を確認	Xingpeng Sun, Haoming Meng, Souradip Chakraborty, Amrit Singh Bedi, Aniket Bera	(参考訳) この研究は、人間とロボットの対話に使用されるテキストベースの大規模言語モデル(llm)の致命的な欠点を浮き彫りにしている。 llmは、これらの人間の会話でテキストを処理するのに優れているが、ロボットや他のaiシステムの曖昧さと不確実性が信頼を損なうような、ソーシャルナビゲーションのようなシナリオにおける言葉による指示のニュアンスに苦しむ。テキストを超えて、さらにこれらの音声応答のパラ言語機能に焦点を当てることで、この欠点に対処できます。これらの特徴は、文字通りの単語(語彙内容)は含まないが、意味やニュアンスを何かの言葉を通して伝える音声コミュニケーションの側面である。提案する「beyond text(beyond text)」は,人間とロボットの会話における影響と関連性を重視した,これらの機能のサブセクションと音声転写を統合することで,llm意思決定を改善する手法である。このアプローチは 70.26% の勝利率を達成するだけでなく、既存の LLM を 48.30% で上回り、トークン操作の敵攻撃に対する堅牢性を高め、勝利率においてテキストのみの言語モデルよりも 22.44% の減少率で強調される。『Beyond Text』はソーシャルロボットナビゲーションとより広範なヒューマンロボットインタラクションの進歩であり、テキストベースのガイダンスをヒューマン・オーディオ・インフォームド言語モデルとシームレスに統合している。 This work highlights a critical shortcoming in text-based Large Language Models (LLMs) used for human-robot interaction, demonstrating that text alone as a conversation modality falls short in such applications. While LLMs excel in processing text in these human conversations, they struggle with the nuances of verbal instructions in scenarios like social navigation, where ambiguity and uncertainty can erode trust in robotic and other AI systems. We can address this shortcoming by moving beyond text and additionally focusing on the paralinguistic features of these audio responses. These features are the aspects of spoken communication that do not involve the literal wording (lexical content) but convey meaning and nuance through how something is said. We present "Beyond Text"; an approach that improves LLM decision-making by integrating audio transcription along with a subsection of these features, which focus on the affect and more relevant in human-robot conversations. This approach not only achieves a 70.26% winning rate, outperforming existing LLMs by 48.30%, but also enhances robustness against token manipulation adversarial attacks, highlighted by a 22.44% less decrease ratio than the text-only language model in winning rate. "Beyond Text" marks an advancement in social robot navigation and broader Human-Robot interactions, seamlessly integrating text-based guidance with human-audio-informed language models.	翻訳日:2024-02-07 18:00:59 公開日:2024-02-05
# 強ラベルを超えて:非造影CTにおける楕円様血管構造の分別のためのガウス擬似ラベルに基づく弱教師付き学習 Beyond Strong labels: Weakly-supervised Learning Based on Gaussian Pseudo Labels for The Segmentation of Ellipse-like Vascular Structures in Non-contrast CTs ( http://arxiv.org/abs/2402.03492v1 ) ライセンス: Link先を確認	Qixiang Ma, Antoine {\L}ucas, Huazhong Shu, Adrien Kaladji, Pascal Haigron	(参考訳) 術前ctスキャンにおける血管構造の自動分割は、血管疾患におけるコンピュータ支援診断および介入手順に寄与する。 CTアンギオグラフィー(CTA)は一般的な標準であるが、造影剤による合併症を回避し、コントラストリスクのない代替手段として非コントラストCTが重要である。しかし, 血管境界の曖昧さによる労働集約的なラベル付けと高いラベル付けの難しさは, 非造影CTにおける従来の強ラベルベースの完全教師あり学習を妨げている。本稿では,楕円の位相をスライスに含む弱教師付きフレームワークを提案する。 1)事前定義された基準に基づく効率的なアノテーション処理 2)楕円フィット加工 3) 擬似ラベルとしての2次元ガウス熱マップの生成 4) 擬似ラベルによるボクセル再構成損失と分配損失の組合せによるトレーニングプロセス。非コントラストCTによる1つの局所的および2つの公開データセットに対する提案手法の有効性について検討し,特に腹部大動脈に焦点を当てた。ローカルデータセットにおいて、擬似ラベルに基づく弱教師付き学習アプローチは、強いラベルに基づく完全教師付き学習(平均Diceスコアの1.54\%)より優れ、ラベル付け時間を約82.0\%削減する。擬似ラベルの生成効率は、ラベルに依存しない外部データをトレーニングセットに含めることを可能にし、パフォーマンス(平均でDiceスコアの2.74\%)が向上し、66.3\%のラベリング時間が短縮され、ラベリング時間が強いラベルよりも大幅に短縮される。公開データセットでは、擬似ラベルは2DモデルでDiceスコアの1.95\%を総合的に改善し、3Dモデルでハウスドルフ距離で11.65ボクセル間隔を縮める。 Deep-learning-based automated segmentation of vascular structures in preoperative CT scans contributes to computer-assisted diagnosis and intervention procedure in vascular diseases. While CT angiography (CTA) is the common standard, non-contrast CT imaging is significant as a contrast-risk-free alternative, avoiding complications associated with contrast agents. However, the challenges of labor-intensive labeling and high labeling variability due to the ambiguity of vascular boundaries hinder conventional strong-label-based, fully-supervised learning in non-contrast CTs. This paper introduces a weakly-supervised framework using ellipses' topology in slices, including 1) an efficient annotation process based on predefined standards, 2) ellipse-fitting processing, 3) the generation of 2D Gaussian heatmaps serving as pseudo labels, 4) a training process through a combination of voxel reconstruction loss and distribution loss with the pseudo labels. We assess the effectiveness of the proposed method on one local and two public datasets comprising non-contrast CT scans, particularly focusing on the abdominal aorta. On the local dataset, our weakly-supervised learning approach based on pseudo labels outperforms strong-label-based fully-supervised learning (1.54\% of Dice score on average), reducing labeling time by around 82.0\%. The efficiency in generating pseudo labels allows the inclusion of label-agnostic external data in the training set, leading to an additional improvement in performance (2.74\% of Dice score on average) with a reduction of 66.3\% labeling time, where the labeling time remains considerably less than that of strong labels. On the public dataset, the pseudo labels achieve an overall improvement of 1.95\% in Dice score for 2D models while a reduction of 11.65 voxel spacing in Hausdorff distance for 3D model.	翻訳日:2024-02-07 18:00:30 公開日:2024-02-05
# 臨床における敗血症発症の早期予測 Early prediction of onset of sepsis in Clinical Setting ( http://arxiv.org/abs/2402.03486v1 ) ライセンス: Link先を確認	Fahim Mohammad, Lakshmi Arunachalam, Samanway Sadhu, Boudewijn Aasman, Shweta Garg, Adil Ahmed, Silvie Colman, Meena Arunachalam, Sudhir Kulkarni, Parsa Mirhaji	(参考訳) 本研究は, アメリカ合衆国ブロンクスのモンテフィオーレ医療センターにおける臨床データを用いて, 敗血症の早期発症を予測する機械学習モデルを提案する。教師付き学習手法が採用され、XGBoostモデルは107の特徴(オリジナルと派生した特徴を含む)を含む列車データセットの80%を利用して訓練された。その後、テストデータの残りの20倍の値でモデルを評価した。モデルはトレーニング期間中に全く見えなかった予測データに基づいて検証された。患者個人レベルでのモデルの性能と予測のタイムラインを評価するために,PhyloNet Sepsis Challenge論文に概説されているように,シーシス検出のための評価方法として正規化ユーティリティスコアを用いた。 f1スコア、感度、特異性、フラグレートなどの指標も考案された。このモデルは、テストデータで0.494、しきい値0.3で予測データで0.378の正規化ユーティリティスコアを達成した。 f1スコアは、試験データと同一閾値の予測データそれぞれ80.8\%と67.1\%であり、臨床意思決定プロセスに効果的に統合される可能性を強調した。これらの結果は、モデルの堅牢な予測能力と臨床意思決定プロセスに大きな影響を与える可能性を示す。 This study proposes the use of Machine Learning models to predict the early onset of sepsis using deidentified clinical data from Montefiore Medical Center in Bronx, NY, USA. A supervised learning approach was adopted, wherein an XGBoost model was trained utilizing 80\% of the train dataset, encompassing 107 features (including the original and derived features). Subsequently, the model was evaluated on the remaining 20\% of the test data. The model was validated on prospective data that was entirely unseen during the training phase. To assess the model's performance at the individual patient level and timeliness of the prediction, a normalized utility score was employed, a widely recognized scoring methodology for sepsis detection, as outlined in the PhysioNet Sepsis Challenge paper. Metrics such as F1 Score, Sensitivity, Specificity, and Flag Rate were also devised. The model achieved a normalized utility score of 0.494 on test data and 0.378 on prospective data at threshold 0.3. The F1 scores were 80.8\% and 67.1\% respectively for the test data and the prospective data for the same threshold, highlighting its potential to be integrated into clinical decision-making processes effectively. These results bear testament to the model's robust predictive capabilities and its potential to substantially impact clinical decision-making processes.	翻訳日:2024-02-07 17:59:55 公開日:2024-02-05
# pubmedユーザクエリログを活用した推奨類似記事のポストホック説明 Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles ( http://arxiv.org/abs/2402.03484v1 ) ライセンス: Link先を確認	Ashley Shin, Qiao Jin, James Anibal, Zhiyong Lu	(参考訳) 参考記事に基づく関連記事の検索は、科学研究の不可欠な部分である。 pubmedは、多くの学術検索エンジンと同様に、ユーザーが見る現在の記事に関連する記事を推奨する「類似記事」機能を備えている。推奨項目の説明は,特に文献検索プロセスにおいて,ユーザにとって非常に有用である。毎年100万以上の生物医学論文が発行されており、推奨される類似記事の説明は、研究者や臨床医が関連記事を探すのに役立つだろう。しかし、現在の文献推薦制度の大半は、その提案についての説明を欠いている。類似記事のタイトルで関連するトークンを識別することで推奨を説明するため,ポストホックなアプローチを採用している。私たちの主な貢献はPubMedのユーザクエリログから560万のコクリックされた記事を再利用することでPubCLogsを構築することです。 PubCLogsデータセットを使用して、シード記事のタイトルと抽象化に基づいて、類似記事のタイトルの最も関連性の高い部分を選択するために設計されたトランスフォーマーベースのモデルであるHighlight similar Article Title(HSAT)をトレーニングする。 HSATは、実験的な評価において強いパフォーマンスを示し、PubCLogsテストセットでF1スコア91.72パーセントを獲得し、BM25 (70.62)、MPNet (67.11)、MedCPT (62.22)、GPT-3.5 (46.00)、GPT-4 (64.89)など、いくつかのベースラインをかなり上回った。別個の手動アノテートテストセットに関する追加評価は、HSATのパフォーマンスをさらに検証する。さらに,本研究の参加者は,簡潔さと包括性とのバランスが優れているため,HSATを好んでいる。本研究は,学術検索エンジンのユーザ問合せログの再利用が,文献レコメンデーションを説明するための最先端モデルをトレーニングするための有望な方法であることを示唆している。 Searching for a related article based on a reference article is an integral part of scientific research. PubMed, like many academic search engines, has a "similar articles" feature that recommends articles relevant to the current article viewed by a user. Explaining recommended items can be of great utility to users, particularly in the literature search process. With more than a million biomedical papers being published each year, explaining the recommended similar articles would facilitate researchers and clinicians in searching for related articles. Nonetheless, the majority of current literature recommendation systems lack explanations for their suggestions. We employ a post hoc approach to explaining recommendations by identifying relevant tokens in the titles of similar articles. Our major contribution is building PubCLogs by repurposing 5.6 million pairs of coclicked articles from PubMed's user query logs. Using our PubCLogs dataset, we train the Highlight Similar Article Title (HSAT), a transformer-based model designed to select the most relevant parts of the title of a similar article, based on the title and abstract of a seed article. HSAT demonstrates strong performance in our empirical evaluations, achieving an F1 score of 91.72 percent on the PubCLogs test set, considerably outperforming several baselines including BM25 (70.62), MPNet (67.11), MedCPT (62.22), GPT-3.5 (46.00), and GPT-4 (64.89). Additional evaluations on a separate, manually annotated test set further verifies HSAT's performance. Moreover, participants of our user study indicate a preference for HSAT, due to its superior balance between conciseness and comprehensiveness. Our study suggests that repurposing user query logs of academic search engines can be a promising way to train state-of-the-art models for explaining literature recommendation.	翻訳日:2024-02-07 17:59:34 公開日:2024-02-05
# SWAG:アクションガイダンスによるストーリーテリング SWAG: Storytelling With Action Guidance ( http://arxiv.org/abs/2402.03483v1 ) ライセンス: Link先を確認	Zeeshan Patel, Karim El-Refai, Jonathan Pei, Tianle Li	(参考訳) 自動化された長文のストーリー生成は通常、ワンショット作成にlong-context large language model(llm)を使用します。 LLMを用いたストーリーテリングの新しいアプローチであるストーリーテリング・ウィズ・アクション・ガイダンス(SWAG)を紹介する。提案手法は,2モデルフィードバックループを用いて,ストーリー執筆を探索問題に還元する。一方のLLMはストーリーコンテンツを生成し,他方のLLMはストーリーの今後の方向性を判断するために,次の「アクション」を選択する。 GPT-4による評価と人的評価により,SWAGは従来のエンド・ツー・エンドのストーリー生成技術を大幅に上回り,オープンソースモデルのみを用いたSWAGパイプラインはGPT-3.5-Turboを上回った。 Automated long-form story generation typically employs long-context large language models (LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging content. We introduce Storytelling With Action Guidance (SWAG), a novel approach to storytelling with LLMs. Our approach reduces story writing to a search problem through a two-model feedback loop: one LLM generates story content, and another auxiliary LLM is used to choose the next best "action" to steer the story's future direction. Our results show that SWAG can substantially outperform previous end-to-end story generation techniques when evaluated by GPT-4 and through human evaluation, and our SWAG pipeline using only open-source models surpasses GPT-3.5-Turbo.	翻訳日:2024-02-07 17:58:57 公開日:2024-02-05
# FINEST: ランク保存ファインチューニングによる勧告の安定化 FINEST: Stabilizing Recommendations by Rank-Preserving Fine-Tuning ( http://arxiv.org/abs/2402.03481v1 ) ライセンス: Link先を確認	Sejoon Oh, Berk Ustun, Julian McAuley, Srijan Kumar	(参考訳) 現代のレコメンダシステムは、トレーニングデータに小さな摂動があるため、かなり異なるレコメンデーションを出力できる。 1人のユーザーからのデータの変更は、他のユーザーのレコメンデーションだけでなくレコメンデーションも変更する。医療、住宅、金融といったアプリケーションでは、この感度はユーザー体験に悪影響を及ぼす可能性がある。このような摂動に対して所定のレコメンダシステムを安定化する手法を提案する。この課題は,(1)出力のアンロックに使用可能な「参照」ランクリストの欠如,(2)トレーニングデータの摂動に関するランクリストの安定性を確保する上での計算上の課題による課題である。提案手法は,与えられた推奨モデルから参照ランクリストを取得し,サンプル項目のランク保存正規化による摂動シミュレーションによりモデルを微調整することで,これらの課題を克服する。実世界のデータセットに対する我々の実験は、FINESTが予測精度を損なうことなく、様々な摂動の下で安定したレコメンデーションを出力できることを実証している。 Modern recommender systems may output considerably different recommendations due to small perturbations in the training data. Changes in the data from a single user will alter the recommendations as well as the recommendations of other users. In applications like healthcare, housing, and finance, this sensitivity can have adverse effects on user experience. We propose a method to stabilize a given recommender system against such perturbations. This is a challenging task due to (1) the lack of a ``reference'' rank list that can be used to anchor the outputs; and (2) the computational challenges in ensuring the stability of rank lists with respect to all possible perturbations of training data. Our method, FINEST, overcomes these challenges by obtaining reference rank lists from a given recommendation model and then fine-tuning the model under simulated perturbation scenarios with rank-preserving regularization on sampled items. Our experiments on real-world datasets demonstrate that FINEST can ensure that recommender models output stable recommendations under a wide range of different perturbations without compromising next-item prediction accuracy.	翻訳日:2024-02-07 17:58:41 公開日:2024-02-05
# 科学発見のためのインフラストラクチャを実現するトリリオンパラメータAI:調査とビジョン Trillion Parameter AI Serving Infrastructure for Scientific Discovery: A Survey and Vision ( http://arxiv.org/abs/2402.03480v1 ) ライセンス: Link先を確認	Nathaniel Hudson, J. Gregory Pauloski, Matt Baughman, Alok Kamatar, Mansi Sakarvadia, Logan Ward, Ryan Chard, Andr\'e Bauer, Maksim Levental, Wenyi Wang, Will Engler, Owen Price Skelly, Ben Blaiszik, Rick Stevens, Kyle Chard, Ian Foster	(参考訳) ディープラーニングの手法は研究を変革し、新しい技術を可能にし、最終的には新しい発見につながる。より有能なAIモデルの需要が拡大を続ける中、私たちは現在、TPM(Trillion Parameter Models)の時代、あるいはHuaweiのPanGu-$\Sigma$のような1兆以上のパラメータを持つモデルに参入しています。我々は,tpmユーザと,科学コミュニティの特定のニーズに対応するプロバイダのエコシステムに対するビジョンについて述べる。次に,tpmが科学的研究と発見を可能にするためのシステム設計における重要な技術的課題と課題について概説する。具体的には,包括的ソフトウェアスタックの要件と,研究者の多様な柔軟な要件をサポートするためのインターフェースについて述べる。 Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.	翻訳日:2024-02-07 17:58:23 公開日:2024-02-05
# ICED:環境設計による強化学習におけるゼロショット転送 ICED: Zero-Shot Transfer in Reinforcement Learning via In-Context Environment Design ( http://arxiv.org/abs/2402.03479v1 ) ライセンス: Link先を確認	Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht	(参考訳) 深層強化学習(rl)を用いて訓練された自律エージェントは、訓練中に遭遇した環境と特性を共有する場合でも、新しい環境にうまく一般化する能力に欠けることが多い。本研究では,RLエージェントのゼロショット一般化能力(ZSG)に,個々の環境インスタンスやレベルのサンプリングがどう影響するかを検討する。我々は,基本層を共有するディープ・アクタ-クリティック・アーキテクチャにおいて,その価値損失に応じた優先順位付けレベルが,生成したトレーニングデータにおけるエージェントの内部表現とトレーニングレベルの相互情報を最小化することを発見した。これは、特定の適応サンプリング戦略によって達成される暗黙の正則化に対する新しい理論的な正当化を与える。次に,データ生成機構をより制御可能なued(unsupervised environment design)メソッドに注目します。既存のUED手法は,ZSG性能の低いトレーニング分布を著しく変化させることができる。オーバーフィッティングと分散シフトの両立を防ぐために,コンテキスト内環境設計 (iced) を導入する。 ICEDは、初期レベルパラメータに基づいて訓練された変分オートエンコーダを用いてレベルを生成し、分散シフトを低減し、適応レベルサンプリング戦略やUEDメソッドよりもZSGを大幅に改善する。 Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce in-context environment design (ICED). ICED generates levels using a variational autoencoder trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.	翻訳日:2024-02-07 17:58:08 公開日:2024-02-05
# 超拡散 : 単一モデルによるてんかんと動脈不確かさの推定 Hyper-Diffusion: Estimating Epistemic and Aleatoric Uncertainty with a Single Model ( http://arxiv.org/abs/2402.03478v1 ) ライセンス: Link先を確認	Matthew A. Chan, Maria J. Molina, Christopher A. Metzler	(参考訳) 医療画像や天気予報などの高度な応用に機械学習(ML)を適用する際には、疫学の不確実性(より多くのトレーニングデータで軽減できる不確実性)や失語症不確実性(手作業に固有の不確実性)を推定し、引き離すことが重要である。データセットの後部分布から正確かつ効率的にサンプルをサンプリングする条件付き拡散モデルのブレークスルー能力は、概念的には不確実性の推定を簡単化している。残念なことに、そのようなアンサンブルのトレーニングは、モデルアーキテクチャの複雑さが増すにつれて計算的に難解になる。本研究では, 一つのモデルを用いて, 審美的, アレタリックな不確かさを正確に推定できる, ハイパー拡散, アンサンブルの新しいアプローチを提案する。既存のモンテカルロのドロップアウトに基づく単一モデルアンサンブル法とは異なり、ハイパー拡散はマルチモデルアンサンブルと同じ予測精度を提供する。我々は,X線CT(Computerd tomography)再構成と気象温度予測という,2つの異なる課題に対するアプローチを検証する。 Estimating and disentangling epistemic uncertainty (uncertainty that can be reduced with more training data) and aleatoric uncertainty (uncertainty that is inherent to the task at hand) is critically important when applying machine learning (ML) to high-stakes applications such as medical imaging and weather forecasting. Conditional diffusion models' breakthrough ability to accurately and efficiently sample from the posterior distribution of a dataset now makes uncertainty estimation conceptually straightforward: One need only train and sample from a large ensemble of diffusion models. Unfortunately, training such an ensemble becomes computationally intractable as the complexity of the model architecture grows. In this work we introduce a new approach to ensembling, hyper-diffusion, which allows one to accurately estimate epistemic and aleatoric uncertainty with a single model. Unlike existing Monte Carlo dropout based single-model ensembling methods, hyper-diffusion offers the same prediction accuracy as multi-model ensembles. We validate our approach on two distinct tasks: x-ray computed tomography (CT) reconstruction and weather temperature forecasting.	翻訳日:2024-02-07 17:57:49 公開日:2024-02-05
# 複雑な視覚システムにおけるマルチタスク学習のロバスト解析 Robust Analysis of Multi-Task Learning on a Complex Vision System ( http://arxiv.org/abs/2402.03557v1 ) ライセンス: Link先を確認	Dayou Mao, Yuhao Chen, Yifan Wu, Maximilian Gilles, Alexander Wong	(参考訳) マルチタスク学習(MTL)は過去10年間に広く研究されてきた。特に、様々な設定で数十の最適化アルゴリズムが提案されている。それぞれが特定のデータセット上の特定のモデルに適用した場合の改善を主張しているが、複雑な実世界のシナリオにおけるパフォーマンスに関する深い理解はいまだにない。研究とアプリケーションの間のギャップを特定し、以下の4つの貢献を行う。 1)複雑な実世界の応用価値が高いロボット把持タスク用に設計されたメタグラスプネットデータセット上で,既存のmtl最適化アルゴリズムを包括的に評価し,最善の手法を導出する。 2) MTL最適化アルゴリズムの大規模集合に対する特徴レベル勾配とパラメータレベル勾配に適用した場合の手法性能を実証的に比較し,この特徴レベル勾配はメソッド固有の理論的保証があるが,すべての手法に一般化できない場合に妥当であると結論付けた。 3) タスク干渉問題に対する洞察を与え, 既存の勾配角と相対勾配ノルムの視点が, MTLの課題を正確に反映していないことを示す。 (4)特徴抽出器によって誘導される潜在空間の観点からタスク干渉問題の新たな視点を提供し,特徴のゆがみに基づくトレーニング監視結果を提供する。 Multi-task learning (MTL) has been widely studied in the past decade. In particular, dozens of optimization algorithms have been proposed for different settings. While each of them claimed improvement when applied to certain models on certain datasets, there is still lack of deep understanding on the performance in complex real-worlds scenarios. We identify the gaps between research and application and make the following 4 contributions. (1) We comprehensively evaluate a large set of existing MTL optimization algorithms on the MetaGraspNet dataset designed for robotic grasping task, which is complex and has high real-world application values, and conclude the best-performing methods. (2) We empirically compare the method performance when applied on feature-level gradients versus parameter-level gradients over a large set of MTL optimization algorithms, and conclude that this feature-level gradients surrogate is reasonable when there are method-specific theoretical guarantee but not generalizable to all methods. (3) We provide insights on the problem of task interference and show that the existing perspectives of gradient angles and relative gradient norms do not precisely reflect the challenges of MTL, as the rankings of the methods based on these two indicators do not align well with those based on the test-set performance. (4) We provide a novel view of the task interference problem from the perspective of the latent space induced by the feature extractor and provide training monitoring results based on feature disentanglement.	翻訳日:2024-02-07 17:51:52 公開日:2024-02-05
# hamlet:偏微分方程式のためのグラフトランスフォーマリン演算子 HAMLET: Graph Transformer Neural Operator for Partial Differential Equations ( http://arxiv.org/abs/2402.03541v1 ) ライセンス: Link先を確認	Andrey Bryutkin, Jiahao Huang, Zhongying Deng, Guang Yang, Carola-Bibiane Sch\"onlieb, Angelica Aviles-Rivero	(参考訳) 本稿では、ニューラルネットワークを用いて偏微分方程式(PDE)を解く際の課題を解決するために、新しいグラフトランスフォーマーフレームワークHAMLETを提案する。このフレームワークは、モジュラー入力エンコーダを備えたグラフトランスフォーマーを使用して、微分方程式情報をソリューションプロセスに直接組み込む。このモジュラリティはパラメータ対応制御を強化し、任意のジオメトリと様々な入力フォーマットのPDEにHAMLETを適応させる。特にhamletはデータの複雑さとノイズの増加で効果的にスケールし、堅牢性を示している。 HAMLETは単一の物理シミュレーションに適合するだけでなく、様々な領域にまたがって適用することができる。さらに、特にデータ制限のあるシナリオでは、モデルのレジリエンスとパフォーマンスが向上する。我々は,広範な実験を通じて,pdesの現在の技術に匹敵する能力があることを実証する。 We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs.	翻訳日:2024-02-07 17:51:29 公開日:2024-02-05
# 信頼できる機械学習のための規制ゲーム Regulation Games for Trustworthy Machine Learning ( http://arxiv.org/abs/2402.03540v1 ) ライセンス: Link先を確認	Mohammad Yaghini, Patty Liu, Franziska Boenisch, Nicolas Papernot	(参考訳) 信頼に値する機械学習(ML)に関する既存の作業は、公正さやプライバシなど、信頼の個々の側面に集中することが多い。さらに、多くのテクニックは、MLモデルをトレーニングする人と、信頼性を評価する責任を持つ人との違いを見落としています。これらの問題に対処するために,信頼性の高いMLを多目的マルチエージェント最適化問題とみなすフレームワークを提案する。これは、レギュレーションゲームと呼ばれるゲーム理論の定式化に自然に役立ちます。 MLモデルビルダーと公正性とプライバシ規制の関係をモデル化する,特定のゲームインスタンスであるSpecGameについて説明する。レギュレータは、仕様の遵守を強制する罰則を設計したいが、ビルダーへの参加を妨げたくない。このような社会的に最適(つまり全てのエージェントに効率的)なソリューションをゲームに求め、ParetoPlayを紹介します。この新しい平衡探索アルゴリズムは、エージェントがその目的のパレートフロンティアに留まり、他の平衡の非効率性を避けることを保証する。 paretoplayによるspecgameのシミュレーションは、ml規制のためのポリシーガイダンスを提供する。例えば、性別分類のアプリケーションでは、規制当局が要求される保証を最初に指定するイニシアチブをとると、平均4.0未満のディファレンシャルプライバシー予算を強制できることを示します。 Existing work on trustworthy machine learning (ML) often concentrates on individual aspects of trust, such as fairness or privacy. Additionally, many techniques overlook the distinction between those who train ML models and those responsible for assessing their trustworthiness. To address these issues, we propose a framework that views trustworthy ML as a multi-objective multi-agent optimization problem. This naturally lends itself to a game-theoretic formulation we call regulation games. We illustrate a particular game instance, the SpecGame in which we model the relationship between an ML model builder and fairness and privacy regulators. Regulators wish to design penalties that enforce compliance with their specification, but do not want to discourage builders from participation. Seeking such socially optimal (i.e., efficient for all agents) solutions to the game, we introduce ParetoPlay. This novel equilibrium search algorithm ensures that agents remain on the Pareto frontier of their objectives and avoids the inefficiencies of other equilibria. Simulating SpecGame through ParetoPlay can provide policy guidance for ML Regulation. For instance, we show that for a gender classification application, regulators can enforce a differential privacy budget that is on average 4.0 lower if they take the initiative to specify their desired guarantee first.	翻訳日:2024-02-07 17:51:16 公開日:2024-02-05
# 拡張バージョン: 解集合プログラミングの構造的硬さについて: 構造は分断の力を効率的に定義できるか? Extended Version of: On the Structural Hardness of Answer Set Programming: Can Structure Efficiently Confine the Power of Disjunctions? ( http://arxiv.org/abs/2402.03539v1 ) ライセンス: Link先を確認	Markus Hecher, Rafael Kiesel	(参考訳) Answer Set Programming(ASP)は、知識表現と産業アプリケーションの急速な成長に焦点を当てた、汎用的な問題解決フレームワークである。これまでの複雑性の研究は、ハードネスの特徴付けとソースの決定、二分法的な結果の形での詳細な洞察、そして詳細なパラメータ化された複雑さのランドスケープに繋がった。残念なことに、よく知られたパラメータツリー幅分離プログラムは、合理的な複雑性の仮定の下で二重指数実行を必要とする。これはすぐに手が届かない。我々は,プログラムの規則構造(インシデンスグラフ)上の分離型aspのための構造パラメータの分類を扱う。まず,プログラム構造に表現されないサブセット最小化にもかかわらず,頂点被覆サイズの観点から単一指数ランタイムを得る多項式カーネルを提案する。次に、頂点被覆サイズと木幅の間の厳密な構造パラメータに注意を向ける。ここでは、木深さ、フィードバック頂点サイズ、斜め幅という、その範囲で最も顕著なパラメータに対して、二重指数下界を提供する。これに基づいて、残念ながら、頂点被覆サイズを超える選択肢は限られていると論じる。本研究は, パラメータ圧縮の複雑さの増大を, 正規化プログラムから解離プログラムへの新たな還元に頼って, 奥行きの硬さについて検討した。 Answer Set Programming (ASP) is a generic problem modeling and solving framework with a strong focus on knowledge representation and a rapid growth of industrial applications. So far, the study of complexity resulted in characterizing hardness and determining their sources, fine-grained insights in the form of dichotomy-style results, as well as detailed parameterized complexity landscapes. Unfortunately, for the well-known parameter treewidth disjunctive programs require double-exponential runtime under reasonable complexity assumptions. This quickly becomes out of reach. We deal with the classification of structural parameters for disjunctive ASP on the program's rule structure (incidence graph). First, we provide a polynomial kernel to obtain single-exponential runtime in terms of vertex cover size, despite subset-minimization being not represented in the program's structure. Then we turn our attention to strictly better structural parameters between vertex cover size and treewidth. Here, we provide double-exponential lower bounds for the most prominent parameters in that range: treedepth, feedback vertex size, and cliquewidth. Based on this, we argue that unfortunately our options beyond vertex cover size are limited. Our results provide an in-depth hardness study, relying on a novel reduction from normal to disjunctive programs, trading the increase of complexity for an exponential parameter compression.	翻訳日:2024-02-07 17:50:54 公開日:2024-02-05
# ANNによるBLDCモータの位置と速度センサレス推定 ANN-based position and speed sensorless estimation for BLDC motors ( http://arxiv.org/abs/2402.03534v1 ) ライセンス: Link先を確認	Jose-Carlos Gamazo-Real, Victor Martinez-Martinez, Jaime Gomez-Gil	(参考訳) BLDCモーターの応用には正確な位置と速度の測定が必要である。本稿では,PWM制御インバータを動作させるFPGAで得られた終端位相電圧を減衰した位相電圧を用いて位置センサを使わずにこれらの測定値を推定する方法を提案する。電圧は、パーセプトロンベースのカスケードトポロジーを持つ2つの3層ANNのトレーニングおよびテストデータを提供するエンコーダを用いて、電気的および仮想的なローター状態とラベル付けされる。第1のANNは、インクリメンタルタイムスタンプを持つ電圧の特徴から位置を推定し、第2のANNは、取得ウィンドウにおけるタイムスタンプを考慮した位置差の特徴から速度を推定する。センサーベースのトレーニングとセンサーレステストは125 - 1500 rpmで8極ペアモーターを装填し、絶対誤差は0.8度と22 rpmであった。その結果, 総合的位置推定は従来法および先進的手法を著しく改善し, 速度推定は従来法をわずかに改善したが, 先進的手法よりも悪くなった。 BLDC motor applications require precise position and speed measurements, traditionally obtained with sensors. This article presents a method for estimating those measurements without position sensors using terminal phase voltages with attenuated spurious, acquired with a FPGA that also operates a PWM-controlled inverter. Voltages are labelled with electrical and virtual rotor states using an encoder that provides training and testing data for two three-layer ANNs with perceptron-based cascade topology. The first ANN estimates the position from features of voltages with incremental timestamps, and the second ANN estimates the speed from features of position differentials considering timestamps in an acquisition window. Sensor-based training and sensorless testing at 125 to 1,500 rpm with a loaded 8-pole-pair motor obtained absolute errors of 0.8 electrical degrees and 22 rpm. Results conclude that the overall position estimation significantly improved conventional and advanced methods, and the speed estimation slightly improved conventional methods, but was worse than in advanced ones.	翻訳日:2024-02-07 17:50:31 公開日:2024-02-05
# フェデレーション・コンテクスト・バンディットにおける公平性とプライバシー保証 Fairness and Privacy Guarantees in Federated Contextual Bandits ( http://arxiv.org/abs/2402.03531v1 ) ライセンス: Link先を確認	Sambhav Solanki, Shweta Jain, Sujit Gujar	(参考訳) 本稿では,フェデレーション環境における公平性とプライバシ保証を伴うCMAB問題について考察する。我々は,メリットに基づく露出を望ましい公平な結果と考え,報酬に比例して各行動に露出を与える。このアルゴリズムの有効性を公正な最適ポリシーとアルゴリズムによるポリシー出力の違いを捉えたfairness regretを用いてモデル化する。公平CMABアルゴリズムを各エージェントに適用すると、各エージェントの個々に公正な後悔が生じる。我々は,協調的フェデレーション学習をより効果的にし,差分プライバシーを確保するアルゴリズムであるFed-FairX-LinUCBを提案する。既存のプライバシーフレームワークを拡張する上で最大の課題は、エージェント間で必要な情報を通信するための通信プロトコルを設計することだ。ナイーブなプロトコルは、プライバシの保証が弱くなるか、あるいは後悔が高まる。実現可能な新しい通信プロトコルを設計する (i)Fed-FairX-LinUCBとPriv-FairX-LinUCBに匹敵する正当性を後悔するサブ線形理論的境界 2Priv-FairX-LinUCBにおけるプライバシー予算の有効活用提案アルゴリズムの有効性をシミュレーションによる実験により実証した。 Fed-FairX-LinUCB と Priv-FairX-LinUCB はともに, ほぼ最適の公平さを後悔する。 This paper considers the contextual multi-armed bandit (CMAB) problem with fairness and privacy guarantees in a federated environment. We consider merit-based exposure as the desired fair outcome, which provides exposure to each action in proportion to the reward associated. We model the algorithm's effectiveness using fairness regret, which captures the difference between fair optimal policy and the policy output by the algorithm. Applying fair CMAB algorithm to each agent individually leads to fairness regret linear in the number of agents. We propose that collaborative -- federated learning can be more effective and provide the algorithm Fed-FairX-LinUCB that also ensures differential privacy. The primary challenge in extending the existing privacy framework is designing the communication protocol for communicating required information across agents. A naive protocol can either lead to weaker privacy guarantees or higher regret. We design a novel communication protocol that allows for (i) Sub-linear theoretical bounds on fairness regret for Fed-FairX-LinUCB and comparable bounds for the private counterpart, Priv-FairX-LinUCB (relative to single-agent learning), (ii) Effective use of privacy budget in Priv-FairX-LinUCB. We demonstrate the efficacy of our proposed algorithm with extensive simulations-based experiments. We show that both Fed-FairX-LinUCB and Priv-FairX-LinUCB achieve near-optimal fairness regret.	翻訳日:2024-02-07 17:50:13 公開日:2024-02-05
# 空間設定における予測手法の一貫性検証 Consistent Validation for Predictive Methods in Spatial Settings ( http://arxiv.org/abs/2402.03527v1 ) ライセンス: Link先を確認	David R. Burt, Yunyi Shen and Tamara Broderick	(参考訳) 空間予測タスクは、天気予報、大気汚染の研究、その他の科学的取り組みの鍵となる。統計的または物理的手法による予測をどの程度信頼するかを決定することは科学的結論の信頼性に不可欠である。残念ながら、バリデーションのための古典的なアプローチでは、バリデーションで利用可能な場所と、予測したい場所(テスト)との間のミスマッチを処理できません。このミスマッチは、検証とテストの場所が2つの分布の i.d. よりも固定されているため、共変量シフトの例(一般に形式化される)ではないことが多い。本研究は,検証データが任意に密集するにつれて,任意に正確になる検証方法のチェックを形式化する。古典的および共変量シフト法は、このチェックに失敗する可能性がある。代わりに,共変量シフト文学における既存の考え方に基づく検証データに適用する手法を提案する。私たちは提案がチェックに合格することを証明します。そして、シミュレーションデータと実データでその利点を実証する。 Spatial prediction tasks are key to weather forecasting, studying air pollution, and other scientific endeavors. Determining how much to trust predictions made by statistical or physical methods is essential for the credibility of scientific conclusions. Unfortunately, classical approaches for validation fail to handle mismatch between locations available for validation and (test) locations where we want to make predictions. This mismatch is often not an instance of covariate shift (as commonly formalized) because the validation and test locations are fixed (e.g., on a grid or at select points) rather than i.i.d. from two distributions. In the present work, we formalize a check on validation methods: that they become arbitrarily accurate as validation data becomes arbitrarily dense. We show that classical and covariate-shift methods can fail this check. We instead propose a method that builds from existing ideas in the covariate-shift literature, but adapts them to the validation data at hand. We prove that our proposal passes our check. And we demonstrate its advantages empirically on simulated and real data.	翻訳日:2024-02-07 17:49:50 公開日:2024-02-05
# nnmamba: 状態空間モデルを用いた3次元生体医用画像分割,分類,ランドマーク検出 nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model ( http://arxiv.org/abs/2402.03526v1 ) ライセンス: Link先を確認	Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, Haofeng Li	(参考訳) バイオメディカル画像解析の分野では、特に3次元画像のセグメンテーション、分類、ランドマーク検出を扱う場合、長距離依存を効果的に把握できるアーキテクチャの探求が最重要である。従来の畳み込みニューラルネットワーク(cnns)は各分野の局所性に苦慮しており、トランスフォーマーは高次元の医用画像に適用すると高い計算負荷を負う。本稿では、CNNの強みとステートスペースシーケンスモデル(SSM)の高度な長距離モデリング機能を統合する新しいアーキテクチャであるnnMambaを紹介する。 nnmambaは ssms を畳み込み残差ブロックに追加し、局所的な特徴とモデル複雑な依存関係を抽出する。差分タスクでは、機能を学ぶために異なるブロックを構築します。広範な実験により、nnmambaは3d画像のセグメンテーション、分類、ランドマーク検出を含む一連の困難なタスクにおいて最先端の手法よりも優れていることが示されている。 nnmambaはロバストなソリューションとして登場し、cnnのローカル表現能力とssmsの効率的なグローバルコンテキスト処理を提供し、医療画像解析における長距離依存性モデリングの新しい標準を設定する。コードはhttps://github.com/lhaof/nnMambaで入手できる。 In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimensional medical images. In this paper, we introduce nnMamba, a novel architecture that integrates the strengths of CNNs and the advanced long-range modeling capabilities of State Space Sequence Models (SSMs). nnMamba adds the SSMs to the convolutional residual-block to extract local features and model complex dependencies. For diffirent tasks, we build different blocks to learn the features. Extensive experiments demonstrate nnMamba's superiority over state-of-the-art methods in a suite of challenging tasks, including 3D image segmentation, classification, and landmark detection. nnMamba emerges as a robust solution, offering both the local representation ability of CNNs and the efficient global context processing of SSMs, setting a new standard for long-range dependency modeling in medical image analysis. Code is available at https://github.com/lhaof/nnMamba	翻訳日:2024-02-07 17:49:32 公開日:2024-02-05
# 倉庫におけるピッカールーティング問題に対する深層強化学習 Deep Reinforcement Learning for Picker Routing Problem in Warehousing ( http://arxiv.org/abs/2402.03525v1 ) ライセンス: Link先を確認	George Dunn, Hadi Charkhgard, Ali Eshragh, Sasan Mahmoudinazlou and Elizabeth Stojanovski	(参考訳) 注文ピッカールーティングは倉庫管理において重要な問題である。問題の複雑さと迅速な解法の必要性により、最適化アルゴリズムは実際に頻繁に用いられる。しかし、強化学習(Reinforcement Learning)は、従来のヒューリスティックスに代えて魅力的な代替手段を提供する。本稿では,強化学習を用いて学習するピッカーツアーをモデル化するための注意に基づくニューラルネットワークを提案する。本手法はその有効性を示すために,様々な問題パラメータの既存のヒューリスティックスに対して評価を行った。提案手法の重要な利点は,経路の複雑さを低減できるオプションを提供することである。 Order Picker Routing is a critical issue in Warehouse Operations Management. Due to the complexity of the problem and the need for quick solutions, suboptimal algorithms are frequently employed in practice. However, Reinforcement Learning offers an appealing alternative to traditional heuristics, potentially outperforming existing methods in terms of speed and accuracy. We introduce an attention based neural network for modeling picker tours, which is trained using Reinforcement Learning. Our method is evaluated against existing heuristics across a range of problem parameters to demonstrate its efficacy. A key advantage of our proposed method is its ability to offer an option to reduce the perceived complexity of routes.	翻訳日:2024-02-07 17:49:09 公開日:2024-02-05
# NP完全頂点グラフ分類を高速化するガウスボソンサンプリング Gaussian Boson Sampling to Accelerate NP-Complete Vertex-Minor Graph Classification ( http://arxiv.org/abs/2402.03524v1 ) ライセンス: Link先を確認	Mushkan Sureka, Saikat Guha	(参考訳) gaussian boson sampling (gbs) は、古典的なコンピュータではサンプリングが難しい確率分布のクラスから、フォトンクリックパターンのランダムなサンプルを生成する。 GBS, Boson Sampling, instantaneous quantum polynomial (IQP) アルゴリズムを用いた量子超越性に関するヒロイックな実証にもかかわらず、証明可能な難解な問題に適用した場合のこれらの量子強調ランダムサンプルのパワーの体系的評価や、よく知られた古典的アルゴリズムのパフォーマンス比較は不足している。 2つのグラフが互いに頂点マイナーかどうかを決定するNP完全問題に対して,GBSを用いたハイブリッド量子古典アルゴリズムを提案する。グラフはGBSでエンコードされ、生成されたランダムサンプルはサポートベクトルマシン(SVM)分類器の機能ベクトルとして機能する。私たちは、ワンショットの分類精度と、生産が難しい量子リソースである入力スクイーズ量とのトレーディングを可能にするグラフ埋め込みを見出した。本稿では,グラフスペクトルに基づく新しい古典的アルゴリズムを提案する。本アルゴリズムの性能をこの古典的アルゴリズムと比較し,その時間と問題規模のスケーリングを解析し,望ましい分類精度を得る。シミュレーションによれば、短期的に実現可能なgbsデバイス – 5ドルのdbパルススクイーサー、12ドルのユニタリ、結合効率、光子数分解検出器のオンチップ損失、検出効率に関する合理的な仮定 — によって、私たちは、パワフルなデスクトップコンピュータに比べて約10^3$の時間で12ドルのノードバーテックスマイナーインスタンスを解決できる。 Gaussian Boson Sampling (GBS) generate random samples of photon-click patterns from a class of probability distributions that are hard for a classical computer to sample from. Despite heroic demonstrations for quantum supremacy using GBS, Boson Sampling, and instantaneous quantum polynomial (IQP) algorithms, systematic evaluations of the power of these quantum-enhanced random samples when applied to provably hard problems, and performance comparisons with best-known classical algorithms have been lacking. We propose a hybrid quantum-classical algorithm using the GBS for the NP-complete problem of determining if two graphs are vertex minor of one another. The graphs are encoded in GBS and the generated random samples serve as feature vectors in the support vector machine (SVM) classifier. We find a graph embedding that allows trading between the one-shot classification accuracy and the amount of input squeezing, a hard-to-produce quantum resource, followed by repeated trials and majority vote to reach an overall desired accuracy. We introduce a new classical algorithm based on graph spectra, which we show outperforms various well-known graph-similarity algorithms. We compare the performance of our algorithm with this classical algorithm and analyze their time vs problem-size scaling, to yield a desired classification accuracy. Our simulation suggests that with a near-term realizable GBS device- $5$ dB pulsed squeezer, $12$-mode unitary, and reasonable assumptions on coupling efficiency, on-chip losses and detection efficiency of photon number resolving detectors-we can solve a $12$-node vertex minor instances with about $10^3$ fold lower time compared to a powerful desktop computer.	翻訳日:2024-02-07 17:48:58 公開日:2024-02-05
# スペイン語における書き起こしの曖昧さの解消 : 句読点復元のためのハイブリッド音響語彙システム Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration ( http://arxiv.org/abs/2402.03519v1 ) ライセンス: Link先を確認	Xiliang Zhu, Chia-Tien Chang, Shayna Gardiner, David Rossouw, Jonas Robertson	(参考訳) Punctuation restorationは、転写可読性を高め、その後のNLPタスクを促進するための自動音声認識(ASR)システムにとって重要なステップである。それでも、従来の語彙に基づくアプローチはスペイン語の句読点復元の課題を解くには不十分であり、不定詞と疑問の間に曖昧さがしばしば見られる。そこで本研究では,モジュールプロセスを通じて音響信号と語彙信号を統合する,スペイン語転写のためのハイブリッド音響-語彙句読解システムを提案する。実験の結果,提案システムでは,スペイン語の公会話データセットと内部会話データセットの総合的な句読点復元とF1スコアを効果的に改善できることがわかった。さらに、LLM(Large Language Model)に対するベンチマーク比較は、精度、信頼性、レイテンシにおける我々のアプローチの優位性を示している。さらに,asrモジュールの単語誤り率(wer)も提案するシステムからメリットがあることを実証する。 Punctuation restoration is a crucial step after Automatic Speech Recognition (ASR) systems to enhance transcript readability and facilitate subsequent NLP tasks. Nevertheless, conventional lexical-based approaches are inadequate for solving the punctuation restoration task in Spanish, where ambiguity can be often found between unpunctuated declaratives and questions. In this study, we propose a novel hybrid acoustic-lexical punctuation restoration system for Spanish transcription, which consolidates acoustic and lexical signals through a modular process. Our experiment results show that the proposed system can effectively improve F1 score of question marks and overall punctuation restoration on both public and internal Spanish conversational datasets. Additionally, benchmark comparison against LLMs (Large Language Model) indicates the superiority of our approach in accuracy, reliability and latency. Furthermore, we demonstrate that the Word Error Rate (WER) of the ASR module also benefits from our proposed system.	翻訳日:2024-02-07 17:47:59 公開日:2024-02-05
# 異なる領域にまたがるゼロショット要約の事実性評価 Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains ( http://arxiv.org/abs/2402.03509v1 ) ライセンス: Link先を確認	Sanjana Ramprasad, Kundan Krishna, Zachary C Lipton and Byron C Wallace	(参考訳) 近年の研究では、大きな言語モデル(LLM)がゼロショット(すなわち、明示的な監督なしに)を生成できることが示されている。しかし、この以前の研究はほとんどニュース記事の要約を評価することに集中してきた。ゼロショット要約器は他の(潜在的により専門的な)ドメインでどのように機能するのか? 本研究では,生物医学的記事や法的請求書(参照のための標準ニュースベンチマークに加えて)を含む専門分野にまたがるゼロショット生成要約を評価する。特にアウトプットの事実性に注目します。ドメインの専門家からアノテーションを取得し、要約の不整合を識別し、これらのエラーを体系的に分類する。本研究では,事前学習コーパスにおける対象ドメインの有病率は,生成した記事の抽出性と忠実度に影響を及ぼすか分析する。収集したすべてのアノテーションを公開し、ニュース記事以外の事実的正確な要約を計測および実現するためのさらなる研究を促進する。データセットはhttps://github.com/sanjanaramprasad/zero_shot_faceval_domainsからダウンロードできる。 Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot (i.e., without explicit supervision) that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains? In this work we evaluate zero-shot generated summaries across specialized domains including biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles. The dataset can be downloaded from https://github.com/sanjanaramprasad/zero_shot_faceval_domains	翻訳日:2024-02-07 17:47:30 公開日:2024-02-05
# 抽象化と推論のためのニューラルネットワーク:マシンの広範な一般化に向けて Neural networks for abstraction and reasoning: Towards broad generalization in machines ( http://arxiv.org/abs/2402.03507v1 ) ライセンス: Link先を確認	Mikel Bober-Irizar, Soumya Banerjee	(参考訳) 人工知能の研究は半世紀にわたって、人間の抽象性と推論の質を再現しようと試みてきた。特定のニューラルネットワークは、目覚ましい範囲の問題を解決することができるが、トレーニングデータ以外の状況への広範な一般化は明らかにされている。本研究では、アルゴリズムを広範囲に一般化してテストするために導入された抽象的な視覚的推論タスクのデータセットであるARC(Abstraction & Reasoning Corpus)を解決するための、いくつかの新しいアプローチを検討する。 10万ドル(約10万円)の賞金を持つ3つの国際競争にもかかわらず、最高のアルゴリズムはARCタスクの大部分を解決できず、機械学習を全く使わずに複雑な手作りのルールに依存している。ニューラルネットワークの最近の進歩がこの課題の進展を許すかどうかを再考する。まず、DreamCoderのニューロシンボリック推論解法をARCに適用する。 dreamcoderは、人間の直観を模倣するニューラルネットワークを使って、ドメイン固有の言語でプログラムを自動記述して推論を行う。我々は,DreamCoder が ARC タスクを解くことを可能にする Perceptual Abstraction and Reasoning Language (PeARL) 言語を提案し,従来の最高の実装を大幅に改善できる新しい認識モデルを提案する。また,大規模言語モデル (LLM) が ARC タスクを解くことを可能にする新しいエンコーディングおよび拡張スキームを提案し,最大モデルが ARC タスクを解くことができることを発見した。 LLMは、最先端の問題解決者に対して異なるグループの問題を解決することができ、他のアプローチを補完する興味深い方法を提供する。どのシステムよりも優れた結果を得るために、モデルを組み合わせてアンサンブル分析を行う。最後に、ARCに関する今後の研究を容易にするために、arckit Pythonライブラリを公開します。 For half a century, artificial intelligence research has attempted to reproduce the human qualities of abstraction and reasoning - creating computer systems that can learn new concepts from a minimal set of examples, in settings where humans find this easy. While specific neural networks are able to solve an impressive range of problems, broad generalisation to situations outside their training data has proved elusive.In this work, we look at several novel approaches for solving the Abstraction & Reasoning Corpus (ARC), a dataset of abstract visual reasoning tasks introduced to test algorithms on broad generalization. Despite three international competitions with $100,000 in prizes, the best algorithms still fail to solve a majority of ARC tasks and rely on complex hand-crafted rules, without using machine learning at all. We revisit whether recent advances in neural networks allow progress on this task. First, we adapt the DreamCoder neurosymbolic reasoning solver to ARC. DreamCoder automatically writes programs in a bespoke domain-specific language to perform reasoning, using a neural network to mimic human intuition. We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks, and propose a new recognition model that allows us to significantly improve on the previous best implementation.We also propose a new encoding and augmentation scheme that allows large language models (LLMs) to solve ARC tasks, and find that the largest models can solve some ARC tasks. LLMs are able to solve a different group of problems to state-of-the-art solvers, and provide an interesting way to complement other approaches. We perform an ensemble analysis, combining models to achieve better results than any system alone. Finally, we publish the arckit Python library to make future research on ARC easier.	翻訳日:2024-02-07 17:46:26 公開日:2024-02-05
# ラベルのないデータが分散検出にどのように役立つか? How Does Unlabeled Data Provably Help Out-of-Distribution Detection? ( http://arxiv.org/abs/2402.03502v1 ) ライセンス: Link先を確認	Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li	(参考訳) ラベルのないデータを使用して機械学習モデルを正規化することにより、out-of-distribution(ood)データの検出における安全性と信頼性が向上する。 In-distriion(ID)データとOODデータの両方の不均一性のため、未ラベルのIn-the-wildデータのパワーを損なうことは自明ではない。クリーンなOODサンプルの欠如は、最適なOOD分類器を学習する上で大きな課題となる。現在、ラベルのないデータがOOD検出にどのように役立つのかを正式に理解する研究が不足している。本稿では,理論的保証と実証的有効性の両方を提供する新たな学習フレームワークSAL(Separate And Learn)を導入することにより,ギャップを埋める。このフレームワークは、未ラベルデータから候補外れ値を切り離し、候補外れ値とラベル付きIDデータを用いてOOD分類器を訓練する。理論的には、分離性と学習可能性のレンズから厳密な誤差境界を提供し、アルゴリズムの2つの要素を正式に正当化する。我々の理論は、SALが小さい誤り率で候補外乱を分離できることを示し、学習されたOOD分類器の一般化を保証する。実証的に、SALは一般的なベンチマークで最先端のパフォーマンスを達成し、理論的な洞察を補強します。コードはhttps://github.com/deeplearning-wisc/salで公開されている。 Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing the power of unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and OOD data. This lack of a clean set of OOD samples poses significant challenges in learning an optimal OOD classifier. Currently, there is a lack of research on formally understanding how unlabeled data helps OOD detection. This paper bridges the gap by introducing a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness. The framework separates candidate outliers from the unlabeled data and then trains an OOD classifier using the candidate outliers and the labeled ID data. Theoretically, we provide rigorous error bounds from the lens of separability and learnability, formally justifying the two components in our algorithm. Our theory shows that SAL can separate the candidate outliers with small error rates, which leads to a generalization guarantee for the learned OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, reinforcing our theoretical insights. Code is publicly available at https://github.com/deeplearning-wisc/sal.	翻訳日:2024-02-07 17:45:41 公開日:2024-02-05
# 分散シフトが強化学習性能に及ぼす影響の評価 Assessing the Impact of Distribution Shift on Reinforcement Learning Performance ( http://arxiv.org/abs/2402.03590v1 ) ライセンス: Link先を確認	Ted Fujimoto and Joshua Suetterlein and Samrat Chatterjee and Auroop Ganguly	(参考訳) 機械学習の研究は、独自の再現性危機の解決に進歩している。特に強化学習(rl)は、独自の課題に直面している。訓練中の最適方針への収束性を示す点推定とプロットの比較は、オーバーフィットや実験的な設定への依存を遠ざける可能性がある。 RLの研究者は、各アルゴリズムの強みと弱みをよりよく理解するために不確実性を説明する信頼性指標を提案しているが、過去の研究の推奨は、分布外観測の存在を前提としていない。本稿では,分散シフト時のrlアルゴリズムのロバスト性を測定する評価手法を提案する。ここで提示されるツールは、エージェントがその環境で動作している間に、時間とともにパフォーマンスを考慮する必要があると主張している。特に,観測rlの評価方法としては時系列解析を推奨する。また、RLとシミュレーションされた動的環境のユニークな性質は、評価における因果影響の測定を正当化するために、より強い仮定をすることができることを示す。次に、これらのツールを単エージェントおよびマルチエージェント環境に適用し、テスト時間中に分散シフトを導入する影響を示す。本手法は,分布シフトの存在下での厳密なrl評価への第一歩として提案する。 Research in machine learning is making progress in fixing its own reproducibility crisis. Reinforcement learning (RL), in particular, faces its own set of unique challenges. Comparison of point estimates, and plots that show successful convergence to the optimal policy during training, may obfuscate overfitting or dependence on the experimental setup. Although researchers in RL have proposed reliability metrics that account for uncertainty to better understand each algorithm's strengths and weaknesses, the recommendations of past work do not assume the presence of out-of-distribution observations. We propose a set of evaluation methods that measure the robustness of RL algorithms under distribution shifts. The tools presented here argue for the need to account for performance over time while the agent is acting in its environment. In particular, we recommend time series analysis as a method of observational RL evaluation. We also show that the unique properties of RL and simulated dynamic environments allow us to make stronger assumptions to justify the measurement of causal impact in our evaluations. We then apply these tools to single-agent and multi-agent environments to show the impact of introducing distribution shifts during test time. We present this methodology as a first step toward rigorous RL evaluation in the presence of distribution shifts.	翻訳日:2024-02-07 17:37:01 公開日:2024-02-05
# 拡散世界モデル Diffusion World Model ( http://arxiv.org/abs/2402.03570v1 ) ライセンス: Link先を確認	Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng	(参考訳) 我々は,多段階の将来の状態と報酬を同時に予測できる条件拡散モデルである拡散世界モデル(DWM)を紹介する。従来のワンステップのダイナミックスモデルとは対照的に、DWMは1つのフォワードパスで長い水平予測を提供し、再帰的なクェアは不要である。我々はDWMをモデルベース値推定に統合し、DWMからサンプリングした将来の軌跡によって短期的回帰をシミュレートする。オフライン強化学習の文脈では、DWMは生成モデルによる保守的な価値正規化と見なすことができる。あるいは、合成データによるオフラインQ-ラーニングを可能にするデータソースとして見ることもできる。 D4RLデータセットに対する実験により,DWMの長軸シミュレーションに対するロバスト性が確認された。絶対性能の面では、DWMは1ステップのダイナミックスモデルを大幅に上回り、4,4\%のパフォーマンス向上を実現している。 We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive quires. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a $44\%$ performance gain, and achieves state-of-the-art performance.	翻訳日:2024-02-07 17:36:40 公開日:2024-02-05
# 格子とギャップ付き/ギャップレス位相上のゲージング Gauging on the lattice and gapped/gapless topological phases ( http://arxiv.org/abs/2402.03566v1 ) ライセンス: Link先を確認	Takamasa Ando	(参考訳) 本研究では,ガウス法則をエネルギカルにのみ適用するシステムにおいて,効果的にゲージングあるいはフェミオン化することで得られる物質相を探索する。従来のゲージやフェルミオニゼーションとは対照的に、低エネルギーで効果的に測定される対称性は、ヒルベルト空間全体に忠実に作用する大域対称性を生成する。この対称性は、非自明な位相位相相を他の創発的対称性で保護するか、非自明な't hooft anomaly'を持つことができる。我々は、これらの対称性を一般的な設定で含む位相応答作用の正確な公式と't hooft anomalies'の式を求める。応用として, この手順の一般処理をギャップレスシステムに適用し, 低エネルギーでGu-Wenフェミオン異常を運ぶような, 様々な新しいギャップレスSPT位相を求める。 In this work, we explore topological phases of matter obtained by effectively gauging or fermionizing the system, where the Gauss law constraint is only enforced energetically. In contrast to conventional gauging or fermionizion, the symmetry that is effectively gauged in low energy still generates a global symmetry that acts on the whole Hilbert space faithfully. This symmetry turns out to protect a non-trivial topological phase with other emergent symmetry, or can have a non-trivial 't Hooft anomaly. We provide a precise formula for the topological response action involving these symmetries in a general setup, as well as find a formula for 't Hooft anomalies. As an application, we apply the general treatment of the procedure to gapless systems and find various new gapless SPT phases, such as the one carrying the Gu-Wen fermionic anomalies at low energy.	翻訳日:2024-02-07 17:36:26 公開日:2024-02-05
# SkipPredict: スケジューリングの予測にいつ投資するか SkipPredict: When to Invest in Predictions for Scheduling ( http://arxiv.org/abs/2402.03564v1 ) ライセンス: Link先を確認	Rana Shahout, Michael Mitzenmacher	(参考訳) 予測されたジョブサイズによるスケジューリングに関する最近の研究を踏まえ、待ち行列システムにおける予測コストの影響を考察し、予測がシステムのリソースやコストフリーとは無関係であるという前提を以前の研究から取り除いた。特に,従来の予測手法であるSkipPredictを導入し,そのコストに対処する手法を提案する。すべてのジョブに均一に予測を適用するのではなく、予測要求に基づいてジョブを分類する、カスタマイズされたアプローチを提案する。これを実現するために、1ビットの "cheap predictions" を使い、ジョブを短いか長いかのどちらかに分類する。 skippredictは、長いジョブよりも予測された短いジョブを優先し、skippredictは、これらのジョブの最も短い処理時間を近似するために、より詳細な "expensive predictions" の第2ラウンドを適用する。我々の分析は予測のコストを考慮している。 2つの異なるモデルに対するこのコストの影響について検討する。外部コストモデルでは、ジョブサービス時間に影響を与えることなく、コストを伴わない外部メソッドによって予測が生成される。サーバ時間コストモデルでは、予測自体がサーバ処理時間を必要とし、ジョブと同じサーバ上でスケジュールされる。 In light of recent work on scheduling with predicted job sizes, we consider the effect of the cost of predictions in queueing systems, removing the assumption in prior research that predictions are external to the system's resources and/or cost-free. In particular, we introduce a novel approach to utilizing predictions, SkipPredict, designed to address their inherent cost. Rather than uniformly applying predictions to all jobs, we propose a tailored approach that categorizes jobs based on their prediction requirements. To achieve this, we employ one-bit "cheap predictions" to classify jobs as either short or long. SkipPredict prioritizes predicted short jobs over long jobs, and for the latter, SkipPredict applies a second round of more detailed "expensive predictions" to approximate Shortest Remaining Processing Time for these jobs. Our analysis takes into account the cost of prediction. We examine the effect of this cost for two distinct models. In the external cost model, predictions are generated by some external method without impacting job service times but incur a cost. In the server time cost model, predictions themselves require server processing time, and are scheduled on the same server as the jobs.	翻訳日:2024-02-07 17:36:09 公開日:2024-02-05
# 未知の言語モデルから未知の言語を識別する Distinguishing the Knowable from the Unknowable with Language Models ( http://arxiv.org/abs/2402.03563v1 ) ライセンス: Link先を確認	Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman	(参考訳) 本研究では,自由形式テキスト上での大規模言語モデル(llm)の出力における認識的不確実性(知識の欠如を反映する)の同定の可能性について検討した。地中真理確率の欠如において, LLMの不確かさを(ほぼ)解消するために, 地中真理の代用として, はるかに大きなモデルが成立する環境を探究する。凍った事前学習されたモデルの埋め込みに基づいて訓練された小さな線形プローブは、より大きなモデルがトークンレベルでより自信を持つようになるタイミングを正確に予測し、あるテキストドメインで訓練されたプローブが他のものに一般化することを示す。さらに,同一タスクにおいて非自明な精度を実現する完全教師なし手法を提案する。まとめて、これらの結果は、LLMが様々な種類の不確実性の内的表現を自然に含んでいるという証拠として解釈し、様々な実践的な環境でモデル信頼性のより有益な指標を考案する可能性がある。 We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form text. In the absence of ground-truth probabilities, we explore a setting where, in order to (approximately) disentangle a given LLM's uncertainty, a significantly larger model stands in as a proxy for the ground truth. We show that small linear probes trained on the embeddings of frozen, pretrained models accurately predict when larger models will be more confident at the token level and that probes trained on one text domain generalize to others. Going further, we propose a fully unsupervised method that achieves non-trivial accuracy on the same task. Taken together, we interpret these results as evidence that LLMs naturally contain internal representations of different types of uncertainty that could potentially be leveraged to devise more informative indicators of model confidence in diverse practical settings.	翻訳日:2024-02-07 17:35:48 公開日:2024-02-05
# vln-video: 屋外視言語ナビゲーションにおける運転映像の活用 VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation ( http://arxiv.org/abs/2402.03561v1 ) ライセンス: Link先を確認	Jialu Li, Aishwarya Padmakumar, Gaurav Sukhatme, Mohit Bansal	(参考訳) アウトドアビジョン・アンド・ランゲージナビゲーション(VLN)では、エージェントが自然言語の指示に基づいて現実的な3D屋外環境をナビゲートする必要がある。既存のVLN法の性能は、ナビゲーション環境の多様性の不足と限られたトレーニングデータによって制限される。これらの課題に対処するため,米国内の複数の都市において,映像の運転中に発生する多様な屋外環境を利用して,自動生成ナビゲーション命令とアクションを付加して,屋外VLN性能を向上させるVLN-Videoを提案する。 VLN-Videoは、直感的な古典的アプローチと近代的なディープラーニング技術を組み合わせて、テンプレートインフィルを使用して基底ナビゲーション命令を生成し、画像回転類似性に基づくナビゲーションアクション予測器と組み合わせて、ディープラーニングVLNモデルを事前学習するためのビデオからVLNスタイルのデータを取得する。我々は、Touchdownデータセット上のモデルと、3つのプロキシタスクで動画の駆動から生成されたビデオ強化データセット、すなわち、マスケド言語モデリング、インストラクションとトラジェクトリマッチング、およびNext Action Predictionを事前トレーニングし、時間的に認識され、視覚的に整列された命令表現を学ぶ。学習した命令表現は、Touchdownデータセットの微調整時に最先端のナビゲータに適合する。実証実験の結果、VLN-Videoは従来の最先端モデルよりも2.1%向上し、Touchdownデータセット上で新しい最先端モデルを実現している。 Outdoor Vision-and-Language Navigation (VLN) requires an agent to navigate through realistic 3D outdoor environments based on natural language instructions. The performance of existing VLN methods is limited by insufficient diversity in navigation environments and limited training data. To address these issues, we propose VLN-Video, which utilizes the diverse outdoor environments present in driving videos in multiple cities in the U.S. augmented with automatically generated navigation instructions and actions to improve outdoor VLN performance. VLN-Video combines the best of intuitive classical approaches and modern deep learning techniques, using template infilling to generate grounded navigation instructions, combined with an image rotation similarity-based navigation action predictor to obtain VLN style data from driving videos for pretraining deep learning VLN models. We pre-train the model on the Touchdown dataset and our video-augmented dataset created from driving videos with three proxy tasks: Masked Language Modeling, Instruction and Trajectory Matching, and Next Action Prediction, so as to learn temporally-aware and visually-aligned instruction representations. The learned instruction representation is adapted to the state-of-the-art navigator when fine-tuning on the Touchdown dataset. Empirical results demonstrate that VLN-Video significantly outperforms previous state-of-the-art models by 2.1% in task completion rate, achieving a new state-of-the-art on the Touchdown dataset.	翻訳日:2024-02-07 17:35:27 公開日:2024-02-05
# 制約満足のための射影生成拡散モデル Projected Generative Diffusion Models for Constraint Satisfaction ( http://arxiv.org/abs/2402.03559v1 ) ライセンス: Link先を確認	Jacob K Christopher, Stephen Baek, Ferdinando Fioretto	(参考訳) 生成拡散モデルは、逐次過程を通じて生ノイズからコヒーレントな内容の堅牢な合成に優れる。しかしながら、特定の厳格な基準に準拠するために出力を必要とするシナリオでの直接適用は、いくつかの厳しい課題に直面している。本稿では,これらの課題を克服し,従来の拡散モデルのサンプリングを制約付き最適化問題にリキャストするPGDM(Projected Generative Diffusion Models)を提案する。これにより、生成されたデータが特定の制約や物理的原則に忠実に準拠することを保証する反復投影法の適用が可能になる。本稿では,複雑な非凸制約や常微分方程式の場合において,PGDMが制約の制約クラスの下で実現可能なサブディストリビューションから出力を合成する理論的支援を提供する。これらの能力は、ビデオ生成における物理インフォームドモーション、経路計画における軌道最適化、物質科学における形態的特性の付着によって実証される。 Generative diffusion models excel at robustly synthesizing coherent content from raw noise through a sequential process. However, their direct application in scenarios requiring outputs to adhere to specific, stringent criteria faces several severe challenges. This paper aims at overcome these challenges and introduces Projected Generative Diffusion Models (PGDM), an approach that recast traditional diffusion models sampling into a constrained-optimization problem. This enables the application of an iterative projections method to ensure that generated data faithfully adheres to specified constraints or physical principles. This paper provides theoretical support for the ability of PGDM to synthesize outputs from a feasible subdistribution under a restricted class of constraints while also providing large empirical evidence in the case of complex non-convex constraints and ordinary differential equations. These capabilities are demonstrated by physics-informed motion in video generation, trajectory optimization in path planning, and morphometric properties adherence in material science.	翻訳日:2024-02-07 17:34:59 公開日:2024-02-05
# 遅い地震解析のためのパスシグネチャとグラフニューラルネットワーク:より良い一緒に? Path Signatures and Graph Neural Networks for Slow Earthquake Analysis: Better Together? ( http://arxiv.org/abs/2402.03558v1 ) ライセンス: Link先を確認	Hans Riess, Manolis Veveakis, Michael M. Zavlanos	(参考訳) 機械学習コミュニティで最近成功したパスシグネチャは、不規則なパスからエンジニアリング特徴を理論的に駆動する手法である。一方、グラフニューラルネットワーク(GNN)は、グラフ上のデータを処理するためのニューラルネットワークであり、センサネットワークのような不規則なドメインを持つタスクに優れる。本稿では,グラフ畳み込みニューラルネットワーク(GCNN)に経路シグネチャを統合し,特徴抽出のための経路シグネチャとGCNNの強みを活用する新しいアプローチ,PS-GCNNを提案する。本手法は, ニュージーランド北島東海岸のGPSセンサネットワークを事例として, GPS時系列データを利用したスロースリップイベント(SSE)解析に応用した。同様の反応拡散現象をモデル化した確率微分方程式のシミュレーション手法のベンチマークも確立した。提案手法は,地震予知とセンサネットワーク解析の今後の進歩を示すものである。 The path signature, having enjoyed recent success in the machine learning community, is a theoretically-driven method for engineering features from irregular paths. On the other hand, graph neural networks (GNN), neural architectures for processing data on graphs, excel on tasks with irregular domains, such as sensor networks. In this paper, we introduce a novel approach, Path Signature Graph Convolutional Neural Networks (PS-GCNN), integrating path signatures into graph convolutional neural networks (GCNN), and leveraging the strengths of both path signatures, for feature extraction, and GCNNs, for handling spatial interactions. We apply our method to analyze slow earthquake sequences, also called slow slip events (SSE), utilizing data from GPS timeseries, with a case study on a GPS sensor network on the east coast of New Zealand's north island. We also establish benchmarks for our method on simulated stochastic differential equations, which model similar reaction-diffusion phenomenon. Our methodology shows promise for future advancement in earthquake prediction and sensor network analysis.	翻訳日:2024-02-07 17:34:43 公開日:2024-02-05
# GANの潜在空間における方向検出によるワンショットニューラルフェイス再現 One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space ( http://arxiv.org/abs/2402.03553v1 ) ライセンス: Link先を確認	Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, Ioannis Patras, Georgios Tzimiropoulos	(参考訳) 本稿では,3次元顔の向きと表情を元顔に転送することを目的とした,ニューラルフェイス/頭部再現のための枠組みを提案する。従来の手法では、アイデンティティと頭の位置/表情の不等角性のための埋め込みネットワークの学習に重点を置いており、これはかなり難しい作業であり、生成された画像の品質が低下することを示している。我々は、高品質な顔画像を生成することができる(微調整)訓練済みのGANを使用することで、そのようなネットワークのトレーニングを回避し、異なるアプローチをとる。 GANは制御性の弱さを特徴とするので,本手法のコアとなるのは,GAN空間のどの方向が頭部のポーズや表現の変動を制御するかを知る方法である。本稿では,3次元形状モデルを用いて,頭部のポーズ,アイデンティティ,表現のための非交叉方向を本質的にキャプチャする簡単なパイプラインを提案する。さらに,GAN潜在空間に実画像を埋め込むことで,実世界の顔の再現に有効であることを示す。提案手法は, 単一音源画像(ワンショット)の使用や, 対人再現など, いくつかの特性を特徴とする。広範な質的定量的な結果から,本手法はvoxceleb1と2の標準ベンチマークにおいて,最先端の手法で生成した顔よりも高い品質で再現された顔を生成する。 In this paper, we present our framework for neural face/head reenactment whose goal is to transfer the 3D head orientation and expression of a target face to a source face. Previous methods focus on learning embedding networks for identity and head pose/expression disentanglement which proves to be a rather hard task, degrading the quality of the generated images. We take a different approach, bypassing the training of such networks, by using (fine-tuned) pre-trained GANs which have been shown capable of producing high-quality facial images. Because GANs are characterized by weak controllability, the core of our approach is a method to discover which directions in latent GAN space are responsible for controlling head pose and expression variations. We present a simple pipeline to learn such directions with the aid of a 3D shape model which, by construction, inherently captures disentangled directions for head pose, identity, and expression. Moreover, we show that by embedding real images in the GAN latent space, our method can be successfully used for the reenactment of real-world faces. Our method features several favorable properties including using a single source image (one-shot) and enabling cross-person reenactment. Extensive qualitative and quantitative results show that our approach typically produces reenacted faces of notably higher quality than those produced by state-of-the-art methods for the standard benchmarks of VoxCeleb1 & 2.	翻訳日:2024-02-07 17:34:24 公開日:2024-02-05
# モンタナ州議会の2020年再分権地図の振り返り分析 A retrospective analysis of Montana's 2020 congressional redistricting map ( http://arxiv.org/abs/2402.03551v1 ) ライセンス: Link先を確認	Kelly McKinnie and Erin Szalda-Petree	(参考訳) 2020年の国勢調査では、モンタナ州では1人から2人の下院議員が増加した。州は2021年11月の議会選挙に間に合うように再編成を行い、州を2つの地区に分けた。本稿では,再編成過程を分析し,採択された議会地図と他の全ての可能な地図の空間を比較した。特に,これらの地図の人口変動,コンパクト性,政治的成果について考察する。また,可能な写像の空間からサンプルを採取する2つの一般的なサンプリング手法について検討し,それらの測定値の真の分布を近似する。 The 2020 decennial census data resulted in an increase from one to two congressional representatives in the state of Montana. The state underwent its redistricting process in 2021 in time for the November 2022 congressional elections, carving the state into two districts. This paper analyzes the redistricting process and compares the adopted congressional map to the space of all other possible maps. In particular, we look at the population deviation, compactness and political outcomes of these maps. We also consider how well two popular sampling techniques, that sample from the space of possible maps, approximate the true distributions of these measures.	翻訳日:2024-02-07 17:33:59 公開日:2024-02-05
# AnaMoDiff:Dunangled Denoisingによる2次元運動拡散 AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising ( http://arxiv.org/abs/2402.03549v1 ) ライセンス: Link先を確認	Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali Mahdavi-Amiri, Hao Zhang	(参考訳) 本稿では,2次元モーションアナロジーの拡散に基づく新しい手法であるAnaMoDiffについて述べる。我々のゴールは、2次元駆動映像からの動きを、その同一性、外観と自然運動の観点から正確に転写することであり、その部分比率と運動速度とスタイルにおいて、音源と駆動キャラクターの間に大きな相違がある場合であっても、十分に保存することである。拡散モデルでは,遅延光流(LOF)ネットワークを介して入射運動を伝達するが,これは空間的に認識され,元のRGBビデオと比較して処理が効率的であり,高密度な動きであっても拡散復調過程を通じてアーチファクトに耐性がある。動作アナロジーとアイデンティティ保存の両方を達成するために,2つのノイズレベルで動作しながら,特徴的不整合でデノナイジングモデルを訓練する。音源の個人識別は従来のノイズ注入によって学習されるが、ポーズと手足を含む動き特性は高次特徴によって符号化されるという規定により、大きな値でノイズを注入するだけでLOF処理した動画から運動特徴が学習される。実験により,本手法は動作類似とアイデンティティ保存の最良のトレードオフを実現することを示す。 We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies that is applied to raw, unannotated videos of articulated characters. Our goal is to accurately transfer motions from a 2D driving video onto a source character, with its identity, in terms of appearance and natural movement, well preserved, even when there may be significant discrepancies between the source and driving characters in their part proportions and movement speed and styles. Our diffusion model transfers the input motion via a latent optical flow (LOF) network operating in a noised latent space, which is spatially aware, efficient to process compared to the original RGB videos, and artifact-resistant through the diffusion denoising process even amid dense movements. To accomplish both motion analogy and identity preservation, we train our denoising model in a feature-disentangled manner, operating at two noise levels. While identity-revealing features of the source are learned via conventional noise injection, motion features are learned from LOF-warped videos by only injecting noise with large values, with the stipulation that motion properties involving pose and limbs are encoded by higher-level features. Experiments demonstrate that our method achieves the best trade-off between motion analogy and identity preservation.	翻訳日:2024-02-07 17:33:49 公開日:2024-02-05
# シングルGPU GNNシステム - トラップと落とし穴 Single-GPU GNN Systems: Traps and Pitfalls ( http://arxiv.org/abs/2402.03548v1 ) ライセンス: Link先を確認	Yidong Gong, Arnab Tarafder, Saima Afrin, and Pradeep Kumar	(参考訳) 現在のグラフニューラルネットワーク(GNN)システムは、トレーニングの精度を示さないという明確な傾向を確立し、評価のためにより小さなデータセットを直接あるいは間接的に依存している。詳細な分析から,システム設計と評価プロセスにおける落とし穴の連鎖,提案するシステム最適化の実用性への疑問,学習した結論や教訓に影響を及ぼすことが示された。多くのシングルGPUシステムを分析し、これらの落とし穴の根本的な影響を示す。さらに仮説,勧告,評価手法を開発し,今後の方向性を示す。最後に,システム設計の落とし穴の解決に根ざした新たな最適化の行を確立するために,新しい参照システムを開発した。提案した設計は、生産的に事前の作業に統合できるため、真に最先端の技術が進歩する。 The current graph neural network (GNN) systems have established a clear trend of not showing training accuracy results, and directly or indirectly relying on smaller datasets for evaluations majorly. Our in-depth analysis shows that it leads to a chain of pitfalls in the system design and evaluation process, questioning the practicality of many of the proposed system optimizations, and affecting conclusions and lessons learned. We analyze many single-GPU systems and show the fundamental impact of these pitfalls. We further develop hypotheses, recommendations, and evaluation methodologies, and provide future directions. Finally, a new reference system is developed to establish a new line of optimizations rooted in solving the system-design pitfalls efficiently and practically. The proposed design can productively be integrated into prior works, thereby truly advancing the state-of-the-art.	翻訳日:2024-02-07 17:33:24 公開日:2024-02-05
# 畳み込みニューラルネットワークのための新しいAUROC損失関数を用いた小児低グレード神経上皮腫瘍の分子サブタイプ同定の改善 Improving Pediatric Low-Grade Neuroepithelial Tumors Molecular Subtype Identification Using a Novel AUROC Loss Function for Convolutional Neural Networks ( http://arxiv.org/abs/2402.03547v1 ) ライセンス: Link先を確認	Khashayar Namdar, Matthias W. Wagner, Cynthia Hawkins, Uri Tabori, Birgit B. Ertl-Wagner, Farzad Khalvati	(参考訳) 小児低グレード神経上皮性腫瘍 (PLGNT) は小児の脳腫瘍の40%を占め, PLGNT分子サブタイプは治療計画に不可欠である。しかし、plgntサブタイプを決定するゴールドスタンダードは生検であり、患者にとって非実用的あるいは危険である。本研究は、MRIスキャンによるPLGNTサブタイプ分類における畳み込みニューラルネットワーク(CNN)の性能を改善し、非侵襲的な診断代替手段として、受信者動作特性(ROC)曲線(AUROC)モデルの範囲を特に改善する損失関数を導入する。本研究では, PLGNT (143 BRAF, 71 BRAF V600E 変異, 125 BRAF) の小児339例の振り返りデータセットを作成した。我々はモンテカルロ乱数分割を用いたCNNモデルを用いた。 bce(binary cross entropy)を用いて基礎モデルを訓練し,braf融合およびbraf v600e変異の鑑別に86.11%のaurocを行い,提案するauroc損失関数(p-value 0.045)を用いて87.71%に改良した。マルチクラス分類では、AUROCは74.42%から76に改善された。 59% (p-value 0.0016) であった。 Pediatric Low-Grade Neuroepithelial Tumors (PLGNT) are the most common pediatric cancer type, accounting for 40% of brain tumors in children, and identifying PLGNT molecular subtype is crucial for treatment planning. However, the gold standard to determine the PLGNT subtype is biopsy, which can be impractical or dangerous for patients. This research improves the performance of Convolutional Neural Networks (CNNs) in classifying PLGNT subtypes through MRI scans by introducing a loss function that specifically improves the model's Area Under the Receiver Operating Characteristic (ROC) Curve (AUROC), offering a non-invasive diagnostic alternative. In this study, a retrospective dataset of 339 children with PLGNT (143 BRAF fusion, 71 with BRAF V600E mutation, and 125 non-BRAF) was curated. We employed a CNN model with Monte Carlo random data splitting. The baseline model was trained using binary cross entropy (BCE), and achieved an AUROC of 86.11% for differentiating BRAF fusion and BRAF V600E mutations, which was improved to 87.71% using our proposed AUROC loss function (p-value 0.045). With multiclass classification, the AUROC improved from 74.42% to 76. 59% (p-value 0.0016).	翻訳日:2024-02-07 17:33:11 公開日:2024-02-05
# オンライン機能アップデートによりオンライン(一般)ラベルシフト適応が改善 Online Feature Updates Improve Online (Generalized) Label Shift Adaptation ( http://arxiv.org/abs/2402.03545v1 ) ライセンス: Link先を確認	Ruihan Wu and Siddhartha Datta and Yi Su and Dheeraj Baby and Yu-Xiang Wang and Kilian Q. Weinberger	(参考訳) 本稿では、時間とともにデータ分布が変化し、タイムリーなラベルを取得することが困難であるオンラインラベル設定におけるラベルシフトの問題に対処する。既存の手法は主に事前訓練された分類器の最終レイヤの調整や更新に重点を置いているが、テスト時にラベルのないデータを使って特徴表現を拡張できる未解決の可能性を探る。オンライン特徴更新を用いたオンラインラベルシフト適応法(OLS-OFU)では,自己教師付き学習を活用して特徴抽出を洗練し,予測モデルを改善する。理論的分析により、OLS-OFUは特徴改善のための自己教師付き学習に乗じてアルゴリズム的後悔を減らすことが確認された。オンラインラベルシフトと一般的なラベルシフト条件の両方下での様々なデータセットに関する実証的研究は、特にドメインシフトの場合、ols-ofuの有効性と堅牢性を強調している。 This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. Our novel method, Online Label Shift adaptation with Online Feature Updates (OLS-OFU), leverages self-supervised learning to refine the feature extraction process, thereby improving the prediction model. Theoretical analyses confirm that OLS-OFU reduces algorithmic regret by capitalizing on self-supervised learning for feature refinement. Empirical studies on various datasets, under both online label shift and generalized label shift conditions, underscore the effectiveness and robustness of OLS-OFU, especially in cases of domain shifts.	翻訳日:2024-02-07 17:32:43 公開日:2024-02-05
# 自転車シェアリングシステムにおける動的リバランシングのための強化学習手法 A Reinforcement Learning Approach for Dynamic Rebalancing in Bike-Sharing System ( http://arxiv.org/abs/2402.03589v1 ) ライセンス: Link先を確認	Jiaqi Liang, Sanjay Dominik Jena, Defeng Liu, Andrea Lodi	(参考訳) 自転車シェアリングシステムはエコフレンドリーな都市移動を提供し、交通渋滞の緩和と健康的なライフスタイルに寄与している。このようなシステムを効果的に運用し、高い顧客満足度を維持することは、旅行需要の確率的な性質から困難であり、フルまたは空の駅につながる。駅間で自転車を再分配するための車両を用いた効果的な再バランス戦略の開発は、オペレーターにとって非常に重要である。古典的な数学的最適化の代替として、逐次的な意思決定問題を解決するために強化学習が根底にある。本稿では,複数車両の動的リバランス問題に対する時空間強化学習アルゴリズムを提案する。まず,問題をマルチエージェントマルコフ決定プロセスとして,連続時間枠組で定式化する。これにより、独立かつ協調的な車両のリバランスが可能となり、車両の出発が同期される時間離散モデルの非現実的制限が排除される。多様な需要シナリオで即時報酬を計算し,学習プロセスを容易にするために,第1アーリブ・ファースト・サーベルールに基づく総合シミュレータを開発した。価値関数を推定し、再バランスポリシーを学ぶために、様々な深いqネットワーク構成をテストし、失われた需要を最小限にする。過去のデータから生成された様々なデータセットで実験を行い、時間的要因と気象要因の両方に影響される。提案アルゴリズムは、需要の減少の観点から、マルチ周期混合整数計画モデルを含むベンチマークより優れている。トレーニングが完了すると、即時決定が得られ、リアルタイムアプリケーションに適合する。我々の研究は、オペレーターに実践的な洞察を提供し、強化学習を動的リバランス問題に統合し、よりインテリジェントで堅牢な都市モビリティソリューションへの道を開く。 Bike-Sharing Systems provide eco-friendly urban mobility, contributing to the alleviation of traffic congestion and to healthier lifestyles. Efficiently operating such systems and maintaining high customer satisfaction is challenging due to the stochastic nature of trip demand, leading to full or empty stations. Devising effective rebalancing strategies using vehicles to redistribute bikes among stations is therefore of uttermost importance for operators. As a promising alternative to classical mathematical optimization, reinforcement learning is gaining ground to solve sequential decision-making problems. This paper introduces a spatio-temporal reinforcement learning algorithm for the dynamic rebalancing problem with multiple vehicles. We first formulate the problem as a Multi-agent Markov Decision Process in a continuous time framework. This allows for independent and cooperative vehicle rebalancing, eliminating the impractical restriction of time-discretized models where vehicle departures are synchronized. A comprehensive simulator under the first-arrive-first-serve rule is then developed to facilitate the learning process by computing immediate rewards under diverse demand scenarios. To estimate the value function and learn the rebalancing policy, various Deep Q-Network configurations are tested, minimizing the lost demand. Experiments are carried out on various datasets generated from historical data, affected by both temporal and weather factors. The proposed algorithms outperform benchmarks, including a multi-period Mixed-Integer Programming model, in terms of lost demand. Once trained, it yields immediate decisions, making it suitable for real-time applications. Our work offers practical insights for operators and enriches the integration of reinforcement learning into dynamic rebalancing problems, paving the way for more intelligent and robust urban mobility solutions.	翻訳日:2024-02-07 17:23:21 公開日:2024-02-05
# 二重頭部識別器による連続的領域適応 Continual Domain Adversarial Adaptation via Double-Head Discriminators ( http://arxiv.org/abs/2402.03588v1 ) ライセンス: Link先を確認	Yan Shen and Zhanghexuan Ji and Chunwei Ma and Mingchen Gao	(参考訳) 連続的な設定でのドメインの逆順応は、以前のソースドメインデータにアクセスする制限のために大きな課題となる。連続学習における広範な研究にもかかわらず、メモリ再生手法の標準設定である少数の記憶領域データのみを用いて、対向適応のタスクを効果的に行うことはできない。この制限は、ソースドメインサンプルがほとんどない$\gH$-divergenceの誤った経験的推定から生じる。そこで,本稿では,ソース学習段階のみに訓練された追加のソースのみのドメイン判別器を導入することにより,ダブルヘッド判別アルゴリズムを提案する。事前学習したソースのみのドメイン識別器を導入することで、$$\gH$-divergence関連の逆数損失の実験的推定誤差がソースドメイン側から減少することを証明する。既存領域適応ベンチマークのさらなる実験により,提案アルゴリズムは対象領域適応タスクのすべてのカテゴリにおいて2$$%以上の改善を達成し,ソース領域の忘れを著しく軽減することを示した。 Domain adversarial adaptation in a continual setting poses a significant challenge due to the limitations on accessing previous source domain data. Despite extensive research in continual learning, the task of adversarial adaptation cannot be effectively accomplished using only a small number of stored source domain data, which is a standard setting in memory replay approaches. This limitation arises from the erroneous empirical estimation of $\gH$-divergence with few source domain samples. To tackle this problem, we propose a double-head discriminator algorithm, by introducing an addition source-only domain discriminator that are trained solely on source learning phase. We prove that with the introduction of a pre-trained source-only domain discriminator, the empirical estimation error of $\gH$-divergence related adversarial loss is reduced from the source domain side. Further experiments on existing domain adaptation benchmark show that our proposed algorithm achieves more than 2$\%$ improvement on all categories of target domain adaptation task while significantly mitigating the forgetting on source domain.	翻訳日:2024-02-07 17:22:54 公開日:2024-02-05
# アクティブ相関クラスタリングのための効果的な獲得関数 Effective Acquisition Functions for Active Correlation Clustering ( http://arxiv.org/abs/2402.03587v1 ) ライセンス: Link先を確認	Linus Aronsson, Morteza Haghir Chehreghani	(参考訳) 相関クラスタリングは、正と負の類似性をサポートする強力な教師なし学習パラダイムである。本稿では,その類似性が事前に分かっていないことを仮定する。代わりに私たちは、コスト効率のよい方法で類似性を反復的にクエリするために、アクティブな学習を採用しています。特に,この設定で使用する3つの効果的な獲得関数を開発する。 1つは矛盾の概念に基づいている(すなわち類似性が推移性に違反する場合)。残りの2つは情報理論量、すなわちエントロピーと情報ゲインに基づいている。 Correlation clustering is a powerful unsupervised learning paradigm that supports positive and negative similarities. In this paper, we assume the similarities are not known in advance. Instead, we employ active learning to iteratively query similarities in a cost-efficient way. In particular, we develop three effective acquisition functions to be used in this setting. One is based on the notion of inconsistency (i.e., when similarities violate the transitive property). The remaining two are based on information-theoretic quantities, i.e., entropy and information gain.	翻訳日:2024-02-07 17:22:38 公開日:2024-02-05
# デコーダのみの画像登録 Decoder-Only Image Registration ( http://arxiv.org/abs/2402.03585v1 ) ライセンス: Link先を確認	Xi Jia, Wenqi Lu, Xinxing Cheng, Jinming Duan	(参考訳) 教師なしの医療画像登録では、エンコーダ-デコーダネットワークアーキテクチャの利用が主なアプローチであり、与えられたペア画像から密度の高いフル解像度の変位場を正確に予測することができる。文献で広く使われているにもかかわらず、このようなアーキテクチャでエンコーダとデコーダの両方を学習可能にする必要性について論じる。本研究では,学習可能なデコーダのみを含む新しいネットワークアーキテクチャであるLesNetを提案し,学習可能なエンコーダの利用を完全に省略する。 LessNetは学習可能なエンコーダを手作りのシンプルな機能に置き換え、エンコーダでネットワークパラメータを学習(最適化)する必要がなくなる。これにより、3d医療画像登録のためのコンパクトで効率的でデコーダのみのアーキテクチャとなる。脳MRIデータセットを2つ公開し,デコーダのみのLessNetが高密度変位場と微分変形場の両方を3Dで効率的に学習できることを実証した。さらに、デコーダのみのLessNetは、VoxelMorphやTransMorphのような最先端のメソッドに匹敵する登録性能を達成できます。私たちのコードと事前トレーニングされたモデルは、https://github.com/xi-jia/lessnetで利用可能です。 In unsupervised medical image registration, the predominant approaches involve the utilization of a encoder-decoder network architecture, allowing for precise prediction of dense, full-resolution displacement fields from given paired images. Despite its widespread use in the literature, we argue for the necessity of making both the encoder and decoder learnable in such an architecture. For this, we propose a novel network architecture, termed LessNet in this paper, which contains only a learnable decoder, while entirely omitting the utilization of a learnable encoder. LessNet substitutes the learnable encoder with simple, handcrafted features, eliminating the need to learn (optimize) network parameters in the encoder altogether. Consequently, this leads to a compact, efficient, and decoder-only architecture for 3D medical image registration. Evaluated on two publicly available brain MRI datasets, we demonstrate that our decoder-only LessNet can effectively and efficiently learn both dense displacement and diffeomorphic deformation fields in 3D. Furthermore, our decoder-only LessNet can achieve comparable registration performance to state-of-the-art methods such as VoxelMorph and TransMorph, while requiring significantly fewer computational resources. Our code and pre-trained models are available at https://github.com/xi-jia/LessNet.	翻訳日:2024-02-07 17:22:34 公開日:2024-02-05
# MQuinE:知識グラフ埋め込みモデルにおける「Zパラドックス」の治療法 MQuinE: a cure for "Z-paradox'' in knowledge graph embedding models ( http://arxiv.org/abs/2402.03583v1 ) ライセンス: Link先を確認	Yang Liu, Huang Fang, Yunfeng Cai, Mingming Sun	(参考訳) 知識グラフ埋め込み(KGE)モデルは、リンク予測や情報検索を含む多くの知識グラフタスクにおいて最先端の結果を得た。実際に KGE モデルの性能は優れているが、一般的な KGE モデルである \emph{Z-paradox} の表現性が不足していることが分かる。 Z-パラドックスの存在に触発されて、Z-パラドックスに苦しむことなく、対称/非対称、逆、1-N/N-1/N-Nを含む様々な関係パターンをモデル化するための強い表現性を保った新しいKGEモデルである「emph{MQuinE}」を提案する。実世界の知識ベースでの実験では、Zパラドックスは既存のKGEモデルの性能を低下させ、いくつかの挑戦的なテストサンプルに対して20倍以上の精度低下を引き起こす可能性がある。我々の実験は、MQuinEがZパラドックスの負の影響を緩和し、既存のKGEモデルをリンク予測タスクの可視限界で上回ることを示した。 Knowledge graph embedding (KGE) models achieved state-of-the-art results on many knowledge graph tasks including link prediction and information retrieval. Despite the superior performance of KGE models in practice, we discover a deficiency in the expressiveness of some popular existing KGE models called \emph{Z-paradox}. Motivated by the existence of Z-paradox, we propose a new KGE model called \emph{MQuinE} that does not suffer from Z-paradox while preserves strong expressiveness to model various relation patterns including symmetric/asymmetric, inverse, 1-N/N-1/N-N, and composition relations with theoretical justification. Experiments on real-world knowledge bases indicate that Z-paradox indeed degrades the performance of existing KGE models, and can cause more than 20\% accuracy drop on some challenging test samples. Our experiments further demonstrate that MQuinE can mitigate the negative impact of Z-paradox and outperform existing KGE models by a visible margin on link prediction tasks.	翻訳日:2024-02-07 17:22:11 公開日:2024-02-05
# ニューラルネットワーク初期化におけるgoldilocksゾーンの再構成 Deconstructing the Goldilocks Zone of Neural Network Initialization ( http://arxiv.org/abs/2402.03579v1 ) ライセンス: Link先を確認	Artem Vysogorets, Anna Dawid, and Julia Kempe	(参考訳) トレーニング損失の2次特性は、ディープラーニングモデルの最適化ダイナミクスに大きな影響を与える。 fort & scherlis (2019) は、高正の曲率と損失ヘッセンの局所凸性が「ゴールデンロックゾーン」と呼ばれる地域にある高度に訓練可能な初期点と関連していることを発見した。その後もこの関係に触発された研究はごくわずかであり、ほとんど説明されていない。本稿では,均質ニューラルネットワークのためのgoldilocksゾーンの厳密かつ包括的な解析を行う。特に、損失ヘッシアンの非零正曲率をもたらす基本条件を導出し、それは事前の信念とは対照的に初期化ノルムと付随的にのみ関連していると主張する。さらに、高い正曲率をモデル信頼度、低い初期損失、以前は知られていなかったクロスエントロピー損失勾配に関連付ける。深層ネットワークのトレーニング性に対する正曲率の重要性を理解するため,Goldilocksゾーン外における完全連結アーキテクチャと畳み込みアーキテクチャの両方を最適化し,創発的挙動を解析する。強力なモデル性能は必ずしもGoldilocksゾーンと一致していないことが分かり、この概念の実用的意義を疑問視する。 The second-order properties of the training loss have a massive impact on the optimization dynamics of deep learning models. Fort & Scherlis (2019) discovered that a high positive curvature and local convexity of the loss Hessian are associated with highly trainable initial points located in a region coined the "Goldilocks zone". Only a handful of subsequent studies touched upon this relationship, so it remains largely unexplained. In this paper, we present a rigorous and comprehensive analysis of the Goldilocks zone for homogeneous neural networks. In particular, we derive the fundamental condition resulting in non-zero positive curvature of the loss Hessian and argue that it is only incidentally related to the initialization norm, contrary to prior beliefs. Further, we relate high positive curvature to model confidence, low initial loss, and a previously unknown type of vanishing cross-entropy loss gradient. To understand the importance of positive curvature for trainability of deep networks, we optimize both fully-connected and convolutional architectures outside the Goldilocks zone and analyze the emergent behaviors. We find that strong model performance is not necessarily aligned with the Goldilocks zone, which questions the practical significance of this concept.	翻訳日:2024-02-07 17:21:36 公開日:2024-02-05
# LLMマルチエージェントシステム:課題と課題 LLM Multi-Agent Systems: Challenges and Open Problems ( http://arxiv.org/abs/2402.03578v1 ) ライセンス: Link先を確認	Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, Chaoyang He	(参考訳) 本稿では,既存のマルチエージェントシステムについて検討し,未解決の課題を特定する。マルチエージェントシステム内の個々のエージェントの多様な能力と役割を活用することで、これらのシステムはコラボレーションを通じて複雑なタスクに取り組むことができる。本稿では,タスク割り当ての最適化,反復的議論による堅牢な推論の促進,複雑で階層的なコンテキスト情報の管理,マルチエージェントシステム内の複雑なインタラクションを支援するためのメモリ管理の強化について論じる。また、ブロックチェーンシステムにおけるマルチエージェントシステムの潜在的な応用について検討し、現実の分散システムにおける今後の開発と応用に光を当てる。 This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents within a multi-agent system, these systems can tackle complex tasks through collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layered context information, and enhancing memory management to support the intricate interactions within multi-agent systems. We also explore the potential application of multi-agent systems in blockchain systems to shed light on their future development and application in real-world distributed systems.	翻訳日:2024-02-07 17:20:57 公開日:2024-02-05
# 統計的観点からのデータセットバイアス問題の再検討 Revisiting the Dataset Bias Problem from a Statistical Perspective ( http://arxiv.org/abs/2402.03577v1 ) ライセンス: Link先を確認	Kien Do, Dung Nguyen, Hung Le, Thao Le, Dang Nguyen, Haripriya Harikumar, Truyen Tran, Santu Rana, Svetha Venkatesh	(参考訳) 本稿では,統計学的観点からの「データセットバイアス」問題を考察し,p(u) と大きく異なる p(u\|b) で表される入力 x におけるクラス属性 u と非クラス属性 b との強い相関性として問題の主な原因を明らかにする。 p(u\|b) は標準最大ログ類似度(MLL)目標のサンプリング分布の一部として現れるため、MLL を通じてバイアスデータセットで訓練されたモデルは本質的にそのような相関関係をパラメータに組み込んでおり、偏りのないテストデータへの一般化が不十分である。この観察から,各サンプルnの目的を {\displaystyle \frac{1}{p(u_{n}\|b_{n})} で重み付けするか,あるいは,そのサンプルを {\displaystyle \frac{1}{p(u_{n}\|b_{n})} に比例してサンプリングすることにより,データセットバイアスを軽減することを提案する。どちらの方法も統計的に等価であるが、前者はより安定で効果的であることが証明されている。さらに, 脱バイアスアプローチと因果推論との関連性を確立し, 提案手法の理論的基礎を補強する。しかし、バイアスラベルが利用できない場合、p(u\|b) を正確に計算するのは困難である。この課題を克服するために,「バイアス増幅」損失を訓練したバイアス付き分類器を用いて,フラクタ{1}{p(u\|b)}を近似する手法を提案する。様々な偏りのあるデータセットに対する大規模な実験は、ほとんどの環境で既存のデバイアス手法よりも優れた方法を示し、理論解析を検証した。 In this paper, we study the "dataset bias" problem from a statistical standpoint, and identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b in the input x, represented by p(u\|b) differing significantly from p(u). Since p(u\|b) appears as part of the sampling distributions in the standard maximum log-likelihood (MLL) objective, a model trained on a biased dataset via MLL inherently incorporates such correlation into its parameters, leading to poor generalization to unbiased test data. From this observation, we propose to mitigate dataset bias via either weighting the objective of each sample n by \frac{1}{p(u_{n}\|b_{n})} or sampling that sample with a weight proportional to \frac{1}{p(u_{n}\|b_{n})}. While both methods are statistically equivalent, the former proves more stable and effective in practice. Additionally, we establish a connection between our debiasing approach and causal reasoning, reinforcing our method's theoretical foundation. However, when the bias label is unavailable, computing p(u\|b) exactly is difficult. To overcome this challenge, we propose to approximate \frac{1}{p(u\|b)} using a biased classifier trained with "bias amplification" losses. Extensive experiments on various biased datasets demonstrate the superiority of our method over existing debiasing techniques in most settings, validating our theoretical analysis.	翻訳日:2024-02-07 17:20:39 公開日:2024-02-05
# $\ell_0$-bounded adversarial attackに対するadversarial trainingの一般化特性 Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks ( http://arxiv.org/abs/2402.03576v1 ) ライセンス: Link先を確認	Payam Delgosha, Hamed Hassani, Ramtin Pedarsani	(参考訳) 我々はニューラルネットワークが入力に対する小さな加法摂動に弱いことが広く観察されている。本稿では,$\ell_0$-bounded adversarial attack に着目し,重要なクラスであるtruncated classifier に対する対人訓練の性能を理論的に特徴付けることを目的とする。このような分類器は、理論上はガウス混合モデル(英語版)(gaussian mixed model)において、経験的に強い性能を示すことが示される。本論文の主な貢献は、分布非依存な$\ell_0$-bounded adversarial perturbation を持つ二項分類設定に対する新たな一般化を証明することである。この設定における一般化の導出には2つの大きな課題がある。 (i)高度に非線形な切断内積 (ii) 対向訓練による$\ell_0$ボールに対する最大化は非凸であり、非滑らかである。これらの課題に取り組むため,我々は,切断された仮説クラスの組合せ次元を境界とする新しいコーディング手法を開発した。 We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the $\ell_0$-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as well as theoretically in the Gaussian mixture model, in the $\ell_0$-adversarial setting. The main contribution of this paper is to prove a novel generalization bound for the binary classification setting with $\ell_0$-bounded adversarial perturbation that is distribution-independent. Deriving a generalization bound in this setting has two main challenges: (i) the truncated inner product which is highly non-linear; and (ii) maximization over the $\ell_0$ ball due to adversarial training is non-convex and highly non-smooth. To tackle these challenges, we develop new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.	翻訳日:2024-02-07 17:19:10 公開日:2024-02-05
# 大規模マルチプレイヤーゲームにおけるヒューマンAIアライメントに向けて Toward Human-AI Alignment in Large-Scale Multi-Player Games ( http://arxiv.org/abs/2402.03575v1 ) ライセンス: Link先を確認	Sugandha Sharma, Guy Davidson, Khimya Khetarpal, Anssi Kanervisto, Udit Arora, Katja Hofmann, Ida Momennejad	(参考訳) 複雑なマルチエージェントゲームにおける人間とAIのアライメントを達成することは、ゲームプレイを強化する信頼できるAIエージェントを作成するために不可欠である。低レベルのポリシーではなく、高レベルな動作タスクに焦点を当てた、解釈可能なタスクセットフレームワークを用いて、このアライメントを評価する方法を提案する。このアプローチには3つのコンポーネントがあります。まず,XboxのBleeding Edge(100K+ゲーム)から,複雑なタスク空間における行動パターンを明らかにすることで,人間のゲームプレイデータを解析する。このタスク空間は、ファイト・フライト、エクスプロイト・エクスプロイト、ソロ・マルチエージェントといった解釈可能な軸をキャプチャする行動多様体の基礎セットとして機能する。第2に、生成事前学習因果変換器を用いてBleeding EdgeをプレイするようにAIエージェントを訓練し、その動作を測定する。第3に、提案した行動多様体に人間とAIのゲームプレイを投影し、比較と対比を行う。これにより、政策の違いを高度な行動概念として解釈することができる。例えば、人間のプレイヤーが戦闘飛行や探索的行動において変動を示す一方で、AIプレイヤーは均一性に向かう傾向がある。さらに、AIエージェントは主にソロプレイに従事し、人間はしばしば協調的で競争的なマルチエージェントパターンに従事している。これらの大きな違いは、ヒューマンアラインアプリケーションにおけるAIの解釈可能な評価、設計、統合の必要性を強調している。我々の研究は、AIにおけるアライメントの議論、特に生成的AI研究を前進させ、マルチプレイヤーゲームにおける人間のエージェントアライメントを解釈可能なフレームワークを提供する。 Achieving human-AI alignment in complex multi-agent games is crucial for creating trustworthy AI agents that enhance gameplay. We propose a method to evaluate this alignment using an interpretable task-sets framework, focusing on high-level behavioral tasks instead of low-level policies. Our approach has three components. First, we analyze extensive human gameplay data from Xbox's Bleeding Edge (100K+ games), uncovering behavioral patterns in a complex task space. This task space serves as a basis set for a behavior manifold capturing interpretable axes: fight-flight, explore-exploit, and solo-multi-agent. Second, we train an AI agent to play Bleeding Edge using a Generative Pretrained Causal Transformer and measure its behavior. Third, we project human and AI gameplay to the proposed behavior manifold to compare and contrast. This allows us to interpret differences in policy as higher-level behavioral concepts, e.g., we find that while human players exhibit variability in fight-flight and explore-exploit behavior, AI players tend towards uniformity. Furthermore, AI agents predominantly engage in solo play, while humans often engage in cooperative and competitive multi-agent patterns. These stark differences underscore the need for interpretable evaluation, design, and integration of AI in human-aligned applications. Our study advances the alignment discussion in AI and especially generative AI research, offering a measurable framework for interpretable human-agent alignment in multiplayer gaming.	翻訳日:2024-02-07 17:18:40 公開日:2024-02-05
# エンタングルメント強化量子距離論:標準量子極限からハイゼンベルク極限へ Entanglement-enhanced quantum metrology: from standard quantum limit to Heisenberg limit ( http://arxiv.org/abs/2402.03572v1 ) ライセンス: Link先を確認	Jiahao Huang, Min Zhuang, Chaohong Lee	(参考訳) エンタングルメント強化量子メートル法は、測定精度を高めるために量子エンタングルメントの利用を探求する。プローブ内の粒子を量子絡み合い状態にすると、測定対象の物理量に関する情報をまとめて蓄積し、標準量子限界を超えた測定精度の向上とハイゼンベルク限界への接近に繋がる。量子操作と検出技術の急速な進歩により、寒冷原子や閉じ込められたイオンのような合成量子システムにおける多粒子の絡み合い状態の生成、操作、検出が可能になった。本稿では,量子計測における多粒子絡み合いを実証する基本原理と実験の進展を概観し,絡み合い量子センサの応用可能性について考察する。 Entanglement-enhanced quantum metrology explores the utilization of quantum entanglement to enhance measurement precision. When particles in a probe are prepared into a quantum entangled state, they collectively accumulate information about the physical quantity to be measured, leading to an improvement in measurement precision beyond the standard quantum limit and approaching the Heisenberg limit. The rapid advancement of techniques for quantum manipulation and detection has enabled the generation, manipulation, and detection of multi-particle entangled states in synthetic quantum systems such as cold atoms and trapped ions. This article aims to review and illustrate the fundamental principles and experimental progresses that demonstrate multi-particle entanglement for quantum metrology, as well as discuss the potential applications of entanglement-enhanced quantum sensors.	翻訳日:2024-02-07 17:18:11 公開日:2024-02-05
# 近似直交リカレントニューラルネットワークの量子化 Quantized Approximately Orthogonal Recurrent Neural Networks ( http://arxiv.org/abs/2402.04012v1 ) ライセンス: Link先を確認	Armand Foucault (IMT), Franck Mamalet (UT), Fran\c{c}ois Malgouyres (IMT)	(参考訳) 直交リカレントニューラルネットワーク(ornn)は、単純さと計算安定性のおかげで、長期的な依存関係を持つ時系列を含むタスクを学習するための魅力的な選択肢である。しかし、これらのネットワークはよく機能するためにかなりの数のパラメータを必要とすることが多く、コンパクトデバイスのような電力制約のある環境では禁止される。この問題に対処するアプローチのひとつは、ニューラルネットワークの量子化だ。本稿では, ORNNにおける繰り返しおよび入力重み行列の量子化について検討し, ほぼ直交RNN(QORNN)の量子化を導いた。学習後量子化(ptq)戦略と,直交制約と量子化重みを組み込んだ3つの量子化アウェアトレーニング(qat)アルゴリズムについて検討した。実証的な結果は、PTQよりもQATを使うことの利点を示している。最も効率的なモデルは、3ビット量子化であっても、様々な標準ベンチマークにおける最先端のフル精度ORNNやLSTMと同様の結果が得られる。 Orthogonal recurrent neural networks (ORNNs) are an appealing option for learning tasks involving time series with long-term dependencies, thanks to their simplicity and computational stability. However, these networks often require a substantial number of parameters to perform well, which can be prohibitive in power-constrained environments, such as compact devices. One approach to address this issue is neural network quantization. The construction of such networks remains an open problem, acknowledged for its inherent instability.In this paper, we explore the quantization of the recurrent and input weight matrices in ORNNs, leading to Quantized approximately Orthogonal RNNs (QORNNs). We investigate one post-training quantization (PTQ) strategy and three quantization-aware training (QAT) algorithms that incorporate orthogonal constraints and quantized weights. Empirical results demonstrate the advantages of employing QAT over PTQ. The most efficient model achieves results similar to state-of-the-art full-precision ORNN and LSTM on a variety of standard benchmarks, even with 3-bits quantization.	翻訳日:2024-02-07 14:43:41 公開日:2024-02-05
# 層状リチャードソン外挿法による量子誤差緩和 Quantum error mitigation by layerwise Richardson extrapolation ( http://arxiv.org/abs/2402.04000v1 ) ライセンス: Link先を確認	Vincent Russo and Andrea Mari	(参考訳) 雑音量子コンピュータにおける誤差の緩和法として広く用いられているのがリチャードソン外挿法(richardson extrapolation)であり、ノイズが量子期待値の推定に与える影響を単一のパラメータで捉え、それをより大きな値にスケールした後、最終的にゼロノイズ限界に外挿する手法である。我々は、異なる個々の層(または回路の大きなチャンク)のノイズを増幅し、関連する期待値を線形結合してゼロノイズ限界を推定するエラー緩和プロトコルである「emph{layerwise Richardson Extrapolation (LRE)}を導入することで、このアプローチを一般化する。線形結合の係数は多変量ラグランジュ補間の理論から解析的に得られる。 LREは、層状ユニタリ折り畳みの柔軟な構成空間を利用し、量子回路の各層のノイズレベルを独立変数として扱うことにより、エラーのより微妙な緩和を可能にする。従来の(単変数の)リチャードソン外挿法と比較してLREが優れた性能を発揮するシナリオを数値シミュレーションで示す。 A widely used method for mitigating errors in noisy quantum computers is Richardson extrapolation, a technique in which the overall effect of noise on the estimation of quantum expectation values is captured by a single parameter that, after being scaled to larger values, is eventually extrapolated to the zero-noise limit. We generalize this approach by introducing \emph{layerwise Richardson extrapolation (LRE)}, an error mitigation protocol in which the noise of different individual layers (or larger chunks of the circuit) is amplified and the associated expectation values are linearly combined to estimate the zero-noise limit. The coefficients of the linear combination are analytically obtained from the theory of multivariate Lagrange interpolation. LRE leverages the flexible configurational space of layerwise unitary folding, allowing for a more nuanced mitigation of errors by treating the noise level of each layer of the quantum circuit as an independent variable. We provide numerical simulations demonstrating scenarios where LRE achieves superior performance compared to traditional (single-variable) Richardson extrapolation.	翻訳日:2024-02-07 14:42:11 公開日:2024-02-05
# 高密度単純グラフ上のs-t最小カットに対するサブ線形クエリ量子アルゴリズム A sublinear query quantum algorithm for s-t minimum cut on dense simple graphs ( http://arxiv.org/abs/2110.15587v2 ) ライセンス: Link先を確認	Simon Apers, Arinta Auza, Troy Lee	(参考訳) グラフにおける$s{\operatorname{-}}t$ 最小カットは、削除によって$s$ と $t$ が切り離される辺の最小重みサブセットに対応する。このようなカットを見つけることは、$s$から$t$への最大フローを見つけるという古典的な問題と双対である。本研究では,無向グラフ上の最小$s{\operatorname{-}}t$カット問題に対する量子アルゴリズムを記述する。 n$頂点、$m$エッジ、および$w$で有界な積分辺重みを持つ無向グラフの場合、アルゴリズムは、$\widetilde o(\sqrt{m} n^{5/6} w^{1/3})$の随伴リストへのクエリに対して、$g$の最小$s{\operatorname{-}}t$の重みを高い確率で計算する。単純なグラフの場合、この境界は常に$\widetilde O(n^{11/6})$である。対照的に、ランダム化されたアルゴリズムは、$s$と$t$が接続されているかどうかを決定するために、単純なグラフの隣接リストに$\Omega(m)$クエリをしなければならない。 An $s{\operatorname{-}}t$ minimum cut in a graph corresponds to a minimum weight subset of edges whose removal disconnects vertices $s$ and $t$. Finding such a cut is a classic problem that is dual to that of finding a maximum flow from $s$ to $t$. In this work we describe a quantum algorithm for the minimum $s{\operatorname{-}}t$ cut problem on undirected graphs. For an undirected graph with $n$ vertices, $m$ edges, and integral edge weights bounded by $W$, the algorithm computes with high probability the weight of a minimum $s{\operatorname{-}}t$ cut after $\widetilde O(\sqrt{m} n^{5/6} W^{1/3})$ queries to the adjacency list of $G$. For simple graphs this bound is always $\widetilde O(n^{11/6})$, even in the dense case when $m = \Omega(n^2)$. In contrast, a randomized algorithm must make $\Omega(m)$ queries to the adjacency list of a simple graph $G$ even to decide whether $s$ and $t$ are connected.	翻訳日:2024-02-07 07:41:21 公開日:2024-02-05
# 歪リスク対策のための政策勾配法 Policy Gradient Methods for Distortion Risk Measures ( http://arxiv.org/abs/2107.04422v7 ) ライセンス: Link先を確認	Nithia Vijayan and Prashanth L.A	(参考訳) 強化学習(RL)フレームワークでリスクに敏感なポリシーを学習するポリシー勾配アルゴリズムを提案する。提案手法は,オン・ポリシーとオフ・ポリシーのrl設定において,エピソディックマルコフ決定過程における累積報酬の歪みリスク測度(drm)を最大化する。我々は,drmの目的に適合する政策勾配定理の変種を導出し,確率比に基づく勾配推定法と統合する。我々は,提案アルゴリズムをDRM目標のほぼ定常点に収束させる非漸近境界を導出する。 We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy and off-policy RL settings, respectively. We derive a variant of the policy gradient theorem that caters to the DRM objective, and integrate it with a likelihood ratio-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.	翻訳日:2024-02-07 07:40:00 公開日:2024-02-05
# フォトニックウェーブフォーム上に分布するフライングキュービットゲート Flying-qubit gates distributive over photonic waveshapes ( http://arxiv.org/abs/2105.13814v2 ) ライセンス: Link先を確認	Ihar Babushkin, Ayhan Demircan, Michael Kues, Uwe Morgner	(参考訳) 導波路などの伝搬ジオメトリにおいて「飛行キュービット」として作用する光子は、波束(パルス)の形で避けられない形で現れる。フォトニック・ウェーブパックの形状、および光子間の時間的・スペクトル的相関は、スケーラブルな計算を成功させる上で重要な役割を担っている。現在、光子が絡み合っていないことは、スケーラブルなフォトニック回路に適した資源であると考えられている。ここでは、コヒーレント光子変換と呼ばれる手法を用いることで、光子の波紋や時間・スペクトルの相関に敏感なフライングキュービットゲートを構築することができ、また、これらの波紋と処理上の相関関係を完全に保存できることを示す。これにより、スケーラブルな計算のために相関と純度を持つ光子を非常に広い範囲で使用できる。さらに、このようなゲートは、絡み合った光波束よりも効率的に処理することができる。 Photons, acting as "flying qubits" in propagation geometries such as waveguides, appear unavoidably in the form of wavepackets (pulses). The shape of the photonic wavepacket, as well as possible temporal/spectral correlations between the photons, play a critical role in successful scalable computation. Currently, unentangled indistinguishable photons are considered as a suitable resource for scalable photonic circuits. Here we show that using so called coherent photon conversion, it is possible to construct flying-qubit gates, which are not only insensitive to waveshapes of the photons and temporal/spectral correlations between them, but which also fully preserve these waveshapes and correlations upon the processing. This allows to use photons with correlations and purity in a very broad range for a scalable computation. Moreover, such gates can process entangled photonic wavepackets even more effectively than unentangled ones.	翻訳日:2024-02-07 07:39:39 公開日:2024-02-05
# 作物型セマンティックセグメンテーションのための文脈自己コントラスト事前学習 Context-self contrastive pretraining for crop type semantic segmentation ( http://arxiv.org/abs/2104.04310v3 ) ライセンス: Link先を確認	Michail Tarasiou, Riza Alp Guler, Stefanos Zafeiriou	(参考訳) 本稿では,特に密な分類タスクに適したコントラスト学習に基づく,教師付き事前学習方式を提案する。提案するコンテキスト自己コントラスト損失(cscl)は、トレーニングサンプル内の各場所とそのローカルコンテキスト間の類似度メトリックを用いて意味境界をポップアップする埋め込み空間を学習する。衛星画像時系列(SITS)からの作物型セマンティックセマンティックセグメンテーションでは、サテライト境界におけるパフォーマンスが重要なボトルネックとなり、CSCLが問題の根本原因に取り組む方法を説明し、このタスクにおける最先端のパフォーマンスを改善する。さらに、Sentinel-2(S2)衛星ミッションの画像を用いて、我々の知る限り、SITSデータセットは作物のタイプとパーセルのアイデンティティによって密にアノテートされ、データ生成パイプラインと共に公開されています。このデータを用いてCSCLは、最小限の事前学習でも、すべてのベースラインを改善し、より粒度の細かい作物のクラスを得るための超解像におけるセマンティックセグメンテーションのプロセスを示す。データをダウンロードするためのコードと指示はhttps://github.com/michaeltrs/DeepSatModelsにある。 In this paper, we propose a fully supervised pre-training scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed Context-Self Contrastive Loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in a training sample and its local context. For crop type semantic segmentation from Satellite Image Time Series (SITS) we find performance at parcel boundaries to be a critical bottleneck and explain how CSCL tackles the underlying cause of that problem, improving the state-of-the-art performance in this task. Additionally, using images from the Sentinel-2 (S2) satellite missions we compile the largest, to our knowledge, SITS dataset densely annotated by crop type and parcel identities, which we make publicly available together with the data generation pipeline. Using that data we find CSCL, even with minimal pre-training, to improve all respective baselines and present a process for semantic segmentation at super-resolution for obtaining crop classes at a more granular level. The code and instructions to download the data can be found in https://github.com/michaeltrs/DeepSatModels.	翻訳日:2024-02-07 07:38:25 公開日:2024-02-05
# In-Crop雑草同定のためのディープラーニング技術:概観 Deep Learning Techniques for In-Crop Weed Identification: A Review ( http://arxiv.org/abs/2103.14872v2 ) ライセンス: Link先を確認	Kun Hu, Zhiyong Wang, Guy Coleman, Asher Bender, Tingting Yao, Shan Zeng, Dezhen Song, Arnold Schumann, Michael Walsh	(参考訳) 雑草は農業の生産性と環境にとって大きな脅威である。持続可能な農業への需要の増加は、除草剤への依存を減らすことを目的とした正確な雑草防除技術の革新を促した。様々な視覚タスクにおけるディープラーニングの大きな成功により、多くの有望な画像ベース雑草検出アルゴリズムが開発されている。本稿では,画像に基づく雑草検出における深層学習技術の最近の展開について概説する。このレビューは、雑草検出に関連する深層学習の基本を概説することから始まる。次に, 深層雑草検出に関する最近の進歩を, 公共雑草データセットを含む研究資料について検討する。最後に, 実際に展開可能な雑草検出手法を開発する上での課題と今後の研究の機会の議論をまとめ, この分野のタイムリーな調査を行い, 学際的な研究課題に対処する研究者を多く呼び寄せることを期待する。 Weeds are a significant threat to the agricultural productivity and the environment. The increasing demand for sustainable agriculture has driven innovations in accurate weed control technologies aimed at reducing the reliance on herbicides. With the great success of deep learning in various vision tasks, many promising image-based weed detection algorithms have been developed. This paper reviews recent developments of deep learning techniques in the field of image-based weed detection. The review begins with an introduction to the fundamentals of deep learning related to weed detection. Next, recent progresses on deep weed detection are reviewed with the discussion of the research materials including public weed datasets. Finally, the challenges of developing practically deployable weed detection methods are summarized, together with the discussions of the opportunities for future research.We hope that this review will provide a timely survey of the field and attract more researchers to address this inter-disciplinary research problem.	翻訳日:2024-02-07 07:38:01 公開日:2024-02-05
# Dynamic ArraysはついにModelsの作り方を変えるのか? Will Dynamic Arrays finally change the way Models are built? ( http://arxiv.org/abs/2006.14706v2 ) ライセンス: Link先を確認	Peter Bartholomew	(参考訳) スプレッドシートは、数値コンテンツを処理し交換する非常に成功し直感的な手段を提供する。その直感的なアドホックな性質は、ビジネスやエンジニアリングなどさまざまな分野での利用に非常に人気がありますが、これらと全く同じ特徴によってエラーが発生します。前回のEuSpRIG論文では、ソリューションの透明性を高め、問題領域とのリンクを偽造する読みやすい表記法を提供する上で、名前の役割について検討した。広く使われているのはCSE配列式であるが、スプレッドシートの開発が明らかに面倒な作業であることを認識している。それ以来、新しい動的配列が導入され、配列計算がExcelのデフォルトの操作モードとなった。本稿では,より専門的な開発環境における導入が,ソリューションの整合性が重要である従来の手法に取って代わる可能性について考察する。完全な動的モデルの大きな利点は、更新を維持するために手作業による介入を少なくし、付随するエラーやリスクを減らす可能性があることである。 Spreadsheets offer a supremely successful and intuitive means of processing and exchanging numerical content. Its intuitive ad-hoc nature makes it hugely popular for use in diverse areas including business and engineering, yet these very same characteristics make it extraordinarily error-prone; many would question whether it is suitable for serious analysis or modelling tasks. A previous EuSpRIG paper examined the role of Names in increasing solution transparency and providing a readable notation to forge links with the problem domain. Extensive use was made of CSE array formulas, but it is acknowledged that their use makes spreadsheet development a distinctly cumbersome task. Since that time, the new dynamic arrays have been introduced and array calculation is now the default mode of operation for Excel. This paper examines the thesis that their adoption within a more professional development environment could replace traditional techniques where solution integrity is important. A major advantage of fully dynamic models is that they require less manual intervention to keep them updated and so have the potential to reduce the attendant errors and risk.	翻訳日:2024-02-07 07:37:47 公開日:2024-02-05
# ANAct: アクティベーション関数の適応正規化 ANAct: Adaptive Normalization for Activation Functions ( http://arxiv.org/abs/2208.13315v3 ) ライセンス: Link先を確認	Yuan Peiwen, Henan Liu, Zhu Changsheng, Yuyi Wang	(参考訳) 本稿では,活性化関数の前方および後方伝播に対する負の効果と,その逆作用について検討する。まず,アクティベーション関数がニューラルネットワークの前方および後方伝播にどのように影響するかを調べ,この領域における先行研究を拡張する勾配分散の一般的な形を導出する。重み初期化後のニューラルネットワークの状態のみを説明するのではなく、トレーニングプロセス全体を通して正規化特性を保証するために、ミニバッチ統計を用いて正規化係数を動的に更新する。第2に,活性化関数を正規化し,層間で一貫した勾配分散を維持し,実験によりその効果を示すanactを提案する。収束速度は正規化特性と大まかに関係していると考えられる。我々は,ANActとCNNおよび残差ネットワーク上の共通活性化関数を比較し,ANActが一貫して性能を改善することを示す。例えば正規化swishは、小さなimagenetデータセットでresnet50のvanilla swishよりも1.4\%高いtop-1精度を達成し、cifar-100では1.2\%以上も高い。 In this paper, we investigate the negative effect of activation functions on forward and backward propagation and how to counteract this effect. First, We examine how activation functions affect the forward and backward propagation of neural networks and derive a general form for gradient variance that extends the previous work in this area. We try to use mini-batch statistics to dynamically update the normalization factor to ensure the normalization property throughout the training process, rather than only accounting for the state of the neural network after weight initialization. Second, we propose ANAct, a method that normalizes activation functions to maintain consistent gradient variance across layers and demonstrate its effectiveness through experiments. We observe that the convergence rate is roughly related to the normalization property. We compare ANAct with several common activation functions on CNNs and residual networks and show that ANAct consistently improves their performance. For instance, normalized Swish achieves 1.4\% higher top-1 accuracy than vanilla Swish on ResNet50 with the Tiny ImageNet dataset and more than 1.2\% higher with CIFAR-100.	翻訳日:2024-02-07 07:33:38 公開日:2024-02-05
# 制約付き非凸非凸Min-Max最適化とコモノトン包摂の高速化アルゴリズム Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion ( http://arxiv.org/abs/2206.05248v3 ) ライセンス: Link先を確認	Yang Cai, Argyris Oikonomou, Weiqiang Zheng	(参考訳) 制約付きコモノトン min-max 最適化,非凸非凹 min-max 最適化問題の構造化クラス,およびコモノトン包摂への一般化について検討した。最初のコントリビューションでは、制約付きコモノトン min-max 最適化とコモノトン包摂に対して、Yoon と Ryu (2021) によって提案された Extra Anchored Gradient (EAG) アルゴリズムを拡張し、すべての一階法で最適収束率$O\left(\frac{1}{T}\right)$を達成した。さらに,アルゴリズムの反復が解集合内の点に収束することを証明する。第2の貢献として、lee and kim (2021) が開発した fast extra gradient (feg) アルゴリズムを、制約付きコモノトンmin-max最適化とコモノトン包含に拡張し、同じ$o\left(\frac{1}{t}\right)$ の収束率を達成する。この値は、文献で研究されていない最も広いコモノトン包摂問題に適用できる。我々の解析は単純なポテンシャル関数引数に基づいており、他の高速化アルゴリズムの解析に有用である。 We study constrained comonotone min-max optimization, a structured class of nonconvex-nonconcave min-max optimization problems, and their generalization to comonotone inclusion. In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods. Additionally, we prove that the algorithm's iterations converge to a point in the solution set. In our second contribution, we extend the Fast Extra Gradient (FEG) algorithm, as developed by Lee and Kim (2021), to constrained comonotone min-max optimization and comonotone inclusion, achieving the same $O\left(\frac{1}{T}\right)$ convergence rate. This rate is applicable to the broadest set of comonotone inclusion problems yet studied in the literature. Our analyses are based on simple potential function arguments, which might be useful for analyzing other accelerated algorithms.	翻訳日:2024-02-07 07:31:31 公開日:2024-02-05
# グラフをノードに翻訳するGNNはグラフ表現学習に強力で効率的な Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning ( http://arxiv.org/abs/2204.04510v2 ) ライセンス: Link先を確認	Dongkwan Kim and Alice Oh	(参考訳) サブグラフ表現学習は重要な問題として現れてきたが、デフォルトでは大きなグローバルグラフ上の特殊なグラフニューラルネットワークにアプローチされている。これらのモデルは広範なメモリと計算資源を必要とするが、サブグラフの階層構造をモデル化する。本稿では,サブグラフの表現を学習するための新しい定式化であるsubgraph-to-node(s2n)翻訳を提案する。具体的には、グローバルグラフのサブグラフの集合が与えられた場合、サブグラフをノードに粗く変換することで、新しいグラフを構築する。理論的証拠と経験的証拠の両方を実証し、S2Nは最先端のモデルと比較してメモリと計算コストを著しく削減するだけでなく、サブグラフの局所構造と大域構造の両方をキャプチャすることでそれらを上回ります。グラフの粗大化手法を利用することで,グラフが不十分なデータスカース設定においても,ベースラインの精度が向上する。 8つのベンチマーク実験により、S2N翻訳を用いた微調整モデルでは、最先端モデルよりも183～711倍のサブグラフサンプルを処理可能であることが示された。 Subgraph representation learning has emerged as an important problem, but it is by default approached with specialized graph neural networks on a large global graph. These models demand extensive memory and computational resources but challenge modeling hierarchical structures of subgraphs. In this paper, we propose Subgraph-To-Node (S2N) translation, a novel formulation for learning representations of subgraphs. Specifically, given a set of subgraphs in the global graph, we construct a new graph by coarsely transforming subgraphs into nodes. Demonstrating both theoretical and empirical evidence, S2N not only significantly reduces memory and computational costs compared to state-of-the-art models but also outperforms them by capturing both local and global structures of the subgraph. By leveraging graph coarsening methods, our method outperforms baselines even in a data-scarce setting with insufficient subgraphs. Our experiments on eight benchmarks demonstrate that fined-tuned models with S2N translation can process 183 -- 711 times more subgraph samples than state-of-the-art models at a better or similar performance level.	翻訳日:2024-02-07 07:30:15 公開日:2024-02-05
# 定数問題:微分的連続観測のきめ細かな複雑さ Constant matters: Fine-grained Complexity of Differentially Private Continual Observation ( http://arxiv.org/abs/2202.11205v6 ) ライセンス: Link先を確認	Hendrik Fichtenberger, Monika Henzinger and Jalaj Upadhyay	(参考訳) 連続観測下でのカウントのための差分プライベートアルゴリズムに対するきめ細かい誤差境界について検討する。我々の主な知見は、下三角行列を使用する行列機構が連続観測モデルで使用できることである。より具体的には、カウント行列 $M_\mathsf{count}$ に対して明示的な分解を与え、エラーを明示的に上限付けする。また、上界の厳密な定数を指定することで、きめ細かい解析も行います。我々の解析は、$M_\mathsf{count}$の完全有界ノルム(cb-ノルム)の上と下の境界に基づいている。その過程で、mathias (siam journal on matrix analysis and applications, 1993) が cb-norm の $m_\mathsf{count}$ を広い範囲の次元の $m_\mathsf{count}$ に対して、28年間で最もよく知られた範囲を改善する。さらに,二進計数,ヒストグラムの維持,略カット保存合成グラフの公開,多くのグラフ統計,サブストリングとエピソード計数などの連続観測下での様々な問題に対して,具体的な誤差境界を最初に与えた。最後に、この結果は非対話的局所学習(英語版)(non-interactive local learning) { and the first lower bounds on the additive error for $(\epsilon,\delta)$-differentially-private counting under continual observation(英語版)(連続観察)に利用できる。この研究の後、Hnzinger et al. (SODA2023) は、我々の分解が平均二乗誤差の微細化も達成していることを示した。 We study fine-grained error bounds for differentially private algorithms for counting under continual observation. Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. More specifically, we give an explicit factorization for the counting matrix $M_\mathsf{count}$ and upper bound the error explicitly. We also give a fine-grained analysis, specifying the exact constant in the upper bound. Our analysis is based on upper and lower bounds of the {\em completely bounded norm} (cb-norm) of $M_\mathsf{count}$. Along the way, we improve the best-known bound of 28 years by Mathias (SIAM Journal on Matrix Analysis and Applications, 1993) on the cb-norm of $M_\mathsf{count}$ for a large range of the dimension of $M_\mathsf{count}$. Furthermore, we are the first to give concrete error bounds for various problems under continual observation such as binary counting, maintaining a histogram, releasing an approximately cut-preserving synthetic graph, many graph-based statistics, and substring and episode counting. Finally, we note that our result can be used to get a fine-grained error bound for non-interactive local learning {and the first lower bounds on the additive error for $(\epsilon,\delta)$-differentially-private counting under continual observation.} Subsequent to this work, Henzinger et al. (SODA2023) showed that our factorization also achieves fine-grained mean-squared error.	翻訳日:2024-02-07 07:29:36 公開日:2024-02-05
# EvoPruneDeepTL:トランスファーラーニングに基づくディープニューラルネットワークのための進化的プルーニングモデル EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning based Deep Neural Networks ( http://arxiv.org/abs/2202.03844v3 ) ライセンス: Link先を確認	Javier Poyatos, Daniel Molina, Aritz. D. Martinez, Javier Del Ser, Francisco Herrera	(参考訳) 近年、ディープラーニングモデルは複雑な最適化問題において優れた性能を示している。一般的に大規模なトレーニングデータセットが必要ですが、ほとんどのケースでは制限があります。転送学習は、事前学習されたアーキテクチャの最初のレイヤをインポートし、それらを完全に接続されたレイヤに接続して、新しい問題に適用することができる。その結果、これらのレイヤの構成はモデルの性能に不可欠となる。残念ながら、これらのモデルの最適化は通常、計算に要求されるタスクである。ディープラーニングモデルを最適化する戦略のひとつにpruning schemeがある。プルーニングメソッドは、一度プルーニングされたモデルのパフォーマンス上のペナルティを想定して、ネットワークの複雑さを減らすことに重点を置いている。しかし、プルーニングは、最適化アルゴリズムを使用してニューロン間の不要な接続を識別し、最終的に除去することで、パフォーマンスを向上させるために使われる可能性がある。本研究は,最後の完全連結層を遺伝的アルゴリズムにより最適化された疎層に置き換える,トランスファー学習に基づくディープニューラルネットワークのための進化的プルーニングモデルであるevoprune deeptlを提案する。提案手法は,その解符号化戦略に応じて,ニューラルネットワークの密結合部分上で最適化されたプルーニングや特徴選択を行うことができる。提案の利点を評価するために,複数のデータセットを用いて異なる実験を行った。その結果、最適化プロセスの結果、EvoPruneDeepTLの寄与と、ネットワーク全体の計算効率に対する特徴選択が示された。特に精度が向上し、最終層における活動ニューロンの数も減少する。 In recent years, Deep Learning models have shown a great performance in complex optimization problems. They generally require large training datasets, which is a limitation in most practical cases. Transfer learning allows importing the first layers of a pre-trained architecture and connecting them to fully-connected layers to adapt them to a new problem. Consequently, the configuration of the these layers becomes crucial for the performance of the model. Unfortunately, the optimization of these models is usually a computationally demanding task. One strategy to optimize Deep Learning models is the pruning scheme. Pruning methods are focused on reducing the complexity of the network, assuming an expected performance penalty of the model once pruned. However, the pruning could potentially be used to improve the performance, using an optimization algorithm to identify and eventually remove unnecessary connections among neurons. This work proposes EvoPruneDeepTL, an evolutionary pruning model for Transfer Learning based Deep Neural Networks which replaces the last fully-connected layers with sparse layers optimized by a genetic algorithm. Depending on its solution encoding strategy, our proposed model can either perform optimized pruning or feature selection over the densely connected part of the neural network. We carry out different experiments with several datasets to assess the benefits of our proposal. Results show the contribution of EvoPruneDeepTL and feature selection to the overall computational efficiency of the network as a result of the optimization process. In particular, the accuracy is improved, reducing at the same time the number of active neurons in the final layers.	翻訳日:2024-02-07 07:29:06 公開日:2024-02-05
# 神経空間のナビゲート:方向の分岐を克服する概念活性化ベクトルの再検討 Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence ( http://arxiv.org/abs/2202.03482v2 ) ライセンス: Link先を確認	Frederik Pahde, Maximilian Dreyer, Leander Weber, Moritz Weckbecker, Christopher J. Anders, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin	(参考訳) ニューラルネットワークの予測戦略を理解することへの関心が高まる中、概念活性化ベクトル(cav)は潜在空間における人間の理解可能な概念をモデル化するための一般的なツールとして登場してきた。一般に、CAVは、与えられた概念を伴わないサンプルの潜在表現の分離性を最適化する線形分類器を利用して計算される。しかし,本論文では,このような分離性指向の計算が,概念の方向性を正確にモデル化する実際の目標から逸脱する可能性を示した。この相違は、不注意な方向、すなわち概念とは無関係な信号がクラス分離性を最適化するために線形モデルのフィルタ(すなわち重み)によって拾われることに起因する可能性がある。そこで我々は,概念信号のみに着目したパターンベースCAVを導入し,より正確な概念指示を提供する。データアーチファクトによるショートカット動作に対する概念感度試験やモデル修正を含む,CAVの真の概念方向との整合性や,CAV応用への影響の観点から,各種CAV手法の評価を行った。本稿では,VGG,ResNet,EfficientNetモデルアーキテクチャを用いたPediatric Bone Age,ISIC2019,FunnyBirdsデータセットを用いたパターンベースCAVの利点を示す。 With a growing interest in understanding neural network prediction strategies, Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space. Commonly, CAVs are computed by leveraging linear classifiers optimizing the separability of latent representations of samples with and without a given concept. However, in this paper we show that such a separability-oriented computation leads to solutions, which may diverge from the actual goal of precisely modeling the concept direction. This discrepancy can be attributed to the significant influence of distractor directions, i.e., signals unrelated to the concept, which are picked up by filters (i.e., weights) of linear models to optimize class-separability. To address this, we introduce pattern-based CAVs, solely focussing on concept signals, thereby providing more accurate concept directions. We evaluate various CAV methods in terms of their alignment with the true concept direction and their impact on CAV applications, including concept sensitivity testing and model correction for shortcut behavior caused by data artifacts. We demonstrate the benefits of pattern-based CAVs using the Pediatric Bone Age, ISIC2019, and FunnyBirds datasets with VGG, ResNet, and EfficientNet model architectures.	翻訳日:2024-02-07 07:28:42 公開日:2024-02-05
# 計算仮定によるEPR対の並列自己検定 Parallel self-testing of EPR pairs under computational assumptions ( http://arxiv.org/abs/2201.13430v3 ) ライセンス: Link先を確認	Honghao Fu, Daochen Wang, Qi Zhao	(参考訳) 自己テストは量子力学の基本的な特徴であり、古典的検証者が信頼できない量子デバイスに特定の状態の準備と測定を強制することができる。標準的なアプローチでは、少なくとも2つの空間分離デバイスが想定されている。近年,metger と vidick [quantum, 2021] は,単一の量子デバイスの epr 対を計算仮定で自己テストできることを示した。本研究では,その結果を一般化し,n$ eprペアの最初の並列自己テストを行い,同一の計算仮定の下で単一デバイス環境で測定する。提案手法は,poly$(N)$リソースを用いて,真正な量子デバイスによって1ドル近い確率で通過可能であることを示す。さらに、最大$\epsilon$の確率でプロトコルに失敗する量子デバイスは、適切な意味で正直であることに近いpoly$(n,\epsilon)$でなければならないことを示した。特に,計算量やアダマール基底測定のテンソル積上の任意の分布をテストでき,計算仮定の下でデバイスに依存しない量子鍵分布などの応用に適している。さらに,従来の通信のみを用いて,単一クラウド量子コンピュータの任意の数の量子ビットを効率よく証明できるプロトコルの簡易版が初となる。 Self-testing is a fundamental feature of quantum mechanics that allows a classical verifier to force untrusted quantum devices to prepare certain states and perform certain measurements on them. The standard approach assumes at least two spatially separated devices. Recently, Metger and Vidick [Quantum, 2021] showed that a single EPR pair of a single quantum device can be self-tested under computational assumptions. In this work, we generalize their results to give the first parallel self-test of $N$ EPR pairs and measurements on them in the single-device setting under the same computational assumptions. We show that our protocol can be passed with probability negligibly close to $1$ by an honest quantum device using poly$(N)$ resources. Moreover, we show that any quantum device that fails our protocol with probability at most $\epsilon$ must be poly$(N,\epsilon)$-close to being honest in the appropriate sense. In particular, our protocol can test any distribution over tensor products of computational or Hadamard basis measurements, making it suitable for applications such as device-independent quantum key distribution under computational assumptions. Moreover, a simplified version of our protocol is the first that can efficiently certify an arbitrary number of qubits of a single cloud quantum computer using only classical communication.	翻訳日:2024-02-07 07:28:16 公開日:2024-02-05
# 歴史的航空機からのアノテーション付きインスタンスセグメンテーションXXL-CTデータセット An annotated instance segmentation XXL-CT data-set from a historic airplane ( http://arxiv.org/abs/2212.08639v2 ) ライセンス: Link先を確認	Roland Gruber and Nils Reims and Andreas Hempfer and Stefan Gerth and Michael B\"ohnel and Theobald Fuchs and Michael Salamon and Thomas Wittenberg	(参考訳) me 163(me 163)は、第二次世界大戦中のドイツ空軍の戦闘機である。これらの航空機の1機は現在ミュンヘンのドイツ博物館の歴史的な航空機展示会に展示されている。産業用XXL-コンピュータトモグラフィースキャナーを用いて,その歴史,設計,保存状態に関する知見を得るため,完全なCTスキャンが得られた。 me 163のctデータを使用して、その詳細は完全な船体から単一のスプロケットやリベットまで、様々なレベルで視覚的に確認できる。しかしながら、訓練された人間の観察者は、体積データをそのすべての部分と接続で識別し、解釈することができるが、飛行機と異なる部分の仮想的な分解は非常に望ましい。それでもこれは、すべてのコンポーネントと関心のあるオブジェクトのインスタンスセグメンテーションをCTデータから切り離す必要があることを意味する。現在、これらのXXL-エアプレーンデータの自動または半自動セグメンテーションのためのコンピュータ支援ツールは存在せず、第1段階としてインタラクティブなデータアノテーションとオブジェクトラベリングプロセスが確立されている。これまでのところ、Me 163の7つの512 x 512 x 512のボクセルサブボリュームに注釈とラベルが付けられており、デジタル遺産、非破壊テスト、機械学習の分野で様々な新しい応用に利用できる可能性がある。本研究は, 産業用XXL-CTスキャナーを用いて航空機のデータ取得過程を記述し, 航空機のCTデータのサブボリュームに注釈を付けるための対話型セグメンテーションとラベリングスキームの概要を述べるとともに, 注釈付きおよびラベル付きデータの解釈・処理に関する様々な課題を解説・議論する。 The Me 163 was a Second World War fighter airplane and a result of the German air force secret developments. One of these airplanes is currently owned and displayed in the historic aircraft exhibition of the Deutsches Museum in Munich, Germany. To gain insights with respect to its history, design and state of preservation, a complete CT scan was obtained using an industrial XXL-computer tomography scanner. Using the CT data from the Me 163, all its details can visually be examined at various levels, ranging from the complete hull down to single sprockets and rivets. However, while a trained human observer can identify and interpret the volumetric data with all its parts and connections, a virtual dissection of the airplane and all its different parts would be quite desirable. Nevertheless, this means, that an instance segmentation of all components and objects of interest into disjoint entities from the CT data is necessary. As of currently, no adequate computer-assisted tools for automated or semi-automated segmentation of such XXL-airplane data are available, in a first step, an interactive data annotation and object labelling process has been established. So far, seven 512 x 512 x 512 voxel sub-volumes from the Me 163 airplane have been annotated and labelled, whose results can potentially be used for various new applications in the field of digital heritage, non-destructive testing, or machine-learning. This work describes the data acquisition process of the airplane using an industrial XXL-CT scanner, outlines the interactive segmentation and labelling scheme to annotate sub-volumes of the airplane's CT data, describes and discusses various challenges with respect to interpreting and handling the annotated and labelled data.	翻訳日:2024-02-07 07:21:29 公開日:2024-02-05
# 表面符号、量子回路、絡み合い相 Surface codes, quantum circuits, and entanglement phases ( http://arxiv.org/abs/2212.08084v2 ) ライセンス: Link先を確認	Jan Behrends, Florian Venn, Benjamin B\'eri	(参考訳) surface codes$\unicode{x2014}$leading candidate for quantum error correction (qec)$\unicode{x2014}$and entanglement phases$\unicode{x2014}$ a key concept for many-body quantum dynamics$\unicode{x2014}$have thisforeは無関係である。ここでは、両者のつながりを確立します。我々はイジングモデルを介して2次元(2次元)曲面符号を非整合または整合誤差(ビットフリップまたは一軸回転)のクラスで$(1+1)$D自由フェルミオン量子回路にマッピングする。誤差補正位相は、回路の1次元長時間状態 $\|\Psi_\infty\rangle$ に対して位相的に非自明な領域法則を示す。誤差閾値を超えると、コヒーレントケースにおける不整合誤差と対数絡みのトポロジカルに自明な領域法則が見つかる。その結果, 1次元親ハミルトニアンをリンクイジングモデルと2次元散乱ネットワークを用いて1次元親ハミルトニアンを定式化し, 後者はそれぞれの絶縁相と金属相を示し, 1次元フェルミオンギャップと位相をその局在長と位相不変量で設定する。我々は,この結果から,(d+1$)Dトポロジカルコードと$d$次元領域法則の誤り訂正フェーズの双対性への一般化を期待する。イジングモデル、散乱ネットワーク、および親ハミルトニアンを組み合わせるアプローチは、他のフェルミオン回路に一般化することができ、独立した興味を持つかもしれない。 Surface codes$\unicode{x2014}$leading candidates for quantum error correction (QEC)$\unicode{x2014}$and entanglement phases$\unicode{x2014}$a key notion for many-body quantum dynamics$\unicode{x2014}$have heretofore been unrelated. Here, we establish a link between the two. We map two-dimensional (2D) surface codes under a class of incoherent or coherent errors (bit flips or uniaxial rotations) to $(1+1)$D free-fermion quantum circuits via Ising models. We show that the error-correcting phase implies a topologically nontrivial area law for the circuit's 1D long-time state $\|\Psi_\infty\rangle$. Above the error threshold, we find a topologically trivial area law for incoherent errors and logarithmic entanglement in the coherent case. In establishing our results, we formulate 1D parent Hamiltonians for $\|\Psi_\infty\rangle$ via linking Ising models and 2D scattering networks, the latter displaying respective insulating and metallic phases and setting the 1D fermion gap and topology via their localization length and topological invariant. We expect our results to generalize to a duality between the error-correcting phase of ($d+1$)D topological codes and $d$-dimensional area laws; this can facilitate assessing code performance under various errors. The approach of combining Ising models, scattering networks, and parent Hamiltonians can be generalized to other fermionic circuits and may be of independent interest.	翻訳日:2024-02-07 07:20:58 公開日:2024-02-05
# 自動プログラム修理のエネルギー消費 Energy Consumption of Automated Program Repair ( http://arxiv.org/abs/2211.12104v2 ) ライセンス: Link先を確認	Matias Martinez, Silverio Mart\'inez-Fern\'andez, Xavier Franch	(参考訳) 自動プログラム修復(APR)は、ソフトウェアプログラムのメンテナンスコストを削減するために、ソフトウェアバグの修復プロセスを自動化することを目的としている。さらに、近年、APRアプローチの成功(精度測定による)が増加している。しかし、APRを用いてバグを修復する際のエネルギー的影響について、これまでの研究は検討されていない。グリーンソフトウェア研究の分野は、ソフトウェア製品の開発、保守、使用に必要なエネルギー消費を測定することを目的としている。本論文は,apr と green software research の分野を初めて組み合わせたものである。我々は,APR活動のエネルギー消費を計測する基盤を定義することが主目的である。 Javaの従来の10のプログラム修復ツールと10の微調整されたLarge-Language Models(LLM)のエネルギー消費を測定し、実際のバグ修正プログラムであるDefects4Jの実際のバグを修復する。この実験の最初の結果は、既存のエネルギー消費とバグを正しく修復する能力とのトレードオフを示している: いくつかのaprツールは、他のツールよりも少ないエネルギーで高い精度を達成できる。 Automated program repair (APR) aims to automatize the process of repairing software bugs in order to reduce the cost of maintaining software programs. Moreover, the success (given by the accuracy metric) of APR approaches has increased in recent years. However, no previous work has considered the energy impact of repairing bugs automatically using APR. The field of green software research aims to measure the energy consumption required to develop, maintain, and use software products. This paper combines, for the first time, the APR and Green software research fields. We have as main goal to define the foundation for measuring the energy consumption of the APR activity. We measure the energy consumption of ten traditional program repair tools for Java and ten fine-tuned Large-Language Models (LLM) on source code trying to repair real bugs from Defects4J, a set of real buggy programs. The initial results from this experiment show the existing trade-off between energy consumption and the ability to correctly repair bugs: Some APR tools are capable of achieving higher accuracy by spending less energy than other tools.	翻訳日:2024-02-07 07:20:06 公開日:2024-02-05
# 個人別連続数におけるほぼタイトな誤差境界 Almost Tight Error Bounds on Differentially Private Continual Counting ( http://arxiv.org/abs/2211.05006v2 ) ライセンス: Link先を確認	Monika Henzinger and Jalaj Upadhyay and Sarvagya Upadhyay	(参考訳) プライベートフェデレーション学習の最初の大規模展開では、継続リリースモデルにおける差分プライベートカウントをサブルーチンとして使用している(“Federated Learning with Formal Differential Privacy Guarantees”と題されたGoogle AIブログ)。この場合、エラーの具体的なバウンドは、プライバシパラメータを減らすために非常に重要になります。連続カウントの標準的なメカニズムはバイナリメカニズムである。本稿では,その平均二乗誤差が漸近的に最適であり,二乗メカニズムの誤差よりも小さい因子10であることを示す。また, 本解析の定数は, 下級項の定数でのみ異なる非漸近的下限と上限を与えることにより, ほぼタイトであることを示した。本アルゴリズムは計数行列の行列機構であり,リリース毎に一定時間を要する。また,デニソフらの私的学習アルゴリズム(NeurIPS 2022)の過剰なリスクに上限を与えるために,算数行列の明示的な因子化も行っている。連続数え上げ機構に対する我々の下限は、近似微分プライバシーの下での連続数え上げに関する最初の厳密な下限である。これは、行列の特異値の観点から、$\gamma_F(\cdot)$ で表されるある分解ノルム上の新しい下界を用いて達成される。特に、任意の複素行列に対して、$A \in \mathbb{C}^{m \times n}$, \[ \gamma_F(A) \geq \frac{1}{\sqrt{m}}\\|A\\|_1, \] ここで、$\\|\cdot \\|$はシャッテン-1ノルムを表す。このテクニックは、より大きな線形クエリのクラスに対する下限を証明するのに役立つと思います。この手法のパワーを説明するために、パリティクエリに応答する平均二乗誤差の第1下位境界を示す。 The first large-scale deployment of private federated learning uses differentially private counting in the continual release model as a subroutine (Google AI blog titled "Federated Learning with Formal Differential Privacy Guarantees"). In this case, a concrete bound on the error is very relevant to reduce the privacy parameter. The standard mechanism for continual counting is the binary mechanism. We present a novel mechanism and show that its mean squared error is both asymptotically optimal and a factor 10 smaller than the error of the binary mechanism. We also show that the constants in our analysis are almost tight by giving non-asymptotic lower and upper bounds that differ only in the constants of lower-order terms. Our algorithm is a matrix mechanism for the counting matrix and takes constant time per release. We also use our explicit factorization of the counting matrix to give an upper bound on the excess risk of the private learning algorithm of Denisov et al. (NeurIPS 2022). Our lower bound for any continual counting mechanism is the first tight lower bound on continual counting under approximate differential privacy. It is achieved using a new lower bound on a certain factorization norm, denoted by $\gamma_F(\cdot)$, in terms of the singular values of the matrix. In particular, we show that for any complex matrix, $A \in \mathbb{C}^{m \times n}$, \[ \gamma_F(A) \geq \frac{1}{\sqrt{m}}\\|A\\|_1, \] where $\\|\cdot \\|$ denotes the Schatten-1 norm. We believe this technique will be useful in proving lower bounds for a larger class of linear queries. To illustrate the power of this technique, we show the first lower bound on the mean squared error for answering parity queries.	翻訳日:2024-02-07 07:19:00 公開日:2024-02-05
# 物理インフォームド深部拡散MRIの合成データによる再構成:人工知能におけるブレークトレーニングデータ Physics-informed Deep Diffusion MRI Reconstruction with Synthetic Data: Break Training Data Bottleneck in Artificial Intelligence ( http://arxiv.org/abs/2210.11388v5 ) ライセンス: Link先を確認	Chen Qian, Yuncheng Gao, Mingyang Han, Zi Wang, Dan Ruan, Yu Shen, Yaping Wu, Yirong Zhou, Chengyan Wang, Boyu Jiang, Ran Tao, Zhigang Wu, Jiazheng Wang, Liuhong Zhu, Yi Guo, Taishan Kang, Jianzhong Lin, Tao Gong, Chen Yang, Guoqiang Fei, Meijin Lin, Di Guo, Jianjun Zhou, Meiyun Wang, and Xiaobo Qu	(参考訳) 拡散磁気共鳴イメージング(MRI)は、生体内水分子の非侵襲的な移動検出のための唯一の画像モダリティであり、臨床および研究に重要な応用がある。マルチショット技術によって取得された拡散MRI(DWI)は、高分解能、信号と雑音の比が良く、幾何歪みが単ショットよりも小さいが、ショット間動きによって引き起こされるアーティファクトに悩まされる。これらのアーティファクトは将来的に除去できないため、アーティファクトフリーのトレーニングラベルがない。したがって,マルチショットDWI再構成における深層学習の可能性は未解決のままである。そこで本研究では,物理拡散モデル(マグニチュード合成)とショット間動き誘導位相モデル(モーションフェーズ合成)を利用して,高品質なペアリングトレーニングデータを合成するための物理インフォームドディープDWI再構成法を提案する。ネットワークは10万の合成サンプルで一度だけ訓練され、複数の現実的な生体内データ再構成の結果が得られた。従来の方法に対する利点は以下のとおりである。 a) より優れたモーションアーティファクトの抑制と再構築の安定性 b)マルチレゾリューション,マルチb値,マルチアンサンプサンプリング,マルチベンダ,マルチセンタを含む,マルチセナリオ再構築の卓越した一般化 c) 7名の経験者(p<0.001)による検証患者に対する優れた臨床適応性(p<0.001) 結論として、piddはmri物理学の力を活用し、ディープラーニング医療画像におけるデータのボトルネックを破るコスト効率が高く説明可能な方法を提供する、新しいディープラーニングフレームワークを提案する。 Diffusion magnetic resonance imaging (MRI) is the only imaging modality for non-invasive movement detection of in vivo water molecules, with significant clinical and research applications. Diffusion MRI (DWI) acquired by multi-shot techniques can achieve higher resolution, better signal-to-noise ratio, and lower geometric distortion than single-shot, but suffers from inter-shot motion-induced artifacts. These artifacts cannot be removed prospectively, leading to the absence of artifact-free training labels. Thus, the potential of deep learning in multi-shot DWI reconstruction remains largely untapped. To break the training data bottleneck, here, we propose a Physics-Informed Deep DWI reconstruction method (PIDD) to synthesize high-quality paired training data by leveraging the physical diffusion model (magnitude synthesis) and inter-shot motion-induced phase model (motion phase synthesis). The network is trained only once with 100,000 synthetic samples, achieving encouraging results on multiple realistic in vivo data reconstructions. Advantages over conventional methods include: (a) Better motion artifact suppression and reconstruction stability; (b) Outstanding generalization to multi-scenario reconstructions, including multi-resolution, multi-b-value, multi-undersampling, multi-vendor, and multi-center; (c) Excellent clinical adaptability to patients with verifications by seven experienced doctors (p<0.001). In conclusion, PIDD presents a novel deep learning framework by exploiting the power of MRI physics, providing a cost-effective and explainable way to break the data bottleneck in deep learning medical imaging.	翻訳日:2024-02-07 07:17:16 公開日:2024-02-05
# FreDSNet:高速フーリエ畳み込みによる単分子深度とセマンティックセグメンテーション FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions ( http://arxiv.org/abs/2210.01595v2 ) ライセンス: Link先を確認	Bruno Berenguel-Baeta, Jesus Bermudez-Cameo and Jose J. Guerrero	(参考訳) 本研究では,単一パノラマから室内環境のセマンティックな3次元理解を得る深層学習ソリューションFreDSNetを提案する。全方位画像は、環境全体に関する360度のコンテキスト情報により、シーン理解の問題に対処する際のタスク固有の利点を明らかにする。しかしながら、全方位画像の固有特性は、オブジェクトの正確な検出と分割、あるいは深さ推定を得るための追加的な問題をもたらす。これらの問題を克服するために,各畳み込み層においてより広い受容場を得るために,頻繁領域の畳み込みを利用する。これらの畳み込みにより、全方位画像からコンテキスト情報全体を活用できる。 FreDSNetは、高速フーリエ畳み込みを利用した単一パノラマ画像からの単眼深度推定とセマンティックセグメンテーションを共同で提供する最初のネットワークである。実験の結果,FreDSNetはセマンティックセグメンテーションと深度推定のための工法と類似した性能を有することがわかった。 FreDSNetのコードはhttps://github.com/Sbrunoberenguel/FreDSNetで公開されている。 In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequential domain obtaining a wider receptive field in each convolutional layer. These convolutions allow to leverage the whole context information from omnidirectional images. FreDSNet is the first network that jointly provides monocular depth estimation and semantic segmentation from a single panoramic image exploiting fast Fourier convolutions. Our experiments show that FreDSNet has similar performance as specific state of the art methods for semantic segmentation and depth estimation. FreDSNet code is publicly available in https://github.com/Sbrunoberenguel/FreDSNet	翻訳日:2024-02-07 07:16:49 公開日:2024-02-05
# XOR ゲームと XOR* ゲームを接続する Connecting XOR and XOR* games ( http://arxiv.org/abs/2210.00397v4 ) ライセンス: Link先を確認	Lorenzo Catani, Ricardo Faleiro, Pierre-Emmanuel Emeriau, Shane Mansfield, Anna Pappa	(参考訳) この研究では、XOR非局所ゲームとXORシーケンシャルゲームという、独占的なリソースを持つ2種類のゲームに焦点を当てる。 XORゲームは、非ローカルゲームにおいて広く研究されており、リソースシステムが制御された操作と最終的な測定の順序に従うゲームの種類の中で、XORゲームが自然なものとして紹介されている。 XORのゲームには、$2\rightarrow 1$ quantum random access codes (QRAC) や[PRA 98,060302(2018)]でHenautらによって導入されたCHSHゲームがある。プロセス理論のダイアグラム言語を用いて、ある仮定の下でこれらの2つのゲームのクラスは、それらの最適戦略とそれらの古典(ベル)と量子(トシレルソン)境界を結ぶ明示的な定理によって関連付けられることを証明する。また、XOR* ゲームにおける変換の可逆性と資源システムの2次元性という2つの仮定が、明示的な反例を提供することで、定理の保持に厳密に必要であることを示す。我々は、XOR/XORゲーム対のいくつかの例と、XORゲームにおける量子計算の利点を生かす可能性のあるリソースを詳細に議論することで結論付けた。 In this work we focus on two classes of games: XOR nonlocal games and XOR* sequential games with monopartite resources. XOR games have been widely studied in the literature of nonlocal games, and we introduce XOR* games as their natural counterpart within the class of games where a resource system is subjected to a sequence of controlled operations and a final measurement. Examples of XOR* games are $2\rightarrow 1$ quantum random access codes (QRAC) and the CHSH* game introduced by Henaut et al. in [PRA 98,060302(2018)]. We prove, using the diagrammatic language of process theories, that under certain assumptions these two classes of games can be related via an explicit theorem that connects their optimal strategies, and so their classical (Bell) and quantum (Tsirelson) bounds. We also show that two of such assumptions -- the reversibility of transformations and the bi-dimensionality of the resource system in the XOR* games -- are strictly necessary for the theorem to hold by providing explicit counterexamples. We conclude with several examples of pairs of XOR/XOR* games and by discussing in detail the possible resources that power the quantum computational advantages in XOR* games.	翻訳日:2024-02-07 07:16:28 公開日:2024-02-05
# バースト伝搬を用いたマルチモーダル音声強調 Multimodal Speech Enhancement Using Burst Propagation ( http://arxiv.org/abs/2209.03275v2 ) ライセンス: Link先を確認	Mohsin Raza, Leandro A. Passos, Ahmed Khubaib, Ahsan Adeel	(参考訳) 本稿では,前頭前皮質および他の脳領域の錐体細胞に関する最新の神経学的発見を考察した,音声・視覚音声強調のための新しいマルチモーダルソリューションMBURSTを提案する。いわゆるバースト伝搬は、フィードバックによる可塑性のサインと大きさの操り、異なる重み接続による層間のフィードバックとフィードフォワード情報の多重化、フィードバックとフィードフォワード接続の近似、フィードバック信号の線形化など、より生物学的に妥当な方法でクレジット割り当て問題に取り組むためのいくつかの基準を実装している。 MBURSTは、雑音信号と視覚刺激の相関関係を学習する能力の恩恵を受け、関連する情報を増幅し、雑音を抑制することによって、音声に意味をもたらす。 Grid Corpus と CHiME3 をベースとしたデータセットを用いて行った実験では、MBURST はマルチモーダルバックプロパゲーションベースのベースラインに類似したマスク再構成を再現でき、エネルギー効率の優れた管理を証明し、ニューロンの発火速度を \textbf{$70\%$} 以下の値に下げることができた。このような機能はより持続可能な実装を意味し、補聴器や他の類似の組み込みシステムに適している。 This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.	翻訳日:2024-02-07 07:16:01 公開日:2024-02-05
# エヴェレットの量子力学の多世界解釈とワームホールの幾何学的位相 Geometric phases, Everett's many-worlds interpretation of quantum mechanics, and wormholes ( http://arxiv.org/abs/2302.13651v2 ) ライセンス: Link先を確認	David Viennot	(参考訳) 断熱量子力学における幾何学的位相の形式化が、エベレットの量子力学の多世界解釈(確率変化に必要な世界間の干渉や優先基底問題を解くのに必要なデコヒーレンス過程など)を許す幾何学的実現をいかに提供するかを示す。また、この幾何学的実現は量子重力(特に行列モデル)と密接に関連していることを示し、多世界解釈は量子重力と一致することを示した。一般相対性理論に借用されたワームホールの概念はこの幾何学的実現の中心である。これは、解釈を助けるためにアナログによるイメージとしてだけでなく、量子重力における量子ワームホールの真の物理モデルとしても見える。 We present how the formalism of geometric phases in adiabatic quantum dynamics provides geometric realisations permitting to ``embody'' the Everett's many-worlds interpretation of quantum mechanics, including interferences between the worlds needed for the probability changes and the decoherence processes needed to solve the preferred basis problem. We show also that this geometric realisation is intimately related to quantum gravity (especially to matrix models), showing that the many-world interpretation can be consistent with quantum gravity. The concept of wormhole borrowed to general relativity is central in this geometric realisation. It appears not only as an image by analogy to help the interpretations, but also as a true physical model of quantum wormhole in quantum gravity, the two ones being consistent which each other.	翻訳日:2024-02-07 07:07:07 公開日:2024-02-05
# トランスファーラーニングによるディープニューラルネットワークの多目的進化pruningとその性能とロバスト性向上 Multiobjective Evolutionary Pruning of Deep Neural Networks with Transfer Learning for improving their Performance and Robustness ( http://arxiv.org/abs/2302.10253v2 ) ライセンス: Link先を確認	Javier Poyatos, Daniel Molina, Aitor Mart\'inez, Javier Del Ser, Francisco Herrera	(参考訳) 進化的計算アルゴリズムは、アーキテクチャ、ハイパーパラメータ、トレーニング構成に関連する最適化問題を解くために使われており、今日のニューラルアーキテクチャ探索として知られる分野を鍛造している。これらのアルゴリズムは、ネットワークの複雑さを減らすニューラルネットワークのプルーニングや、目の前のものに関連する別の問題から知識をインポートするTransfer Learningといった他の手法と組み合わせられている。進化的提案の品質を評価するためのいくつかの基準の使用は、ネットワークのパフォーマンスと複雑さが最もよく使われる基準である、一般的なケースでもある。本研究は多目的進化解析アルゴリズムMO-EvoPruneDeepTLを提案する。 MO-EvoPruneDeepTLは、Transfer Learningを使用して、Deep Neural Networksの最後の層を、遺伝的アルゴリズムによって進化したスパース層に置き換える。提案の利点を評価するために,複数のデータセットを用いて異なる実験を行った。その結果,提案手法はすべての目的において有望な結果が得られ,その間に直接関係が提示されることがわかった。実験の結果、最も影響力のあるニューロンは、入力画像のどの部分がプルーンドニューラルネットワークの予測に最も関連しているかを説明するのに役立ちます。最後に,提案手法によるプルーニングパターンのパレートフロントにおける多様性を活かし,異なるプルーニングモデルのアンサンブルによって,トレーニングされたネットワーク全体の性能と頑健性が向上することを示す。 Evolutionary Computation algorithms have been used to solve optimization problems in relation with architectural, hyper-parameter or training configuration, forging the field known today as Neural Architecture Search. These algorithms have been combined with other techniques such as the pruning of Neural Networks, which reduces the complexity of the network, and the Transfer Learning, which lets the import of knowledge from another problem related to the one at hand. The usage of several criteria to evaluate the quality of the evolutionary proposals is also a common case, in which the performance and complexity of the network are the most used criteria. This work proposes MO-EvoPruneDeepTL, a multi-objective evolutionary pruning algorithm. MO-EvoPruneDeepTL uses Transfer Learning to adapt the last layers of Deep Neural Networks, by replacing them with sparse layers evolved by a genetic algorithm, which guides the evolution based in the performance, complexity and robustness of the network, being the robustness a great quality indicator for the evolved models. We carry out different experiments with several datasets to assess the benefits of our proposal. Results show that our proposal achieves promising results in all the objectives, and direct relation are presented among them. The experiments also show that the most influential neurons help us explain which parts of the input images are the most relevant for the prediction of the pruned neural network. Lastly, by virtue of the diversity within the Pareto front of pruning patterns produced by the proposal, it is shown that an ensemble of differently pruned models improves the overall performance and robustness of the trained networks.	翻訳日:2024-02-07 07:06:51 公開日:2024-02-05
# ピクセルからの混合交通制御と協調 Mixed Traffic Control and Coordination from Pixels ( http://arxiv.org/abs/2302.09167v4 ) ライセンス: Link先を確認	Michael Villarreal, Bibek Poudel, Jia Pan, Weizi Li	(参考訳) 交通渋滞は社会の永続的な問題である。交通制御の従来の方法は、現在の渋滞レベルを緩和する上で無駄であることが証明されており、道路上での自律性の異なる車両の出現が増えると、研究者はロボットによるアイデアを探求する。これにより、ロボット車両が強化学習(RL)を通じて人間駆動車両を規制する交通制御が混在する。しかし、既存の研究の多くは、各道路網の観測空間にドメインの専門知識と手作業を必要とする正確な観察を用いている。さらに、正確な観測では、環境流出などのグローバル情報や、車両の位置や速度といったローカル情報を使用する。この情報を得るには、既存の道路インフラを巨大なセンサー環境で更新し、潜在的に望ましくない人間ドライバーと通信する必要がある。画像観察は、rlによる混合交通制御のために広範囲に検討されていないモダリティである。 1) 画像は,環境から環境への観察空間の完全な再想像を必要としない。 2)画像は,衛星画像,車載カメラシステム,交通監視システムを通じてユビキタスである。 3)画像は機器への通信のみを必要とする。本研究では,画像観察を用いたロボット車両が,リング,図8,交差点,マージ,ボトルネックなどの環境に関する正確な情報を用いて,競合性能を実現することを示す。例えば、グローバルトラフィック情報とは対照的に、ローカルなトラフィック情報のみを使用しているにも関わらず、マージ環境における平均車両速度を最大8%向上させるなどです。 Traffic congestion is a persistent problem in our society. Previous methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that require domain expertise and hand engineering for each road network's observation space. Additionally, precise observations use global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations, a modality that has not been extensively explored for mixed traffic control via RL, as the alternative: 1) images do not require a complete re-imagination of the observation space from environment to environment; 2) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve competitive performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 8% increase in average vehicle velocity in the merge environment, despite only using local traffic information as opposed to global traffic information.	翻訳日:2024-02-07 07:06:23 公開日:2024-02-05
# Sneaky Spikes:ニューロモーフィックデータによるスパイクニューラルネットワークのバックドア攻撃を発見 Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural Networks with Neuromorphic Data ( http://arxiv.org/abs/2302.06279v3 ) ライセンス: Link先を確認	Gorka Abad, Oguzhan Ersoy, Stjepan Picek, Aitor Urbieta	(参考訳) ディープニューラルネットワーク(DNN)は、画像や音声認識など、さまざまなタスクで顕著なパフォーマンスを示している。しかし、DNNの有効性を最大化するには、トレーニングを通じて多数のハイパーパラメータとネットワークパラメータを慎重に最適化する必要がある。さらに、高性能DNNには多くのパラメータがあり、トレーニング中にかなりのエネルギーを消費する。これらの課題を克服するために、研究者はニューラルネットワーク(snn)をスパイクし、エネルギー効率の向上と生物学的に妥当なデータ処理能力を提供し、特にニューロモルフィックデータにおいて、感覚データタスクに非常に適している。それらの利点にもかかわらず、DNNのようなSNNは、敵の例やバックドア攻撃など、様々な脅威を受けやすい。しかし、これらの攻撃の理解と対処の観点からSNNの分野を探求する必要がある。本稿では,ニューロモルフィックデータセットと多様なトリガーを用いたSNNのバックドア攻撃について検討する。具体的には、画像などの領域における従来のトリガーよりも広い範囲の可能性を提供するために、その位置や色を操作できるニューロモルフィックデータ内のバックドアトリガーを探索する。我々は,攻撃成功率を100%まで達成しつつ,クリーンな精度に無視できる影響を保ちながら,様々な攻撃戦略を提示する。さらに、これらの攻撃のステルス性を評価し、最も強力な攻撃が重要なステルス能力を持っていることを明らかにした。最後に、画像領域から最先端の防御を適応させ、その効果をニューロモルフィックデータに評価し、それらが不足しているインスタンスを明らかにすることで、パフォーマンスが損なわれる。 Deep neural networks (DNNs) have demonstrated remarkable performance across various tasks, including image and speech recognition. However, maximizing the effectiveness of DNNs requires meticulous optimization of numerous hyperparameters and network parameters through training. Moreover, high-performance DNNs entail many parameters, which consume significant energy during training. In order to overcome these challenges, researchers have turned to spiking neural networks (SNNs), which offer enhanced energy efficiency and biologically plausible data processing capabilities, rendering them highly suitable for sensory data tasks, particularly in neuromorphic data. Despite their advantages, SNNs, like DNNs, are susceptible to various threats, including adversarial examples and backdoor attacks. Yet, the field of SNNs still needs to be explored in terms of understanding and countering these attacks. This paper delves into backdoor attacks in SNNs using neuromorphic datasets and diverse triggers. Specifically, we explore backdoor triggers within neuromorphic data that can manipulate their position and color, providing a broader scope of possibilities than conventional triggers in domains like images. We present various attack strategies, achieving an attack success rate of up to 100% while maintaining a negligible impact on clean accuracy. Furthermore, we assess these attacks' stealthiness, revealing that our most potent attacks possess significant stealth capabilities. Lastly, we adapt several state-of-the-art defenses from the image domain, evaluating their efficacy on neuromorphic data and uncovering instances where they fall short, leading to compromised performance.	翻訳日:2024-02-07 07:05:34 公開日:2024-02-05
# 3次元分子生成と最適化のための幾何完全拡散 Geometry-Complete Diffusion for 3D Molecule Generation and Optimization ( http://arxiv.org/abs/2302.04313v5 ) ライセンス: Link先を確認	Alex Morehead, Jianlin Cheng	(参考訳) 拡散確率モデル (DDPM) は近年, テキスト誘導画像生成から構造誘導タンパク質設計に至るまで, コンピュータビジョンや計算生物学などの分野における新たな最先端の成果を開拓し, 嵐による生成モデリングの分野を開拓している。後者の研究の線に沿って、DDPMフレームワーク内で同変グラフニューラルネットワーク(GNN)を用いて3次元分子を生成する方法が最近提案されている。しかし、そのような手法は分子グラフ生成中に3d分子の重要な幾何学的・物理的性質を学習できないため、分子に依存しないgnnを3dグラフの分断ネットワークとして採用し、大きな3d分子のデータセットに効果的にスケールする能力に悪影響を及ぼす。そこで本研究では,既存の3次元分子拡散モデルに対して,qm9データセットやより大きなgeom-drugsデータセットの条件的・非条件的設定において有意なマージンで勝る3d分子生成のための幾何完全拡散モデル(gcdm)を導入することで,これらのギャップに対処する。重要なのは,cgdmが3d分子生成のために学習した幾何完全分極化プロセスにより,ジオムドラッグのスケールで現実的な安定な大きな分子を生成できることである。さらに,GCDMの拡張は, 特定のタンパク質ポケットの3D分子を効果的に設計するだけでなく, 分子拡散モデルの新たな実世界の汎用性を示すために, 既存の3D分子の形状と化学組成を直接最適化するために, GCDMの幾何学的特徴を効果的に再利用できることを示した。ソースコードとデータはhttps://github.com/bioinfomachinelearning/bio-diffusionから無料で利用できます。 Denoising diffusion probabilistic models (DDPMs) have recently taken the field of generative modeling by storm, pioneering new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods have recently been proposed for generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. However, such methods are unable to learn important geometric and physical properties of 3D molecules during molecular graph generation, as they adopt molecule-agnostic and non-geometric GNNs as their 3D graph denoising networks, which negatively impacts their ability to effectively scale to datasets of large 3D molecules. In this work, we address these gaps by introducing the Geometry-Complete Diffusion Model (GCDM) for 3D molecule generation, which outperforms existing 3D molecular diffusion models by significant margins across conditional and unconditional settings for the QM9 dataset as well as for the larger GEOM-Drugs dataset. Importantly, we demonstrate that the geometry-complete denoising process GCDM learns for 3D molecule generation allows the model to generate realistic and stable large molecules at the scale of GEOM-Drugs, whereas previous methods fail to do so with the features they learn. Additionally, we show that extensions of GCDM can not only effectively design 3D molecules for specific protein pockets but also that GCDM's geometric features can effectively be repurposed to directly optimize the geometry and chemical composition of existing 3D molecules for specific molecular properties, demonstrating new, real-world versatility of molecular diffusion models. Our source code and data are freely available at https://github.com/BioinfoMachineLearning/Bio-Diffusion.	翻訳日:2024-02-07 07:04:41 公開日:2024-02-05
# 拡散混合によるグラフ生成 Graph Generation with Diffusion Mixture ( http://arxiv.org/abs/2302.03596v3 ) ライセンス: Link先を確認	Jaehyeong Jo, Dongki Kim, Sung Ju Hwang	(参考訳) グラフの生成は、非ユークリッド構造の複雑な性質を理解する必要がある実世界のタスクにとって大きな課題である。拡散モデルは最近グラフ生成で顕著な成功を収めているが、ノイズのあるサンプルを発音する学習はグラフ構造の生成を明示的に学習しないため、グラフの位相特性のモデル化には不向きである。この制限に対処するために,拡散過程の最終グラフ構造を明示的に学習することにより,グラフのトポロジーをモデル化する生成フレームワークを提案する。具体的には、生成過程を、高速収束をもたらす予測グラフに向けて駆動される終端条件付き拡散過程の混合として設計する。さらに,混合過程の簡単なパラメータ化を導入し,最終グラフ構造を学習する目的を考案し,最大確率トレーニングを可能にした。一般グラフと2D/3D分子生成タスクに関する広範な実験的検証により,本手法は従来の生成モデルよりも優れ,連続的な(例えば3D座標)と離散的な(例えば原子型)両方の特徴を持つ正確なトポロジを持つグラフを生成する。私たちのコードはhttps://github.com/harryjo97/DruM.comで利用可能です。 Generation of graphs is a major challenge for real-world tasks that require understanding the complex nature of their non-Euclidean structures. Although diffusion models have achieved notable success in graph generation recently, they are ill-suited for modeling the topological properties of graphs since learning to denoise the noisy samples does not explicitly learn the graph structures to be generated. To tackle this limitation, we propose a generative framework that models the topology of graphs by explicitly learning the final graph structures of the diffusion process. Specifically, we design the generative process as a mixture of endpoint-conditioned diffusion processes which is driven toward the predicted graph that results in rapid convergence. We further introduce a simple parameterization of the mixture process and develop an objective for learning the final graph structure, which enables maximum likelihood training. Through extensive experimental validation on general graph and 2D/3D molecule generation tasks, we show that our method outperforms previous generative models, generating graphs with correct topology with both continuous (e.g. 3D coordinates) and discrete (e.g. atom types) features. Our code is available at https://github.com/harryjo97/DruM.	翻訳日:2024-02-07 07:04:06 公開日:2024-02-05
# モデルベースクラスタリングにおける規則化と最適化 Regularization and Optimization in Model-Based Clustering ( http://arxiv.org/abs/2302.02450v2 ) ライセンス: Link先を確認	Raphael Araujo Sampaio, Joaquim Dias Garcia, Marcus Poggi, Thibaut Vidal	(参考訳) 概念的単純さから、k平均アルゴリズムの変種は教師なしクラスタ分析に広く用いられている。しかし、これらのアルゴリズムの主な欠点の1つは、本質的に同じ球面ガウスの混合をそのような分布から大きく逸脱するデータに適合させることである。対照的に、ガウス混合モデル(GMM)はよりリッチな構造に適合するが、共分散行列を表現するためにクラスタ毎に2次数のパラメータを推定する必要がある。これは2つの大きな問題をもたらします (i) 局所最小値の多さにより、基礎となる最適化問題は困難である。 (ii) それらのソリューションはデータに過度に適合する。本研究では,両問題を回避した検索戦略を設計する。一般GMMのためのより効率的な最適化アルゴリズムを開発し、これらのアルゴリズムと正規化戦略を組み合わせて過度な適合を避ける。より広範な計算解析により,クラスタの回復性は著しく向上しないことが明らかとなった。しかしながら、これらのテクニックを組み合わせることで、これまでk-meansアルゴリズムの変種によって実現されていなかった全く新しいレベルのパフォーマンスが実現され、非常に異なるクラスタ構造を解き放ちます。これらの結果から, GMM と k-means 法の間の現状に新たな光を当て, 一般 GMM をデータ探索に利用することが示唆された。このようなアプリケーションを容易にするため、提案手法を実装したJuliaパッケージ(UnsupervisedClustering.jlとRegularizedCovarianceMatrices.jl)とともに、オープンソースコードを提供する。 Due to their conceptual simplicity, k-means algorithm variants have been extensively used for unsupervised cluster analysis. However, one main shortcoming of these algorithms is that they essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estimating a quadratic number of parameters per cluster to represent the covariance matrices. This poses two main issues: (i) the underlying optimization problems are challenging due to their larger number of local minima, and (ii) their solutions can overfit the data. In this work, we design search strategies that circumvent both issues. We develop more effective optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting. Through extensive computational analyses, we observe that optimization or regularization in isolation does not substantially improve cluster recovery. However, combining these techniques permits a completely new level of performance previously unachieved by k-means algorithm variants, unraveling vastly different cluster structures. These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration. To facilitate such applications, we provide open-source code as well as Julia packages (UnsupervisedClustering.jl and RegularizedCovarianceMatrices.jl) implementing the proposed techniques.	翻訳日:2024-02-07 07:03:45 公開日:2024-02-05
# FedEBA+:エントロピーモデルによる公正かつ効果的なフェデレーション学習を目指して FedEBA+: Towards Fair and Effective Federated Learning via Entropy-Based Model ( http://arxiv.org/abs/2301.12407v4 ) ライセンス: Link先を確認	Lin Wang, Zhichao Wang, Sai Praneeth Karimireddy and Xiaoying Tang	(参考訳) 公平性を確保することは、モデルがすべてのクライアントで一貫した実行を可能にする連合学習(fl)の重要な側面である。しかしながら、グローバルモデルのパフォーマンスを向上し公平性を促進するflアルゴリズムの設計は、しばしば前者とのトレードオフを必要とするため、依然として大きな課題である。そこで本研究では,グローバルモデルの性能向上と同時に公平性を高める新しいflアルゴリズムfedeba+を提案する。 FedEBA+には、パフォーマンスの低いクライアントにより高い重みを割り当てる公平なアグリゲーションスキームとアライメント更新メソッドが組み込まれている。さらに、理論的収束解析を行い、FedEBA+の公正性を示す。大規模な実験により、フェデバ+は他のSOTAフェアネスFL法よりもフェアネスとグローバルモデルの性能の両面で優れていることが示された。 Ensuring fairness is a crucial aspect of Federated Learning (FL), which enables the model to perform consistently across all clients. However, designing an FL algorithm that simultaneously improves global model performance and promotes fairness remains a formidable challenge, as achieving the latter often necessitates a trade-off with the former. To address this challenge, we propose a new FL algorithm, FedEBA+, which enhances fairness while simultaneously improving global model performance. FedEBA+ incorporates a fair aggregation scheme that assigns higher weights to underperforming clients and an alignment update method. In addition, we provide theoretical convergence analysis and show the fairness of FedEBA+. Extensive experiments demonstrate that FedEBA+ outperforms other SOTA fairness FL methods in terms of both fairness and global model performance.	翻訳日:2024-02-07 07:02:57 公開日:2024-02-05
# 大規模言語モデルにおけるアライメントの基本限界 Fundamental Limitations of Alignment in Large Language Models ( http://arxiv.org/abs/2304.11082v5 ) ライセンス: Link先を確認	Yotam Wolf, Noam Wies, Oshri Avnery, Yoav Levine, Amnon Shashua	(参考訳) 人間と対話する言語モデルを開発する上で重要な側面は、人間のユーザにとって有用で有害な振る舞いを整列させることである。これは通常、望ましい振る舞いを高め、望ましくない振る舞い、すなわちアライメントと呼ばれるプロセスを抑制する方法でモデルを調整することによって達成される。本稿では,行動予測境界 (BEB) と呼ばれる理論的手法を提案する。重要なことに、このフレームワークの限界内では、モデルによって提示される確率が有限である任意の挙動に対して、プロンプトの長さとともに増加する確率で、モデルにこの挙動を出力させるようなプロンプトが存在することを証明している。これは、望ましくない振る舞いを弱め、完全に取り除かないアライメントプロセスは、敵対的な攻撃に対して安全ではないことを意味する。さらに,この枠組みは,人間からのフィードバックからの強化学習などの指導的アライメントアプローチが,LLMを望ましくない行動に駆り立てる傾向があることを示唆している。この理論結果は、現代の"chatGPT jailbreaks"と呼ばれる、敵のユーザがLSMを騙してアライメントガードレールを壊し、悪意のあるペルソナとして行動させることによって、大規模に実証されている。この結果から,LLMのアライメントにおける基本的な制限が明らかになり,AIの安全性を確保するための信頼性の高いメカニズムを考案する必要が生じた。 An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful for their human users. This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones, a process referred to as alignment. In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models. Importantly, we prove that within the limits of this framework, for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt. This implies that any alignment process that attenuates an undesired behavior but does not remove it altogether, is not safe against adversarial prompting attacks. Furthermore, our framework hints at the mechanism by which leading alignment approaches such as reinforcement learning from human feedback make the LLM prone to being prompted into the undesired behaviors. This theoretical result is being experimentally demonstrated in large scale by the so called contemporary "chatGPT jailbreaks", where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona. Our results expose fundamental limitations in alignment of LLMs and bring to the forefront the need to devise reliable mechanisms for ensuring AI safety.	翻訳日:2024-02-07 06:56:04 公開日:2024-02-05
# d次元球の指示関数のためのフーリエ級数とディープニューラルネットワークの点収束 Pointwise convergence of Fourier series and deep neural network for the indicator function of d-dimensional ball ( http://arxiv.org/abs/2304.08172v4 ) ライセンス: Link先を確認	Ryota Kawasumi and Tsuyoshi Yoneda	(参考訳) 本稿では,ディープニューラルネットワークとフーリエ級数との重大な違いを明らかにする。いくつかの放射関数の周期化の複数のフーリエ級数について、倉壺(2010)は球面部分和の挙動を調べ、ギブス・ウィルブラハムやピンスキー現象以外の3つ目の現象を発見した。特に第3のものは、点収束の防止を示す。それとは対照的に、特定のディープニューラルネットワークを与え、ポイントワイド収束を証明する。 In this paper, we clarify the crucial difference between a deep neural network and the Fourier series. For the multiple Fourier series of periodization of some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the behavior of the spherical partial sum and discovered the third phenomenon other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular, the third one exhibits prevention of pointwise convergence. In contrast to it, we give a specific deep neural network and prove pointwise convergence.	翻訳日:2024-02-07 06:54:51 公開日:2024-02-05
# エルゴード反復の強い安定性について On the strong stability of ergodic iterations ( http://arxiv.org/abs/2304.04657v4 ) ライセンス: Link先を確認	L\'aszl\'o Gy\"orfi, Attila Lovas, Mikl\'os R\'asonyi	(参考訳) 定常およびエルゴード列によって駆動される反復ランダム関数によって生成される過程を再検討する。そのような過程は、ランダムな初期化が存在し、その過程が定常でエルゴード的であり、他の初期化に対しては、二つの過程の差はほぼ確実にゼロに収束する。対応する再帰写像上のいくつかの穏やかな条件の下では、駆動列の条件がなければ、繰り返しの強い安定性を示す。一般化された自己回帰やキューイングなど、いくつかの応用が調査されている。さらに,依存雑音を伴うランジュバン型反復とマルチタイプの分岐過程について新たな結果が得られた。 We revisit processes generated by iterated random functions driven by a stationary and ergodic sequence. Such a process is called strongly stable if a random initialization exists, for which the process is stationary and ergodic, and for any other initialization, the difference between the two processes converges to zero almost surely. Under some mild conditions on the corresponding recursive map, without any condition on the driving sequence, we show the strong stability of iterations. Several applications are surveyed such as generalized autoregression and queuing. Furthermore, new results are deduced for Langevin-type iterations with dependent noise and for multitype branching processes.	翻訳日:2024-02-07 06:54:27 公開日:2024-02-05
# CLADE: 異方性医用画像の高分解能化のためのサイクル損失増強 CLADE: Cycle Loss Augmented Degradation Enhancement for Unpaired Super-Resolution of Anisotropic Medical Images ( http://arxiv.org/abs/2303.11831v3 ) ライセンス: Link先を確認	Michele Pascale, Vivek Muthurangu, Javier Montalt Tordera, Heather E Fitzke, Gauraang Bhatnagar, Stuart Taylor, Jennifer Steeden	(参考訳) 3次元(3d)イメージングは医療用途で一般的であるが、スキャン時間を短縮するために、厚く低解像度のスライスを持つ異方性3dボリュームがしばしば取得される。ディープラーニング(DL)は、超解像再構成(SRR)によって高分解能特徴を回復するソリューションを提供する。残念ながら、多くの3D医療アプリケーションではペアトレーニングデータが利用できないため、新しいアンペアドアプローチであるCLADE(Cycle Loss Augmented Degradation Enhancement)を提案する。 CLADEはCycleGANアーキテクチャを改良したサイクル一貫性勾配写像損失を用いて、異方性3Dボリュームデータ自体の高分解能平面の不整合パッチから低次元のSRRを学習する。腹部MRIおよび腹部CTにおけるCLADEの有用性を示し,低分解能ボリュームおよび最先端の自己監督型SRR, SMORE(Synthetic Multi-Orientation Resolution Enhancement)に対するCLADE画像品質の大幅な改善を示した。定量的PIQUEスコアと定量的エッジシャープネス(ES)は,MRI,CTともにCLADEに優れていた。 CLADEは低解像度のボリュームとSMOREよりも画質が良く、知覚のESも高い。本稿では, CLADEを用いた異方性3次元画像データの超高分解能再構成の可能性について述べる。 Three-dimensional (3D) imaging is popular in medical applications, however, anisotropic 3D volumes with thick, low-spatial-resolution slices are often acquired to reduce scan times. Deep learning (DL) offers a solution to recover high-resolution features through super-resolution reconstruction (SRR). Unfortunately, paired training data is unavailable in many 3D medical applications and therefore we propose a novel unpaired approach; CLADE (Cycle Loss Augmented Degradation Enhancement). CLADE uses a modified CycleGAN architecture with a cycle-consistent gradient mapping loss, to learn SRR of the low-resolution dimension, from disjoint patches of the high-resolution plane within the anisotropic 3D volume data itself. We show the feasibility of CLADE in abdominal MRI and abdominal CT and demonstrate significant improvements in CLADE image quality over low-resolution volumes and state-of-the-art self-supervised SRR; SMORE (Synthetic Multi-Orientation Resolution Enhancement). Quantitative PIQUE (qualitative perception-based image quality evaluator) scores and quantitative edge sharpness (ES - calculated as the maximum gradient of pixel intensities over a border of interest), showed superior performance for CLADE in both MRI and CT. Qualitatively CLADE had the best overall image quality and highest perceptual ES over the low-resolution volumes and SMORE. This paper demonstrates the potential of using CLADE for super-resolution reconstruction of anisotropic 3D medical imaging data without the need for paired 3D training data.	翻訳日:2024-02-07 06:52:01 公開日:2024-02-05
# 非平衡アトラスを用いた状態表現学習 State Representation Learning Using an Unbalanced Atlas ( http://arxiv.org/abs/2305.10267v2 ) ライセンス: Link先を確認	Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad	(参考訳) 多様体仮説は、高次元データはしばしば低次元多様体上にあり、この多様体を対象空間として利用するとより効率的な表現が得られると仮定する。多くの伝統的な多様体に基づく手法が次元の減少のために存在するが、自己教師あり学習への応用は遅い進歩を目撃している。最近のMSimCLR法は、多様体エンコーディングとSimCLRを組み合わせるが、その適用性を制限するために非常に低い目標エンコーディング次元を必要とする。本稿では,最先端の自己教師付き学習アプローチを超越するアンバランス・アトラス(ua)を用いた新しい学習パラダイムを提案する。提案したUAパラダイムに適合する時空間DeepInfomax(ST-DIM)フレームワークを適用して,DeepInfomaxを非平衡アトラス(DIM-UA)方式で検討・設計した。 DIM-UAの有効性はAtari Annotated RAM Interface (AtariARI)ベンチマークのトレーニングと評価を通じて実証される。 UAパラダイムは、ターゲット符号化次元の増大に伴って既存のアルゴリズムを大幅に改善する。例えば、DIM-UAのカテゴリの平均F1スコアは16384の隠れユニットを使用すると、ST-DIMの70%に比べて75%程度である。 The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves existing algorithms significantly as the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units.	翻訳日:2024-02-07 06:43:56 公開日:2024-02-05
# ハイパースペクトル画像の比スペクトル圧縮のための生成逆ネットワーク Generative Adversarial Networks for Spatio-Spectral Compression of Hyperspectral Images ( http://arxiv.org/abs/2305.08514v2 ) ライセンス: Link先を確認	Akshara Preethy Byju, Martin Hermann Paul Fuchs, Alisa Walda, Beg\"um Demir	(参考訳) ハイパースペクトル画像圧縮のための深層学習モデル(HSI)の開発は,近年,ハイパースペクトルデータアーカイブの急激な増加により,リモートセンシングにおいて大きな注目を集めている。既存のモデルのほとんどはスペクトル圧縮または空間圧縮を達成しており、HSIに存在する時空間冗長性も考慮していない。この問題に対処するため,本稿では,高忠実度圧縮(hific)モデル(空間圧縮問題に非常に有効であることが証明されている)に着目し,hsisの時空間圧縮に適応する。詳しくは 2つの新しいモデルを紹介します i) HiFiC using Squeeze and Excitation (SE) block (denoted as HiFiC$_{SE}$); そして ii)HSIの圧縮の枠組みにおける3D畳み込みによるHiFiC(HiFiC$_{3D}$) 本研究では,比例スペクトル冗長性圧縮におけるHiFiC$_{SE}$とHiFiC$_{3D}$の有効性を,チャネルアテンションと依存性間解析により解析する。実験結果から,高画質のビットレートで画像の再構成を行いながら,空間スペクトル圧縮を行う上でのモデルの有効性が示された。提案されたモデルのコードはhttps://git.tu-berlin.de/rsim/HSI-SSC で公開されている。 The development of deep learning-based models for the compression of hyperspectral images (HSIs) has recently attracted great attention in remote sensing due to the sharp growing of hyperspectral data archives. Most of the existing models achieve either spectral or spatial compression, and do not jointly consider the spatio-spectral redundancies present in HSIs. To address this problem, in this paper we focus our attention on the High Fidelity Compression (HiFiC) model (which is proven to be highly effective for spatial compression problems) and adapt it to perform spatio-spectral compression of HSIs. In detail, we introduce two new models: i) HiFiC using Squeeze and Excitation (SE) blocks (denoted as HiFiC$_{SE}$); and ii) HiFiC with 3D convolutions (denoted as HiFiC$_{3D}$) in the framework of compression of HSIs. We analyze the effectiveness of HiFiC$_{SE}$ and HiFiC$_{3D}$ in compressing the spatio-spectral redundancies with channel attention and inter-dependency analysis. Experimental results show the efficacy of the proposed models in performing spatio-spectral compression, while reconstructing images at reduced bitrates with higher reconstruction quality. The code of the proposed models is publicly available at https://git.tu-berlin.de/rsim/HSI-SSC .	翻訳日:2024-02-07 06:42:48 公開日:2024-02-05
# REMAST:ソフト移行によるリアルタイム感情に基づく音楽アレンジメント REMAST: Real-time Emotion-based Music Arrangement with Soft Transition ( http://arxiv.org/abs/2305.08029v2 ) ライセンス: Link先を確認	Zihao Wang, Le Ma, Chen Zhang, Bo Han, Yunfei Xu, Yikai Wang, Xinyi Chen, HaoRong Hong, Wenbo Liu, Xinda Wu, Kejun Zhang	(参考訳) 感情的な介入媒体としての音楽は、音楽療法、ゲーム、映画などのシナリオに重要な応用がある。しかし、音楽には感情の変化に応じてリアルタイムなアレンジメントが必要であり、ターゲット感情のきめ細かでミュータブルな性質のため、感情のリアルタイム適合性とソフトな感情移行のバランスをとるための課題をもたらす。既存の研究は主に感情をリアルタイムに適合させることに焦点が当てられているが、スムーズな移行の問題はまだ検討されており、音楽の全体的な感情的コヒーレンスに影響を与える。本稿では,このトレードオフに対応するためのREMASTを提案する。具体的には、最後のタイムステップの音楽感情を認識し、現在のタイムステップの入力感情と融合する。融合した感情はREMASTを誘導し、入力されたメロディに基づいて音楽を生成する。音楽の類似性と感情のリアルタイム適合性を柔軟に調整するために、オリジナルメロディを分解し、生成モデルに入力する。さらに、ドメイン知識による4つの音楽理論の特徴を設計し、感情情報を強化し、半教師付き学習を用いて、手動データセットアノテーションによる主観的バイアスを軽減する。評価結果によると,REMASTは客観的および主観的指標において最先端の手法を上回っている。これらの結果から,remastはリアルタイムの適合性と滑らかな遷移を同時に達成でき,生成音楽の一貫性が向上した。 Music as an emotional intervention medium has important applications in scenarios such as music therapy, games, and movies. However, music needs real-time arrangement according to changing emotions, bringing challenges to balance emotion real-time fit and soft emotion transition due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of smooth transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose REMAST to address this trade-off. Specifically, we recognize the last timestep's music emotion and fuse it with the current timestep's input emotion. The fused emotion then guides REMAST to generate the music based on the input melody. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features by domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, REMAST surpasses the state-of-the-art methods in objective and subjective metrics. These results demonstrate that REMAST achieves real-time fit and smooth transition simultaneously, enhancing the coherence of the generated music.	翻訳日:2024-02-07 06:42:28 公開日:2024-02-05
# エンタングルメント支援マルチパーティ計算の通信複雑性 Communication complexity of entanglement assisted multi-party computation ( http://arxiv.org/abs/2305.04435v4 ) ライセンス: Link先を確認	Ruoyu Meng, Aditya Ramamoorthy	(参考訳) プレイヤー2, \dots, n$はプレイヤー1に適切な情報を伝達する必要があるので、適切な約束を持つ「一般化された」内部積関数を計算することができる。プロトコルの通信複雑性は、通信が必要なビットの総数である。 n$ が素数で選択された関数の場合、量子プロトコル(複雑性 $(n-1) \log n$ ビット)と古典的なプロトコル(複雑性 $(n-1)^2 (\log n^2$) ビット)を示す。量子プロトコルでは、プレイヤーは絡み合った量子ビットにアクセスするが、通信は古典的である。さらに,古典的通信複雑性の下位境界を決定する整数線形プログラミングの定式化を提案する。これは、量子プロトコルが古典的プロトコルよりも厳密に優れていることを示す。 We consider a quantum and classical version multi-party function computation problem with $n$ players, where players $2, \dots, n$ need to communicate appropriate information to player 1, so that a "generalized" inner product function with an appropriate promise can be calculated. The communication complexity of a protocol is the total number of bits that need to be communicated. When $n$ is prime and for our chosen function, we exhibit a quantum protocol (with complexity $(n-1) \log n$ bits) and a classical protocol (with complexity $(n-1)^2 (\log n^2$) bits). In the quantum protocol, the players have access to entangled qudits but the communication is still classical. Furthermore, we present an integer linear programming formulation for determining a lower bound on the classical communication complexity. This demonstrates that our quantum protocol is strictly better than classical protocols.	翻訳日:2024-02-07 06:41:08 公開日:2024-02-05
# 連続手話認識のためのデノジング拡散アライメント Denoising-Diffusion Alignment for Continuous Sign Language Recognition ( http://arxiv.org/abs/2305.03614v3 ) ライセンス: Link先を確認	Leming Guo and Wanli Xue and Ze Kang and Yuxi Zhou and Tiantian Yuan and Zan Gao and Shengyong Chen	(参考訳) 社会的善の鍵として、連続手話認識(CSLR)は聴覚障害に対するアクティブでアクセスしやすいコミュニケーションを促進することを目的としている。現在のCSLR研究は、ビデオクリップ・テクスチュアル・グロス間のマッピング関係を学習するために、モダリティ間のアライメント方式を採用している。しかし、この局所的アライメント法、特に弱いデータアノテーションでは、モダリティの文脈情報を無視し、視覚的特徴の一般化を直接減らす。そこで,本稿では,映像グロス列のマッピングのモデル化に焦点をあてた,dda(denoising-diffusion global alignment scheme)を提案する。 DDAは部分的なノイズ発生処理戦略とデノイング拡散オートエンコーダから構成される。前者は視覚的モダリティに対するテキストモダリティの効率的なガイダンスを達成するために使用され、後者は2つのモダリティのグローバルアライメント情報を視覚的に学習する。 CSLRにおける視覚表現学習における拡散モデルの有効性を確認した。 3つの公開ベンチマーク実験により,本手法が最先端の性能を実現することを示す。さらに,提案手法は,他のCSLR手法を一般化するためのプラグアンドプレイ最適化である。 As a key to social good, continuous sign language recognition (CSLR) aims to promote active and accessible communication for the hearing impaired. Current CSLR research adopts a cross-modality alignment scheme to learn the mapping relationship between "video clip-textual gloss". However, this local alignment method, especially with weak data annotation, ignores the contextual information of modalities and directly reduces the generalization of visual features. To this end, we propose a novel Denoising-Diffusion global Alignment scheme (DDA), which focuses on modeling the mapping of the "entire video-gloss sequence". DDA consists of a partial noising process strategy and a denoising-diffusion autoencoder. The former is used to achieve efficient guidance of the text modality to the visual modality; the latter learns the global alignment information of the two modalities in a denoising manner. Our DDA confirms the feasibility of diffusion models for visual representation learning in CSLR. Experiments on three public benchmarks demonstrate that our method achieves state-of-the-art performances. Furthermore, the proposed method can be a plug-and-play optimization to generalize other CSLR methods.	翻訳日:2024-02-07 06:40:30 公開日:2024-02-05
# リモートセンシング時系列用軽量予習変圧器 Lightweight, Pre-trained Transformers for Remote Sensing Timeseries ( http://arxiv.org/abs/2304.14065v4 ) ライセンス: Link先を確認	Gabriel Tseng, Ruben Cartuyvels, Ivan Zvonkov, Mirali Purohit, David Rolnick, Hannah Kerner	(参考訳) 衛星データの機械学習手法は、社会的な応用範囲が広いが、モデルの訓練に用いられるラベルは、取得が困難か不可能である。セルフスーパービジョン(Self-supervision)は、ラベル付きデータに制限のある設定において自然なソリューションであるが、現在の衛星データに対するセルフ教師付きモデルは、時間次元(作物の成長を監視するなど、多くのアプリケーションにとって重要な)や、多くの補完センサーからのデータの可用性(モデルの予測性能を著しく向上させる)など、そのデータの特徴を生かしていない。リモートセンシングの時系列データに基づいて事前学習したPresto(Pretrained Remote Sensing Transformer)を提案する。リモートセンシングデータに特化してPrestoを設計することにより、はるかに小さいがパフォーマンスのモデルを作成することができる。 Prestoは、多種多様なグローバル分散リモートセンシングタスクに優れ、はるかに少ない計算を必要としながら、はるかに大きなモデルと競争的に機能する。 Prestoは、転送学習や単純なモデルの機能抽出に使用することができ、大規模に効率的にデプロイできる。 Machine learning methods for satellite data have a range of societally relevant applications, but labels used to train models can be difficult or impossible to acquire. Self-supervision is a natural solution in settings with limited labeled data, but current self-supervised models for satellite data fail to take advantage of the characteristics of that data, including the temporal dimension (which is critical for many applications, such as monitoring crop growth) and availability of data from many complementary sensors (which can significantly improve a model's predictive performance). We present Presto (the Pretrained Remote Sensing Transformer), a model pre-trained on remote sensing pixel-timeseries data. By designing Presto specifically for remote sensing data, we can create a significantly smaller but performant model. Presto excels at a wide variety of globally distributed remote sensing tasks and performs competitively with much larger models while requiring far less compute. Presto can be used for transfer learning or as a feature extractor for simple models, enabling efficient deployment at scale.	翻訳日:2024-02-07 06:39:34 公開日:2024-02-05
# Parrot Dilemma: 分類作業における人間ラベルとLLM拡張データ The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks ( http://arxiv.org/abs/2304.13861v2 ) ライセンス: Link先を確認	Anders Giovanni M{\o}ller, Jacob Aarup Dalsgaard, Arianna Pera, Luca Maria Aiello	(参考訳) 計算社会科学(css)の領域では、実践者は複雑で低リソースのドメインをナビゲートし、データの取得と注釈付けのコストと時間を要する課題に直面する。我々は,GPT-4 と Llama-2 から生成した合成データを用いて,複雑度の異なる 10 個のCSS 分類タスクにおいて,人間のラベル付きデータを用いて,このような課題に対処するためのガイドラインを確立することを目的とする。さらに,データサイズのトレーニングがパフォーマンスに与える影響についても検討する。以上の結果から,人間のラベル付きデータに基づいてトレーニングしたモデルでは,人工的に強化したモデルよりも優れた,あるいは同等の性能を示すことがわかった。にもかかわらず、合成強化は特に多クラスタスクにおける希少なクラスの性能向上に有益である。さらに, GPT-4 と Llama-2 をゼロショット分類に利用し, 高い性能を示すが, 適度な訓練セットで訓練した特殊分類器に比べ, しばしば不足することがわかった。 In the realm of Computational Social Science (CSS), practitioners often navigate complex, low-resource domains and face the costly and time-intensive challenges of acquiring and annotating data. We aim to establish a set of guidelines to address such challenges, comparing the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks of varying complexity. Additionally, we examine the impact of training data sizes on performance. Our findings reveal that models trained on human-labeled data consistently exhibit superior or comparable performance compared to their synthetically augmented counterparts. Nevertheless, synthetic augmentation proves beneficial, particularly in improving performance on rare classes within multi-class tasks. Furthermore, we leverage GPT-4 and Llama-2 for zero-shot classification and find that, while they generally display strong performance, they often fall short when compared to specialized classifiers trained on moderately sized training sets.	翻訳日:2024-02-07 06:39:14 公開日:2024-02-05
# クロスエントロピー推定器を用いた強化学習によるベイズ逐次実験設計 Statistically Efficient Bayesian Sequential Experiment Design via Reinforcement Learning with Cross-Entropy Estimators ( http://arxiv.org/abs/2305.18435v2 ) ライセンス: Link先を確認	Tom Blau, Iadine Chades, Amir Dezfouli, Daniel Steinberg, Edwin V. Bonilla	(参考訳) 強化学習は、実験のシーケンスを設計するための償却された設計方針を学ぶことができる。しかし、現在の補正手法は、予測情報ゲイン(EIG)の推定に頼っており、偏りのない推定を達成するには、EIGの大きさの指数的なサンプル数を必要とする。本稿では,ジョイントモデル分布のクロスエントロピーとフレキシブルな提案分布に基づく代替推定器の利用を提案する。この提案分布は、実験履歴と設計方針から得られたモデルパラメータの真の後部を近似する。提案手法は,従来の手法の指数サンプルの複雑さを克服し,高いEIG値のより正確な推定を行う。より重要なことは、優れた設計方針の学習を可能にし、連続的で離散的な設計空間、非微分可能性、さらには暗黙の確率モデルとも互換性がある。 Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method overcomes the exponential-sample complexity of previous approaches and provide more accurate estimates of high EIG values. More importantly, it allows learning of superior design policies, and is compatible with continuous and discrete design spaces, non-differentiable likelihoods and even implicit probabilistic models.	翻訳日:2024-02-07 06:31:27 公開日:2024-02-05
# 自己監督学習のための行列情報理論 Matrix Information Theory for Self-Supervised Learning ( http://arxiv.org/abs/2305.17326v4 ) ライセンス: Link先を確認	Yifan Zhang, Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan	(参考訳) 対照的な学習はしばしば、正のアンカーサンプルと複数の負のサンプルを比較して自己教師付き学習(ssl)を行う。しかし、BYOL、SimSiam、Barlow Twinsといった競合しないアプローチは、明示的な負のサンプルなしでSSLを実現する。本稿では,コントラスト的および非矛盾的学習法を多数記述した統一行列情報理論の枠組みを提案する。次に,行列情報理論に基づく新しい行列ssl法を提案する。実験結果から, Matrix-SSLは, 線形評価条件下でのImageNetデータセットや, 伝達学習タスクにおけるMS-COCOにおいて, 最先端の手法を著しく上回ることがわかった。具体的には,100エポック事前学習を行う場合,SimCLRの4.6%,MS-COCOで転送学習を行う場合,MoCo v2やBYOLなどの従来のSOTA手法よりも3.3%,800エポック前訓練に比べて400エポックに優れていた。コードはhttps://github.com/yifanzhang-pro/matrix-ssl。 Contrastive learning often relies on comparing positive anchor samples with multiple negative samples to perform Self-Supervised Learning (SSL). However, non-contrastive approaches like BYOL, SimSiam, and Barlow Twins achieve SSL without explicit negative samples. In this paper, we introduce a unified matrix information-theoretic framework that explains many contrastive and non-contrastive learning methods. We then propose a novel method Matrix-SSL based on matrix information theory. Experimental results reveal that Matrix-SSL significantly outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing 100 epochs pre-training, our method outperforms SimCLR by 4.6%, and when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.	翻訳日:2024-02-07 06:31:13 公開日:2024-02-05
# ベイズ原理による神経添加モデルの改善 Improving Neural Additive Models with Bayesian Principles ( http://arxiv.org/abs/2305.16905v3 ) ライセンス: Link先を確認	Kouroche Bouchiat, Alexander Immer, Hugo Y\`eche, Gunnar R\"atsch, Vincent Fortuin	(参考訳) ニューラル加算モデル(NAM)は、個別の加算サブネットワークにおける入力特徴を扱うことにより、ディープニューラルネットワークの透明性を高める。しかし、それらは不確かさを校正し、関連する特徴と相互作用の選択を可能にする固有のメカニズムを欠いている。ベイズの観点から NAM に近づくと、我々はこれらを3つの主要な方法で拡張する。 a) 個別の添加物サブネットワークに対して信頼できる間隔を提供する b) 経験的ベイズ手続による特徴の暗黙の選択を行う限界的可能性の推定 c) 微調整モデルにおける二階相互作用の候補としての機能対のランク付けを容易にすること。特にlaplace-approximated nams (la-nams) を開発し,表型データセットにおける経験的性能の向上と現実の医療課題への挑戦を示した。 Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.	翻訳日:2024-02-07 06:30:40 公開日:2024-02-05
# 神経不完全因子分解:共役勾配法による学習前条件 Neural incomplete factorization: learning preconditioners for the conjugate gradient method ( http://arxiv.org/abs/2305.16368v2 ) ライセンス: Link先を確認	Paul H\"ausner, Ozan \"Oktem, Jens Sj\"olund	(参考訳) 共役勾配法のような反復解法を加速するための適切な前提条件を見つけることは、研究の活発な領域である。本稿では,手作業によるアルゴリズムをニューラルネットワークに置き換える,計算効率のよいデータ駆動手法を提案する。線形システムの条件数を直接最適化することは計算不可能である。その代わり、この手法は行列の不完全因子分解を生成し、それを神経不完全因子分解(neuralif)と呼ぶ。効率的なトレーニングには,行列ベクトル乗算のみを必要とするフロベニウス損失の確率近似を用いる。本手法のコアとなるのは,スパース行列理論にインスパイアされた新しいメッセージパッシングブロックであり,行列のスパース分解を求める目的と一致する。共役勾配法で使用される従来のプリコンディショナーをグラフニューラルネットワークに基づくデータ駆動モデルに置き換えることで,反復解法を高速化する。提案手法は,科学計算から生じる合成問題と実世界の問題の両方について評価し,計算効率を保ちながら解解時間を短縮できることを示す。 Finding suitable preconditioners to accelerate iterative solution methods, such as the conjugate gradient method, is an active area of research. In this paper, we develop a computationally efficient data-driven approach to replace the typically hand-engineered algorithms with neural networks. Optimizing the condition number of the linear system directly is computationally infeasible. Instead, our method generates an incomplete factorization of the matrix and is, therefore, referred to as neural incomplete factorization (NeuralIF). For efficient training, we utilize a stochastic approximation of the Frobenius loss which only requires matrix-vector multiplications. At the core of our method is a novel messagepassing block, inspired by sparse matrix theory, that aligns with the objective of finding a sparse factorization of the matrix. By replacing conventional preconditioners used within the conjugate gradient method by data-driven models based on graph neural networks, we accelerate the iterative solving procedure. We evaluate our proposed method on both a synthetic and a real-world problem arising from scientific computing and show its ability to reduce the solving time while remaining computationally efficient.	翻訳日:2024-02-07 06:30:27 公開日:2024-02-05
# LAraBench: 大規模言語モデルによるアラビアAIのベンチマーク LAraBench: Benchmarking Arabic AI with Large Language Models ( http://arxiv.org/abs/2305.14982v2 ) ライセンス: Link先を確認	Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam	(参考訳) 近年のLarge Language Models(LLM)の進歩は、言語研究や音声研究の風景に大きな影響を与えている。この進歩にもかかわらず、これらのモデルは特定の言語やタスクに適した最新技術(SOTA)モデルに対する特定のベンチマークを欠いている。 larabench氏は、アラビア語自然言語処理(nlp)と音声処理タスクにおけるこのギャップに対処している。 gpt-3.5-turbo,gpt-4,bloomz,jais-13b-chat,whisper,usmなどのモデルを用いて,61の公開データセットにまたがる33の異なるタスクに取り組むためのゼロショットとマイショットの学習技術を用いた。これには98の実験的なセットアップが含まれ、約296Kのデータポイント、約46時間スピーチ、テキスト音声(TTS)30文が含まれていた。この試みにより330以上の実験が行われた。分析では,SOTAモデルとLLMの性能ギャップの測定に焦点をあてた。概して観察される傾向は、SOTAモデルは概してゼロショット学習においてLLMよりも優れており、例外もある。特に、少ないショット学習技術を持つ大きな計算モデルでは、パフォーマンスのギャップを低減できた。本研究は,アラビア語NLPおよび音声処理タスクにおけるLLMの適用性に関する貴重な知見を提供する。 Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tagging and content classification across different domains. We utilized models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM, employing zero and few-shot learning techniques to tackle 33 distinct tasks across 61 publicly available datasets. This involved 98 experimental setups, encompassing ~296K data points, ~46 hours of speech, and 30 sentences for Text-to-Speech (TTS). This effort resulted in 330+ sets of experiments. Our analysis focused on measuring the performance gap between SOTA models and LLMs. The overarching trend observed was that SOTA models generally outperformed LLMs in zero-shot learning, with a few exceptions. Notably, larger computational models with few-shot learning techniques managed to reduce these performance gaps. Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.	翻訳日:2024-02-07 06:30:09 公開日:2024-02-05
# 次元還元型人間分類の合理的モデル A Rational Model of Dimension-reduced Human Categorization ( http://arxiv.org/abs/2305.14383v2 ) ライセンス: Link先を確認	Yifan Hong and Chen Wang	(参考訳) 人間はいくつかの重要な特徴に基づいてオブジェクトを分類する傾向がある。本稿では,確率的主成分分析器(mPPCA)の混合を利用した合理的分類モデルを提案する。このモデルは、特徴次元を縮小した各カテゴリを表現し、局所的な特徴をカテゴリ間で共有し、数発の学習を容易にする。理論的には、次元縮小表現が全次元表現を上回る必要十分条件を同定する。次に,行動実験において,実例とプロトタイプモデルに対する人間の分類予測において,mppcaの優れた性能を示す。畳み込みニューラルネットワークと組み合わせると、各カテゴリの単一の主成分次元を持つmPPCA分類器は、${\tt CIFAR-10H}$人間の分類データセット上の線形分類器でResNetに匹敵する性能を達成する。 Humans tend to categorize objects based on a few key features. We propose a rational model of categorization that utilizes a mixture of probabilistic principal component analyzers (mPPCA). This model represents each category with reduced feature dimensions and allows local features to be shared across categories to facilitate few-shot learning. Theoretically, we identify the necessary and sufficient condition for dimension-reduced representation to outperform full-dimension representation. We then show the superior performance of mPPCA in predicting human categorization over exemplar and prototype models in a behavioral experiment. When combined with the convolutional neural network, the mPPCA classifier with a single principal component dimension for each category achieves comparable performance to ResNet with a linear classifier on the ${\tt CIFAR-10H}$ human categorization dataset.	翻訳日:2024-02-07 06:29:46 公開日:2024-02-05
# HumBEL:人間と機械の会話における言語モデルの復号化要因の評価手法 HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors of Language Models in Human-Machine Conversations ( http://arxiv.org/abs/2305.14195v3 ) ライセンス: Link先を確認	Anthony Sicilia, Jennifer C. Gates, and Malihe Alikhani	(参考訳) 年齢や性別などの人口統計要因は、人々の話し方、特に機械との話し方を変えるが、これらの変化にどの程度大きな事前訓練された言語モデル(LM)が適応できるかは、ほとんど調査されていない。このギャップを是正するために,lm言語スキルの人口統計学的要因を計測し,対象集団との適合度を判断する方法を検討する。ヒトにおける言語スキル獲得の基準を持つ音声言語病理の臨床的手法を提案する。本稿では,専門医(臨床認可言語病理医)と共同で評価を行い,臨床評価を大規模に補完する自動手法を提案する。 GPT-3.5は、推論を必要とするタスクにおいて、6～15歳までの人間の能力を模倣し、同時に、記憶化時に典型的な21歳以上の能力より優れています。 GPT-3.5は社会語の使用にも問題があり、テストされた実用的スキルの50%以下である。 LMを公的なツールとして使用する場合、人口統計アライメントと会話目標を検討することが重要であることを確認する。コード、データ、パッケージが利用可能になる。 While demographic factors like age and gender change the way people talk, and in particular, the way people talk to machines, there is little investigation into how large pre-trained language models (LMs) can adapt to these changes. To remedy this gap, we consider how demographic factors in LM language skills can be measured to determine compatibility with a target demographic. We suggest clinical techniques from Speech Language Pathology, which has norms for acquisition of language skills in humans. We conduct evaluation with a domain expert (i.e., a clinically licensed speech language pathologist), and also propose automated techniques to complement clinical evaluation at scale. Empirically, we focus on age, finding LM capability varies widely depending on task: GPT-3.5 mimics the ability of humans ranging from age 6-15 at tasks requiring inference, and simultaneously, outperforms a typical 21 year old at memorization. GPT-3.5 also has trouble with social language use, exhibiting less than 50% of the tested pragmatic skills. Findings affirm the importance of considering demographic alignment and conversational goals when using LMs as public-facing tools. Code, data, and a package will be available.	翻訳日:2024-02-07 06:29:33 公開日:2024-02-05
# 分散シフトに対する大規模言語モデルのロバストプロンプト最適化 Robust Prompt Optimization for Large Language Models Against Distribution Shifts ( http://arxiv.org/abs/2305.13954v3 ) ライセンス: Link先を確認	Moxin Li, Wenjie Wang, Fuli Feng, Yixin Cao, Jizhi Zhang, Tat-Seng Chua	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて重要な能力を示している。しかし、その効果はタスクプロンプトの表現に大きく依存しており、ラベル付きタスクデータを用いた自動プロンプト最適化の研究に繋がる。我々は,これらの迅速な最適化手法が,顧客レビュー分析などの現実シナリオにおいてLLMに共通するサブポピュレーションシフトなどの分散シフトに対して脆弱であることを明らかにする。そこで本研究では,ラベル付きソースグループに対して最適化されたプロンプトを,ラベル付きターゲットグループに同時に一般化できるような,分散シフトに対するLSMのロバストなプロンプト最適化法を提案する。そこで本研究では,対象グループからのラベルなしデータをプロンプト最適化に組み込む汎用的なプロンプト最適化フレームワークを提案する。大規模な実験結果から,提案フレームワークの有効性が示され,対象群では性能が向上し,ソース群では同等の性能が向上した。 Large Language Model (LLM) has demonstrated significant ability in various Natural Language Processing tasks. However, their effectiveness is highly dependent on the phrasing of the task prompt, leading to research on automatic prompt optimization using labeled task data. We reveal that these prompt optimization techniques are vulnerable to distribution shifts such as subpopulation shifts, which are common for LLMs in real-world scenarios such as customer reviews analysis. In this light, we propose a new problem of robust prompt optimization for LLMs against distribution shifts, which requires the prompt optimized over the labeled source group can simultaneously generalize to an unlabeled target group. To solve this problem, we propose Generalized Prompt Optimization framework, which incorporates the unlabeled data from the target group into prompt optimization. Extensive experimental results demonstrate the effectiveness of the proposed framework with significant performance improvement on the target group and comparable performance on the source group.	翻訳日:2024-02-07 06:29:12 公開日:2024-02-05
# マトリックスの完成度向上のための爆発観測バイアス Exploiting Observation Bias to Improve Matrix Completion ( http://arxiv.org/abs/2306.04775v2 ) ライセンス: Link先を確認	Yassir Jedra, Sean Mann, Charlotte Park, Devavrat Shah	(参考訳) 本稿では,Ma と Chen が導入したモデルに類似したモデルを用いて,入出力をバイアス的に明らかにする行列補完の変種を考える。通常の場合のように、この観察バイアスを不利として扱う代わりに、目標は、バイアスと関心の結果の間の共有情報を利用して予測を改善することである。これに向けて,観測パターンと興味の成果が,基礎となる潜在要因や観測されていない要因の同じ集合によって駆動される自然モデルを考える。このアルゴリズムは2段階の行列補完アルゴリズムを導出する: まず、観測パターンに対応する完全観測された雑音二乗行列に対する行列補完を利用して遅延因子を復元する; 次に、得られた遅延因子を特徴として利用し、緩やかに観測された雑音の結果をラベルとして、非パラメトリック教師付き学習を行う。有限サンプル誤差率解析は、対数的因子を無視して、このアプローチが対応する教師付き学習パラメトリックレートと競合することを示唆している。これは、2段階の手法がバイアスと結果の間の共有情報を活用することによって、観測されていない潜在要因へのアクセスに匹敵する性能を持つことを意味する。実世界のデータセットを用いた経験的評価により, この2段階のアルゴリズムでは, 従来の行列補完法に比べて平均2乗誤差が30倍小さく, モデルの有用性と提案手法が示唆された。 We consider a variant of matrix completion where entries are revealed in a biased manner, adopting a model akin to that introduced by Ma and Chen. Instead of treating this observation bias as a disadvantage, as is typically the case, the goal is to exploit the shared information between the bias and the outcome of interest to improve predictions. Towards this, we consider a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent or unobserved factors. This leads to a two stage matrix completion algorithm: first, recover (distances between) the latent factors by utilizing matrix completion for the fully observed noisy binary matrix corresponding to the observation pattern; second, utilize the recovered latent factors as features and sparsely observed noisy outcomes as labels to perform non-parametric supervised learning. The finite-sample error rates analysis suggests that, ignoring logarithmic factors, this approach is competitive with the corresponding supervised learning parametric rates. This implies the two-stage method has performance that is comparable to having access to the unobserved latent factors through exploiting the shared information between the bias and outcomes. Through empirical evaluation using a real-world dataset, we find that with this two-stage algorithm, the estimates have 30x smaller mean squared error compared to traditional matrix completion methods, suggesting the utility of the model and the method proposed in this work.	翻訳日:2024-02-07 06:18:21 公開日:2024-02-05
# Equity-Transformer: NP-hard Min-Max Routing 問題を量子コンテキスト付き逐次生成として解く Equity-Transformer: Solving NP-hard Min-Max Routing Problems as Sequential Generation with Equity Context ( http://arxiv.org/abs/2306.02689v3 ) ライセンス: Link先を確認	Jiwoo Son, Minsu Kim, Sanghyeok Choi, Hyeonah Kim, Jinkyoo Park	(参考訳) 最小限のルーティング問題は、エージェントが協調的にタスクを実行することで、複数のエージェント間の最大ツアー長を最小化することを目的としている。これらの問題には影響のある実世界の応用が含まれるが、NPハードとして知られている。既存の手法は、特に数千の都市をカバーするために多数のエージェントの調整を必要とする大規模な問題に直面している。本稿では,大規模なmin-maxルーティング問題を解決するためにEquity-Transformerを提案する。まず、min-maxルーティング問題に対処するためにシーケンシャルプランニングアプローチを採用し、強力なシーケンスジェネレータ(Transformerなど)を活用する。第2に,エージェント間の均等なワークロード分布を確保するための重要な帰納バイアスを提案する。 Equity-Transformerの有効性は、min-maxマルチエージェント旅行セールスマン問題(min-max mTSP)とmin-maxマルチエージェントピックアップ・デリバリ問題(min-max mPDP)の2つの代表的なmin-maxルーティングタスクにおいて、優れた性能で実証されている。特に,mTSP1000都市100台において,競争的ヒューリスティック(LKH3)と比較して,約335倍,コストが約53倍のランタイムの大幅な削減を実現している。再現可能なソースコードは \url{https://github.com/kaist-silab/equity-transformer} です。 Min-max routing problems aim to minimize the maximum tour length among multiple agents by having agents conduct tasks in a cooperative manner. These problems include impactful real-world applications but are known as NP-hard. Existing methods are facing challenges, particularly in large-scale problems that require the coordination of numerous agents to cover thousands of cities. This paper proposes Equity-Transformer to solve large-scale min-max routing problems. First, we employ sequential planning approach to address min-max routing problems, allowing us to harness the powerful sequence generators (e.g., Transformer). Second, we propose key inductive biases that ensure equitable workload distribution among agents. The effectiveness of Equity-Transformer is demonstrated through its superior performance in two representative min-max routing tasks: the min-max multi-agent traveling salesman problem (min-max mTSP) and the min-max multi-agent pick-up and delivery problem (min-max mPDP). Notably, our method achieves significant reductions of runtime, approximately 335 times, and cost values of about 53\% compared to a competitive heuristic (LKH3) in the case of 100 vehicles with 1,000 cities of mTSP. We provide reproducible source code: \url{https://github.com/kaist-silab/equity-transformer}.	翻訳日:2024-02-07 06:17:00 公開日:2024-02-05
# GAD-NR 近傍再構成によるグラフ異常検出 GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction ( http://arxiv.org/abs/2306.01951v7 ) ライセンス: Link先を確認	Amit Roy, Juan Shu, Jia Li, Carl Yang, Olivier Elshocht, Jeroen Smeets and Pan Li	(参考訳) Graph Anomaly Detection (GAD) は、グラフ内の異常ノードを識別し、ネットワークセキュリティ、不正検出、ソーシャルメディアスパム検出、その他さまざまな分野の応用を見つけるために用いられるテクニックである。 GADの一般的な方法は、グラフデータをノード表現にエンコードし、これらの表現に基づいてグラフの再構成品質を評価することによって異常を識別するグラフオートエンコーダ(GAE)である。しかし、既存のGAEモデルは直接リンク再構成に最適化されており、グラフに接続されたノードは潜在空間にクラスタ化される。その結果、クラスター型構造異常を検出するのに優れるが、クラスタに適合しないより複雑な構造異常に悩まされる。この制限に対処するため,グラフ異常検出のための近傍再構成を組み込んだGAEの新しい変種であるGAD-NRを提案する。 GAD-NRは、ノード表現に基づいて、ローカル構造、自己属性、および隣接属性を含むノードの近傍全体を再構築することを目的としている。異常ノードと正常ノード間の近傍再構成損失を比較することで、GAD-NRは任意の異常を効果的に検出できる。 6つの実世界のデータセットで実施された大規模な実験は、GAD-NRの有効性を検証し、最先端の競合相手よりも顕著な改善(AUCでは最大30%)を示す。 GAD-NRのソースコードが公開されている。比較分析の結果,既存の手法は3種類の異常から1種類または2種類の異常を検出する場合にのみ有効であることが判明した。対照的に、GAD-NRはデータセット全体の3種類の異常を検知し、その包括的な異常検出能力を示す。 Graph Anomaly Detection (GAD) is a technique used to identify abnormal nodes within graphs, finding applications in network security, fraud detection, social media spam detection, and various other domains. A common method for GAD is Graph Auto-Encoders (GAEs), which encode graph data into node representations and identify anomalies by assessing the reconstruction quality of the graphs based on these representations. However, existing GAE models are primarily optimized for direct link reconstruction, resulting in nodes connected in the graph being clustered in the latent space. As a result, they excel at detecting cluster-type structural anomalies but struggle with more complex structural anomalies that do not conform to clusters. To address this limitation, we propose a novel solution called GAD-NR, a new variant of GAE that incorporates neighborhood reconstruction for graph anomaly detection. GAD-NR aims to reconstruct the entire neighborhood of a node, encompassing the local structure, self-attributes, and neighbor attributes, based on the corresponding node representation. By comparing the neighborhood reconstruction loss between anomalous nodes and normal nodes, GAD-NR can effectively detect any anomalies. Extensive experimentation conducted on six real-world datasets validates the effectiveness of GAD-NR, showcasing significant improvements (by up to 30% in AUC) over state-of-the-art competitors. The source code for GAD-NR is openly available. Importantly, the comparative analysis reveals that the existing methods perform well only in detecting one or two types of anomalies out of the three types studied. In contrast, GAD-NR excels at detecting all three types of anomalies across the datasets, demonstrating its comprehensive anomaly detection capabilities.	翻訳日:2024-02-07 06:16:26 公開日:2024-02-05
# 対称リプレイトレーニング:組合せ最適化のための深層強化学習におけるサンプル効率の向上 Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization ( http://arxiv.org/abs/2306.01276v3 ) ライセンス: Link先を確認	Hyeonah Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park	(参考訳) 深部強化学習(DRL)は組合せ最適化(CO)の分野を著しく進歩させた。しかし、その実用性は、特に計算集約的機能評価を伴うシナリオにおいて、多くの報酬評価の必要性によって妨げられている。サンプル効率を向上させるために,SRT (symmetric replay training) と呼ばれる簡易かつ効果的な手法を提案し,様々なDRL法に容易に組み込むことができる。提案手法では,高精細なサンプルを活用し,オンラインインタラクションの不要な未探索の対称領域の探索を促進する。リプレイトレーニングを通じて、このポリシーは、発見された高次サンプルの対称軌道の可能性を最大化するために訓練される。実験により,分子最適化やハードウェア設計といった実世界の課題に応用した多様なDRL法に対して,本手法のサンプリング効率が一貫した改善を示した。 Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO). However, its practicality is hindered by the necessity for a large number of reward evaluations, especially in scenarios involving computationally intensive function assessments. To enhance the sample efficiency, we propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various DRL methods. Our method leverages high-reward samples to encourage exploration of the under-explored symmetric regions without additional online interactions - free. Through replay training, the policy is trained to maximize the likelihood of the symmetric trajectories of discovered high-rewarded samples. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks, such as molecular optimization and hardware design.	翻訳日:2024-02-07 06:15:57 公開日:2024-02-05
# 対人訓練におけるクリーン・ジェネリゼーションとロバスト・オーバーフィッティングの理解に向けて Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training ( http://arxiv.org/abs/2306.01271v2 ) ライセンス: Link先を確認	Binghui Li, Yuanzhi Li	(参考訳) 通常のディープラーニングの驚くべきパフォーマンスと同様に、敵のトレーニングによってトレーニングされたディープネットも、$\textit{unseen clean data (natural data)$で一般化している。しかし、敵対的なトレーニングは低いロバストなトレーニングエラーを達成することができるが、大きな$\textit{robust generalization gap}$が存在する。この現象を$\textit{Clean Generalization and Robust Overfitting (CGRO)$と呼びます。本研究では,敵対的学習におけるcgro現象を,$\textit{representation complexity}$と$\textit{training dynamics}$という2つの視点から研究する。具体的には、n$の分離トレーニングデータポイントを持つバイナリ分類セットについて検討する。 $\textit{First}$ は、$\operatorname{poly}(D)$-size clean classifier (ここで$D$はデータ次元)、ReLU net with only $O(N D)$ 余分なパラメータは、CGROを達成するために頑健な記憶を活用できるが、ロバストな分類は、最悪の場合においても指数関数的な表現複雑性を必要とする。 $\textit{Next}$、トレーニングダイナミクスを解析するための構造化データケースに焦点を当て、敵の摂動に対して$O(N D)$の幅で2層畳み込みネットワークをトレーニングします。次に,3段階の位相遷移が学習過程中に発生し,ネットワークが頑健な記憶機構に確実に収束し,CGROが生じることを示す。 $\textit{Besides}$, 実画像認識データセットの実験による理論的解析を実証的に検証する。 Similar to surprising performance in the standard deep learning, deep nets trained by adversarial training also generalize well for $\textit{unseen clean data (natural data)}$. However, despite adversarial training can achieve low robust training error, there exists a significant $\textit{robust generalization gap}$. We call this phenomenon the $\textit{Clean Generalization and Robust Overfitting (CGRO)}$. In this work, we study the CGRO phenomenon in adversarial training from two views: $\textit{representation complexity}$ and $\textit{training dynamics}$. Specifically, we consider a binary classification setting with $N$ separated training data points. $\textit{First}$, we prove that, based on the assumption that we assume there is $\operatorname{poly}(D)$-size clean classifier (where $D$ is the data dimension), ReLU net with only $O(N D)$ extra parameters is able to leverages robust memorization to achieve the CGRO, while robust classifier still requires exponential representation complexity in worst case. $\textit{Next}$, we focus on a structured-data case to analyze training dynamics, where we train a two-layer convolutional network with $O(N D)$ width against adversarial perturbation. We then show that a three-stage phase transition occurs during learning process and the network provably converges to robust memorization regime, which thereby results in the CGRO. $\textit{Besides}$, we also empirically verify our theoretical analysis by experiments in real-image recognition datasets.	翻訳日:2024-02-07 06:15:42 公開日:2024-02-05
# In-Domain Self-Supervised Learningはリモートセンシング画像シーンの分類を改善する In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification ( http://arxiv.org/abs/2307.01645v2 ) ライセンス: Link先を確認	Ivica Dimitrovski, Ivan Kitanovski, Nikola Simidjievski, Dragi Kocev	(参考訳) リモートセンシング画像解析におけるビジョンモデルのドメイン内自己教師付き事前学習の有用性について検討する。自己教師あり学習(ssl)は、大量のラベルなしデータを活用できるため、リモートセンシング画像分類に有望なアプローチとして登場した。従来の教師付き学習とは異なり、sslは明示的なラベルなしでデータの表現を学習することを目指している。これは、所定の下流タスクで微調整する前に、事前トレーニングモデルに使用できる補助タスクを定式化することで実現される。 sslプリトレーニングの一般的なアプローチは、imagenetなどの標準的なプリトレーニングデータセットを活用することだ。このような一般的なアプローチは、モデル下流のパフォーマンス、特にリモートセンシングのような挑戦的な領域のタスクに最適な影響を与える可能性がある。本稿では,大規模かつラベルなしのリモートセンシングデータセットであるMario-AIDでトレーニングしたビジョントランスフォーマーと組み合わせたiBOTフレームワークを用いて,SSL事前トレーニングの有効性を解析する。本稿では,様々な自己指導型事前学習戦略を総合的に研究し,その効果を様々な特性を持つ14の下流データセットで評価する。その結果,自己教師付き事前学習に大規模ドメイン内データセットを活用すると,予測ダウンストリームのパフォーマンスが向上することが示された。 We investigate the utility of in-domain self-supervised pre-training of vision models in the analysis of remote sensing imagery. Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to exploit large amounts of unlabeled data. Unlike traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used for pre-training models before fine-tuning them on a given downstream task. A common approach in practice to SSL pre-training is utilizing standard pre-training datasets, such as ImageNet. While relevant, such a general approach can have a sub-optimal influence on the downstream performance of models, especially on tasks from challenging domains such as remote sensing. In this paper, we analyze the effectiveness of SSL pre-training by employing the iBOT framework coupled with Vision transformers trained on Million-AID, a large and unlabeled remote sensing dataset. We present a comprehensive study of different self-supervised pre-training strategies and evaluate their effect across 14 downstream datasets with diverse properties. Our results demonstrate that leveraging large in-domain datasets for self-supervised pre-training consistently leads to improved predictive downstream performance, compared to the standard approaches found in practice.	翻訳日:2024-02-07 06:08:39 公開日:2024-02-05
# エントロピーの量子ニューラル推定 Quantum Neural Estimation of Entropies ( http://arxiv.org/abs/2307.01171v2 ) ライセンス: Link先を確認	Ziv Goldfeld, Dhrumil Patel, Sreejith Sreekumar, and Mark M. Wilde	(参考訳) エントロピー測度は、量子システムに存在する情報量と相関を定量化する。実際には、量子状態が未知でそのコピーのみが利用可能である場合には、そのようなエントロピー測度の推定に頼る必要がある。ここでは、フォン・ノイマンとR'enyiエントロピーを推定するための変分量子アルゴリズムと、測定された相対エントロピーと測定されたR'enyi相対エントロピーを提案する。提案手法は,まず量子回路と古典的ニューラルネットワークによる関心度測定の変分式をパラメータ化し,パラメータ空間上での目的を最適化する。ノイズレス量子シミュレータを用いて,我々の量子アルゴリズムの数値シミュレーションを行った。このアルゴリズムは、テストした例の様々なエントロピー測度を正確に推定し、下流タスクでの使用に有望なアプローチとして表現する。 Entropy measures quantify the amount of information and correlation present in a quantum system. In practice, when the quantum state is unknown and only copies thereof are available, one must resort to the estimation of such entropy measures. Here we propose a variational quantum algorithm for estimating the von Neumann and R\'enyi entropies, as well as the measured relative entropy and measured R\'enyi relative entropy. Our approach first parameterizes a variational formula for the measure of interest by a quantum circuit and a classical neural network, and then optimizes the resulting objective over parameter space. Numerical simulations of our quantum algorithm are provided, using a noiseless quantum simulator. The algorithm provides accurate estimates of the various entropy measures for the examples tested, which renders it as a promising approach for usage in downstream tasks.	翻訳日:2024-02-07 06:08:17 公開日:2024-02-05
# スムーズド・フィットネス・ランドスケープによるタンパク質最適化の改善 Improving Protein Optimization with Smoothed Fitness Landscapes ( http://arxiv.org/abs/2307.00494v2 ) ライセンス: Link先を確認	Andrew Kirjner, Jason Yim, Raman Samusevich, Shahar Bracha, Tommi Jaakkola, Regina Barzilay, Ila Fiete	(参考訳) 望ましい性質のために高い適合性を持つ新規なタンパク質を設計する能力は、バイオテクノロジーと医学にとって革命的であろう。配列の組合せ的に大きな空間をモデル化することは不可能であり、以前の手法はしばしば小さな突然変異半径に最適化を制約するが、これは設計空間を著しく制限する。ヒューリスティックスの代わりに,タンパク質の最適化を容易にするために,適応環境の円滑化を提案する。まず、タンパク質の適合性をグラフ信号として定式化し、次にTikunov正則化を用いてフィットネスのランドスケープを円滑にする。このスムーズなランドスケープを最適化することで、GFPとAAVベンチマークの複数のメソッドのパフォーマンスが向上する。第2に、スムーズな景観における離散エネルギーモデルとMCMCを用いた最先端の成果を得る。提案手法はGibs sample with Graph-based Smoothing (GGS) と呼ばれ, トレーニングセットに対して2.5倍の適合性向上(シリコン内評価)を達成可能であることを示す。 GGSは、限られたデータ構造でタンパク質を最適化する可能性を実証している。コード: https://github.com/kirjner/ggs The ability to engineer novel proteins with higher fitness for a desired property would be revolutionary for biotechnology and medicine. Modeling the combinatorially large space of sequences is infeasible; prior methods often constrain optimization to a small mutational radius, but this drastically limits the design space. Instead of heuristics, we propose smoothing the fitness landscape to facilitate protein optimization. First, we formulate protein fitness as a graph signal then use Tikunov regularization to smooth the fitness landscape. We find optimizing in this smoothed landscape leads to improved performance across multiple methods in the GFP and AAV benchmarks. Second, we achieve state-of-the-art results utilizing discrete energy-based models and MCMC in the smoothed landscape. Our method, called Gibbs sampling with Graph-based Smoothing (GGS), demonstrates a unique ability to achieve 2.5 fold fitness improvement (with in-silico evaluation) over its training set. GGS demonstrates potential to optimize proteins in the limited data regime. Code: https://github.com/kirjner/GGS	翻訳日:2024-02-07 06:08:03 公開日:2024-02-05
# ひび割れセグメンテーションのための意味ガイダンス付きリアルタイム高分解能ニューラルネットワーク Real-time High-Resolution Neural Network with Semantic Guidance for Crack Segmentation ( http://arxiv.org/abs/2307.00270v2 ) ライセンス: Link先を確認	Yongshang Li, Ronggui Ma, Han Liu and Gaoli Cheng	(参考訳) 深層学習はクラックセグメンテーションにおいて重要な役割を果たすが、ほとんどの研究は、このタスクのために特別に開発されていない既製のモデルや改良されたモデルを利用している。物体の位置と詳細に敏感な高分解能畳み込みニューラルネットワークは、クラックセグメンテーションの性能を向上させるが、リアルタイム検出と矛盾する。本稿では,ひび割れのセグメンテーションに特化して設計された高分解能ネットワークHrSegNetについて述べる。複合データセット crackseg9k とシナリオ固有のデータセット asphalt3k と concrete3k の評価を行った後、hrsegnet は、比較したモデルをはるかに上回る最先端のセグメンテーション性能と効率性を得る。このアプローチは、高解像度モデリングとリアルタイム検出の間にトレードオフがあることを示し、エッジデバイスを使用して現実世界のアプリケーションのひび割れを分析する。 Deep learning plays an important role in crack segmentation, but most work utilize off-the-shelf or improved models that have not been specifically developed for this task. High-resolution convolution neural networks that are sensitive to objects' location and detail help improve the performance of crack segmentation, yet conflict with real-time detection. This paper describes HrSegNet, a high-resolution network with semantic guidance specifically designed for crack segmentation, which guarantees real-time inference speed while preserving crack details. After evaluation on the composite dataset CrackSeg9k and the scenario-specific datasets Asphalt3k and Concrete3k, HrSegNet obtains state-of-the-art segmentation performance and efficiencies that far exceed those of the compared models. This approach demonstrates that there is a trade-off between high-resolution modeling and real-time detection, which fosters the use of edge devices to analyze cracks in real-world applications.	翻訳日:2024-02-07 06:07:42 公開日:2024-02-05
# スパース・シーケンシャル・マイクロドップラー・リコンストラクションのための注意深いアンロール法 Attention-Refined Unrolling for Sparse Sequential micro-Doppler Reconstruction ( http://arxiv.org/abs/2306.14233v2 ) ライセンス: Link先を確認	Riccardo Mazzieri, Jacopo Pegoraro and Michele Rossi	(参考訳) ヒトの動きのマイクロドップラーシグネチャの再構築は、微細な活動認識無線センシングの鍵となる。 JCS(Joint Communication and Sensing)システムでは、専用レーダーセンシングシステムとは異なり、検知精度と通信オーバーヘッドの間の適切なトレードオフが達成されなければならない。通信パケットから得られたチャネル推定の不完全な窓からマイクロドップラーを再構築する必要がある。既存のアプローチでは圧縮センシングを活用しているが、ほんの数チャンネルの計測しか利用できない場合、非常に貧弱なリコンストラクションが発生する。加えて、収束するために必要な多数のイテレーションは、リアルタイムシステムでの使用を妨げる。本研究では,高度不完全チャネル計測からヒト運動のマイクロドップラー配列を再構成するニューラルネットワークSTARを提案する。 STARは、新しいアーキテクチャ設計に基づいており、出力に使用される1つの非ロールの反復的ハードスレッディング層とアテンションメカニズムを組み合わせたものである。その結果、モデルベースとデータ駆動の両方のソリューションの利点を享受する解釈可能で軽量なアーキテクチャが生まれます。 STARは、人間の活動トレースを60GHzのチャネルで測定した公共のJSSデータセットで評価される。実験結果から, 再建したマイクロドップラーの品質において, 最先端技術を大幅に上回ることがわかった。注目すべきは、STARは既存の技術が失敗するチャネル測定の90%を欠いている場合でも、十分な精度で人間の活動認識を可能にすることである。 The reconstruction of micro-Doppler signatures of human movements is a key enabler for fine-grained activity recognition wireless sensing. In Joint Communication and Sensing (JCS) systems, unlike in dedicated radar sensing systems, a suitable trade-off between sensing accuracy and communication overhead has to be attained. It follows that the micro-Doppler has to be reconstructed from incomplete windows of channel estimates obtained from communication packets. Existing approaches exploit compressed sensing, but produce very poor reconstructions when only a few channel measurements are available, which is often the case with real communication patterns. In addition, the large number of iterations they need to converge hinders their use in real-time systems. In this work, we propose and validate STAR, a neural network that reconstructs micro-Doppler sequences of human movement even from highly incomplete channel measurements. STAR is based upon a new architectural design that combines a single unrolled iterative hard-thresholding layer with an attention mechanism, used at its output. This results in an interpretable and lightweight architecture that reaps the benefits of both model-based and data driven solutions. STAR is evaluated on a public JCS dataset of 60 GHz channel measurements of human activity traces. Experimental results show that it substantially outperforms state-of-the-art techniques in terms of the reconstructed micro-Doppler quality. Remarkably, STAR enables human activity recognition with satisfactory accuracy even with 90% of missing channel measurements, for which existing techniques fail.	翻訳日:2024-02-07 06:06:18 公開日:2024-02-05
# 大規模ランダムクロネッカーグラフの解析と近似推定 Analysis and Approximate Inference of Large Random Kronecker Graphs ( http://arxiv.org/abs/2306.08489v2 ) ライセンス: Link先を確認	Zhenyu Liao, Yuanqian Xia, Chengmei Niu, Yong Xiao	(参考訳) ランダムグラフモデルは、ソーシャルネットワーク、通信システム、生理学的および生物学的ネットワークなど、様々な分野でますます重要な役割を担っている。この風景の中で、ランダムクロネッカーグラフモデルは、複雑な現実世界のネットワークを精査するための顕著なフレームワークとして現れる。本稿では,Kroneckerグラフの大規模乱数,すなわちグラフ頂点数が$N$であることを示す。ランダム行列理論(RMT)と高次元統計学の最近の進歩に基づいて、大きなランダムクロネッカーグラフの隣接性はスペクトルノルムの意味で分解可能であることを証明した: グラフパラメータとゼロ平均ランダムノイズ行列において線形な小さなランク(ランク$O(\log N)$)信号行列である。この結果に基づき,計算量を大幅に削減したキーグラフパラメータを推算する ``denoise-and-solve''' 手法を提案する。提案手法を評価するために,グラフ推定と分類の両方の実験を行った。どちらのタスクにおいても、提案されたアプローチは、グラフサイズ$n$として線形にスケールする時間コストで、広く使われているグラフ推論(例えば、kronfit)やグラフニューラルネットワークベースラインと同等あるいは有利なパフォーマンスをもたらす。 Random graph models are playing an increasingly important role in various fields ranging from social networks, telecommunication systems, to physiologic and biological networks. Within this landscape, the random Kronecker graph model, emerges as a prominent framework for scrutinizing intricate real-world networks. In this paper, we investigate large random Kronecker graphs, i.e., the number of graph vertices $N$ is large. Built upon recent advances in random matrix theory (RMT) and high-dimensional statistics, we prove that the adjacency of a large random Kronecker graph can be decomposed, in a spectral norm sense, into two parts: a small-rank (of rank $O(\log N)$) signal matrix that is linear in the graph parameters and a zero-mean random noise matrix. Based on this result, we propose a ``denoise-and-solve'' approach to infer the key graph parameters, with significantly reduced computational complexity. Experiments on both graph inference and classification are presented to evaluate the our proposed method. In both tasks, the proposed approach yields comparable or advantageous performance, than widely-used graph inference (e.g., KronFit) and graph neural net baselines, at a time cost that scales linearly as the graph size $N$.	翻訳日:2024-02-07 06:03:47 公開日:2024-02-05
# squeezellm: 密度と分散の量子化 SqueezeLLM: Dense-and-Sparse Quantization ( http://arxiv.org/abs/2306.07629v3 ) ライセンス: Link先を確認	Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer	(参考訳) 生成型大規模言語モデル(LLM)は、幅広いタスクに対して顕著な結果を示した。しかしながら,これらのモデルを推論用にデプロイすることは,前例のないリソース要件のために大きな課題となっている。これにより、既存のデプロイメントフレームワークでは、複雑でコストがかかるマルチGPU推論パイプラインの使用や、より小型でパフォーマンスの低いモデルの使用を余儀なくされている。本研究では, LLMを用いた生成推論の主なボトルネックは, 計算よりもメモリ帯域幅であることを示す。量子化はモデル重みを精度の低下で表現することで有望な解として現れてきたが、以前の試みはしばしば顕著な性能劣化をもたらした。学習後量子化フレームワークであるSqueezeLLMを導入し、最大3ビットの超低精度でのロスレス圧縮を可能にするとともに、同じメモリ制約下で高い量子化性能を実現する。私たちの枠組みには2つの新しいアイデアが組み込まれています (i)第2次情報に基づいて最適なビット精度を探索する感度に基づく非一様量子化 (ii)異常値や感度の高い重み値を効率的なスパース形式に格納する密度とスパース分解。 LLaMAモデルに適用した場合、我々の3ビット量子化はFP16ベースラインからのパープレキシティギャップを、同じメモリ要件の最先端手法と比較して最大2.1倍削減する。さらに、A6000 GPUにデプロイすると、我々の量子化モデルはベースラインと比較して最大2.3倍のスピードアップを達成する。私たちのコードはオープンソースで、オンラインで利用可能です。 Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing model weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is open-sourced and available online.	翻訳日:2024-02-07 06:03:23 公開日:2024-02-05
# 光量子メモリに実装したプログラマブル空間分散によるスペクトル-位置マッピング Spectrum-to-position mapping via programmable spatial dispersion implemented in an optical quantum memory ( http://arxiv.org/abs/2308.01793v2 ) ライセンス: Link先を確認	Marcin Jastrz\k{e}bski, Stanis{\l}aw Kurzyna, Bartosz Niewelt, Mateusz Mazelanik, Wojciech Wasilewski, Micha{\l} Parniak	(参考訳) 分光時間処理は、光通信やメトロジーにおいて、極端に光子当たりの情報容量に達するのに不可欠である。空間領域とは対照的に、時間周波数領域における複雑なマルチモード処理は困難である。本稿では、勾配エコー量子メモリにおける空間スピン波変調技術を用いたスペクトル対位置変換のプロトコルを提案する。このようにして、2つの領域をリンクし、従来の光学を用いた空間モードで純粋に処理を行えるようにする。本稿では,lao境界との比較を含む周波数推定の不確かさの議論と同様に,インタフェースの特性について述べる。実験結果は数値シミュレーションによって裏付けられている。この測定は, 単一光子レベルにおいて, 低付加雑音と光子飢餓状態における適用性を示した。本研究は超精密分光の展望を示し、量子・古典的通信、センシング、コンピューティングにおいて多くのプロトコルを強化する機会を提供する。 Spectro-temporal processing is essential in reaching ultimate per-photon information capacity in optical communication and metrology. In contrast to the spatial domain, complex multimode processing in the time-frequency domain is however challenging. Here we propose a protocol for spectrum-to-position conversion using spatial spin wave modulation technique in gradient echo quantum memory. This way we link the two domains and allow the processing to be performed purely on the spatial modes using conventional optics. We present the characterization of our interface as well as the frequency estimation uncertainty discussion including the comparison with Cram\'er-Rao bound. The experimental results are backed up by numerical numerical simulations. The measurements were performed on a single-photon level demonstrating low added noise and proving applicability in a photon-starved regime. Our results hold prospects for ultra-precise spectroscopy and present an opportunity to enhance many protocols in quantum and classical communication, sensing, and computing.	翻訳日:2024-02-07 05:55:33 公開日:2024-02-05
# 高歪レジームにおける単位ノルムベクトルの最適圧縮 Optimal Compression of Unit Norm Vectors in the High Distortion Regime ( http://arxiv.org/abs/2307.07941v2 ) ライセンス: Link先を確認	Heng Zhu, Avishek Ghosh, Arya Mazumdar	(参考訳) 通信効率の高い分散学習の必要性に動機づけられて,単位ノルムベクトルを最小ビット数に圧縮する方法を検討した。この問題は、率歪み/カバレッジコード文学において検討されてきたが、我々の焦点は「高歪」体制に限られている。我々は,ベクトルに関する事前情報を得ることなく,ランダム化圧縮マップを使用可能な,最悪のシナリオでこの問題にアプローチする。本研究は, バイアス圧縮法と非バイアス圧縮法の両方を考察し, 最適圧縮率を決定する。このシナリオでは、単純な圧縮スキームがほぼ最適であることがわかった。結果は新しいものと既知のものが混在しているが、完全性のためにこの論文にまとめられている。 Motivated by the need for communication-efficient distributed learning, we investigate the method for compressing a unit norm vector into the minimum number of bits, while still allowing for some acceptable level of distortion in recovery. This problem has been explored in the rate-distortion/covering code literature, but our focus is exclusively on the "high-distortion" regime. We approach this problem in a worst-case scenario, without any prior information on the vector, but allowing for the use of randomized compression maps. Our study considers both biased and unbiased compression methods and determines the optimal compression rates. It turns out that simple compression schemes are nearly optimal in this scenario. While the results are a mix of new and known, they are compiled in this paper for completeness.	翻訳日:2024-02-07 05:54:04 公開日:2024-02-05
# 長いステップを通したより高速なグラディエント染料 Provably Faster Gradient Descent via Long Steps ( http://arxiv.org/abs/2307.06324v5 ) ライセンス: Link先を確認	Benjamin Grimmer	(参考訳) 本研究は, コンピュータ支援解析手法による滑らかな凸最適化における勾配降下の新たな収束保証を確立する。本理論は、多くの反復の全体的な効果を、ほとんどの一階法分析で使われる典型的な単文帰納法ではなく、一度に分析することにより、頻繁な長いステップでポリシーを段階化することを可能にする。短期的に客観的な価値を高めるための長いステップは、長期的には確実により早く収束することを示している。勾配降下のより高速な$O(1/T\log T)$レートを証明するための予想も、単純な数値検証と共に動機付けられる。 This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.	翻訳日:2024-02-07 05:52:10 公開日:2024-02-05
# 画像キャプションのための視覚言語モデルの線形アライメント Linear Alignment of Vision-language Models for Image Captioning ( http://arxiv.org/abs/2307.05591v2 ) ライセンス: Link先を確認	Fabian Paischer, Markus Hofmarcher, Sepp Hochreiter, Thomas Adler	(参考訳) 近年、CLIPのような視覚言語モデルは、画像キャプションやキャプション評価など、様々なマルチモーダルタスクにおいて、技術の進歩を遂げている。多くのアプローチは、CLIPと言語モデルの間のマッピングネットワークをトレーニングすることで、CLIPスタイルのモデルを下流タスクに適応させる。これは通常、大きなモデルの勾配を計算するためコストがかかる。本稿では,CLIPの画像とテキストの埋め込みを,クローズドフォームで線形にマッピングする,より効率的なトレーニングプロトコルを提案する。これにより勾配計算の必要性を回避し、既存の軽量メソッドの最大1000倍の速度でトレーニング可能な、recapと呼ばれる軽量キャプションメソッドが実現される。さらに,CLIPスコアに基づく2つの新しい学習ベースの画像キャプチャーメトリクスと線形マッピングを提案する。さらにrecapと新しいメトリクスを組み合わせることで,合成キャプションに基づく反復型データストア・オーグメンテーションループ(dal)を設計する。我々はms-coco,flickr30k,vizwiz,msrvttのリキャップを評価した。 Flickr8k-Expert や Flickr8k-Crowdflower での人間の評価と整合性が高いため、既存のメトリクスでは最先端の軽量メソッドに匹敵するパフォーマンスを実現しています。最後に、recapが他のドメインにうまく移行し、dalがパフォーマンス向上につながることを実証します。 Recently, vision-language models like CLIP have advanced the state of the art in a variety of multi-modal tasks including image captioning and caption evaluation. Many approaches adapt CLIP-style models to a downstream task by training a mapping network between CLIP and a language model. This is costly as it usually involves calculating gradients for large models. We propose a more efficient training protocol that fits a linear mapping between image and text embeddings of CLIP via a closed-form solution. This bypasses the need for gradient computation and results in a lightweight captioning method called ReCap, which can be trained up to 1000 times faster than existing lightweight methods. Moreover, we propose two new learning-based image-captioning metrics that build on CLIP score along with our linear mapping. Furthermore, we combine ReCap with our new metrics to design an iterative datastore-augmentation loop (DAL) based on synthetic captions. We evaluate ReCap on MS-COCO, Flickr30k, VizWiz, and MSRVTT. ReCap achieves performance comparable to state-of-the-art lightweight methods on established metrics while outperforming them on our new metrics, which are better aligned with human ratings on Flickr8k-Expert and Flickr8k-Crowdflower. Finally, we demonstrate that ReCap transfers well to other domains and that our DAL leads to a performance boost.	翻訳日:2024-02-07 05:51:59 公開日:2024-02-05
# 迅速な経験的シナリオ Fast Empirical Scenarios ( http://arxiv.org/abs/2307.03927v2 ) ライセンス: Link先を確認	Michael Multerer, Paul Schneider, Rohan Sen	(参考訳) 我々は,サンプルモーメントと整合する大規模かつ高次元のパネルデータから,少数の代表的なシナリオを抽出することを目指す。 2つの新しいアルゴリズムのうち、最初に観測されていないシナリオを識別し、共分散行列のシナリオベースで表現する。第2の提案は、既に実現済みで、高次のサンプルモーメント情報と整合した世界の状態から重要なデータポイントを選択する。どちらのアルゴリズムも計算に効率的であり、一貫したシナリオベースモデリングと高次元数値積分に役立てる。広範囲な数値ベンチマーク研究とポートフォリオ最適化への応用により,提案手法が好まれる。 We seek to extract a small number of representative scenarios from large and high-dimensional panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal picks important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute, and lend themselves to consistent scenario-based modeling and high-dimensional numerical integration. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.	翻訳日:2024-02-07 05:51:35 公開日:2024-02-05
# SSTFormer: フレームイベントに基づく認識のためのブリッジングスパイキングニューラルネットワークとメモリサポートトランス SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition ( http://arxiv.org/abs/2308.04369v2 ) ライセンス: Link先を確認	Xiao Wang, Zongzhen Wu, Yao Rong, Lin Zhu, Bo Jiang, Jin Tang, Yonghong Tian	(参考訳) イベントカメラに基づくパターン認識は近年新たに生まれた研究テーマである。現在の研究者は通常、イベントストリームを画像、グラフ、voxelに変換し、イベントベースの分類にディープニューラルネットワークを採用する。しかし、単純なイベント認識データセットでは良いパフォーマンスが得られるが、以下の2つの問題により、結果はまだ限られているかもしれない。まず、認識のみに空間的スパースイベントストリームを採用するが、色や詳細なテクスチャ情報をうまくキャプチャできない場合がある。第2に、spyking neural networks (snn) をエネルギー効率のよいサブオプティマイズによる認識に、artificial neural networks (ann) をエネルギー集約的かつ高性能な認識に採用している。しかし、これら2つの側面のバランスを取ることはほとんど考えていない。本稿では,RGBフレームとイベントストリームを同時に融合してパターンを認識することを提案し,上記の問題に対処する新しいRGBフレームイベント認識フレームワークを提案する。提案手法は,RGBフレーム符号化のためのメモリサポートトランスフォーマーネットワーク,生イベントストリーム符号化のためのスパイクニューラルネットワーク,RGBイベント特徴集約のためのマルチモーダルボトルネック融合モジュール,予測ヘッドの4つの主要モジュールを含む。また,RGB-Eventに基づく分類データセットが不足しているため,DVS346イベントカメラを用いて記録した114のクラスと27102のフレームイベントペアを含む大規模PokerEventデータセットを提案する。 2つのrgbイベントベースの分類データセットに関する広範な実験により,提案フレームワークの有効性が完全に検証された。この作業により、RGBフレームとイベントストリームを融合することで、パターン認識の開発が促進されることを願っています。この作業のデータセットとソースコードは、https://github.com/Event-AHU/SSTFormer.comで公開されます。 Event camera-based pattern recognition is a newly arising research topic in recent years. Current researchers usually transform the event streams into images, graphs, or voxels, and adopt deep neural networks for event-based classification. Although good performance can be achieved on simple event recognition datasets, however, their results may be still limited due to the following two issues. Firstly, they adopt spatial sparse event streams for recognition only, which may fail to capture the color and detailed texture information well. Secondly, they adopt either Spiking Neural Networks (SNN) for energy-efficient recognition with suboptimal results, or Artificial Neural Networks (ANN) for energy-intensive, high-performance recognition. However, seldom of them consider achieving a balance between these two aspects. In this paper, we formally propose to recognize patterns by fusing RGB frames and event streams simultaneously and propose a new RGB frame-event recognition framework to address the aforementioned issues. The proposed method contains four main modules, i.e., memory support Transformer network for RGB frame encoding, spiking neural network for raw event stream encoding, multi-modal bottleneck fusion module for RGB-Event feature aggregation, and prediction head. Due to the scarce of RGB-Event based classification dataset, we also propose a large-scale PokerEvent dataset which contains 114 classes, and 27102 frame-event pairs recorded using a DVS346 event camera. Extensive experiments on two RGB-Event based classification datasets fully validated the effectiveness of our proposed framework. We hope this work will boost the development of pattern recognition by fusing RGB frames and event streams. Both our dataset and source code of this work will be released at https://github.com/Event-AHU/SSTFormer.	翻訳日:2024-02-07 05:44:17 公開日:2024-02-05
# リアルタイム合成支援のためのハイブリッド検索拡張生成 Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance ( http://arxiv.org/abs/2308.04215v2 ) ライセンス: Link先を確認	Menglin Xia, Xuchao Zhang, Camille Couturier, Guoqing Zheng, Saravan Rajmohan, Victor Ruhle	(参考訳) Retrieval Augmentationは、コンテキストを追加することによって、従来の言語モデルのパフォーマンスを向上させる。しかし,拡張大言語モデル(LLM)の検索に対する計算要求は,合成支援などのリアルタイムタスクに適用する際の課題となっている。この制限に対処するために,我々は,クラウドベースのllmを拡張メモリ検索により,より小さなクライアントサイド言語モデルと効率的に結合する新しい手法であるhybridrag(hybridrag)フレームワークを提案する。この統合により、クライアントモデルはLLMの機能とコンテキスト情報を利用して効果的な応答を生成することができる。さらに、非同期メモリ更新メカニズムにより、クライアントモデルはクラウドからの応答を待つことなく、ユーザの入力に素早くリアルタイムの完了を配信できる。 5つのベンチマークデータセットの実験により、HybridRAGは低レイテンシを維持しながら、クライアントのみのモデルよりも実用性を大幅に向上することを示した。 Retrieval augmentation enhances performance of traditional language models by incorporating additional context. However, the computational demands for retrieval augmented large language models (LLMs) pose a challenge when applying them to real-time tasks, such as composition assistance. To address this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework, a novel approach that efficiently combines a cloud-based LLM with a smaller, client-side, language model through retrieval augmented memory. This integration enables the client model to generate effective responses, benefiting from the LLM's capabilities and contextual information. Additionally, through an asynchronous memory update mechanism, the client model can deliver real-time completions swiftly to user inputs without the need to wait for responses from the cloud. Our experiments on five benchmark datasets demonstrate that HybridRAG significantly improves utility over client-only models while maintaining low latency.	翻訳日:2024-02-07 05:43:48 公開日:2024-02-05
# 固有スペクトルに基づく量子共分散行列の分類に関する結果 A Result About the Classification of Quantum Covariance Matrices Based on Their Eigenspectra ( http://arxiv.org/abs/2308.03439v2 ) ライセンス: Link先を確認	Arik Avagyan	(参考訳) 有限自由度を持つ連続変数量子システムの共分散行列の集合は、ハイゼンベルクの不確実性原理による実正定値行列の集合の厳密な部分集合である。これは、一般に量子共分散行列のすべての直交変換が不確実性原理に従う正定値行列を生成するわけではないことを意味する。したがって自然問題が起こり、与えられた固有スペクトルと一致する量子共分散行列の集合を見つける。純粋ガウス状態の特別なクラスについて、与えられた固有スペクトルを持つ量子共分散行列の集合は直交シンプレクティック群の作用の1つの軌道からなる。このクラスの状態の共分散行列の固有スペクトルは、それぞれ1に乗算するペアで構成されている。我々の主な貢献は、このクラスの任意の固有スペクトルに対応する量子共分散行列の集合が直交シンプレクティック変換によって関連しているという性質を持つ非自明な固有スペクトルのクラスを見つけることである。この性質を持つすべての非退化固有スペクトルは、このクラスに属しなければならず、そのような固有スペクトルの集合は、ガウス状態の物理的に関連する熱的およびスクイーズ的パラメータを特定する非退化固有スペクトルのクラスと一致する。 The set of covariance matrices of a continuous-variable quantum system with a finite number of degrees of freedom is a strict subset of the set of real positive-definite matrices due to Heisenberg's uncertainty principle. This has the implication that, in general, not every orthogonal transform of a quantum covariance matrix produces a positive-definite matrix that obeys the uncertainty principle. A natural question thus arises, to find the set of quantum covariance matrices consistent with a given eigenspectrum. For the special class of pure Gaussian states the set of quantum covariance matrices with a given eigenspectrum consists of a single orbit of the action of the orthogonal symplectic group. The eigenspectrum of a covariance matrix of a state in this class is composed of pairs that each multiply to one. Our main contribution is finding a non-trivial class of eigenspectra with the property that the set of quantum covariance matrices corresponding to any eigenspectrum in this class are related by orthogonal symplectic transformations. We show that all non-degenerate eigenspectra with this property must belong to this class, and that the set of such eigenspectra coincides with the class of non-degenerate eigenspectra that identify the physically relevant thermal and squeezing parameters of a Gaussian state.	翻訳日:2024-02-07 05:43:34 公開日:2024-02-05
# ディープラーニングにおける校正:最新技術に関する調査 Calibration in Deep Learning: A Survey of the State-of-the-Art ( http://arxiv.org/abs/2308.01222v2 ) ライセンス: Link先を確認	Cheng Wang	(参考訳) ディープニューラルネットワークモデルのキャリブレーションは、安全クリティカルなアプリケーションにおいて、信頼性が高くロバストなaiシステムを構築する上で重要な役割を果たす。近年の研究では、予測能力の高い現代のニューラルネットワークは、キャリブレーションが不十分であり、信頼性の低いモデル予測を生成することが示されている。深層学習モデルは様々なベンチマークで顕著な性能を発揮するが、モデルの校正と信頼性の研究は比較的過小評価されている。理想の深層モデルは高い予測性能を持つだけでなく、高度に校正されるべきである。深層モデルの校正に関する最近の進歩がいくつかある。本調査では,モデルキャリブレーションを行うための最先端のキャリブレーション手法とその原理について概説する。まず、モデルの校正の定義から始め、モデルの誤校正の根本原因を説明します。そして、この側面を計測できる重要な指標を紹介します。次に, ポストホック校正法, 正規化法, 不確実性推定法, 合成法という4つのカテゴリに大まかに分類した校正法の概要を示す。また、大きなモデル、特に大きな言語モデル(llm)のキャリブレーションにおける最近の進歩についても取り上げます。最後に、オープンな問題、課題、潜在的な方向性について議論する。 Calibrating deep neural models plays an important role in building reliable, robust AI systems in safety-critical applications. Recent work has shown that modern neural networks that possess high predictive capability are poorly calibrated and produce unreliable model predictions. Though deep learning models achieve remarkable performance on various benchmarks, the study of model calibration and reliability is relatively underexplored. Ideal deep models should have not only high predictive performance but also be well calibrated. There have been some recent advances in calibrating deep models. In this survey, we review the state-of-the-art calibration methods and their principles for performing model calibration. First, we start with the definition of model calibration and explain the root causes of model miscalibration. Then we introduce the key metrics that can measure this aspect. It is followed by a summary of calibration methods that we roughly classify into four categories: post-hoc calibration, regularization methods, uncertainty estimation, and composition methods. We also cover recent advancements in calibrating large models, particularly large language models (LLMs). Finally, we discuss some open issues, challenges, and potential directions.	翻訳日:2024-02-07 05:42:57 公開日:2024-02-05
# UniAP: 混合整数二次プログラミングによる層間および層内自動並列化 UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming ( http://arxiv.org/abs/2307.16375v2 ) ライセンス: Link先を確認	Hao Lin, Ke Wu, Jie Li, Jun Li, Wu-Jun Li	(参考訳) 分散学習は、ディープラーニングモデル、特に大規模モデルのトレーニングに一般的に使用される。分散学習では、手動並列処理(mp)の手法は相当な人的労力を必要とし、柔軟性は限られている。そのため,近年,並列戦略最適化プロセスを自動化するための自動並列化手法が提案されている。既存のAP法は、並列戦略の2つのカテゴリ(すなわち層間並列性と層間並列性)を共同で最適化しないため、準最適解に苦しむ。本論文では、混合整数二次計画法により層間および層間自動並列性を統一するUniAPと呼ばれる新しいAP手法を提案する。我々の知る限り、UniAPは並列戦略の2つのカテゴリを共同で最適化し、最適な解を見つけるための最初の並列手法である。実験の結果、uniapは5つのトランスフォーマーベースのモデルで最大 1.71$\times$ のスループットと 107$\times$ の戦略最適化時間を削減した。 Distributed learning is commonly used for training deep learning models, especially large models. In distributed learning, manual parallelism (MP) methods demand considerable human effort and have limited flexibility. Hence, automatic parallelism (AP) methods have recently been proposed for automating the parallel strategy optimization process. Existing AP methods suffer from sub-optimal solutions because they do not jointly optimize the two categories of parallel strategies (i.e., inter-layer parallelism and intra-layer parallelism). In this paper, we propose a novel AP method called UniAP, which unifies inter- and intra-layer automatic parallelism by mixed integer quadratic programming. To the best of our knowledge, UniAP is the first parallel method that can jointly optimize the two categories of parallel strategies to find an optimal solution. Experimental results show that UniAP outperforms state-of-the-art methods by up to 1.71$\times$ in throughput and reduces strategy optimization time by up to 107$\times$ across five Transformer-based models.	翻訳日:2024-02-07 05:42:19 公開日:2024-02-05
# 差分進化アルゴリズムに基づく負荷予測のための変圧器ニューラルネットワークモデルのハイパーパラメータ選択 Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting ( http://arxiv.org/abs/2307.15299v5 ) ライセンス: Link先を確認	Anuvab Sen, Arul Rhik Mazumder, Udayon Sen	(参考訳) 多くの分野において、正確な負荷予測は重要な役割を果たすが、動的電力システムの複雑なダイナミクスを正確に捉えることは、伝統的な統計モデルにとって課題である。これらの理由から、時系列モデル(ARIMA)とディープラーニングモデル(ANN、LSTM、GRUなど)が一般的にデプロイされ、しばしばより高い成功を経験する。本稿では,最近開発されたTransformer-based Neural Network Modelの負荷予測における有効性について検討する。トランスフォーマーモデルは、そのアテンションメカニズムから派生した長距離依存を学習できるため、ロード予測を改善する可能性がある。本稿では,変圧器ベースニューラルネットワークの最適ハイパーパラメータを求めるために,微分進化というメタヒューリスティックスを適用した。微分進化は、非微分可能、多目的、制約付き最適化問題に対するスケーラブルで堅牢なグローバルソリューションを提供する。本研究では,mse(平均二乗誤差)やmape(平均絶対パーセンテージ誤差)などの数値指標に基づく負荷予測における性能と,様々なメタヒューリスティックアルゴリズムと統合したトランスフォーマティブニューラルネットワークモデルを比較した。負荷予測におけるメタヒューリスティックなトランスフォーマーベースニューラルネットワークモデルの可能性を示し,各モデルに最適なハイパーパラメータを提供する。 Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.	翻訳日:2024-02-07 05:41:33 公開日:2024-02-05
# 経路依存NJ-ODEの雑音観測への拡張と依存観測フレームワーク Extending Path-Dependent NJ-ODEs to Noisy Observations and a Dependent Observation Framework ( http://arxiv.org/abs/2307.13147v2 ) ライセンス: Link先を確認	William Andersson, Jakob Heiss, Florian Krach, Josef Teichmann	(参考訳) Path-Dependent Neural Jump Ordinary Differential Equation (PD-NJ-ODE) は、不規則かつ不完全な観測で連続時間確率過程を予測するモデルである。特に、不完全な過去の観測の時系列を不規則にサンプリングした最適な予測を学習する。これまでのところ、プロセス自体と座標観測時間は独立であり、観測はノイズのないと仮定されていた。本研究では,これらの制約を緩和し,理論的な保証と実証的な例を与える2つの拡張について論じる。特に、アルゴリズムを変更することなく、より現実的な条件付き独立設定に理論を拡張することで、独立性の仮定を引き上げることができる。さらに、ノイズの多い観測を処理できる新しい損失関数を導入し、以前使用していた損失関数が一貫した推定に繋がらなかった理由を説明する。 The Path-Dependent Neural Jump Ordinary Differential Equation (PD-NJ-ODE) is a model for predicting continuous-time stochastic processes with irregular and incomplete observations. In particular, the method learns optimal forecasts given irregularly sampled time series of incomplete past observations. So far the process itself and the coordinate-wise observation times were assumed to be independent and observations were assumed to be noiseless. In this work we discuss two extensions to lift these restrictions and provide theoretical guarantees as well as empirical examples for them. In particular, we can lift the assumption of independence by extending the theory to much more realistic settings of conditional independence without any need to change the algorithm. Moreover, we introduce a new loss function, which allows us to deal with noisy observations and explain why the previously used loss function did not lead to a consistent estimator.	翻訳日:2024-02-07 05:41:09 公開日:2024-02-05
# Big Data -- 予測のためのサプライチェーン管理フレームワーク: データ前処理と機械学習技術 Big Data -- Supply Chain Management Framework for Forecasting: Data Preprocessing and Machine Learning Techniques ( http://arxiv.org/abs/2307.12971v4 ) ライセンス: Link先を確認	Md Abrar Jahin, Md Sakib Hossain Shovon, Jungpil Shin, Istiyaque Ahmed Ridoy, and M. F. Mridha	(参考訳) 本稿は,最先端サプライチェーン(sc)の予測戦略と技術を体系的に同定し,比較分析することを目的とする。 sc管理(problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization)にビッグデータ分析(problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization)を組み込んだ新しいフレームワークが提案されている。当初、sc戦略に従ってデータを収集する必要性と収集方法が議論されてきた。本稿は、期間やSCの目的に応じて異なるタイプの予測の必要性について論じる。 SC KPIとエラー測定システムは、最高性能モデルを最適化するために推奨されている。モデル性能パラメータの決定と運用管理,透明性,計画効率の向上のために,ファントム在庫の予測および管理決定のSC KPIへの依存に対する悪影響を概説した。フレームワーク内の循環接続は、プロセス後KPIに基づいて前処理の最適化を導入し、全体的な制御プロセス(発明的管理、労働決定、コスト、生産、容量計画)を最適化する。この研究の貢献は、標準のSCプロセスフレームワークの提案、予測データ分析の推奨、SCパフォーマンスの予測効果、機械学習アルゴリズムの最適化、そして将来の研究への光の遮蔽にある。 This article intends to systematically identify and comparatively analyze state-of-the-art supply chain (SC) forecasting strategies and technologies. A novel framework has been proposed incorporating Big Data Analytics in SC Management (problem identification, data sources, exploratory data analysis, machine-learning model training, hyperparameter tuning, performance evaluation, and optimization), forecasting effects on human-workforce, inventory, and overall SC. Initially, the need to collect data according to SC strategy and how to collect them has been discussed. The article discusses the need for different types of forecasting according to the period or SC objective. The SC KPIs and the error-measurement systems have been recommended to optimize the top-performing model. The adverse effects of phantom inventory on forecasting and the dependence of managerial decisions on the SC KPIs for determining model performance parameters and improving operations management, transparency, and planning efficiency have been illustrated. The cyclic connection within the framework introduces preprocessing optimization based on the post-process KPIs, optimizing the overall control process (inventory management, workforce determination, cost, production and capacity planning). The contribution of this research lies in the standard SC process framework proposal, recommended forecasting data analysis, forecasting effects on SC performance, machine learning algorithms optimization followed, and in shedding light on future research.	翻訳日:2024-02-07 05:40:53 公開日:2024-02-05
# RCM融合:3次元物体検出のためのレーダーカメラ多層核融合 RCM-Fusion: Radar-Camera Multi-Level Fusion for 3D Object Detection ( http://arxiv.org/abs/2307.10249v4 ) ライセンス: Link先を確認	Jisong Kim, Minjae Seong, Geonho Bang, Dongsuk Kum, Jun Won Choi	(参考訳) LiDARセンサーは3Dオブジェクト検出にうまく応用されているが、レーダーやカメラセンサーが手に入ることで、3Dオブジェクト検出のためのレーダーやカメラの融合への関心が高まっている。しかし、従来のレーダー・カメラ融合モデルはレーダー情報の可能性を十分に活用できなかった。本稿では,特徴レベルとインスタンスレベルの両モードを融合するRadar-Camera Multi-level fusion (RCM-Fusion)を提案する。特徴レベルの融合のために,レーダーバード-アイビュー(BEV)特徴の誘導を用いて,カメラ特徴を正確なBEV表現に変換するレーダー誘導型BEVエンコーダを提案する。実例レベルの融合では,レーダ点雲の特性を考慮し,局所化誤差を低減できるレーダグリッドポイントリファインメントモジュールを提案する。公開nuScenesデータセットを用いて行った実験により,提案したRCM-Fusionは,nuScenes 3Dオブジェクト検出ベンチマークにおいて,単一フレームベースレーダカメラ融合方式の最先端性能を実現することが示された。コードは公開される予定だ。 While LiDAR sensors have been successfully applied to 3D object detection, the affordability of radar and camera sensors has led to a growing interest in fusing radars and cameras for 3D object detection. However, previous radar-camera fusion models were unable to fully utilize the potential of radar information. In this paper, we propose Radar-Camera Multi-level fusion (RCM-Fusion), which attempts to fuse both modalities at both feature and instance levels. For feature-level fusion, we propose a Radar Guided BEV Encoder which transforms camera features into precise BEV representations using the guidance of radar Bird's-Eye-View (BEV) features and combines the radar and camera BEV features. For instance-level fusion, we propose a Radar Grid Point Refinement module that reduces localization error by accounting for the characteristics of the radar point clouds. The experiments conducted on the public nuScenes dataset demonstrate that our proposed RCM-Fusion achieves state-of-the-art performances among single frame-based radar-camera fusion methods in the nuScenes 3D object detection benchmark. Code will be made publicly available.	翻訳日:2024-02-07 05:39:52 公開日:2024-02-05
# DialogStudio: 会話型AIのための最もリッチで最も多様な統一データセットコレクションを目指して DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI ( http://arxiv.org/abs/2307.10172v3 ) ライセンス: Link先を確認	Jianguo Zhang and Kun Qian and Zhiwei Liu and Shelby Heinecke and Rui Meng and Ye Liu and Zhou Yu and Huan Wang and Silvio Savarese and Caiming Xiong	(参考訳) 会話AIの進歩にもかかわらず、言語モデルは多様な会話タスクを扱うための課題に直面し、既存の対話データセットコレクションは多様性と包括性を欠いていることが多い。これらの問題に対処するために,対話データセットの最大かつ最も多様なコレクションであるDialogStudioを紹介し,元の情報を保存しながら一貫したフォーマットで統一する。本コレクションは,オープンドメイン対話,タスク指向対話,自然言語理解,対話レコメンデーション,対話要約,知識基底対話などのデータを含む。ダイアログスタディオの有用性をさらに高め、各データセットのライセンスを特定し、外部知識と選択した対話に対するドメイン認識プロンプトを設計し、命令認識の微調整を容易にする。さらに、データセット収集を用いて会話型AIモデルを構築し、ゼロショットおよび少数ショット学習シナリオにおける実験により、DialogStudioの優位性を実証した。透明性を改善し、タスクベースの研究と言語モデルの事前トレーニングをサポートするため、すべてのデータセット、ライセンス、コード、およびdialogstudioに関連するモデルは、公開アクセス可能な\footnote{\url{https://github.com/salesforce/dialogstudio}} となる。 Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues, making it an incredibly rich and diverse resource for dialogue research and model training. To further enhance the utility of DialogStudio, we identify the licenses for each dataset, design external knowledge and domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we develop conversational AI models using the dataset collection, and our experiments in both zero-shot and few-shot learning scenarios demonstrate the superiority of DialogStudio. To improve transparency and support dataset and task-based research, as well as language model pre-training, all datasets, licenses, codes, and models associated with DialogStudio are made publicly accessible\footnote{\url{https://github.com/salesforce/DialogStudio}}.	翻訳日:2024-02-07 05:39:31 公開日:2024-02-05
# ソースコード要約のための蒸留GPT Distilled GPT for Source Code Summarization ( http://arxiv.org/abs/2308.14731v2 ) ライセンス: Link先を確認	Chia-Yi Su and Collin McMillan	(参考訳) コード概要は、ソースコードの簡単な自然言語記述である。要約は通常は1文だけであり、開発者ドキュメントのバックボーンを形成している。のような短い記述は、プログラマにコード自体を読み込むことなく、コードが何を行うかという高レベルなアイデアを与えることができる。近年、ChatGPTのような大規模言語モデルに基づく製品は、これらの記述を自動的に記述する強力な能力を示している。しかし、これらのツールを使用するには、プログラマは信頼できないサードパーティにコードを送信する必要がある(API呼び出しなど)。この保護の喪失は多くの組織には受け入れられない。本稿では, gpt-3.5で生成したサンプル出力を用いて, 知識蒸留に関する過程において, オープンソースモデルを訓練する。我々のモデルは1つの16gbのGPUで動かすのに十分小さい(350mパラメータ)が、このタスクでGPT-3.5を模倣するのに十分な大きさであることを示す。 A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT-3.5 in a process related to knowledge distillation. Our model is small enough (350m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT-3.5 on this task.	翻訳日:2024-02-07 05:31:42 公開日:2024-02-05
# ESGに着目したDLT研究の進化:NLPによる文献分析 Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature ( http://arxiv.org/abs/2308.12420v2 ) ライセンス: Link先を確認	Walter Hernandez, Kamil Tylinski, Alastair Moore, Niall Roche, Nikhil Vadgama, Horst Treiblmaier, Jiangbo Shangguan, Paolo Tasca, and Jiahua Xu	(参考訳) 分散台帳技術(dlts)が急速に発展するにつれて、その影響は技術を超えて広がり、環境や社会に影響を及ぼす。この進化は出版物を増やし、手作業による文学分析をますます困難にしている。本稿では,自然言語処理(NLP)に基づく体系的文献レビュー手法を用いて,分散台帳技術(DLT)と環境・社会・ガバナンス(ESG)との共通点を探索する。提案手法では,107のシード論文から24,539のパブリッシングコーパスへの有向引用ネットワークの構築と,DLTおよびESGドメイン上の名前付きエンティティ認識(NER)のためのトランスフォーマーベース言語モデルの微調整を行う。このモデルを適用して, コーパスを505個の主要な出版物に蒸留し, ESG文脈におけるDLTの進化に関する文献レビューと時間グラフ解析を可能にした。我々のコントリビューションには、適応的でスケーラブルなNLP駆動型システム文献レビュー方法論と、DLTとESG研究に適した54,808個のNERデータセットが含まれている。我々の初回文献レビューは、DLTの進化と影響を分析するための適用性と有効性を示し、DLTドメインの利害関係者にとって重要でないことを証明している。 As Distributed Ledger Technologies (DLTs) rapidly evolve, their impacts extend beyond technology, influencing environmental and societal aspects. This evolution has increased publications, making manual literature analysis increasingly challenging. We address this with a Natural Language Processing (NLP)-based systematic literature review method to explore the intersection of Distributed Ledger Technology (DLT) with its Environmental, Social, and Governance (ESG) aspects. Our approach involves building and refining a directed citation network from 107 seed papers to a corpus of 24,539 publications and fine-tuning a transformer-based language model for Named Entity Recognition (NER) on DLT and ESG domains. Applying this model, we distilled the corpus to 505 key publications, enabling an inaugural literature review and temporal graph analysis of DLT's evolution in ESG contexts. Our contributions include an adaptable and scalable NLP-driven systematic literature review methodology and a unique NER dataset of 54,808 entities, tailored for DLT and ESG research. Our inaugural literature review demonstrates their applicability and effectiveness in analyzing DLT's evolution and impacts, proving invaluable for stakeholders in the DLT domain.	翻訳日:2024-02-07 05:30:31 公開日:2024-02-05
# 大規模多言語モデルによる言語間のゼロショットマルチモーダル学習 Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages ( http://arxiv.org/abs/2308.12038v2 ) ライセンス: Link先を確認	Jinyi Hu, Yuan Yao, Chongyi Wang, Shan Wang, Yinxu Pan, Qianyu Chen, Tianyu Yu, Hanghao Wu, Yue Zhao, Haoye Zhang, Xu Han, Yankai Lin, Jiao Xue, Dahai Li, Zhiyuan Liu, Maosong Sun	(参考訳) 近年,画像・テキスト・テキスト・画像生成の両面で,マルチモーダル学習が著しく増加している。しかし、この成功は英語に限られており、他の言語はほとんど残っていない。他の言語で競合する言語を構築することは、非英語のマルチモーダルデータ(大規模で高品質な画像テキストデータの欠如)の低リソース性のために非常に難しい。本研究では,非英語言語における大規模マルチモーダルモデルの学習に有効な訓練パラダイムであるMPMを提案する。 mpmは、多言語言語モデルが言語間でゼロショットのマルチモーダル学習をピボットできることを実証する。具体的には、強い多言語大言語モデルに基づいて、英語のみの画像テキストデータに基づいて事前訓練されたマルチモーダルモデルは、(準)ゼロショット方式で他の言語にうまく一般化することができる。 MPMの実践として中国語を取り入れ,画像からテキストへ,テキストから画像へ生成する大規模なマルチモーダルモデルVisCPMを構築した。将来の研究を促進するため、私たちはhttps://github.com/OpenBMB/VisCPM.gitでコードとモデルの重みをオープンソース化しました。 Recently there has been a significant surge in multimodal learning in terms of both image-to-text and text-to-image generation. However, the success is typically limited to English, leaving other languages largely behind. Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i.e., lack of large-scale, high-quality image-text data). In this work, we propose MPM, an effective training paradigm for training large multimodal models in non-English languages. MPM demonstrates that Multilingual language models can Pivot zero-shot Multimodal learning across languages. Specifically, based on a strong multilingual large language model, multimodal models pretrained on English-only image-text data can well generalize to other languages in a (quasi)-zero-shot manner, even surpassing models trained on image-text data in native languages. Taking Chinese as a practice of MPM, we build large multimodal models VisCPM in image-to-text and text-to-image generation, which achieve state-of-the-art (open-source) performance in Chinese. To facilitate future research, we open-source codes and model weights at https://github.com/OpenBMB/VisCPM.git.	翻訳日:2024-02-07 05:29:56 公開日:2024-02-05
# 費用効率の良いオンライン意思決定:A Combinatorial Multi-Armed Bandit Approach Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach ( http://arxiv.org/abs/2308.10699v2 ) ライセンス: Link先を確認	Arman Rahbar, Niklas {\AA}kerblom, Morteza Haghir Chehreghani	(参考訳) オンライン意思決定は多くの現実世界のアプリケーションにおいて重要な役割を果たす。多くのシナリオでは、入ってくるデータポイントで一連のテストを実行することに基づいて決定が行われる。しかし、すべてのテストの実行は高価であり、常に可能であるとは限らない。本稿では,コンビネータ型多腕バンディットに基づくオンライン意思決定問題の新規な定式化と,テスト実行の(おそらく確率的)コストを考慮に入れる。この定式化に基づいて,後方サンプリングやベイズUCBを探索に利用できる費用効率の高いオンライン意思決定のための新しい枠組みを提供する。我々は,コスト効率の高いオンライン意思決定のためのトンプソンサンプリングの理論的解析を行い,実世界問題に対するフレームワークの適用性を示す様々な実験結果を示す。 Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the (possibly stochastic) cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a theoretical analysis of Thompson Sampling for cost-efficient online decision making, and present various experimental results that demonstrate the applicability of our framework to real-world problems.	翻訳日:2024-02-07 05:28:50 公開日:2024-02-05
# スマートモビリティにおける創発的ソフトウェアサービスプラットフォームとその応用 Emergent Software Service Platform and its Application in a Smart Mobility Setting ( http://arxiv.org/abs/2308.08168v2 ) ライセンス: Link先を確認	Nils Wilken, Christoph Knieke, Eric Nyakam, Andreas Rausch, Christian Schindler, Christian Bartelt and Nikolaus Ziebura	(参考訳) 産業、ビジネス、社会におけるデジタルイノベーションの発展ダイナミクスは、もはや古典的な開発プロセスにおいて中央および階層的に設計できない複雑なシステムコングロマリットを生み出している。むしろシステムは、異質なアクターがオープンなプラットフォームで一緒に行動するDevOpsプロセスで進化している。このような動的かつ自律的に変化するシステムランドスケープへのインフルエンスとコントロールは、現在、サービスユーザとプロバイダ、およびプラットフォームインフラストラクチャのオペレーターにとって大きな課題であり、基本的な関心事である。本稿では,このような創発的ソフトウェアサービスプラットフォームのためのアーキテクチャを提案する。このアーキテクチャを基盤となるエンジニアリング方法論で実装するソフトウェアプラットフォームは、スマートパーキングロットシナリオによって実証される。 The development dynamics of digital innovations for industry, business, and society are producing complex system conglomerates that can no longer be designed centrally and hierarchically in classic development processes. Instead, systems are evolving in DevOps processes in which heterogeneous actors act together on an open platform. Influencing and controlling such dynamically and autonomously changing system landscapes is currently a major challenge and a fundamental interest of service users and providers, as well as operators of the platform infrastructures. In this paper, we propose an architecture for such an emergent software service platform. A software platform that implements this architecture with the underlying engineering methodology is demonstrated by a smart parking lot scenario.	翻訳日:2024-02-07 05:28:21 公開日:2024-02-05
# Qibolab: オープンソースのハイブリッド量子オペレーティングシステム Qibolab: an open-source hybrid quantum operating system ( http://arxiv.org/abs/2308.06313v3 ) ライセンス: Link先を確認	Stavros Efthymiou, Alvaro Orgaz-Fuertes, Rodolfo Carobene, Juan Cereijo, Andrea Pasquale, Sergi Ramos-Calderer, Simone Bordoni, David Fuentes-Ruiz, Alessandro Candido, Edoardo Pedicillo, Matteo Robbiati, Yuanzheng Paul Tan, Jadwiga Wilkens, Ingo Roth, Jos\'e Ignacio Latorre, Stefano Carrazza	(参考訳) 我々はqibo量子コンピューティングミドルウェアフレームワークと統合された量子ハードウェア制御のためのオープンソースソフトウェアライブラリqibolabを提案する。 Qibolabは、カスタムのセルフホスト量子ハードウェアプラットフォーム上でサーキットベースのアルゴリズムを自動実行するために必要なソフトウェア層を提供する。本稿では,機器,トランスパイラ,最適化アルゴリズムのためのパルス指向ドライバによる量子制御へのプログラム的アクセスを提供するためのオブジェクトセットを提案する。 qibolabを使えば、実験家や開発者は、量子コンピューティングアルゴリズムを拡張可能なハードウェアに依存しない方法で、量子コンピューティングアルゴリズムの展開を標準化できるように、ハードウェア実装の複雑な側面をライブラリに委譲することができる。まず、ライブラリの全てのコンポーネントの状態を説明し、次に超伝導量子ビットプラットフォームの制御設定の例を示す。最後に,回路ベースのアルゴリズムに関する応用結果を示す。 We present Qibolab, an open-source software library for quantum hardware control integrated with the Qibo quantum computing middleware framework. Qibolab provides the software layer required to automatically execute circuit-based algorithms on custom self-hosted quantum hardware platforms. We introduce a set of objects designed to provide programmatic access to quantum control through pulses-oriented drivers for instruments, transpilers and optimization algorithms. Qibolab enables experimentalists and developers to delegate all complex aspects of hardware implementation to the library so they can standardize the deployment of quantum computing algorithms in a extensible hardware-agnostic way, using superconducting qubits as the first officially supported quantum technology. We first describe the status of all components of the library, then we show examples of control setup for superconducting qubits platforms. Finally, we present successful application results related to circuit-based algorithms.	翻訳日:2024-02-07 05:27:21 公開日:2024-02-05
# SIB-200:200以上の言語と方言におけるトピック分類のためのシンプルで包括的で大きな評価データセット SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects ( http://arxiv.org/abs/2309.07445v2 ) ライセンス: Link先を確認	David Ifeoluwa Adelani, Hannah Liu, Xiaoyu Shen, Nikita Vassilyev, Jesujoba O. Alabi, Yanke Mao, Haonan Gao, Annie En-Shiun Lee	(参考訳) 過去数年間に記録した多言語自然言語処理の進歩にもかかわらず、評価は通常、多数の低リソース言語を除外したデータセットを持つ少数の言語に限られる。本稿では,200言語および方言におけるトピック分類のための大規模オープンソースベンチマークデータセットであるSIB-200を作成し,自然言語理解のための評価データセットの欠如に対処した。 SIB-200でカバーされている多くの言語に対して、これはNLUのための最初の公開評価データセットである。データセットは flores-200 machine translation corpus に基づいている。我々は、データセットの英語部分を注釈化し、文レベルのアノテーションをコーパスに含まれる残りの203言語に拡張した。このタスクの単純さにもかかわらず、我々は、多言語評価が多くの世界言語に拡張される際に、ハイリソース言語と低リソース言語のパフォーマンスの間には、依然として大きなギャップがあることを示す。我々は,多言語モデルの事前学習中,未表現言語ファミリー(ニロティック語やアルタン語-コンゴ語など)やアフリカ,アメリカ,オセアニア,東南アジアの言語が,トピック分類データセットにおいて最も低いパフォーマンスを示すことが判明した。我々のデータセットは、より多様な言語セットにおける多言語言語モデルのより包括的評価を促進することを願っている。 https://github.com/dadelani/sib-200 Despite the progress we have recorded in the last few years in multilingual natural language processing, evaluation is typically limited to a small set of languages with available datasets which excludes a large number of low-resource languages. In this paper, we created SIB-200 -- a large-scale open-sourced benchmark dataset for topic classification in 200 languages and dialects to address the lack of evaluation dataset for Natural Language Understanding (NLU). For many of the languages covered in SIB-200, this is the first publicly available evaluation dataset for NLU. The dataset is based on Flores-200 machine translation corpus. We annotated the English portion of the dataset and extended the sentence-level annotation to the remaining 203 languages covered in the corpus. Despite the simplicity of this task, our evaluation in full-supervised setting, cross-lingual transfer setting and prompting of large language model setting show that there is still a large gap between the performance of high-resource and low-resource languages when multilingual evaluation is scaled to numerous world languages. We found that languages unseen during the pre-training of multilingual language models, under-represented language families (like Nilotic and Altantic-Congo), and languages from the regions of Africa, Americas, Oceania and South East Asia, often have the lowest performance on our topic classification dataset. We hope our dataset will encourage a more inclusive evaluation of multilingual language models on a more diverse set of languages. https://github.com/dadelani/sib-200	翻訳日:2024-02-07 05:21:13 公開日:2024-02-05
# 進化的アルゴリズムを使って配列のキャッシュフレンドリーな一般化モートンレイアウトを見つける Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays ( http://arxiv.org/abs/2309.07002v2 ) ライセンス: Link先を確認	Stephen Nicholas Swatman, Ana-Lucia Varbanescu, Andy D. Pimentel, Andreas Salzburger, Attila Krasznahorkay	(参考訳) 多次元データのレイアウトは、ハードウェアキャッシュの有効性と拡張によってアプリケーションのパフォーマンスに大きな影響を与える可能性がある。一般的な多次元レイアウトには、標準行長および列長のレイアウトとモートン曲線レイアウトが含まれる。本稿では,モートンレイアウトを多次元データレイアウトの非常に大きなファミリーに一般化し,その性能特性を多様に変化させる方法について述べる。この設計空間は遺伝的アルゴリズムに基づく組合せ進化法を用いて効率的に探索できると仮定する。そこで本研究では,このようなレイアウトの染色体表現と,キャッシュシミュレーションを用いた配列レイアウトの適合性推定手法を提案する。我々は,実ハードウェアのカーネル実行時間と適合する適合度関数を示し,その進化戦略により,少数の世代で検討中の8つの実世界のアプリケーションのうち4つにおいて,良好なキャッシュ特性を持つ候補を見つけることができることを示した。最後に、我々の進化的手法を用いた配列レイアウトは、シミュレーション環境だけでなく、実際のハードウェアにおける大幅なパフォーマンス向上(極端な場合では最大10倍)にも影響を与えることを実証する。 The layout of multi-dimensional data can have a significant impact on the efficacy of hardware caches and, by extension, the performance of applications. Common multi-dimensional layouts include the canonical row-major and column-major layouts as well as the Morton curve layout. In this paper, we describe how the Morton layout can be generalized to a very large family of multi-dimensional data layouts with widely varying performance characteristics. We posit that this design space can be efficiently explored using a combinatorial evolutionary methodology based on genetic algorithms. To this end, we propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts using cache simulation. We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties in four out of the eight real-world applications under consideration in a small number of generations. Finally, we demonstrate that the array layouts found using our evolutionary method perform well not only in simulated environments but that they can effect significant performance gains -- up to a factor ten in extreme cases -- in real hardware.	翻訳日:2024-02-07 05:20:47 公開日:2024-02-05
# RLHFのアライメント税の緩和 Mitigating the Alignment Tax of RLHF ( http://arxiv.org/abs/2309.06256v3 ) ライセンス: Link先を確認	Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, Tong Zhang	(参考訳) LLMは事前訓練中に幅広い能力を得るが、強化学習とヒューマンフィードバック(RLHF)の下でのLLMの整列は忘れてしまうことがある。この仮説を実証的に検証するために,OpenLLaMA-3Bを用いて既存のRLHFアルゴリズムを用いて実験を行った。一方、忘れを和らげる様々なテクニックにもかかわらず、RLHFのパフォーマンスとは相反することが多く、報酬の最大化と回避のトレードオフにつながる。本稿では,上記のllmの整列化における課題を踏まえて,前・後rlhfモデル重みを補間したモデル平均化法を考察し,より効率的な報酬・タックス・パレート・フロントを実現する。その効果を理解するため、我々はモデル平均化に関する理論的洞察を提供し、タスクが重なり合った機能空間を共有するレイヤ上で、機能多様性を増すことにより、パフォーマンスパレートを前もって向上させることを明らかにした。低レベルトランスフォーマー層の平均化の利点を示すことによって、実証的な証拠が分析を裏付ける。変換器の異なる層を平均化すると、報酬税のトレードオフが著しく異なるという分析と観察に基づいて、モデル層の様々な組み合わせ比を適応的に求める適応モデル平均化(AMA)を提案する。 AMAは最小限のアライメント税を課しながらアライメント報酬を最大化する。さらに,OpenLLaMA-3B上でのRLHFアルゴリズムの性能評価を行い,さらにMistral-7Bまで拡張した。 LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments with existing RLHF algorithms using OpenLLaMA-3B, which revealed a pronounced alignment tax in NLP tasks. On the other hand, despite various techniques to mitigate forgetting, they are often at odds with the RLHF performance, leading to a trade-off between reward maximization and forgetting mitigation. In light of the above pressing issue in aligning LLMs, in this paper we explore model averaging, which interpolates between pre and post RLHF model weights, to achieve a more efficient reward-tax Pareto front. To understand its effectiveness, We offer theoretical insights into model averaging, revealing that it enhances performance Pareto front by increasing feature diversity on the layers where tasks share overlapped feature spaces. Empirical evidence corroborates our analysis by showing the benefits of averaging low-level transformer layers. Building on the analysis and the observation that averaging different layers of the transformer leads to significantly different reward-tax trade-offs, we propose Adaptive Model Averaging (AMA) to adaptively find various combination ratios of model layers. AMA seeks to maximize the alignment reward while incurring minimal alignment tax. Moreover, we validate AMA's performance across a range of RLHF algorithms over OpenLLaMA-3B and further extend our findings to Mistral-7B.	翻訳日:2024-02-07 05:18:43 公開日:2024-02-05
# 捕捉イオンによる高速交換冷却 Rapid Exchange Cooling with Trapped Ions ( http://arxiv.org/abs/2309.02581v2 ) ライセンス: Link先を確認	Spencer D. Fallek, Vikram S. Sandhu, Ryan A. McGill, John M. Gray, Holly N. Tinkey, Craig R. Clark and Kenton R. Brown	(参考訳) トラップイオン量子電荷結合デバイス(QCCD)アーキテクチャは、先進量子情報処理の第一候補である。現在のQCCD実装では、不完全イオン輸送と異常加熱は計算中にイオン運動を励起することができる。これに対抗するには、高忠実度ゲート性能を維持するために中間冷却が必要である。計算イオンを他の種のイオンに同調的に冷却することは、一般的に使用される戦略である。ここでは、交換冷却と呼ばれる別のアプローチを示す。交感冷却とは異なり、交換冷却は2つの異なる原子種をトラップする必要がない。このプロトコルは、繰り返しレーザー冷却される「冷却」イオンのバンクを導入する。計算イオンは、冷却液イオンをその近傍に輸送することで冷却することができる。我々は、この概念を2つの$^{40}\mathrm{Ca}^{+}$イオンで実験的にテストし、107$\mathrm {\mu s}$で必要な輸送を実行する。計算イオンから軸運動エネルギーの96%以上と最大102(5)クオンタを除去した。冷却剤イオンの再冷却が計算イオンを脱離しないことを検証する。このアプローチは、高速な量子シミュレーションと計算が可能な単一種QCCDプロセッサの実現可能性を検証する。 The trapped-ion quantum charge-coupled device (QCCD) architecture is a leading candidate for advanced quantum information processing. In current QCCD implementations, imperfect ion transport and anomalous heating can excite ion motion during a calculation. To counteract this, intermediate cooling is necessary to maintain high-fidelity gate performance. Cooling the computational ions sympathetically with ions of another species, a commonly employed strategy, creates a significant runtime bottleneck. Here, we demonstrate a different approach we call exchange cooling. Unlike sympathetic cooling, exchange cooling does not require trapping two different atomic species. The protocol introduces a bank of "coolant" ions which are repeatedly laser cooled. A computational ion can then be cooled by transporting a coolant ion into its proximity. We test this concept experimentally with two $^{40}\mathrm{Ca}^{+}$ ions, executing the necessary transport in 107 $\mathrm{\mu s}$, an order of magnitude faster than typical sympathetic cooling durations. We remove over 96%, and as many as 102(5) quanta, of axial motional energy from the computational ion. We verify that re-cooling the coolant ion does not decohere the computational ion. This approach validates the feasibility of a single-species QCCD processor, capable of fast quantum simulation and computation.	翻訳日:2024-02-07 05:17:56 公開日:2024-02-05
# 陰謀者の解剖:包括的Twitterデータセットによるトラストの公開 The Anatomy of Conspirators: Unveiling Traits using a Comprehensive Twitter Dataset ( http://arxiv.org/abs/2308.15154v2 ) ライセンス: Link先を確認	Margherita Gambini, Serena Tardelli, Maurizio Tesconi	(参考訳) 陰謀論に関する議論は、現在オンライン環境における誤報の高まりの中で活発化している。この分野での研究は、ソーシャルメディア上の陰謀論の検出に焦点が当てられ、限られたデータセットに依存することが多い。本研究では,2022年を通じて共謀活動に従事するアカウントを含むTwitterデータセットを構築するための新しい手法を提案する。我々のアプローチは、特定の陰謀理論や情報操作に依存しないデータ収集に焦点を当てている。さらに、我々のデータセットは、陰謀活動に関わる個人とかなり比較可能なランダムに選択されたユーザーからなる制御グループを含む。この包括的な収集作業により、合計15万のアカウントと3700万のツイートがタイムラインから抽出された。我々は,トピックス,プロファイル,行動特性の3次元にわたる2つのグループの比較分析を行った。その結果,共謀と制御の利用者は,プロファイルのメタデータ特性で類似性を示した。しかし, 行動・活動の面では, 特に議論された話題, 使用用語, トレンドに対する態度について, 大きく異なっていた。また,この2つのグループ間のボットユーザの存在に有意な差はみられなかった。最後に,ボット,トロル,言語文献から借用した機能を用いて共謀ユーザを識別する分類器を開発した。その結果、高い精度(F1スコア0.94)を示し、陰謀関連アカウントに関する最も差別的な特徴を明らかにすることができた。 The discourse around conspiracy theories is currently thriving amidst the rampant misinformation in online environments. Research in this field has been focused on detecting conspiracy theories on social media, often relying on limited datasets. In this study, we present a novel methodology for constructing a Twitter dataset that encompasses accounts engaged in conspiracy-related activities throughout the year 2022. Our approach centers on data collection that is independent of specific conspiracy theories and information operations. Additionally, our dataset includes a control group comprising randomly selected users who can be fairly compared to the individuals involved in conspiracy activities. This comprehensive collection effort yielded a total of 15K accounts and 37M tweets extracted from their timelines. We conduct a comparative analysis of the two groups across three dimensions: topics, profiles, and behavioral characteristics. The results indicate that conspiracy and control users exhibit similarity in terms of their profile metadata characteristics. However, they diverge significantly in terms of behavior and activity, particularly regarding the discussed topics, the terminology used, and their stance on trending subjects. In addition, we find no significant disparity in the presence of bot users between the two groups. Finally, we develop a classifier to identify conspiracy users using features borrowed from bot, troll and linguistic literature. The results demonstrate a high accuracy level (with an F1 score of 0.94), enabling us to uncover the most discriminating features associated with conspiracy-related accounts.	翻訳日:2024-02-07 05:16:11 公開日:2024-02-05
# オンラインcmdpにおけるモデルフリー, 後悔-最適政策識別 Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs ( http://arxiv.org/abs/2309.15395v4 ) ライセンス: Link先を確認	Zihan Zhou, Honghao Wei, Lei Ying	(参考訳) 本稿では,制約付きマルコフ決定プロセス(CMDP)におけるBPI問題について考察する。私たちは、モデルフリーで、後悔の少ないアルゴリズムに興味を持ち、確率の高いほぼ最適なポリシーを特定しています。オンラインCMDPのサブ線形後悔と制約違反を伴う既存のモデルフリーアルゴリズムは、最適ポリシーへの収束保証を提供しておらず、以前に使用したすべてのポリシーからランダムにポリシーがサンプリングされた場合にのみ平均的なパフォーマンス保証を提供する。本稿では,以前に証明されたCMDPの基本構造特性に基づいて,Pruning-Refinement-Identification (PRI) と呼ばれる新しいアルゴリズムを開発した。このプロパティは、n$制約のあるcmdpに対して、最大$n$確率的決定を持つ最適なポリシーが存在すると言っている。提案するアルゴリズムは,まず確率的決定を行うべき段階と状態を特定し,その確率的決定の分布を微調整する。 PRIは3つの目標を達成する。 (i)PRIはモデルフリーのアルゴリズムであり、 (二学習の終わりに高い確率でほぼ最適な政策を出力すること。) (iii) PRI は $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint violation を保証します。これは、モデルなしのアルゴリズムの下で、$H$ は各エピソードの長さ、$S$ は状態の数、$A$ はアクションの数、学習中のエピソードの総数は2K+\tilde{\mathcal O}(K^{0.25})です。 $ This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an approximately optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs proved before, which we call limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs an approximately optimal policy with a high probability at the end of learning; and (iii) PRI guarantees $\tilde{\mathcal{O}}(H\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(H^4 \sqrt{SA}K^{\frac{4}{5}})$ under a model-free algorithm, where $H$ is the length of each episode, $S$ is the number of states, $A$ is the number of actions, and the total number of episodes during learning is $2K+\tilde{\cal O}(K^{0.25}).$	翻訳日:2024-02-07 05:08:51 公開日:2024-02-05
# DiffusionWorldViewer: 生成テキスト・画像モデルによる世界観の表現と拡大 DiffusionWorldViewer: Exposing and Broadening the Worldview Reflected by Generative Text-to-Image Models ( http://arxiv.org/abs/2309.09944v2 ) ライセンス: Link先を確認	Zoe De Simone and Angie Boggust and Arvind Satyanarayan and Ashia Wilson	(参考訳) TTI(Generative Text-to-image)モデルは、短いテキスト記述から高品質な画像を生成し、学術的・創造的な領域で広く利用されている。人間と同じように、TTIモデルは世界観を持ち、トレーニングデータから学んだ世界観と、与えられたプロンプトのために生成した画像に影響を与えるタスクを持っている。しかし、TTIモデルのワールドビューはユーザから隠されていることが多く、ユーザがTTI出力に関する直観を構築することは困難であり、ユーザのワールドビューと不一致であることが多いため、ユーザの期待に合わない出力イメージが生成される。そこで本稿では,tttiモデルのworldviewを出力層間で公開するインタラクティブインタフェースであるd diffusionworldviewerを紹介し,出力画像とユーザ視点を整合させる編集ツールを提供する。 18の多様なTTIユーザによるユーザスタディにおいて、DiffusionWorldViewerは、ユーザが生成した画像のさまざまな視点を表現し、現在のTTIモデルに反映されている限られた世界観に挑戦するのに役立つ。 Generative text-to-image (TTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. Like humans, TTI models have a worldview, a conception of the world learned from their training data and task that influences the images they generate for a given prompt. However, the worldviews of TTI models are often hidden from users, making it challenging for users to build intuition about TTI outputs, and they are often misaligned with users' worldviews, resulting in output images that do not match user expectations. In response, we introduce DiffusionWorldViewer, an interactive interface that exposes a TTI model's worldview across output demographics and provides editing tools for aligning output images with user perspectives. In a user study with 18 diverse TTI users, we find that DiffusionWorldViewer helps users represent their varied viewpoints in generated images and challenge the limited worldview reflected in current TTI models.	翻訳日:2024-02-07 05:05:56 公開日:2024-02-05
# トレーニング対象を超えて:大規模言語モデルにおける逆モデル多様性の解釈 Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models ( http://arxiv.org/abs/2310.08164v3 ) ライセンス: Link先を確認	Luke Marks, Amir Abdullah, Luna Mendez, Rauno Arike, Philip Torr, Fazl Barez	(参考訳) 人間のフィードバック(RLHF)からの強化学習によって微調整された大規模言語モデル(LLM)は、より広くデプロイされている。我々は、RLHF 中に LLM に起こる変化が高次世代をもたらすことを示すために、$\textit{Implicit Reward Model}$ (IRM) という用語を造った。我々は、IRMを解釈し、それらを誘導する微調整プロセスで使用されるRLHF報酬モデルから、それらのばらつきを測定する。 LLMのIRMに線形関数を適用することにより、RLHF報酬モデルと同じ型シグネチャを持つ報酬モデルを構築し、直接比較することができる。さらに,RLHF報酬モデルとの関連性に基づき,LLMが生成する特徴の分類と相互比較によるIRMの構築を検証した。このことは、$\textit{safety}$と$\textit{alignment}$ of LLMsの重要なコンポーネントであると考えています。 Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed. We coin the term $\textit{Implicit Reward Model}$ (IRM) to refer to the changes that occur to an LLM during RLHF that result in high-reward generations. We interpret IRMs, and measure their divergence from the RLHF reward model used in the fine-tuning process that induced them. By fitting a linear function to an LLM's IRM, a reward model with the same type signature as the RLHF reward model is constructed, allowing for direct comparison. Additionally, we validate our construction of the IRM through cross-comparison with classifications of features generated by an LLM based on their relevance to the RLHF reward model. Better comprehending IRMs can help minimize discrepencies between LLM behavior and training objectives, which we believe to be an essential component of the $\textit{safety}$ and $\textit{alignment}$ of LLMs.	翻訳日:2024-02-07 04:56:23 公開日:2024-02-05
# 温度条件型gflownetsのためのlogitsスケールの学習 Learning to Scale Logits for Temperature-Conditional GFlowNets ( http://arxiv.org/abs/2310.02823v2 ) ライセンス: Link先を確認	Minsu Kim, Joohwan Ko, Taeyoung Yun, Dinghuai Zhang, Ling Pan, Woochang Kim, Jinkyoo Park, Emmanuel Bengio, Yoshua Bengio	(参考訳) gflownetsは確率的ポリシーを通じて構成構造を逐次生成する確率モデルである。 GFlowNetの中で、温度条件付きGFlowNetsは、探索と利用のための温度ベースの制御性を導入することができる。温度条件付きGFlowNetのトレーニングを大幅に高速化する新しいアーキテクチャ設計であるGFlowNets(Logit-GFN)を提案する。以前に提案されたアプローチは、異なる温度が、ポリシーのロジットの規模だけでなく、非常に異なる勾配プロファイルをもたらす可能性があるため、ディープ・ネットワーク・トレーニングにおいて数値的な課題を導入したという考え方に基づいている。政策のロジットを直接スケールするために、温度の学習関数を使用する場合、課題は大幅に削減される。また、Logit-GFNを使用することで、オフライン学習における一般化機能とオンライン学習におけるモード発見機能により、GFlowNetsが改善される。我々のコードは \url{https://github.com/dbsxodud-11/logit-gfn} で入手できる。 GFlowNets are probabilistic models that sequentially generate compositional structures through a stochastic policy. Among GFlowNets, temperature-conditional GFlowNets can introduce temperature-based controllability for exploration and exploitation. We propose \textit{Logit-scaling GFlowNets} (Logit-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed approaches introduced numerical challenges in the deep network training, since different temperatures may give rise to very different gradient profiles as well as magnitudes of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. Also, using Logit-GFN, GFlowNets can be improved by having better generalization capabilities in offline learning and mode discovery capabilities in online learning, which is empirically verified in various biological and chemical tasks. Our code is available at \url{https://github.com/dbsxodud-11/logit-gfn}	翻訳日:2024-02-07 04:55:57 公開日:2024-02-05
# ppt:効率的な視覚トランスフォーマーのためのトークンプルーニングとプール PPT: Token Pruning and Pooling for Efficient Vision Transformers ( http://arxiv.org/abs/2310.01812v3 ) ライセンス: Link先を確認	Xinjian Wu, Fanhu Zeng, Xiudong Wang, Xinghao Chen	(参考訳) ビジョントランスフォーマー (vits) はコンピュータビジョンの分野で強力なモデルとして登場し、様々なビジョンタスクで優れたパフォーマンスを提供する。しかし、高い計算複雑性は現実のシナリオで実用的応用に重大な障壁をもたらす。全てのトークンが最終予測に等しく寄与するわけではなく、より少ないトークンは計算コストを低減し、冗長トークンの削減はビジョントランスフォーマーを加速する主要なパラダイムとなっている。しかし,トークンプルーニングによる不注意冗長性を低減するか,トークンマージによる重複冗長性を低減するかは最適ではない。そこで本稿では,これら2種類の冗長性を異なる層で適応的に扱うための新しい加速フレームワーク,トークンプルーニングとプーリングトランスフォーマ(ppt)を提案する。トレーニング可能なパラメータを追加せずに、トークンプルーニングとトークンプーリングの両方をViTsに統合することにより、PTは予測精度を維持しながら、モデルの複雑さを効果的に軽減する。例えば、PPTは37%以上のFLOPを削減し、ImageNetデータセットの精度低下なしに、DeiT-Sのスループットを45%以上改善している。コードはhttps://github.com/xjwu1024/PPTとhttps://github.com/mindspore-lab/models/で入手できる。 Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their practical applications in real-world scenarios. Motivated by the fact that not all tokens contribute equally to the final predictions and fewer tokens bring less computational cost, reducing redundant tokens has become a prevailing paradigm for accelerating vision transformers. However, we argue that it is not optimal to either only reduce inattentive redundancy by token pruning, or only reduce duplicative redundancy by token merging. To this end, in this paper we propose a novel acceleration framework, namely token Pruning & Pooling Transformers (PPT), to adaptively tackle these two types of redundancy in different layers. By heuristically integrating both token pruning and token pooling techniques in ViTs without additional trainable parameters, PPT effectively reduces the model complexity while maintaining its predictive accuracy. For example, PPT reduces over 37% FLOPs and improves the throughput by over 45% for DeiT-S without any accuracy drop on the ImageNet dataset. The code is available at https://github.com/xjwu1024/PPT and https://github.com/mindspore-lab/models/	翻訳日:2024-02-07 04:53:29 公開日:2024-02-05
# ResBit: カテゴリ値のための残留ビットベクトル ResBit: Residual Bit Vector for Categorical Values ( http://arxiv.org/abs/2309.17196v2 ) ライセンス: Link先を確認	Masane Fuchi, Amar Zanashir, Hiroto Minami, Tomohiro Takagi	(参考訳) 離散/分類データの表現方法であるワンホットベクトルは、その単純さと直感性のために機械学習で一般的に使用される。しかし、1ホットベクトルは次元の線形増加に悩まされ、特に多くのカテゴリを含むデータセットを扱う場合、計算とメモリの課題が引き起こされる。この問題に対処するために,分類データを密に表現するResidual Bit Vectors (ResBit)を提案する。 Analog Bitsも同様のアプローチを示しているが、分類データ生成タスクでは課題に直面している。 ResBitはこれらの制限を克服し、より汎用的なソリューションを提供する。実験では,表データ生成に焦点をあて,さまざまなカテゴリデータを用いて,シナリオ間のパフォーマンスを検証した。アクセラレーションを確認し、パフォーマンスの維持や改善を確実にします。 One-hot vectors, a method for representing discrete/categorical data, are commonly used in machine learning due to their simplicity and intuitiveness. However, the one-hot vectors suffer from a linear increase in dimensionality, posing computational and memory challenges, especially when dealing with datasets containing numerous categories. To address this issue, we propose Residual Bit Vectors (ResBit), a technique for densely representing categorical data. While Analog Bits presents a similar approach, it faces challenges in categorical data generation tasks. ResBit overcomes these limitations, offering a more versatile solution. In our experiments, we focus on tabular data generation, examining the performance across scenarios with varying amounts of categorical data. We verify the acceleration and ensure the maintenance or improvement of performance.	翻訳日:2024-02-07 04:52:53 公開日:2024-02-05
# マルチモーダル大言語モデルによる命令に基づく画像編集の指導 Guiding Instruction-based Image Editing via Multimodal Large Language Models ( http://arxiv.org/abs/2309.17102v2 ) ライセンス: Link先を確認	Tsu-Jui Fu and Wenze Hu and Xianzhi Du and William Yang Wang and Yinfei Yang and Zhe Gan	(参考訳) インストラクションベースの画像編集は、詳細な説明や地域マスクのない自然なコマンドによる画像操作の制御性と柔軟性を向上させる。しかし、現在の方法では、人間の指示があまりにも簡潔すぎることがある。 MLLM(Multimodal large language model)は,マルチモーダル理解と視覚応答生成において有望な能力を示す。 MLLMはどのようにして編集手順を容易にし、MGIE(MLLM-Guided Image Editing)を提示するかを検討する。 MGIEは表現的な指示を導き、明確なガイダンスを提供する。編集モデルは、この視覚的想像力を共同で捉え、エンドツーエンドのトレーニングを通じて操作を行う。 photoshopスタイルの修正,グローバル写真最適化,ローカル編集のさまざまな側面を評価した。広範な実験結果から,表現的指示は命令に基づく画像編集に不可欠であることが示され,mgieは競争的推論効率を維持しつつ,自動計測や人間評価において顕著な改善をもたらす可能性がある。 Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions are sometimes too brief for current methods to capture and follow. Multimodal large language models (MLLMs) show promising capabilities in cross-modal understanding and visual-aware response generation via LMs. We investigate how MLLMs facilitate edit instructions and present MLLM-Guided Image Editing (MGIE). MGIE learns to derive expressive instructions and provides explicit guidance. The editing model jointly captures this visual imagination and performs manipulation through end-to-end training. We evaluate various aspects of Photoshop-style modification, global photo optimization, and local editing. Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.	翻訳日:2024-02-07 04:52:41 公開日:2024-02-05
# AtomSurf : タンパク質構造学習のための表面表現 AtomSurf : Surface Representation for Learning on Protein Structures ( http://arxiv.org/abs/2309.16519v2 ) ライセンス: Link先を確認	Vincent Mallet, Souhaib Attaiki and Maks Ovsjanikov	(参考訳) タンパク質構造から学ぶ上で重要な側面は、それらの表現を幾何学的対象(グリッド、グラフ、表面など)として選択することであり、関連する学習方法を条件付ける。与えられたアプローチのパフォーマンスは、表現とそれに対応する学習モデルの両方に依存します。本稿では,タンパク質を$\textit{surfaces embedded in 3d}$として表現し,その表現を確立されたベンチマークatom3dで評価する。最初の発見は、有望な結果にもかかわらず、最先端のsurfaceベースの学習アプローチだけでは、このベンチマークの他のモダリティとは競合しないということです。そこで本研究では,1つの学習可能なアーキテクチャにグラフとサーフェスベースのアプローチを組み込んだ新しい相乗的アプローチを提案する。 2つの表現の長所を継承するこの組み合わせを用いることで、$\textit{all test task}$、atom3dベンチマーク、およびバインディングポケット分類において、最先端の結果が得られることを示す。私たちのコードとデータはオンラインで見つけることができます。 An essential aspect of learning from protein structures is the choice of their representation as a geometric object (be it a grid, graph, or surface), which conditions the associated learning method. The performance of a given approach will then depend on both the representation and its corresponding learning model. In this paper, we investigate representing proteins as $\textit{surfaces embedded in 3D}$ and evaluate this representation within an established benchmark: atom3d. Our first finding is that despite promising results, state-of-the-art surface-based learning approaches alone are not competitive with other modalities on this benchmark. Building on this, we introduce a novel synergistic approach that incorporates graph and surface-based approaches within a single learnable architecture. We show that using this combination, which inherits the strengths of the two representations, we obtain state-of-the-art results across $\textit{all tested tasks}$, on the atom3d benchmark, as well as on binding pocket classification. Our code and data can be found online: https://github.com/Vincentx15/atom2D.	翻訳日:2024-02-07 04:52:07 公開日:2024-02-05
# DoGE: 一般化推定によるドメイン再重み付け DoGE: Domain Reweighting with Generalization Estimation ( http://arxiv.org/abs/2310.15393v2 ) ライセンス: Link先を確認	Simin Fan, Matteo Pagliardini, Martin Jaggi	(参考訳) 事前学習データのカバレッジと構成は、Large Language Models(LLMs)の一般化能力に大きな影響を及ぼす。その重要性にもかかわらず、最近のllmはデータドメインの影響を増減するためにヒューリスティックスと試行錯誤に依存している。本稿では,各領域(領域重み)からのサンプリング確率を原理的に最適化した一般化推定(doge)による領域重み付けを提案する。私たちのアプローチは2段階のプロセスです。 (i)二段階最適化アルゴリズムを用いて、プロキシモデルを訓練してドメイン重み付けを得る。 (ii)学習したドメイン重みに応じて訓練領域をサンプリングして、より大きなベースモデルを訓練すること。実験では、DoGEがベースモデルの一般化をターゲットデータ混合にどのように改善するかを広範囲に示す。 slimpajamaデータセットでは、ベースモデルがベースラインメソッドと比較して、6ドルのタスクにまたがる複雑さとわずかな推論の精度が向上しています。さらに、事前学習コーパス(oodドメイン)では認識されていないドメイン外ターゲットタスクへの一般化を目指して、dogeはドメイン間依存関係を効果的に識別し、ターゲットドメインのより優れたテストパープレキシティを一貫して達成します。 The coverage and composition of the pretraining data significantly impacts the generalization ability of Large Language Models (LLMs). Despite its importance, recent LLMs still rely on heuristics and trial and error to increase or reduce the influence of data-domains. We propose DOmain reweighting with Generalization Estimation (DoGE), which optimizes the probability of sampling from each domain (domain weights) in a principled way. Our approach is a two-stage process consisting of (i) training a proxy model to obtain domain weights using a bi-level optimization algorithm; (ii) training a larger base model by sampling training domains according to the learned domain weights. In our experiments, we extensively show how DoGE improves the generalization of the base model to any target data mixture. On the SlimPajama dataset, our base model gets better perplexity and few-shot reasoning accuracies across $6$ tasks compared to baseline methods. Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.	翻訳日:2024-02-07 04:45:07 公開日:2024-02-05
# メッセージパッシングのレンズによるハイパーグラフニューラルネットワーク: ホモフィリとアーキテクチャ設計への共通の視点 Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design ( http://arxiv.org/abs/2310.07684v2 ) ライセンス: Link先を確認	Lev Telyatnikov, Maria Sofia Bucarelli, Guillermo Bernardez, Olga Zaghen, Simone Scardapane, Pietro Lio	(参考訳) 現在のハイパーグラフ学習方法論やハイパーグラフ領域のベンチマークデータセットのほとんどは、グラフアナログから手順を持ち上げて得られるため、ハイパーグラフの特定の特徴を過大評価することになる。 Q1 ホモフィリーの概念はハイパーグラフニューラルネットワーク(HNN)において重要な役割を担っているか? Q2 高階ネットワークの特徴に注意して対処することで、現在のHNNアーキテクチャを改善する余地はあるか? Q3 既存のデータセットは、HNNに有意義なベンチマークを提供するか? そこで,我々はまず,メッセージパッシング(mp)方式に基づく高次ネットワークにおけるホモフィリの新たな概念化を提案し,解析的検証と高次ネットワークのモデリングの両方を統一した。さらに,ハイパーエッジ依存ノード表現の保持やノード/ハイパーエッジ確率サンプリングなど,hnn内の高次構造を処理するための自然だがほとんど未検討の手法についても検討し,これまでで最も一般的なmp定式化であるmultiset-と,オリジナルのアーキテクチャ設計であるmultisetmixerに導いた。最後に、我々の提案を文脈的に分析し、我々の質問に対する洞察をうまく提供するための広範な実験を行う。 Most of the current hypergraph learning methodologies and benchmarking datasets in the hypergraph realm are obtained by lifting procedures from their graph analogs, leading to overshadowing specific characteristics of hypergraphs. This paper attempts to confront some pending questions in that regard: Q1 Can the concept of homophily play a crucial role in Hypergraph Neural Networks (HNNs)? Q2 Is there room for improving current HNN architectures by carefully addressing specific characteristics of higher-order networks? Q3 Do existing datasets provide a meaningful benchmark for HNNs? To address them, we first introduce a novel conceptualization of homophily in higher-order networks based on a Message Passing (MP) scheme, unifying both the analytical examination and the modeling of higher-order networks. Further, we investigate some natural, yet mostly unexplored, strategies for processing higher-order structures within HNNs such as keeping hyperedge-dependent node representations, or performing node/hyperedge stochastic samplings, leading us to the most general MP formulation up to date -MultiSet-, as well as to an original architecture design, MultiSetMixer. Finally, we conduct an extensive set of experiments that contextualize our proposals and successfully provide insights about our inquiries.	翻訳日:2024-02-07 04:43:18 公開日:2024-02-05
# ニューラルネットワーク用ゼロレベルセットエンコーダ Zero-Level-Set Encoder for Neural Distance Fields ( http://arxiv.org/abs/2310.06644v2 ) ライセンス: Link先を確認	Stefan Rhys Jeske and Jonathan Klein and Dominik L. Michels and Jan Bender	(参考訳) 神経形状表現は一般的に、特定の空間位置で符号付き距離または占有値を計算するためにニューラルネットワークを使用する3次元幾何学を表す。本稿では,1つの前方パスに3次元形状を埋め込む新しいエンコーダデコーダニューラルネットワークを提案する。我々のアーキテクチャは、グラフベースおよびボクセルベースのコンポーネントを組み込んだマルチスケールハイブリッドシステムと、連続的に微分可能なデコーダに基づいている。さらに、ネットワークはアイコン方程式を解くために訓練され、訓練と推論のためにゼロレベル集合の知識のみを必要とする。これは、これまでのほとんどの作業とは対照的に、ネットワークは、非ゼロ距離値や形状占有率の明示的な事前知識なしに、有効な符号付き距離フィールドを出力することができることを意味する。さらに, 非水密曲面や非多様体幾何学の文脈において, 表面正規性が十分に定義されていない場合の損失関数の修正を提案する。全体として、これは神経距離フィールドのトレーニングと評価の計算オーバーヘッドを削減し、アプリケーションが難しい形状を可能にするのに役立つ。シミュレーションデータと生の3Dスキャンに基づいて,変形形状からなるデータセットに対して,本手法の有効性,一般化性,拡張性を実証した。さらに,固定頂点数入力と可変頂点数入力の両方で単一クラスおよび複数クラスエンコーディングを行い,多種多様な応用例を示す。 Neural shape representation generally refers to representing 3D geometry using neural networks, e.g., to compute a signed distance or occupancy value at a specific spatial position. In this paper, we present a novel encoder-decoder neural network for embedding 3D shapes in a single forward pass. Our architecture is based on a multi-scale hybrid system incorporating graph-based and voxel-based components, as well as a continuously differentiable decoder. Furthermore, the network is trained to solve the Eikonal equation and only requires knowledge of the zero-level set for training and inference. This means that in contrast to most previous work, our network is able to output valid signed distance fields without explicit prior knowledge of non-zero distance values or shape occupancy. We further propose a modification of the loss function in case that surface normals are not well defined, e.g., in the context of non-watertight surfaces and non-manifold geometry. Overall, this can help reduce the computational overhead of training and evaluating neural distance fields, as well as enabling the application to difficult shapes. We finally demonstrate the efficacy, generalizability and scalability of our method on datasets consisting of deforming shapes, both based on simulated data and raw 3D scans. We further show single-class and multi-class encoding, on both fixed and variable vertex-count inputs, showcasing a wide range of possible applications.	翻訳日:2024-02-07 04:42:54 公開日:2024-02-05
# 計画トークンを用いた言語モデル数学推論の指導 Guiding Language Model Math Reasoning with Planning Tokens ( http://arxiv.org/abs/2310.05707v3 ) ライセンス: Link先を確認	Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni	(参考訳) 大規模言語モデル(LLM)は、最近、連鎖推論のような複雑な推論タスクを実行する能力に対して、かなりの関心を集めている。しかしながら、この能力を強化する既存のアプローチのほとんどは、モデルの推論能力の構造的な側面を無視しながら、データ駆動型メソッドに大きく依存しています。 LLMは個々の推論ステップをうまく管理できますが、すべての推論チェーンの一貫性を維持するのに苦労しています。これを解決するために,各推論ステップの始めに計画トークンを導入し,モデルのガイドとして機能し,モデルパラメータにそれらの埋め込みを追加する。我々のアプローチでは、トレーニング可能なパラメータ(わずか0.001%)の無視可能な増加が必要であり、完全な微調整またはよりパラメータ効率の良いスキームによって適用できる。提案手法の有効性を3つの異なるLLMに適用し,3つの算術語問題データセット(標準微調整ベースライン)間で顕著な精度向上を示す。 Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.	翻訳日:2024-02-07 04:42:07 公開日:2024-02-05
# ユニバーサルドメイン適応のためのメモリ支援サブプロトタイプマイニング Memory-Assisted Sub-Prototype Mining for Universal Domain Adaptation ( http://arxiv.org/abs/2310.05453v2 ) ライセンス: Link先を確認	Yuxiang Lai (1 and 2), Yi Zhou (1 and 2), Xinghong Liu (1 and 2), Tao Zhou (3) ((1) School of Computer Science and Engineering, Southeast University, China (2) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China (3) School of Computer Science and Engineering, Nanjing University of Science and Technology, China)	(参考訳) ユニバーサルドメイン適応は、クラスを整列させ、ソースとターゲットドメインの同一カテゴリ間の特徴ギャップを減らすことを目的としている。対象のプライベートカテゴリは、ソースドメインに含まれないため、適応プロセス中に未知のクラスとして設定される。しかし、既存の手法の多くはカテゴリ内のクラス内構造を見落としており、特に同じカテゴリに属するサンプル間で重要な概念シフトがある場合である。大きな概念シフトを持つサンプルを強制的に押し付けると、適応性能に悪影響を及ぼす可能性がある。さらに、解釈可能性の観点からは、視覚の特徴を戦闘機や民間航空機のような重要な相違点と一致させることは理不尽である。残念ながら、このような意味的曖昧さとアノテーションのコストのため、カテゴリは必ずしも詳細に分類されるわけではないため、モデルが正確な適応を行うのは困難である。そこで本研究では,同一のサブクラスに属するサンプルとマイニングサブクラスの違いを学習できるメモリ支援サブプロトタイプマイニング (memspm) 法を提案する。そうすることで、我々のモデルは、転送可能性を高め、同じカテゴリにアノテートされたサンプル間の固有の差異を反映するより合理的な特徴空間を学習する。我々は,UniDA,OSDA,PDAを含む複数のシナリオに対してMemSPM法の有効性を評価する。提案手法は,4つのベンチマークにおいて,ほとんどの場合,最先端の性能を実現する。 Universal domain adaptation aims to align the classes and reduce the feature gap between the same category of the source and target domains. The target private category is set as the unknown class during the adaptation process, as it is not included in the source domain. However, most existing methods overlook the intra-class structure within a category, especially in cases where there exists significant concept shift between the samples belonging to the same category. When samples with large concept shift are forced to be pushed together, it may negatively affect the adaptation performance. Moreover, from the interpretability aspect, it is unreasonable to align visual features with significant differences, such as fighter jets and civil aircraft, into the same category. Unfortunately, due to such semantic ambiguity and annotation cost, categories are not always classified in detail, making it difficult for the model to perform precise adaptation. To address these issues, we propose a novel Memory-Assisted Sub-Prototype Mining (MemSPM) method that can learn the differences between samples belonging to the same category and mine sub-classes when there exists significant concept shift between them. By doing so, our model learns a more reasonable feature space that enhances the transferability and reflects the inherent differences among samples annotated as the same category. We evaluate the effectiveness of our MemSPM method over multiple scenarios, including UniDA, OSDA, and PDA. Our method achieves state-of-the-art performance on four benchmarks in most cases.	翻訳日:2024-02-07 04:41:50 公開日:2024-02-05
# エントロピーMCMC:平底盆地からの試料採取 Entropy-MCMC: Sampling from Flat Basins with Ease ( http://arxiv.org/abs/2310.05401v2 ) ライセンス: Link先を確認	Bolian Li, Ruqi Zhang	(参考訳) ベイズ深層学習は後方分布推定の質をカウントする。しかし、ディープニューラルネットワークの後方は本質的に非常にマルチモーダルであり、局所モードは一般化性能が異なる。実用的な予算が与えられると、元の後方を狙うことは、いくつかのサンプルが"悪い"モードに閉じ込められ、過剰なフィッティングに苦しむ可能性があるため、最適以下のパフォーマンスにつながる可能性がある。一般化誤差の低い「良い」モードはエネルギーランドスケープの平坦な流域にしばしば存在するという観察を活かし、これらの平坦な領域の後方の偏差サンプリングを提案する。具体的には,mcmcサンプラーを平らな盆地に導くために,シャープモードのない後方平滑化に類似した定常分布を補助誘導変数として導入する。この導出変数をモデルパラメータと統合することにより、計算オーバーヘッドを最小限に抑えた効率的なサンプリングを可能にする単純な結合分布を作成する。提案手法の収束性を証明し, 強凸条件下での既存の平坦性認識法よりも高速に収束することを示す。実験により,本手法は後方の平らな盆地から試料を採取し,分類,校正,分布外検出など,複数のベンチマークで比較した基準線を上回った。 Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, targeting at the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.	翻訳日:2024-02-07 04:41:25 公開日:2024-02-05
# シングルIMUと階層型機械学習モデルによる高齢者のオタゴ運動モニタリング Otago Exercises Monitoring for Older Adults by a Single IMU and Hierarchical Machine Learning Models ( http://arxiv.org/abs/2310.03512v2 ) ライセンス: Link先を確認	Meng Shang, Lenore Dedeyne, Jolan Dupont, Laura Vercauteren, Nadjia Amini, Laurence Lapauw, Evelien Gielen, Sabine Verschueren, Carolina Varon, Walter De Raedt, and Bart Vanrumste	(参考訳) オタゴ運動プログラム (Otago Exercise Program, OEP) は、高齢者の疲労、サルコニア、バランスを改善するためのリハビリテーションプログラムである。 OEPへの患者関与の正確なモニタリングは困難であり、自己申告(日記)は信頼できないことが多い。ウェアラブルセンサーの開発に伴い、ウェアラブルセンサーを用いたヒューマンアクティビティ認識(HAR)システムは医療に革命をもたらした。しかし、OEPの利用は依然として限られた性能を示している。本研究の目的は,高齢者のためのOEPモニタリングシステムを構築することである。 imu(single waist-mounted inertial measurement unit)を装着した高齢者からデータを得た。 2つのデータセットが収集され、1つは実験室で、1つは患者の自宅で収集された。階層システムには2つの段階がある。 1) 深層学習モデルを用いて,患者がoepを行うかどうか,又は10分間のスライディングウインドウを用いて日常生活(adls)のアクティビティを認識する。 2) ステージ1に基づいて6秒スライディングウィンドウを用いて,OEPサブクラスが実行されたことを認識した。その結果、ステージ1では、OEPはウィンドウワイドのf1スコアが0.95以上、インターセクションオーバーユニオン(IoU)のf1スコアが0.85以上と認識できた。ステージ2では, 足関節底屈筋, 膝屈筋, 座屈筋の4つの活動が, 0.8以上のf1スコアで認識された。その結果, 日常生活における単一IMUを用いて, OEPのコンプライアンスを監視できる可能性が示唆された。また、いくつかのOEPサブクラスはさらなる分析のために認識することができる。 Otago Exercise Program (OEP) is a rehabilitation program for older adults to improve frailty, sarcopenia, and balance. Accurate monitoring of patient involvement in OEP is challenging, as self-reports (diaries) are often unreliable. With the development of wearable sensors, Human Activity Recognition (HAR) systems using wearable sensors have revolutionized healthcare. However, their usage for OEP still shows limited performance. The objective of this study is to build an unobtrusive and accurate system to monitor OEP for older adults. Data was collected from older adults wearing a single waist-mounted Inertial Measurement Unit (IMU). Two datasets were collected, one in a laboratory setting, and one at the homes of the patients. A hierarchical system is proposed with two stages: 1) using a deep learning model to recognize whether the patients are performing OEP or activities of daily life (ADLs) using a 10-minute sliding window; 2) based on stage 1, using a 6-second sliding window to recognize the OEP sub-classes performed. The results showed that in stage 1, OEP could be recognized with window-wise f1-scores over 0.95 and Intersection-over-Union (IoU) f1-scores over 0.85 for both datasets. In stage 2, for the home scenario, four activities could be recognized with f1-scores over 0.8: ankle plantarflexors, abdominal muscles, knee bends, and sit-to-stand. The results showed the potential of monitoring the compliance of OEP using a single IMU in daily life. Also, some OEP sub-classes are possible to be recognized for further analysis.	翻訳日:2024-02-07 04:40:00 公開日:2024-02-05
# バイアス分散分解による半教師付き不均衡ノード分類の再検討 Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition ( http://arxiv.org/abs/2310.18765v3 ) ライセンス: Link先を確認	Divin Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, Zengfeng Huang	(参考訳) 本稿では,グラフ構造データ学習のためのグラフニューラルネットワーク(GNN)におけるクラス不均衡問題に対する新しいアプローチを提案する。提案手法は不均衡ノード分類とバイアス分散分解を統合し,データ不均衡とモデル分散を密接に関連付ける理論的枠組みを確立する。また,グラフ増分手法を利用して分散を推定し,不均衡の影響を軽減するために正規化項を設計する。自然に不均衡なデータセットや、パブリックなクラス不均衡なデータセットを含む複数のベンチマークで試験を行い、我々の手法が様々な不均衡なシナリオで最先端の手法よりも優れていることを示した。この研究は、GNNにおける不均衡ノード分類の問題に対処するための新しい理論的視点を提供する。 This paper introduces a new approach to address the issue of class imbalance in graph neural networks (GNNs) for learning on graph-structured data. Our approach integrates imbalanced node classification and Bias-Variance Decomposition, establishing a theoretical framework that closely relates data imbalance to model variance. We also leverage graph augmentation technique to estimate the variance, and design a regularization term to alleviate the impact of imbalance. Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios. This work provides a novel theoretical perspective for addressing the problem of imbalanced node classification in GNNs.	翻訳日:2024-02-07 04:31:51 公開日:2024-02-05
# 理解による生成:論理記号の接地による神経視覚生成 Generating by Understanding: Neural Visual Generation with Logical Symbol Groundings ( http://arxiv.org/abs/2310.17451v2 ) ライセンス: Link先を確認	Yifei Peng, Yu Jin, Zhexu Luo, Yao-Xiang Ding, Wang-Zhou Dai, Zhong Ren, Kun Zhou	(参考訳) 近年の神経視覚生成モデルの成功にもかかわらず、強力なシンボリック推論システムとそれらを統合することは難しい課題である。ひとつはシンボル割当て、すなわち、限定されたラベル付きデータから学習することで推論システムから、神経視覚発生器の潜在因子を意味的意味的象徴的要因にマッピングすることである。 2つ目はルール学習であり、象徴的推論システムを強化するために生成過程を支配する新しいルールを学習する。これら2つの問題に対処するため、帰納的学習フレームワークに基づく神経視覚生成モデルと論理プログラミングシステムを統合するために、ニューロシンボリック・ラーニング・アプローチAbdGen(AbdGen)を提案する。信頼性と効率的なシンボルグラウンド化を実現するため、セマンティック・コードブック内で最寄りのルックアップによる減算提案を生成する量子化減算法を導入した。厳密な規則学習を実現するために,正の場合の誤った規則を排除し,負の場合の情報の少ない規則を同時に回避するために,対照的なメタアブダクション法を提案する。実験の結果、abdgenはベースラインアプローチと比較してシンボル割り当てのためのラベル付きデータを必要とすることが判明した。さらに、AbdGenは、既存のアプローチの能力から外れたデータから、基礎となる論理生成ルールを効果的に学習することができる。このリンクで、 https://github.com/candytalking/AbdGen.com/ というコードがリリースされた。 Despite the great success of neural visual generative models in recent years, integrating them with strong symbolic reasoning systems remains a challenging task. There are two levels of symbol grounding problems among the core challenges: the first is symbol assignment, i.e. mapping latent factors of neural visual generators to semantic-meaningful symbolic factors from the reasoning systems by learning from limited labeled data. The second is rule learning, i.e. learning new rules that govern the generative process to enhance the symbolic reasoning systems. To deal with these two problems, we propose a neurosymbolic learning approach, Abductive visual Generation (AbdGen), for integrating logic programming systems with neural visual generative models based on the abductive learning framework. To achieve reliable and efficient symbol grounding, the quantized abduction method is introduced for generating abduction proposals by the nearest-neighbor lookup within semantic codebooks. To achieve precise rule learning, the contrastive meta-abduction method is proposed to eliminate wrong rules with positive cases and avoid less informative rules with negative cases simultaneously. Experimental results show that compared to the baseline approaches, AbdGen requires significantly less labeled data for symbol assignment. Furthermore, AbdGen can effectively learn underlying logical generative rules from data, which is out of the capability of existing approaches. The code is released at this link: https://github.com/candytalking/AbdGen.	翻訳日:2024-02-07 04:31:19 公開日:2024-02-05
# 教師なしフェデレーション学習の理論に向けて:フェデレーションEMアルゴリズムの非漸近解析 Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms ( http://arxiv.org/abs/2310.15330v2 ) ライセンス: Link先を確認	Ye Tian, Haolei Weng, Yang Feng	(参考訳) 教師なし連合学習アプローチは大きな成功を収めてきたが、教師なし連合学習の領域はいまだに未発見のままである。いくつかの連合EMアルゴリズムが実際に人気を博しているが、理論上の基礎はしばしば欠落している。本稿では,混合モデルの教師なし学習用に設計されたフェデレーション勾配emアルゴリズム(federated gradient em algorithm, fedgrem)について紹介する。一般混合モデルに対する包括的有限サンプル理論を提案し、この一般理論を特定の統計モデルに適用し、モデルパラメータと混合比率の明示的な推定誤差を特徴づける。私たちの理論は、feedgremが既存のfederated emアルゴリズムに拡張された洞察によって、ローカルなシングルタスク学習に勝るタイミングと方法を明らかにします。このことは、実践的な成功と理論的理解のギャップを埋める。シミュレーションの結果,FedGrEMが既存の教師なしフェデレート学習ベンチマークよりも優れていることを示す。 While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. Several federated EM algorithms have gained popularity in practice, however, their theoretical foundations are often lacking. In this paper, we first introduce a federated gradient EM algorithm (FedGrEM) designed for the unsupervised learning of mixture models, which supplements the existing federated EM algorithms by considering task heterogeneity and potential adversarial attacks. We present a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on specific statistical models to characterize the explicit estimation error of model parameters and mixture proportions. Our theory elucidates when and how FedGrEM outperforms local single-task learning with insights extending to existing federated EM algorithms. This bridges the gap between their practical success and theoretical understanding. Our simulation results validate our theory, and demonstrate FedGrEM's superiority over existing unsupervised federated learning benchmarks.	翻訳日:2024-02-07 04:30:07 公開日:2024-02-05
# 同変深重み空間アライメント Equivariant Deep Weight Space Alignment ( http://arxiv.org/abs/2310.13397v2 ) ライセンス: Link先を確認	Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik, Nadav Dym, Haggai Maron	(参考訳) ディープネットワークの置換対称性はモデルマージや類似度推定のような基本的な操作を困難にする。多くの場合、ネットワークの重み、すなわち、その重み間の最適な置換を見つけることは必要である。残念ながら、重量調整はnp問題である。それまでの研究は主にアライメント問題の緩和版を解くことに集中しており、時間を要する方法や準最適解が導かれる。本稿では,アライメントプロセスを加速し,その品質を向上させるために,重みアライメント問題を解決するための新しい枠組みを提案する。この目的のために、まず2つの基本対称性に重み付けが一致することを証明し、これらの対称性を尊重する深いアーキテクチャを提案する。特に、当社のフレームワークはラベル付きデータを必要としない。提案手法の理論的解析を行い,様々なタイプのネットワークアーキテクチャと学習環境におけるDeep-Alignの評価を行う。実験結果から,Deep-Align を用いたフィードフォワードパスは,現在の最適化アルゴリズムと同等のアライメントが得られることがわかった。さらに、アライメントは他の手法の効果的な初期化として利用することができ、収束の大幅な高速化を伴う改善された解をもたらす。 Permutation symmetries of deep networks make basic operations like model merging and similarity estimation challenging. In many cases, aligning the weights of the networks, i.e., finding optimal permutations between their weights, is necessary. Unfortunately, weight alignment is an NP-hard problem. Prior research has mainly focused on solving relaxed versions of the alignment problem, leading to either time-consuming methods or sub-optimal solutions. To accelerate the alignment process and improve its quality, we propose a novel framework aimed at learning to solve the weight alignment problem, which we name Deep-Align. To that end, we first prove that weight alignment adheres to two fundamental symmetries and then, propose a deep architecture that respects these symmetries. Notably, our framework does not require any labeled data. We provide a theoretical analysis of our approach and evaluate Deep-Align on several types of network architectures and learning setups. Our experimental results indicate that a feed-forward pass with Deep-Align produces better or equivalent alignments compared to those produced by current optimization algorithms. Additionally, our alignments can be used as an effective initialization for other methods, leading to improved solutions with a significant speedup in convergence.	翻訳日:2024-02-07 04:29:15 公開日:2024-02-05
# ICU:タスクをイメージキャプションと言語理解に分割した視覚・言語モデリングにおける言語バリアの検索 ICU: Conquering Language Barriers in Vision-and-Language Modeling by Dividing the Tasks into Image Captioning and Language Understanding ( http://arxiv.org/abs/2310.12531v3 ) ライセンス: Link先を確認	Guojun Wu	(参考訳) 多くの多言語視覚言語研究(v&l)は、1つのモデルで多言語および多言語機能を達成することを目的としている。しかし、画像の多言語キャプションの不足が開発を妨げている。この障害を克服するために、V&Lモデルが画像キャプションを英語で実行し、マルチリンガル言語モデル(mLM)がアルトテキストとしてキャプションを取り、言語間理解を行うという、V&Lタスクを2つのステージに分割するICU、画像キャプション理解(Image Caption Understanding)を提案する。多言語処理の負担はV&Lモデルから引き上げられ、mLM上に置かれる。多言語テキストデータが比較的豊富で品質が高いため、ICUはV&Lモデルの言語障壁の克服を容易にすることができる。 iglueベンチマークで9つの言語にまたがる2つのタスクに関する実験で、icuは5つの言語で最新の結果を達成でき、他の言語でも同様の結果が得られることを示した。 Most multilingual vision-and-language (V&L) research aims to accomplish multilingual and multimodal capabilities within one model. However, the scarcity of multilingual captions for images has hindered the development. To overcome this obstacle, we propose ICU, Image Caption Understanding, which divides a V&L task into two stages: a V&L model performs image captioning in English, and a multilingual language model (mLM), in turn, takes the caption as the alt text and performs cross-lingual language understanding. The burden of multilingual processing is lifted off V&L model and placed on mLM. Since the multilingual text data is relatively of higher abundance and quality, ICU can facilitate the conquering of language barriers for V&L models. In experiments on two tasks across 9 languages in the IGLUE benchmark, we show that ICU can achieve new state-of-the-art results for five languages, and comparable results for the rest.	翻訳日:2024-02-07 04:28:55 公開日:2024-02-05
# Infinite Horizon Discounted Reward Markov Decision Processs の一般パラメータ化による自然ポリシー勾配アルゴリズムのサンプル複素性解析の改善 Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes ( http://arxiv.org/abs/2310.11677v2 ) ライセンス: Link先を確認	Washim Uddin Mondal and Vaneet Aggarwal	(参考訳) 無限遠地平線割引報酬マルコフ決定プロセスのためのサンプル効率的な学習アルゴリズムの設計の問題を考える。具体的には, 高速化確率勾配勾配法を用いて自然政策勾配を求める高速化自然政策勾配(ANPG)アルゴリズムを提案する。 anpgは$\mathcal{o}({\epsilon^{-2}})$サンプル複雑性と$\mathcal{o}(\epsilon^{-1})$イテレーション複雑性を達成し、$\epsilon$は最適性エラーを定義する。これは$\log(\frac{1}{\epsilon})$ factorによって最先端のサンプルの複雑さを改善する。 ANPGは1次アルゴリズムであり、既存の文献とは異なり、重要サンプリング(IS)重みの分散が上限となるという検証不可能な仮定を必要としない。 Hessian-free アルゴリズムと IS-free アルゴリズムのクラスでは、ANPG は $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ の係数で最もよく知られたサンプル複雑性を破り、同時に彼らの最先端の反復複雑性と一致する。 We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterization where $\epsilon$ defines the optimality error. This improves the state-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.	翻訳日:2024-02-07 04:28:35 公開日:2024-02-05
# PowerFlowNet:メッセージパッシンググラフニューラルネットワークを用いた電力フロー近似 PowerFlowNet: Power Flow Approximation Using Message Passing Graph Neural Networks ( http://arxiv.org/abs/2311.03415v2 ) ライセンス: Link先を確認	Nan Lin, Stavros Orfanoudakis, Nathan Ordonez Cardenas, Juan S. Giraldo, Pedro P. Vergara	(参考訳) 高精度かつ効率的な電力フロー解析(PF)は、現代の電気ネットワークの運用と計画において重要である。したがって、小規模および大規模の電力ネットワークに対して、正確かつ高速なソリューションを提供するスケーラブルなアルゴリズムが必要である。電力ネットワークをグラフと解釈できるため、グラフニューラルネットワーク(GNN)は、基礎となるグラフ構造を介して情報共有を利用することで、PF近似の精度と速度を改善するための有望なアプローチとして登場した。本研究では,従来のNewton-Raphson法と同じような性能を示すPF近似のための新しいGNNアーキテクチャであるPowerFlowNetを紹介するが,単純なIEEE 14バスシステムでは4倍,フランス高電圧ネットワーク(6470rte)では145倍の高速化を実現している。一方、DC緩和法などの従来の近似手法では、性能と実行時間で大幅に上回っているため、PowerFlowNetは実世界のPF分析に非常に有望なソリューションである。さらに,powerflownetの性能,スケーラビリティ,解釈可能性,アーキテクチャ依存性を徹底的に検証し,詳細な実験評価を行い,本手法の有効性を検証する。この評価は、電力系統解析におけるGNNの挙動と潜在的な応用に関する洞察を与える。 Accurate and efficient power flow (PF) analysis is crucial in modern electrical networks' operation and planning. Therefore, there is a need for scalable algorithms that can provide accurate and fast solutions for both small and large scale power networks. As the power network can be interpreted as a graph, Graph Neural Networks (GNNs) have emerged as a promising approach for improving the accuracy and speed of PF approximations by exploiting information sharing via the underlying graph structure. In this study, we introduce PowerFlowNet, a novel GNN architecture for PF approximation that showcases similar performance with the traditional Newton-Raphson method but achieves it 4 times faster in the simple IEEE 14-bus system and 145 times faster in the realistic case of the French high voltage network (6470rte). Meanwhile, it significantly outperforms other traditional approximation methods, such as the DC relaxation method, in terms of performance and execution time; therefore, making PowerFlowNet a highly promising solution for real-world PF analysis. Furthermore, we verify the efficacy of our approach by conducting an in-depth experimental evaluation, thoroughly examining the performance, scalability, interpretability, and architectural dependability of PowerFlowNet. The evaluation provides insights into the behavior and potential applications of GNNs in power system analysis.	翻訳日:2024-02-07 04:21:00 公開日:2024-02-05
# 一階論理制約付きマルチタスクカーネルベース学習 Multitask Kernel-based Learning with First-Order Logic Constraints ( http://arxiv.org/abs/2311.03340v3 ) ライセンス: Link先を確認	Michelangelo Diligenti, Marco Gori, Marco Maggini and Leonardo Rigutini	(参考訳) 本稿では,一階述語論理節の集合によって表現された背景知識をカーネルマシンに組み込むための一般的なフレームワークを提案する。特に、オブジェクトの集合に定義された複数の述語をサンプルから共同で学習し、それらの値の許容可能な構成に一連のfol制約を課すマルチタスク学習スキームを考える。述語は、入力オブジェクトが表現される特徴空間上で定義され、プリオリまたは適切なカーネルベースの学習者によって近似される。 FOL節をカーネルベースの述語によって計算された出力に対処できる連続的な実装に変換するための一般的なアプローチが提示される。学習問題は、教師付き例と正規化項と、教師付き例と教師なし例の両方に制約を強制するペナルティ項とを組み合わせた損失関数の主元における最適化を必要とする半教師付きタスクとして定式化される。残念なことに、ペナルティ項は凸ではなく、最適化プロセスを妨げる可能性がある。しかし、教師付き例をまず学習し、次に制約を強制する2段階の学習スキーマを使用することで、貧弱な解決策を避けることができる。 In this paper we propose a general framework to integrate supervised and unsupervised examples with background knowledge expressed by a collection of first-order logic clauses into kernel machines. In particular, we consider a multi-task learning scheme where multiple predicates defined on a set of objects are to be jointly learned from examples, enforcing a set of FOL constraints on the admissible configurations of their values. The predicates are defined on the feature spaces, in which the input objects are represented, and can be either known a priori or approximated by an appropriate kernel-based learner. A general approach is presented to convert the FOL clauses into a continuous implementation that can deal with the outputs computed by the kernel-based predicates. The learning problem is formulated as a semi-supervised task that requires the optimization in the primal of a loss function that combines a fitting loss measure on the supervised examples, a regularization term, and a penalty term that enforces the constraints on both the supervised and unsupervised examples. Unfortunately, the penalty term is not convex and it can hinder the optimization process. However, it is possible to avoid poor solutions by using a two stage learning schema, in which the supervised examples are learned first and then the constraints are enforced.	翻訳日:2024-02-07 04:20:38 公開日:2024-02-05
# PhoGPT: ベトナムのためのジェネレーティブプレトレーニング PhoGPT: Generative Pre-training for Vietnamese ( http://arxiv.org/abs/2311.02945v2 ) ライセンス: Link先を確認	Dat Quoc Nguyen, Linh The Nguyen, Chi Tran, Dung Ngoc Nguyen, Dinh Phung, Hung Bui	(参考訳) 我々はベトナム語のための最先端の4Bパラメータ生成モデルシリーズをオープンソースとして公開し、基礎となる訓練済み単言語モデルPhoGPT-4Bとそのチャット変種であるPhoGPT-4B-Chatを含む。ベースモデルであるPhoGPT-4Bは、正確に3.7Bパラメータを持つが、ベトナムの102Bトークンのコーパスのスクラッチから事前訓練されており、文脈長は8192で、20480トークンの語彙を使用している。チャットの変種であるPhoGPT-4B-Chatは、70Kの命令プロンプトとその応答のデータセット上でPhoGPT-4Bを微調整して得られたモデリング出力である。従来のクローズドソースおよびオープンソース7Bパラメータモデルと比較して高い性能を示す。私たちのPhoGPTモデルは、https://github.com/VinAIResearch/PhoGPTで利用可能です。 We open-source a state-of-the-art 4B-parameter generative model series for Vietnamese, which includes the base pre-trained monolingual model PhoGPT-4B and its chat variant, PhoGPT-4B-Chat. The base model, PhoGPT-4B, with exactly 3.7B parameters, is pre-trained from scratch on a Vietnamese corpus of 102B tokens, with an 8192 context length, employing a vocabulary of 20480 token types. The chat variant, PhoGPT-4B-Chat, is the modeling output obtained by fine-tuning PhoGPT-4B on a dataset of 70K instructional prompts and their responses, along with an additional 290K conversations. We demonstrate its strong performance compared to previous closed-source and open-source 7B-parameter models. Our PhoGPT models are available at: https://github.com/VinAIResearch/PhoGPT	翻訳日:2024-02-07 04:19:18 公開日:2024-02-05
# 視覚言語基礎モデルからの分布外ロバスト性蒸留 Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models ( http://arxiv.org/abs/2311.01441v2 ) ライセンス: Link先を確認	Andy Zhou and Jindong Wang and Yu-Xiong Wang and Haohan Wang	(参考訳) 本稿では,知識蒸留とデータ拡張を組み合わせた視覚モデルの堅牢性向上を目的とした,概念的にシンプルで軽量なフレームワークを提案する。我々は, 基礎モデルから蒸留する場合, より大きなモデルでは分散性が強く向上することを示すことにより, より良い教師には役に立たない, という予想に対処した。そこで,本研究では,教師の頑健さを活かした離散逆蒸留法 (dad) を提案し,vqgan を用いてそれを識別し,標準データ拡張法よりも有意義なサンプルを生成する。本研究では,データ拡張設定による知識蒸留におけるロバストな教師の利用に関する理論的枠組みを提案し,分散的ロバスト性,クリーンな精度の高向上を示す。特に,類似技術と比較して計算オーバーヘッドが小さいこと,改良のために他のデータ拡張と組み合わせることが容易である。 We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation. We address the conjecture that larger models do not make for better teachers by showing strong gains in out-of-distribution robustness when distilling from pretrained foundation models. Following this finding, we propose Discrete Adversarial Distillation (DAD), which leverages a robust teacher to generate adversarial examples and a VQGAN to discretize them, creating more informative samples than standard data augmentation techniques. We provide a theoretical framework for the use of a robust teacher in the knowledge distillation with data augmentation setting and demonstrate strong gains in out-of-distribution robustness and clean accuracy across different student architectures. Notably, our method adds minor computational overhead compared to similar techniques and can be easily combined with other data augmentations for further improvements.	翻訳日:2024-02-07 04:18:23 公開日:2024-02-05
# 効果的なロボットイミッタとしての視覚言語基礎モデル Vision-Language Foundation Models as Effective Robot Imitators ( http://arxiv.org/abs/2311.01378v3 ) ライセンス: Link先を確認	Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, Tao Kong	(参考訳) 視覚言語の基礎モデルの最近の進歩は、マルチモーダルデータを理解し、ロボット操作を含む複雑な視覚言語タスクを解決する能力を示している。我々は、ロボットデータに簡単な微調整を施した、既存の視覚言語モデル(VLM)を利用する簡単な方法を模索する。この目的のために,オープンソースのVLMであるOpenFlamingo上に構築されたRoboFlamingoという,シンプルで斬新な視覚言語操作フレームワークを考案した。以前の作品とは異なり、RoboFlamingoはシングルステップの視覚言語理解に事前訓練されたVLMを使用し、明示的なポリシーヘッドで逐次履歴情報をモデル化し、言語条件の操作データセットのみに基づいて模倣学習によって微調整されている。このような分解によってroboflamingoは、オープンループ制御と低パフォーマンスプラットフォームへのデプロイの柔軟性を提供する。テストベンチマークでは,最先端のパフォーマンスをはるかに上回って,ロボット制御にVLMを適用する上で,RoboFlamingoが効果的かつ競争力のある代替手段であることを示す。実験の結果,操作作業におけるVLMの動作に関する興味深い結論が得られた。 roboflamingoは、ロボティクスの操作に費用対効果があり、使いやすいソリューションになる可能性があり、誰もが自分のロボティクスポリシーを微調整できる能力があると信じている。 Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics manipulation. We seek a straightforward way of making use of existing vision-language models (VLMs) with simple fine-tuning on robotics data. To this end, we derive a simple and novel vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. Unlike prior works, RoboFlamingo utilizes pre-trained VLMs for single-step vision-language comprehension, models sequential history information with an explicit policy head, and is slightly fine-tuned by imitation learning only on language-conditioned manipulation datasets. Such a decomposition provides RoboFlamingo the flexibility for open-loop control and deployment on low-performance platforms. By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control. Our extensive experimental results also reveal several interesting conclusions regarding the behavior of different pre-trained VLMs on manipulation tasks. We believe RoboFlamingo has the potential to be a cost-effective and easy-to-use solution for robotics manipulation, empowering everyone with the ability to fine-tune their own robotics policy.	翻訳日:2024-02-07 04:17:36 公開日:2024-02-05
# 法領域におけるテキスト分類への共通アプローチのエネルギーベース比較分析 An energy-based comparative analysis of common approaches to text classification in the Legal domain ( http://arxiv.org/abs/2311.01256v2 ) ライセンス: Link先を確認	Sinan Gultekin and Achille Globo and Andrea Zugarini and Marco Ernandes and Leonardo Rigutini	(参考訳) ほとんどの機械学習研究は、パフォーマンスの観点から最高のソリューションを評価します。しかし、最高のパフォーマンスモデルを求めるレースでは、多くの重要な側面がしばしば見過ごされ、反対に、慎重に検討されるべきである。実際、異なるアプローチ間のパフォーマンスのギャップは無視できることもあるが、生産コスト、エネルギー消費量、カーボンフットプリントといった要因を考慮する必要がある。大規模言語モデル(LLM)は、学術や産業におけるNLP問題に対処するために広く採用されている。本稿では,LexGLUEベンチマークにおけるLCMと従来のアプローチ(例えばSVM)の詳細な定量的比較を行い,その性能(標準指標)と,時間,消費電力,コストといった代替指標(カーボンフットプリント)の両方を考慮に入れた。本分析では,異なる実装手順に従い,異なるリソースを必要とするため,プロトタイピングフェーズ(トレーニング検証テストの繰り返しによるモデル選択)と本運用フェーズを別々に検討した。その結果、最も単純なアルゴリズムはLLMに非常に近い性能を達成できるが、消費電力が極めて少なく、リソースの要求も少ないことが示唆された。その結果、機械学習(ML)ソリューションの選択にさらなる評価を加えることが示唆された。 Most Machine Learning research evaluates the best solutions in terms of performance. However, in the race for the best performing model, many important aspects are often overlooked when, on the contrary, they should be carefully considered. In fact, sometimes the gaps in performance between different approaches are neglectable, whereas factors such as production costs, energy consumption, and carbon footprint must take into consideration. Large Language Models (LLMs) are extensively adopted to address NLP problems in academia and industry. In this work, we present a detailed quantitative comparison of LLM and traditional approaches (e.g. SVM) on the LexGLUE benchmark, which takes into account both performance (standard indices) and alternative metrics such as timing, power consumption and cost, in a word: the carbon-footprint. In our analysis, we considered the prototyping phase (model selection by training-validation-test iterations) and in-production phases separately, since they follow different implementation procedures and also require different resources. The results indicate that very often, the simplest algorithms achieve performance very close to that of large LLMs but with very low power consumption and lower resource demands. The results obtained could suggest companies to include additional evaluations in the choice of Machine Learning (ML) solutions.	翻訳日:2024-02-07 04:17:10 公開日:2024-02-05
# 強化学習のための拡散モデル:調査 Diffusion Models for Reinforcement Learning: A Survey ( http://arxiv.org/abs/2311.01223v3 ) ライセンス: Link先を確認	Zhengbang Zhu, Hanye Zhao, Haoran He, Yichao Zhong, Shenyu Zhang, Haoquan Guo, Tingting Chen, Weinan Zhang	(参考訳) 拡散モデルは、サンプル品質とトレーニング安定性において、以前の生成モデルを超える。最近の研究は、強化学習(RL)ソリューションの改善における拡散モデルの利点を示している。この調査は、この新興分野の概要を提供し、新たな研究の道を開くことを目的としている。まず,RLアルゴリズムが抱えるいくつかの課題について検討する。次に, rlにおける拡散モデルの役割に基づく既存手法の分類を行い, 今後の課題について考察する。さらに,様々なRL関連タスクにおける拡散モデルの適用について概説する。最後に,調査を終了し,今後の研究方向性に関する洞察を提供する。 rlの拡散モデルを利用するため、論文やその他の関連リソースのためのgithubリポジトリを積極的にメンテナンスしています。 Diffusion models surpass previous generative models in sample quality and training stability. Recent works have shown the advantages of diffusion models in improving reinforcement learning (RL) solutions. This survey aims to provide an overview of this emerging field and hopes to inspire new avenues of research. First, we examine several challenges encountered by RL algorithms. Then, we present a taxonomy of existing methods based on the roles of diffusion models in RL and explore how the preceding challenges are addressed. We further outline successful applications of diffusion models in various RL-related tasks. Finally, we conclude the survey and offer insights into future research directions. We are actively maintaining a GitHub repository for papers and other related resources in utilizing diffusion models in RL: https://github.com/apexrl/Diff4RLSurvey.	翻訳日:2024-02-07 04:16:48 公開日:2024-02-05
# 整数計画法による多項関数の多項回帰 Piecewise Polynomial Regression of Tame Functions via Integer Programming ( http://arxiv.org/abs/2311.13544v2 ) ライセンス: Link先を確認	Gilles Bareilles, Johannes Aspman, Jiri Nemecek, Jakub Marecek	(参考訳) 我々は、分別多項式関数を持つ非滑らかな非凸関数のクラスであるいわゆる「為関数」の近似を考える。 Tame関数は、すべての共通の活性化を伴うディープニューラルネットワークのトレーニングで遭遇する関数、混合整数プログラムの値関数、小さな分子の波動関数など、幅広い用途に現れる。任意の全次元立方体上の任意のセグメント数を持つ分割多項式関数により、タメ関数の近似の質を限定する。また,数次多項式回帰の混合整数計画法を初めて提示する。これらを合わせて、テーム関数を推定することができる。有望な計算結果を示す。 We consider approximating so-called tame functions, a class of nonsmooth, nonconvex functions, with piecewise polynomial functions. Tame functions appear in a wide range of applications: functions encountered in the training of deep neural networks with all common activations, value functions of mixed-integer programs, or wave functions of small molecules. We bound the quality of approximation of a tame function by a piecewise polynomial function with a given number of segments on any full-dimensional cube. We also present the first ever mixed-integer programming formulation of piecewise polynomial regression. Together, these can be used to estimate tame functions. We demonstrate promising computational results.	翻訳日:2024-02-07 04:09:12 公開日:2024-02-05
# 跳躍面におけるMLモデルの進化と維持の解析 Analyzing the Evolution and Maintenance of ML Models on Hugging Face ( http://arxiv.org/abs/2311.13380v2 ) ライセンス: Link先を確認	Joel Casta\~no, Silverio Mart\'inez-Fern\'andez, Xavier Franch, Justus Bogner	(参考訳) huging face(hf)は、機械学習(ml)モデルの開発と共有のための重要なプラットフォームとして確立された。このリポジトリマイニング調査は、HF Hub API経由で収集されたデータを使用して380,000以上のモデルに分類し、HFにホストされたモデルを中心に、コミュニティの関与、進化、メンテナンスを探求することを目的としている。まず、HFの成長と人気、MLドメインのトレンド、フレームワークの使用状況、著者グループ化、使用するタグとデータセットの進化について調べる。モデルカード記述のテキスト解析を通じて,開発者コミュニティ内で広く普及しているテーマや洞察の特定も行なっています。本研究は,MLモデルの保守状態を評価するとともに,コミットメッセージをさまざまなカテゴリ(補正,完全,適応)に分類し,コミットメトリクスの開発段階にわたる進化を分析し,複数の属性に基づいてモデルのメンテナンス状態を推定する新たな分類システムを提案する。本研究の目的は、HFのようなプラットフォーム上での将来のモデル開発戦略に影響を及ぼすであろうMLモデルのメンテナンスと進化に関する貴重な洞察を提供することである。 Hugging Face (HF) has established itself as a crucial platform for the development and sharing of machine learning (ML) models. This repository mining study, which delves into more than 380,000 models using data gathered via the HF Hub API, aims to explore the community engagement, evolution, and maintenance around models hosted on HF, aspects that have yet to be comprehensively explored in the literature. We first examine the overall growth and popularity of HF, uncovering trends in ML domains, framework usage, authors grouping and the evolution of tags and datasets used. Through text analysis of model card descriptions, we also seek to identify prevalent themes and insights within the developer community. Our investigation further extends to the maintenance aspects of models, where we evaluate the maintenance status of ML models, classify commit messages into various categories (corrective, perfective, and adaptive), analyze the evolution across development stages of commits metrics and introduce a new classification system that estimates the maintenance status of models based on multiple attributes. This study aims to provide valuable insights about ML model maintenance and evolution that could inform future model development strategies on platforms like HF.	翻訳日:2024-02-07 04:09:03 公開日:2024-02-05
# DURELアノテーションツール:人間と計算による意味的近接度、センスクラスタ、意味的変化の測定 The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change ( http://arxiv.org/abs/2311.12664v2 ) ライセンス: Link先を確認	Dominik Schlechtweg, Shafqat Mumtaz Virk, Pauline Sander, Emma Sk\"oldberg, Lukas Theuer Linke, Tuo Zhang, Nina Tahmasebi, Jonas Kuhn, Sabine Schulte im Walde	(参考訳) 本稿では,オンラインのオープンソースインターフェースに単語使用間の意味的近接のアノテーションを実装するDURelツールを提案する。このツールは、標準的なヒューマンアノテーションと計算アノテーションをサポートし、Word-in-Contextモデルによる最近の進歩に基づいている。アノテータ判断は自動グラフクラスタリング技術でクラスタ化され、分析のために視覚化される。これにより、使用ペア間の単純で直感的なマイクロタスクの判断で単語感覚を測定することができる。このツールは、アノテータ間の合意を比較する追加の機能を提供し、得られた判断のサブジェクティビティを保証し、感覚周波数分布、意味変化、時間の経過に伴う感覚の変化についての洞察を与える要約統計を計算する。 We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics giving insights into sense frequency distributions, semantic variation or changes of senses over time.	翻訳日:2024-02-07 04:08:18 公開日:2024-02-05
# 分解に基づく多目的強化学習:分類学と枠組み Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework ( http://arxiv.org/abs/2311.12495v2 ) ライセンス: Link先を確認	Florian Felten and El-Ghazali Talbi and Gr\'egoire Danoy	(参考訳) 多目的強化学習(MORL)は、対立する目的の間で異なる妥協を行う政策を求めることにより、従来のRLを拡張している。近年のMORLへの関心の高まりは様々な研究や解法をもたらし、しばしば分解(MOO/D)に基づく多目的最適化における既存の知識から引き出された。しかし、既存の文献では、RLとMOO/Dの両方に基づいた明確な分類が欠落している。その結果、morlの研究者は、標準化された分類がないため、より広い文脈で貢献を分類しようとすると困難に陥る。そこで本稿では,rlとmooの文献を橋渡しする新しい手法である分解法(morl/d)に基づく多目的強化学習を提案する。 MORL/Dの包括的分類法が提示され、既存のおよび潜在的なMORL作品の分類のための構造化された基盤を提供する。導入された分類法は、MORLの研究を精査し、明確に分類することで明確さと簡潔さを高めるために用いられる。さらに,分類から派生した柔軟な枠組みを導入する。このフレームワークは、RLとMOO/Dの両方のツールを使用して、多様なインスタンス化を実現する。その汎用性は異なる構成で実装し、対照的なベンチマーク問題に基づいて評価することで実証される。その結果, MORL/Dのインスタンス化は, 現状技術に匹敵する性能を示した。分類と枠組みを提示することにより,本論文は総合的な視点とMORLの統一語彙を提供する。これによりアルゴリズムによる貢献の特定が容易になるだけでなく、モルにおける新しい研究の道の基礎となる。 Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces multi-objective reinforcement learning based on decomposition (MORL/D), a novel methodology bridging the literature of RL and MOO. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL.	翻訳日:2024-02-07 04:08:04 公開日:2024-02-05
# 言語モデルと人間の脳との相違 Divergences between Language Models and Human Brains ( http://arxiv.org/abs/2311.09308v2 ) ライセンス: Link先を確認	Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J. Tarr, Leila Wehbe	(参考訳) 機械と人間は同じような方法で言語を処理するのか? 近年の研究では、言語モデルの内部表現(LM)を用いて脳信号が効果的に予測できることが確認されている。このような結果は、lmsと人間の脳との計算原理の共有を反映していると考えられているが、lmsと人間の言語表現や使用方法にも明確な違いがある。本研究では,脳磁図(MEG)による言語に対するLM表現と人間の脳反応の差異を,被験者が物語を読んだり聴いたりする2つのデータセットで調べることで,人間と機械語処理の相違を系統的に検討する。データ駆動型アプローチを用いて、LMによってうまく捉えられていない2つのドメインを識別する。次に、これらのドメインを人間の行動実験で検証し、これらのドメイン上の微調整LMが人間の脳反応との整合性を改善することを示す。 Do machines and humans process language in similar ways? Recent research has hinted in the affirmative, finding that brain signals can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use language. In this work, we systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. Using a data-driven approach, we identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense. We then validate these domains with human behavioral experiments and show that fine-tuning LMs on these domains can improve their alignment with human brain responses.	翻訳日:2024-02-07 04:05:40 公開日:2024-02-05
# 離散整合型ASRのためのデコーダのみ変換器の損失マスキングは不要 Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR ( http://arxiv.org/abs/2311.04534v2 ) ライセンス: Link先を確認	Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang	(参考訳) 近年, speechgpt, viola, audiopalmなどの統一音声テキストモデルが様々な音声タスクにおいて顕著な性能を発揮している。これらのモデルは音声信号をトークンに識別し(音声識別)、テキストと音声のトークンの両方に共有語彙を使用する。そして、1つのデコーダのみのトランスフォーマーを複数の音声タスクで訓練する。しかし、これらのモデルは音声トークン間の依存を無視したASRタスクのロスマスキング戦略に依存している。本稿では,テキストと同様に自己回帰的に音声トークンをモデル化することを提案する。従来のクロスエントロピー損失を入力音声トークンに適用しても,ロスマスキング方式よりもASR性能が常に向上しないことがわかった。この問題に対処するため,スムーズなラベル付きKL分散損失を音声トークンに適用する,Smoothed Label Distillation (SLD) という新しい手法を提案する。実験により,sldは音声識別法が異なるasrタスクにおいて,音声トークンを効果的にモデル化し,デコーダのみのトランスフォーマーの損失マスキングよりも優れることを示した。ソースコードは以下の通り。 https://github.com/alibaba-damo-academy/SpokenNLP/tree/main/sld Recently, unified speech-text models, such as SpeechGPT, VioLA, and AudioPaLM, have achieved remarkable performance on various speech tasks. These models discretize speech signals into tokens (speech discretization) and use a shared vocabulary for both text and speech tokens. Then they train a single decoder-only Transformer on a mixture of speech tasks. However, these models rely on the Loss Masking strategy for the ASR task, which ignores the dependency among speech tokens. In this paper, we propose to model speech tokens in an autoregressive way, similar to text. We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach. To address this issue, we propose a novel approach denoted Smoothed Label Distillation (SLD), which applies a KL divergence loss with smoothed labels on speech tokens. Our experiments show that SLD effectively models speech tokens and outperforms Loss Masking for decoder-only Transformers in ASR tasks with different speech discretization methods. The source code can be found here: https://github.com/alibaba-damo-academy/SpokenNLP/tree/main/sld	翻訳日:2024-02-07 04:05:01 公開日:2024-02-05
# LLMは人間の反応バイアスを示すか? 調査設計における事例研究 Do LLMs exhibit human-like response biases? A case study in survey design ( http://arxiv.org/abs/2311.04076v4 ) ライセンス: Link先を確認	Lindia Tjuatja, Valerie Chen, Sherry Tongshuang Wu, Ameet Talwalkar, Graham Neubig	(参考訳) 大規模言語モデル(LLM)の能力が向上するにつれて、調査や世論調査などの主観的ラベルが望まれる現実世界のタスクにおいて、LLMを人間のためのプロキシとして使用する可能性への興奮が高まっている。しかし興味深いことに、人間は反応バイアスの形での変化を指示する感度も示しています。したがって、LLMが人間の意見の近似に使用されるのであれば、LLMが人間の反応バイアスを反映する程度を調査する必要があると論じる。本研究では,「プロンプット」の語句の置換による人間の反応バイアスが広範に研究されている事例研究として,サーベイデザインを用いた。社会心理学における先行研究から,我々はデータセットを設計し,LLMが人間的な反応バイアスを示すかどうかを評価する枠組みを提案する。 9つのモデルの包括的評価からは,一般的なオープンおよび商用のllmは,一般的に人間的な行動を反映していないことが分かる。これらの矛盾は、微調整されたモデルでは顕著である。さらに,モデルがヒトと同じ方向において有意な変化を示す場合でも,ヒトの有意な変化を誘発しない摂動も同様の変化をもたらす可能性があることを見出した。これらの結果は、アノテーションパイプラインの一部で人間を置換するためにLLMを使うことの潜在的な落とし穴を強調し、さらにモデル行動のよりきめ細かい特徴付けの重要性を強調している。私たちのコード、データセット、収集したサンプルはhttps://github.com/lindiatjuatja/biasmonkeyで入手できます。 As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling. One widely-cited barrier to the adoption of LLMs is their sensitivity to prompt wording - but interestingly, humans also display sensitivities to instruction changes in the form of response biases. As such, we argue that if LLMs are going to be used to approximate human opinions, it is necessary to investigate the extent to which LLMs also reflect human response biases, if at all. In this work, we use survey design as a case study, where human response biases caused by permutations in wordings of "prompts" have been extensively studied. Drawing from prior work in social psychology, we design a dataset and propose a framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior. These inconsistencies tend to be more prominent in models that have been instruction fine-tuned. Furthermore, even if a model shows a significant change in the same direction as humans, we find that perturbations that are not meant to elicit significant changes in humans may also result in a similar change. These results highlight the potential pitfalls of using LLMs to substitute humans in parts of the annotation pipeline, and further underscore the importance of finer-grained characterizations of model behavior. Our code, dataset, and collected samples are available at https://github.com/lindiatjuatja/BiasMonkey	翻訳日:2024-02-07 04:04:38 公開日:2024-02-05
# Web上の逐次タスク構成における言語モデルエージェントの表現限界 Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web ( http://arxiv.org/abs/2311.18751v2 ) ライセンス: Link先を確認	Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, Izzeddin Gur	(参考訳) 言語モデルエージェント(LMA)は最近、ミューティステップ決定タスクにおける有望なパラダイムとして登場し、人間や他の強化学習エージェントよりも優れています。約束にもかかわらず、しばしばタスクの組み合わせを伴う実世界のアプリケーションでの彼らのパフォーマンスは、まだ過小評価されている。本稿では,より現実的な仮定を反映した新しい構成型web自動化タスクであるcompwob -- 50について紹介する。既存の推進型lmas (gpt-3.5-turboまたはgpt-4) はベースタスクの平均成功率94.0%を達成するが, 構成タスクでは24.9%に低下する。一方、転送されたlmas(ベースタスクのみに調整)は一般化のギャップが小さく、85.4%から54.8%に低下した。タスク間のデータ分散のバランスをとることで、MiniWoBで人間レベルのパフォーマンス(95.2%)を超え、CompWoB(61.5%)で最高のゼロショットパフォーマンスを達成するHTML-T5++をトレーニングします。これらは、タスク構成性のための小規模の微調整および転送モデルの約束を強調するが、それらのパフォーマンスは、組み合わせ順序を変更する異なる命令構成の下でさらに低下する。 LMAの最近の顕著な成功とは対照的に、我々のベンチマークと詳細な分析は、実世界の展開において、ロバストでタスク構成性に一般化可能なLMAを構築することの必要性を強調している。 Language model agents (LMA) recently emerged as a promising paradigm on muti-step decision making tasks, often outperforming humans and other reinforcement learning agents. Despite the promise, their performance on real-world applications that often involve combinations of tasks is still underexplored. In this work, we introduce a new benchmark, called CompWoB -- 50 new compositional web automation tasks reflecting more realistic assumptions. We show that while existing prompted LMAs (gpt-3.5-turbo or gpt-4) achieve 94.0% average success rate on base tasks, their performance degrades to 24.9% success rate on compositional tasks. On the other hand, transferred LMAs (finetuned only on base tasks) show less generalization gap, dropping from 85.4% to 54.8%. By balancing data distribution across tasks, we train a new model, HTML-T5++, that surpasses human-level performance (95.2%) on MiniWoB, and achieves the best zero-shot performance on CompWoB (61.5%). While these highlight the promise of small-scale finetuned and transferred models for task compositionality, their performance further degrades under different instruction compositions changing combinational order. In contrast to the recent remarkable success of LMA, our benchmark and detailed analysis emphasize the necessity of building LMAs that are robust and generalizable to task compositionality for real-world deployment.	翻訳日:2024-02-07 03:56:31 公開日:2024-02-05
# polypセグメンテーションのためのディープラーニングに関する調査:技術,課題,今後の動向 A Survey on Deep Learning for Polyp Segmentation: Techniques, Challenges and Future Trends ( http://arxiv.org/abs/2311.18373v2 ) ライセンス: Link先を確認	Jiaxin Mei, Tao Zhou, Kaiwen Huang, Yizhe Zhang, Yi Zhou, Ye Wu, Huazhu Fu	(参考訳) ポリープの早期検出と評価は大腸癌(CRC)の予防と治療において重要な役割を担っている。ポリープセグメンテーション(Polyp segmentation)は、臨床医が正確なポリープ領域の特定とセグメンテーションを支援する効果的なソリューションを提供する。過去には、色、テクスチャ、形状など、手作業で抽出された低レベルな特徴を頼りにすることが多かった。ディープラーニングの出現に伴い、深層学習ネットワークに基づく医用画像分割アルゴリズムがますます登場し、この分野では大きな進歩を遂げている。本稿では,ポリプセグメンテーションアルゴリズムの包括的レビューを行う。まず,手作業で抽出した特徴と深いセグメンテーションアルゴリズムに基づく従来のアルゴリズムをレビューし,そのトピックに関連するベンチマークデータセットを詳述した。具体的には,研究トピックの問題点とネットワーク構造の違いを考慮して,最近のディープラーニングモデルとポリプサイズに基づく結果の包括的評価を行う。最後に, この分野におけるポリプセグメンテーションの課題と今後の動向について論じる。収集したモデル、ベンチマークデータセット、ソースコードリンクはすべてhttps://github.com/taozh2017/Awesome-Polyp-Segmentationで公開されている。 Early detection and assessment of polyps play a crucial role in the prevention and treatment of colorectal cancer (CRC). Polyp segmentation provides an effective solution to assist clinicians in accurately locating and segmenting polyp regions. In the past, people often relied on manually extracted lower-level features such as color, texture, and shape, which often had issues capturing global context and lacked robustness to complex scenarios. With the advent of deep learning, more and more outstanding medical image segmentation algorithms based on deep learning networks have emerged, making significant progress in this field. This paper provides a comprehensive review of polyp segmentation algorithms. We first review some traditional algorithms based on manually extracted features and deep segmentation algorithms, then detail benchmark datasets related to the topic. Specifically, we carry out a comprehensive evaluation of recent deep learning models and results based on polyp sizes, considering the pain points of research topics and differences in network structures. Finally, we discuss the challenges of polyp segmentation and future trends in this field. The models, benchmark datasets, and source code links we collected are all published at https://github.com/taozh2017/Awesome-Polyp-Segmentation.	翻訳日:2024-02-07 03:55:41 公開日:2024-02-05
# 過パラメータ化下におけるシャープネス認識最小化の解析 Analyzing Sharpness-aware Minimization under Overparameterization ( http://arxiv.org/abs/2311.17539v2 ) ライセンス: Link先を確認	Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee	(参考訳) 過パラメータニューラルネットワークのトレーニングは、同じレベルのトレーニング損失にもかかわらず、異なる一般化能力の最小化をもたらす可能性がある。極小の鋭さと一般化誤差の相関関係を示唆する証拠から、より一般化可能な解として平坦な極小を明示的に見つけるための最適化手法の開発に努力が払われている。しかし、このシャープネス認識最小化(SAM)戦略は、過パラメータ化による影響について、まだ研究されていない。本研究では, SAMの過パラメータ化過程を解析し, SAMに対する過パラメータ化の影響を示す経験的および理論的結果の両方を提示する。具体的には、様々な領域にわたる広範な数値実験を行い、SAMが過パラメータ化の増加の恩恵を受け続けている一貫した傾向が存在することを示す。また,過小パラメータ化の効果が,一連のアブレーション研究と並行して,より明瞭に,あるいは減少する,説得力のある事例も見いだした。理論的には、最適化に標準手法を用い、確率的条件下でのオーバーパラメータ化の下でSAMが線形収束率を達成できることを証明する。また,2層ネットワークの解析に基づいて,過パラメータ化がsamの一般化を改善できること,さらに,samが発見する線形安定なミニマムがsgdよりも均一なヘッシアンモーメントを持つことを示した。 Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss. With evidence that suggests a correlation between sharpness of minima and their generalization errors, increasing efforts have been made to develop an optimization method to explicitly find flat minima as more generalizable solutions. However, this sharpness-aware minimization (SAM) strategy has not been studied much yet as to whether and how it is affected by overparameterization. In this work, we analyze SAM under overparameterization of varying degrees and present both empirical and theoretical results that indicate a critical influence of overparameterization on SAM. Specifically, we conduct extensive numerical experiments across various domains, and show that there exists a consistent trend that SAM continues to benefit from increasing overparameterization. We also discover compelling cases where the effect of overparameterization is more pronounced or even diminished along with a series of ablation studies. On the theoretical side, we use standard techniques in optimization and prove that SAM can achieve a linear rate of convergence under overparameterization in a stochastic setting. We also show that overparameterization can improve generalization of SAM based on an analysis of two-layer networks, and further, that the linearly stable minima found by SAM have more uniform Hessian moments compared to SGD.	翻訳日:2024-02-07 03:55:07 公開日:2024-02-05
# 単一および積分多スペクトル空中画像の融合 Fusion of Single and Integral Multispectral Aerial Images ( http://arxiv.org/abs/2311.17515v3 ) ライセンス: Link先を確認	Mohamed Youssef, Oliver Bimber	(参考訳) 複数の入力チャネルから最も重要なサルエント情報を適切に融合することは、多くの航空画像処理に不可欠である。マルチスペクトル記録は様々なスペクトル範囲の特徴を呈するが、合成開口センシングは閉塞した特徴を可視化する。我々は,従来の空中画像から最も重要な特徴を,合成開口センシングによる閉塞除去の結果として得られる積分空中画像とを融合する,第1および第2次ハイブリッド(モデルと学習に基づく)アーキテクチャを提案する。環境の空間的参照と、通常、密集した植生によって隠される、目立たない標的の特徴を組み合わせる。本手法は, 相互情報, 視覚情報忠実度, ピーク信号対雑音比などの共通指標において, 最先端の2チャンネル融合と多チャンネル融合のアプローチを視覚的, 定量的に上回る。提案モデルは、手動で調整したパラメータを必要とせず、任意の数とスペクトルチャネルの組み合わせに拡張することができ、異なるユースケースに対応するために再構成可能である。本研究では,探索救助,山火事検出,野生生物観察の例を示す。 An adequate fusion of the most significant salient information from multiple input channels is essential for many aerial imaging tasks. While multispectral recordings reveal features in various spectral ranges, synthetic aperture sensing makes occluded features visible. We present a first and hybrid (model- and learning-based) architecture for fusing the most significant features from conventional aerial images with the ones from integral aerial images that are the result of synthetic aperture sensing for removing occlusion. It combines the environment's spatial references with features of unoccluded targets that would normally be hidden by dense vegetation. Our method out-beats state-of-the-art two-channel and multi-channel fusion approaches visually and quantitatively in common metrics, such as mutual information, visual information fidelity, and peak signal-to-noise ratio. The proposed model does not require manually tuned parameters, can be extended to an arbitrary number and combinations of spectral channels, and is reconfigurable for addressing different use cases. We demonstrate examples for search-and-rescue, wildfire detection, and wildlife observation.	翻訳日:2024-02-07 03:54:44 公開日:2024-02-05
# ChatTraffic:拡散モデルによるテキストからトラフィック生成 ChatTraffic: Text-to-Traffic Generation via Diffusion Model ( http://arxiv.org/abs/2311.16203v3 ) ライセンス: Link先を確認	Chengyang Zhang, Yong Zhang, Qitan Shao, Bo Li, Yisheng Lv, Xinglin Piao, Baocai Yin	(参考訳) 交通予測は、インテリジェントトランスポーテーションシステム(ITS)の最も重要な基盤の1つである。従来のトラフィック予測手法は、過去のトラフィックデータのみに頼ってトラフィックトレンドを予測し、2つの大きな課題に直面している。 1)異常事象に対する感受性。 2)長期予測における性能の制限。そこで本研究では,交通システムを記述するテキストと生成モデルを組み合わせることで,トラヒック生成を実現し,そのタスクをTTG(Text-to-Traffic Generation)と呼ぶ。 TTGタスクの鍵となる課題は、交通状況を生成するために、テキストを道路ネットワークの空間構造と交通データを関連付ける方法である。そこで本研究では,テキスト・トラフィック生成のための最初の拡散モデルChatTrafficを提案する。合成データと実データとの整合性を保証するため,グラフ畳み込みネットワーク(GCN)を用いて拡散モデルを拡張し,交通データの空間的相関を抽出する。さらに,TTGタスクのためのテキスト-グラフペアを含む大規模データセットを構築する。私たちは、リリース済みのデータセットを質的かつ定量的にベンチマークしました。実験の結果,チャットトラフィックはテキストから現実的な交通状況を生成することができた。私たちのコードとデータセットはhttps://github.com/chyazhang/chattrafficで利用可能です。 Traffic prediction is one of the most significant foundations in Intelligent Transportation Systems (ITS). Traditional traffic prediction methods rely only on historical traffic data to predict traffic trends and face two main challenges. 1) insensitivity to unusual events. 2) limited performance in long-term prediction. In this work, we explore how generative models combined with text describing the traffic system can be applied for traffic generation, and name the task Text-to-Traffic Generation (TTG). The key challenge of the TTG task is how to associate text with the spatial structure of the road network and traffic data for generating traffic situations. To this end, we propose ChatTraffic, the first diffusion model for text-to-traffic generation. To guarantee the consistency between synthetic and real data, we augment a diffusion model with the Graph Convolutional Network (GCN) to extract spatial correlations of traffic data. In addition, we construct a large dataset containing text-traffic pairs for the TTG task. We benchmarked our model qualitatively and quantitatively on the released dataset. The experimental results indicate that ChatTraffic can generate realistic traffic situations from the text. Our code and dataset are available at https://github.com/ChyaZhang/ChatTraffic.	翻訳日:2024-02-07 03:54:27 公開日:2024-02-05
# ロバストなインストラクションチューニングのためのデータ多様性 Data Diversity Matters for Robust Instruction Tuning ( http://arxiv.org/abs/2311.14736v2 ) ライセンス: Link先を確認	Alexander Bukharin and Tuo Zhao	(参考訳) 近年の研究では、高品質で多様な命令チューニングデータセットをキュレートすることで、命令追従能力を大幅に改善できることが示されている。しかし、このようなデータセットの作成は困難であり、ほとんどの作業は手動のキュレーションやプロプライエタリな言語モデルに依存している。インストラクションチューニングの多様性をどのように定義できるか、多様性と品質が相互に依存するのか、データセットの品質と多様性をどのように最適化できるか、まだ明確ではないため、自動データキュレーションは難しい。これらの問題を解決するため、我々はQDIT(Quality-Diversity Instruction Tuning)という新しいアルゴリズムを提案する。 QDITは、データセットの多様性と品質を同時に制御する簡単な方法を提供する。本研究は,データ多様性と品質の間に自然なトレードオフが存在すること,およびデータ多様性の増加が,パフォーマンスの続く最悪のケース命令を大幅に改善し,ロバスト性を向上させること,の2つの重要な洞察を導く。大規模命令チューニングデータセットにおけるQDITの性能を検証することにより,品質駆動型データ選択と比較して,最悪のケースや平均ケースのパフォーマンスを大幅に向上させることができる。 Recent works have shown that by curating high quality and diverse instruction tuning datasets, we can significantly improve instruction-following capabilities. However, creating such datasets is difficult and most works rely on manual curation or proprietary language models. Automatic data curation is difficult as it is still not clear how we can define diversity for instruction tuning, how diversity and quality depend on one other, and how we can optimize dataset quality and diversity. To resolve these issue, we propose a new algorithm, Quality-Diversity Instruction Tuning (QDIT). QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance. From this study we draw two key insights (1) there is a natural tradeoff between data diversity and quality and (2) increasing data diversity significantly improves the worst case instruction following performance, therefore improving robustness. We validate the performance of QDIT on several large scale instruction tuning datasets, where we find it can substantially improve worst and average case performance compared to quality-driven data selection.	翻訳日:2024-02-07 03:53:40 公開日:2024-02-05
# サブリニア空間における超長期トークン注意近似のためのワンパスストリーミングアルゴリズム One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space ( http://arxiv.org/abs/2311.14652v2 ) ライセンス: Link先を確認	Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang	(参考訳) 注意計算は、$O(n^2)$の時間的複雑さと$O(n^2)$の空間的複雑さの両方を同時に取り、膨大な計算資源を必要とする長いコンテキストを含むストリーミングアプリケーションに大規模言語モデル(LLM)をデプロイする。先日のOpenAI DevDay (2023年2月6日)で、OpenAIは、128Kのドキュメントをサポート可能な新しいモデルをリリースした。 Query, Key, and Value 行列 $Q, K, V \in \mathbb{R}^{n \times d}$ の単層自己アテンションを考えると、多項式法は注意出力 $T \in \mathbb{R}^{n \times d}$ を近似する。 u_1, u_2 \in \mathbb{r}^{n \times t}$ to expedite attention ${\sf attn}(q, k, v)$ computation within $n^{1+o(1)}$ time executions. でこれを実現している。それにもかかわらず、近似された注目行列$U_1U_2^\top \in \mathbb{R}^{n \times n}$はまだ$O(n^2)$空間を必要とするため、メモリ使用量は大幅に増加する。これらの課題に対応して,ストリーミング方式でデータの1パスのみを読み取る新しいアルゴリズムを導入する。この方法は3つのスケッチ行列を格納するためにサブ線形空間$o(n)$を使用し、正確な$K、V$ストレージの必要性を緩和する。特に,超長トークンを用いたメモリ効率の優れた性能を示す。トークン長の$n$が増加すると、メモリ使用量がほぼ一定である間、エラー保証は減少します。このユニークな属性は、ストリーミングアプリケーションにおけるLLMの効率的な処理における我々の技術の可能性を示している。 Attention computation takes both the time complexity of $O(n^2)$ and the space complexity of $O(n^2)$ simultaneously, which makes deploying Large Language Models (LLMs) in streaming applications that involve long contexts requiring substantial computational resources. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we focus on the memory-efficient issue when context length $n$ is much greater than 128K ($n \gg 2^d$). Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$. It accomplishes this by constructing $U_1, U_2 \in \mathbb{R}^{n \times t}$ to expedite attention ${\sf Attn}(Q, K, V)$ computation within $n^{1+o(1)}$ time executions. Despite this, computing the approximated attention matrix $U_1U_2^\top \in \mathbb{R}^{n \times n}$ still necessitates $O(n^2)$ space, leading to significant memory usage. In response to these challenges, we introduce a new algorithm that only reads one pass of the data in a streaming fashion. This method employs sublinear space $o(n)$ to store three sketch matrices, alleviating the need for exact $K, V$ storage. Notably, our algorithm exhibits exceptional memory-efficient performance with super-long tokens. As the token length $n$ increases, our error guarantee diminishes while the memory usage remains nearly constant. This unique attribute underscores the potential of our technique in efficiently handling LLMs in streaming applications.	翻訳日:2024-02-07 03:53:20 公開日:2024-02-05
# 期待効用仮説に基づく量子電池からの最適作業抽出 Optimal work extraction from quantum batteries based on the expected utility hypothesis ( http://arxiv.org/abs/2311.14489v2 ) ライセンス: Link先を確認	Gianluca Francica and Luca Dell'Anna	(参考訳) 量子有限系における仕事の抽出は、量子熱力学において重要な問題である。抽出された最適作業はエルゴトロピーと呼ばれ、全ユニタリサイクルで抽出された平均作業量を最大化することで達成される。しかし, リスクに中立でないエージェントは変動の影響を受け, 期待された効用仮説に従って作業を引き出す必要がある。そこで本研究では,全ユニタリサイクルの平均効用関数を最大化することにより,リスク非中立剤による最適作業抽出について検討する。我々は主にエネルギー基底に関して一貫性のない初期状態に注目し、仕事の確率分布を達成する。この場合、最適な作業抽出は、エージェントのリスク回避に依存するエネルギー基底の置換という、一貫性のないユニタリ変換によってどのように行われるかを示す。いくつかの例を挙げ、特に量子電池のアンサンブルからの作業抽出について検討する。さらに,作業の準確率分布を考慮したエネルギーベースにおける初期量子コヒーレンスの存在による作業抽出の影響についても検討した。 Work extraction in quantum finite systems is an important issue in quantum thermodynamics. The optimal work extracted is called ergotropy, and it is achieved by maximizing the average work extracted over all the unitary cycles. However, an agent that is non-neutral to risk is affected by fluctuations and should extract work by following the expected utility hypothesis. Thus, we investigate the optimal work extraction performed by a risk non-neutral agent by maximizing the average utility function over all the unitary cycles. We mainly focus on initial states that are incoherent with respect to the energy basis, achieving a probability distribution of work. In this case we show how the optimal work extraction will be performed with an incoherent unitary transformation, namely a permutation of the energy basis, which depends on the risk aversion of the agent. We give several examples, in particular also the work extraction from an ensemble of quantum batteries is examined. Furthermore, we also investigate how work extraction is affected by the presence of initial quantum coherence in the energy basis by considering a quasiprobability distribution of work.	翻訳日:2024-02-07 03:52:43 公開日:2024-02-05
# 対話型ヒューマノイド:社会標準化と予測を用いたオンラインフルボディモーション反応合成 Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting ( http://arxiv.org/abs/2312.08983v3 ) ライセンス: Link先を確認	Yunze Liu, Changxi Chen, Li Yi	(参考訳) 対象物との人間-ヒューマノイド相互作用タスクを任意に重視する。そこで本研究では,ヒトアクターの動きに基づいてヒューマノイド反応を生成するオンラインフルボディモーション反応合成法を提案する。前回の研究は、物体のない人間の相互作用にのみ焦点をあて、手なしで身体反応を発生させる。また,このタスクをオンライン環境とはみなさないため,現実的な状況下での情報観測が不可能である。このタスクを支援するために,HHIとCoChairという2つのデータセットを構築し,統一的な手法を提案する。具体的には,社会的アプライアンス表現の構築を提案する。まず、ソーシャル・アプライアンス・キャリアを選択し、SE(3)-Equivariant Neural Networksを用いてローカル・フレームを学習し、ソーシャル・アプライアンス・キャリアを標準化する。また, 想定される未来に基づいて, 原子炉を予測できる社会的な余裕予測手法を提案する。実験により,HHIとCoChairの高次反応を効果的に生成できることが示された。さらに,既存の人間間相互作用データセット,Chi3Dについても検証を行った。 We focus on the human-humanoid interaction task optionally with an object. We propose a new task named online full-body motion reaction synthesis, which generates humanoid reactions based on the human actor's motions. The previous work only focuses on human interaction without objects and generates body reactions without hand. Besides, they also do not consider the task as an online setting, which means the inability to observe information beyond the current moment in practical situations. To support this task, we construct two datasets named HHI and CoChair and propose a unified method. Specifically, we propose to construct a social affordance representation. We first select a social affordance carrier and use SE(3)-Equivariant Neural Networks to learn the local frame for the carrier, then we canonicalize the social affordance. Besides, we propose a social affordance forecasting scheme to enable the reactor to predict based on the imagined future. Experiments demonstrate that our approach can effectively generate high-quality reactions on HHI and CoChair. Furthermore, we also validate our method on existing human interaction datasets Interhuman and Chi3D.	翻訳日:2024-02-07 03:44:30 公開日:2024-02-05
# oracleとaiの議論で大きなゲームをする Playing Large Games with Oracles and AI Debate ( http://arxiv.org/abs/2312.04792v2 ) ライセンス: Link先を確認	Xinyi Chen, Angelica Chen, Dean Foster, Elad Hazan	(参考訳) 非常に多くのアクションを伴う繰り返しゲームにおける後悔の最小化について検討する。このようなゲームは、議論によるAI安全性の設定に固有のものであり、より一般的には、アクションが言語に基づくゲームである。既存のオンラインゲームプレイのアルゴリズムは、アクションの数で計算多項式を必要とするが、これは大きなゲームでは禁じられる。私たちはoracleベースのアルゴリズムを、oracleが自然にaiエージェントへのアクセスをモデル化すると考えている。 oracle accessでは、内部および外部の後悔を最小限に抑えることができます。我々は,動作数に対数的に依存する後悔と計算の複雑さを最小化するための新しいアルゴリズムを提案する。これは大きなゲームにおける相関平衡の効率的なオラクルベースの計算を意味する。我々は、AI Safety via Debateの設定において、アルゴリズム分析からの洞察の恩恵を示す実験で締めくくります。 We consider regret minimization in repeated games with a very large number of actions. Such games are inherent in the setting of AI safety via debate, and more generally games whose actions are language-based. Existing algorithms for online game playing require computation polynomial in the number of actions, which can be prohibitive for large games. We thus consider oracle-based algorithms, as oracles naturally model access to AI agents. With oracle access, we characterize when internal and external regret can be minimized efficiently. We give a novel efficient algorithm for internal regret minimization whose regret and computation complexity depend logarithmically on the number of actions. This implies efficient oracle-based computation of a correlated equilibrium in large games. We conclude with experiments in the setting of AI Safety via Debate that shows the benefit of insights from our algorithmic analysis.	翻訳日:2024-02-07 03:43:08 公開日:2024-02-05
# GraphMETRO: 専門家の混在による複雑なグラフ分散シフトの緩和 GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts ( http://arxiv.org/abs/2312.04693v2 ) ライセンス: Link先を確認	Shirley Wu, Kaidi Cao, Bruno Ribeiro, James Zou, Jure Leskovec	(参考訳) グラフデータは本質的に複雑で不均一であり、分布シフトの自然な多様性をもたらす。しかし、現実世界で自然に発生する複雑な非合成分布シフトに一般化する機械学習アーキテクチャを構築する方法は不明である。ここでは、自然の多様性を確実にモデル化し、複雑な分散シフトをキャプチャするグラフニューラルネットワークアーキテクチャであるGraphMETROを開発する。 graphmetroでは,ゲーティングモデルと複数のエキスパートモデルを備えたmoe(mixed-of-experts)アーキテクチャを採用している。さらに、異なる専門家モデルからの表現をスムーズな最適化のために整列させる新しい目的を設計する。 GraphMETROは、複雑な実世界の分散シフトと、WebKBおよびTwitchデータセットの67%と4.2%の改善からなるGOODベンチマークの4つのデータセットの最先端結果を達成する。 Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to complex non-synthetic distributional shifts naturally occurring in the real world. Here we develop GraphMETRO, a Graph Neural Network architecture, that reliably models natural diversity and captures complex distributional shifts. GraphMETRO employs a Mixture-of-Experts (MoE) architecture with a gating model and multiple expert models, where each expert model targets a specific distributional shift to produce a shift-invariant representation, and the gating model identifies shift components. Additionally, we design a novel objective that aligns the representations from different expert models to ensure smooth optimization. GraphMETRO achieves state-of-the-art results on four datasets from GOOD benchmark comprised of complex and natural real-world distribution shifts, improving by 67% and 4.2% on WebKB and Twitch datasets.	翻訳日:2024-02-07 03:42:32 公開日:2024-02-05
# 混合量子/古典理論(MQCT)による複雑系における分子衝突のダイナミクス Mixed Quantum/Classical Theory (MQCT) Approach to the Dynamics of Molecule-Molecule Collisions in Complex Systems ( http://arxiv.org/abs/2312.02322v3 ) ライセンス: Link先を確認	Carolin Joy, Bikramaditya Mandal, Dulat Bostan, Marie-Lise Dubernet and Dmitri Babikov	(参考訳) 複雑な分子-分子衝突における衝突エネルギー移動とロ-振動エネルギー交換のダイナミクスを研究できる一般理論的アプローチとユーザ対応のコンピュータコードを開発した。この方法は古典力学と量子力学の混合である。衝突パートナーの内部振動運動は、状態量子化やゼロ点エネルギー、状態-状態遷移、量子対称性、干渉現象などの多くの量子現象を捉える時間依存シュロディンガー方程式を用いて量子力学的に扱われる。 ehrenfest平均場軌道アプローチを用いて、衝突パートナーの翻訳運動を古典的に記述することにより、重要な数値的な高速化が得られる。このフレームワーク内では、衝突力学の近似手法のファミリーが開発された。 H$_2$O や ND$_3$ とHe, H$_2$ や D$_2$ と衝突した二原子および三原子分子に関するいくつかのベンチマーク研究は、MQCT の結果が幅広いエネルギー、特に完全な量子結果とほぼ同一となる高衝突エネルギーのフル量子計算とよく一致していることを示している。この手法の数値的効率性とmqct符号の大規模並列性により、c$_6$h$_6$ + he, ch$_3$cooh + he, h$_2$o + h$_2$o などの最も複雑な衝突系を取り入れることができる。 MQCTのCH$_3$CHCH$_2$O + Heなどのキラル分子の衝突や分子表面衝突への応用も可能であり、将来追求される。 We developed a general theoretical approach and a user-ready computer code that permit to study the dynamics of collisional energy transfer and ro-vibrational energy exchange in complex molecule-molecule collisions. The method is a mixture of classical and quantum mechanics. The internal ro-vibrational motion of collision partners is treated quantum mechanically using time-dependent Schrodinger equation that captures many quantum phenomena including state quantization and zero-point energy, propensity and selection rules for state-to-state transitions, quantum symmetry and interference phenomena. A significant numerical speed up is obtained by describing the translational motion of collision partners classically, using the Ehrenfest mean-field trajectory approach. Within this framework a family of approximate methods for collision dynamics is developed. Several benchmark studies for diatomic and triatomic molecules, such as H$_2$O and ND$_3$ collided with He, H$_2$ and D$_2$, show that the results of MQCT are in good agreement with full-quantum calculations in a broad range of energies, especially at high collision energies where they become nearly identical to the full quantum results. Numerical efficiency of the method and massive parallelism of the MQCT code permit us to embrace some of the most complicated collisional systems ever studied, such as C$_6$H$_6$ + He, CH$_3$COOH + He and H$_2$O + H$_2$O. Application of MQCT to the collisions of chiral molecules such as CH$_3$CHCH$_2$O + He, and to the molecule-surface collisions is also possible and will be pursued in the future.	翻訳日:2024-02-07 03:41:11 公開日:2024-02-05
# 言語と政治の双方向適応によるオープンエンドエンボディエージェントの構築 Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation ( http://arxiv.org/abs/2401.00006v2 ) ライセンス: Link先を確認	Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao and Yu Liu	(参考訳) オープンエンド学習エージェントの構築には、事前学習言語モデル(LLM)と強化学習(RL)アプローチの課題が含まれる。 LLMはコンテキスト固有のリアルタイムインタラクションに苦しむ一方、RL法は探索の効率性の問題に直面している。そこで我々は,LLMとGRLと連携して,任意の指示を解釈できるオープンエンドエージェントを構築するための協調学習フレームワークOpenContraを提案する。この実装は、(1)人間の指示を構造化された目標に翻訳するLLMを微調整し、(2)任意の目標を達成するために目標条件付きRLポリシーを訓練し、(2)LLMとRLポリシーを互いに適応させ、指示空間にオープンディペンデンスを達成させる協調訓練を含む。複雑で広大な目標空間を持つバトルロイヤルFPSゲームであるContraで実験を行う。その結果、OpenContraで訓練されたエージェントは、任意の人間の指示を理解し、高い完成率で目標を達成していることが示され、OpenContraがオープンなエンボディエージェントを構築するための最初の実用的なソリューションである可能性が証明された。 Building open-ended learning agents involves challenges in pre-trained language model (LLM) and reinforcement learning (RL) approaches. LLMs struggle with context-specific real-time interactions, while RL methods face efficiency issues for exploration. To this end, we propose OpenContra, a co-training framework that cooperates LLMs and GRL to construct an open-ended agent capable of comprehending arbitrary human instructions. The implementation comprises two stages: (1) fine-tuning an LLM to translate human instructions into structured goals, and curriculum training a goal-conditioned RL policy to execute arbitrary goals; (2) collaborative training to make the LLM and RL policy learn to adapt each, achieving open-endedness on instruction space. We conduct experiments on Contra, a battle royale FPS game with a complex and vast goal space. The results show that an agent trained with OpenContra comprehends arbitrary human instructions and completes goals with a high completion ratio, which proves that OpenContra may be the first practical solution for constructing open-ended embodied agents.	翻訳日:2024-02-07 03:33:06 公開日:2024-02-05
# 印刷多層パーセプトロンの増積と活性化のベスポーク近似 Bespoke Approximation of Multiplication-Accumulation and Activation Targeting Printed Multilayer Perceptrons ( http://arxiv.org/abs/2312.17612v2 ) ライセンス: Link先を確認	Florentia Afentaki, Gurol Saglam, Argyris Kokkinis, Kostas Siozios, Georgios Zervakis, Mehdi B Tahoori	(参考訳) Printed Electronics (PE) は、真のユビキタスコンピューティングを実現するための顕著な技術である、際立った特徴と特徴を特徴とする。これは、これまでコンピューティングの浸透が限られていた整合性および超低コストのソリューションを必要とするアプリケーションドメインに特に関係している。シリコンベースの技術とは異なり、peは非繰り返しのエンジニアリングコスト、超低製造コスト、コンフォーサル、フレキシブル、非毒性、伸縮可能なハードウェアのオンデマンド製造などの非並列的な機能を提供する。しかし、PEはその大きな特徴サイズのために一定の制限に直面しており、機械学習分類器のような複雑な回路の実現を妨げる。本研究では,近似計算の原理と(完全にカスタマイズされた)設計の原理を活用し,これらの制約に対処する。超低出力多層パーセプトロン(mlp)分類器の設計のための自動化フレームワークを提案する。これは初めて、mlpのニューロンの全ての機能を近似する包括的アプローチである、乗算、蓄積、活性化を用いる。各種のMLPを網羅的に評価することにより,最も複雑なMLPアーキテクチャであっても,バッテリ駆動による操作が可能であり,技術の現状を大きく上回っていることを示す。 Printed Electronics (PE) feature distinct and remarkable characteristics that make them a prominent technology for achieving true ubiquitous computing. This is particularly relevant in application domains that require conformal and ultra-low cost solutions, which have experienced limited penetration of computing until now. Unlike silicon-based technologies, PE offer unparalleled features such as non-recurring engineering costs, ultra-low manufacturing cost, and on-demand fabrication of conformal, flexible, non-toxic, and stretchable hardware. However, PE face certain limitations due to their large feature sizes, that impede the realization of complex circuits, such as machine learning classifiers. In this work, we address these limitations by leveraging the principles of Approximate Computing and Bespoke (fully-customized) design. We propose an automated framework for designing ultra-low power Multilayer Perceptron (MLP) classifiers which employs, for the first time, a holistic approach to approximate all functions of the MLP's neurons: multiplication, accumulation, and activation. Through comprehensive evaluation across various MLPs of varying size, our framework demonstrates the ability to enable battery-powered operation of even the most intricate MLP architecture examined, significantly surpassing the current state of the art.	翻訳日:2024-02-07 03:32:34 公開日:2024-02-05
# 不均衡線形分類のためのパーセプトロン(SIGTRON)を用いた拡張非対称シグモノイド An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification ( http://arxiv.org/abs/2312.16043v2 ) ライセンス: Link先を確認	Hyenkyun Woo	(参考訳) 本稿では, パーセプトロンを持つ拡張非対称なsigtronモデルであるsigtronと呼ばれる新しい多項式パラメータ付きsigtronモデルと, 仮想sigtronによる凸損失関数を用いたsigtron-imbalanced classification(sic)モデルを提案する。従来の$\pi$-weighted cost-sensitive learning modelとは対照的に、SICモデルは損失関数に外部の$\pi$-weightを持たず、仮想SIGTRON誘導損失関数の内部パラメータを持つ。その結果、与えられたトレーニングデータセットがバランスの取れた状態に近い場合、提案したSICモデルは、トレーニングデータセットとテストデータセットのスケールクラス不均衡比の不整合など、データセットのバリエーションに適応することが示される。この適応は歪曲超平面方程式を作成することによって達成される。さらに,区間ベースの双断面探索法を開発し,仮想凸損失に対する準ニュートン最適化(l-bfgs)フレームワークを提案する。実験結果から,提案手法は,テスト分類精度が551$ two-class と 6.7$ multi-class dataset の点で,$\pi$-weighted convex focal loss と balanced classifier liblinear (logistic regression, svm, l2svm) よりも優れていることがわかった。トレーニングデータセットのスケールクラス不均衡比が重要でないバイナリ分類問題では、各データセットに最適なテスト精度を持つSICモデル群(TOP$1$)が、よく知られたカーネルベースの分類器であるLIBSVM(C-SVC with RBF kernel)より優れている。 This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $\pi$-weighted cost-sensitive learning model, the SIC model does not have an external $\pi$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the scale-class-imbalance ratio between the training and test datasets. This adaptation is achieved by creating a skewed hyperplane equation. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms $\pi$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.	翻訳日:2024-02-07 03:31:23 公開日:2024-02-05
# スイッチバック実験の高速化 Faster Rates for Switchback Experiments ( http://arxiv.org/abs/2312.15574v2 ) ライセンス: Link先を確認	Su Jia, Nathan Kallus, Christina Lee Yu	(参考訳) スイッチバック実験設計では、1つのユニット(例えば、システム全体)が1つのランダムな時間ブロックの処理に晒され、クロスユニットと時間的干渉の両方に取り組む。 Hu and Wager (2022) はブロックの開始点を縮める処理効果推定器を提案し、高速な混合を伴うマルコフ条件下でのグローバル平均処理効果(GATE)を推定するための$T^{-1/3}$レートを確立した。彼らはこのレートが最適であり、より速いレートを楽しむために、異なる(そして設計に依存した)見積もりに焦点を当てることを提案している。同じ設計の場合、ブロック全体を用いた代替推定器を提案し、同じ仮定の下で、元の設計に依存しないGATE推定器に対して、実際に$\sqrt{\log T/T}$の推定率を達成できることを示した。 Switchback experimental design, wherein a single unit (e.g., a whole system) is exposed to a single random treatment for interspersed blocks of time, tackles both cross-unit and temporal interference. Hu and Wager (2022) recently proposed a treatment-effect estimator that truncates the beginnings of blocks and established a $T^{-1/3}$ rate for estimating the global average treatment effect (GATE) in a Markov setting with rapid mixing. They claim this rate is optimal and suggest focusing instead on a different (and design-dependent) estimand so as to enjoy a faster rate. For the same design we propose an alternative estimator that uses the whole block and surprisingly show that it in fact achieves an estimation rate of $\sqrt{\log T/T}$ for the original design-independent GATE estimand under the same assumptions.	翻訳日:2024-02-07 03:30:26 公開日:2024-02-05
# EraseDiff:拡散モデルにおけるデータ影響の消去 EraseDiff: Erasing Data Influence in Diffusion Models ( http://arxiv.org/abs/2401.05779v2 ) ライセンス: Link先を確認	Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi	(参考訳) 本研究では拡散モデルのための非学習アルゴリズムを提案する。本アルゴリズムは,データ記憶に関する懸念を緩和する機構を備えた拡散モデルである。これを実現するために、未学習問題を制約最適化問題として定式化し、残余のデータ上で拡散モデルの実用性を保ち、学習可能な生成過程を地道記述法から逸脱させることで、データ忘れに関する情報を精査する。そこで本研究では, 拡散過程を警戒しながら, 実用性に優れた一階法を採用する。実験により,本アルゴリズムは広く使用されている拡散モデルと条件付きおよび無条件画像生成シナリオの両方において,モデルの有効性,有効性,効率を保てることを実証する。 In this work, we introduce an unlearning algorithm for diffusion models. Our algorithm equips a diffusion model with a mechanism to mitigate the concerns related to data memorization. To achieve this, we formulate the unlearning problem as a constraint optimization problem, aiming to preserve the utility of the diffusion model on the remaining data and scrub the information associated with forgetting data by deviating the learnable generative process from the ground-truth denoising procedure. To solve the resulting problem, we adopt a first-order method, having superior practical performance while being vigilant about the diffusion process. Empirically, we demonstrate that our algorithm can preserve the model utility, effectiveness, and efficiency while removing across the widely-used diffusion models and in both conditional and unconditional image generation scenarios.	翻訳日:2024-02-07 03:19:43 公開日:2024-02-05
# LKCA: 大きなカーネルの進化的注意 LKCA: Large Kernel Convolutional Attention ( http://arxiv.org/abs/2401.05738v2 ) ライセンス: Link先を確認	Chenghao Li, Boheng Zeng, Yi Lu, Pengbo Shi, Qingzi Chen, Jirui Liu, Lingyun Zhu	(参考訳) 視覚変換器における注意機構と大カーネルConvNetの関係を再検討し,LKCA(Large Kernel Convolutional Attention)という空間的注意を提案する。単一の大きなカーネル畳み込みに置き換えることで、注意操作を単純化する。 LKCAは畳み込みニューラルネットワークとビジュアルトランスフォーマーの利点を組み合わせて、大きな受容野、局所性、パラメータ共有を持つ。我々は、畳み込みと注意の両方の観点からlkcaの優位性を説明し、各ビューに同等のコード実装を提供した。コンボリューションとアテンションの両方の観点から実装されたLKCAは同等の性能を示した。分類タスクとセグメンテーションタスクの両方において, LKCA の ViT 変異体を広範囲に実験した。実験により,LKCAは視覚タスクにおいて競争性能を示すことが示された。私たちのコードはhttps://github.com/CatworldLee/LKCAで公開されます。 We revisit the relationship between attention mechanisms and large kernel ConvNets in visual transformers and propose a new spatial attention named Large Kernel Convolutional Attention (LKCA). It simplifies the attention operation by replacing it with a single large kernel convolution. LKCA combines the advantages of convolutional neural networks and visual transformers, possessing a large receptive field, locality, and parameter sharing. We explained the superiority of LKCA from both convolution and attention perspectives, providing equivalent code implementations for each view. Experiments confirm that LKCA implemented from both the convolutional and attention perspectives exhibit equivalent performance. We extensively experimented with the LKCA variant of ViT in both classification and segmentation tasks. The experiments demonstrated that LKCA exhibits competitive performance in visual tasks. Our code will be made publicly available at https://github.com/CatworldLee/LKCA.	翻訳日:2024-02-07 03:19:28 公開日:2024-02-05
# 超伝導回路における非断熱幾何fsimゲートの一段階実装 One-step implementation of nonadiabatic geometric fSim gate in superconducting circuits ( http://arxiv.org/abs/2401.02234v2 ) ライセンス: Link先を確認	M.-R. Yun, Zheng Shan, L.-L. Yan, Yu Jia S.-L. Su, and G. Chen	(参考訳) fsimゲートはアルゴリズムの奥行きを下げる重要な用途のために多くの注目を集め、一方fsimゲートのワンステップ実装は未解決の問題である。本稿では,3つの最低エネルギーレベルに基づく可変超伝導回路における非断熱非循環幾何fsimゲートの一段階実装を提案する。従来の単ループ非断熱幾何fsimゲートと比較して,提案手法は半時間しかかからず,パラメータ変動に対する強固性,環境影響に対する強固性を示す。さらに,可変カプラを追加することで,fsimゲートを実装するための間接的なスキームを提案する。このスキームは量子計算とシミュレーションへの有望な道を提供するかもしれない。 Due to its significant application in reducing algorithm depth, fSim gates have attracted a lot of attention, while one-step implementation of fSim gates remains an unresolved issue. In this manuscript, we propose a one-step implementation of the nonadiabatic nocyclic geometric fSim gate in a tunable superconducting circuit based on the three lowest energy levels. Compared to conventional single-loop nonadiabatic geometric fSim gate, our scheme only takes half the time and demonstrates stronger robustness to parameter fluctuations, as well as stronger robustness to environmental impacts. Moreover, we also proposed an indirect scheme to implement the fSim gate by adding a tunable coupler. This scheme may provide a promising path toward quantum computation and simulation.	翻訳日:2024-02-07 03:16:34 公開日:2024-02-05
# Lumiere:ビデオ生成のための時空間拡散モデル Lumiere: A Space-Time Diffusion Model for Video Generation ( http://arxiv.org/abs/2401.12945v2 ) ライセンス: Link先を確認	Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri	(参考訳) Lumiereは、ビデオ合成において重要な課題である、リアルで多様なコヒーレントな動きを表現するビデオの合成用に設計された、テキストからビデオへの拡散モデルである。この目的のために、我々は、一度に1回のパスでビデオの全時間を生成するSpace-Time U-Netアーキテクチャを導入する。これは、遠方のキーフレームを合成した既存のビデオモデルと対照的に、時間的超解像は本質的にグローバルな時間的一貫性を達成しにくくするアプローチである。空間的および(重要な)時間的ダウンサンプリングとアップサンプリングの両方をデプロイし、事前訓練されたテキスト・ツー・イメージ拡散モデルを活用することにより、複数の時空間スケールで処理することで、フルフレームレートの低解像度映像を直接生成することを学ぶ。我々は最先端のテキスト対ビデオ生成結果を示し,画像から映像への変換,スタイリッシュな生成など,幅広いコンテンツ作成タスクやビデオ編集アプリケーションを容易に行うことができることを示す。 We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.	翻訳日:2024-02-07 03:08:07 公開日:2024-02-05
# DCRMTA:マルチタッチ属性のための曖昧な因果表現 DCRMTA: Unbiased Causal Representation for Multi-touch Attribution ( http://arxiv.org/abs/2401.08875v2 ) ライセンス: Link先を確認	Jiaming Tang	(参考訳) MTA(Multi-touch Attribution)は、現在、各広告タッチポイントの変換行動に対する貢献を公平に評価し、予算配分や広告推薦に深く影響している。以前の研究は、変換モデルの偏りのない仮定を達成するために、ユーザの好みに起因するバイアスを取り除こうとした。マルチモデル協調方式は効率的ではなく、ユーザインフルエンスを完全に排除することで、変換に対するユーザ特徴の因果効果も排除され、変換モデルの性能が制限される。本稿では,コンバージョンにおけるユーザ特徴の因果効果を再定義し,mta (deep causal representation for mta) を提案する。本モデルは,共起変数を排除しつつ,変換とユーザ間の因果関係を抽出することに焦点を当てる。さらに、DCRMTAが様々なデータ分布にまたがって予測を変換する際の優れた性能を示すとともに、ディフフェレント広告チャネル間で効果的に価値をもたらすことを示す。 Multi-touch attribution (MTA) currently plays a pivotal role in achieving a fair estimation of the contributions of each advertising touchpoint to-wards conversion behavior, deeply influencing budget allocation and advertising recommenda-tion. Previous works attempted to eliminate the bias caused by user preferences to achieve the unbiased assumption of the conversion model. The multi-model collaboration method is not ef-ficient, and the complete elimination of user in-fluence also eliminates the causal effect of user features on conversion, resulting in limited per-formance of the conversion model. This paper re-defines the causal effect of user features on con-versions and proposes a novel end-to-end ap-proach, Deep Causal Representation for MTA (DCRMTA). Our model focuses on extracting causa features between conversions and users while eliminating confounding variables. Fur-thermore, extensive experiments demonstrate DCRMTA's superior performance in converting prediction across varying data distributions, while also effectively attributing value across dif-ferent advertising channels.	翻訳日:2024-02-07 03:05:56 公開日:2024-02-05
# 人工包摂の錯覚 The illusion of artificial inclusion ( http://arxiv.org/abs/2401.08572v3 ) ライセンス: Link先を確認	William Agnew, A. Stevie Bergman, Jennifer Chien, Mark D\'iaz, Seliem El-Sayed, Jaylen Pittman, Shakir Mohamed, Kevin R. McKee	(参考訳) 人間の参加者は、現代の人工知能(AI)技術の発展、心理学、ユーザー研究において中心的な役割を果たす。生成AIの最近の進歩は、これらの領域における人間の参加者をAIサロゲートに置き換える可能性への関心が高まっている。このような「代替提案」を調査し、近代的な生成AIによる人間の置換者に対する議論をより深く理解する。調査・開発作業のコスト削減や収集データの多様性向上といった目標を掲げて,これらの提案の近年の波が示唆されている。しかし、これらの提案は、表現、包含、理解という、人間と作業の基本的な価値を無視して、最終的に衝突する。本稿では,人間参加の根底にある原則と目標を批判的に検討し,真に参加者を集中し,力づける将来の仕事の道筋を図解する。 Human participants play a central role in the development of modern artificial intelligence (AI) technology, in psychological science, and in user research. Recent advances in generative AI have attracted growing interest to the possibility of replacing human participants in these domains with AI surrogates. We survey several such "substitution proposals" to better understand the arguments for and against substituting human participants with modern generative AI. Our scoping review indicates that the recent wave of these proposals is motivated by goals such as reducing the costs of research and development work and increasing the diversity of collected data. However, these proposals ignore and ultimately conflict with foundational values of work with human participants: representation, inclusion, and understanding. This paper critically examines the principles and goals underlying human participation to help chart out paths for future work that truly centers and empowers participants.	翻訳日:2024-02-07 03:05:20 公開日:2024-02-05
# 自発的崩壊モデルによって宇宙の古典性が出現する Spontaneous collapse models lead to the emergence of classicality of the Universe ( http://arxiv.org/abs/2401.08269v2 ) ライセンス: Link先を確認	Jos\'e Luis Gaona-Reyes, Luc\'ia Men\'endez-Pidal, Mir Faizal, Matteo Carlesso	(参考訳) 量子力学が普遍的であり、すべてのスケールで適用できると仮定すると、宇宙は状態の量子重ね合わせであり、それぞれの状態が異なる時空幾何学に対応することができる。どのようにして私たちが観察する古典的なよく定義された幾何学の出現を説明できるのか? デコヒーレンス駆動の量子-古典遷移は外部の物理的実体に依存しているため、この過程は宇宙の古典的な振る舞いの出現を考慮できない。ここでは、波動関数の自然崩壊モデルが、そのような出現を説明するための実行可能なメカニズムを提供する方法を示す。これを重力と完全流体の単純な一般相対論の力学モデルに適用する。異なる幾何学の一般的な量子重ね合わせから始めると、崩壊ダイナミクスは単一の幾何学へとつながり、宇宙の量子から古典への遷移のメカニズムを提供する。同様に、我々の力学を物理的に等価なパラメトリスの一モジュラー重力モデルに適用すると、宇宙定数に基づいて崩壊し、最終的に1つの正確な値が選択され、宇宙定数問題に対する実行可能な説明を与える。我々の形式論は、よく定義されたクロック変数を選択できる他の量子宇宙論モデルにも容易に適用できる。 Assuming that Quantum Mechanics is universal and that it can be applied over all scales, then the Universe is allowed to be in a quantum superposition of states, where each of them can correspond to a different space-time geometry. How can one then describe the emergence of the classical, well-defined geometry that we observe? Considering that the decoherence-driven quantum-to-classical transition relies on external physical entities, this process cannot account for the emergence of the classical behaviour of the Universe. Here, we show how models of spontaneous collapse of the wavefunction can offer a viable mechanism for explaining such an emergence. We apply it to a simple General Relativity dynamical model for gravity and a perfect fluid. We show that, by starting from a general quantum superposition of different geometries, the collapse dynamics leads to a single geometry, thus providing a possible mechanism for the quantum-to-classical transition of the Universe. Similarly, when applying our dynamics to the physically-equivalent Parametrised Unimodular gravity model, we obtain a collapse on the basis of the cosmological constant, where eventually one precise value is selected, thus providing also a viable explanation for the cosmological constant problem. Our formalism can be easily applied to other quantum cosmological models where we can choose a well-defined clock variable.	翻訳日:2024-02-07 03:05:04 公開日:2024-02-05
# AI-as-exploration: インテリジェンス空間をナビゲートする AI-as-exploration: Navigating intelligence space ( http://arxiv.org/abs/2401.07964v2 ) ライセンス: Link先を確認	Dimitri Coelho Mollo	(参考訳) 人工知能は多くの人生を生きる分野であり、この用語は科学と商業の取り組みのモットリーのコレクションを含んでいる。本稿では,AIが果たさなければならない,無視されるが中心的な科学的な役割の輪郭について述べる。 ai-as-explorationの基本的な推進力は、私たちが慣れ親しんだ人間や動物の知性と異なる可能性のある知性の構成要素の候補を明らかにするシステムの作成と研究である。言い換えれば、AIは、インテリジェンス空間、すなわち可能なインテリジェントシステムの空間を探索する上で、私たちが持っている最高のツールの1つであることを提案します。特定のケーススタディ、すなわち、人間と大規模言語モデルにおける新しい概念と発明された概念を組み合わせる能力に関する最近の研究に焦点を当てて、AI-as-explorationの価値を説明する。後者は、そのようなタスクにおいて人間のレベルでの正確さを示しているにもかかわらず、おそらくは人間にとっての仮説とは根本的に異なる方法でそれを解決している。 Artificial Intelligence is a field that lives many lives, and the term has come to encompass a motley collection of scientific and commercial endeavours. In this paper, I articulate the contours of a rather neglected but central scientific role that AI has to play, which I dub `AI-as-exploration'.The basic thrust of AI-as-exploration is that of creating and studying systems that can reveal candidate building blocks of intelligence that may differ from the forms of human and animal intelligence we are familiar with. In other words, I suggest that AI is one of the best tools we have for exploring intelligence space, namely the space of possible intelligent systems. I illustrate the value of AI-as-exploration by focusing on a specific case study, i.e., recent work on the capacity to combine novel and invented concepts in humans and Large Language Models. I show that the latter, despite showing human-level accuracy in such a task, most probably solve it in ways radically different, but no less relevant to intelligence research, to those hypothesised for humans.	翻訳日:2024-02-07 03:04:19 公開日:2024-02-05
# フーリエベース再パラメータ化トレーニングによる暗黙的神経表現の改善 Improved Implicit Neural Representation with Fourier Bases Reparameterized Training ( http://arxiv.org/abs/2401.07402v3 ) ライセンス: Link先を確認	Kexuan Shi and Xingyu Zhou and Shuhang Gu	(参考訳) Inlicit Neural Representation (INR)は、近年様々なコンピュータビジョンタスクにおいて、強力な表現パラダイムとして成功している。バニラ多層パーセプトロン(MLP)の低周波バイアス問題により、位置符号化や周期的アクティベーション関数といった高度な技術を用いてINRの精度を向上させる方法が研究されている。本稿では,ネットワークトレーニングバイアスと再パラメータ化手法を結合し,重み付け再パラメータ化がMDPのスペクトルバイアスを軽減することができることを理論的に証明する。理論解析に基づき,固定されたフーリエ基底の係数行列を学習し,MLPの重みを構成するフーリエ再パラメータ化法を提案する。本稿では,バニラ型MLP,位置符号化型MLP,高度なアクティベーション機能付きMLPなど,様々なMLPアーキテクチャを用いたINRタスクに対するフーリエ再パラメータ化手法の評価を行った。異なるMLPアーキテクチャ上での優越性近似は,提案手法の利点を明らかに証明する。フーリエのパラメータ化手法によって、より多くのテクスチャと少ないアーティファクトを持つより優れたINRをトレーニングデータから学べる。 Implicit Neural Representation (INR) as a mighty representation paradigm has achieved success in various computer vision tasks recently. Due to the low-frequency bias issue of vanilla multi-layer perceptron (MLP), existing methods have investigated advanced techniques, such as positional encoding and periodic activation function, to improve the accuracy of INR. In this paper, we connect the network training bias with the reparameterization technique and theoretically prove that weight reparameterization could provide us a chance to alleviate the spectral bias of MLP. Based on our theoretical analysis, we propose a Fourier reparameterization method which learns coefficient matrix of fixed Fourier bases to compose the weights of MLP. We evaluate the proposed Fourier reparameterization method on different INR tasks with various MLP architectures, including vanilla MLP, MLP with positional encoding and MLP with advanced activation function, etc. The superiority approximation results on different MLP architectures clearly validate the advantage of our proposed method. Armed with our Fourier reparameterization method, better INR with more textures and less artifacts can be learned from the training data.	翻訳日:2024-02-07 03:03:58 公開日:2024-02-05
# 臨床テキストにおけるエンティティ修飾子の予測のための伝達学習:オピオイド使用障害検出への応用 Transfer Learning for the Prediction of Entity Modifiers in Clinical Text: Application to Opioid Use Disorder Case Detection ( http://arxiv.org/abs/2401.15222v2 ) ライセンス: Link先を確認	Abdullateef I. Almudaifer, Whitney Covington, JaMor Hairston, Zachary Deitch, Ankit Anand, Caleb M. Carroll, Estera Crisan, William Bradford, Lauren Walter, Eaton Ellen, Sue S. Feldman and John D. Osborne	(参考訳) 背景: 臨床テキストから抽出されたエンティティのセマンティクスは, 実体否定, 不確実性, 条件性, 深刻度, 主観などの修飾によって劇的に変化する。臨床実体の修飾者を決定する既存のモデルは、各修飾者のために独立に訓練された正規表現または特徴重みを含む。方法:SemEval 2015 Task 14コーパスと,SemEvalと共有する修飾子とOUD特有の新規修飾子を含む新しいOpioid Use Disorder (OUD)データセットを用いて,修飾子を学習・予測するマルチタスクトランスフォーマーアーキテクチャの設計を開発し,評価する。本研究は, 複数タスク学習手法の有効性を, 既に公表されているシステムに対して評価し, 臨床組織修飾体の一部を共有する場合に, 移行学習の有効性を評価する。結果:SemEval 2015 Task 14のShAReコーパスでは,重み付け精度が1.1%,非重み付け精度が1.7%,マイクロF1スコアが10%向上した。結論: 共有モデルから学習した重みを部分的に一致した新しいデータセットに効果的に変換できることを示し, 臨床用テキスト修正器における転写学習の有用性を検証した。 Background: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier. Methods: We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared. Results: Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores. Conclusions: We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers	翻訳日:2024-02-07 02:56:25 公開日:2024-02-05
# 空気中の光による光の偏向の干渉計測 Interferometric measurement of the deflection of light by light in air ( http://arxiv.org/abs/2401.13506v2 ) ライセンス: Link先を確認	Adrien E. Kraych, Aur\'elie Max Mailliet, Fran\c{c}ois Couchot, Xavier Sarazin, Elsa Baynard, Julien Demailly, Moana Pittman, Arache Djannati-Ata\"i, Sophie Kazamias, Scott Robertson and Marcel Urban	(参考訳) dellight (deflection of light by light) 実験の目的は、高強度集光レーザパルス(pump)によって誘導される有効真空指数勾配を交差させる際に、低強度集光レーザパルス(probe)の屈折を測定することで、量子電磁力学によって予測される真空中の光学非線形性を初めて観測することである。 sagnac干渉計を用いて偏向信号を増幅する。本稿では,低強度ポンプを用いたDLLightパイロット干渉計による空気中の光による光の偏向の測定を行った。干渉計によって測定された偏向信号は増幅され、空気中の光ケラ効果によって引き起こされる期待信号と一致していることを示す。さらに, ポンプ強度, ポンプとプローブ間の時間遅延, 相対偏光の関数として信号が期待通りに変化することを確認した。これらの結果は、干渉計測増幅に基づくDeLLight実験法の概念実証である。 The aim of the DeLLight (Deflection of Light by Light) experiment is to observe for the first time the optical nonlinearity in vacuum, as predicted by Quantum Electrodynamics, by measuring the refraction of a low-intensity focused laser pulse (probe) when crossing the effective vacuum index gradient induced by a high-intensity focused laser pulse (pump). The deflection signal is amplified by using a Sagnac interferometer. Here, we report the first measurement performed with the DeLLight pilot interferometer, of the deflection of light by light in air, with a low-intensity pump. We show that the deflection signal measured by the interferometer is amplified, and is in agreement with the expected signal induced by the optical Kerr effect in air. Moreover, we verify that the signal varies as expected as a function of the pump intensity, the temporal delay between the pump and the probe, and their relative polarisation. These results represent a proof of concept of the DeLLight experimental method based on interferometric amplification.	翻訳日:2024-02-07 02:55:19 公開日:2024-02-05
# 分散固定設計量子チップと量子チャネルを用いたフェデレーション学習 Federated learning with distributed fixed design quantum chips and quantum channels ( http://arxiv.org/abs/2401.13421v2 ) ライセンス: Link先を確認	Ammar Daskin	(参考訳) 古典的な連合学習におけるプライバシは、クライアントへのエンジニアリングクエリとともに、ローカル勾配結果を使用することによって破られる。しかし、チャネル上の測定によって情報の損失が生じ、送信者が検出できるため、量子通信チャネルはより安全であると考えられる。したがって、フェデレーション学習の量子バージョンは、より多くのプライバシーを提供するために使用できる。さらに、量子チャネルを通して$N$の次元データベクトルを送信するには、$\log N$ entangled qubitsを送信する必要がある。本稿では,集中型サーバが送信する量子状態に基づいて,固定設計量子チップを動作させる量子フェデレーション学習モデルを提案する。来るべき重ね合わせ状態に基づいて、クライアントは計算し、そのローカル勾配を量子状態としてサーバに送信し、パラメータを更新するために集約される。サーバはモデルパラメータを送信せず、代わりに演算子を量子状態として送信するため、クライアントはモデルを共有する必要はない。これにより、非同期学習モデルの作成が可能になる。さらに、量子状態としてのモデルは直接クライアント側のチップに供給されるため、勾配を計算するためにモデルパラメータを取得するために次の量子状態の測定を必要としない。これにより、パラメータベクトルが古典的あるいは量子的チャネルを介して送信され、これらのパラメータの得られた値によって局所勾配が得られるモデルよりも効率が良い。 The privacy in classical federated learning can be breached through the use of local gradient results along with engineered queries to the clients. However, quantum communication channels are considered more secure because a measurement on the channel causes a loss of information, which can be detected by the sender. Therefore, the quantum version of federated learning can be used to provide more privacy. Additionally, sending an $N$ dimensional data vector through a quantum channel requires sending $\log N$ entangled qubits, which can potentially provide exponential efficiency if the data vector is utilized as quantum states. In this paper, we propose a quantum federated learning model where fixed design quantum chips are operated based on the quantum states sent by a centralized server. Based on the coming superposition states, the clients compute and then send their local gradients as quantum states to the server, where they are aggregated to update parameters. Since the server does not send model parameters, but instead sends the operator as a quantum state, the clients are not required to share the model. This allows for the creation of asynchronous learning models. In addition, the model as a quantum state is fed into client-side chips directly; therefore, it does not require measurements on the upcoming quantum state to obtain model parameters in order to compute gradients. This can provide efficiency over the models where the parameter vector is sent via classical or quantum channels and local gradients are obtained through the obtained values of these parameters.	翻訳日:2024-02-07 02:55:02 公開日:2024-02-05
# SLANG: 大規模言語モデルの新たな概念理解 SLANG: New Concept Comprehension of Large Language Models ( http://arxiv.org/abs/2401.12585v3 ) ライセンス: Link先を確認	Lingrui Mei, Shenghua Liu, Yiwei Wang, Baolong Bi, Xueqi Cheng	(参考訳) 言語の動的な性質は、特にインターネット上のスラングやミームの領域において顕著であり、大規模言語モデル(llm)の適応性に深刻な課題をもたらす。伝統的に静的データセットに固定されているこれらのモデルは、しばしばオンラインコミュニティの急速な言語進化の特徴に追従するのに苦労する。本研究の目的は,インターネット上での新たな概念のLLMの理解を高めることで,継続的な再学習のコストを高く抑えることである。この目標を追求するために、新しいデータを自動で統合してデータセットを最新に保ち、新興概念の理解におけるLLMの能力を評価できる$\textbf{FOCUS}$と、因果推論を用いてLLMを拡張し、新しいフレーズとその文脈を理解するアプローチである$\textbf{FOCUS}$を提案する。我々のベンチマークとアプローチは、言語の変化の実際の例を消化し、文脈のビーコンとして働き、新しく現れた表現とその意味の間のより正確でコンテキスト的に関連づける関係を形成する。実験分析の結果,我々の因果推論に基づくアプローチは,インターネットスラングやミームの理解において,精度と関連性の観点から従来のモデルよりも優れていることがわかった。 The dynamic nature of language, particularly evident in the realm of slang and memes on the Internet, poses serious challenges to the adaptability of large language models (LLMs). Traditionally anchored to static datasets, these models often struggle to keep up with the rapid linguistic evolution characteristic of online communities. This research aims to bridge this gap by enhancing LLMs' comprehension of the evolving new concepts on the Internet, without the high cost of continual retraining. In pursuit of this goal, we propose a new benchmark $\textbf{SLANG}$, which can autonomously integrates novel data to stay dataset up-to-date, to assess LLMs' capability in comprehending emerging concepts and an approach $\textbf{FOCUS}$, which uses causal inference to enhance LLMs to understand new phrases and their colloquial context. Our benchmark and approach involves digesting real-world instances of linguistic shifts, serving as contextual beacons, to form more precise and contextually relevant connections between newly emerging expressions and their meanings. The empirical analysis shows that our causal inference-based approach outperforms the traditional models in terms of precision and relevance in the comprehension of Internet slang and memes.	翻訳日:2024-02-07 02:53:55 公開日:2024-02-05
# HICH-IT : 高血圧性脳内出血研究のための総合的テキストと画像データセット HICH Image/Text (HICH-IT): Comprehensive Text and Image Datasets for Hypertensive Intracerebral Hemorrhage Research ( http://arxiv.org/abs/2401.15934v2 ) ライセンス: Link先を確認	Jie Li and Yulong Xia and Tongxin Yang and Fenglin Cai and Miao Wei and Zhiwei Zhang and Li Jiang	(参考訳) 本稿では, 高血圧性脳出血 (hich) の医学領域において, 電子カルテ (emr) と頭部ct画像の両方を含むhich-itと呼ばれる新しいデータセットを提案する。このデータセットは、hichの診断と治療における人工知能の精度を高めるように設計されている。このデータセットは、標準的なテキストと画像データの基礎の上に構築され、EMRに特定のアノテーションを組み込んで、テキスト情報から重要な内容を取り出し、画像データのアノテーション内容は、脳中線、血腫、左脳心室、右脳室の4種類に分類される。 HICH-ITは、画像セグメンテーションタスクと名前付きエンティティ認識における特徴学習のための基礎的データセットである。データセットをさらに理解するために、私たちは、パフォーマンスを観察するためにディープラーニングアルゴリズムを訓練しました。事前訓練されたモデルはwww.daip.clubとgithub.com/Deep-AI-Application-DAIPの両方でリリースされた。データセットはhttps://github.com/CYBUS123456/HICH-IT-Datasetsにアップロードされている。 Index Terms-HICH, Deep Learning, intraparenchymal hemorrhage, named entity recognition, novel dataset In this paper, we introduce a new dataset in the medical field of hypertensive intracerebral hemorrhage (HICH), called HICH-IT, which includes both electronic medical records (EMRs) and head CT images. This dataset is designed to enhance the accuracy of artificial intelligence in the diagnosis and treatment of HICH. This dataset, built upon the foundation of standard text and image data, incorporates specific annotations within the EMRs, extracting key content from the text information, and categorizes the annotation content of imaging data into four types: brain midline, hematoma, left and right cerebral ventricle. HICH-IT aims to be a foundational dataset for feature learning in image segmentation tasks and named entity recognition. To further understand the dataset, we have trained deep learning algorithms to observe the performance. The pretrained models have been released at both www.daip.club and github.com/Deep-AI-Application-DAIP. The dataset has been uploaded to https://github.com/CYBUS123456/HICH-IT-Datasets. Index Terms-HICH, Deep learning, Intraparenchymal hemorrhage, named entity recognition, novel dataset	翻訳日:2024-02-07 02:44:40 公開日:2024-02-05
# グリオーマの病理組織像解析における人工知能の応用 Applications of artificial intelligence in the analysis of histopathology images of gliomas: a review ( http://arxiv.org/abs/2401.15022v2 ) ライセンス: Link先を確認	Jan-Philipp Redlich, Friedrich Feuerhake, Joachim Weis, Nadine S. Schaadt, Sarah Teuber-Hanselmann, Christoph Buck, Sabine Luttmann, Andrea Eberle, Stefan Nikolin, Arno Appenzeller, Andreas Portmann, Andr\'e Homeyer	(参考訳) 近年,グリオーマの診断が複雑化している。人工知能(AI)を用いたグリオーマ組織像の解析は,診断と予後予測を支援する新たな機会を提供する。そこで本研究では,ヒトグリオーマの組織像全体に対するAIベースの手法を提案する70の公開研究について概説し,サブタイプ(16/70),グレーディング(23/70),分子マーカー予測(13/70),生存予測(27/70)の診断課題を概説した。すべての研究は, 方法論的側面と臨床応用性について検討した。本研究は成人型びまん性グリオーマのヘマトキシリンおよびエオシン染色組織断面の評価に焦点が当てられた。大部分の研究(49/70)は、がんゲノムアトラス (tcga) から入手可能なグリオブラスト腫と低グレードグリオーマデータセットに基づいており、他のデータセット (10/70) やtcgaデータセット (11/70) に加えて採用されている研究はごくわずかである。現在のアプローチは主に畳み込みニューラルネットワーク(53/70)を使用して、20倍の倍率(30/70)で組織を分析する。新しい研究分野は臨床データ、omicsデータ、磁気共鳴イメージング(27/70)の統合である。これまでのところ、AIベースの手法は有望な成果を上げているが、実際の臨床環境ではまだ使われていない。将来の研究は、日常的な適用可能性を示すために、高品質で最新の臨床および分子病理アノテーションを備えた大規模マルチサイトデータセットのメソッドの独立した検証に焦点を当てるべきである。 In recent years, the diagnosis of gliomas has become increasingly complex. Analysis of glioma histopathology images using artificial intelligence (AI) offers new opportunities to support diagnosis and outcome prediction. To give an overview of the current state of research, this review examines 70 publicly available research studies that have proposed AI-based methods for whole-slide histopathology images of human gliomas, covering the diagnostic tasks of subtyping (16/70), grading (23/70), molecular marker prediction (13/70), and survival prediction (27/70). All studies were reviewed with regard to methodological aspects as well as clinical applicability. It was found that the focus of current research is the assessment of hematoxylin and eosin-stained tissue sections of adult-type diffuse gliomas. The majority of studies (49/70) are based on the publicly available glioblastoma and low-grade glioma datasets from The Cancer Genome Atlas (TCGA) and only a few studies employed other datasets in isolation (10/70) or in addition to the TCGA datasets (11/70). Current approaches mostly rely on convolutional neural networks (53/70) for analyzing tissue at 20x magnification (30/70). A new field of research is the integration of clinical data, omics data, or magnetic resonance imaging (27/70). So far, AI-based methods have achieved promising results, but are not yet used in real clinical settings. Future work should focus on the independent validation of methods on larger, multi-site datasets with high-quality and up-to-date clinical and molecular pathology annotations to demonstrate routine applicability.	翻訳日:2024-02-07 02:41:43 公開日:2024-02-05
# GameStopショートストリップにおけるReddit集団行動の因果的役割 The causal role of the Reddit collective action on the GameStop short squeeze ( http://arxiv.org/abs/2401.14999v2 ) ライセンス: Link先を確認	Antonio Desiderio, Luca Maria Aiello, Giulio Cimini, Laura Alessandretti	(参考訳) 2021年初頭、GameStop、AMC、Nokia、BlackBerryの株価は劇的に上昇した。これらのイベントは、初めて、オンラインソーシャルネットワークが金融集団のアクションを触媒する可能性を示した。しかし、Redditユーザーがどのようにして、いつまで、どのようにしてこれらの価格を推し進める役割を担ったのかは不明だ。これらの問題に対処するために、私たちは因果推論手法を採用し、RedditとTwitterのデータキャプチャアクティビティを活用し、高時間分解能の取引量を利用する。 Redditの議論は、GameStopのショートストレッチの前にトレーディングのボリュームを予想し、その予測力は時間単位の時間スケールで特に強かった。この効果は突然現れ、イベントの数週間前に目立ったものになったが、投資家のコミュニティがtwitterを通じて広く注目されるようになると衰退した。因果関係が広がるにつれ、Redditコミュニティの集団投資はGameStopのユーザーの財務的地位を通じて定量化され、株式の市場資本化を反映した。今回の調査から明らかになった証拠は、RedditユーザーがGameStopの短縮を加速し、Redditが共有金融戦略のコーディネートハブとして機能したことを示している。 1月末、GameStopについて話しているユーザーはBlackBerry、AMC、Nokiaの人気を高めた。全体として、われわれの調査結果はソーシャルメディアユーザーによる最初の大規模な金融集団行動の背後にあるダイナミクスに光を当てた。 In early 2021, the stock prices of GameStop, AMC, Nokia, and BlackBerry experienced dramatic increases, triggered by short squeeze operations that have been largely attributed to Reddit's retail investors. These events showcased, for the first time, the potential of online social networks to catalyze financial collective action. How, when and to what extent Reddit users played a causal role in driving up these prices, however, remains unclear. To address these questions, we employ causal inference techniques, leveraging data capturing activity on Reddit and Twitter, and trading volume with a high temporal resolution. We find that Reddit discussions foreshadowed trading volume before the GameStop short squeeze, with their predictive power being particularly strong on hourly time scales. This effect emerged abruptly and became prominent a few weeks before the event, but waned once the community of investors gained widespread visibility through Twitter. As the causal link unfolded, the collective investment of the Reddit community, quantified through each user's financial position on GameStop, closely mirrored the market capitalization of the stock. The evidence from our study suggests that Reddit users fueled the GameStop short squeeze, and thereby Reddit served as a coordination hub for a shared financial strategy. Towards the end of January, users talking about GameStop contributed to raise the popularity of BlackBerry, AMC and Nokia, which emerged as the most popular stocks as the community gained global recognition. Overall, our results shed light on the dynamics behind the first large-scale financial collective action driven by social media users.	翻訳日:2024-02-07 02:41:13 公開日:2024-02-05
# the machine vision iceberg explains: 総合的環境条件を考慮した動的テストの進歩 The Machine Vision Iceberg Explained: Advancing Dynamic Testing by Considering Holistic Environmental Circumstances ( http://arxiv.org/abs/2401.14831v2 ) ライセンス: Link先を確認	Hubert Padusinski, Thilo Braun, Christian Steinhauser, Lennart Ries, Eric Sax	(参考訳) 現在の機械ビジョンのテストで氷山に向かっていますか? この作業は、ハイ自動化運転(HAD)システムで非常に必要とされる、マシンビジョン(MV)テストの現場に展開する。氷山への移動という隠語的な概念を利用して,現在のテスト戦略に隠されている潜在的な欠点について論じる。我々は,開発プロセスにおけるmvの不透明な機能をどう扱うか,より深く理解する必要があることを強調する。見過ごされているように、考慮は命がかかる。私たちの大きな貢献は、粒度グレードと呼ばれる階層的レベルモデルです。このモデルはmvが運用することを意図した環境の状況を理解するための多スケールな深さの探索を奨励する。このモデルは、オブジェクト属性のような個々のエンティティの関係から環境シーン全体まで、mv機能に影響を与える可能性のあるすべてのエンティティの全体的概要を提供することを目的としている。モデルの適用により、特定のドメイン内のエンティティの構造化された探索、それらの関係、MV-アンダーテストの結果の割り当てを行い、エンティティ-リレーショナルグラフを構築する。グラフ内の関係のクラスタリングパターンを通じて、mv の一般赤字は回避可能である。本研究は,HAD操作領域の全体的状況と相関して,MV試験対象の欠陥のよりきめ細やかで体系化された同定に寄与する。 Are we heading for an iceberg with the current testing of machine vision? This work delves into the landscape of Machine Vision (MV) testing, which is heavily required in Highly Automated Driving (HAD) systems. Utilizing the metaphorical notion of navigating towards an iceberg, we discuss the potential shortcomings concealed within current testing strategies. We emphasize the urgent need for a deeper understanding of how to deal with the opaque functions of MV in development processes. As overlooked considerations can cost lives. Our main contribution is the hierarchical level model, which we call Granularity Grades. The model encourages a refined exploration of the multi-scaled depths of understanding about the circumstances of environments in which MV is intended to operate. This model aims to provide a holistic overview of all entities that may impact MV functions, ranging from relations of individual entities like object attributes to entire environmental scenes. The application of our model delivers a structured exploration of entities in a specific domain, their relationships and assigning results of a MV-under-test to construct an entity-relationship graph. Through clustering patterns of relations in the graph general MV deficits are arguable. In Summary, our work contributes to a more nuanced and systematized identification of deficits of a MV test object in correlation to holistic circumstances in HAD operating domains.	翻訳日:2024-02-07 02:40:46 公開日:2024-02-05
# 円錐交点近傍での低エネルギーダイナミクスのための精密分解に基づく軌道探索 Exploring exact-factorization-based trajectories for low-energy dynamics near a conical intersection ( http://arxiv.org/abs/2401.14801v2 ) ライセンス: Link先を確認	Lea M. Ibele, Federica Agostini	(参考訳) 量子ウェーブパケットとトラジェクトリダイナミクスを用いて円錐交点付近で2次元2状態のヤーン・テラー・ハミルトンが生成する低エネルギーダイナミクスについて検討した。近年, 2つの理論表現で生じる位相・幾何学的位相効果の異なる性質を強調するために, 断熱的表現と正確な因子分解を比較することにより, これらのダイナミクスが研究されている。本稿では, 軌道を用いた核運動の近似記述を用いて, 円錐交差近傍の低エネルギーダイナミクスを正確にモデル化する方法を理解するために, 正確な因子分解を用いる。その結果,非断熱効果は弱いが無視できないため,古典的近似を呼び出す軌道に基づく記述は正しい振る舞いを捉えるのに苦労していることがわかった。 We study low-energy dynamics generated by a two-dimensional two-state Jahn-Teller Hamiltonian in the vicinity of a conical intersection using quantum wavepacket and trajectories dynamics. Recently, these dynamics were studied by comparing the adiabatic representation and the exact factorization, with the purpose to highlight the different nature of topological- and geometric-phase effects arising in the two theoretical representation of the same problem. Here, we employ the exact factorization to understand how to model accurately low-energy dynamics in the vicinity of a conical intersection using an approximate description of the nuclear motion that uses trajectories. We find that, since nonadiabatic effects are weak but non-negligible, the trajectory-based description that invokes the classical approximation struggles to capture the correct behavior.	翻訳日:2024-02-07 02:40:27 公開日:2024-02-05
# 自分自身の宇宙をデザインする: グラフニューラルネットワークを実現する物理インフォームド・アグノスティックな方法 Design Your Own Universe: A Physics-Informed Agnostic Method for Enhancing Graph Neural Networks ( http://arxiv.org/abs/2401.14580v2 ) ライセンス: Link先を確認	Dai Shi, Andi Han, Lequan Lin, Yi Guo, Zhiyong Wang, Junbin Gao	(参考訳) 物理インフォームドグラフニューラルネットワークは、オーバースムーシング、オーバースキャッシング、ヘテロフィリー適応といった一般的なGNNの課題を緩和することで、グラフ構造化データを通じて学習において顕著なパフォーマンスを達成した。これらの進歩にもかかわらず、これらの課題に対処するための従来の手法を適切に統合する、単純で効果的なパラダイムの開発はまだ進行中である。本稿では,GNNと物理系における粒子系の伝播の類似を図り,モデルに依存しない拡張フレームワークを提案する。このフレームワークは、追加ノードを導入し、ノードラベル情報によってガイドされる正と負の重みの両方で接続を切り替えることで、グラフ構造を強化する。提案手法によって強化されたGNNが,過度にスムースな問題を効果的に回避し,過度なスキャッシングに対する堅牢性を示すことを理論的に検証する。さらに,リワイヤグラフのスペクトル解析を行い,対応するgnnがホモ親和グラフとヘテロ親和グラフの両方に適合することを示す。また,同好性グラフ,異好性グラフ,長期グラフデータセットのベンチマークに対する実証的検証により,GNNが元のグラフよりも優れていることが示された。 Physics-informed Graph Neural Networks have achieved remarkable performance in learning through graph-structured data by mitigating common GNN challenges such as over-smoothing, over-squashing, and heterophily adaption. Despite these advancements, the development of a simple yet effective paradigm that appropriately integrates previous methods for handling all these challenges is still underway. In this paper, we draw an analogy between the propagation of GNNs and particle systems in physics, proposing a model-agnostic enhancement framework. This framework enriches the graph structure by introducing additional nodes and rewiring connections with both positive and negative weights, guided by node labeling information. We theoretically verify that GNNs enhanced through our approach can effectively circumvent the over-smoothing issue and exhibit robustness against over-squashing. Moreover, we conduct a spectral analysis on the rewired graph to demonstrate that the corresponding GNNs can fit both homophilic and heterophilic graphs. Empirical validations on benchmarks for homophilic, heterophilic graphs, and long-term graph datasets show that GNNs enhanced by our method significantly outperform their original counterparts.	翻訳日:2024-02-07 02:39:32 公開日:2024-02-05
# 大規模言語モデルによる弱強弱化脱獄 Weak-to-Strong Jailbreaking on Large Language Models ( http://arxiv.org/abs/2401.17256v2 ) ライセンス: Link先を確認	Xuandong Zhao, Xianjun Yang, Tianyu Pang, Chao Du, Lei Li, Yu-Xiang Wang, William Yang Wang	(参考訳) 大規模言語モデル(LLM)は、ジェイルブレイク攻撃に対して脆弱である。しかし、既存の脱獄法は計算コストがかかる。本稿では,LLMを攻撃して有害なテキストを生成する手法として,弱いジェイルブレイク攻撃を提案する。我々の重要な直感は、ジェイルブレイクとアライメントモデルが初期デコード分布でのみ異なるという観察に基づいている。弱い対強攻撃の主な技術的洞察は、より小さな2つのモデル(安全モデルと安全でないモデル)を使用して、はるかに大きな安全なモデルの復号可能性を修正することである。 3つの組織から5種類のLSMに対する弱い攻撃を評価した。その結果,本手法は2つのデータセットにおいて,例えば1回のフォワードパスで99%以上のミスアリゲーション率を向上できることがわかった。 LLMを整列させる際に対処する必要がある緊急安全問題を明らかにする。最初の試みとして、このような攻撃から防御するための防衛戦略を提案するが、より高度な防御を創造することは依然として困難である。このメソッドを複製するコードはhttps://github.com/xuandongzhao/weak-to-strongで入手できる。 Large language models (LLMs) are vulnerable to jailbreak attacks - resulting in harmful, unethical, or biased text generations. However, existing jailbreaking methods are computationally costly. In this paper, we propose the weak-to-strong jailbreaking attack, an efficient method to attack aligned LLMs to produce harmful text. Our key intuition is based on the observation that jailbroken and aligned models only differ in their initial decoding distributions. The weak-to-strong attack's key technical insight is using two smaller models (a safe and an unsafe one) to adversarially modify a significantly larger safe model's decoding probabilities. We evaluate the weak-to-strong attack on 5 diverse LLMs from 3 organizations. The results show our method can increase the misalignment rate to over 99% on two datasets with just one forward pass per example. Our study exposes an urgent safety issue that needs to be addressed when aligning LLMs. As an initial attempt, we propose a defense strategy to protect against such attacks, but creating more advanced defenses remains challenging. The code for replicating the method is available at https://github.com/XuandongZhao/weak-to-strong	翻訳日:2024-02-07 02:30:02 公開日:2024-02-05
# SemScore:意味的テクスチャ類似性に基づく指導型LLMの自動評価 SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity ( http://arxiv.org/abs/2401.17072v2 ) ライセンス: Link先を確認	Ansar Aynetdinov, Alan Akbik	(参考訳) 命令調整型大規模言語モデル(llms)は、最近、自然言語命令に適合した応答を生成する能力の顕著な進歩を見せている。しかし、現在の多くの研究は、生成された応答の品質を判断するために手作業による評価に依存している。このような手作業による評価は時間を要するため、複数のモデルやモデル変種の評価に容易にスケールできない。本稿では,セムスコア(SemScore)という,モデル出力とゴールドターゲット応答を直接意味的テキスト類似度(STS)を用いて比較する手法を提案する。テキスト生成のための8種類の評価指標を用いて12個の著名な命令調整llmのモデル出力の比較評価を行った。提案したSemScore測定基準は,人間の評価と相関関係において,より複雑な評価指標よりも優れていることがわかった。これらの結果から,提案手法の有効性が示唆された。 Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable advancements in their ability to generate fitting responses to natural language instructions. However, many current works rely on manual evaluation to judge the quality of generated responses. Since such manual evaluation is time-consuming, it does not easily scale to the evaluation of multiple models and model variants. In this short paper, we propose a straightforward but remarkably effective evaluation metric called SemScore, in which we directly compare model outputs to gold target responses using semantic textual similarity (STS). We conduct a comparative evaluation of the model outputs of 12 prominent instruction-tuned LLMs using 8 widely-used evaluation metrics for text generation. We find that our proposed SemScore metric outperforms all other, in many cases more complex, evaluation metrics in terms of correlation to human evaluation. These findings indicate the utility of our proposed metric for the evaluation of instruction-tuned LLMs.	翻訳日:2024-02-07 02:29:24 公開日:2024-02-05
# EarthGPT:リモートセンシング領域におけるマルチセンサ画像理解のための汎用マルチモーダル大言語モデル EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain ( http://arxiv.org/abs/2401.16822v2 ) ライセンス: Link先を確認	Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao	(参考訳) マルチモーダル大言語モデル(MLLM)は、自然画像領域における視覚および視覚言語タスクにおいて顕著な成功を収めている。自然とリモートセンシング(RS)画像の間に大きな多様性があるため、RSドメインにおけるMLLMの開発はまだ幼児期にある。このギャップを埋めるために,多様なマルチセンサRS解釈タスクを統一的に統合したEarthGPTという先駆的なMLLMを提案する。 earthgptでは、視覚強調知覚機構、クロスモーダル相互理解アプローチ、rsドメインにおけるマルチセンサマルチタスクのための統一命令チューニング手法を含む3つの鍵となる手法が開発されている。さらに、大規模マルチセンサマルチモーダルRS命令追従を特徴とするMMRS-1Mというデータセットを構築し、34の既存RSデータセットに基づいて100万以上の画像テキストペアを構成し、光学、合成開口レーダ(SAR)、赤外線などのマルチセンサ画像を含む。 MMRS-1Mデータセットは、RSの専門家知識に基づくMLLMの欠点に対処し、RSドメインにおけるMLLMの開発を刺激する。大規模な実験を行い、他の専門モデルやMLLMと比較して様々な視覚的解釈タスクにおいて、EarthGPTの優れた性能を示し、提案したEarthGPTの有効性を証明し、オープンセット推論タスクに汎用的なパラダイムを提供する。 Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant diversities between the natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpretation tasks uniformly is proposed in this paper for universal RS image comprehension. In EarthGPT, three key techniques are developed including a visual-enhanced perception mechanism, a cross-modal mutual comprehension approach, and a unified instruction tuning method for multi-sensor multi-task in the RS domain. More importantly, a dataset named MMRS-1M featuring large-scale multi-sensor multi-modal RS instruction-following is constructed, comprising over 1M image-text pairs based on 34 existing diverse RS datasets and including multi-sensor images such as optical, synthetic aperture radar (SAR), and infrared. The MMRS-1M dataset addresses the drawback of MLLMs on RS expert knowledge and stimulates the development of MLLMs in the RS domain. Extensive experiments are conducted, demonstrating the EarthGPT's superior performance in various RS visual interpretation tasks compared with the other specialist models and MLLMs, proving the effectiveness of the proposed EarthGPT and offering a versatile paradigm for open-set reasoning tasks.	翻訳日:2024-02-07 02:28:42 公開日:2024-02-05
# 言語モデルにおけるアライメントとヘルプフルネスのトレードオフ Tradeoffs Between Alignment and Helpfulness in Language Models ( http://arxiv.org/abs/2401.16332v2 ) ライセンス: Link先を確認	Yotam Wolf, Noam Wies, Dorin Shteyman, Binyamin Rothberg, Yoav Levine, and Amnon Shashua	(参考訳) 言語モデルのアライメントはAIの安全性の重要なコンポーネントとなり、望ましい行動を強化し、望ましくない行動を抑制することによって、人間と言語モデルの安全な相互作用を可能にする。しばしば、モデルをチューニングしたり、プリセットされたアライメントプロンプトを挿入することで行われる。近年,トレーニング後の表現を変更することによってモデルの動作を変化させる表現工学がllmの調整に有効であることが示されている(zou et al., 2023a)。表現工学は、敵対的攻撃に対する抵抗や社会的バイアスの低減など、アライメント指向のタスクの成果をもたらすが、モデルが基本的なタスクを実行する能力の低下を引き起こすことも示されている。本稿では,アライメントの増大とモデルの有用性の低下のトレードオフについて検討する。この2つの量の境界を提供する理論的枠組みを提案し,その妥当性を実証的に示す。興味深いことに、典型的には有用性は減少するが、表現工学のベクトルのノルムと2次的に作用するのに対し、アライメントは線形に増大し、表現工学を使うのが効率的であるレジームを示す。その結果を実証的に検証し,その境界をアライメントのための表現工学の有用性に表した。 Language model alignment has become an important component of AI safety, allowing safe interactions between humans and language models, by enhancing desired behaviors and inhibiting undesired ones. It is often done by tuning the model or inserting preset aligning prompts. Recently, representation engineering, a method which alters the model's behavior via changing its representations post-training, was shown to be effective in aligning LLMs (Zou et al., 2023a). Representation engineering yields gains in alignment oriented tasks such as resistance to adversarial attacks and reduction of social biases, but was also shown to cause a decrease in the ability of the model to perform basic tasks. In this paper we study the tradeoff between the increase in alignment and decrease in helpfulness of the model. We propose a theoretical framework which provides bounds for these two quantities, and demonstrate their relevance empirically. Interestingly, we find that while the helpfulness generally decreases, it does so quadratically with the norm of the representation engineering vector, while the alignment increases linearly with it, indicating a regime in which it is efficient to use representation engineering. We validate our findings empirically, and chart the boundaries to the usefulness of representation engineering for alignment.	翻訳日:2024-02-07 02:27:36 公開日:2024-02-05
# ハイブリッドエッジクラウドで認知インターネットを強化するブロックの構築 Building Blocks to Empower Cognitive Internet with Hybrid Edge Cloud ( http://arxiv.org/abs/2402.00876v2 ) ライセンス: Link先を確認	Siavash Alamouti, Fay Arjomandi, Michel Burger and Dr. Bashar Altakrouri	(参考訳) モバイルインターネットから「認知的インターネット」に移行するにつれて、テクノロジーや知性との関わり方に大きな変化が生まれます。我々は、コグニティブ・インターネットはコグニティブ・モノのインターネット(認知的iot)を超え、コネクテッド・オブジェクトが知識と理解を独立して得ることができると主張している。 Mobile InternetやCognitive IoTとは異なり、Cognitive Internetはネットワーク全体のコラボレーティブインテリジェンスを統合し、認知IoT領域とシステム全体のコラボレーションとヒューマンインテリジェンスを融合させる。この統合インテリジェンスは、意思決定の自律性を保ち、さまざまなアイデンティティを取り入れながら、さまざまなドメインにわたるデバイス、サービス、エンティティ、個人間のインタラクションを促進する。この論文は「認知的インターネット」パラダイムの基礎的要素、特徴、利益、産業的影響について論じている。このシフトを実現する上で、適応可能なAIインフラストラクチャとハイブリッドエッジクラウド(HEC)プラットフォームの重要性を強調している。この進化によって、認知サービス、知識・アズ・ア・サービス(KaaS)経済、意思決定の自律性の向上、持続可能なデジタル進歩、データ管理の進歩、処理技術、プライバシーの強化などが実現される。本論文は,認知インターネットにおけるhecの変容可能性を理解し,活用するための重要な資源である。ケーススタディ、前方視界、現実世界のアプリケーションによってサポートされ、この新興パラダイムに関する包括的な洞察を提供する。 As we transition from the mobile internet to the 'Cognitive Internet,' a significant shift occurs in how we engage with technology and intelligence. We contend that the Cognitive Internet goes beyond the Cognitive Internet of Things (Cognitive IoT), enabling connected objects to independently acquire knowledge and understanding. Unlike the Mobile Internet and Cognitive IoT, the Cognitive Internet integrates collaborative intelligence throughout the network, blending the cognitive IoT realm with system-wide collaboration and human intelligence. This integrated intelligence facilitates interactions between devices, services, entities, and individuals across diverse domains while preserving decision-making autonomy and accommodating various identities. The paper delves into the foundational elements, distinct characteristics, benefits, and industrial impact of the 'Cognitive Internet' paradigm. It highlights the importance of adaptable AI infrastructures and hybrid edge cloud (HEC) platforms in enabling this shift. This evolution brings forth cognitive services, a Knowledge as a Service (KaaS) economy, enhanced decision-making autonomy, sustainable digital progress, advancements in data management, processing techniques, and a stronger emphasis on privacy. In essence, this paper serves as a crucial resource for understanding and leveraging the transformative potential of HEC for Cognitive Internet. Supported by case studies, forward-looking perspectives, and real-world applications, it provides comprehensive insights into this emerging paradigm.	翻訳日:2024-02-07 02:17:11 公開日:2024-02-05
# 分子エンベディング用LLaMAとChatGPTエンベディングの比較解析 Comparative Analysis of LLaMA and ChatGPT Embeddings for Molecule Embedding ( http://arxiv.org/abs/2402.00024v2 ) ライセンス: Link先を確認	Shaghayegh Sadeghi, Alan Bui, Ali Forooghi, Jianguo Lu, Alioune Ngom	(参考訳) 目的: ChatGPT や LLaMA のような大規模言語モデル (LLM) は,化学情報学の分野,特に化学構造を表現する標準的な方法である単純分子入力線入力システム (SMILES) の解釈において,その可能性をますます認識している。これらのLLMはSMILES文字列をベクトル表現にデコードすることができ、化学グラフを理解するための新しいアプローチを提供する。方法: SMILES文字列の埋め込みにおけるChatGPTとLLaMAの性能について検討する。我々は分子特性予測 (mp) と薬物-薬物相互作用予測 (ddi) の2つの重要な応用について評価を行った。結果: LLaMAを用いて生成したSMILES埋め込みは,MPおよびDDI予測タスクにおいてChatGPTより優れていた。特に、LLaMAベースのSMILES埋め込みは、両方の予測タスクで既存のメソッドに匹敵する結果を示す。結論: LLMのケミノフォマティクスへの応用,特にSMILESの組込み利用は,薬物開発を進展させる大きな可能性を示唆している。これには、化学的性質の予測を改善し、薬物発見プロセスを促進することが含まれる。 GitHub:https://github.com/sshaghayeghs/LLaMA-VS-ChatGPT Purpose: Large Language Models (LLMs) like ChatGPT and LLaMA are increasingly recognized for their potential in the field of cheminformatics, particularly in interpreting Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs can decode SMILES strings into vector representations, providing a novel approach to understanding chemical graphs. Methods: We investigate the performance of ChatGPT and LLaMA in embedding SMILES strings. Our evaluation focuses on two key applications: molecular property (MP) prediction and drug-drug interaction (DDI) prediction, both essential in drug development and healthcare. Results: We find that SMILES embeddings generated using LLaMA outperform those from ChatGPT in both MP and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to existing methods in both prediction tasks. Conclusion: The application of LLMs in cheminformatics, particularly in utilizing SMILES embeddings, shows significant promise for advancing drug development. This includes improving the prediction of chemical properties and facilitating the drug discovery process. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-ChatGPT	翻訳日:2024-02-07 02:15:50 公開日:2024-02-05
# テレコネクテーション変換器を用いた高効率季節性気象予報 Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers ( http://arxiv.org/abs/2401.17870v2 ) ライセンス: Link先を確認	Shan Zhao, Zhitong Xiong, Xiao Xiang Zhu	(参考訳) 農業、水資源管理、災害の早期警戒にとって重要な季節的予報は、大気のカオス性による課題に直面している。機械学習(ML)の最近の進歩は、数値モデルに対する競争力のある予測スキルを達成することによって、天気予報に革命をもたらした。しかし、そのような基礎モデルのトレーニングには何千日ものGPU日を必要とするため、炭素排出量が大きくなり、適用性が制限される。さらに、MLモデルは、物理的整合性や気象学的意味に欠ける滑らかな結果を生成することで、画素単位の誤差スコアを騙す傾向にある。上記の問題に対処するために,テレコネクション変換器を提案する。我々のアーキテクチャは事前学習されたpanguモデルを利用して、適切な初期重み付けを達成し、テレコネクションインフォームされた時間モジュールを統合し、拡張された時間範囲での予測可能性を向上させる。また,Panguモデルのパラメータの1.1%を調整することにより,2週間のリードタイムで4面および5つの上層大気変数の予測可能性を高める。さらに, テレコネクションフィルタにより出力の空間的粒度が大幅に向上し, 物理的整合性が示唆された。我々の研究は、将来の気象条件を駆動する上で、大気と海洋のテレコネクションの重要性を強調している。さらに、研究者が既存の基盤モデルを多目的下流タスクで活用するための資源効率の高い経路を提供する。 Subseasonal forecasting, which is pivotal for agriculture, water resource management, and early warning of disasters, faces challenges due to the chaotic nature of the atmosphere. Recent advances in machine learning (ML) have revolutionized weather forecasting by achieving competitive predictive skills to numerical models. However, training such foundation models requires thousands of GPU days, which causes substantial carbon emissions and limits their broader applicability. Moreover, ML models tend to fool the pixel-wise error scores by producing smoothed results which lack physical consistency and meteorological meaning. To deal with the aforementioned problems, we propose a teleconnection-informed transformer. Our architecture leverages the pretrained Pangu model to achieve good initial weights and integrates a teleconnection-informed temporal module to improve predictability in an extended temporal range. Remarkably, by adjusting 1.1% of the Pangu model's parameters, our method enhances predictability on four surface and five upper-level atmospheric variables at a two-week lead time. Furthermore, the teleconnection-filtered features improve the spatial granularity of outputs significantly, indicating their potential physical consistency. Our research underscores the importance of atmospheric and oceanic teleconnections in driving future weather conditions. Besides, it presents a resource-efficient pathway for researchers to leverage existing foundation models on versatile downstream tasks.	翻訳日:2024-02-07 02:15:04 公開日:2024-02-05
# Chain-of-Feedback: 応答における不整合の影響の緩和 Chain-of-Feedback: Mitigating the Effects of Inconsistency in Responses ( http://arxiv.org/abs/2402.02648v1 ) ライセンス: Link先を確認	Jinwoo Ahn	(参考訳) 大きな言語モデル(LLM)はしばしば知識集約的な質問に悩まされ、同じ入力を与えられたにもかかわらず異なる出力を提供することで矛盾することが多い。応答品質は、LLMが正しい初期にもかかわらず応答を調整する原因となる強固な反対姿勢をユーザが表すと悪化する。これらの行動は、これらのモデルによって提供される応答の信頼性と妥当性を低下させる。本稿では,我々が試みる 1)cof(chain-of-feedback)がllmを実際の回答から逸脱させる様子を示し、chatgptのようなaiエージェントに過度に頼りすぎている固有のリスクに対する認識を高める。 2) フィードバックの帰納的連鎖 (Recursive Chain of Feedback, R-CoF) という新たなプロンプト手法を提案する。 CoFシステムは、オープンエンドのマルチステップ問題に対処する。そして、また別の試みを要求する無意味なフィードバックを繰り返し提供します。予備実験では,フィードバックは応答の質を低下させるのみであることが示された。一方, 上記の不整合の影響を軽減するため, 各不整合を個別に繰り返し分解することで, LLMが提供する初期不整合推論を再帰的に修正する手法を提案する。 Large Language Models (LLMs) frequently suffer from knowledge-intensive questions, often being inconsistent by providing different outputs despite given the same input. The response quality worsens when the user expresses a firm opposing stance which causes the LLMs to adjust its response despite the correct initial one. These behaviors decrease the reliability and validity of the responses provided by these models. In this paper, we attempt to 1) raise awareness of the inherent risks that follow from overly relying on AI agents like ChatGPT by showing how Chain-of-Feedback (CoF) triggers LLMs to deviate more from the actual answer and 2) suggest a novel prompting method, Recursive Chain of Feedback (R-CoF), that we are conducting further study. The CoF system takes in an open-ended multi-step question. Then, we repetitively provide meaningless feedback requesting another attempt. Our preliminary experiments show that such feedback only decreases the quality of the response. On the other hand, to mitigate the effects of the aforementioned inconsistencies, we present a novel method of recursively revising the initial incorrect reasoning provided by the LLM by repetitively breaking down each incorrect step into smaller individual problems.	翻訳日:2024-02-06 18:39:51 公開日:2024-02-05
# ギグアップ:チャットgptがquoraに対抗してgig economy insightsに対抗 The Gig's Up: How ChatGPT Stacks Up Against Quora on Gig Economy Insights ( http://arxiv.org/abs/2402.02676v1 ) ライセンス: Link先を確認	Thomas Lancaster	(参考訳) 生成ai(generative ai)は、ギグエコノミーや労働市場など、さまざまな分野の質問に対する答えを求める方法を変えつつあるが、既存の質問や回答プラットフォームから得られる、密接にチャットgptシミュレートされたアウトプットマッチに関する情報は限られている。本稿では、ChatGPTを研究アシスタントとして使用し、ChatGPTがQuoraの質問や回答をどの程度再現できるかを、ギグエコノミーのデータを用いて調査する。コンテンツ分析の結果から、quoraはお金を稼ごうとするユーザーから質問を受ける傾向があり、回答には個人的な経験や例が含まれる可能性が示唆されている。 ChatGPTシミュレートされたバージョンは、雇用や労働権に関する考慮を含め、より個人的でコンセプトに基づくものではない。したがって、生成的AIは、ギグエコノミーに関する回答で人間が望むことの一部だけをシミュレートしているようだ。本稿では、類似した比較手法が他の研究分野にまたがって有効であり、生成AIの最良の実世界利用の確立に役立つことを提案する。 Generative AI is changing the way in which humans seek to find answers to questions in different fields including on the gig economy and labour markets, but there is limited information available about closely ChatGPT simulated output matches that obtainable from existing question and answer platforms. This paper uses ChatGPT as a research assistant to explore how far ChatGPT can replicate Quora question and answers, using data from the gig economy as an indicative case study. The results from content analysis suggest that Quora is likely to be asked questions from users looking to make money and answers are likely to include personal experiences and examples. ChatGPT simulated versions are less personal and more concept-based, including considerations on employment implications and labour rights. It appears therefore that generative AI simulates only part of what a human would want in their answers relating to the gig economy. The paper proposes that a similar comparative methodology would also be useful across other research fields to help in establishing the best real world uses of generative AI.	翻訳日:2024-02-06 18:26:46 公開日:2024-02-05
# 分散検出のためのプロトタイプの混合による学習 Learning with Mixture of Prototypes for Out-of-Distribution Detection ( http://arxiv.org/abs/2402.02653v1 ) ライセンス: Link先を確認	Haodong Lu, Dong Gong, Shuo Wang, Jason Xue, Lina Yao, Kristen Moore	(参考訳) Out-of-distriion(OOD)検出は、実際の世界で機械学習モデルの安全なデプロイに不可欠である、In-distriion(ID)トレーニングデータから遠く離れたテストサンプルを検出することを目的としている。深部表現学習の強化により,距離に基づくOOD検出法が出現している。彼らは、IDクラスセントロイドやプロトタイプからの距離を測定することで、見えないOODサンプルを識別する。しかし、既存のアプローチでは、1セントロイドクラスのプロトタイプで各クラスのIDデータをモデル化したり、OOD検出用に設計されていない損失関数を用いてデータ内の自然の多様性を見渡すなど、単純化されたデータ仮定に基づく表現を学習している。各クラスのデータサンプルを1つのプロトタイプでコンパクトにすることは、現実的なデータの不十分なモデリングと限られたパフォーマンスをもたらす。これらの課題に対処するために,複数のプロトタイプを用いて各クラスをモデル化し,より忠実でコンパクトなサンプル埋め込みを学習し,OOD検出を向上させるプロトタイプ混在型学習(PALM)を提案する。提案手法はプロトタイプを自動的に識別し動的に更新し,各サンプルを相互隣り合うソフト割り当て重みでプロトタイプのサブセットに割り当てる。 PALMは最大誤差推定(MLE)の損失を最適化し、サンプル埋め込みが関連するプロトタイプの周囲にコンパクトになるように促し、また全てのプロトタイプに対照的な損失を与え、プロトタイプレベルでクラス内コンパクト性とクラス間差別を高める。さらに, プロトタイプの自動推定により, 難解な OOD 検出タスクに非競合IDデータで拡張することが可能となる。大規模な実験はPALMの優位性を示し、挑戦的なCIFAR-100ベンチマークで93.82の最先端平均AUROC性能を達成した。コードはhttps://github.com/jeff024/PALMで公開されている。 Out-of-distribution (OOD) detection aims to detect testing samples far away from the in-distribution (ID) training data, which is crucial for the safe deployment of machine learning models in the real world. Distance-based OOD detection methods have emerged with enhanced deep representation learning. They identify unseen OOD samples by measuring their distances from ID class centroids or prototypes. However, existing approaches learn the representation relying on oversimplified data assumptions, e.g, modeling ID data of each class with one centroid class prototype or using loss functions not designed for OOD detection, which overlook the natural diversities within the data. Naively enforcing data samples of each class to be compact around only one prototype leads to inadequate modeling of realistic data and limited performance. To tackle these issues, we propose PrototypicAl Learning with a Mixture of prototypes (PALM) which models each class with multiple prototypes to capture the sample diversities, and learns more faithful and compact samples embeddings to enhance OOD detection. Our method automatically identifies and dynamically updates prototypes, assigning each sample to a subset of prototypes via reciprocal neighbor soft assignment weights. PALM optimizes a maximum likelihood estimation (MLE) loss to encourage the sample embeddings to be compact around the associated prototypes, as well as a contrastive loss on all prototypes to enhance intra-class compactness and inter-class discrimination at the prototype level. Moreover, the automatic estimation of prototypes enables our approach to be extended to the challenging OOD detection task with unlabelled ID data. Extensive experiments demonstrate the superiority of PALM, achieving state-of-the-art average AUROC performance of 93.82 on the challenging CIFAR-100 benchmark. Code is available at https://github.com/jeff024/PALM.	翻訳日:2024-02-06 18:26:23 公開日:2024-02-05
# 強化学習のための視覚言語モデルの提案 Vision-Language Models Provide Promptable Representations for Reinforcement Learning ( http://arxiv.org/abs/2402.02651v1 ) ライセンス: Link先を確認	William Chen and Oier Mees and Aviral Kumar and Sergey Levine	(参考訳) 人間は背景の世界知識を生かして新しい行動を学ぶことができる。対照的に、強化学習(RL)で訓練されたエージェントは通常、スクラッチから行動を学ぶ。そこで本研究では,インターネット規模で事前学習した視覚言語モデル (VLM) に符号化された多量の一般・索引可能な世界知識を具体化するための新しい手法を提案する。視覚的な観察に基礎を置き、vlmの内部知識に基づいて意味的特徴をエンコードする埋め込みであり、タスクコンテキストと補助情報を提供するプロンプトによって引き起こされる。本研究では,ハビタットのマインクラフトとロボットナビゲーションにおいて,視覚的に複雑で長い水平方向のRLタスクに対するアプローチを評価する。汎用的なVLMから抽出した埋め込みを訓練したポリシーは、汎用的な非プロンプト可能な画像埋め込みを訓練した同等のポリシーより優れていた。また,本手法は命令追従法より優れ,ドメイン固有の埋め込みと互換性がある。 Humans can quickly learn new behaviors by leveraging background world knowledge. In contrast, agents trained with reinforcement learning (RL) typically learn behaviors from scratch. We thus propose a novel approach that uses the vast amounts of general and indexable world knowledge encoded in vision-language models (VLMs) pre-trained on Internet-scale data for embodied RL. We initialize policies with VLMs by using them as promptable representations: embeddings that are grounded in visual observations and encode semantic features based on the VLM's internal knowledge, as elicited through prompts that provide task context and auxiliary information. We evaluate our approach on visually-complex, long horizon RL tasks in Minecraft and robot navigation in Habitat. We find that our policies trained on embeddings extracted from general-purpose VLMs outperform equivalent policies trained on generic, non-promptable image embeddings. We also find our approach outperforms instruction-following methods and performs comparably to domain-specific embeddings.	翻訳日:2024-02-06 18:25:52 公開日:2024-02-05
# 医用画像セグメンテーションのための適応型Deep Supervisionを用いたDensely Decoded Networks Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation ( http://arxiv.org/abs/2402.02649v1 ) ライセンス: Link先を確認	Suraj Mishra	(参考訳) ディープニューラルネットワークを用いた医用画像分割が成功している。しかし、これらのネットワークの有効性は、密度の低い予測と頑健な特徴を抽出できないことによって制限されることが多い。本研究では,'crutch'ネットワーク接続を選択的に導入し,高密度復号化ネットワーク(ddn)を提案する。ネットワークデコーダ(1)のアップサンプリング段階におけるこのような「クラッチ」接続は、エンコーダからの高解像度特徴を取り入れたターゲットローカライゼーションを強化し、(2)多段階のコンテキスト情報フローを容易にすることでセグメンテーションを改善する。さらに,適応的深層監視(ads)に基づくトレーニング戦略を提案し,入力データセットの特定の属性を活用・適応し,ロバストな特徴抽出を行う。特にadsは、ネットワークの平均入力オブジェクトサイズと層別有効受容フィールド(lerf)をマッチングすることにより、補助的な監督を戦略的に配置し、展開する。このような「コンパニオン目標」を特定の隠蔽層から含めることで、トレーニング中にネットワークが「無視」する可能性のある、いくつかの異なる入力依存機能にモデルが注意を払うのに役立つ。当社の新しいネットワークとトレーニング戦略は、異なるモダリティの4つの多様なデータセット上で検証され、その効果を示しています。 Medical image segmentation using deep neural networks has been highly successful. However, the effectiveness of these networks is often limited by inadequate dense prediction and inability to extract robust features. To achieve refined dense prediction, we propose densely decoded networks (ddn), by selectively introducing 'crutch' network connections. Such 'crutch' connections in each upsampling stage of the network decoder (1) enhance target localization by incorporating high resolution features from the encoder, and (2) improve segmentation by facilitating multi-stage contextual information flow. Further, we present a training strategy based on adaptive deep supervision (ads), which exploits and adapts specific attributes of input dataset, for robust feature extraction. In particular, ads strategically locates and deploys auxiliary supervision, by matching the average input object size with the layer-wise effective receptive fields (lerf) of a network, resulting in a class of ddns. Such inclusion of 'companion objective' from a specific hidden layer, helps the model pay close attention to some distinct input-dependent features, which the network might otherwise 'ignore' during training. Our new networks and training strategy are validated on 4 diverse datasets of different modalities, demonstrating their effectiveness.	翻訳日:2024-02-06 18:25:39 公開日:2024-02-05
# マルチリージョンマルコフガウス過程 : 複数の脳領域を横断する方向コミュニケーションを効率的に発見する手法 Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions ( http://arxiv.org/abs/2402.02686v1 ) ライセンス: Link先を確認	Weihan Li, Chengrui Li, Yule Wang, Anqi Wu	(参考訳) 異なる脳領域間の複雑な相互作用を研究することは神経科学において重要である。様々な統計的手法が複数の脳領域にわたる潜伏通信を調査している。主なカテゴリはガウス過程(GP)と線形力学系(LDS)である。 GPに基づくアプローチは、周波数帯域や通信方向などの潜伏変数を効果的に発見する。逆に、LDSベースのアプローチは計算効率が良いが、潜在表現には強力な表現力がない。本研究では,マルチアウトプットgpを反映するlds(multi-region markovian gaussian process,mrm-gp)を作成し,両手法を融合する。我々の研究は、LDSとマルチ出力GPの接続を確立し、ニューラル記録の潜在空間内での周波数と位相遅延を明示的にモデル化する最初のものである。その結果、モデルが線形推論コストをタイムポイントを超えて達成し、解釈可能な低次元表現を提供し、脳領域間の通信方向を明らかにし、振動通信を異なる周波数帯域に分離する。 Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables such as frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work is the first to establish a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.	翻訳日:2024-02-06 18:14:28 公開日:2024-02-05
# 2次元ハニカム位相反強磁性体の固有非線形ホール効果 Intrinsic nonlinear Hall effect in two-dimensional honeycomb topological antiferromagnets ( http://arxiv.org/abs/2402.02685v1 ) ライセンス: Link先を確認	Zheng-Yang Zhuang, Zhongbo Yan	(参考訳) ハニカム格子を持つ2次元系は、格子幾何学、スピン軌道結合、磁気学の相互作用が波動関数の量子幾何において非常に豊かな特徴をもたらすため、様々な種類のホール効果を探求するパラダイム的プラットフォームであることが知られている。本研究では, ハニカム位相型反強磁性体について, $\mathcal{pt}$-symmetric antiferromagnetic kane-mele モデルを用いて効果的に記述し, 格子の異方性, 化学ポテンシャル, n\'{e}el ベクトルの方向の変化に関して, その非線形ホール応答の進化を考察する。 $\mathcal{PT}$-対称性のため、量子幾何起源の先行次ホール効果は、電場の二階効果であり散乱時間に依存しない固有の非線形ホール効果である。反強磁性交換場と格子異方性により駆動されるトポロジカル相転移における固有非線形ホール導電率テンソルの挙動を考察し,その成分が外在非線形ホール効果と異なる符号を変化させないことを示す。弱いドープ状態においては、固有非線形ホール効果は谷分極である。化学ポテンシャルを変化させることで、フェルミ面がリフシッツ転移するときに非線形ホール伝導率テンソルがキンクを示すことが分かる。さらに,スピン回転対称性を持ち上げるためのスピン軌道結合の存在は,n\'{e}elベクトルの方向を検出するために固有非線形ホール効果を用いるために決定的であることがわかった。本研究は, 2次元ハニカム位相反強磁性体が固有非線形ホール効果の研究に豊富な特性を持つ物質系の理想的なクラスであることを示す。 Two-dimensional systems with honeycomb lattice are known to be a paradigmatic platform to explore the various types of Hall effects, owing to that the interplay of lattice geometry, spin-orbit coupling and magnetism can give rise to very rich features in the quantum geometry of wave functions. In this work, we consider honeycomb topological antiferromagets that are effectively described by a $\mathcal{PT}$-symmetric antiferromagnetic Kane-Mele model, and explore the evolution of its nonlinear Hall response with respect to the change of lattice anisotropy, chemical potential, and the direction of the N\'{e}el vector. Due to the $\mathcal{PT}$-symmetry, the leading-order Hall effect of quantum geometric origin is the intrinsic nonlinear Hall effect, which is a second-order effect of electric fields and is independent of the scattering time. We investigate the behavior of the intrinsic nonlinear Hall conductivity tensor across topological phase transitions driven by antiferromagnetic exchange field and lattice anisotropy and find that its components do not change sign, which is different from the extrinsic nonlinear Hall effect. In the weakly doped regime, we find that the intrinsic nonlinear Hall effect is valley-polarized. By varying the chemical potential, we find that the nonlinear Hall conductivity tensors exhibit kinks when the Fermi surface undergoes Lifshitz transitions. Furthermore, we find that the existence of spin-orbit coupling to lift the spin-rotation symmetry is decisive for the use of intrinsic nonlinear Hall effect to detect the direction of the N\'{e}el vector. Our work shows that the two-dimensional honeycomb topological antiferromagnets are an ideal class of material systems with rich properties for the study of intrinsic nonlinear Hall effect.	翻訳日:2024-02-06 18:14:11 公開日:2024-02-05
# 等変対称性破れ集合 Equivariant Symmetry Breaking Sets ( http://arxiv.org/abs/2402.02681v1 ) ライセンス: Link先を確認	YuQing Xie, Tess Smidt	(参考訳) 等価ニューラルネットワーク(ENN)は、基礎となる対称性を含むアプリケーションに非常に効果的であることが示されている。建設によって、ENNはより高い対称性の入力を与えられた低い対称性の出力を生成できない。しかし、自発的対称性の破れは多くの物理系で起こり、初期高対称状態からより対称性の低い安定状態が得られる。したがって、我々はENNの対称性を体系的に破る方法を理解することが不可欠である。本研究では, 完全同変な新しい対称性破壊フレームワークを提案する。我々は、我々のアプローチは、任意のグループの下での等分散に対して一般的かつ適用可能であることを強調する。これを実現するために、対称破れ集合(SBS)の概念を導入する。既存のネットワークを再設計する代わりに、入力と出力の対称性に基づいてネットワークに供給される対称性を破るオブジェクトセットを設計します。これらの集合に同値を定義する自然な方法があることを示し、追加の制約を与える。これらのセットのサイズを最小化することは、データ効率に等しい。これらの集合を最小化することは、よく研究された群論問題に変換され、点群に対するこの問題に対する解を集計する。最後に、我々のアプローチが実際にどのように機能しているかを示すために、対称性の破れのいくつかの例を示す。 Equivariant neural networks (ENNs) have been shown to be extremely effective in applications involving underlying symmetries. By construction ENNs cannot produce lower symmetry outputs given a higher symmetry input. However, spontaneous symmetry breaking occurs in many physical systems and we may obtain a less symmetric stable state from an initial highly symmetric one. Hence, it is imperative that we understand how to systematically break symmetry in ENNs. In this work, we propose a novel symmetry breaking framework that is fully equivariant. We emphasize that our approach is general and applicable to equivariance under any group. To achieve this, we introduce the idea of symmetry breaking sets (SBS). Rather than redesign existing networks, we design sets of symmetry breaking objects which we feed into our network based on the symmetry of our inputs and outputs. We show there is a natural way to define equivariance on these sets, which gives an additional constraint. Minimizing the size of these sets equates to data efficiency. We prove that minimizing these sets translates to a well studied group theory problem, and tabulate solutions to this problem for the point groups. Finally, we provide some examples of symmetry breaking to demonstrate how our approach works in practice.	翻訳日:2024-02-06 18:13:37 公開日:2024-02-05
# 大きな言語モデルは地理的に偏っている Large Language Models are Geographically Biased ( http://arxiv.org/abs/2402.02680v1 ) ライセンス: Link先を確認	Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, Stefano Ermon	(参考訳) 大規模言語モデル(LLM)は本質的に、トレーニングコーパスに含まれるバイアスを持ち、社会的害の永続性につながる可能性がある。これらの基礎モデルの影響が大きくなるにつれて、バイアスの理解と評価が公平性と正確性を達成する上で不可欠となる。我々は、地理のレンズを通して、llmが世界について何を知っているかを研究することを提案する。このアプローチは、文化、人種、言語、政治、宗教といった地理的空間に有意義に投影される人間の生活の多くの側面に根拠のある真実があるため、特に強力である。地理空間予測において,システム的誤りと定義する,様々な問題のある地理的バイアスを示す。当初、LLMは、地上の真実と強いモノトニックな相関を示す評価(Spearmanの$\rho$最大0.89)の形で正確なゼロショット地理空間予測を行うことができることを示した。次に, LLMは, 目的, 主観的なトピックに共通するバイアスを示すことを示す。特にllmは、魅力的さ、道徳性、知性といった様々な敏感な主観的な話題(spearmanの$\rho$ は 0.70 まで)において、より低い社会経済的状況(例えばアフリカの大部分)の場所に対して明らかに偏っている。最後に,これを定量化するためにバイアススコアを導入し,既存のllmにまたがるバイアスの大きさに有意な変動があることを見いだす。 Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is particularly powerful as there is ground truth for the numerous aspects of human life that are meaningfully projected onto geographic space such as culture, race, language, politics, and religion. We show various problematic geographic biases, which we define as systemic errors in geospatial predictions. Initially, we demonstrate that LLMs are capable of making accurate zero-shot geospatial predictions in the form of ratings that show strong monotonic correlation with ground truth (Spearman's $\rho$ of up to 0.89). We then show that LLMs exhibit common biases across a range of objective and subjective topics. In particular, LLMs are clearly biased against locations with lower socioeconomic conditions (e.g. most of Africa) on a variety of sensitive subjective topics such as attractiveness, morality, and intelligence (Spearman's $\rho$ of up to 0.70). Finally, we introduce a bias score to quantify this and find that there is significant variation in the magnitude of bias across existing LLMs.	翻訳日:2024-02-06 18:13:21 公開日:2024-02-05
# 因果発見を用いたブラックボックス機械学習モデルの実用的説明とクレジットレーティングへの応用 Counterfactual Explanations of Black-box Machine Learning Models using Causal Discovery with Applications to Credit Rating ( http://arxiv.org/abs/2402.02678v1 ) ライセンス: Link先を確認	Daisuke Takahashi, Shohei Shimizu and Takuma Tanaka	(参考訳) 説明可能な人工知能(xai)は、機械学習アルゴリズムの内部メカニズムの解明に役立ち、予測の基礎を実証することで信頼性を高めている。いくつかのXAIモデルは、予測モデルの入出力関係と特徴間の依存関係を調べることによって、因果関係をモデルを説明する。これらのモデルの大半は、因果グラフが知られていると仮定して、反事実確率に基づく説明に基づいている。しかし、この仮定は、ほとんどの場合、特徴間の因果関係が未知であるため、そのようなモデルの実際のデータへの適用を複雑にする。そこで本研究では,因果グラフが知られている制約を緩和する新しいXAIフレームワークを提案する。この枠組みは,因果的確率と因果的構造に関する事前情報を活用し,因果的発見法とブラックボックス分類モデルにより推定される因果グラフの統合を容易にする。さらに,反事実確率に基づいて説明スコアを推定した。人工データを用いた数値実験により,説明スコアを因果グラフの欠如よりも正確に推定できる可能性が確認された。最後に,実データへの適用として,滋賀県志賀銀行に割り当てられた信用格付けの分類モデルを構築した。因果グラフが不明な場合に,提案手法の有効性を実証した。 Explainable artificial intelligence (XAI) has helped elucidate the internal mechanisms of machine learning algorithms, bolstering their reliability by demonstrating the basis of their predictions. Several XAI models consider causal relationships to explain models by examining the input-output relationships of prediction models and the dependencies between features. The majority of these models have been based their explanations on counterfactual probabilities, assuming that the causal graph is known. However, this assumption complicates the application of such models to real data, given that the causal relationships between features are unknown in most cases. Thus, this study proposed a novel XAI framework that relaxed the constraint that the causal graph is known. This framework leveraged counterfactual probabilities and additional prior information on causal structure, facilitating the integration of a causal graph estimated through causal discovery methods and a black-box classification model. Furthermore, explanatory scores were estimated based on counterfactual probabilities. Numerical experiments conducted employing artificial data confirmed the possibility of estimating the explanatory score more accurately than in the absence of a causal graph. Finally, as an application to real data, we constructed a classification model of credit ratings assigned by Shiga Bank, Shiga prefecture, Japan. We demonstrated the effectiveness of the proposed method in cases where the causal graph is unknown.	翻訳日:2024-02-06 18:12:59 公開日:2024-02-05
# zkSNARKを用いた機械学習モデルの検証評価 Verifiable evaluations of machine learning models using zkSNARKs ( http://arxiv.org/abs/2402.02675v1 ) ライセンス: Link先を確認	Tobin South, Alexander Camuto, Shrey Jain, Shayla Nguyen, Robert Mahari, Christian Paquin, Jason Morton, Alex 'Sandy' Pentland	(参考訳) クローズドソースの商用機械学習モデルの増加の世界では、開発者によるモデル評価を顔の値で行う必要があります。これらのベンチマーク結果(タスク精度、バイアス評価、安全チェック)は、ブラックボックスモデル出力でベンチマークを再実行するコストや不可能なプロセスなしで、モデルエンドユーザーによる検証は従来不可能である。本研究は,zkSNARKによるモデル推論を用いたモデルの検証手法を提案する。結果として得られた、データセット上のモデル出力のゼロ知識計算証明を検証可能な評価アテレーションにパッケージ化することで、固定されたプライベートウェイトを持つモデルが、指定されたパフォーマンスまたは公開入力に対する公平性メトリクスを達成することを示すことができる。これらの検証可能な検証は、計算要件の異なる任意の標準ニューラルネットワークモデル上で実行できる。実世界のモデルのサンプルを通じてこれを初めて実証し、重要な課題と設計ソリューションを強調する。これは、プライベートモデルの検証可能な評価において、新しい透明性パラダイムを示す。 In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results, whether over task accuracy, bias evaluations, or safety checks, are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. These verifiable attestations can be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models.	翻訳日:2024-02-06 18:12:38 公開日:2024-02-05
# 分散データに対する条件付き平均治療効果の推定--プライバシ保存アプローチ Estimation of conditional average treatment effects on distributed data: A privacy-preserving approach ( http://arxiv.org/abs/2402.02672v1 ) ライセンス: Link先を確認	Yuji Kawamata, Ryoki Motai, Yukihiko Okada, Akira Imakura, Tetsuya Sakurai	(参考訳) 条件平均治療効果(CATE)の推定は、医学や社会科学など様々な分野において重要なトピックである。複数のパーティにわたる分散データが集中できる場合、CATEは高い精度で推定できる。しかし、プライバシー情報を含む場合、そのようなデータを集約することは困難である。そこで本研究では,分散データのプライバシ保存を伴うCATEモデルの推定手法であるDC-DML(Data collaboration double machine learning)を提案し,数値実験により評価した。私たちの貢献は以下の3点にまとめられている。まず,分散データ上で反復的な通信を行うことなく,半パラメトリックCATEモデルの推定とテストを可能にする。半パラメトリックまたは非パラメトリックCATEモデルは、パラメトリックモデルよりも誤特定をモデル化するのに堅牢な推定とテストを可能にする。しかし,分散データ上で半パラメトリック・非パラメトリック・ケートモデルを推定・評価するための通信効率のよい手法は提案されていない。第2に,次元レデュースした中間表現を蓄積できるため,複数の時間点とパーティ間の協調的な推定が可能となる。第3に, 合成, 半合成, 実世界のデータセットを用いた評価実験において, 本手法は, 他の手法よりも優れていた。 Estimation of conditional average treatment effects (CATEs) is an important topic in various fields such as medical and social sciences. CATEs can be estimated with high accuracy if distributed data across multiple parties can be centralized. However, it is difficult to aggregate such data if they contain privacy information. To address this issue, we proposed data collaboration double machine learning (DC-DML), a method that can estimate CATE models with privacy preservation of distributed data, and evaluated the method through numerical experiments. Our contributions are summarized in the following three points. First, our method enables estimation and testing of semi-parametric CATE models without iterative communication on distributed data. Semi-parametric or non-parametric CATE models enable estimation and testing that is more robust to model mis-specification than parametric models. However, to our knowledge, no communication-efficient method has been proposed for estimating and testing semi-parametric or non-parametric CATE models on distributed data. Second, our method enables collaborative estimation between different parties as well as multiple time points because the dimensionality-reduced intermediate representations can be accumulated. Third, our method performed as well or better than other methods in evaluation experiments using synthetic, semi-synthetic and real-world datasets.	翻訳日:2024-02-06 18:12:22 公開日:2024-02-05
# ユーティリティベース強化学習:単一目的と多目的の強化学習の統合 Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning ( http://arxiv.org/abs/2402.02665v1 ) ライセンス: Link先を確認	Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan K\"allstr\"om, Gabriel Ramos, Roxana R\u{a}dulescu, Willem R\"opke, Diederik M. Roijers	(参考訳) マルチオブジェクト強化学習(MORL)の研究は、環境報酬と、それらの報酬からユーザによって導出されるユーティリティを定義する機能の両方を利用するユーティリティベースのパラダイムを導入した。本稿では,このパラダイムを単目的強化学習(rl)の文脈にまで拡張し,未知の目標,リスク対応rl,割引,安全なrlに関するタスク間でマルチポリシー学習を行う能力を含む,複数の潜在的なメリットについて概説する。また,ユーティリティベースのアプローチを採用することのアルゴリズム的意味についても検討する。 Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.	翻訳日:2024-02-06 18:12:00 公開日:2024-02-05
# カウンターファクトフェアネスはデモグラフィーのパリティではない、その他の観測結果 Counterfactual Fairness Is Not Demographic Parity, and Other Observations ( http://arxiv.org/abs/2402.02663v1 ) ライセンス: Link先を確認	Ricardo Silva	(参考訳) 因果概念と純粋に確率的概念の等価性の包括的記述は注意してアプローチすべきである。本稿では, 反実的公正性は人口的平等と等価であるという最近の主張を考察する。その主張は精査には耐えられない。虚偽の公正について、より広い誤解に対処する機会を得ます。 Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.	翻訳日:2024-02-06 18:11:50 公開日:2024-02-05
# ゼロショット一般化改善のための画像キャプチャエンコーディング Image-Caption Encoding for Improving Zero-Shot Generalization ( http://arxiv.org/abs/2402.02662v1 ) ライセンス: Link先を確認	Eric Yang Yu, Christopher Liao, Sathvik Ravi, Theodoros Tsiligkaridis, Brian Kulis	(参考訳) 近年の視覚言語モデルの進歩は、ゼロショット画像分類のような下流推論タスクにおいて、最先端(SOTA)を実現するための生成手法と対照的なアプローチを組み合わせている。しかし、画像分類におけるこれらのモデルの永続的な問題は、そのアウト・オブ・ディストリビューション(OOD)一般化能力である。最初に、OODデータポイントが誤って分類された場合、正しいクラスがTop-K予測クラスによく見られることを示す。予測クラス内の正しいクラスに対してモデル予測を行うために、評価時にのみ、画像条件付きとキャプション条件付き予測の一貫性を直接強制する簡単なアプローチであるイメージカプセルエンコーディング(ice)法を提案する。直感的には、生成されたキャプションのユニークな特性を利用して、Top-K予測クラス内の正しいクラスラベルをローカルに検索する。本手法は他のSOTA法と組み合わせて,Top-1 OODアキュラシーを平均0.5%,挑戦的データセットで最大3%向上できることを示す。コード:https://github.com/Chris210634/ice Recent advances in vision-language models have combined contrastive approaches with generative methods to achieve state-of-the-art (SOTA) on downstream inference tasks like zero-shot image classification. However, a persistent issue of these models for image classification is their out-of-distribution (OOD) generalization capabilities. We first show that when an OOD data point is misclassified, the correct class can be typically found in the Top-K predicted classes. In order to steer the model prediction toward the correct class within the top predicted classes, we propose the Image-Caption Encoding (ICE) method, a straightforward approach that directly enforces consistency between the image-conditioned and caption-conditioned predictions at evaluation time only. Intuitively, we take advantage of unique properties of the generated captions to guide our local search for the correct class label within the Top-K predicted classes. We show that our method can be easily combined with other SOTA methods to enhance Top-1 OOD accuracies by 0.5% on average and up to 3% on challenging datasets. Our code: https://github.com/Chris210634/ice	翻訳日:2024-02-06 18:11:44 公開日:2024-02-05
# 検証器による多段階問題の解法:モデル誘起プロセススーパービジョンの実証分析 Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision ( http://arxiv.org/abs/2402.02658v1 ) ライセンス: Link先を確認	Zihan Wang, Yunxuan Li, Yuexin Wu, Liangchen Luo, Le Hou, Hongkun Yu, Jingbo Shang	(参考訳) プロセス監視は、学習された検証器を用いて、推論器が生成する中間ステップを評価することで、多段階問題解決において大きな改善が示された。本稿では,検証者学習データに対する高価な人的アノテーションの取り組みを避けるために,データキュレーションを自動化する新しい手法であるモデル誘発プロセススーパービジョン(MiPS)を紹介する。 MiPSは、推論モデルを通じてこの解の完了をサンプリングし、正しい完了の比率として定義される精度を得ることによって中間段階を注釈する。推論者の誤りによりmipsは中間ステップの精度を過小評価してしまうため,検証者の高い予測スコアに着目した検証が,先行研究とは対照的に低い予測スコアの検証よりも望ましいことを示唆し,実証的に示す。提案手法は算数および符号化タスクにおける PaLM 2 の性能を著しく向上させる(GSM8K では +0.67%,MATH では +4.16%,MBPP では +0.92%)。さらに, 検証器は, 異なる推論モデルにまたがる強い一般化能力を示すことを示した。 Process supervision, using a trained verifier to evaluate the intermediate steps generated by reasoner, has demonstrated significant improvements in multi-step problem solving. In this paper, to avoid expensive human annotation effort on the verifier training data, we introduce Model-induced Process Supervision (MiPS), a novel method for automating data curation. MiPS annotates an intermediate step by sampling completions of this solution through the reasoning model, and obtaining an accuracy defined as the proportion of correct completions. Errors in the reasoner would cause MiPS to underestimate the accuracy of intermediate steps, therefore, we suggest and empirically show that verification focusing on high predicted scores of the verifier shall be preferred over that of low predicted scores, contrary to prior work. Our approach significantly improves the performance of PaLM 2 on math and coding tasks (accuracy +0.67% on GSM8K, +4.16% on MATH, +0.92% on MBPP compared with an output supervision trained verifier). Additionally, our study demonstrates that the verifier exhibits strong generalization ability across different reasoning models.	翻訳日:2024-02-06 18:11:26 公開日:2024-02-05
# 可変結合を持つトランスモン量子ビットアレイの二次元トポロジー効果 Two-dimensional topological effect in a transmon qubit array with tunable couplings ( http://arxiv.org/abs/2402.02657v1 ) ライセンス: Link先を確認	Yan-Jun Zhao, Yu-Qi Wang, Yang Xue, Xun-Wei Xu, Yan-Yang Zhang, Wu-Ming Liu, Yu-xi Liu	(参考訳) インダクティブ・カプラを媒介とする量子ビット間相互作用を有する超伝導量子ビットの正方格子構造について検討する。そこで, クビットとカプラ間の誘導性クーリングは, 環境から発生するフラックスノイズを威圧するために, グラディメータ形状に設計されることが示唆された。実効磁束と呼ばれるアベリアゲージ電位を周期的に変調して人工的に合成することができ、2次元トポロジカル物理をシミュレーションするための優れたプラットフォームとなる。最も単純な2次元モデルでは、実効的な磁束が変化するにつれて、単粒子基底状態において、2重(または3重)ラグと2重の渦-マイスナー相転移が検出できる。さらに、レグ間結合強度とレグ間カップリング強度との間の大きなカップリング比は、キラル電流を正弦波関数に類似させる。行数がさらに増加すると、大きな行で期待されるトポロジカルバンド構造が比較的少数の行(考慮されたパラメータは10以上)でも発生し始める。これは、トポロジカルバンドを観測するために、小さな回路スケールを示す。バンドギャップ内のエッジ状態は、トポロジカルチャーン数によって決定され、第1ブリルアンゾーンに対するベリー曲率の統合により計算することができる。さらに,波動関数の時間領域および空間領域のフロリエ変換を適切に励起した後,位相バンド構造を計測する方法を体系的に提案する。この結果は、最先端の超伝導量子チップ上での二次元トポロジカル物理をシミュレーションするための道を提供する。 We investigate a square-lattice architecture of superconducting transmon qubits with inter-qubit interactions mediated by inductive couplers. Therein, the inductive couling between the qubit and couplers is suggested to be designed into the gradiometer form to intigimate the flux noise orginating from the environment. Via periodically modulating the couplers,the Abelian gauge potential, termed effective magnetic flux, can be synthesized artificially, making the system an excellent platform for simulating two-dimensional topological physics. In the simplest two-dimensional model, the double (or three-leg) ladder, the staggered vortex-Meissner phase transition different from that in the two-leg ladder can be found in the single-particle ground state as the effective magnetic flux varies. Besides, the large coupling ratio between the interleg and intraleg coupling strengths also makes the chiral current resemble squeezed sinusoidal functions. If the row number is further increased, the topological band structure anticipated at massive rows begins to occur even for a relatively small number of rows (ten or so for the considered parameters). This heralds a small circuit scale to observe the topological band. The edge state in the band gap is determined by the topological Chern number and can be calculated through integrating the Berry curvature with respect to the first Brillouin zone. Besides, we present a systematic method on how to measure the topological band structure based on time- and space-domain Frourier transformation of the wave function after properly excited. The result offers an avenue for simulating two-dimensional topological physics on the state-of-the-art superconducting quantum chips.	翻訳日:2024-02-06 18:11:04 公開日:2024-02-05
# RACER:半構造化メンタルヘルスインタビューのスケーラブル分析のためのLLMを利用した方法論 RACER: An LLM-powered Methodology for Scalable Analysis of Semi-structured Mental Health Interviews ( http://arxiv.org/abs/2402.02656v1 ) ライセンス: Link先を確認	Satpreet Harcharan Singh, Kevin Jiang, Kanchan Bhasin, Ashutosh Sabharwal, Nidal Moukaddam, Ankit B Patel	(参考訳) 半構造化面接(SSI)は、医療研究において一般的に用いられるデータ収集手法であり、被験者の体験に関する詳細な質的な洞察を提供する。その価値にもかかわらず、SSIのマニュアル分析は、感情的な反応を抽出し分類することの難しさや、大集団に対する人間の評価をスケールすることの難しさから、時間と労働集約性で悪名高い。本研究では,大規模言語モデル (llm) をベースとする専門家主導の自動化パイプラインであるracerを開発した。我々はRACERを用いて、93人の医療専門家と研修生によるSSIを分析し、COVID-19危機における幅広い個人的・専門的なメンタルヘルスへの影響を評価する。レーサーは2人の人間蒸発者(72%)と適度に高い合意に達し、人間間利回り協定(77%)に接近する。興味深いことに、llmと人間は、ニュアンス的感情的、曖昧/弁別的、心理的ステートメントを含む同様のコンテンツに苦しむ。本研究は、LSMを用いた研究効率向上の機会と課題を強調し、医療研究におけるSSIのスケーラブルな分析のための新たな道を開く。 Semi-structured interviews (SSIs) are a commonly employed data-collection method in healthcare research, offering in-depth qualitative insights into subject experiences. Despite their value, the manual analysis of SSIs is notoriously time-consuming and labor-intensive, in part due to the difficulty of extracting and categorizing emotional responses, and challenges in scaling human evaluation for large populations. In this study, we develop RACER, a Large Language Model (LLM) based expert-guided automated pipeline that efficiently converts raw interview transcripts into insightful domain-relevant themes and sub-themes. We used RACER to analyze SSIs conducted with 93 healthcare professionals and trainees to assess the broad personal and professional mental health impacts of the COVID-19 crisis. RACER achieves moderately high agreement with two human evaluators (72%), which approaches the human inter-rater agreement (77%). Interestingly, LLMs and humans struggle with similar content involving nuanced emotional, ambivalent/dialectical, and psychological statements. Our study highlights the opportunities and challenges in using LLMs to improve research efficiency and opens new avenues for scalable analysis of SSIs in healthcare research.	翻訳日:2024-02-06 18:10:37 公開日:2024-02-05
# VlogQA:ベトナムの音声機械読解のためのタスク,データセット,ベースラインモデル VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension ( http://arxiv.org/abs/2402.02655v1 ) ライセンス: Link先を確認	Thinh Phuoc Ngo, Khoa Tran Anh Dang, Son T. Luu, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen	(参考訳) 本稿では、機械読解タスクのためのベトナム語音声コーパス(mrc)の開発プロセスについて述べるとともに、実世界データを用いた機械読解タスクの課題と機会について考察する。ベトナムの既存のMRCコーパスは主にウィキペディアの記事、オンライン新聞、教科書などの公式文書に焦点を当てている。対照的に、VlogQAは10,076の質問回答ペアで構成されており、YouTubeからソースされた1,230の文書に基づく。ベトナム語母語話者の話し言葉を自然の環境で捉えることで、ベトナム語の研究で見落とされ、コーパスはベトナム語の理解課題を読む上で、将来の研究に貴重な資源を提供する。性能評価では,ベトナム語音声データに対する機械読解の大幅な進歩を示唆し,テストセットで75.34%のF1スコアを達成した。 EMに関しては、最高スコアは53.97%であり、音声ベースのコンテンツ処理の課題を反映し、さらなる改善の必要性を強調している。 This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension (MRC) tasks and provides insights into the challenges and opportunities associated with using real-world data for machine reading comprehension tasks. The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks. In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube -- an extensive source of user-uploaded content, covering the topics of food and travel. By capturing the spoken language of native Vietnamese speakers in natural settings, an obscure corner overlooked in Vietnamese research, the corpus provides a valuable resource for future research in reading comprehension tasks for the Vietnamese language. Regarding performance evaluation, our deep-learning models achieved the highest F1 score of 75.34% on the test set, indicating significant progress in machine reading comprehension for Vietnamese spoken language data. In terms of EM, the highest score we accomplished is 53.97%, which reflects the challenge in processing spoken-based content and highlights the need for further improvement.	翻訳日:2024-02-06 18:10:14 公開日:2024-02-05
# 位置論文:大規模言語モデルから時系列分析について何がわかるか Position Paper: What Can Large Language Models Tell Us about Time Series Analysis ( http://arxiv.org/abs/2402.02713v1 ) ライセンス: Link先を確認	Ming Jin, Yifan Zhang, Wei Chen, Kexin Zhang, Yuxuan Liang, Bin Yang, Jindong Wang, Shirui Pan, Qingsong Wen	(参考訳) 時系列解析は、様々な現実世界のシステムやアプリケーション固有の複雑さを理解するために不可欠である。大規模言語モデル(LLM)は近年大きな進歩を遂げているが、時系列解析機能を備えた人工知能(AGI)の開発はまだ初期段階にある。既存の時系列モデルはドメイン知識と広範囲なモデルチューニングに大きく依存しており、主に予測タスクに焦点を当てている。本稿では,現在のllmは,時系列分析に革命を起こす可能性があり,効率的な意思決定を促進し,より普遍的な時系列分析インテリジェンスへと前進する。このような進歩は、モダリティスイッチングや時系列質問応答など、幅広い可能性を解き放ちます。我々は,時系列分析におけるllmの可能性を研究者や実践者に認識させ,関連する取り組みへの信頼の必要性を強調する。さらに,既存のLCM技術と時系列解析のシームレスな統合について詳述し,今後の研究に期待できる道筋を概説する。 Time series analysis is essential for comprehending the complexities inherent in various real-world systems and applications. Although large language models (LLMs) have recently made significant strides, the development of artificial general intelligence (AGI) equipped with time series analysis capabilities remains in its nascent phase. Most existing time series models heavily rely on domain knowledge and extensive model tuning, predominantly focusing on prediction tasks. In this paper, we argue that current LLMs have the potential to revolutionize time series analysis, thereby promoting efficient decision-making and advancing towards a more universal form of time series analytical intelligence. Such advancement could unlock a wide range of possibilities, including modality switching and time series question answering. We encourage researchers and practitioners to recognize the potential of LLMs in advancing time series analysis and emphasize the need for trust in these related efforts. Furthermore, we detail the seamless integration of time series analysis with existing LLM technologies and outline promising avenues for future research.	翻訳日:2024-02-06 18:01:55 公開日:2024-02-05
# 物理形ニューラルネットワークの最適化のためのアーキテクチャ戦略 Architectural Strategies for the optimization of Physics-Informed Neural Networks ( http://arxiv.org/abs/2402.02711v1 ) ライセンス: Link先を確認	Hemanth Saratchandran, Shin-Fang Chng, Simon Lucey	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は、偏微分方程式(PDE)における前方と逆の問題を、基礎物理学の原理に深層学習を組み込むことで解決する、有望な道を提供する。その顕著な経験的成功にもかかわらず、PINNは様々なPDEで悪名高いトレーニング課題の評判を博した。本稿では,神経アーキテクチャの観点からピン最適化の複雑さを考察する。ニューラル・タンジェント・カーネル(NTK)を用いて,本研究はガウスの活性化がPINNを効果的に訓練する上で,いくつかの代替活性化を上回ることを明らかにした。数値線形代数学からの洞察に基づいて,事前条件付きニューラルアーキテクチャを導入し,このような最適化アーキテクチャが最適化プロセスをどのように強化するかを示す。学術文献中のPDEに対する厳密な検証を通じて,本研究の理論的知見を裏付ける。 Physics-informed neural networks (PINNs) offer a promising avenue for tackling both forward and inverse problems in partial differential equations (PDEs) by incorporating deep learning with fundamental physics principles. Despite their remarkable empirical success, PINNs have garnered a reputation for their notorious training challenges across a spectrum of PDEs. In this work, we delve into the intricacies of PINN optimization from a neural architecture perspective. Leveraging the Neural Tangent Kernel (NTK), our study reveals that Gaussian activations surpass several alternate activations when it comes to effectively training PINNs. Building on insights from numerical linear algebra, we introduce a preconditioned neural architecture, showcasing how such tailored architectures enhance the optimization process. Our theoretical findings are substantiated through rigorous validation against established PDEs within the scientific literature.	翻訳日:2024-02-06 18:01:38 公開日:2024-02-05
# エキシトン光学系を用いた2つのエキシトンモードの絡み込み Entangling two exciton modes using exciton optomechanics ( http://arxiv.org/abs/2402.02710v1 ) ライセンス: Link先を確認	Xuan Zuo, Zhi-Yuan Fan, Huai-Bing Zhu, Jie Li	(参考訳) エキシトン光学、キャビティエキシトンポラリトンおよびオプティメカニクスは、エキシトン、フォノン、光子間の豊富な非線形カップリングのために、光間相互作用と非線形性の研究の新しい機会を開く。本稿では,2つの量子井戸と集積された半導体オプティメカニカルマイクロキャビティからなるエキシトン-オプトメカニカルシステムにおいて,2つのエキシトンモードを絡み合わせることを提案する。量子井戸は、線形結合を介して光キャビティモードに同時に結合する2つのエキシトンモードをサポートし、キャビティモードは、放射圧と光弾性効果の両方を考慮した分散光学相互作用を介して機械振動モードに結合する。我々は, マイクロキャビティを赤みがかったレーザー磁場で強く駆動することにより, 両エキシトンモードがそれぞれストークスと反ストークスのサイドバンドに共振している場合, 両エキシトンモード間の定常的な絡み合いを, 現在利用可能なパラメータの下で確立できることを示す。エンタングルメントは系の様々な散逸に対して堅牢であり、室温で$\sim10^4$以上の機械的品質係数を達成できる。 Exciton optomechanics, bridging cavity exciton polaritons and optomechanics, opens new opportunities for the study of light-matter strong interactions and nonlinearities, due to the rich nonlinear couplings among excitons, phonons, and photons. Here, we propose to entangle two exciton modes in an exciton-optomechanical system, which consists of a semiconductor optomechanical microcavity integrated with two quantum wells. The quantum wells support two exciton modes, which simultaneously couple to an optical cavity mode via a linear coupling, and the cavity mode also couples to a mechanical vibration mode via a dispersive optomechanical interaction, accounting for both the radiation pressure and the photoelastic effect. We show that by strongly driving the microcavity with a red-detuned laser field and when the two exciton modes are respectively resonant with the Stokes and anti-Stokes sidebands scattered by the mechanical motion, stationary entanglement between the two exciton modes can be established under currently available parameters. The entanglement is robust against various dissipations of the system and can be achieved at room temperature for a mechanical quality factor higher than $\sim10^4$.	翻訳日:2024-02-06 18:01:25 公開日:2024-02-05
# 隠蔽単一光子源を用いたパッシブデコイ状態量子セキュア直接通信 Passive decoy-state quantum secure direct communication with heralded single-photon source ( http://arxiv.org/abs/2402.02709v1 ) ライセンス: Link先を確認	Jia-Wei Ying, Peng Zhao, Wei Zhong, Ming-Ming Du, Xi-Yun Li, Shu-Ting Shen, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng	(参考訳) 量子セキュアダイレクト通信(QSDC)は、秘密メッセージをキーなしで直接量子チャネルを介して送信することができる。不完全光子源はQSDCの実装において大きな障害となる。不完全な光子源から放出される望ましくない真空状態と多光子成分は、QSDCの機密メッセージ容量を大幅に減らし、セキュリティを脅かす。本稿では,高効率な単一光子源(HSPS)を用いた受動デコイ状態QSDCプロトコルを提案する。我々は、2つの空間モードで絡み合った光子対を放出するために自然パラメトリックダウンコンバージョン源を採用する。 2つの相関した空間モードのうちの1つで光子を検出することで、他の空間モードの光子数分布を推定することができる。一方,本プロトコルは信号状態とデコイ状態の簡易な受動的準備を可能にする。 HSPSは真空状態の確率を効果的に低減し、QSDCの機密メッセージ容量を増大させることができる。一方、受動的デコイ状態法は、実験操作を簡素化し、サードパーティーのサイドチャネル攻撃に対するQSDCの堅牢性を高めることができる。 10kmの通信距離で、当社のQSDCプロトコルの秘密メッセージ容量は、HSPSなしで81.85倍(平均光子数0.1)、12.79倍(平均光子数0.01)を達成することができる。我々のQSDCプロトコルは通信距離が長い(平均光子数0.01の約17.975km)。我々の研究は、実用的な受動デコイ状態QSDCシステムのさらなる発展に向けた大きなステップとなる。 Quantum secure direct communications (QSDC) can directly transmit secret messages through quantum channel without keys. The imperfect photon source is a major obstacle for QSDC's practical implementation. The unwanted vacuum state and multi-photon components emitted from imperfect photon source largely reduce QSDC's secrecy message capacity and even threaten its security. In the paper, we propose a high-efficient passive decoy-state QSDC protocol with the heralded single-photon source (HSPS). We adopt a spontaneous parametric down-conversion source to emit entangled photon pairs in two spatial modes. By detecting the photons in one of the two correlated spatial modes, we can infer the photon number distribution of the other spatial mode. Meanwhile, our protocol allows a simple passive preparation of the signal states and decoy state. The HSPS can effectively reduce the probability of vacuum state and increase QSDC's secrecy message capacity. Meanwhile, the passive decoy-state method can simplify the experimental operations and enhance QSDC's robustness against the third-party side-channel attacks. Under the communication distance of 10 km, the secrecy message capacity of our QSDC protocol can achieve 81.85 times (average photon number of 0.1) and 12.79 times (average photon number of 0.01) of that in the original single-photon-based QSDC protocol without the HSPS. Our QSDC protocol has longer maximal communication distance (about 17.975 km with average photon number of 0.01). Our work serves as a major step toward the further development of practical passive decoy-state QSDC systems.	翻訳日:2024-02-06 18:01:00 公開日:2024-02-05
# マルチタスクモデルマージのための表現手術 Representation Surgery for Multi-Task Model Merging ( http://arxiv.org/abs/2402.02705v1 ) ライセンス: Link先を確認	Enneng Yang and Li Shen and Zhenyi Wang and Guibing Guo and Xiaojun Chen and Xingwei Wang and Dacheng Tao	(参考訳) マルチタスク学習(MTL)は、複数のタスクから情報を統一されたバックボーンに圧縮し、計算効率と一般化を改善する。最近の研究は、mtlのアプリケーションシナリオを大きく拡大し、共同トレーニングのために生データを収集する代わりに、mtlを実行するために複数の独立したトレーニングモデルを直接マージする。しかし、既存のモデルマージスキームの表現分布を可視化することで、マージモデルはしばしば表現バイアスのジレンマに苦しむことが分かる。つまり、マージされたモデルと個々のモデルの表現分布に大きな差があり、結果としてマージされたMTLの性能は低下する。本稿では,統合モデルにおける表現バイアスを低減するために,Surgeryと呼ばれる表現手術ソリューションを提案する。特に、手術は、マージされたモデルの表現を入力とし、マージされたモデルから表現に含まれるバイアスを出力しようとする軽量なタスク固有モジュールである。そこで我々は,統合モデルの表現と個々のモデルの表現との距離を最小化し,手術モジュールを更新する教師なし最適化目標を設計した。手術モジュールをSOTA(State-of-the-art Model merging scheme)に適用した場合のMTL性能は有意に向上した。 Multi-task learning (MTL) compresses the information from multiple tasks into a unified backbone to improve computational efficiency and generalization. Recent work directly merges multiple independently trained models to perform MTL instead of collecting their raw data for joint training, greatly expanding the application scenarios of MTL. However, by visualizing the representation distribution of existing model merging schemes, we find that the merged model often suffers from the dilemma of representation bias. That is, there is a significant discrepancy in the representation distribution between the merged and individual models, resulting in poor performance of merged MTL. In this paper, we propose a representation surgery solution called "Surgery" to reduce representation bias in the merged model. Specifically, Surgery is a lightweight task-specific module that takes the representation of the merged model as input and attempts to output the biases contained in the representation from the merged model. We then designed an unsupervised optimization objective that updates the Surgery module by minimizing the distance between the merged model's representation and the individual model's representation. Extensive experiments demonstrate significant MTL performance improvements when our Surgery module is applied to state-of-the-art (SOTA) model merging schemes.	翻訳日:2024-02-06 18:00:33 公開日:2024-02-05
# 視覚強化学習における一般化ギャップの理解:理論と実証的エビデンス Understanding What Affects Generalization Gap in Visual Reinforcement Learning: Theory and Empirical Evidence ( http://arxiv.org/abs/2402.02701v1 ) ライセンス: Link先を確認	Jiafei Lyu, Le Wan, Xiu Li, Zongqing Lu	(参考訳) 近年,視覚強化学習(rl)における継続的制御のための有用なポリシーを学習しようとする試みが数多く行われている。このシナリオでは、テスト環境がトレーニング環境と異なる可能性があるため、一般的なポリシーを学ぶことが重要である。この問題に対処するために多くの実用的なアルゴリズムが提案されている。しかしながら、私たちの知る限りでは、一般化のギャップと、提案された方法がなぜ機能するのかを理論的に理解することはできません。本稿では,テスト環境に障害がある場合の一般化ギャップに寄与する重要な要因を理論的に答えることで,この問題に橋渡しする。我々の理論は、人間の直観と一致するトレーニング環境とテスト環境の間の表現距離を最小化することが、一般化のギャップを減らすために最も重要であることを示している。我々の理論結果はDMControl Generalization Benchmark (DMC-GB) の実証的な証拠によって裏付けられている。 Recently, there are many efforts attempting to learn useful policies for continuous control in visual reinforcement learning (RL). In this scenario, it is important to learn a generalizable policy, as the testing environment may differ from the training environment, e.g., there exist distractors during deployment. Many practical algorithms are proposed to handle this problem. However, to the best of our knowledge, none of them provide a theoretical understanding of what affects the generalization gap and why their proposed methods work. In this paper, we bridge this issue by theoretically answering the key factors that contribute to the generalization gap when the testing environment has distractors. Our theories indicate that minimizing the representation distance between training and testing environments, which aligns with human intuition, is the most critical for the benefit of reducing the generalization gap. Our theoretical results are supported by the empirical evidence in the DMControl Generalization Benchmark (DMC-GB).	翻訳日:2024-02-06 18:00:12 公開日:2024-02-05
# 線形文脈 MDP のサンプル複素性評価 Sample Complexity Characterization for Linear Contextual MDPs ( http://arxiv.org/abs/2402.02700v1 ) ライセンス: Link先を確認	Junze Deng, Yuan Cheng, Shaofeng Zou and Yingbin Liang	(参考訳) 文脈マルコフ決定プロセス(CMDP)は、遷移カーネルと報酬関数がコンテキスト変数によってインデックス付けされた異なるMDPで時間とともに変化できる強化学習のクラスを記述する。 cmdpは、多くの実世界のアプリケーションを時間変動環境でモデル化するための重要なフレームワークとして機能するが、理論的な見地からはほとんど探索されない。本稿では,CMDPを2つの線形関数近似モデル(文脈変化表現とすべての文脈に対する共通線形重み付きモデルI)と,すべての文脈に対する共通表現と文脈変化線形重み付きモデルIIについて検討する。いずれのモデルにおいても,新しいモデルベースアルゴリズムを提案し,所望の多項式サンプル複雑性を持つ$\epsilon$-suboptimality gap を満足できることを示す。特に,最初のモデルから表型cmdpへの結果のインスタンス化は,到達可能性の仮定を取り除いて既存の結果を改善する。 2つ目のモデルに対する結果は、このような関数近似モデルで最初に知られた結果である。さらに,2つのモデルの比較結果から,文脈変化の特徴を持つ場合,線形CMDPの下でのすべてのコンテキストに対する共通表現よりも,より優れたサンプル効率が得られることが示された。 Contextual Markov decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. While CMDPs serve as an important framework to model many real-world applications with time-varying environments, they are largely unexplored from theoretical perspective. In this paper, we study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights. For both models, we propose novel model-based algorithms and show that they enjoy guaranteed $\epsilon$-suboptimality gap with desired polynomial sample complexity. In particular, instantiating our result for the first model to the tabular CMDP improves the existing result by removing the reachability assumption. Our result for the second model is the first-known result for such a type of function approximation models. Comparison between our results for the two models further indicates that having context-varying features leads to much better sample efficiency than having common representations for all contexts under linear CMDPs.	翻訳日:2024-02-06 17:59:55 公開日:2024-02-05
# ロバスト話者検証のための逆データ拡張 Adversarial Data Augmentation for Robust Speaker Verification ( http://arxiv.org/abs/2402.02699v1 ) ライセンス: Link先を確認	Zhenyu Zhou and Junhui Chen and Namin Wang and Lantian Li and Dong Wang	(参考訳) データ拡張(DA)は、実装の容易さと重要な有効性により、ディープスピーカーモデルで広く普及している。実生活の音響変化をシミュレートすることでトレーニングデータを強化し、深いニューラルネットワークは無関係な音響変化を無視しながら話者関連表現を学習し、堅牢性と一般化を改善する。しかしながら、バニラDAの潜在的な問題は増大残留、すなわち異なる種類の増大に起因する不必要な歪みである。そこで本稿では,daと逆学習を組み合わせた新しいアプローチであるadversarial data augmentation (a-da)を提案する。具体的には、データ拡張に使用されるさまざまな拡張タイプを分類する拡張分類器を追加する。この敵対的学習により、ネットワークは拡張分類器を欺くことができる話者埋め込みを生成することができ、学習された話者埋め込みは増分変動に直面してより堅牢になる。 VoxCeleb と CN-Celeb のデータセットを用いて行った実験により,提案したA-DA は実験条件の整合性および整合性の両方において標準DA よりも優れており,その優れた堅牢性と音響変動に対する一般化が示された。 Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a potential issue with the vanilla DA is augmentation residual, i.e., unwanted distortion caused by different types of augmentation. To address this problem, this paper proposes a novel approach called adversarial data augmentation (A-DA) which combines DA with adversarial learning. Specifically, it involves an additional augmentation classifier to categorize various augmentation types used in data augmentation. This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations. Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions, showcasing its superior robustness and generalization against acoustic variations.	翻訳日:2024-02-06 17:59:34 公開日:2024-02-05
# 期待を超えて:確率的支配で学ぶことは現実的になった Beyond Expectations: Learning with Stochastic Dominance Made Practical ( http://arxiv.org/abs/2402.02698v1 ) ライセンス: Link先を確認	Shicong Cen, Jincheng Mei, Hanjun Dai, Dale Schuurmans, Yuejie Chi, Bo Dai	(参考訳) 確率的支配は、単に期待に頼るのとは対照的に、不確実性の本質的な構造を自然に捉えた不確定な結果を伴う意思決定に対するリスク回避的な選好をモデル化する。理論上は魅力的であるにもかかわらず、機械学習における確率的支配の応用は、以下の課題により少ない:$\textbf{i)}$, the original concept of stochastic dominance only provides a $\textit{partial order}$, is amenable to serve aOptimity criterion; $\textbf{ii)}$, and $\textbf{ii}$, a efficient computational recipe is not lack to the continuum nature of stochastic dominance。これは機械学習への応用を妨げている。本研究では,確率的支配による学習の一般的な枠組みを確立するための最初の試みを行う。まず確率支配の概念を一般化し、任意の確率変数の任意のペア間の比較を可能にする。次に,多くの学習タスクにシームレスに接続できる確率的支配という観点から最適解を求めるための,単純かつ計算効率の良い手法を開発した。数値実験により,提案手法は標準的なリスク中立戦略に匹敵する性能を示し,教師付き学習,強化学習,ポートフォリオ最適化など,さまざまなアプリケーション間でのリスクに対するよりよいトレードオフを得ることができた。 Stochastic dominance models risk-averse preferences for decision making with uncertain outcomes, which naturally captures the intrinsic structure of the underlying uncertainty, in contrast to simply resorting to the expectations. Despite theoretically appealing, the application of stochastic dominance in machine learning has been scarce, due to the following challenges: $\textbf{i)}$, the original concept of stochastic dominance only provides a $\textit{partial order}$, therefore, is not amenable to serve as an optimality criterion; and $\textbf{ii)}$, an efficient computational recipe remains lacking due to the continuum nature of evaluating stochastic dominance.%, which barriers its application for machine learning. In this work, we make the first attempt towards establishing a general framework of learning with stochastic dominance. We first generalize the stochastic dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We next develop a simple and computationally efficient approach for finding the optimal solution in terms of stochastic dominance, which can be seamlessly plugged into many learning tasks. Numerical experiments demonstrate that the proposed method achieves comparable performance as standard risk-neutral strategies and obtains better trade-offs against risk across a variety of applications including supervised learning, reinforcement learning, and portfolio optimization.	翻訳日:2024-02-06 17:59:11 公開日:2024-02-05
# 高次元ガウス混合に対する深部平衡モデルと非深さ明示的モデルとの等価性 Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures ( http://arxiv.org/abs/2402.02697v1 ) ライセンス: Link先を確認	Zenan Ling, Longbo Li, Zhanbo Feng, Yixuan Zhang, Feng Zhou, Robert C. Qiu, Zhenyu Liao	(参考訳) 典型的な暗黙的ニューラルネットワークであるDeep equilibrium Model (DEQ) は、様々なタスクにおいて顕著な成功を収めている。しかしながら、暗黙のdeqと明示的なニューラルネットワークモデルとの接続と差に関する理論的理解が欠如している。本稿では, ランダム行列理論(rmt)の最近の進歩を利用して, 高次元ガウス混合系から入力データが引き出されるとき, 共役核(ck)および神経接核(ntk)行列の固有スペクトルの詳細な解析を行う。この設定において、これらのインプリシットCKとNTKのスペクトル挙動は、DECの活性化関数と初期重み分散に依存するが、4つの非線形方程式の系によってのみ証明する。この理論結果の直接的な結果として、浅い明示的ネットワークを、与えられたDECと同じCKまたはNTKを生成するように慎重に設計できることが示される。ここではガウス混合データに導かれるが、実証的な結果は、提案された理論と設計原則が一般的な実世界のデータセットにも適用されることを示している。 Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.	翻訳日:2024-02-06 17:58:38 公開日:2024-02-05
# 応答性機械学習のための因果的特徴選択 Causal Feature Selection for Responsible Machine Learning ( http://arxiv.org/abs/2402.02696v1 ) ライセンス: Link先を確認	Raha Moraffah, Paras Sheth, Saketh Vishnubhatla, and Huan Liu	(参考訳) 機械学習(ML)は多くの現実世界のアプリケーションにおいて重要な側面となっている。その結果、責任ある機械学習の必要性が出現し、MLモデルを倫理的および社会的価値に整合させ、信頼性と信頼性を高めることに集中した。責任あるMLには多くの問題がある。この調査は、解釈可能性、公正性、敵の堅牢性、ドメインの一般化の4つの主要な問題に対処する。機能選択は、責任あるMLタスクにおいて重要な役割を果たす。しかしながら、変数間の統計的相関に基づいて構築すると、バイアスやパフォーマンスが損なわれるようなスプリアスパターンにつながる可能性がある。この調査は、責任のあるmlの4つの側面を強化するための、因果的特徴の選択に関する現在の研究に焦点を当てている。因果関係が結果に影響を及ぼす特徴を特定し、因果関係を相関から区別することで、因果的特徴選択は、高スループットアプリケーションにおいて倫理的かつ社会的に責任を持つmlモデルを確実にするためのユニークなアプローチとして仮定される。 Machine Learning (ML) has become an integral aspect of many real-world applications. As a result, the need for responsible machine learning has emerged, focusing on aligning ML models to ethical and social values, while enhancing their reliability and trustworthiness. Responsible ML involves many issues. This survey addresses four main issues: interpretability, fairness, adversarial robustness, and domain generalization. Feature selection plays a pivotal role in the responsible ML tasks. However, building upon statistical correlations between variables can lead to spurious patterns with biases and compromised performance. This survey focuses on the current study of causal feature selection: what it is and how it can reinforce the four aspects of responsible ML. By identifying features with causal impacts on outcomes and distinguishing causality from correlation, causal feature selection is posited as a unique approach to ensuring ML models to be ethically and socially responsible in high-stakes applications.	翻訳日:2024-02-06 17:58:16 公開日:2024-02-05
# ブラックボックスレベルの攻撃に対する爆発的クラス確率 Exploiting Class Probabilities for Black-box Sentence-level Attacks ( http://arxiv.org/abs/2402.02695v1 ) ライセンス: Link先を確認	Raha Moraffah and Huan Liu	(参考訳) 文レベルの攻撃は、正しく分類された文と同義であるが、テキスト分類器によって誤分類される逆文を作らせる。ブラックボックス設定の下では、分類器はクエリされた入力へのフィードバックを通してのみアクセスでき、クラス確率の形では主に利用可能である。クラス確率を利用すると、より強力な攻撃が発生するが、文レベルの攻撃に使用するという課題のため、既存の攻撃ではフィードバックもクラスラベルのみを使用する。課題を克服するために,ブラックボックスの文レベル攻撃にクラス確率を用いたアルゴリズムを開発し,攻撃の成功にクラス確率を用いることの有効性を検証し,ブラックボックスの文レベル攻撃によるクラス確率の使用が適切か否かを問う。提案する攻撃を,各種分類器とベンチマークデータセットのベースラインと比較し,広範な評価を行った。 Sentence-level attacks craft adversarial sentences that are synonymous with correctly-classified sentences but are misclassified by the text classifiers. Under the black-box setting, classifiers are only accessible through their feedback to queried inputs, which is predominately available in the form of class probabilities. Even though utilizing class probabilities results in stronger attacks, due to the challenges of using them for sentence-level attacks, existing attacks use either no feedback or only the class labels. Overcoming the challenges, we develop a novel algorithm that uses class probabilities for black-box sentence-level attacks, investigate the effectiveness of using class probabilities on the attack's success, and examine the question if it is worthy or practical to use class probabilities by black-box sentence-level attacks. We conduct extensive evaluations of the proposed attack comparing with the baselines across various classifiers and benchmark datasets.	翻訳日:2024-02-06 17:57:59 公開日:2024-02-05
# IEEE ICME 2024グランドチャレンジの解説:ドメインシフトに基づく半教師付き音響シーン分類 Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift ( http://arxiv.org/abs/2402.02694v1 ) ライセンス: Link先を確認	Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Mark D. Plumbley, Dongyuan Shi, Woon-Seng Gan, Susanto Rahardja, Bin Xiang, Jianfeng Chen	(参考訳) 音響シーン分類 (ASC) は, 音場解析において重要な研究課題であり, 環境の独特の音響特性を認識することを目的としている。 ASCタスクの課題の1つは、トレーニングとテストデータの分散ギャップに起因するドメインシフトである。 2018年以降、ASCの課題は、さまざまな記録デバイスにまたがるASCモデルの一般化に焦点を当てている。近年のこの課題はデバイス一般化において大きな進歩を遂げているが、時間、空間、文化、言語など、異なる領域間のドメインシフトの課題はいまだに不十分である。また、実世界におけるラベルなし音響シーンデータの存在量を考慮すると、これらのラベルなしデータを利用する方法を検討することが重要である。そこで,ICME 2024 Grand Challengeにおいて,ドメインシフトに基づく半教師付き音響シーン分類を提案する。我々は、ドメインシフトの下でより堅牢なASCモデルを開発することを目的として、半教師付き学習技術で革新を奨励する。 Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is domain shift caused by a distribution gap between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Although this task in recent years has achieved substantial progress in device generalization, the challenge of domain shift between different regions, involving characteristics such as time, space, culture, and language, remains insufficiently explored at present. In addition, considering the abundance of unlabeled acoustic scene data in the real world, it is important to study the possible ways to utilize these unlabelled data. Therefore, we introduce the task Semi-supervised Acoustic Scene Classification under Domain Shift in the ICME 2024 Grand Challenge. We encourage participants to innovate with semi-supervised learning techniques, aiming to develop more robust ASC models under domain shift.	翻訳日:2024-02-06 17:57:42 公開日:2024-02-05
# グラフニューラルネットワークを用いたリンク予測のための統計的保証 Statistical Guarantees for Link Prediction using Graph Neural Networks ( http://arxiv.org/abs/2402.02692v1 ) ライセンス: Link先を確認	Alan Chung, Amin Saberi, Morgane Austern	(参考訳) 本稿では,グラフ生成グラフ上のリンク予測タスクにおいて,グラフニューラルネットワーク(GNN)の性能を統計的に保証する。本稿では,基礎となるエッジ確率に対して一貫した推定値を生成する線形gnnアーキテクチャ(lg-gnn)を提案する。平均二乗誤差の上限を確立し,LG-GNNの高確率エッジ検出能力を保証する。我々の保証は疎グラフと密グラフの両方に当てはまる。最後に,従来のgcnアーキテクチャの欠点を実証するとともに,実データと合成データを用いた結果の検証を行う。 This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.	翻訳日:2024-02-06 17:57:25 公開日:2024-02-05
# ベイズ最適化のためのポアソン過程 Poisson Process for Bayesian Optimization ( http://arxiv.org/abs/2402.02687v1 ) ライセンス: Link先を確認	Xiaoxing Wang, Jiaxing Li, Chao Xue, Wei Liu, Weifeng Liu, Xiaokang Yang, Junchi Yan, Dacheng Tao	(参考訳) ベイズ最適化(BO)はサンプル効率の良いブラックボックス最適化器であり、木構造パーゼン推定器(TPE)、ランダムフォレスト(SMAC)、ガウス過程(GP)などの確率的サロゲートモデルを用いてブラックボックス関数の絶対関数応答を構築するための広範な手法が提案されている。しかし, 雑音に対して頑健で, 絶対関数応答よりも実用性に優れる候補の相対的ランキングを推定する手法は, 特に関数応答が難解で選好が得られない場合には, ほとんど検討されていない。そこで本研究では,poisson プロセスに基づく新しいランキングに基づくサーロゲートモデルを提案し,poisson process bayesian optimization (popbo) という効率的な bo フレームワークを提案する。 2つの調整された取得関数は、それに対応するために古典的な LCB と EI からさらに派生する。従来のGP-BO法と比較すると,PoPBOは計算コストが低く,ノイズに対する堅牢性も良好である。ハイパーパラメータ最適化(HPO)やニューラルアーキテクチャサーチ(NAS)を含むシミュレーションと実世界のベンチマークの結果は、PoPBOの有効性を示している。 BayesianOptimization(BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidates, which can be more robust to noise and have better practicality than absolute function responses, especially when the function responses are intractable but preferences can be acquired. To this end, we propose a novel ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO). Two tailored acquisition functions are further derived from classic LCB and EI to accommodate it. Compared to the classic GP-BO method, our PoPBO has lower computation costs and better robustness to noise, which is verified by abundant experiments. The results on both simulated and real-world benchmarks, including hyperparameter optimization (HPO) and neural architecture search (NAS), show the effectiveness of PoPBO.	翻訳日:2024-02-06 17:57:17 公開日:2024-02-05
# 創発的コミュニケーションによるインテントプロファイリングと翻訳 Intent Profiling and Translation Through Emergent Communication ( http://arxiv.org/abs/2402.02768v1 ) ライセンス: Link先を確認	Salwa Mostafa, Mohammed S. Elbamby, Mohamed K. Abdel-Aziz, and Mehdi Bennis	(参考訳) ネットワークアプリケーションの要求を効果的に表現し満足するために、意図に基づくネットワーク管理が有望なソリューションとして登場した。意図に基づく手法では、ユーザとアプリケーションはネットワークへの高レベル抽象言語における意図を表現する。この抽象化はネットワーク操作を単純化するが、アプリケーションの意図を効率的に表現し、異なるネットワーク機能にマップする多くの課題を引き起こす。そこで本研究では,意図のプロファイリングと翻訳のためのAIベースのフレームワークを提案する。ネットワークと対話するアプリケーションがドメイン言語におけるネットワークサービスのニーズを表現するシナリオを考察する。マシン間通信(すなわちアプリケーションとネットワーク間の通信)は、実用的でもスケーラブルでもない各アプリケーションのドメイン言語を理解するためのネットワークを必要とするため、複雑である。その代わり、創発的コミュニケーションに基づくフレームワークがインテントプロファイリングのために提案され、アプリケーションは創発的コミュニケーションメッセージを通じてネットワークに対するqoe(abstract quality-of-experience)インテントを表現する。その後、ネットワークはこれらの通信メッセージを解釈し、要求されたサービス品質(qos)を保証するためのネットワーク機能(スライス)にマップする方法を学習する。シミュレーションの結果,提案手法は自己学習スライシングやその他のベースラインよりも優れており,完全知識ベースラインに近い性能が得られることがわかった。 To effectively express and satisfy network application requirements, intent-based network management has emerged as a promising solution. In intent-based methods, users and applications express their intent in a high-level abstract language to the network. Although this abstraction simplifies network operation, it induces many challenges to efficiently express applications' intents and map them to different network capabilities. Therefore, in this work, we propose an AI-based framework for intent profiling and translation. We consider a scenario where applications interacting with the network express their needs for network services in their domain language. The machine-to-machine communication (i.e., between applications and the network) is complex since it requires networks to learn how to understand the domain languages of each application, which is neither practical nor scalable. Instead, a framework based on emergent communication is proposed for intent profiling, in which applications express their abstract quality-of-experience (QoE) intents to the network through emergent communication messages. Subsequently, the network learns how to interpret these communication messages and map them to network capabilities (i.e., slices) to guarantee the requested Quality-of-Service (QoS). Simulation results show that the proposed method outperforms self-learning slicing and other baselines, and achieves a performance close to the perfect knowledge baseline.	翻訳日:2024-02-06 17:49:58 公開日:2024-02-05
# lidarカメラ核融合モデルのロバスト性向上 : 核融合戦略の観点から Improving Robustness of LiDAR-Camera Fusion Model against Weather Corruption from Fusion Strategy Perspective ( http://arxiv.org/abs/2402.02738v1 ) ライセンス: Link先を確認	Yihao Huang, Kaiyuan Yu, Qing Guo, Felix Juefei-Xu, Xiaojun Jia, Tianlin Li, Geguang Pu, Yang Liu	(参考訳) 近年、LiDARカメラ融合モデルでは、自律運転における3次元物体検出タスクが著しく進歩している。しかし、複雑な物理的世界の霧、雨、雪、日光といった一般的な気象汚染に対する頑健さは未調査のままである。本稿では,崩壊したデータセットの融合戦略の観点から,融合モデルのロバスト性を評価する。この評価に基づき,LiDARおよびカメラ源からのフレキシブルに重み付けされたヒューズ機能により,様々な気象シナリオに適応し,融合モデルの堅牢性を高めるための簡潔かつ実用的な融合戦略を提案する。異なる2つの軽量実装を持つ4種類の融合モデルによる実験により、アプローチの適用性と有効性が確認された。 In recent years, LiDAR-camera fusion models have markedly advanced 3D object detection tasks in autonomous driving. However, their robustness against common weather corruption such as fog, rain, snow, and sunlight in the intricate physical world remains underexplored. In this paper, we evaluate the robustness of fusion models from the perspective of fusion strategies on the corrupted dataset. Based on the evaluation, we further propose a concise yet practical fusion strategy to enhance the robustness of the fusion models, namely flexibly weighted fusing features from LiDAR and camera sources to adapt to varying weather scenarios. Experiments conducted on four types of fusion models, each with two distinct lightweight implementations, confirm the broad applicability and effectiveness of the approach.	翻訳日:2024-02-06 17:49:38 公開日:2024-02-05
# モーションキューを用いた低データレジームにおけるシングルフレームボディポースと形状推定 Using Motion Cues to Supervise Single-Frame Body Pose and Shape Estimation in Low Data Regimes ( http://arxiv.org/abs/2402.02736v1 ) ライセンス: Link先を確認	Andrey Davydov, Alexey Sidnev, Artsiom Sanakoyeu, Yuhua Chen, Mathieu Salzmann, Pascal Fua	(参考訳) 十分な注釈付きトレーニングデータが得られた場合、教師付きディープラーニングアルゴリズムは、単一のカメラを使用して人間の身体のポーズと形状を推定する。このようなデータがあまりにも少なすぎる影響は、ボディ形状のデータベースのような他の情報ソースを使用して事前学習することで軽減することができる。残念ながら、そのような情報源は必ずしも利用可能ではない。このような場合、必要な監視信号を提供するために、アノテーションのないビデオを簡単に作成できることが示される。注釈付きデータが少なすぎると、連続したフレームでポーズを計算し、それらの間の光の流れを計算します。次に、画像光の流れと、あるフレームから次のフレームへのポーズの変化から推測できるものとの間の一貫性を強制する。これにより、ネットワークの重みを効果的に洗練し、より注釈付きデータを使って訓練された方法と同等に実行するのに十分な監督が与えられる。 When enough annotated training data is available, supervised deep-learning algorithms excel at estimating human body pose and shape using a single camera. The effects of too little such data being available can be mitigated by using other information sources, such as databases of body shapes, to learn priors. Unfortunately, such sources are not always available either. We show that, in such cases, easy-to-obtain unannotated videos can be used instead to provide the required supervisory signals. Given a trained model using too little annotated data, we compute poses in consecutive frames along with the optical flow between them. We then enforce consistency between the image optical flow and the one that can be inferred from the change in pose from one frame to the next. This provides enough additional supervision to effectively refine the network weights and to perform on par with methods trained using far more annotated data.	翻訳日:2024-02-06 17:49:25 公開日:2024-02-05
# inva:マルチモーダルニューロイメージングデータの調和のための統合的変分オートエンコーダ InVA: Integrative Variational Autoencoder for Harmonization of Multi-modal Neuroimaging Data ( http://arxiv.org/abs/2402.02734v1 ) ライセンス: Link先を確認	Bowen Lei, Rajarshi Guhaniyogi, Krishnendu Chandra, Aaron Scheffler, Bani Mallick (for the Alzheimer's Disease Neuroimaging Initiative)	(参考訳) 多様な画像モダリティから派生した複数の画像間での非線形関係の探索には大きな関心がある。複数の画像に基づいて画像の予測推論を導出するイメージ・オン・イメージ回帰に関する文献が増えているが、既存のアプローチでは、画像の予測において複数の画像モダリティ間で情報を効率的に借用することに限界がある。本稿では、可変オートエンコーダ(VAE)の文献に基づいて、異なるソースから得られた複数の画像から情報を借りて画像の予測推論を行う、積分変分オートエンコーダ(\texttt{InVA})と呼ばれる新しい手法を提案する。提案手法は, 結果画像と入力画像との複雑な非線形関係を捉えつつ, 高速計算を可能にする。数値的な結果は、通常、入力画像間での情報を借りることができないVAEに対して、texttt{InVA} のかなりの利点を示す。提案フレームワークは、MRI(MRI)から容易に利用できるヒト脳スキャンにおける皮質構造の複数の測定から、コストの高いポジトロン放射トポグラフィー(PET)を高精度に予測する。 There is a significant interest in exploring non-linear associations among multiple images derived from diverse imaging modalities. While there is a growing literature on image-on-image regression to delineate predictive inference of an image based on multiple images, existing approaches have limitations in efficiently borrowing information between multiple imaging modalities in the prediction of an image. Building on the literature of Variational Auto Encoders (VAEs), this article proposes a novel approach, referred to as Integrative Variational Autoencoder (\texttt{InVA}) method, which borrows information from multiple images obtained from different sources to draw predictive inference of an image. The proposed approach captures complex non-linear association between the outcome image and input images, while allowing rapid computation. Numerical results demonstrate substantial advantages of \texttt{InVA} over VAEs, which typically do not allow borrowing information between input images. The proposed framework offers highly accurate predictive inferences for costly positron emission topography (PET) from multiple measures of cortical structure in human brain scans readily available from magnetic resonance imaging (MRI).	翻訳日:2024-02-06 17:49:08 公開日:2024-02-05
# Toon Aging: アーティストのポートレートスタイルの転送で顔の再老化 ToonAging: Face Re-Aging upon Artistic Portrait Style Transfer ( http://arxiv.org/abs/2402.02733v1 ) ライセンス: Link先を確認	Bumsoo Kim, Abdul Muqeet, Kyuchul Lee, Sanghyun Seo	(参考訳) 顔の再描画はコンピュータビジョンとグラフィックスにおいて顕著な分野であり、映画、広告、ライブストリーミングといったフォトリアリスティックな領域で重要な応用がある。近年,漫画やイラスト,アニメーションといったノンフォトリアリスティックなイメージに顔のリエイジを適用する必要性が,様々なエンターテイメント分野の延長として現れている。しかし、NPR画像上の見かけの年齢をシームレスに編集できるネットワークが存在しないことは、これらのタスクが素直なアプローチに制限され、各タスクを順次適用することを意味している。これはしばしば不快なアーティファクトとドメイン間の不一致による顔属性の喪失をもたらす。本稿では,1つの生成ステップで顔再老化とポートレート・スタイル・トランスファーを組み合わせた,新しい一段階顔再老化手法を提案する。同じPRドメイン内でトレーニングされた既存の顔のリエイジとスタイル転送ネットワークを活用します。本手法は異なる潜伏ベクトルを特異的に融合し,老化関連属性の管理とnprの出現を管理する。従来型のアプローチを採用することで,ドメインレベルの微調整アプローチよりも柔軟性が向上する。これは、再使用のためのペアデータセットと、スタイリングのためのドメインレベルのデータ駆動アプローチの制限に効果的に対処する。実験の結果,本モデルはサンプルのスタイルを伝達しながら,自然外観と制御性の両方を保ちながら,無作為に再生画像を生成することができることがわかった。 Face re-aging is a prominent field in computer vision and graphics, with significant applications in photorealistic domains such as movies, advertising, and live streaming. Recently, the need to apply face re-aging to non-photorealistic images, like comics, illustrations, and animations, has emerged as an extension in various entertainment sectors. However, the absence of a network capable of seamlessly editing the apparent age on NPR images means that these tasks have been confined to a naive approach, applying each task sequentially. This often results in unpleasant artifacts and a loss of facial attributes due to domain discrepancies. In this paper, we introduce a novel one-stage method for face re-aging combined with portrait style transfer, executed in a single generative step. We leverage existing face re-aging and style transfer networks, both trained within the same PR domain. Our method uniquely fuses distinct latent vectors, each responsible for managing aging-related attributes and NPR appearance. Adopting an exemplar-based approach, our method offers greater flexibility than domain-level fine-tuning approaches, which typically require separate training or fine-tuning for each domain. This effectively addresses the limitation of requiring paired datasets for re-aging and domain-level, data-driven approaches for stylization. Our experiments show that our model can effortlessly generate re-aged images while simultaneously transferring the style of examples, maintaining both natural appearance and controllability.	翻訳日:2024-02-06 17:48:47 公開日:2024-02-05
# サロゲートベースのブラックボックス攻撃に対する生成的アプローチ A Generative Approach to Surrogate-based Black-box Attacks ( http://arxiv.org/abs/2402.02732v1 ) ライセンス: Link先を確認	Raha Moraffah, Huan Liu	(参考訳) サーロゲートベースのブラックボックス攻撃は、DNNの脆弱性を増大させた。これらの攻撃は、特定のサンプルセットのみに対するブラックボックス目標フィードバックを持つサンプルに対して、敵の例を作成するように設計されている。最先端のサロゲートベースの攻撃では、ターゲットの出力を模倣する差別的なサロゲートを訓練する。目標は、ターゲットの決定境界を学ぶことです。その後、サロゲートはホワイトボックス攻撃によって攻撃され、元のサンプルに似た敵の例を作るが、他のクラスに属する。限られたサンプルでは、判別サーロゲートはターゲットの判断境界を正確に学習できないため、これらのサーロゲートに基づく攻撃は成功率が低い。判別的アプローチとは違って,対象の判断境界上あるいはその近傍に存在するサンプルの分布を学習する生成的サロゲートを提案する。生成的サロゲートによって学習された分布は、元のサンプルとは不可避な差異を持つが他のクラスに属する敵の例を作るのに使うことができる。提案手法は,様々なターゲットやデータセットに対する攻撃成功率が非常に高い攻撃となる。 Surrogate-based black-box attacks have exposed the heightened vulnerability of DNNs. These attacks are designed to craft adversarial examples for any samples with black-box target feedback for only a given set of samples. State-of-the-art surrogate-based attacks involve training a discriminative surrogate that mimics the target's outputs. The goal is to learn the decision boundaries of the target. The surrogate is then attacked by white-box attacks to craft adversarial examples similar to the original samples but belong to other classes. With limited samples, the discriminative surrogate fails to accurately learn the target's decision boundaries, and these surrogate-based attacks suffer from low success rates. Different from the discriminative approach, we propose a generative surrogate that learns the distribution of samples residing on or close to the target's decision boundaries. The distribution learned by the generative surrogate can be used to craft adversarial examples that have imperceptible differences from the original samples but belong to other classes. The proposed generative approach results in attacks with remarkably high attack success rates on various targets and datasets.	翻訳日:2024-02-06 17:48:20 公開日:2024-02-05
# GANによる高速かつ正確な協調無線地図推定 Fast and Accurate Cooperative Radio Map Estimation Enabled by GAN ( http://arxiv.org/abs/2402.02729v1 ) ライセンス: Link先を確認	Zezhong Zhang, Guangxu Zhu, Junting Chen, Shuguang Cui	(参考訳) 6G時代には、無線リソースのリアルタイムモニタリングと管理が、多様な無線アプリケーションをサポートするように求められている。これは無線資源の分布を高速かつ正確に推定することを要求するもので、通常はラジオマップとして知られる地理的環境上の空間信号のパワー強度によって表される。本稿では,GAN(Generative Adversarial Network, GAN-CRME)によって実現された協調無線地図推定(CRME)手法について述べる。無線マップは、モバイルユーザにおける分散受信信号強度(RSS)測定と、ディープニューラルネットワーク推定器を用いた地理的マップとの相互作用を利用して推定され、データ取得コストと計算複雑性が低下する。さらに、GANに基づく学習アルゴリズムは、生成AIのパワーを活用することにより、ディープニューラルネットワーク推定器の推論能力を高めるために提案される。シミュレーションの結果,GAN-CRMEは地理地図情報が不正確な場合に誤り訂正を行うことができることがわかった。 In the 6G era, real-time radio resource monitoring and management are urged to support diverse wireless-empowered applications. This calls for fast and accurate estimation on the distribution of the radio resources, which is usually represented by the spatial signal power strength over the geographical environment, known as a radio map. In this paper, we present a cooperative radio map estimation (CRME) approach enabled by the generative adversarial network (GAN), called as GAN-CRME, which features fast and accurate radio map estimation without the transmitters' information. The radio map is inferred by exploiting the interaction between distributed received signal strength (RSS) measurements at mobile users and the geographical map using a deep neural network estimator, resulting in low data-acquisition cost and computational complexity. Moreover, a GAN-based learning algorithm is proposed to boost the inference capability of the deep neural network estimator by exploiting the power of generative AI. Simulation results showcase that the proposed GAN-CRME is even capable of coarse error-correction when the geographical map information is inaccurate.	翻訳日:2024-02-06 17:48:05 公開日:2024-02-05
# ソフトウェア実践者は人間中心の欠陥をどう認識するか? How do software practitioners perceive human-centric defects? ( http://arxiv.org/abs/2402.02726v1 ) ライセンス: Link先を確認	Vedant Chauhan, Chetan Arora, Hourieh Khalajzadeh, John Grundy	(参考訳) コンテキスト: ヒューマン中心のソフトウェア設計と開発は、ユーザがソフトウェアを適合させるのではなく、ユーザがタスクを実行する方法に焦点を当てます。ソフトウェアユーザは、性別、年齢、文化、言語、障害、社会経済的地位、教育的背景、その他多くの違いを持つことができる。これらの違いの本質的に異なる性質と、そのソフトウェア使用への影響のため、ユーザーの好みや問題は異なり、結果として、私たちが「人間中心の欠陥」(hcd)と呼ぶユーザー固有の欠陥が生じる。目的: ソフトウェア実践者によるこのような人間中心の欠陥の認識と現在のマネジメントプラクティスを理解し,報告や理解,修正における重要な課題を特定し,ソフトウェアエンジニアリングにおけるhcds管理を改善するための推奨を提供する。方法: hcdsと欠陥追跡プロセスに関する知識と経験を評価するため,ソフトウェア技術者を対象に調査およびインタビューを行った。結果:se実践者から50 (50) のアンケートと10 (10) の面接応答を分析し,ソフトウェア工学の実践において現在の hcd の管理には複数のギャップがあることを確認した。人間中心の側面には認識が欠如しており、ソフトウェア開発中に失われたり、過小評価されたりする。その結果, エンドユーザによるフィードバックプロセスの改善, より記述的な分類, 適切な自動化により, HCDの処理が改善できることが判明した。結論: HCDは、多様なエンドユーザベースを考慮して、ソフトウェア実践者に大きな課題を与えます。ソフトウェアエンジニアリングの分野では、HCDの研究は限られており、人間中心の側面に関するより良い認識とサポートを生み出すために、研究と実践コミュニティからの努力が必要である。 Context: Human-centric software design and development focuses on how users want to carry out their tasks rather than making users accommodate their software. Software users can have different genders, ages, cultures, languages, disabilities, socioeconomic statuses, and educational backgrounds, among many other differences. Due to the inherently varied nature of these differences and their impact on software usage, preferences and issues of users can vary, resulting in user-specific defects that we term as `human-centric defects' (HCDs). Objective: This research aims to understand the perception and current management practices of such human-centric defects by software practitioners, identify key challenges in reporting, understanding and fixing them, and provide recommendations to improve HCDs management in software engineering. Method: We conducted a survey and interviews with software engineering practitioners to gauge their knowledge and experience on HCDs and the defect tracking process. Results: We analysed fifty (50) survey- and ten (10) interview- responses from SE practitioners and identified that there are multiple gaps in the current management of HCDs in software engineering practice. There is a lack of awareness regarding human-centric aspects, causing them to be lost or under-appreciated during software development. Our results revealed that handling HCDs could be improved by following a better feedback process with end-users, a more descriptive taxonomy, and suitable automation. Conclusion: HCDs present a major challenge to software practitioners, given their diverse end-user base. In the software engineering domain, research on HCDs has been limited and requires effort from the research and practice communities to create better awareness and support regarding human-centric aspects.	翻訳日:2024-02-06 17:47:48 公開日:2024-02-05
# 革新的サイバーシックネス検出: 仮想現実における頭部運動パターンの探索 Innovative Cybersickness Detection: Exploring Head Movement Patterns in Virtual Reality ( http://arxiv.org/abs/2402.02725v1 ) ライセンス: Link先を確認	Masoud Salehi, Nikoo Javadpour, Brietta Beisner, Mohammadamin Sanaei, Stephen B. Gilbert	(参考訳) 仮想現実(VR)技術が広く採用されているにもかかわらず、サイバーシックネスは一部のユーザーにとって障壁となっている。本研究は,サイバーシックネス検出のための新しい生理マーカーとして,頭部運動パターンを調査した。従来のマーカーとは異なり、頭部の動きは、すべての商用VRヘッドセットに埋め込まれたセンサーを通して簡単に捉えられる連続した非侵襲的な手段を提供する。私たちは、75人の参加者を含むvr実験の公開データセットを使用して、6軸にわたる頭の動きを分析しました。その後,頭部運動データセットとその派生品である速度,加速度,ジャークに対して広範な特徴抽出処理を行った。統計的特徴,時間的特徴,スペクトル特徴を含む3つの特徴カテゴリーが抽出された。その後,再帰的特徴除去法を用いて,最も重要かつ効果的な特徴を選定した。一連の実験で、さまざまな機械学習アルゴリズムをトレーニングしました。その結果,頭部運動に基づくサイバーシックネスの予測において76%の精度と83%の精度が得られた。この研究は、サイバーシックネス文学への貢献は、新しいデータソースの予備分析を提供し、頭の動きとサイバーシックネスの関係についての洞察を提供することである。 Despite the widespread adoption of Virtual Reality (VR) technology, cybersickness remains a barrier for some users. This research investigates head movement patterns as a novel physiological marker for cybersickness detection. Unlike traditional markers, head movements provide a continuous, non-invasive measure that can be easily captured through the sensors embedded in all commercial VR headsets. We used a publicly available dataset from a VR experiment involving 75 participants and analyzed head movements across six axes. An extensive feature extraction process was then performed on the head movement dataset and its derivatives, including velocity, acceleration, and jerk. Three categories of features were extracted, encompassing statistical, temporal, and spectral features. Subsequently, we employed the Recursive Feature Elimination method to select the most important and effective features. In a series of experiments, we trained a variety of machine learning algorithms. The results demonstrate a 76% accuracy and 83% precision in predicting cybersickness in the subjects based on the head movements. This study contribution to the cybersickness literature lies in offering a preliminary analysis of a new source of data and providing insight into the relationship of head movements and cybersickness.	翻訳日:2024-02-06 17:47:19 公開日:2024-02-05
# fdnet : 誘導多能性幹細胞由来アストロサイトにおける細胞分節の周波数領域分節ネットワーク FDNet: Frequency Domain Denoising Network For Cell Segmentation in Astrocytes Derived From Induced Pluripotent Stem Cells ( http://arxiv.org/abs/2402.02724v1 ) ライセンス: Link先を確認	Haoran Li, Jiahua Shi, Huaming Chen, Bo Du, Simon Maksour, Gabrielle Phillips, Mirella Dottori, Jun Shen	(参考訳) 体細胞から人工的に誘導される多能性幹細胞(iPSC)は、神経変性疾患の疾患モデルおよび薬物スクリーニングにおいて重要な役割を果たす。 iPSCと区別されるアストロサイトは神経代謝を研究する重要な標的である。アストロサイト分化の進行は、異なる分化段階の顕微鏡画像から観察された形態の変化を通して観察され、成熟時に分子生物学技術によって決定される。しかし、アストロサイトは通常「完全に」背景に溶け込み、そのうちのいくつかは干渉情報(死細胞、メディア堆積物、細胞破片など)で覆われ、アストロサイトは観察が困難になる。注釈付きデータセットがないため、既存の最先端のディープラーニングアプローチはこの問題に対処できない。本稿では,704画像とそのピクセルレベルアノテーションマスクを含む新しいデータセットであるiai704を用いて,アストロサイトセグメンテーションという新しいタスクを提案する。さらに、アストロサイトセグメンテーションのために、FDNetと呼ばれる新しい周波数領域デノナイズネットワークを提案する。 FDNetは、文脈情報融合モジュール(CIF)、注意ブロック(AB)、フーリエ変換ブロック(FTB)から構成される。 cifとabはアストロサイトを局在させるためにマルチスケールの機能埋め込みを融合する。 ftbは特徴埋め込みを周波数領域に変換し、干渉情報を排除するためにハイパスフィルタを実行する。実験により,iPSCの分化進行予測において,アストロサイトセグメンテーションの最先端代替品よりもFDNetの方が優れていることが示された。 Artificially generated induced pluripotent stem cells (iPSCs) from somatic cells play an important role for disease modeling and drug screening of neurodegenerative diseases. Astrocytes differentiated from iPSCs are important targets to investigate neuronal metabolism. The astrocyte differentiation progress can be monitored through the variations of morphology observed from microscopy images at different differentiation stages, then determined by molecular biology techniques upon maturation. However, the astrocytes usually ``perfectly'' blend into the background and some of them are covered by interference information (i.e., dead cells, media sediments, and cell debris), which makes astrocytes difficult to observe. Due to the lack of annotated datasets, the existing state-of-the-art deep learning approaches cannot be used to address this issue. In this paper, we introduce a new task named astrocyte segmentation with a novel dataset, called IAI704, which contains 704 images and their corresponding pixel-level annotation masks. Moreover, a novel frequency domain denoising network, named FDNet, is proposed for astrocyte segmentation. In detail, our FDNet consists of a contextual information fusion module (CIF), an attention block (AB), and a Fourier transform block (FTB). CIF and AB fuse multi-scale feature embeddings to localize the astrocytes. FTB transforms feature embeddings into the frequency domain and conducts a high-pass filter to eliminate interference information. Experimental results demonstrate the superiority of our proposed FDNet over the state-of-the-art substitutes in astrocyte segmentation, shedding insights for iPSC differentiation progress prediction.	翻訳日:2024-02-06 17:47:00 公開日:2024-02-05
# より小さな次元での量子相関による1ビットの通信 Beating one bit of Communication with Quantum Correlations in Smaller Dimension ( http://arxiv.org/abs/2402.02723v1 ) ライセンス: Link先を確認	Peter Sidajaya, Valerio Scarani	(参考訳) ベルの定理の結果として、いくつかの絡み合った状態の測定統計は局所隠れ変数(LHV)だけではシミュレートできない。供給しなければならない通信量は、非古典性の直感的な定量化である。この量が一般に非常に大きいことは明らかであるが、シミュレーションが1ビット以上の通信を必要とする量子相関の単純な例を見つけることは驚くほど困難である。本稿では,これまでで最も単純な例を報告する。これは(5,2,5,5)$ Bellのシナリオ(これまで知られている最小のケースは(7,3,16,16)$のシナリオ)である。この証明は、1ビットスコアの最大化が2つのサブゲームにおける局所スコアの和が最大となる最良のパーティションを見つけることと等価であることを示す。 As a consequence of Bell's theorem, the statistics of measurements on some entangled states cannot be simulated with Local Hidden Variables (LHV) alone. The amount of communication that must be supplied is an intuitive quantifier of non-classicality. While it is obvious that this amount can be very large in general, it has been surprisingly difficult to find simple examples of quantum correlations, whose simulation requires more than one bit of communication. In this paper, we report the simplest example to date, which lives in the $(5,2,5,5)$ Bell scenario [the previously known smallest case living in the $(7,3,16,16)$ scenario]. The proof is built on the observation that the maximisation of the 1-bit score is equivalent to finding the best partition in which the sum of the local scores in the two sub-games is maximal.	翻訳日:2024-02-06 17:46:29 公開日:2024-02-05
# Gottesman-Kitaev-Preskill Qubit-based All-Photonic Quantum Networksのための量子スイッチ Quantum Switches for Gottesman-Kitaev-Preskill Qubit-based All-Photonic Quantum Networks ( http://arxiv.org/abs/2402.02721v1 ) ライセンス: Link先を確認	Mohadeseh Azari, Paul Polakos, Kaushik P. Seshadreesan	(参考訳) ゴッテマン・キタエフ・プレスキル(gkp)符号は、理論上はガウスの熱損失光チャネル上の量子通信に最適に近い情報であり、将来の量子ネットワークにおける選択のエンコーディングである可能性が高い。 GKP符号化光に基づく量子リピータは、GKP符号の準備やホモダイン検出の非効率が現実的有限スキーズであるにもかかわらず、広範囲にわたる高いエンドツーエンドの絡み合い速度をサポートすることが示されている。本稿では,GKP量子ビットベースの量子ネットワークにおいて,クライアントとの多重化GKP量子ビットベースの絡み合いリンク生成を含むアーキテクチャと,GKP量子ビットグラフ状態リソースを併用した全フォトニックストレージを提案する。このスイッチは、最近導入された$\textit{entanglement- rank-based link matching}$ protocol heuristicのマルチクライアント一般化を使用する。 GKP-qubit グラフ状態リソースの生成はハードウェア集約的であり、総リソース予算とクライアントの任意のレイアウトが与えられているため、スイッチによって提供される異なるクライアントペア接続に対する最適な割り当ての問題に対処し、スイッチの総スループットを最大化するとともに、個々の絡み合い率についても公平である。データセンターがスイッチのクライアントであり、他のすべてのクライアントがデータセンタに接続することを目指しています。これは、ローカルエリアネットワークをグローバルネットワークに接続するゲートウェイルータの一般的なケースをキャプチャするシナリオです。互換性のある量子リピータとともに、量子スイッチは任意のトポロジーの量子ネットワークを実現する方法を提供する。 The Gottesman-Kitaev-Preskill (GKP) code, being information theoretically near optimal for quantum communication over Gaussian thermal-loss optical channels, is likely to be the encoding of choice for advanced quantum networks of the future. Quantum repeaters based on GKP-encoded light have been shown to support high end-to-end entanglement rates across large distances despite realistic finite squeezing in GKP code preparation and homodyne detection inefficiencies. Here, we introduce a quantum switch for GKP-qubit-based quantum networks, whose architecture involves multiplexed GKP-qubit-based entanglement link generation with clients, and their all-photonic storage, together enabled by GKP-qubit graph state resources. For bipartite entanglement distribution between clients via entanglement swapping, the switch uses a multi-client generalization of a recently introduced $\textit{entanglement-ranking-based link matching}$ protocol heuristic. Since generating the GKP-qubit graph state resource is hardware intensive, given a total resource budget and an arbitrary layout of clients, we address the question of their optimal allocation towards the different client-pair connections served by the switch such that the sum throughput of the switch is maximized while also being fair in terms of the individual entanglement rates. We illustrate our results for an exemplary data center network, where the data center is a client of a switch and all of its other clients aim to connect to the data center alone -- a scenario that also captures the general case of a gateway router connecting a local area network to a global network. Together with compatible quantum repeaters, our quantum switch provides a way to realize quantum networks of arbitrary topology.	翻訳日:2024-02-06 17:46:15 公開日:2024-02-05
# 割引アダプティブオンライン予測 Discounted Adaptive Online Prediction ( http://arxiv.org/abs/2402.02720v1 ) ライセンス: Link先を確認	Zhiyu Zhang, David Bombara, Heng Yang	(参考訳) オンライン学習は、すべてを覚えることではない。未来は統計的に過去と大きく異なる可能性があるため、新しいデータが入り込む間、歴史を優雅に忘れることが重要な課題である。この直観を定式化するために,最近開発された適応型オンライン学習の手法を用いて,後悔の割引という古典的な概念を再検討する。我々の主な成果は、損失シーケンスとコンパレータの両方の複雑さに適応する新しいアルゴリズムであり、一定の学習率で広範に非適応的なアルゴリズムである勾配降下を改善する。特に、我々の理論的保証は凸性以上の構造的仮定を必要とせず、アルゴリズムは準最適ハイパーパラメータチューニングに確実に堅牢である。さらに,オンラインコンフォメーション予測,セットメンバシップ決定のための下流オンライン学習タスクを通じて,このようなメリットを実証する。 Online learning is not always about memorizing everything. Since the future can be statistically very different from the past, a critical challenge is to gracefully forget the history while new data comes in. To formalize this intuition, we revisit the classical notion of discounted regret using recently developed techniques in adaptive online learning. Our main result is a new algorithm that adapts to the complexity of both the loss sequence and the comparator, improving the widespread non-adaptive algorithm - gradient descent with a constant learning rate. In particular, our theoretical guarantee does not require any structural assumption beyond convexity, and the algorithm is provably robust to suboptimal hyperparameter tuning. We further demonstrate such benefits through online conformal prediction, a downstream online learning task with set-membership decisions.	翻訳日:2024-02-06 17:45:46 公開日:2024-02-05
# レコメンデーションのための時間サイクルモデリング Denoising Time Cycle Modeling for Recommendation ( http://arxiv.org/abs/2402.02718v1 ) ライセンス: Link先を確認	Sicong Xie, Qunwei Li, Weidi Xu, Kaiming Shen, Shaohu Chen, Wenliang Zhong	(参考訳) 近年,ユーザ・イテム相互作用の時間パターンのモデル化が推奨システムにおいて注目されている。既存の手法ではユーザの行動の時間的パターンは無視できる。対象品目と無関係なユーザの動作のサブセットをノイズとして定義し,対象とするタイムサイクルモデリングの性能を制限し,レコメンデーション性能に影響を与える。本稿では,ユーザの振る舞いを識別し,対象項目と関連性の高いユーザ行動のサブセットを選択する新しい手法として,Denoising Time Cycle Modeling (DiCycle)を提案する。 DiCycleは、推薦のために多様なサイクルパターンを明示的にモデル化することができる。公開ベンチマークと実世界のデータセットの両方で大規模な実験が行われ、最先端のレコメンデーションメソッドよりも優れたパフォーマンスを示している。 Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are irrelevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods.	翻訳日:2024-02-06 17:45:33 公開日:2024-02-05
# LLMエージェントの計画を理解する:調査 Understanding the planning of LLM agents: A survey ( http://arxiv.org/abs/2402.02716v1 ) ライセンス: Link先を確認	Xu Huang and Weiwen Liu and Xiaolong Chen and Xingmei Wang and Hao Wang and Defu Lian and Yasheng Wang and Ruiming Tang and Enhong Chen	(参考訳) 大規模言語モデル(LLM)は重要な知性を示しており、自律エージェントの計画モジュールとしてLLMを活用する進歩が注目されている。本調査は,計画能力の向上を目的とした最近の研究を対象とする,llmに基づくエージェント計画の最初の体系的視点を提供する。 LLM-Agent計画に関する既存の作業の分類を,タスク分解,計画選択,外部モジュール,リフレクション,メモリに分類することができる。各方向について総合的な分析を行い、研究分野のさらなる課題について論じる。 As Large Language Models (LLMs) have shown significant intelligence, the progress to leverage LLMs as planning modules of autonomous agents has attracted more attention. This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. We provide a taxonomy of existing works on LLM-Agent planning, which can be categorized into Task Decomposition, Plan Selection, External Module, Reflection and Memory. Comprehensive analyses are conducted for each direction, and further challenges for the field of research are discussed.	翻訳日:2024-02-06 17:45:20 公開日:2024-02-05
# ks-lottery:多言語モデルのための認定抽選券を見つける KS-Lottery: Finding Certified Lottery Tickets for Multilingual Language Models ( http://arxiv.org/abs/2402.02801v1 ) ライセンス: Link先を確認	Fei Yuan, Chang Ma, Shuai Yuan, Qiushi Sun, Lei Li	(参考訳) 宝くじ仮説は、ランダムに初期化されたニューラルネットワーク内で「勝利チケット」の存在を仮定する。微調整シナリオにおけるLLMの当選チケットは存在するか? そんな入賞券をどうやって見つけるの? 本稿では,多言語微調整に非常に有効なLLMパラメータの小さなサブセットを同定するKS-Lotteryを提案する。我々はKolmogorov-Smirnov Testを用いて、微調整前後のパラメータの分布変化を分析する。さらに ks-lottery が組込み層で認定入賞チケットを検索できることを理論的に証明し、得られたパラメータの微調整と完全な微調整が保証される。 KS-Lotteryと他のパラメータ効率の調整アルゴリズムとの比較実験により,KS-Lotteryは細調整のためのパラメータセットがはるかに小さく,かつ完全な微調整LDMと同等の性能を実現していることがわかった。驚いたことに、微調整された18個のトークンのLLaMA埋め込みは、微調整された翻訳性能に到達するのに十分である。コードとモデルは一般公開される予定だ。 The lottery ticket hypothesis posits the existence of ``winning tickets'' within a randomly initialized neural network. Do winning tickets exist for LLMs in fine-tuning scenarios? How can we find such winning tickets? In this paper, we propose KS-Lottery, a method to identify a small subset of LLM parameters highly effective in multilingual fine-tuning. Our key idea is to use Kolmogorov-Smirnov Test to analyze the distribution shift of parameters before and after fine-tuning. We further theoretically prove that KS-Lottery can find the certified winning tickets in the embedding layer, fine-tuning on the found parameters is guaranteed to perform as well as full fine-tuning. Comparing KS-Lottery with other parameter-efficient tuning algorithms on translation tasks, the experimental results show that KS-Lottery finds a much smaller set of parameters for fine-tuning while achieving the comparable performance as full fine-tuning LLM. Surprisingly, we find that fine-tuning 18 tokens' embedding of LLaMA suffices to reach the fine-tuning translation performance. Code and model will be released to the public.	翻訳日:2024-02-06 17:39:39 公開日:2024-02-05
# 部分的から厳密な構成的パーシングへ From Partial to Strictly Incremental Constituent Parsing ( http://arxiv.org/abs/2402.02782v1 ) ライセンス: Link先を確認	Ana Ezquerro, Carlos G\'omez-Rodr\'iguez, David Vilares	(参考訳) プレフィックス表現のみに基づいて木を出力できる能力を評価するために,インクリメンタル構成構文解析器について検討した。厳密な左から右への生成言語モデルとツリーデコードモジュールによって導かれ、言語間のインクリメンタル性を強く定義したパーサを構築します。これは漸進性を主張する作業の上に構築されるが、ほとんどはエンコーダまたはデコーダにのみ適用された。最後に,非インクリメンタルモデルおよび部分インクリメンタルモデルに対する分析を行う。 We study incremental constituent parsers to assess their capacity to output trees based on prefix representations alone. Guided by strictly left-to-right generative language models and tree-decoding modules, we build parsers that adhere to a strong definition of incrementality across languages. This builds upon work that asserted incrementality, but that mostly only enforced it on either the encoder or the decoder. Finally, we conduct an analysis against non-incremental and partially incremental models.	翻訳日:2024-02-06 17:39:19 公開日:2024-02-05
# 音響イベント検出のための二重知識蒸留 Dual Knowledge Distillation for Efficient Sound Event Detection ( http://arxiv.org/abs/2402.02781v1 ) ライセンス: Link先を確認	Yang Xiao, Rohan Kumar Das	(参考訳) 音響信号中の特定の音とその時間的位置を認識するには,音事象検出(SED)が不可欠である。これは特に計算リソースが限られているデバイス上のアプリケーションでは困難になる。本稿では,本研究で効率的なsedシステムを開発するために,dual knowledge distillationと呼ばれる新しい枠組みを提案する。提案する二重知識蒸留と時間平均知識蒸留(takd)は,生徒モデルのパラメータの時間平均化から得られた平均学生モデルを用いて行う。これにより、学生モデルは、訓練済みの教師モデルから間接的に学習することができ、安定した知識蒸留が保証される。次に, 留学生モデルに埋込蒸留層を組み込んで文脈学習を促進することを含む, 埋込型特徴蒸留(EEFD)を導入する。基本モデルのパラメータの3分の1しか持たない二重知識蒸留システムであるDCASE 2023 Task 4Aでは,PSDS1とPSDS2で優れた性能を示す。これは、エッジデバイスに理想的なコンパクトなSEDシステムのための二重知識蒸留の重要性を強調している。 Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals. This becomes challenging particularly for on-device applications, where computational resources are limited. To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work. Our proposed dual knowledge distillation commences with temporal-averaging knowledge distillation (TAKD), utilizing a mean student model derived from the temporal averaging of the student model's parameters. This allows the student model to indirectly learn from a pre-trained teacher model, ensuring a stable knowledge distillation. Subsequently, we introduce embedding-enhanced feature distillation (EEFD), which involves incorporating an embedding distillation layer within the student model to bolster contextual learning. On DCASE 2023 Task 4A public evaluation dataset, our proposed SED system with dual knowledge distillation having merely one-third of the baseline model's parameters, demonstrates superior performance in terms of PSDS1 and PSDS2. This highlights the importance of proposed dual knowledge distillation for compact SED systems, which can be ideal for edge devices.	翻訳日:2024-02-06 17:39:09 公開日:2024-02-05
# 高速不正確オラクルによるマトロイド最適化の高速化 Accelerating Matroid Optimization through Fast Imprecise Oracles ( http://arxiv.org/abs/2402.02774v1 ) ライセンス: Link先を確認	Franziska Eberle, Felix Hommelsheim, Alexander Lindermayr, Zhenwei Liu, Nicole Megow, Jens Schl\"oter	(参考訳) 正確な情報(例えば、交通モデル、データベースシステム、大規模なMLモデル)に対する複雑なモデルのクエリは、しばしば激しい計算と長い応答時間を必要とする。このように、不正確な結果を素早く得る弱いモデルは有利であり、より強いモデルへのクエリが少なくて不正確さを解決できる。マトロイドの最大重みの基底(多くの組合せ最適化問題のよく知られた一般化)を計算する基本的な問題において、アルゴリズムはクリーンオラクルにアクセスしてマトロイド情報をクエリすることができる。さらに、未知の、潜在的に異なるマトロイドをモデル化する高速だが汚いオラクルをアルゴリズムに装備する。我々は,汚れたオラクルの品質のみを使用する実用的なアルゴリズムを設計・解析する一方で,不正なマトロイドに対する堅牢性を維持しつつ,与えられた問題に対する古典的なアルゴリズムの性能に近づいている。特に、我々のアルゴリズムは、多くの点で最も有益であることを示す。さらに、他のマトロイドオラクルタイプ、非自由な汚いオークル、その他のマトロイド問題への拡張を概説する。 Querying complex models for precise information (e.g. traffic models, database systems, large ML models) often entails intense computations and results in long response times. Thus, weaker models which give imprecise results quickly can be advantageous, provided inaccuracies can be resolved using few queries to a stronger model. In the fundamental problem of computing a maximum-weight basis of a matroid, a well-known generalization of many combinatorial optimization problems, algorithms have access to a clean oracle to query matroid information. We additionally equip algorithms with a fast but dirty oracle modelling an unknown, potentially different matroid. We design and analyze practical algorithms which only use few clean queries w.r.t. the quality of the dirty oracle, while maintaining robustness against arbitrarily poor dirty matroids, approaching the performance of classic algorithms for the given problem. Notably, we prove that our algorithms are, in many respects, best-possible. Further, we outline extensions to other matroid oracle types, non-free dirty oracles and other matroid problems.	翻訳日:2024-02-06 17:38:47 公開日:2024-02-05
# コントラストディフューザ:コントラスト学習による高戻り状態に向けた計画 Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning ( http://arxiv.org/abs/2402.02772v1 ) ライセンス: Link先を確認	Yixiang Shan, Zhengbang Zhu, Ting Long, Qifan Liang, Yi Chang, Weinan Zhang, Liang Yin	(参考訳) 近年,長期計画のための強化学習における拡散モデルの適用が注目されている。いくつかの拡散法は任意の分布に対する拡散のモデリング能力をうまく活用している。これらの手法は計画のための後続の軌道を生成し、著しい改善を示している。しかし、これらの方法は、単純な基底分布と、異なる状態が異なるリターンを持つサンプルの多様性を見渡すことによって制限される。彼らは単に拡散を利用してオフラインデータセットの分布を学習し、その状態がオフラインデータセットと同じ分布を共有するトラジェクトリを生成する。その結果、これらのモデルが高リターン状態に達する確率は、データセットの分布に大きく依存する。誘導モデルも装備されているが、性能は抑えられている。そこで本稿では,これらの制約に対処するために,生成した軌道の状態から高リターン状態へ引き出す戻りコントラスト機構を考案し,低リターン状態から遠ざけてベース分布を改善するcdiffuserという新しい手法を提案する。提案手法の有効性を実証する14種類のd4rlベンチマーク実験を行った。私たちのコードはhttps://anonymous.4open.science/r/ContrastiveDiffuser.comで公開されています。 Applying diffusion models in reinforcement learning for long-term planning has gained much attention recently. Several diffusion-based methods have successfully leveraged the modeling capabilities of diffusion for arbitrary distributions. These methods generate subsequent trajectories for planning and have demonstrated significant improvement. However, these methods are limited by their plain base distributions and their overlooking of the diversity of samples, in which different states have different returns. They simply leverage diffusion to learn the distribution of offline dataset, generate the trajectories whose states share the same distribution with the offline dataset. As a result, the probability of these models reaching the high-return states is largely dependent on the dataset distribution. Even equipped with the guidance model, the performance is still suppressed. To address these limitations, in this paper, we propose a novel method called CDiffuser, which devises a return contrast mechanism to pull the states in generated trajectories towards high-return states while pushing them away from low-return states to improve the base distribution. Experiments on 14 commonly used D4RL benchmarks demonstrate the effectiveness of our proposed method. Our code is publicly available at https://anonymous.4open.science/r/ContrastiveDiffuser.	翻訳日:2024-02-06 17:38:17 公開日:2024-02-05
# 正規化の指導から学ぶ: 一般化可能な相関は模倣しやすい Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate ( http://arxiv.org/abs/2402.02769v1 ) ライセンス: Link先を確認	Can Jin, Tong Che, Hongwu Peng, Yiyuan Li, Marco Pavone	(参考訳) 一般化は機械学習の中心的な課題である。本研究では,深層ニューラルネットワークのための新しい正規化手法であるlearning from teaching (lot)を提案する。簡潔で抽象的なパターンを捉える人間の能力に触発されて、一般化可能な相関は教えやすいと仮定する。 LoTはこの概念を運用し、補助的学習者によるメインモデルの一般化を改善する。学生学習者は、主モデルによって訓練され、主モデルを改善し、フィードバックを提供することで、より一般化し、教示可能な相関関係を捉える。コンピュータビジョン,自然言語処理,強化学習など,いくつかの領域にわたる実験結果から,LoTの導入は,本来のトレーニングデータ上でのトレーニングモデルに比べて,大きなメリットをもたらすことが示された。これは、データ内の複雑なパターンの沼に陥ることなく、一般化可能な情報を識別するLoTの有効性を示唆している。 Generalization remains a central challenge in machine learning. In this work, we propose Learning from Teaching (LoT), a novel regularization technique for deep neural networks to enhance generalization. Inspired by the human ability to capture concise and abstract patterns, we hypothesize that generalizable correlations are expected to be easier to teach. LoT operationalizes this concept to improve the generalization of the main model with auxiliary student learners. The student learners are trained by the main model and improve the main model to capture more generalizable and teachable correlations by providing feedback. Our experimental results across several domains, including Computer Vision, Natural Language Processing, and Reinforcement Learning, demonstrate that the introduction of LoT brings significant benefits compared to merely training models on the original training data. It suggests the effectiveness of LoT in identifying generalizable information without falling into the swamp of complex patterns in data, making LoT a valuable addition to the current machine learning frameworks.	翻訳日:2024-02-06 17:37:51 公開日:2024-02-05
# 検索・検索付加生成のためのリスト対応リグレード・トランケーション・ジョイントモデル List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation ( http://arxiv.org/abs/2402.02764v1 ) ライセンス: Link先を確認	Shicheng Xu, Liang Pang, Jun Xu, Huawei Shen, Xueqi Cheng	(参考訳) 情報検索(IR)の結果は通常、人間のウェブ検索や大規模言語モデル(LLM)の検索強化生成など、候補文書のランク付けされたリストの形式で提示される。リストアウェア検索は、リストレベルのコンテキスト機能をキャプチャして、リストを返却することを目的としている。リスト内の文書を細かく再スコアする。トランケーションは、ランクリストのカットオフポイントを動的に決定し、関連性全体のトレードオフと無関係な文書からの誤情報を避ける。以前の研究では、それらを2つの別々のタスクとして扱い、個別にモデル化した。しかし、分離は最適ではない。まず,2つのタスク間でランキングリストのコンテキスト情報を共有することは困難である。第二に、分離されたパイプラインは通常エラー蓄積問題に満ちており、再ランキングステージからの小さなエラーがトランザクションステージに大きく影響する可能性がある。これらの問題を解決するために,2つのタスクを同時に実行可能なRe rank-Truncation Joint Model (GenRT)を提案する。 GenRTは、エンコーダ-デコーダアーキテクチャに基づく生成パラダイムによるリランクとトランケーションを統合している。また, 協調最適化のための新しい損失関数を設計し, モデルが両方のタスクを学習できるようにする。ジョイントモデルによるパラメータの共有は、この2つのタスクの共通モデリング情報を最大限に活用することにつながる。さらに、2つのタスクを同時に実行し、異なるステージ間のエラー蓄積問題を解決するために協調最適化する。オープンドメインQ&Aタスクと公開学習ベンチマークを用いた実験により,Web検索および検索拡張 LLM における再ランク化タスクとトランケーションタスクの両方においてSOTA性能が達成された。 The results of information retrieval (IR) are usually presented in the form of a ranked list of candidate documents, such as web search for humans and retrieval-augmented generation for large language models (LLMs). List-aware retrieval aims to capture the list-level contextual features to return a better list, mainly including reranking and truncation. Reranking finely re-scores the documents in the list. Truncation dynamically determines the cut-off point of the ranked list to achieve the trade-off between overall relevance and avoiding misinformation from irrelevant documents. Previous studies treat them as two separate tasks and model them separately. However, the separation is not optimal. First, it is hard to share the contextual information of the ranking list between the two tasks. Second, the separate pipeline usually meets the error accumulation problem, where the small error from the reranking stage can largely affect the truncation stage. To solve these problems, we propose a Reranking-Truncation joint model (GenRT) that can perform the two tasks concurrently. GenRT integrates reranking and truncation via generative paradigm based on encoder-decoder architecture. We also design the novel loss functions for joint optimization to make the model learn both tasks. Sharing parameters by the joint model is conducive to making full use of the common modeling information of the two tasks. Besides, the two tasks are performed concurrently and co-optimized to solve the error accumulation problem between separate stages. Experiments on public learning-to-rank benchmarks and open-domain Q\&A tasks show that our method achieves SOTA performance on both reranking and truncation tasks for web search and retrieval-augmented LLMs.	翻訳日:2024-02-06 17:37:22 公開日:2024-02-05
# 改良hough変換に基づく伝送線路検出 Transmission Line Detection Based on Improved Hough Transform ( http://arxiv.org/abs/2402.02761v1 ) ライセンス: Link先を確認	Wei Song, Pei Li, Man Wang	(参考訳) uav(unmanned aerial vehicle)画像における伝送線路の低検出精度と高い偽陽性率の課題に対処するために,線形特徴と空間分布について検討する。複雑な背景における伝送線路検出に適した拡張確率Hough変換手法を提案する。送信線の初期前処理にHessian行列を用い,境界探索と画素行分割を利用して,送信線領域を背景と区別する。誤検出と誤検出の両方を著しく削減し,伝送線路同定の精度を向上させる。実験により, 従来のハフ変換法やランダムハフ変換法に比べて, 画像の処理速度が速いだけでなく, 優れた検出結果が得られることを示した。 To address the challenges of low detection accuracy and high false positive rates of transmission lines in UAV (Unmanned Aerial Vehicle) images, we explore the linear features and spatial distribution. We introduce an enhanced stochastic Hough transform technique tailored for detecting transmission lines in complex backgrounds. By employing the Hessian matrix for initial preprocessing of transmission lines, and utilizing boundary search and pixel row segmentation, our approach distinguishes transmission line areas from the background. We significantly reduce both false positives and missed detections, thereby improving the accuracy of transmission line identification. Experiments demonstrate that our method not only processes images more rapidly, but also yields superior detection results compared to conventional and random Hough transform methods.	翻訳日:2024-02-06 17:36:00 公開日:2024-02-05
# 解釈可能な音響分類のための焦点変調ネットワーク Focal Modulation Networks for Interpretable Sound Classification ( http://arxiv.org/abs/2402.02754v1 ) ライセンス: Link先を確認	Luca Della Libera, Cem Subakan, Mirco Ravanelli	(参考訳) ディープニューラルネットワークの成功の増加は、その固有のブラックボックスの性質に対する懸念を高め、解釈可能性と信頼に関する課題を提起している。視覚と言語における解釈技術は広く研究されてきたが、音声領域における解釈可能性については、主にポストホックな説明に焦点が当てられている。本稿では,最近提案されている注意のない焦点変調ネットワーク(focalnets)を用いて,音声領域における可読性 by-design の問題に対処する。本研究では,FocalNetsを環境音の分類タスクに適用し,その解釈可能性特性をESC-50データセット上で評価する。本手法は, 精度と解釈性の両方において, 同様の大きさの視覚トランスフォーマーよりも優れている。さらに、音声領域におけるポストホック解釈に特化して設計されたPIQと競合する。 The increasing success of deep neural networks has raised concerns about their inherent black-box nature, posing challenges related to interpretability and trust. While there has been extensive exploration of interpretation techniques in vision and language, interpretability in the audio domain has received limited attention, primarily focusing on post-hoc explanations. This paper addresses the problem of interpretability by-design in the audio domain by utilizing the recently proposed attention-free focal modulation networks (FocalNets). We apply FocalNets to the task of environmental sound classification for the first time and evaluate their interpretability properties on the popular ESC-50 dataset. Our method outperforms a similarly sized vision transformer both in terms of accuracy and interpretability. Furthermore, it is competitive against PIQ, a method specifically designed for post-hoc interpretation in the audio domain.	翻訳日:2024-02-06 17:35:45 公開日:2024-02-05
# 共有量子コンピューティング環境におけるクロストーク攻撃と防御 Crosstalk Attacks and Defence in a Shared Quantum Computing Environment ( http://arxiv.org/abs/2402.02753v1 ) ライセンス: Link先を確認	Benjamin Harper, Behnam Tonekaboni, Bahar Goldozian, Martin Sevior, Muhammad Usman	(参考訳) 量子コンピューティングは、古典的なコンピュータでは難解な問題に対する解決策を提供する可能性があるが、現在の世代の量子コンピュータの精度は、ノイズや漏れ、クロストーク、デフォーカス、振幅減衰といったエラーの影響に悩まされている。量子コンピュータへのアクセスは、ほとんどクラウドベースのサービスを通じて共有環境内にあるため、敵はクロストークノイズを利用して近くの量子ビットの量子計算を中断し、量子回路を慎重に設計し、意図的に間違った答えを導くことができる。本稿では,IBM Quantum コンピュータ上でのトモグラフィによるクロストークノイズの広さと特性を分析し,クロストークシミュレーションモデルを改良した。その結果、クロストークノイズはIBMの量子ハードウェアにおける重大なエラーの原因であり、クロストークベースの攻撃は共有環境における量子コンピューティングの脅威となることが示唆された。 IBMハードウェアに対してベンチマークしたクロストークシミュレータに基づいて、クロストーク攻撃の影響を評価し、クロストーク効果を緩和するための戦略を開発する。シミュレーションのシステマティックセットを通じて,回路分離,強化学習によるキュービット割り当て最適化,およびオブザーバ量子ビットの利用という3つのクロストーク攻撃緩和戦略の有効性を評価し,それぞれが様々な成功度でクロストーク攻撃を克服し,共有プラットフォームにおける量子コンピューティングの確保を支援することを示す。 Quantum computing has the potential to provide solutions to problems that are intractable on classical computers, but the accuracy of the current generation of quantum computers suffer from the impact of noise or errors such as leakage, crosstalk, dephasing, and amplitude damping among others. As the access to quantum computers is almost exclusively in a shared environment through cloud-based services, it is possible that an adversary can exploit crosstalk noise to disrupt quantum computations on nearby qubits, even carefully designing quantum circuits to purposely lead to wrong answers. In this paper, we analyze the extent and characteristics of crosstalk noise through tomography conducted on IBM Quantum computers, leading to an enhanced crosstalk simulation model. Our results indicate that crosstalk noise is a significant source of errors on IBM quantum hardware, making crosstalk based attack a viable threat to quantum computing in a shared environment. Based on our crosstalk simulator benchmarked against IBM hardware, we assess the impact of crosstalk attacks and develop strategies for mitigating crosstalk effects. Through a systematic set of simulations, we assess the effectiveness of three crosstalk attack mitigation strategies, namely circuit separation, qubit allocation optimization via reinforcement learning, and the use of spectator qubits, and show that they all overcome crosstalk attacks with varying degrees of success and help to secure quantum computing in a shared platform.	翻訳日:2024-02-06 17:35:02 公開日:2024-02-05
# KIVI: KVキャッシュのためのチューニング不要な非対称2ビット量子化 KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache ( http://arxiv.org/abs/2402.02750v1 ) ライセンス: Link先を確認	Zirui Liu, Jiayi Yuan, Hongye Jin, Shaochen Zhong, Zhaozhuo Xu, Vladimir Braverman, Beidi Chen, Xia Hu	(参考訳) 大規模言語モデル(LLM)の効率的な提供には,要求毎のコスト削減のために,多数のリクエストをバッチ処理する必要がある。しかし、再計算を避けるためにアテンションキーと値を保存するキーバリュー(KV)キャッシュは、メモリ要求を大幅に増加させ、スピードとメモリ使用における新たなボトルネックとなる。このメモリ要求は、バッチサイズと長いコンテキスト長によって増加する。さらに、推論速度はKVキャッシュのサイズによって制限されるため、GPUのSRAMは、生成されたトークン毎にメインGPUメモリからKVキャッシュ全体をロードする必要があるため、このプロセス中に計算コアがアイドルになる。 KVキャッシュサイズを減らすための単純で効果的な解決策は量子化であり、KVキャッシュが取る全バイトを削減する。しかし、KVキャッシュ量子化の硬さと限界を理解するため、KVキャッシュの要素分布を探索する詳細な研究は存在しない。このギャップを埋めるために、人気のあるLCMのKVキャッシュにおける要素分布に関する総合的研究を行った。以上の結果から,キーキャッシュはチャネル単位,すなわちチャネル次元に沿ってグループ要素を量子化し,それらを量子化するべきである。対照的に、値キャッシュはトーケン毎に量子化されるべきである。この解析から,チューニングフリーな2ビットkvキャッシュ量子化アルゴリズムkiviを開発した。ハードウェアフレンドリーな実装により、kiviはllama (llama-2)、falcon、mistralモデルでほぼ同じ品質を維持しつつ、$\mathbf{2.6\times}$のピークメモリ使用量(モデル重量を含む)を削減できる。このメモリ使用量の削減は、$\mathbf{4\times}$より大きなバッチサイズを可能にし、実際のLCM推論ワークロードで$\mathbf{2.35\times \sim 3.47\times}$スループットをもたらす。ソースコードはhttps://github.com/jy-yuan/kiviで入手できる。 Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. This memory demand increases with larger batch sizes and longer context lengths. Additionally, the inference speed is limited by the size of KV cache, as the GPU's SRAM must load the entire KV cache from the main GPU memory for each token generated, causing the computational core to be idle during this process. A straightforward and effective solution to reduce KV cache size is quantization, which decreases the total bytes taken by KV cache. However, there is a lack of in-depth studies that explore the element distribution of KV cache to understand the hardness and limitation of KV cache quantization. To fill the gap, we conducted a comprehensive study on the element distribution in KV cache of popular LLMs. Our findings indicate that the key cache should be quantized per-channel, i.e., group elements along the channel dimension and quantize them together. In contrast, the value cache should be quantized per-token. From this analysis, we developed a tuning-free 2bit KV cache quantization algorithm, named KIVI. With the hardware-friendly implementation, KIVI can enable Llama (Llama-2), Falcon, and Mistral models to maintain almost the same quality while using $\mathbf{2.6\times}$ less peak memory usage (including the model weight). This reduction in memory usage enables up to $\mathbf{4\times}$ larger batch size, bringing $\mathbf{2.35\times \sim 3.47\times}$ throughput on real LLM inference workload. The source code is available at https://github.com/jy-yuan/KIVI.	翻訳日:2024-02-06 17:34:36 公開日:2024-02-05
# 高次元ベイズ最適化に必要な標準ガウス過程 Standard Gaussian Process is All You Need for High-Dimensional Bayesian Optimization ( http://arxiv.org/abs/2402.02746v1 ) ライセンス: Link先を確認	Zhitong Xu, Shandian Zhe	(参考訳) 標準ガウス過程 (GP) を持つベイズ最適化 (BO) は高次元最適化問題では有効ではないという長年にわたる広く信じられてきた。この認識の一部は、GPが共分散モデリングと関数推定のために高次元入力に苦しむ直観に由来するかもしれない。これらの懸念は妥当に見えるが、この信念を支持する実証的な証拠は不足している。本稿では,高次元最適化のための様々な合成および実世界のベンチマーク問題に対して,標準GP回帰を用いたBOを体系的に検討した。驚くべきことに、標準gpのパフォーマンスは一貫して最高のものとなり、特に高次元最適化のために設計された既存のboメソッドを大きなマージンで上回っている。ステレオタイプとは対照的に,標準GPは高次元対象関数の学習に有効な代理として機能することがわかった。強い構造的仮定がなければ、標準 GP を持つ BO は高次元最適化に優れるだけでなく、対象関数内の様々な構造を調節する上でも堅牢である。さらに、標準GPでは、より複雑な代理モデルで必要とされる高価なマルコフ-チェインモンテカルロサンプリング(MCMC)の必要性を排除し、最大推定だけを用いることで、期待できる最適化性能を達成することができる。そこで我々は,高次元問題に対する標準ボのポテンシャルの再評価と詳細な研究を提唱する。 There has been a long-standing and widespread belief that Bayesian Optimization (BO) with standard Gaussian process (GP), referred to as standard BO, is ineffective in high-dimensional optimization problems. This perception may partly stem from the intuition that GPs struggle with high-dimensional inputs for covariance modeling and function estimation. While these concerns seem reasonable, empirical evidence supporting this belief is lacking. In this paper, we systematically investigated BO with standard GP regression across a variety of synthetic and real-world benchmark problems for high-dimensional optimization. Surprisingly, the performance with standard GP consistently ranks among the best, often outperforming existing BO methods specifically designed for high-dimensional optimization by a large margin. Contrary to the stereotype, we found that standard GP can serve as a capable surrogate for learning high-dimensional target functions. Without strong structural assumptions, BO with standard GP not only excels in high-dimensional optimization but also proves robust in accommodating various structures within the target functions. Furthermore, with standard GP, achieving promising optimization performance is possible by only using maximum likelihood estimation, eliminating the need for expensive Markov-Chain Monte Carlo (MCMC) sampling that might be required by more complex surrogate models. We thus advocate for a re-evaluation and in-depth study of the potential of standard BO in addressing high-dimensional problems.	翻訳日:2024-02-06 17:33:59 公開日:2024-02-05
# Koopman演算子を用いた局所過勾配推定 Glocal Hypergradient Estimation with Koopman Operator ( http://arxiv.org/abs/2402.02741v1 ) ライセンス: Link先を確認	Ryuichiro Hataya and Yoshinobu Kawahara	(参考訳) 勾配に基づくハイパーパラメータ最適化手法は、ハイパーパラメータに対するメタ基準の勾配である過勾配を用いてハイパーパラメータを更新する。これまでの研究では、2つの異なるアップデート戦略を使用していた: モデルトレーニングを完了した後に得られたグローバルハイパーグレードを使用してハイパーパラメータを最適化するか、いくつかのモデル更新ごとに派生したローカルハイパーグレードを最適化する。グローバル・ハイパーグレードエントは信頼性を提供するが、計算コストは重要であり、逆にローカル・ハイパーグレードエントは速度を提供するが、しばしば最適ではない。本稿では,「グローバル」品質と「ローカル」効率を融合したglocal hypergradient estimationを提案する。この目的のために、我々はKoopman演算子理論を用いて超勾配の力学を線形化し、大域超勾配を局所超勾配の軌道を用いてのみ効率的に近似することができる。その結果、推定されたグローバル・ハイパーグレードエントを用いて高パラメータを柔軟に最適化し、信頼性と効率を同時に達成することができる。最適化器の最適化を含むハイパーパラメータ最適化の数値実験を通じて,glocal hypergradient estimationの有効性を示す。 Gradient-based hyperparameter optimization methods update hyperparameters using hypergradients, gradients of a meta criterion with respect to hyperparameters. Previous research used two distinct update strategies: optimizing hyperparameters using global hypergradients obtained after completing model training or local hypergradients derived after every few model updates. While global hypergradients offer reliability, their computational cost is significant; conversely, local hypergradients provide speed but are often suboptimal. In this paper, we propose glocal hypergradient estimation, blending "global" quality with "local" efficiency. To this end, we use the Koopman operator theory to linearize the dynamics of hypergradients so that the global hypergradients can be efficiently approximated only by using a trajectory of local hypergradients. Consequently, we can optimize hyperparameters greedily using estimated global hypergradients, achieving both reliability and efficiency simultaneously. Through numerical experiments of hyperparameter optimization, including optimization of optimizers, we demonstrate the effectiveness of the glocal hypergradient estimation.	翻訳日:2024-02-06 17:33:34 公開日:2024-02-05
# DisDet: 拡散モデルによるバックドア攻撃の検出可能性を探る DisDet: Exploring Detectability of Backdoor Attack on Diffusion Models ( http://arxiv.org/abs/2402.02739v1 ) ライセンス: Link先を確認	Yang Sui, Huy Phan, Jinqi Xiao, Tianfang Zhang, Zijie Tang, Cong Shi, Yan Wang, Yingying Chen, Bo Yuan	(参考訳) エキサイティングな生成AIの時代、拡散モデルは、さまざまなデータモダリティのための非常に強力で広く採用されているコンテンツ生成および編集ツールとして現れ、潜在的なセキュリティリスクの研究が極めて必要かつ重要になっている。最近では、いくつかの先駆的な研究がバックドア攻撃に対する拡散モデルの脆弱性を示し、この人気で基本的なai技術のセキュリティ上の課題を詳細に分析し調査している。本稿では, バックドア拡散モデルにおける有毒音入力の検知可能性について, 既存の研究ではほとんど検討されていない重要な性能指標として, 初めて系統的に検討する。ディフェンダーの観点から,既存の拡散バックドア攻撃におけるトリガーパターンの特性を解析し,トロイの木馬検出における分布不一致の重要な役割を明らかにする。そこで本研究では, 有毒な入力ノイズを効果的に検出できる低コストトリガー検出機構を提案する。次に,攻撃側からも同様の問題を研究するためのさらなる一歩を踏み出し,我々の提案する検出スキームを回避できる無意味なトリガを学習できるバックドア攻撃戦略を提案する。各種拡散モデルおよびデータセットの実験的評価は、提案したトリガー検出および検出回避攻撃戦略の有効性を示す。トリガ検出には,既存の作業で使用されているトロイの木馬トリガの100倍の検知率が得られる。提案するステルストリガー設計手法は, 有毒雑音入力アプローチを良性雑音に分散させるためにエンドツーエンド学習を行い, バックドア拡散モデルにおいて, 高い攻撃率と良性性能を有する100～%近い検出パスレートを実現する。 In the exciting generative AI era, the diffusion model has emerged as a very powerful and widely adopted content generation and editing tool for various data modalities, making the study of their potential security risks very necessary and critical. Very recently, some pioneering works have shown the vulnerability of the diffusion model against backdoor attacks, calling for in-depth analysis and investigation of the security challenges of this popular and fundamental AI technique. In this paper, for the first time, we systematically explore the detectability of the poisoned noise input for the backdoored diffusion models, an important performance metric yet little explored in the existing works. Starting from the perspective of a defender, we first analyze the properties of the trigger pattern in the existing diffusion backdoor attacks, discovering the important role of distribution discrepancy in Trojan detection. Based on this finding, we propose a low-cost trigger detection mechanism that can effectively identify the poisoned input noise. We then take a further step to study the same problem from the attack side, proposing a backdoor attack strategy that can learn the unnoticeable trigger to evade our proposed detection scheme. Empirical evaluations across various diffusion models and datasets demonstrate the effectiveness of the proposed trigger detection and detection-evading attack strategy. For trigger detection, our distribution discrepancy-based solution can achieve a 100\% detection rate for the Trojan triggers used in the existing works. For evading trigger detection, our proposed stealthy trigger design approach performs end-to-end learning to make the distribution of poisoned noise input approach that of benign noise, enabling nearly 100\% detection pass rate with very high attack and benign performance for the backdoored diffusion models.	翻訳日:2024-02-06 17:33:16 公開日:2024-02-05
# 教師なし時系列異常検出のためのVAEの再検討:周波数視点 Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective ( http://arxiv.org/abs/2402.02820v1 ) ライセンス: Link先を確認	Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei, Saravan Rajmohan, Dongmei Zhang, Qingwei Lin, Haiming Zhang, Jianhui Li, Gaogang Xie	(参考訳) 時系列異常検出(AD)は、Webシステムにおいて重要な役割を果たす。様々なウェブシステムは、リアルタイムで異常を監視し識別するために時系列データに依存し、診断と修復の手順を開始する。変分オートエンコーダ(VAE)は、異常検出に有用な優れたノイズ除去能力のために、近年人気を集めている。しかし,vaeに基づく手法では,長周期の異種パターンと短周期の詳細な傾向を同時に捉えることが困難であることが明らかとなった。これらの課題に対処するため,周波数拡張型条件変分オートエンコーダ (FCVAE) を提案する。正確なADを保証するため、FCVAEは、グローバルとローカルの両方の周波数特徴を条件付き変分オートエンコーダ(CVAE)の条件に並列に統合し、通常のデータ再構成の精度を大幅に向上させる革新的なアプローチを採用している。この手法は、注意深く設計された「ターゲットアテンション」機構とともに、周波数領域から最も有用な情報を選び、短周期トレンド構築を改善する。我々のFCVAEは、パブリックデータセットと大規模クラウドシステムで評価されており、その結果、最先端の手法よりも優れていることが示された。これにより,現在のvaeに基づく異常検出モデルの限界に対処する上で,本手法の実用的適用性が確認できる。 Time series Anomaly Detection (AD) plays a crucial role for web systems. Various web systems rely on time series data to monitor and identify anomalies in real time, as well as to initiate diagnosis and remediation procedures. Variational Autoencoders (VAEs) have gained popularity in recent decades due to their superior de-noising capabilities, which are useful for anomaly detection. However, our study reveals that VAE-based methods face challenges in capturing long-periodic heterogeneous patterns and detailed short-periodic trends simultaneously. To address these challenges, we propose Frequency-enhanced Conditional Variational Autoencoder (FCVAE), a novel unsupervised AD method for univariate time series. To ensure an accurate AD, FCVAE exploits an innovative approach to concurrently integrate both the global and local frequency features into the condition of Conditional Variational Autoencoder (CVAE) to significantly increase the accuracy of reconstructing the normal data. Together with a carefully designed "target attention" mechanism, our approach allows the model to pick the most useful information from the frequency domain for better short-periodic trend construction. Our FCVAE has been evaluated on public datasets and a large-scale cloud system, and the results demonstrate that it outperforms state-of-the-art methods. This confirms the practical applicability of our approach in addressing the limitations of current VAE-based anomaly detection models.	翻訳日:2024-02-06 17:26:25 公開日:2024-02-05
# プリ・イン・ポスト処理による線形不等式制約付きベイズ・オプティカルフェア分類 Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing ( http://arxiv.org/abs/2402.02817v1 ) ライセンス: Link先を確認	Xianli Zeng, Guang Cheng and Edgar Dobriban	(参考訳) 機械学習アルゴリズムは保護されたグループに異なる影響を与える可能性がある。そこで我々は,与えられた群フェアネス制約による分類誤差を最小限に抑えるため,ベイズ最適公正分類法を開発した。本稿では,確率的分類器の線形関数である \emph{linear disparity measures} の概念と,群的回帰関数においても線型である \emph{bilinear disparity measures} を導入する。人口格差、機会の平等、予測平等からの逸脱など、いくつかの一般的な格差対策が双線形であることを示します。ベイズ最適公正分類器の形式は、ネマン・ピアソン補題との接続を明らかにすることによって、単一の線形不均等測度の下で得られる。双線型差分法では、ベイズ最適公正分類器はグループワイドのしきい値規則となる。提案手法は,複数のフェアネス制約(等化オッズなど)を扱うことができ,予測フェーズでは保護属性が使用できない場合の共通シナリオも扱うことができる。理論結果の活用により,双線型不等式制約下でフェアベイズ最適分類器を学習する手法を考案する。提案手法は,事前処理(フェアアップおよびダウンサンプリング),インプロセッシング(フェアコストセンシティブな分類),ポストプロセッシング(フェアプラグインルール)という,フェアネスアウェア分類の一般的な3つのアプローチをカバーする。本手法は, 最適値-精度トレードオフを達成しつつ, 相違を直接制御する。提案手法が既存のアルゴリズムと良好に比較できることを実証的に示す。 Machine learning algorithms may have disparate impacts on protected groups. To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints. We introduce the notion of \emph{linear disparity measures}, which are linear functions of a probabilistic classifier; and \emph{bilinear disparity measures}, which are also linear in the group-wise regression functions. We show that several popular disparity measures -- the deviations from demographic parity, equality of opportunity, and predictive equality -- are bilinear. We find the form of Bayes-optimal fair classifiers under a single linear disparity measure, by uncovering a connection with the Neyman-Pearson lemma. For bilinear disparity measures, Bayes-optimal fair classifiers become group-wise thresholding rules. Our approach can also handle multiple fairness constraints (such as equalized odds), and the common scenario when the protected attribute cannot be used at the prediction phase. Leveraging our theoretical results, we design methods that learn fair Bayes-optimal classifiers under bilinear disparity constraints. Our methods cover three popular approaches to fairness-aware classification, via pre-processing (Fair Up- and Down-Sampling), in-processing (Fair Cost-Sensitive Classification) and post-processing (a Fair Plug-In Rule). Our methods control disparity directly while achieving near-optimal fairness-accuracy tradeoffs. We show empirically that our methods compare favorably to existing algorithms.	翻訳日:2024-02-06 17:25:51 公開日:2024-02-05
# 勧告の両面公正性 Intersectional Two-sided Fairness in Recommendation ( http://arxiv.org/abs/2402.02816v1 ) ライセンス: Link先を確認	Yifan Wang, Peijie Sun, Weizhi Ma, Min Zhang, Yuan Zhang, Peng Jiang, Shaoping Ma	(参考訳) 推薦システム(RS)の公正性は近年注目を集めている。関係する利害関係者に基づいて、RSの公平性は、ユーザフェアネス、アイテムフェアネス、およびユーザフェアネスとアイテムフェアネスの両方を同時に考慮する両側フェアネスに分けられる。しかし,本論文における実世界データに関する実証的研究により,RSが両面公正であっても,交差する二面不公平性は依然として存在すると論じる。この問題を軽減するため,我々は交叉2面フェアネスレコメンデーション(itfr)と呼ばれる新しいアプローチを提案する。本手法は,不利なグループを知覚するためにシャープネス認識損失を利用し,協調的損失バランスを用いて異なる交叉群に対して一貫した識別能力を開発する。さらに、予測スコア正規化を利用して、正の予測スコアを異なる交叉群で相当に評価する。 3つの公開データセットの大規模な実験と分析により,提案手法は両面の不公平性を効果的に軽減し,従来の最先端手法を一貫して上回ることを示す。 Fairness of recommender systems (RS) has attracted increasing attention recently. Based on the involved stakeholders, the fairness of RS can be divided into user fairness, item fairness, and two-sided fairness which considers both user and item fairness simultaneously. However, we argue that the intersectional two-sided unfairness may still exist even if the RS is two-sided fair, which is observed and shown by empirical studies on real-world data in this paper, and has not been well-studied previously. To mitigate this problem, we propose a novel approach called Intersectional Two-sided Fairness Recommendation (ITFR). Our method utilizes a sharpness-aware loss to perceive disadvantaged groups, and then uses collaborative loss balance to develop consistent distinguishing abilities for different intersectional groups. Additionally, predicted score normalization is leveraged to align positive predicted scores to fairly treat positives in different intersectional groups. Extensive experiments and analyses on three public datasets show that our proposed approach effectively alleviates the intersectional two-sided unfairness and consistently outperforms previous state-of-the-art methods.	翻訳日:2024-02-06 17:25:21 公開日:2024-02-05
# 統計・物理・超学習グラフモデルによる都市大気汚染状態の推定 State estimation of urban air pollution with statistical, physical, and super-learning graph models ( http://arxiv.org/abs/2402.02812v1 ) ライセンス: Link先を確認	Matthieu Dolbeault, Olga Mula and Agust\'in Somacal	(参考訳) 都市大気汚染マップのリアルタイム再構築の問題点を考察する。このタスクは、利用可能なデータの不均一なソース、直接測定の不足、ノイズの存在、考慮すべき大きな表面のために困難である。本研究は,都市グラフ上の問題に対するポーズに基づく異なる再構成手法を提案する。私たちの戦略は、完全なデータ駆動、物理駆動、ハイブリッドに分類でき、それらをスーパーラーニングモデルと組み合わせます。この手法の性能は、フランスのパリ市内で試験されている。 We consider the problem of real-time reconstruction of urban air pollution maps. The task is challenging due to the heterogeneous sources of available data, the scarcity of direct measurements, the presence of noise, and the large surfaces that need to be considered. In this work, we introduce different reconstruction methods based on posing the problem on city graphs. Our strategies can be classified as fully data-driven, physics-driven, or hybrid, and we combine them with super-learning models. The performance of the methods is tested in the case of the inner city of Paris, France.	翻訳日:2024-02-06 17:24:53 公開日:2024-02-05
# mciにおける神経変性のマルチスケールfmri時系列解析 Multi-scale fMRI time series analysis for understanding neurodegeneration in MCI ( http://arxiv.org/abs/2402.02811v1 ) ライセンス: Link先を確認	Ammu R., Debanjali Bhattacharya, Ameiy Acharya, Ninad Aithal and Neelam Sinha	(参考訳) 本研究では、休息状態のfmriボリュームに適用するマルチスケールビュー(脳ネットワークレベルと局所スケールを意味するグローバルスケール)にまたがる手法を提案する。深層学習に基づく分類は神経変性の理解に利用される。提案手法の新規性は、2つの極端な分析スケールを利用することである。あるブランチは、グラフ分析フレームワーク内のネットワーク全体を考慮している。同時に、第2のブランチはネットワーク内の各ROIを独立して精査し、ダイナミクスの進化に焦点を当てる。グラフベースのアプローチでは、各ROIがノードである1つのグラフで対象をプロファイルするために部分的相関を採用し、参加レベルの違いに関する洞察を提供する。対照的に、非線形解析では、被写体をマルチチャネル2次元画像としてプロファイルするために再帰プロットを用い、基礎となるダイナミクスの区別を明らかにする。提案手法は、ADNIデータセットから得られた50個の健康制御(HC)と50個のマイルド認知障害(MCI)のコホートを分類するために用いられる。その結果, (1) MCIにおけるPCCなどのROIの低下 (2) MCIにおける後頭葉の活性は, HC (3) では解析できないが, MCIのすべてのROIは時系列において予測可能性が高いことがわかった。 In this study, we present a technique that spans multi-scale views (global scale -- meaning brain network-level and local scale -- examining each individual ROI that constitutes the network) applied to resting-state fMRI volumes. Deep learning based classification is utilized in understanding neurodegeneration. The novelty of the proposed approach lies in utilizing two extreme scales of analysis. One branch considers the entire network within graph-analysis framework. Concurrently, the second branch scrutinizes each ROI within a network independently, focusing on evolution of dynamics. For each subject, graph-based approach employs partial correlation to profile the subject in a single graph where each ROI is a node, providing insights into differences in levels of participation. In contrast, non-linear analysis employs recurrence plots to profile a subject as a multichannel 2D image, revealing distinctions in underlying dynamics. The proposed approach is employed for classification of a cohort of 50 healthy control (HC) and 50 Mild Cognitive Impairment (MCI), sourced from ADNI dataset. Results point to: (1) reduced activity in ROIs such as PCC in MCI (2) greater activity in occipital in MCI, which is not seen in HC (3) when analysed for dynamics, all ROIs in MCI show greater predictability in time-series.	翻訳日:2024-02-06 17:24:44 公開日:2024-02-05
# 音は系統再建のために聞こえるか? Are Sounds Sound for Phylogenetic Reconstruction? ( http://arxiv.org/abs/2402.02807v1 ) ライセンス: Link先を確認	Luise H\"auser and Gerhard J\"ager and Taraka Rama and Johann-Mattis List and Alexandros Stamatakis	(参考訳) 言語進化に関する伝統的な研究において、学者はしばしば、言語系統樹の系統的推論における音法則と音対応の重要性を強調している。しかし、これまで計算手法はこの可能性を考慮していない。ほとんどの計算研究は、言語学における系統学的再構築のための主要なデータ源として語彙的コグネートに依存しているが、著者が音列のレベルで単語を比較する利点を賞賛する研究はいくつか存在する。建物 (a)異なる言語族からの10の多様なデータセット、 b)コグネートと音の対応検出のための最先端手法を初めて実験し,音とコグネートによる系統再構築手法の性能について検討した。以上の結果から,音素対応から再構成した系統に比べて,平均的な四重項距離に対して約3分の1の位相的距離が金本位の系統に近かった。 In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.	翻訳日:2024-02-06 17:24:20 公開日:2024-02-05
# 非同期計画推論におけるグラフ強化大言語モデル Graph-enhanced Large Language Models in Asynchronous Plan Reasoning ( http://arxiv.org/abs/2402.02805v1 ) ライセンス: Link先を確認	Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert	(参考訳) 時間コストの最適化にはシーケンシャルかつ並列な計画を必要とするため、非同期計画の推論は難しい。大規模言語モデル(llm)はこのタスクで成功するだろうか? 本稿では,この問題に関する最初の大規模研究について述べる。 GPT-4 や LLaMA-2 など,クローズドかつオープンソースな LLM の代表的セットは,我々のベンチマーク AsyncHow のタスク解決プロセスに関する図面が提供されないと,動作が悪くなる。そこで我々は,グラフと自然言語のプロンプトを組み合わせ,最先端の結果を得るPlan Like a Graph (PLaG) という新しい手法を提案する。 PLaGはモデル性能を向上させることができるが、タスクの複雑さが増加するとLLMは劇的に劣化し、デジタルデバイスをシミュレートするためのLLMの利用限界が強調される。我々の研究は、LSMを効率的な自律エージェントとして使うためのエキサイティングなステップだと考えています。 Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs. Can large language models (LLMs) succeed at this task? Here, we present the first large-scale study investigating this question. We find that a representative set of closed and open-source LLMs, including GPT-4 and LLaMA-2, behave poorly when not supplied with illustrations about the task-solving process in our benchmark AsyncHow. We propose a novel technique called Plan Like a Graph (PLaG) that combines graphs with natural language prompts and achieves state-of-the-art results. We show that although PLaG can boost model performance, LLMs still suffer from drastic degradation when task complexity increases, highlighting the limits of utilizing LLMs for simulating digital devices. We see our study as an exciting step towards using LLMs as efficient autonomous agents.	翻訳日:2024-02-06 17:24:00 公開日:2024-02-05
# 大言語モデル蒸留薬推奨モデル Large Language Model Distilling Medication Recommendation Model ( http://arxiv.org/abs/2402.02803v1 ) ライセンス: Link先を確認	Qidong Liu, Xian Wu, Xiangyu Zhao, Yuanshao Zhu, Zijian Zhang, Feng Tian and Yefeng Zheng	(参考訳) 医薬品の推奨は、患者の特定の健康ニーズに基づいて最も適した薬物を処方することを含む、インテリジェントヘルスケアシステムの重要な側面である。残念ながら、現在使われている多くの高度なモデルは、アイデンティティにのみ依存しながら、医療データの微妙なセマンティクスを見落としてしまう傾向にある。さらに,これらのモデルでは,患者が初めて来院した患者を対象とする治療において,先行する処方歴が欠如していることから,大きな課題に直面している。これらの課題に対処するために,Large Language Models (LLMs) の強力な意味理解と入力に依存しない特徴を利用する。本研究は, LLMを用いて既存の薬剤推奨手法を変換することを目的としている。本稿では,Large Language Model Distilling Medication Recommendation (LEADER)と呼ばれる新しいアプローチを提案する。まず、LSMが医薬品を効果的に提案できる適切なプロンプトテンプレートを作成することから始める。しかし、llmのレコメンダシステムへの直接的な統合は、薬に特有の体外問題に繋がる。 LLMを新しい出力層と改良されたチューニング損失関数に適応させることで処理を行う。 LLMベースのモデルは優れた能力を示すが、推論時に高い計算コストに悩まされ、医療分野では実用的ではない。これを軽減するため,LLMの習熟度をよりコンパクトなモデルに伝達する機能レベルの知識蒸留技術を開発した。実世界の2つのデータセットである mimic-iii と mimic-iv に関する広範な実験により,提案モデルが効果的な結果をもたらすだけでなく,効率も高いことを示した。実験の再現性を高めるため,実装コードをオンラインで公開する。 The recommendation of medication is a vital aspect of intelligent healthcare systems, as it involves prescribing the most suitable drugs based on a patient's specific health needs. Unfortunately, many sophisticated models currently in use tend to overlook the nuanced semantics of medical data, while only relying heavily on identities. Furthermore, these models face significant challenges in handling cases involving patients who are visiting the hospital for the first time, as they lack prior prescription histories to draw upon. To tackle these issues, we harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs). Our research aims to transform existing medication recommendation methodologies using LLMs. In this paper, we introduce a novel approach called Large Language Model Distilling Medication Recommendation (LEADER). We begin by creating appropriate prompt templates that enable LLMs to suggest medications effectively. However, the straightforward integration of LLMs into recommender systems leads to an out-of-corpus issue specific to drugs. We handle it by adapting the LLMs with a novel output layer and a refined tuning loss function. Although LLM-based models exhibit remarkable capabilities, they are plagued by high computational costs during inference, which is impractical for the healthcare sector. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model. Extensive experiments conducted on two real-world datasets, MIMIC-III and MIMIC-IV, demonstrate that our proposed model not only delivers effective results but also is efficient. To ease the reproducibility of our experiments, we release the implementation code online.	翻訳日:2024-02-06 17:23:45 公開日:2024-02-05
# 拡散モデルを用いた物体の極端2次元幾何学 Extreme Two-View Geometry From Object Poses with Diffusion Models ( http://arxiv.org/abs/2402.02800v1 ) ライセンス: Link先を確認	Yujing Sun, Caiyi Sun, Yuan Liu, Yuexin Ma, Siu Ming Yiu	(参考訳) 人間は、同じ物体を含む2つの画像の視点の違いを、同じ視覚領域が存在しないという驚くほど大きな変化であっても、力ずくで知覚することができる。しかし、この顕著な技術は既存のカメラポーズ推定手法の課題であり、マッチングに重複する局所的な特徴が欠如しているため、大きな視点の違いに直面して失敗することが多い。本稿では,オブジェクトのパワーを効果的に活用し,極端な視点変化に直面した2視点形状を正確に決定することを目的とする。本稿では,まず,相対カメラのポーズ推定問題をオブジェクトのポーズ推定問題に数学的に変換する。そして、オブジェクトのポーズを推定するために、拡散モデルZero123から得られたオブジェクトの事前情報を用いて、オブジェクトの新規ビュー画像を合成する。新規ビュー画像が一致してオブジェクトのポーズが決定され、2ビューカメラのポーズが決定される。実験では,大局的な視点変化に対する特異なロバスト性およびレジリエンスを示し,合成および実世界のデータセットにまたがる例外的な一般化能力を持つ2視点ポーズを連続的に推定した。コードはhttps://github.com/scy639/Extreme-Two-View-Geometry-From-Object-Poses-with-Diffusion-Modelsで入手できる。 Human has an incredible ability to effortlessly perceive the viewpoint difference between two images containing the same object, even when the viewpoint change is astonishingly vast with no co-visible regions in the images. This remarkable skill, however, has proven to be a challenge for existing camera pose estimation methods, which often fail when faced with large viewpoint differences due to the lack of overlapping local features for matching. In this paper, we aim to effectively harness the power of object priors to accurately determine two-view geometry in the face of extreme viewpoint changes. In our method, we first mathematically transform the relative camera pose estimation problem to an object pose estimation problem. Then, to estimate the object pose, we utilize the object priors learned from a diffusion model Zero123 to synthesize novel-view images of the object. The novel-view images are matched to determine the object pose and thus the two-view camera pose. In experiments, our method has demonstrated extraordinary robustness and resilience to large viewpoint changes, consistently estimating two-view poses with exceptional generalization ability across both synthetic and real-world datasets. Code will be available at https://github.com/scy639/Extreme-Two-View-Geometry-From-Object-Poses-with-Diffusion-Models.	翻訳日:2024-02-06 17:23:19 公開日:2024-02-05
# 表面欠陥検出のための合同注意誘導型特徴核融合ネットワーク Joint Attention-Guided Feature Fusion Network for Saliency Detection of Surface Defects ( http://arxiv.org/abs/2402.02797v1 ) ライセンス: Link先を確認	Xiaoheng Jiang, Feng Yan, Yang Lu, Ke Wang, Shuai Guo, Tianzhu Zhang, Yanwei Pang, Jianwei Niu, and Mingliang Xu	(参考訳) 表面欠陥検査は工業生産・製造プロセスにおいて重要な役割を果たしている。畳み込みニューラルネットワーク(CNN)ベースの欠陥検査手法は大きな飛躍を遂げたものの、欠陥スケールの変化、複雑なバックグラウンド、低コントラストなど、多くの課題に直面している。これらの問題に対処するために,エンコーダ・デコーダネットワークに基づく表面欠陥検出のための共同注意誘導型特徴融合ネットワーク (JAFFNet) を提案する。 JAFFNetは主にJAFFモジュールをデコードステージに組み込み、低レベルと高レベルの機能を適応的に融合させる。 JAFFモジュールは、欠陥の強調と、低コントラスト欠陥の検出に有用な機能融合時のバックグラウンドノイズの抑制を学習する。さらに、JAFFNetはエンコーダに従って高密度受容野(DRF)モジュールを導入し、リッチなコンテキスト情報によって機能をキャプチャし、異なるスケールの欠陥を検出する。 JAFFモジュールは主に、高レベルの意味的特徴によって提供される学習された共同チャネルと空間の注意マップを利用して特徴融合を誘導する。注意マップは、モデルが欠陥機能にもっと注意を払うようにします。 DRFモジュールは、先行するMRF特徴マップと元の入力を全て入力として、MRF(Multi-Receptive-field)ユニットのシーケンスを利用する。得られたDRF機能は、広い範囲の受容場を持つリッチコンテキスト情報をキャプチャする。 SD- Saliency-900, Magnetic tile, および DAGM 2007 で行った実験により, 本手法が他の最先端手法と比較して有望な性能を発揮することが示された。一方,本手法は66FPSのリアルタイム欠陥検出速度に達する。 Surface defect inspection plays an important role in the process of industrial manufacture and production. Though Convolutional Neural Network (CNN) based defect inspection methods have made huge leaps, they still confront a lot of challenges such as defect scale variation, complex background, low contrast, and so on. To address these issues, we propose a joint attention-guided feature fusion network (JAFFNet) for saliency detection of surface defects based on the encoder-decoder network. JAFFNet mainly incorporates a joint attention-guided feature fusion (JAFF) module into decoding stages to adaptively fuse low-level and high-level features. The JAFF module learns to emphasize defect features and suppress background noise during feature fusion, which is beneficial for detecting low-contrast defects. In addition, JAFFNet introduces a dense receptive field (DRF) module following the encoder to capture features with rich context information, which helps detect defects of different scales. The JAFF module mainly utilizes a learned joint channel-spatial attention map provided by high-level semantic features to guide feature fusion. The attention map makes the model pay more attention to defect features. The DRF module utilizes a sequence of multi-receptive-field (MRF) units with each taking as inputs all the preceding MRF feature maps and the original input. The obtained DRF features capture rich context information with a large range of receptive fields. Extensive experiments conducted on SD-saliency-900, Magnetic tile, and DAGM 2007 indicate that our method achieves promising performance in comparison with other state-of-the-art methods. Meanwhile, our method reaches a real-time defect detection speed of 66 FPS.	翻訳日:2024-02-06 17:22:58 公開日:2024-02-05
# エッジコンテンツ配信のための学習型キャッシュ機構 A Learning-Based Caching Mechanism for Edge Content Delivery ( http://arxiv.org/abs/2402.02795v1 ) ライセンス: Link先を確認	Hoda Torabi, Hamzeh Khazaei, Marin Litoiu	(参考訳) 5Gネットワークの出現とIoT(Internet of Things)の台頭により、Content Delivery Networks(CDNs)はますますネットワークエッジに拡張されている。このシフトは、特に限られたキャッシュストレージとエッジにおける多様な要求パターンのために、ユニークな課題をもたらす。これらのエッジ環境は、さまざまなオブジェクトサイズ分布とオブジェクトアクセスパターンによって特徴づけられるトラフィッククラスをホストすることができる。このような複雑さにより、要求頻度や時間間隔といったメトリクスに依存する従来のキャッシュ戦略が効果的になるのが難しくなる。これらの複雑さにもかかわらず、エッジキャッシュの最適化は不可欠である。エッジでのバイトヒット率の改善ネットワークバックボーンの負荷を軽減するだけでなく、運用コストの最小化とエンドユーザへのコンテンツ配信の迅速化。本稿では,ハザードレート(HR)順序付けの原則に基づく総合的な学習ベースのキャッシュフレームワークであるHR-Cacheを紹介する。 HR-Cacheはこのルールを利用して、将来のオブジェクトの排除決定を導く。 HRの順序付けに基づくキャッシュ決定から学習するために、軽量な機械学習モデルを採用し、その後、受信するリクエストの"キャッシュフレンドリ"を予測する。 cache-averse"と見なされるオブジェクトは、evictionの優先候補としてキャッシュに置かれる。広範な実験を通じて、hr-cacheは既存の最先端手法に比べてバイトヒット率を一貫して向上させるだけでなく、最小の予測オーバーヘッドでこれを達成することを実証する。実世界の3つのトレースと1つの合成トレースを用いた実験の結果、HR-CacheはLRUよりも2.2-14.6%大きなWANトラフィックを継続的に達成していることが示された。ヒューリスティックなキャッシング戦略だけでなく、最先端の学習ベースのアルゴリズムよりも優れています。 With the advent of 5G networks and the rise of the Internet of Things (IoT), Content Delivery Networks (CDNs) are increasingly extending into the network edge. This shift introduces unique challenges, particularly due to the limited cache storage and the diverse request patterns at the edge. These edge environments can host traffic classes characterized by varied object-size distributions and object-access patterns. Such complexity makes it difficult for traditional caching strategies, which often rely on metrics like request frequency or time intervals, to be effective. Despite these complexities, the optimization of edge caching is crucial. Improved byte hit rates at the edge not only alleviate the load on the network backbone but also minimize operational costs and expedite content delivery to end-users. In this paper, we introduce HR-Cache, a comprehensive learning-based caching framework grounded in the principles of Hazard Rate (HR) ordering, a rule originally formulated to compute an upper bound on cache performance. HR-Cache leverages this rule to guide future object eviction decisions. It employs a lightweight machine learning model to learn from caching decisions made based on HR ordering, subsequently predicting the "cache-friendliness" of incoming requests. Objects deemed "cache-averse" are placed into cache as priority candidates for eviction. Through extensive experimentation, we demonstrate that HR-Cache not only consistently enhances byte hit rates compared to existing state-of-the-art methods but also achieves this with minimal prediction overhead. Our experimental results, using three real-world traces and one synthetic trace, indicate that HR-Cache consistently achieves 2.2-14.6% greater WAN traffic savings than LRU. It outperforms not only heuristic caching strategies but also the state-of-the-art learning-based algorithm.	翻訳日:2024-02-06 17:22:28 公開日:2024-02-05
# 小さな言語モデルのための最適化とアーキテクチャの再考 Rethinking Optimization and Architecture for Tiny Language Models ( http://arxiv.org/abs/2402.02791v1 ) ライセンス: Link先を確認	Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang	(参考訳) 大規模言語モデル(llm)のパワーは多くのデータと計算リソースを通して実証されている。しかし,モバイル端末上での言語モデルの適用は,計算コストやメモリコストの面で大きな課題に直面している。高度に複雑な訓練プロセスによって制限された言語モデルの最適化には、慎重に研究されることがほとんどない多くの詳細がある。本研究では,1Bパラメータを持つ小さな言語モデルに基づいて,各成分の効果を分析するための実験的な研究を慎重に設計する。主にニューラルアーキテクチャ、パラメータ初期化、最適化戦略という3つの視点が議論されている。いくつかの設計式は、トークン圧縮、アーキテクチャの微調整、パラメータ継承、複数ラウンドトレーニングなど、小さな言語モデルに特に効果的であることが実証されている。次に、1.6T多言語コーパス上でPanGu-$\pi$-1B ProとPanGu-$\pi$-1.5B Proを訓練する。実験の結果、PanGu-$\pi$-1B Proのベンチマーク評価セットにおいて、最適化とアーキテクチャの改善により8.87の顕著な平均改善が得られた。さらに、PanGu-$\pi$-1.5B Proは、モデルサイズが大きいSOTAモデルの範囲を超え、その優れた性能を検証する。コードはまもなくリリースされる(https://github.com/YuchuanTian/RethinkTinyLM)。 The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing language models that are seldom studied carefully. In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component. Three perspectives are mainly discussed, i.e., neural architecture, parameter initialization, and optimization strategy. Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training. Then we train PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on 1.6T multilingual corpora, following the established formulas. Experimental results demonstrate the improved optimization and architecture yield a notable average improvement of 8.87 on benchmark evaluation sets for PanGu-$\pi$-1B Pro. Besides, PanGu-$\pi$-1.5B Pro surpasses a range of SOTA models with larger model sizes, validating its superior performance. The code will be released soon (https://github.com/YuchuanTian/RethinkTinyLM).	翻訳日:2024-02-06 17:21:58 公開日:2024-02-05
# 双曲型タンジェント指数線形ユニット(TeLU)による安定・ロバスト深層学習 Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear Unit (TeLU) ( http://arxiv.org/abs/2402.02790v1 ) ライセンス: Link先を確認	Alfredo Fernandez and Ankur Mali	(参考訳) 本稿では,f(x) = x{\cdot}tanh(e^x)$として表現される新しいニューラルネットワーク活性化関数である双曲的接指数線形単位(telu)について述べる。 TeLUは、ReLU、GELU、Mishのような従来のアクティベーション関数の制限を、消滅と爆発的な勾配問題に対処することによって克服するように設計されている。我々の理論解析と実証評価により、TeLUは既存の活性化関数よりも安定性と堅牢性を向上し、活性化出力の平均をゼロに効果的に調整し、訓練安定性と収束性を高めた。 Resnet-50を含む先進アーキテクチャにおける一般的なアクティベーション関数(ReLU、GELU、SiLU、Mish、Logish、Smish)に対する広範な評価は、他の関数に最適化されたハイパーパラメータ条件下であっても、TeLUの低分散と優れた性能を示している。 CIFAR-10、CIFAR-100、TinyImageNetといった挑戦的なデータセットを使った大規模なテストでは、860のシナリオを網羅し、TeLUはその効果を一貫して示し、ニューラルネットワークアクティベーション機能の潜在的な新しい標準として位置づけ、多様なディープラーニングアプリケーションにおける安定性とパフォーマンスを高めた。 In this paper, we introduce the Hyperbolic Tangent Exponential Linear Unit (TeLU), a novel neural network activation function, represented as $f(x) = x{\cdot}tanh(e^x)$. TeLU is designed to overcome the limitations of conventional activation functions like ReLU, GELU, and Mish by addressing the vanishing and, to an extent, the exploding gradient problems. Our theoretical analysis and empirical assessments reveal that TeLU outperforms existing activation functions in stability and robustness, effectively adjusting activation outputs' mean towards zero for enhanced training stability and convergence. Extensive evaluations against popular activation functions (ReLU, GELU, SiLU, Mish, Logish, Smish) across advanced architectures, including Resnet-50, demonstrate TeLU's lower variance and superior performance, even under hyperparameter conditions optimized for other functions. In large-scale tests with challenging datasets like CIFAR-10, CIFAR-100, and TinyImageNet, encompassing 860 scenarios, TeLU consistently showcased its effectiveness, positioning itself as a potential new standard for neural network activation functions, boosting stability and performance in diverse deep learning applications.	翻訳日:2024-02-06 17:21:35 公開日:2024-02-05
# 散逸量子力学の人工知能に基づくサロゲート解--ユニバーサルプロパゲータの物理不定形再構成 Artificial-intelligence-based surrogate solution of dissipative quantum dynamics: physics-informed reconstruction of the universal propagator ( http://arxiv.org/abs/2402.02788v1 ) ライセンス: Link先を確認	Jiaji Zhang, Carlos L. Benavides-Riveros, Lipeng Chen	(参考訳) 散逸量子系のダイナミクスを支配する方程式の正確な(あるいは近似的な)解は、量子科学にとって難しい課題である。これらの方程式を柔軟に解くためにいくつかのアルゴリズムが設計されているが、それらは主に高価な反復スキームに依存している。最近では、ディープニューラルネットワークが量子力学に使われているが、現在のアーキテクチャは特定のシステムの物理学に大きく依存しており、通常は人口動態に限られている。本稿では,量子プロパゲータをフーリエニューラル演算子としてパラメータ化することにより,散逸的量子力学を解く人工知能に基づく代理モデルを提案する。従来のアルゴリズムと比較して、我々の量子ニューラルプロパゲータは、時間を要するイテレーションを回避し、任意の初期量子状態を任意に長時間進化させるために使用できる普遍的なスーパー演算子を提供する。提案手法の適用性を示すため,Fenna-Matthews-Olson複合体の人口動態と時間相関関数の計算に量子ニューラルプロパゲータを用いた。 The accurate (or even approximate) solution of the equations that govern the dynamics of dissipative quantum systems remains a challenging task for quantum science. While several algorithms have been designed to solve those equations with different degrees of flexibility, they rely mainly on highly expensive iterative schemes. Most recently, deep neural networks have been used for quantum dynamics but current architectures are highly dependent on the physics of the particular system and usually limited to population dynamics. Here we introduce an artificial-intelligence-based surrogate model that solves dissipative quantum dynamics by parameterizing quantum propagators as Fourier neural operators, which we train using both dataset and physics-informed loss functions. Compared with conventional algorithms, our quantum neural propagator avoids time-consuming iterations and provides a universal super-operator that can be used to evolve any initial quantum state for arbitrarily long times. To illustrate the wide applicability of the approach, we employ our quantum neural propagator to compute population dynamics and time-correlation functions of the Fenna-Matthews-Olson complex.	翻訳日:2024-02-06 17:21:11 公開日:2024-02-05
# EEVEE: 自然言語処理のための簡単なアノテーションツール EEVEE: An Easy Annotation Tool for Natural Language Processing ( http://arxiv.org/abs/2402.02864v1 ) ライセンス: Link先を確認	Axel Sorensen, Siyao Peng, Barbara Plank, Rob van der Goot	(参考訳) アノテーションツールは自然言語処理(NLP)データセット作成の出発点である。さまざまなツールが用意されていますが、これらのツールの設定は障害になります。簡便さ,効率,使いやすさを重視したアノテーションツールであるEEVEEを提案する。ブラウザ内で直接実行することができ(セットアップは不要)、アノテーションにはタブ分離ファイル(文字オフセットやタスク固有のフォーマットとは対照的に)を使用する。単一のデータセット上で複数のタスクのアノテーションを可能にし、シーケンスラベリング、スパンラベリング、テキスト分類、Seq2seqの4つのタスクタイプをサポートする。 Annotation tools are the starting point for creating Natural Language Processing (NLP) datasets. There is a wide variety of tools available; setting up these tools is however a hindrance. We propose EEVEE, an annotation tool focused on simplicity, efficiency, and ease of use. It can run directly in the browser (no setup required) and uses tab-separated files (as opposed to character offsets or task-specific formats) for annotation. It allows for annotation of multiple tasks on a single dataset and supports four task-types: sequence labeling, span labeling, text classification and seq2seq.	翻訳日:2024-02-06 17:15:15 公開日:2024-02-05
# 機械学習耐性アモルファスシリコン物理的不凍機能(pufs) Machine Learning Resistant Amorphous Silicon Physically Unclonable Functions (PUFs) ( http://arxiv.org/abs/2402.02846v1 ) ライセンス: Link先を確認	Velat Kilic, Neil Macfarlane, Jasper Stround, Samuel Metais, Milad Alemohammad, A. Brinton Cooper, Amy C. Foster, Mark A. Foster	(参考訳) 非線形波動カオスアモルファスシリコン(a-si)のキャビティを物理的に不安定な関数(puf)として用いる方法を検討した。統合された電子PUFに対する機械学習攻撃は、PUFの挙動をモデル化するのに非常に効果的であることが示されている。このようなa-SiフォトニックPUFに対する攻撃は、線形回帰、k-アネレスト近傍、決定木アンサンブル(ランダム森林と勾配高木)、ディープニューラルネットワーク(DNN)などのアルゴリズムを用いて研究される。 DNNは研究対象のアルゴリズムの中で最高の性能を示したが、私的情報メトリクスによって定量化されるa-Si PUFのセキュリティを完全に破壊することはできなかった。さらに, A-Si PUFの機械学習抵抗は, 非線形応答の強度に直接関係していることがわかった。 We investigate usage of nonlinear wave chaotic amorphous silicon (a-Si) cavities as physically unclonable functions (PUF). Machine learning attacks on integrated electronic PUFs have been demonstrated to be very effective at modeling PUF behavior. Such attacks on integrated a-Si photonic PUFs are investigated through application of algorithms including linear regression, k-nearest neighbor, decision tree ensembles (random forests and gradient boosted trees), and deep neural networks (DNNs). We found that DNNs performed the best among all the algorithms studied but still failed to completely break the a-Si PUF security which we quantify through a private information metric. Furthermore, machine learning resistance of a-Si PUFs were found to be directly related to the strength of their nonlinear response.	翻訳日:2024-02-06 17:15:05 公開日:2024-02-05
# オープンドメイン科学的クレーム検証のための知識ソースの比較 Comparing Knowledge Sources for Open-Domain Scientific Claim Verification ( http://arxiv.org/abs/2402.02844v1 ) ライセンス: Link先を確認	Juraj Vladika, Florian Matthes	(参考訳) 科学的知識が発見され、健康的な主張がオンラインで共有される速度は、科学的主張のための効果的な事実チェックシステムの開発の重要性を強調している。文献におけるこのタスクの通常の設定は、クレームの証拠を含む文書が既に提供され、注釈が付されているか、限定されたコーパスに含まれていると仮定している。これは、数百万のドキュメントを持つ知識ソースを問い合わせて関連する証拠を見つける必要がある現実世界の環境では、システムが非現実的になる。本稿では,オープンドメインクレーム検証システムの性能をテストするために,一連の実験を行う。バイオメディカルおよびヘルスクレームの4つのデータセット上で,システムの最終評決予測を異なる設定で検証する。パイプラインの証拠選択と評決予測部を一定に保ちながら、文書検索は3つの共通知識ソース(pubmed、wikipedia、google)で行われ、2つの異なる情報検索技術を用いて行われる。 PubMedは特殊なバイオメディカルクレームとうまく連携するが、Wikipediaは日常的な健康問題に向いている。同様に、BM25は検索精度が優れ、セマンティック検索は関連する証拠を思い出す。結果について議論し,検索パターンや課題を概説し,今後の方向性を期待する。 The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of developing efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the systems unrealistic for real-world settings where knowledge sources with potentially millions of documents need to be queried to find relevant evidence. In this paper, we perform an array of experiments to test the performance of open-domain claim verification systems. We test the final verdict prediction of systems on four datasets of biomedical and health claims in different settings. While keeping the pipeline's evidence selection and verdict prediction parts constant, document retrieval is performed over three common knowledge sources (PubMed, Wikipedia, Google) and using two different information retrieval techniques. We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns. Likewise, BM25 excels in retrieval precision, while semantic search in recall of relevant evidence. We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.	翻訳日:2024-02-06 17:14:50 公開日:2024-02-05
# trinity: 複数/ロングテール/長期的な関心を1つにまとめる Trinity: Syncretizing Multi-/Long-tail/Long-term Interests All in One ( http://arxiv.org/abs/2402.02842v1 ) ライセンス: Link先を確認	Jing Yan, Liu Jiang, Jianfei Cui, Zhichen Zhao, Xingyan Bin, Feng Zhang, Zuotao Liu	(参考訳) 推薦システムにおける関心モデリングは、ユーザエクスペリエンスを改善するための絶え間ないトピックであり、多くの既存の作品において、典型的な関心モデリングタスク(例えば、マルチ関心、長期関心、長期関心など)が研究されている。しかし、ほとんどの人は孤立に関心を抱く一方、相互関係を無視している。本稿では,これらの課題は共通の「関心記憶障害」の問題に悩まされ,同時に緩和する解が存在することを論じる。長期的手がかりは,多目的性を示し,長期的関心の顕在化に寄与すると考えられる。本研究は,検索段階における新規で統一的なフレームワークであるTrinityを提案し,利害対立を解消し,複数の利害関係モデリングタスクを改善する。実時間クラスタリングシステムを構築して,アイテムをエネルブルクラスタに投影し,これらのクラスタ上での統計的関心ヒストグラムを計算する。これらのヒストグラムに基づいて、トリニティは未配信のテーマを認識し、新たなホットトピックに直面すると安定している。 Trinityは、計算オーバーヘッドが少なすぎるため、大規模産業のシナリオに適している。派生したレトリバーはDouyinのレコメンダシステムにデプロイされ、ユーザエクスペリエンスと保持を大幅に改善した。このような実践的な経験は他のシナリオによく一般化できると考えている。 Interest modeling in recommender system has been a constant topic for improving user experience, and typical interest modeling tasks (e.g. multi-interest, long-tail interest and long-term interest) have been investigated in many existing works. However, most of them only consider one interest in isolation, while neglecting their interrelationships. In this paper, we argue that these tasks suffer from a common "interest amnesia" problem, and a solution exists to mitigate it simultaneously. We figure that long-term cues can be the cornerstone since they reveal multi-interest and clarify long-tail interest. Inspired by the observation, we propose a novel and unified framework in the retrieval stage, "Trinity", to solve interest amnesia problem and improve multiple interest modeling tasks. We construct a real-time clustering system that enables us to project items into enumerable clusters, and calculate statistical interest histograms over these clusters. Based on these histograms, Trinity recognizes underdelivered themes and remains stable when facing emerging hot topics. Trinity is more appropriate for large-scale industry scenarios because of its modest computational overheads. Its derived retrievers have been deployed on the recommender system of Douyin, significantly improving user experience and retention. We believe that such practical experience can be well generalized to other scenarios.	翻訳日:2024-02-06 17:14:29 公開日:2024-02-05
# 量子多部系における高次例外点に対する位相不変量の測定 Measuring topological invariants for higher-order exceptional points in quantum multipartite systems ( http://arxiv.org/abs/2402.02839v1 ) ライセンス: Link先を確認	Pei-Rong Han, Wen Ning, Xin-Jie Huang, Ri-Hua Zheng, Shou-Bang Yang, Fan Wu, Zhen-Biao Yang, Qi-Ping Su, Chui-Ping Yang, Shi-Biao Zheng	(参考訳) 例外点 (EPs) の存在により、非エルミート系 (NH) はエルミート類縁体なしで興味深い位相現象を見せることができる。しかし、例外的位相不変量の実験的特性は古典的あるいは半古典的システムにおいて二階EP(EP2s)に制限されている。本稿では,高次EPを持つNHマルチキュービットモデルを提案する。我々は、超伝導量子ビットを2つのマイクロ波共振器に制御的に結合し、3量子ビットモデルを実装した。パラメータ空間におけるこのEP3を取り巻くループに沿って複素固有スペクトルをマッピングすることにより、EP3の位相不変量を実験的に定量化する。実現されたトポロジーの非古典性は、対応する固有状態における観測された量子相関によって示される。我々の結果は、例外位相の研究を、多粒子交絡固有状態を持つ完全量子力学モデルに拡張する。さらに、光子が3つの個々の元素によって非局所的に共有される1つの光子の非相互伝達を実証する。 Owing to the presence of exceptional points (EPs), non-Hermitian (NH) systems can display intriguing topological phenomena without Hermitian analogs. However, experimental characteristics of exceptional topological invariants have been restricted to second-order EPs (EP2s) in classical or semiclassical systems. We here propose an NH multi-qubit model with higher-order EPs, each of which is underlain by a multifold-degenerate multipartite entangled eigenstate. We implement the three-qubit model by controllably coupling a superconducting qubit to two microwave resonators, one serving as a Hermitian photonic qubit while the other as an NH qubit. We experimentally quantify the topological invariant for an EP3, by mapping out the complex eigenspectra along a loop surrounding this EP3 in the parameter space. The nonclassicality of the realized topology is manifested by the observed quantum correlations in the corresponding eigenstates. Our results extend research of exceptional topology to fully quantum-mechanical models with multi-partite entangled eigenstates. We further demonstrate the non-reciprocal transmission of a single photon, during which the photon is nonlocally shared by three individual elements.	翻訳日:2024-02-06 17:14:06 公開日:2024-02-05
# スリランカにおけるプライマリケアヘルス情報システム間の相互運用性向上のためのHAPI-FHIRサーバの実装 HAPI-FHIR Server Implementation to Enhancing Interoperability among Primary Care Health Information Systems in Sri Lanka: Review of the Technical Use Case ( http://arxiv.org/abs/2402.02838v1 ) ライセンス: Link先を確認	Prabath Jayathissa, Roshan Hewapathirana	(参考訳) このレビューは、標準化されたフレームワークを提唱するデジタルヘルスにおける相互運用性の重要な役割を強調するものだ。 FHIR(Fast Healthcare Interoperability Resources)サーバの実装、技術的、セマンティック、プロセスの問題への対処に焦点を当てている。 FHIRの適応性は、プライマリケアヘルス情報システムにおける統一性を保証する。患者データ管理の複雑さは、シームレスな患者ケアにおける意味的相互運用の重要な役割を強調します。 FHIR標準はこれらの取り組みを強化し、データ検索のための複数の経路を提供する。 ADR誘導型FHIRサーバの実装は、患者のアイデンティティ、バイオメトリックス、データセキュリティに関連する課題を体系的に解決する。詳細な開発フェーズでは、アーキテクチャ、API統合、セキュリティが重視されている。最終段階には、HHIMS Synthetic Dataset Testingなど、前方のアプローチが含まれる。 fhirの統合をトランスフォーメーションとして想定し、デジタルヘルスランドスケープに合わせたレスポンシブな医療環境を期待し、効率的なデータ交換とアクセスのために包括的でダイナミックで相互接続されたシステムを保証する。 This review underscores the vital role of interoperability in digital health, advocating for a standardized framework. It focuses on implementing a Fast Healthcare Interoperability Resources (FHIR) server, addressing technical, semantic, and process challenges. FHIR's adaptability ensures uniformity within Primary Care Health Information Systems, fostering interoperability. Patient data management complexities highlight the pivotal role of semantic interoperability in seamless patient care. FHIR standards enhance these efforts, offering multiple pathways for data search. The ADR-guided FHIR server implementation systematically addresses challenges related to patient identity, biometrics, and data security. The detailed development phases emphasize architecture, API integration, and security. The concluding stages incorporate forward-looking approaches, including HHIMS Synthetic Dataset testing. Envisioning FHIR integration as transformative, it anticipates a responsive healthcare environment aligned with the evolving digital health landscape, ensuring comprehensive, dynamic, and interconnected systems for efficient data exchange and access.	翻訳日:2024-02-06 17:13:48 公開日:2024-02-05
# 私の(言語的な)友人の助けを借りて: 多人数会話のトピックセグメンテーション With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations ( http://arxiv.org/abs/2402.02837v1 ) ライセンス: Link先を確認	Amandine Decker (LORIA, UL, CNRS, SEMAGRAMME, GU), Maxime Amblard (SEMAGRAMME, LORIA)	(参考訳) 現在議論されていることが参加者の貢献を制限しているため、話題は会話のグローバル組織において重要な役割を果たす。対話の中でトピックが組織される方法を理解することは、発話の順序を超えた対話の構造についての洞察を与える。しかし、この高レベル構造を研究することは、対話をより小さなトポラルコヒーレントな発話集合に分割することでアプローチしようとする複雑なタスクである。これらのセグメント間の相互作用を理解することで、対話レベルでトピック組織モデルを提案することができる。本稿では、オープンドメイン会話を用いて、最近の機械学習に基づくトピックセグメンテーションモデルと同等の精度を、形式的アプローチで達成しようと試みる。このタスクに意味があると思われる機能は、会話のトピック構造をより理解するのに役立ちます。 Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by first segmenting dialogues into smaller topically coherent sets of utterances. Understanding the interactions between these segments would then enable us to propose a model of topic organisation at a dialogue level. In this paper we work with open-domain conversations and try to reach a comparable level of accuracy as recent machine learning based topic segmentation models but with a formal approach. The features we identify as meaningful for this task help us understand better the topical structure of a conversation.	翻訳日:2024-02-06 17:13:31 公開日:2024-02-05
# エンドツーエンドJND最適化による知覚学習画像圧縮 Perceptual Learned Image Compression via End-to-End JND-Based Optimization ( http://arxiv.org/abs/2402.02836v1 ) ライセンス: Link先を確認	Farhad Pakdaman, Sanaz Nami, and Moncef Gabbouj	(参考訳) Emerging Learned Image Compression (LC)は、圧縮のためのニューラルネットワークのエンドツーエンドトレーニングによって、コーディング効率を大幅に改善する。従来のコーデックに対するこのアプローチの重要な利点は、トレーニング中に任意の最適化基準をエンコーダ-デコーダネットワークに直接適用できることである。 HVS(Human Visual System)に準拠したLCの知覚的最適化は、まだ完全には検討されていない。本稿では、Just Noticeable Distortion(JND)の原則をLCに統合する新しいフレームワークを提案する。既存のJNDデータセットを活用することで,JNDをLCトレーニングプロセスに統合する3つのパーセプティブ最適化手法が提案されている。(1) 画素単位のJND損失(PWL)は,JND特性の再現において画素単位の忠実度を優先し,(2) 画像単位のJND損失(IWL)は,全体的な知覚不能な劣化レベルを強調し,(3) 特徴単位のJND損失(FWL)は,再構成された画像特徴を知覚的に重要な特徴と整合させる。実験により,JND統合の有効性が示され,ベースライン法と比較して,速度歪み性能と視覚的品質が向上した。提案手法はトレーニング後の複雑さを増すことはない。 Emerging Learned image Compression (LC) achieves significant improvements in coding efficiency by end-to-end training of neural networks for compression. An important benefit of this approach over traditional codecs is that any optimization criteria can be directly applied to the encoder-decoder networks during training. Perceptual optimization of LC to comply with the Human Visual System (HVS) is among such criteria, which has not been fully explored yet. This paper addresses this gap by proposing a novel framework to integrate Just Noticeable Distortion (JND) principles into LC. Leveraging existing JND datasets, three perceptual optimization methods are proposed to integrate JND into the LC training process: (1) Pixel-Wise JND Loss (PWL) prioritizes pixel-by-pixel fidelity in reproducing JND characteristics, (2) Image-Wise JND Loss (IWL) emphasizes on overall imperceptible degradation levels, and (3) Feature-Wise JND Loss (FWL) aligns the reconstructed image features with perceptually significant features. Experimental evaluations demonstrate the effectiveness of JND integration, highlighting improvements in rate-distortion performance and visual quality, compared to baseline methods. The proposed methods add no extra complexity after training.	翻訳日:2024-02-06 17:13:16 公開日:2024-02-05
# 一般化光子可変非ガウス演算による連続可変量子テレポーテーションの性能最適化 Performance optimization of continuous variable quantum teleportation with generalized photon-varying non-Gaussian operations ( http://arxiv.org/abs/2402.02835v1 ) ライセンス: Link先を確認	Mingjian He, Shouyin Liu	(参考訳) 連続可変量子テレポーテーションは、量子状態の長距離伝送への経路を提供する。フォトンバリアリング非ガウシアン演算は、プロトコルに統合された場合の量子テレポーテーションの忠実性を向上させることが示されている。しかし、固定された非ガウス演算が与えられると、達成可能な忠実度は異なる入力状態によって変化する。ある状態のクラスをテレポートするための忠実度を高める操作は、他の状態のクラスに対して逆のことができる。異なる入力状態に適したパフォーマンス指標が欠落している。任意のタイプの非ガウス演算に対して、達成可能な忠実度は演算に関連するパラメータによっても変化する。以前の作業はパラメータの特定の設定のみに焦点を当てていた。パラメータに対する最適化も欠落している。本研究では,マルチモード状態に対する光子変動非ガウス演算のためのフレームワークを構築し,任意のテレポーテーション入力状態に適した性能指標を提案する。そして、新しいメトリックを適用して、様々なタイプの非ガウス演算を評価します。単純な多光子光子減算と光子付加から始めると、操作に関わる補助光子数の増加は性能向上を保証しないことがわかった。次に、上記の操作の組み合わせを調査し、特定の形式に近似した操作が最高の改善をもたらすことを確かめる。ここで提供される結果は、量子テレポーテーションネットワークの現実の実装や、量子状態の非ガウス性を利用するアプリケーションに有用である。 Continuous variable quantum teleportation provides a path to the long-distance transmission of quantum states. Photon-varying non-Gaussian operations have been shown to improve the fidelity of quantum teleportation when integrated into the protocol. However, given a fixed non-Gaussian operation, the achievable fidelity varies with different input states. An operation that increases the fidelity for teleporting one class of states might do the contrary for other classes of states. A performance metric suitable for different input states is missing. For a given type of non-Gaussian operation, the achievable fidelity also varies with parameters associated with the operation. Previous work only focuses on particular settings of the parameters. Optimization over the parameters is also missing. In this work, we build a framework for photon-varying non-Gaussian operations for multi-mode states, upon which we propose a performance metric suitable for arbitrary teleportation input states. We then apply the new metric to evaluate different types of non-Gaussian operations. Starting from simple multi-photon photon subtraction and photon addition, we find that increasing the number of ancillary photons involved in the operation does not guarantee performance improvement. We then investigate combinations of the operations mentioned above, finding that operations that approximate a particular form provide the best improvement. The results provided here will be valuable for real-world implementations of quantum teleportation networks and applications that harness the non-Gaussianity of quantum states.	翻訳日:2024-02-06 17:12:44 公開日:2024-02-05
# shortened llama: 大きな言語モデルのための単純な深さプルーニング Shortened LLaMA: A Simple Depth Pruning for Large Language Models ( http://arxiv.org/abs/2402.02834v1 ) ライセンス: Link先を確認	Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song	(参考訳) 現代の大規模言語モデル(LLM)の構造的プルーニングは、高い計算要求を減らす方法として現れている。幅プルーニングは、層数を維持しながら投影重量行列(例えば注意ヘッドを取り除いて)のサイズを小さくする。対照的に深さの刈り取りは、残りの重量のサイズを変更せずに、層やブロック全体を取り除きます。現在の研究は幅のみまたは幅と深さの混合に重点を置いており、LLM推論効率への影響について2つの単位(幅と深さ)の比較分析はほとんどない。本研究では,単純な深さプルーニング手法が,最近の幅プルーニング法とゼロショットタスク性能で競合することを示す。提案手法は,特にメモリ制約条件下では,LLMの実行に必要なバッチサイズを制限し,幅切断が有効でない場合に,推論速度を向上する。この作業が,ローカルおよびエッジデバイスへのllmのデプロイを支援することを願っています。 Structured pruning of modern large language models (LLMs) has emerged as a way of decreasing their high computational needs. Width pruning reduces the size of projection weight matrices (e.g., by removing attention heads) while maintaining the number of layers. Depth pruning, in contrast, removes entire layers or blocks, while keeping the size of the remaining weights unchanged. Most current research focuses on either width-only or a blend of width and depth pruning, with little comparative analysis between the two units (width vs. depth) concerning their impact on LLM inference efficiency. In this work, we show that a simple depth pruning approach can compete with recent width pruning methods in terms of zero-shot task performance. Our pruning method boosts inference speeds, especially under memory-constrained conditions that require limited batch sizes for running LLMs, where width pruning is ineffective. We hope this work can help deploy LLMs on local and edge devices.	翻訳日:2024-02-06 17:12:23 公開日:2024-02-05
# 工学生に何を教えるか - 倫理・道徳・政治の組込み What do we teach to engineering students: embedded ethics, morality, and politics ( http://arxiv.org/abs/2402.02831v1 ) ライセンス: Link先を確認	Avigail Ferdman and Emanuele Ratti	(参考訳) ここ数年、エンジニアリングカリキュラムに倫理モジュールを統合することを求める声が増えている。このポジティブな傾向にもかかわらず、これらの組込みプログラムに関する多くの問題が残っている。まず、学習目標が不明確である。第2の制限は、同じ旗の下で異なる次元の融合であり、特に個人の行為の倫理に対処する倫理カリキュラムと、社会的レベルで倫理に取り組むためのカリキュラムの混乱である。本稿では,これらの困難を克服するための3部構成の枠組みを提案する。我々のフレームワークは、倫理モジュールを3次元に分析的に分解する。第一に、学習目標に関連する倫理的次元がある。第二に、技術者の行動の道徳的関連性に対処する道徳的側面がある。最後に、政治的側面があり、市民レベルで道徳的関係の問題を拡大している。全体として、私たちのフレームワークには2つの利点があります。まず、コースインストラクターが道徳的・政治的両方の領域で倫理的ジレンマを見つけ、道徳的・政治的哲学からツールとリソースを活用できるように分析的な明快さを提供する。第二に、学生が抽象的な道徳的問題を推論し、潜在的な解決策を社会的に文脈化できる包括的な倫理的訓練を描いている。 In the past few years, calls for integrating ethics modules in engineering curricula have multiplied. Despite this positive trend, a number of issues with these embedded programs remains. First, learning goals are underspecified. A second limitation is the conflation of different dimensions under the same banner, in particular confusion between ethics curricula geared towards addressing the ethics of individual conduct and curricula geared towards addressing ethics at the societal level. In this article, we propose a tripartite framework to overcome these difficulties. Our framework analytically decomposes an ethics module into three dimensions. First, there is the ethical dimension, which pertains to the learning goals. Second, there is the moral dimension, which addresses the moral relevance of engineers conduct. Finally, there is the political dimension, which scales up issues of moral relevance at the civic level. All in all, our framework has two advantages. First, it provides analytic clarity, i.e. it enables course instructors to locate ethical dilemmas in either the moral or political realm and to make use of the tools and resources from moral and political philosophy. Second, it depicts a comprehensive ethical training, which enables students to both reason about moral issues in the abstract, and to socially contextualize potential solutions.	翻訳日:2024-02-06 17:12:07 公開日:2024-02-05
# PowerGraph: グラフニューラルネットワークのための電力グリッドベンチマークデータセット PowerGraph: A power grid benchmark dataset for graph neural networks ( http://arxiv.org/abs/2402.02827v1 ) ライセンス: Link先を確認	Anna Varbella, Kenza Amara, Blazhe Gjorgiev, Giovanni Sansavini	(参考訳) Public Graph Neural Networks(GNN)ベンチマークデータセットは、GNNの使用を促進し、さまざまな分野へのGNNの適用性を高める。コミュニティは現在、GNNアプリケーション用の電力グリッドの公開データセットを欠いている。実際、GNNは代替機械学習技術よりも複雑な電力グリッド現象を捉えることができる。パワーグリッドは、グラフ表現に自然に対応可能な複雑なネットワークである。したがって、GNNは代替機械学習技術よりも電力グリッドの挙動を捉える可能性がある。この目的のために,電力系統における停電の主な原因である障害イベントをカスケードするためのグラフデータセットを開発した。歴史的ブラックアウトデータセットは少なく、不完全である。脆弱性の評価と臨界成分の同定は通常、カスケード故障の計算コストの高いオフラインシミュレーションによって行われる。そこで本研究では,カスケード発生時のシステム状態の知識を活用して,カスケード障害のオンライン検出のための機械学習モデルを提案する。我々はpowergraphを開発した。powergraphは、電力グリッドの障害をモデル化するグラフデータセットである。 i)多類分類、二分分類、回帰を含む異なるグラフレベルのタスクに対するgnnモデルの訓練二 GNNモデルを説明すること。物理ベースのカスケード障害モデルによって生成されたデータセットは、さまざまな障害シナリオにまたがる操作と環境条件の汎用性を保証する。さらに,本論文では,gnn 説明可能性のベンチマーク手法として,接地エッジレベルの説明を割り当てることにより,データセットの利用を促進する。 PowerGraphは、化学から生物学まで多くの領域において重要な、グラフレベルのタスクと説明可能性のためのより良いGNNモデルの開発を支援する。 Public Graph Neural Networks (GNN) benchmark datasets facilitate the use of GNN and enhance GNN applicability to diverse disciplines. The community currently lacks public datasets of electrical power grids for GNN applications. Indeed, GNNs can potentially capture complex power grid phenomena over alternative machine learning techniques. Power grids are complex engineered networks that are naturally amenable to graph representations. Therefore, GNN have the potential for capturing the behavior of power grids over alternative machine learning techniques. To this aim, we develop a graph dataset for cascading failure events, which are the major cause of blackouts in electric power grids. Historical blackout datasets are scarce and incomplete. The assessment of vulnerability and the identification of critical components are usually conducted via computationally expensive offline simulations of cascading failures. Instead, we propose using machine learning models for the online detection of cascading failures leveraging the knowledge of the system state at the onset of the cascade. We develop PowerGraph, a graph dataset modeling cascading failures in power grids, designed for two purposes, namely, i) training GNN models for different graph-level tasks including multi-class classification, binary classification, and regression, and ii) explaining GNN models. The dataset generated via a physics-based cascading failure model ensures the generality of the operating and environmental conditions by spanning diverse failure scenarios. In addition, we foster the use of the dataset to benchmark GNN explainability methods by assigning ground-truth edge-level explanations. PowerGraph helps the development of better GNN models for graph-level tasks and explainability, critical in many domains ranging from chemistry to biology, where the systems and processes can be described as graphs.	翻訳日:2024-02-06 17:11:45 公開日:2024-02-05
# SynthVision -- 合成画像データを用いたコンピュータビジョンモデルにおける最小出力のハーネス化 SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data ( http://arxiv.org/abs/2402.02826v1 ) ライセンス: Link先を確認	Yudara Kularathne, Prathapa Janitha, Sithira Ambepitiya, Thanveer Ahamed, Dinuka Wijesundara, Prarththanan Sothyrajah	(参考訳) 疾患検出コンピュータビジョンモデルの迅速な開発は、伝染病やバイオテロの出来事といった急激な医療危機への対応に不可欠である。しかし、従来のデータ収集手法は遅すぎるため、最小限のデータから素早く信頼できるモデルを生成するための革新的なアプローチが必要となる。我々は,人工データのみを用いたヒトパピローマウイルス性器用ワート検出のための包括的コンピュータビジョンモデルを構築した。本研究では拡散モデルを用いた二相実験を行った。第1相拡散モデルを用いて10個のhpvガイド画像から多数の多種多様な合成画像を生成する。第2フェーズでは、この合成データセットを使用した視覚モデルのトレーニングとテストが行われた。医用画像認識における高画質トレーニングデータの生成における拡散モデルの有効性とその後の視覚モデルの性能への影響を評価することを目的とした。その結果,拡散モデルで生成した合成画像を用いて訓練した視覚モデルの性能に有意な知見が得られた。視覚モデルは性器障害の症例を正確に同定する上で非常に優れた性能を示した。医用画像分類における有効性を裏付ける精度は96%に達した。 hpvの場合、モデルの精度は99%、リコールは94%であった。正常例では95%の精度で,99%の再現率を示した。これらの指標は、モデルが真正のケースを正しく識別し、偽陽性を最小化する能力を示している。このモデルは、hpv症例では96%、正常症例では97%のf1スコアを達成した。両方のカテゴリにわたる高いF1スコアは、モデルの精度のバランスの取れた性質を強調し、その予測における信頼性と堅牢性を保証する。 Rapid development of disease detection computer vision models is vital in response to urgent medical crises like epidemics or events of bioterrorism. However, traditional data gathering methods are too slow for these scenarios necessitating innovative approaches to generate reliable models quickly from minimal data. We demonstrate our new approach by building a comprehensive computer vision model for detecting Human Papilloma Virus Genital warts using only synthetic data. In our study, we employed a two phase experimental design using diffusion models. In the first phase diffusion models were utilized to generate a large number of diverse synthetic images from 10 HPV guide images explicitly focusing on accurately depicting genital warts. The second phase involved the training and testing vision model using this synthetic dataset. This method aimed to assess the effectiveness of diffusion models in rapidly generating high quality training data and the subsequent impact on the vision model performance in medical image recognition. The study findings revealed significant insights into the performance of the vision model trained on synthetic images generated through diffusion models. The vision model showed exceptional performance in accurately identifying cases of genital warts. It achieved an accuracy rate of 96% underscoring its effectiveness in medical image classification. For HPV cases the model demonstrated a high precision of 99% and a recall of 94%. In normal cases the precision was 95% with an impressive recall of 99%. These metrics indicate the model capability to correctly identify true positive cases and minimize false positives. The model achieved an F1 Score of 96% for HPV cases and 97% for normal cases. The high F1 Score across both categories highlights the balanced nature of the model precision and recall ensuring reliability and robustness in its predictions.	翻訳日:2024-02-06 17:11:05 公開日:2024-02-05
# FAIR-USE4OS: オープンソースからオープンソースへ FAIR-USE4OS: From open source to Open Source ( http://arxiv.org/abs/2402.02824v1 ) ライセンス: Link先を確認	Raphael Sonabend, Hugo Gruson, Leo Wolansky, Agnes Kiragga, Daniel S. Katz	(参考訳) 本稿では、FAIR(Findable, Accessible, Interoperable, Reusable)ガイドラインを拡張し、ソフトウェアがオープンソースであるかどうかを評価するための基準を提供する。ユーザ中心、持続可能、平等)を追加することで、ソフトウェア開発は、ユーザー入力を早期に取り入れ、すべての利害関係者がフロントエンド設計を利用できるようにし、ソフトウェア設計と並行して長期的な持続可能性を計画し、オープンソースベストプラクティスに固執することができる。 FAIR-USE4OSガイドラインは、資金提供者と研究者がオープンソースソフトウェアプロジェクトをより効果的に評価し、計画することを可能にする。 FAIRガイドラインの下でも、これは単にGitHubでDOI Zenodoを使ってリリースされるソフトウェアを意味する可能性がある。 FAIR-USE4OSガイドラインを採用することで、設計プロセスの初期段階からベストプラクティスを実証することができます。 This paper extends the FAIR (Findable, Accessible, Interoperable, Reusable) guidelines to provide criteria for assessing if software is Open Source. By adding 'USE' (User-Centered, Sustainable, Equitable), software development can adhere to open source best practice by incorporating user-input early on, ensuring front-end designs are accessible to all possible stakeholders, and planning long-term sustainability alongside software design. The FAIR-USE4OS guidelines will allow funders and researchers to more effectively evaluate and plan Open Source software projects. There is good evidence of funders increasingly mandating that all funded research software is open-source; however, even under the FAIR guidelines, this could simply mean software released on GitHub with a Zenodo DOI. By employing the FAIR-USE4OS guidelines, best practice can be demonstrated from the very beginning of the design process and the software has the greatest chance of success by being truly 'Open Source'.	翻訳日:2024-02-06 17:10:23 公開日:2024-02-05
# 言語モデルに対するデータ汚染検出の実施は(あまりにも)容易である Evading Data Contamination Detection for Language Models is (too) Easy ( http://arxiv.org/abs/2402.02823v1 ) ライセンス: Link先を確認	Jasper Dekoninck, Mark Niklas M\"uller, Maximilian Baader, Marc Fischer, Martin Vechev	(参考訳) 大規模な言語モデルは広く普及しており、ベンチマークのパフォーマンスは、あるモデルに対して別のモデルよりもユーザの好みを導くことが多い。しかしながら、これらのモデルでトレーニングされる膨大なデータが、不注意に公開ベンチマークの汚染につながり、パフォーマンス測定を損なう可能性がある。近年, 汚染検出手法が開発されているが, 検出を回避しようとする悪意のあるモデル提供者による意図的な汚染の可能性を見落としている。この設定は、公開ベンチマークの信頼性に疑念を抱くため、非常に重要であると我々は主張する。この問題をより厳密に研究するために,モデルプロバイダと汚染検出方法の両方の分類を提案する。これは、現在の検出方法を完全に回避しながらベンチマーク性能を大幅に膨らませる、単純で効果的な汚染技術であるealで活用している既存の方法の脆弱性を明らかにします。 Large language models are widespread, with their performance on benchmarks frequently guiding user preferences for one model over another. However, the vast amount of data these models are trained on can inadvertently lead to contamination with public benchmarks, thus compromising performance measurements. While recently developed contamination detection methods try to address this issue, they overlook the possibility of deliberate contamination by malicious model providers aiming to evade detection. We argue that this setting is of crucial importance as it casts doubt on the reliability of public benchmarks. To more rigorously study this issue, we propose a categorization of both model providers and contamination detection methods. This reveals vulnerabilities in existing methods that we exploit with EAL, a simple yet effective contamination technique that significantly inflates benchmark performance while completely evading current detection methods.	翻訳日:2024-02-06 17:09:28 公開日:2024-02-05
# 大規模言語モデルはどのように文脈を学ぶか? 文脈内頭部の問合せと鍵行列はメトリクス学習のための2つの塔である How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning ( http://arxiv.org/abs/2402.02872v1 ) ライセンス: Link先を確認	Zeping Yu, Sophia Ananiadou	(参考訳) 文脈内学習のメカニズムを探究し,測位・計画法を用いて仮説を提案する。浅い層では、デモの特徴を対応するラベルにマージし、入力テキストの特徴を最後のトークンに集約する。深いレイヤでは、コンテキスト内ヘッドが大きな貢献をします。各インコンテキストヘッドでは、値出力行列がラベルの特徴を抽出する。クエリとキー行列は、入力テキストと各デモの間の注意重みを計算する。注目重量が大きいほど、ラベル情報は次の単語を予測するために最後のトークンに転送される。クエリとキー行列は、入力テキストと各デモンストレーションの類似度メトリックを学ぶための2つの塔とみなすことができる。この仮説に基づいて,不均衡ラベルと実演順序が予測に影響を与える理由を説明する。 GPT2 大型 Llama 7B, 13B, 30B で実験を行った。結果は我々の分析を裏付ける。全体として、本研究は、文脈内学習のメカニズムを理解するための新しい方法と合理的仮説を提供する。私たちのコードはgithubでリリースされます。 We explore the mechanism of in-context learning and propose a hypothesis using locate-and-project method. In shallow layers, the features of demonstrations are merged into their corresponding labels, and the features of the input text are aggregated into the last token. In deep layers, in-context heads make great contributions. In each in-context head, the value-output matrix extracts the labels' features. Query and key matrices compute the attention weights between the input text and each demonstration. The larger the attention weight is, the more label information is transferred into the last token for predicting the next word. Query and key matrices can be regarded as two towers for learning the similarity metric between the input text and each demonstration. Based on this hypothesis, we explain why imbalanced labels and demonstration order affect predictions. We conduct experiments on GPT2 large, Llama 7B, 13B and 30B. The results can support our analysis. Overall, our study provides a new method and a reasonable hypothesis for understanding the mechanism of in-context learning. Our code will be released on github.	翻訳日:2024-02-06 17:01:04 公開日:2024-02-05
# 解釈のない統計: 説明可能な機械学習について Statistics without Interpretation: A Sober Look at Explainable Machine Learning ( http://arxiv.org/abs/2402.02870v1 ) ライセンス: Link先を確認	Sebastian Bordt, Ulrike von Luxburg	(参考訳) 説明アルゴリズムに関する急速に成長している文献では、これらのアルゴリズムが何のために、どのように使われるべきかはよく分かっていない。これは、説明アルゴリズムはしばしば数学的に複雑であるが明確な解釈は認めていないためであると主張する。あいにく、明確な解釈を持たない複雑な統計的手法は解釈の誤りに結びつくことになり、この事実は文学においてますます明らかになっている。前進するために、説明アルゴリズムに関する論文は、アルゴリズムの出力がどの程度正確に解釈されるべきかを明確にするべきである。また、これらの説明により、機能に関する質問に何ができるのか、答えられないのかを明確にする必要がある。我々の議論は統計学と解釈の区別に基づいている。また、説明可能な機械学習と応用統計の並列性にも依存している。 In the rapidly growing literature on explanation algorithms, it often remains unclear what precisely these algorithms are for and how they should be used. We argue that this is because explanation algorithms are often mathematically complex but don't admit a clear interpretation. Unfortunately, complex statistical methods that don't have a clear interpretation are bound to lead to errors in interpretation, a fact that has become increasingly apparent in the literature. In order to move forward, papers on explanation algorithms should make clear how precisely the output of the algorithms should be interpreted. They should also clarify what questions about the function can and cannot be answered given the explanations. Our argument is based on the distinction between statistics and their interpretation. It also relies on parallels between explainable machine learning and applied statistics.	翻訳日:2024-02-06 17:00:51 公開日:2024-02-05
# 微調整強化学習モデルは秘かに緩和問題である Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem ( http://arxiv.org/abs/2402.02868v1 ) ライセンス: Link先を確認	Maciej Wo{\l}czyk, Bart{\l}omiej Cupia{\l}, Mateusz Ostaszewski, Micha{\l} Bortkiewicz, Micha{\l} Zaj\k{a}c, Razvan Pascanu, {\L}ukasz Kuci\'nski, Piotr Mi{\l}o\'s	(参考訳) ファインチューニング(英: fine-tuning)は、基礎モデルの成功した応用によって最近紹介されたように、実践者が事前訓練された能力を移行できる幅広い技術である。しかし、微調整強化学習(RL)モデルは依然として課題である。この研究は、行動と観察の間の相互作用によってRL設定でアクセント化され、事前訓練された能力を忘れる、移動不良の原因の1つを概念化する。すなわち、モデルは、微調整の初期フェーズに到達しない下流タスクの状態サブスペースで劣化し、事前トレーニングによってモデルがうまく振る舞う。これにより、予想される転送利益が失われる。この問題が発生した場合の条件を特定し、それが一般的であり、多くの場合破滅的であることを示す。課題であるNetHackとMontzumaのRevenge環境の詳細な実証分析を通じて、標準的な知識保持技術が問題を緩和し、事前学習された能力を最大限に活用できることを示す。特にNetHackでは、Human Monkシナリオの前のベストスコアを5ドルKから10ドルKポイントに改善した、ニューラルモデルのための新たな最先端技術を実現しています。 Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successful applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma's Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from $5$K to over $10$K points in the Human Monk scenario.	翻訳日:2024-02-06 17:00:41 公開日:2024-02-05
# 異常検出のための量子正規化流れ Quantum Normalizing Flows for Anomaly Detection ( http://arxiv.org/abs/2402.02866v1 ) ライセンス: Link先を確認	Bodo Rosenhahn, Christoph Hirche	(参考訳) 正規化フローは任意の分布から事前定義された(正規化)分布への単射写像を計算する。このような流れは、例えば異常検出のように、そのようなマッピングが学習されると、異なるタスクに対処するために使用できる。本研究では,量子アーキテクチャにおけるフローの正規化,フローのモデル化と最適化の方法について記述し,この手法を例のデータセットで評価する。提案モデルでは, 量子コンピュータ上で完全に実行可能でありながら, 孤立林, 局所外れ係数(LOF), 単一クラスSVMなどの古典的手法と比較して, 異常検出の競合性能を示す。 A Normalizing Flow computes a bijective mapping from an arbitrary distribution to a predefined (e.g. normal) distribution. Such a flow can be used to address different tasks, e.g. anomaly detection, once such a mapping has been learned. In this work we introduce Normalizing Flows for Quantum architectures, describe how to model and optimize such a flow and evaluate our method on example datasets. Our proposed models show competitive performance for anomaly detection compared to classical methods, e.g. based on isolation forests, the local outlier factor (LOF) or single-class SVMs, while being fully executable on a quantum computer.	翻訳日:2024-02-06 17:00:20 公開日:2024-02-05
# 音声インテリジェンスレベル分類のための注意LSTMシステムにおける音響・変調スペクトログラムの組み合わせについて On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification ( http://arxiv.org/abs/2402.02865v1 ) ライセンス: Link先を確認	Ascensi\'on Gallardo-Antol\'in and Juan M. Montero	(参考訳) 音声の理解性は、雑音環境、チャネル歪み、生理学的問題など、複数の要因に影響される可能性がある。本研究では,後者の場合において,音声の明瞭度の自動予測の問題に対処する。本研究の成果は,LSTMネットワークをベースとした非侵入型システムと,この課題に配慮したアテンション機構から,主に2つのコントリビューションを提示する。第一に、重要な時間情報を捨てるコンパクトな表現の代わりに、フレーム単位の変調スペクトログラムを入力特徴として用いることを提案する。第2に、LSTMフレームワークにフレーム単位の音響ログメルと変調スペクトログラムを組み合わせるための2つの異なる戦略について、判定レベルまたは遅延融合、発話レベルまたは重み付きポリシング(WP)融合について検討した。提案手法は, 重度の異なる構音障害音声を含むua-speechデータベースを用いて評価した。一方,注意力のあるlstmネットワークは,log-melspectrogramsの場合と同様の分類率を持つ変調スペクトログラムシーケンスを適切にモデル化できることを示した。一方、後期と後期の融合とWP融合の組合せ戦略は、単一機能システムよりも優れており、フレームごとのログメルと変調スペクトログラムは、LSTMアーキテクチャによって効果的に活用できるよりも、音声の可聴性予測のタスクに相補的な情報を持ち、WP融合戦略を持つシステムである。 Speech intelligibility can be affected by multiple factors, such as noisy environments, channel distortions or physiological issues. In this work, we deal with the problem of automatic prediction of the speech intelligibility level in this latter case. Starting from our previous work, a non-intrusive system based on LSTM networks with attention mechanism designed for this task, we present two main contributions. In the first one, it is proposed the use of per-frame modulation spectrograms as input features, instead of compact representations derived from them that discard important temporal information. In the second one, two different strategies for the combination of per-frame acoustic log-mel and modulation spectrograms into the LSTM framework are explored: at decision level or late fusion and at utterance level or Weighted-Pooling (WP) fusion. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. On the one hand, results show that attentional LSTM networks are able to adequately modeling the modulation spectrograms sequences producing similar classification rates as in the case of log-mel spectrograms. On the other hand, both combination strategies, late and WP fusion, outperform the single-feature systems, suggesting that per-frame log-mel and modulation spectrograms carry complementary information for the task of speech intelligibility prediction, than can be effectively exploited by the LSTM-based architectures, being the system with the WP fusion strategy and Attention-Pooling the one that achieves best results.	翻訳日:2024-02-06 17:00:06 公開日:2024-02-05
# graph neural machine: 表データを用いた新しい学習モデル Graph Neural Machine: A New Model for Learning with Tabular Data ( http://arxiv.org/abs/2402.02862v1 ) ライセンス: Link先を確認	Giannis Nikolentzos and Siyun Wang and Johannes Lutzeyer and Michalis Vazirgiannis	(参考訳) 近年、異なるドメインからグラフ構造へのデータマッピングへの関心が高まっている。中でもマルチ層パーセプトロン(MLP)のようなニューラルネットワークモデルはグラフとしてモデル化できる。実際、MLPは有向非巡回グラフとして表すことができる。グラフニューラルネットワーク(GNN)は最近、グラフ上で機械学習タスクを実行するための標準ツールになっている。本研究では,MLPが非同期メッセージパッシングGNNモデルと等価であることを示す。そこで我々は,MLPの有向非巡回グラフをほぼ完全なグラフに置き換え,同期メッセージパッシング方式を採用したグラフニューラルネットワーク(GNM)と呼ばれる,表型データのための新しい機械学習モデルを提案する。 1つのGNMモデルが複数のMLPモデルをシミュレート可能であることを示す。提案手法をいくつかの分類・回帰データセットで評価する。ほとんどの場合、GNMモデルはMLPアーキテクチャよりも優れている。 In recent years, there has been a growing interest in mapping data from different domains to graph structures. Among others, neural network models such as the multi-layer perceptron (MLP) can be modeled as graphs. In fact, MLPs can be represented as directed acyclic graphs. Graph neural networks (GNNs) have recently become the standard tool for performing machine learning tasks on graphs. In this work, we show that an MLP is equivalent to an asynchronous message passing GNN model which operates on the MLP's graph representation. We then propose a new machine learning model for tabular data, the so-called Graph Neural Machine (GNM), which replaces the MLP's directed acyclic graph with a nearly complete graph and which employs a synchronous message passing scheme. We show that a single GNM model can simulate multiple MLP models. We evaluate the proposed model in several classification and regression datasets. In most cases, the GNM model outperforms the MLP architecture.	翻訳日:2024-02-06 16:59:38 公開日:2024-02-05
# ゼロサムゲームにおけるノイズ観測の活用 Leveraging Noisy Observations in Zero-Sum Games ( http://arxiv.org/abs/2402.02861v1 ) ライセンス: Link先を確認	Emmanouil M Athanasakos (NEO), Samir M Perlaza (NEO, ECE, GAATI)	(参考訳) 本稿では,1人のプレイヤー(リーダー)が相手(フォロワー)に対して所定の確率測度(ストラテジー)をサンプリングして行動を選択するゼロサムゲームの事例について検討する。リーダーの行動は、従者によって任意のチャンネルの出力として観察される。これに反応して、従者は現在の情報、すなわちリーダーのコミットメントとそれに対応する騒がしい行動観察に基づいて行動を選択する。この文脈において、騒がしい動作の可観測性を持つゲームの平衡が常に存在し、その一意性に必要な条件が特定される。興味深いことに、ノイズの多い観測は、フォロワーの最高の反応の集合の濃度に重要な影響を及ぼす。特定の条件下では、そのような最良の応答の集合は、ほぼ確実にシングルトンであることが証明される。提案モデルでは,Lebesgue測度に対して密度のある任意のチャネルノイズを捕捉する。例として、チャネルがガウス確率測定によって記述される場合について検討する。 This paper studies an instance of zero-sum games in which one player (the leader) commits to its opponent (the follower) to choose its actions by sampling a given probability measure (strategy). The actions of the leader are observed by the follower as the output of an arbitrary channel. In response to that, the follower chooses its action based on its current information, that is, the leader's commitment and the corresponding noisy observation of its action. Within this context, the equilibrium of the game with noisy action observability is shown to always exist and the necessary conditions for its uniqueness are identified. Interestingly, the noisy observations have important impact on the cardinality of the follower's set of best responses. Under particular conditions, such a set of best responses is proved to be a singleton almost surely. The proposed model captures any channel noise with a density with respect to the Lebesgue measure. As an example, the case in which the channel is described by a Gaussian probability measure is investigated.	翻訳日:2024-02-06 16:59:25 公開日:2024-02-05
# オンライン変分学習における重要度サンプリング Importance sampling for online variational learning ( http://arxiv.org/abs/2402.02859v1 ) ライセンス: Link先を確認	Mathis Chagneux (IP Paris), Pierre Gloaguen (UBS), Sylvain Le Corff (LPSM (UMR\_8001), SU), Jimmy Olsson (KTH)	(参考訳) 本稿では,状態空間モデルにおけるオンライン変動推定について述べる。我々は,モンテカルロの重要サンプリングとともに変動的アプローチを用いて,スムージング分布,すなわち観測された潜在状態の連立分布の学習に焦点をあてる。本研究では,観測が逐次到着するストリーミングデータの文脈において,エビデンス下限(elbo)の勾配を計算するための効率的なアルゴリズムを提案する。私たちのコントリビューションには、計算効率のよいオンラインELBO推定器、オフラインおよび真のオンライン設定におけるパフォーマンスの実証、共同平滑な分布下でのコンピューティング一般への適応性などが含まれます。 This article addresses online variational estimation in state-space models. We focus on learning the smoothing distribution, i.e. the joint distribution of the latent states given the observations, using a variational approach together with Monte Carlo importance sampling. We propose an efficient algorithm for computing the gradient of the evidence lower bound (ELBO) in the context of streaming data, where observations arrive sequentially. Our contributions include a computationally efficient online ELBO estimator, demonstrated performance in offline and true online settings, and adaptability for computing general expectations under joint smoothing distributions.	翻訳日:2024-02-06 16:59:12 公開日:2024-02-05
# モデルベースオフライン強化学習のためのディープ自己回帰密度ネットとニューラルネットワークアンサンブル Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning ( http://arxiv.org/abs/2402.02858v1 ) ライセンス: Link先を確認	Abdelhakim Benechehab, Albert Thomas and Bal\'azs K\'egl	(参考訳) 政策最適化のために一組のシステム遷移しか利用できないオフライン強化学習の問題点を考察する。この分野の最近の進歩に続き、利用可能なデータからシステムダイナミクスを推論し、想像モデルロールアウトのポリシー最適化を行うモデルベース強化学習アルゴリズムを検討する。このアプローチは、実際のシステムで壊滅的な障害を引き起こす可能性のあるモデルエラーの悪用に対して脆弱です。標準的な解決策は、不確実性ヒューリスティックのアンサンブルに依存することと、不確実性が高すぎるモデルを利用するのを避けることである。我々は、D4RLベンチマークで1つのよく校正された自己回帰モデルでより良いパフォーマンスが得られることを示すことによって、アンサンブルに頼らなければならないという一般的な信念に挑戦する。また,モデル学習の静的指標を分析し,エージェントの最終性能の重要なモデル特性について結論づける。 We consider the problem of offline reinforcement learning where only a set of system transitions is made available for policy optimization. Following recent advances in the field, we consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts. This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system. The standard solution is to rely on ensembles for uncertainty heuristics and to avoid exploiting the model where it is too uncertain. We challenge the popular belief that we must resort to ensembles by showing that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark. We also analyze static metrics of model-learning and conclude on the important model properties for the final performance of the agent.	翻訳日:2024-02-06 16:59:01 公開日:2024-02-05
# バイアス適応確率近似の非漸近解析 Non-asymptotic Analysis of Biased Adaptive Stochastic Approximation ( http://arxiv.org/abs/2402.02857v1 ) ライセンス: Link先を確認	Sobihan Surendran (LPSM (UMR\_8001)), Antoine Godichon-Baggioni (LPSM (UMR\_8001)), Adeline Fermanian, Sylvain Le Corff (LPSM (UMR\_8001))	(参考訳) 適応ステップ付き確率勾配降下(sgd)は現在、ディープニューラルネットワークのトレーニングに広く使われている。ほとんどの理論的結果は、不偏勾配推定器へのアクセスを前提としており、モンテカルロ法を用いた最近の深層学習および強化学習の応用ではそうではない。本稿では,SGDの偏り勾配と凸・非凸スムーズ関数の適応ステップを包括的に非漸近解析する。本研究は、時間依存バイアスを取り入れ、勾配推定器のバイアスと平均正方形誤差(MSE)を制御することの重要性を強調する。特に、バイアスのある勾配を持つAdagrad と RMSProp が、非凸関数の滑らかな臨界点に、未バイアスの場合の文献における既存の結果に類似した速度で収束することが確認される。最後に,我々の収束結果を示す変分オートエンコンダ(vae)を用いた実験を行い,適切なハイパーパラメータチューニングによりバイアスの影響を低減できることを示す。 Stochastic Gradient Descent (SGD) with adaptive steps is now widely used for training deep neural networks. Most theoretical results assume access to unbiased gradient estimators, which is not the case in several recent deep learning and reinforcement learning applications that use Monte Carlo methods. This paper provides a comprehensive non-asymptotic analysis of SGD with biased gradients and adaptive steps for convex and non-convex smooth functions. Our study incorporates time-dependent bias and emphasizes the importance of controlling the bias and Mean Squared Error (MSE) of the gradient estimator. In particular, we establish that Adagrad and RMSProp with biased gradients converge to critical points for smooth non-convex functions at a rate similar to existing results in the literature for the unbiased case. Finally, we provide experimental results using Variational Autoenconders (VAE) that illustrate our convergence results and show how the effect of bias can be reduced by appropriate hyperparameter tuning.	翻訳日:2024-02-06 16:58:46 公開日:2024-02-05
# 動的スパース学習 : 効率的な推薦のための新しいパラダイム Dynamic Sparse Learning: A Novel Paradigm for Efficient Recommendation ( http://arxiv.org/abs/2402.02855v1 ) ライセンス: Link先を確認	Shuyao Wang, Yongduo Sui, Jiancan Wu, Zhi Zheng, Hui Xiong	(参考訳) ディープラーニングベースのレコメンデーションシステムでは、ユーザ数の増加とアイテム数の増加による計算要求の増加が、実用的なデプロイメントにとって大きな課題となっている。モデルサイズを削減し、効率的なレコメンデーションのためにユーザとアイテムの表現を効果的に学習する。モデル圧縮とアーキテクチャ探索の大幅な進歩にもかかわらず、一般的なアプローチは顕著な制約に直面している。これには、モデル圧縮における事前訓練/再訓練による計算コストと、アーキテクチャ設計における広範な検索スペースが含まれる。さらに、特に厳密な時間や空間制限のあるシナリオでは、複雑性を管理し、メモリ制約に固執することが問題となる。これらの課題に対処するため,推薦モデルに適した新しい学習パラダイムである動的スパース学習(DSL)を導入する。 DSLは、スクラッチから軽量スパースモデルを訓練し、トレーニング中の各ウェイトの重要性とモデルの空間分布を定期的に評価し、動的に調整する。このアプローチは、学習ライフサイクル全体の一貫性と最小のパラメータ予算を確保し、トレーニングから推論までの"エンドツーエンド"効率を実現する。大規模な実験の結果は、DSLの有効性を裏付け、トレーニングと推論コストを大幅に削減し、同等のレコメンデーションパフォーマンスを提供しています。 In the realm of deep learning-based recommendation systems, the increasing computational demands, driven by the growing number of users and items, pose a significant challenge to practical deployment. This challenge is primarily twofold: reducing the model size while effectively learning user and item representations for efficient recommendations. Despite considerable advancements in model compression and architecture search, prevalent approaches face notable constraints. These include substantial additional computational costs from pre-training/re-training in model compression and an extensive search space in architecture design. Additionally, managing complexity and adhering to memory constraints is problematic, especially in scenarios with strict time or space limitations. Addressing these issues, this paper introduces a novel learning paradigm, Dynamic Sparse Learning (DSL), tailored for recommendation models. DSL innovatively trains a lightweight sparse model from scratch, periodically evaluating and dynamically adjusting each weight's significance and the model's sparsity distribution during the training. This approach ensures a consistent and minimal parameter budget throughout the full learning lifecycle, paving the way for "end-to-end" efficiency from training to inference. Our extensive experimental results underline DSL's effectiveness, significantly reducing training and inference costs while delivering comparable recommendation performance.	翻訳日:2024-02-06 16:58:31 公開日:2024-02-05
# 二次元リドバーグ原子配列中のアモルファス量子磁石 Amorphous quantum magnets in a two-dimensional Rydberg atom array ( http://arxiv.org/abs/2402.02852v1 ) ライセンス: Link先を確認	Sergi Juli\`a-Farr\'e and Joseph Vovrosh and Alexandre Dauphin	(参考訳) アモルファス固体、すなわち、明確に定義された短距離特性を持つが、長距離秩序を持たない系は、凝縮物質において重要な研究トピックである。結晶構造は結晶構造と異なることが知られているが、アモルファス材料における創発的な集団的挙動に関する多くのオープンな疑問がある。これは、数値シミュレーションが極めて困難である量子状態において特にそうである。本稿では,アナログ量子シミュレータを用いてアモルファス量子磁石を探索する方法を提案する。そこで我々はまず,IsingモデルのRydbergシミュレータに適したアモルファス量子磁石を生成するアルゴリズムを提案する。その後、半古典的アプローチを用いてモデルの物理学の予備的な洞察を得る。特に,平均場位相図を計算し,線形スピン波理論を用いて励起の局在特性と動的構造因子を研究する。最後に,プログラム可能なツイーザアレイにおけるRydberg原子に基づく実験的な提案を概説し,古典的にシミュレートが難しい状態におけるアモルファス量子マグネットの研究への道を開く。 Amorphous solids, i.e., systems which feature well-defined short-range properties but lack long-range order, constitute an important research topic in condensed matter. While their microscopic structure is known to differ from their crystalline counterpart, there are still many open questions concerning the emergent collective behavior in amorphous materials. This is particularly the case in the quantum regime, where the numerical simulations are extremely challenging. In this article, we instead propose to explore amorphous quantum magnets with an analog quantum simulator. To this end, we first present an algorithm to generate amorphous quantum magnets, suitable for Rydberg simulators of the Ising model. Subsequently, we use semiclassical approaches to get a preliminary insight of the physics of the model. In particular, we calculate mean-field phase diagrams, and use the linear-spin-wave theory to study localization properties and dynamical structure factors of the excitations. Finally, we outline an experimental proposal based on Rydberg atoms in programmable tweezer arrays, thus opening the road towards the study of amorphous quantum magnets in regimes difficult to simulate classically.	翻訳日:2024-02-06 16:58:09 公開日:2024-02-05
# 合成特徴アライメントによる合成一般化の促進 Enhancing Compositional Generalization via Compositional Feature Alignment ( http://arxiv.org/abs/2402.02851v1 ) ライセンス: Link先を確認	Haoxiang Wang, Haozhe Si, Huajie Shao, Han Zhao	(参考訳) 機械学習モデルの現実世界の応用は、トレーニングとテストデータ分布の食い違いがあるデータ分散シフトに直面することが多い。一般的なマルチドメインマルチクラスセットアップでは、クラス数やドメイン数が大きくなると、各ドメインクラスの組み合わせでトレーニングデータを集めることは不可能になる。この課題は自然に、合成一般化(CG)能力を持つモデルを探し求め、モデルが目に見えないドメイン-クラスの組み合わせに一般化できる。 CGの課題を掘り下げるために,既存の実世界の画像データセットから派生したCGベンチマークスイートであるCG-Benchを開発し,CLIPやDINOv2といった基礎モデルの事前学習ファインタニングパラダイムが課題に対処していることを観察する。この課題に対処するために,簡単な2段階ファインタニング手法であるコンポジション・フィーチャーアライメント(CFA)を提案する。一クラス及びドメインラベルに関する事前訓練されたエンコーダ上で二本の直交線形ヘッドを学ぶこと。二新たに学習した頭部を凍結したエンコーダを微調整すること。我々はCFAが事前学習されたモデルの合成特徴学習を促進することを理論的および実証的に正当化する。 CG-Bench for CLIP と DINOv2, 2つの強力な事前学習型視覚基盤モデルについて広範な実験を行った。実験の結果, CFAは合成一般化において一般的な微調整技術より優れており, 合成特徴学習におけるCFAの有効性が相関していることがわかった。 Real-world applications of machine learning models often confront data distribution shifts, wherein discrepancies exist between the training and test data distributions. In the common multi-domain multi-class setup, as the number of classes and domains scales up, it becomes infeasible to gather training data for every domain-class combination. This challenge naturally leads the quest for models with Compositional Generalization (CG) ability, where models can generalize to unseen domain-class combinations. To delve into the CG challenge, we develop CG-Bench, a suite of CG benchmarks derived from existing real-world image datasets, and observe that the prevalent pretraining-finetuning paradigm on foundational models, such as CLIP and DINOv2, struggles with the challenge. To address this challenge, we propose Compositional Feature Alignment (CFA), a simple two-stage finetuning technique that i) learns two orthogonal linear heads on a pretrained encoder with respect to class and domain labels, and ii) fine-tunes the encoder with the newly learned head frozen. We theoretically and empirically justify that CFA encourages compositional feature learning of pretrained models. We further conduct extensive experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models. Experiment results show that CFA outperforms common finetuning techniques in compositional generalization, corroborating CFA's efficacy in compositional feature learning.	翻訳日:2024-02-06 16:57:49 公開日:2024-02-05
# 音声明瞭度の自動分類のための注意型短期記憶ベースシステム An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility ( http://arxiv.org/abs/2402.02850v1 ) ライセンス: Link先を確認	Miguel Fern\'andez-D\'iaz and Ascensi\'on Gallardo-Antol\'in	(参考訳) 音声の難易度は、ノイズ環境、技術的困難、生物学的条件など、複数の要因により劣化することがある。本研究は,後者の場合において,音声の可聴度を自動予測する非侵入システムの開発に焦点をあてる。本研究の主な貢献は,ログメルスペクトログラムを入力として用いたLong Short-Term Memory (LSTM)ネットワークの利用である。さらに、このlstmベースのシステムは、このタスクにより関連するフレームを決定できる単純な注意機構を組み込むことにより、さらに強化されている。提案手法は, 重度の異なる構音障害音声を含むua-speechデータベースを用いて評価した。その結果、LSTMアーキテクチャは、手作り機能付きサポートベクトルマシン(SVM)ベースのシステムと、平均ポリシングによるLSTMベースのシステムの両方に優れていた。 Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.	翻訳日:2024-02-06 16:57:21 公開日:2024-02-05
# 多言語セマンティック検索のドメイン適応-文献レビュー Domain Adaptation of Multilingual Semantic Search -- Literature Review ( http://arxiv.org/abs/2402.02932v1 ) ライセンス: Link先を確認	Anna Bringmann, Anastasia Zhukova	(参考訳) 本稿では、低リソース環境でドメイン適応を行うための現在のアプローチの概要と、低リソース環境で多言語セマンティック検索を行うためのアプローチについて述べる。我々は,高密度テキスト情報検索システムの一部に基づいて,ドメイン適応手法をクラスタ化するための新しいタイプ法を開発し,それらを効率的に組み合わせる方法について検討した。また,低リソース環境における多言語意味検索とドメイン適応手法を組み合わせる可能性についても検討する。 This literature review gives an overview of current approaches to perform domain adaptation in a low-resource and approaches to perform multilingual semantic search in a low-resource setting. We developed a new typology to cluster domain adaptation approaches based on the part of dense textual information retrieval systems, which they adapt, focusing on how to combine them efficiently. We also explore the possibilities of combining multilingual semantic search with domain adaptation approaches for dense retrievers in a low-resource setting.	翻訳日:2024-02-06 16:50:02 公開日:2024-02-05
# 多層住宅の熱力学のグレイボックスモデリングのためのデジタル双生児 Digital Twin for Grey Box modeling of Multistory residential building thermal dynamics ( http://arxiv.org/abs/2402.02909v1 ) ライセンス: Link先を確認	Lina Morkunaite, Justas Kardoka, Darius Pupeikis, Paris Fokaides, Vangelis Angelakis	(参考訳) ビルのエネルギー効率は広く研究されており、環境問題の増加とエネルギー自立の必要性により急速に人気が高まっている。北ヨーロッパでは、暖房エネルギーだけで全体のエネルギー消費の70%を占める。 IoT、ビッグデータ、クラウドコンピューティング、機械学習といった4.0の産業技術は、予測とプロアクティブなデジタルツインの作成とともに、この数を削減できる。しかし、構造熱力学は、多くの変数に依存する非常に複雑なプロセスである。その結果、一般的に使われている物理ベースのホワイトボックスモデルは時間がかかり、膨大な専門知識を必要とする。それとは対照的に、主にエネルギー消費データの構築に依存しているブラックボックス予測モデルは、基本的な洞察を欠き、再利用を妨げる。本研究では,建築の3次元表現とリアルタイムIoTデータを統合することで,建築熱力学のグレーボックスモデリングを容易にするアーキテクチャを提案する。このアーキテクチャは、ユーザーが物理法則と実データに基づいて建物の熱力学を定義することができるデジタルツインプラットフォームを作成するケーススタディで検証され、最適な暖房エネルギー最適化戦略のためのインフォームド意思決定が容易になる。また、作成したユーザインターフェースにより、施設管理者やエネルギー提供者、管理機関といったステークホルダーが、広範囲の専門知識や時間資源なしに、建物の熱力学を分析し、比較し、評価することができる。 Buildings energy efficiency is a widely researched topic, which is rapidly gaining popularity due to rising environmental concerns and the need for energy independence. In Northern Europe heating energy alone accounts for up to 70 percent of the total building energy consumption. Industry 4.0 technologies such as IoT, big data, cloud computing and machine learning, along with the creation of predictive and proactive digital twins, can help to reduce this number. However, buildings thermal dynamics is a very complex process that depends on many variables. As a result, commonly used physics-based white box models are time-consuming and require vast expertise. On the contrary, black box forecasting models, which rely primarily on building energy consumption data, lack fundamental insights and hinder re-use. In this study we propose an architecture to facilitate grey box modelling of building thermal dynamics while integrating real time IoT data with 3D representation of buildings. The architecture is validated in a case study creating a digital twin platform that enables users to define the thermal dynamics of buildings based on physical laws and real data, thus facilitating informed decision making for the best heating energy optimization strategy. Also, the created user interface enables stakeholders such as facility managers, energy providers or governing bodies to analyse, compare and evaluate buildings thermal dynamics without extensive expertise or time resources.	翻訳日:2024-02-06 16:49:54 公開日:2024-02-05
# ViewFusion:新しいビュー合成のための構成可能な拡散モデル学習 ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis ( http://arxiv.org/abs/2402.02906v1 ) ライセンス: Link先を確認	Bernard Spiegl, Andrea Perin, St\'ephane Deny, Alexander Ilin	(参考訳) ディープラーニングは、Neural Radiance Field(NeRF)ベースのアプローチからエンドツーエンドスタイルアーキテクチャに至るまで、新しいビュー合成の古い問題に対する、数多くの新しいアプローチを提供しています。それぞれのアプローチには特定の強みがあるが、適用性には特定の制限がある。この研究は、非並列な柔軟性を備えた新しいビュー合成に対する最先端のエンドツーエンド生成アプローチであるViewFusionを導入している。 ViewFusionは、シーンの任意の入力ビューに対して拡散遅延ステップを同時に適用し、各ビューで得られたノイズ勾配を(推定)画素重みマスクと組み合わせ、ターゲットシーンの各領域において最も情報性の高い入力ビューのみを考慮に入れることを保証する。従来のアプローチでは,(1)複数のシーンとオブジェクトクラスを訓練し,一般化すること,(2)列車とテスト時間の両方で,さまざまなポーズフリービューを適応的に取得すること,(3)非常に不確定な状況(生成的性質のため)においても,妥当なビューを生成すること,など,いくつかの制限が解決されている。制限には、シーンの3D埋め込みを発生させないことが含まれており、結果として推論速度は比較的遅く、我々の手法は比較的小さなデータセットNMRでのみテストされる。コードは利用可能。 Deep learning is providing a wealth of new approaches to the old problem of novel view synthesis, from Neural Radiance Field (NeRF) based approaches to end-to-end style architectures. Each approach offers specific strengths but also comes with specific limitations in their applicability. This work introduces ViewFusion, a state-of-the-art end-to-end generative approach to novel view synthesis with unparalleled flexibility. ViewFusion consists in simultaneously applying a diffusion denoising step to any number of input views of a scene, then combining the noise gradients obtained for each view with an (inferred) pixel-weighting mask, ensuring that for each region of the target scene only the most informative input views are taken into account. Our approach resolves several limitations of previous approaches by (1) being trainable and generalizing across multiple scenes and object classes, (2) adaptively taking in a variable number of pose-free views at both train and test time, (3) generating plausible views even in severely undetermined conditions (thanks to its generative nature) -- all while generating views of quality on par or even better than state-of-the-art methods. Limitations include not generating a 3D embedding of the scene, resulting in a relatively slow inference speed, and our method only being tested on the relatively small dataset NMR. Code is available.	翻訳日:2024-02-06 16:49:32 公開日:2024-02-05
# ヒト肘の強化学習制御デジタル双対におけるインピーダンス同定実験の複製 Replication of Impedance Identification Experiments on a Reinforcement-Learning-Controlled Digital Twin of Human Elbows ( http://arxiv.org/abs/2402.02904v1 ) ライセンス: Link先を確認	Hao Yu, Zebin Huang, Qingbo Liu, Ignacio Carlucho, and Mustafa Suphi Erden	(参考訳) 本研究は,デジタル人間モデルを用いた仮想環境下でのヒト神経機械実験を再現する先駆的な試みである。強化学習(RL)により強化された最先端の人体運動シミュレーションプラットフォームであるMyoSuiteを用いて, 筋骨格モデルを用いて, 肘のインピーダンス同定実験を行った。我々は,RLエージェントが制御する肘の動きと実際のヒト肘の動きを,トルク摂動実験で同定されたインピーダンスの観点から比較した。以上の結果から, rl剤は, ヒトよりも短い反応時間と優れた感覚能力により, 摂動下での肘運動を安定させるため, 高い肘インピーダンスを示すことが明らかとなった。本研究は,神経力学研究における仮想環境シミュレーションの可能性に関する予備的考察であり,従来の実験手法に代わる,初期的かつ有望な代替手段を提供する。人体の完全な筋骨格モデルを持つrl制御型デジタル双生児は、実際の人体実験の前に実験の設計とリハビリテーション理論を検証するのに有用であると考えられる。 This study presents a pioneering effort to replicate human neuromechanical experiments within a virtual environment utilising a digital human model. By employing MyoSuite, a state-of-the-art human motion simulation platform enhanced by Reinforcement Learning (RL), multiple types of impedance identification experiments of human elbow were replicated on a musculoskeletal model. We compared the elbow movement controlled by an RL agent with the motion of an actual human elbow in terms of the impedance identified in torque-perturbation experiments. The findings reveal that the RL agent exhibits higher elbow impedance to stabilise the target elbow motion under perturbation than a human does, likely due to its shorter reaction time and superior sensory capabilities. This study serves as a preliminary exploration into the potential of virtual environment simulations for neuromechanical research, offering an initial yet promising alternative to conventional experimental approaches. An RL-controlled digital twin with complete musculoskeletal models of the human body is expected to be useful in designing experiments and validating rehabilitation theory before experiments on real human subjects.	翻訳日:2024-02-06 16:49:06 公開日:2024-02-05
# 不均一多中心集団の回帰モデルに対するベイズ連邦推論 Bayesian Federated Inference for regression models with heterogeneous multi-center populations ( http://arxiv.org/abs/2402.02898v1 ) ライセンス: Link先を確認	Marianne A Jonker, Hassan Pazira, Anthony CC Coolen	(参考訳) 回帰モデルのパラメータを正確に推定するには、サンプルサイズがモデルに対する予測値の数に対して十分に大きい必要がある。実際には、十分なデータが欠如しており、モデルが過剰に適合し、その結果、新しい患者の結果の信頼性の低い予測につながる可能性がある。異なる(医療)センターで収集された異なるデータセットからデータをポーリングすることはこの問題を軽減するが、プライバシー規制やロジスティック問題のためにしばしば実現不可能である。別の方法は、センター内のローカルデータを別々に分析し、統計的推測結果とベイズ連邦推論(BFI)手法を組み合わせることである。このアプローチの目的は、統計解析が結合データ上で実行された場合の推測結果を、別々のセンタで計算することである。本研究は,同質性と異質性に基づく方法論を別センターの集団間で説明し,よりよく理解するための実例を与える。提案手法の優れた性能を示す。すべての計算を行うRパッケージが開発され,本論文で説明されている。数学的詳細はAppendixに記載されている。 To estimate accurately the parameters of a regression model, the sample size must be large enough relative to the number of possible predictors for the model. In practice, sufficient data is often lacking, which can lead to overfitting of the model and, as a consequence, unreliable predictions of the outcome of new patients. Pooling data from different data sets collected in different (medical) centers would alleviate this problem, but is often not feasible due to privacy regulation or logistic problems. An alternative route would be to analyze the local data in the centers separately and combine the statistical inference results with the Bayesian Federated Inference (BFI) methodology. The aim of this approach is to compute from the inference results in separate centers what would have been found if the statistical analysis was performed on the combined data. We explain the methodology under homogeneity and heterogeneity across the populations in the separate centers, and give real life examples for better understanding. Excellent performance of the proposed methodology is shown. An R-package to do all the calculations has been developed and is illustrated in this paper. The mathematical details are given in the Appendix.	翻訳日:2024-02-06 16:48:47 公開日:2024-02-05
# 対話におけるLLMエージェント:大規模言語モデルにおける個人性一貫性と言語適応の測定 LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models ( http://arxiv.org/abs/2402.02896v1 ) ライセンス: Link先を確認	Ivar Frisch, Mario Giulianelli	(参考訳) エージェントインタラクションとパーソナライゼーションはどちらも,大規模言語モデル(LLM)の研究において活発な話題であるが,ペルソナ条件のLLMエージェントの動作に対する言語インタラクションの影響に限定的な焦点が当てられている。このような取り組みは、エージェントが割り当てられた特性に一貫性を保ちながら、オープンで自然主義的な対話に参加できることを保証するために重要である。本実験では, 簡易な変動誘導サンプリングアルゴリズムを用いて, LLMエージェントの2群集団を誘導し, パーソナリティプロファイルにGPT-3.5を適用した。次に、パーソナリティテストを行い、エージェントを共同執筆タスクに提出し、異なるプロファイルが異なるパーソナリティ一貫性と言語的アライメントを示すことを発見した。本研究は,LLM間の対話型対話の理解を深め,対話型環境のための堅牢で人間的なLLMペルソナを構築するための新たなアプローチの必要性を強調した。 While both agent interaction and personalisation are vibrant topics in research on large language models (LLMs), there has been limited focus on the effect of language interaction on the behaviour of persona-conditioned LLM agents. Such an endeavour is important to ensure that agents remain consistent to their assigned traits yet are able to engage in open, naturalistic dialogues. In our experiments, we condition GPT-3.5 on personality profiles through prompting and create a two-group population of LLM agents using a simple variability-inducing sampling algorithm. We then administer personality tests and submit the agents to a collaborative writing task, finding that different profiles exhibit different degrees of personality consistency and linguistic alignment to their conversational partners. Our study seeks to lay the groundwork for better understanding of dialogue-based interaction between LLMs and highlights the need for new approaches to crafting robust, more human-like LLM personas for interactive environments.	翻訳日:2024-02-06 16:48:29 公開日:2024-02-05
# モーションアウェアビデオフレーム補間 Motion-Aware Video Frame Interpolation ( http://arxiv.org/abs/2402.02892v1 ) ライセンス: Link先を確認	Pengfei Han, Fuhua Zhang, Bin Zhao, and Xuelong Li	(参考訳) ビデオフレーム補間手法は、ビデオのフレーム周波数を増大させることを目的として、既存のフレームに賭ける新しいフレームを作成する試みである。しかし、現在の手法は、閉塞や不連続運動を含む挑戦的なシナリオにおいて、画像のぼやけや刺激的なアーティファクトになりがちである。さらに、それらは通常、モデリングと計算コストに複雑さをもたらす光学フロー推定に依存する。これらの問題に対処するために,新しい階層型ピラミッドモジュールを導入することで,連続フレームからの中間光流を直接推定するma-vfi(motion-aware video frame interpolation)ネットワークを提案する。異なる受容場を持つ入力フレームからグローバルな意味関係と空間的詳細を抽出し、複雑な動きパターンを捉えるだけでなく、必要な計算コストと複雑さを効果的に削減する。次いで,抽出した特徴量から中間流マップを推定・洗練するために,クロススケールな運動構造を示す。この手法は,フレーム補間過程における入力フレーム特徴とフローマップとの相互作用を容易にし,介在する流れのデラインの精度を著しく高める。最後に、中間流を中心に配向された凹凸状の損失を慎重に抽出し、その中間流の予後を巧みにガイドするデフト舵として機能し、介在流図の精度を大幅に改善する。実験により、MA-VFIは様々なデータセットにまたがる代表的VFI手法を超越し、有効性を保ちながら効率を向上させることができることが示された。 Video frame interpolation methodologies endeavor to create novel frames betwixt extant ones, with the intent of augmenting the video's frame frequency. However, current methods are prone to image blurring and spurious artifacts in challenging scenarios involving occlusions and discontinuous motion. Moreover, they typically rely on optical flow estimation, which adds complexity to modeling and computational costs. To address these issues, we introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames by introducing a novel hierarchical pyramid module. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, enabling the model to capture intricate motion patterns, but also effectively reduces the required computational cost and complexity. Subsequently, a cross-scale motion structure is presented to estimate and refine intermediate flow maps by the extracted features. This approach facilitates the interplay between input frame features and flow maps during the frame interpolation process and markedly heightens the precision of the intervening flow delineations. Finally, a discerningly fashioned loss centered around an intermediate flow is meticulously contrived, serving as a deft rudder to skillfully guide the prognostication of said intermediate flow, thereby substantially refining the precision of the intervening flow mappings. Experiments illustrate that MA-VFI surpasses several representative VFI methods across various datasets, and can enhance efficiency while maintaining commendable efficacy.	翻訳日:2024-02-06 16:48:09 公開日:2024-02-05
# 階層タッカー分解によるブラックボックス近似と最適化 Black-Box Approximation and Optimization with Hierarchical Tucker Decomposition ( http://arxiv.org/abs/2402.02890v1 ) ライセンス: Link先を確認	Gleb Ryzhakov, Andrei Chertkov, Artem Basharin, Ivan Oseledets	(参考訳) 多次元ブラックボックス近似と勾配なし最適化のための新しいhtbb法を開発し,maxvolインデックス選択法を用いて低ランク階層タッカー分解法を基礎とした。 14の複素モデル問題に対する数値実験により,提案手法の寸法1000倍のロバスト性を示すとともに,従来の勾配のない最適化手法よりもはるかに正確な結果を示すとともに,テンソル・ネットワークの単純な場合を表すテンソル・トレイン分解に基づく近似と最適化手法を示す。 We develop a new method HTBB for the multidimensional black-box approximation and gradient-free optimization, which is based on the low-rank hierarchical Tucker decomposition with the use of the MaxVol indices selection procedure. Numerical experiments for 14 complex model problems demonstrate the robustness of the proposed method for dimensions up to 1000, while it shows significantly more accurate results than classical gradient-free optimization methods, as well as approximation and optimization methods based on the popular tensor train decomposition, which represents a simpler case of a tensor network.	翻訳日:2024-02-06 16:47:43 公開日:2024-02-05
# 汎用音声理解のためのフェデレーション型自己監督学習の探索 Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding ( http://arxiv.org/abs/2402.02889v1 ) ライセンス: Link先を確認	Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen	(参考訳) federated learning (fl) と self-supervised learning (ssl) の統合は、ユーザデータのプライバシーを損なうことなく、オーディオデータを汎用オーディオ理解に活用するためのユニークで相乗効果のある組み合わせを提供する。しかし,大規模な異種音源からトレーニングデータを生成する場合,FL方式のSSLモデルを汎用音声理解のために研究することは稀である。本稿では,非独立分散(非iid)データにシミュレートされた大規模fl設定に組み込む場合の特徴マッチングおよび予測オーディオssl技術の性能評価を行う。本稿では,大規模分散ヘテロジニアスクライアントから中間的特徴表現を学習し,ラベルなし音声データを保持する新しいフェデレートssl(f-ssl)フレームワークを提案する。本研究は,音声-検索タスクにおける集中型音声-SSLアプローチと同等の性能を示すことを示す。広範囲な実験により、fasslが最先端fl集約法のための最適なグローバルモデルを得るのに役立つことの有効性と意義が示されている。 The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated by large-scale heterogeneous audio sources. In this paper, we evaluate the performance of feature-matching and predictive audio-SSL techniques when integrated into large-scale FL settings simulated with non-independently identically distributed (non-iid) data. We propose a novel Federated SSL (F-SSL) framework, dubbed FASSL, that enables learning intermediate feature representations from large-scale decentralized heterogeneous clients, holding unlabelled audio data. Our study has found that audio F-SSL approaches perform on par with the centralized audio-SSL approaches on the audio-retrieval task. Extensive experiments demonstrate the effectiveness and significance of FASSL as it assists in obtaining the optimal global model for state-of-the-art FL aggregation methods.	翻訳日:2024-02-06 16:47:31 公開日:2024-02-05
# 時間・メモリ・パラメータ効率の良い視覚適応 Time-, Memory- and Parameter-Efficient Visual Adaptation ( http://arxiv.org/abs/2402.02887v1 ) ライセンス: Link先を確認	Otniel-Bogdan Mercea, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab	(参考訳) 基盤モデルがより普及するにつれ、下流タスクに効率的に微調整する必要性が高まっている。多数の適応法が提案されているが, パラメータの学習量の観点からのみ効率的であるように設計されている。しかしながら、通常はモデル全体の勾配をバックプロパゲーションする必要があるため、トレーニング時間とメモリコストはそれほど大きく削減されない。本稿では,バックボーンを通じて勾配をバックプロパゲートしない適応法を提案する。凍結した、事前訓練されたバックボーンの機能を利用する軽量ネットワークを並列に設計することで、これを実現する。その結果,本手法はパラメータだけでなく,トレーニング時間やメモリ使用量にも有効であることがわかった。提案手法は,一般的なVTABベンチマークにおける最先端の精度パラメータトレードオフを実現し,トレーニング時間やメモリ使用量に関して,先行作業よりも優れていることを示す。さらに,映像分類の計算要求課題に対して40億パラメータの視覚トランスフォーマーバックボーンを適用し,複雑なモデル並列処理を必要とせず,学習効率と拡張性を示す。ここでは、10億のパラメータバックボーンにしかスケールできない、あるいは、より小さなバックボーンを完全に微調整できる事前のアダプタベースの手法を、同じGPUで実現し、トレーニング時間を短縮する。 As foundation models become more popular, there is a growing need to efficiently finetune them for downstream tasks. Although numerous adaptation methods have been proposed, they are designed to be efficient only in terms of how many parameters are trained. They, however, typically still require backpropagating gradients throughout the model, meaning that their training-time and -memory cost does not reduce as significantly. We propose an adaptation method which does not backpropagate gradients through the backbone. We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone. As a result, our method is efficient not only in terms of parameters, but also in training-time and memory usage. Our approach achieves state-of-the-art accuracy-parameter trade-offs on the popular VTAB benchmark, and we further show how we outperform prior works with respect to training-time and -memory usage too. We further demonstrate the training efficiency and scalability of our method by adapting a vision transformer backbone of 4 billion parameters for the computationally demanding task of video classification, without any intricate model parallelism. Here, we outperform a prior adaptor-based method which could only scale to a 1 billion parameter backbone, or fully-finetuning a smaller backbone, with the same GPU and less training time.	翻訳日:2024-02-06 16:47:13 公開日:2024-02-05
# フェデレートスパイキング学習における時間分散バックドア攻撃 Time-Distributed Backdoor Attacks on Federated Spiking Learning ( http://arxiv.org/abs/2402.02886v1 ) ライセンス: Link先を確認	Gorka Abad, Stjepan Picek, Aitor Urbieta	(参考訳) 本稿では,ニューロモルフィックデータを用いたバックドア攻撃に対するスパイクニューラルネットワーク(SNN)とフェデレーション学習(FL)の脆弱性について検討する。 SNNの効率性とFLのプライバシー上の優位性にもかかわらず、特に低消費電力デバイスでは、これらのシステムがこのような攻撃の影響を受けやすいことを示す。まず、ニューロモルフィックデータを用いてFLとSNNの併用の可能性を評価し、その可能性を示す。次に,既知のfl攻撃手法のsnsへの移動性を評価し,これらが準最適攻撃性能をもたらすことを見出した。そこで,攻撃性能を向上させるため,単発攻撃と複数発攻撃のバックドア攻撃について検討した。我々の主な貢献は、SNNやFLに合わせた新たな攻撃戦略を開発することです。最良の場合、攻撃成功率0.13 MSE、および98.9 SSIMを達成する。さらに,従来のバックドア攻撃に対する防御を適応評価し,snsの保護に不備があることを明らかにする。本研究は、特にバックドア攻撃の文脈において、SNNとFLの展開における堅牢なセキュリティ対策の必要性を明らかにする。 This paper investigates the vulnerability of spiking neural networks (SNNs) and federated learning (FL) to backdoor attacks using neuromorphic data. Despite the efficiency of SNNs and the privacy advantages of FL, particularly in low-powered devices, we demonstrate that these systems are susceptible to such attacks. We first assess the viability of using FL with SNNs using neuromorphic data, showing its potential usage. Then, we evaluate the transferability of known FL attack methods to SNNs, finding that these lead to suboptimal attack performance. Therefore, we explore backdoor attacks involving single and multiple attackers to improve the attack performance. Our primary contribution is developing a novel attack strategy tailored to SNNs and FL, which distributes the backdoor trigger temporally and across malicious devices, enhancing the attack's effectiveness and stealthiness. In the best case, we achieve a 100 attack success rate, 0.13 MSE, and 98.9 SSIM. Moreover, we adapt and evaluate an existing defense against backdoor attacks, revealing its inadequacy in protecting SNNs. This study underscores the need for robust security measures in deploying SNNs and FL, particularly in the context of backdoor attacks.	翻訳日:2024-02-06 16:46:49 公開日:2024-02-05
# 分散型人工知能のビルディングブロックに関するレビュー A Review on Building Blocks of Decentralized Artificial Intelligence ( http://arxiv.org/abs/2402.02885v1 ) ライセンス: Link先を確認	Vid Kersic, Muhamed Turkanovic	(参考訳) 人工知能は私たちの生活を変えつつあり、学術的、理論的領域から現実世界への技術進歩と移転は年々加速している。しかし、その進展と移行の間、デジタルプライバシ、所有権、コントロールなど倫理的に発展するためには、いくつかのオープンな問題と疑問に対処する必要がある。これらは、現在最も普及している人工知能のアプローチ、すなわち集中型ai(ceai)が疑問視されている理由の1つであり、最も到達可能な問題のいくつかを解決するために分散型人工知能(deai)のような他の方向も広く研究されている。本稿では,DAI分野における既存研究の体系的文献レビュー(SLR)を行い,71の特定研究の成果を報告する。この論文の主な焦点は、DeAIソリューションとネットワークの構築ブロックを特定し、ボトムアップアプローチからDEAI分析に取り組むことである。最終的には、研究とオープンな問題の今後の方向性が提案される。 Artificial intelligence is transforming our lives, and technological progress and transfer from the academic and theoretical sphere to the real world are accelerating yearly. But during that progress and transition, several open problems and questions need to be addressed for the field to develop ethically, such as digital privacy, ownership, and control. These are some of the reasons why the currently most popular approaches of artificial intelligence, i.e., centralized AI (CEAI), are questionable, with other directions also being widely explored, such as decentralized artificial intelligence (DEAI), to solve some of the most reaching problems. This paper provides a systematic literature review (SLR) of existing work in the field of DEAI, presenting the findings of 71 identified studies. The paper's primary focus is identifying the building blocks of DEAI solutions and networks, tackling the DEAI analysis from a bottom-up approach. In the end, future directions of research and open problems are proposed.	翻訳日:2024-02-06 16:46:26 公開日:2024-02-05
# 既設シアーム変圧器の近似帰属 Approximate Attributions for Off-the-Shelf Siamese Transformers ( http://arxiv.org/abs/2402.02883v1 ) ライセンス: Link先を確認	Lucas M\"oller and Dmitry Nikolaev and Sebastian Pad\'o	(参考訳) 文変換器のようなシームエンコーダは、最も理解されていない深層モデルの一つである。確立された帰属メソッドは、1つの入力を処理するのではなく2つの入力を比較するため、このモデルクラスに取り組むことができない。このギャップに対処するため,我々は最近,シアムエンコーダに特化した帰属法を提案した(m\"oller et al., 2023)。しかし、調整と微調整を必要とするため、市販モデルに直接適用することはできない。この作品ではこれらの制約を再評価し (i)原モデルの予測性能を維持する正確な帰属能力を有するモデル (ii)既成モデルに対する近似帰属を計算する方法。我々は、近似と正確な帰属を広範囲に比較し、モデルの異なる言語的側面への出席を分析するためにそれらを使用する。 siamese transformersがどの構文的役割を担っているか、否定をほとんど無視していること、意味的に反対の形容詞を判断する方法、語彙バイアスを示すこと、といった知見を得る。 Siamese encoders such as sentence transformers are among the least understood deep models. Established attribution methods cannot tackle this model class since it compares two inputs rather than processing a single one. To address this gap, we have recently proposed an attribution method specifically for Siamese encoders (M\"oller et al., 2023). However, it requires models to be adjusted and fine-tuned and therefore cannot be directly applied to off-the-shelf models. In this work, we reassess these restrictions and propose (i) a model with exact attribution ability that retains the original model's predictive performance and (ii) a way to compute approximate attributions for off-the-shelf models. We extensively compare approximate and exact attributions and use them to analyze the models' attendance to different linguistic aspects. We gain insights into which syntactic roles Siamese transformers attend to, confirm that they mostly ignore negation, explore how they judge semantically opposite adjectives, and find that they exhibit lexical bias.	翻訳日:2024-02-06 16:46:09 公開日:2024-02-05
# パルス型量子ニューラルネットワークの表現力の解き放つ Unleashing the Expressive Power of Pulse-Based Quantum Neural Networks ( http://arxiv.org/abs/2402.02880v1 ) ライセンス: Link先を確認	Han-Xiao Tao and Jiaqi Hu and Re-Bing Wu	(参考訳) ノイズ中間スケール量子(NISQ)デバイスに基づく量子機械学習(QML)は、限られた量子リソースの最適利用を必要とする。一般的に使われるゲートベースのqmlモデルは、ソフトウェアエンジニアにとって便利であるが、その表現性は有限コヒーレンス時間内の許容回路深さによって制限される。これとは対照的に、パルスベースのモデルでは、同じコヒーレンス時間内に「無限に」深い量子ニューラルネットワークを構築することが可能であり、複雑な学習タスクにおいてより表現力を高めることができる。本稿では,量子制御理論の観点から,このポテンシャルについて検討する。まず、パルスベースモデルの非線形性は、ゲートベースモデルにおけるデータ再ロードの連続的な限界と見なせる符号化プロセスに由来することを示唆する。次いで, 基礎物理系がアンサンブル制御可能である場合, パルスベースモデルが任意の非線形関数を近似できることを証明した。この条件下では、数値シミュレーションにより、パルス長や量子ビットの数を増やすことにより、表現性を高めることができる。期待されたように、パルスベースモデルがゲートベースモデルよりも表現力の解放を図った数値例を通して示す。 NISQデバイスを用いた表現型QMLモデルの理解と設計のための理論的基盤を確立する。 Quantum machine learning (QML) based on Noisy Intermediate-Scale Quantum (NISQ) devices requires the optimal utilization of limited quantum resources. The commonly used gate-based QML models are convenient for software engineers, but their expressivity is restricted by the permissible circuit depth within a finite coherence time. In contrast, pulse-based models enable the construction of "infinitely" deep quantum neural networks within the same coherence time, which may unleash greater expressive power for complex learning tasks. In this paper, we investigate this potential from the perspective of quantum control theory. We first indicate that the nonlinearity of pulse-based models comes from the encoding process that can be viewed as the continuous limit of data-reuploading in gate-based models. Subsequently, we prove that the pulse-based model can approximate arbitrary nonlinear functions when the underlying physical system is ensemble controllable. Under this condition, numerical simulations show that the expressivity can be enhanced by either increasing the pulse length or the number of qubits. As anticipated, we demonstrate through numerical examples that the pulse-based model can unleash more expressive power compared to the gate-based model. These findings establish a theoretical foundation for understanding and designing expressive QML models using NISQ devices.	翻訳日:2024-02-06 16:45:50 公開日:2024-02-05
# ePrivacy Directive(ePrivacy Directive)の技術的スコープに関する欧州データ保護理事会のガイドライン2/2023へのフィードバック Feedback to the European Data Protection Board's Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive ( http://arxiv.org/abs/2402.02877v1 ) ライセンス: Link先を確認	Cristiana Santos, Nataliia Bielova (PRIVATICS), Vincent Roca (PRIVATICS), Mathieu Cunche (PRIVATICS), Gilles Mertens (PRIVATICS), Karel Kubicek (ETHZ), Hamed Haddadi	(参考訳) edpbのガイドラインをとても歓迎しています。 eprivacy directive(eprivacy directive 5(3))の技術的スコープに関するガイドライン2/2023へのフィードバックをご覧ください。私たちのコメントは、EDPBによる提案されたテキストからの引用の後に提示されます。 We very much welcome the EDPB's Guidelines. Please find hereunder our feedback to the Guidelines 2/2023 on Technical Scope of Art. 5(3) of ePrivacy Directive. Our comments are presented after a quotation from the proposed text by the EDPB in a box.	翻訳日:2024-02-06 16:45:30 公開日:2024-02-05
# 協調UAVによるセル通信のオフロードのためのマルチエージェント強化学習 Multi-Agent Reinforcement Learning for Offloading Cellular Communications with Cooperating UAVs ( http://arxiv.org/abs/2402.02957v1 ) ライセンス: Link先を確認	Abhishek Mondal, Deepak Mishra, Ganesh Prasad, George C. Alexandropoulos, Azzam Alnahari, Riku Jantti	(参考訳) 地上のセルネットワークにおけるインテリジェントなデータ収集の効果的なソリューションは、特にモノのインターネット(Internet of Things)アプリケーションにおいて重要である。地上基地局のスペクトルとカバレッジの制限は、ネットワークユーザのデータレート要求の増大に対応する上で課題となる。高機敏さ、機動性、柔軟性で知られる無人航空機は、地上のbssからデータトラフィックを降ろし、追加アクセスポイントとして機能する代替手段を提供する。本稿では,地上BSからのデータトラフィックオフロードに複数のUAVを効率的に利用するための新しい手法を提案する。具体的には,UAVトラジェクトリとユーザ関連指標をサービス品質の制約下で協調的に最適化することで,UAVとのユーザ関連性を最大化する。定式化UAV制御問題は非凸かつ組合せ的であるため,本研究はマルチエージェント強化学習フレームワークを活用する。この枠組みでは、それぞれのUAVが独立したエージェントとして機能し、UAV間の協調行動を維持することを目的としている。提案手法では, 有限状態マルコフ決定過程を用いて, uavs速度の制約とそれらの軌道と状態空間の関係を考察する。訓練エピソードよりもuavの最適逐次意思決定方針を決定するために,低複雑性の分散状態行動報酬状態動作アルゴリズムを提案する。広範なシミュレーション結果は,提案する解析の有効性を検証し,最適uav軌道に関する貴重な知見を提供する。導出軌道はQ学習や粒子群最適化などのベンチマーク手法と比較して平均UAV関連性能が優れている。 Effective solutions for intelligent data collection in terrestrial cellular networks are crucial, especially in the context of Internet of Things applications. The limited spectrum and coverage area of terrestrial base stations pose challenges in meeting the escalating data rate demands of network users. Unmanned aerial vehicles, known for their high agility, mobility, and flexibility, present an alternative means to offload data traffic from terrestrial BSs, serving as additional access points. This paper introduces a novel approach to efficiently maximize the utilization of multiple UAVs for data traffic offloading from terrestrial BSs. Specifically, the focus is on maximizing user association with UAVs by jointly optimizing UAV trajectories and users association indicators under quality of service constraints. Since, the formulated UAVs control problem is nonconvex and combinatorial, this study leverages the multi agent reinforcement learning framework. In this framework, each UAV acts as an independent agent, aiming to maintain inter UAV cooperative behavior. The proposed approach utilizes the finite state Markov decision process to account for UAVs velocity constraints and the relationship between their trajectories and state space. A low complexity distributed state action reward state action algorithm is presented to determine UAVs optimal sequential decision making policies over training episodes. The extensive simulation results validate the proposed analysis and offer valuable insights into the optimal UAV trajectories. The derived trajectories demonstrate superior average UAV association performance compared to benchmark techniques such as Q learning and particle swarm optimization.	翻訳日:2024-02-06 16:36:58 公開日:2024-02-05
# LSTMを用いたKubernetesクラスタの自動災害復旧システムの設計と実装 Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM ( http://arxiv.org/abs/2402.02938v1 ) ライセンス: Link先を確認	Ji-Beom Kim, Je-Bum Choi, and Eun-Sung Jung	(参考訳) 現代のビジネス環境におけるデータの重要性が高まる中、効果的なデータマンエイジメントと保護戦略が研究の注目を集めている。クラウド環境におけるデータ保護は、情報資産の保護と持続可能なサービスの維持に不可欠である。本研究では、Kubernetes管理プレートフォームとバックアップとリストアツールを統合するシステム構造を紹介する。このシステムは災害を即座に検出し、別のkubernetesクラスタからアプリケーションを自動的にリカバリするように設計されている。実験の結果, 人間の介入なしに15秒以内に修復処理を行い, 迅速な回復を可能にした。これにより、手動のリカバリプロセスに比べて遅延やエラーの可能性を著しく低減し、クラウド環境におけるデータ管理とリカバリのef-ficiencyが向上する。さらに,Long Short-Term Memory (LSTM) を用いてクラスタのCPU利用率を予測する。この予測を通したスケジューリングの必要性は、スケジューリングなしでの実験と比較することでより明確になり、性能劣化を防ぐ能力を示す。本研究は,クラウド環境における自動リカバリシステムの効率性と必要性を強調し,今後の研究に向けた新たな方向性を定めている。 With the increasing importance of data in the modern business environment, effective data man-agement and protection strategies are gaining increasing research attention. Data protection in a cloud environment is crucial for safeguarding information assets and maintaining sustainable services. This study introduces a system structure that integrates Kubernetes management plat-forms with backup and restoration tools. This system is designed to immediately detect disasters and automatically recover applications from another kubernetes cluster. The experimental results show that this system executes the restoration process within 15 s without human intervention, enabling rapid recovery. This, in turn, significantly reduces the potential for delays and errors compared with manual recovery processes, thereby enhancing data management and recovery ef-ficiency in cloud environments. Moreover, our research model predicts the CPU utilization of the cluster using Long Short-Term Memory (LSTM). The necessity of scheduling through this predict is made clearer through comparison with experiments without scheduling, demonstrating its ability to prevent performance degradation. This research highlights the efficiency and necessity of automatic recovery systems in cloud environments, setting a new direction for future research.	翻訳日:2024-02-06 16:36:31 公開日:2024-02-05
# ゲート畳み込みとコンテクスト再構成損失を伴うパノラマ画像のインパインティング Panoramic Image Inpainting With Gated Convolution And Contextual Reconstruction Loss ( http://arxiv.org/abs/2402.02936v1 ) ライセンス: Link先を確認	Li Yu, Yanjun Gao, Farhad Pakdaman, Moncef Gabbouj	(参考訳) 深層学習に基づく手法は、パノラマ画像の塗布作業の促進効果を示す。しかし、有効な画素を無効な画素と区別し、破損した領域の適切な参照を見つける既存の手法は困難であり、その結果、被写体化に繋がる。これらの課題に対応して,顔生成器,立方体生成器,サイドブランチ,および2つの識別器からなるパノラマ画像インペインティングフレームワークを提案する。ネットワーク入力にはCubemap Projection(CMP)フォーマットを使用します。このジェネレータは、有効な画素を無効画素と区別するためにゲート畳み込みを用いるが、サイドブランチは、コンテクストリコンストラクション(cr)損失を利用して、ジェネレータが行方不明領域を塗りつぶすのに最も適した参照パッチを見つけるようガイドする。提案手法は,PSNRおよびSSIMの観点から,SUN360ストリートビューデータセット上の最先端(SOTA)手法と比較する。実験結果とアブレーション実験により,提案手法はSOTAを定量的にも定性的にも優れていることが示された。 Deep learning-based methods have demonstrated encouraging results in tackling the task of panoramic image inpainting. However, it is challenging for existing methods to distinguish valid pixels from invalid pixels and find suitable references for corrupted areas, thus leading to artifacts in the inpainted results. In response to these challenges, we propose a panoramic image inpainting framework that consists of a Face Generator, a Cube Generator, a side branch, and two discriminators. We use the Cubemap Projection (CMP) format as network input. The generator employs gated convolutions to distinguish valid pixels from invalid ones, while a side branch is designed utilizing contextual reconstruction (CR) loss to guide the generators to find the most suitable reference patch for inpainting the missing region. The proposed method is compared with state-of-the-art (SOTA) methods on SUN360 Street View dataset in terms of PSNR and SSIM. Experimental results and ablation study demonstrate that the proposed method outperforms SOTA both quantitatively and qualitatively.	翻訳日:2024-02-06 16:36:11 公開日:2024-02-05
# InterpretCC: 独立解釈型ニューラルネットワークの条件計算 InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks ( http://arxiv.org/abs/2402.02933v1 ) ライセンス: Link先を確認	Vinitra Swamy, Julian Blackwell, Jibril Frej, Martin Jaggi, Tanja K\"aser	(参考訳) ニューラルネットワークの現実世界の解釈性は、3つの懸念のトレードオフである。 1)説明近似(ポストホックアプローチなど)を人間に信頼させる必要がある。 2)説明の理解性を損なう(例えば、自動識別された特徴マスク)。 3) モデルパフォーマンス(例えば決定木)を損なう。これらの欠点は、信頼できる説明、行動可能な解釈、正確な予測を必要とする、教育、医療、自然言語のような人間向けドメインでは受け入れられない。本稿では,人間中心の解釈性を保証しつつ,予測前の特徴を適応的かつスパースに活性化することにより,最先端モデルに匹敵する性能を維持しつつ,人間中心の解釈可能性を保証する,解釈可能なニューラルネットワークの一群である interpretcc (interpretable conditional computation) を提案する。私たちはこのアイデアを、人間が関心のあるトピックを特定するための解釈可能なmixed-of-expertsモデルに拡張し、各データポイントの特徴空間を個別にトピックサブネットワークに分離し、これらのトピックサブネットワークを適応的かつスパースにアクティベートします。本稿では,6つのオンライン教育コース,ニュース分類,乳がん診断,レビュー感情という,テキストおよび表型データに対するInterpretCCアーキテクチャのバリエーションを実世界のベンチマークで示す。 Real-world interpretability for neural networks is a tradeoff between three concerns: 1) it requires humans to trust the explanation approximation (e.g. post-hoc approaches), 2) it compromises the understandability of the explanation (e.g. automatically identified feature masks), and 3) it compromises the model performance (e.g. decision trees). These shortcomings are unacceptable for human-facing domains, like education, healthcare, or natural language, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable mixture-of-experts model, that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks. We demonstrate variations of the InterpretCC architecture for text and tabular data across several real-world benchmarks: six online education courses, news classification, breast cancer diagnosis, and review sentiment.	翻訳日:2024-02-06 16:35:50 公開日:2024-02-05
# 印刷mlpの離散的遺伝的学習における組込みハードウェア近似 Embedding Hardware Approximations in Discrete Genetic-based Training for Printed MLPs ( http://arxiv.org/abs/2402.02930v1 ) ライセンス: Link先を確認	Florentia Afentaki, Michael Hefenbrock, Georgios Zervakis, Mehdi B. Tahoori	(参考訳) Printed Electronics (PE) は、低コストやフレキシブルな製造など、その特性が異なるため、幅広いコンピューティングのための有望な技術として注目されている。従来のシリコンベースの技術とは異なり、PEは伸縮性、適合性、および非有毒なハードウェアを可能にする。しかし、PEは機能サイズが大きいため、機械学習(ML)分類器のような複雑な回路の実装は困難である。近似コンピューティングは、MLP(Multilayer Perceptrons)のようなML回路のハードウェアコストを削減することが証明されている。本稿では,ハードウェア近似をMLPトレーニングプロセスに統合することにより,近似計算の利点を最大化する。ハードウェア近似の離散的な性質から,印刷用MLPに特化して設計された,遺伝的に近似したハードウェア認識トレーニング手法を提案し,実装する。 5%の精度損失が得られた場合,MLPはベースラインに比べて5倍以上の面積と電力削減を実現し,その性能は近似的および確率的印刷MLPよりも優れていた。 Printed Electronics (PE) stands out as a promisingtechnology for widespread computing due to its distinct attributes, such as low costs and flexible manufacturing. Unlike traditional silicon-based technologies, PE enables stretchable, conformal,and non-toxic hardware. However, PE are constrained by larger feature sizes, making it challenging to implement complex circuits such as machine learning (ML) classifiers. Approximate computing has been proven to reduce the hardware cost of ML circuits such as Multilayer Perceptrons (MLPs). In this paper, we maximize the benefits of approximate computing by integrating hardware approximation into the MLP training process. Due to the discrete nature of hardware approximation, we propose and implement a genetic-based, approximate, hardware-aware training approach specifically designed for printed MLPs. For a 5% accuracy loss, our MLPs achieve over 5x area and power reduction compared to the baseline while outperforming state of-the-art approximate and stochastic printed MLPs.	翻訳日:2024-02-06 16:35:28 公開日:2024-02-05
# 歴史的航空機のケースセグメンテーションXXL-CTチャレンジ Instance Segmentation XXL-CT Challenge of a Historic Airplane ( http://arxiv.org/abs/2402.02928v1 ) ライセンス: Link先を確認	Roland Gruber and Johann Christopher Engster and Markus Michen and Nele Blum and Maik Stille and Stefan Gerth and Thomas Wittenberg	(参考訳) XXL-CT画像における複合物体のサンプルセグメンテーションは、非破壊検査においてユニークな課題である。この複雑さは、既知の参照セグメンテーションラベルの欠如、限定可能なセグメンテーションツール、および部分的に劣化した画像品質から生じる。機械学習に基づく画像セグメンテーションの最近の進歩を評価するため,歴史航空機のインスタンスセグメンテーションXXL-CTチャレンジを実施した。この課題は、スクリュー、リベット、金属シート、圧力管など、様々な航空機の部品を効率的にディライン化するための、自動または対話的なインスタンスセグメンテーション方法を探ることであった。本稿では,この課題の組織と成果を報告し,提案したセグメンテーション手法の能力と限界について述べる。 Instance segmentation of compound objects in XXL-CT imagery poses a unique challenge in non-destructive testing. This complexity arises from the lack of known reference segmentation labels, limited applicable segmentation tools, as well as partially degraded image quality. To asses recent advancements in the field of machine learning-based image segmentation, the "Instance Segmentation XXL-CT Challenge of a Historic Airplane" was conducted. The challenge aimed to explore automatic or interactive instance segmentation methods for an efficient delineation of the different aircraft components, such as screws, rivets, metal sheets or pressure tubes. We report the organization and outcome of this challenge and describe the capabilities and limitations of the submitted segmentation methods.	翻訳日:2024-02-06 16:35:11 公開日:2024-02-05
# コグネート変換器を用いたリンク予測タスクとしての自動コグネート検出 Automated Cognate Detection as a Supervised Link Prediction Task with Cognate Transformer ( http://arxiv.org/abs/2402.02926v1 ) ライセンス: Link先を確認	V.S.D.S.Mahesh Akavarapu and Arnab Bhattacharya	(参考訳) 関連言語間における認識の同定は、歴史的言語学における主要な問題の一つである。自動コグネート同定は、音の対応の特定、原言語再構築、系統分類など、下流のいくつかのタスクに役立ちます。以前のコグネート識別法は、主に多言語単語リストで計算された音素の分布に基づいており、コグネートクラスタ間のリンクを定義するコグネートラベルをほとんど使用していない。本稿では,コグネート自動検出のための計算生物学にインスパイアされたトランスフォーマーアーキテクチャを提案する。一定の監督範囲を超えて、既存の方法よりも優れた性能を示し、さらなる監督強化とともに着実に改善され、ラベル付き情報の利用の有効性が証明される。また,複数のシーケンスアライメントを入力として受け入れ,リンク予測ヘッドを備えたエンドツーエンドアーキテクチャを持つことにより,優れた性能を実現すると同時に,計算時間を節約できることを実証した。 Identification of cognates across related languages is one of the primary problems in historical linguistics. Automated cognate identification is helpful for several downstream tasks including identifying sound correspondences, proto-language reconstruction, phylogenetic classification, etc. Previous state-of-the-art methods for cognate identification are mostly based on distributions of phonemes computed across multilingual wordlists and make little use of the cognacy labels that define links among cognate clusters. In this paper, we present a transformer-based architecture inspired by computational biology for the task of automated cognate detection. Beyond a certain amount of supervision, this method performs better than the existing methods, and shows steady improvement with further increase in supervision, thereby proving the efficacy of utilizing the labeled information. We also demonstrate that accepting multiple sequence alignments as input and having an end-to-end architecture with link prediction head saves much computation time while simultaneously yielding superior performance.	翻訳日:2024-02-06 16:34:56 公開日:2024-02-05
# 産業試験結果データセットにおける動的テストケース優先順位付け Dynamic Test Case Prioritization in Industrial Test Result Datasets ( http://arxiv.org/abs/2402.02925v1 ) ライセンス: Link先を確認	Alina Torbunova, Per Erik Strandberg, Ivan Porres	(参考訳) ソフトウェア開発における回帰テストは、新しいソフトウェア機能が既存の機能に影響するかどうかをチェックする。回帰テストは継続的開発と統合において重要なタスクであり、ソフトウェアは小さなインクリメントで構築され、新しい機能はできるだけ早く統合される。したがって、開発者が迅速に障害を通知することが重要です。本稿では,静的優先度付けアルゴリズムと動的優先度付けアルゴリズムを組み合わせたテストケース優先順位付けスキーマを提案する。動的優先順位付けアルゴリズムは、テストが実行されている間、フライでテストの実行順序を再構成する。そこで本稿では条件付き確率動的アルゴリズムを提案する。 3つの産業データセットでソリューションを評価し,それに対する平均故障検出率を活用する。主な発見は、我々の動的優先順位付けアルゴリズムが可能であることである。 a) 各テストケースに優先スコアを割り当てる任意の静的アルゴリズムを適用する b) テストケース間に障害相関がある場合、静的アルゴリズムの性能を向上させることができる c) 静的アルゴリズムの性能を低下させることもできるが、静的スケジューリングが最適に近いレベルで実行される場合のみである。 Regression testing in software development checks if new software features affect existing ones. Regression testing is a key task in continuous development and integration, where software is built in small increments and new features are integrated as soon as possible. It is therefore important that developers are notified about possible faults quickly. In this article, we propose a test case prioritization schema that combines the use of a static and a dynamic prioritization algorithm. The dynamic prioritization algorithm rearranges the order of execution of tests on the fly, while the tests are being executed. We propose to use a conditional probability dynamic algorithm for this. We evaluate our solution on three industrial datasets and utilize Average Percentage of Fault Detection for that. The main findings are that our dynamic prioritization algorithm can: a) be applied with any static algorithm that assigns a priority score to each test case b) can improve the performance of the static algorithm if there are failure correlations between test cases c) can also reduce the performance of the static algorithm, but only when the static scheduling is performed at a near optimal level.	翻訳日:2024-02-06 16:34:40 公開日:2024-02-05
# 古典的無線マイクロ波星座を用いた量子光学状態の符号化 Encoding quantum optical states with classical wireless microwave constellation ( http://arxiv.org/abs/2402.02923v1 ) ライセンス: Link先を確認	Niloy Ghosh, Sarang Pendharker	(参考訳) 本稿では、従来のマイクロ波領域で符号化されたデジタル情報を量子光学領域にシームレスに変換する基礎となる物理を考察する。我々は、シームレスな無線-光変換器における変換を媒介する量子力学的相互作用を包括的にモデル化する。コンバータの物理幅と変調素子間間隔を適切に選択することにより,量子力学的相互作用を向上できることを強調した。本研究は、古典的なマイクロ波コンステレーションによる量子光学位相空間の符号化についても強調する。さらに、量子ショットノイズによる符号化量子光学位相空間におけるシンボル間重なりの課題に対処する。報告された知見は、将来、古典的マイクロ波および量子光学通信リンクをブリッジするための基盤となる枠組みを提供する。 This paper explores the underlying physics behind seamless transduction of digital information encoded in the classical microwave domain to the quantum optical domain. We comprehensively model the quantum mechanical interaction mediating the transduction in a seamless wireless-to-optical converter. We highlight that the quantum mechanical interaction can be enhanced by suitably choosing the physical width and the inter-modulating element spacing in the converter. This study also highlights the encoding of quantum optical phase-space with classical microwave constellation. Furthermore, the challenge of inter-symbol overlap in the encoded quantum optical phase-space due to quantum shot noise is addressed. The reported findings provide a foundational framework for bridging classical microwave and quantum optical communication links in the future.	翻訳日:2024-02-06 16:34:26 公開日:2024-02-05
# マルチイルミナントシーンにおけるスムースネス技術による画素単位のカラーコンポータンス Pixel-Wise Color Constancy via Smoothness Techniques in Multi-Illuminant Scenes ( http://arxiv.org/abs/2402.02922v1 ) ライセンス: Link先を確認	Umut Cem Entok, Firas Laakom, Farhad Pakdaman, Moncef Gabbouj	(参考訳) ほとんどのシーンは、伝統的な一様照明の仮定が無効であるいくつかの光源によって照らされる。この問題は、主に複数の光源が画像に複雑な空間的影響をもたらすため、ほとんどのカラーコンステンシー法では無視される。さらに、既存の多くの多重照度法は、自然画像の空間依存性から生じる照明の滑らかな変化を保存できない。そこで本研究では,複数光源による画素毎の照明マップを学習し,多色コンテンタンス法を提案する。提案手法は,全変動損失を伴うトレーニングを正則化することにより,隣接画素内の滑らかさを強制する。さらに、エッジを保ちながら、推定画像の自然な外観を高めるために、二元フィルタを更に設定する。さらに,基礎的真理の不確実性にも拘わらず,モデルがうまく一般化できるラベルスモーニング手法を提案する。定量的および定性的な実験により,提案手法が最先端技術より優れていることを示す。 Most scenes are illuminated by several light sources, where the traditional assumption of uniform illumination is invalid. This issue is ignored in most color constancy methods, primarily due to the complex spatial impact of multiple light sources on the image. Moreover, most existing multi-illuminant methods fail to preserve the smooth change of illumination, which stems from spatial dependencies in natural images. Motivated by this, we propose a novel multi-illuminant color constancy method, by learning pixel-wise illumination maps caused by multiple light sources. The proposed method enforces smoothness within neighboring pixels, by regularizing the training with the total variation loss. Moreover, a bilateral filter is provisioned further to enhance the natural appearance of the estimated images, while preserving the edges. Additionally, we propose a label-smoothing technique that enables the model to generalize well despite the uncertainties in ground truth. Quantitative and qualitative experiments demonstrate that the proposed method outperforms the state-of-the-art.	翻訳日:2024-02-06 16:34:16 公開日:2024-02-05
# インクリメンタル評価を用いた行動パターンの最小セットのマイニング Mining a Minimal Set of Behavioral Patterns using Incremental Evaluation ( http://arxiv.org/abs/2402.02921v1 ) ライセンス: Link先を確認	Mehdi Acheli, Daniela Grigori, Matthias Weidlich	(参考訳) プロセスマイニングは、プロセス実行中に情報システムによって生成されたイベントログを分析する方法を提供する。これにより、医療、製造、電子商取引といった分野におけるプロセスの設計、検証、実行をサポートする。変動が大きいフレキシブルなプロセスの規則性を探るため,その基盤となるプロセスを共同で記述する反復的な行動パターンを考察した。しかし、既存の行動パターンマイニングのアプローチには2つの制限がある。まず、インクリメンタルな計算はパターン候補の生成にのみ取り入れられるが、品質評価には組み込まれないため、スケーラビリティが制限される。第二に、マイニングされたパターンに基づくプロセス分析は、実際的なアプリケーションシナリオで得られるパターンの数が圧倒的に多いため、効果が限られており、その多くが冗長である。本稿では,これらの制約に対処し,行動パターンに基づく複雑で柔軟なプロセスの解析を容易にする。具体的には、パターン候補の品質を評価し、その効率を最適化するインクリメンタルな手順により、最初の行動パターンマイニングアルゴリズムであるCOBPAMを改善する。結果のパターンをより効果的に活用することを目的として、さらに冗長パターンに対するプルーニング戦略を提案し、残りのパターン間の関係を抽出して視覚化してプロセスの洞察を提供する方法を示す。実世界の多様なデータセットを用いた実験は、パターンマイニングに必要なランタイムの大幅な削減を示し、定性的な評価は、パターン間の関係が基盤となるプロセスの分析をどのように導くかを強調している。 Process mining provides methods to analyse event logs generated by information systems during the execution of processes. It thereby supports the design, validation, and execution of processes in domains ranging from healthcare, through manufacturing, to e-commerce. To explore the regularities of flexible processes that show a large behavioral variability, it was suggested to mine recurrent behavioral patterns that jointly describe the underlying process. Existing approaches to behavioral pattern mining, however, suffer from two limitations. First, they show limited scalability as incremental computation is incorporated only in the generation of pattern candidates, but not in the evaluation of their quality. Second, process analysis based on mined patterns shows limited effectiveness due to an overwhelmingly large number of patterns obtained in practical application scenarios, many of which are redundant. In this paper, we address these limitations to facilitate the analysis of complex, flexible processes based on behavioral patterns. Specifically, we improve COBPAM, our initial behavioral pattern mining algorithm, by an incremental procedure to evaluate the quality of pattern candidates, optimizing thereby its efficiency. Targeting a more effective use of the resulting patterns, we further propose pruning strategies for redundant patterns and show how relations between the remaining patterns are extracted and visualized to provide process insights. Our experiments with diverse real-world datasets indicate a considerable reduction of the runtime needed for pattern mining, while a qualitative assessment highlights how relations between patterns guide the analysis of the underlying process.	翻訳日:2024-02-06 16:34:01 公開日:2024-02-05
# 散逸系における絡み合った多重、非対称性、量子mpemba効果 Entangled multiplets, asymmetry, and quantum Mpemba effect in dissipative systems ( http://arxiv.org/abs/2402.02918v1 ) ライセンス: Link先を確認	Fabio Caceffo, Sara Murciano, Vincenzo Alba	(参考訳) 近年、エンタングルメント非対称性は、量子クエンチ後の平衡外量子多体系の動的対称性回復を理解するための情報ツールとして登場した。可積分系に対して、非対称性は、Refで指摘された準粒子図形を通して時空のスケーリング限界で理解することができる。 [1]. しかし、一般的な初期状態からの量子クエンチの準粒子像はいまだに欠けていた。ここでは,非対称性を構成する主成分である還元密度行列の荷電モーメントに対する正準粒子像を推定する。我々の公式は、任意の数の励起の絡み合った多重項を生成するクエンチに対して機能する。結果のベンチマークを$XX$のスピンチェーンで行います。まず、多次元定常位相近似に基づく初等的アプローチを用いて、[2] で処理されたクエンチに対する荷電モーメントのダイナミクスを厳密に導出する$\textit{ab initio}$ を提供する。次に, 準粒子画像中では, 同じ結果が容易に得られることを示す。解析の副産物として、長い時間で消滅する絡み合う非対称性を保証する一般的な基準を得る。次に、リンドブラッドマスター方程式を用いて、エンタングルメント非対称性に対する利得と損失散逸の影響を研究する。具体的には、放散の存在下でのいわゆる量子Mpemba効果(QME)の運命について検討する。また,QMEの条件を準粒子ベースで解釈することで,単位動力学が示さない場合でも,散逸はQMEを誘導できることを示す。 Recently, the entanglement asymmetry emerged as an informative tool to understand dynamical symmetry restoration in out-of-equilibrium quantum many-body systems after a quantum quench. For integrable systems the asymmetry can be understood in the space-time scaling limit via the quasiparticle picture, as it was pointed out in Ref. [1]. However, a quasiparticle picture for quantum quenches from generic initial states was still lacking. Here we conjecture a full-fledged quasiparticle picture for the charged moments of the reduced density matrix, which are the main ingredients to construct the asymmetry. Our formula works for quenches producing entangled multiplets of an arbitrary number of excitations. We benchmark our results in the $XX$ spin chain. First, by using an elementary approach based on the multidimensional stationary phase approximation we provide an $\textit{ab initio}$ rigorous derivation of the dynamics of the charged moments for the quench treated in [2]. Then, we show that the same results can be straightforwardly obtained within our quasiparticle picture. As a byproduct of our analysis, we obtain a general criterion ensuring a vanishing entanglement asymmetry at long times. Next, by using the Lindblad master equation, we study the effect of gain and loss dissipation on the entanglement asymmetry. Specifically, we investigate the fate of the so-called quantum Mpemba effect (QME) in the presence of dissipation. We show that dissipation can induce QME even if unitary dynamics that does not show it, and we provide a quasiparticle-based interpretation of the condition for the QME.	翻訳日:2024-02-06 16:33:36 公開日:2024-02-05
# 近縁言語間の相互理解性評価のための計算モデル A Computational Model for the Assessment of Mutual Intelligibility Among Closely Related Languages ( http://arxiv.org/abs/2402.02915v1 ) ライセンス: Link先を確認	Jessica Nieder and Johann-Mattis List	(参考訳) 密接に関連する言語は、ある言語の話者が積極的に学習することなく他の言語の話者を理解することができる言語類似性を示す。相互の知性は程度によって異なり、典型的には精神言語実験で試験される。相互理解性を計算的に研究するために,多言語意味ベクトルと多言語音クラスで拡張する言語を人間が学習する認知過程を近似する計算モデルである線形判別学習器を用いたコンピュータ支援手法を提案する。我々は、ドイツ語、オランダ語、英語の3つの近縁なゲルマン語のコグネートデータに基づいてモデルを検証した。私たちのモデルの理解の正確さは、 1)反射の自動トリミング 2) 理解がテストされる言語ペア。多言語モデルアプローチは,言語間の相互理解度の自動評価のための新しい方法論的知見を提供するだけでなく,線形判別学習を多言語環境に拡張する。 Closely related languages show linguistic similarities that allow speakers of one language to understand speakers of another language without having actively learned it. Mutual intelligibility varies in degree and is typically tested in psycholinguistic experiments. To study mutual intelligibility computationally, we propose a computer-assisted method using the Linear Discriminative Learner, a computational model developed to approximate the cognitive processes by which humans learn languages, which we expand with multilingual semantic vectors and multilingual sound classes. We test the model on cognate data from German, Dutch, and English, three closely related Germanic languages. We find that our model's comprehension accuracy depends on 1) the automatic trimming of inflections and 2) the language pair for which comprehension is tested. Our multilingual modelling approach does not only offer new methodological findings for automatic testing of mutual intelligibility across languages but also extends the use of Linear Discriminative Learning to multilingual settings.	翻訳日:2024-02-06 16:33:14 公開日:2024-02-05
# DS-MS-TCN: デュアルスケール多段階時間畳み込みネットワークによるオタゴ運動認識 DS-MS-TCN: Otago Exercises Recognition with a Dual-Scale Multi-Stage Temporal Convolutional Network ( http://arxiv.org/abs/2402.02910v1 ) ライセンス: Link先を確認	Meng Shang, Lenore Dedeyne, Jolan Dupont, Laura Vercauteren, Nadjia Amini, Laurence Lapauw, Evelien Gielen, Sabine Verschueren, Carolina Varon, Walter De Raedt, Bart Vanrumste	(参考訳) オタゴ・エクササイズ・プログラム(OEP)は、バランスと強度を高めることを目的とした高齢者向けの重要なリハビリテーションイニシアチブである。 OEP認識にウェアラブルセンサーを用いた以前の研究にもかかわらず、既存の研究は精度と堅牢性に関して限界を示してきた。本研究は,地域在住高齢者の日常生活におけるOEP運動を認識するために,腰に装着した慣性測定装置(IMU)を用いて,これらの制約に対処する。 36人の高齢者のコホートが実験に参加し、さらに7人の高齢者が自宅でのアセスメントに参加した。本研究は,2段階のシーケンス・ツー・シーケンス分類のために設計したDual-Scale Multi-Stage Temporal Convolutional Network (DS-MS-TCN)を提案する。第1段階では、モデルは各エクササイズ(マイクロラベル)の反復を認識することに集中する。その後の段階は認識を拡張し、完全な範囲の運動(マクロラベル)を包含する。 DS-MS-TCNモデルは、既存の最先端ディープラーニングモデルを超え、f1スコアが80%以上、IoU(Intersection over Union) f1スコアが60%以上である。特に、このモデルはスライディングウインドウ技術を用いた先行研究より優れており、後処理段階やウィンドウサイズ調整の必要性がなくなる。本研究は,人間活動認識(har)システムを強化するための新たな視点を,各活動の反復認識を通じて提示する。 The Otago Exercise Program (OEP) represents a crucial rehabilitation initiative tailored for older adults, aimed at enhancing balance and strength. Despite previous efforts utilizing wearable sensors for OEP recognition, existing studies have exhibited limitations in terms of accuracy and robustness. This study addresses these limitations by employing a single waist-mounted Inertial Measurement Unit (IMU) to recognize OEP exercises among community-dwelling older adults in their daily lives. A cohort of 36 older adults participated in laboratory settings, supplemented by an additional 7 older adults recruited for at-home assessments. The study proposes a Dual-Scale Multi-Stage Temporal Convolutional Network (DS-MS-TCN) designed for two-level sequence-to-sequence classification, incorporating them in one loss function. In the first stage, the model focuses on recognizing each repetition of the exercises (micro labels). Subsequent stages extend the recognition to encompass the complete range of exercises (macro labels). The DS-MS-TCN model surpasses existing state-of-the-art deep learning models, achieving f1-scores exceeding 80% and Intersection over Union (IoU) f1-scores surpassing 60% for all four exercises evaluated. Notably, the model outperforms the prior study utilizing the sliding window technique, eliminating the need for post-processing stages and window size tuning. To our knowledge, we are the first to present a novel perspective on enhancing Human Activity Recognition (HAR) systems through the recognition of each repetition of activities.	翻訳日:2024-02-06 16:32:57 公開日:2024-02-05
# 自動走行における歩行者検出のための安全適応損失 A Safety-Adapted Loss for Pedestrian Detection in Automated Driving ( http://arxiv.org/abs/2402.02986v1 ) ライセンス: Link先を確認	Maria Lyssenko, Piyush Pimplikar, Maarten Bieshaar, Farzad Nozarian, Rudolph Triebel	(参考訳) 自動走行(AD)のような安全クリティカルな領域では、オブジェクト検出器によるエラーは歩行者や他の脆弱な道路利用者(VRU)を危険にさらす可能性がある。一般的な評価指標は適切な安全性指標ではないため、最近の研究では、安全クリティカルなVRUを特定し、オブジェクト検出器に対するリスクをバックアノテートするアプローチが採用されている。しかし、これらのアプローチはディープニューラルネットワーク(dnn)トレーニングプロセスにおける安全性因子を考慮しない。したがって、最先端のDNNは、すべての誤検知を、その臨界性に関係なく等しく罰する。その後、事故発生の軽減、すなわち偽陰性化を図り、重要な歩行者の検知性能を高めるために、安全に配慮した訓練戦略が必要である。本稿では,トレーニング中の歩行者あたりの臨界点の推定値を活用する,新たな安全意識の損失変動を提案する。我々は,移動領域からの到達性設定に基づく時間対衝突(TTC-RSB)測定値と距離情報を利用して,臨界度を定量化する最悪の脅威を考慮した。 nuScenesデータセット上でのRetinaNetとFCOSを用いた評価結果から,安全を意識した損失関数を用いたモデルのトレーニングにより,安全クリティカルゾーン外における歩行者の歩行性能を損なうことなく,重要な歩行者の誤検出を軽減できることが示された。 In safety-critical domains like automated driving (AD), errors by the object detector may endanger pedestrians and other vulnerable road users (VRU). As common evaluation metrics are not an adequate safety indicator, recent works employ approaches to identify safety-critical VRU and back-annotate the risk to the object detector. However, those approaches do not consider the safety factor in the deep neural network (DNN) training process. Thus, state-of-the-art DNN penalizes all misdetections equally irrespective of their criticality. Subsequently, to mitigate the occurrence of critical failure cases, i.e., false negatives, a safety-aware training strategy might be required to enhance the detection performance for critical pedestrians. In this paper, we propose a novel safety-aware loss variation that leverages the estimated per-pedestrian criticality scores during training. We exploit the reachability set-based time-to-collision (TTC-RSB) metric from the motion domain along with distance information to account for the worst-case threat quantifying the criticality. Our evaluation results using RetinaNet and FCOS on the nuScenes dataset demonstrate that training the models with our safety-aware loss function mitigates the misdetection of critical pedestrians without sacrificing performance for the general case, i.e., pedestrians outside the safety-critical zone.	翻訳日:2024-02-06 16:25:12 公開日:2024-02-05
# 道路シーン理解のためのマルチモーダルマルチタスク基礎モデルの構築:パラダイムの学習から Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives ( http://arxiv.org/abs/2402.02968v1 ) ライセンス: Link先を確認	Sheng Luo, Wei Chen, Wanxin Tian, Rui Liu, Luanxuan Hou, Xiubao Zhang, Haifeng Shen, Ruiqi Wu, Shuyi Geng, Yi Zhou, Ling Shao, Yi Yang, Bojun Gao, Qun Li and Guobin Wu	(参考訳) ファンデーションモデルは様々な分野に大きな影響を与えており、インテリジェントシステムの能力を著しく形作る重要なコンポーネントとして現れている。インテリジェントな車両の文脈では、基礎モデルの力を活用することは、視覚理解の顕著な進歩をもたらす変換的であることが証明されている。マルチモーダルおよびマルチタスク学習機能を備えたマルチモーダルマルチタスク視覚理解基礎モデル(mm-vufms)は、多様なモダリティからデータを効果的に処理し、融合し、強力な適応性を持つ様々な運転関連タスクを同時に処理し、周囲のシーンをより総合的に理解する。本研究では道路シーン用に特別に設計されたmm-vufmの系統的解析を行う。我々の目標は、タスク固有のモデル、統合マルチモーダルモデル、統一マルチタスクモデル、基礎モデル推進技術など、共通プラクティスの包括的な概要を提供するだけでなく、多様な学習パラダイムにおける彼らの高度な能力を強調することにある。これらのパラダイムには、オープンワールド理解、ロードシーンの効率的な転送、継続的な学習、インタラクティブで生成能力が含まれる。さらに、クローズドループ駆動システム、解釈可能性、エンボディドドライブエージェント、世界モデルなど、重要な課題や今後のトレンドに関する洞察を提供する。道路現場におけるMM-VUFMの最近の発展を反映させるため,我々は, https://github.com/rolsheng/MM-VUFM4DSに継続的に更新されたレポジトリを構築した。 Foundation models have indeed made a profound impact on various fields, emerging as pivotal components that significantly shape the capabilities of intelligent systems. In the context of intelligent vehicles, leveraging the power of foundation models has proven to be transformative, offering notable advancements in visual understanding. Equipped with multi-modal and multi-task learning capabilities, multi-modal multi-task visual understanding foundation models (MM-VUFMs) effectively process and fuse data from diverse modalities and simultaneously handle various driving-related tasks with powerful adaptability, contributing to a more holistic understanding of the surrounding scene. In this survey, we present a systematic analysis of MM-VUFMs specifically designed for road scenes. Our objective is not only to provide a comprehensive overview of common practices, referring to task-specific models, unified multi-modal models, unified multi-task models, and foundation model prompting techniques, but also to highlight their advanced capabilities in diverse learning paradigms. These paradigms include open-world understanding, efficient transfer for road scenes, continual learning, interactive and generative capability. Moreover, we provide insights into key challenges and future trends, such as closed-loop driving systems, interpretability, embodied driving agents, and world models. To facilitate researchers in staying abreast of the latest developments in MM-VUFMs for road scenes, we have established a continuously updated repository at https://github.com/rolsheng/MM-VUFM4DS	翻訳日:2024-02-06 16:24:48 公開日:2024-02-05
# 条件付きディープGEMによる混合騒音と後部推定 Mixed Noise and Posterior Estimation with Conditional DeepGEM ( http://arxiv.org/abs/2402.02964v1 ) ライセンス: Link先を確認	Paul Hagemann, Johannes Hertrich, Maren Casfor, Sebastian Heidenreich, Gabriele Steidl	(参考訳) 混合ノイズモデルを用いた間接計測とナノメトロジーからの応用に動機づけられ,ベイズ逆問題における後方および雑音パラメータを共同で推定する新しいアルゴリズムを開発した。本稿では,期待最大化(em)アルゴリズムを用いてこの問題を解決することを提案する。現在の雑音パラメータに基づいて、後部を近似した条件正規化フローをEステップで学習する。 mステップでは、分析式を持つemアルゴリズムによって再びノイズパラメータの更新を見つけることを提案する。我々は,条件付き正規化流のトレーニングを前後KLと比較し,従来の手法とは異なり,我々のモデルが多くの測定値から情報を組み込むことができることを示す。 Motivated by indirect measurements and applications from nanometrology with a mixed noise model, we develop a novel algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that approximates the posterior. In the M-step, we propose to find the noise parameter updates again by an EM algorithm, which has analytical formulas. We compare the training of the conditional normalizing flow with the forward and reverse KL, and show that our model is able to incorporate information from many measurements, unlike previous approaches.	翻訳日:2024-02-06 16:24:18 公開日:2024-02-05
# 建築封筒検査のためのカラー・トゥ・サーマルAIによる一級異常検出 One-class anomaly detection through color-to-thermal AI for building envelope inspection ( http://arxiv.org/abs/2402.02963v1 ) ライセンス: Link先を確認	Polina Kurtser, Kailun Feng, Thomas Olofsson, Aitor De Andres	(参考訳) 建築封筒のサーモグラフィ検査における異常検出のためのラベルフリー手法を提案する。カラー画像からの熱分布のaiによる予測に基づいている。本手法は、予測された熱分布と実際の熱分布とのミスマッチの高い熱画像領域の1級分類器として機能する。このアルゴリズムは、トレーニングに使用するターゲットサンプルを選択することで、特定の特徴を正常または異常として識別することができる。この原理は, 異なる屋外温度で収集したデータを用いてアルゴリズムを訓練し, 熱橋の検出に繋がることを示した。本手法は, 定期的な建築検査や, 大規模検診の自動化のための移動プラットフォームと連携して, 作業者を支援する。 We present a label-free method for detecting anomalies during thermographic inspection of building envelopes. It is based on the AI-driven prediction of thermal distributions from color images. Effectively the method performs as a one-class classifier of the thermal image regions with high mismatch between the predicted and actual thermal distributions. The algorithm can learn to identify certain features as normal or anomalous by selecting the target sample used for training. We demonstrated this principle by training the algorithm with data collected at different outdoors temperature, which lead to the detection of thermal bridges. The method can be implemented to assist human professionals during routine building inspections or combined with mobile platforms for automating examination of large areas.	翻訳日:2024-02-06 16:24:06 公開日:2024-02-05
# GitBug-Java - 最近のJavaバグの再現可能なベンチマーク GitBug-Java: A Reproducible Benchmark of Recent Java Bugs ( http://arxiv.org/abs/2402.02961v1 ) ライセンス: Link先を確認	Andr\'e Silva, Nuno Saavedra, Martin Monperrus	(参考訳) バグフィックスベンチマークは自動プログラム修復(apr)とフォールトローカライゼーション(fl)の方法論を評価するのに不可欠である。しかし、欠陥4jによって例示される既存のベンチマークは、現代の開発プラクティスに沿った最近のバグ修正を組み込むために進化する必要がある。さらに、再現性は重要な科学的原則であり、バグフィックスベンチマークに欠けている。これらのギャップに対処するため、最近のJavaバグの再現可能なベンチマークであるGitBug-Javaを紹介します。 GitBug-Javaは、55の有名なオープンソースリポジトリのコミット履歴から抽出された199のバグを特徴としている。 GitBug-Javaを構築するための方法論は、完全に再現可能な環境におけるバグフィックスの保存を保証する。 GitBug-Javaはhttps://github.com/gitbugactions/gitbug-java.orgで公開しています。 Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with contemporary development practices. Moreover, reproducibility, a key scientific principle, has been lacking in bug-fix benchmarks. To address these gaps, we present GitBug-Java, a reproducible benchmark of recent Java bugs. GitBug-Java features 199 bugs extracted from the 2023 commit history of 55 notable open-source repositories. The methodology for building GitBug-Java ensures the preservation of bug-fixes in fully-reproducible environments. We publish GitBug-Java at https://github.com/gitbugactions/gitbug-java.	翻訳日:2024-02-06 16:23:54 公開日:2024-02-05
# AdaTreeFormer: 単一高解像度画像からの樹木数に対するショット領域適応 AdaTreeFormer: Few Shot Domain Adaptation for Tree Counting from a Single High-Resolution Image ( http://arxiv.org/abs/2402.02956v1 ) ライセンス: Link先を確認	Hamed Amini Amirkolaee, Miaojing Shi, Lianghua He, and Mark Mulligan	(参考訳) 写真測量やリモートセンシングの分野において、単一の空中画像や衛星画像のみを用いて樹木密度を推定・計数する処理は難しい課題である。しかし、森林管理において重要な役割を担っている。様々な地形の多種多様な木は、樹木の計数モデルがうまく機能することを著しく妨げている。本研究の目的は、十分なラベル付きツリーを持つソースドメインから学習し、限られた数のラベル付きツリーでターゲットドメインに適応するフレームワークを提案することである。我々の手法はAdaTreeFormerと呼ばれ、ソースとターゲットドメインからロバストな特徴を抽出する階層的特徴抽出方式を備えた1つの共有エンコーダを含んでいる。また、ソースドメインとターゲットドメインから自己ドメイン注意マップを抽出するサブネットと、クロスドメイン注意マップを抽出するサブネットの3つで構成されている。後者では,木密度マップの生成中に異なるドメインから関連情報を抽出するアテンション・ツー・アダプティブ・メカニズムを導入し,ソース・ターゲット領域の特徴を段階的に整列する階層的クロスドメイン特徴アライメントスキームを提案する。我々はまた、ソースドメインとターゲットドメインのギャップをさらに減らすために、フレームワークに敵対的学習を取り入れています。我々のAdaTreeFormerは,3つの木数データセットであるie Jiangsu,Yosemite,Londonを用いて,設計済みのドメイン適応タスクを6つ評価し,その性能を著しく向上させる。 The process of estimating and counting tree density using only a single aerial or satellite image is a difficult task in the fields of photogrammetry and remote sensing. However, it plays a crucial role in the management of forests. The huge variety of trees in varied topography severely hinders tree counting models to perform well. The purpose of this paper is to propose a framework that is learnt from the source domain with sufficient labeled trees and is adapted to the target domain with only a limited number of labeled trees. Our method, termed as AdaTreeFormer, contains one shared encoder with a hierarchical feature extraction scheme to extract robust features from the source and target domains. It also consists of three subnets: two for extracting self-domain attention maps from source and target domains respectively and one for extracting cross-domain attention maps. For the latter, an attention-to-adapt mechanism is introduced to distill relevant information from different domains while generating tree density maps; a hierarchical cross-domain feature alignment scheme is proposed that progressively aligns the features from the source and target domains. We also adopt adversarial learning into the framework to further reduce the gap between source and target domains. Our AdaTreeFormer is evaluated on six designed domain adaptation tasks using three tree counting datasets, ie Jiangsu, Yosemite, and London; and outperforms the state of the art methods significantly.	翻訳日:2024-02-06 16:23:44 公開日:2024-02-05
# 階層型情報共有Dec-POMDPの解法 - 汎用型ゲームアプローチ Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach ( http://arxiv.org/abs/2402.02954v1 ) ライセンス: Link先を確認	Johan Peralez, Aur\'elien Delage, Olivier Buffet, Jilles S. Dibangoye	(参考訳) 最近の理論では、マルチプレイヤーの部分観測可能なマルコフ決定プロセスが等価な単一プレイヤーゲームに変換され、単一ステージのサブゲームに分解することで、単一プレイヤーゲームを解決するための最適性の原理が適用可能である。しかし、このアプローチは、各シングルステージのサブゲームにおける全てのプレイヤーの決定変数を絡み合わせることで、二重指数複雑性を持つバックアップとなる。本稿では,階層的な情報共有の下での最適性を維持しつつ,これらの決定変数を解き放つ方法を示す。これを実現するため、我々は、より小さなサブゲームに分解することで、任意のシングルステージのサブゲームを解決するために最適性の原則を適用し、同時にシングルプレイヤーの決定を行えるようにする。我々のアプローチでは、広義のゲームは常に単一ステージのサブゲームに対する解決策として存在し、時間的複雑さを著しく減少させる。実験の結果,この結果を利用したアルゴリズムは,最適化を損なうことなく,より大規模なマルチプレイヤーゲームにスケールアップできることがわかった。 A recent theory shows that a multi-player decentralized partially observable Markov decision process can be transformed into an equivalent single-player game, enabling the application of \citeauthor{bellman}'s principle of optimality to solve the single-player game by breaking it down into single-stage subgames. However, this approach entangles the decision variables of all players at each single-stage subgame, resulting in backups with a double-exponential complexity. This paper demonstrates how to disentangle these decision variables while maintaining optimality under hierarchical information sharing, a prominent management style in our society. To achieve this, we apply the principle of optimality to solve any single-stage subgame by breaking it down further into smaller subgames, enabling us to make single-player decisions at a time. Our approach reveals that extensive-form games always exist with solutions to a single-stage subgame, significantly reducing time complexity. Our experimental results show that the algorithms leveraging these findings can scale up to much larger multi-player games without compromising optimality.	翻訳日:2024-02-06 16:23:17 公開日:2024-02-05
# androidマルウェア検出のための機械学習ソリューションの鍵を解く Unraveling the Key of Machine Learning Solutions for Android Malware Detection ( http://arxiv.org/abs/2402.02953v1 ) ライセンス: Link先を確認	Jiahao Liu, Jun Zeng, Fabio Pierazzi, Lorenzo Cavallaro, Zhenkai Liang	(参考訳) Androidのマルウェア検出は、悪意のあるアプリに対する最前線として機能する。機械学習(ML)の急速な進歩により、Android APKから悪意あるパターンを自動的にキャプチャする機能によって、MLベースのAndroidマルウェア検出が注目を集めている。これらの学習駆動手法はマルウェアの検出において有望な結果を報告している。しかし、現在の研究の進展に関する詳細な分析が欠如しているため、この分野の芸術の全体像を得るのは困難である。本稿では,MLベースのAndroidマルウェア検出において,経験的,定量的な分析を行った。まず文献を調査し,androidの機能工学とmlモデリングパイプラインに基づく分類学への貢献を分類した。次に,mlベースのandroidマルウェア検出のための汎用フレームワークを設計し,異なる研究コミュニティによる12の代表的なアプローチを再実装し,有効性,堅牢性,効率という3つの主要な側面から評価する。この評価によると、MLベースのアプローチは依然としてオープンな課題に直面しており、より強力なMLモデルがより良いマルウェア検出器を設計するための銀の弾丸ではないような洞察力のある発見を提供する。本研究の成果をさらに要約し,今後の研究の指針として推奨する。 Android malware detection serves as the front line against malicious apps. With the rapid advancement of machine learning (ML), ML-based Android malware detection has attracted increasing attention due to its capability of automatically capturing malicious patterns from Android APKs. These learning-driven methods have reported promising results in detecting malware. However, the absence of an in-depth analysis of current research progress makes it difficult to gain a holistic picture of the state of the art in this area. This paper presents a comprehensive investigation to date into ML-based Android malware detection with empirical and quantitative analysis. We first survey the literature, categorizing contributions into a taxonomy based on the Android feature engineering and ML modeling pipeline. Then, we design a general-propose framework for ML-based Android malware detection, re-implement 12 representative approaches from different research communities, and evaluate them from three primary dimensions, i.e., effectiveness, robustness, and efficiency. The evaluation reveals that ML-based approaches still face open challenges and provides insightful findings like more powerful ML models are not the silver bullet for designing better malware detectors. We further summarize our findings and put forth recommendations to guide future research.	翻訳日:2024-02-06 16:22:55 公開日:2024-02-05
# ソフトマックスゲーティングにおけるエキスパートの最小二乗推定について On Least Squares Estimation in Softmax Gating Mixture of Experts ( http://arxiv.org/abs/2402.02952v1 ) ライセンス: Link先を確認	Huy Nguyen and Nhat Ho and Alessandro Rinaldo	(参考訳) mixed of experts (moe) modelは、より複雑で表現力のあるモデルを形成するために、softmax gating関数を使用して複数のエキスパートネットワークを集約する統計的機械学習設計である。スケーラビリティのため、いくつかのアプリケーションで一般的に使われているが、moeモデルの数学的および統計的な性質は複雑で分析が難しい。その結果、以前の理論研究は主に確率的moeモデルに焦点をあて、データがガウス的moeモデルから生成されるという非現実的な仮定を課している。本研究では,回帰モデルに基づいてデータをサンプリングした決定論的moeモデルにおいて,最小二乗推定器(lse)の性能について検討する。我々は,各種専門家関数の収束挙動を特徴付ける強識別可能性という条件を定式化する。強く識別可能な専門家を推定する速度、すなわち、活性化関数 $\mathrm{sigmoid}(\cdot)$ と $\tanh(\cdot)$ が多項式の専門家のそれよりも大幅に速いことが示され、驚くべき低速な推定率を示す。我々の発見は専門家の選択に重要な意味を持つ。 Mixture of experts (MoE) model is a statistical machine learning design that aggregates multiple expert networks using a softmax gating function in order to form a more intricate and expressive model. Despite being commonly used in several applications owing to their scalability, the mathematical and statistical properties of MoE models are complex and difficult to analyze. As a result, previous theoretical works have primarily focused on probabilistic MoE models by imposing the impractical assumption that the data are generated from a Gaussian MoE model. In this work, we investigate the performance of the least squares estimators (LSE) under a deterministic MoE model where the data are sampled according to a regression model, a setting that has remained largely unexplored. We establish a condition called strong identifiability to characterize the convergence behavior of various types of expert functions. We demonstrate that the rates for estimating strongly identifiable experts, namely the widely used feed forward networks with activation functions $\mathrm{sigmoid}(\cdot)$ and $\tanh(\cdot)$, are substantially faster than those of polynomial experts, which we show to exhibit a surprising slow estimation rate. Our findings have important practical implications for expert selection.	翻訳日:2024-02-06 16:22:37 公開日:2024-02-05
# 動的ビザンチン・ロバスト学習 : ビザンチン労働者への適応 Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers ( http://arxiv.org/abs/2402.02951v1 ) ライセンス: Link先を確認	Ron Dorfman, Naseem Yehya, Kfir Y. Levy	(参考訳) byzantine-robust learningは、フォールトトレラントな分散機械学習フレームワークとして登場した。しかし、ほとんどの技法は静的な設定を考慮し、ビザンチンマシンのアイデンティティは学習プロセス中に固定されている。この仮定は、一時的な機能不全や標的の時空攻撃を含む現実世界の動的ビザンチンの挙動を捉えていない。この制限に対処するために、$\textsf{dynabro}$ -\mathcal{o}(\sqrt{t})$のビザンチンアイデンティティ変更(ここで$t$はトレーニングラウンドの総数)に耐えられる新しいメソッドを提案し、静的設定の漸近収束率と一致させる。本手法は,マルチレベルモンテカルロ勾配推定手法と作業者の更新の堅牢な集約を組み合わせ,動的ビザンティン戦略からのバイアスを制限するためにフェールセーフフィルタを組み込む。さらに,適応学習率を活用することで,ビザンティン労働者の割合を知る必要がなくなる。 Byzantine-robust learning has emerged as a prominent fault-tolerant distributed machine learning framework. However, most techniques consider the static setting, wherein the identity of Byzantine machines remains fixed during the learning process. This assumption does not capture real-world dynamic Byzantine behaviors, which may include transient malfunctions or targeted temporal attacks. Addressing this limitation, we propose $\textsf{DynaBRO}$ -- a new method capable of withstanding $\mathcal{O}(\sqrt{T})$ rounds of Byzantine identity alterations (where $T$ is the total number of training rounds), while matching the asymptotic convergence rate of the static setting. Our method combines a multi-level Monte Carlo (MLMC) gradient estimation technique with robust aggregation of worker updates and incorporates a fail-safe filter to limit bias from dynamic Byzantine strategies. Additionally, by leveraging an adaptive learning rate, our approach eliminates the need for knowing the percentage of Byzantine workers.	翻訳日:2024-02-06 16:22:14 公開日:2024-02-05
# 分散検出のためのカーネルpca Kernel PCA for Out-of-Distribution Detection ( http://arxiv.org/abs/2402.02949v1 ) ライセンス: Link先を確認	Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang and Jie Yang	(参考訳) OoD(Out-of-Distribution)検出はディープニューラルネットワーク(DNN)の信頼性に不可欠である。 In-Distribution(In-Distribution)データからOoDデータを検出する際,DNNの特徴にPCA(Principal Component Analysis)が直接適用されないことを示す。 PCAの失敗は、OoD と InD に居住するネットワーク機能は、単に線形部分空間で進行することで十分に分離されず、代わりに適切な非線形写像によって解決できることを示している。本研究では,ood検出のためにkernel pca(kpca)のフレームワークを利用し,oodとindの特徴が著しく異なるパターンで割り当てられる部分空間を求める。 kpca における非線形カーネルを誘導する2つの特徴写像を考案し、主成分にまたがる部分空間における ind と ood データの分離性を提唱する。テストサンプルが与えられた場合、そのような部分空間の再構成誤差は$\mathcal{O}(1)$Time complexity in inferenceで検出結果を効率的に得るために使用される。複数のOoDデータセットとネットワーク構造に対する大規模な実験結果により、KPCAベースの検出器の効率性と有効性は最先端のOoD検出性能で検証できる。 Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proceeding in a linear subspace, which instead can be resolved through proper nonlinear mappings. In this work, we leverage the framework of Kernel PCA (KPCA) for OoD detection, seeking subspaces where OoD and InD features are allocated with significantly different patterns. We devise two feature mappings that induce non-linear kernels in KPCA to advocate the separability between InD and OoD data in the subspace spanned by the principal components. Given any test sample, the reconstruction error in such subspace is then used to efficiently obtain the detection result with $\mathcal{O}(1)$ time complexity in inference. Extensive empirical results on multiple OoD data sets and network structures verify the superiority of our KPCA-based detector in efficiency and efficacy with state-of-the-art OoD detection performances.	翻訳日:2024-02-06 16:21:53 公開日:2024-02-05
# RydbergエキシトンによるCu2O単結晶の大規模キャラクタリゼーション Large-scale characterization of Cu2O monocrystals via Rydberg excitons ( http://arxiv.org/abs/2402.02948v1 ) ライセンス: Link先を確認	Kerwan Morin, Delphine Lagarde, Ang\'elique Gillet, Xavier Marie, Thomas Boulier	(参考訳) 励起子のライドバーグ状態はミクロンに達し、非常に純粋な結晶を必要とする。酸化銅(Cu2O)中のRydberg励起体の高速かつ空間分解特性を,大域でサブミクロン分解能を有する実験法を提案する。提案手法では, 全試料をカメラで照明・撮像し, 可動部を必要とせず, 空間分解型共振吸収分光法を実現する。これにより、Rydbergの励起子のエネルギー、線幅、ピーク吸収などの空間地図が得られ、サンプル全体の総合的な品質評価が1ショットで得られる。さらに, 試料の発光を同じ領域に撮像することにより, 帯電した酸素空孔のスペクトル品質マップと発光マップとの強い関係を確立する。この結果、共鳴分光法により得られた結果と密接に一致する独立な発光ベースの品質マップが得られる。以上の結果から,天然Cu2O結晶中のRydbergエキシトンは,光学活性酸素空孔の影響が強く,容易にマッピングできることがわかった。これら2つの相補的手法はcu2oの結晶特性に関する貴重な知見を提供する。 Rydberg states of excitons can reach microns in size and require extremely pure crystals. We introduce an experimental method for the rapid and spatially-resolved characterization of Rydberg excitons in copper oxide (Cu2O) with sub-micron resolution over large zones. Our approach involves illuminating and imaging the entire sample on a camera to realize a spatially-resolved version of resonant absorption spectroscopy, without any mobile part. This yields spatial maps of Rydberg exciton properties, including their energy, linewidth and peak absorption, providing a comprehensive quality assessment of the entire sample in a single shot. Furthermore, by imaging the sample photoluminescence over the same zone, we establish a strong relationship between the spectral quality map and the photoluminescence map of charged oxygen vacancies. This results in an independent, luminescence-based quality map that closely matches the results obtained through resonant spectroscopy. Our findings reveal that Rydberg excitons in natural Cu2O crystals are predominantly influenced by optically-active charged oxygen vacancies, which can be easily mapped. Together, these two complementary methods provide valuable insights into Cu2O crystal properties.	翻訳日:2024-02-06 16:21:32 公開日:2024-02-05
# HoughToRadon変換:投影空間における特徴改善のための新しいニューラルネットワーク層 HoughToRadon Transform: New Neural Network Layer for Features Improvement in Projection Space ( http://arxiv.org/abs/2402.02946v1 ) ライセンス: Link先を確認	Alexandra Zhabitskaya, Alexander Sheshkus, and Vladimir L. Arlazarov	(参考訳) 本稿では,Hough ToRadon 変換層を導入し,Hough Transform を組み込んだニューラルネットワークの高速化を図り,セマンティックな画像分割問題の解法を提案する。ハフ変換層の後に置くことで、"inner"畳み込みは、より小さな処理された画像領域や、角度とシフトによるパラメータ空間の線形性といった、新しい有益な特性を持つ修正された特徴マップを受け取る。これらの性質はハフ変換だけでは示されなかった。さらに、HoughToRadon変換層は、2つの新しいパラメータを使って中間特徴写像のサイズを調整でき、その結果のニューラルネットワークの速度と品質のバランスをとることができる。オープンなMIDV-500データセットに対する実験により、この新しい手法は文書セグメンテーションタスクの時間節約につながり、最先端の97.7%の精度を実現し、より複雑なHoughEncoderを上回った。 In this paper, we introduce HoughToRadon Transform layer, a novel layer designed to improve the speed of neural networks incorporated with Hough Transform to solve semantic image segmentation problems. By placing it after a Hough Transform layer, "inner" convolutions receive modified feature maps with new beneficial properties, such as a smaller area of processed images and parameter space linearity by angle and shift. These properties were not presented in Hough Transform alone. Furthermore, HoughToRadon Transform layer allows us to adjust the size of intermediate feature maps using two new parameters, thus allowing us to balance the speed and quality of the resulting neural network. Our experiments on the open MIDV-500 dataset show that this new approach leads to time savings in document segmentation tasks and achieves state-of-the-art 97.7% accuracy, outperforming HoughEncoder with larger computational complexity.	翻訳日:2024-02-06 16:21:13 公開日:2024-02-05
# コンピュータビジョンのためのハイブリッドCNNとViTsアーキテクチャのシナジーを探る Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey ( http://arxiv.org/abs/2402.02941v1 ) ライセンス: Link先を確認	Haruna Yunusa, Shiyin Qin, Abdulrahman Hamman Adama Chukkol, Abdulganiyu Abdu Yusuf, Isah Bello, Adamu Lawan	(参考訳) 畳み込みニューラルネットワーク(CNN)とビジョントランスフォーマー(ViT)アーキテクチャのハイブリッドは画期的なアプローチとして登場し、コンピュータビジョン(CV)の境界を押し進めている。この総合的なレビューは、最先端のハイブリッドCNN-ViTアーキテクチャに関する文献を徹底的に調べ、これらの2つのアプローチの相乗効果を探求する。本調査の主な内容は,(1)バニラCNNとViTの背景,(2)CNNとViTsモデルの統合による相乗効果を探求する様々な分類学的ハイブリッドデザインの体系的レビュー,(3)異なるハイブリッドアーキテクチャ間の比較分析とアプリケーションタスク固有の相乗効果,(4)ハイブリッドモデルの課題と今後の方向性,(5) 最後に,重要な発見と推奨事項をまとめて結論づける。このようなハイブリッドcvアーキテクチャの調査を通じて、この調査は、cnnとvitsの複雑なダイナミクスとcvアーキテクチャの将来を形成する上での集団的影響をより深く理解する上で、ガイドとなることを目標としている。 The hybrid of Convolutional Neural Network (CNN) and Vision Transformers (ViT) architectures has emerged as a groundbreaking approach, pushing the boundaries of computer vision (CV). This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT architectures, exploring the synergies between these two approaches. The main content of this survey includes: (1) a background on the vanilla CNN and ViT, (2) systematic review of various taxonomic hybrid designs to explore the synergy achieved through merging CNNs and ViTs models, (3) comparative analysis and application task-specific synergy between different hybrid architectures, (4) challenges and future directions for hybrid models, (5) lastly, the survey concludes with a summary of key findings and recommendations. Through this exploration of hybrid CV architectures, the survey aims to serve as a guiding resource, fostering a deeper understanding of the intricate dynamics between CNNs and ViTs and their collective impact on shaping the future of CV architectures.	翻訳日:2024-02-06 16:20:55 公開日:2024-02-05
# UniMem: 長期の大規模言語モデルの統一ビューを目指して UniMem: Towards a Unified View of Long-Context Large Language Models ( http://arxiv.org/abs/2402.03009v1 ) ライセンス: Link先を確認	Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, Zhiyuan Liu, Maosong Sun	(参考訳) 長文処理は、大きな言語モデルの適用性を制限する重要な能力である。大規模言語モデル(llm)の長期文脈処理能力を向上させるための様々な方法が存在するが、それらは孤立した方法で開発され、系統的な分析や強みの統合が欠如しており、さらなる発展を妨げている。本稿では,LLMのメモリ拡張の観点から,既存の長文メソッドを再構成する統一フレームワークUniMemを紹介する。 UniMemは、メモリ管理、メモリ書き込み、メモリ読み込み、メモリ注入の4つの重要な側面によって特徴づけられ、様々な長文メソッドを理解するための体系的な理論を提供する。我々は,UniMemに基づく16の既存手法を再構成し,その設計原理と強みを明らかにするために,Transformer-XL,Memorizing Transformer,RTT,Longformerの4つの代表的な方法を分析する。これらの分析に基づいて,これらのアルゴリズムの強みを統合する革新的な手法であるunimixを提案する。実験の結果、unimixは、ベースラインよりもかなり低いパープレキシティで長いコンテキストを扱うのに優れた性能を発揮することがわかった。 Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from the view of memory augmentation of LLMs. UniMem is characterized by four key dimensions: Memory Management, Memory Writing, Memory Reading, and Memory Injection, providing a systematic theory for understanding various long-context methods. We reformulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.	翻訳日:2024-02-06 16:11:41 公開日:2024-02-05
# DexDiffuser:拡散モデルによるDexterous Graspsの生成 DexDiffuser: Generating Dexterous Grasps with Diffusion Models ( http://arxiv.org/abs/2402.02989v1 ) ライセンス: Link先を確認	Zehang Weng, Haofei Lu, Danica Kragic, Jens Lundell	(参考訳) 本稿では,部分的対象点雲の把握を生成・評価・洗練する新しいデクスタラス把持手法であるdexdiffuserを提案する。 dexdiffuserは条件拡散に基づくgrab sampler dexsamplerとdexterous grab evaluator dexevaluatorを含む。 DexSamplerは、ランダムにサンプリングされたグリップの反復的 denoising により、オブジェクトポイントクラウド上で条件付けられた高品質なグリップを生成する。 Evaluator-Guided Diffusion (EGD) と Evaluator-based Sampling Refinement (ESR) の2つのグリップリファインメント戦略を導入する。 The Allegro Handのシミュレーションと実世界の実験により、DexDiffuserは平均21.71-22.20\%の精度で、最先端のマルチフィンガーグリップ生成法FFHNetよりも優れた性能を示した。 We introduce DexDiffuser, a novel dexterous grasping method that generates, evaluates, and refines grasps on partial object point clouds. DexDiffuser includes the conditional diffusion-based grasp sampler DexSampler and the dexterous grasp evaluator DexEvaluator. DexSampler generates high-quality grasps conditioned on object point clouds by iterative denoising of randomly sampled grasps. We also introduce two grasp refinement strategies: Evaluator-Guided Diffusion (EGD) and Evaluator-based Sampling Refinement (ESR). Our simulation and real-world experiments on the Allegro Hand consistently demonstrate that DexDiffuser outperforms the state-of-the-art multi-finger grasp generation method FFHNet with an, on average, 21.71--22.20\% higher grasp success rate.	翻訳日:2024-02-06 16:11:19 公開日:2024-02-05
# GPTモデルに対する会話再構成攻撃 Conversation Reconstruction Attack Against GPT Models ( http://arxiv.org/abs/2402.02987v1 ) ライセンス: Link先を確認	Junjie Chu and Zeyang Sha and Michael Backes and Yang Zhang	(参考訳) 近年,GPTシリーズモデルに代表される大規模言語モデル (LLM) の分野では,大幅な進歩が見られた。タスク実行を最適化するために、ユーザはクラウド環境にホストされたGPTモデルとマルチラウンドで会話することが多い。これらの複数ラウンドの会話は、潜在的にプライベートな情報と重複し、クラウド内での送信とストレージを必要とする。しかし、この作戦パラダイムは追加のアタックサーフェスを導入する。本稿では,GPTモデルを対象とした特定の会話再構成攻撃について紹介する。提案した会話再構築攻撃は,セッションをハイジャックし,会話を再構築する2つのステップから構成される。その後、GPTモデルが提案された攻撃を受けると、会話に固有のプライバシーリスクを徹底的に評価する。しかし、GPT-4は提案された攻撃に対して一定の堅牢性を示す。次に,従来の会話,特にUNR攻撃とPBU攻撃の再構築を目的とした2つの高度な攻撃を導入する。実験結果から,PBU攻撃は全モデルで有意な性能を示し,意味的類似性スコアは0.60を超え,UNR攻撃はGPT-3.5のみに有効であることが示唆された。以上の結果から,GPTモデルに関わる会話に関連するプライバシーリスクの懸念が浮き彫りになり,これらのモデルが持つ顕著な能力を誤用しないように,コミュニティの注意を引こうとしている。関連大型言語モデルのサプライヤに対して,当社の調査結果を責任を持って開示します。 In recent times, significant advancements have been made in the field of large language models (LLMs), represented by GPT series models. To optimize task execution, users often engage in multi-round conversations with GPT models hosted in cloud environments. These multi-round conversations, potentially replete with private information, require transmission and storage within the cloud. However, this operational paradigm introduces additional attack surfaces. In this paper, we first introduce a specific Conversation Reconstruction Attack targeting GPT models. Our introduced Conversation Reconstruction Attack is composed of two steps: hijacking a session and reconstructing the conversations. Subsequently, we offer an exhaustive evaluation of the privacy risks inherent in conversations when GPT models are subjected to the proposed attack. However, GPT-4 demonstrates certain robustness to the proposed attacks. We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack. Our experimental findings indicate that the PBU attack yields substantial performance across all models, achieving semantic similarity scores exceeding 0.60, while the UNR attack is effective solely on GPT-3.5. Our results reveal the concern about privacy risks associated with conversations involving GPT models and aim to draw the community's attention to prevent the potential misuse of these models' remarkable capabilities. We will responsibly disclose our findings to the suppliers of related large language models.	翻訳日:2024-02-06 16:10:55 公開日:2024-02-05
# 道路シーン解析のための高分解能uav画像の教師なし意味セグメンテーション Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing ( http://arxiv.org/abs/2402.02985v1 ) ライセンス: Link先を確認	Zihan Ma, Yongshang Li, Ronggui Ma, Chen Liang	(参考訳) UAV画像で道路シーンを解析する際に2つの課題が提示される。まず,UAV画像の高解像度化により,処理が困難になる。第二に、教師付きディープラーニング手法は、堅牢で正確なモデルをトレーニングするために、大量の手動アノテーションを必要とする。本稿では,近年のビジョン言語モデルと基礎的コンピュータビジョンモデルを活用する,教師なしの道路解析フレームワークを導入し,まず,超高解像度UAV画像を効率よく処理し,画像の関心領域を迅速に検出するビジョン言語モデルを提案する。その後、ビジョンファウンデーションモデルSAMを用いて、カテゴリ情報のない道路領域のマスクを生成する。その後、自己教師表現学習ネットワークは、すべてのマスク領域から特徴表現を抽出する。最後に、これらの特徴表現をクラスタ化するために教師なしクラスタリングアルゴリズムを適用し、各クラスタにIDを割り当てる。マスク領域は対応するidと結合して初期擬似ラベルを生成し、正規意味セグメンテーションのための反復的な自己学習プロセスを開始する。提案手法は,手動アノテーションを使わずに開発データセット上で89.96%のmiouを実現する。特に注目すべきなのは、提案手法の異常な柔軟性であり、人間定義のカテゴリの制限を超え、データセット自体から新しいカテゴリの知識を得ることができる。 Two challenges are presented when parsing road scenes in UAV images. First, the high resolution of UAV images makes processing difficult. Second, supervised deep learning methods require a large amount of manual annotations to train robust and accurate models. In this paper, an unsupervised road parsing framework that leverages recent advances in vision language models and fundamental computer vision model is introduced.Initially, a vision language model is employed to efficiently process ultra-large resolution UAV images to quickly detect road regions of interest in the images. Subsequently, the vision foundation model SAM is utilized to generate masks for the road regions without category information. Following that, a self-supervised representation learning network extracts feature representations from all masked regions. Finally, an unsupervised clustering algorithm is applied to cluster these feature representations and assign IDs to each cluster. The masked regions are combined with the corresponding IDs to generate initial pseudo-labels, which initiate an iterative self-training process for regular semantic segmentation. The proposed method achieves an impressive 89.96% mIoU on the development dataset without relying on any manual annotation. Particularly noteworthy is the extraordinary flexibility of the proposed method, which even goes beyond the limitations of human-defined categories and is able to acquire knowledge of new categories from the dataset itself.	翻訳日:2024-02-06 16:10:33 公開日:2024-02-05
# 変装した自由フェルミオンをもつ量子回路 Quantum circuits with free fermions in disguise ( http://arxiv.org/abs/2402.02984v1 ) ライセンス: Link先を確認	Bal\'azs Pozsgay	(参考訳) 近年、ヨルダン-ウィグナー変換によって解くことができないにもかかわらず、自由フェルミイオンスペクトルを持つスピンチェーンモデルの複数のファミリーが発見された。代わりに、自由フェルミオンは、かなり複雑な構造の結果生じる。本研究では,この問題の量子回路定式化について考察する。各モデルの局所ハミルトニアンの項から構築された局所ユニタリゲートを用いて回路を構築し、質問する: どの回路ジオメトリ(ゲート列)が自由フェルミオンスペクトルに導くか? 主な例はフェンドリーの4-フェルミオンモデルであり、様々な幾何学を持つ自由フェルミオン回路を構築する。ある場合には自由フェルミオン性を証明するが、他の測地では数値的に確認する。驚くべきことに、多くの標準的なレンガ加工回路は自由フェルミオンではないが、ある種の対称構造は特定できる。 Recently multiple families of spin chain models were found, which have a free fermionic spectrum, even though they are not solvable by a Jordan-Wigner transformation. Instead, the free fermions emerge as a result of a rather intricate construction. In this work we consider the quantum circuit formulation of the problem. We construct circuits using local unitary gates built from the terms in the local Hamiltonians of the respective models, and ask the question: which circuit geometries (sequence of gates) lead to a free fermionic spectrum? Our main example is the 4-fermion model of Fendley, where we construct free fermionic circuits with various geometries. In certain cases we prove the free fermionic nature, while for other geometries we confirm it numerically. Surprisingly, we find that many standard brickwork circuits are not free fermionic, but we identify certain symmetric constructions which are.	翻訳日:2024-02-06 16:10:13 公開日:2024-02-05
# ロボットマニピュレータの故障診断と耐故障性制御方式に関するレビュー:AI,機械学習,ディジタルツインの最近の進歩 Review on Fault Diagnosis and Fault-Tolerant Control Scheme for Robotic Manipulators: Recent Advances in AI, Machine Learning, and Digital Twin ( http://arxiv.org/abs/2402.02980v1 ) ライセンス: Link先を確認	Md Muzakkir Quamar and Ali Nasir	(参考訳) この総合的なレビュー記事は、ロボットマニピュレータに適した耐故障制御(FTC)の複雑な領域を掘り下げている。我々の調査はFTCの歴史的進化にまたがり、その開発を時間とともに追跡し、人工知能(AI)、機械学習(ML)、デジタルツイン技術(DTT)といった最先端技術の統合によって引き起こされた最近のブレークスルーを注意深く調べています。この記事は、ロボットマニピュレータ制御とフォールトトレランスのランドスケープにおいて、これらの現代のトレンドが果たす変化的影響に特に重点を置いている。歴史的文脈を掘り下げることで、FTCの仕組みの進化を包括的に理解することを目的としています。この旅は、モデルベースと信号ベースのスキームからセンサーの役割への移行を含み、AI、ML、DTTによって実現された現在のパラダイムシフトを探求するためのステージを設定します。この物語は、ロボットマニピュレータの領域内での耐故障性を高めるために、これらの先進技術とそれらの応用の間の複雑な相互作用を見極めながら展開する。本稿は,これらの進歩が近年出現した新しい方法論,技法,応用に与える影響を批判的に評価する。本稿では,ロボットマニピュレータのコンテキスト内での故障診断と耐故障性制御の現状を包括的に把握し,AI,ML,DTTの進歩というより広範な枠組みの中で探究することを目的とする。歴史的基礎と現代的イノベーションの両方を綿密に調べることで、このレビューは既存の知識体系に大きく貢献し、研究者、実践家、そしてロボットマニピュレータ制御の動的な景観をナビゲートする愛好家たちに貴重な洞察を提供する。 This comprehensive review article delves into the intricate realm of fault-tolerant control (FTC) schemes tailored for robotic manipulators. Our exploration spans the historical evolution of FTC, tracing its development over time, and meticulously examines the recent breakthroughs fueled by the synergistic integration of cutting-edge technologies such as artificial intelligence (AI), machine learning (ML), and digital twin technologies (DTT). The article places a particular emphasis on the transformative influence these contemporary trends exert on the landscape of robotic manipulator control and fault tolerance. By delving into the historical context, our aim is to provide a comprehensive understanding of the evolution of FTC schemes. This journey encompasses the transition from model-based and signal-based schemes to the role of sensors, setting the stage for an exploration of the present-day paradigm shift enabled by AI, ML, and DTT. The narrative unfolds as we dissect the intricate interplay between these advanced technologies and their applications in enhancing fault tolerance within the domain of robotic manipulators. Our review critically evaluates the impact of these advancements, shedding light on the novel methodologies, techniques, and applications that have emerged in recent times. The overarching goal of this article is to present a comprehensive perspective on the current state of fault diagnosis and fault-tolerant control within the context of robotic manipulators, positioning our exploration within the broader framework of AI, ML, and DTT advancements. Through a meticulous examination of both historical foundations and contemporary innovations, this review significantly contributes to the existing body of knowledge, offering valuable insights for researchers, practitioners, and enthusiasts navigating the dynamic landscape of robotic manipulator control.	翻訳日:2024-02-06 16:09:58 公開日:2024-02-05

Title

Authors

Abstract

論文公表日・翻訳日

# 仮想量子資源蒸留 : 一般的な枠組みと応用

Virtual quantum resource distillation: General framework and applications ( http://arxiv.org/abs/2404.13048v1 )

ライセンス: Link先を確認

Ryuji Takagi, Xiao Yuan, Bartosz Regula, Mile Gu,

(参考訳) 我々は,従来の量子資源蒸留を,古典的後処理の力を統合することで拡張する,仮想資源蒸留の一般的な枠組み,すなわち [Phys. Lett. 132, 050203 (2024)] に提案されている代替蒸留戦略を開発する。ここで提示されるフレームワークは、量子状態だけでなく、量子チャネルや高次プロセスのような動的量子オブジェクトにも適用することができる。計算可能な半定値プログラムの形式での仮想資源蒸留の性能評価とベンチマーク,および運用上の動機付け量について述べる。我々は,量子メモリ,量子通信,非マルコフ力学などの動的資源を含む設定だけでなく,絡み合い,コヒーレンス,マジックなどの標準的な資源理論を含む,様々な具体的な設定に一般の枠組みを適用した。確率蒸留の枠組みについても論じる。

We develop the general framework of virtual resource distillation -- an alternative distillation strategy proposed in [Phys. Rev. Lett. 132, 050203 (2024)], which extends conventional quantum resource distillation by integrating the power of classical postprocessing. The framework presented here is applicable not only to quantum states, but also dynamical quantum objects such as quantum channels and higher-order processes. We provide a general characterization and benchmarks for the performance of virtual resource distillation in the form of computable semidefinite programs as well as several operationally motivated quantities. We apply our general framework to various concrete settings of interest, including standard resource theories such as entanglement, coherence, and magic, as well as settings involving dynamical resources such as quantum memory, quantum communication, and non-Markovian dynamics. The framework of probabilistic distillation is also discussed.

翻訳日:2024-07-01 11:58:46 公開日:2024-02-05

# ニューラルロバストネスの探索メカニズム-幾何とスペクトルの間の橋渡しを探る

Exploring mechanisms of Neural Robustness: probing the bridge between geometry and spectrum ( http://arxiv.org/abs/2405.00679v1 )

ライセンス: Link先を確認

Konstantin Holzhausen, Mia Merlid, Håkon Olav Torvik, Anders Malthe-Sørenssen, Mikkel Elle Lepperød,

(参考訳) バックプロパゲーションに最適化された人工ニューラルネットワークは、正確性はあるものの堅牢性に欠けており、その安全性に影響を及ぼす予期せぬ行動を引き起こす。生体神経系はこれらの問題をすでに解決している。したがって、堅牢性の生物学的メカニズムを理解することは、信頼できる安全なシステムを構築するための重要なステップである。人工モデルとは異なり、生物学的ニューロンは隣の細胞活動に基づいて接続を調整する。神経表現におけるロバスト性は、符号化多様体の滑らかさと相関すると仮定される。最近の研究は、マウスの一次視覚野を観察したパワーロー共分散スペクトルが、表現の正確性と堅牢性の間のバランスの取れたトレードオフを示していることを示唆している。ここでは、勝者を持つ教師なし局所学習モデルが全てのダイナミクスにそのような力の法則表現を学習させることを示し、今後の研究でそのような特性を持つ力学モデルを提供する。本研究の目的は, 神経表現における幾何学的特徴, スペクトル特性, 頑健性, 表現性の間の相互作用を理解することである。そこで, 重み, ジャコビアン, スペクトル正則化を用いて, 表現の滑らかさとスペクトルの関連性を検討した。我々の研究は、生物と人工両方のシステムにおいて、パワーロースペクトルと最適にスムーズなエンコーディングの基礎となるメカニズムの研究の基礎となる。得られた知見は、哺乳類の脳で堅牢なニューラルネットワークを実現するメカニズムを解明し、より安定で信頼性の高い人工システムの発達を知らせる可能性がある。

Backpropagation-optimized artificial neural networks, while precise, lack robustness, leading to unforeseen behaviors that affect their safety. Biological neural systems do solve some of these issues already. Thus, understanding the biological mechanisms of robustness is an important step towards building trustworthy and safe systems. Unlike artificial models, biological neurons adjust connectivity based on neighboring cell activity. Robustness in neural representations is hypothesized to correlate with the smoothness of the encoding manifold. Recent work suggests power law covariance spectra, which were observed studying the primary visual cortex of mice, to be indicative of a balanced trade-off between accuracy and robustness in representations. Here, we show that unsupervised local learning models with winner takes all dynamics learn such power law representations, providing upcoming studies a mechanistic model with that characteristic. Our research aims to understand the interplay between geometry, spectral properties, robustness, and expressivity in neural representations. Hence, we study the link between representation smoothness and spectrum by using weight, Jacobian and spectral regularization while assessing performance and adversarial robustness. Our work serves as a foundation for future research into the mechanisms underlying power law spectra and optimally smooth encodings in both biological and artificial systems. The insights gained may elucidate the mechanisms that realize robust neural networks in mammalian brains and inform the development of more stable and reliable artificial systems.

翻訳日:2024-07-01 11:19:45 公開日:2024-02-05

# 大規模クラウドシステムにおける依存性認識インシデントリンク

Dependency Aware Incident Linking in Large Cloud Systems ( http://arxiv.org/abs/2403.18639v1 )

ライセンス: Link先を確認

Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan Rajmohan,

(参考訳) 信頼性の高い努力にもかかわらず、大規模クラウドサービスは必然的に、サービスの可用性と顧客満足度に大きな影響を与える生産インシデントを経験します。さらに悪いことに、多くの場合、1つのインシデントが複数のダウンストリーム障害を引き起こします。多くの場合、オンコールエンジニア(OCE)は、これらのインシデントをサイロで調査し、大量の手動の爪を発生させ、全体的なタイム・トゥ・ミディゲートインシデントを増加させる。したがって,効率的なインシデントリンクモデルの開発は,大規模な機能停止を迅速に解決し,オンコール疲労を軽減するために,関連するインシデントをクラスタにグループ化する上で極めて重要である。既存のインシデントリンク手法は、主にインシデント(タイトル、説明、重大さ、影響のあるコンポーネントなど)のテキスト情報とコンテキスト情報を活用しているため、サービス間の依存性を活用できない。本稿では、テキストおよびサービス依存グラフ情報を活用する依存性対応インシデントリンク(DiLink)フレームワークを提案し、同一サービスから来るインシデントリンクの精度とカバレッジを向上させるとともに、異なるサービスやワークロードからもたらされるインシデントリンクのカバレッジを改善する。さらに,Orthogonal Procrustesを用いてマルチモーダル(テキストおよびグラフィカル)データの埋め込みを整列する手法を提案する。 Microsoftの5つのワークロードによる実世界のインシデントに対する大規模な実験結果によると、アライメントメソッドのF1スコアは0.96(現在の最先端メソッドよりも14%向上)である。また、これらの5つのワークロードから610のサービスにこのソリューションをデプロイして、インシデント管理の改善と手作業による爪の削減を継続的にサポートしています。

Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCEs) examine these incidents in silos that lead to significant amount of manual toil and increase the overall time-to-mitigate incidents. Therefore, developing efficient incident linking models is of paramount importance for grouping related incidents into clusters so as to quickly resolve major outages and reduce on-call fatigue. Existing incident linking methods mostly leverages textual and contextual information of incidents (e.g., title, description, severity, impacted components), thus failing to leverage the inter-dependencies between services. In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads. Furthermore, we propose a novel method to align the embeddings of multi-modal (i.e., textual and graphical) data using Orthogonal Procrustes. Extensive experimental results on real-world incidents from 5 workloads of Microsoft demonstrate that our alignment method has an F1-score of 0.96 (14% gain over current state-of-the-art methods). We are also in the process of deploying this solution across 610 services from these 5 workloads for continuously supporting OCEs improving incident management and reducing manual toil.

翻訳日:2024-04-01 02:34:48 公開日:2024-02-05

# opML: ブロックチェーン上での最適機械学習

opML: Optimistic Machine Learning on Blockchain ( http://arxiv.org/abs/2401.17555v2 )

ライセンス: Link先を確認

KD Conway, Cathie So, Xiaohang Yu, Kartin Wong,

(参考訳) マシンラーニングとブロックチェーンテクノロジの統合は、分散化、セキュア、透過的なAIサービスのビジョンによって、関心が高まっているのを目撃している。この文脈では、ブロックチェーンシステムがAIモデル推論を実行するための革新的なアプローチであるopML(Optimistic Machine Learning on chain)を導入します。 opMLには、楽観的なロールアップシステムを思い出させる、インタラクティブな不正証明プロトコルがある。このメカニズムにより、MLサービスに対する分散的で検証可能なコンセンサスが保証され、信頼性と透明性が向上する。 zkML(Zero-Knowledge Machine Learning)とは異なり、opMLはコスト効率と高効率のMLサービスを提供し、参加要件は最小限である。注目すべきは、GPUのない標準PC上で7B-LLaMAのような広範な言語モデルの実行を可能にし、アクセシビリティを大幅に拡張することである。 opMLを通じてブロックチェーンとAIの能力を組み合わせることで、アクセスしやすく、セキュアで、効率的なオンチェーン機械学習に向けた変革的な旅を開始します。

The integration of machine learning with blockchain technology has witnessed increasing interest, driven by the vision of decentralized, secure, and transparent AI services. In this context, we introduce opML (Optimistic Machine Learning on chain), an innovative approach that empowers blockchain systems to conduct AI model inference. opML lies a interactive fraud proof protocol, reminiscent of the optimistic rollup systems. This mechanism ensures decentralized and verifiable consensus for ML services, enhancing trust and transparency. Unlike zkML (Zero-Knowledge Machine Learning), opML offers cost-efficient and highly efficient ML services, with minimal participation requirements. Remarkably, opML enables the execution of extensive language models, such as 7B-LLaMA, on standard PCs without GPUs, significantly expanding accessibility. By combining the capabilities of blockchain and AI through opML, we embark on a transformative journey toward accessible, secure, and efficient on-chain machine learning.

翻訳日:2024-03-25 12:08:11 公開日:2024-02-05

# 暗号コンピューティングを用いた市町村のサイバーリスクモデリング

Municipal cyber risk modeling using cryptographic computing to inform cyber policymaking ( http://arxiv.org/abs/2402.01007v2 )

ライセンス: Link先を確認

Avital Baral, Taylor Reynolds, Lawrence Susskind, Daniel J. Weitzner, Angelina Wu,

(参考訳) 市町村は破壊的な結果を伴うサイバー攻撃に弱いが、彼らのリスクを評価し、彼らのセキュリティ姿勢を仲間と比較するための重要な情報がない。セキュリティ姿勢,インシデント,セキュリティコントロールの失敗,損失に関する,暗号化的にセキュアな計算プラットフォームを通じて収集された83の自治体のデータを用いて,我々は,自治体のためのデータ駆動型サイバーリスクモデルとサイバーセキュリティベンチマークを構築した。セクターにおけるセキュリティ姿勢のベンチマーク、サイバーインシデントの発生頻度、防衛姿勢に基づく組織に対する年次損失予測、個別の障害率と関連する損失に基づくサイバーコントロールの重み付けを作成している。これら4つの項目を組み合わせることで、セクター内のサイバーリスクを定量化し、対処すべきギャップを特定し、ポリシーの介入を優先順位付けし、時間とともに介入の進捗を追跡することで、サイバーポリシー作成のガイドに役立つ。市町村の場合、これらの新たなリスク対策は、サイバーセキュリティ対策の継続的な改善の必要性を強調し、弱みと強みを明確に示し、セキュリティ教育、インシデント対応、セキュリティドル当たりのリスク低減率が最も低い自治体への取り組みなど、早期の政策目標を政府に提供するものである。

Municipalities are vulnerable to cyberattacks with devastating consequences, but they lack key information to evaluate their own risk and compare their security posture to peers. Using data from 83 municipalities collected via a cryptographically secure computation platform about their security posture, incidents, security control failures, and losses, we build data-driven cyber risk models and cyber security benchmarks for municipalities. We produce benchmarks of the security posture in a sector, the frequency of cyber incidents, forecasted annual losses for organizations based on their defensive posture, and a weighting of cyber controls based on their individual failure rates and associated losses. Combined, these four items can help guide cyber policymaking by quantifying the cyber risk in a sector, identifying gaps that need to be addressed, prioritizing policy interventions, and tracking progress of those interventions over time. In the case of the municipalities, these newly derived risk measures highlight the need for continuous measured improvement of cybersecurity readiness, show clear areas of weakness and strength, and provide governments with some early targets for policy focus such as security education, incident response, and focusing efforts first on municipalities at the lowest security levels that have the highest risk reduction per security dollar invested.

翻訳日:2024-03-25 11:58:26 公開日:2024-02-05

# LLMプログラム合成と不正確なオブジェクトデータベースを用いたオープン・ユニバース室内シーン生成

Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases ( http://arxiv.org/abs/2403.09675v1 )

ライセンス: Link先を確認

Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie,

(参考訳) テキストのプロンプトに応じて屋内シーンを生成するシステムを提案する。プロンプトはシーン記述の固定語彙に制限されず、生成されたシーン内のオブジェクトは固定されたオブジェクトカテゴリに制限されない。屋内シーン生成に関するこれまでのほとんどの研究とは異なり、既存の3Dシーンの大規模なトレーニングデータセットは不要である。代わりに、事前訓練された大規模言語モデル(LLM)に符号化された世界知識を活用して、オブジェクトとそれらの間の空間関係を記述するドメイン固有のレイアウト言語でプログラムを合成する。このようなプログラムを実行すると制約満足度問題の仕様が作成され、勾配に基づく最適化スキームを用いてオブジェクトの位置と向きを生成する。オブジェクトの幾何学を生成するために、システムはデータベースから3Dメッシュを検索する。カテゴリアノテートされた相互整合メッシュのデータベースを使用する以前の作業とは異なり、視覚言語モデル(VLM)を使用して、非アノテートで一貫性のないメッシュの巨大なデータベースからメッシュを取得するパイプラインを開発する。実験により,本システムは従来の閉片側シーン生成タスクにおいて,3次元データに基づいて訓練された生成モデルよりも優れており,また,開放片側シーン生成における最近のLLMに基づくレイアウト生成手法よりも優れていた。

We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of existing 3D scenes. Instead, it leverages the world knowledge encoded in pre-trained large language models (LLMs) to synthesize programs in a domain-specific layout language that describe objects and spatial relations between them. Executing such a program produces a specification of a constraint satisfaction problem, which the system solves using a gradient-based optimization scheme to produce object positions and orientations. To produce object geometry, the system retrieves 3D meshes from a database. Unlike prior work which uses databases of category-annotated, mutually-aligned meshes, we develop a pipeline using vision-language models (VLMs) to retrieve meshes from massive databases of un-annotated, inconsistently-aligned meshes. Experimental evaluations show that our system outperforms generative models trained on 3D data for traditional, closed-universe scene generation tasks; it also outperforms a recent LLM-based layout generation method on open-universe scene generation.

翻訳日:2024-03-25 08:06:28 公開日:2024-02-05

# 勾配降下型モルフォロジーニューラルネットワークの訓練 : いくつかの理論的考察

Training morphological neural networks with gradient descent: some theoretical insights ( http://arxiv.org/abs/2403.12975v1 )

ライセンス: Link先を確認

Samy Blusseau,

(参考訳) モルフォロジーニューラルネットワーク(英: Morphological Neural Network、または層)は、完全な格子演算子の表現のような理論的側面や画像処理パイプラインの開発において、数学的形態学の進歩を促進する強力なツールである。しかしながら、これらのアーキテクチャは、少なくとも勾配降下に基づく最適化アルゴリズムを使用する一般的な機械学習フレームワークにおいて、いくつかの形態的レイヤを数えると、トレーニングが困難であることが判明した。本稿では、ブーリガンド微分の非滑らかな最適化概念を考慮して、微分に基づくアプローチと形態素ネットワークに適用されるバックプロパゲーションの可能性と限界について検討する。我々は、特に初期化と学習率に関する洞察と最初の理論的ガイドラインを提供する。

Morphological neural networks, or layers, can be a powerful tool to boost the progress in mathematical morphology, either on theoretical aspects such as the representation of complete lattice operators, or in the development of image processing pipelines. However, these architectures turn out to be difficult to train when they count more than a few morphological layers, at least within popular machine learning frameworks which use gradient descent based optimization algorithms. In this paper we investigate the potential and limitations of differentiation based approaches and back-propagation applied to morphological networks, in light of the non-smooth optimization concept of Bouligand derivative. We provide insights and first theoretical guidelines, in particular regarding initialization and learning rates.

翻訳日:2024-03-25 07:27:10 公開日:2024-02-05

# 動的アタックグラフを用いたモノのインターネット(IoT)環境における脅威モデリング

Threat Modelling in Internet of Things (IoT) Environment Using Dynamic Attack Graphs ( http://arxiv.org/abs/2310.01689v2 )

ライセンス: Link先を確認

Marwa Salayma,

(参考訳) 本研究は,IoT(Internet of Things,モノのインターネット)環境における攻撃経路の変更を動的に表現するための脅威モデリング手法を提案する。提案手法では,攻撃グラフを用いた脅威の伝播について検討する。しかし、従来のアタックグラフアプローチは、エンタープライズネットワークのように継続的に変更されない静的環境に適用され、静的で通常非常に大きなアタックグラフにつながる。対照的に、IoT環境は動的変更と相互接続によって特徴づけられることが多い。このような新たな相互接続は、対応するアタックグラフが変化するデバイス間の到達性の変化につながる。これは、脅威とリスク分析のための動的トポロジとアタックグラフを必要とする。本稿では,IoT環境において発生する動的システム変化に対処し,システムダイナミクスを許容しながら攻撃経路の特定を可能にする脅威モデリング手法を開発した。動的トポロジとアタックグラフは、関連するグラフを維持することで、IoT環境の変化に迅速に対応できる。研究の動機付けと提案したアプローチを説明するために,医療システムに基づく事例シナリオを紹介した。提案手法はグラフデータベース管理ツール(GDBM)-Neo4jを用いて実装されている。これは高度に接続されたデータのグラフをマッピング、視覚化、クエリするための一般的なツールであり、高速な脅威モデリングメカニズムを提供することで効率よく、動的IoT環境におけるセキュリティ変更のキャプチャに適している。

This work presents a threat modelling approach to represent changes to the attack paths through an Internet of Things (IoT) environment when the environment changes dynamically, i.e., when new devices are added or removed from the system or when whole sub-systems join or leave. The proposed approach investigates the propagation of threats using attack graphs. However, traditional attack graph approaches have been applied in static environments that do not continuously change such as the Enterprise networks, leading to static and usually very large attack graphs. In contrast, IoT environments are often characterised by dynamic change and interconnections; different topologies for different systems may interconnect with each other dynamically and outside the operator control. Such new interconnections lead to changes in the reachability amongst devices according to which their corresponding attack graphs change. This requires dynamic topology and attack graphs for threat and risk analysis. In this paper, a threat modelling approach is developed that copes with dynamic system changes that may occur in IoT environments and enables identifying attack paths whilst allowing for system dynamics. Dynamic topology and attack graphs were developed that are able to cope with the changes in the IoT environment rapidly by maintaining their associated graphs. To motivate the work and illustrate the proposed approach, an example scenario based on healthcare systems is introduced. The proposed approach is implemented using a Graph Database Management Tool (GDBM)- Neo4j- which is a popular tool for mapping, visualising and querying the graphs of highly connected data, and is efficient in providing a rapid threat modelling mechanism, which makes it suitable for capturing security changes in the dynamic IoT environment.

翻訳日:2024-03-19 03:31:41 公開日:2024-02-05

# ALBERTA: トランスフォーマーアーキテクチャにおけるalgorithmベースのエラーレジリエンス

ALBERTA: ALgorithm-Based Error Resilience in Transformer Architectures ( http://arxiv.org/abs/2310.03841v2 )

ライセンス: Link先を確認

Haoxuan Liu, Vasu Singh, Michał Filipiuk, Siva Kumar Sastry Hari,

(参考訳) ビジョントランスフォーマーは、信頼性の高い安全クリティカルなアプリケーションにますますデプロイされている。過渡的ハードウェアエラーのような潜在的なエラーにもかかわらず、実行の正確性を保証することが不可欠である。アルゴリズムに基づく新しいレジリエンスフレームワークであるALBERTAを提案し、エンドツーエンドのレジリエンス分析とトランスフォーマーベースのアーキテクチャの保護を実現する。まず、トランス層のレジリエンスを計算し、ランク付けする効率的なプロセスを開発する。トランスモデルの規模が大きいため、従来のネットワーク冗長性を最も脆弱なレイヤのサブセットに適用することは、過激なオーバーヘッドを伴うにもかかわらず、高いエラーカバレッジを提供する。本稿では,浮動小数点演算と整数演算を併用したトランスフォーマーモデルにおいて,最も脆弱な汎用行列乗算(GEMM)層を保護することを目的とした,ソフトウェア指向のチェックサムに基づくエラー検出手法を提供することにより,この問題に対処する。その結果,提案手法は,それぞれ0.2%未満のミスマッチと0.01%のメモリオーバヘッドのミスマッチを生じるエラーに対して,99%以上のカバレッジを達成できた。最後に、異なる数値精度で、最新のGPUアーキテクチャにおける我々のフレームワークの適用性を示す。本稿では,誤り検出を平均2%未満のオーバーヘッドで解決する,効率的な自己補正機構を提案する。

Vision Transformers are being increasingly deployed in safety-critical applications that demand high reliability. It is crucial to ensure the correctness of their execution in spite of potential errors such as transient hardware errors. We propose a novel algorithm-based resilience framework called ALBERTA that allows us to perform end-to-end resilience analysis and protection of transformer-based architectures. First, our work develops an efficient process of computing and ranking the resilience of transformers layers. We find that due to the large size of transformer models, applying traditional network redundancy to a subset of the most vulnerable layers provides high error coverage albeit with impractically high overhead. We address this shortcoming by providing a software-directed, checksum-based error detection technique aimed at protecting the most vulnerable general matrix multiply (GEMM) layers in the transformer models that use either floating-point or integer arithmetic. Results show that our approach achieves over 99% coverage for errors that result in a mismatch with less than 0.2% and 0.01% computation and memory overheads, respectively. Lastly, we present the applicability of our framework in various modern GPU architectures under different numerical precisions. We introduce an efficient self-correction mechanism for resolving erroneous detection with an average of less than 2% overhead per error.

翻訳日:2024-03-19 03:12:08 公開日:2024-02-05

# 最適ロールアップのインセンティブ非適合性

Incentive Non-Compatibility of Optimistic Rollups ( http://arxiv.org/abs/2312.01549v2 )

ライセンス: Link先を確認

Daji Landis,

(参考訳) 最適化ロールアップ(Optimistic rollups)は、その基盤となるチェーンのスループットを向上する、人気があり有望な方法である。これらの方法は、彼らの安全を保証するための経済的なインセンティブに依存している。楽観的なロールアップのモデルは、インセンティブがプレイヤーの期待する行動と必ずしも一致していないことを示唆し、既存の楽観的なロールアップのセキュリティを損なう可能性がある。我々のモデルに照らされた潜在的な解決策について議論する。

Optimistic rollups are a popular and promising method of increasing the throughput capacity of their underlying chain. These methods rely on economic incentives to guarantee their security. We present a model of optimistic rollups that suggests that the incentives are not necessarily aligned with the expected behavior of the players, thus potentially undermining the security of existing optimistic rollups. We discuss some potential solutions illuminated by our model.

翻訳日:2024-03-18 13:15:35 公開日:2024-02-05

# Code-based Single-Server Private Information Retrieval: Circumventing the Sub-Query Attack

Code-Based Single-Server Private Information Retrieval: Circumventing the Sub-Query Attack ( http://arxiv.org/abs/2402.02871v1 )

ライセンス: Link先を確認

Neehar Verma, Camilla Hollanti,

(参考訳) ランダムな線形コードを用いて,単一サーバからのプライベート情報検索を検討する。 Holzbaur、Hollanti、Wachter-Zehによって提案された最初のコードベースのシングルサーバ計算PIRスキームの修正版である[Holzbaur et al , "Computational Code-Based Single-Server Private Information Retrieval", 2020 IEEE ISIT]。元のスキームは[Bordage et al , “On the privacy of a code-based single-server computer PIR scheme, Cryptogr. Comm., 2021] で、ユーザのクエリのサブマトリックスの非常に高いランク差から生じる攻撃によって破られた。ここで、この攻撃は、サブ行列が無視できるランク差を持つことを保証することで回避される。さらに、ランク差を所望のファイルインデックスに関連付けることができないため、スキームのプライバシーを確保することができる。複数のファイルを取得する場合、修正されたスキームのレートは、ほとんど影響を受けず、元のスキームと同等である。

Private information retrieval from a single server is considered, utilizing random linear codes. Presented is a modified version of the first code-based single-server computational PIR scheme proposed by Holzbaur, Hollanti, and Wachter-Zeh in [Holzbaur et al., "Computational Code-Based Single-Server Private Information Retrieval", 2020 IEEE ISIT]. The original scheme was broken in [Bordage et al., "On the privacy of a code-based single-server computational PIR scheme", Cryptogr. Comm., 2021] by an attack arising from highly probable rank differences in sub-matrices of the user's query. Here, this attack is now circumvented by ensuring that the sub-matrices have negligible rank difference. Furthermore, the rank difference cannot be attributed to the desired file index, thereby ensuring the privacy of the scheme. In the case of retrieving multiple files, the rate of the modified scheme is largely unaffected and at par with the original scheme.

翻訳日:2024-03-18 07:57:54 公開日:2024-02-05

# UniHENN:im2colを使わずに、より多彩な同型暗号化ベースのCNNを設計

UniHENN: Designing More Versatile Homomorphic Encryption-based CNNs without im2col ( http://arxiv.org/abs/2402.03060v1 )

ライセンス: Link先を確認

Hyunmin Choi, Jihun Kim, Seungho Kim, Seonhye Park, Jeongyong Park, Wonbin Choi, Hyoungshick Kim,

(参考訳) ホモモルフィック暗号化は、プライバシ保護クラウドサービスにとって重要な復号化のない暗号化データの計算を可能にする。しかし、同相暗号を用いた畳み込みニューラルネットワーク(CNN)のデプロイは、特に、インプットデータを畳み込みのための2次元行列に変換する際に、重要な課題に直面する。この方法は効率的なが、暗号化されたデータ構造との互換性の制約により、デプロイ可能なCNNモデルの多様性を制限する。 UniHENNは、同型暗号に基づくCNNアーキテクチャであり、Im2colの必要性を排除し、同型暗号を用いた様々なCNNモデルとの互換性を確保する。実験の結果,UniHENNは2次元CNN推論アーキテクチャであるPyCrCNNを平均30.090秒で上回り,PyCrCNNの794.064秒より大幅に高速であることがわかった。さらに、UniHENNは、高要求のクラウドアプリケーションにとって不可欠な機能である、同時イメージの処理にim2colを使用するTenSEALよりも優れています。 UniHENNの汎用性は、1Dと6つの異なる2D CNNを含むさまざまなCNNアーキテクチャで証明されており、柔軟性と効率性を強調している。これらの品質は、UniHENNを、クラウドコンピューティング環境におけるスケーラブルでセキュアで効率的なディープラーニングの需要の増加に対処する、プライバシ保護、クラウドベースのCNNサービスの有望なソリューションとして確立している。

Homomorphic encryption enables computations on encrypted data without decryption, which is crucial for privacy-preserving cloud services. However, deploying convolutional neural networks (CNNs) with homomorphic encryption encounters significant challenges, particularly in converting input data into a two-dimensional matrix for convolution, typically achieved using the im2col technique. While efficient, this method limits the variety of deployable CNN models due to compatibility constraints with the encrypted data structure. UniHENN, a homomorphic encryption-based CNN architecture, eliminates the need for im2col, ensuring compatibility with a diverse range of CNN models using homomorphic encryption. Our experiments demonstrate that UniHENN surpasses the leading 2D CNN inference architecture, PyCrCNN, in inference time, as evidenced by its performance on the LeNet-1 dataset, where it averages 30.090 seconds--significantly faster than PyCrCNN's 794.064 seconds. Furthermore, UniHENN outperforms TenSEAL, which employs im2col, in processing concurrent images, an essential feature for high-demand cloud applications. The versatility of UniHENN is proven across various CNN architectures, including 1D and six different 2D CNNs, highlighting its flexibility and efficiency. These qualities establish UniHENN as a promising solution for privacy-preserving, cloud-based CNN services, addressing the increasing demand for scalable, secure, and efficient deep learning in cloud computing environments.