Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240520となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 多人数会話におけるヒューマン・アウェア・ロボットのマルチモーダル説明可能性アプローチ A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation ( http://arxiv.org/abs/2407.03340v1 ) ライセンス: Link先を確認	Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Alessandra Sciutti, Carlo Mazzola,	(参考訳) 住所推定(誰かの話に従えば)は、多人数会話のシナリオにおける人間の行動認識の基本的なタスクである。具体的には、人間とロボットの相互作用の分野では、このような対話的なコンテキストにソーシャルロボットを参加させることがさらに重要である。しかし、通常は二分分類タスクとして実装され、ロボットが対処したかどうかを推定し、対話的なスキルを制限する能力を制限する。社会ロボットが人間の信頼を得るためには、あるレベルの透明性と説明可能性を示すことも重要である。したがって、説明可能な人工知能は現在の機械学習アプリケーションやモデルにおいて重要な役割を果たす。私たちの仕事で、私たちは a) 前のSOTAと比較して性能が向上した宛先推定モデルを示すこと。 b) 本モデルをさらに変更して,本質的に説明可能な注意に基づく区分を含むこと。 c) iCubロボットにおける多人数会話のためのモジュール型認知アーキテクチャの一部として説明可能な宛先推定を実装した。 d) 上記アーキテクチャに説明可能性及び透明性を組み込むためのいくつかの方法を提案する。 e) 被験者がロボットをどのように知覚するかに関する様々な説明の効果を分析するために、パイロットユーザー研究を行う。 The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition in multi-party conversation scenarios. Specifically, in the field of human-robot interaction, it becomes even more crucial to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, restricting the robot's capability to estimate whether it was addressed and limiting its interactive skills. For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, to provide explanations for their decisions besides excellent performance. In our work, we a) present an addressee estimation model with improved performance in comparison with the previous SOTA; b) further modify this model to include inherently explainable attention-based segments; c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; d) propose several ways to incorporate explainability and transparency in the aforementioned architecture; and e) perform a pilot user study to analyze the effect of various explanations on how human participants perceive the robot.	翻訳日:2024-07-22 22:09:04 公開日:2024-05-20
# テキスト簡易化における依存距離の役割:人間とチャットGPTの簡易化比較 Role of Dependency Distance in Text Simplification: A Human vs ChatGPT Simplification Comparison ( http://arxiv.org/abs/2406.17787v1 ) ライセンス: Link先を確認	Sumi Lee, Gondy Leroy, David Kauchak, Melissa Just,	(参考訳) 本研究では,人間とチャットGPTテキストの簡易化とその依存距離との関係について検討する。従来のユーザスタディで測定された文法的難易度が増大する220文は、人間の専門家とChatGPTを用いて単純化された。その結果, 3つの文集合は, 平均依存距離が異なり, 原文集合の最上位, 後続のChatGPT簡易文, 人為的簡易文は平均依存距離が低かった。 This study investigates human and ChatGPT text simplification and its relationship to dependency distance. A set of 220 sentences, with increasing grammatical difficulty as measured in a prior user study, were simplified by a human expert and using ChatGPT. We found that the three sentence sets all differed in mean dependency distances: the highest in the original sentence set, followed by ChatGPT simplified sentences, and the human simplified sentences showed the lowest mean dependency distance.	翻訳日:2024-07-01 06:21:45 公開日:2024-05-20
# サンプル選択による3次元点雲正規分布推定の精細化 Refining 3D Point Cloud Normal Estimation via Sample Selection ( http://arxiv.org/abs/2406.18541v1 ) ライセンス: Link先を確認	Jun Zhou, Yaoshun Li, Hongchen Tan, Mingjie Wang, Nannan Li, Xiuping Liu,	(参考訳) 近年,3次元幾何処理の分野では,古典的・基礎的アルゴリズムとしての点雲正規推定が注目されている。現在のニューラルネットワークベースの手法によって達成された顕著なパフォーマンスにもかかわらず、その堅牢性はトレーニングデータの品質とモデルのパフォーマンスの影響を受け続けている。本研究では,グローバルな情報と様々な制約機構を組み込むことにより,正規化のための基本的枠組みを設計し,既存モデルを拡張した。さらに、信頼に基づく戦略を用いて、公平で堅牢なネットワークトレーニングのための妥当なサンプルを選択しました。導入されたサンプル信頼度は、モデルトレーニングにおける異なるサンプルの影響のバランスをとるために損失関数に統合することができる。最後に,従来の方向定式化手法を用いて非方向定式化を行い,非方向定式化タスクと非方向定式化タスクの両方で最先端性能を実現した。大規模な実験結果から,本手法は広く用いられているベンチマークでよく動作することが示された。 In recent years, point cloud normal estimation, as a classical and foundational algorithm, has garnered extensive attention in the field of 3D geometric processing. Despite the remarkable performance achieved by current Neural Network-based methods, their robustness is still influenced by the quality of training data and the models' performance. In this study, we designed a fundamental framework for normal estimation, enhancing existing model through the incorporation of global information and various constraint mechanisms. Additionally, we employed a confidence-based strategy to select the reasonable samples for fair and robust network training. The introduced sample confidence can be integrated into the loss function to balance the influence of different samples on model training. Finally, we utilized existing orientation methods to correct estimated non-oriented normals, achieving state-of-the-art performance in both oriented and non-oriented tasks. Extensive experimental results demonstrate that our method works well on the widely used benchmarks.	翻訳日:2024-07-01 06:12:00 公開日:2024-05-20
# マルチモーダルトランスを用いたAIによるLiDARポイントクラウド生成 Generative AI Empowered LiDAR Point Cloud Generation with Multimodal Transformer ( http://arxiv.org/abs/2406.18542v1 ) ライセンス: Link先を確認	Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci,	(参考訳) 統合センシングと通信は6G無線通信システムのキーイネーブルである。複数のセンシングモードにより、基地局は環境をより正確に表現することができ、コンテキスト対応の通信につながる。カメラやRADARセンサーなどの広範囲に装備されたセンサーは、いくつかの環境認識を提供することができる。しかし、特に悪天候下では、正確な環境表現を生成するには不十分である。一方、LiDARセンサーはより正確な表現を提供するが、その普及は高いコストで妨げられている。本稿では、画像とRADARデータからLiDAR点雲を合成し、無線通信システムを強化する新しい手法を提案する。具体的には、マルチモーダルトランスアーキテクチャと事前訓練された符号化モデルを使用して、正確なLiDAR生成を可能にする。提案するフレームワークは、コンテキスト対応無線アプリケーション用にキュレートされた実世界のデータセットであるDeepSense 6Gデータセットに基づいて評価される。本研究は,LiDAR点雲を高精度に生成する手法の有効性を示すものである。修正平均二乗誤差は10.3931である。画像の視覚的検査により,LiDAR点雲に存在する構造の大部分を多種多様な環境下で捉えることが可能であることが示唆された。これにより基地局はより正確な環境検知を行うことができる。既存のセンシングモードとLiDAR合成を統合することで、ビームや遮断予測を含む様々な無線アプリケーションの性能を向上させることができる。 Integrated sensing and communications is a key enabler for the 6G wireless communication systems. The multiple sensing modalities will allow the base station to have a more accurate representation of the environment, leading to context-aware communications. Some widely equipped sensors such as cameras and RADAR sensors can provide some environmental perceptions. However, they are not enough to generate precise environmental representations, especially in adverse weather conditions. On the other hand, the LiDAR sensors provide more accurate representations, however, their widespread adoption is hindered by their high cost. This paper proposes a novel approach to enhance the wireless communication systems by synthesizing LiDAR point clouds from images and RADAR data. Specifically, it uses a multimodal transformer architecture and pre-trained encoding models to enable an accurate LiDAR generation. The proposed framework is evaluated on the DeepSense 6G dataset, which is a real-world dataset curated for context-aware wireless applications. Our results demonstrate the efficacy of the proposed approach in accurately generating LiDAR point clouds. We achieve a modified mean squared error of 10.3931. Visual examination of the images indicates that our model can successfully capture the majority of structures present in the LiDAR point cloud for diverse environments. This will enable the base stations to achieve more precise environmental sensing. By integrating LiDAR synthesis with existing sensing modalities, our method can enhance the performance of various wireless applications, including beam and blockage prediction.	翻訳日:2024-07-01 06:12:00 公開日:2024-05-20
# 機械学習で実現可能なシステムエンジニアリングにおけるペインの命名 Naming the Pain in Machine Learning-Enabled Systems Engineering ( http://arxiv.org/abs/2406.04359v1 ) ライセンス: Link先を確認	Marcos Kalinowski, Daniel Mendez, Görkem Giray, Antonio Pedro Santos Alves, Kelly Azevedo, Tatiana Escovedo, Hugo Villamizar, Helio Lopes, Teresa Baldassarre, Stefan Wagner, Stefan Biffl, Jürgen Musil, Michael Felderer, Niklas Lavesson, Tony Gorschek,	(参考訳) コンテキスト: マシンラーニング(ML)対応システムは、製品や運用プロセスの強化を目指す企業によって、ますます採用されています。目的: 本論文は, ML対応システムの現状を概観し, 実践的, 問題駆動型学術研究の基盤となることを目的としている。方法: ML対応システムの現状と問題点について, 実践者から洞察を得るための国際調査を行った。 25カ国から188件の回答を受け取りました。本研究では,信頼区間を有するブートストラップを用いた現代的実践に関する定量的統計分析と,オープンおよび軸方向の符号化手法を用いて報告された問題の質的分析を行った。結果: ML対応システムに関する既存の実証的証拠を補強・拡張し,典型的なML対応システムプロジェクト状況,MLライフサイクルフェーズの認識と複雑性,問題理解,モデル展開,モデル監視に関する現在の実践について,さらなる知見を提供する。さらに、定性的分析は、MLライフサイクルの各フェーズで実践者が直面する問題と、プロジェクト全体の失敗を引き起こす問題の詳細マップを提供する。結論: 結果は,現状と実践環境の問題点の理解に寄与する。我々は、ML対応システムのエンジニアリングを強化するために、ソフトウェアエンジニアリングプラクティスのさらなる適応と普及を提唱する。 Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.	翻訳日:2024-06-23 14:05:12 公開日:2024-05-20
# 勧告強化のための本質的・遠方的知識の抽出 Extracting Essential and Disentangled Knowledge for Recommendation Enhancement ( http://arxiv.org/abs/2406.00012v1 ) ライセンス: Link先を確認	Kounianhua Du, Jizheng Chen, Jianghao Lin, Menghui Zhu, Bo Chen, Shuai Li, Ruiming Tang,	(参考訳) 様々な産業シナリオにおいて、リコメンダモデルは重要な役割を担っているが、高速なシフトデータ配信、ユーザの興味の進化、販売プロモーション中のクリック信号の変動などによって引き起こされる悲惨な忘れ問題に直面していることが多い。この問題を緩和するためには、歴史的データから知識を再利用するのが一般的なアプローチである。しかし、巨大かつ高速に蓄積されたデータの保存は困難であり、劇的なストレージオーバーヘッドを引き起こす。次に、パラメトリックな知識ベースを通じて古いデータを記憶し、膨大な量の生データをモデルパラメータに圧縮する。柔軟性にも拘わらず、パラメトリック知識基盤の記憶と一般化能力を改善する方法は困難である。本稿では,従来のデータから本質的知識と不整合的知識を抽出する2つの制約を提案する。本質的な原理は、入力を代表ベクトルに圧縮し、タスク関連情報をキャプチャし、ノイズのある情報をフィルタリングするのに役立つ。アンタングル化原理は、格納された情報の冗長性を低減し、アンタングル化不変パターンのキャプチャにフォーカスする知識ベースをプッシュする。これら2つのルールは、堅牢で一般化された知識表現のための情報の合理的な圧縮を促進する。 2つのデータセットに対する大規模な実験は,提案手法の有効性を正当化するものである。 Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution, e.g., the evolving user interests, click signals fluctuation during sales promotions, etc. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-accumulating data is hard, which causes dramatic storage overhead. Memorizing old data through a parametric knowledge base is then proposed, which compresses the vast amount of raw data into model parameters. Despite the flexibility, how to improve the memorization and generalization capabilities of the parametric knowledge base is challenging. In this paper, we propose two constraints to extract Essential and Disentangled Knowledge from past data for rational and generalized recommendation enhancement, which improves the capabilities of the parametric knowledge base without increasing the size of it. The essential principle helps to compress the input into representative vectors that capture the task-relevant information and filter out the noisy information. The disentanglement principle reduces the redundancy of stored information and pushes the knowledge base to focus on capturing the disentangled invariant patterns. These two rules together promote rational compression of information for robust and generalized knowledge representations. Extensive experiments on two datasets justify the effectiveness of the proposed method.	翻訳日:2024-06-09 16:19:21 公開日:2024-05-20
# 論文:文書要約とキーワード抽出と画像検索への応用 Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval ( http://arxiv.org/abs/2406.00013v1 ) ライセンス: Link先を確認	Jayaprakash Sundararaj,	(参考訳) 自動要約は、原文書の最も重要な点を保持する要約を生成するために、文書を縮小する過程である。本研究では,2つの問題について検討する。一画像再生のためのキーワード/カプセルの集合としてテキスト文書を要約すること。二書面に関連性及び感情を混同した意見要約を作成すること。そこで本研究では,既存のプレーンテキストニュース記事の相当量の向上に向けた推奨画像について紹介する。確率的モデルと単語類似性ヒューリスティックスを用いてキャプションを生成し、関連するフィードバック機構を持つランク集約フレームワークを用いて再ランク付けされたキーフレーズを抽出する。タグ付け文書やテキスト情報検索で一般的に使用されるランクアグリゲーションや関連するフィードバックは,画像検索の改善にも有効であることを示す。これらのクエリはYahoo Search Engineに送られ、関連する画像を取得する。提案手法は,既存のすべてのベースラインよりも優れた性能を示す。さらに,意見要約のための部分モジュラ関数の集合を提案する。意見要約は、その中に要約と感情検出のタスクが組み込まれている。しかし、感情を検知し、同時に要約を抽出することは容易ではない。この2つの課題は、圧縮の要求が感傷的な文を減少させ、感情検出の要求が冗長な文をもたらすという意味で矛盾する。しかし、サブモジュラリティを使って、この2つの要件のバランスをとる方法を示します。我々の関数は、文書の感情と要約の感情と良いROUGEスコアとの間に良い相関関係があるような要約を生成する。また,提案した部分モジュラ関数の性能を比較する。 Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions.	翻訳日:2024-06-09 16:19:21 公開日:2024-05-20
# 黒体全内エネルギーの定義に関する最後の成果に関するブレークスルー Breaking news on last achievements on the definition of the black-body total internal energy ( http://arxiv.org/abs/2405.15806v1 ) ライセンス: Link先を確認	Lino Reggiani, Eleonora Alfinito,	(参考訳) 黒体の内部エネルギーは、現代物理学の発展において重要な物理量である。そこで,本稿では,短い歴史的展開とともに,この量の定義と性質に関する最新のニュース(2018-2024)を報告し,コメントする。最初のコメントは、ゼロ点エネルギーの存在によって引き起こされる真空カタストロフィを避けるカシミールエネルギーの包含に関するものであり、それによって境界効果に関連するさらなる量子的寄与につながった。第2のコメントは、量子ブラックボディにおける古典物理学の役割の再検討の可能性を示す1次元のブラックボディの半古典的なシミュレーションに関するものである。 The internal total-energy of the black-body is a physical quantity of paramount importance in the development of modern physics. Accordingly, together with a brief historical development, we report and comment last breaking news (2018-2024) concerning the definition and properties of this quantity. The first comment concerns with the inclusion of the Casimir energy that avoids the vacuum catastrophe implied by he presence of zero-point energy, thus leading to further quantum contributions associated with boundary effects. The second comment concerns with a semi-classical simulation of a one dimensional black-body whose results suggest a possible reconsideration on the role of classical physics on the quantum black-body.	翻訳日:2024-06-02 14:30:04 公開日:2024-05-20
# 独立原子アンザッツからのH$_{2}$分子の解析的相関 Analytical Correlation in the H$_{2}$ Molecule from the Independent Atom Ansatz ( http://arxiv.org/abs/2405.15809v1 ) ライセンス: Link先を確認	Alanna 'Lanie' Leung, Alexander V. Mironenko,	(参考訳) 密度汎関数理論の独立原子アンサッツは、H$_{2}$分子の動的相関エネルギーに対して正確な解析的表現を与える:$E_{c} = 0.5(1 - \sqrt{2})(ab\|ba)$ 原子付加自己整合密度$\rho = \|a\|^{2} + \|b\|^{2}$。正確な原子の自己交換と組み合わせると、ほぼ正確なSCAN交換相関エネルギーの99.5%以上をR > 0.5$\r{A}$で回収し、0.12eV以下である。全エネルギー関数はH-H結合を正しく解離させ、厳密な結合計算コストでの実験に対して0.002$\r{A}$, 0.19 eV, 13 cm$^{-1}$の絶対誤差を与える。化学結合の形成は、準直交原子状態(-(ab\|ba)$)の漸近的ハイトラー・ロンドン共鳴によるもので、その結合の運動エネルギーや電荷蓄積に寄与しない。 The independent atom ansatz of density functional theory yields an accurate analytical expression for dynamic correlation energy in the H$_{2}$ molecule: $E_{c} = 0.5(1 - \sqrt{2})(ab\|ba)$ for the atom-additive self-consistent density $\rho = \|a\|^{2} + \|b\|^{2}$. Combined with exact atomic self-exchange, it recovers more than 99.5 % of nearly exact SCAN exchange-correlation energy at R > 0.5 $\r{A}$, differing by less than 0.12 eV. The total energy functional correctly dissociates the H-H bond and yields absolute errors of 0.002 $\r{A}$, 0.19 eV, and 13 cm$^{-1}$ relative to experiment at the tight binding computational cost. The chemical bond formation is attributed to the asymptotic Heitler-London resonance of quasi-orthogonal atomic states ($- (ab\|ba)$) with no contributions from kinetic energy or charge accumulation in the bond.	翻訳日:2024-06-02 14:30:04 公開日:2024-05-20
# ディープニューラルネットワークにおけるマージンに基づく一般化予測について On margin-based generalization prediction in deep neural networks ( http://arxiv.org/abs/2405.17445v1 ) ライセンス: Link先を確認	Coenraad Mouton,	(参考訳) ディープニューラルネットワークにおける一般化を理解することは、研究の活発な領域である。有望な探索の道はマージンの測定であり、与えられたサンプルの判定境界から最短距離、またはネットワークの内部のサンプルの表現である。マージンに基づく複雑性測定は、いくつかの状況においてディープニューラルネットワークの一般化能力と相関することが示されているが、それ以外は関係していない。これらの指標の成功や失敗の背景にある理由は、現時点では不明である。本研究では,異なる環境下でのマージンに基づく一般化予測手法について検討する。これらのメトリクスが、時に正確な一般化の予測に失敗し、どのように改善できるかを動機付けています。まず、入力空間で測定されたマージンとサンプルノイズの関係を解析する。異なる種類のサンプルノイズが、ノイズデータをモデル化したネットワーク全体のマージンに、非常に異なる効果をもたらすことが判明した。これに続いて、異なる表現空間で測定されたロバストマージンが、一般化を予測する上でいかに頑健であるかを実証的に評価する。これらの指標にはいくつかの制限があり、多くの場合、大きなマージンは経験的リスクと強く相関しない。最後に、基礎となるデータ多様体の近似を組み込んだ新たなマージンベースの測度を導入する。この測度は、一般に他のすべてのマージンベースの測度よりも一般化の予測的であることが実証的に実証されている。さらに、この測定は、よく知られた一般化予測ベンチマークにおいて、他の現代の複雑性指標よりも優れていることが判明した。さらに,本手法の有用性と限界を分析し,この指標が先行作業で表現された直観とよく一致していることを確認した。 Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or that sample's representation internal to the network. Margin-based complexity measures have been shown to be correlated with the generalization ability of deep neural networks in some circumstances but not others. The reasons behind the success or failure of these metrics are currently unclear. In this study, we examine margin-based generalization prediction methods in different settings. We motivate why these metrics sometimes fail to accurately predict generalization and how they can be improved. First, we analyze the relationship between margins measured in the input space and sample noise. We find that different types of sample noise can have a very different effect on the overall margin of a network that has modeled noisy data. Following this, we empirically evaluate how robust margins measured at different representational spaces are at predicting generalization. We find that these metrics have several limitations and that a large margin does not exhibit a strong correlation with empirical risk in many cases. Finally, we introduce a new margin-based measure that incorporates an approximation of the underlying data manifold. It is empirically demonstrated that this measure is generally more predictive of generalization than all other margin-based measures. Furthermore, we find that this measurement also outperforms other contemporary complexity measures on a well-known generalization prediction benchmark. In addition, we analyze the utility and limitations of this approach and find that this metric is well aligned with intuitions expressed in prior work.	翻訳日:2024-06-02 14:30:04 公開日:2024-05-20
# 病理組織学的特徴エクストラクタを用いた全スライド画像生存解析 Whole Slide Image Survival Analysis Using Histopathological Feature Extractors ( http://arxiv.org/abs/2405.17446v1 ) ライセンス: Link先を確認	Kleanthis Marios Papadopoulos,	(参考訳) WSI (Whole Slide Images) に含まれる情報の豊富さは, 予後評価に有用である。事前訓練されたResNetバックボーンを利用した多数のモデルがリリースされ、主にMIL(Multiple Instance Learning)に基づいたさまざまな機能集約技術が採用されている。最近リリースされたUNI機能抽出器を利用することで、既存のモデルはより高精度に適応することができ、デジタル病理学におけるより堅牢な予後ツールの道を開くことができる。 The abundance of information present in Whole Slide Images (WSIs) makes them useful for prognostic evaluation. A large number of models utilizing a pretrained ResNet backbone have been released and employ various feature aggregation techniques, primarily based on Multiple Instance Learning (MIL). By leveraging the recently released UNI feature extractor, existing models can be adapted to achieve higher accuracy, which paves the way for more robust prognostic tools in digital pathology.	翻訳日:2024-06-02 14:30:04 公開日:2024-05-20
# YUI: 簡易供給・需要曲線を用いた日頭電力価格予測 YUI: Day-ahead Electricity Price Forecasting Using Invariance Simplified Supply and Demand Curve ( http://arxiv.org/abs/2405.14893v1 ) ライセンス: Link先を確認	Linian Wang, Anlan Yu, Jianghong Liu, Huibing Zhang, Leye Wang,	(参考訳) 日頭電気市場においては、すべての市場参加者が意思決定プロセスの信頼性と正確な価格予測にアクセスできることが不可欠である。現在、産業用途で使われている予測手法は、価格形成の基盤となるメカニズムを無視することが多いが、供給・需要の観点からの経済研究は厳しいデータ収集要求を抱えており、実際の市場では適用が困難である。日頭電気市場の特徴を考察し,供給曲線と需要曲線のモデリングを簡略化するための2つの相違仮定を導入する。時間差の仮定を組み込むと、近年の複数のタイムスロットから市場均衡点を用いて供給曲線を予測できる。価格不感の仮定を導入することで、直線を用いて需要曲線を近似することができる。この2つの曲線が交差する点から、予測価格が得られます。提案したモデルでは, suppl\textbf{Y} と要求 cUrve を, YUI と呼ばれる不変性によって単純化し, 最先端の手法よりも効率的である。実験の結果,既存の手法と比較して,YUIは予測誤差をMAEで13.8%,sMAPEで28.7%削減できることがわかった。コードはhttps://github.com/wangln19/YUIで公開されている。 In day-ahead electricity market, it is crucial for all market participants to have access to reliable and accurate price forecasts for their decision-making processes. Forecasting methods currently utilized in industrial applications frequently neglect the underlying mechanisms of price formation, while economic research from the perspective of supply and demand have stringent data collection requirements, making it difficult to apply in actual markets. Observing the characteristics of the day-ahead electricity market, we introduce two invariance assumptions to simplify the modeling of supply and demand curves. Upon incorporating the time invariance assumption, we can forecast the supply curve using the market equilibrium points from multiple time slots in the recent period. By introducing the price insensitivity assumption, we can approximate the demand curve using a straight line. The point where these two curves intersect provides us with the forecast price. The proposed model, forecasting suppl\textbf{Y} and demand cUrve simplified by Invariance, termed as YUI, is more efficient than state-of-the-art methods. Our experiment results in Shanxi day-ahead electricity market show that compared with existing methods, YUI can reduce forecast error by 13.8\% in MAE and 28.7\% in sMAPE. Code is publicly available at https://github.com/wangln19/YUI.	翻訳日:2024-05-27 19:48:22 公開日:2024-05-20
# 空間的自己回帰モデルのための伝達学習 Transfer Learning for Spatial Autoregressive Models ( http://arxiv.org/abs/2405.15600v1 ) ライセンス: Link先を確認	Hao Zeng, Wei Zhong, Xingbai Xu,	(参考訳) 空間自己回帰モデル (SAR) は, 被験者間の空間依存を特徴づけるために, 様々な経験的経済研究に広く応用されている。しかし,SARモデルの推定精度は,対象データのサンプルサイズが制限された場合に低下する。本稿では,SARモデルのための新しい伝達学習フレームワークを提案する。情報ソースデータセットが知られている場合、未知のパラメータを推定し、得られた推定値の理論的収束率を確立するために、転送段階とデバイアス段階を含む2段階のアルゴリズムを導入する。もしどのソースを転送すべきかがわからなければ、空間的残留ブートストラップに基づく情報ソースデータを検出し、必要な空間的依存を維持するために、転送可能なソース検出アルゴリズムが提案される。検出一貫性ももたらされる。シミュレーション研究は,情報ソースデータを用いて,従来の2段最小二乗推定器の性能を著しく向上させることを実証した。実証的な応用として、我々は、2016年アメリカ合衆国大統領選挙の投票データとその他の人口統計および地理的データを利用して、2020年アメリカ合衆国大統領選挙の揺動州における選挙予測に適用する。実験結果から,本手法は従来の推定方法よりも優れていることが示された。 The spatial autoregressive (SAR) model has been widely applied in various empirical economic studies to characterize the spatial dependence among subjects. However, the precision of estimating the SAR model diminishes when the sample size of the target data is limited. In this paper, we propose a new transfer learning framework for the SAR model to borrow the information from similar source data to improve both estimation and prediction. When the informative source data sets are known, we introduce a two-stage algorithm, including a transferring stage and a debiasing stage, to estimate the unknown parameters and also establish the theoretical convergence rates for the resulting estimators. If we do not know which sources to transfer, a transferable source detection algorithm is proposed to detect informative sources data based on spatial residual bootstrap to retain the necessary spatial dependence. Its detection consistency is also derived. Simulation studies demonstrate that using informative source data, our transfer learning algorithm significantly enhances the performance of the classical two-stage least squares estimator. In the empirical application, we apply our method to the election prediction in swing states in the 2020 U.S. presidential election, utilizing polling data from the 2016 U.S. presidential election along with other demographic and geographical data. The empirical results show that our method outperforms traditional estimation methods.	翻訳日:2024-05-27 13:40:24 公開日:2024-05-20
# 医学のための大規模言語モデル:サーベイ Large Language Models for Medicine: A Survey ( http://arxiv.org/abs/2405.13055v1 ) ライセンス: Link先を確認	Yanxin Zheng, Wensheng Gan, Zefeng Chen, Zhenlian Qi, Qian Liang, Philip S. Yu,	(参考訳) デジタル経済のデジタルインテリジェンスにおける課題に対処するため,大規模言語モデル(LLM)が開発されている。計算能力と利用可能な資源の改善により、LLMは大幅に進歩し、人間の生活のために様々な領域に統合された。医療用LSMは、様々な医療シナリオにまたがる潜在的な応用ツールである。本稿では,医療用LLMの要件と応用に焦点をあてて,LLMの発展を概観する。我々は,先進的な研究の方向性を探究し,将来の医学的応用のために研究者に利益をもたらすことを目的とした,既存モデルの簡潔な概要を提供する。アプリケーションにおける医療用LDMの利点と,その開発における課題を強調した。最後に,医療分野の要求に応えつつ,今後のLLMの課題と研究の方向性を緩和する技術統合の方向性を提案する。 To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, we review LLM developments, focusing on the requirements and applications of medical LLMs. We provide a concise overview of existing models, aiming to explore advanced research directions and benefit researchers for future medical applications. We emphasize the advantages of medical LLMs in applications, as well as the challenges encountered during their development. Finally, we suggest directions for technical integration to mitigate challenges and potential research directions for the future of medical LLMs, aiming to meet the demands of the medical field better.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# 新型コロナウイルスによる新聞記事の感情分析のための大規模言語モデル:The Guardian Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian ( http://arxiv.org/abs/2405.13056v1 ) ライセンス: Link先を確認	Rohitash Chandra, Baicheng Zhu, Qingying Fang, Eka Shinjikashvili,	(参考訳) 新型コロナウイルス(COVID-19)パンデミックの間、ニュースメディアは、ウイルス感染、医療資源の配分、政府の対応措置など、幅広いトピックをカバーした。新型コロナウイルスの感染拡大を抑えるために実施されるケースや政府戦略の台頭を踏まえ、ソーシャルメディアプラットフォームに対する感情分析について、公衆の反応を理解する研究が進められている。感情分析は、パンデミック中の社会的意見や感情的傾向の変化をよりよく理解することができる。ソーシャルメディア以外では、新聞は政府の情報、専門家、そして様々な話題に関する一般大衆の情報を広める上で重要な役割を担っている。新型コロナウイルス(COVID-19)感染拡大に伴う新聞ソースの感情分析は、メディアがパンデミックをどうカバーしているかを概観することができる。本研究では、The Guardian紙を選定し、初期感染、ロックダウン、ワクチン接種を含む新型コロナウイルスの様々な段階における感情分析を行う。我々は、新しい大規模言語モデル(LLM)を採用し、専門家による感情分析データを用いてそれらを洗練する。また、比較のためにパンデミック前に経験した感情の分析も提供する。その結果、パンデミックの初期段階において、公衆の感情が緊急の危機対応を優先し、後に健康と経済への影響に焦点を移したことが示唆された。ソーシャルメディアの感情分析に関する関連研究と比較すると,「ガーディアン」と「否定的感情」の優位性(不快感,不安感,否定感,否定感)の相違は,ソーシャルメディアがより多様な感情的反射をもたらすことを示唆している。 The Guardianでは、オーストラリア、イギリス、ワールドニュース、オピニオンを含むニュースセクションで、新型コロナウイルス前と期間中に、否定的な感情を総合的に支配する悲惨な物語を見つけた。 During the COVID-19 pandemic, the news media coverage encompassed a wide range of topics that includes viral transmission, allocation of medical resources, and government response measures. There have been studies on sentiment analysis of social media platforms during COVID-19 to understand the public response given the rise of cases and government strategies implemented to control the spread of the virus. Sentiment analysis can provide a better understanding of changes in societal opinions and emotional trends during the pandemic. Apart from social media, newspapers have played a vital role in the dissemination of information, including information from the government, experts, and also the public about various topics. A study of sentiment analysis of newspaper sources during COVID-19 for selected countries can give an overview of how the media covered the pandemic. In this study, we select The Guardian newspaper and provide a sentiment analysis during various stages of COVID-19 that includes initial transmission, lockdowns and vaccination. We employ novel large language models (LLMs) and refine them with expert-labelled sentiment analysis data. We also provide an analysis of sentiments experienced pre-pandemic for comparison. The results indicate that during the early pandemic stages, public sentiment prioritised urgent crisis response, later shifting focus to addressing the impact on health and the economy. In comparison with related studies about social media sentiment analyses, we found a discrepancy between The Guardian with dominance of negative sentiments (sad, annoyed, anxious and denial), suggesting that social media offers a more diversified emotional reflection. We found a grim narrative in The Guardian with overall dominance of negative sentiments, pre and during COVID-19 across news sections including Australia, UK, World News, and Opinion	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# Githubの問題はTree Of Thoughtsで解決できるだろうか? Can Github issues be solved with Tree Of Thoughts? ( http://arxiv.org/abs/2405.13057v1 ) ライセンス: Link先を確認	Ricardo La Rosa, Corey Hulse, Bangdi Liu,	(参考訳) 大規模な言語モデル(LLM)によるコード生成に関する広範な研究は、HumanEvalのようなベンチマークが96.3%の成功率で上回っているが、これらのベンチマークは主に、基本的な関数レベルのコード生成におけるモデルのパフォーマンスを判断し、GitHubの問題を解決するような現実のシナリオに必要なスコープの批判的思考と概念を欠いている。本研究では,この複雑な課題に対するLLMの意思決定能力と問題解決能力を高めるために,思考のツリー(ToT)言語モデル推論フレームワークの適用について紹介する。従来のインプット・アウトプット(IO)プロンプトとレトリーバル・オーグメンテッド・ジェネレーション(RAG)技術と比較して、ToTは複数の推論軌道の構造化探索を容易にし、潜在的な解の自己評価を可能にすることで性能を向上させるように設計されている。私たちは、SWE-benchのインスタンスに含まれるGithubの問題に対処するために、ToTを実験的にデプロイします。しかし、この結果から、ToTフレームワークだけではLLMに既存のメソッドを上回る重要な理由付け能力を与えるには不十分であることが判明した。本稿では,これらの欠点の潜在的な原因を分析し,思考プロセスの深化やエージェント機能の導入など,改善のための重要な領域を特定する。本研究の知見は,ToTの応用と実世界の問題解決シナリオにおけるLCMの可能性を活かすための今後の方向性を示すことを目的としている。 While there have been extensive studies in code generation by large language models (LLM), where benchmarks like HumanEval have been surpassed with an impressive 96.3% success rate, these benchmarks predominantly judge a model's performance on basic function-level code generation and lack the critical thinking and concept of scope required of real-world scenarios such as solving GitHub issues. This research introduces the application of the Tree of Thoughts (ToT) language model reasoning framework for enhancing the decision-making and problem-solving abilities of LLMs for this complex task. Compared to traditional input-output (IO) prompting and Retrieval Augmented Generation (RAG) techniques, ToT is designed to improve performance by facilitating a structured exploration of multiple reasoning trajectories and enabling self-assessment of potential solutions. We experimentally deploy ToT in tackling a Github issue contained within an instance of the SWE-bench. However, our results reveal that the ToT framework alone is not enough to give LLMs the critical reasoning capabilities to outperform existing methods. In this paper we analyze the potential causes of these shortcomings and identify key areas for improvement such as deepening the thought process and introducing agentic capabilities. The insights of this research are aimed at informing future directions for refining the application of ToT and better harnessing the potential of LLMs in real-world problem-solving scenarios.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# 未来を創るAIコミュニティ : ハグする顔ハブの開発活動の定量的分析 The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub ( http://arxiv.org/abs/2405.13058v1 ) ライセンス: Link先を確認	Cailean Osborne, Jennifer Ding, Hannah Rose Kirk,	(参考訳) オープンソース開発者は、人工知能(AI)の政治経済において重要な役割を担い、クローズドソースのAI開発に代わるものとして、オープンモデル開発が認められている。しかし、オープンソースAIにおける協調的なプラクティスについては、まだ理解が限られています。本稿では,Huging Face (HF) Hubにおける開発活動の定量的分析を3段階に分けて行うことで,このギャップに対処する。まず,348,181モデル,65,761データセット,および156,642スペースリポジトリのさまざまな種類の活動が,右スクリュー分布を示す。例えば、70%以上のモデルが0回ダウンロードされており、1%が99%のダウンロードを占めている。第2に、モデル上でのコラボレーションによるソーシャルネットワークの構造のスナップショットを分析し、コミュニティがコア周辺構造を持ち、多彩な開発者のコアと分離された開発者の大多数(89%)が参加していることを発見した。分離を除去すると、開発者のネットワーク位置に関わらず、協調は高い相互性によって特徴づけられる。第三に、空間におけるモデル利用のレンズを通してモデルの採用を検討し、少数の企業が開発している少数のモデルがHF Hubで広く使われていることを発見した。全体として、HF Hub上のさまざまなタイプのアクティビティは、GitHubのようなプラットフォーム上のOSS開発パターンに関する以前の観察と一致して、Paretoディストリビューションによって特徴づけられる。我々は、(オープンソース)AI研究者、開発者、政策立案者に対する発見とレコメンデーションの意味に関する議論で締めくくります。 Open source developers have emerged as key actors in the political economy of artificial intelligence (AI), with open model development being recognised as an alternative to closed-source AI development. However, we still have a limited understanding of collaborative practices in open source AI. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, we find that various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Second, we analyse a snapshot of the social network structure of collaboration on models, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing isolates, collaboration is characterised by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, we find that various types of activity on the HF Hub are characterised by Pareto distributions, congruent with prior observations about OSS development patterns on platforms like GitHub. We conclude with a discussion of the implications of the findings and recommendations for (open source) AI researchers, developers, and policymakers.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# RNG:複合型マルチモーダルアスペクト感度解析のためのマルチレベルノイズ低減とマルチグレードセマンティックギャップ RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis ( http://arxiv.org/abs/2405.13059v1 ) ライセンス: Link先を確認	Yaxin Liu, Yan Zhou, Ziming Li, Jinchuan Zhang, Yu Shang, Chenyang Zhang, Songlin Hu,	(参考訳) 重要なマルチモーダル感情分析タスクであるJMASA(Joint Multimodal Aspect-Sentiment Analysis)は、与えられたテキストイメージ対からアスペクト項と関連する感情極性を共同抽出することを目的としており、懸念が高まっている。既存の作業は,(1)多レベルモードノイズ,すなわち,事例レベルと特徴レベルノイズ,(2)多層セマンティックギャップ,すなわち粗くきめ細かなギャップの2つの限界に直面する。どちらの問題もアスペクト知覚対の正確な識別に干渉する可能性がある。これらの制約に対処するため、我々はRNG for JMASAという新しいフレームワークを提案する。具体的には, マルチレベル・モダリティノイズと多粒性セマンティックギャップを同時に低減するために, 1) インスタンスレベルのノイズ低減のためのテキスト画像類似性に基づくグローバルリラクタンス制約(GR-Con), (2) 特徴レベルのノイズ低減のための情報ボトルネック(IB-Con)原理に基づく情報ボトルネック制約(IB-Con), 3) 多粒性セマンティック・セマンティック・セマンティック・セマンティック・ギャップ低減のための対照的な学習方法に基づくセマンティック・コンストラクト(SC-Con)の3つの制約を設計する。 2つのデータセットに関する大規模な実験は、我々の新しい最先端のパフォーマンスを検証する。 As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap, i.e., coarse- and fine-grained gap. Both issues may interfere with accurate identification of aspect-sentiment pairs. To address these limitations, we propose a novel framework named RNG for JMASA. Specifically, to simultaneously reduce multi-level modality noise and multi-grained semantic gap, we design three constraints: (1) Global Relevance Constraint (GR-Con) based on text-image similarity for instance-level noise reduction, (2) Information Bottleneck Constraint (IB-Con) based on the Information Bottleneck (IB) principle for feature-level noise reduction, and (3) Semantic Consistency Constraint (SC-Con) based on mutual information maximization in a contrastive learning way for multi-grained semantic gap reduction. Extensive experiments on two datasets validate our new state-of-the-art performance.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# StatAvg: 侵入検知システムにおけるフェデレーション学習におけるデータ不均一性の軽減 StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems ( http://arxiv.org/abs/2405.13062v1 ) ライセンス: Link先を確認	Pavlos S. Bouzinis, Panagiotis Radoglou-Grammatikis, Ioannis Makris, Thomas Lagkas, Vasileios Argyriou, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis, George K. Karagiannidis,	(参考訳) フェデレートラーニング(FL)は、サードパーティに生データを公開せずに、参加するデバイスが共同で機械学習(ML)またはディープラーニング(DL)モデルを構築することを可能にする、分散学習技術である。プライバシー保護の性質から、FLはサイバーセキュリティの領域内で侵入検知システム(IDS)を構築するために広く注目を集めている。しかし、参加ドメインやエンティティ間のデータの均一性は、FLベースのIDSの信頼性を実現する上で大きな課題となる。本稿では,FLにおけるローカルクライアントのデータ間の非独立性および同一性(非ID)の分散機能を緩和する,統計的平均化(StatAvg)と呼ばれる効果的な手法を提案する。特にStatAvgは、FLクライアントが個々のデータ統計データをサーバと共有することを可能にする。後者はクライアントと共有され、ユニバーサルデータ正規化に使用される。 StatAvgは、実際のFLトレーニングプロセスの前に発生するような、あらゆるFLアグリゲーション戦略とシームレスに統合できることは注目に値する。提案手法は,ニューラルネットワークとホスト人工知能(AI)を用いたIDSのためのデータセットを用いて,ベースラインアプローチに対して評価される。実験により, FLクライアント間の非イド特徴分布の低減におけるStatAvgの有効性を, ベースライン法と比較して実証した。 Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# Aurora: 大気の基礎モデル Aurora: A Foundation Model of the Atmosphere ( http://arxiv.org/abs/2405.13063v1 ) ライセンス: Link先を確認	Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris,	(参考訳) ディープラーニング基盤モデルは、大量のデータを活用して、さまざまな下流タスクに取り組むために適応可能な汎用的な表現を学ぶことで、科学の多くの側面に革命をもたらしている。ファンデーションモデルは、地球系の膨大なデータを活用することで、地球とそのサブシステムをモデル化する能力も変革する、という約束を持っています。ここではAuroraを紹介します。Auroraは、100万時間以上の多様な気象および気候データに基づいてトレーニングされた大気の大規模な基盤モデルです。オーロラは基礎モデリングアプローチの強みを活用して、限られた訓練データ、異種変数、極端な事象を含む様々な大気予測問題に対する運用予測を生成する。 1分以内にオーロラは5日間の大気汚染予測と10日間の高解像度気象予測を生成し、最先端の古典的なシミュレーションツールと最高の専門的なディープラーニングモデルを上回った。これらの結果は, 基礎モデルが環境予測を変換できることを示唆している。 Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models. Taken together, these results indicate that foundation models can transform environmental forecasting.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# デジタルヘルスと室内空気の品質: 行動変化と技術受容のためのIoT駆動の人間中心の可視化プラットフォーム Digital Health and Indoor Air Quality: An IoT-Driven Human-Centred Visualisation Platform for Behavioural Change and Technology Acceptance ( http://arxiv.org/abs/2405.13064v1 ) ライセンス: Link先を確認	Rameez Raja Kureshi, Bhupesh Kumar Mishra, Dhavalkumar Thakker, Suvodeep Mazumdar, Xiao Li,	(参考訳) 大気汚染物質によるヒトの健康への影響は、室内空気質(IAQ)に対する懸念が高まっている。デジタルヘルスの介入や市民科学のイニシアチブの出現は、意識を高め、IAQを改善し、行動の変化を促進するための新たな道を提供してきた。 TAM(Technology Acceptance Model)は、IAQ技術のユーザ受け入れと採用を理解するための理論的枠組みを提供する。本稿では、COM-BモデルとIoT(Internet of Things)技術を用いて、人間中心のデジタル可視化プラットフォームを設計し、振る舞いの変化とIAQの改善をもたらすケーススタディを提案する。本研究は,IAQに対するユーザ体験,期待,影響に着目し,利用者の受け入れと採用についても検討した。 IAQセンシング、デジタル健康関連介入、市民科学、TAMモデルを統合することで、IAQの課題に対処し、公衆衛生を強化し、持続可能な屋内環境を育む機会を提供する。分析の結果,ヒトの行動,室内活動,意識などの要因がIAQの形成に重要な役割を担っていることが明らかとなった。 The detrimental effects of air pollutants on human health have prompted increasing concerns regarding indoor air quality (IAQ). The emergence of digital health interventions and citizen science initiatives has provided new avenues for raising awareness, improving IAQ, and promoting behavioural changes. The Technology Acceptance Model (TAM) offers a theoretical framework to understand user acceptance and adoption of IAQ technology. This paper presents a case study using the COM-B model and Internet of Things (IoT) technology to design a human-centred digital visualisation platform, leading to behavioural changes and improved IAQ. The study also investigates users' acceptance and adoption of the technology, focusing on their experiences, expectations, and the impact on IAQ. Integrating IAQ sensing, digital health-related interventions, citizen science, and the TAM model offers opportunities to address IAQ challenges, enhance public health, and foster sustainable indoor environments. The analytical results show that factors such as human behaviour, indoor activities, and awareness play crucial roles in shaping IAQ.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# 教師の人工知能認知の探求--K-12教育における人間とAIの相補性の機会と課題- Exploring Teachers' Perception of Artificial Intelligence: The Socio-emotional Deficiency as Opportunities and Challenges in Human-AI Complementarity in K-12 Education ( http://arxiv.org/abs/2405.13065v1 ) ライセンス: Link先を確認	Soon-young Oh, Yongsu Ahn,	(参考訳) 学校では、教師は教育者、カウンセラー、意思決定者、学校コミュニティのメンバーとして多くの役割を担っている。人工知能(AI)の最近の進歩により、AIが教師にどのように支援し、補完し、協力できるかが議論されている。本研究は,学校における教師とAIの補完関係を改善するために,教師とAIの相補性に関する談話の拡大を目的とした。韓国の小学校教師100名による調査と、12人の教師との詳細なインタビューの混合手法を用いて、教師は、管理タスクを自動化し、高度な知性を通じてパーソナライズされた学習を強化することで、AIが人間の教師を補完する可能性を期待していることを示唆した。興味深いことに、AIの社会的感情能力の欠如は、課題と機会の両方として認識されている。全体として、我々の研究は教師の微妙な認識と彼らの役割に対する様々な期待レベルを示し、教育者の好みや関心に合わせたAIの採用に関する決定の必要性に挑戦する。 In schools, teachers play a multitude of roles, serving as educators, counselors, decision-makers, and members of the school community. With recent advances in artificial intelligence (AI), there is increasing discussion about how AI can assist, complement, and collaborate with teachers. To pave the way for better teacher-AI complementary relationships in schools, our study aims to expand the discourse on teacher-AI complementarity by seeking educators' perspectives on the potential strengths and limitations of AI across a spectrum of responsibilities. Through a mixed method using a survey with 100 elementary school teachers in South Korea and in-depth interviews with 12 teachers, our findings indicate that teachers anticipate AI's potential to complement human teachers by automating administrative tasks and enhancing personalized learning through advanced intelligence. Interestingly, the deficit of AI's socio-emotional capabilities has been perceived as both challenges and opportunities. Overall, our study demonstrates the nuanced perception of teachers and different levels of expectations over their roles, challenging the need for decisions about AI adoption tailored to educators' preferences and concerns.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# 機械学習型NIDSのための分散処理フレームワークの実用化 Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS ( http://arxiv.org/abs/2405.13066v1 ) ライセンス: Link先を確認	Maho Kajiura, Junya Nakamura,	(参考訳) ネットワーク侵入検知システム(NIDS)は、ネットワークトラフィックにおける侵入攻撃を検出する。特に、未知の攻撃の検出率が高いため、機械学習ベースのNIDSが注目されている。スケーラブルな分散ストリーム処理システムを用いた機械学習に基づくNIDSのための分散処理フレームワークが文献で提案されている。しかし、機械学習に基づく分類器が実装された場合のパフォーマンスは包括的に評価されていない。本研究では,本フレームワークに基づく5つの代表的な分類器(決定木,ランダムフォレスト,ネイブベイズ,SVM,kNN)を実装し,そのスループットとレイテンシを評価する。実験により,これらの分類器間の処理性能の違いと,フレームワークの処理性能のボトルネックについて検討した。 Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# Lockpicking LLM: トークンレベルの操作を用いたロジトベースのジェイルブレイク Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation ( http://arxiv.org/abs/2405.13068v1 ) ライセンス: Link先を確認	Yuxi Li, Yi Liu, Yuekang Li, Ling Shi, Gelei Deng, Shengquan Chen, Kailong Wang,	(参考訳) 大規模言語モデル(LLM)は、自然言語処理の分野を変えてきたが、意図しない、潜在的に有害なコンテンツを生成する能力を利用するジェイルブレイク攻撃の影響を受け続けている。既存のトークンレベルのジェイルブレイクテクニックは有効だが、特にモデルが頻繁な更新を行い、高度な防御措置を取り入れているため、スケーラビリティと効率の課題に直面している。本稿では,これらの制約に効果的に対応する革新的なトークンレベルの操作手法であるJailMineを紹介する。 JailMineは、肯定的なアウトプットを戦略的に選択し、拒否の可能性を反復的に低減することで、LSMから悪意ある応答を抽出する自動化された"マイニング"プロセスを採用している。複数の有名なLCMとデータセットの厳密なテストを通じて、JailMineの有効性と効率を実証し、進化する防衛戦略に直面した場合でも、平均95%の成功率を維持しながら、使用時間の86%の大幅な削減を実現した。我々の研究は、LLMの脆弱性をジェイルブレイク攻撃に対して評価し緩和するための継続的な努力に寄与し、これらの強力な言語モデルのセキュリティと信頼性を高めるための継続的な警戒と積極的な対策の重要性を強調している。 Large language models (LLMs) have transformed the field of natural language processing, but they remain susceptible to jailbreaking attacks that exploit their capabilities to generate unintended and potentially harmful content. Existing token-level jailbreaking techniques, while effective, face scalability and efficiency challenges, especially as models undergo frequent updates and incorporate advanced defensive measures. In this paper, we introduce JailMine, an innovative token-level manipulation approach that addresses these limitations effectively. JailMine employs an automated "mining" process to elicit malicious responses from LLMs by strategically selecting affirmative outputs and iteratively reducing the likelihood of rejection. Through rigorous testing across multiple well-known LLMs and datasets, we demonstrate JailMine's effectiveness and efficiency, achieving a significant average reduction of 86% in time consumed while maintaining high success rates averaging 95%, even in the face of evolving defensive strategies. Our work contributes to the ongoing effort to assess and mitigate the vulnerability of LLMs to jailbreaking attacks, underscoring the importance of continued vigilance and proactive measures to enhance the security and reliability of these powerful language models.	翻訳日:2024-05-25 04:32:08 公開日:2024-05-20
# ニュース記事イベントベース埋め込みの新しい手法 A Novel Method for News Article Event-Based Embedding ( http://arxiv.org/abs/2405.13071v1 ) ライセンス: Link先を確認	Koren Ishlach, Itzhak Ben-David, Michael Fire, Lior Rokach,	(参考訳) ニュース記事の埋め込みは、メディアバイアスの検出、偽ニュースの特定、ニュースレコメンデーションなど、複数の分野にとって重要なツールである。しかし、既存のニュース埋め込み手法は、ニュースイベントの潜在コンテキストをキャプチャするために最適化されていない。多くの場合、ニュース埋め込み手法は全文情報に依存し、時間関連埋め込み生成の重要性を無視する。そこで本稿では,記事に言及されているエンティティやテーマと,特定のイベントへの歴史的関連性に注目して,ニュース埋め込み生成を最適化する,新たな軽量な手法を提案する。 3段階からなる手法を提案する。まず、与えられたニュース記事のイベント、エンティティ、テーマを処理し、抽出する。第2に、現在および歴史的データに基づいて、時間的に分離されたGloVeモデルをトレーニングすることで、テーマやエンティティの周期的な時間埋め込みを生成する。最後に、記事レベルのベクトルに対するSIF(Smooth Inverse Frequency)と、イベント関連情報による埋め込みのためのSamese Neural Networksの2つの異なるアプローチによって生成されたニュース埋め込みを結合する。我々はGDELTプロジェクトから,85万件以上のニュース記事と1000,000件のイベントを収集し,評価を行った。検証のために、我々は異なるニュース埋め込み生成手法の比較分析を行い、共有イベント検出タスクに2回適用した。提案手法は,すべてのタスクやデータセットに対して,精度・リコール(PR)AUCを大幅に改善することを示す。具体的には,SIFと比較して平均的PR AUC改善率は2.15%,2.57%,日毎および月毎の共有イベント検出タスクに対する半監督的アプローチに比べて2.57%,2.43%であった。 Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and news recommendations. However, existing news embedding methods are not optimized for capturing the latent context of news events. In many cases, news embedding methods rely on full-textual information and neglect the importance of time-relevant embedding generation. Here, we aim to address these shortcomings by presenting a novel lightweight method that optimizes news embedding generation by focusing on the entities and themes mentioned in the articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract the events, entities, and themes for the given news articles. Second, we generate periodic time embeddings for themes and entities by training timely separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. To test and evaluate our method, we leveraged over 850,000 news articles and 1,000,000 events from the GDELT project. For validation purposes, we conducted a comparative analysis of different news embedding generation methods, applying them twice to a shared event detection task - first on articles published within the same day and subsequently on those published within the same month. Our experiments show that our method significantly improves the Precision-Recall (PR) AUC across all tasks and datasets. Specifically, we observed an average PR AUC improvement of 2.15% and 2.57% compared to SIF, as well as 2.57% and 2.43% compared to the semi-supervised approach for daily and monthly shared event detection tasks, respectively.	翻訳日:2024-05-25 04:22:11 公開日:2024-05-20
# メタ変数を持つ異種データセットのグラフ構造距離 A graph-structured distance for heterogeneous datasets with meta variables ( http://arxiv.org/abs/2405.13073v1 ) ライセンス: Link先を確認	Edward Hallé-Hannan, Charles Audet, Youssef Diouane, Sébastien Le Digabel, Paul Saves,	(参考訳) 不均一データセットは、さまざまなデータソース、さまざまなデータタイプ、変数間の複雑な関係を特徴とする、さまざまな機械学習や最適化アプリケーションに現れる。実際には、ヘテロジニアスデータセットは、処理が容易なより小さな、よく理解されたデータセットに分割されることが多い。しかしながら、一部のアプリケーションは、高コストで生成または制限されたサイズデータセットを含んでおり、データセット全体に基づいたメソッドを動機付けている。この研究の最初の貢献は、最先端の階層的、木構造的、変数サイズのフレームワークを一般化するグラフ構造化フレームワークのモデリングである。このフレームワークは、変数が連続的、整数的、またはカテゴリー的であるような異種データセットを含むドメインをモデル化する。除外された変数は、与えられたポイントに応じて含まれるか除外される変数を管理するために導入された。 2つ目の主な貢献はグラフ構造距離であり、拡張点と含められた変数と除外された変数の組み合わせを比較する:任意の一対の点を比較することができ、メタ変数を持つ異種データセットで直接動作することができる。コントリビューションはいくつかの回帰実験で説明され、ハイパーパラメーターに対する多層パーセプトロンの性能は逆距離重み付けと$K$-nearest neighborsモデルでモデル化される。 Heterogeneous datasets emerge in various machine learning or optimization applications that feature different data sources, various data types and complex relationships between variables. In practice, heterogeneous datasets are often partitioned into smaller well-behaved ones that are easier to process. However, some applications involve expensive-to-generate or limited size datasets, which motivates methods based on the whole dataset. The first main contribution of this work is a modeling graph-structured framework that generalizes state-of-the-art hierarchical, tree-structured, or variable-size frameworks. This framework models domains that involve heterogeneous datasets in which variables may be continuous, integer, or categorical, with some identified as meta if their values determine the inclusion/exclusion or affect the bounds of other so-called decreed variables. Excluded variables are introduced to manage variables that are either included or excluded depending on the given points. The second main contribution is the graph-structured distance that compares extended points with any combination of included and excluded variables: any pair of points can be compared, allowing to work directly in heterogeneous datasets with meta variables. The contributions are illustrated with some regression experiments, in which the performance of a multilayer perceptron with respect to its hyperparameters is modeled with inverse distance weighting and $K$-nearest neighbors models.	翻訳日:2024-05-25 04:22:11 公開日:2024-05-20
# 量子トモグラフィーは量子力学を説明する Quantum tomography explains quantum mechanics ( http://arxiv.org/abs/2110.05294v5 ) ライセンス: Link先を確認	Arnold Neumaier,	(参考訳) ボーンの法則よりむしろ量子トモグラフィにインスパイアされた新しい原理から始まり、量子力学と量子測定に対する自己完結型導出的アプローチを提供する。量子検出器を構成するものとその反応の振る舞いに対する示唆的な概念は、論理的に不可能な測定の定義につながる。光状態、位置測定、粒子軌道の測定スキームへの応用は、理想化なしに複雑な現実的な実験に適用可能であることを示す。量子状態、量子検出器、量子プロセス、量子機器のための様々な形態の量子トモグラフィについて論じる。量子力学の伝統的な力学とスペクトルの性質は、量子過程の連続極限から導かれ、混合量子系の密度作用素に対するリンドブラッド方程式と、純粋な非混合量子系の状態ベクトルに対するシュリンガー方程式を与える。正規化密度作用素は古典位相空間変数の位置と運動量と完全に類似した量子位相空間変数の役割を果たす。測定過程のわずかな理想化は、量子場の概念に結びつき、量子期待は、測定可能な空間の領域の再現可能な性質として現れる。新しいアプローチは、従来の基礎よりも実践に近いものです。より一般的であり、従ってより強力である。従来の手法よりもシンプルで技術的には少ないため、量子力学の標準的なツールを導出するのは難しくない。これにより、新しいアプローチは量子力学の入門コースに適合する。文学から引用された様々な引用は、歴史的・哲学的な側面で形式的な展示を照らしている。 Starting from a new principle inspired by quantum tomography rather than from Born's rule, this paper gives a self-contained deductive approach to quantum mechanics and quantum measurement. A suggestive notion for what constitutes a quantum detector and for the behavior of its responses leads to a logically impeccable definition of measurement. Applications to measurement schemes for optical states, position measurements and particle tracks demonstrate the applicability to complex realistic experiments without any idealization. The various forms of quantum tomography for quantum states, quantum detectors, quantum processes, and quantum instruments are discussed. The traditional dynamical and spectral properties of quantum mechanics are derived from a continuum limit of quantum processes, giving the Lindblad equation for the density operator of a mixing quantum system and the Schr\"odinger equation for the state vector of a pure, nonmixing quantum system. Normalized density operators are shown to play the role of quantum phase space variables, in complete analogy to the classical phase space variables position and momentum. A slight idealization of the measurement process leads to the notion of quantum fields, whose smeared quantum expectations emerge as reproducible properties of regions of space accessible to measurements. The new approach is closer to actual practice than the traditional foundations. It is more general, and therefore more powerful. It is simpler and less technical than the traditional approach, and the standard tools of quantum mechanics are not difficult to derive. This makes the new approach suitable for introductory courses on quantum mechanics. A variety of quotes from the literature illuminate the formal exposition with historical and philosophical aspects.	翻訳日:2024-05-22 19:47:36 公開日:2024-05-20
# 観測観測者の量子力学則と量子論の整合性 Quantum mechanical rules for observed observers and the consistency of quantum theory ( http://arxiv.org/abs/2202.04203v2 ) ライセンス: Link先を確認	Alexios P. Polychronakos,	(参考訳) ユニタリ量子力学の規則は、観測者がマクロ状態(<cat>測定)の線形結合で測定の対象となることを示唆しており、そのような測定後の実験結果に対する信頼性の高い予測はできないことを論じる。これにより、Frauchiger と Renner が最近発見した量子力学の解釈に矛盾が生じる。結果の確率を計算し、他のオブザーバーと通信するためのボーンルールは、一般的には猫測定の観察者には適用されない。これらの条件で完了した量子力学的規則は、完全に一貫したものになる。 I argue that the rules of unitary quantum mechanics imply that observers who will themselves be subject to measurements in a linear combination of macroscopic states (``cat" measurements) cannot make reliable predictions on the results of experiments performed after such measurements. This lifts the inconsistency in the interpretation of quantum mechanics recently identified by Frauchiger and Renner. The Born rules for calculating the probability of outcomes and for communicating with other observers do not generally apply for cat-measured observers, nor can they generally be amended to incorporate upcoming cat measurements. Quantum mechanical rules completed with these conditions become fully consistent.	翻訳日:2024-05-22 19:47:36 公開日:2024-05-20
# リプシッツ平滑性を考慮した簡易制御ランダムリシャッフル勾配アルゴリズムの収束性 Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness ( http://arxiv.org/abs/2212.01848v3 ) ライセンス: Link先を確認	Ruggiero Seccia, Corrado Coppola, Giampaolo Liuzzi, Laura Palagi,	(参考訳) 本研究では,非常に多くのスムーズかつ非凸関数の平均を最小化することを検討するとともに,この最適化問題に対処するために広く利用されている2つのミニバッチフレームワークであるインクリメンタルグラディエント(IG)とランダムリシャッフル(RR)に焦点を当てる。我々は IG/RR スキームの緩和制御的な修正を定義するが、これはより軽い計算量 {but} が {weak} と標準仮定の下で収束することを証明できる。特に、IG/RRイテレーションをウォッチドッグルールと、収束を保証するために散発的にのみ活性化するデリバティブフリーライン探索を用いて制御する2つのアルゴリズムスキームを定義する。 2つのスキームは、モノトニックまたは非モノトニックな規則を用いて実行される、ウォッチドッグとライン検索で異なる。この2つのスキームは、メインIG/RRイテレーションで使用されるステップサイズの更新を制御でき、ステップサイズをゼロにしすぎてしまうような事前設定ルールの使用を避けることができ、ステップサイズを効果的に更新するルールを設計する労力を減らすことができる。成分関数の勾配のリプシッツ連続性の軽微な仮定の下で収束性を証明し、異なるディープニューラルネットワークアーキテクチャと様々なサイズデータセットのベンチマークを用いて広範な計算解析を行う。我々は,本手法を全バッチ勾配法(L-BFGS)とIG/RR法の両方と比較し,我々のアルゴリズムが他のオンラインアルゴリズムと同じような計算作業を必要とすること,学習速度の制御が目的関数の高速化を可能にすることを証明した。 In this work, we consider minimizing the average of a very large number of smooth and possibly non-convex functions, and we focus on two widely used minibatch frameworks to tackle this optimization problem: Incremental Gradient (IG) and Random Reshuffling (RR). We define ease-controlled modifications of the IG/RR schemes, which require a light additional computational effort {but} can be proved to converge under {weak} and standard assumptions. In particular, we define two algorithmic schemes in which the IG/RR iteration is controlled by using a watchdog rule and a derivative-free linesearch that activates only sporadically to guarantee convergence. The two schemes differ in the watchdog and the linesearch, which are performed using either a monotonic or a non-monotonic rule. The two schemes also allow controlling the updating of the stepsize used in the main IG/RR iteration, avoiding the use of pre-set rules that may drive the stepsize to zero too fast, reducing the effort in designing effective updating rules of the stepsize. We prove convergence under the mild assumption of Lipschitz continuity of the gradients of the component functions and perform extensive computational analysis using different deep neural architectures and a benchmark of varying-size datasets. We compare our implementation with both a full batch gradient method (i.e. L-BFGS) and an implementation of IG/RR methods, proving that our algorithms require a similar computational effort compared to the other online algorithms and that the control on the learning rate may allow a faster decrease of the objective function.	翻訳日:2024-05-22 19:40:07 公開日:2024-05-20
# 花を五千通り見る Seeing a Rose in Five Thousand Ways ( http://arxiv.org/abs/2212.04965v2 ) ライセンス: Link先を確認	Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu,	(参考訳) 視覚的に、バラとは何か? バラは内在的であり、幾何学、テクスチャ、およびその対象カテゴリーに特有の物質が分布する。これらの固有の性質を知ることで、異なる大きさと形状のバラを異なるポーズで、異なる照明条件下でレンダリングすることができる。本研究では,花束の写真など,一つの画像からそのような物体の内在を捉えることを学習する生成モデルを構築する。このようなイメージには、オブジェクトタイプの複数のインスタンスが含まれている。これらの例は全て同じ内在論を共有しているが、これらの内在論におけるばらつきと、ポーズや照明のような外在的要因の違いにより異なるように見える。実験により,インターネット画像から対象物(形状,テクスチャ,素材の分布)を多種多様に学習することに成功した。提案手法は,本質的な画像分解,形状と画像生成,ビュー合成,ライティングなど,複数のダウンストリームタスクにおいて優れた結果が得られる。 What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.	翻訳日:2024-05-22 19:40:07 公開日:2024-05-20
# 線形代数に対する量子ビット効率の良いランダム化量子アルゴリズム Qubit-Efficient Randomized Quantum Algorithms for Linear Algebra ( http://arxiv.org/abs/2302.01873v3 ) ライセンス: Link先を確認	Samson Wang, Sam McArdle, Mario Berta,	(参考訳) 本稿では,行列関数に対する量子ブロック符号化や他のコヒーレントなオラクルアクセスを使わずに,行列関数からサンプリングするタスクのためのランダム化量子アルゴリズムのクラスを提案する。したがって、量子ビットの使用は純粋にアルゴリズムであり、量子データ構造には追加の量子ビットは必要ない。我々のアルゴリズムは、関心の行列がパウリ基底で指定される古典的なデータ構造から始まる。 N\times N$ Hermitian 行列の場合、空間コストは$\log(N)+1$ qubitsであり、行列の構造によっては、ゲートの複雑さは、等価なエンドツーエンドの問題を考えるとき、最大$O(N^2)$の量子データ構造を使用する最先端の手法に匹敵する。本フレームワークでは,解ベクトルの性質をサンプリングする量子線形系解法と,ハミルトンの基底状態とギブス状態の特性をサンプリングするアルゴリズムを提案する。具体的な応用として、これらのサブルーチンを組み合わせて、量子多体系のグリーン関数を計算するスキームを示す。 We propose a class of randomized quantum algorithms for the task of sampling from matrix functions, without the use of quantum block encodings or any other coherent oracle access to the matrix elements. As such, our use of qubits is purely algorithmic, and no additional qubits are required for quantum data structures. Our algorithms start from a classical data structure in which the matrix of interest is specified in the Pauli basis. For $N\times N$ Hermitian matrices, the space cost is $\log(N)+1$ qubits and depending on the structure of the matrices, the gate complexity can be comparable to state-of-the-art methods that use quantum data structures of up to size $O(N^2)$, when considering equivalent end-to-end problems. Within our framework, we present a quantum linear system solver that allows one to sample properties of the solution vector, as well as algorithms for sampling properties of ground states and Gibbs states of Hamiltonians. As a concrete application, we combine these sub-routines to present a scheme for calculating Green's functions of quantum many-body systems.	翻訳日:2024-05-22 19:40:07 公開日:2024-05-20
# 早期出力を用いた深部ニューラルネットワークの階層的学習 Hierarchical Training of Deep Neural Networks Using Early Exiting ( http://arxiv.org/abs/2303.02384v4 ) ライセンス: Link先を確認	Yamin Sepehri, Pedram Pad, Ahmet Caner Yüzügüler, Pascal Frossard, L. Andrea Dunbar,	(参考訳) 深層ニューラルネットワークは、ビジョンタスクに最先端の精度を提供するが、トレーニングにはかなりのリソースを必要とする。これにより、データを取得するエッジデバイスから遠く離れたクラウドサーバでトレーニングされる。この問題は通信コスト、ランタイム、プライバシの懸念を高める。本研究では,エッジとクラウドワーカを分割したアーキテクチャで早期のエグジットを利用して通信コスト,トレーニングランタイム,プライバシの懸念を緩和する,ディープニューラルネットワークの新しい階層的トレーニング手法を提案する。本手法では,トレーニング期間中のエッジとクラウド間のニューラルネットワークの後方通過を分離するために,早期出口の新しいユースケースを提案する。トレーニングフェーズのシーケンシャルな性質のため、階層のレベルを同時にトレーニングできない、あるいはプライバシを妥協するコストで実行できない、最も利用可能なメソッドの問題に対処する。対照的に,本手法はエッジとクラウドワーカを同時に使用することができ,生の入力データをクラウドと共有せず,後方通過時の通信も不要である。異なるニューラルネットワークアーキテクチャに対するいくつかのシミュレーションとオンデバイス実験は、この方法の有効性を実証している。 CIFAR-10分類では,VGG-16およびResNet-18アーキテクチャのトレーニングランタイムを29%,61%削減し,低ビットレートチャネル上でクラウドとの通信を行う場合,Tiny ImageNet分類では25%,81%削減した。この実行時の利得は達成され、精度低下は無視される。この方法は、エッジクラウドシステムの一部として、携帯電話やロボットなどのセンサ保有の低リソースデバイス上での、高精度なディープニューラルネットワークのオンライン学習に有利である。 Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime and privacy concerns. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved whilst the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy deep neural networks on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge-cloud system, making them more flexible in facing new tasks and classes of data.	翻訳日:2024-05-22 19:40:07 公開日:2024-05-20
# NoRA: 高連結ハミルトニアンの体積方向エンタングル平衡状態のためのテンソルネットワークアンサッツ NoRA: A Tensor Network Ansatz for Volume-Law Entangled Equilibrium States of Highly Connected Hamiltonians ( http://arxiv.org/abs/2303.16946v5 ) ライセンス: Link先を確認	Valérie Bettaque, Brian Swingle,	(参考訳) 平均場量子スピングラスモデルやSachdev-Ye-Kitaev(SYK)モデルのような全対全相互作用を持つ量子モデルの基底状態構造により、体積法則の絡み合いと大きな基底状態の縮退を緩和できるテンソルネットワークアーキテクチャを提案する。我々は、このアーキテクチャを非局所再正規化アンサッツ(NoRA)と呼んでいる。これは、MERA、DMERA、分岐MERAネットワークの一般化と見なすことができ、空間的局所性の制約が取り除かれるためである。アーキテクチャはSYKモデルの接地空間の絡み合いや複雑さを捉えるのに十分な表現性を持っているため、適切な変分アンザッツとなるが、SYKの詳細な研究は今後の研究に任せる。テンソルがランダムなクリフォードゲートである特別な場合において、アーキテクチャをさらに探求する。ここでは、アーキテクチャをランダムな安定化器コードの符号化マップと見なすことができる。我々はSYKモデルにインスパイアされた一連の符号を導入し、高重量安定器のコストで一定速度と線形距離を選択できることを示した。また、この符号族とSYK基底空間から形成される近似符号との潜在的な類似点についてもコメントする。 Motivated by the ground state structure of quantum models with all-to-all interactions such as mean-field quantum spin glass models and the Sachdev-Ye-Kitaev (SYK) model, we propose a tensor network architecture which can accomodate volume law entanglement and a large ground state degeneracy. We call this architecture the non-local renormalization ansatz (NoRA) because it can be viewed as a generalization of MERA, DMERA, and branching MERA networks with the constraints of spatial locality removed. We argue that the architecture is potentially expressive enough to capture the entanglement and complexity of the ground space of the SYK model, thus making it a suitable variational ansatz, but we leave a detailed study of SYK to future work. We further explore the architecture in the special case in which the tensors are random Clifford gates. Here the architecture can be viewed as the encoding map of a random stabilizer code. We introduce a family of codes inspired by the SYK model which can be chosen to have constant rate and linear distance at the cost of some high weight stabilizers. We also comment on potential similarities between this code family and the approximate code formed from the SYK ground space.	翻訳日:2024-05-22 19:40:07 公開日:2024-05-20
# 動揺運動によるパーキンソン病の時系列分類 Time Series Classification for Detecting Parkinson's Disease from Wrist Motions ( http://arxiv.org/abs/2304.11265v2 ) ライセンス: Link先を確認	Cedric Donié, Neha Das, Satoshi Endo, Sandra Hirche,	(参考訳) パーキンソン病(英: Parkinson disease, PD)は、運動症状の頻繁な変化を特徴とする神経変性疾患である。古典的時系列分類と深層学習技術は、複雑なPD運動パターンと利用可能なデータセットの小さいため、ウェアラブル加速度計データを用いたPD症状のモニタリングにおいて限られた効果を示した。 InceptionTimeとRandOm Convolutional KErnel Transform(ROCKET)をPD症状モニタリングに有望なものとして検討し、InceptionTimeの高学習能力は複雑な動きパターンをモデル化するのに適しており、ROCKETは小さなデータセットに適している。ランダムな探索手法により,最も高いインセプションタイムアーキテクチャを同定し,その性能をPD患者の手首動作データに対する尾根分類器と多層パーセプトロン(MLP)と比較する。以上の結果より, 震度とブラジキネジアの有無を推定するのには全アプローチが適しているが, ジスキネジアの検出には困難が伴うことが示唆された。 ROCKETはジスキネジアの同定において優れた性能を示すが、InceptionTimeは振れやブラジキネシアの検出においてわずかに優れた性能を示す。特に、どちらの手法も多層パーセプトロンよりも優れている。結論として、InceptionTimeは複雑な手首の動き時系列を分類する能力を示し、PDの継続的な症状モニタリングの最大の可能性を秘めている。 Parkinson's disease (PD) is a neurodegenerative condition characterized by frequently changing motor symptoms, necessitating continuous symptom monitoring for more targeted treatment. Classical time series classification and deep learning techniques have demonstrated limited efficacy in monitoring PD symptoms using wearable accelerometer data due to complex PD movement patterns and the small size of available datasets. We investigate InceptionTime and RandOm Convolutional KErnel Transform (ROCKET) as they are promising for PD symptom monitoring, with InceptionTime's high learning capacity being well-suited to modeling complex movement patterns while ROCKET is suited to small datasets. With random search methodology, we identify the highest-scoring InceptionTime architecture and compare its performance to ROCKET with a ridge classifier and a multi-layer perceptron (MLP) on wrist motion data from PD patients. Our findings indicate that all approaches are suitable for estimating tremor severity and bradykinesia presence but encounter challenges in detecting dyskinesia. ROCKET demonstrates superior performance in identifying dyskinesia, whereas InceptionTime exhibits slightly better performance in tremor and bradykinesia detection. Notably, both methods outperform the multi-layer perceptron. In conclusion, InceptionTime exhibits the capability to classify complex wrist motion time series and holds the greatest potential for continuous symptom monitoring in PD.	翻訳日:2024-05-22 19:30:20 公開日:2024-05-20
# 線型フェルミオン部分を持つ指数関数に対するバリアン・ブレジン分解の一般化 Generalization of Balian-Brezin decomposition for exponentials with linear fermionic part ( http://arxiv.org/abs/2306.13481v3 ) ライセンス: Link先を確認	M. A. Seifi Mirjafarlou, A. Jafarizadeh, M. A. Rajabpour,	(参考訳) フェルミオンガウス状態は、その興味深い性質、特にウィックの定理により、かなりの注意を払っている。フェルミオン型ガウス作用素と状態の性質を一般化したバリアンとブレジンの研究を拡張して、それらの発見をさらに拡張して、ガウス作用素を線型成分に組み込む。コルパが導入した手法を活用し、解析を合理化し、線形項を含む指数関数を包含するバリアン・ブレジン分解(BBD)の包括的拡張を示す。さらに、線形部分を含むガウス状態を導入し、対応する重複式を導出する。さらに、Wickの定理を線形項を含むシナリオを包含するように一般化し、一点相関関数と二点相関関数に関する一般的な期待値の表現を容易にする。また、$\mathfrak{so}(N)$ Lie algebra 内の BCH (Zassenhaus) 公式に対処する際の BB 分解の適用性に関する簡単な注釈も提供する。 Fermionic Gaussian states have garnered considerable attention due to their intriguing properties, most notably Wick's theorem. Expanding upon the work of Balian and Brezin, who generalized properties of fermionic Gaussian operators and states, we further extend their findings to incorporate Gaussian operators with a linear component. Leveraging a technique introduced by Colpa, we streamline the analysis and present a comprehensive extension of the Balian-Brezin decomposition (BBD) to encompass exponentials involving linear terms. Furthermore, we introduce Gaussian states featuring a linear part and derive corresponding overlap formulas. Additionally, we generalize Wick's theorem to encompass scenarios involving linear terms, facilitating the expression of generic expectation values in relation to one and two-point correlation functions. We also provide a brief commentary on the applicability of the BB decomposition in addressing the BCH (Zassenhaus) formulas within the $\mathfrak{so}(N)$ Lie algebra.	翻訳日:2024-05-22 19:30:20 公開日:2024-05-20
# 電子健康記録を用いた因果推論のためのテキストデータの活用 Leveraging text data for causal inference using electronic health records ( http://arxiv.org/abs/2307.03687v2 ) ライセンス: Link先を確認	Reagan Mozer, Aaron R. Kaufman, Leo A. Celi, Luke Miratrix,	(参考訳) 電子健康記録(EHR)のデータに依存する研究において、臨床進歩ノートなどの構造化されていないテキストデータは、構造化されたデータから欠落している可能性がある患者の特徴やケアに関する情報の豊富な情報源を提供する。臨床研究におけるテキストの普及にもかかわらず、これらのデータは、その複雑さのために定量的分析のために無視されることが多い。本稿では,テキストデータを利用した電子健康データによる因果推論を解析の複数の段階で支援するための統一的な枠組みを提案する。特に、自然言語処理と統計テキスト解析を標準推論手法と組み合わせて、欠落データ、不確定バイアス、処理効果の不均一性といった問題に対処する方法を検討する。本研究は,非ランダム化医療介入が患者予後に与える影響を調査するERH研究への応用を通じて,従来のマッチング分析にテキストデータを統合することで,治療効果の妥当性を高め,治療の恩恵を最も受ける患者サブグループを特定することができることを示す。我々は,これらの手法が臨床データの二次分析の範囲を,発展途上国のような構造化ERHデータに制限された領域にまで広げる可能性があると考えている。この目的のために、我々は、臨床研究におけるこれらの技術の採用と広範な探索を促進するために、コードとオープンソースの複製材料を提供する。 In studies that rely on data from electronic health records (EHRs), unstructured text data such as clinical progress notes offer a rich source of information about patient characteristics and care that may be missing from structured data. Despite the prevalence of text in clinical research, these data are often ignored for the purposes of quantitative analysis due their complexity. This paper presents a unified framework for leveraging text data to support causal inference with electronic health data at multiple stages of analysis. In particular, we consider how natural language processing and statistical text analysis can be combined with standard inferential techniques to address common challenges due to missing data, confounding bias, and treatment effect heterogeneity. Through an application to a recent EHR study investigating the effects of a non-randomized medical intervention on patient outcomes, we show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect and identify patient subgroups that may benefit most from treatment. We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited, such as in developing countries. To this end, we provide code and open-source replication materials to encourage adoption and broader exploration of these techniques in clinical research.	翻訳日:2024-05-22 19:20:36 公開日:2024-05-20
# ニューラルトピカル表現の一般化に向けて Towards Generalising Neural Topical Representations ( http://arxiv.org/abs/2307.12564v3 ) ライセンス: Link先を確認	Xiaohao Yang, He Zhao, Dinh Phung, Lan Du,	(参考訳) トピックモデルは従来のベイズ確率モデルから最近のニューラルトピックモデル(NTM)へと進化してきた。 NTMは特定のコーパスでトレーニングおよびテストを行う際に有望な性能を示すが、コーパス間の一般化能力はまだ研究されていない。実際には、ソースコーパスでトレーニングされたNTMが、異なるターゲットコーパスから文書の質の高いトピック表現(トピック上の潜在分布)を生成できると期待されることが多い。本研究では,文書の表現能力がコーパスやタスク全体にわたって確実に一般化されるように,NTMをさらに改良することを目指している。そこで我々は,類似文書間の意味的距離を狭め,異なるコーパスからの文書が類似した意味を共有できるという前提のもとに,NTMの強化を提案する。具体的には、テキストデータ拡張により、トレーニング文書毎に類似した文書を取得する。そして,各ペア間の意味的距離を階層的話題移動距離(Hierarchical Topic Transport Distance)で測定し,トピック表現間の最適移動距離を計算することにより,NTMをさらに最適化する。我々のフレームワークは、ほとんどのNTMにプラグイン・アンド・プレイモジュールとして簡単に適用できます。大規模な実験により, コーパス間の神経トピック表現に関する一般化能力は大幅に向上した。私たちのコードとデータセットは、https://github.com/Xiaohao-Yang/Topic_Model_Generalisationで公開されています。 Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalisation ability across corpora has yet to be studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation (i.e., latent distribution over topics) for the document from different target corpora. In this work, we aim to improve NTMs further so that their representation power for documents generalises reliably across corpora and tasks. To do so, we propose to enhance NTMs by narrowing the semantical distance between similar documents, with the underlying assumption that documents from different corpora may share similar semantics. Specifically, we obtain a similar document for each training document by text data augmentation. Then, we optimise NTMs further by minimising the semantical distance between each pair, measured by the Hierarchical Topic Transport Distance, which computes the Optimal Transport (OT) distance between their topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora. Our code and datasets are available at: https://github.com/Xiaohao-Yang/Topic_Model_Generalisation	翻訳日:2024-05-22 19:20:36 公開日:2024-05-20
# 低次多項式によるグラフオン推定のための計算下界 Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials ( http://arxiv.org/abs/2308.15728v3 ) ライセンス: Link先を確認	Yuetian Luo, Chao Gao,	(参考訳) グラフオン推定は、ネットワーク分析における最も基本的な問題の一つであり、過去10年間にかなりの注目を集めてきた。統計的観点からは、確率ブロックモデルと非パラメトリックグラフトン推定の両方について、Gao et al (2015) により、グラノン推定の最小誤差速度が確立されている。統計的最適推定子は制約された最小二乗に基づいており、次元において計算複雑性が指数関数的である。計算の観点からは、最もよく知られた多項式時間推定器は普遍特異値しきい値のしきい値に基づいているが、最小値よりもはるかに遅い推定誤差率しか達成できない。 USVTの計算最適性や、グラノン推定における計算障壁の存在は、長年の未解決問題であった。本研究では,低次多項式を用いたグラフトン推定における計算障壁の厳密な証拠を提供する。具体的には,SBMグラノン推定において,低次多項式推定器の場合,その推定誤差は幅広いパラメータ条件下でUSVTの推定値よりも著しく優れていることが示され,非パラメトリックグラノン推定では,低次多項式推定器が推定誤差率を最小値よりも厳密に遅くすることを示す。我々の結果は、Schramm と Wein (2022) による最近の低次多項式の発展に基づいて証明されている。また,本研究の主な成果を生かして,SBMにおけるコミュニティ検出におけるクラスタリング誤差の計算的下限も提供し,コミュニティの効率的な回復のためのケステン・スティグムしきい値の新たな証拠を得た。最後に、計算下界をスパースグラノン推定とビクラスタリングに拡張する。 Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based universal singular value thresholding, but it can only achieve a much slower estimation error rate than the minimax one. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in SBM graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes and in nonparametric graphon estimation, we show low-degree polynomial estimators achieve estimation error rates strictly slower than the minimax rate. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery. Finally, we extend our computational lower bounds to sparse graphon estimation and biclustering.	翻訳日:2024-05-22 19:20:36 公開日:2024-05-20
# BioCoder: 大規模言語モデルを用いたバイオインフォマティクスコード生成ベンチマーク BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models ( http://arxiv.org/abs/2308.16458v5 ) ライセンス: Link先を確認	Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein,	(参考訳) 事前訓練された大規模言語モデル(LLM)はコード生成を大幅に改善した。これらのモデルがスケールアップするにつれて、より複雑なタスクを処理し、特定のドメインに適切に特化するための出力の必要性が高まっています。ここでは、この規律が必要とするドメイン知識、アルゴリズム、データ操作の量により、バイオインフォマティクスを対象とする。バイオインフォマティクス固有のコードを生成する際のLCMを評価するためのベンチマークであるBioCoderを提案する。 BioCoderは、ファイル間の依存関係、クラス宣言、グローバル変数を含む、フィールドの大部分にまたがる。その中には、GitHubから抽出された1,026のPython関数と1,243のJavaメソッドと、バイオインフォマティクスに関連するRosalindプロジェクトから253のサンプルが含まれている。トピックモデリングを用いて、包含コード全体のカバレッジは、バイオインフォマティクス計算の完全なスペクトルを表していることを示す。 BioCoderは、評価のためのファズテストフレームワークを組み込んでいる。 InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, GPT-4 など,さまざまなモデルの評価に採用しました。さらに、1つのモデル(StarCoder)を微調整し、トレーニングデータセットがテストベンチマークのパフォーマンスを向上できることを実証しました。 1) 成功したモデルは、機能的依存関係を含む完全なコンテキストを持つ長いプロンプト(> 2,600トークン)に対応します。 2) バイオインフォマティクスのドメイン固有の知識は, 一般的なコーディング能力以上のものを含んでいる。これはGPT-3.5/4のパフォーマンス向上から明らかです(ベンチマークの50%対25%)。可用性と実装: コードは https://github.com/gersteinlab/biocoder と https://biocoder-benchmark で利用可能である。 github.io/ Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1,026 Python functions and 1,243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT- 4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (1) Successful models accommodate a long prompt (> 2,600 tokens) with full context, including functional dependencies. (2) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% vs. up to 25%). Availability and implementation: Code is available at: https://github.com/gersteinlab/biocoder and https://biocoder-benchmark. github.io/.	翻訳日:2024-05-22 19:20:36 公開日:2024-05-20
# 境界ストレージモデルにおける機能暗号化 Functional Encryption in the Bounded Storage Models ( http://arxiv.org/abs/2309.06702v3 ) ライセンス: Link先を確認	Mohammed Barhoush, Louis Salvail,	(参考訳) 関数型暗号化は公開鍵暗号の強力なパラダイムであり、暗号化されたデータへの制御されたアクセスを可能にする。このプリミティブの理想的なシミュレーションベースのセキュリティを実現することは、通常、平易なモデルでは不可能であるため、量子記憶モデル(BQSM)と古典記憶モデル(BCSM)では、それぞれ量子記憶量と古典記憶量に制限がある可能性について検討する。機能的暗号化における不可能な結果がこれらの設定に当てはまらないため、肯定的な結果が得られる。まず、BQSMでは、${q}=O(\sqrt{{s}/{r}})$で情報理論に基づくセキュリティを満たす非対話型関数暗号を構築する。ここで${r}$は、相手がプロトコル内の量子メモリの${s}$-qubitsに制限される回数を表し、${q}$はプロトコルを正直に実行するために必要な量子メモリを表す。次に、我々のスキームは、${q} < \sqrt{{s}/{r}}$で情報理論上のセキュリティを得ることができないことを証明することで最適であることを示す。しかし、一方通行関数の存在を仮定することで、${q}=0$ と ${r}=1$ で(相互に)機能的な暗号化を実現する。第二に、BCSMでは、情報理論に基づく部分指数シミュレーションに基づくセキュリティを満足する非対話型機能暗号を構築し、部分指数灰色の箱難読化の存在を仮定する。この仮定は、非対話型機能暗号から部分指数灰色の難読化を構築することで最小限であることを示す。また、グレーボックスの難読化と片道関数を仮定したシミュレーションベースのセキュリティを満たす(対話型)関数暗号の計算設定も検討する。 Functional encryption is a powerful paradigm for public-key encryption that allows for controlled access to encrypted data. Achieving the ideal simulation based security for this primitive is generally impossible in the plain model, so we investigate possibilities in the bounded quantum storage model (BQSM) and the bounded classical storage model (BCSM), where adversaries are limited with respect to their quantum and classical memories, respectively. The impossibility results on functional encryption do not apply to these settings which allows us to obtain positive outcomes. Firstly, in the BQSM, we construct non-interactive functional encryption satisfying information-theoretic simulation based security with ${q}=O(\sqrt{{s}/{r}})$. Here ${r}$ denotes the number of times that an adversary is restricted to ${s}$--qubits of quantum memory in the protocol and ${q}$ denotes the required quantum memory to run the protocol honestly. We then show that our scheme is optimal by proving that it is impossible to attain information-theoretically security with ${q} < \sqrt{{s}/{r}}$. However, by assuming the existence of one-way functions, we achieve (interactive) functional encryption with ${q}=0$ and ${r}=1$. Secondly, in the BCSM, we construct non-interactive functional encryption satisfying information-theoretic subexponential simulation based security assuming the existence of subexponential grey-box obfuscation. We then demonstrate that this assumption is minimal by constructing subexponential grey-box obfuscation from non-interactive functional encryption. We also consider the computational setting, obtaining (interactive) functional encryption satisfying simulation based security assuming grey-box obfuscation and one-way functions.	翻訳日:2024-05-22 19:20:36 公開日:2024-05-20
# 強結合レジームにおける2量子ドット微小キャビティ系の量子相関に及ぼすフォルスター相互作用とパルス励起の影響 Effect of the Förster Interaction and the Pulsed Pumping on the Quantum Correlations of a Two Quantum Dot-Microcavity System in the Strong Coupling Regime ( http://arxiv.org/abs/2309.08699v2 ) ライセンス: Link先を確認	D. Madrid-Úsuga, A. A. Portacio, D. Rasero,	(参考訳) 2つの量子ドットとF\"oster interaction({\Gamma})を強く結合した微小キャビティ内に配置した系と、電磁界の単一モードをレーザーパルスで駆動する系の量子相関を、リンドランド形式のマスター方程式の定式化を用いて理論的に検討した。系のエネルギー固有値は、第1および第2の励起多様体の分解関数として研究された。シミュレーションされたレーザーパルスとパルス強度のポンプ時間の変化を考慮し、コンカレンス(CC)、生成絡み(EoF)、ミューチュアル情報(I)、量子不協和(Q)を時間関数として検討した。エンタングルメント量化器としてEoFとCCの相違を見出した結果,共起がEoFよりもはるかに高い値に達することが示唆された。 F\"oster"相互作用の存在は、系内の量子不協和が支配的な相関関係であることを好んでおり、系の絡み合いが消えても系は量子相関を維持するが、レーザーポンプ時間の増加の影響を受けていることを示している。 The quantum correlations of a system of two quantum dots with F\"oster interaction ({\Gamma}) in a microcavity with strongly coupled dissipation and a single mode of the electromagnetic field and driven by a laser pulse were studied theoretically, using the formalism of the master equation in Lindbland form. The energy eigenvalues of the system were studied as a function of detuning for the first and second excitation varieties. Concurrence (CC), formation entanglement (EoF),mutual information (I) and quantum discord (Q) are studied as a function of time considering different values of F\"oster coupling, varying the pump times of the simulated laser pulse and pulse intensity. We found a discrepancy between EoF and CC as entanglement quantifiers, noting that concurrence reaches much higher values than EoF; so concurrence can indicate results that are well above the EoF. The presence of the F\"oster interaction favors that the quantum discord is the dominant correlation in the system, which indicates that the system maintains quantum correlations even when the entanglement of the system has disappeared, but that it is affected by the increase in the laser pump time	翻訳日:2024-05-22 19:10:52 公開日:2024-05-20
# 自動運転車の長距離3次元物体検出に向けて Towards Long-Range 3D Object Detection for Autonomous Vehicles ( http://arxiv.org/abs/2310.04800v2 ) ライセンス: Link先を確認	Ajinkya Khoche, Laura Pereira Sánchez, Nazre Batool, Sina Sharif Mansouri, Patric Jensfelt,	(参考訳) 長距離での3D物体検出は、自動運転車の安全性と効率を確保するために不可欠である。しかし、現在最先端のLiDARベースの手法は、遠距離での間隔によって範囲が限られており、エゴ車から遠く離れた地点間での領域ギャップが生じる。もう一つの関連する問題は、遠距離物体のラベル不均衡であり、遠距離でのディープニューラルネットワークの性能を阻害する。上記の制約に対処するため、現在のLiDARベースの3D検出器の長距離性能を改善する2つの方法を検討する。まず,距離の専門家と呼ばれる2つの3D検出ネットワークと,近距離から中距離の物体を専門とする3D検出ネットワークと,長距離の3D検出ネットワークを組み合わせる。ラベルの少ない状況下で長い距離で検出器を訓練するためには、ラベル付き点とエゴ車との距離に応じて損失を更に重み付けする。第2に、画像に基づく深度補完アルゴリズムであるMultimodal Virtual Points (MVP) を用いて、LiDARスキャンを仮想点で拡張する。長距離Argoverse2(AV2)データセットに関する我々の実験は、MVPが単純な実装を維持しながら、長距離性能を改善するのにより効果的であることを示している。一方、レンジの専門家は、画像ベースのセグメンテーションネットワークと完璧なカメラ-LiDARキャリブレーションに依存しないように、計算的に効率的で簡単な代替手段を提供する。 3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away from the ego vehicle. Another related problem is the label imbalance for faraway objects, which inhibits the performance of Deep Neural Networks at long range. To address the above limitations, we investigate two ways to improve long range performance of current LiDAR based 3D detectors. First, we combine two 3D detection networks, referred to as range experts, one specializing at near to mid range objects, and one at long range 3D detection. To train a detector at long range under a scarce label regime, we further weigh the loss according to the labelled point's distance from ego vehicle. Second, we augment LiDAR scans with virtual points generated using Multimodal Virtual Points (MVP), a readily available image-based depth completion algorithm. Our experiments on the long range Argoverse2 (AV2) dataset indicate that MVP is more effective in improving long range performance, while maintaining a straightforward implementation. On the other hand, the range experts offer a computationally efficient and simpler alternative, avoiding dependency on image-based segmentation networks and perfect camera-LiDAR calibration.	翻訳日:2024-05-22 19:10:52 公開日:2024-05-20
# Imitation Bootstrapped Reinforcement Learning Imitation Bootstrapped Reinforcement Learning ( http://arxiv.org/abs/2311.02198v6 ) ライセンス: Link先を確認	Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh,	(参考訳) 強化学習(RL)のかなりの可能性にもかかわらず、ロボット制御タスクはより優れたサンプル効率のため、主に模倣学習(IL)に依存している。しかし、ILがすべての可能なシナリオに一般化できるような、包括的な専門家によるデモンストレーションを収集することはコストがかかる。したがって、RL は効率的な自己改善手順として IL 上に構築できることをアピールしている。提案手法は,提案する実演において,まずILポリシーを訓練し,それを用いて,オンライン探索とブートストラップ対象値の両方に対する代替行動を提案する,サンプル効率の高いRLのための新しいフレームワークである。 IBRLは、デモンストレーションのオーバーサンプリングやRLの正規化を、さらなる模倣損失で行う以前の作業と比較して、トレーニングの開始以来、ILポリシーからの高品質なアクションを活用することができ、探索と訓練の効率を大幅に向上させることができる。 IBRLを6つのシミュレーションと3つの実世界のタスクで評価した。 IBRLは従来の手法よりも優れており、特に難しい作業では改善が顕著である。 Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations that enable IL to generalize to all possible scenarios, and any distribution shift would require recollecting data for finetuning. Therefore, RL is appealing if it can build upon IL as an efficient autonomous self-improvement procedure. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework for sample-efficient RL with demonstrations that first trains an IL policy on the provided demonstrations and then uses it to propose alternative actions for both online exploration and bootstrapping target values. Compared to prior works that oversample the demonstrations or regularize RL with an additional imitation loss, IBRL is able to utilize high quality actions from IL policies since the beginning of training, which greatly accelerates exploration and training efficiency. We evaluate IBRL on 6 simulation and 3 real-world tasks spanning various difficulty levels. IBRL significantly outperforms prior methods and the improvement is particularly more prominent in harder tasks.	翻訳日:2024-05-22 19:01:09 公開日:2024-05-20
# 分散シフトによるテスト可能な学習 Testable Learning with Distribution Shift ( http://arxiv.org/abs/2311.15142v2 ) ライセンス: Link先を確認	Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan,	(参考訳) そこで本研究では,学習者が学習ディストリビューションからラベル付きサンプルに$D$,テストディストリビューションから$D’$のラベル付きサンプルを付与し,テストエラーの少ない分類器を出力する,分散シフトによる学習の基本的な問題を再検討する。この設定における標準的なアプローチは、$D$ と $D'$ の間の距離の概念によって分類器の損失を限定することである。しかし、これらの距離は計算が難しく、効率的なアルゴリズムに繋がらない。我々はこのパラダイムから離れ、分散シフトを伴うテスト可能学習と呼ばれる新しいモデルを定義し、テスト分布上での分類器の性能を証明可能なアルゴリズムを得る。このモデルでは、学習者は、$D$と$D’$のサンプルが関連するテストに合格するたびに、低いテストエラーの分類器を出力する。ハーフ空間、ハーフ空間の交叉、決定木などのよく研究された概念クラスを学習するために、D$の限界がガウス的あるいは一様であるとき、いくつかの肯定的な結果を与える。我々の研究に先立ち、これらの基本事例に対する効率的なアルゴリズムは、$D'$に関する強い仮定なしでは知られていなかった。実現可能な場合($D$と$D’$の両方に整合したハーフスペースが存在する場合)のハーフスペースに対しては、モーメントマッチングアプローチとアクティブラーニングのアイデアを組み合わせて、不一致領域を推定するための効率的なオラクルをシミュレートする。実現不可能な設定にまで拡張するために、テスト可能な(不可知な)学習から最近の研究を適用する。より一般的には、低次$L_2$サンドウィッチ多項式近似器を持つ任意の関数クラスが、我々のモデルで学習できることが証明される。我々は,必要な近似値を得るために擬似ランダム性文献から構成を適用した。 We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on $\{\pm 1\}^d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.	翻訳日:2024-05-22 19:01:09 公開日:2024-05-20
# SqueezeSAM: ユーザフレンドリーなモバイルインタラクティブセグメンテーション SqueezeSAM: User friendly mobile interactive segmentation ( http://arxiv.org/abs/2312.06736v3 ) ライセンス: Link先を確認	Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra,	(参考訳) Segment Anything Model (SAM)は、インタラクティブセグメンテーションの分野における基盤であり、生成AI、計算写真、医療画像の進歩を加速させている。任意のユーザ入力を処理し、対応するセグメンテーションマスクを生成する能力があるにもかかわらず、SAMの6億ドルのパラメータアーキテクチャはViT-Hをベースにしており、その高い計算要求と大きなモデルサイズのために現在のモバイルハードウェアと互換性がない。本研究の目的は,モバイル写真アプリケーションにSAMを応用することである。この目的のために、完全に畳み込まれたSqueezeSAMモデルアーキテクチャを開発し、これは元のSAMより62.5倍速く、31.6倍小さいので、モバイルアプリケーションにとって実行可能なソリューションです。さらに、我々の小さなモデルは、元のVIT-Hアーキテクチャの1%以内のmIOUを達成する。自動セグメンテーション(Automated segmentation)は、リンゴやカプカットといった業界の主要なプレイヤーによって採用されていることの証明として、写真アプリケーションの作成フローにおいて重要な価値を持っている。この自動化を容易にするために,我々は,有能なオブジェクト検出と,前景オブジェクト選択のための潜在的なユーザクリックをシミュレートし,ユーザが対話的に編集できる初期セグメンテーションマスクを生成する。一般的なユーザからの期待は、オブジェクトの特定の部分のクリックがオブジェクト全体のセグメンテーションをもたらすことである。例えば、写真の中の人のTシャツをクリックすれば、Tシャツだけでなく、理想的には人全体を分割できる。しかし、SAMは通常、クリックされた領域のみをセグメント化する。我々はこの制限を新しいデータ拡張方式によって解決する。これにより、ユーザがバスケットボールを持っている人をクリックすると、人とバスケットボールの両方がセグメンテーションされ、ユーザの期待と一致し、全体的なユーザエクスペリエンスが向上する。 The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware due to its high computational demands and large model size. Our research aims to adapt SAM for use in mobile photography applications. To this end, we have developed a fully convolutional SqueezeSAM model architecture, which is 62.5 times faster and 31.6 times smaller than the original SAM, making it a viable solution for mobile applications. Furthermore, our tiny model achieves an mIOU within 1% of the original VIT-H architecture. Automated segmentation holds significant value in the creation flow for photography applications, as evidenced by its adoption by leading industry players like apple and capcut. To facilitate this automation, we employ salient object detection and simulate potential user clicks for foreground object selection, generating an initial segmentation mask that users can subsequently edit interactively. A common user expectation is that a click on a specific part of an object will result in the segmentation of the entire object. For example, a click on a person's t-shirt in a photo should ideally segment the entire person, not just the t-shirt. However, SAM typically only segments the clicked area. We address this limitation through a novel data augmentation scheme. Consequently, if a user clicks on a person holding a basketball, both the person and the basketball are segmented together, aligning with user expectations and enhancing the overall user experience.	翻訳日:2024-05-22 18:51:19 公開日:2024-05-20
# Gemini: 高機能マルチモーダルモデルのファミリー Gemini: A Family of Highly Capable Multimodal Models ( http://arxiv.org/abs/2312.11805v3 ) ライセンス: Link先を確認	Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, Ryan Doherty, Eli Collins, Clemens Meyer, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Jack Krawczyk, Cosmo Du, Ed Chi, Heng-Tze Cheng, Eric Ni, Purvi Shah, Patrick Kane, Betty Chan, Manaal Faruqui, Aliaksei Severyn, Hanzhao Lin, YaGuang Li, Yong Cheng, Abe Ittycheriah, Mahdis Mahdieh, Mia Chen, Pei Sun, Dustin Tran, Sumit Bagri, Balaji Lakshminarayanan, Jeremiah Liu, Andras Orban, Fabian Güra, Hao Zhou, Xinying Song, Aurelien Boffy, Harish Ganapathy, Steven Zheng, HyunJeong Choe, Ágoston Weisz, Tao Zhu, Yifeng Lu, Siddharth Gopal, Jarrod Kahn, Maciej Kula, Jeff Pitman, Rushin Shah, Emanuel Taropa, Majd Al Merey, Martin Baeuml, Zhifeng Chen, Laurent El Shafey, Yujing Zhang, Olcan Sercinoglu, George Tucker, Enrique Piqueras, Maxim Krikun, Iain Barr, Nikolay Savinov, Ivo Danihelka, Becca Roelofs, Anaïs White, Anders Andreassen, Tamara von Glehn, Lakshman Yagati, Mehran Kazemi, Lucas Gonzalez, Misha Khalman, Jakub Sygnowski, Alexandre Frechette, Charlotte Smith, Laura Culp, Lev Proleev, Yi Luan, Xi Chen, James Lottes, Nathan Schucher, Federico Lebron, Alban Rrustemi, Natalie Clay, Phil Crone, Tomas Kocisky, Jeffrey Zhao, Bartek Perz, Dian Yu, Heidi Howard, Adam Bloniarz, Jack W. Rae, Han Lu, Laurent Sifre, Marcello Maggioni, Fred Alcober, Dan Garrette, Megan Barnes, Shantanu Thakoor, Jacob Austin, Gabriel Barth-Maron, William Wong, Rishabh Joshi, Rahma Chaabouni, Deeni Fatiha, Arun Ahuja, Gaurav Singh Tomar, Evan Senter, Martin Chadwick, Ilya Kornakov, Nithya Attaluri, Iñaki Iturrate, Ruibo Liu, Yunxuan Li, Sarah Cogan, Jeremy Chen, Chao Jia, Chenjie Gu, Qiao Zhang, Jordan Grimstad, Ale Jakse Hartman, Xavier Garcia, Thanumalayan Sankaranarayana Pillai, Jacob Devlin, Michael Laskin, Diego de Las Casas, Dasha Valter, Connie Tao, Lorenzo Blanco, Adrià Puigdomènech Badia, David Reitter, Mianna Chen, Jenny Brennan, Clara Rivera, Sergey Brin, Shariq Iqbal, Gabriela Surita, Jane Labanowski, Abhi Rao, Stephanie Winkler, Emilio Parisotto, Yiming Gu, Kate Olszewska, Ravi Addanki, Antoine Miech, Annie Louis, Denis Teplyashin, Geoff Brown, Elliot Catt, Jan Balaguer, Jackie Xiang, Pidong Wang, Zoe Ashwood, Anton Briukhov, Albert Webson, Sanjay Ganapathy, Smit Sanghavi, Ajay Kannan, Ming-Wei Chang, Axel Stjerngren, Josip Djolonga, Yuting Sun, Ankur Bapna, Matthew Aitchison, Pedram Pejman, Henryk Michalewski, Tianhe Yu, Cindy Wang, Juliette Love, Junwhan Ahn, Dawn Bloxwich, Kehang Han, Peter Humphreys, Thibault Sellam, James Bradbury, Varun Godbole, Sina Samangooei, Bogdan Damoc, Alex Kaskasoli, Sébastien M. R. Arnold, Vijay Vasudevan, Shubham Agrawal, Jason Riesa, Dmitry Lepikhin, Richard Tanburn, Srivatsan Srinivasan, Hyeontaek Lim, Sarah Hodkinson, Pranav Shyam, Johan Ferret, Steven Hand, Ankush Garg, Tom Le Paine, Jian Li, Yujia Li, Minh Giang, Alexander Neitz, Zaheer Abbas, Sarah York, Machel Reid, Elizabeth Cole, Aakanksha Chowdhery, Dipanjan Das, Dominika Rogozińska, Vitaliy Nikolaev, Pablo Sprechmann, Zachary Nado, Lukas Zilka, Flavien Prost, Luheng He, Marianne Monteiro, Gaurav Mishra, Chris Welty, Josh Newlan, Dawei Jia, Miltiadis Allamanis, Clara Huiyi Hu, Raoul de Liedekerke, Justin Gilmer, Carl Saroufim, Shruti Rijhwani, Shaobo Hou, Disha Shrivastava, Anirudh Baddepudi, Alex Goldin, Adnan Ozturel, Albin Cassirer, Yunhan Xu, Daniel Sohn, Devendra Sachan, Reinald Kim Amplayo, Craig Swanson, Dessie Petrova, Shashi Narayan, Arthur Guez, Siddhartha Brahma, Jessica Landon, Miteyan Patel, Ruizhe Zhao, Kevin Villela, Luyu Wang, Wenhao Jia, Matthew Rahtz, Mai Giménez, Legg Yeung, James Keeling, Petko Georgiev, Diana Mincu, Boxi Wu, Salem Haykal, Rachel Saputro, Kiran Vodrahalli, James Qin, Zeynep Cankara, Abhanshu Sharma, Nick Fernando, Will Hawkins, Behnam Neyshabur, Solomon Kim, Adrian Hutter, Priyanka Agrawal, Alex Castro-Ros, George van den Driessche, Tao Wang, Fan Yang, Shuo-yiin Chang, Paul Komarek, Ross McIlroy, Mario Lučić, Guodong Zhang, Wael Farhan, Michael Sharman, Paul Natsev, Paul Michel, Yamini Bansal, Siyuan Qiao, Kris Cao, Siamak Shakeri, Christina Butterfield, Justin Chung, Paul Kishan Rubenstein, Shivani Agrawal, Arthur Mensch, Kedar Soparkar, Karel Lenc, Timothy Chung, Aedan Pope, Loren Maggiore, Jackie Kay, Priya Jhakra, Shibo Wang, Joshua Maynez, Mary Phuong, Taylor Tobin, Andrea Tacchetti, Maja Trebacz, Kevin Robinson, Yash Katariya, Sebastian Riedel, Paige Bailey, Kefan Xiao, Nimesh Ghelani, Lora Aroyo, Ambrose Slone, Neil Houlsby, Xuehan Xiong, Zhen Yang, Elena Gribovskaya, Jonas Adler, Mateo Wirth, Lisa Lee, Music Li, Thais Kagohara, Jay Pavagadhi, Sophie Bridgers, Anna Bortsova, Sanjay Ghemawat, Zafarali Ahmed, Tianqi Liu, Richard Powell, Vijay Bolina, Mariko Iinuma, Polina Zablotskaia, James Besley, Da-Woon Chung, Timothy Dozat, Ramona Comanescu, Xiance Si, Jeremy Greer, Guolong Su, Martin Polacek, Raphaël Lopez Kaufman, Simon Tokumine, Hexiang Hu, Elena Buchatskaya, Yingjie Miao, Mohamed Elhawaty, Aditya Siddhant, Nenad Tomasev, Jinwei Xing, Christina Greer, Helen Miller, Shereen Ashraf, Aurko Roy, Zizhao Zhang, Ada Ma, Angelos Filos, Milos Besta, Rory Blevins, Ted Klimenko, Chih-Kuan Yeh, Soravit Changpinyo, Jiaqi Mu, Oscar Chang, Mantas Pajarskas, Carrie Muir, Vered Cohen, Charline Le Lan, Krishna Haridasan, Amit Marathe, Steven Hansen, Sholto Douglas, Rajkumar Samuel, Mingqiu Wang, Sophia Austin, Chang Lan, Jiepu Jiang, Justin Chiu, Jaime Alonso Lorenzo, Lars Lowe Sjösund, Sébastien Cevey, Zach Gleicher, Thi Avrahami, Anudhyan Boral, Hansa Srinivasan, Vittorio Selo, Rhys May, Konstantinos Aisopos, Léonard Hussenot, Livio Baldini Soares, Kate Baumli, Michael B. Chang, Adrià Recasens, Ben Caine, Alexander Pritzel, Filip Pavetic, Fabio Pardo, Anita Gergely, Justin Frye, Vinay Ramasesh, Dan Horgan, Kartikeya Badola, Nora Kassner, Subhrajit Roy, Ethan Dyer, Víctor Campos Campos, Alex Tomala, Yunhao Tang, Dalia El Badawy, Elspeth White, Basil Mustafa, Oran Lang, Abhishek Jindal, Sharad Vikram, Zhitao Gong, Sergi Caelles, Ross Hemsley, Gregory Thornton, Fangxiaoyu Feng, Wojciech Stokowiec, Ce Zheng, Phoebe Thacker, Çağlar Ünlü, Zhishuai Zhang, Mohammad Saleh, James Svensson, Max Bileschi, Piyush Patil, Ankesh Anand, Roman Ring, Katerina Tsihlas, Arpi Vezer, Marco Selvi, Toby Shevlane, Mikel Rodriguez, Tom Kwiatkowski, Samira Daruki, Keran Rong, Allan Dafoe, Nicholas FitzGerald, Keren Gu-Lemberg, Mina Khan, Lisa Anne Hendricks, Marie Pellat, Vladimir Feinberg, James Cobon-Kerr, Tara Sainath, Maribeth Rauh, Sayed Hadi Hashemi, Richard Ives, Yana Hasson, Eric Noland, Yuan Cao, Nathan Byrd, Le Hou, Qingze Wang, Thibault Sottiaux, Michela Paganini, Jean-Baptiste Lespiau, Alexandre Moufarek, Samer Hassan, Kaushik Shivakumar, Joost van Amersfoort, Amol Mandhane, Pratik Joshi, Anirudh Goyal, Matthew Tung, Andrew Brock, Hannah Sheahan, Vedant Misra, Cheng Li, Nemanja Rakićević, Mostafa Dehghani, Fangyu Liu, Sid Mittal, Junhyuk Oh, Seb Noury, Eren Sezener, Fantine Huot, Matthew Lamm, Nicola De Cao, Charlie Chen, Sidharth Mudgal, Romina Stella, Kevin Brooks, Gautam Vasudevan, Chenxi Liu, Mainak Chain, Nivedita Melinkeri, Aaron Cohen, Venus Wang, Kristie Seymore, Sergey Zubkov, Rahul Goel, Summer Yue, Sai Krishnakumaran, Brian Albert, Nate Hurley, Motoki Sano, Anhad Mohananey, Jonah Joughin, Egor Filonov, Tomasz Kępa, Yomna Eldawy, Jiawern Lim, Rahul Rishi, Shirin Badiezadegan, Taylor Bos, Jerry Chang, Sanil Jain, Sri Gayatri Sundara Padmanabhan, Subha Puttagunta, Kalpesh Krishna, Leslie Baker, Norbert Kalb, Vamsi Bedapudi, Adam Kurzrok, Shuntong Lei, Anthony Yu, Oren Litvin, Xiang Zhou, Zhichun Wu, Sam Sobell, Andrea Siciliano, Alan Papir, Robby Neale, Jonas Bragagnolo, Tej Toor, Tina Chen, Valentin Anklin, Feiran Wang, Richie Feng, Milad Gholami, Kevin Ling, Lijuan Liu, Jules Walter, Hamid Moghaddam, Arun Kishore, Jakub Adamek, Tyler Mercado, Jonathan Mallinson, Siddhinita Wandekar, Stephen Cagle, Eran Ofek, Guillermo Garrido, Clemens Lombriser, Maksim Mukha, Botu Sun, Hafeezul Rahman Mohammad, Josip Matak, Yadi Qian, Vikas Peswani, Pawel Janus, Quan Yuan, Leif Schelin, Oana David, Ankur Garg, Yifan He, Oleksii Duzhyi, Anton Älgmyr, Timothée Lottaz, Qi Li, Vikas Yadav, Luyao Xu, Alex Chinien, Rakesh Shivanna, Aleksandr Chuklin, Josie Li, Carrie Spadine, Travis Wolfe, Kareem Mohamed, Subhabrata Das, Zihang Dai, Kyle He, Daniel von Dincklage, Shyam Upadhyay, Akanksha Maurya, Luyan Chi, Sebastian Krause, Khalid Salama, Pam G Rabinovitch, Pavan Kumar Reddy M, Aarush Selvan, Mikhail Dektiarev, Golnaz Ghiasi, Erdem Guven, Himanshu Gupta, Boyi Liu, Deepak Sharma, Idan Heimlich Shtacher, Shachi Paul, Oscar Akerlund, François-Xavier Aubet, Terry Huang, Chen Zhu, Eric Zhu, Elico Teixeira, Matthew Fritze, Francesco Bertolini, Liana-Eleonora Marinescu, Martin Bölle, Dominik Paulus, Khyatti Gupta, Tejasi Latkar, Max Chang, Jason Sanders, Roopa Wilson, Xuewei Wu, Yi-Xuan Tan, Lam Nguyen Thiet, Tulsee Doshi, Sid Lall, Swaroop Mishra, Wanming Chen, Thang Luong, Seth Benjamin, Jasmine Lee, Ewa Andrejczuk, Dominik Rabiej, Vipul Ranjan, Krzysztof Styrc, Pengcheng Yin, Jon Simon, Malcolm Rose Harriott, Mudit Bansal, Alexei Robsky, Geoff Bacon, David Greene, Daniil Mirylenka, Chen Zhou, Obaid Sarvana, Abhimanyu Goyal, Samuel Andermatt, Patrick Siegler, Ben Horn, Assaf Israel, Francesco Pongetti, Chih-Wei "Louis" Chen, Marco Selvatici, Pedro Silva, Kathie Wang, Jackson Tolins, Kelvin Guu, Roey Yogev, Xiaochen Cai, Alessandro Agostini, Maulik Shah, Hung Nguyen, Noah Ó Donnaile, Sébastien Pereira, Linda Friso, Adam Stambler, Adam Kurzrok, Chenkai Kuang, Yan Romanikhin, Mark Geller, ZJ Yan, Kane Jang, Cheng-Chun Lee, Wojciech Fica, Eric Malmi, Qijun Tan, Dan Banica, Daniel Balle, Ryan Pham, Yanping Huang, Diana Avram, Hongzhi Shi, Jasjot Singh, Chris Hidey, Niharika Ahuja, Pranab Saxena, Dan Dooley, Srividya Pranavi Potharaju, Eileen O'Neill, Anand Gokulchandran, Ryan Foley, Kai Zhao, Mike Dusenberry, Yuan Liu, Pulkit Mehta, Ragha Kotikalapudi, Chalence Safranek-Shrader, Andrew Goodman, Joshua Kessinger, Eran Globen, Prateek Kolhar, Chris Gorgolewski, Ali Ibrahim, Yang Song, Ali Eichenbaum, Thomas Brovelli, Sahitya Potluri, Preethi Lahoti, Cip Baetu, Ali Ghorbani, Charles Chen, Andy Crawford, Shalini Pal, Mukund Sridhar, Petru Gurita, Asier Mujika, Igor Petrovski, Pierre-Louis Cedoz, Chenmei Li, Shiyuan Chen, Niccolò Dal Santo, Siddharth Goyal, Jitesh Punjabi, Karthik Kappaganthu, Chester Kwak, Pallavi LV, Sarmishta Velury, Himadri Choudhury, Jamie Hall, Premal Shah, Ricardo Figueira, Matt Thomas, Minjie Lu, Ting Zhou, Chintu Kumar, Thomas Jurdi, Sharat Chikkerur, Yenai Ma, Adams Yu, Soo Kwak, Victor Ähdel, Sujeevan Rajayogam, Travis Choma, Fei Liu, Aditya Barua, Colin Ji, Ji Ho Park, Vincent Hellendoorn, Alex Bailey, Taylan Bilal, Huanjie Zhou, Mehrdad Khatir, Charles Sutton, Wojciech Rzadkowski, Fiona Macintosh, Konstantin Shagin, Paul Medina, Chen Liang, Jinjing Zhou, Pararth Shah, Yingying Bi, Attila Dankovics, Shipra Banga, Sabine Lehmann, Marissa Bredesen, Zifan Lin, John Eric Hoffmann, Jonathan Lai, Raynald Chung, Kai Yang, Nihal Balani, Arthur Bražinskas, Andrei Sozanschi, Matthew Hayes, Héctor Fernández Alcalde, Peter Makarov, Will Chen, Antonio Stella, Liselotte Snijders, Michael Mandl, Ante Kärrman, Paweł Nowak, Xinyi Wu, Alex Dyck, Krishnan Vaidyanathan, Raghavender R, Jessica Mallet, Mitch Rudominer, Eric Johnston, Sushil Mittal, Akhil Udathu, Janara Christensen, Vishal Verma, Zach Irving, Andreas Santucci, Gamaleldin Elsayed, Elnaz Davoodi, Marin Georgiev, Ian Tenney, Nan Hua, Geoffrey Cideron, Edouard Leurent, Mahmoud Alnahlawi, Ionut Georgescu, Nan Wei, Ivy Zheng, Dylan Scandinaro, Heinrich Jiang, Jasper Snoek, Mukund Sundararajan, Xuezhi Wang, Zack Ontiveros, Itay Karo, Jeremy Cole, Vinu Rajashekhar, Lara Tumeh, Eyal Ben-David, Rishub Jain, Jonathan Uesato, Romina Datta, Oskar Bunyan, Shimu Wu, John Zhang, Piotr Stanczyk, Ye Zhang, David Steiner, Subhajit Naskar, Michael Azzam, Matthew Johnson, Adam Paszke, Chung-Cheng Chiu, Jaume Sanchez Elias, Afroz Mohiuddin, Faizan Muhammad, Jin Miao, Andrew Lee, Nino Vieillard, Jane Park, Jiageng Zhang, Jeff Stanway, Drew Garmon, Abhijit Karmarkar, Zhe Dong, Jong Lee, Aviral Kumar, Luowei Zhou, Jonathan Evens, William Isaac, Geoffrey Irving, Edward Loper, Michael Fink, Isha Arkatkar, Nanxin Chen, Izhak Shafran, Ivan Petrychenko, Zhe Chen, Johnson Jia, Anselm Levskaya, Zhenkai Zhu, Peter Grabowski, Yu Mao, Alberto Magni, Kaisheng Yao, Javier Snaider, Norman Casagrande, Evan Palmer, Paul Suganthan, Alfonso Castaño, Irene Giannoumis, Wooyeol Kim, Mikołaj Rybiński, Ashwin Sreevatsa, Jennifer Prendki, David Soergel, Adrian Goedeckemeyer, Willi Gierke, Mohsen Jafari, Meenu Gaba, Jeremy Wiesner, Diana Gage Wright, Yawen Wei, Harsha Vashisht, Yana Kulizhskaya, Jay Hoover, Maigo Le, Lu Li, Chimezie Iwuanyanwu, Lu Liu, Kevin Ramirez, Andrey Khorlin, Albert Cui, Tian LIN, Marcus Wu, Ricardo Aguilar, Keith Pallo, Abhishek Chakladar, Ginger Perng, Elena Allica Abellan, Mingyang Zhang, Ishita Dasgupta, Nate Kushman, Ivo Penchev, Alena Repina, Xihui Wu, Tom van der Weide, Priya Ponnapalli, Caroline Kaplan, Jiri Simsa, Shuangfeng Li, Olivier Dousse, Fan Yang, Jeff Piper, Nathan Ie, Rama Pasumarthi, Nathan Lintz, Anitha Vijayakumar, Daniel Andor, Pedro Valenzuela, Minnie Lui, Cosmin Paduraru, Daiyi Peng, Katherine Lee, Shuyuan Zhang, Somer Greene, Duc Dung Nguyen, Paula Kurylowicz, Cassidy Hardin, Lucas Dixon, Lili Janzer, Kiam Choo, Ziqiang Feng, Biao Zhang, Achintya Singhal, Dayou Du, Dan McKinnon, Natasha Antropova, Tolga Bolukbasi, Orgad Keller, David Reid, Daniel Finchelstein, Maria Abi Raad, Remi Crocker, Peter Hawkins, Robert Dadashi, Colin Gaffney, Ken Franko, Anna Bulanova, Rémi Leblond, Shirley Chung, Harry Askham, Luis C. Cobo, Kelvin Xu, Felix Fischer, Jun Xu, Christina Sorokin, Chris Alberti, Chu-Cheng Lin, Colin Evans, Alek Dimitriev, Hannah Forbes, Dylan Banarse, Zora Tung, Mark Omernick, Colton Bishop, Rachel Sterneck, Rohan Jain, Jiawei Xia, Ehsan Amid, Francesco Piccinno, Xingyu Wang, Praseem Banzal, Daniel J. Mankowitz, Alex Polozov, Victoria Krakovna, Sasha Brown, MohammadHossein Bateni, Dennis Duan, Vlad Firoiu, Meghana Thotakuri, Tom Natan, Matthieu Geist, Ser tan Girgin, Hui Li, Jiayu Ye, Ofir Roval, Reiko Tojo, Michael Kwong, James Lee-Thorp, Christopher Yew, Danila Sinopalnikov, Sabela Ramos, John Mellor, Abhishek Sharma, Kathy Wu, David Miller, Nicolas Sonnerat, Denis Vnukov, Rory Greig, Jennifer Beattie, Emily Caveness, Libin Bai, Julian Eisenschlos, Alex Korchemniy, Tomy Tsai, Mimi Jasarevic, Weize Kong, Phuong Dao, Zeyu Zheng, Frederick Liu, Fan Yang, Rui Zhu, Tian Huey Teh, Jason Sanmiya, Evgeny Gladchenko, Nejc Trdin, Daniel Toyama, Evan Rosen, Sasan Tavakkol, Linting Xue, Chen Elkind, Oliver Woodman, John Carpenter, George Papamakarios, Rupert Kemp, Sushant Kafle, Tanya Grunina, Rishika Sinha, Alice Talbert, Diane Wu, Denese Owusu-Afriyie, Cosmo Du, Chloe Thornton, Jordi Pont-Tuset, Pradyumna Narayana, Jing Li, Saaber Fatehi, John Wieting, Omar Ajmeri, Benigno Uria, Yeongil Ko, Laura Knight, Amélie Héliou, Ning Niu, Shane Gu, Chenxi Pang, Yeqing Li, Nir Levine, Ariel Stolovich, Rebeca Santamaria-Fernandez, Sonam Goenka, Wenny Yustalim, Robin Strudel, Ali Elqursh, Charlie Deck, Hyo Lee, Zonglin Li, Kyle Levin, Raphael Hoffmann, Dan Holtmann-Rice, Olivier Bachem, Sho Arora, Christy Koh, Soheil Hassas Yeganeh, Siim Põder, Mukarram Tariq, Yanhua Sun, Lucian Ionita, Mojtaba Seyedhosseini, Pouya Tafti, Zhiyu Liu, Anmol Gulati, Jasmine Liu, Xinyu Ye, Bart Chrzaszcz, Lily Wang, Nikhil Sethi, Tianrun Li, Ben Brown, Shreya Singh, Wei Fan, Aaron Parisi, Joe Stanton, Vinod Koverkathu, Christopher A. Choquette-Choo, Yunjie Li, TJ Lu, Abe Ittycheriah, Prakash Shroff, Mani Varadarajan, Sanaz Bahargam, Rob Willoughby, David Gaddy, Guillaume Desjardins, Marco Cornero, Brona Robenek, Bhavishya Mittal, Ben Albrecht, Ashish Shenoy, Fedor Moiseev, Henrik Jacobsson, Alireza Ghaffarkhah, Morgane Rivière, Alanna Walton, Clément Crepy, Alicia Parrish, Zongwei Zhou, Clement Farabet, Carey Radebaugh, Praveen Srinivasan, Claudia van der Salm, Andreas Fidjeland, Salvatore Scellato, Eri Latorre-Chimoto, Hanna Klimczak-Plucińska, David Bridson, Dario de Cesare, Tom Hudson, Piermaria Mendolicchio, Lexi Walker, Alex Morris, Matthew Mauger, Alexey Guseynov, Alison Reid, Seth Odoom, Lucia Loher, Victor Cotruta, Madhavi Yenugula, Dominik Grewe, Anastasia Petrushkina, Tom Duerig, Antonio Sanchez, Steve Yadlowsky, Amy Shen, Amir Globerson, Lynette Webb, Sahil Dua, Dong Li, Surya Bhupatiraju, Dan Hurt, Haroon Qureshi, Ananth Agarwal, Tomer Shani, Matan Eyal, Anuj Khare, Shreyas Rammohan Belle, Lei Wang, Chetan Tekur, Mihir Sanjay Kale, Jinliang Wei, Ruoxin Sang, Brennan Saeta, Tyler Liechty, Yi Sun, Yao Zhao, Stephan Lee, Pandu Nayak, Doug Fritz, Manish Reddy Vuyyuru, John Aslanides, Nidhi Vyas, Martin Wicke, Xiao Ma, Evgenii Eltyshev, Nina Martin, Hardie Cate, James Manyika, Keyvan Amiri, Yelin Kim, Xi Xiong, Kai Kang, Florian Luisier, Nilesh Tripuraneni, David Madras, Mandy Guo, Austin Waters, Oliver Wang, Joshua Ainslie, Jason Baldridge, Han Zhang, Garima Pruthi, Jakob Bauer, Feng Yang, Riham Mansour, Jason Gelman, Yang Xu, George Polovets, Ji Liu, Honglong Cai, Warren Chen, XiangHai Sheng, Emily Xue, Sherjil Ozair, Christof Angermueller, Xiaowei Li, Anoop Sinha, Weiren Wang, Julia Wiesinger, Emmanouil Koukoumidis, Yuan Tian, Anand Iyer, Madhu Gurumurthy, Mark Goldenson, Parashar Shah, MK Blake, Hongkun Yu, Anthony Urbanowicz, Jennimaria Palomaki, Chrisantha Fernando, Ken Durden, Harsh Mehta, Nikola Momchev, Elahe Rahimtoroghi, Maria Georgaki, Amit Raul, Sebastian Ruder, Morgan Redshaw, Jinhyuk Lee, Denny Zhou, Komal Jalan, Dinghua Li, Blake Hechtman, Parker Schuh, Milad Nasr, Kieran Milan, Vladimir Mikulik, Juliana Franco, Tim Green, Nam Nguyen, Joe Kelley, Aroma Mahendru, Andrea Hu, Joshua Howland, Ben Vargas, Jeffrey Hui, Kshitij Bansal, Vikram Rao, Rakesh Ghiya, Emma Wang, Ke Ye, Jean Michel Sarr, Melanie Moranski Preston, Madeleine Elish, Steve Li, Aakash Kaku, Jigar Gupta, Ice Pasupat, Da-Cheng Juan, Milan Someswar, Tejvi M., Xinyun Chen, Aida Amini, Alex Fabrikant, Eric Chu, Xuanyi Dong, Amruta Muthal, Senaka Buthpitiya, Sarthak Jauhari, Nan Hua, Urvashi Khandelwal, Ayal Hitron, Jie Ren, Larissa Rinaldi, Shahar Drath, Avigail Dabush, Nan-Jiang Jiang, Harshal Godhia, Uli Sachs, Anthony Chen, Yicheng Fan, Hagai Taitelbaum, Hila Noga, Zhuyun Dai, James Wang, Chen Liang, Jenny Hamer, Chun-Sung Ferng, Chenel Elkind, Aviel Atias, Paulina Lee, Vít Listík, Mathias Carlen, Jan van de Kerkhof, Marcin Pikus, Krunoslav Zaher, Paul Müller, Sasha Zykova, Richard Stefanec, Vitaly Gatsko, Christoph Hirnschall, Ashwin Sethi, Xingyu Federico Xu, Chetan Ahuja, Beth Tsai, Anca Stefanoiu, Bo Feng, Keshav Dhandhania, Manish Katyal, Akshay Gupta, Atharva Parulekar, Divya Pitta, Jing Zhao, Vivaan Bhatia, Yashodha Bhavnani, Omar Alhadlaq, Xiaolin Li, Peter Danenberg, Dennis Tu, Alex Pine, Vera Filippova, Abhipso Ghosh, Ben Limonchik, Bhargava Urala, Chaitanya Krishna Lanka, Derik Clive, Yi Sun, Edward Li, Hao Wu, Kevin Hongtongsak, Ianna Li, Kalind Thakkar, Kuanysh Omarov, Kushal Majmundar, Michael Alverson, Michael Kucharski, Mohak Patel, Mudit Jain, Maksim Zabelin, Paolo Pelagatti, Rohan Kohli, Saurabh Kumar, Joseph Kim, Swetha Sankar, Vineet Shah, Lakshmi Ramachandruni, Xiangkai Zeng, Ben Bariach, Laura Weidinger, Tu Vu, Amar Subramanya, Sissie Hsiao, Demis Hassabis, Koray Kavukcuoglu, Adam Sadovsky, Quoc Le, Trevor Strohman, Yonghui Wu, Slav Petrov, Jeffrey Dean, Oriol Vinyals,	(参考訳) 本報告では,画像,音声,ビデオ,テキスト理解の両面で優れた機能を示す,新しいマルチモーダルモデルであるGeminiを紹介する。 GeminiファミリーはUltra、Pro、Nanoサイズで構成されており、複雑な推論タスクからオンデバイスメモリ制約のユースケースまで幅広い用途に適している。幅広いベンチマークに対する評価は、我々の最も有能なGemini Ultraモデルが、これらのベンチマークのうち32のベンチマークのうち30の最先端モデルに進歩していることを示している - 特に、よく研究された試験ベンチマークMMLUで人為的なパフォーマンスを達成した最初のモデルであり、調査した20のマルチモーダルベンチマークのうちの1つで最先端モデルが改善されている。 Geminiファミリーのクロスモーダル推論と言語理解における新機能によって、さまざまなユースケースが実現できると考えています。 Gemini、Gemini Advanced、Google AI Studio、Cloud Vertex AIといったサービスを通じて、ユーザに対して責任を負うような、ゲミニモデルのポストトレーニングとデプロイに対する当社のアプローチについて議論する。 This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.	翻訳日:2024-05-22 18:51:19 公開日:2024-05-20
# 自動車をハッキングする予測者をハックする: 自律走行セキュリティのための軌道予測脆弱性を識別するために感度分析を利用する Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous Driving Security ( http://arxiv.org/abs/2401.10313v2 ) ライセンス: Link先を確認	Marsalis Gibson, David Babazadeh, Claire Tomlin, Shankar Sastry,	(参考訳) 学習に基づくマルチモーダル軌道予測器に対する逆攻撃はすでに実証されている。しかし、状態履歴以外の入力に対する摂動の影響や、これらの攻撃が下流の計画と制御にどのように影響するかについては、まだ明らかな疑問がある。本稿では,2つの軌道予測モデルである Trajectron++ と AgentFormer の感度解析を行う。この分析により、全ての入力の間に、両方のモデルに対する摂動感度のほぼ全ては、最新の位置と速度状態にしか属さないことが明らかとなった。さらに、状態履歴の摂動に支配的な感度があるにもかかわらず、高速勾配符号法による検出不可能な画像マップ摂動は、両方のモデルにおいて大きな予測誤差の増加を誘発し、これらの軌跡予測器が実際、画像ベース攻撃の影響を受けやすいことを示した。評価結果から得られた最適プランナーと例摂動を用いて、これらの攻撃によって車両が突然、適度な運転速度から停止する可能性があることを示す。 Adversarial attacks on learning-based multi-modal trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. The analysis reveals that between all inputs, almost all of the perturbation sensitivities for both models lie only within the most recent position and velocity states. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models, revealing that these trajectory predictors are, in fact, susceptible to image-based attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how these attacks can cause a vehicle to come to a sudden stop from moderate driving speeds.	翻訳日:2024-05-22 18:41:35 公開日:2024-05-20
# ニューラルネットワークを用いた大気密度適応型精密火星探査ナビゲーション Precision Mars Entry Navigation with Atmospheric Density Adaptation via Neural Networks ( http://arxiv.org/abs/2401.14411v2 ) ライセンス: Link先を確認	Felipe Giraldo-Grueso, Andrey A. Popov, Renato Zanetti,	(参考訳) 火星に入る宇宙船は、ダイナミックで不確実な大気環境における車両の位置と速度を正確に推定できる正確なナビゲーションアルゴリズムを必要とする。真の火星大気密度とオンボード密度モデルとの相違は、宇宙船の航法フィルタの性能を著しく損なう可能性がある。この研究は、ニューラルネットワークを用いて大気密度を推定し、推定の不確かさを考慮に入れた、火星突入のためのオンラインフィルタリングの新しいアプローチを導入する。ネットワークは指数的な大気密度モデルに基づいて訓練され、そのパラメータは、真の密度と推定された密度のミスマッチを考慮するために、リアルタイムで動的に適応される。ネットワークの適応は、フィルタの測定革新を利用して最適なネットワークパラメータを同定することにより、最大極大問題として定式化される。最大可能性のアプローチのコンテキスト内では、ニューラルネットワークを組み込むことで、機械学習領域におけるその効率で知られている確率的オプティマイザの使用が可能になる。様々な現実的な火星の航法シナリオにおいて、共分散マッチングと状態拡張と修正の2つのオンライン適応アプローチに対して性能比較を行った。その結果、他の手法と比較して推定精度が優れており、火星-GRAMデータから得られた火星の大気を広範囲に選別して推定密度を正確に調整できることがわかった。 Spacecraft entering Mars require precise navigation algorithms capable of accurately estimating the vehicle's position and velocity in dynamic and uncertain atmospheric environments. Discrepancies between the true Martian atmospheric density and the onboard density model can significantly impair the performance of spacecraft entry navigation filters. This work introduces a new approach to online filtering for Martian entry using a neural network to estimate atmospheric density and employing a consider analysis to account for the uncertainty in the estimate. The network is trained on an exponential atmospheric density model, and its parameters are dynamically adapted in real time to account for any mismatch between the true and estimated densities. The adaptation of the network is formulated as a maximum likelihood problem by leveraging the measurement innovations of the filter to identify optimal network parameters. Within the context of the maximum likelihood approach, incorporating a neural network enables the use of stochastic optimizers known for their efficiency in the machine learning domain. Performance comparisons are conducted against two online adaptive approaches, covariance matching and state augmentation and correction, in various realistic Martian entry navigation scenarios. The results show superior estimation accuracy compared to other approaches, and precise alignment of the estimated density with a broad selection of realistic Martian atmospheres sampled from perturbed Mars-GRAM data.	翻訳日:2024-05-22 18:41:35 公開日:2024-05-20
# 予測影響評価支援のためのLCMの能力評価 Evaluating the Capabilities of LLMs for Supporting Anticipatory Impact Assessment ( http://arxiv.org/abs/2401.18028v2 ) ライセンス: Link先を確認	Mowafak Allaham, Nicholas Diakopoulos,	(参考訳) 社会における人工知能(AI)技術の潜在的なネガティブな影響に関する洞察を得ることは、予想されるガバナンスアプローチを実装する上での課題である。このような洞察を生み出すための1つのアプローチは、新興技術の望ましくない結果の範囲を考案し探求する過程で専門家を支援し、ガイドするために、LLM(Large Language Models)を使用することである。しかし、このようなタスクに対するLCMの性能評価は、生成した影響の一般的な品質だけでなく、生成した影響の種類やバイアスも調査するなど、依然として必要である。本稿では, メディアからの多種多様な記事に対して, 微調整完了モデル(GPT-3, Mistral-7B) を用いて, 社会におけるAIの高品質で多様な影響を生み出す可能性を示し, インストラクションベースモデル(GPT-4, Mistral-7B-Instruct) による影響と比較する。我々は, コーヒーレンス, 構造, 妥当性, 信頼性について検討し, メディアからの影響を微調整した小型オープンソースモデルMistral-7Bによる影響は, GPT-4のようなより有能で大規模なモデルで生成された影響と同程度に質的に評価される傾向にあることを示した。さらに, 命令ベースモデルによる影響は, 微調整モデルと比較して, ある種の影響カテゴリーの生成にギャップがあることが判明した。この研究は、最先端のLLMが生み出す影響範囲における潜在的なバイアスと、予想されるガバナンスアプローチを支援するために、より高品質で多様な影響を生み出すためのスケーラブルな代替手段として、より小さなLLMをニュースメディアに整合させる可能性を強調している。 Gaining insight into the potential negative impacts of emerging Artificial Intelligence (AI) technologies in society is a challenge for implementing anticipatory governance approaches. One approach to produce such insight is to use Large Language Models (LLMs) to support and guide experts in the process of ideating and exploring the range of undesirable consequences of emerging technologies. However, performance evaluations of LLMs for such tasks are still needed, including examining the general quality of generated impacts but also the range of types of impacts produced and resulting biases. In this paper, we demonstrate the potential for generating high-quality and diverse impacts of AI in society by fine-tuning completion models (GPT-3 and Mistral-7B) on a diverse sample of articles from news media and comparing those outputs to the impacts generated by instruction-based (GPT-4 and Mistral-7B-Instruct) models. We examine the generated impacts for coherence, structure, relevance, and plausibility and find that the generated impacts using Mistral-7B, a small open-source model fine-tuned on impacts from the news media, tend to be qualitatively on par with impacts generated using a more capable and larger scale model such as GPT-4. Moreover, we find that impacts produced by instruction-based models had gaps in the production of certain categories of impacts in comparison to fine-tuned models. This research highlights a potential bias in the range of impacts generated by state-of-the-art LLMs and the potential of aligning smaller LLMs on news media as a scalable alternative to generate high quality and more diverse impacts in support of anticipatory governance approaches.	翻訳日:2024-05-22 18:41:35 公開日:2024-05-20
# マルチモーダル学習のためのテキスト中心アライメント Text-centric Alignment for Multi-Modality Learning ( http://arxiv.org/abs/2402.08086v2 ) ライセンス: Link先を確認	Yun-Da Tsai, Ting-Yu Yen, Pei-Fu Guo, Zhe-Yan Li, Shou-De Lin,	(参考訳) 本研究では,マルチモーダル学習におけるモダリティミスマッチの課題について考察する。テキスト中心アライメント・フォー・マルチモーダル・ラーニング(TAMML)アプローチは,Large Language Models(LLM)とインコンテキスト・ラーニングと基礎モデルを用いて,これらの条件下でのマルチモーダルシステムの一般化性を高める手法である。テキストのユニークな性質を統一意味空間として活用することにより、TAMMLは目に見えない、多様性があり、予測不可能なモダリティの組み合わせを扱う上で、大幅な改善を示す。 TAMMLは様々なモダリティに適応するだけでなく、堅牢なパフォーマンスも維持し、埋め込み表現における従来の固定モードフレームワークの限界を克服する基礎モデルの可能性を示している。本研究は,モダリティの可用性が動的で不確実な実世界のアプリケーションに対して,フレキシブルで効果的なソリューションを提供することによって,この分野に寄与する。 This research paper addresses the challenge of modality mismatch in multimodal learning, where the modalities available during inference differ from those available at training. We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach, an innovative method that utilizes Large Language Models (LLMs) with in-context learning and foundation models to enhance the generalizability of multimodal systems under these conditions. By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations. TAMML not only adapts to varying modalities but also maintains robust performance, showcasing the potential of foundation models in overcoming the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the field by offering a flexible, effective solution for real-world applications where modality availability is dynamic and uncertain.	翻訳日:2024-05-22 18:31:52 公開日:2024-05-20
# 不規則時系列データ解析における安定なニューラル確率微分方程式 Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data ( http://arxiv.org/abs/2402.14989v3 ) ライセンス: Link先を確認	YongKyung Oh, Dongyoung Lim, Sungil Kim,	(参考訳) 実世界の時系列データにおける不規則サンプリング間隔と欠落値は、一貫した間隔と完全データを仮定する従来の手法の課題を示す。ニューラル正規微分方程式(Neural Ordinary Differential Equations (Neural ODEs))は、パラメータ化されたベクトル場を通して連続的な潜在表現を学習するためにODEソルバと結合されたニューラルネットワークを利用する別のアプローチを提供する。ニューラル確率微分方程式(Neural Stochastic Differential Equations (Neural SDEs))は、拡散項を組み込むことでニューラル ODE を拡張するが、特に不規則区間や欠落値を扱う場合、この加算は自明ではない。その結果, ドリフトと拡散関数の注意設計は安定性の維持と性能の向上に不可欠であるが, 強い解の欠如, 確率的不安定化, 不安定なオイラー離散化などの不適切な選択はニューラルSDEの性能に大きな影響を及ぼす可能性がある。本研究では,Langevin型SDE,Linear Noise SDE,Geometric SDEの3つの安定クラスを提案する。そして, 配電時の性能を良好に維持する上で, 過度な適合を効果的に防止し, その堅牢性を示す。提案手法の有効性を評価するため, 補間, 予測, 分類タスクの4つのベンチマークデータセットに対して広範囲な実験を行い, 欠落率の異なる30個の公開データセットを用いて手法のロバスト性を解析した。本研究は,実世界の不規則時系列データを扱う上で,提案手法の有効性を示すものである。 Irregular sampling intervals and missing values in real-world time series data present challenges for conventional methods that assume consistent intervals and complete data. Neural Ordinary Differential Equations (Neural ODEs) offer an alternative approach, utilizing neural networks combined with ODE solvers to learn continuous latent representations through parameterized vector fields. Neural Stochastic Differential Equations (Neural SDEs) extend Neural ODEs by incorporating a diffusion term, although this addition is not trivial, particularly when addressing irregular intervals and missing values. Consequently, careful design of drift and diffusion functions is crucial for maintaining stability and enhancing performance, while incautious choices can result in adverse properties such as the absence of strong solutions, stochastic destabilization, or unstable Euler discretizations, significantly affecting Neural SDEs' performance. In this study, we propose three stable classes of Neural SDEs: Langevin-type SDE, Linear Noise SDE, and Geometric SDE. Then, we rigorously demonstrate their robustness in maintaining excellent performance under distribution shift, while effectively preventing overfitting. To assess the effectiveness of our approach, we conduct extensive experiments on four benchmark datasets for interpolation, forecasting, and classification tasks, and analyze the robustness of our methods with 30 public datasets under different missing rates. Our results demonstrate the efficacy of the proposed method in handling real-world irregular time series data.	翻訳日:2024-05-22 18:31:52 公開日:2024-05-20
# NiNformer: トケミキシング生成ゲーティング機能を備えたネットワークトランスフォーマーのネットワーク NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function ( http://arxiv.org/abs/2403.02411v3 ) ライセンス: Link先を確認	Abdullah Nazhat Abdullah, Tarkan Aydin,	(参考訳) 注意機構はトランスフォーマーアーキテクチャの主要なコンポーネントであり、導入以来、多くのドメインと複数のタスクにまたがるディープラーニングの大幅な進歩につながっている。注意機構はコンピュータビジョンにおいてビジョントランスフォーマー ViT として利用され、その用途は、分類、セグメンテーション、オブジェクト検出、画像生成など、視覚領域の多くのタスクに拡張されている。このメカニズムは非常に表現力があり能力があるが、計算コストが高く、効率的な最適化のためにかなりのサイズのデータセットを必要とするという欠点がある。これらの欠点に対処するために、計算負担を減らし、データサイズ要件を緩和する多くの設計が文献で提案されている。視覚領域におけるこのような試みの例としては、MLP-Mixer、Conv-Mixer、Perciver-IOなどがある。本稿では,MLP-Mixerの静的アプローチを強化するネットワーク・イン・ネットワーク構造を,トークン・ミキシング・プロセスによって要素ワイド・ゲーティング関数を学習する動的システムに置き換えることで,通常のViTブロックに代わる新しい計算ブロックを提案する。広汎な実験により,視覚領域の画像分類タスクに適用された複数のデータセットのベースラインアーキテクチャよりも優れた性能が得られた。 The attention mechanism is the main component of the transformer architecture, and since its introduction, it has led to significant advancements in deep learning that span many domains and multiple tasks. The attention mechanism was utilized in computer vision as the Vision Transformer ViT, and its usage has expanded into many tasks in the vision domain, such as classification, segmentation, object detection, and image generation. While this mechanism is very expressive and capable, it comes with the drawback of being computationally expensive and requiring datasets of considerable size for effective optimization. To address these shortcomings, many designs have been proposed in the literature to reduce the computational burden and alleviate the data size requirements. Examples of such attempts in the vision domain are the MLP-Mixer, the Conv-Mixer, the Perciver-IO, and many more. This paper introduces a new computational block as an alternative to the standard ViT block that reduces the compute burdens by replacing the normal attention layers with a Network in Network structure that enhances the static approach of the MLP-Mixer with a dynamic system of learning an element-wise gating function by a token mixing process. Extensive experimentation shows that the proposed design provides better performance than the baseline architectures on multiple datasets applied in the image classification task of the vision domain.	翻訳日:2024-05-22 18:22:08 公開日:2024-05-20
# 連鎖蒸留における相互情報の最大化のための学習 Learning to Maximize Mutual Information for Chain-of-Thought Distillation ( http://arxiv.org/abs/2403.03348v2 ) ライセンス: Link先を確認	Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, Ke Ding,	(参考訳) 知識蒸留は、大規模で複雑なモデルからより小さなモデルへ知識を伝達する技術であり、効率的なAIデプロイメントに向けた重要なステップである。 CoT蒸留を利用した新しい手法であるDistilling Step-by-Step (DSS) は、より大型のモデルに対して優れた推論能力を持つ小型モデルを投入することで、約束を証明している。 DSSでは、蒸留されたモデルは、マルチタスク学習フレームワークを通じて合理性を生成し、ラベルを同時に予測する能力を取得する。しかし、DSSは2つのトレーニングタスクの本質的な関係を見落とし、CoT知識とラベル予測のタスクの非効率な統合につながる。そこで本研究では,この2つのタスクの相互関係をインフォメーション・ボトルネックの観点から検討し,それら2つのタスクの表現特徴の相互情報の最大化として定式化する。本稿では,この最適化問題を学習に基づく手法を用いて解くための変分手法を提案する。 4つのデータセットにまたがる実験結果から,本手法は最先端DSSよりも優れていることが示された。本研究は,言語モデルの蒸留およびCoTの応用に関する今後の研究に対する洞察に富んだガイダンスを提供する。コードとモデルはまもなくリリースされる。 Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Code and models will be released soon.	翻訳日:2024-05-22 18:22:08 公開日:2024-05-20
# SPTNet:空間プロンプトチューニングによる一般化カテゴリー発見のための効率的な代替フレームワーク SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning ( http://arxiv.org/abs/2403.13684v2 ) ライセンス: Link先を確認	Hongjun Wang, Sagar Vaze, Kai Han,	(参考訳) Generalized Category Discovery (GCD) は、'seen' クラスと 'unseen' クラスの両方から、ラベル付き 'seen' クラスのイメージのセットから知識を転送することで、未ラベルのイメージを分類することを目的としている。既存のGCDのアプローチにおける重要なテーマは、GCDタスクのために大規模な事前訓練されたモデルを適用することである。しかし、代替的な視点は、データ表現自体を事前訓練されたモデルとの整合性に適応させることである。そこで本研究では,モデルパラメータ(モデルファインタニング)とデータパラメータ(即時学習)を反復的に最適化する,SPTNetと呼ばれる2段階適応手法を提案する。さらに,画像データの空間特性を考慮した空間的プロンプトチューニング手法(SPT)を提案する。我々は,SPTNetを標準ベンチマークで徹底的に評価し,既存のGCD法よりも優れていることを示す。特に, 従来の最先端手法を約10%超えて, SSBの平均精度は61.4%であることがわかった。我々の手法はバックボーンアーキテクチャの0.117%のパラメータを余分に生成するので、この改善は特に顕著である。プロジェクトページ: https://visual-ai.github.io/sptnet.com Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet.	翻訳日:2024-05-22 18:12:24 公開日:2024-05-20
# 分布シフトを伴うハーフスペースの学習:改良アルゴリズムとSQ下界 Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds ( http://arxiv.org/abs/2404.02364v2 ) ライセンス: Link先を確認	Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan,	(参考訳) Klivans, Stavropoulos, Vasilyanの最近の研究は、分散シフトによるテスト可能な学習(TDS学習)の研究を開始した。そこでは、学習者にトレーニング分布からラベル付きサンプルを付与する$\mathcal{D}$、テスト分布からラベル付けされていないサンプルを$\mathcal{D}'$、トレーニングサンプルが対応するテストを通過するたびに$\mathcal{D}'$で低エラーの分類器を出力することを目的としている。それらのモデルは、$\mathcal{D}'$に仮定されることはないという点で、以前のすべての作業から逸脱する。代わりに、トレーニングとテストの分布の限界が等しい場合、テストは(高い確率で)受け入れなければならない。ここでは、ガウスの訓練分布に関するハーフ空間の交叉の基本的なケースに注目し、$k$同質半空間のTDS学習交叉に対する2$(k/\epsilon)^{O(1)}} \mathsf{poly}(d)$-timeアルゴリズムを含む様々な新しい上限を証明している。ガウスのトレーニング分布は、正と負の両方の例(\epsilon$- Balanced)の少なくとも$\epsilon$分を含むという軽微な仮定の下で作業する。また、任意のTDS学習問題に対するSQの下界の最初の集合を証明し、(1) 1 つの半空間に対する $\mathsf{poly}(d,1/\epsilon)$-time TDS 学習に$\epsilon$- Balanced 仮定が必要であること、(2) a $d^{\tilde{\Omega}(\log 1/\epsilon)$ 2 つの一般半空間の交叉に対する$$\epsilon$- Balanced 仮定においても$\epsilon$- Balanced 仮定は必要であることを示す。我々の技術は、TDS学習のツールキットを著しく拡張します。我々は次元の縮小と被覆を用いて、領域適応文学の重要な指標である離散距離の局所化バージョンを計算するための効率的なアルゴリズムを提供する。 Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution $\mathcal{D}$, unlabeled samples from test distribution $\mathcal{D}'$, and the goal is to output a classifier with low error on $\mathcal{D}'$ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on $\mathcal{D}'$. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a $2^{(k/\epsilon)^{O(1)}} \mathsf{poly}(d)$-time algorithm for TDS learning intersections of $k$ homogeneous halfspaces to accuracy $\epsilon$ (prior work achieved $d^{(k/\epsilon)^{O(1)}}$). We work under the mild assumption that the Gaussian training distribution contains at least an $\epsilon$ fraction of both positive and negative examples ($\epsilon$-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the $\epsilon$-balanced assumption is necessary for $\mathsf{poly}(d,1/\epsilon)$-time TDS learning for a single halfspace and (2) a $d^{\tilde{\Omega}(\log 1/\epsilon)}$ lower bound for the intersection of two general halfspaces, even with the $\epsilon$-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.	翻訳日:2024-05-22 18:12:24 公開日:2024-05-20
# 客観性は明らかか? - KhrennikovとQBistへの回答 Is Intersubjectivity Proven? A Reply to Khrennikov and to QBists ( http://arxiv.org/abs/2404.04367v2 ) ライセンス: Link先を確認	Herve Zwirn,	(参考訳) 最近の2つの論文において、クレンニコフは、彼は「大沢の射影定理」(Ozawa intersubjectivity theorem) と呼ぶものを用いて、相互射影性は量子力学において必ずしも検証されていると主張し、QB主義を批判し、より一般的に観賞的であるすべての解釈を批判する。以前の2つのQBist論文と一致して、Khrennikovの証明が有効でない理由を説明します。 In two recent papers Khrennikov uses what he calls Ozawa intersubjectivity theorem to claim that intersubjectivity is necessarily verified in quantum mechanics and to criticize QBism and more generally all interpretations that are perspectival. In agreement with two previous QBist papers, I explain here why Khrennikov proof is not valid but in contrast with one of these papers, I criticize the way intersubjectivity is dealt with in QBism.	翻訳日:2024-05-22 18:12:24 公開日:2024-05-20
# 大規模言語モデルを用いた読解テスト項目の自動生成と評価 Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models ( http://arxiv.org/abs/2404.07720v2 ) ライセンス: Link先を確認	Andreas Säuberli, Simon Clematide,	(参考訳) 可読性テストは、教育から簡易テキストの可読性評価まで、様々なアプリケーションで使用されている。しかし、このようなテストを手動で作成し、品質を保証することは難しく、時間を要する。本稿では,大規模言語モデル(LLM)を用いて,複数項目の読解項目の生成と評価を行う。そこで我々は,ドイツ語読解項目のデータセットをコンパイルし,推測可能性と解答可能性に基づくテキスト情報伝達度(text informativity)と呼ばれる指標を含む,人間と自動評価のための新しいプロトコルを開発した。次に、このプロトコルとデータセットを用いて、Llama 2 と GPT-4 で生成されたアイテムの品質を評価した。以上の結果から,両モデルともゼロショット設定で許容品質のアイテムを生成できることが示唆されるが,GPT-4はLlama 2より明らかに優れていた。また, LLM をアイテムレポジトリから抽出することで, 自動評価に利用できることを示す。このシナリオでは、GPT-4による評価結果はヒトのアノテータに最もよく似ている。全体として、LLMによるゼロショット生成は、読解テスト項目の生成と評価において有望なアプローチである。 Reading comprehension tests are used in a variety of applications, reaching from education to assessing the comprehensibility of simplified texts. However, creating such tests manually and ensuring their quality is difficult and time-consuming. In this paper, we explore how large language models (LLMs) can be used to generate and evaluate multiple-choice reading comprehension items. To this end, we compiled a dataset of German reading comprehension items and developed a new protocol for human and automatic evaluation, including a metric we call text informativity, which is based on guessability and answerability. We then used this protocol and the dataset to evaluate the quality of items generated by Llama 2 and GPT-4. Our results suggest that both models are capable of generating items of acceptable quality in a zero-shot setting, but GPT-4 clearly outperforms Llama 2. We also show that LLMs can be used for automatic evaluation by eliciting item reponses from them. In this scenario, evaluation results with GPT-4 were the most similar to human annotators. Overall, zero-shot generation with LLMs is a promising approach for generating and evaluating reading comprehension test items, in particular for languages without large amounts of available data.	翻訳日:2024-05-22 18:02:40 公開日:2024-05-20
# Wasserstein Wormhole: 変圧器を用いたスケーラブルな最適輸送距離 Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers ( http://arxiv.org/abs/2404.09411v3 ) ライセンス: Link先を確認	Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cassandra Burdziak, Tal Nawy, Anna Gilbert, Dana Pe'er,	(参考訳) 最適輸送(OT)と関連するワッサーシュタイン計量(W)は、分布を比較するための強力でユビキタスなツールである。しかし、コホートサイズが大きくなるにつれて、ペアワイズワッサースタイン距離の計算は急速に困難になる。魅力的な選択肢は、標準多次元スケーリング(MDS)と同様、ユークリッド距離をOT距離にペアでマッピングする埋め込み空間を見つけることである。我々は、変圧器をベースとした自己エンコーダであるワッサーシュタイン・ワームホール(Wasserstein Wormhole)を、ユークリッド距離がOT距離に近似する潜在空間に経験的分布を埋める。 MDS理論を拡張して、目的関数は非ユークリッド距離を埋め込む際に発生する誤差の有界性を示すことを示す。実験的に、ワームホール埋め込み間の距離はワッサーシュタイン距離と密接に一致し、OT距離の線形時間計算を可能にした。 Wasserstein Wormholeは、分散を埋め込みにマッピングするエンコーダとともに、埋め込みを分布にマッピングするデコーダを含み、埋め込み空間内の操作をWasserstein Barycenter EstimationやOT補間といったOT空間に一般化することができる。スケーラビリティと解釈可能性をOTアプローチに貸すことで、Wasserstein Wormholeは計算幾何学と単細胞生物学の分野におけるデータ解析の新たな道を開く。 Optimal transport (OT) and the related Wasserstein metric (W) are powerful and ubiquitous tools for comparing distributions. However, computing pairwise Wasserstein distances rapidly becomes intractable as cohort size grows. An attractive alternative would be to find an embedding space in which pairwise Euclidean distances map to OT distances, akin to standard multidimensional scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder that embeds empirical distributions into a latent space wherein Euclidean distances approximate OT distances. Extending MDS theory, we show that our objective function implies a bound on the error incurred when embedding non-Euclidean distances. Empirically, distances between Wormhole embeddings closely match Wasserstein distances, enabling linear time computation of OT distances. Along with an encoder that maps distributions to embeddings, Wasserstein Wormhole includes a decoder that maps embeddings back to distributions, allowing for operations in the embedding space to generalize to OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By lending scalability and interpretability to OT approaches, Wasserstein Wormhole unlocks new avenues for data analysis in the fields of computational geometry and single-cell biology.	翻訳日:2024-05-22 18:02:40 公開日:2024-05-20
# AIインタフェースにおけるデザインパターンとの相互作用による特徴付けとモデリング Characterizing and modeling harms from interactions with design patterns in AI interfaces ( http://arxiv.org/abs/2404.11370v3 ) ライセンス: Link先を確認	Lujain Ibrahim, Luc Rocher, Ana Valdivia,	(参考訳) 人工知能(AI)システムを用いたアプリケーションの普及は、洗練されたインターフェースを通じてこれらのシステムと対話するユーザの増加につながっている。ヒューマンコンピュータインタラクションの研究は、ユーザー行動と技術的能力とリスクに対するユーザーの認識の両方を形作るインターフェースを長年にわたって示してきた。しかし、AIシステムの社会的および倫理的リスクを評価する実践者や研究者は、人間とAIの相互作用に対する人為的、欺く、没入的なインターフェースの影響を見落としてしまう傾向にある。ここでは,適応型AIシステムを用いたインタフェースの設計は,従来考えられていた以上のフィードバックループによって,カスケード効果をもたらす可能性がある,と論じる。まず、AIインターフェース設計のスコーピングレビューを行い、AIインターフェースに潜在的に有害なデザインパターンの有害なテーマを抽出する。そこで我々は,AIインタフェース設計における影響評価を構造化し,促進する概念モデルとして,AIシステムの設計強化制御(DECAI)を提案する。 DECAIは制御系理論(動的物理系の解析と設計の理論)の原則に基づいて、ヒューマンAIシステムにおけるインターフェースの役割を解明する。推薦システムと対話型言語モデルシステムに関する2つのケーススタディを通じて、AIインタフェース設計の評価にDECAIをどのように利用できるかを示す。 The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces. Human-computer interaction research has long shown that interfaces shape both user behavior and user perception of technical capabilities and risks. Yet, practitioners and researchers evaluating the social and ethical risks of AI systems tend to overlook the impact of anthropomorphic, deceptive, and immersive interfaces on human-AI interactions. Here, we argue that design features of interfaces with adaptive AI systems can have cascading impacts, driven by feedback loops, which extend beyond those previously considered. We first conduct a scoping review of AI interface designs and their negative impact to extract salient themes of potentially harmful design patterns in AI interfaces. Then, we propose Design-Enhanced Control of AI systems (DECAI), a conceptual model to structure and facilitate impact assessments of AI interface designs. DECAI draws on principles from control systems theory -- a theory for the analysis and design of dynamic physical systems -- to dissect the role of the interface in human-AI systems. Through two case studies on recommendation systems and conversational language model systems, we show how DECAI can be used to evaluate AI interface designs.	翻訳日:2024-05-22 18:02:40 公開日:2024-05-20
# STaRK: テキストと関係知識に基づくLLM検索のベンチマーク STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases ( http://arxiv.org/abs/2404.13207v2 ) ライセンス: Link先を確認	Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec,	(参考訳) 複雑な製品検索のような現実世界の複雑なクエリに答えるには、構造化されていない(例:製品のテキスト記述)と構造化された(例:製品の実体関係)情報の混在を含む、半構造化された知識ベースからの正確な検索が必要となることが多い。しかし、以前の研究はテキスト検索と関係検索を個別のトピックとして研究していた。このギャップに対処するため,テキストとリレーショナルKのガベージベース上での大規模半構造評価ベンチマークSTARKを開発した。本ベンチマークでは, 製品検索, 学術論文検索, 精密医療におけるクエリの3分野を対象とする。多様なリレーショナル情報と複雑なテキスト特性を統合した,現実的なユーザクエリを合成する,新たなパイプラインを設計する。我々は,合成クエリの品質を評価するために,厳密な人的評価を行う。さらに、高品質な人為的クエリによるベンチマークを強化し、真の参照を提供する。 STARKは、大規模言語モデル(LLM)によって駆動される検索システムの性能を評価するための総合的なテストベッドとして機能する。実験の結果,STARKは現在の検索システムとLLMシステムに重大な課題を呈し,より有能な検索システムの構築の必要性が示唆された。ベンチマークデータとコードはhttps://github.com/snap-stanford/stark.comで公開されている。 Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational K nowledge Bases. Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries. We further enhance the benchmark with high-quality human-generated queries to provide an authentic reference. STARK serves as a comprehensive testbed for evaluating the performance of retrieval systems driven by large language models (LLMs). Our experiments suggest that STARK presents significant challenges to the current retrieval and LLM systems, indicating the demand for building more capable retrieval systems. The benchmark data and code are available on https://github.com/snap-stanford/stark.	翻訳日:2024-05-22 18:02:40 公開日:2024-05-20
# 音響近似を用いたニューラルネットワーク動的モデルのリアルタイム安全制御 Real-Time Safe Control of Neural Network Dynamic Models with Sound Approximation ( http://arxiv.org/abs/2404.13456v2 ) ライセンス: Link先を確認	Hanjiang Hu, Jianglin Lan, Changliu Liu,	(参考訳) ニューラルネットワークダイナミックモデル(NNDM)の安全な制御は、ロボット工学や多くの応用において重要である。しかし、NNDMの最適安全制御をリアルタイムに計算することは依然として困難である。実時間計算を実現するために,NNDMの音響近似を制御合成に用いることを提案する。特に、NNDMにおけるReLU活性化関数のBernstein多項式オーバー近似(BPO)に基づくBernstein over-approximated Neural Dynamics(BOND)を提案する。 NNDMのBPO緩和における最も安全でない近似状態を用いて、近似による誤差を軽減し、安全制御問題の持続可能性を確保するために、最悪のケース安全性指標を合成する。オンラインリアルタイム最適化では、非線形最悪の安全制約の1次テイラー近似を、高次残差の l2 境界バイアス項を付加した NNDM の線形層として定式化する。異なるニューラルダイナミクスと安全性制約による総合的な実験により、音近似のNNDMは、MIP(Mixed integer Programming)を用いた安全な制御ベースラインよりも10～100倍高速で、最悪の安全指標の有効性と、提案したBONDのリアルタイム大規模設定におけるスケーラビリティが検証された。コードはhttps://github.com/intelligent-control-lab/BOND.comで公開されている。 Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernstein polynomial over-approximation (BPO) of ReLU activation functions in NNDM. To mitigate the errors introduced by the approximation and to ensure persistent feasibility of the safe control problems, we synthesize a worst-case safety index using the most unsafe approximated state within the BPO relaxation of NNDM offline. For the online real-time optimization, we formulate the first-order Taylor approximation of the nonlinear worst-case safety constraint as an additional linear layer of NNDM with the l2 bounded bias term for the higher-order remainder. Comprehensive experiments with different neural dynamics and safety constraints show that with safety guaranteed, our NNDMs with sound approximation are 10-100 times faster than the safe control baseline that uses mixed integer programming (MIP), validating the effectiveness of the worst-case safety index and scalability of the proposed BOND in real-time large-scale settings. The code is available at https://github.com/intelligent-control-lab/BOND.	翻訳日:2024-05-22 18:02:40 公開日:2024-05-20
# トラップイオンを用いたディジタルアナログ反断熱量子最適化 Digital-Analog Counterdiabatic Quantum Optimization with Trapped Ions ( http://arxiv.org/abs/2405.01447v2 ) ライセンス: Link先を確認	Shubham Kumar, Narendra N. Hegade, Alejandro Gomez Cadavid, Murilo Henrique de Oliveira, Enrique Solano, F. Albarrán-Arriagada,	(参考訳) 本稿では,最適化問題に適した反断熱量子力学の,ハードウェア固有の問題依存型ディジタルアナログ量子アルゴリズムを提案する。具体的には,デジタルゲートを補完するアナログ相互作用として,グローバルなM{\o}lmer-S{\o}rensenゲートを活かして,トラップイオンアーキテクチャに着目する。アナログブロックとデジタルステップの最適構成は、純粋にデジタルアプローチに比べて回路深さが大幅に減少することを示す。これは、提案したエンコーディングを使うことで、現在のデバイスのコヒーレンス時間を保ちながら、より多くのキュービットを必要とする、より大きな最適化問題インスタンスに対処できることを意味している。さらに, アナログブロックの最小ゲート忠実度は, 純粋デジタルシミュレーションよりも優れており, 文献で報告されている最良忠実度以下であることが確認された。ディジタル・アナログ符号化の性能を検証するため,最大独立セット問題に取り組み,デジタル・ケースに比べて少ないリソースを必要とすることを示す。このハイブリッド共設計アプローチは、量子最適化問題の効率的な解に対する量子優位性への道を開く。 We introduce a hardware-specific, problem-dependent digital-analog quantum algorithm of a counterdiabatic quantum dynamics tailored for optimization problems. Specifically, we focus on trapped-ion architectures, taking advantage from global M{\o}lmer-S{\o}rensen gates as the analog interactions complemented by digital gates, both of which are available in the state-of-the-art technologies. We show an optimal configuration of analog blocks and digital steps leading to a substantial reduction in circuit depth compared to the purely digital approach. This implies that, using the proposed encoding, we can address larger optimization problem instances, requiring more qubits, while preserving the coherence time of current devices. Furthermore, we study the minimum gate fidelity required by the analog blocks to outperform the purely digital simulation, finding that it is below the best fidelity reported in the literature. To validate the performance of the digital-analog encoding, we tackle the maximum independent set problem, showing that it requires fewer resources compared to the digital case. This hybrid co-design approach paves the way towards quantum advantage for efficient solutions of quantum optimization problems.	翻訳日:2024-05-22 17:52:56 公開日:2024-05-20
# ラストパス漂白における人為的要因 Human Factors in the LastPass Breach ( http://arxiv.org/abs/2405.01795v3 ) ライセンス: Link先を確認	Niroop Sugunaraj,	(参考訳) 本稿では,LastPass攻撃の解析を通じて,サイバー攻撃の複雑な性質について検討する。目標は、目標指向の行動、認知的過負荷、人間の偏見(例えば、楽観主義、アンカーリング)、リスク行動などの要因を緩和することに集中することである。この侵害の分析から得られた発見は、サイバー防衛の人間的側面と技術的側面の両方に対処することで、複雑な脅威に対するサイバーシステムのレジリエンスを著しく向上させるという観点からの支持を提供する。これは、ユーザのインタラクションをシンプルにしつつバランスのとれたアプローチを維持し、ユーザのバイアスを認識させ、サイバーインシデントを防ぐためにリスク回避のプラクティスが不可欠であることを意味します。 This paper examines the complex nature of cyber attacks through an analysis of the LastPass breach. It argues for the integration of human-centric considerations into cybersecurity measures, focusing on mitigating factors such as goal-directed behavior, cognitive overload, human biases (e.g., optimism, anchoring), and risky behaviors. Findings from an analysis of this breach offers support to the perspective that addressing both the human and technical dimensions of cyber defense can significantly enhance the resilience of cyber systems against complex threats. This means maintaining a balanced approach while simultaneously simplifying user interactions, making users aware of biases, and discouraging risky practices are essential for preventing cyber incidents.	翻訳日:2024-05-22 17:52:56 公開日:2024-05-20
# 暗黙のシナリオリダクションによる超量子制約最適化の高速計算 Fast Computation of Superquantile-Constrained Optimization Through Implicit Scenario Reduction ( http://arxiv.org/abs/2405.07965v2 ) ライセンス: Link先を確認	Jake Roth, Ying Cui,	(参考訳) 近年,統計学習や意思決定問題において,公正性や分布変化に対処するためのリスク対応指標として,スーパーチャンティルが注目されている。本稿では,超量子的制約による大規模最適化問題を解くために,高速でスケーラブルで堅牢な2階計算フレームワークを提案する。経験的リスク最小化とは異なり、超量子ベースの最適化は、テール条件予測を計算するために、すべてのシナリオで評価されたランダム関数のランク付けを必要とする。このテールベースの機能は、計算的に不都合に思えるかもしれないが、半滑らか-ニュートンベースのラグランジアン法に有利な設定を提供する。超量子作用素は、テール期待がかなり少ないシナリオを含むため、ニュートン系の次元を効果的に減少させる。特に、関連する2階情報を取得し、行列逆転を行うための余分なコストは、勾配計算に必要な労力に匹敵し、時にはそれ以下である。提案手法は,シナリオ数が決定変数数を超える場合,特に有効である。線形および凸対角2次目的の合成問題において, 数値実験により, 提案手法は, 低精度解のOSQPで実装した乗算器の交互方向法よりも, 線形および2次目的の750倍以上の高速化を実現している。さらに、線形目的の最大25倍、二次目的の最大70倍、線形目的の最大20倍、二次目的の最大30倍、高精度解計算のPortfolio Safeguard最適化スイートよりも高速である。 Superquantiles have recently gained significant interest as a risk-aware metric for addressing fairness and distribution shifts in statistical learning and decision making problems. This paper introduces a fast, scalable and robust second-order computational framework to solve large-scale optimization problems with superquantile-based constraints. Unlike empirical risk minimization, superquantile-based optimization requires ranking random functions evaluated across all scenarios to compute the tail conditional expectation. While this tail-based feature might seem computationally unfriendly, it provides an advantageous setting for a semismooth-Newton-based augmented Lagrangian method. The superquantile operator effectively reduces the dimensions of the Newton systems since the tail expectation involves considerably fewer scenarios. Notably, the extra cost of obtaining relevant second-order information and performing matrix inversions is often comparable to, and sometimes even less than, the effort required for gradient computation. Our developed solver is particularly effective when the number of scenarios substantially exceeds the number of decision variables. In synthetic problems with linear and convex diagonal quadratic objectives, numerical experiments demonstrate that our method outperforms existing approaches by a large margin: It achieves speeds more than 750 times faster for linear and quadratic objectives than the alternating direction method of multipliers as implemented by OSQP for computing low-accuracy solutions. Additionally, it is up to 25 times faster for linear objectives and 70 times faster for quadratic objectives than the commercial solver Gurobi, and 20 times faster for linear objectives and 30 times faster for quadratic objectives than the Portfolio Safeguard optimization suite for high-accuracy solution computations.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# マルチカバーのためのセンサネットワーク設計の最適化 Optimizing Sensor Network Design for Multiple Coverage ( http://arxiv.org/abs/2405.09096v2 ) ライセンス: Link先を確認	Lukas Taus, Yen-Hsi Richard Tsai,	(参考訳) センサ配置最適化法は広く研究されている。それらは、既知の環境の監視、5Gタワーの最適な位置、ミサイル防衛システムの配置など、幅広い用途に適用できる。しかし、センサーの故障や敵の攻撃に関するセンサネットワークの堅牢性と効率性を調べる研究はほとんどない。本稿では、最小限のセンサを最適化して、所定の数のセンサによって、非単純連結領域の複数のカバレッジを実現することで、この問題に対処する。本稿では,より効率的で堅牢なセンサネットワークを設計し,ネットワークの最適性に関する理論的境界を導出するための,新しい目的関数(greedy,next-best-view)アルゴリズムを提案する。さらに,ほぼリアルタイムに計算を行うアルゴリズムを高速化するディープラーニングモデルを導入する。ディープラーニングモデルは、トレーニング例の生成を必要とする。それに対応して、トレーニングデータセットの幾何学的特性を理解することは、深層学習技術の性能と訓練過程に重要な洞察を与えることを示す。最後に,より単純な目的を用いたグレディアプローチの単純な並列バージョンは,非常に競争力が高いことを実証する。 Sensor placement optimization methods have been studied extensively. They can be applied to a wide range of applications, including surveillance of known environments, optimal locations for 5G towers, and placement of missile defense systems. However, few works explore the robustness and efficiency of the resulting sensor network concerning sensor failure or adversarial attacks. This paper addresses this issue by optimizing for the least number of sensors to achieve multiple coverage of non-simply connected domains by a prescribed number of sensors. We introduce a new objective function for the greedy (next-best-view) algorithm to design efficient and robust sensor networks and derive theoretical bounds on the network's optimality. We further introduce a Deep Learning model to accelerate the algorithm for near real-time computations. The Deep Learning model requires the generation of training examples. Correspondingly, we show that understanding the geometric properties of the training data set provides important insights into the performance and training process of deep learning techniques. Finally, we demonstrate that a simple parallel version of the greedy approach using a simpler objective can be highly competitive.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# キラルスピン液体からの創発性マヨラナ金属 Emergent Majorana metal from a chiral spin liquid ( http://arxiv.org/abs/2405.12278v1 ) ライセンス: Link先を確認	Penghao Zhu, Shi Feng, Kang Wang, Tao Xiang, Nandini Trivedi,	(参考訳) 強磁性キラルスピン液体(CSL)とパーシャル偏極(PP)相の間に挟まれた反強磁性キエフモデルにおける中間ギャップレススピン液相(IGP)の出現を説明する新しいメカニズムを提案する。中程度のフィールドで$\pi$-fluxes nucleateを基底状態で提案し,Majoranaゼロモードをトラップすることができる。これらのフラックスが磁場の増加とともに増殖するにつれて、マヨラナゼロモードは重なり、ゼロエネルギーで「フェルミ面」を持つマヨラナ金属状態を生成する。さらに、Majoranaスペクトル関数は、無限射影対角状態(iPEPS)アンサッツによって得られる動的スピンと二量相関をキャプチャすることを示した。本研究は, 候補の北エフ材料に対する結果の意味について論じる。 We propose a novel mechanism to explain the emergence of an intermediate gapless spin liquid phase (IGP) in the antiferromagnetic Kitaev model in an externally applied magnetic field, sandwiched between the well-known gapped chiral spin liquid (CSL) and the gapped partially polarized (PP) phase. We propose in moderate fields $\pi$-fluxes nucleate in the ground state and can trap Majorana zero modes. As these fluxes proliferate with increasing field, the Majorana zero modes overlap creating an emergent Majorana metallic state with a `Fermi surface' at zero energy. We further show that the Majorana spectral function captures the dynamical spin and dimer correlations obtained by the infinite Projected Entangled Pair States (iPEPS) ansatz. We discuss the implications of our results for candidate Kitaev materials.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# Anyonic系における多体非エルミチアン皮膚効果の動的抑制 Dynamical suppression of many-body non-Hermitian skin effect in Anyonic systems ( http://arxiv.org/abs/2405.12288v1 ) ライセンス: Link先を確認	Yi Qin, Ching Hua Lee, Linhu Li,	(参考訳) 非エルミート皮膚効果(英: non-Hermitian skin effect, NHSE)は、非平衡系において、固有状態が系の境界に大きく局在し、これらの系にロードされる(準)粒子を一方向的に境界に励起する現象である。多体効果との相互作用は近年活発に研究され、粒子間反発またはフェルミ縮退圧力は、NHSEによって誘発される境界蓄積を、その固有解法と力学の両方で制限することが示されている。しかし、この研究において、任意の統計学がNHSEの局所化方向に対して状態力学を抑圧したり反転させたりすることで、NHSEの力学にさらに深い影響を与えることが判明した。この系における量子情報の拡散は、NHSEが熱アンサンブルのための情報力学だけに影響を及ぼすが、1つの初期状態には影響しない、さらにエキゾチックな現象を示す。我々の研究結果は、NHSEと正準統計学の相互作用から生じる新しい非エルミート現象を探求する新たな道を開き、超低温原子量子シミュレータや量子コンピュータで実証できる可能性がある。 The non-Hermitian skin effect (NHSE) is a fascinating phenomenon in nonequilibrium systems where eigenstates massively localize at the systems' boundaries, pumping (quasi-)particles loaded in these systems unidirectionally to the boundaries. Its interplay with many-body effects have been vigorously studied recently, and inter-particle repulsion or Fermi degeneracy pressure have been shown to limit the boundary accumulation induced by the NHSE both in their eigensolutions and dynamics. However, in this work we found that anyonic statistics can even more profoundly affect the NHSE dynamics, suppressing or even reversing the state dynamicss against the localizing direction of the NHSE. This phenomenon is found to be more pronounced when more particles are involved.The spreading of quantum information in this system shows even more exotic phenomena, where NHSE affects only the information dynamics for a thermal ensemble, but not that for a single initial state. Our results open up a new avenue on exploring novel non-Hermitian phenomena arisen from the interplay between NHSE and anyonic statistics, and can potentially be demonstrated in ultracold atomic quantum simulators and quantum computers.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# 走査プローブ顕微鏡と高性能コンピューティングの統合:固定政治と報酬駆動ワークフローの実装 Integration of Scanning Probe Microscope with High-Performance Computing: fixed-policy and reward-driven workflows implementation ( http://arxiv.org/abs/2405.12300v1 ) ライセンス: Link先を確認	Yu Liu, Utkarsh Pratiush, Jason Bemis, Roger Proksch, Reece Emery, Philip D. Rack, Yu-Chen Liu, Jan-Chi Yang, Stanislav Udovenko, Susan Trolier-McKinstry, Sergei V. Kalinin,	(参考訳) 計算能力と機械学習アルゴリズムの急速な発展は、走査型プローブ顕微鏡(SPM)による科学的発見の自動化の道を開いた。自動化されたSPMの運用に向けた重要な要素は、PythonコードからのSPM制御を可能にするインターフェース、高いコンピューティングパワーの可用性、科学的発見のためのワークフローの開発である。ここでは、ローカルコンピュータまたはリモート高性能コンピュータ(HPC)からSPMを制御することができるPythonインターフェースライブラリを構築し、自律ワークフローにおける機械学習アルゴリズムの計算能力の向上を満足する。さらに、科学的な発見におけるSPMの操作を固定政治や報酬駆動のワークフローに抽象化するための一般的なプラットフォームも導入する。私たちの作業は、ルーチン操作と機械学習による自律的な科学的発見の両方のために、自動化されたSPMワークフローを構築するための完全なインフラストラクチャを提供します。 The rapid development of computation power and machine learning algorithms has paved the way for automating scientific discovery with a scanning probe microscope (SPM). The key elements towards operationalization of automated SPM are the interface to enable SPM control from Python codes, availability of high computing power, and development of workflows for scientific discovery. Here we build a Python interface library that enables controlling an SPM from either a local computer or a remote high-performance computer (HPC), which satisfies the high computation power need of machine learning algorithms in autonomous workflows. We further introduce a general platform to abstract the operations of SPM in scientific discovery into fixed-policy or reward-driven workflows. Our work provides a full infrastructure to build automated SPM workflows for both routine operations and autonomous scientific discovery with machine learning.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# リコメンダシステムにおけるインテントによる多様化 Diversifying by Intent in Recommender Systems ( http://arxiv.org/abs/2405.12327v1 ) ライセンス: Link先を確認	Yuyan Wang, Cheenar Banerjee, Samer Chucri, Fabio Soldo, Sriraj Badam, Ed H. Chi, Minmin Chen,	(参考訳) 短期的なエンゲージメントに過度にフォーカスするレコメンダシステムが、必然的に長期的なユーザエクスペリエンスを損なうことは、ますます明白になっている。しかし、望まれる信号があいまいでノイズがあり、長い視野で現れるため、長期的なユーザーエクスペリエンスを直接最適化することは困難である。本研究では,複数のインタラクションやレコメンデーションセッションにまたがるユーザインテントを高レベルのユーザ理解を導入することで,長期的なユーザエクスペリエンスを最適化するためのページ全体のレコメンデーションを実現することのメリットを示す。ユーザインテントは主に検索のコンテキスト内で調査されているが、リコメンダシステムでは探索されていない。このギャップを埋めるため,提案システムの最終段階において,確率論的意図に基づく全ページ多様化フレームワークを開発する。従来のユーザ意図の信念から始めると、提案フレームワークはこれらの信念に基づいて各位置の項目を逐次選択し、その後、その意図に関する過去の信念を更新する。長期ユーザーエクスペリエンスを最適化するために、異なるユーザ意図がページ内で表現されることを保証する。我々は、世界最大のコンテンツレコメンデーションプラットフォームのひとつで、毎日何十億ものユーザーにサービスを提供しています。我々のフレームワークは,ユーザの探究意図を取り入れ,新たな関心やコンテンツを探究する機会を捉えている。ライブ実験により,提案手法がユーザ維持とユーザ満足度の向上につながり,長期計画の促進効果が検証された。特に、ユーザは、時間とともに基盤となる意図と整合した多様なコンテンツを一貫して発見し、関与することができるため、長期的なユーザーエクスペリエンスが向上する。 It has become increasingly clear that recommender systems overly focusing on short-term engagement can inadvertently hurt long-term user experience. However, it is challenging to optimize long-term user experience directly as the desired signal is sparse, noisy and manifests over a long horizon. In this work, we show the benefits of incorporating higher-level user understanding, specifically user intents that can persist across multiple interactions or recommendation sessions, for whole-page recommendation toward optimizing long-term user experience. User intent has primarily been investigated within the context of search, but remains largely under-explored for recommender systems. To bridge this gap, we develop a probabilistic intent-based whole-page diversification framework in the final stage of a recommender system. Starting with a prior belief of user intents, the proposed diversification framework sequentially selects items at each position based on these beliefs, and subsequently updates posterior beliefs about the intents. It ensures that different user intents are represented in a page towards optimizing long-term user experience. We experiment with the intent diversification framework on one of the world's largest content recommendation platforms, serving billions of users daily. Our framework incorporates the user's exploration intent, capturing their propensity to explore new interests and content. Live experiments show that the proposed framework leads to an increase in user retention and overall user enjoyment, validating its effectiveness in facilitating long-term planning. In particular, it enables users to consistently discover and engage with diverse contents that align with their underlying intents over time, thereby leading to an improved long-term user experience.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# 3Qubit反強磁性熱機械における磁気異方性の影響 Effects of Magnetic Anisotropy on 3-Qubit Antiferromagnetic Thermal Machines ( http://arxiv.org/abs/2405.12339v1 ) ライセンス: Link先を確認	Bastian Castorene, Francisco J. Peña, Ariel Norambuena, Sergio E. Ulloa, Cristobal Araya, Patricio Vargas,	(参考訳) 本研究は, 反強磁性ハイゼンベルクXXXモデルによって記述された, 鎖と環のトポロジーを持つ3つの量子ビット系の異方性効果について検討する。我々はスターリングサイクルとオットーサイクルを探索し、容易な軸異方性は全てのケースにおいてエンジン効率を大幅に向上させることを示した。低温では、リング構成はスターリングサイクル中の作業と効率の両方においてチェーンよりも優れる。さらに、両方のトポロジーにおいて、スターリングサイクルは量子臨界点における有限の作用でカルノー効率を達成する。対照的に、準静電オットーエンジンはこれらの点でカルノット効率に達するが、有用な作業は得られない。特にスターリングサイクルは、エンジンまたは冷凍機としてのみ機能する準静的オットーサイクルとは異なり、全ての熱運転用エンジン、冷蔵庫、ヒーター、加速器を展示している。 This study investigates the anisotropic effects on a system of three qubits with chain and ring topology, described by the antiferromagnetic Heisenberg XXX model subjected to a homogeneous magnetic field. We explore the Stirling and Otto cycles and find that easy-axis anisotropy significantly enhances engine efficiency across all cases. At low temperatures, the ring configuration outperforms the chain on both work and efficiency during the Stirling cycle. Additionally, in both topologies, the Stirling cycle achieves Carnot efficiency with finite work at quantum critical points. In contrast, the quasistatic Otto engine also reaches Carnot efficiency at these points but yields no useful work. Notably, the Stirling cycle exhibits all thermal operational regimes engine, refrigerator, heater, and accelerator unlike the quasistatic Otto cycle, which functions only as an engine or refrigerator.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# 拡散干渉下における因果効果推定のためのカスケードに基づくランダム化 Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference ( http://arxiv.org/abs/2405.12340v1 ) ライセンス: Link先を確認	Zahra Fatemi, Jean Pouget-Abadie, Elena Zheleva,	(参考訳) 個人の結果が近隣ノードの処理の割り当てや行動に依存する可能性がある干渉の存在は、バイアスのある因果効果の推定につながる可能性がある。ネットワーク設計への現在のアプローチは、クラスタベースのランダム化による干渉の制限に焦点を当てており、クラスタをグラフクラスタリングを用いて識別し、クラスタランダム化はノードの処理と制御を規定する。しかし、クラスタベースのランダム化アプローチは、干渉がカスケード内で伝播し、治療に対する個人の反応が近隣のマルチホップに伝播すると、性能が低下する。カスケードシードノードの知識があれば、この干渉構造を利用して因果効果推定バイアスを軽減することができる。本研究の目的は,カスケードシードノードからの処理の割り当てを開始して,カスケード成長中の干渉を制限するために,それらのマルチホップ近傍への割り当てを伝搬し,全体的な因果効果推定誤差を低減するカスケードベースのネットワーク実験設計を提案することである。実世界のデータセットと合成データセットに関する広範な実験により、提案するフレームワークは、ネットワークデータにおける因果効果を推定する上で、既存の最先端アプローチよりも優れていることを示した。 The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# ニューラル演算子に基づく高速解像器を用いた大規模散乱 Large scale scattering using fast solvers based on neural operators ( http://arxiv.org/abs/2405.12380v1 ) ライセンス: Link先を確認	Zongren Zou, Adar Kahana, Enrui Zhang, Eli Turkel, Rishikesh Ranade, Jay Pathak, George Em Karniadakis,	(参考訳) 我々は最近提案された機械学習に基づく反復解法(HINTS)を拡張し,複雑な吸収境界条件を持つ外界領域におけるヘルムホルツ方程式によって記述される散乱問題を解く。 HINTS法は、ニューラル演算子(NO)と標準イテレーティブソルバ(eg Jacobi と Gauss-Seidel (GS))を組み合わせて、ニューラルネットワークのスペクトルバイアスを利用してより良い性能を実現する。 HINTSでは、従来の反復法のいくつかのイテレーションは、事前訓練されたNOの推論に置き換えられる。本研究では,HINTSを用いて,標準反復解法が失敗する2次元および3次元問題の散乱問題を解く。 2次元の正方形および三角形の散乱器と3次元の立方体とモデル潜水艦を考える。本研究では,非散乱シナリオ上でNOをトレーニングし,HINTSにNOを配置することで,散乱問題の解法として実現した散乱器の多様なジオメトリを扱うHINTSの補間能力について考察する。 HINTS法におけるNOは,新しい散乱器が与えられるたびに再トレーニングや微調整を行わずに有効であることを示す。その結果,多様な分散問題に対処する拡張HINTS手法の適応性と汎用性を強調した。 We extend a recently proposed machine-learning-based iterative solver, i.e. the hybrid iterative transferable solver (HINTS), to solve the scattering problem described by the Helmholtz equation in an exterior domain with a complex absorbing boundary condition. The HINTS method combines neural operators (NOs) with standard iterative solvers, e.g. Jacobi and Gauss-Seidel (GS), to achieve better performance by leveraging the spectral bias of neural networks. In HINTS, some iterations of the conventional iterative method are replaced by inferences of the pre-trained NO. In this work, we employ HINTS to solve the scattering problem for both 2D and 3D problems, where the standard iterative solver fails. We consider square and triangular scatterers of various sizes in 2D, and a cube and a model submarine in 3D. We explore and illustrate the extrapolation capability of HINTS in handling diverse geometries of the scatterer, which is achieved by training the NO on non-scattering scenarios and then deploying it in HINTS to solve scattering problems. The accurate results demonstrate that the NO in HINTS method remains effective without retraining or fine-tuning it whenever a new scatterer is given. Taken together, our results highlight the adaptability and versatility of the extended HINTS methodology in addressing diverse scattering problems.	翻訳日:2024-05-22 17:43:12 公開日:2024-05-20
# 静的AI評価を超えて: LLMの害とリスクに対する人間のインタラクション評価を前進させる Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks ( http://arxiv.org/abs/2405.10632v2 ) ライセンス: Link先を確認	Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung,	(参考訳) モデル評価は、AIシステムの安全性、リスク、社会的影響を理解する上で重要である。ほとんどの実世界のAIアプリケーションは人間とAIのインタラクションを含んでいるが、AIモデルの現在の評価(例えば、一般的なベンチマーク)はそうではない。その代わりに、人間的要因を限定的に組み込んで、モデルの安全性を個別に評価することで、人間とモデルの相互作用の複雑さを捉えることができない。本稿では,人-モデルインタラクションの評価や,モデルを用いた人-モデルインタラクションのプロセスと結果に焦点をあてた,新たな評価カテゴリ"ヒューマンインタラクション評価" (HIEs) の定義と運用について論じる。まず、HIEは安全性評価の妥当性を高め、直接人的影響と相互作用特異的害を評価し、モデルによる社会的影響の今後の評価を導くために使用できると論じる。第2に,安全性を重視したHIE設計フレームワーク(人-LLM相互作用分類を含む)について,(1)危険領域の同定,(2)使用状況の特徴付け,(3)評価パラメータの選択の3段階について提案する。第3に、過信と説得リスクの2つの潜在的評価に我々の枠組みを適用します。最後に,HIEのコスト,複製性,非表現性に関する懸念に対処するための具体的な勧告を述べる。 Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.	翻訳日:2024-05-22 17:23:37 公開日:2024-05-20
# INDUS:科学応用のための効率的かつ効率的な言語モデル INDUS: Effective and Efficient Language Models for Scientific Applications ( http://arxiv.org/abs/2405.10725v2 ) ライセンス: Link先を確認	Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kaylin Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grezes, Megan Ansdell, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis, Michele Dolfi, Rafael Teixeira de Lima, Panagiotis Vagenas, S. Karthik Mukkavilli, Peter Staar, Sanaz Vahidinia, Ryan McGranaghan, Armin Mehrabian, Tsendgar Lee,	(参考訳) 言語モデル(LLM)は、自然言語処理(NLP)タスクにおいて顕著な結果を示した。しかし、以前の研究では、ドメイン中心のコーパスを使用して訓練されたLLMが、専門的なタスクでより良く機能することを示した。この中心的な洞察に触発されて、地球科学、生物学、物理学、ヘリオ物理、惑星科学、天体物理学領域に適した総合的なLLMスイートであるINDUSを開発し、多様なデータソースから得られたキュレートされた科学コーパスを用いて訓練した。 1) 自然言語理解タスクに対処するために,ドメイン固有の語彙とコーパスを用いて訓練されたエンコーダモデル,(2) 複数のソースから抽出された多様なデータセットを用いて訓練された対照的な学習ベースの汎用テキスト埋め込みモデル,(3) 待ち時間やリソース制約のあるアプリケーションに対処するために知識蒸留技術を用いて作成された,これらのモデルのより小さなバージョンである。また、これらの分野の研究を加速するために、CLIMATE-CHANGE-NER(entity-recognition)、NASA-QA(extractive QA)、NASA-IR(IR)という3つの新しい科学的ベンチマークデータセットを作成しました。最後に、我々のモデルは、これらの新しいタスクにおける汎用エンコーダ(RoBERTa)と既存のドメイン固有エンコーダ(SciBERT)、および関心領域における既存のベンチマークタスクよりも優れていることを示す。 Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.	翻訳日:2024-05-22 17:23:37 公開日:2024-05-20
# 線形クエリ複雑度を考慮したKnapsack制約下での非単調部分モジュラ最大化に対する決定論的近似アルゴリズムの強化 Enhanced Deterministic Approximation Algorithm for Non-monotone Submodular Maximization under Knapsack Constraint with Linear Query Complexity ( http://arxiv.org/abs/2405.12252v1 ) ライセンス: Link先を確認	Canh V. Pham,	(参考訳) 本研究では,Knapsack (SMK) 制約問題に基づく部分モジュラ最大化について,n$の基底集合上で検討する。この問題は、最適化、人工知能、機械学習といった様々な分野に応用されているため、最近多くの注目を集めた。我々は、最も高速な決定論的アルゴリズムの近似係数を、6+\epsilon$から5+\epsilon$に改善し、最高のクエリ複雑性は$O(n)$で、$\epsilon > 0$は定数パラメータである。本手法は, しきい値のグリーディ・サブルーチンと, 候補解としての2つの解集合の構築という, 2つの成分の性能を最適化することに基づいている。さらに、候補解のコストを慎重に分析することにより、より厳密な近似係数が得られる。 In this work, we consider the Submodular Maximization under Knapsack (SMK) constraint problem over the ground set of size $n$. The problem recently attracted a lot of attention due to its applications in various domains of combination optimization, artificial intelligence, and machine learning. We improve the approximation factor of the fastest deterministic algorithm from $6+\epsilon$ to $5+\epsilon$ while keeping the best query complexity of $O(n)$, where $\epsilon >0$ is a constant parameter. Our technique is based on optimizing the performance of two components: the threshold greedy subroutine and the building of two disjoint sets as candidate solutions. Besides, by carefully analyzing the cost of candidate solutions, we obtain a tighter approximation factor.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# マンモCLIP:マンモグラフィーにおけるデータ効率とロバスト性を高めるビジョン言語基礎モデル Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography ( http://arxiv.org/abs/2405.12255v1 ) ライセンス: Link先を確認	Shantanu Ghosh, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich,	(参考訳) 乳がん検出におけるCAD(Computer-Aided Diagnosis)の大規模かつ多様なトレーニングデータが欠如していることが,システム導入の障害となっている。近年,VLM(\eg CLIP)を用いた大規模画像テキストデータセットによる事前トレーニングでは,コンピュータビジョン(CV)における堅牢性とデータ効率の問題が部分的に解決されている。本稿では,大量のマンモグラム-レポートペアを事前学習した最初のVLMであるMammo-CLIPを提案する。乳がん検出に欠かせない様々なマンモグラフィー特性の分類, 位置決定, データ効率, CVにおけるCLIPと類似した堅牢性について検討した。また,マンモグラフィーレポートにおける文レベルの粒度による表現の空間的解釈を実現するために,新しい特徴属性法であるマンモファクタを提案する。コードは公開されている。 \url{https://github.com/batmanlab/Mammo-CLIP}。 The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 大規模言語モデルによる科学的仮説生成:乳癌治療における検査的検証 Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment ( http://arxiv.org/abs/2405.12258v1 ) ライセンス: Link先を確認	Abbi Abdel-Rehim, Hector Zenil, Oghenejokpeme Orhobor, Marie Fisher, Ross J. Collins, Elizabeth Bourne, Gareth W. Fearnley, Emma Tate, Holly X. Smith, Larisa N. Soldatova, Ross D. King,	(参考訳) 大規模言語モデル(LLM)はAIを変革し、人間の知性を必要とする幅広いタスクにおいて画期的なパフォーマンスを達成した。科学において、LLMの最も興味深い応用は仮説形成である。 LLMの特徴は、その確率的構造から生じるものであり、出力テキストが必ずしもトレーニングテキストからの有効な推論であるとは限らないことである。これらは「幻覚」であり、多くのアプリケーションにおいて深刻な問題である。しかし、科学では幻覚は有用であり、実験室で検証できる新しい仮説である。ここでは乳がん治療の分野での科学的仮説の根拠としてLLMの使用を実験的に検証する。 LLM GPT4を用いて,MCF7乳がん細胞株を標的とした新しいFDA承認非癌薬の仮説を立証した。実験の第1ラウンドで、GPT4は、正の制御以上のシナジースコアを持つ3つの薬物の組み合わせ(テストされた12のうち)を発見することに成功した。これらの組み合わせはイトラコナゾール+アテノール、ジスルフィラム+シムバスタチン、ジピリダモール+メベンダゾールである。その後、GPT4は最初の結果を考慮して新しい組み合わせを生成するよう求められた。その後、さらに3つの正のシナジースコア(4つの試験のうち)が発見され、これらはジスルフィラム+フヴェストラント、メベンダゾール+キナクリン、ジスルフィラム+キナクリンであった。仮説の生成元としてのGPT4の限界は、それらの説明が定式化され、説得力がないことである。 LLMは科学的仮説のエキサイティングな新しい源であると結論付けている。 Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structure, is that the output text is not necessarily a valid inference from the training text. These are 'hallucinations', and are a serious problem in many applications. However, in science, hallucinations may be useful: they are novel hypotheses whose validity may be tested by laboratory experiments. Here we experimentally test the use of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment. We applied the LLM GPT4 to hypothesize novel pairs of FDA-approved non-cancer drugs that target the MCF7 breast cancer cell line relative to the non-tumorigenic breast cell line MCF10A. In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations (out of 12 tested) with synergy scores above the positive controls. These combinations were itraconazole + atenolol, disulfiram + simvastatin and dipyridamole + mebendazole. GPT4 was then asked to generate new combinations after considering its initial results. It then discovered three more combinations with positive synergy scores (out of four tested), these were disulfiram + fulvestrant, mebendazole + quinacrine and disulfiram + quinacrine. A limitation of GPT4 as a generator of hypotheses was that its explanations for them were formulaic and unconvincing. We conclude that LLMs are an exciting novel source of scientific hypotheses.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 特徴量に基づく性能予測モデルの一般化能力:ベンチマークによる統計的解析 Generalization Ability of Feature-based Performance Prediction Models: A Statistical Analysis across Benchmarks ( http://arxiv.org/abs/2405.12259v1 ) ライセンス: Link先を確認	Ana Nikolikj, Ana Kostovska, Gjorgjina Cenikj, Carola Doerr, Tome Eftimov,	(参考訳) 本研究では,アルゴリズム性能予測モデルの各種ベンチマークスイートにおける一般化能力について検討した。探索ランドスケープ解析の特徴に基づく性能予測モデルの精度と問題収集の統計的類似性を比較すると、これらの2つの指標の間には正の相関関係があることが分かる。具体的には、トレーニングスイートとテストスイート間の高次元的特徴値分布が統計的に重要でない場合、テストエラーがトレーニングエラーと同じ範囲にあるという意味で、モデルをうまく一般化する傾向にある。 2つの実験により、標準ベンチマークスイート、BBOBおよびCECコレクション、およびBBOB問題インスタンスのアフィン組み合わせの5つのコレクションを使用して、これらの結果が検証された。 This study examines the generalization ability of algorithm performance prediction models across various benchmark suites. Comparing the statistical similarity between the problem collections with the accuracy of performance prediction models that are based on exploratory landscape analysis features, we observe that there is a positive correlation between these two measures. Specifically, when the high-dimensional feature value distributions between training and testing suites lack statistical significance, the model tends to generalize well, in the sense that the testing errors are in the same range as the training errors. Two experiments validate these findings: one involving the standard benchmark suites, the BBOB and CEC collections, and another using five collections of affine combinations of BBOB problem instances.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# EXACT: 機械学習モデル説明手法を実証的にベンチマークするプラットフォームを目指して EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods ( http://arxiv.org/abs/2405.12261v1 ) ライセンス: Link先を確認	Benedict Clark, Rick Wilming, Artur Dox, Paul Eschenbach, Sami Hached, Daniel Jin Wodke, Michias Taye Zewdie, Uladzislau Bruila, Marta Oliveira, Hjalmar Schulz, Luca Matteo Cornils, Danny Panknin, Ahcène Boubekki, Stefan Haufe,	(参考訳) 説明可能な人工知能(XAI)の進化する展望は、複雑な機械学習(ML)モデルの解釈可能性を改善することを目的としている。本稿では,初期ベンチマークプラットフォームであるEXACT(Explainable AI Comparison Toolkit)に,さまざまなベンチマークデータセットと新たなパフォーマンス指標を組み込むことにより,XAI手法の評価のための標準化された基盤を提供する。提案するデータセットは, クラス条件の特徴に対する基礎的真理の説明と, 新たな定量的指標を活用して, それらが生成する説明の質において, ポストホックなXAI手法の性能を評価する。我々の最近の知見は、しばしばランダムなベースラインを超えるのに苦労し、無関係な特徴に寄与するため、人気のあるXAI手法の限界を浮き彫りにした。さらに、モデルアーキテクチャが等しく動作する異なるモデルアーキテクチャから導かれる説明において、変動性を示す。この初期ベンチマークプラットフォームは、XAI研究者が新たに開発した手法の高品質をテストし、確実にすることを目的としている。 The evolving landscape of explainable artificial intelligence (XAI) aims to improve the interpretability of intricate machine learning (ML) models, yet faces challenges in formalisation and empirical validation, being an inherently unsupervised process. In this paper, we bring together various benchmark datasets and novel performance metrics in an initial benchmarking platform, the Explainable AI Comparison Toolkit (EXACT), providing a standardised foundation for evaluating XAI methods. Our datasets incorporate ground truth explanations for class-conditional features, and leveraging novel quantitative metrics, this platform assesses the performance of post-hoc XAI methods in the quality of the explanations they produce. Our recent findings have highlighted the limitations of popular XAI methods, as they often struggle to surpass random baselines, attributing significance to irrelevant features. Moreover, we show the variability in explanations derived from different equally performing model architectures. This initial benchmarking platform therefore aims to allow XAI researchers to test and assure the high quality of their newly developed methods.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 一般車両ルーティングのためのプロンプト学習 Prompt Learning for Generalized Vehicle Routing ( http://arxiv.org/abs/2405.12262v1 ) ライセンス: Link先を確認	Fei Liu, Xi Lin, Weiduo Liao, Zhenkun Wang, Qingfu Zhang, Xialiang Tong, Mingxuan Yuan,	(参考訳) ニューラル組合せ最適化(Neural combinatorial Optimization, NCO)は、手作業によるアルゴリズム設計を伴わずに、様々な車両ルーティング問題を解決するための、有望な学習ベースのアプローチである。しかし、現在のNCO法は主に分配性能に重点を置いているのに対し、実際の問題インスタンスは通常異なる分布から来ている。アウト・オブ・ディストリビューションのインスタンスに取り組むには、コストのかかる微調整アプローチや、スクラッチから一般化されたモデルの再トレーニングが必要になる。本研究は,従来の手法と異なり,NCOにおけるクロスディストリビューション適応のための効率的なプロンプト学習手法について検討する。具体的には、事前学習したモデルのゼロショット適応を高速に行う新しいプロンプト学習法を提案し、異なる分布からのルーティング問題を解く。提案モデルでは, 各種分布の一連のプロンプトを学習し, 最良適合のプロンプトを選択し, 各問題インスタンスに対して事前学習したアテンションモデルを提案する。広汎な実験により,提案手法が事前学習されたルーティングモデルの迅速な適応を促進することが示唆された。また、分散予測とゼロショット一般化の両方において、既存の一般化されたモデルよりも、多様な新しいタスクセットに優れる。私たちのコード実装はオンラインhttps://github.com/FeiLiu36/PromptVRP.comで利用可能です。 Neural combinatorial optimization (NCO) is a promising learning-based approach to solving various vehicle routing problems without much manual algorithm design. However, the current NCO methods mainly focus on the in-distribution performance, while the real-world problem instances usually come from different distributions. A costly fine-tuning approach or generalized model retraining from scratch could be needed to tackle the out-of-distribution instances. Unlike the existing methods, this work investigates an efficient prompt learning approach in NCO for cross-distribution adaptation. To be concrete, we propose a novel prompt learning method to facilitate fast zero-shot adaptation of a pre-trained model to solve routing problem instances from different distributions. The proposed model learns a set of prompts among various distributions and then selects the best-matched one to prompt a pre-trained attention model for each problem instance. Extensive experiments show that the proposed prompt learning approach facilitates the fast adaptation of pre-trained routing models. It also outperforms existing generalized models on both in-distribution prediction and zero-shot generalization to a diverse set of new tasks. Our code implementation is available online https://github.com/FeiLiu36/PromptVRP.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 大規模言語モデルにおける方向付きメトリック構造 Directed Metric Structures arising in Large Language Models ( http://arxiv.org/abs/2405.12264v1 ) ライセンス: Link先を確認	Stéphane Gaubert, Yiannis Vlassopoulos,	(参考訳) 大規模言語モデルは、コーパス内の与えられたテキストに対して、可能な次の単語の確率分布を生成するように訓練されたトランスフォーマーニューラルネットワークである。本稿では,テキスト拡張の条件付き確率分布によって定義される数学的構造について述べる。確率から-log確率への視点の変更私たちは、サブテキストの順序が、-log確率によって$\mathcal{L}$というテキストの空間で定義されたメートル法構造に完全にエンコードされていることを観察する。次に、計量ポリヘドロン $P(\mathcal{L})$ と $\mathcal{L}$ を $P(\mathcal{L})$ に等尺埋め込み( Yoneda embedding)し、テキストが特定の極端線の生成元にマップするように構成する。 P(\mathcal{L})$はこれらの極端線発生器の$(\min,+)$(熱帯)線型スパンである。生成元はまた$(\min+)$線型方程式の系を満たす。すると、$P(\mathcal{L})$はテキストの追加と互換性があることを示し、そこからボルツマン重み付きテキストベクトルの線形結合としてテキストベクトルの近似を導出する。次に、テキスト拡張とテキスト制限が等長多面体を与えることを示す双対性定理を証明します。さらに、$P(\mathcal{L})$ はいわゆる (あるバージョンの) の格子閉包であり、$\mathcal{L}$ は extremal ray generators の $(\max,+)$ であることを示す。すべての構成は圏論の解釈を持つが、圏論を明示的に用いない。分類学的解釈は付録で簡潔に説明されている。最後の付録では、意味論問題に対する構文が一般的な数学的双対性にどのように適合するかを記述している。 Large Language Models are transformer neural networks which are trained to produce a probability distribution on the possible next words to given texts in a corpus, in such a way that the most likely word predicted is the actual word in the training text. In this paper we find what is the mathematical structure defined by such conditional probability distributions of text extensions. Changing the view point from probabilities to -log probabilities we observe that the subtext order is completely encoded in a metric structure defined on the space of texts $\mathcal{L}$, by -log probabilities. We then construct a metric polyhedron $P(\mathcal{L})$ and an isometric embedding (called Yoneda embedding) of $\mathcal{L}$ into $P(\mathcal{L})$ such that texts map to generators of certain special extremal rays. We explain that $P(\mathcal{L})$ is a $(\min,+)$ (tropical) linear span of these extremal ray generators. The generators also satisfy a system of $(\min+)$ linear equations. We then show that $P(\mathcal{L})$ is compatible with adding more text and from this we derive an approximation of a text vector as a Boltzmann weighted linear combination of the vectors for words in that text. We then prove a duality theorem showing that texts extensions and text restrictions give isometric polyhedra (even though they look a priory very different). Moreover we prove that $P(\mathcal{L})$ is the lattice closure of (a version of) the so called, Isbell completion of $\mathcal{L}$ which turns out to be the $(\max,+)$ span of the text extremal ray generators. All constructions have interpretations in category theory but we don't use category theory explicitly. The categorical interpretations are briefly explained in an appendix. In the final appendix we describe how the syntax to semantics problem could fit in a general well known mathematical duality.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# SEL-CIE:非線形sRGB画像からのCIE-XYZ再構成のための知識誘導型自己教師付き学習フレームワーク SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images ( http://arxiv.org/abs/2405.12265v1 ) ライセンス: Link先を確認	Shir Barzel, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch,	(参考訳) 現代のカメラは、通常、原センサーデータを表す最小処理の線形RGB画像と、sRGB状態のような高処理の非線形画像状態の2種類の画像状態を提供する。 CIE-XYZ色空間(CIE-XYZ color space)は、カメラパイプラインの一部として使用されるデバイスに依存しない線形空間であり、医用アプリケーションにおける画像の劣化、脱毛、色認識といったコンピュータビジョンタスクに役立ち、色精度が重要である。しかし、通常、画像は非線形状態に保存され、従来の方法でCIE-XYZ色画像を達成することは必ずしも不可能である。この問題に対処するため、買収パイプラインの反転に焦点を当てた古典的な方法論が開発されている。最近では、同一画像のCIE-XYZとsRGB表現を組み合わせた教師あり学習が採用されている。しかし、CIE-XYZとsRGBペアの大規模なデータセットを得るのは難しい。この制限を克服し、大量のペアデータへの依存を軽減するために、自己教師付き学習(SSL)を、ペアデータのみに依存する代用として利用することができる。本稿では,CIE-XYZ 画像と sRGB 画像の再構成に SSL 手法を併用したフレームワークを提案する。提案するフレームワークはsRGB2XYZデータセットに適用される。 Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# EGAN: ランサムウェア普及のための進化的GAN EGAN: Evolutional GAN for Ransomware Evasion ( http://arxiv.org/abs/2405.12266v1 ) ライセンス: Link先を確認	Daniel Commey, Benjamin Appiah, Bill K. Frimpong, Isaac Osei, Ebenezer N. A. Hammond, Garth V. Crosby,	(参考訳) 敵の訓練は、敵のマルウェアに対する防御戦略として証明されている。しかし、このような訓練のために敵のマルウェアサンプルを生成することは、敵のマルウェアが回避的かつ機能的であり続ける必要があるため、課題となる。この研究は、この制限に対処する攻撃フレームワークEGANを提案する。 EGANはEvolution StrategyとGenerative Adversarial Networkを活用して、元の機能を保存しながらランサムウェアファイルを変更可能な一連の攻撃アクションを選択する。私たちは、このフレームワークを、VirusTotalにリストされたAIを使った商用アンチウイルスシステムでテストし、我々のフレームワークがこれらのシステムの大部分をバイパスできることを示した。さらに,EGAN攻撃フレームワークが他の商用非AIアンチウイルスソリューションを回避できるかどうかを検討した。この結果から, 敵ランサムウェアが生成したランサムウェアは, それらのいくつかを回避できる可能性が示唆された。 Adversarial Training is a proven defense strategy against adversarial malware. However, generating adversarial malware samples for this type of training presents a challenge because the resulting adversarial malware needs to remain evasive and functional. This work proposes an attack framework, EGAN, to address this limitation. EGAN leverages an Evolution Strategy and Generative Adversarial Network to select a sequence of attack actions that can mutate a Ransomware file while preserving its original functionality. We tested this framework on popular AI-powered commercial antivirus systems listed on VirusTotal and demonstrated that our framework is capable of bypassing the majority of these systems. Moreover, we evaluated whether the EGAN attack framework can evade other commercial non-AI antivirus solutions. Our results indicate that the adversarial ransomware generated can increase the probability of evading some of them.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 多体量子系における固有状態の局在 Eigenstate localization in a many-body quantum system ( http://arxiv.org/abs/2405.12279v1 ) ライセンス: Link先を確認	Chao Yin, Rahul Nandkishore, Andrew Lucas,	(参考訳) 非零エネルギー密度以下のすべての固有状態は、ヒルベルト空間内の「エネルギー的に許容される構成」の指数的に小さな部分で局所化される。我々の構成は古典的な低密度パリティチェックコードへの量子摂動に基づいている。原理的には、この固有状態の局在は、効率的に準備可能な混合状態において、ほとんどボディの相関関数を計測することによって検出することができる。 We prove the existence of extensive many-body Hamiltonians with few-body interactions and a many-body mobility edge: all eigenstates below a nonzero energy density are localized in an exponentially small fraction of "energetically allowed configurations" within Hilbert space. Our construction is based on quantum perturbations to a classical low-density parity check code. In principle, it is possible to detect this eigenstate localization by measuring few-body correlation functions in efficiently preparable mixed states.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 量子コンピュータを用いた電子構造における時間外相関器 Out-of-time-order correlators in electronic structure using Quantum Computers ( http://arxiv.org/abs/2405.12289v1 ) ライセンス: Link先を確認	K. J. Joven, V. M. Bastidas,	(参考訳) 演算子の拡散は、統計力学やブラックホール物理学から量子情報まで様々な分野に深く影響している。量子化の通常の方法は、古典的カオス力学におけるリャプノフ指数の量子アナログであるOtOC(out-of-time-order correlator)である。本研究では,量子コンピュータにおける電子構造の量子シミュレーションにおける演算子拡散現象について検討する。その結果を裏付けるために、水素鎖$H_4$に焦点をあて、この鎖が平衡幾何学から遠く離れている場合、作用素の拡散が促進されることを示す。また,バイパーティライトの絡み合いのダイナミクスと,そのパーティションサイズへの依存性についても検討した。本研究により, 領域と体積法則によく似た特異な特徴が認められた。電子構造の量子シミュレーションにおいて,演算子によるコヒーレントな誤差の拡散に関する知見を提供し,今日利用可能な様々なプラットフォームで実験的に実装可能である。 Operator spreading has profound implications in diverse fields ranging from statistical mechanics and blackhole physics to quantum information. The usual way to quantify it is through out-of-time-order correlators (OTOCs), which are the quantum analog to Lyapunov exponents in classical chaotic dynamics. In this work we explore the phenomenon of operator spreading in quantum simulation of electronic structure in quantum computers. To substantiate our results, we focus on a hydrogen chain $H_4$ and demonstrate that operator spreading is enhanced when the chain is far from its equilibrium geometry. We also investigate the dynamics of bipartite entanglement and its dependence on the partition's size. Our findings reveal distinctive signatures closely resembling area- and volume-laws in equilibrium and far-from-equilibrium geometries, respectively. Our results provide insight of operator spreading of coherent errors in quantum simulation of electronic structure and can be experimentally implemented in various platforms available today.	翻訳日:2024-05-22 15:17:08 公開日:2024-05-20
# 射影による量子リサジョウス図形 Quantum Lissajous Figures via Projection ( http://arxiv.org/abs/2405.12291v1 ) ライセンス: Link先を確認	Errico J. Russo,	(参考訳) 角周周波数の2DHOに対して、新しい量子リッサホス状態のカテゴリを示す。状態は通常のコヒーレント状態の2DHOの退化部分空間への射影から生じる。このように、新しい古典的でない量子力学的定常状態は古典的だが非定常的コヒーレント状態から生じる。リッサホス図形との関係は、我々の状態はすべて、対応する古典的リッサホス図形に沿って局所化される確率密度を持つということである。さらに、我々は、確率電流密度と検討中の状態における量子干渉の出現との間の重要な相互作用を強調した。そうすることで、ボルテックス状態として知られる状態のクラスについて一貫した議論をすることができる。 We present a new category of quantum Lissajous states for a 2DHO having commensurate angular frequencies. The states result from the projection of ordinary coherent states onto a degenerate subspace of the 2DHO. In this way, new, non-classical quantum mechanically stationary states arise from the classical but non-stationary coherent states. The connection to Lissajous figures is that our states all have probability densities that are localized along the corresponding classical Lissajous figures. We further emphasize the important interplay between the probability current density and the emergence of quantum interference in the states we examine. In doing so, we are able to present a consistent discussion of a class of states known as vortex states.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 量子光と創発性グラビトン偏光子に結合した分数量子ホール液体の理論 Theory of fractional quantum Hall liquids coupled to quantum light and emergent graviton-polaritons ( http://arxiv.org/abs/2405.12292v1 ) ライセンス: Link先を確認	Zeno Bacciconi, Hernan Xavier, Iacopo Carusotto, Titas Chanda, Marcello Dalmonte,	(参考訳) 最近のブレークスルー実験は、量子電磁空洞場と相互作用する量子ホール状態のダイナミクスを探索する方法を実証している。強く結合した非局所キャビティモードが整数量子ホール物理学に与える影響は近年研究されているが、分数量子ホール(FQH)液体、そしてより一般的には、物質の分数化状態に対する影響は未解明のままである。本研究では、量子光に結合したFQH状態の理解のための理論的枠組みを開発する。特に、解析的議論とテンソルネットワークシミュレーションを組み合わせることで、単一モードキャビティにおける$\nu=1/3$ Laughlin状態と有限電場勾配のダイナミクスを研究する。 FQH状態の位相的シグネチャは、量子化されたホール比抵抗の持続性によって示されるように、非局所的な空洞真空変動に対して頑健である。しかし、エンタングルメントスペクトルは、光物質の絡み合いとトポロジーの直接指紋を持ち、U(1)$カウントの独特な極性的なレプリカが明らかになる。キャビティ変動に対するさらなる応答として、長波長相関で符号化された圧縮されたFQH幾何も見出す。さらに, 強いキャビティ場勾配への移動は, 勾配方向の強い密度変調を特徴とし, 滑動する友長・ラッティンガー液相への不安定性をもたらすことを観察した。最後に、FQH相内の低エネルギー励起スペクトルを探索することにより、四極性FQH集団励起(グラビトンとして知られる)と光のハイブリッド化から生じる新しい準粒子、グラビトン・ポラリトンを同定する。本研究の結果から,より複雑なシナリオへの拡張の可能性について考察した。 Recent breakthrough experiments have demonstrated how it is now possible to explore the dynamics of quantum Hall states interacting with quantum electromagnetic cavity fields. While the impact of strongly coupled non-local cavity modes on integer quantum Hall physics has been recently addressed, its effects on fractional quantum Hall (FQH) liquids -- and, more generally, fractionalized states of matter -- remain largely unexplored. In this work, we develop a theoretical framework for the understanding of FQH states coupled to quantum light. In particular, combining analytical arguments with tensor network simulations, we study the dynamics of a $\nu=1/3$ Laughlin state in a single-mode cavity with finite electric field gradients. We find that the topological signatures of the FQH state remain robust against the non-local cavity vacuum fluctuations, as indicated by the endurance of the quantized Hall resistivity. The entanglement spectra, however, carry direct fingerprints of light-matter entanglement and topology, revealing peculiar polaritonic replicas of the $U(1)$ counting. As a further response to cavity fluctuations, we also find a squeezed FQH geometry, encoded in long-wavelength correlations. We additionally observe that moving to strong cavity field gradients leads to an instability towards a sliding Tomonaga-Luttinger liquid phase, featuring a strong density modulation in the gradient direction. Finally, by exploring the low-energy excited spectrum inside the FQH phase, we identify a new quasiparticle, the graviton-polariton, arising from the hybridization between quadrupolar FQH collective excitations (known as gravitons) and light. We discuss the experimental implications of our findings and possible extension of our results to more complex scenarios.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 誘導型グラフニューラルネットワークに対する効率的なモデルステアリング攻撃 Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks ( http://arxiv.org/abs/2405.12295v1 ) ライセンス: Link先を確認	Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pregowska, Tomasz Michalak,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造で組織された実世界のデータを処理するための強力なツールとして認識されている。特に、事前に定義されたグラフ構造に依存しないグラフ構造化データの処理を可能にするインダクティブGNNは、ますます多様なアプリケーションにおいて重要になっている。これらのネットワークは、様々なタスクにわたる習熟度を示すため、敵が標的ネットワークの機能の複製を試みているモデルステーリング攻撃の利益源となる。画像やテキストで訓練されたモデルに焦点を当てたモデルステアリング攻撃の開発には、多大な努力が払われている。しかし、グラフデータで訓練されたGNNには、ほとんど注意が払われていない。本稿では,グラフコントラスト学習とスペクトルグラフ拡張に基づく誘導型GNNに対する教師なしモデルステーリング攻撃手法を提案する。提案した攻撃は6つのデータセットで徹底的に評価される。その結果,既存の盗難攻撃と比較して高い効率性を示した。より具体的には、我々の攻撃は、ターゲットモデルに送信されるクエリを少なくしながら、盗難モデルの忠実度と下流精度を達成するため、全てのベンチマークでベースラインを上回ります。 Graph Neural Networks (GNNs) are recognized as potent tools for processing real-world data organized in graph structures. Especially inductive GNNs, which enable the processing of graph-structured data without relying on predefined graph structures, are gaining importance in an increasingly wide variety of applications. As these networks demonstrate proficiency across a range of tasks, they become lucrative targets for model-stealing attacks where an adversary seeks to replicate the functionality of the targeted network. A large effort has been made to develop model-stealing attacks that focus on models trained with images and texts. However, little attention has been paid to GNNs trained on graph data. This paper introduces a novel method for unsupervised model-stealing attacks against inductive GNNs, based on graph contrasting learning and spectral graph augmentations to efficiently extract information from the target model. The proposed attack is thoroughly evaluated on six datasets. The results show that this approach demonstrates a higher level of efficiency compared to existing stealing attacks. More concretely, our attack outperforms the baseline on all benchmarks achieving higher fidelity and downstream accuracy of the stolen model while requiring fewer queries sent to the target model.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# メタオーバーフィッティングを緩和するグラディエントの摂動 Perturbing the Gradient for Alleviating Meta Overfitting ( http://arxiv.org/abs/2405.12299v1 ) ライセンス: Link先を確認	Manas Gogoi, Sambhavi Tiwari, Shekhar Verma,	(参考訳) 相互非排他性と多様性の欠如は、単一のグローバル関数がすべてのメタトレーニングタスクのサポートセットデータセットに適合し、新しい未知のタスクに一般化できないことを意味する。この問題は、メタトレーニングタスクではエラー率の低いことが証明されているが、新しいタスクではエラー率が高い。しかしながら、タスクの多様性を高め、いくつかのタスクに対するモデルの信頼性を低下させるという、2つの目標のいずれかを念頭に置いて、この問題に対する新しい解決策が数多く存在する。そこで本研究では,数ショットの正弦波回帰や数ショットの分類など,数ショットの学習環境におけるメタオーバーフィッティングに対処する手法を提案する。提案手法は,非相互排他的タスク設定における学習における最先端のベースラインと比較して,一般化性能の向上を実証する。本論文は,メタラーニングにおけるオーバーフィッティングに対処するための洞察を提供することと,より堅牢で一般化可能なモデルに向けての分野を前進させることを目的としている。 The reason for Meta Overfitting can be attributed to two factors: Mutual Non-exclusivity and the Lack of diversity, consequent to which a single global function can fit the support set data of all the meta-training tasks and fail to generalize to new unseen tasks. This issue is evidenced by low error rates on the meta-training tasks, but high error rates on new tasks. However, there can be a number of novel solutions to this problem keeping in mind any of the two objectives to be attained, i.e. to increase diversity in the tasks and to reduce the confidence of the model for some of the tasks. In light of the above, this paper proposes a number of solutions to tackle meta-overfitting on few-shot learning settings, such as few-shot sinusoid regression and few shot classification. Our proposed approaches demonstrate improved generalization performance compared to state-of-the-art baselines for learning in a non-mutually exclusive task setting. Overall, this paper aims to provide insights into tackling overfitting in meta-learning and to advance the field towards more robust and generalizable models.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 分散型衛星ルーティングのための連続的深層強化学習 Continual Deep Reinforcement Learning for Decentralized Satellite Routing ( http://arxiv.org/abs/2405.12308v1 ) ライセンス: Link先を確認	Federico Lozano-Cuadra, Beatriz Soret, Israel Leyva-Mayorga, Petar Popovski,	(参考訳) 本稿では,低地球軌道衛星コンステレーションにおける分散ルーティングを連続的深部強化学習(DRL)に基づいて完全な解法を提案する。これは、衛星における部分的な知識や連続的な動き、交通、通信リンク、通信バッファといったシステムの不確実性の時間的変化など、複数の課題に対処する必要がある。我々は,各衛星が独立した意思決定エージェントとして機能し,近隣のエージェントからのフィードバックに基づいて環境の知識を限定的に獲得するマルチエージェントアプローチに従う。解法は2つの段階に分けられる。まず、オフライン学習フェーズは、分散化された決定と、グローバルエクスペリエンスをトレーニングしたグローバルディープニューラルネットワーク(DNN)に依存します。次に、ローカル、オンボード、および事前訓練されたDNNによるオンラインフェーズでは、(1)星座の予測可能な条件を次の衛星と共有する各衛星が活用するモデル予測、(2)エージェントのモデルをまずクラスタレベルでマージし、次にグローバルパラメータサーバに集約するフェデレートラーニング(FL)という2つの方法で、環境とともに継続的な学習を行う必要がある。その結果,提案するマルチエージェントDRLフレームワークは,最短パス方式と同じE2E性能を実現するが,後者は集中ノードにおけるリアルタイムネットワーク知識の通信オーバーヘッドを前提としている。重要なことは、当社のソリューションは混雑条件に順応し、負荷の少ない経路を活用できるということです。さらに,長期的アライメントに応用された予測と,長期的アライメントに利用したFLの相乗効果により,時間とともにモデルのばらつきが容易に取り組まれる。 This paper introduces a full solution for decentralized routing in Low Earth Orbit satellite constellations based on continual Deep Reinforcement Learning (DRL). This requires addressing multiple challenges, including the partial knowledge at the satellites and their continuous movement, and the time-varying sources of uncertainty in the system, such as traffic, communication links, or communication buffers. We follow a multi-agent approach, where each satellite acts as an independent decision-making agent, while acquiring a limited knowledge of the environment based on the feedback received from the nearby agents. The solution is divided into two phases. First, an offline learning phase relies on decentralized decisions and a global Deep Neural Network (DNN) trained with global experiences. Then, the online phase with local, on-board, and pre-trained DNNs requires continual learning to evolve with the environment, which can be done in two different ways: (1) Model anticipation, where the predictable conditions of the constellation are exploited by each satellite sharing local model with the next satellite; and (2) Federated Learning (FL), where each agent's model is merged first at the cluster level and then aggregated in a global Parameter Server. The results show that, without high congestion, the proposed Multi-Agent DRL framework achieves the same E2E performance as a shortest-path solution, but the latter assumes intensive communication overhead for real-time network-wise knowledge of the system at a centralized node, whereas ours only requires limited feedback exchange among first neighbour satellites. Importantly, our solution adapts well to congestion conditions and exploits less loaded paths. Moreover, the divergence of models over time is easily tackled by the synergy between anticipation, applied in short-term alignment, and FL, utilized for long-term alignment.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 単一基底状態からの等変量子系の高精度学習 Accurate Learning of Equivariant Quantum Systems from a Single Ground State ( http://arxiv.org/abs/2405.12309v1 ) ライセンス: Link先を確認	Štěpán Šmíd, Roberto Bondesan,	(参考訳) システムパラメータ間の特性予測は、分子動力学から変分量子アルゴリズムまで、量子物理学において重要な課題である。近年,この課題を解決するアルゴリズムの開発が進められている。ここでは, 周期的境界条件を持つ系のすべての基底状態の性質を, 1つの基底状態サンプルから学習する方法を示すことにより, これらのアルゴリズムの効率を劇的に改善する。予測誤差は熱力学の限界でゼロとなる傾向を示し、数値的な検証を行う。 Predicting properties across system parameters is an important task in quantum physics, with applications ranging from molecular dynamics to variational quantum algorithms. Recently, provably efficient algorithms to solve this task for ground states within a gapped phase were developed. Here we dramatically improve the efficiency of these algorithms by showing how to learn properties of all ground states for systems with periodic boundary conditions from a single ground state sample. We prove that the prediction error tends to zero in the thermodynamic limit and numerically verify the results.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 新しいバイアス測定の原理的アプローチ A Principled Approach for a New Bias Measure ( http://arxiv.org/abs/2405.12312v1 ) ライセンス: Link先を確認	Bruno Scarone, Alfredo Viola, Ricardo Baeza-Yates,	(参考訳) 意思決定に機械学習とデータ駆動アルゴリズムが広く使われていることは、長年にわたり着実に増加している。医療、雇用、金融、教育、法制度など、様々な分野でこの現象が起きている。負のデータであるemph{bias}は、特定の集団に有害な結果をもたらす傾向がある。バイアスの負の結果に対処する緩和戦略や効果的な政策は、バイアスが存在するという認識から始まり、その理解と定量化の方法である。しかし、データのバイアスを測定する方法にはコンセンサスがないため、しばしば意図された意味は文脈に依存し、研究コミュニティには一様ではない。本研究の主な貢献は,(1)保護群に対するデータセットのバイアスレベルを定義し,効率的に定量化するための一般的なアルゴリズムフレームワーク,(2)新しいバイアス尺度の定義である。この結果は,9つの公開データセットを用いて実験的に検証され,理論的に解析され,新たな知見が得られた。当社のアプローチに基づいて,政策立案者にとって有用なバイアス緩和アルゴリズムも導出する。 The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name a few; and the associated negative side effects are being increasingly harmful for society. Negative data \emph{bias} is one of those, which tends to result in harmful consequences for specific groups of people. Any mitigation strategy or effective policy that addresses the negative consequences of bias must start with awareness that bias exists, together with a way to understand and quantify it. However, there is a lack of consensus on how to measure data bias and oftentimes the intended meaning is context dependent and not uniform within the research community. The main contributions of our work are: (1) a general algorithmic framework for defining and efficiently quantifying the bias level of a dataset with respect to a protected group; and (2) the definition of a new bias measure. Our results are experimentally validated using nine publicly available datasets and theoretically analyzed, which provide novel insights about the problem. Based on our approach, we also derive a bias mitigation algorithm that might be useful to policymakers.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 深層学習に基づくハイパースペクトル画像再構成による農作物の品質評価 Deep learning-based hyperspectral image reconstruction for quality assessment of agro-product ( http://arxiv.org/abs/2405.12313v1 ) ライセンス: Link先を確認	Md. Toukir Ahmed, Ocean Monjur, Mohammed Kamruzzaman,	(参考訳) ハイパースペクトルイメージング(HSI)は、近年、多くの農業用途において有望なツールとして登場したが、大量のデータを処理するのに膨大な時間を要するため、リアルタイムシステムでは直接利用できない。したがって、現在のHSIシステムでは、単純でコンパクトで費用対効果の高いイメージングシステムの開発は不可能である。そこで本研究の目的は,農業用深層学習によるRGB画像からのハイパースペクトル画像の再構成である。具体的には、高スペクトル畳み込みニューラルネットワーク(HSCNN-D)を用いて、サツマイモの可溶性固形物(SSC)を予測するために、RGB画像から高スペクトル画像を再構成した。アルゴリズムは、RGB画像からのハイパースペクトル画像を正確に再構成し、その結果のスペクトルは、地上構造と密に一致した。再構成スペクトルに基づく部分最小二乗回帰(PLSR)モデルは,サツマイモのSSC予測の可能性を示した。これらの知見は,様々な農業用ツールとして,ディープラーニングに基づくハイパースペクトル画像再構成の可能性を強調した。 Hyperspectral imaging (HSI) has recently emerged as a promising tool for many agricultural applications; however, the technology cannot be directly used in a real-time system due to the extensive time needed to process large volumes of data. Consequently, the development of a simple, compact, and cost-effective imaging system is not possible with the current HSI systems. Therefore, the overall goal of this study was to reconstruct hyperspectral images from RGB images through deep learning for agricultural applications. Specifically, this study used Hyperspectral Convolutional Neural Network - Dense (HSCNN-D) to reconstruct hyperspectral images from RGB images for predicting soluble solid content (SSC) in sweet potatoes. The algorithm accurately reconstructed the hyperspectral images from RGB images, with the resulting spectra closely matching the ground-truth. The partial least squares regression (PLSR) model based on reconstructed spectra outperformed the model using the full spectral range, demonstrating its potential for SSC prediction in sweet potatoes. These findings highlight the potential of deep learning-based hyperspectral image reconstruction as a low-cost, efficient tool for various agricultural uses.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 二重ランドマーク積分作用素を用いた高次元ノイズデータセットのカーネルスペクトル結合埋め込み Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators ( http://arxiv.org/abs/2405.12317v1 ) ライセンス: Link先を確認	Xiucai Ding, Rong Ma,	(参考訳) 複数の異種データセットの統合解析は、多くの研究分野、特に単細胞ゲノム学や医療情報学において標準的な実践となっている。既存のアプローチは、しばしば非線形構造を捕捉する際の限られたパワー、ノイズや高次元効果の不足、信号への適応性の欠如、サンプルサイズの不均衡、そしてそれらの結果の解釈が困難である。これらの制約に対処するために、独立に観測された2つの高次元ノイズデータセットの結合埋め込みを実現する新しいカーネルスペクトル法を提案する。提案手法は,組込み品質を向上させるために,データセット間で共有される可能性のある低次元構造を自動的に捕捉し,活用する。得られた低次元埋め込みは、同時クラスタリング、データの可視化、デノイングなど、多くの下流タスクに利用できる。提案手法は厳密な理論的解析によって正当化される。具体的には,低次元雑音信号の回復における手法の整合性を示し,信号対雑音比が収束率に与える影響を特徴付ける。合同多様体モデルフレームワークの下で、新たに導入された積分作用素の固有函数への究極の埋め込みの収束を確立する。これらの作用素はデュオランドマーク積分作用素と呼ばれ、再生されたカーネルヒルベルト空間(RKHS)の畳み込みカーネル写像によって定義される。これらのRKHSは、2つのデータセットの部分的または完全に共有された低次元の非線形信号構造をキャプチャする。 2つの単一セルオミクスデータセットの数値実験と解析により,既存手法よりも提案手法の利点を実証した。 Integrative analysis of multiple heterogeneous datasets has become standard practice in many research fields, especially in single-cell genomics and medical informatics. Existing approaches oftentimes suffer from limited power in capturing nonlinear structures, insufficient account of noisiness and effects of high-dimensionality, lack of adaptivity to signals and sample sizes imbalance, and their results are sometimes difficult to interpret. To address these limitations, we propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets. The proposed method automatically captures and leverages possibly shared low-dimensional structures across datasets to enhance embedding quality. The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising. The proposed method is justified by rigorous theoretical analysis. Specifically, we show the consistency of our method in recovering the low-dimensional noiseless signals, and characterize the effects of the signal-to-noise ratios on the rates of convergence. Under a joint manifolds model framework, we establish the convergence of ultimate embeddings to the eigenfunctions of some newly introduced integral operators. These operators, referred to as duo-landmark integral operators, are defined by the convolutional kernel maps of some reproducing kernel Hilbert spaces (RKHSs). These RKHSs capture the either partially or entirely shared underlying low-dimensional nonlinear signal structures of the two datasets. Our numerical experiments and analyses of two single-cell omics datasets demonstrate the empirical advantages of the proposed method over existing methods in both embeddings and several downstream tasks.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 胸部X線画像における正確な肺切開のためのチャネルとコンテキストアテンションを有する階層型セグネット Hierarchical SegNet with Channel and Context Attention for Accurate Lung Segmentation in Chest X-ray Images ( http://arxiv.org/abs/2405.12318v1 ) ライセンス: Link先を確認	Mohammad Ali Labbaf Khaniki, Nazanin Mahjourian, Mohammad Manthouri,	(参考訳) 胸部X線像における肺セグメンテーションは、様々な肺疾患の正確な診断と治療を可能にする医療画像解析において重要な課題である。本稿では,階層型セグネットとマルチモーダルアテンション機構を組み合わせた肺セグメンテーション手法を提案する。チャネルアテンション機構は肺領域セグメンテーションに不可欠な特定の特徴マップやチャネルを強調し、コンテキストアテンション機構は異なる空間領域の重要性を適応的に重み付けする。両方のメカニズムを組み合わせることで、モデルが複雑なパターンや特徴間の関係をよりよく捉え、セグメンテーションの精度が向上し、特徴表現が向上する。さらに、注意情報とエンコーダ特徴を統合するために注意ゲーティング機構を用い、異なる注意特徴の重要性を適応的に評価し、無関係な特徴を無視できるようにする。実験により,本手法は肺分画作業における最先端性能を達成し,既存手法より優れていたことを示す。提案手法は,肺疾患の診断と治療の精度と効率を向上させる可能性があり,他の画像解析にも適用可能である。 Lung segmentation in chest X-ray images is a critical task in medical image analysis, enabling accurate diagnosis and treatment of various lung diseases. In this paper, we propose a novel approach for lung segmentation by integrating Hierarchical SegNet with a proposed multi-modal attention mechanism. The channel attention mechanism highlights specific feature maps or channels crucial for lung region segmentation, while the context attention mechanism adaptively weighs the importance of different spatial regions. By combining both mechanisms, the proposed mechanism enables the model to better capture complex patterns and relationships between various features, leading to improved segmentation accuracy and better feature representation. Furthermore, an attention gating mechanism is employed to integrate attention information with encoder features, allowing the model to adaptively weigh the importance of different attention features and ignore irrelevant ones. Experimental results demonstrate that our proposed approach achieves state-of-the-art performance in lung segmentation tasks, outperforming existing methods. The proposed approach has the potential to improve the accuracy and efficiency of lung disease diagnosis and treatment, and can be extended to other medical image analysis tasks.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 超局所気象予測を用いた動的ラインレーティング:機械学習によるアプローチ Dynamic Line Rating using Hyper-local Weather Predictions: A Machine Learning Approach ( http://arxiv.org/abs/2405.12319v1 ) ライセンス: Link先を確認	Henri Manninen, Markus Lippus, Georg Rute,	(参考訳) 送電網における再生可能エネルギー統合には動的ラインレーティング(DLR)システムが不可欠である。しかし、従来の方法では、あらゆる極やスパンにセンサーをインストールする非現実性のために、センサーデータの課題に直面する。さらに、センサベースのアプローチは、急速に変化する気象条件においてDLRを予測するのに苦労する可能性がある。本稿では,ハイパーローカル気象予報データとともに機械学習(ML)技術を活用する新しい手法を提案する。センサデータにのみ依存する従来の手法とは異なり、このアプローチでは、全ネットワークスケールでハイパーローカル気象パラメータを予測するためにトレーニングされたMLモデルを使用する。地形データを統合することで、景観の特徴や頭上線周辺の障害物を考慮した予測精度が向上する。本稿では,不確実性に関連するリスクを軽減するため,DLR評価のための信頼区間を提案する。エストニアのケーススタディでは、提案手法の実践的な実装を実証し、実世界のシナリオにおけるその有効性を強調している。本研究は,センサベースアプローチの限界に対処することにより,送電系統における再生可能エネルギー統合の談話,電力系統における効率と信頼性の向上に寄与する。 Dynamic Line Rating (DLR) systems are crucial for renewable energy integration in transmission networks. However, traditional methods relying on sensor data face challenges due to the impracticality of installing sensors on every pole or span. Additionally, sensor-based approaches may struggle predicting DLR in rapidly changing weather conditions. This paper proposes a novel approach, leveraging machine learning (ML) techniques alongside hyper-local weather forecast data. Unlike conventional methods, which solely rely on sensor data, this approach utilizes ML models trained to predict hyper-local weather parameters on a full network scale. Integrating topographical data enhances prediction accuracy by accounting for landscape features and obstacles around overhead lines. The paper introduces confidence intervals for DLR assessments to mitigate risks associated with uncertainties. A case study from Estonia demonstrates the practical implementation of the proposed methodology, highlighting its effectiveness in real-world scenarios. By addressing limitations of sensor-based approaches, this research contributes to the discourse of renewable energy integration in transmission systems, advancing efficiency and reliability in the power grid.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 物理的不可避関数とゼロ知識証明によるブロックチェーンベースのIoTシステムのセキュア化 Securing Blockchain-based IoT Systems with Physical Unclonable Functions and Zero-Knowledge Proofs ( http://arxiv.org/abs/2405.12322v1 ) ライセンス: Link先を確認	Daniel Commey, Sena Hounsinou, Garth V. Crosby,	(参考訳) 本稿では,ブロックチェーンベースのIoTシステムを保護するためのフレームワークとして,Physical Unclonable Functions(PUF)とZero-Knowledge Proofs(ZKP)をHyperledger Fabric環境に統合する。提案フレームワークは、PUFをユニークなデバイス識別に、ZKPをプライバシ保護認証とトランザクション処理に活用する。実験の結果、様々な攻撃に対するフレームワークの実現可能性、性能、セキュリティが示された。このフレームワークは、ブロックチェーンベースのIoTシステムのセキュリティ問題に対処するための包括的なソリューションを提供する。 This paper presents a framework for securing blockchain-based IoT systems by integrating Physical Unclonable Functions (PUFs) and Zero-Knowledge Proofs (ZKPs) within a Hyperledger Fabric environment. The proposed framework leverages PUFs for unique device identification and ZKPs for privacy-preserving authentication and transaction processing. Experimental results demonstrate the framework's feasibility, performance, and security against various attacks. This framework provides a comprehensive solution for addressing the security challenges in blockchain-based IoT systems.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# ボールのオーバーラップ数モデル非依存対数(ONB-MACF):信頼に値する人工知能のためのデータ構造に基づく対実生成法 Overlap Number of Balls Model-Agnostic CounterFactuals (ONB-MACF): A Data-Morphology-based Counterfactual Generation Method for Trustworthy Artificial Intelligence ( http://arxiv.org/abs/2405.12326v1 ) ライセンス: Link先を確認	José Daniel Pascual-Triana, Alberto Fernández, Javier Del Ser, Francisco Herrera,	(参考訳) 説明可能な人工知能(XAI)は、AIシステムの運用メカニズムを理解することを目的とした重要な研究領域である。 XAIは、これらのAIシステムをより理解しやすく信頼性の高いものにし、意思決定プロセスに関する洞察を提供することを目指している。明確で分かりやすい説明を生み出すことで、XAIはユーザー、実践者、利害関係者がモデルの判断を信頼できるようになる。本研究は,データ形態学戦略の価値を解析し,反実的説明を生成する。ボールのオーバーラップ数(Overlap Number of Balls Model-Agnostic CounterFactuals,ONB-MACF)は、データ形態を利用してモデルの決定境界を推定する、モデルに依存しない逆ファクト生成法である。 ONB-MACF法は、被覆点がクラスを共有するデータ空間内の超球面を構築し、決定境界をマッピングする。その後、インスタンスの属性を最も近い代替クラスハイパースフィアに向けて漸進的に調整し、最小限の変更で決定境界を越えることで、カウンターファクトアルが生成される。設計により、ONB-MACF 法は、データ分布に従う、実現可能でスパースな偽物を生成する。両視点から総合ベンチマークを行ったところ,ONB-MACF法は,多様な表型データセット上での複数の品質指標において,既存の最先端の偽物生成手法よりも優れていることがわかった。これは我々の仮説を支持し、信頼できるAIのためのデータ形態に基づく説明可能性戦略の可能性を示している。 Explainable Artificial Intelligence (XAI) is a pivotal research domain aimed at understanding the operational mechanisms of AI systems, particularly those considered ``black boxes'' due to their complex, opaque nature. XAI seeks to make these AI systems more understandable and trustworthy, providing insight into their decision-making processes. By producing clear and comprehensible explanations, XAI enables users, practitioners, and stakeholders to trust a model's decisions. This work analyses the value of data morphology strategies in generating counterfactual explanations. It introduces the Overlap Number of Balls Model-Agnostic CounterFactuals (ONB-MACF) method, a model-agnostic counterfactual generator that leverages data morphology to estimate a model's decision boundaries. The ONB-MACF method constructs hyperspheres in the data space whose covered points share a class, mapping the decision boundary. Counterfactuals are then generated by incrementally adjusting an instance's attributes towards the nearest alternate-class hypersphere, crossing the decision boundary with minimal modifications. By design, the ONB-MACF method generates feasible and sparse counterfactuals that follow the data distribution. Our comprehensive benchmark from a double perspective (quantitative and qualitative) shows that the ONB-MACF method outperforms existing state-of-the-art counterfactual generation methods across multiple quality metrics on diverse tabular datasets. This supports our hypothesis, showcasing the potential of data-morphology-based explainability strategies for trustworthy AI.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# 医用画像分割のためのアテンションベースフィルタを用いた多次元変換器 Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation ( http://arxiv.org/abs/2405.12328v1 ) ライセンス: Link先を確認	Wentao Wang, Xi Xiao, Mingjie Liu, Qing Tian, Xuanyao Huang, Qizhen Lan, Swalpa Kumar Roy, Tianyang Wang,	(参考訳) 医療画像の正確なセグメンテーションは、疾患の診断と治療に不可欠である。近年の研究では、視覚トランスフォーマーに基づく手法は、特徴間のグローバルな関係を確立する能力と様々な入力への適応性により、医用画像セグメンテーションの性能が著しく向上したことが示されている。しかし,これらの手法は,医用画像固有の低信号-雑音比に苦慮している。また, 医用画像のセグメンテーションに不可欠なチャネル情報と空間情報の有効利用は, 自己注意の表現能力によって制限される。これらの課題に対処するために,医療画像セグメンテーションのためのパッチ埋め込みと自己保持機構を再設計する,アテンションベースフィルタリング(MDT-AF)を備えたマルチ次元トランスフォーマーを提案する。 MDT-AFは、注意に基づく特徴フィルタリング機構をパッチ埋め込みブロックに組み込んでおり、低信号対雑音比の影響を軽減するために粗粒度プロセスを採用している。医用画像の複雑な構造をよりよく捉えるために、MDT-AFは自己認識機構を拡張し、空間次元とチャネル次元を取り入れ、特徴表現を豊かにする。さらに,空間次元とチャネル次元の特徴集約を改善するための相互作用機構を導入する。 3つの公開医用画像セグメンテーションベンチマークによる実験結果から, MDT-AFがSOTA(State-of-the-art)の性能を達成することが示された。 The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low signal-to-noise ratio inherent to medical images. Additionally, the effective utilization of channel and spatial information, which are essential for medical image segmentation, is limited by the representation capacity of self-attention. To address these challenges, we propose a multi-dimension transformer with attention-based filtering (MDT-AF), which redesigns the patch embedding and self-attention mechanism for medical image segmentation. MDT-AF incorporates an attention-based feature filtering mechanism into the patch embedding blocks and employs a coarse-to-fine process to mitigate the impact of low signal-to-noise ratio. To better capture complex structures in medical images, MDT-AF extends the self-attention mechanism to incorporate spatial and channel dimensions, enriching feature representation. Moreover, we introduce an interaction mechanism to improve the feature aggregation between spatial and channel dimensions. Experimental results on three public medical image segmentation benchmarks show that MDT-AF achieves state-of-the-art (SOTA) performance.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# オープンソースプロジェクトにおけるソフトウェア欠陥検出のための静的解析ツールの有効性 Efficacy of static analysis tools for software defect detection on open-source projects ( http://arxiv.org/abs/2405.12333v1 ) ライセンス: Link先を確認	Jones Yeboah, Saheed Popoola,	(参考訳) ソフトウェアプラクティスでは、静的解析ツールはソフトウェアの欠陥検出の不可欠な部分であり、Java、C++、Pythonといったさまざまなプログラミング言語で解析を実行するように設計されている。本稿では,Java,C++,Pythonのコードを用いたいくつかのデータセットを用いて,ソフトウェア欠陥を識別するための一般的な静的解析ツールを実証的に比較する。この研究は、データセットを使用して比較を行うために、SonarQube、PMD、Checkstyle、FindBugsといった一般的な分析ツールを使用した。この研究では、精度、リコール、F1スコアなどのさまざまな評価指標を使用して、分析ツールのパフォーマンスを測定した。この結果から,SonarQubeは3つのプログラミング言語にまたがる欠陥検出において,他のツールよりもかなり優れていることがわかった。これらの結果は、SonarQubeがソフトウェアの欠陥検出に有効なツールであることに同意する他の既存の研究と一致している。この研究は、異なるプログラミング言語を用いた静的解析ツールに関する多くの洞察と、各解析ツールの長所と短所を理解するための追加情報に貢献している。この研究は、ソフトウェア開発研究者や実践者への影響や、この分野の今後の方向性についても論じている。我々の研究アプローチは、ソフトウェア開発者、実践者、研究者が静的解析ツールでソフトウェアコードのエラーを検出する正しい選択をできるようにするためのレコメンデーションガイドラインを提供することを目的としています。また、研究者はソフトウェア分析ツールの調査と改善に取り組み、ソフトウェアシステムの品質と信頼性とソフトウェア開発プロセスの実践を強化する。 In software practice, static analysis tools remain an integral part of detecting defects in software and there have been various tools designed to run the analysis in different programming languages like Java, C++, and Python. This paper presents an empirical comparison of popular static analysis tools for identifying software defects using several datasets using Java, C++, and Python code. The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison based on using the datasets. The study also used various evaluation metrics such as Precision, Recall, and F1-score to determine the performance of each analysis tool. The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection across the various three programming languages. These findings remain consistent with other existing studies that also agree on SonarQube being an effective tool for defect detection in software. The study contributes to much insight on static analysis tools with different programming languages and additional information to understand the strengths and weaknesses of each analysis tool. The study also discusses the implications for software development researchers and practitioners, and future directions in this area. Our research approach aim is to provide a recommendation guideline to enable software developers, practitioners, and researchers to make the right choice on static analysis tools to detect errors in their software codes. Also, for researchers to embark on investigating and improving software analysis tools to enhance the quality and reliability of the software systems and its software development processes practice.	翻訳日:2024-05-22 15:07:24 公開日:2024-05-20
# オープンスタンダードベースメタデータ, 透かし, 暗号を用いた放送媒体の相互運用性保証認証 Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography ( http://arxiv.org/abs/2405.12336v1 ) ライセンス: Link先を確認	John C. Simmons, Joseph M. Winograd,	(参考訳) 誤った情報や誤解を招く情報の拡散は、立法機関や規制機関から大きな注目を集めている。消費者は特定の情報ソースを信頼するので、情報の出所や正確性を決定するためのスケーラブルで相互運用可能な方法が必要である。本稿では、ソーシャルメディアプラットフォームへの放送ニュースコンテンツの投稿、オープンスタンダードの役割、証明の検証における暗号メタデータと透かしの相互利用、そして成功と失敗のシナリオについて分析する。我々は,C2PA (Coalition for Provenance and Authenticity) とATSC (Advanced Television Systems Committee) によって開発された音声・ビデオ透かしのための暗号認証メタデータのオープン標準が,放送の証明に適していると結論付けた。最適な成功のためにこれらの標準を使用する方法を提案する。 The spread of false and misleading information is receiving significant attention from legislative and regulatory bodies. Consumers place trust in specific sources of information, so a scalable, interoperable method for determining the provenance and authenticity of information is needed. In this paper we analyze the posting of broadcast news content to a social media platform, the role of open standards, the interplay of cryptographic metadata and watermarks when validating provenance, and likely success and failure scenarios. We conclude that the open standards for cryptographically authenticated metadata developed by the Coalition for Provenance and Authenticity (C2PA) and for audio and video watermarking developed by the Advanced Television Systems Committee (ATSC) are well suited to address broadcast provenance. We suggest methods for using these standards for optimal success.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# Self-HWDebug: ハードウェアセキュリティ検証のためのLLMセルフインストラクションの自動化 Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification ( http://arxiv.org/abs/2405.12347v1 ) ライセンス: Link先を確認	Mohammad Akyash, Hadi Mardani Kamali,	(参考訳) 命令調整型大規模言語モデル(LLM)の台頭は、人工知能(AI)(特定のプロンプトに対応するのに適したもの)の大幅な進歩を示している。その人気にもかかわらず、ハードウェア設計におけるセキュリティ脆弱性、すなわちレジスタ転送言語(RTL)モジュール、特にSystem-on-chip(SoC)レベルでのセキュリティ脆弱性のデバッグにそのようなモデルを適用することは、大きな課題を呈している。主な課題の1つは、脆弱性の特定と緩和のために、正確に設計された指示が必要であることである。この課題に対応するために,LLMを活用して必要なデバッグ手順を自動生成する,革新的なフレームワークであるSelf-HWDebugを提案する。 Self-HWDebugでは、最も重要なハードウェア共通弱点列挙(CWE)リストから既に特定されているバグのセットと緩和解像度がフレームワークに提供され、その後、LSMにそのような緩和のためのターゲット命令を生成するよう促す。 LLMの生成した命令はその後、同じCWEカテゴリ内の脆弱性に対処する参照として使用されるが、全く異なる設計で、関連するセキュリティ問題にまたがるソリューションを拡張するフレームワークの能力を効果的に示している。 Self-HWDebugは、モデル独自の出力を使用してデバッグをガイドすることによって、人間の介入を大幅に削減する。包括的なテストを通じて、Self-HWDebugは専門家の労力/時間を短縮するだけでなく、デバッグプロセスの品質を向上させることも証明している。 The rise of instruction-tuned Large Language Models (LLMs) marks a significant advancement in artificial intelligence (AI) (tailored to respond to specific prompts). Despite their popularity, applying such models to debug security vulnerabilities in hardware designs, i.e., register transfer language (RTL) modules, particularly at system-on-chip (SoC) level, presents considerable challenges. One of the main issues lies in the need for precisely designed instructions for pinpointing and mitigating the vulnerabilities, which requires substantial time and expertise from human experts. In response to this challenge, this paper proposes Self-HWDebug, an innovative framework that leverages LLMs to automatically create required debugging instructions. In Self-HWDebug, a set of already identified bugs from the most critical hardware common weakness enumeration (CWE) listings, along with mitigation resolutions, is provided to the framework, followed by prompting the LLMs to generate targeted instructions for such mitigation. The LLM-generated instructions are subsequently used as references to address vulnerabilities within the same CWE category but in totally different designs, effectively demonstrating the framework's ability to extend solutions across related security issues. Self-HWDebug significantly reduces human intervention by using the model's own output to guide debugging. Through comprehensive testing, Self-HWDebug proves not only to reduce experts' effort/time but also to even improve the quality of the debugging process.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# TinyM$^2$Net-V3:サステナブルエッジ展開のためのメモリ対応圧縮マルチモーダルディープニューラルネットワーク TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment ( http://arxiv.org/abs/2405.12353v1 ) ライセンス: Link先を確認	Hasib-Al Rashid, Tinoosh Mohsenin,	(参考訳) 高度な人工知能(AI)アルゴリズムの進歩により、エネルギー使用量や二酸化炭素排出量が著しく増加し、気候変動に対する懸念が高まっている。この増大する問題により、AI技術の環境持続性が最前線に進出した。これらの課題に対応するため、持続可能なAIソリューションの開発には緊急の必要性がある。これらのソリューションは、限られた資源を持つ環境でも多様なデータタイプを扱えるエネルギー効率の高い組込みシステムに焦点を合わせ、技術的進歩と環境責任の両立を保証する必要がある。エッジデバイス用の小さな機械学習モデルに補完的なマルチモーダルデータを統合することは、複雑さ、レイテンシ、消費電力の増加によって困難である。この研究はTinyM$^2$Net-V3を導入し、相補的なデータの異なるモダリティを処理し、ディープニューラルネットワーク(DNN)モデルを設計し、知識蒸留や低ビット幅量子化を含むモデル圧縮技術を用いて、低メモリ階層レベルにモデルを適合させ、レイテンシを低減し、リソース制約のあるデバイスにおけるエネルギー効率を向上する。我々はTinyM$^2$Net-V3を2つのマルチモーダルケーススタディで評価した。小さな推論モデル(6KBと58KB)では、それぞれ92.95%と90.7%の精度を達成した。私たちの小さな機械学習モデルは、リソース制限されたハードウェア上にデプロイされ、ミリ秒以内の低レイテンシと非常に高い電力効率を示しました。 The advancement of sophisticated artificial intelligence (AI) algorithms has led to a notable increase in energy usage and carbon dioxide emissions, intensifying concerns about climate change. This growing problem has brought the environmental sustainability of AI technologies to the forefront, especially as they expand across various sectors. In response to these challenges, there is an urgent need for the development of sustainable AI solutions. These solutions must focus on energy-efficient embedded systems that are capable of handling diverse data types even in environments with limited resources, thereby ensuring both technological progress and environmental responsibility. Integrating complementary multimodal data into tiny machine learning models for edge devices is challenging due to increased complexity, latency, and power consumption. This work introduces TinyM$^2$Net-V3, a system that processes different modalities of complementary data, designs deep neural network (DNN) models, and employs model compression techniques including knowledge distillation and low bit-width quantization with memory-aware considerations to fit models within lower memory hierarchy levels, reducing latency and enhancing energy efficiency on resource-constrained devices. We evaluated TinyM$^2$Net-V3 in two multimodal case studies: COVID-19 detection using cough, speech, and breathing audios, and pose classification from depth and thermal images. With tiny inference models (6 KB and 58 KB), we achieved 92.95% and 90.7% accuracies, respectively. Our tiny machine learning models, deployed on resource limited hardware, demonstrated low latencies within milliseconds and very high power efficiency.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 強化学習における変分量子回路の最適化手法に関する研究 A Study on Optimization Techniques for Variational Quantum Circuits in Reinforcement Learning ( http://arxiv.org/abs/2405.12354v1 ) ライセンス: Link先を確認	Michael Kölle, Timo Witter, Tobias Rohe, Gerhard Stenzel, Philipp Altmann, Thomas Gabor,	(参考訳) 量子コンピューティングは、機械学習を合理化し、トレーニング可能なパラメータを少なくすることで、より効果的にすることを目指している。このパラメータの削減は、学習プロセスを高速化し、計算資源の使用を削減できる。しかし、量子コンピューティングの現段階では、ノイズのある中間スケール量子時代 (NISQ) として知られており、量子ビットの限られた数と広範な量子ノイズのために学習は困難である。これらの課題を克服するために、研究者は変分量子回路(VQC)に注目している。 VQCは量子回路をマージするハイブリッドアルゴリズムであり、パラメータによって調整できる。これらの回路は、効果的な学習のために数量子ビットしか必要としない。近年の研究では、強化学習にVQCを適用する新しい方法が提示されており、さらなる探索を保証できる有望な結果を示している。本研究では,データ再ロード,インプットスケーリング,アウトプットスケーリングといった様々な手法の効果について検討し,量子近位法最適化アルゴリズムのアクタ-VQCにおいて指数的学習率減衰を導入する。我々は,これらの手法を,人気のある凍結湖とキャットポールの環境において評価する。我々の焦点は、効率を損なうことなく、VQC内のパラメータ数を削減できることにあります。以上の結果から,データ再アップロードと指数的学習率の低下により,ハイパーパラメータ安定性と全体的な性能が著しく向上することが示唆された。入力スケーリングはパラメータ効率を向上しないが、出力スケーリングはグレディネスを効果的に管理し、学習速度と堅牢性を高める。 Quantum Computing aims to streamline machine learning, making it more effective with fewer trainable parameters. This reduction of parameters can speed up the learning process and reduce the use of computational resources. However, in the current phase of quantum computing development, known as the noisy intermediate-scale quantum era (NISQ), learning is difficult due to a limited number of qubits and widespread quantum noise. To overcome these challenges, researchers are focusing on variational quantum circuits (VQCs). VQCs are hybrid algorithms that merge a quantum circuit, which can be adjusted through parameters, with traditional classical optimization techniques. These circuits require only few qubits for effective learning. Recent studies have presented new ways of applying VQCs to reinforcement learning, showing promising results that warrant further exploration. This study investigates the effects of various techniques -- data re-uploading, input scaling, output scaling -- and introduces exponential learning rate decay in the quantum proximal policy optimization algorithm's actor-VQC. We assess these methods in the popular Frozen Lake and Cart Pole environments. Our focus is on their ability to reduce the number of parameters in the VQC without losing effectiveness. Our findings indicate that data re-uploading and an exponential learning rate decay significantly enhance hyperparameter stability and overall performance. While input scaling does not improve parameter efficiency, output scaling effectively manages greediness, leading to increased learning speed and robustness.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 宇宙制御のための深層強化学習における選択の影響の検討 Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls ( http://arxiv.org/abs/2405.12355v1 ) ライセンス: Link先を確認	Nathaniel Hamilton, Kyle Dunlap, Kerianne L. Hobbs,	(参考訳) 多くの宇宙用途において、従来の制御法は操作中によく用いられる。しかし、宇宙資産の数は増え続けているため、自律的な運用は異なる宇宙関連タスクに対する制御方法の迅速な開発を可能にする。自律的な制御を開発する方法のひとつに強化学習(Reinforcement Learning, RL)がある。 RLエージェントが有界連続制御値を学ぶことは一般的であるが、伝統的に制御のオン/オフアプローチを好む多くの宇宙タスクにとって現実的あるいは実践的ではないかもしれない。本稿では、エージェントが予め定義されたアクションリストから選択しなければならない個別のアクション空間を用いて分析する。実験では、エージェントに提供された選択肢の数が、トレーニング中および後のパフォーマンスにどのように影響するかを調査した。この分析は、エージェントが物体を周航してその表面上の点を検査しなければならない検査タスクと、エージェントが別の宇宙船と「ドック」に接近し、相対速度が低いドッキングを行うドッキングタスクに対して行われる。両方のタスクの共通の目的は、燃料の使用を最小化することであり、燃料を使用しないアクションを定期的に選択する動機となっている。本結果より, 個別選択が限定された場合, 検査作業の最適性能が得られ, 連続制御はドッキング作業の最適性能が導かれることがわかった。 For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising performance and success across many complex tasks. While it is common for RL agents to learn bounded continuous control values, this may not be realistic or practical for many space tasks that traditionally prefer an on/off approach for control. This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. The experiments explore how the number of choices provided to the agents affects their measured performance during and after training. This analysis is conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" with a low relative speed. A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel. Our results show that a limited number of discrete choices leads to optimal performance for the inspection task, while continuous control leads to optimal performance for the docking task.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 多次元一般化ランゲヴィン方程式による粗粒配座力学--どのように、いつ、なぜか Coarse-graining conformational dynamics with multi-dimensional generalized Langevin equation: how, when, and why ( http://arxiv.org/abs/2405.12356v1 ) ライセンス: Link先を確認	Pinchen Xie, Yunrui Qiu, Weinan E,	(参考訳) データ駆動型ab initio Generalized Langevin equation (AIGLE) アプローチが開発され、高次元、不均一、粗粒状コンフォメーションダイナミクスを学習し、シミュレートする。揺らぎ散逸定理に制約されたこのアプローチは、全原子分子動力学との動的整合性において粗い粒度のモデルを構築することができる。また,AIGLEが長期的動的整合性を実現するための実践的基準を提案する。 20の粗粒の部位を持つおもちゃのポリマーと2つの二面角を持つアラニンジペプチドのケーススタディは、実際に粗粒のコンフォメーション力学をモデル化するために、なぜAIGLEまたはマルコフ限界を採用するべきかを解明する。 A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics. Constrained by the fluctuation-dissipation theorem, the approach can build coarse-grained models in dynamical consistency with all-atom molecular dynamics. We also propose practical criteria for AIGLE to enforce long-term dynamical consistency. Case studies of a toy polymer, with 20 coarse-grained sites, and the alanine dipeptide, with two dihedral angles, elucidate why one should adopt AIGLE or its Markovian limit for modeling coarse-grained conformational dynamics in practice.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 高加速度肝4次元MRIのための条件付き生成逆相関ネットワーク Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI ( http://arxiv.org/abs/2405.12357v1 ) ライセンス: Link先を確認	Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng,	(参考訳) 目的:高時空間分解能の4D MRIが肝放射線治療において望まれる。密集したk空間データを取得するのに時間がかかります。スパースサンプルによる高速な取得が望ましいが、画像の品質低下や長い復元時間の原因となることが多い。本研究では, 再構成品質を維持しつつ, 4次元MRI再構成時間の短縮を図るために, 再構成ペア付き条件生成適応ネットワーク(Re-Con-GAN)を提案する。方法: 自由呼吸肝4D MRIを施行した患者を対象とした。 nuFFTアルゴリズムを用いて3, 6, 10倍 (3x, 6x, 10x) の完全および振り返りアンダーサンプリングデータを初めて再構成した。その後、Re-Con-GANはペアで入力と出力を訓練した。 ResNet9, UNet, Restructation Swin Transformerの3種類のネットワークがジェネレータとして探索された。 PatchGANが差別者に選ばれた。 Re-Con-GANはデータを時間スライス(2D+t)として処理した。時間スライス12332例のうち48例はトレーニング(37例は10721スライス)とテスト(11例は1611スライス)に分けられた。結果: Re-Con-GAN は CS/UNet モデルと比較し,PSNR,SSIM,RMSE のスコアを一貫して達成した。 Re-Con-GAN、UNet、CSの推論時間は0.15s、0.16s、120sである。 GTV検出タスクでは、UNetと比較してRe-Con-GANとCSは、未処理のアンダーサンプル画像(3x 69.61%)のダイススコア(3x Re-Con-GAN 80.98%、3x CS 80.74%、3x UNet 79.88%)を改善した。結論: 家庭内データセットに有望かつ効率的な再構成結果を提示し, 対人訓練を施した生成ネットワークを提案する。 4D肝MRの迅速かつ質的な再構成は、肝癌に対するオンライン適応型MR誘導放射線療法を促進する可能性がある。 Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI reconstruction time while maintaining the reconstruction quality. Methods: Patients who underwent free-breathing liver 4D MRI were included in the study. Fully- and retrospectively under-sampled data at 3, 6 and 10 times (3x, 6x and 10x) were first reconstructed using the nuFFT algorithm. Re-Con-GAN then trained input and output in pairs. Three types of networks, ResNet9, UNet and reconstruction swin transformer, were explored as generators. PatchGAN was selected as the discriminator. Re-Con-GAN processed the data (3D+t) as temporal slices (2D+t). A total of 48 patients with 12332 temporal slices were split into training (37 patients with 10721 slices) and test (11 patients with 1611 slices). Results: Re-Con-GAN consistently achieved comparable/better PSNR, SSIM, and RMSE scores compared to CS/UNet models. The inference time of Re-Con-GAN, UNet and CS are 0.15s, 0.16s, and 120s. The GTV detection task showed that Re-Con-GAN and CS, compared to UNet, better improved the dice score (3x Re-Con-GAN 80.98%; 3x CS 80.74%; 3x UNet 79.88%) of unprocessed under-sampled images (3x 69.61%). Conclusion: A generative network with adversarial training is proposed with promising and efficient reconstruction results demonstrated on an in-house dataset. The rapid and qualitative reconstruction of 4D liver MR has the potential to facilitate online adaptive MR-guided radiotherapy for liver cancer.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# エンタープライズRAGのための原子単位を用いた質問ベース検索 Question-Based Retrieval using Atomic Units for Enterprise RAG ( http://arxiv.org/abs/2405.12363v1 ) ライセンス: Link先を確認	Vatsal Raina, Mark Gales,	(参考訳) エンタープライズ検索拡張生成(RAG)は、強力な大規模言語モデル(LLM)と内部的、あるいは時間的に変化する文書を組み合わせるための、非常に柔軟なフレームワークを提供する。 RAGでは、文書はまずチャンクされる。関連チャンクは、特定のユーザクエリに対して検索され、コンテクストとしてシンセサイザーLLMに渡されてクエリ応答を生成する。しかし、誤ったチャンクがシンセサイザーLLMを誘導して誤応答を発生させるため、検索ステップは性能を制限できる。本研究は、より正確なチャンクリのための標準密度検索ステップのゼロショット適応を提案する。具体的には、チャンクをまず原子ステートメントに分解する。合成質問の集合がこれらの原子上で生成される(コンテキストとしてチャンクが用いられる)。センス検索は、ユーザクエリに最も近い合成質問と関連するチャンクを見つけることを伴う。その結果,原子による検索はチャンクによる検索よりも高いリコールにつながることがわかった。原子上に生成した合成質問を用いた検索により、さらなる性能向上が観察された。検索ステップでのリコールの高速化により、RAGパイプラインを使用したエンタープライズLLMのパフォーマンスの向上が可能となる。 Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a specific user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work proposes a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 負の質量特性を模倣する Mimicking Negative Mass Properties ( http://arxiv.org/abs/2405.12366v1 ) ライセンス: Link先を確認	S. D. Campos,	(参考訳) 本研究では, 負の質量に起因する特性が正の質量粒子によって模倣されるような物理的条件を得るための2つの系を解析する。 1つ目は、ディラック方程式で表されるよく知られた1/2スピン系で、外部電磁場が存在する。いくつかの物理的制限を仮定すると、$e\rightarrow-e$ は $m\rightarrow-m$ と同じ結果をもたらす。特に、零誘電関数に対しては、負の荷電粒子からなる正の質量系から負の質量挙動を得ることができる。第2のシステムは、ド・ブロイ物質波に基づいている。そのような波の分散関係は、虚波数を仮定すると負(実数または虚数)となる。その結果、正の質量粒子に対する負の屈折率が出現した。しかし、この挙動は一般に負の質量系に起因する。 In the present work, one analyzes two systems trying to obtain physical conditions where some properties attributed to negative mass can be mimicked by positive mass particles. The first one is the well-known 1/2-spin system described by the Dirac equation in the presence of an external electromagnetic field. Assuming some physical restrictions, one obtains that the use of $e\rightarrow-e$ can lead to the same results as using $m\rightarrow-m$. In particular, for a null dielectric function, it is possible to obtain a negative mass behavior from a positive mass system composed of negatively charged particles. The second system is based on the de Broglie matter wave. The dispersion relation of such a wave can be negative (real or imaginary valued) if one assumes an imaginary wavenumber. The consequence is the emergence of a negative refractive index for positive mass particles. However, this behavior is generally attributed to a negative mass system.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 深層学習による膵の大規模マルチセンターCTとMRI分割 Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning ( http://arxiv.org/abs/2405.12367v1 ) ライセンス: Link先を確認	Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan, Emil Agarunov, Nassier Harfouch, Chenchan Huang, Marco J. Bruno, Ivo Schoots, Rajesh N. Keswani, Frank H. Miller, Tamas Gonda, Cemal Yazici, Temel Tirkes, Baris Turkbey, Michael B. Wallace, Ulas Bagci,	(参考訳) 膵疾患の診断と経過観察には,横断的画像診断における膵の容積分画の自動化が必要である。 CTベースの膵セグメンテーションはより確立されているが、MRIベースのセグメンテーション手法は、公開データセットの欠如、ベンチマーク研究の努力、ドメイン固有のディープラーニング手法が主な原因である。 2004年3月から2022年11月にかけて,T1強調画像(T1W)とT2強調画像(T2W)の大規模なデータセット(499名)を収集した。また,ベンチマーク目的で公開資料から1,350人の患者のCTも収集した。そこで我々は,nnUNetとTransformerネットワークの長所と,体積計算が可能な新しい線形アテンションモジュールを組み合わせた,パンセグネットと呼ばれる新しい膵分画法を開発した。我々は,Dice と Hausdorff 距離 (HD95) 評価指標を用いて,PanSegNet のクロスモダリティ (合計2,117スキャン) とクロスセンター設定の精度を検証した。我々は,CohenのKappa統計を,それぞれ量比較とDice比較のペアt検定に用いた。 T1W MRIでは85.0% (std: 7.9%) , T2W MRIでは86.3% (std: 6.4%) であった。 R^2は0.91,0.84,0.85はCT,T1W,T2Wと高い相関を示した。 0.624,0.638,T1W,T2WMRIにて中等度なサーバ間一致率を示し,高いサーバ内一致率を示した。すべてのMRIデータはhttps://osf.io/kysnj/で公開されている。ソースコードはhttps://github.com/NUBagciLab/PaNSegNetで公開されています。 Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# センサトリガー(TDOST)のテキスト記述によるスマートホームにおけるレイアウト非依存の人間活動認識 Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST) ( http://arxiv.org/abs/2405.12368v1 ) ライセンス: Link先を確認	Megha Thukral, Sourish Gunesh Dhekane, Shruthi K. Hiremath, Harish Haresamudram, Thomas Ploetz,	(参考訳) スマートホームにおける環境センサを用いたヒューマンアクティビティ認識(HAR)は、人間の健康と健康に多くの応用がある。しかし、新しいスマートホーム環境にデプロイ可能な汎用HARモデルを構築するには、大量のアノテートされたセンサデータとトレーニングのオーバーヘッドが必要である。ほとんどのスマートホームはレイアウト、すなわちフロアプランやセンサーの具体的特徴に大きく違いがあり、特定の住宅向けに訓練されたHARモデルの一般化性は低い。本稿では,センサデータの自然言語記述の伝達可能な表現能力を利用したスマートホームにおけるHARシステムのレイアウトに依存しない新しいモデリング手法を導入することで,この制限に対処する。この目的のために,センサトリガーのテクスチュアル記述(TDOST)を生成し,周囲のトリガー条件をカプセル化し,アクティビティ認識モデルに基盤となるアクティビティの手がかりを提供する。テキストの埋め込みを生のセンサデータではなく活用することで、対象の家庭に適応したり(再学習)することなく、家庭全体の標準的な活動を予測できる活動認識システムを構築します。本研究では,TDOSTをベースとしたスマートホームにおけるモデルの有効性を,ベンチマークしたCASASデータセットを用いた実験により実証した。さらに,本手法の個々の成分が下流活動認識性能に与える影響を詳細に分析する。 Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of sensors embedded, resulting in low generalizability of HAR models trained for specific homes. We address this limitation by introducing a novel, layout-agnostic modeling approach for HAR systems in smart homes that utilizes the transferrable representational capacity of natural language descriptions of raw sensor data. To this end, we generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions and provide cues for underlying activities to the activity recognition models. Leveraging textual embeddings, rather than raw sensor data, we create activity recognition systems that predict standard activities across homes without either (re-)training or adaptation on target homes. Through an extensive evaluation, we demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets. Furthermore, we conduct a detailed analysis of how the individual components of our approach affect downstream activity recognition performance.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# AtomGS: 高密度放射場のためのガウス散乱の微粒化 AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field ( http://arxiv.org/abs/2405.12369v1 ) ライセンス: Link先を確認	Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng,	(参考訳) 3D Gaussian Splatting (3DGS) は、新しいビュー合成とリアルタイムレンダリング速度の優れた機能を提供することにより、近年、放射界再構成が進んでいる。しかし、最適化と適応密度制御をブレンドするというその戦略は、時として、より小さなものを適切に密度付けするコストで、大きなガウスを最適化することを優先するため、ノイズの多い幾何学やぼやけたアーチファクトを生じることがある。この問題に対処するために、Atomized ProliferationとGeometry-Guided OptimizationからなるAtomGSを紹介します。 Atomized Proliferationは様々な大きさの楕円体ガウスをより均一な大きさの原子ガウスに制限する。この戦略は, シーンの細部に応じて, デンシフィケーションに重きを置くことで, 優れた特徴を持つ領域の表現を促進させる。さらに,エッジ・アウェア・ノーマル・ロスを組み込んだ幾何誘導最適化手法を提案する。この最適化方法は、複雑な詳細を保存しながら、平面を効果的に滑らかにする。評価の結果、AtomGSはレンダリング品質において既存の最先端手法よりも優れています。さらに、幾何再構成における競合精度を実現し、他のSDF法よりもトレーニング速度が大幅に向上する。よりインタラクティブなデモは、私たちのWebサイトにある(\href{https://rongliu-leo.github.io/AtomGS/}{https://rongliu-leo.github.io/AtomGS/})。 3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (\href{https://rongliu-leo.github.io/AtomGS/}{https://rongliu-leo.github.io/AtomGS/}).	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# DispaRisk: データセットにおける格差リスクの評価と解釈 DispaRisk: Assessing and Interpreting Disparity Risks in Datasets ( http://arxiv.org/abs/2405.12372v1 ) ライセンス: Link先を確認	Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala,	(参考訳) 機械学習アルゴリズム(ML)は、人間の生活のあらゆる側面に影響を与え、医療、金融、教育など、さまざまな分野にまたがって利用されてきた。しばしば、MLアルゴリズムはデータセットで示される社会的バイアスを悪化させ、多くの場合、個人のサブセットやグループに敵対的な影響をもたらす。これらの不適切な効果を効果的に軽減するためには、MLパイプラインの早期に相違/相の同定と評価が不可欠である。このプロアクティブなアプローチは、バイアスの増幅を防ぎ、モデル開発の後期段階で複雑さを減らすために、タイムリーな介入を促進する。本稿では,MLパイプラインの初期段階におけるデータセットの不均一性の潜在的なリスクを積極的に評価するために設計された,新しいフレームワークであるDispaRiskを紹介する。フェアネス研究でよく使われるデータセットとベンチマークすることで、DispaRiskの有効性を評価する。以上の結果から,差別リスクの高いデータセットを識別するDispaRiskの能力,バイアスを伴いやすいモデルファミリー,MLパイプラインにおける識別感受性を高める特徴が示された。実験用のコードは以下のリポジトリで利用可能です。 Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors, including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases presented in datasets, leading to adversarial impacts on subsets/groups of individuals, in many cases minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified and assessed early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it with commonly used datasets in fairness research. Our findings demonstrate the capabilities of DispaRisk to identify datasets with a high-risk of discrimination, model families prone to biases, and characteristics that heighten discrimination susceptibility in a ML pipeline. The code for our experiments is available in the following repository: https://github.com/jovasque156/disparisk	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 時空間アテンションに基づく隠れた物理インフォームドニューラルネットワークによる生活予測 Spatio-temporal Attention-based Hidden Physics-informed Neural Network for Remaining Useful Life Prediction ( http://arxiv.org/abs/2405.12377v1 ) ライセンス: Link先を確認	Feilong Jiang, Xiaonan Hou, Min Xia,	(参考訳) RUL(Representing Useful Life)の予測は、産業システムにおける予後健康管理(PHM)に不可欠である。ディープラーニングアプローチはRUL予測においてかなりの成功を収めているが、予測精度の低下や解釈可能性の低下といった課題が大きな課題となり、実践的な実装を妨げている。本研究では,RUL予測のための時空間アテンションに基づく隠れ物理インフォームドニューラルネットワーク(STA-HPINN)を提案する。時空間的注意機構は、入力データから重要な特徴を抽出することができる。センサ次元と時間ステップ次元の両方における自己認識機構により、提案モデルは劣化情報を効果的に抽出することができる。隠れた物理インフォームドニューラルネットワークを用いて、RULの進化を管理する物理機構を捉える。物理の制約により、モデルはより高い精度と妥当な予測を達成できる。このアプローチはベンチマークデータセットで検証され、特に複雑な条件の場合、最先端の手法と比較して、例外的なパフォーマンスを示す。 Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-based Hidden Physics-informed Neural Network (STA-HPINN) for RUL prediction, which can utilize the associated physics of the system degradation. The spatio-temporal attention mechanism can extract important features from the input data. With the self-attention mechanism on both the sensor dimension and time step dimension, the proposed model can effectively extract degradation information. The hidden physics-informed neural network is utilized to capture the physics mechanisms that govern the evolution of RUL. With the constraint of physics, the model can achieve higher accuracy and reasonable predictions. The approach is validated on a benchmark dataset, demonstrating exceptional performance when compared to cutting-edge methods, especially in the case of complex conditions.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 量子核法における計算資源としての位相空間負性 Phase-space negativity as a computational resource for quantum kernel methods ( http://arxiv.org/abs/2405.12378v1 ) ライセンス: Link先を確認	Ulysse Chabaud, Roohollah Ghobadi, Salman Beigi, Saleh Rahimi-Keshari,	(参考訳) 量子カーネル法は、機械学習において量子計算の優位性を達成するための提案である。量子カーネルと呼ばれる関数は量子デバイスによって推定され、残りの計算は古典的に実行される。量子カーネル関数が古典的コンピュータ上で効率的に推定できない場合に限り、この方法で量子上の優位性を達成することができる。本稿では,ボゾン系における量子カーネル関数の最適古典的推定に十分な条件を提供する。具体的には、量子カーネルに関連付けられた量子状態の位相空間準確率分布における負性度が、量子回路の大きさと多項式的にほぼ一致している場合、カーネル関数を古典的に効率的に推定できることを示す。我々は、適応的な非ガウス測度を持つ線形光学ネットワークを含む量子光学的例を考察し、古典的シミュレーションの効率に対する損失の影響について検討する。本研究は, カーネル法に基づく量子機械学習において, 位相空間準確率分布における負性の役割を決定づけるものである。 Quantum kernel methods are a proposal for achieving quantum computational advantage in machine learning. They are based on a hybrid classical-quantum computation where a function called the quantum kernel is estimated by a quantum device while the rest of the computation is performed classically. Quantum advantages may be achieved through this method only if the quantum kernel function cannot be estimated efficiently on a classical computer. In this paper, we provide sufficient conditions for the efficient classical estimation of quantum kernel functions for bosonic systems. Specifically, we show that if the negativity in the phase-space quasi-probability distributions of data-encoding quantum states associated with the quantum kernel scales at most polynomially with the size of the quantum circuit, then the kernel function can be estimated efficiently classically. We consider quantum optical examples involving linear-optical networks with and without adaptive non-Gaussian measurements and investigate the effects of loss on the efficiency of the classical simulation. Our results underpin the role of the negativity in phase-space quasi-probability distributions as an essential resource in quantum machine learning based on kernel methods.	翻訳日:2024-05-22 14:57:39 公開日:2024-05-20
# 測定依存は量子ネットワークにおけるセキュリティを高める Measurement dependence can enhance security in a quantum network ( http://arxiv.org/abs/2405.12379v1 ) ライセンス: Link先を確認	Amit Kundu, Debasis Sarkar,	(参考訳) ネットワーク非局所性(Network Nonlocality)は、ベルの定理を超えてネットワーク構造を構成する量子非局所性の先進的な研究である。量子ネットワークの発展は、セバラル量子情報処理タスクに多くの技術応用をもたらす可能性がある。ここでは、ネットワークにおけるエンドパーティの測定選択の独立性の役割に注目し、量子ネットワークにおけるセキュリティを強化するために使用することができる。 3つのパートに分かれた2つのソースのバイローカルネットワークと4つのパートの3つのソースのスターネットワークシナリオにおいて、誰かがネットワーク通信に侵入したければ、実際のセキュリティプロトコルを強化するための仮定の緩和を理解するための実践的な方法が示される。理論的には、一方の端点のみの測定選択の独立性を緩和することにより、標準ネットワーク非局所性(SNN)とより強いフルネットワーク非局所性(FNN)を作成でき、古典的無信号局所モデルによって最大量子違反が得られることを証明した。我々は、FNNがSNNよりも強いという意味で、ネットワーク内のすべてのソースが非ローカルリソースを分散する必要がある、とFNNは述べている。 Network Nonlocality is an advanced study of quantum nonlocality that comprises network structure beyond Bell's theorem. The development of quantum networks has the potential to bring a lot of technological applications in sevaral quantum information processing tasks. Here, we are focusing on how the role of the independence of the measurement choices of the end parties in a network works and can be used to enhance the security in a quantum network. In both three-parties two-sources bilocal network and four-parties three-sources star network scenarios, we are able to show, a practical way to understand the relaxation of the assumptions to enhance a real security protocol if someone wants to breach in a network communications. Theoratically, we have proved that by relaxing the independence of the measurement choices of only one end party we can create a Standard Network Nonlocality(SNN) and more stronger Full Network Nonlocality(FNN) and we can get maximum quantum violation by the classical no-signalling local model. We are able to distinguish between two types of network nonlocality in the sense that the FNN is stronger than SNN, i.e., FNN states all the sources in a network need to distribute nonlocal resources.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 確率型貯留層コンピュータ Stochastic Reservoir Computers ( http://arxiv.org/abs/2405.12382v1 ) ライセンス: Link先を確認	Peter J. Ehlers, Hendra I. Nurdin, Daniel Soh,	(参考訳) 貯留層コンピューティング(Reservoir computing)は、非線形力学システムを用いて、典型的なニューラルネットワークと比較してコスト効率の良い複雑なタスクを実行する機械学習の一種である。近年の貯水池コンピューティングの進歩、特に量子貯水池コンピューティングは、本質的に確率的な貯水池を利用している。しかし、これらのシステムを使用する理論的正当性はまだ十分に確立されていない。本稿では, 確率型貯水池コンピュータの普遍性について検討し, 各貯水池状態の確率を状態自体ではなく可読性として利用した確率型貯水池計算システムについて検討する。確率的貯水池計算では、貯水池のコンピュータ全体の異なる状態の数は、貯水池のハードウェアのサイズと指数関数的にスケールできる可能性があり、コンパクトなデバイスサイズに利点がある。確率的エコー状態ネットワークのクラス、従って全ての確率的貯水池コンピュータのクラスは普遍的な近似クラスであることを示す。また,確率型貯水池コンピュータの分類とカオス時系列予測における実用例について検討した。ショットノイズは確率的貯水池計算の性能の限界要因であるが,ノイズの影響が小さい場合には,類似のハードウェアを持つ決定論的貯水池コンピュータに比べて性能が大幅に向上した。 Reservoir computing is a form of machine learning that utilizes nonlinear dynamical systems to perform complex tasks in a cost-effective manner when compared to typical neural networks. Many recent advancements in reservoir computing, in particular quantum reservoir computing, make use of reservoirs that are inherently stochastic. However, the theoretical justification for using these systems has not yet been well established. In this paper, we investigate the universality of stochastic reservoir computers, in which we use a stochastic system for reservoir computing using the probabilities of each reservoir state as the readout instead of the states themselves. In stochastic reservoir computing, the number of distinct states of the entire reservoir computer can potentially scale exponentially with the size of the reservoir hardware, offering the advantage of compact device size. We prove that classes of stochastic echo state networks, and therefore the class of all stochastic reservoir computers, are universal approximating classes. We also investigate the performance of two practical examples of stochastic reservoir computers in classification and chaotic time series prediction. While shot noise is a limiting factor in the performance of stochastic reservoir computing, we show significantly improved performance compared to a deterministic reservoir computer with similar hardware in cases where the effects of noise are small.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# ディープラーニングによる脆弱性検出 Vulnerability Detection with Deep Learning ( http://arxiv.org/abs/2405.12384v1 ) ライセンス: Link先を確認	Zhen Huang, Amy Aumpansub,	(参考訳) ディープラーニングは、ソフトウェアの脆弱性を検出するための有望なツールであることが示されている。本研究では,C/C++プログラムのソースコードから抽出したプログラムスライスを用いてニューラルネットワークをトレーニングし,ソフトウェア脆弱性を検出する。プログラムスライスでは、API関数呼び出し、配列使用、ポインタ使用、演算式など、脆弱性に関連するプログラム構成の構文とセマンティック特性をキャプチャする。脆弱なコードと非脆弱なコードの両方に対して強力な予測モデルを実現するため、異なるタイプのトレーニングデータ、異なるオプティマイザ、異なるタイプのニューラルネットワークを比較した。この結果から,ソースコードの特徴の相違と,脆弱なプログラムスライスと非脆弱性なプログラムスライスをバランスよく組み合わせることで,脆弱なコードと非脆弱性なコードの両方を予測する上で,バランスの取れた精度が得られることがわかった。さまざまなニューラルネットワークの中で、ADAMオプティマイザを備えたBGRUは、92.49%の精度でソフトウェア脆弱性を検出するのに最善を尽くしている。 Deep learning has been shown to be a promising tool in detecting software vulnerabilities. In this work, we train neural networks with program slices extracted from the source code of C/C++ programs to detect software vulnerabilities. The program slices capture the syntax and semantic characteristics of vulnerability-related program constructs, including API function call, array usage, pointer usage, and arithmetic expression. To achieve a strong prediction model for both vulnerable code and non-vulnerable code, we compare different types of training data, different optimizers, and different types of neural networks. Our result shows that combining different types of characteristics of source code and using a balanced number of vulnerable program slices and non-vulnerable program slices produce a balanced accuracy in predicting both vulnerable code and non-vulnerable code. Among different neural networks, BGRU with the ADAM optimizer performs the best in detecting software vulnerabilities with an accuracy of 92.49%.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# SciJava Ops: FijiとBeyondのための改良されたアルゴリズムフレームワーク SciJava Ops: An Improved Algorithms Framework for Fiji and Beyond ( http://arxiv.org/abs/2405.12385v1 ) ライセンス: Link先を確認	Gabriel J. Selzer, Curtis T. Rueden, Mark C. Hiner, Edward L. Evans III, David Kolb, Marcel Wiedenmann, Christian Birkhold, Tim-Oliver Buchholz, Stefan Helfrich, Brian Northan, Alison Walter, Johannes Schindelin, Tobias Pietzsch, Stephan Saalfeld, Michael R. Berthold, Kevin W. Eliceiri,	(参考訳) 多くの科学ソフトウェアプラットフォームは、外部開発機能の統合、デプロイ、実行を簡単にするプラグインメカニズムを提供している。画像分野で最も広く使われているプラットフォームのひとつに、科学画像分析のための人気のあるオープンソースアプリケーションであるFijiがある。 FijiにはImageJとImageJ2プラットフォームが組み込まれており、多様な問題を解決するために数千のプラグインが使用する強力なプラグインアーキテクチャを提供する。この機能はFijiの成功の重要な部分であり、生体画像解析ツールとして広く使われ、新機能のターゲットとなっている。しかし、プラグインベースのソフトウェアアーキテクチャは、互換性のないデータ構造で動作する異なるプラットフォームを統合することはできない。その結果、Fijiのようなプラットフォームは高い相互接続性と拡張性を実現していますが、多くのデータタイプ、プログラミング言語、さまざまなソフトウェアプラットフォームのアーキテクチャ上の違いをまたいで統合するように設計されていません。この課題に対処するために、我々はSciJava Opsという、プラグインとしてアルゴリズムを表現するための基礎的なソフトウェアライブラリを紹介します。 FijiのSciJavaプラグインメカニズムの進化を続けているSciJava Opsは、中央実行環境内のさまざまなソフトウェアプラットフォームからのアルゴリズムを活用することができる。さらに、SciJava Opsは、各アルゴリズムの最も適切な構造に自動的にデータを適応し、ユーザーが自由に透過的に非互換なツールのアルゴリズムを組み合わせられるようにする。 SciJava Opsは最初はFijiのアップデートサイトとして配布されるが、フレームワークはFiji、ImageJ、ImageJ2を必要としない。 Many scientific software platforms provide plugin mechanisms that simplify the integration, deployment, and execution of externally developed functionality. One of the most widely used platforms in the imaging space is Fiji, a popular open-source application for scientific image analysis. Fiji incorporates and builds on the ImageJ and ImageJ2 platforms, which provide a powerful plugin architecture used by thousands of plugins to solve a wide variety of problems. This capability is a major part of Fiji's success, and it has become a widely used biological image analysis tool and a target for new functionality. However, a plugin-based software architecture cannot unify disparate platforms operating on incompatible data structures; interoperability necessitates the creation of adaptation or "bridge" layers to translate data and invoke functionality. As a result, while platforms like Fiji enable a high degree of interconnectivity and extensibility, they were not fundamentally designed to integrate across the many data types, programming languages, and architectural differences of various software platforms.To help address this challenge, we present SciJava Ops, a foundational software library for expressing algorithms as plugins in a unified and extensible way. Continuing the evolution of Fiji's SciJava plugin mechanism, SciJava Ops enables users to harness algorithms from various software platforms within a central execution environment. In addition, SciJava Ops automatically adapts data into the most appropriate structure for each algorithm, allowing users to freely and transparently combine algorithms from otherwise incompatible tools. While SciJava Ops is initially distributed as a Fiji update site, the framework does not require Fiji, ImageJ, or ImageJ2, and would be suitable for integration with additional image analysis platforms.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 粒子群最適化と最大近似と負の負二項回帰への応用 Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression ( http://arxiv.org/abs/2405.12386v1 ) ライセンス: Link先を確認	Sisi Shao, Junhyung Park, Weng Kee Wong,	(参考訳) nlminb, optim (R) や nlmixed (SAS) などの汎用最適化ルーチンは、非標準分布のモデルパラメータを推定するために頻繁に使用される。本稿では、統計学で使われている多くのアルゴリズムの代替として、Particle Swarm Optimization (PSO)を提案する。 PSOは上記のルーチンと同じ結果を再現できるだけでなく、より最適な結果や、他のルーチンが収束できない結果も生成できる。後者の場合、問題や問題の原因を特定することもできる。一般化分布のいくつかのパラメータは、(RまたはSASのルーチンを用いて明らかにまたは計算的に表されていない場合)PSOを用いて未同定され、(2)PSOは、現在のルーチンがそうでない場合の対数二項回帰の予測結果を生成することができる、(3)PSOは、それぞれGLMやGENMODなどの標準パッケージによって支持されるLASSOペナルティによる二項回帰のリンク関数に柔軟性を提供する、(4)PSOは、モーメントに依存する従来の統計手法と比較して、EE-IW分布の優れたMLE推定を提供する、という4つの例を用いてPSOを使用する利点を強調した。 General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that are more optimal or when others cannot converge. In the latter case, it can also identify the source of the problem or problems. We highlight advantages of using PSO using four examples, where: (1) some parameters in a generalized distribution are unidentified using PSO when it is not apparent or computationally manifested using routines in R or SAS; (2) PSO can produce estimation results for the log-binomial regressions when current routines may not; (3) PSO provides flexibility in the link function for binomial regression with LASSO penalty, which is unsupported by standard packages like GLM and GENMOD in Stata and SAS, respectively, and (4) PSO provides superior MLE estimates for an EE-IW distribution compared with those from the traditional statistical methods that rely on moments.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 隠れたコンバウンディングによるコンフォルマルカウンターファクト推論 Conformal Counterfactual Inference under Hidden Confounding ( http://arxiv.org/abs/2405.12387v1 ) ライセンス: Link先を確認	Zonghao Chen, Ruocheng Guo, Jean-François Ton, Yang Liu,	(参考訳) パーソナライズされた意思決定は、異なる治療の下での潜在的な結果に関する知識を必要とし、潜在的な結果に対する信頼区間は、この意思決定プロセスをさらに強化し、高い評価シナリオにおける信頼性を向上させる。反事実的世界における潜在的な結果の予測と不確実性は、因果推論における因果的問題を引き起こす。反事実に対する信頼区間を構築する既存の方法は、強い無知の仮定に依存するか、または観察的分布と介入的分布の違いを特徴づける識別不能な下限と上限へのアクセスを必要とする。これらの制限を克服するために,まず,提案手法をトランスダクティブ重み付き共形予測に基づく新しいアプローチwTCP-DRを提案する。制約の少ない仮定では、観察ディストリブトインから介入分布への共変量シフトを考慮に入れるために、少数の介入データ(ランダム化制御試験から)にアクセスする必要がある。理論的な結果は、介入データのみを使用するネーブ法に対して、我々のアルゴリズムが厳密に有利である条件を明確に示している。対策薬の有効間隔を確保した後、個別治療効果(ITE)の間隔を構築することは容易である。提案手法は, 対象範囲と効率の両面で, 最先端のベースラインと比較して, 提案手法の優位性を検証するために, 推薦システムを含む, 合成および実世界のデータにまたがる手法を実証する。 Personalized decision making requires the knowledge of potential outcomes under different treatments, and confidence intervals about the potential outcomes further enrich this decision-making process and improve its reliability in high-stakes scenarios. Predicting potential outcomes along with its uncertainty in a counterfactual world poses the foundamental challenge in causal inference. Existing methods that construct confidence intervals for counterfactuals either rely on the assumption of strong ignorability, or need access to un-identifiable lower and upper bounds that characterize the difference between observational and interventional distributions. To overcome these limitations, we first propose a novel approach wTCP-DR based on transductive weighted conformal prediction, which provides confidence intervals for counterfactual outcomes with marginal converage guarantees, even under hidden confounding. With less restrictive assumptions, our approach requires access to a fraction of interventional data (from randomized controlled trials) to account for the covariate shift from observational distributoin to interventional distribution. Theoretical results explicitly demonstrate the conditions under which our algorithm is strictly advantageous to the naive method that only uses interventional data. After ensuring valid intervals on counterfactuals, it is straightforward to construct intervals for individual treatment effects (ITEs). We demonstrate our method across synthetic and real-world data, including recommendation systems, to verify the superiority of our methods compared against state-of-the-art baselines in terms of both coverage and efficiency	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 1次元マニフォールド学習のためのメートル法に基づく主曲線法 A Metric-based Principal Curve Approach for Learning One-dimensional Manifold ( http://arxiv.org/abs/2405.12390v1 ) ライセンス: Link先を確認	Elvis Han Cui, Sisi Shao,	(参考訳) 主曲線(英: principal curve)は、微分幾何学の概念を用いた多様体学習を指向したよく知られた統計手法である。本稿では,空間データの1次元多様体を学習する新しい計量ベース主曲線(MPC)法を提案する。合成データセット MNISTデータセットを用いた実応用により,本手法は形状の観点から一次元多様体をよく学習できることを示す。 Principal curve is a well-known statistical method oriented in manifold learning using concepts from differential geometry. In this paper, we propose a novel metric-based principal curve (MPC) method that learns one-dimensional manifold of spatial data. Synthetic datasets Real applications using MNIST dataset show that our method can learn the one-dimensional manifold well in terms of the shape.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 欧州XFELクライストロンの異常自動検出 Automated Anomaly Detection on European XFEL Klystrons ( http://arxiv.org/abs/2405.12391v1 ) ライセンス: Link先を確認	Antonin Sulc, Annika Eichler, Tim Wilksen,	(参考訳) 高出力マルチビームクライストロンは、欧州XFELにおける超伝導無線周波数(SRF)キャビティの加速場を生成するためにRFを増幅する重要な構成要素である。これらの高出力コンポーネントの交換には時間と労力を要するため、メンテナンスとダウンタイムを最小限に抑えると同時に、デバイスの動作を最大化する必要がある。機械学習を用いてクライストロンの挙動を探索するために,我々は,さまざまな操作モードを判定し,特徴抽出と次元還元を行い,通常の操作に関する情報を抽出する一連の実験を完了した。記録されたデータを分析するために、私たちは最先端のデータ駆動学習技術を使用し、Klystronの運用状態をよりよく理解し、可能性のある障害や異常の早期発見に役立つ最も有望なコンポーネントを認識しました。 High-power multi-beam klystrons represent a key component to amplify RF to generate the accelerating field of the superconducting radio frequency (SRF) cavities at European XFEL. Exchanging these high-power components takes time and effort, thus it is necessary to minimize maintenance and downtime and at the same time maximize the device's operation. In an attempt to explore the behavior of klystrons using machine learning, we completed a series of experiments on our klystrons to determine various operational modes and conduct feature extraction and dimensionality reduction to extract the most valuable information about a normal operation. To analyze recorded data we used state-of-the-art data-driven learning techniques and recognized the most promising components that might help us better understand klystron operational states and identify early on possible faults or anomalies.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# ASMR: 効率的な推論のための活性化共有マルチレゾリューションコーディネートネットワーク ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference ( http://arxiv.org/abs/2405.12398v1 ) ライセンス: Link先を確認	Jason Chun Lok Li, Steven Tin Sui Luo, Le Xu, Ngai Wong,	(参考訳) コーディネート・ネットワーク (Coordinate Network) または暗黙的ニューラル表現 (INR) は、コンパクトなニューラル表現の利点により、自然信号(画像やビデオなど)を高速に符号化する手法である。 INRの符号化能力を高めるために多くの方法が提案されているが、しばしば見過ごされる側面は推論効率であり、通常は乗算累積(MAC)数で測定される。これは、ハードウェアの制約によって推論スループットが大幅に制限されるユースケースにおいて特に重要である。そこで本研究では,多分解能座標分解と階層変調を組み合わせたASMR(Activation-Sharing Multi-Resolution)座標ネットワークを提案する。具体的には、ASMRモデルはデータのグリッド間でのアクティベーションの共有を可能にする。これにより、その推論コストは、その再設計能力と直接的に相関する深さから大きく切り離され、層数に関係なくほぼO(1)推論の複雑さが生じる。実験により、ASMRはバニラSIRENモデルのMACを最大500倍まで低減し、SIRENのベースラインよりもさらに高い再現品質が得られることが示された。 Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This is particularly critical in use cases where inference throughput is greatly limited by hardware constraints. To this end, we propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations. Specifically, an ASMR model enables the sharing of activations across grids of the data. This largely decouples its inference cost from its depth which is directly correlated to its reconstruction capability, and renders a near O(1) inference complexity irrespective of the number of layers. Experiments show that ASMR can reduce the MAC of a vanilla SIREN model by up to 500x while achieving an even higher reconstruction quality than its SIREN baseline.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 世界モデリングのための拡散 : アタリにおける視覚的詳細事項 Diffusion for World Modeling: Visual Details Matter in Atari ( http://arxiv.org/abs/2405.12399v1 ) ライセンス: Link先を確認	Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret,	(参考訳) 世界モデルは、安全でサンプル効率のよい強化学習エージェントを訓練するための有望なアプローチである。最近の世界モデルは、主に環境力学をモデル化するために、離散潜在変数のシーケンスを操作する。しかし、このコンパクトな離散表現への圧縮は、強化学習において重要な視覚的詳細を無視する可能性がある。同時に、拡散モデルは画像生成において支配的なアプローチとなり、個別の潜伏者をモデル化する確立された手法に挑戦している。このパラダイムシフトを動機として,拡散世界モデルで訓練された強化学習エージェントであるDIAMOND(DIffusion As a Model of eNvironment Dreams)を紹介する。我々は,世界モデリングに適した拡散を実現する上で必要となる重要な設計選択を解析し,視覚的詳細の改善がエージェントの性能向上にどのように寄与するかを実証する。 DIAMONDは競争力のあるAtari 100kベンチマークで平均1.46の人間正規化スコアを達成している。世界モデリングのための拡散に関する将来の研究を促進するため、私たちはコード、エージェント、プレイ可能な世界モデルをhttps://github.com/eloialonso/diamond.comでリリースします。 World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. To foster future research on diffusion for world modeling, we release our code, agents and playable world models at https://github.com/eloialonso/diamond.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# Gottesman-Kitaev-Preskill状態の基底状態の性質と非線形スクイーズ Ground state nature and nonlinear squeezing of Gottesman-Kitaev-Preskill states ( http://arxiv.org/abs/2405.12406v1 ) ライセンス: Link先を確認	Petr Marek,	(参考訳) 旅行光による普遍量子計算の主なボトルネックは、十分な品質のゴッテマン・キタエフ・プレスキル状態の準備である。これは非常に難しい課題であり、実験的なだけでなく理論的な問題でもある。このような測度、GKPのスクイーズを導入し、状態の特徴付けの現在の方法とどのように関係しているかを示す。この尺度は計算が容易であり、状態の準備や実験結果の検証に容易に利用できる。 The main bottleneck for universal quantum computation with traveling light is the preparation of Gottesman-Kitaev-Preskill states of sufficient quality. This is an extremely challenging task, experimental as well as theoretical, also because there is currently no single easily computable measure of quality for these states. We introduce such measure, GKP squeezing, and show how it is related to the current ways of characterizing the states. The measure is easy to compute and can be easily employed in state preparation as well as verification of experimental results.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 木深度パラメータによる値制約満足度の局所探索 Local search for valued constraint satisfaction parameterized by treedepth ( http://arxiv.org/abs/2405.12410v1 ) ライセンス: Link先を確認	Artem Kaznatcheev,	(参考訳) 時折、局所探索アルゴリズムは局所的なピークを効率的に見つけることができない。その理由を理解するため、VCSP(価値制約満足度問題)からフィットネスランドスケープの上昇構造を考察する。木深$d$の制約グラフを持つ VCSP が与えられたとき、任意の初期割り当てから、常に局所ピークまでの長さ 2^{d + 1} \cdot n$ の上昇が存在することを証明します。これは、対数木深度の制約グラフから常にフィットネスランドスケープに存在することを意味し、したがって有界木幅のすべてのVCSPに対してもである。しかし、これは、ローカルな検索アルゴリズムが、まばらなVCSPの短い上昇を常に見つけ、追跡するという意味ではない。 loglog treedepthではsuperpolynomial ascentsが存在し、polylog treedepthでは、すべてのAscentsがsuperpolynomialである初期割り当てがある。これらの結果は、スパースVCSPの研究が、効率的な局所探索の障壁を理解するのに役立つことを示唆している。 Sometimes local search algorithms cannot efficiently find even local peaks. To understand why, I look at the structure of ascents in fitness landscapes from valued constraint satisfaction problems (VCSPs). Given a VCSP with a constraint graph of treedepth $d$, I prove that from any initial assignment there always exists an ascent of length $2^{d + 1} \cdot n$ to a local peak. This means that short ascents always exist in fitness landscapes from constraint graphs of logarithmic treedepth, and thus also for all VCSPs of bounded treewidth. But this does not mean that local search algorithms will always find and follow such short ascents in sparse VCSPs. I show that with loglog treedepth, superpolynomial ascents exist; and for polylog treedepth, there are initial assignments from which all ascents are superpolynomial. Together, these results suggest that the study of sparse VCSPs can help us better understand the barriers to efficient local search.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 離散確率型ニューラルネットワークの校正について On Measuring Calibration of Discrete Probabilistic Neural Networks ( http://arxiv.org/abs/2405.12412v1 ) ライセンス: Link先を確認	Spencer Young, Porter Jenkins,	(参考訳) 機械学習システムが現実世界のアプリケーションにますます統合されるにつれて、その安全性、堅牢性、信頼性を高める上では、不確実性を正確に表現することが不可欠である。高次元確率分布を最大極大で適合させるニューラルネットワークの訓練は、不確実性定量化の有効な方法となっている。しかし、そのようなモデルはしばしばキャリブレーションが悪く、自信過剰な予測につながった。 expected Calibration Error (ECE) や Negative Log Likelihood (NLL) のような従来のメトリクスには、バイアスやパラメトリック仮定などの制限がある。本稿では,これらのバイアスや仮定を伴わずにキャリブレーションの誤差を測定するために,条件付きカーネル平均埋め込みを用いた新しい手法を提案する。合成データに関する予備的な実験は、この方法の可能性を示し、より複雑な応用に向けた今後の研究が計画されている。 As machine learning systems become increasingly integrated into real-world applications, accurately representing uncertainty is crucial for enhancing their safety, robustness, and reliability. Training neural networks to fit high-dimensional probability distributions via maximum likelihood has become an effective method for uncertainty quantification. However, such models often exhibit poor calibration, leading to overconfident predictions. Traditional metrics like Expected Calibration Error (ECE) and Negative Log Likelihood (NLL) have limitations, including biases and parametric assumptions. This paper proposes a new approach using conditional kernel mean embeddings to measure calibration discrepancies without these biases and assumptions. Preliminary experiments on synthetic data demonstrate the method's potential, with future work planned for more complex applications.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# 低リソース言語ファミリに対するターゲット型多言語適応 Targeted Multilingual Adaptation for Low-resource Language Families ( http://arxiv.org/abs/2405.12413v1 ) ライセンス: Link先を確認	C. M. Downey, Terra Blevins, Dhwani Serai, Dwija Parikh, Shane Steinert-Threlkeld,	(参考訳) マルチリンガルモデルの「大規模マルチリンガル」トレーニングは、どの言語でも実用性を制限することが知られており、低リソース言語では特に不十分である。しかし、低リソース言語は、モデルが密接な関係のある言語で訓練されるターゲットの多言語性から恩恵を受けることができるという証拠がある。このアプローチをより厳密にテストするために,事前学習されたモデルを言語系に適用するためのベストプラクティスを体系的に研究する。テストケースとしてUralicファミリに着目し、XLM-Rを様々な構成でモデル15言語に適応させ、2つの下流タスクと11の評価言語でそれぞれの実験環境の性能を評価する。適応モデルは単言語および多言語ベースラインを大きく上回る。さらに、ハイパーパラメータ効果の回帰分析により、適応語彙のサイズは低リソース言語では比較的重要ではなく、低リソース言語は高リソース言語の性能をほとんど損なうことなくトレーニング中に積極的にアップサンプリングできることが明らかになった。これらの結果から,ターゲット設定で言語適応を行うための新たなベストプラクティスが導入された。 The "massively-multilingual" training of multilingual models is known to limit their utility in any one language, and they perform particularly poorly on low-resource languages. However, there is evidence that low-resource languages can benefit from targeted multilinguality, where the model is trained on closely related languages. To test this approach more rigorously, we systematically study best practices for adapting a pre-trained model to a language family. Focusing on the Uralic family as a test case, we adapt XLM-R under various configurations to model 15 languages; we then evaluate the performance of each experimental setting on two downstream tasks and 11 evaluation languages. Our adapted models significantly outperform mono- and multilingual baselines. Furthermore, a regression analysis of hyperparameter effects reveals that adapted vocabulary size is relatively unimportant for low-resource languages, and that low-resource languages can be aggressively up-sampled during training at little detriment to performance in high-resource languages. These results introduce new best practices for performing language adaptation in a targeted setting.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# GeoMask3D:3Dにおける自己教師付きポイントクラウド学習のための幾何学的インフォームドマスク選択 GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D ( http://arxiv.org/abs/2405.12419v1 ) ライセンス: Link先を確認	Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers,	(参考訳) 我々は,Masked Auto Encoders (MAE) の効率を高めるために,GeoMask3D (GM3D) と呼ばれる幾何学的に情報を得たマスク選択戦略を用いて,点雲に対する自己教師型学習の先駆的アプローチを導入する。従来のランダムマスキング法とは異なり,本手法では教師学生モデルを用いて,データ内の複雑な領域に焦点をあてる。この戦略は、より厳しいパッチに集中することでより堅牢な特徴表現が得られるという仮説に基づいている。また,特徴量情報から包括的コンテキストを用いた幾何学的複雑性の予測を導くために,完全-部分的特徴量レベルの知識蒸留手法を提案する。大規模実験により,本手法がSOTA(State-Of-The-Art)ベースラインよりも優れていることが確認された。 We introduce a pioneering approach to self-supervised learning for point clouds, employing a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Auto Encoders (MAE). Unlike the conventional method of random masking, our technique utilizes a teacher-student model to focus on intricate areas within the data, guiding the model's focus toward regions with higher geometric complexity. This strategy is grounded in the hypothesis that concentrating on harder patches yields a more robust feature representation, as evidenced by the improved performance on downstream tasks. Our method also presents a complete-to-partial feature-level knowledge distillation technique designed to guide the prediction of geometric complexity utilizing a comprehensive context from feature-level information. Extensive experiments confirm our method's superiority over State-Of-The-Art (SOTA) baselines, demonstrating marked improvements in classification, and few-shot tasks.	翻訳日:2024-05-22 14:47:55 公開日:2024-05-20
# GarmentDreamer: 多様な幾何学とテクスチャを具備した3DGSガイド型ガーメント合成 GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details ( http://arxiv.org/abs/2405.12420v1 ) ライセンス: Link先を確認	Boqian Li, Xuan Li, Ying Jiang, Tianyi Xie, Feng Gao, Huamin Wang, Yin Yang, Chenfanfu Jiang,	(参考訳) 伝統的な3D衣料は、スケッチ、モデリング、紫外線マッピング、テクスチャなど、時間と費用がかかる労働集約型である。拡散に基づく生成モデルの最近の進歩は、テキストプロンプト、画像、ビデオから3D衣服を生成する新しい可能性を可能にしている。しかし、既存の方法は多視点画像の不整合に悩まされるか、下層の人間モデルから布を分離するために追加のプロセスを必要とする。本稿では,GarmentDreamerを提案する。GarmentDreamerは3Dガウス・スプレイティング(GS)を利用して,テキストプロンプトからウェアラブルでシミュレーション可能な3D衣料メッシュを生成する手法である。生成モデルによって直接予測されるマルチビュー画像をガイダンスとして使用するのとは対照的に、3DGSのガイダンスは、衣服の変形とテクスチャ合成の両方において一貫した最適化を保証する。本手法では,通常およびRGBA情報によってガイドされる新しい衣服拡張モジュールを導入し,暗黙のニューラルネットワーク場(NeTF)とスコア蒸留サンプリング(SDS)を組み合わせて,多様な幾何学的・テクスチャ的詳細を生成する。我々は,GarmentDreamerの最先端技術よりも優れた性能を示す総合的質的,定量的実験により,本手法の有効性を検証した。私たちのプロジェクトページは、https://xuan-li.github.io/GarmentDreamerDemo/.com/で利用可能です。 Traditional 3D garment creation is labor-intensive, involving sketching, modeling, UV mapping, and texturing, which are time-consuming and costly. Recent advances in diffusion-based generative models have enabled new possibilities for 3D garment generation from text prompts, images, and videos. However, existing methods either suffer from inconsistencies among multi-view images or require additional processes to separate cloth from the underlying human model. In this paper, we propose GarmentDreamer, a novel method that leverages 3D Gaussian Splatting (GS) as guidance to generate wearable, simulation-ready 3D garment meshes from text prompts. In contrast to using multi-view images directly predicted by generative models as guidance, our 3DGS guidance ensures consistent optimization in both garment deformation and texture synthesis. Our method introduces a novel garment augmentation module, guided by normal and RGBA information, and employs implicit Neural Texture Fields (NeTF) combined with Score Distillation Sampling (SDS) to generate diverse geometric and texture details. We validate the effectiveness of our approach through comprehensive qualitative and quantitative experiments, showcasing the superior performance of GarmentDreamer over state-of-the-art alternatives. Our project page is available at: https://xuan-li.github.io/GarmentDreamerDemo/.	翻訳日:2024-05-22 14:38:05 公開日:2024-05-20
# オフラインリワード学習のための統一線形プログラミングフレームワーク A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback ( http://arxiv.org/abs/2405.12421v1 ) ライセンス: Link先を確認	Kihyun Kim, Jiawei Zhang, Pablo A. Parrilo, Asuman Ozdaglar,	(参考訳) Inverse Reinforcement Learning (IRL) と Reinforcement Learning from Human Feedback (RLHF) は報酬学習において重要な方法論であり、人間の実演とフィードバックに基づいて、連続的な意思決定問題の報酬関数を推論・形成する。報奨学習におけるほとんどの以前の作業は、決定や選好モデルに関する事前の知識や仮定に依存しており、堅牢性の問題につながる可能性がある。そこで本研究では,オフライン報酬学習に適した新しい線形プログラミング(LP)フレームワークを提案する。本フレームワークは,オンライン探索を使わずに事前に収集した軌道を用いて,設計したLPの一次双対最適条件から設定した有望な報酬を推定し,提案可能なサンプル効率の最適性保証を提供する。我々のLPフレームワークはまた、計算的トラクタビリティとサンプル効率を維持しながら、ペアの軌道比較データなど、報酬関数を人間のフィードバックと整合させることができる。解析例と数値実験により,従来の最大推定法(MLE)と比較して,本フレームワークは性能が向上する可能性が示唆された。 Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments.	翻訳日:2024-05-22 14:38:05 公開日:2024-05-20
# 深層学習とGoogle Earth Engineを組み合わせた都市水抽出手法 An Urban Water Extraction Method Combining Deep Learning and Google Earth Engine ( http://arxiv.org/abs/1912.10726v2 ) ライセンス: Link先を確認	Yudie Wang, Zhiwei Li, Chao Zeng, Gui-Song Xia, Huanfeng Shen,	(参考訳) 都市水は都市生態系にとって重要である。リモートセンシングデータによる都市水の高精度かつ効率的な検出は、都市管理と計画にとって非常に重要である。本稿では,Google Earth Engine (GEE) とマルチスケール畳み込みニューラルネットワーク (MSCNN) を組み合わせてランドサット画像から都市水を抽出する方法を提案する。すなわち,MSCNNの訓練はオフラインで完了し,MSCNNの訓練パラメータを用いて都市水抽出のプロセスがGEE上で実施された。 OTOPは、GEEとCNNのそれぞれの利点をフルに活用し、GEE上でのディープラーニングメソッドの使用をより柔軟にする。データダウンロードや保存を必要とせずに、利用可能な衛星画像の処理が可能であり、都市水抽出の全体的な性能も、修正された正規化差水指数(MNDWI)やランダム森林よりも高い。長春,武漢,久明,広州では,都市水抽出のカッパ,F1スコア,結合(IoU)の平均値が0.924,0.930,0.869に達した。また、中国の他の主要都市で拡張された検証の結果、OTOPは堅牢であり、MSCNNの構造設計と訓練の恩恵を受ける様々な種類の都市水抽出に使用できることが示された。したがって,OTOPは都市化の背景にある大規模・長期の都市水変化検出研究に特に適している。 Urban water is important for the urban ecosystem. Accurate and efficient detection of urban water with remote sensing data is of great significance for urban management and planning. In this paper, we proposed a new method to combine Google Earth Engine (GEE) with multiscale convolutional neural network (MSCNN) to extract urban water from Landsat images, which is summarized as offline training and online prediction (OTOP). That is, the training of MSCNN was completed offline, and the process of urban water extraction was implemented on GEE with the trained parameters of MSCNN. The OTOP can give full play to the respective advantages of GEE and CNN, and make the use of deep learning method on GEE more flexible. It can process available satellite images with high performance without data download and storage, and the overall performance of urban water extraction is also higher than that of the modified normalized difference water index (MNDWI) and random forest. The mean kappa, F1-score and intersection over union (IoU) of urban water extraction with the OTOP in Changchun, Wuhan, Kunming and Guangzhou reached 0.924, 0.930 and 0.869, respectively. The results of the extended validation in the other major cities of China also show that the OTOP is robust and can be used to extract different types of urban water, which benefits from the structural design and training of the MSCNN. Therefore, the OTOP is especially suitable for the study of large-scale and long-term urban water change detection in the background of urbanization.	翻訳日:2024-05-22 03:18:46 公開日:2024-05-20
# 公正なアクティブラーニング:保険におけるラベル付け問題の解決 Fair Active Learning: Solving the Labeling Problem in Insurance ( http://arxiv.org/abs/2112.09466v4 ) ライセンス: Link先を確認	Romuald Elie, Caroline Hillairet, François Hu, Marc Juillard,	(参考訳) 本稿では,保険業界における機械学習モデルの普及に伴う大きな障害に対処する。最初の課題は、ラベルのないデータを保険で効果的に活用し、ラベル付けの労力を減らし、アクティブな学習技術によるデータ関連性を強調することである。本報告では, 各種アクティブラーニングサンプリング手法について検討し, 総合的および実保険データセットに与える影響について検討する。この分析は、機械学習モデルが基礎となるデータに見られるバイアスや差別を再現する可能性があるため、公正なモデル推論を達成することの難しさを強調している。このような相互接続型課題に対処するために,本研究では,革新的なフェアアクティブラーニング手法を提案する。提案手法は, モデル予測性能と公正性とのバランスが良好であることを, 保険データセットの数値実験で確認した。 This paper addresses significant obstacles that arise from the widespread use of machine learning models in the insurance industry, with a specific focus on promoting fairness. The initial challenge lies in effectively leveraging unlabeled data in insurance while reducing the labeling effort and emphasizing data relevance through active learning techniques. The paper explores various active learning sampling methodologies and evaluates their impact on both synthetic and real insurance datasets. This analysis highlights the difficulty of achieving fair model inferences, as machine learning models may replicate biases and discrimination found in the underlying data. To tackle these interconnected challenges, the paper introduces an innovative fair active learning method. The proposed approach samples informative and fair instances, achieving a good balance between model predictive performance and fairness, as confirmed by numerical experiments on insurance datasets.	翻訳日:2024-05-22 01:31:04 公開日:2024-05-20
# IT5: イタリア語の理解と生成のためのテキストからテキストへの事前学習 IT5: Text-to-text Pretraining for Italian Language Understanding and Generation ( http://arxiv.org/abs/2203.03759v2 ) ライセンス: Link先を確認	Gabriele Sarti, Malvina Nissim,	(参考訳) イタリアで事前訓練されたエンコーダ・デコーダ・トランスフォーマーモデルの最初のファミリーであるIT5を紹介する。我々は,大規模なイタリアのコーパスに対して徹底的なクリーニング手順を文書化し,それを用いて4つのIT5モデルサイズを事前訓練する。次に、ItaGenベンチマークを紹介します。これは、イタリア語に対する幅広い自然言語理解および生成タスクを含み、IT5モデルと多言語ベースラインのパフォーマンスを評価するためにそれを使用します。モノリンガルなIT5モデルは、テスト対象のモデル間で最高のスケールとパフォーマンスの比率を提供し、一貫してマルチリンガルなモデルよりも優れたパフォーマンスを提供し、新たな最先端のイタリア語生成を実現しています。 We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the performance of IT5 models and multilingual baselines. We find monolingual IT5 models to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for Italian language generation.	翻訳日:2024-05-22 01:31:04 公開日:2024-05-20
# 逆問題に対するマニフォールド制約を用いた拡散モデルの改善 Improving Diffusion Models for Inverse Problems using Manifold Constraints ( http://arxiv.org/abs/2206.00941v3 ) ライセンス: Link先を確認	Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, Jong Chul Ye,	(参考訳) 近年、拡散モデルは、サンプリングプロセスに適切な修正を加えることなく、教師なしの方法で様々な逆問題を解くために使用されている。しかし、逆拡散ステップを逐次適用した現在の解法は、射影に基づく測定一貫性ステップを伴って、しばしば準最適結果を生成する。生成的サンプリングパスを調べることで、現在の解法がサンプルパスをデータ多様体から捨てることを示し、したがってエラーが蓄積される。これを解決するために、多様体の制約に着想を得た追加の補正項を提案する。提案された多様体制約は、数行のコードで簡単に実装できるが、驚くほど大きなマージンでパフォーマンスを向上する。広汎な実験により,本手法は理論的にも経験的にも従来の手法よりも優れており,画像インペインティング,カラー化,スパースビューCTなどの多くの応用において有望な結果が得られた。 Code available https://github.com/HJ-harry/MCG_diffusion Recently, diffusion models have been used to solve various inverse problems in an unsupervised manner with appropriate modifications to the sampling process. However, the current solvers, which recursively apply a reverse diffusion step followed by a projection-based measurement consistency step, often produce suboptimal results. By studying the generative sampling path, here we show that current solvers throw the sample path off the data manifold, and hence the error accumulates. To address this, we propose an additional correction term inspired by the manifold constraint, which can be used synergistically with the previous solvers to make the iterations close to the manifold. The proposed manifold constraint is straightforward to implement within a few lines of code, yet boosts the performance by a surprisingly large margin. With extensive experiments, we show that our method is superior to the previous methods both theoretically and empirically, producing promising results in many applications such as image inpainting, colorization, and sparse-view computed tomography. Code available https://github.com/HJ-harry/MCG_diffusion	翻訳日:2024-05-22 01:31:04 公開日:2024-05-20
# ブロックチェーンベースのセキュアエネルギーマーケットプレーススキームによるピアからピアマイクログリッドへのモチベーション Blockchain based Secure Energy Marketplace Scheme to Motivate Peer to Peer Microgrids ( http://arxiv.org/abs/2206.07248v3 ) ライセンス: Link先を確認	Muhammad Awais, Qamar Abbas, Shehbaz Tariq, Sayyaf Haider Warraich,	(参考訳) ここ数年、ピーク時のコスト削減のため、マイクログリッドのトレンドは非常に急速に増加している。しかし、これらのシステムでは、サードパーティは依然として余剰エネルギーの販売に関与している。その結果、エネルギーコストが増大し、そのようなシステムには多くの運用およびセキュリティ障壁が存在する。これらの問題は、コンシューマが余剰エネルギーを他のコンシューマにローカルに販売できる、分散化されたマイクログリッドの分散システムによって解決できる。このようなシステムをデプロイするには、エネルギーの取引に対するセキュリティ障壁を考慮する必要がある。そこで,本稿では,ユーザが相互に交流し,より良い利率でエネルギーを売買し,リース時にエネルギー資源を得るマーケットプレースとしてのスキームを考案し,資本投資の心配をしなくて済むようにすることで,これらの問題を解決することを提案する。リソースの所有者とコンシューマの合意は、ブロックチェーンベースのスマートコントラクトに基づいて記録される。本稿では、既存のよく知られた分散型エネルギーソリューションに対する調査を行う。また,保護された実行環境を活用するための余分なセキュリティ層を提案し,システムに侵入しても,コンシューマやサードパーティが生成し,利用し,共有するエネルギー情報を変更できないようにした。 In the past years trend of microgrids is increasing very fast to reduce peak-hour costs. However, in these systems, third parties are still involved in selling surplus energy. This results in increased cost of energy and there are many operational and security barriers in such systems. These issues can be solved by the decentralized distributed system of microgrids where a consumer can locally sell their surplus energy to another consumer. To deploy such a system, one must consider security barriers for the transaction of energy. This paper proposes a solution to these problems by devising a scheme as a marketplace where users interact with each other to buy and sell energy at better rates and get energy-generating resources on lease so that users do not have to worry about capital investment. Agreement between owner of resources and consumer is recorded on blockchain based smart contracts. In this paper, a survey is performed for existing well known, decentralized energy solutions. This paper also proposes an extra layer of security to leverage a shielded execution environment so that information of energy generated, utilized, and shared cannot be changed by consumers and third parties even if the system is compromised.	翻訳日:2024-05-22 01:31:04 公開日:2024-05-20
# レーンマーキングを用いた地理参照のためのロバストなセルフチューニングデータアソシエーション Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings ( http://arxiv.org/abs/2207.14042v2 ) ライセンス: Link先を確認	Miguel Ángel Muñoz-Bañón, Jan-Hendrik Pauls, Haohao Hu, Christoph Stiller, Francisco A. Candelas, Fernando Torres,	(参考訳) 航空画像に基づく地図のローカライゼーションは、グローバルな一貫性、ジオリファレンスマップ、パブリックアクセス可能なデータなど、多くの利点がある。しかし、空中画像と搭載センサーの両方から観測できるランドマークは限られている。これは、データアソシエーションのあいまいさやエイリアスにつながる。本稿では,高情報化表現(効率的なデータアソシエーションを可能にする)に基づいて,これらの曖昧性を解決するための完全なパイプラインを提案する。その中核は、測定のエントロピーに応じて探索領域に適応する堅牢な自己調整データアソシエーションである。さらに、最終的な結果を円滑にするために、データアソシエーションプロセスによって生成された相対変換の関数として、関連データの情報行列を調整する。ドイツ・カールスルーエ市周辺の都市・農村のシナリオを実データとして評価した。我々は、最先端のアウトリア緩和手法と自己調整アプローチを比較し、特に郊外のシナリオにおいて、大幅な改善を示す。 Localization in aerial imagery-based maps offers many advantages, such as global consistency, geo-referenced maps, and the availability of publicly accessible data. However, the landmarks that can be observed from both aerial imagery and on-board sensors is limited. This leads to ambiguities or aliasing during the data association. Building upon a highly informative representation (that allows efficient data association), this paper presents a complete pipeline for resolving these ambiguities. Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements. Additionally, to smooth the final result, we adjust the information matrix for the associated data as a function of the relative transform produced by the data association process. We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany. We compare state-of-the-art outlier mitigation methods with our self-tuning approach, demonstrating a considerable improvement, especially for outer-urban scenarios.	翻訳日:2024-05-22 01:31:04 公開日:2024-05-20
# Hyperloop:サイバーセキュリティの展望 Hyperloop: A Cybersecurity Perspective ( http://arxiv.org/abs/2209.03095v3 ) ライセンス: Link先を確認	Alessandro Brighente, Mauro Conti, Denis Donadel, Federico Turrin,	(参考訳) ハイパーループは将来の交通システムの中でも最も有名なものの一つである。持続可能性を確保しつつ、最高速度1220km/hの走行を可能にする新しい技術を含んでいる。システムのパフォーマンス要件とそれが表す重要なインフラストラクチャのため、その安全性とセキュリティを慎重に検討する必要がある。輸送システムでは、サイバー攻撃は人口と周辺環境にとって破滅的な結果をもたらす安全上の問題を引き起こす可能性がある。現在まで、ハイパーループ技術のサイバーセキュリティに関する研究は行われていない。本稿では,Hyperloopエコシステムのさまざまなコンポーネント間の相互接続におけるサイバーセキュリティの課題を初めて分析する。私たちは、現在利用可能なHyperloopの実装に基づいて分析を行い、最終的な設計で見られるであろうこれらの機能を精査しています。さらに,インフラ管理のアプローチとそのセキュリティ問題についても検討する。最後に,ハイパーループ設計の安全性に対する対策と今後の方向性について論じる。 Hyperloop is among the most prominent future transportation systems. It involves novel technologies to allow traveling at a maximum speed of 1220km/h while guaranteeing sustainability. Due to the system's performance requirements and the critical infrastructure it represents, its safety and security must be carefully considered. In transportation systems, cyberattacks could lead to safety issues with catastrophic consequences for the population and the surrounding environment. To this day, no research investigated the cybersecurity issues of the Hyperloop technology. In this paper, we provide the first analysis of the cybersecurity challenges of the interconnections between the different components of the Hyperloop ecosystem. We base our analysis on the currently available Hyperloop implementations, distilling those features that will likely be present in its final design. Moreover, we investigate possible infrastructure management approaches and their security concerns. Finally, we discuss countermeasures and future directions for the security of the Hyperloop design.	翻訳日:2024-05-22 01:20:28 公開日:2024-05-20
# 一般雑音逆問題に対する拡散後方サンプリング法 Diffusion Posterior Sampling for General Noisy Inverse Problems ( http://arxiv.org/abs/2209.14687v4 ) ライセンス: Link先を確認	Hyungjin Chung, Jeongsol Kim, Michael T. Mccann, Marc L. Klasky, Jong Chul Ye,	(参考訳) 拡散モデルは最近、高品質な再構成と既存の反復解法を組み合わせることの容易さから、強力な逆問題解法として研究されている。しかし、ほとんどの研究は、ノイズのない環境での単純な線形逆問題の解決に重点を置いている。本研究では,拡散解法を拡張し,後方サンプリングの近似を用いて一般雑音(非線形逆問題)を効率的に処理する。興味深いことに、得られた後続サンプリング方式は、厳密な測定整合性予測ステップを伴わずに、多様体拘束勾配の拡散サンプリングのブレンド版であり、以前の研究と比べてノイズの多い設定でより望ましい生成経路が得られる。拡散モデルではガウシアンやポアソンのような様々な計測ノイズ統計を組み込むことができ、フーリエ位相探索や不均一な振れといった非線形逆問題も効率的に処理できることを示す。 https://github.com/DPS2022/diffusion-posterior-sampling Diffusion models have been recently studied as powerful generative inverse problem solvers, owing to their high quality reconstructions and the ease of combining existing iterative solvers. However, most works focus on solving simple linear inverse problems in noiseless settings, which significantly under-represents the complexity of real-world problems. In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via approximation of the posterior sampling. Interestingly, the resulting posterior sampling scheme is a blended version of diffusion sampling with the manifold constrained gradient without a strict measurement consistency projection step, yielding a more desirable generative path in noisy settings compared to the previous studies. Our method demonstrates that diffusion models can incorporate various measurement noise statistics such as Gaussian and Poisson, and also efficiently handle noisy nonlinear inverse problems such as Fourier phase retrieval and non-uniform deblurring. Code available at https://github.com/DPS2022/diffusion-posterior-sampling	翻訳日:2024-05-22 01:20:28 公開日:2024-05-20
# ティール: WAN交通工学の学習促進最適化 Teal: Learning-Accelerated Optimization of WAN Traffic Engineering ( http://arxiv.org/abs/2210.13763v4 ) ライセンス: Link先を確認	Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu,	(参考訳) グローバルクラウドワイドエリアネットワーク(WAN)の急速な拡張は、商用最適化エンジンが大規模なネットワークトラフィックエンジニアリング(TE)問題を効率的に解決する上で、課題となっている。既存の高速化戦略では、TE最適化を並列サブプロブレムに分解するが、実行時間とアロケーション性能の本質的にのトレードオフにより、制限された並列性を実現する。本稿では,GPUの並列処理能力を活用してTE制御を高速化する学習型TEアルゴリズムTealを提案する。まず、Tealはフロー中心グラフニューラルネットワーク(GNN)を設計し、WAN接続とネットワークフローをキャプチャし、下流アロケーションへのインプットとしてのフロー特徴を学習する。第2に,問題スケールを小さくし,学習を容易なものにするため,中央のTE目標を最適化しながら,各交通需要を独立的に割り当てるマルチエージェント強化学習(RL)アルゴリズムを用いる。最後に,ADMM(Alternating Direction Method of Multipliers)を用いたTeal Fine-tunesアロケーションは,過利用リンクなどの制約違反を低減するために,高度に並列化可能な最適化アルゴリズムである。 MicrosoftのWANのトラフィック行列を用いたTealの評価を行った。 1,700ノードの大規模なWANトポロジでは、Tealは、プロダクション最適化エンジンよりも数桁高速に動作しながら、ほぼ最適なフローアロケーションを生成する。他のTE加速方式と比較して、Tealは需要を6～32%増やし、197～625倍のスピードアップを達成している。 The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. First, Teal designs a flow-centric graph neural network (GNN) to capture WAN connectivity and network flows, learning flow features as inputs to downstream allocation. Second, to reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand while optimizing a central TE objective. Finally, Teal fine-tunes allocations with ADMM (Alternating Direction Method of Multipliers), a highly parallelizable optimization algorithm for reducing constraint violations such as overutilized links. We evaluate Teal using traffic matrices from Microsoft's WAN. On a large WAN topology with >1,700 nodes, Teal generates near-optimal flow allocations while running several orders of magnitude faster than the production optimization engine. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.	翻訳日:2024-05-22 01:20:28 公開日:2024-05-20
# RulE: ルール埋め込みによる知識グラフの推論 RulE: Knowledge Graph Reasoning with Rule Embedding ( http://arxiv.org/abs/2210.14905v3 ) ライセンス: Link先を確認	Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, Muhan Zhang,	(参考訳) 知識グラフ推論(KG)は知識グラフにとって重要な問題である。本稿では,論理的ルールを効果的に活用し,KG推論を強化するために,‘textbf{RulE}({Rul}e {E}mbedding)’という新奇で原則化されたフレームワークを提案する。知識グラフ埋め込み (KGE) 法とは異なり、RulE は既存の三重項と一階述語 {rules} から規則埋め込みを学習し、統合埋め込み空間において \textbf{entities} 、 \textbf{relations} および \textbf{logical rules} を共同で表現する。学習したルールの埋め込みに基づいて、各ルールに対する信頼スコアを計算し、観察された三重項との整合性を反映する。これにより、論理規則推論をソフトな方法で実行し、論理の脆さを軽減することができる。一方、RulEは事前の論理ルール情報を埋め込み空間に注入し、エンティティ/リレーショナル埋め込みを豊かにし、規則化する。これにより、KGE単独のパフォーマンスも向上する。 RulEは概念的にはシンプルで、経験的に有効です。我々はRulEの各成分を検証するために広範な実験を行う。複数のベンチマークの結果、我々のモデルは既存の埋め込みベースのアプローチやルールベースのアプローチよりも優れています。 Knowledge graph (KG) reasoning is an important problem for knowledge graphs. In this paper, we propose a novel and principled framework called \textbf{RulE} (stands for {Rul}e {E}mbedding) to effectively leverage logical rules to enhance KG reasoning. Unlike knowledge graph embedding (KGE) methods, RulE learns rule embeddings from existing triplets and first-order {rules} by jointly representing \textbf{entities}, \textbf{relations} and \textbf{logical rules} in a unified embedding space. Based on the learned rule embeddings, a confidence score can be calculated for each rule, reflecting its consistency with the observed triplets. This allows us to perform logical rule inference in a soft way, thus alleviating the brittleness of logic. On the other hand, RulE injects prior logical rule information into the embedding space, enriching and regularizing the entity/relation embeddings. This makes KGE alone perform better too. RulE is conceptually simple and empirically effective. We conduct extensive experiments to verify each component of RulE. Results on multiple benchmarks reveal that our model outperforms the majority of existing embedding-based and rule-based approaches.	翻訳日:2024-05-22 01:20:28 公開日:2024-05-20
# 共変量シフトの祝福と曲線--対数学習ダイナミクス,方向収束,平衡 Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria ( http://arxiv.org/abs/2212.02457v3 ) ライセンス: Link先を確認	Tengyuan Liang,	(参考訳) 共変量分布の変化と対向摂動は従来の統計学習フレームワークに頑健な課題をもたらす: テスト共変量分布の軽微な変化は、トレーニング分布に基づいて学習された統計モデルの性能に大きく影響する。モデルの性能は、外挿が発生すると劣化する。すなわち、トレーニング分布が不足している領域へのシフトを共変させ、当然、学習されたモデルには情報はほとんどない。頑健性や正規化を考慮し, 対向的摂動法を治療として提案するが, 学習モデルにより, 対向的共変量シフトがどの外挿領域に焦点を絞るかについて, 慎重に検討する必要がある。本稿では,無限次元環境下での回帰と分類の両面から,外挿領域を正確に特徴づける。逐次ゲームフレームワークにおける平衡モデル(ベイズ最適モデル)のその後の学習に対する逆共変量シフトの影響について検討する。対戦型学習ゲームのダイナミクスを生かし、平衡学習と実験設計への共変量シフトの好奇効果を明らかにする。特に, 回帰の祝福, 指数率の逆共変量シフト, 迅速な学習のための最適実験設計への変化, 分類の呪い, 逆共変量シフトの準4次速度へのシフト, 続く学習をトラップする最も難しい実験設計への2つの方向性収束結果を確立する。 Covariate distribution shifts and adversarial perturbations present robustness challenges to the conventional statistical learning framework: mild shifts in the test covariate distribution can significantly affect the performance of the statistical model learned based on the training distribution. The model performance typically deteriorates when extrapolation happens: namely, covariates shift to a region where the training distribution is scarce, and naturally, the learned model has little information. For robustness and regularization considerations, adversarial perturbation techniques are proposed as a remedy; however, careful study needs to be carried out about what extrapolation region adversarial covariate shift will focus on, given a learned model. This paper precisely characterizes the extrapolation region, examining both regression and classification in an infinite-dimensional setting. We study the implications of adversarial covariate shifts to subsequent learning of the equilibrium -- the Bayes optimal model -- in a sequential game framework. We exploit the dynamics of the adversarial learning game and reveal the curious effects of the covariate shift to equilibrium learning and experimental design. In particular, we establish two directional convergence results that exhibit distinctive phenomena: (1) a blessing in regression, the adversarial covariate shifts in an exponential rate to an optimal experimental design for rapid subsequent learning; (2) a curse in classification, the adversarial covariate shifts in a subquadratic rate to the hardest experimental design trapping subsequent learning.	翻訳日:2024-05-22 01:20:28 公開日:2024-05-20
# 生レーダフレーム上でのオンライン物体検出のための繰り返しCNN A recurrent CNN for online object detection on raw radar frames ( http://arxiv.org/abs/2212.11172v3 ) ライセンス: Link先を確認	Colin Decourt, Rufin VanRullen, Didier Salle, Thomas Oberlin,	(参考訳) 自動車用レーダーセンサーは、高度運転支援システム(ADAS)に貴重な情報を提供する。レーダーは、天気や光条件に関わらず、物体からの距離と相対速度を確実に推定することができる。しかし、レーダーセンサーは、低解像度で、物体の形状のクラス内における大きな変化に悩まされている。時間情報(例えば、複数のフレーム)の爆発は、オブジェクトのダイナミクスをよりよく捉え、したがってオブジェクトの形状の変化を捉えるのに役立つことが示されている。ほとんどの時間的レーダー物体検出器は空間的および時間的情報を学ぶために3D畳み込みを使用する。しかし、これらの手法はしばしば非因果的であり、リアルタイムアプリケーションには適さない。本稿では,オンラインレーダオブジェクト検出のための新しいCNNアーキテクチャであるRECORDを紹介する。本稿では,コンボリューションとConvLSTMを混合したエンドツーエンドのトレーニング可能なアーキテクチャを提案し,逐次フレーム間の時空間依存性を学習する。我々のモデルは因果的であり、オブジェクトを検出するためにConvLSTMのメモリに符号化された過去の情報のみを必要とする。本実験は,ROD2021およびCARRADAデータセットにおける異なるレーダ表現(レンジドップラー,レンジアングル)のオブジェクト検出の妥当性を示す。 Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive.	翻訳日:2024-05-22 01:20:28 公開日:2024-05-20
# 量子時間差学習の解析 An Analysis of Quantile Temporal-Difference Learning ( http://arxiv.org/abs/2301.04462v3 ) ライセンス: Link先を確認	Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney,	(参考訳) 大規模強化学習の大規模応用において重要な要素であることが証明された分散強化学習アルゴリズムである量子時間差分学習(QTD)を解析する。これらの経験的成功にもかかわらず、QTDに関する理論的理解はこれまでにも発覚的であることが証明されている。標準的な確率近似ツールで解析できる古典的TD学習とは異なり、QTD更新は縮約写像を近似せず、非常に非線形であり、複数の固定点を持つ。本論文の中核的な結果は、確率 1 で関連する動的プログラミング手順のファミリーの固定点への収束の証明であり、QTD をしっかりとした理論的な足場に配置する。この証明は、確率近似理論と非滑らか解析を通じて、QTDと非線形微分包含物の間の関係を確立する。 We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.	翻訳日:2024-05-22 01:10:44 公開日:2024-05-20
# 変動推論:後部閾値はスパースレジームにおけるネットワーククラスタリング精度を改善する Variational Inference: Posterior Threshold Improves Network Clustering Accuracy in Sparse Regimes ( http://arxiv.org/abs/2301.04771v2 ) ライセンス: Link先を確認	Xuezhen Li, Can M. Le,	(参考訳) 変分推論は機械学習の文献で様々なベイズモデルに適合するために広く用いられている。ネットワーク解析において,この手法はコミュニティ検出問題の解決に成功している。これらの結果は有望であるが、理論上の支持は相対的に密度の高いネットワークに限られており、これは実際のネットワークには当てはまらない仮定である。また, 最近, ばらつき損失面には多くのサドル点があり, 特にスパースネットワークに適用した場合, その性能に深刻な影響を及ぼす可能性があることが示されている。本稿では,各反復後のコミュニティ割り当ての後部をハードしきい値にすることで,変分推論法を改善するための簡易な方法を提案する。提案手法は, ネットワークのノード平均次数が有界であっても, 真のコミュニティラベルを収束し, 正確に復元できることを, 真のコミュニティ割り当てと相関するランダム初期化を用いて示す。大規模な数値研究により、古典的変分推論と別の最先端アルゴリズムに対する提案手法の利点がさらに裏付けられる。 Variational inference has been widely used in machine learning literature to fit various Bayesian models. In network analysis, this method has been successfully applied to solve the community detection problems. Although these results are promising, their theoretical support is only for relatively dense networks, an assumption that may not hold for real networks. In addition, it has been shown recently that the variational loss surface has many saddle points, which may severely affect its performance, especially when applied to sparse networks. This paper proposes a simple way to improve the variational inference method by hard thresholding the posterior of the community assignment after each iteration. Using a random initialization that correlates with the true community assignment, we show that the proposed method converges and can accurately recover the true community labels, even when the average node degree of the network is bounded. Extensive numerical study further confirms the advantage of the proposed method over the classical variational inference and another state-of-the-art algorithm.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# PECAN: バックドア攻撃に対する決定論的認証 PECAN: A Deterministic Certified Defense Against Backdoor Attacks ( http://arxiv.org/abs/2301.11824v4 ) ライセンス: Link先を確認	Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni,	(参考訳) ニューラルネットワークはバックドア中毒の攻撃に弱いため、攻撃者はトレーニングセットを悪意を持って毒殺し、テストインプットにトリガーを挿入して、被害者モデルの予測を変更する。既存のバックドア攻撃の防御は、正式な保証を提供しないか、コスト対計算と非効率な確率的保証を伴わない。 PECANは,バックドア攻撃に対する効果的かつ認証されたアプローチである。 PECANを支えている重要な洞察は、データの不整合分割に基づいてトレーニングされたニューラルネットワークのセットに対して、オフザシェルのテスト時間回避認証技術を適用することである。 PECANを画像分類とマルウェア検出データセットで評価する。以上の結果から,PECANは,(1)防衛力と効率の両面で最先端のバックドアディフェンスを著しく上回り,(2)実際のバックドアアタックでは,文献からのベースラインの範囲と比較して,桁違いに攻撃成功率を低下させることができることが示唆された。 Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# 言語モデルにおけるマルチモーダル・チェーン・オブ・サート推論 Multimodal Chain-of-Thought Reasoning in Language Models ( http://arxiv.org/abs/2302.00923v5 ) ライセンス: Link先を確認	Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola,	(参考訳) 大規模言語モデル (LLM) は複雑な推論において、中間的推論の連鎖を推論する論理として生成するために必要なチェーン・オブ・シント (CoT) を活用することで、顕著な性能を示した。しかし、既存のCoT研究は主に言語モダリティに焦点を当てている。本稿では,言語(テキスト)と視覚(画像)のモダリティを2段階のフレームワークに組み込んだマルチモーダルCoTを提案する。このようにして、回答推論は、マルチモーダル情報に基づくより良い生成論理を活用できる。その結果,ScienceQA と A-OKVQA のベンチマークは,提案手法の有効性を示した。また,Multimodal-CoTでは,ScienceQAベンチマークにおいて,10億パラメータ未満のモデルで最先端のパフォーマンスを実現している。分析の結果,Multimodal-CoTは幻覚を緩和し,収束速度を向上する利点があることがわかった。コードはhttps://github.com/amazon-science/mm-cot.comで公開されている。 Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach. With Multimodal-CoT, our model under 1 billion parameters achieves state-of-the-art performance on the ScienceQA benchmark. Our analysis indicates that Multimodal-CoT offers the advantages of mitigating hallucination and enhancing convergence speed. Code is publicly available at https://github.com/amazon-science/mm-cot.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# SPAN: トランスフォーマーによるシーングラフとイメージの類似性を学ぶ SPAN: Learning Similarity between Scene Graphs and Images with Transformers ( http://arxiv.org/abs/2304.00590v2 ) ライセンス: Link先を確認	Yuren Cong, Wentong Liao, Bodo Rosenhahn, Michael Ying Yang,	(参考訳) シーングラフと画像の類似性を学習することは、シーングラフと画像が与えられた類似度スコアを推定することを目的としている。現在、このタスクに関する研究は行われていないが、シーングラフの生成や下流のアプリケーションには不可欠である。 Recall$@K$と平均Recall$@K$は、人間のラベル付き三重項集合に現れる予測三重項の比率を測定する。しかし、このようなトリプルト指向のメトリクスは、シーングラフと画像の全体的な意味的差異を示すことができず、アノテーションのバイアスやノイズに敏感である。したがって、下流アプリケーションで生成されたシーングラフの使用は制限される。この問題に対処するため,Scene graPh-imAge coNtrastive learning framework, SPANを提案する。我々の新しいフレームワークはグラフ変換器と画像変換器から構成され、シーングラフとその対応する画像を共有潜在空間に配置する。本稿では,シーングラフを構造的エンコーディングを伴うシーケンスに変換する新しいグラフシリアライズ手法を提案する。本稿では,シーングラフ生成のための新しい評価指標として,R-Precision測定画像検索精度を提案する。我々は、Visual GenomeとOpen Imagesデータセットに新しいベンチマークを構築した。シーングラフエンコーダとして大きな可能性を示すSPANの有効性を検証するために,大規模な実験を行った。 Learning similarity between scene graphs and images aims to estimate a similarity score given a scene graph and an image. There is currently no research dedicated to this task, although it is critical for scene graph generation and downstream applications. Scene graph generation is conventionally evaluated by Recall$@K$ and mean Recall$@K$, which measure the ratio of predicted triplets that appear in the human-labeled triplet set. However, such triplet-oriented metrics fail to demonstrate the overall semantic difference between a scene graph and an image and are sensitive to annotation bias and noise. Using generated scene graphs in the downstream applications is therefore limited. To address this issue, for the first time, we propose a Scene graPh-imAge coNtrastive learning framework, SPAN, that can measure the similarity between scene graphs and images. Our novel framework consists of a graph Transformer and an image Transformer to align scene graphs and their corresponding images in the shared latent space. We introduce a novel graph serialization technique that transforms a scene graph into a sequence with structural encodings. Based on our framework, we propose R-Precision measuring image retrieval accuracy as a new evaluation metric for scene graph generation. We establish new benchmarks on the Visual Genome and Open Images datasets. Extensive experiments are conducted to verify the effectiveness of SPAN, which shows great potential as a scene graph encoder.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# EduceLab-Scrolls:X線CTによるHerculaneum Papyriからのテキストの復元 EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT ( http://arxiv.org/abs/2304.02084v4 ) ライセンス: Link先を確認	Stephen Parsons, C. Seth Parker, Christy Chapman, Mami Hayashida, W. Brent Seales,	(参考訳) X線CT画像を用いたHerculaneum papyriの隠れテキストを明らかにするための完全なソフトウェアパイプラインを提案する。この拡張された仮想アンラッピングパイプラインは、機械学習と、3D画像と2D画像をリンクする新しい幾何学的フレームワークを組み合わせる。 EduceLab-Scrollsは、この問題に関する20年間の研究成果を表す包括的オープンデータセットである。 EduceLab-Scrollsには、小さな断片と無傷のロールスクロールの両方のボリュームX線CT画像が含まれている。データセットには、インク検出モデルの教師付きトレーニングに使用される2Dイメージラベルも含まれている。ラベリングは、スクロールフラグメントのスペクトル写真と、同じフラグメントのX線CT画像との整列を可能とし、画像空間とモダリティの間の機械学習可能なマッピングを作成する。このアライメントは、X線CTで「見えない」炭素インクを検出するための教師あり学習を可能にする。私たちの知る限り、このデータセットはこの種のデータセットとしては初めてのもので、遺産ドメインでリリースされた最大のデータセットです。本手法は, スクロール断片のテキスト行の正確な行を, 既知の地底真理で明らかにすることができる。検索されたテキストは、視覚的確認、定量的画像メトリクス、学術的レビューを用いて検証される。 EduceLab-ScrollsはHerculaneum papyriの隠れたテキストを初めて発見した。 EduceLab-Scrollsデータセットは、研究が進むにつれて、より多くのテキスト発見が生成されることを期待しています。 We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# Few Shot Semantic Segmentation: 方法論,ベンチマーク,オープンな課題のレビュー Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges ( http://arxiv.org/abs/2304.05832v2 ) ライセンス: Link先を確認	Nico Catalano, Matteo Matteucci,	(参考訳) セマンティックセグメンテーション(Semantic segmentation)は、自律運転からロボティクスまで、アプリケーションに不可欠なもので、大規模な注釈付きデータセットの収集が困難または違法に高価である領域において、大きな課題に直面している。医学や農業などの文脈では、訓練用画像の不足が進行している。 Few-Shot Semantic Segmentationは、コンピュータビジョンの新しいタスクであり、いくつかの例だけで新しいセマンティッククラスをセグメンテーションできるモデルを設計することを目的としている。本稿では、Few-Shot Semantic Segmentationの総合的な調査からなり、より一般的な条件付きおよびプロトタイプ型ネットワークからよりニッチな遅延空間最適化手法まで、その進化を辿り、様々なモデル設計を探究する。年代記を通して、我々は影響力のある傾向と方法論を識別し、その強さと限界について洞察を与える。時間軸は、視野の進行における重要なマイルストーンを示す、視覚的なロードマップを提供する。この調査は、ベンチマークデータセットの定量的分析と、セミナー作品の質的な展示によって補完され、読者にそのトピックを深く理解させる。現状の課題、最先端のモデル、そして今後の展望を解明することで、研究者や実践者がFew-Shot Semantic Segmentationの複雑さをナビゲートし、将来の発展のための基盤を提供する。 Semantic segmentation, vital for applications ranging from autonomous driving to robotics, faces significant challenges in domains where collecting large annotated datasets is difficult or prohibitively expensive. In such contexts, such as medicine and agriculture, the scarcity of training images hampers progress. Introducing Few-Shot Semantic Segmentation, a novel task in computer vision, which aims at designing models capable of segmenting new semantic classes with only a few examples. This paper consists of a comprehensive survey of Few-Shot Semantic Segmentation, tracing its evolution and exploring various model designs, from the more popular conditional and prototypical networks to the more niche latent space optimization methods, presenting also the new opportunities offered by recent foundational models. Through a chronological narrative, we dissect influential trends and methodologies, providing insights into their strengths and limitations. A temporal timeline offers a visual roadmap, marking key milestones in the field's progression. Complemented by quantitative analyses on benchmark datasets and qualitative showcases of seminal works, this survey equips readers with a deep understanding of the topic. By elucidating current challenges, state-of-the-art models, and prospects, we aid researchers and practitioners in navigating the intricacies of Few-Shot Semantic Segmentation and provide ground for future development.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# 倫理的マルチモーダルシステムに向けて Towards ethical multimodal systems ( http://arxiv.org/abs/2304.13765v3 ) ライセンス: Link先を確認	Alexis Roger, Esma Aïmeur, Irina Rish,	(参考訳) ジェネレーティブAIシステム(ChatGPT、DALL-Eなど)は、アートのRombach氏ら(2021年)からメンタルヘルスのRob Morris氏とKareem Kouddous氏(2022年)まで、私たちの生活のさまざまな領域に拡大しています。 AIアライメントの新たな分野は、AIシステムが人間の価値を反映することを目指している。本稿では,テキストと画像の両方を含むマルチモーダルAIシステムの倫理性を評価することに焦点を当てる。倫理性に対する人間のフィードバックから、まずマルチモーダルな倫理的データベースを作成する。そこで,本データベースを用いて,システム応答の倫理性を自動的に評価するアルゴリズムを開発した。 Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses.	翻訳日:2024-05-22 01:10:43 公開日:2024-05-20
# 多変量トレース不等式による絡み合いモノガミー Entanglement monogamy via multivariate trace inequalities ( http://arxiv.org/abs/2304.14878v2 ) ライセンス: Link先を確認	Mario Berta, Marco Tomamichel,	(参考訳) エントロピー(英: Entropy)は、量子情報理論における基本的な概念であり、絡み合いを定量化し、その性質(例えば、マルチパーティイト系上のモノガミー)を調べることができる。ここでは、多部量子系の制限された測定に基づいて、相対エントロピーの変分式を導出する。これを多変量行列トレース不等式と組み合わせることで、様々な既存の絡み合いモノガミー不等式を回復し、時に強化する。特に,一方向局所演算と古典的通信で測定された絡み合いの相対エントロピーと,それを分離的に測定された絡み合いの相対エントロピーに関連付け,相互情報の条件エントロピーの忠実度とを関連づけることで,行列解析に基づく直接的・行列解析に基づく証明を与える。本稿では, 相対エントロピーを正部分転位状態, マルチパーティイト構成状態の相対エントロピーを用いて, これらの結果の変動について論じる。本研究は,情報理論タスクの漸近的達成性に関する操作的議論を取り入れた文献における過去の導出を簡素化し,一般化した。 Entropy is a fundamental concept in quantum information theory that allows to quantify entanglement and investigate its properties, for example its monogamy over multipartite systems. Here, we derive variational formulas for relative entropies based on restricted measurements of multipartite quantum systems. By combining these with multivariate matrix trace inequalities, we recover and sometimes strengthen various existing entanglement monogamy inequalities. In particular, we give direct, matrix-analysis-based proofs for the faithfulness of squashed entanglement by relating it to the relative entropy of entanglement measured with one-way local operations and classical communication, as well as for the faithfulness of conditional entanglement of mutual information by relating it to the separably measured relative entropy of entanglement. We discuss variations of these results using the relative entropy to states with positive partial transpose, and multipartite setups. Our results simplify and generalize previous derivations in the literature that employed operational arguments about the asymptotic achievability of information-theoretic tasks.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# 時間共有計算資源に関する学習可能性 Learnability with Time-Sharing Computational Resource Concerns ( http://arxiv.org/abs/2305.02217v4 ) ライセンス: Link先を確認	Zhi-Hua Zhou,	(参考訳) 従来の理論的機械学習研究は、一般に、十分に、あるいは無限に供給された計算資源が存在することを明示的または暗黙的に仮定する。しかし、実際には、計算リソースは通常限られており、機械学習のパフォーマンスは、受信したデータの数だけでなく、利用可能な計算リソースの処理量にも依存する。現在の 'intelligent supercomputing'' 施設は、学習性能要求や学習プロセス状態などの重要な要因を考慮して、適応的なスケジューリング戦略を使わずに、一定の量のリソースを機械学習タスクに割り当てる排他的オペレーティングシステムのように機能する。本稿では,機械学習のスループットの概念を導入し,計算資源効率学習(CoRE-Learning)を定義し,学習理論における計算資源の影響を考慮した理論的枠組みを提案する。このフレームワークは、入ってくるデータストリームが圧倒的なサイズで無限に終止符を打つことができるようなストリーム学習に自然に適用することができ、受信したすべてのデータを時間内に処理できると仮定するのは現実的ではない。これはまた、インテリジェントなスーパーコンピュータオペレーティングシステムの設計に対する理論的視点を提供するかもしれない。 Conventional theoretical machine learning studies generally assume explicitly or implicitly that there are enough or even infinitely supplied computational resources. In real practice, however, computational resources are usually limited, and the performance of machine learning depends not only on how many data have been received, but also on how many data can be handled subject to computational resources available. Note that most current ``intelligent supercomputing'' facilities work like exclusive operating systems, where a fixed amount of resources are allocated to a machine learning task without adaptive scheduling strategies considering important factors such as the learning performance demands and learning process status. In this article, we introduce the notion of machine learning throughput, define Computational Resource Efficient Learning (CoRE-Learning), and present a theoretical framework that takes into account the influence of computational resources in learning theory. This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless with overwhelming size and it is impractical to assume that all received data can be handled in time. It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# 最適化アルゴリズム、リャプノフ関数、微分方程式の接続について:理論と洞察 On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights ( http://arxiv.org/abs/2305.08658v3 ) ライセンス: Link先を確認	Paul Dobson, Jesus Maria Sanz-Serna, Konstantinos Zygalakis,	(参考訳) 我々はFazylab et al (SIAM J. Optim. 28 2018)によって導入された一般的なフレームワークを再検討し、離散的かつ連続的な時間で最適化アルゴリズムのためのリアプノフ関数を構築する。滑らかで強凸な目的関数に対して、そのような構成に必要な要求を緩和する。その結果、Polyak の常微分方程式と、Nesterov アルゴリズムの 2 パラメータの族に対して、文献で利用できるものよりも良い収束率を証明できる。我々はNesterovアルゴリズムの解釈をPolyak方程式の離散化として分析する。アルゴリズムが加法ランゲ・クッタ積分器の例であることを示し、微分方程式のほとんどの離散化が加速を伴う最適化アルゴリズムを導出しない理由を論じる。また、Polyak方程式の修正を導入し、収束特性について研究する。最後に、一般のフレームワークを確率的シナリオに拡張し、過パラメータモデルに対する加速度を伴うランダムアルゴリズムへの応用を検討する。 We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# 野生におけるディープフェイクテキストの検出 Deepfake Text Detection in the Wild ( http://arxiv.org/abs/2305.13242v2 ) ライセンス: Link先を確認	Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang,	(参考訳) 大規模言語モデル(LLM)は、偽ニュースの拡散や盗作などのリスクを軽減するために効果的なAI生成テキスト検出の必要性を強調し、人間レベルのテキスト生成を実現している。既存の研究は、特定のドメインや特定の言語モデルにおける検出方法を評価することで制約されている。しかし、実際のシナリオでは、検出器はソースを知らずに、様々なドメインやLLMのテキストに直面する。この目的のために,様々な LLM が生成する多種多様な人文やテキストからテキストを収集し,総合的なテストベッドを構築する。実証的な結果は、機械が生成したテキストと、さまざまなシナリオ、特にアウト・オブ・ディストリビューションにおける人間によるテキストを区別する上での課題を示している。これらの課題は、2つの情報源間の言語的区別の減少によるものである。問題にもかかわらず、トップパフォーマンス検出器は、新しいLCMによって生成された86.54%のドメイン外のテキストを識別することができ、アプリケーションシナリオの実現可能性を示している。私たちはリソースをhttps://github.com/yafuly/MAGE.comでリリースします。 Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# MGL2Rank:マルチグラフフュージョンに基づく道路ネットワークにおけるノードの重要性のランク付けを学ぶ MGL2Rank: Learning to Rank the Importance of Nodes in Road Networks Based on Multi-Graph Fusion ( http://arxiv.org/abs/2305.14375v3 ) ライセンス: Link先を確認	Ming Xu, Jing Zhang,	(参考訳) 道路網における伝播能力の強い重要なノードの同定は都市計画において重要な課題である。交通ネットワークにおけるノードの重要性を評価する既存の手法は、地形情報や交通量のみを考慮しており、車線数や道路セグメントの平均速度などの道路ネットワークにおける交通特性の多様性は無視され、性能が制限される。この問題を解決するために,道路ネットワークの豊富な特徴を統合し,ノードの重要性をランク付けするグラフ学習ベースのフレームワーク(MGL2Rank)を提案する。本フレームワークは、サンプリングアルゴリズム(MGWalk)とエンコーダネットワークとを含む埋め込みモジュールを備え、各道路セグメントの潜在表現を学習する。 MGWalkは、道路ネットワークのトポロジを捉え、それらの属性に基づいて道路セグメント間の関連を確立するために、マルチグラフ融合を利用する。得られたノード表現は、道路セグメントの重要性のランキングを学習するために使用される。最後に,シェニアン市の地方道路網をベースとしたタスクのランク付けのための合成データセットを構築し,その評価結果から本手法の有効性を実証した。 MGL2Rankのデータとソースコードはhttps://github.com/iCityLab/MGL2Rank.comで入手できる。 The identification of important nodes with strong propagation capabilities in road networks is a vital topic in urban planning. Existing methods for evaluating the importance of nodes in traffic networks only consider topological information and traffic volumes, the diversity of the traffic characteristics in road networks, such as the number of lanes and average speed of road segments, is ignored, thus limiting their performance. To solve this problem, we propose a graph learning-based framework (MGL2Rank) that integrates the rich characteristics of road networks to rank the importance of nodes. This framework comprises an embedding module containing a sampling algorithm (MGWalk) and an encoder network to learn the latent representations for each road segment. MGWalk utilizes multigraph fusion to capture the topology of road networks and establish associations between road segments based on their attributes. The obtained node representation is then used to learn the importance ranking of the road segments. Finally, a synthetic dataset is constructed for ranking tasks based on the regional road network of Shenyang City, and the ranking results on this dataset demonstrate the effectiveness of our method. The data and source code for MGL2Rank are available at https://github.com/iCityLab/MGL2Rank.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# 信頼による生成:ブラックボックス大言語モデルの不確実性定量化 Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models ( http://arxiv.org/abs/2305.19187v3 ) ライセンス: Link先を確認	Zhen Lin, Shubhendu Trivedi, Jimeng Sun,	(参考訳) 自然言語生成(NLG)に特化した大規模言語モデル(LLM)は、最近、様々な領域で有望な能力を示すようになった。しかし、LSMsが生み出す応答の信頼性を追求することは、NLGの不確実性定量化(UQ)の研究が限られており、未解決の課題である。さらに、既存の文献では、言語モデルへのホワイトボックスアクセスを前提としており、これは最新のLCMのクローズソースの性質や計算上の制約によって非現実的になっている。本研究では,NLG における black-box LLM の UQ について検討する。第一に不確実性と自信を区別する: 前者は固定された入力に対する潜在的な予測の ` `dispersion'' を指し、後者は特定の予測/生成に対する信頼を示す。次に、信頼できない結果が無視されるか、さらなる評価のために得られるような、選択的なNLGに適用して、いくつかの信頼/不確実性対策を提案し、比較する。質問応答データセット(評価目的)について,いくつかのLLMを用いて実験を行った。その結果, 意味的分散の簡易な尺度は, LLMの応答品質の信頼性の高い予測因子となり, LLMを採用する際の不確実性管理について, 実践者にとって貴重な知見を提供することができた。実験を再現するコードはhttps://github.com/zlin7/UQ-NLG.comで公開されている。 Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities across a variety of domains. However, gauging the trustworthiness of responses generated by LLMs remains an open challenge, with limited research on uncertainty quantification (UQ) for NLG. Furthermore, existing literature typically assumes white-box access to language models, which is becoming unrealistic either due to the closed-source nature of the latest LLMs or computational constraints. In this work, we investigate UQ in NLG for black-box* LLMs. We first differentiate uncertainty vs confidence: the former refers to the ``dispersion'' of the potential predictions for a fixed input, and the latter refers to the confidence on a particular prediction/generation. We then propose and compare several confidence/uncertainty measures, applying them to selective NLG where unreliable results could either be ignored or yielded for further assessment. Experiments were carried out with several popular LLMs on question-answering datasets (for evaluation purposes). Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses, providing valuable insights for practitioners on uncertainty management when adopting LLMs. The code to replicate our experiments is available at https://github.com/zlin7/UQ-NLG.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# 正規グリッドを超えて: 任意ドメイン上のフーリエベースのニューラル演算子 Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains ( http://arxiv.org/abs/2305.19663v4 ) ライセンス: Link先を確認	Levi Lingsch, Mike Y. Michelis, Emmanuel de Bezenac, Sirani M. Perera, Robert K. Katzschmann, Siddhartha Mishra,	(参考訳) PDEの学習に広く用いられている多くのニューラル演算子の計算効率は、スペクトル計算を行うための高速フーリエ変換(FFT)に依存している。 FFT は等間隔(正方形)のグリッドに制限されているため、一般の非等間隔の点分布で入力関数と出力関数を処理しなければならない問題に適用した場合、そのようなニューラル演算子の効率は制限される。ニューラル演算子の要求表現性を提供するために、限られたフーリエ(スペクトル)モードが十分であるという観測を生かして、基礎となるスペクトル変換の効率的な直接評価に基づいて、ニューラルネットワークを任意の領域に拡張する簡単な方法を提案する。このような直接スペクトル評価の効率的な実装は、任意の非等間隔分布上のデータの処理を可能にするために、既存のニューラル演算子モデルと結合される。実験的な評価により,提案手法により,フーリエニューラル演算子(FNO)と関連するニューラル演算子の精度を維持したり向上させたりしながら,ベースライン上でのトレーニング速度を大幅に向上した任意の点分布にニューラル演算子を拡張できることが実証された。 The computational efficiency of many neural operators, widely used for learning solutions of PDEs, relies on the fast Fourier transform (FFT) for performing spectral computations. As the FFT is limited to equispaced (rectangular) grids, this limits the efficiency of such neural operators when applied to problems where the input and output functions need to be processed on general non-equispaced point distributions. Leveraging the observation that a limited set of Fourier (Spectral) modes suffice to provide the required expressivity of a neural operator, we propose a simple method, based on the efficient direct evaluation of the underlying spectral transformation, to extend neural operators to arbitrary domains. An efficient implementation of such direct spectral evaluations is coupled with existing neural operator models to allow the processing of data on arbitrary non-equispaced distributions of points. With extensive empirical evaluation, we demonstrate that the proposed method allows us to extend neural operators to arbitrary point distributions with significant gains in training speed over baselines while retaining or improving the accuracy of Fourier neural operators (FNOs) and related neural operators.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# 量子計測のための物理ノイズモデル A physical noise model for quantum measurements ( http://arxiv.org/abs/2305.19766v3 ) ライセンス: Link先を確認	Faedi Loulidi, Ion Nechita, Clément Pellegrini,	(参考訳) 本稿では, 間接計測方式による量子計測のための新しいノイズモデルを提案する。量子システムとプローブ間の相互作用を制御しているランダムなダイナミクスを平均として、自然の物理的ノイズモデルが出現する。不整合性ロバスト性(英語版)の枠組みにおける既存のノイズモデル(一様および非偏極化)と比較する。我々は,本モデルが特定の測定値のクラスに対して,より大きな互換性領域を実現することを観察した。 In this paper we introduce a novel noise model for quantum measurements motivated by an indirect measurement scheme with faulty preparation. Averaging over random dynamics governing the interaction between the quantum system and a probe, a natural, physical noise model emerges. We compare it to existing noise models (uniform and depolarizing) in the framework of incompatibility robustness. We observe that our model allows for larger compatibility regions for specific classes of measurements.	翻訳日:2024-05-22 01:00:22 公開日:2024-05-20
# マルチバススピン-ボソンモデルにおけるエンタングルメントの増強 Enhanced entanglement in multi-bath spin-boson models ( http://arxiv.org/abs/2306.11036v3 ) ライセンス: Link先を確認	Charlie R. Hogg, Federico Cerisola, James D. Cresser, Simon A. R. Horsley, Janet Anders,	(参考訳) スピンボソンモデルは通常、単一のボゾン浴に結合されたスピンを考える。しかし、いくつかの物理的状況ではスピンを複数の環境に結合する必要がある。例えば、スピンは3次元磁気材料中のフォノンと相互作用する。ここではスピン結合を3つの独立した浴槽に等方的に考える。複数浴室との結合は, スピンと環境との絡み合いを0温度で著しく増大させることを示した。この効果は平均力平衡状態におけるスピンの期待値を減らすことである。対照的に、古典的な3塩基スピン平衡状態は、環境結合から完全に独立であることが判明した。これらの結果から、多重バス結合から生じる純粋に量子効果が明らかとなり、磁気材料など幅広い用途に応用される可能性がある。 The spin-boson model usually considers a spin coupled to a single bosonic bath. However, some physical situations require coupling of the spin to multiple environments. For example, spins interacting with phonons in three-dimensional magnetic materials. Here, we consider a spin coupled isotropically to three independent baths. We show that coupling to multiple baths can significantly increase entanglement between the spin and its environment at zero temperature. The effect of this is to reduce the spin's expectation values in the mean force equilibrium state. In contrast, the classical three-bath spin equilibrium state turns out to be entirely independent of the environmental coupling. These results reveal purely quantum effects that can arise from multi-bath couplings, with potential applications in a wide range of settings, such as magnetic materials.	翻訳日:2024-05-22 00:50:05 公開日:2024-05-20
# Statler: 身体的推論のための状態維持型言語モデル Statler: State-Maintaining Language Models for Embodied Reasoning ( http://arxiv.org/abs/2306.17840v4 ) ライセンス: Link先を確認	Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter,	(参考訳) 知的ロボットを複雑な推論で強化するために、大きな言語モデルを使うことに大きな研究関心が寄せられている。既存の研究は、彼らの行動と観察の歴史を解明するために彼らの能力を活用することに焦点を当てている。本稿では,ロボット工学の計画において,大規模言語モデルが有用となる新しい次元について検討する。特に,大規模な言語モデルに対して,しばしば観測不可能な世界状態の推定を指示し,その遷移を新たな行動として追跡するフレームワークであるStatlerを提案する。そして、我々のフレームワークは、現在の世界状態の推定に対して各アクションを条件付けします。概念的には単純であるにもかかわらず、我々のStatlerフレームワークは、いくつかのロボット計画タスクにおいて強力な競合する手法(例えば、Code-as-Policies)を著しく上回っている。さらに、より困難な長期計画タスクにスケールアップする可能性もあります。 There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.	翻訳日:2024-05-22 00:50:05 公開日:2024-05-20
# FlakyFix: 大規模な言語モデルを使用して、フレキシブルなテスト修正カテゴリとテストコード修正を予測する FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair ( http://arxiv.org/abs/2307.00012v3 ) ライセンス: Link先を確認	Sakina Fatima, Hadi Hemmati, Lionel Briand,	(参考訳) 不安定なテストは、非決定的に同じソフトウェアバージョンをテスト中にパスまたは失敗し、混乱と開発労力の浪費を引き起こすため、問題となる。機械学習モデルは、フレキネスとその根本原因を予測するために使われてきたが、問題を修正するためのサポートを提供する作業は、はるかに少ない。このギャップに対処するために、本稿では、フレキネスを除去し、そのベースでテストコードを修正するために必要な修正の種類を予測することに焦点を当てる。これは、フレキネスの根本原因がテストケース自身にあり、本番コードにはない、不安定なテストケースのサブセットに対して行います。私たちのキーとなるアイデアは、予測された修正カテゴリの形で、テストのフレキネスに関するさらなる知識で、修復プロセスのガイドを行うことです。そこで我々はまず,13の修正カテゴリのラベル付きデータセットを自動的に生成するフレームワークを提案し,テストコードのみを解析することにより,フレークテストの修正カテゴリを予測するモデルを訓練する。コードモデルと数ショット学習を用いた実験結果から,修正カテゴリのほとんどを正確に予測できることが判明した。欠陥を自動的に修復するための固定カテゴリラベルの有用性を示すため,GPTのような大規模言語モデル(LLM)を改良し,補修提案をLLMに依頼する。提案する修正カテゴリラベルは文脈内学習を補完し, GPT 3.5 Turbo のフレークテストにおける修正能力を大幅に向上させることが示唆された。本研究は, GPT修復フラキ試験のサンプルの実施と解析に基づいて, 修復率の70%から90%が通過することが期待されると推定した。修復されたテストが失敗した場合、平均してテストコードの16%は、通過するためにさらに変更する必要がある。 Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test, causing confusion and wasting development effort. While machine learning models have been used to predict flakiness and its root causes, there is much less work on providing support to fix the problem. To address this gap, in this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis. We do this for a subset of flaky test cases where the root cause of flakiness is in the test case itself and not in the production code. Our key idea is to guide the repair process with additional knowledge about the test's flakiness in the form of its predicted fix category. Thus, we first propose a framework that automatically generates labeled datasets for 13 fix categories and trains models to predict the fix category of a flaky test by analyzing the test code only. Our experimental results using code models and few-shot learning show that we can correctly predict most of the fix categories. To show the usefulness of such fix category labels for automatically repairing flakiness, in addition to informing testers, we augment a Large Language Model (LLM) like GPT with such extra knowledge to ask the LLM for repair suggestions. The results show that our suggested fix category labels, complemented with in-context learning, significantly enhance the capability of GPT 3.5 Turbo in generating fixes for flaky tests. Based on the execution and analysis of a sample of GPT-repaired flaky tests, we estimate that a large percentage of such repairs, (roughly between 70% and 90%) can be expected to pass. For the failing repaired tests, on average, 16% of the test code needs to be further changed for them to pass.	翻訳日:2024-05-22 00:50:05 公開日:2024-05-20
# 正規設計によるロジスティック回帰におけるパラメータ推定のサンプル複雑性について On the sample complexity of parameter estimation in logistic regression with normal design ( http://arxiv.org/abs/2307.04191v3 ) ライセンス: Link先を確認	Daniel Hsu, Arya Mazumdar,	(参考訳) ロジスティック回帰モデルは、ノイズの多いバイナリ分類問題において最も一般的なデータ生成モデルの一つである。本研究では,ロジスティック回帰モデルのパラメータを与えられた$\ell_2$誤差まで推定するサンプルの複雑さを,標準正規共変量を用いて,次元と逆温度の観点から検討する。逆温度は、データ生成プロセスの信号対雑音比を制御する。対数回帰のための最大線量推定器の一般化境界と漸近性能はよく研究されているが, 誤差依存性を示す非漸近サンプルの複雑さとパラメータ推定のための逆温度は, これまでの分析では欠落している。試料の複雑性曲線は逆温度の点で2つの変化点を持ち, 低温, 中温, 高温状態を明確に分離することを示した。 The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.	翻訳日:2024-05-22 00:50:05 公開日:2024-05-20
# My3DGen: スケーラブルなパーソナライズされた3D生成モデル My3DGen: A Scalable Personalized 3D Generative Model ( http://arxiv.org/abs/2307.05468v4 ) ライセンス: Link先を確認	Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta,	(参考訳) 近年,光現実的顔合成の課題に対処するために,生成型3次元顔モデル(例えばEG3D)が開発されている。しかし、これらのモデルでは個々の個人固有の顔の特徴を捉えることができず、パーソナライズの重要性を強調している。いくつかの先行研究は、生成的顔モデルのパーソナライズを約束しているが、これらの研究は主に2D設定に焦点を当てている。また、これらの手法では、各ユーザに対して多数のパラメータを微調整と格納の両方を必要とし、スケーラブルなパーソナライゼーションを実現するのに障害となる。パーソナライゼーションのもうひとつの課題は、個々のトレーニングイメージの数が限られていることだ。提案手法であるMy3DGenは,50枚以上のトレーニング画像を用いて個人に対してパーソナライズされた3D画像を生成する。 My3DGenは、新しいビューの合成、特定の顔のセマンティックな編集(例えば、笑顔を追加する)、新しい外観の合成を可能にする。我々は3D顔の特徴をグローバルな特徴とパーソナライズされた特徴に分解し、トレーニング済みのEG3Dを凍結し、低ランクの分解によってさらにパーソナライズされた重みをトレーニングする。その結果、My3DGenは個々のパラメータごとに$\textbf{240K}$パーソナライズされたパラメータのみを導入し、パラメータ全体の微調整に必要な$\textbf{30.6M}$と比較して、トレーニング可能なパラメータが$$\textbf{127}\times$削減される。ストレージの大幅な削減にもかかわらず、我々のモデルは下流アプリケーションの品質を損なうことなくアイデンティティ機能を保ちます。 In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D settings. Also, these methods require both fine-tuning and storing a large number of parameters for each user, posing a hindrance to achieving scalable personalization. Another challenge of personalization is the limited number of training images available for each individual, which often leads to overfitting when using full fine-tuning methods. Our proposed approach, My3DGen, generates a personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity. We decouple the 3D facial features into global features and personalized features by freezing the pre-trained EG3D and training additional personalized weights through low-rank decomposition. As a result, My3DGen introduces only $\textbf{240K}$ personalized parameters per individual, leading to a $\textbf{127}\times$ reduction in trainable parameters compared to the $\textbf{30.6M}$ required for fine-tuning the entire parameter space. Despite this significant reduction in storage, our model preserves identity features without compromising the quality of downstream applications.	翻訳日:2024-05-22 00:50:05 公開日:2024-05-20
# 境界駆動型ダブルスピンチェーンと資源効率のよいリモートエンタングルメント安定化の具体的結果 Exact Results for a Boundary-Driven Double Spin Chain and Resource-Efficient Remote Entanglement Stabilization ( http://arxiv.org/abs/2307.09482v2 ) ライセンス: Link先を確認	Andrew Lingenfelter, Mingxing Yao, Andrew Pocklington, Yu-Xin Wang, Abdullah Irfan, Wolfgang Pfaff, Aashish A. Clerk,	(参考訳) 2つの$XX$結合された$N$-qubitスピンチェイン(おそらくは非一様結合)が境界 Rabi ドライブおよび導波路(双方向または一方向)によって生じる共通境界損失を受けるようなセットアップの定常状態に対する正確な解を導出する。幅広いパラメータに対して、このシステムは純粋に絡み合った定常状態を持ち、圧縮光を使わずに遠隔マルチキュービットの絡み合いを安定化する手段を提供する。我々の解はまた、相互作用するフェルミオンモデルに写像する1つの境界駆動散逸$XX$スピンチェインに関する洞察を与える。非平衡定常状態は、動的に拘束されたホッピングから生じる穴の励起の創発的なペアリングを含む驚くべき相関効果を示す。我々のシステムは、回路QEDを含む多くの実験プラットフォームで実装できる。 We derive an exact solution for the steady state of a setup where two $XX$-coupled $N$-qubit spin chains (with possibly non-uniform couplings) are subject to boundary Rabi drives, and common boundary loss generated by a waveguide (either bidirectional or unidirectional). For a wide range of parameters, this system has a pure entangled steady state, providing a means for stabilizing remote multi-qubit entanglement without the use of squeezed light. Our solution also provides insights into a single boundary-driven dissipative $XX$ spin chain that maps to an interacting fermionic model. The non-equilibrium steady state exhibits surprising correlation effects, including an emergent pairing of hole excitations that arises from dynamically constrained hopping. Our system could be implemented in a number of experimental platforms, including circuit QED.	翻訳日:2024-05-22 00:50:05 公開日:2024-05-20
# LSTM, BiLSTM, CNN, GRU, GloVeを用いた癌遺伝子変異分類のためのハイブリッド機械学習モデル A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe ( http://arxiv.org/abs/2307.14361v3 ) ライセンス: Link先を確認	Sanad Aburass, Osama Dorgham, Jamil Al Shaqsi,	(参考訳) 本研究では,LSTM,BiLSTM,CNN,GRU,GloVeを併用した新しいハイブリッドアンサンブルモデルを提案する。このモデルは、Kaggle氏のPersonalized Medicine: Redefining Cancer Treatmentデータセットを使用して厳格にテストされ、すべての評価指標で例外的なパフォーマンスを示しました。特に,トレーニング精度80.6%,精度81.6%,リコール80.6%,F1スコア83.1%,Mean Squared Error(MSE)2.596。これらの結果は高度なトランスフォーマーモデルとそのアンサンブルを上回り、遺伝子変異分類の複雑さを扱う上で、我々のモデルの優れた能力を示している。遺伝子変異分類の精度と効率は、個々の遺伝子プロファイルに基づく調整された治療計画が患者の結果を劇的に改善し、命を救うことができる精密医療の時代において最重要である。本モデルでは, がん診断と治療の精度を高める可能性を強調し, パーソナライズされた医療の進歩に大きく貢献する。 In our study, we introduce a novel hybrid ensemble model that synergistically combines LSTM, BiLSTM, CNN, GRU, and GloVe embeddings for the classification of gene mutations in cancer. This model was rigorously tested using Kaggle's Personalized Medicine: Redefining Cancer Treatment dataset, demonstrating exceptional performance across all evaluation metrics. Notably, our approach achieved a training accuracy of 80.6%, precision of 81.6%, recall of 80.6%, and an F1 score of 83.1%, alongside a significantly reduced Mean Squared Error (MSE) of 2.596. These results surpass those of advanced transformer models and their ensembles, showcasing our model's superior capability in handling the complexities of gene mutation classification. The accuracy and efficiency of gene mutation classification are paramount in the era of precision medicine, where tailored treatment plans based on individual genetic profiles can dramatically improve patient outcomes and save lives. Our model's remarkable performance highlights its potential in enhancing the precision of cancer diagnoses and treatments, thereby contributing significantly to the advancement of personalized healthcare.	翻訳日:2024-05-22 00:40:21 公開日:2024-05-20
# ログベース異常検出のための機械学習手法に関する総合的研究 A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection ( http://arxiv.org/abs/2307.16714v2 ) ライセンス: Link先を確認	Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand,	(参考訳) システム複雑性の増大により、ログベースの異常検出(LAD)など、さまざまなログ分析タスクに特化した自動化技術の必要性が高まっている。後者は文学で広く取り上げられており、主に様々な深層学習技術によって研究されている。ディープラーニング技術には多くの利点があるが、従来の機械学習(ML)技術は多くのケースにおいて、コンテキストやデータセットによってうまく機能する可能性があるため、ある程度は任意である。半監督的技法は、前者が明確な実践上の優位性を持っているため、半監督的技法と同一の注意を払っている。さらに、現在の評価は検出精度の評価に大きく依存している。しかし、特定のMLテクニックが与えられたコンテキストにおけるLAD問題に対処するのに適したかどうかを決定するのに十分ではない。その他の考慮すべき側面としては、トレーニングや予測時間、ハイパーパラメータチューニングに対する感度などがあります。本稿では,教師付き,半教師付き,従来型,深層ML技術の4つの評価基準として,検出精度,時間性能,検出精度の感度,ハイパーパラメータチューニングに対する時間性能の4つの評価基準を提案する。実験結果から,従来のML手法と深部ML手法は,検出精度と予測時間に類似していることがわかった。さらに、総合的に、ハイパーパラメータチューニングw.r.t.検出精度に対する感度解析は、教師付き従来のML技術がディープラーニング技術よりも感度が低いことを示している。さらに、半教師技術は教師技術よりも検出精度が著しく低い。 Growth in system complexity increases the need for automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of a variety of deep learning techniques. Despite their many advantages, that focus on deep learning techniques is somewhat arbitrary as traditional Machine Learning (ML) techniques may perform well in many cases, depending on the context and datasets. In the same vein, semi-supervised techniques deserve the same attention as supervised techniques since the former have clear practical advantages. Further, current evaluations mostly rely on the assessment of detection accuracy. However, this is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem in a given context. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. In this paper, we present a comprehensive empirical study, in which we evaluate supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy and time performance to hyperparameter tuning. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time. Moreover, overall, sensitivity analysis to hyperparameter tuning w.r.t. detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.	翻訳日:2024-05-22 00:40:21 公開日:2024-05-20
# 半無限導波路と結合した原子に基づく量子コヒーレント及び測定フィードバック制御 Quantum coherent and measurement feedback control based on atoms coupled with a semi-infinite waveguide ( http://arxiv.org/abs/2307.16876v2 ) ライセンス: Link先を確認	Haijin Ding, Nina H. Amini, Guofeng Zhang, John E. Gough,	(参考訳) 本稿では,複数の2レベル原子を結合した半無限導波路に基づく原子・フォトニック系の所望の状態を生成するために,量子フィードバック制御が適用可能であることを示す。このセットアップでは、初期励起原子が導波路に1つの光子を放出し、終端ミラーや他の原子によって反射され、原子と光子のコヒーレント相互作用を介して異なるフィードバックループを確立することができる。導波管量子電磁力学(導波管QED)系に少なくとも2つの励起が存在する場合、量子状態の進化はランダムグラフ理論を用いて解釈できる。このプロセスは環境の影響を受けながら,計測に基づくフィードバック制御やコヒーレントドライブによって環境誘起のダイナミクスを排除できることを明らかにする。したがって、オープン系原子-導波路相互作用において、測定に基づくフィードバックは最終的な定常量子状態を変調することができ、同時に、測定プロセスにおけるホモダイン検出ノイズは振動を誘発し、コヒーレントなフィードバック設計によって処理される。 In this paper, we show that quantum feedback control may be applied to generate desired states for atomic and photonic systems based on a semi-infinite waveguide coupled with multiple two-level atoms. In this set-up, an initially excited atom can emit one photon into the waveguide, which can be reflected by the terminal mirror or other atoms to establish different feedback loops via the coherent interactions between the atom and photon. When there are at most two excitations in the waveguide quantum electrodynamics (waveguide QED) system, the evolution of quantum states can be interpreted using random graph theory. While this process is influenced by the environment, and we clarify that the environment-induced dynamics can be eliminated by measurement-based feedback control or coherent drives. Thus, in the open system atom-waveguide interactions, measurement-based feedback can modulate the final steady quantum state, while simultaneously, the homodyne detection noise in the measurement process can induce oscillations, which is treated by the coherent feedback designs.	翻訳日:2024-05-22 00:40:21 公開日:2024-05-20
# 連続対称性を持つ新しい畳み込みニューラルネットワークアーキテクチャ A Novel Convolutional Neural Network Architecture with a Continuous Symmetry ( http://arxiv.org/abs/2308.01621v4 ) ライセンス: Link先を確認	Yao Liu, Hang Shao, Bing Bai,	(参考訳) 本稿では、準線形双曲系と呼ばれる偏微分方程式(PDE)のクラスに着想を得た新しい畳み込みニューラルネットワーク(ConvNet)アーキテクチャを提案する。画像分類タスクで同等の性能を持つので、連続した対称性の群を通して重みを修正できる。これは、アーキテクチャと重みが本質的に固定された従来のモデルから大きく変わります。我々は、ニューラルネットワークの新たな望ましい特性として(内部)対称性を推進し、より広範なDeep LearningコミュニティにおけるConvNetの分析と解釈において、PDEの視点に注意を向けたい。 This paper introduces a new Convolutional Neural Network (ConvNet) architecture inspired by a class of partial differential equations (PDEs) called quasi-linear hyperbolic systems. With comparable performance on the image classification task, it allows for the modification of the weights via a continuous group of symmetry. This is a significant shift from traditional models where the architecture and weights are essentially fixed. We wish to promote the (internal) symmetry as a new desirable property for a neural network, and to draw attention to the PDE perspective in analyzing and interpreting ConvNets in the broader Deep Learning community.	翻訳日:2024-05-22 00:40:21 公開日:2024-05-20
# シーン認識のための意味埋め込み型類似性プロトタイプ Semantic-embedded Similarity Prototype for Scene Recognition ( http://arxiv.org/abs/2308.05896v3 ) ライセンス: Link先を確認	Chuanxin Song, Hanbo Wu, Xin Ma, Yibin Li,	(参考訳) 複雑な構成によって生じるクラス間類似度の高さと、シーン間の共存オブジェクトにより、多くの研究がシーン認識を改善するためにシーン内のオブジェクトの意味知識を探索してきた。しかし、オブジェクト情報抽出技術では計算コストが重いため、ネットワークの負担が大きくなるため、結果として課題が生じる。この制限は、実際のデプロイにおいて、エッジデバイスと互換性のないオブジェクトアシストアプローチをしばしば引き起こす。対照的に,本研究では,シーン認識ネットワークが実際の計算コストを増大させることなく,より優れた精度を実現するための,意味的知識に基づく類似性プロトタイプを提案する。シンプルで、既存のパイプラインにプラグイン&プレイできる。より具体的には、シーンのセマンティックな知識をクラスレベルのセマンティックな表現として表現するための統計戦略が導入された。これらの表現はシーンクラス間の相関を探索するために使用され、最終的には類似したプロトタイプを構築する。さらに,この類似性を生かして,グラディエントラベルソフトニングとバッチレベルのコントラストロスの観点から,ネットワークトレーニングを支援することを提案する。複数のベンチマークの総合的な評価は、我々の類似性プロトタイプが既存のネットワークの性能を向上させる一方で、実際の展開において計算負荷を余分に回避していることを示している。コードと統計的類似性プロトタイプはhttps://github.com/ChuanxinSong/SimilarityPrototypeで公開される。 Due to the high inter-class similarity caused by the complex composition and the co-existing objects across scenes, numerous studies have explored object semantic knowledge within scenes to improve scene recognition. However, a resulting challenge emerges as object information extraction techniques require heavy computational costs, thereby burdening the network considerably. This limitation often renders object-assisted approaches incompatible with edge devices in practical deployment. In contrast, this paper proposes a semantic knowledge-based similarity prototype, which can help the scene recognition network achieve superior accuracy without increasing the computational cost in practice. It is simple and can be plug-and-played into existing pipelines. More specifically, a statistical strategy is introduced to depict semantic knowledge in scenes as class-level semantic representations. These representations are used to explore correlations between scene classes, ultimately constructing a similarity prototype. Furthermore, we propose to leverage the similarity prototype to support network training from the perspective of Gradient Label Softening and Batch-level Contrastive Loss, respectively. Comprehensive evaluations on multiple benchmarks show that our similarity prototype enhances the performance of existing networks, all while avoiding any additional computational burden in practical deployments. Code and the statistical similarity prototype will be available at https://github.com/ChuanxinSong/SimilarityPrototype	翻訳日:2024-05-22 00:40:21 公開日:2024-05-20
# BEVTrack:鳥から見た3Dオブジェクト追跡のためのシンプルで強力なベースライン BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View ( http://arxiv.org/abs/2309.02185v5 ) ライセンス: Link先を確認	Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha,	(参考訳) 3Dシングルオブジェクトトラッキング(SOT)はコンピュータビジョンの基本課題であり、自律運転のようなアプリケーションに不可欠なことを証明している。外観の変化、散逸、点雲の広さにより、ターゲットを周囲から特定することは依然として困難である。これらの問題に対処するためには、シームズ以前のトラッカーとモーション中心のトラッカーの両方が精巧な設計と複数のサブタスクを解決する必要がある。本稿では,単純で効果的なベースライン手法であるBEVTrackを提案する。 Bird's-Eye View (BEV) の目標運動を推定して追跡を行うことで、BEVTrackは、ネットワーク設計、トレーニング目標、トラッキングパイプラインといった様々な側面から驚くほどの単純さを示しながら、優れたパフォーマンスを実現している。さらに、様々な属性(例えば、サイズ、動きパターン)を持つ対象に対する正確な回帰を達成するために、BEVTrackは、以前の研究のように固定されたラプラシアンあるいはガウス的仮定を作るのではなく、学習した基礎分布を異なる目標に適合させる可能性関数を構築する。これにより、トラッキングのための貴重な事前情報が提供され、パフォーマンスがさらに向上する。単純な畳み込みアーキテクチャで単一の回帰損失のみを使用する一方で、BEVTrackは3つの大規模データセット(KITTI、NuScenes、Waymo Open Dataset)で最先端のパフォーマンスを実現し、推論速度は約200FPSを維持している。コードはhttps://github.com/xmm-prio/BEVTrack.comでリリースされる。 3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. It remains challenging to localize the target from surroundings due to appearance variations, distractors, and the high sparsity of point clouds. To address these issues, prior Siamese and motion-centric trackers both require elaborate designs and solving multiple subtasks. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance. Besides, to achieve accurate regression for targets with diverse attributes (e.g., sizes and motion patterns), BEVTrack constructs the likelihood function with the learned underlying distributions adapted to different targets, rather than making a fixed Laplacian or Gaussian assumption as in previous works. This provides valuable priors for tracking and thus further boosts performance. While only using a single regression loss with a plain convolutional architecture, BEVTrack achieves state-of-the-art performance on three large-scale datasets, KITTI, NuScenes, and Waymo Open Dataset while maintaining a high inference speed of about 200 FPS. The code will be released at https://github.com/xmm-prio/BEVTrack.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# コード秘密:ニューラルネットワークの補完ツールがハードコードクレジットカードを記憶できる Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-Coded Credentials ( http://arxiv.org/abs/2309.07639v2 ) ライセンス: Link先を確認	Yizhan Huang, Yichen Li, Weibin Wu, Jianping Zhang, Michael R. Lyu,	(参考訳) ニューラルコード補完ツール(NCCT)は、言語モデリング技術に基づいて構築され、文脈に関連のあるコードスニペットを正確に提案できるソフトウェア工学の分野を再構築した。しかし、言語モデルは適切なプロンプトで推論中に冗長なトレーニングデータを出力することができる。この記憶特性は、ハードコードされたクレデンシャルリークに関するNCCTのプライバシー上の懸念を高め、アプリケーション、システム、ネットワークへの不正アクセスを引き起こす。したがって、NCCTがハードコードされたクレデンシャルを出力するかどうかを問うために、ハードコードCredential Revealer (HCR) と呼ばれる評価ツールを提案する。 HCRはGitHubのコードファイルに基づいてテストプロンプトを構築し、NCCTの記憶現象を明らかにする。そして、HCRは不正な認証情報をフィルタする4つのフィルタを設計する。最後に、HCRは、一連の非機密認証の妥当性を直接チェックする。商用NCCT,オープンソースモデル,コード補完機能を備えたチャットボットの3種類のNCCTの評価にHCRを適用した。実験の結果,NCCTはトレーニングデータの正確な部分を返すだけでなく,必然的に追加の秘密文字列を漏洩させることができることがわかった。特に,実験中に2つの有効な認証情報が確認された。したがって、HCRは、商用NCCTのトレーニングデータにハードコードされた認証情報が漏洩する可能性があるという深刻なプライバシー上の懸念を提起する。すべてのアーティファクトとデータは、将来の研究目的のためにhttps://github.com/HCR-Repo/HCRでリリースされる。 Neural Code Completion Tools (NCCTs) have reshaped the field of software engineering, which are built upon the language modeling technique and can accurately suggest contextually relevant code snippets. However, language models may emit the training data verbatim during inference with appropriate prompts. This memorization property raises privacy concerns of NCCTs about hard-coded credential leakage, leading to unauthorized access to applications, systems, or networks. Therefore, to answer whether NCCTs will emit the hard-coded credential, we propose an evaluation tool called Hard-coded Credential Revealer (HCR). HCR constructs test prompts based on GitHub code files with credentials to reveal the memorization phenomenon of NCCTs. Then, HCR designs four filters to filter out ill-formatted credentials. Finally, HCR directly checks the validity of a set of non-sensitive credentials. We apply HCR to evaluate three representative types of NCCTs: Commercial NCCTs, open-source models, and chatbots with code completion capability. Our experimental results show that NCCTs can not only return the precise piece of their training data but also inadvertently leak additional secret strings. Notably, two valid credentials were identified during our experiments. Therefore, HCR raises a severe privacy concern about the potential leakage of hard-coded credentials in the training data of commercial NCCTs. All artifacts and data are released for future research purposes in https://github.com/HCR-Repo/HCR.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# Draft & Verify: 自己投機的デコーディングによるロスレス大規模言語モデルの高速化 Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding ( http://arxiv.org/abs/2309.08168v2 ) ライセンス: Link先を確認	Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, Sharad Mehrotra,	(参考訳) 本稿では,Large Language Models (LLM) を補助モデルなしで高速化するための新しい推論手法,自己投機的デコーディングを提案する。このアプローチの特徴は、ドラフトと検証という2段階のプロセスである。ドラフト段階は、若干低い品質でより迅速にドラフトトークンを生成し、ドラフト期間中に特定の中間層を選択的にスキップすることで達成される。その後、検証段階は元のLSMを使用して、これらのドラフト出力トークンを1つのフォワードパスで検証する。このプロセスは、最終的な出力が未修正のLLMで生成された出力と変わらないことを保証します。さらに、提案手法では、追加のニューラルネットワークトレーニングを必要とせず、メモリフットプリントを必要とせず、推論アクセラレーションのためのプラグアンドプレイで費用対効果の高いソリューションとなる。 LLaMA-2とその変種によるベンチマークでは、1.99$\times$まで高速化された。 We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary model. This approach is characterized by a two-stage process: drafting and verification. The drafting stage generates draft tokens at a slightly lower quality but more quickly, which is achieved by selectively skipping certain intermediate layers during drafting. Subsequently, the verification stage employs the original LLM to validate those draft output tokens in one forward pass. This process ensures the final output remains identical to that produced by the unaltered LLM. Moreover, the proposed method requires no additional neural network training and no extra memory footprint, making it a plug-and-play and cost-effective solution for inference acceleration. Benchmarks with LLaMA-2 and its variants demonstrated a speedup up to 1.99$\times$.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# 点拡散モデルを用いた大腸3次元形状再構成によるデジタルファントム生成 Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation ( http://arxiv.org/abs/2309.08289v2 ) ライセンス: Link先を確認	Kaouther Mouheb, Mobina Ghojogh Nejad, Lavsen Dahal, Ehsan Samei, Kyle J. Lafata, W. Paul Segars, Joseph Y. Lo,	(参考訳) 人間の臓器の正確な3Dモデリングは、仮想画像実験のための計算ファントムの構築において重要な役割を担っている。しかし、CTスキャンから臓器表面の解剖学的に妥当な再構成を生成することは、人体の多くの構造にとって依然として困難である。この課題は大腸を扱う際に特に顕著である。本研究では,近年の幾何学的深層学習の進歩と拡散確率モデルのデノベーションを活用して,大腸のセグメンテーション結果を改良する。まず、臓器を3Dセグメンテーションマスクの表面から採取した点雲として表現する。その後,臓器形状のグローバルおよび局所的な潜在表現を得るために,階層的変分オートエンコーダを用いた。階層型潜在空間における2つの条件付き微分拡散モデルを訓練し、形状改善を行う。提案手法をさらに強化するため,得られた完全点雲からスムーズなメッシュを生成することのできる最先端表面再構成モデルを組み込んだ。実験の結果,臓器形状のグローバル分布と微細な細部の両方を捉えるためのアプローチの有効性が示された。我々の完全な精錬パイプラインは、初期セグメンテーションに比べて表面表現の顕著な向上を示し、チャンファー距離を70%、ハウスドルフ距離を32%、アースモーバー距離を6%削減した。幾何学的深層学習, 拡散モデル, 高度な表面再構成技術を組み合わせることで, 大腸表面を正確にモデル化し, 他の解剖学的構造にも容易に拡張できることを示す。 Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent advancements in geometric deep learning and denoising diffusion probabilistic models to refine the segmentation results of the large intestine. We begin by representing the organ as point clouds sampled from the surface of the 3D segmentation mask. Subsequently, we employ a hierarchical variational autoencoder to obtain global and local latent representations of the organ's shape. We train two conditional denoising diffusion models in the hierarchical latent space to perform shape refinement. To further enhance our method, we incorporate a state-of-the-art surface reconstruction model, allowing us to generate smooth meshes from the obtained complete point clouds. Experimental results demonstrate the effectiveness of our approach in capturing both the global distribution of the organ's shape and its fine details. Our complete refinement pipeline demonstrates remarkable enhancements in surface representation compared to the initial segmentation, reducing the Chamfer distance by 70%, the Hausdorff distance by 32%, and the Earth Mover's distance by 6%. By combining geometric deep learning, denoising diffusion models, and advanced surface reconstruction techniques, our proposed method offers a promising solution for accurately modeling the large intestine's surface and can easily be extended to other anatomical structures.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# アダプティブ・プライオリティ・リライジングによる公正分類器の一般化の促進 Boosting Fair Classifier Generalization through Adaptive Priority Reweighing ( http://arxiv.org/abs/2309.08375v3 ) ライセンス: Link先を確認	Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He,	(参考訳) 重要な意思決定領域における機械学習アプリケーションの普及に伴い、アルゴリズム的公正性の要求がより顕著になる。公正性制約を学習することでアルゴリズムの公正性を改善するための様々なモダリティがあるが、それらの性能はテストセットではうまく一般化しない。より優れた一般化性を持つ性能向上フェアアルゴリズムが必要である。本稿では,トレーニングデータとテストデータ間の分散シフトがモデル一般化性に与える影響を解消する適応的リライジング手法を提案する。以前のリウィーディング法のほとんどは、各(部分)群に対して統一重みを割り当てることを提案している。むしろ,本手法は,サンプル予測から決定境界までの距離を詳細にモデル化する。適応的リウィーディング法は, 決定境界に近いサンプルを優先し, 公平な分類器の一般化性を向上させるために, より高い重みを割り当てる。グラフ型ベンチマークにおいて,適応的優先順位付け手法の精度と公平度(等機会,等化確率,人口比率)の一般化性を検証するため,広範囲な実験を行った。また,言語と視覚モデルの公平性を向上する上で,本手法の性能を強調した。コードはhttps://github.com/che2198/APW.comで公開されている。 With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability. Most previous reweighing methods propose to assign a unified weight for each (sub)group. Rather, our method granularly models the distance from the sample predictions to the decision boundary. Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers. Extensive experiments are performed to validate the generalizability of our adaptive priority reweighing method for accuracy and fairness measures (i.e., equal opportunity, equalized odds, and demographic parity) in tabular benchmarks. We also highlight the performance of our method in improving the fairness of language and vision models. The code is available at https://github.com/che2198/APW.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# スクリーンだけ見る:マルチモーダル・チェーン・オブ・アクション・エージェント You Only Look at Screens: Multimodal Chain-of-Action Agents ( http://arxiv.org/abs/2309.11436v3 ) ライセンス: Link先を確認	Zhuosheng Zhang, Aston Zhang,	(参考訳) 自律型グラフィカルユーザインタフェース(GUI)エージェントは、手作業による介入なしにユーザインタフェースと対話することで、タスクの自動化を促進することを目的としている。近年,多様な環境において,大規模言語モデル(LLM)を効果的に活用する能力について検討している。 LLMの入出力要件に合わせて、既存のほとんどのアプローチはサンドボックス環境下で開発され、外部ツールやアプリケーション固有のAPIに依存して、環境をテキスト要素に解析し、予測されたアクションを解釈する。その結果、これらのアプローチは推論の非効率性とエラーの伝播リスクに悩まされることが多い。課題を軽減するため、私たちはAuto-GUIを導入しました。Auto-GUIはインターフェースと直接対話するマルチモーダルソリューションで、環境解析やアプリケーション依存APIへの依存を回避します。さらに、エージェントが実行すべきアクションを決定するのを助けるために、一連の中間的なアクション履歴と将来のアクション計画を活用するチェーン・オブ・アクション手法を提案する。我々は,アプリケーション操作やWeb検索,Webショッピングといったマルチステップタスクにまたがる,30$Kのユニークな命令を持つ新しいデバイス制御ベンチマークAITWに対するアプローチを評価した。実験の結果,Auto-GUIは動作型予測精度90\%,総合動作成功率74\%で最先端性能を達成することがわかった。コードはhttps://github.com/cooelf/Auto-GUIで公開されている。 Autonomous graphical user interface (GUI) agents aim to facilitate task automation by interacting with the user interface without manual intervention. Recent studies have investigated eliciting the capabilities of large language models (LLMs) for effective engagement in diverse environments. To align with the input-output requirement of LLMs, most existing approaches are developed under a sandbox setting where they rely on external tools and application-specific APIs to parse the environment into textual elements and interpret the predicted actions. Consequently, those approaches often grapple with inference inefficiency and error propagation risks. To mitigate the challenges, we introduce Auto-GUI, a multimodal solution that directly interacts with the interface, bypassing the need for environment parsing or reliance on application-dependent APIs. Moreover, we propose a chain-of-action technique -- leveraging a series of intermediate previous action histories and future action plans -- to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions, spanning multi-step tasks such as application operation, web searching, and web shopping. Experimental results show that Auto-GUI achieves state-of-the-art performance with an action type prediction accuracy of 90\% and an overall action success rate of 74\%. Code is publicly available at https://github.com/cooelf/Auto-GUI.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# Jury: 総合評価ツールキット Jury: A Comprehensive Evaluation Toolkit ( http://arxiv.org/abs/2310.02040v2 ) ライセンス: Link先を確認	Devrim Cavusoglu, Secil Sen, Ulas Sert, Sinan Altinuc,	(参考訳) 評価は、どんな予測ベースシステムの基本ブロックとして、ディープラーニングにおいて重要な役割を果たす。しかし、膨大な数の自然言語処理(NLP)タスクと様々なメトリクスの開発が、異なるメトリクスで異なるシステムを評価する上での課題につながっている。これらの課題に対処するために、さまざまなタスクやメトリクスに対して評価を行うための標準化された構造を備えた統一的な評価フレームワークである陪審を導入する。陪審の目的は、すべてのシステムに対するメートル法評価の標準化と改善であり、評価の課題を克服するコミュニティを支援することである。オープンソースリリース以来、審査員は幅広い読者にリーチし、https://github.com/obss/jury.comで入手できる。 Evaluation plays a critical role in deep learning as a fundamental block of any prediction-based system. However, the vast number of Natural Language Processing (NLP) tasks and the development of various metrics have led to challenges in evaluating different systems with different metrics. To address these challenges, we introduce jury, a toolkit that provides a unified evaluation framework with standardized structures for performing evaluation across different tasks and metrics. The objective of jury is to standardize and improve metric evaluation for all systems and aid the community in overcoming the challenges in evaluation. Since its open-source release, jury has reached a wide audience and is available at https://github.com/obss/jury.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# 振動による浮遊型マイクロ磁気シリンダの線形冷却 Linear cooling of a levitated micromagnetic cylinder by vibration ( http://arxiv.org/abs/2310.03880v2 ) ライセンス: Link先を確認	Chris Timberlake, Elliot Simcox, Hendrik Ulbricht,	(参考訳) 我々は, 圧電アクチュエータを用いて, 高Qの機械的モードに線形フィードバックを適用することにより, 導電性マイクロマグネットシリンダの変換自由度とリリレーショナル自由度のフィードバック冷却を報告する。通常のモードは、直流SQUIDに結合した超伝導ピックアップコイルを用いて測定され、位相情報は圧電アクチュエータにフィードバックされ、中心質量モードを~$\sim$~7〜Kに冷却し、830 \pm 200$~mKにリリレーモードとする。中心質量モードでは1.0 \times 10^7$のQ因子が評価される。振動分離を導入し, ピックアップコイルの形状を最適化し, 注目モードに焦点を合わせ, 検出に最先端のSQUIDを利用することにより, 地中冷却が可能であることが判明した。 We report feedback cooling of translational and librational degrees of freedom of a levitated micromagnet cylinder, utilizing a piezoelectric actuator to apply linear feedback to high-Q mechanical modes. The normal modes are measured with a superconducting pick-up coil coupled to a DC SQUID, and phase information is fed back to the piezoelectric actuator to feedback cool a center-of-mass mode to ~$\sim$~7~K, and a librational mode to $830 \pm 200$~mK. Q-factors of $1.0 \times 10^7$ are evaluated for the center-of-mass mode. We find that it is plausible to achieve ground state cooling of the center-of-mass mode by introducing vibration isolation, optimizing the geometry of the pick-up coil to focus on the specific mode of interest and utilizing a state-of-the-art SQUID for detection.	翻訳日:2024-05-22 00:30:29 公開日:2024-05-20
# シェープ値に基づく導電性勾配の新しいベースライン推定 A New Baseline Assumption of Integated Gradients Based on Shaply value ( http://arxiv.org/abs/2310.04821v3 ) ライセンス: Link先を確認	Shuyang Liu, Zixuan Chen, Ge Shi, Ji Wang, Changjie Fan, Yu Xiong, Runze Wu Yujing Hu, Ze Ji, Yang Gao,	(参考訳) ディープニューラルネットワーク(DNN)をデコードする試みは、しばしば、予測を入力機能にマッピングする。これらの手法の中で、統合勾配(IG)が重要な手法として登場している。 IGにおける適切なベースラインの選択は、多種多様な設定におけるモデル予測の有意義で偏見のない説明を作成するために不可欠である。しかし、単一のベースラインを利用する標準的なアプローチは、しばしば不十分であり、複数のベースラインが必要である。 IGとAumann-Shapley Valueの自然な結びつきを活用し、ベースライン設計の新たな展望を提供する。理論的には、ある仮定の下では、基本ラインの集合は、共有価値(Shapley Value)によって記述された連立関係と一致している。この知見に基づいて,Shapley値計算プロセスに比例サンプリングを用いたShapley Integrated Gradients (SIG) と呼ばれる新しいベースライン手法を開発した。 GridWorldで実施されたシミュレーションでは、SIGがシェープ値の分布を効果的にエミュレートしている。さらに、様々な画像処理タスクに関する実証テストでは、SIGが従来のIGベースラインメソッドを超越し、より正確な特徴の見積もりを提供し、異なるアプリケーション間で一貫した説明を提供し、追加の計算要求を無視できる多様なデータタイプへの適応性を確保する。 Efforts to decode deep neural networks (DNNs) often involve mapping their predictions back to the input features. Among these methods, Integrated Gradients (IG) has emerged as a significant technique. The selection of appropriate baselines in IG is crucial for crafting meaningful and unbiased explanations of model predictions in diverse settings. The standard approach of utilizing a single baseline, however, is frequently inadequate, prompting the need for multiple baselines. Leveraging the natural link between IG and the Aumann-Shapley Value, we provide a novel outlook on baseline design. Theoretically, we demonstrate that under certain assumptions, a collection of baselines aligns with the coalitions described by the Shapley Value. Building on this insight, we develop a new baseline method called Shapley Integrated Gradients (SIG), which uses proportional sampling to mirror the Shapley Value computation process. Simulations conducted in GridWorld validate that SIG effectively emulates the distribution of Shapley Values. Moreover, empirical tests on various image processing tasks show that SIG surpasses traditional IG baseline methods by offering more precise estimates of feature contributions, providing consistent explanations across different applications, and ensuring adaptability to diverse data types with negligible additional computational demand.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# UniParser: 相関表現学習を統一したマルチヒューマンパーシング UniParser: Multi-Human Parsing with Unified Correlation Representation Learning ( http://arxiv.org/abs/2310.08984v2 ) ライセンス: Link先を確認	Jiaming Chu, Lei Jin, Junliang Xing, Jian Zhao,	(参考訳) マルチヒューマンパーシング(Multi- Human parsing)は、インスタンスレベルと詳細なカテゴリレベルの情報の両方を必要とするイメージセグメンテーションタスクである。しかし、以前の研究では、これらの2つのタイプの情報を別々のブランチと異なる出力形式で処理し、非効率で冗長なフレームワークを生み出した。本稿では、インスタンスレベルとカテゴリレベルの表現を3つの重要な側面に統合するUniParserを紹介する。 1)コサイン空間内のインスタンスやカテゴリの特徴をネットワークで学べる統合された相関表現学習手法を提案する。 2)各モジュールの出力の形式を画素レベルのセグメンテーション結果として統一するとともに,補助的損失を伴う同種ラベルを用いてインスタンスとカテゴリの特徴を監督する。 3)インスタンスとカテゴリ表現を融合させる共同最適化手法を設計する。インスタンスレベルの出力とカテゴリレベルの出力を統合することで、UniParserは手動で設計した後処理技術を回避し、最先端の手法を超越し、MHPv2.0では49.3%のAP、CIHPでは60.4%のAPを達成した。私たちは、将来の研究を促進するために、ソースコード、事前訓練されたモデル、オンラインデモをリリースします。 Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# 人間の記憶機構にインスパイアされた推論のためのフレームワーク A Framework for Inference Inspired by Human Memory Mechanisms ( http://arxiv.org/abs/2310.09297v2 ) ライセンス: Link先を確認	Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang,	(参考訳) 人間と機械は、認識された情報を過去の記憶の文脈に含めながら、関係推論と質問応答の現在の入力をどう理解するかは、認知科学と人工知能において難しい問題となっている。人間の脳の記憶システムと認知アーキテクチャに触発され,知覚,記憶,推論の構成要素からなるPMIフレームワークを提案する。特に、メモリモジュールは、ワーキングメモリと長期メモリから構成されており、後者は、広範囲で複雑なリレーショナル知識と経験を保持するために、高次構造を備えている。異なる競合する書き込みアクセスを通じて、現在の知覚はワーキングメモリを更新し、後に外部製品アソシエーションを通じて長期記憶にマージされ、情報競合を低減し、メモリオーバーフローを回避する。推論モジュールでは、2つの別々のメモリ起源から関連情報を検索し、連想的に統合して現在の知覚をより包括的かつ正確に解釈する。我々は,bAbI-20kやSolt-of-CLEVRデータセットなどの質問応答タスクに対して,PMIを爆発的に適用し,また,等角三角形,言語モデリング,画像分類タスクを検出するとともに,PMIの強化により,元のモデルを大きく上回っている。可視化解析により、リレーショナルメモリの統合は、多様なメモリソースからの情報の相互作用と統合と共に、推論タスクにおけるモデルの有効性に大きく寄与することが明らかとなった。 How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain extensive and complex relational knowledge and experience. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, reducing information conflicts and averting memory overflow. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as detecting equilateral triangles, language modeling and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that relational memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# ビッグデータコンテキストにおけるK平均クラスタリング最適化手法の比較分析: Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review ( http://arxiv.org/abs/2310.09819v3 ) ライセンス: Link先を確認	Ravil Mussabayev, Rustam Mussabayev,	(参考訳) 本稿では,ビッグデータの文脈におけるK平均アルゴリズムの最適化手法の比較分析を行う。 K-meansはクラスタリングアルゴリズムとして広く使用されているが、大規模なデータセットを扱う場合、スケーラビリティの問題に悩まされる可能性がある。本稿では、並列化、近似、サンプリング方法など、これらの問題を克服するための様々なアプローチについて検討する。著者らは、多数のベンチマークデータセット上で、様々なクラスタリング技術の性能を評価し、それらを"less is more"アプローチ(LIMA)によって提供される支配的基準、すなわち、スピード、クラスタリング品質、単純さの次元に沿って同時に比較した。その結果、異なる手法がデータセットの異なるタイプに適していることが示され、ビッグデータのK平均クラスタリングにおける速度と精度のトレードオフに関する洞察を提供する。全体として、この論文は、ビッグデータアプリケーションにK平均をどのように最適化するかについて、実践者や研究者に包括的なガイドを提供する。 This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with large datasets. The paper explores different approaches to overcome these issues, including parallelization, approximation, and sampling methods. The authors evaluate the performance of various clustering techniques on a large number of benchmark datasets, comparing them according to the dominance criterion provided by the "less is more" approach (LIMA), i.e., simultaneously along the dimensions of speed, clustering quality, and simplicity. The results show that different techniques are more suitable for different types of datasets and provide insights into the trade-offs between speed and accuracy in K-means clustering for big data. Overall, the paper offers a comprehensive guide for practitioners and researchers on how to optimize K-means for big data applications.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# KI-PMF:知識統合可塑性運動予測 KI-PMF: Knowledge Integrated Plausible Motion Forecasting ( http://arxiv.org/abs/2310.12007v2 ) ライセンス: Link先を確認	Abhishek Vivekanandan, Ahmed Abouelazm, Philip Schörner, J. Marius Zöllner,	(参考訳) 交通機関の正確な動きを予測することは、大規模な自動運転車の配備に不可欠である。現在の軌道予測アプローチは、主に特定の計量で損失関数を最適化することに集中しており、これは物理的法則に従わない、あるいは外部の制約に反しない予測をもたらす可能性がある。本研究の目的は,車両の運動的制約と運転環境の幾何学的制約に適合して,ネットワークが将来の軌跡を予測できる明示的な知識を組み込むことである。これを実現するために、定義した知識事前を統合するために、非パラメトリックプルーニング層とアテンション層を導入する。提案手法は,複雑な状況と動的状況の両方において,交通アクタの到達可能性を保証するように設計されている。ネットワークに物理法則に従うよう条件付けすることで、現実の環境での自動運転車の安全性と効率を維持する上で不可欠な正確かつ安全な予測が得られ、要約して、トレーニングプロセスに知識を取り入れることで、安全で信頼性の高い動き予測のためのオフロード予測を防止する概念を提示する。 Accurately forecasting the motion of traffic actors is crucial for the deployment of autonomous vehicles at a large scale. Current trajectory forecasting approaches primarily concentrate on optimizing a loss function with a specific metric, which can result in predictions that do not adhere to physical laws or violate external constraints. Our objective is to incorporate explicit knowledge priors that allow a network to forecast future trajectories in compliance with both the kinematic constraints of a vehicle and the geometry of the driving environment. To achieve this, we introduce a non-parametric pruning layer and attention layers to integrate the defined knowledge priors. Our proposed method is designed to ensure reachability guarantees for traffic actors in both complex and dynamic situations. By conditioning the network to follow physical laws, we can obtain accurate and safe predictions, essential for maintaining autonomous vehicles' safety and efficiency in real-world settings.In summary, this paper presents concepts that prevent off-road predictions for safe and reliable motion forecasting by incorporating knowledge priors into the training process.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# 相互共振相互作用を超越した固定結合、固定周波数トランスモンにおけるネイティブ2量子ゲート Native two-qubit gates in fixed-coupling, fixed-frequency transmons beyond cross-resonance interaction ( http://arxiv.org/abs/2310.12146v2 ) ライセンス: Link先を確認	Ken Xuan Wei, Isaac Lauer, Emily Pritchett, William Shanks, David C. McKay, Ali Javadi-Abhari,	(参考訳) 固定周波数超伝導量子ビットは、安定かつスケーラブルな量子コンピューティングのプラットフォームとして素晴らしい成功を収めた。クロス共振ゲートは固定結合で固定周波数の超伝導プロセッサのワークホースであり、隣人の周波数と1キュービットの共振で発生した絡み合いを利用して高忠実で普遍的なCNOTを実現している。ここでは、オン共振およびオフ共振マイクロ波駆動を用いてクロス共振を超越し、CNOTと等価でないネイティブに興味深い2ビットゲートを実現する。特に、ネイティブISWAP、SWAP、$\sqrt{\text{ISWAP}}$、BSWAPゲートを実装し、ベンチマークする。さらに、これらの手法をBゲートの効率的な構成に応用し、任意の2ビットゲートに到達可能な完全エンタングルを2つの応用で実現した。これらのネイティブな2ビットゲートは、クロス共振ゲートからコンパイルしたゲートよりも優れていることを示す。本研究では,各2ビットゲートの駆動に必要な共振条件を解明し,それをカイスキットで実装するための新しいフレームトラッキング技術を提供する。 Fixed-frequency superconducting qubits demonstrate remarkable success as platforms for stable and scalable quantum computing. Cross-resonance gates have been the workhorse of fixed-coupling, fixed-frequency superconducting processors, leveraging the entanglement generated by driving one qubit resonantly with a neighbor's frequency to achieve high-fidelity, universal CNOTs. Here, we use on-resonant and off-resonant microwave drives to go beyond cross-resonance, realizing natively interesting two-qubit gates that are not equivalent to CNOTs. In particular, we implement and benchmark native ISWAP, SWAP, $\sqrt{\text{ISWAP}}$, and BSWAP gates. Furthermore, we apply these techniques for an efficient construction of the B-gate: a perfect entangler from which any two-qubit gate can be reached in only two applications. We show these native two-qubit gates are better than their counterparts compiled from cross-resonance gates. We elucidate the resonance conditions required to drive each two-qubit gate and provide a novel frame tracking technique to implement them in Qiskit.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# タンパク質リガンド構造予測モデルの可能性を解き放つため, HelixDock を用いた大規模ドッキングコンフォーメーションの事前評価 Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models ( http://arxiv.org/abs/2310.13913v3 ) ライセンス: Link先を確認	Lihang Liu, Shanzhuo Zhang, Donglong He, Xianbin Ye, Jingbo Zhou, Xiaonan Zhang, Yaoyao Jiang, Weiming Diao, Hang Yin, Hua Chai, Fan Wang, Jingzhou He, Liang Zheng, Yonghui Li, Xiaomin Fang,	(参考訳) タンパク質リガンド構造予測は、小さな分子(リガンド)と標的タンパク質(受容体)の結合相互作用を予測する薬物発見において必須の課題である。近年の進歩は、タンパク質リガンド構造予測の精度を向上させるためのディープラーニング技術が組み込まれている。それでもドッキングコンフォーメーションの実験的な検証はコストがかかるままであり、訓練データに制限があるため、これらの深層学習手法の一般化可能性に関する懸念が高まる。本研究では,従来の物理ドッキングツールによる大規模ドッキングコンフォメーションの事前トレーニングを行い,実験によって検証された受容体-リガンド複合体の限定セットを用いて微調整を行うことにより,優れた性能を有するタンパク質-リガンド構造予測モデルが得られることを示す。具体的には、このプロセスはタンパク質とリガンドのペアリングのための1億ドッキングコンフォメーションを生成し、約100万のCPUコア日を要した。提案モデルであるHelixDockは,物理ベースのドッキングツールによってカプセル化された物理知識を,事前学習期間中に取得することを目的としている。 HelixDockは、物理学ベースのベースラインとディープラーニングベースのベースラインの両方に対して厳格にベンチマークされ、バインディング確認の予測において、例外的な精度と堅牢な転送性を示している。さらに,本研究は,事前学習したタンパク質リガンド構造予測モデルに基づくスケーリング法則を明らかにし,モデルパラメータの増加と事前学習データ量の増加に伴う性能の持続的な向上を示唆している。さらに,HelixDockをいくつかの薬物発見関連タスクに適用し,その実用性を検証した。 HelixDockはクロスドッキングと構造ベースの仮想スクリーニングベンチマークの両方で優れた機能を示している。 Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises concerns regarding the generalizability of these deep learning-based methods due to the limited training data. In this work, we show that by pre-training on a large-scale docking conformation generated by traditional physics-based docking tools and then fine-tuning with a limited set of experimentally validated receptor-ligand complexes, we can obtain a protein-ligand structure prediction model with outstanding performance. Specifically, this process involved the generation of 100 million docking conformations for protein-ligand pairings, an endeavor consuming roughly 1 million CPU core days. The proposed model, HelixDock, aims to acquire the physical knowledge encapsulated by the physics-based docking tools during the pre-training phase. HelixDock has been rigorously benchmarked against both physics-based and deep learning-based baselines, demonstrating its exceptional precision and robust transferability in predicting binding confirmation. In addition, our investigation reveals the scaling laws governing pre-trained protein-ligand structure prediction models, indicating a consistent enhancement in performance with increases in model parameters and the volume of pre-training data. Moreover, we applied HelixDock to several drug discovery-related tasks to validate its practical utility. HelixDock demonstrates outstanding capabilities on both cross-docking and structure-based virtual screening benchmarks.	翻訳日:2024-05-22 00:20:28 公開日:2024-05-20
# 時系列因果グラフの抽象化による全効果の同定可能性 Identifiability of total effects from abstractions of time series causal graphs ( http://arxiv.org/abs/2310.14691v5 ) ライセンス: Link先を確認	Charles K. Assaad, Emilie Devijver, Eric Gaussier, Gregor Gössler, Anouar Meynaoui,	(参考訳) 実例では,真の因果グラフの抽象化にのみアクセス可能な状況において,観測時系列からの介入による全効果の識別可能性の問題について検討する。ここでは、全てのラタグ因果関係を混同するが、ラタグ関係と即時関係を区別する拡張要約因果グラフと、因果関係間の遅延を示さない要約因果グラフの2つの抽象化について考察する。要約因果グラフでは,全効果が常に識別可能であることを示し,要約因果グラフにおける識別可能性について十分な条件を提供する。さらに、特定可能な場合の総効果を推定するための調整セットも提供します。 We study the problem of identifiability of the total effect of an intervention from observational time series in the situation, common in practice, where one only has access to abstractions of the true causal graph. We consider here two abstractions: the extended summary causal graph, which conflates all lagged causal relations but distinguishes between lagged and instantaneous relations, and the summary causal graph which does not give any indication about the lag between causal relations. We show that the total effect is always identifiable in extended summary causal graphs and provide sufficient conditions for identifiability in summary causal graphs. We furthermore provide adjustment sets allowing to estimate the total effect whenever it is identifiable.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# タブラルデータクエリと可視化のための自然言語インタフェース:サーベイ Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey ( http://arxiv.org/abs/2310.17894v3 ) ライセンス: Link先を確認	Weixu Zhang, Yifei Wang, Yuanfeng Song, Victor Junqiu Wei, Yuxing Tian, Yiyan Qi, Jonathan H. Chan, Raymond Chi-Wing Wong, Haiqin Yang,	(参考訳) 自然言語処理の出現は、ユーザが表形式のデータと対話する方法に革命をもたらし、従来のクエリ言語や手作業によるプロットから、より直感的な言語ベースのインターフェースへの移行を可能にした。 ChatGPTなどの大規模言語モデル(LLM)の台頭は、この分野をさらに進歩させ、自然言語処理技術のための新たな道を開いた。本調査では,自然言語クエリによるデータ操作を可能にする,表形式のデータクエリと可視化のための自然言語インタフェースの概要を概観する。自然言語からSQLクエリやデータ視覚化コマンドへの変換を容易にする重要な技術であるセマンティック解析に特に重点を置いて、これらのインターフェースの基礎となる概念とテクニックを紹介します。次に、データセット、方法論、メトリクス、システム設計の観点から、Text-to-SQLおよびText-to-Vis問題の最近の進歩を掘り下げます。この中には、LSMの影響を深く掘り下げ、その強み、制限、将来の改善の可能性を強調している。本調査は,大規模言語モデルの時代におけるデータインタラクションのための自然言語インタフェースの開発と適用に関心のある研究者や実践者を対象としたロードマップの提供を目的とする。 The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# GIST: 生成入力はディープラーニングにおける転送可能性を設定する GIST: Generated Inputs Sets Transferability in Deep Learning ( http://arxiv.org/abs/2311.00801v3 ) ライセンス: Link先を確認	Florian Tambon, Foutse Khomh, Giuliano Antoniol,	(参考訳) ディープニューラルネットワーク(DNN)の妥当性とテスト性を高めるため,テストケース生成手法の開発が進んでいる。 DNNモデルのテストに直面すると、ユーザーは既存のテスト生成テクニックを適用できる。しかし、テスト中の各テクニックと各DNNモデルに対してそうする必要がある。テスト中の各DNNモデルに対して独立してテストセットを再生するのではなく、既存のDNNモデルから移行することができる。本稿では、テストセットの効率的な転送のための新しいアプローチであるGIST(Generated Inputs Sets Transferability)を紹介する。ユーザによって選択されたプロパティ(例えば、ニューロンがカバーされ、障害)が与えられた場合、GISTは、利用可能なテストセットのうち、このプロパティの観点から良いテストセットを選択することができる。これにより、ユーザは、テストケース生成技術を使って、スクラッチからテストセットを生成することで、転送されたテストセット上の同様のプロパティを回復することができる。実験結果から,GISTは移動対象のプロパティに対して有効なテストセットを選択することができることがわかった。さらに、GISTはテスト中のDNNモデルでスクラッチからテストケース生成テクニックを再適用するよりもスケールが優れている。 To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed. When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models. This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# 自然言語記述を用いたインコンテクスト学習のロバスト性向上 Using Natural Language Explanations to Improve Robustness of In-context Learning ( http://arxiv.org/abs/2311.07556v2 ) ライセンス: Link先を確認	Xuanli He, Yuxiang Wu, Oana-Maria Camburu, Pasquale Minervini, Pontus Stenetorp,	(参考訳) 近年の研究では、大規模言語モデル(LLM)が、文脈内学習(ICL)を通じて多くのタスクを遂行できることが示されている。しかし,近年の研究では,逆入力を用いた場合,ICLが提案するモデルでは不正確な結果が生じる傾向が示されている。本研究では,自然言語の推論とパラフレーズ識別を対象とする敵対的データセットにおいて,自然言語説明(NLE)によるICLの強化がLLMの堅牢性を向上させるか否かを検討する。我々は,人間生成NLEの小さなセットでLSMにさらなるNLEを生成するよう促し,ゼロショットICL設定と人生成NLEの使用の両方よりも正確な結果を得る。 5つのLLM (GPT3.5-turbo, Llama2, Vicuna, Zephyr, Mistral) の結果から, HANS, ISCS, NaN, ST, PICD, PISP, ANLI, PAWS の8つのデータセットに対するベースラインアプローチよりも6%以上の改善が得られた。さらに、従来の研究では、迅速な選択戦略により、分布内テストセット上でのICLが著しく向上することが示されている。しかし, 本手法はロバスト性評価に適合せず, その結果, 提案手法と比較して8%の精度低下がみられた。 Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recent works show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. We prompt LLMs with a small set of human-generated NLEs to produce further NLEs, yielding more accurate results than both a zero-shot-ICL setting and using only human-generated NLEs. Our results on five popular LLMs (GPT3.5-turbo, Llama2, Vicuna, Zephyr, and Mistral) show that our approach yields over 6% improvement over baseline approaches for eight adversarial datasets: HANS, ISCS, NaN, ST, PICD, PISP, ANLI, and PAWS. Furthermore, previous studies have demonstrated that prompt selection strategies significantly enhance ICL on in-distribution test sets. However, our findings reveal that these strategies do not match the efficacy of our approach for robustness evaluations, resulting in an accuracy drop of 8% compared to the proposed approach.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# 量子部分空間法における適応的測定戦略 Adaptive measurement strategy for quantum subspace methods ( http://arxiv.org/abs/2311.07893v2 ) ライセンス: Link先を確認	Yuma Nakamura, Yoshichika Yano, Nobuyuki Yoshioka,	(参考訳) 未知の量子状態に対する物理観測値の推定は、量子情報処理、量子物理学、量子化学など幅広い分野の基礎となる重要な問題である。特に量子計算の文脈では、既存の研究は主に、既知の古典的な記述を持つ特定の可観測物に対する全体論的な状態トモグラフィーや推定に焦点を当てているが、これは推定対象自体が測定結果に依存している重要な問題のクラスを欠いている。本研究では、量子部分空間法、すなわち古典的後処理を計測結果に用いた変分シミュレーション法に有用な適応的測定最適化法を提案する。提案手法は、まず古典的にシミュレート可能な状態の測定プロトコルを決定し、量子測定結果に応じて量子部分空間展開(QSE)のプロトコルを適応的に更新する。数値実験として,分子の励起状態シミュレーションを行った。一適切な測定戦略を構築することにより、測定回数を桁違いに減らすことができること。 (ii) 適応反復は H$_4$ の強い相関分子に対してもうまく収束する。本研究は,QSE法の可能性について,精巧な測定プロトコルを用いて検証し,より効率的な量子計測手法を実用化するための道を開くことを明らかにする。 Estimation of physical observables for unknown quantum states is an important problem that underlies a wide range of fields, including quantum information processing, quantum physics, and quantum chemistry. In the context of quantum computation, in particular, existing studies have mainly focused on holistic state tomography or estimation on specific observables with known classical descriptions, while this lacks the important class of problems where the estimation target itself relies on the measurement outcome. In this work, we propose an adaptive measurement optimization method that is useful for the quantum subspace methods, namely the variational simulation methods that utilize classical postprocessing on measurement outcomes. The proposed method first determines the measurement protocol for classically simulatable states, and then adaptively updates the protocol of quantum subspace expansion (QSE) according to the quantum measurement result. As a numerical demonstration, we have shown for excited-state simulation of molecules that (i) we are able to reduce the number of measurements by an order of magnitude by constructing an appropriate measurement strategy (ii) the adaptive iteration converges successfully even for a strongly correlated molecule of H$_4$. Our work reveals that the potential of the QSE method can be empowered by elaborated measurement protocols, and opens a path to further pursue efficient quantum measurement techniques in practical computations.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# 量子モンテカルロと多モード摂動法による第1相水素の電子励起スペクトル Electronic excitation spectra of molecular hydrogen in Phase I from Quantum Monte Carlo and Many-Body perturbation methods ( http://arxiv.org/abs/2311.08506v2 ) ライセンス: Link先を確認	Vitaly Gorelov, Markus Holzmann, David M. Ceperley, Carlo Pierleoni,	(参考訳) 固体水素(フェーズI)中の電子励起スペクトルを,量子モンテカルロ法および多体摂動理論を用いて,周囲温度および5-90 GPa圧力で検討した。この範囲では、システムは広いギャップ分子絶縁体から半導体に変化し、励起の性質は局所化から非局在化する。計算されたギャップとスペクトルは実験に一致し、核量子および熱効果の存在下で多体系のバンドギャップを正確に予測する能力を示す。 We study the electronic excitation spectra in solid molecular hydrogen (phase I) at ambient temperature and 5-90 GPa pressures using Quantum Monte Carlo methods and Many-Body Perturbation Theory. In this range, the system changes from a wide gap molecular insulator to a semiconductor, altering the nature of the excitations from localized to delocalized. Computed gaps and spectra agree with experiments, proving the ability to predict accurately band gaps of many-body systems in presence of nuclear quantum and thermal effects.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# 正規微分方程式(SA-nODE)によるニューラルネットワーク分類のための安定なトラクター Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE) ( http://arxiv.org/abs/2311.10387v2 ) ライセンス: Link先を確認	Raffaele Marino, Lorenzo Giambagli, Lorenzo Chicchi, Lorenzo Buffoni, Duccio Fanelli,	(参考訳) 機械学習と力学系理論の交点に位置する教師付き分類の新しい手法を提案する。通常の微分方程式を分類目的に用いた他の手法との相違点において、訓練されていないモデルは事前割り当てされた定常的誘引器の集合に対応するように構築された先行性である。分類量は、入力として供給された処理項目の特異性に応じて、植木された引き金の1つに向かってダイナミクスを操る。漸近的に、システムは探索された多次元空間の特定の点に収束し、最終的に分類される対象の圏を宣言する。この文脈で作業する際、訓練されたモデルによって取得されたポストによって固有の分類を行う能力は、最終的にターゲットの安定なアトラクションのそれぞれに関連するアトラクションの形状の流域に反映される。提案手法の性能は,その目的のために製作されたシンプルな玩具モデルや,確立された参照基準に頼って評価される。この手法は最先端のディープラーニングアルゴリズムの性能には達しないが、解析的相互作用項を閉じた連続力学系が高性能な分類器として機能することを示す。 A novel approach for supervised classification is presented which sits at the intersection of machine learning and dynamical systems theory. At variance with other methodologies that employ ordinary differential equations for classification purposes, the untrained model is a priori constructed to accommodate for a set of pre-assigned stationary stable attractors. Classifying amounts to steer the dynamics towards one of the planted attractors, depending on the specificity of the processed item supplied as an input. Asymptotically the system will hence converge on a specific point of the explored multi-dimensional space, flagging the category of the object to be eventually classified. Working in this context, the inherent ability to perform classification, as acquired ex post by the trained model, is ultimately reflected in the shaped basin of attractions associated to each of the target stable attractors. The performance of the proposed method is here challenged against simple toy models crafted for the purpose, as well as by resorting to well established reference standards. Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# バイモーダル畳み込みニューラルネットワークを用いた言語・生理データストリームの認識検出 Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks ( http://arxiv.org/abs/2311.10944v3 ) ライセンス: Link先を確認	Panfeng Li, Mohamed Abouelenien, Rada Mihalcea, Zhicheng Ding, Qikai Yang, Yiming Zhou,	(参考訳) 倫理的・セキュリティ上の懸念から、偽造検知が関心を増している。本稿では,畳み込み型ニューラルネットワークのマルチモーダルな騙し検出への応用について検討する。 2つのトピックについて104人の被験者にインタビューして構築したデータセットを使用します。特に、主な貢献は3つあります。まず,このデータから言語的・生理的特徴を抽出し,ニューラルネットワークモデルを訓練・構築する。次に,両モードを用いた融合畳み込みニューラルネットワークモデルを提案する。第3に,新しい手法と,マルチモーダルな偽装検出のための従来手法を比較した。我々のシステムは通常の分類法よりも優れており,本研究の結果は,限られた量のデータが存在する場合でも,誤検出にニューラルネットワークを用いることの可能性を示している。 Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# Tactics2D: 意思決定のための高度にモジュラーで拡張可能なシミュレータ Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-making ( http://arxiv.org/abs/2311.11058v3 ) ライセンス: Link先を確認	Yueyuan Li, Songan Zhang, Mingyang Jiang, Xingyuan Chen, Yeqiang Qian, Chunxiang Wang, Ming Yang,	(参考訳) シミュレーションは多様で現実的な交通シナリオを生成するための先進的な手法であり、運転意思決定システムの開発を支援する。しかし、既存のシミュレータは、様々なシナリオや、交通参加者の対話的行動モデルでは不足することが多い。この欠陥は、柔軟で信頼性が高く、ユーザフレンドリなオープンソースシミュレータの必要性を浮き彫りにする。この課題に対処するため、Tactics2Dでは、道路要素、交通規制、行動モデル、車両の物理シミュレーション、イベント検出機構を含む、交通シナリオ構築へのモジュラーアプローチを採用している。広く利用されているアルゴリズムと構成を統合することで、Tactics2Dは、ビルディングブロックを組み立てるように、ユーザが強制的に駆動シナリオを構築することができる。ユーザは、パブリックデータセットとユーザによる実世界のデータの両方を活用することで、さまざまなシナリオで意思決定モデルを駆動するパフォーマンスを効果的に評価できる。ソースコードとコミュニティのサポートにアクセスするには、https://github.com/WoodOxen/Tactics2Dの公式GitHubページを参照してほしい。 Simulation is a prospective method for generating diverse and realistic traffic scenarios to aid in the development of driving decision-making systems. However, existing simulators often fall short in diverse scenarios or interactive behavior models for traffic participants. This deficiency underscores the need for a flexible, reliable, user-friendly open-source simulator. Addressing this challenge, Tactics2D adopts a modular approach to traffic scenario construction, encompassing road elements, traffic regulations, behavior models, physics simulations for vehicles, and event detection mechanisms. By integrating numerous commonly utilized algorithms and configurations, Tactics2D empowers users to construct their driving scenarios effortlessly, just like assembling building blocks. Users can effectively evaluate the performance of driving decision-making models across various scenarios by leveraging both public datasets and user-collected real-world data. For access to the source code and community support, please visit the official GitHub page for Tactics2D at https://github.com/WoodOxen/Tactics2D.	翻訳日:2024-05-22 00:10:05 公開日:2024-05-20
# 分散二段階最適化の通信複雑性について On the Communication Complexity of Decentralized Bilevel Optimization ( http://arxiv.org/abs/2311.11342v3 ) ライセンス: Link先を確認	Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao,	(参考訳) 分散二段階最適化は、機械学習に広く応用されているため、ここ数年で活発に研究されている。しかし、既存のアルゴリズムは確率的過次性の推定によって引き起こされる通信の複雑さに悩まされ、その応用を現実のタスクに限定する。この問題に対処するため,各ラウンドの通信コストと通信ラウンド数が少ない不均一な条件下で,分散確率的二段階勾配降下アルゴリズムを開発した。したがって、不均一性に関する強い仮定なしに、既存のアルゴリズムよりもはるかに優れた通信複雑性を実現することができる。我々の知る限りでは、これは不均一な条件下でこれらの理論結果を達成する最初の確率的アルゴリズムである。最終的に実験結果により,本アルゴリズムの有効性が確認された。 Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and a small number of communication rounds. As such, it can achieve a much better communication complexity than existing algorithms without any strong assumptions regarding heterogeneity. To the best of our knowledge, this is the first stochastic algorithm achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# SIAM:ビデオ予測のための簡単な交互ミキサー SIAM: A Simple Alternating Mixer for Video Prediction ( http://arxiv.org/abs/2311.11683v2 ) ライセンス: Link先を確認	Xin Zheng, Ziang Peng, Yuan Cao, Hongming Shan, Junping Zhang,	(参考訳) ビデオ予測は、以前のフレームから将来のフレームを予測するもので、自律運転や天気予報といった幅広い応用がある。既存の最先端の手法は、通常、ビデオから空間的、時間的、または時空間的な特徴を抽出することに焦点を当てる。異なる特徴は、異なるネットワークアーキテクチャから生じるもので、結果のモデルがいくつかのビデオ予測タスクで優れているが、他のモデルでは不十分である。より汎用的なビデオ予測ソリューションを目指して、これらの機能を統一エンコーダデコーダフレームワークで明示的にモデル化し、新しい簡易交互混合器(SIAM)を提案する。 SIAMの斬新さは次元交互混合(DaMi)ブロックの設計にあり、特徴写像の次元の交互化によって空間的・時間的・時空間的特徴をモデル化することができる。大規模な実験結果から,合成シナリオと実世界のシナリオの両方をカバーする4つのベンチマークビデオデータセットにおいて,提案したSIAMの優れた性能を示す。 Video prediction, predicting future frames from the previous ones, has broad applications such as autonomous driving and weather forecasting. Existing state-of-the-art methods typically focus on extracting either spatial, temporal, or spatiotemporal features from videos. Different feature focuses, resulting from different network architectures, may make the resultant models excel at some video prediction tasks but perform poorly on others. Towards a more generic video prediction solution, we explicitly model these features in a unified encoder-decoder framework and propose a novel simple alternating Mixer (SIAM). The novelty of SIAM lies in the design of dimension alternating mixing (DaMi) blocks, which can model spatial, temporal, and spatiotemporal features through alternating the dimensions of the feature maps. Extensive experimental results demonstrate the superior performance of the proposed SIAM on four benchmark video datasets covering both synthetic and real-world scenarios.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 生成的蒸留を伴う拡散モデルの連続学習 Continual Learning of Diffusion Models with Generative Distillation ( http://arxiv.org/abs/2311.14028v2 ) ライセンス: Link先を確認	Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven,	(参考訳) 拡散モデルは画像合成における最先端性能を達成する強力な生成モデルである。しかし、それらのトレーニングには大量のデータと計算資源が必要である。継続的な学習は、新しいタスクを漸進的に学習し、知識を蓄積し、さらなる学習のためにトレーニングされたモデルの再利用を可能にする。生成的リプレイでは、以前のタスクで訓練された生成モデルのコピーが、現在のタスクのデータとインターリーブされた合成データを生成する。しかし、拡散モデルに適用された標準的な生成的リプレイは、デノナイジング能力の破滅的な損失をもたらす。本稿では,拡散モデルの全逆過程を除去する生成蒸留法を提案する。提案手法は,生成的リプレイの継続学習性能を大幅に向上させ,計算コストをわずかに増加させることを実証する。 Diffusion models are powerful generative models that achieve state-of-the-art performance in image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus enabling the reuse of trained models for further learning. One potentially suitable continual learning approach is generative replay, where a copy of a generative model trained on previous tasks produces synthetic data that are interleaved with data from the current task. However, standard generative replay applied to diffusion models results in a catastrophic loss in denoising capabilities. In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model. We demonstrate that our approach substantially improves the continual learning performance of generative replay with only a modest increase in the computational costs.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 微分可能かつ加速された球面調和変換とウィグナー変換 Differentiable and accelerated spherical harmonic and Wigner transforms ( http://arxiv.org/abs/2311.14670v2 ) ライセンス: Link先を確認	Matthew A. Price, Jason D. McEwen,	(参考訳) 科学と工学の多くの分野は、球面多様体上で定義されたデータに遭遇する。球面データのモデリングと解析は、しばしば高次の球面調和変換を必要とする。球面 $\mathbb{S}^2$ と回転群 $\text{SO}(3)$,すなわち球面調和およびウィグナー変換上の一般化フーリエ変換の高速化および微分可能計算のための新しいアルゴリズム構造を開発する。 Wigner $d$-functions の計算に対する再帰的アルゴリズムを提案する。これを分離可能な球面変換と密結合することにより、現代のハードウェアアクセラレータ(例えばGPU)の高スループット計算に適した極めて並列的な構造を示すアルゴリズムを得る。我々はまた、勾配を効率的に計算できるように、ハイブリッド自動微分法と手動微分法を開発した。我々のアルゴリズムは、S2FFTソフトウェアコードのJAX差別化プログラミングフレームワークで実装されています。等角およびHEALPixサンプリングを含む球面の多数のサンプリングがサポートされている。計算誤差は、サンプリング定理を持つ球面サンプリングの機械精度の順である。代替のCコードに対してベンチマークすると、最大400倍の加速度が観測されます。さらに、複数のGPUに分散すると、アルゴリズムの高度に並列化されバランスの取れた性質のため、GPUの数が増加するにつれて、最適な線形スケーリングに非常に近い。十分に多くのGPUにアクセスすることで、我々の変換は前例のない効果的な線形時間複雑性を示す。 Many areas of science and engineering encounter data defined on spherical manifolds. Modelling and analysis of spherical data often necessitates spherical harmonic transforms, at high degrees, and increasingly requires efficient computation of gradients for machine learning or other differentiable programming tasks. We develop novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere $\mathbb{S}^2$ and rotation group $\text{SO}(3)$, i.e. spherical harmonic and Wigner transforms, respectively. We present a recursive algorithm for the calculation of Wigner $d$-functions that is both stable to high harmonic degrees and extremely parallelisable. By tightly coupling this with separable spherical transforms, we obtain algorithms that exhibit an extremely parallelisable structure that is well-suited for the high throughput computing of modern hardware accelerators (e.g. GPUs). We also develop a hybrid automatic and manual differentiation approach so that gradients can be computed efficiently. Our algorithms are implemented within the JAX differentiable programming framework in the S2FFT software code. Numerous samplings of the sphere are supported, including equiangular and HEALPix sampling. Computational errors are at the order of machine precision for spherical samplings that admit a sampling theorem. When benchmarked against alternative C codes we observe up to a 400-fold acceleration. Furthermore, when distributing over multiple GPUs we achieve very close to optimal linear scaling with increasing number of GPUs due to the highly parallelised and balanced nature of our algorithms. Provided access to sufficiently many GPUs our transforms thus exhibit an unprecedented effective linear time complexity.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 拡散確率モデルに基づく残差雑音に基づく画像復元 Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise ( http://arxiv.org/abs/2311.14900v2 ) ライセンス: Link先を確認	Zhenning Shi, Haoshuai Zheng, Chen Xu, Changsheng Dong, Bin Pan, Xueshuo Xie, Along He, Tao Li, Huazhu Fu,	(参考訳) 近年,デノナイズ拡散モデルの研究が画像復元分野への応用を拡大している。従来の拡散に基づく画像復元法では、劣化した画像を条件入力として利用し、元の劣化拡散過程を変更することなく、逆生成プロセスを効果的に導出する。しかし、劣化した画像は、既に低周波情報を含んでいるため、ガウスホワイトノイズから始めるとサンプリングステップが増加する。本稿では,残項を拡散前処理に組み込んだ一般フレームワークであるResfusionを提案する。私たちの推論プロセスの形式はDDPMと一致しています。我々は,残音の重み付けされた残音を予測対象として導入し,残音における残音項と残音項の量的関係を明示した。滑らかな等価変換を利用することで、Resfusionは最適な加速度ステップを決定し、既存のノイズスケジュールの整合性を維持し、トレーニングと推論プロセスを統一する。実験の結果,Resfusion は ISTD データセット,OL データセット,Raindrop データセットに対して,わずか5つのサンプリングステップで競合性能を示すことがわかった。さらに、画像生成に簡単に適用でき、強力な汎用性で現れる。私たちのコードとモデルはhttps://github.com/nkicsl/Resfusion.comで公開されています。 Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https://github.com/nkicsl/Resfusion.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# DreamPropeller: 並列サンプリングによるスーパーチャージテキスト・ツー・3D生成 DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling ( http://arxiv.org/abs/2311.17082v3 ) ライセンス: Link先を確認	Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon,	(参考訳) テキストから3D生成のための2次元拡散モデルを用いたスコア蒸留(SDS)や変分スコア蒸留(VSD)といった最近の手法は、印象的な生成品質を示している。しかし、そのようなアルゴリズムの長寿命化はユーザー体験を著しく劣化させる。そこで我々はDreamPropellerを提案する。このDreamPropellerは、既存のテキストから3D生成パイプラインに、スコアの蒸留に基づいてラップできる加速アルゴリズムである。我々のフレームワークは、ODEパスを並列サンプリングする古典的なアルゴリズムであるPicard繰り返しを一般化し、モーメントベースの勾配更新や最適化プロセス中の寸法変化などの非ODEパスを3次元生成の場合と同様に考慮することができる。提案アルゴリズムは, 並列計算をウォールクロック時間で処理し, 最大4.7倍の高速化を実現する。 Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 異常運転行動検出のためのサロゲート安全対策を用いたデータ駆動半教師付き機械学習 Data-driven Semi-supervised Machine Learning with Surrogate Safety Measures for Abnormal Driving Behavior Detection ( http://arxiv.org/abs/2312.04610v4 ) ライセンス: Link先を確認	Yongqi Dong, Lanxin Zhang, Haneen Farah, Arkady Zgonnikov, Bart van Arem,	(参考訳) 道路交通の安全と運転者の行動評価には,異常運転行動の検出が重要である。機械学習(ML)アルゴリズムの進歩と自然主義駆動データの蓄積により、多くのMLモデルが異常運転行動検出に採用されている。既存のMLベースの検出器の多くは(完全に)教師付きML法に依存しており、かなりのラベル付きデータを必要とする。しかし、地上の真理ラベルは必ずしも現実世界で利用できておらず、大量のデータをラベル付けするのは面倒である。したがって、異常検出プロセスをより効果的かつ効果的にするために、教師なしまたは半教師なしの手法を検討する必要がある。このギャップを埋めるために,本研究では,複数の異常運転行動(例えば,急激な加速,高速車線変更)を明らかにする大規模実世界のデータを分析し,部分ラベル付きデータを用いて階層的エクストリーム学習マシン(HELM)に基づく半教師付きML法を開発し,その異常運転動作を正確に検出する。さらに、従来のMLベースアプローチでは、基本車両の動作特性(速度や加速度など)を利用して異常運転行動のラベル付けと検出を行うのに対して、本研究では、MLモデルの入力機能としてサロゲート安全対策(SSM)を導入し、検出性能を向上させることを目的とする。実験結果から,提案した半教師付きMLモデルの有効性を示すとともに,SSMが重要な特徴であることを示す。提案した半教師付きML法は、様々な指標(例えば、99.58%で最高の精度、0.9913で最高のF-1測定値)に関して、他のベースラインの半教師付きあるいは教師なしの手法よりも優れている。アブレーション研究は, 検出性能向上におけるSSMsの重要性をさらに強調した。 Detecting abnormal driving behavior is critical for road traffic safety and the evaluation of drivers' behavior. With the advancement of machine learning (ML) algorithms and the accumulation of naturalistic driving data, many ML models have been adopted for abnormal driving behavior detection. Most existing ML-based detectors rely on (fully) supervised ML methods, which require substantial labeled data. However, ground truth labels are not always available in the real world, and labeling large amounts of data is tedious. Thus, there is a need to explore unsupervised or semi-supervised methods to make the anomaly detection process more feasible and efficient. To fill this research gap, this study analyzes large-scale real-world data revealing several abnormal driving behaviors (e.g., sudden acceleration, rapid lane-changing) and develops a Hierarchical Extreme Learning Machines (HELM) based semi-supervised ML method using partly labeled data to accurately detect the identified abnormal driving behaviors. Moreover, previous ML-based approaches predominantly utilize basic vehicle motion features (such as velocity and acceleration) to label and detect abnormal driving behaviors, while this study seeks to introduce Surrogate Safety Measures (SSMs) as the input features for ML models to improve the detection performance. Results from extensive experiments demonstrate the effectiveness of the proposed semi-supervised ML model with the introduced SSMs serving as important features. The proposed semi-supervised ML method outperforms other baseline semi-supervised or unsupervised methods regarding various metrics, e.g., delivering the best accuracy at 99.58% and the best F-1 measure at 0.9913. The ablation study further highlights the significance of SSMs for advancing detection performance.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# ホログラフィーレニーエントロピーのための改良型宇宙ブレインの提案 A Modified Cosmic Brane Proposal for Holographic Renyi Entropy ( http://arxiv.org/abs/2312.04625v2 ) ライセンス: Link先を確認	Xi Dong, Jonah Kudler-Flam, Pratik Rath,	(参考訳) 本稿では,複数面が存在する場合のホログラフィック・レニーエントロピーの計算式を提案する。提案手法は、固定領域状態に基づいて波動関数を計算し、レーニエントロピーの対角近似を仮定する。 Renyi index $n\geq1$ に対して、我々の提案はホログラフィック Renyi entropy に対する既存の宇宙ブレインの提案と一致している。しかし、$n<1$の場合、我々の提案は(ニュートンの定数$G$で)宇宙ブレインの提案を補正する新しい位相を予測している。固定領域状態に対する最適化の観点からは、この2つの提案の違いは最適化の順序から理解することができる:$n<1$の場合、宇宙ブレイン提案は最小限の処方令であるのに対して、我々の提案は最大限の処方令である。このような先行順序補正の存在を実例で示す。特に,本提案では,PSSYモデルと高エネルギー固有状態の文献における既存の結果を再現し,前述した先行順序補正を$n<1$ Renyiエントロピーに普遍的に説明する。 We propose a new formula for computing holographic Renyi entropies in the presence of multiple extremal surfaces. Our proposal is based on computing the wave function in the basis of fixed-area states and assuming a diagonal approximation for the Renyi entropy. For Renyi index $n\geq1$, our proposal agrees with the existing cosmic brane proposal for holographic Renyi entropy. For $n<1$, however, our proposal predicts a new phase with leading order (in Newton's constant $G$) corrections to the cosmic brane proposal, even far from entanglement phase transitions and when bulk quantum corrections are unimportant. Recast in terms of optimization over fixed-area states, the difference between the two proposals can be understood to come from the order of optimization: for $n<1$, the cosmic brane proposal is a minimax prescription whereas our proposal is a maximin prescription. We demonstrate the presence of such leading order corrections using illustrative examples. In particular, our proposal reproduces existing results in the literature for the PSSY model and high-energy eigenstates, providing a universal explanation for previously found leading order corrections to the $n<1$ Renyi entropies.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# ソフトウェア定義VANETのためのスタック型アンサンブル学習IDSモデル A Stacked Ensemble Learning IDS Model for Software-Defined VANET ( http://arxiv.org/abs/2312.04956v4 ) ライセンス: Link先を確認	Shakil Ibne Ahsan, Phil Legg, S M Iftekharul Alam,	(参考訳) 侵入検知システム(IDS)は、外部ネットワークのセキュリティイベントを検出し、緩和するために広く利用されている。 VANET(Vehicle ad-hoc Networks)は特にコネクテッド・オートモービルズ(CAV)の開発で進化している。したがって、新興技術において従来のIDSアプローチをどのように活用できるかを評価することが不可欠である。この問題に対処するため,本研究では,複数の機械学習アルゴリズムを組み合わせることで,単一のアルゴリズム手法よりも効果的に脅威を検出することを目的とした,集積型アンサンブル学習手法を提案する。 CICIDS2017とVeReMiベンチマークデータセットを使用して、我々のアプローチのパフォーマンスを既存の機械学習手法と比較し、脅威を特定するのがより正確であることを確かめる。また,ハイパーパラメータ最適化と特徴選択を取り入れて,性能をさらに向上する。以上の結果から,累積アンサンブル学習はIDSの有効性を高める上で有望な手法であることが示唆された。 Intrusion Detection Systems (IDS) are widely employed to detect and mitigate external network security events. VANETs (Vehicle ad-hoc Networks) are evolving, especially with the development of Connected Autonomous Vehicles (CAVs). So, it is crucial to assess how traditional IDS approaches can be utilised for emerging technologies. To address this concern, our work presents a stacked ensemble learning approach for IDS, which combines multiple machine learning algorithms to detect threats more effectively than single algorithm methods. Using the CICIDS2017 and the VeReMi benchmark data sets, we compare the performance of our approach with existing machine learning methods and find that it is more accurate at identifying threats. Our method also incorporates hyperparameter optimization and feature selection to improve its performance further. Overall, our results suggest that stacked ensemble learning is a promising technique for enhancing the effectiveness of IDS.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 量子アルゴリズムを用いたランダムハイパーグラフMAX-3-XORSAT問題の近似性について On the approximability of random-hypergraph MAX-3-XORSAT problems with quantum algorithms ( http://arxiv.org/abs/2312.06104v3 ) ライセンス: Link先を確認	Eliot Kapit, Brandon A. Barton, Sean Feeney, George Grattan, Pratik Patnaik, Jacob Sagal, Lincoln D. Carr, Vadim Oganesyan,	(参考訳) NPにおける制約満足度問題の標準的特徴は近似硬度であり、最悪の場合、すべての既知の方法において十分品質の近似解を見つけることは指数関数的に困難である。基本的に、ガイド付き局所最小脱出法が欠如していることは、古典的近似の正確さと近似的な近似の硬さの両方を保証するが、量子アルゴリズムの等価メカニズムはよく理解されていない。ハミルトニアン時間進化に基づくアルゴリズムでは、原型的にハードなMAX-3-XORSAT問題クラスを通してこの問題を探索する。量子精度と近似硬さのメカニズムは根本的に異なると結論付けている。論文の既知結果をレビューし,従来の量子法(例えばAdiabatic Quantum Computing)の弱い近似アルゴリズムを最悪の場合に用いるメカニズムを同定する。我々はこれらの問題から逃れるスペクトルフィルタリング量子アルゴリズムのファミリーを構築し、その性能に関する解析理論を開発する。近似系におけるランダムなハイパーグラフに対して、エネルギーを$E = N_{\mathrm{unsat}}-N_{\mathrm{sat}}$と定義すると、スペクトルフィルタリングされた量子最適化は準四進時間において$E \leq q_m E_{\mathrm{GS}}$(ここで$E_{\rm GS}$は基底状態エネルギー)で状態を返す。これは、古典的な検索を行う最も難しいインスタンスに対して$q_m \to 0$と対照的である。これらすべての主張を広範な数値シミュレーションで検証する。我々は、この近似保証がすべての可能なハイパーグラフを保持できると主張するわけではないが、我々のアルゴリズムのメカニズムは広く一般化される可能性がある。これらの結果は、量子コンピュータが以前想定されていたよりも近似最適化に強力であることを示唆している。 A canonical feature of the constraint satisfaction problems in NP is approximation hardness, where in the worst case, finding sufficient-quality approximate solutions is exponentially hard for all known methods. Fundamentally, the lack of any guided local minimum escape method ensures both exact and approximate classical approximation hardness, but the equivalent mechanism(s) for quantum algorithms are poorly understood. For algorithms based on Hamiltonian time evolution, we explore this question through the prototypically hard MAX-3-XORSAT problem class. We conclude that the mechanisms for quantum exact and approximation hardness are fundamentally distinct. We review known results from the literature, and identify mechanisms that make conventional quantum methods (such as Adiabatic Quantum Computing) weak approximation algorithms in the worst case. We construct a family of spectrally filtered quantum algorithms that escape these issues, and develop analytical theories for their performance. We show that, for random hypergraphs in the approximation-hard regime, if we define the energy to be $E = N_{\mathrm{unsat}}-N_{\mathrm{sat}}$, spectrally filtered quantum optimization will return states with $E \leq q_m E_{\mathrm{GS}}$ (where $E_{\rm GS}$ is the ground state energy) in sub-quadratic time, where conservatively, $q_m \simeq 0.59$. This is in contrast to $q_m \to 0$ for the hardest instances with classical searches. We test all of these claims with extensive numerical simulations. We do not claim that this approximation guarantee holds for all possible hypergraphs, though our algorithm's mechanism can likely generalize widely. These results suggest that quantum computers are more powerful for approximate optimization than had been previously assumed.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 構造化状態空間モデルはディープ・ウィーナーモデルである Structured state-space models are deep Wiener models ( http://arxiv.org/abs/2312.06211v2 ) ライセンス: Link先を確認	Fabio Bonassi, Carl Andersson, Per Mattsson, Thomas B. Schön,	(参考訳) 本研究の目的は,構造化状態空間モデル (Structured State-space Models, SSM) に対するシステム識別フレンドリな導入を提供することである。これらのモデルは、その並列化性のため、非常に長いシーケンス分類と回帰問題に取り組むために効率よく、訓練できるため、機械学習コミュニティで最近人気になっている。興味深いことに、SSMは深層Wienerモデルを学習する効果的な方法として現れ、システム識別によく使用されるモデルクラスの拡張としてSSMを再構成することができる。機械学習とシステム識別コミュニティ間のアイデアの多様さを刺激するために,最近のトピックに対するコントリビューションを構造化され,アクセス可能な形式で要約することが有用であると考えられる。最後に、このコミュニティが影響力のあるコントリビューションを提供するための今後の研究の方向性を強調します。 The goal of this paper is to provide a system identification-friendly introduction to the Structured State-space Models (SSMs). These models have become recently popular in the machine learning community since, owing to their parallelizability, they can be efficiently and scalably trained to tackle extremely-long sequence classification and regression problems. Interestingly, SSMs appear as an effective way to learn deep Wiener models, which allows to reframe SSMs as an extension of a model class commonly used in system identification. In order to stimulate a fruitful exchange of ideas between the machine learning and system identification communities, we deem it useful to summarize the recent contributions on the topic in a structured and accessible form. At last, we highlight future research directions for which this community could provide impactful contributions.	翻訳日:2024-05-22 00:00:07 公開日:2024-05-20
# 学習とリコール : 事前学習型言語モデルによるインクリメンタルラーニングの再考 Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models ( http://arxiv.org/abs/2312.07887v2 ) ライセンス: Link先を確認	Junhao Zheng, Shengjie Qiu, Qianli Ma,	(参考訳) インクリメンタルラーニング(IL)は、ビジョンと自然言語処理(NLP)コミュニティにおいて長年の課題であった。近年、PLM(Pre-trained Language Models)は様々なNLP下流タスクにおいて顕著な進歩を遂げており、最近のNLPにおけるIL研究において、PLMをバックボーンとして活用することが一般的となっている。殆どの人は、破滅的な忘れが優れたIL性能を達成するための最大の障害であると仮定し、この問題を克服するための様々な手法を提案する。しかし、この仮定は問題となる。具体的には,4つの分類タスク(テキスト分類,インテント分類,関係抽出,名前付きエンティティ認識)について,最も一般的な2つのIL設定(クラスインクリメンタルとタスクインクリメンタル)に基づいて20以上の手法を再検討し,PLMの固有のアンチフォジット能力を著しく過小評価していることを明らかにする。そこで本研究では,PLMを用いたILのためのSEQというフラストレーションに富んだ手法を提案する。その結果,SEQ は最新式 (SOTA) の IL 法に比べて性能が優れており,トレーニング時間やトレーニング時間もかなり少ないことがわかった。これらの知見は, ILをPLMで再考し, 今後の研究がPLMにおける破滅的な忘れを根本的に理解することを促すものである。データ、コード、スクリプトはhttps://github.com/zzz47zzz/pretrained-lm-for-incremental-learningで公開されている。 Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# ASLseg:半監督肝腫瘍分節に対するSAMのループへの適応 ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation ( http://arxiv.org/abs/2312.07969v2 ) ライセンス: Link先を確認	Shiyun Chen, Li Lin, Pujin Cheng, Xiaoying Tang,	(参考訳) 肝腫瘍の分節化は, コンピュータ診断, 手術計画, 予後評価に必須である。しかし、高密度アノテーションによる大規模データセットの取得と維持は困難である。セミ・スーパーバイザード・ラーニング(SSL)はこれらの課題に対処するための一般的なテクニックである。近年,Segment Anything Model (SAM) は,いくつかの画像分割作業において有望な性能を示したが,肝腫瘍のセグメンテーションでは不十分であった。本稿では,新しい半教師付きフレームワークであるASLsegを提案する。これはSAMをSSL設定に効果的に適応し,肝腫瘍のドメイン固有知識と一般知識を組み合わせることができる。具体的には、特定のSSLパラダイムでトレーニングされたセグメンテーションモデルは、微調整されたSAMへのプロンプトとして生成された擬似ラベルを提供する。次に、適応ネットワークを使用してSAM予測を洗練し、高品質な擬似ラベルを生成する。最後に、信頼された擬似ラベルを選択して、反復訓練のためのラベル付きセットを拡張する。 LiTSデータセットの大規模な実験は、当社のASLセグの圧倒的な性能を示している。 Liver tumor segmentation is essential for computer-aided diagnosis, surgical planning, and prognosis evaluation. However, obtaining and maintaining a large-scale dataset with dense annotations is challenging. Semi-Supervised Learning (SSL) is a common technique to address these challenges. Recently, Segment Anything Model (SAM) has shown promising performance in some medical image segmentation tasks, but it performs poorly for liver tumor segmentation. In this paper, we propose a novel semi-supervised framework, named ASLseg, which can effectively adapt the SAM to the SSL setting and combine both domain-specific and general knowledge of liver tumors. Specifically, the segmentation model trained with a specific SSL paradigm provides the generated pseudo-labels as prompts to the fine-tuned SAM. An adaptation network is then used to refine the SAM-predictions and generate higher-quality pseudo-labels. Finally, the reliable pseudo-labels are selected to expand the labeled set for iterative training. Extensive experiments on the LiTS dataset demonstrate overwhelming performance of our ASLseg.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# DIRECT:不均衡とラベルノイズ下での深層能動学習 DIRECT: Deep Active Learning under Imbalance and Label Noise ( http://arxiv.org/abs/2312.09196v3 ) ライセンス: Link先を確認	Shyam Nuggehalli, Jifan Zhang, Lalit Jain, Robert Nowak,	(参考訳) クラス不均衡は、実世界の機械学習アプリケーションでは一般的な問題であり、希少クラスや少数クラスではパフォーマンスが低下することが多い。ワイルドなラベル付きデータの豊富さによって、アクティブラーニングは、おそらくその根底にある問題を解決する最も効果的なテクニックである。ラベルノイズは、データアノテーションジョブのもう1つの一般的な問題であり、特にアクティブな学習方法では難しい。本研究では,クラス不均衡とラベルノイズの両面において,アクティブラーニングの最初の研究を行う。本稿では,クラス分離閾値を頑健に同定し,最も近い不確実な例を注釈する新しいアルゴリズムを提案する。 DIRECTは,一次元アクティブラーニングへの新たな削減を通じて,古典的なアクティブラーニング文献を活用し,バッチラベリングやラベルノイズに対する耐性といった問題に対処することができる。ラベルノイズを伴わない不均衡データセットについて広範な実験を行った。 DIRECTは,最先端のアクティブ学習アルゴリズムと比較して60%以上のアノテーション予算を節約でき,また,ランダムサンプリングに比べて80%以上のアノテーション予算を節約できることを示した。 Class imbalance is a prevalent issue in real world machine learning applications, often leading to poor performance in rare and minority classes. With an abundance of wild unlabeled data, active learning is perhaps the most effective technique in solving the problem at its root -- collecting a more balanced and informative set of labeled examples during annotation. Label noise is another common issue in data annotation jobs, which is especially challenging for active learning methods. In this work, we conduct the first study of active learning under both class imbalance and label noise. We propose a novel algorithm that robustly identifies the class separation threshold and annotates the most uncertain examples that are closest from it. Through a novel reduction to one-dimensional active learning, our algorithm DIRECT is able to leverage the classic active learning literature to address issues such as batch labeling and tolerance towards label noise. We present extensive experiments on imbalanced datasets with and without label noise. Our results demonstrate that DIRECT can save more than 60% of the annotation budget compared to state-of-art active learning algorithms and more than 80% of annotation budget compared to random sampling.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# 深部ドラム音源分離に向けて Toward Deep Drum Source Separation ( http://arxiv.org/abs/2312.09663v3 ) ライセンス: Link先を確認	Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti,	(参考訳) これまで、ドラムソース分離の分野は、データ可用性が限られており、他の関連するオーディオアプリケーションで成功を収めた最先端のディープラーニング手法の採用を妨げていたため、重大な課題に直面していた。本稿では,独立した単一構造ドラムステムの大規模オーディオデータセットであるStemGMDを紹介する。 10個の実音響ドラムキットを用いて、各オーディオクリップを表現型ドラム演奏のMIDI記録から合成する。トータルで1224時間、StemGMDはドラムの最大のオーディオデータセットであり、標準の9ピースドラムキットですべての楽器のための独立したオーディオクリップを初めて作成した。我々は、StemGMDを利用して、新しいディープドラムソース分離モデルであるLarsNetを開発した。専用U-Netのバンクを通じて、LarsNetはステレオドラムの混合物から5本の幹をリアルタイムより高速に分離することができ、最先端の非負の分光時間分解法よりも著しく優れていることを示す。 In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# 決定のための識別表現事前学習に基づくトップkサブタスクプランニングツリーの学習 Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making ( http://arxiv.org/abs/2312.11027v2 ) ライセンス: Link先を確認	Jingqing Ruan, Kaishen Wang, Qingyang Zhang, Dengpeng Xing, Bo Xu,	(参考訳) 多くの複雑な現実世界のタスクは、より小さく、より管理しやすい部分に分割することができる。しかし、このプロセスを複製することはAIエージェントにとって課題であり、自然に2つの疑問を提起する。複雑な問題を分解する合理的プランを開発するには? 単一エンコーダ構造を用いた既存の表現学習手法の多くは脆弱で、複雑で多様な力学に敏感である。この問題に対処するために、簡単なサブタスクのための十分なデータからタスク固有表現を学習するマルチエンコーダと個別予測システムを導入する。複数のエンコーダは、混乱することなく適切なタスク関連ダイナミクスを抽出することができ、共有予測器はタスク特性を識別することができる。また、注意機構を用いてトップkのサブタスク計画木を生成し、未確認タスクの複雑な決定を導くためにサブタスク実行計画をカスタマイズする。このプロセスは、計画木の深さと幅を柔軟に調整し、前方視とグローバル性を実現する。いくつかの基本的な単純なタスクと組合せ的にリッチな合成タスクからなる挑戦的なプラットフォーム上での実証的な結果は、競争力のあるベースラインを一貫して上回り、我々の設計の利点を実証する。 Many complicated real-world tasks can be broken down into smaller, more manageable parts, and planning with prior knowledge extracted from these simplified pieces is crucial for humans to make accurate decisions. However, replicating this process remains a challenge for AI agents and naturally raises two questions: How to extract discriminative knowledge representation from priors? How to develop a rational plan to decompose complex problems? Most existing representation learning methods employing a single encoder structure are fragile and sensitive to complex and diverse dynamics. To address this issue, we introduce a multiple-encoder and individual-predictor regime to learn task-essential representations from sufficient data for simple subtasks. Multiple encoders can extract adequate task-relevant dynamics without confusion, and the shared predictor can discriminate the task characteristics. We also use the attention mechanism to generate a top-k subtask planning tree, which customizes subtask execution plans in guiding complex decisions on unseen tasks. This process enables forward-looking and globality by flexibly adjusting the depth and width of the planning tree. Empirical results on a challenging platform composed of some basic simple tasks and combinatorially rich synthetic tasks consistently outperform some competitive baselines and demonstrate the benefits of our design.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# トロコイド探索最適化 Trochoid Search Optimization ( http://arxiv.org/abs/2312.13597v2 ) ライセンス: Link先を確認	Abdesslem Layeb,	(参考訳) 本稿では,トロコイド曲線の数学的特性を利用した新しいメタヒューリスティックであるトロコイド探索最適化アルゴリズム(TSO)を提案する。 TSOアルゴリズムは、トロコイド固有の同時翻訳運動と回転運動のユニークな組み合わせを採用し、爆発的な探索能力と搾取的な探索能力の間の洗練された平衡を育む。特に、TSOは、その効率性と有効性に一括して寄与する、グローバルとローカルの2つの重要なフェーズで構成されている。実験による検証は、TSOアルゴリズムが様々なベンチマーク関数にまたがる顕著な性能を示し、探索空間における探索とエクスプロイトのバランスのとれた競争力のあるエッジを示す。 TSOの際立った特徴は単純さにある。ユーザ定義パラメータの最小限の要件が特徴であり、アクセス可能で強力な最適化ツールである。 This paper introduces the Trochoid Search Optimization Algorithm (TSO), a novel metaheuristic leveraging the mathematical properties of trochoid curves. The TSO algorithm employs a unique combination of simultaneous translational and rotational motions inherent in trochoids, fostering a refined equilibrium between explorative and exploitative search capabilities. Notably, TSO consists of two pivotal phases global and local search that collectively contribute to its efficiency and efficacy. Experimental validation demonstrates the TSO algorithm's remarkable performance across various benchmark functions, showcasing its competitive edge in balancing exploration and exploitation within the search space. A distinguishing feature of TSO lies in its simplicity, marked by a minimal requirement for user-defined parameters, making it an accessible yet powerful optimization tool.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# 吸収分布はコンセンサスにどのように影響するか : ブロックチェーンの分散化の分析 How Does Stake Distribution Influence Consensus? Analyzing Blockchain Decentralization ( http://arxiv.org/abs/2312.13938v3 ) ライセンス: Link先を確認	Shashank Motepalli, Hans-Arno Jacobsen,	(参考訳) PoSブロックチェーンの世界では、完全な分散化を実現する上での課題は、少数のバリデータ間でステンドトークンが不均等に集中していることによって、しばしば妨げられます。本研究では、重み付けされたコンセンサス機構のための分散化指標を最初に定式化することにより、この課題を解析する。 10個の無許可ブロックチェーンに対する実証分析により、バリデータ間のかなりの重量集中が明らかとなり、等価なアプローチの必要性が強調された。これに対応するために,重み分布を効果的に再検討するSquare Root Stake Weight (SRSW) モデルを提案する。 Gini指数は平均37.16%向上し, 中本指数は平均101.04%, 80.09%向上した。この研究は、ブロックチェーンのコンセンサスメカニズムにおける分散化を推進し、より公平で公平なステイクウェイト分布に向けた重要なステップである。 In the PoS blockchain landscape, the challenge of achieving full decentralization is often hindered by a disproportionate concentration of staked tokens among a few validators. This study analyses this challenge by first formalizing decentralization metrics for weighted consensus mechanisms. An empirical analysis across ten permissionless blockchains uncovers significant weight concentration among validators, underscoring the need for an equitable approach. To counter this, we introduce the Square Root Stake Weight (SRSW) model, which effectively recalibrates staking weight distribution. Our examination of the SRSW model demonstrates notable improvements in the decentralization metrics: the Gini index improves by 37.16% on average, while Nakamoto coefficients for liveness and safety see mean enhancements of 101.04% and 80.09%, respectively. This research is a pivotal step toward a more fair and equitable distribution of staking weight, advancing the decentralization in blockchain consensus mechanisms.	翻訳日:2024-05-21 23:50:08 公開日:2024-05-20
# 工学的正規微分方程式を分類アルゴリズム(EODECA):徹底的な特徴付けと試験 Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing ( http://arxiv.org/abs/2312.14681v2 ) ライセンス: Link先を確認	Raffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi, Lorenzo Giambagli, Duccio Fanelli,	(参考訳) EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) は、機械学習と動的システム理論の共通部分における新しいアプローチであり、分類タスクのためのユニークなフレームワークである[1]。この方法は、通常の微分方程式(ODE)を用いて、複雑な分類課題を効率的に扱うことによって、その力学系構造を際立たせる。論文は、EODECAの動的特性を考察し、ランダムな摂動に対するレジリエンスと、さまざまな分類シナリオにおける堅牢なパフォーマンスを強調した。特に、EODECAの設計には、安定したアトラクタをフェーズ空間に埋め込む機能が含まれており、信頼性を高め、可逆的なダイナミクスを可能にする。本稿では,作業 [1] を拡張し,オイラー離散化方式を用いて包括的解析を行う。特に,EODECAの性能を5つの異なる分類問題に分けて評価し,適応性と効率性を検討した。重要なことは、EODECAがMNISTデータセットとFashion MNISTデータセットで有効であることを実証し、それぞれ98.06\%と8.21\%の印象的な精度を達成したことである。これらの結果は多層パーセプトロン(MLP)に匹敵するものであり、複雑なデータ処理タスクにおけるEODECAの可能性を示している。我々は、モデルの学習の旅をさらに探求し、前と後の両方のトレーニング環境におけるその進化を評価し、安定した誘引者に向かう能力を強調します。この研究は、EODECAの可逆性、意思決定プロセスと内部作業に光を当てることについても検討している。本稿では、より透明で堅牢な機械学習パラダイムに向けて、機械学習アルゴリズムと動的システム方法論のギャップを埋める重要なステップを示す。 EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) is a novel approach at the intersection of machine learning and dynamical systems theory, presenting a unique framework for classification tasks [1]. This method stands out with its dynamical system structure, utilizing ordinary differential equations (ODEs) to efficiently handle complex classification challenges. The paper delves into EODECA's dynamical properties, emphasizing its resilience against random perturbations and robust performance across various classification scenarios. Notably, EODECA's design incorporates the ability to embed stable attractors in the phase space, enhancing reliability and allowing for reversible dynamics. In this paper, we carry out a comprehensive analysis by expanding on the work [1], and employing a Euler discretization scheme. In particular, we evaluate EODECA's performance across five distinct classification problems, examining its adaptability and efficiency. Significantly, we demonstrate EODECA's effectiveness on the MNIST and Fashion MNIST datasets, achieving impressive accuracies of $98.06\%$ and $88.21\%$, respectively. These results are comparable to those of a multi-layer perceptron (MLP), underscoring EODECA's potential in complex data processing tasks. We further explore the model's learning journey, assessing its evolution in both pre and post training environments and highlighting its ability to navigate towards stable attractors. The study also investigates the invertibility of EODECA, shedding light on its decision-making processes and internal workings. This paper presents a significant step towards a more transparent and robust machine learning paradigm, bridging the gap between machine learning algorithms and dynamical systems methodologies.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# Bipartiete Mixed Separable States を用いた Ancilla-Assisted Process Tomography Ancilla-Assisted Process Tomography with Bipartiete Mixed Separable States ( http://arxiv.org/abs/2312.14901v3 ) ライセンス: Link先を確認	Zhuoran Bao, Daniel F. V. James,	(参考訳) システム状態とアシラリー状態の絡み合いは、アシラ支援プロセス断層撮影(AAPT)を行うための厳密な要件ではないことが示されている。代わりに、システム・アンシラ状態が忠実であることを要求するだけであり、実際には、状態を表すある行列の可逆性である。本稿は、量子過程に関する完全な情報を抽出できる状態が忠実であること、および2キュービットのシステム・アンシラ状態における単一キュービットの操作に制限されたプロセスについて述べる。本稿では,2つの量子ビットの相関関係を定量化する,可逆性問題とシニスターネスの概念を結びつける理論的解析について述べる。シニスターネス(Sinisterness)を用いて、運用上の意味に忠実であることが保証された2量子状態を構築し、プロセスの平均誤差に基づいて境界を推定する手法を導出する。我々の分析は、最大絡み合った状態が最小の誤差増幅を与えることに一致している。それでも、エンタングルメントの利点が始める数値領域をマップアウトする。 It has been shown that the entanglement between the system state and the ancillary state is not a strict requirement for performing ancilla-assisted process tomography(AAPT). Instead, it only requires that the system-ancilla state be faithful, which, in practice, is the invertibility of a certain matrix representing the state. Our paper takes on the operational definition of faithfulness, i.e., a state is faithful if one can extract complete information about the quantum process, and we restrict the process to single-qubit operations on a two-qubit system-ancilla state. We present a theoretical analysis to connect the invertibility problem to the concept of Sinisterness, which quantifies the correlation of two qubits. Using Sinisterness, we derive a way of constructing two-qubit states that are guaranteed to be faithful in an operational sense and estimate the bound on the average error of the process. Our analysis agrees that the maximally entangled states provided the smallest error amplification. Nevertheless, it maps out a numerical region where the advantage of the entanglement starts.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# コヒーレント散乱による2つの浮遊ナノ粒子の同時地中冷却 Simultaneous ground-state cooling of two levitated nanoparticles by coherent scattering ( http://arxiv.org/abs/2312.15898v2 ) ライセンス: Link先を確認	Yi Xu, Yu-Hong Liu, Cheng Liu, Jie-Qiao Liao,	(参考訳) 2つの浮遊ナノ粒子の同時冷却は、量子エンタングルメントや粒子の翻訳運動を伴う量子相関のようなマクロ的な量子効果を研究するための重要な前提条件である。ここでは,共振器共振粒子系について考察し,ハミルトンの詳細な導出について述べる。 2つの粒子の$y$-direction運動は空洞場と$x$-および$z$-direction運動の両方から切り離され、さらに$z$-direction運動は粒子の適切な位置を選択することにより空洞場と$x$-direction運動からさらに切り離される。 3モード・5モード共振器共振器共振器共振器モデルにおけるこれらの機械モードの同時冷却について検討した。 2つのツイーザーが同じパワーを持つ場合、同時に地中冷却を抑えるダークモード効果が存在することが判明した。それでも、これらのモードの同時冷却は、適切なパラメータの下でダークモード効果を破ることによって実現できる。本システムは,共振器共振器共振器系における量子効果と応用を研究するための汎用的なプラットフォームを提供する。 Simultaneous ground-state cooling of two levitated nanoparticles is a crucial prerequisite for investigation of macroscopic quantum effects such as quantum entanglement and quantum correlation involving translational motion of particles. Here we consider a coupled cavity-levitated-particle system and present a detailed derivation of its Hamiltonian. We find that the $y$-direction motions of the two particles are decoupled from the cavity field and both the $x$- and $z$-direction motions, and that the $z$-direction motions can be further decoupled from the cavity field and the $x$-direction motions by choosing proper locations of the particles. We study the simultaneous cooling of these mechanical modes in both the three-mode and five-mode cavity-levitated optomechanical models. It is found that there exists the dark-mode effect when the two tweezers have the same powers, which suppress the simultaneous ground-state cooling. Nevertheless, the simultaneous ground-state cooling of these modes can be realized by breaking the dark-mode effect under proper parameters. Our system provides a versatile platform to study quantum effects and applications in cavity-levitated optomechanical systems.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# 凸確率計画における平均平均近似のための計量エントロピー自由サンプル複素境界 Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming ( http://arxiv.org/abs/2401.00664v2 ) ライセンス: Link先を確認	Hongcheng Liu, Jindong Tong,	(参考訳) 本稿では、凸あるいは強凸確率計画問題の解法におけるサンプル平均近似(SAA)について検討する。いくつかの共通正規性条件の下では、おそらく初めて、SAAのサンプルの複雑さが(被覆数の対数のような)計量エントロピーの量子化から完全に解放されることを示し、既存のほとんどの結果よりも次元$d$のかなり効率的な速度をもたらす。新たに確立された複雑性境界から、SAA と正準確率ミラー降下(SMD)法は、SP に対する2つの主流解法であり、サンプル効率のほぼ同じ率を伴い、SAA の長期理論上の矛盾を$O(d)$ の順序で修正する。さらに,SAAが証明可能な有効性を維持する非リプシッツ的シナリオについて検討する一方,SMDの対応は未検討であり,不規則な環境下でのSAAのよりよい適用可能性を示している。 This paper studies the sample average approximation (SAA) in solving convex or strongly convex stochastic programming problems. Under some common regularity conditions, we show -- perhaps for the first time -- that the SAA's sample complexity can be completely free from any quantification of metric entropy (such as the logarithm of the covering number), leading to a significantly more efficient rate with dimensionality $d$ than most existing results. From the newly established complexity bounds, an important revelation is that the SAA and the canonical stochastic mirror descent (SMD) method, two mainstream solution approaches to SP, entail almost identical rates of sample efficiency, rectifying a long-standing theoretical discrepancy of the SAA from the SMD by the order of $O(d)$. Furthermore, this paper explores non-Lipschitzian scenarios where the SAA maintains provable efficacy, whereas corresponding results for the SMD remain unexplored, indicating the potential of the SAA's better applicability in some irregular settings.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# 光ワイドエリア通信ネットワークのための量子アニーリングにより実現したILPベースの資源最適化 -量子アニーリングによる実世界のアプリケーションにおける組合せ問題を解くためのフレームワーク ILP-based Resource Optimization Realized by Quantum Annealing for Optical Wide-area Communication Networks -- A Framework for Solving Combinatorial Problems of a Real-world Application by Quantum Annealing ( http://arxiv.org/abs/2401.00826v2 ) ライセンス: Link先を確認	Arthur Witt, Jangho Kim, Christopher Körber, Thomas Luu,	(参考訳) 広域インターネットネットワークのリソース割り当ては、本質的には組合せ最適化の問題であり、高速に解決すれば、ネットワークの有効性とロバスト性を高めるとともに、電力供給トランシーバからのエネルギー要求を最小限に抑えながら、インターネットプロトコールトラフィックのほぼリアルタイムな適応制御を提供できる。近年の研究では、D-Wave AdvantageTM量子アニールシステムに組み込むことができる2次非拘束二元最適化(QUBO)問題として、そのような問題をいかに実装できるかを実証し、原理実証を行った。我々の初期の研究は、システム実行パラメータの司法的選択によるD-Waveソリューションの改善の可能性を広げた。本稿では、これらのシステムのパラメータを最適化するための調査と、ソリューションの品質をさらに向上させるために機械学習(ML)技術をどのように組み込んだかについて報告する。特に,ハミング距離を用いて,様々なシステム実行パラメータと解ベクトルの相関関係について検討する。次に、これらの相関関係を学習するために決定木ニューラルネットワーク(NN)を適用し、ニューラルネットワークを使用して解ベクトルにさらなる推測を提供する。我々は、このNNを単純な整数線形プログラミング(ILP)の例で実装し、D-Waveが取得しなかった解空間をNNが完全にマッピングする方法を実証した。しかし、3ノードネットワークの問題に対して、NNはソリューションの空間の質を高めることができない。 Resource allocation of wide-area internet networks is inherently a combinatorial optimization problem that if solved quickly, could provide near real-time adaptive control of internet-protocol traffic ensuring increased network efficacy and robustness, while minimizing energy requirements coming from power-hungry transceivers. In recent works we demonstrated how such a problem could be cast as a quadratic unconstrained binary optimization (QUBO) problem that can be embedded onto the D-Wave AdvantageTM quantum annealer system, demonstrating proof of principle. Our initial studies left open the possibility for improvement of D-Wave solutions via judicious choices of system run parameters. Here we report on our investigations for optimizing these system parameters, and how we incorporate machine learning (ML) techniques to further improve on the quality of solutions. In particular, we use the Hamming distance to investigate correlations between various system-run parameters and solution vectors. We then apply a decision tree neural network (NN) to learn these correlations, with the goal of using the neural network to provide further guesses to solution vectors. We successfully implement this NN in a simple integer linear programming (ILP) example, demonstrating how the NN can fully map out the solution space that was not captured by D-Wave. We find, however, for the 3-node network problem the NN is not able to enhance the quality of space of solutions.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# 高分解能ジコトコス像の両側参照 Bilateral Reference for High-Resolution Dichotomous Image Segmentation ( http://arxiv.org/abs/2401.03407v3 ) ライセンス: Link先を確認	Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe,	(参考訳) 高分解能ディコトコス像分割(DIS)のための新しい両側参照フレームワーク(BiRefNet)を導入する。本研究は,2つの基本成分: 局所化モジュール (LM) と再構成モジュール (RM) を, 提案した両側参照 (BiRef) で構成する。 LMはグローバルな意味情報を用いたオブジェクトのローカライゼーションを支援する。 RM内では、画像の階層的パッチがソース参照を提供し、勾配マップがターゲット参照として機能する、再構成プロセスにBiRefを利用する。これらのコンポーネントは、最終的な予測マップを生成するために協力する。また,より詳細な領域に焦点を絞るために,補助的な勾配監督を導入する。さらに、地図の質とトレーニングプロセスを改善するために、Disdisに適した実践的なトレーニング戦略を概説する。提案手法の汎用性を検証するため,BiRefNetがすべてのベンチマークにおいて,タスク固有の最先端手法よりも優れた性能を示すことを示すため,4つのタスクについて広範な実験を行った。私たちのコードはhttps://github.com/ZhengPeng7/BiRefNetで公開されています。 We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# シュワルツシルト時空における真の三部体の非局所性と絡み合いのデコヒーレンス下での増幅 Amplification of genuine tripartite nonlocality and entanglement in the Schwarzschild spacetime under decoherence ( http://arxiv.org/abs/2401.04407v2 ) ライセンス: Link先を確認	Chunyao Liu, Zhengwen Long, Qiliang He,	(参考訳) シュワルツシルトブラックホールの背景における真の三部体非局所性(GTN)と真の三部体エンタングルメント(GTE)のデコヒーレンス下での局所濾過操作による増幅について検討した。物理的にアクセス可能なGTNはデコヒーレンスによって完全に破壊され、物理的にアクセス可能なGTNがシステム内に存在しないことが示されている。特に、局所フィルタリング操作は、物理的にアクセス可能なGTNを一定範囲のホーキング温度内に表示させることができ、すなわち、局所フィルタリング操作は、物理的にアクセス可能なGTNを、以前に発見されておらず、量子情報処理の恩恵を受ける環境と結合したシステム内で生成することができる。さらに、物理的にアクセス可能なGTEは、ほとんどの場合において無限のホーキング温度の極限で安定な値に近づき、もしデコヒーレンスパラメータ$p$が1未満であれば、デコヒーレンス強度が十分大きい場合には、GTEの ‘sudden death'' が発生する。なお、非ゼロ安定値のGTEは、デコヒーレンスの存在下であっても、局所フィルタリング操作を行うことで増大させることができる。最後に, 物理的に到達不能なGTNとGTEの生成をデコヒーレンス下で検討し, 物理的に到達不能なGTNは生成できないが, 物理的に到達不能なGTEは生成可能であることを示した。さらに, 局所フィルタリング操作を適用することにより, 生成した物理的に到達不能なGTEを増大させることができる。 We investigate the amplification of the genuine tripartite nonlocality(GTN) and the genuine tripartite entanglement(GTE) of Dirac particles in the background of a Schwarzschild black hole by a local filtering operation under decoherence. It is shown that the physically accessible GTN will be completely destroyed by decoherence, which means that the physically accessible GTN will not exist in the system. Particularly, the local filtering operation can make the physically accessible GTN appear within a certain range of Hawking temperature, namely, the local filtering operation can cause the physically accessible GTN to be generated in the system coupled with the environment, which is not discovered before and is benefit for the quantum information processing. Furthermore, we also find that the physically accessible GTE approaches a stable value in the limit of infinite Hawking temperature for most cases, but if the decoherence parameter $p$ is less than 1, the ``sudden death'' of GTE will take place when the decoherence strength is large enough. It is worth noting that the nonzero stable value of GTE can be increased by performing the local filtering operation, even in the presence of decoherence. Finally, we explore the generation of physically inaccessible GTN and GTE of other tripartite subsystems under decoherence, it is shown that the physically inaccessible GTN cannot be produced, but the physically inaccessible GTE can be produced. In addition, we can see that the generated physically inaccessible GTE can be increased by applying the local filtering operation.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# データ駆動物理インフォームドニューラルネットワーク:デジタル双対視点 Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective ( http://arxiv.org/abs/2401.08667v4 ) ライセンス: Link先を確認	Sunwoong Yang, Hojin Kim, Yoonpyo Hong, Kwanjung Yee, Romit Maulik, Namwoo Kang,	(参考訳) 本研究では, 物理インフォームドニューラルネットワーク(PINN)によるディジタル双生児(DT)の実現の可能性について, 様々な観点から検討した。まず,手動によるメッシュ生成を伴わない仮想表現の自動構築を可能にするPINNのメッシュフリーフレームワークにおいて,コロケーションポイントに対する様々な適応サンプリング手法の有効性を検証した。次に,データ駆動型PINN(DD-PINN)フレームワークの全体的な性能について検討し,DTシナリオで取得したデータセットを活用する。より一般的な物理学へのスケーラビリティはパラメトリックなナビエ・ストークス方程式で検証され、レイノルズ数が異なるため、PINNは再訓練される必要はない。また, 実際に異なる忠実度/疎度からデータセットを収集できるため, 多忠実DD-PINNも提案され, 評価されている。これらは外挿タスクにおいても顕著な予測性能を示し、シングルフィデリティアプローチよりも42\sim62\%$改善されている。最後に,多要素DD-PINNの不確実性定量化性能をアンサンブル法を用いて検討し,精度の高い予測不確かさの測定が重要であるDTにおけるその可能性を検証する。この研究で調べたDD-PINNフレームワークは、上記の観点から従来のPINNよりもDTシナリオに適していることが分かり、エンジニアはシームレスなDTの実現に一歩近づいた。 This study explores the potential of physics-informed neural networks (PINNs) for the realization of digital twins (DT) from various perspectives. First, various adaptive sampling approaches for collocation points are investigated to verify their effectiveness in the mesh-free framework of PINNs, which allows automated construction of virtual representation without manual mesh generation. Then, the overall performance of the data-driven PINNs (DD-PINNs) framework is examined, which can utilize the acquired datasets in DT scenarios. Its scalability to more general physics is validated within parametric Navier-Stokes equations, where PINNs do not need to be retrained as the Reynolds number varies. In addition, since datasets can be often collected from different fidelity/sparsity in practice, multi-fidelity DD-PINNs are also proposed and evaluated. They show remarkable prediction performance even in the extrapolation tasks, with $42\sim62\%$ improvement over the single-fidelity approach. Finally, the uncertainty quantification performance of multi-fidelity DD-PINNs is investigated by the ensemble method to verify their potential in DT, where an accurate measure of predictive uncertainty is critical. The DD-PINN frameworks explored in this study are found to be more suitable for DT scenarios than traditional PINNs from the above perspectives, bringing engineers one step closer to seamless DT realization.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# 分散ランダムネットワーク蒸留による探索と反探索 Exploration and Anti-Exploration with Distributional Random Network Distillation ( http://arxiv.org/abs/2401.09750v4 ) ライセンス: Link先を確認	Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li,	(参考訳) エージェントが未知の環境で高いリターンを得るための深層強化学習において、探索は依然として重要な課題である。探索的ランダムネットワーク蒸留(RND)アルゴリズムは、多くの環境で有効であることが証明されているが、しばしばボーナスアロケーションにおいてより識別力を必要とする。本稿では、RNDにおける「結合不整合」の問題を強調し、その主な限界を指摘する。この問題に対処するために、RNDの派生である分布式RND(DRND)を導入する。 DRNDは、ランダムネットワークの分布を蒸留し、疑似カウントを暗黙的に取り入れて、ボーナス割り当ての精度を向上させることにより、探索プロセスを強化する。この改良により、エージェントはより広範な探査に従事した。本手法は,計算オーバーヘッドの増大を伴わずに,不整合問題を効果的に軽減する。理論的解析と実験結果は、元のRNDアルゴリズムよりも我々のアプローチの方が優れていることを示している。本手法は,D4RLオフラインタスクにおいて,オンライン探索シナリオの挑戦に優れ,探索防止機構として効果的に機能する。私たちのコードはhttps://github.com/yk7333/DRND.comで公開されています。 Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https://github.com/yk7333/DRND.	翻訳日:2024-05-21 23:40:18 公開日:2024-05-20
# PatchAD: 時系列異常検出のための軽量パッチベースMLPミキサ PatchAD: A Lightweight Patch-based MLP-Mixer for Time Series Anomaly Detection ( http://arxiv.org/abs/2401.09793v4 ) ライセンス: Link先を確認	Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, Kaixiang Yang,	(参考訳) 時系列解析における異常検出は重要な課題であるが、ラベル不足シナリオにおける正常パターンと異常パターンを識別することが課題となっている。以前の研究では、モデルの表現能力を制限する再構成に基づくアプローチが大半を占めていた。さらに、既存のディープラーニングベースの手法は十分に軽量ではない。これらの問題に対処するため,表現抽出と異常検出にコントラスト学習を利用する,新しいマルチスケールパッチベースのマルチスケールMLP-MixerアーキテクチャであるPatchADを提案する。 4つの異なるMLPミキサーと革新的なデュアルプロジェクト制約モジュールにより、PatchADは潜在的なモデル劣化を軽減し、わずか3.2$MBの軽量なソリューションを提供する。その有効性は、異なるアプリケーションシナリオから得られる9ドルのデータセットの最先端の結果によって実証され、30ドルの比較アルゴリズムよりも優れています。 PatchAD は古典的な F1 スコアを 50.5 %$ で、Aff-F1 スコアを 7.8 %$ で、AUC スコアを $10.0 %$ で大幅に改善する。コードは公開されている。 \url{https://github.com/EmorZz1G/PatchAD} Anomaly detection in time series analysis is a pivotal task, yet it poses the challenge of discerning normal and abnormal patterns in label-deficient scenarios. While prior studies have largely employed reconstruction-based approaches, which limits the models' representational capacities. Moreover, existing deep learning-based methods are not sufficiently lightweight. Addressing these issues, we present PatchAD, our novel, highly efficient multiscale patch-based MLP-Mixer architecture that utilizes contrastive learning for representation extraction and anomaly detection. With its four distinct MLP Mixers and innovative dual project constraint module, PatchAD mitigates potential model degradation and offers a lightweight solution, requiring only $3.2$MB. Its efficacy is demonstrated by state-of-the-art results across $9$ datasets sourced from different application scenarios, outperforming over $30$ comparative algorithms. PatchAD significantly improves the classical F1 score by $50.5\%$, the Aff-F1 score by $7.8\%$, and the AUC by $10.0\%$. The code is publicly available. \url{https://github.com/EmorZz1G/PatchAD}	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# Marabou 2.0: ニューラルネットワークのVersatile形式分析ツール Marabou 2.0: A Versatile Formal Analyzer of Neural Networks ( http://arxiv.org/abs/2401.14461v2 ) ライセンス: Link先を確認	Haoze Wu, Omri Isac, Aleksandar Zeljić, Teruhiro Tagomori, Matthew Daggitt, Wen Kokke, Idan Refaeli, Guy Amir, Kyle Julian, Shahaf Bassan, Pei Huang, Ori Lahav, Min Wu, Min Zhang, Ekaterina Komendantskaya, Guy Katz, Clark Barrett,	(参考訳) 本稿では,ニューラルネットワークの形式解析のためのMarabouフレームワークのバージョン2.0の包括的システム記述として機能する。ツールのアーキテクチャ設計について議論し、最初のリリース以降に導入された主要な機能とコンポーネントを強調します。 This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release.	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# 古典的量子貯水池計算の統一的普遍性条件 Universality conditions of unified classical and quantum reservoir computing ( http://arxiv.org/abs/2401.15067v3 ) ライセンス: Link先を確認	Francesco Monzani, Enrico Prati,	(参考訳) 貯留層コンピューティング(Reservoir computing)は、動的システム(貯水池)の非線形ダイナミクスを利用して時間依存情報を効率的に処理する、計算神経科学と機械学習の多用途パラダイムである。導入以来、様々なアプリケーションで顕著な能力を発揮してきた。広く知られているように、貯水池コンピュータのクラスは、暗くなるメモリを持つ関数の普遍的な近似器として機能する。そのような普遍類の構成はしばしば文脈固有のように見えるが、実際にはそれらは同じ原理に従う。ここでは、統一された理論的枠組みを示し、普遍性を確保するための準備が整った設定を提案する。量子貯水池計算は,従来の量子貯水池計算と量子貯水池計算の統一的な見方に光を当てている。 Reservoir computing is a versatile paradigm in computational neuroscience and machine learning, that exploits the non-linear dynamics of a dynamical system - the reservoir - to efficiently process time-dependent information. Since its introduction, it has exhibited remarkable capabilities in various applications. As widely known, classes of reservoir computers serve as universal approximators of functionals with fading memory. The construction of such universal classes often appears context-specific, but in fact, they follow the same principles. Here we present a unified theoretical framework and we propose a ready-made setting to secure universality. We test the result in the arising context of quantum reservoir computing.The analysis sheds light on a unified view of classical and quantum reservoir computing.	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# チェンフライス級数によるニューラルオードのラデマッハ複素度 Rademacher Complexity of Neural ODEs via Chen-Fliess Series ( http://arxiv.org/abs/2401.16655v3 ) ライセンス: Link先を確認	Joshua Hanson, Maxim Raginsky,	(参考訳) 本稿では, 非線形ODEに対するChen-Fliess級数展開を用いて, 連続深度ニューラルODEモデルを単一層無限幅ネットとしてフレーム化する方法を示す。このネットでは, 制御入力のシグネチャから出力 `‘weights'' を抽出し, 無限次元パスをテンソルの列として表現するツールであるテンソルの列から, 制御入力の繰り返し積分を構成する。 `features'' は、制御されたODEモデルのベクトル場に関して出力関数のリー微分を反復化したものである。この研究の主な成果は、初期条件をある終端時間にスカラー出力にマッピングするODEモデルのラデマッハ複雑性に対するコンパクトな表現を導出するために、このフレームワークを適用することである。その結果、単層アーキテクチャで得られる素直な分析が活用される。いくつかの特定のシステムのバウンダリをインスタンス化して、潜在的なフォローアップ作業について議論する例で締めくくります。 We show how continuous-depth neural ODE models can be framed as single-layer, infinite-width nets using the Chen--Fliess series expansion for nonlinear ODEs. In this net, the output ``weights'' are taken from the signature of the control input -- a tool used to represent infinite-dimensional paths as a sequence of tensors -- which comprises iterated integrals of the control input over a simplex. The ``features'' are taken to be iterated Lie derivatives of the output function with respect to the vector fields in the controlled ODE model. The main result of this work applies this framework to derive compact expressions for the Rademacher complexity of ODE models that map an initial condition to a scalar output at some terminal time. The result leverages the straightforward analysis afforded by single-layer architectures. We conclude with some examples instantiating the bound for some specific systems and discuss potential follow-up work.	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# スケジューリングされた好奇心-ディープダイナ-Q:対話政策学習のための効率的な探索 Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning ( http://arxiv.org/abs/2402.00085v2 ) ライセンス: Link先を確認	Xuecheng Niu, Akinori Ito, Takashi Nose,	(参考訳) 強化学習に基づくタスク指向対話エージェントの訓練には時間を要する。限られたダイアログ体験の中でダイアログポリシーを把握する方法は、エージェントのトレーニングプロセスの効率を損なう障害である。さらに、従来のほとんどのフレームワークは、人間の学習方法とは異なるトレーニングサンプルをランダムに選択することでトレーニングを開始し、トレーニングの効率と安定性を損なう。そこで本研究では,現状のモデルに基づく強化学習ダイアログモデルであるDeep Dyna-Q(DDQ)に基づく,好奇心駆動型カリキュラム学習フレームワークであるSchduled Curiosity-Deep Dyna-Q(SC-DDQ)を提案する。さらに,SC-DDQ と DDQ の学習スケジュールを,古典的カリキュラム学習と逆バージョンという2つの逆の学習戦略に従って設計した。その結果,本フレームワークは,スケジュール学習と好奇心を導入することで,DDQとディープQラーニング(DQN)を大幅に改善することがわかった。驚いたことに、従来のカリキュラム学習は必ずしも効果的ではなかった。具体的には、実験結果によると、SC-DDQ と DDQ には、より容易で難易度の高い戦略が適している。実験結果から,実験結果のエントロピーを用いて行動探索を図った結果,第1段階では高いエントロピー,最終段階では低いエントロピーのトレーニング戦略により,より優れた性能が得られることがわかった。 Training task-oriented dialog agents based on reinforcement learning is time-consuming and requires a large number of interactions with real users. How to grasp dialog policy within limited dialog experiences remains an obstacle that makes the agent training process less efficient. In addition, most previous frameworks start training by randomly choosing training samples, which differs from the human learning method and hurts the efficiency and stability of training. Therefore, we propose Scheduled Curiosity-Deep Dyna-Q (SC-DDQ), a curiosity-driven curriculum learning framework based on a state-of-the-art model-based reinforcement learning dialog model, Deep Dyna-Q (DDQ). Furthermore, we designed learning schedules for SC-DDQ and DDQ, respectively, following two opposite training strategies: classic curriculum learning and its reverse version. Our results show that by introducing scheduled learning and curiosity, the new framework leads to a significant improvement over the DDQ and Deep Q-learning(DQN). Surprisingly, we found that traditional curriculum learning was not always effective. Specifically, according to the experimental results, the easy-first and difficult-first strategies are more suitable for SC-DDQ and DDQ. To analyze our results, we adopted the entropy of sampled actions to depict action exploration and found that training strategies with high entropy in the first stage and low entropy in the last stage lead to better performance.	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# 組合せ最適化のためのハイパーヒューリスティックスとしての大規模言語モデル Large Language Models as Hyper-Heuristics for Combinatorial Optimization ( http://arxiv.org/abs/2402.01145v2 ) ライセンス: Link先を確認	Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, Guojie Song,	(参考訳) NP-hard combinatorial optimization problem (COP) の完全解釈は、ドメインの専門家をトライアル・アンド・エラー・ヒューリスティックな設計に駆り立てる。設計自動化の長年の取り組みは、大きな言語モデル(LLM)の台頭とともに、新たな勢いを増している。本稿では,LHH(Language Hyper-Heuristics)を提案する。LHH(Language Hyper-Heuristics)は,LLMをヒューリスティック生成に活用し,最小限の手動介入とオープンエンドヒューリスティック空間を特徴とする。 LHHを効果的に探索するための進化探索の新たな統合であるRelective Evolution(ReEvo)と、空間内の言語勾配を提供するLLMリフレクションを提案する。 5つの異種アルゴリズムタイプ、6つの異なるCOP、そして、COPのホワイトボックスとブラックボックスの両方のビューにおいて、ReEvoは最先端で競争的なメタヒューリスティック、進化アルゴリズム、ヒューリスティック、ニューラルソルバを出力し、従来のLHHよりもサンプル効率が高い。私たちのコードは、https://github.com/ai4co/LLM-as-HH.comで利用可能です。 The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. To empower LHHs, we present Reflective Evolution (ReEvo), a novel integration of evolutionary search for efficiently exploring the heuristic space, and LLM reflections to provide verbal gradients within the space. Across five heterogeneous algorithmic types, six different COPs, and both white-box and black-box views of COPs, ReEvo yields state-of-the-art and competitive meta-heuristics, evolutionary algorithms, heuristics, and neural solvers, while being more sample-efficient than prior LHHs. Our code is available: https://github.com/ai4co/LLM-as-HH.	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# 多様性が集団意思決定に及ぼす影響 The effect of diversity on group decision-making ( http://arxiv.org/abs/2402.01427v2 ) ライセンス: Link先を確認	Georgi Karadzhov, Andreas Vlachos, Tom Stafford,	(参考訳) 認知の多様性の異なる側面と、それが集団検討の成功に与える影響を考察する。これを評価するために、Wason Card SelectionタスクであるDeliData corpusについて議論する小さなオンライングループから500の対話を使用します。コーパスを活用することで,認知多様性の3つの異なる尺度を定量的に評価する。まず,多様性のプロキシ尺度としてグループサイズの影響を分析する。第2に、初期アイデアプールのサイズの影響を評価する。最後に、議論の解決策、議論のパターン、そして会話の探索がそれらの特性をどのように改善するかを分析し、議論の内容について考察する。複合バイアスに対するグループの評価にもかかわらず、小集団は対話を通じて直感的なバイアスを克服し、個人の意思決定を改善することができることを示す。大規模なサンプルと異なる運用方法を通じて、より認知的な多様性がより成功したグループ熟考と結びついていることが一貫して分かる。分析に使用されるコードとデータは、リポジトリで利用可能である。 We explore different aspects of cognitive diversity and its effect on the success of group deliberation. To evaluate this, we use 500 dialogues from small, online groups discussing the Wason Card Selection task - the DeliData corpus. Leveraging the corpus, we perform quantitative analysis evaluating three different measures of cognitive diversity. First, we analyse the effect of group size as a proxy measure for diversity. Second, we evaluate the effect of the size of the initial idea pool. Finally, we look into the content of the discussion by analysing discussed solutions, discussion patterns, and how conversational probing can improve those characteristics. Despite the reputation of groups for compounding bias, we show that small groups can, through dialogue, overcome intuitive biases and improve individual decision-making. Across a large sample and different operationalisations, we consistently find that greater cognitive diversity is associated with more successful group deliberation. Code and data used for the analysis are available in the repository: https://github.com/gkaradzhov/cognitive-diversity-groups-cogsci24.	翻訳日:2024-05-21 23:30:28 公開日:2024-05-20
# Bellman Infinity-error を用いた最適対向ロバストQ-ラーニングに向けて Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error ( http://arxiv.org/abs/2402.02165v2 ) ライセンス: Link先を確認	Haoran Li, Zicheng Zhang, Wang Luo, Congying Han, Yudong Hu, Tiande Guo, Shichen Liao,	(参考訳) 強固な政策を確立することは、深層強化学習(DRL)エージェントに影響を及ぼす攻撃や妨害に対抗するために不可欠である。近年の研究では、国家と対立するロバスト性を探求し、最適ロバストポリシー(ORP)の潜在的な欠如を示唆し、厳密なロバスト性制約を設定する上での課題を提起している。はじめに、マルコフ決定過程における最適な行動は、経験的および理論的証拠によって支えられた小さな摂動と一貫しているというポリシー(CAP)の一貫性の仮定を導入する。 CAPを基盤として,ベルマン最適政策に適合する決定論的かつ定常なORPの存在を決定的に証明する。さらに、ベルマン誤差を最小限に抑えるために、$L^{\infty}$-normの必要性を述べる。この発見は、ベルマン最適ポリシーを$L^{1}$-normでターゲットとする従来のDRLアルゴリズムの脆弱性を明らかにし、ベルマンインフィニティエラーのサロゲートを最小化することにより、一貫性のある逆ロバスト深度Q-Network (CAR-DQN) のトレーニングを動機付ける。 CAR-DQNの様々なベンチマークにおける最上位性能は、その実用性を検証し、理論解析の健全性を補強する。 Establishing robust policies is essential to counter attacks or disturbances affecting deep reinforcement learning (DRL) agents. Recent studies explore state-adversarial robustness and suggest the potential lack of an optimal robust policy (ORP), posing challenges in setting strict robustness constraints. This work further investigates ORP: At first, we introduce a consistency assumption of policy (CAP) stating that optimal actions in the Markov decision process remain consistent with minor perturbations, supported by empirical and theoretical evidence. Building upon CAP, we crucially prove the existence of a deterministic and stationary ORP that aligns with the Bellman optimal policy. Furthermore, we illustrate the necessity of $L^{\infty}$-norm when minimizing Bellman error to attain ORP. This finding clarifies the vulnerability of prior DRL algorithms that target the Bellman optimal policy with $L^{1}$-norm and motivates us to train a Consistent Adversarial Robust Deep Q-Network (CAR-DQN) by minimizing a surrogate of Bellman Infinity-error. The top-tier performance of CAR-DQN across various benchmarks validates its practical effectiveness and reinforces the soundness of our theoretical analysis.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# RSCNet: クラウドベースのWiFiセンシングのための動的CSI圧縮 RSCNet: Dynamic CSI Compression for Cloud-based WiFi Sensing ( http://arxiv.org/abs/2402.04888v2 ) ライセンス: Link先を確認	Borna Barahimi, Hakam Singh, Hina Tabassum, Omer Waqar, Mohammad Omer,	(参考訳) WiFi対応のIoT(Internet-of-Things)デバイスは、単なる通信デバイスから計測機器へと進化し、チャネル状態情報(CSI)抽出機能を活用している。それでも、リソース制約のあるIoTデバイスとディープニューラルネットワークの複雑さは、センシングのためにCSIをクラウドサーバに送信する必要がある。実現可能ではあるが、これはかなりの通信オーバーヘッドにつながる。本稿では,圧縮CSIによるセンシングが可能な新しいリアルタイムセンシング・圧縮ネットワーク(RSCNet)を開発し,通信オーバーヘッドを低減する。 RSCNetは、いくつかのCSIフレームからなるCSIウィンドウ間の最適化を容易にする。クラウドサーバに送信されると、Long Short-Term Memory (LSTM) ユニットを使用して、前のウィンドウからのデータを利用する。 RSCNetは、CSI圧縮とセンシング精度のトレードオフを十分にバランスさせ、通信コストを削減し、リアルタイムクラウドベースのWiFiセンシングを合理化する。数値的な発見は、SenseFiのような既存のベンチマークよりもRCCNetが向上していることを示し、最小のCSI再構成誤差で97.4%の感度の精度を示した。また,CSIフレーム数の関数として提案したRCCNetの計算解析を行った。 WiFi-enabled Internet-of-Things (IoT) devices are evolving from mere communication devices to sensing instruments, leveraging Channel State Information (CSI) extraction capabilities. Nevertheless, resource-constrained IoT devices and the intricacies of deep neural networks necessitate transmitting CSI to cloud servers for sensing. Although feasible, this leads to considerable communication overhead. In this context, this paper develops a novel Real-time Sensing and Compression Network (RSCNet) which enables sensing with compressed CSI; thereby reducing the communication overheads. RSCNet facilitates optimization across CSI windows composed of a few CSI frames. Once transmitted to cloud servers, it employs Long Short-Term Memory (LSTM) units to harness data from prior windows, thus bolstering both the sensing accuracy and CSI reconstruction. RSCNet adeptly balances the trade-off between CSI compression and sensing precision, thus streamlining real-time cloud-based WiFi sensing with reduced communication costs. Numerical findings demonstrate the gains of RSCNet over the existing benchmarks like SenseFi, showcasing a sensing accuracy of 97.4% with minimal CSI reconstruction error. Numerical results also show a computational analysis of the proposed RSCNet as a function of the number of CSI frames.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# 浅部ReLU様ニューラルネットワークのランドスケープ:静止点,サドルエスケープ,ネットワーク埋め込み Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding ( http://arxiv.org/abs/2402.05626v2 ) ライセンス: Link先を確認	Zhengqing Wu, Berfin Simsek, Francois Ged,	(参考訳) 本稿では,経験的二乗損失を学習したReLU様活性化関数を持つ一層ニューラルネットワークの損失状況について検討する。アクティベーション関数は微分不可能であるため、固定点を完全に特徴づける方法は今のところ不明である。非微分可能ケースと微分可能ケースの両方に適用可能な定常条件を提案する。さらに、定常点が一階条件で定義される「エスケープニューロン」を含まない場合、局所最小値でなければならないことを示す。さらに、スカラーアウトプットの場合、エスケープニューロンの存在は、静止点が局所的な最小値でないことを保証している。その結果,浅部ReLU様ネットワークに対する無限小の初期化から始まり,サドルからサドルまでのトレーニングプロセスの記述を洗練し,サドルから脱出したニューロンのパラメータ変化と直接関連付けることができた。さらに、より広いネットワーク内でより狭いネットワークをインスタンス化するネットワーク埋め込みが、静止点を再設定する方法について、十分に議論することができる。 In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# ファクティクスとファクティクスの融合: 長期世代における集合的ファクティカルクレームのコントラクティクティヴな性質の評価 Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations ( http://arxiv.org/abs/2402.05629v3 ) ライセンス: Link先を確認	Cheng-Han Chiang, Hung-yi Lee,	(参考訳) 大規模言語モデル(LLM)からの長文生成には、事実と非事実のクレームが混在しており、事実性を評価するのが困難である。よりきめ細かな方法で長文生成の事実精度を評価するために、先行研究では、長文生成を複数の検証可能な事実に分解し、それらの事実を独立に検証することを提案する。生成の事実は、すべての事実の中で検証可能な事実の割合である。このような手法は、事実的クレームの組み合わせが事実的段落を形成すると仮定する。本稿では,エンティティの曖昧さによって仮定に違反する可能性があることを示す。 LLMは、検証可能な事実を含む段落を生成することができるが、実体的曖昧さのため、事実が組み合わさって非事実的段落を形成する。さらに、FActScoreや引用リコールを含む既存の事実精度指標が、これらの非事実項の事実性を適切に評価できないことも明らかにした。そこで本研究では,不明瞭なエンティティを持つコンテンツを対象とした拡張メトリックD-FActScoreを提案する。検索増強世代(RAG)で生成された人物のD-FActScoresを評価する。 D-FActScore は FActScore よりもエンティティの曖昧さで段落の事実性を評価することができることを示す。また,4つのオープンソース LLM が,異なるエンティティの情報を混合して非実数項を形成する傾向にあることも確認した。 Long-form generations from large language models (LLMs) contains a mix of factual and non-factual claims, making evaluating factuality difficult. To evaluate factual precision of long-form generations in a more fine-grained way, prior works propose to decompose long-form generations into multiple verifiable facts and verify those facts independently. The factuality of the generation is the proportion of verifiable facts among all the facts. Such methods assume that combining factual claims forms a factual paragraph. This paper shows that the assumption can be violated due to entity ambiguity. We show that LLMs can generate paragraphs that contain verifiable facts, but the facts are combined to form a non-factual paragraph due to entity ambiguity. We further reveal that existing factual precision metrics, including FActScore and citation recall, cannot properly evaluate the factuality of these non-factual paragraphs. To address this, we introduce an enhanced metric, D-FActScore, specifically designed for content with ambiguous entities. We evaluate the D-FActScores of people biographies generated with retrieval-augmented generation (RAG). We show that D-FActScore can better assess the factuality of paragraphs with entity ambiguity than FActScore. We also find that four widely used open-source LLMs tend to mix information of distinct entities to form non-factual paragraphs.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# REMEDI: 改良されたニューラルエントロピー推定のための補正変換 REMEDI: Corrective Transformations for Improved Neural Entropy Estimation ( http://arxiv.org/abs/2402.05718v2 ) ライセンス: Link先を確認	Viktor Nilsson, Anirban Samaddar, Sandeep Madireddy, Pierre Nyquist,	(参考訳) 情報理論量は、機械学習において中心的な役割を果たす。近年、データとモデルの複雑さの増大により、これらの量の正確な推定に対する需要が高まっている。しかし、次元が大きくなるにつれて、既存の手法は比較的低次元で既に苦労しているため、推定には大きな課題が生じる。この問題に対処するために、基本的な情報理論量である微分エントロピーの効率的かつ正確な推定のために$\texttt{REMEDI}$を導入する。このアプローチは、単純で適応的なベースモデルに対するクロスエントロピーの最小化と、データ密度から相対エントロピーの観点からそれらの偏差を推定するものである。提案手法は, 合成データと自然データの両方において, エントロピー推定を包含して, 幅広い推定課題にまたがる改善を実証する。さらに、我々は、重要な理論的整合性の結果を、我々のアプローチで要求されるより一般化された設定にまで拡張する。本稿では,このフレームワークを情報理論教師あり学習モデルに自然に拡張する方法について述べる。本手法は既存のインフォメーション・ボトルネック法と比較して精度がよいことを示す。さらに、$\texttt{REMEDI}$と、リジェクションサンプリングとランゲヴィンダイナミクスを用いた生成モデリングとの自然な関係について検討する。 Information theoretic quantities play a central role in machine learning. The recent surge in the complexity of data and models has increased the demand for accurate estimation of these quantities. However, as the dimension grows the estimation presents significant challenges, with existing methods struggling already in relatively low dimensions. To address this issue, in this work, we introduce $\texttt{REMEDI}$ for efficient and accurate estimation of differential entropy, a fundamental information theoretic quantity. The approach combines the minimization of the cross-entropy for simple, adaptive base models and the estimation of their deviation, in terms of the relative entropy, from the data density. Our approach demonstrates improvement across a broad spectrum of estimation tasks, encompassing entropy estimation on both synthetic and natural data. Further, we extend important theoretical consistency results to a more generalized setting required by our approach. We illustrate how the framework can be naturally extended to information theoretic supervised learning models, with a specific focus on the Information Bottleneck approach. It is demonstrated that the method delivers better accuracy compared to the existing methods in Information Bottleneck. In addition, we explore a natural connection between $\texttt{REMEDI}$ and generative modeling using rejection sampling and Langevin dynamics.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# 推薦の優先順位付けのための非自己回帰生成モデル Non-autoregressive Generative Models for Reranking Recommendation ( http://arxiv.org/abs/2402.06871v2 ) ライセンス: Link先を確認	Yuxin Ren, Qiya Yang, Yichun Wu, Wei Xu, Yalong Wang, Zhiqiang Zhang,	(参考訳) コンテンポラリーレコメンデーションシステムは、ユーザのニーズを満たすために、特定の要求や関心に合わせたアイテムの適切なリストを提供することによって設計されている。多段階レコメンデーションシステムでは、項目間のリスト内相関をモデル化することで、リランクが重要な役割を果たす。再階の鍵となる課題は、置換の組合せ空間内の最適な列の探索である。近年の研究では、ジェネレータが複数の実行可能なシーケンスを生成し、評価器が推定されたリストワイズスコアに基づいて最適なシーケンスを選択する、ジェネレータ-評価器学習パラダイムを提案する。ジェネレータは非常に重要であり、生成モデルはジェネレータ機能に適している。現在の生成モデルは、シーケンス生成のための自己回帰戦略を採用している。しかし、リアルタイム産業システムに自己回帰モデルを展開することは困難である。これらの課題に対処するため,効率と有効性を高めるために,提案するレコメンデーション(NAR4Rec)の再評価のための非自己回帰生成モデルを提案する。スパーストレーニングサンプルや動的候補といった課題に対処するために,マッチングモデルを導入する。ユーザフィードバックの多様性を考えると、実現不可能なシークエンスと不可能なシークエンスを区別するために、シークエンスレベルの相違したトレーニング目標を用いる。さらに,対象項目に関する非自己回帰モデルにおける依存性モデリングの欠如を克服するため,これらの項目間の相関を捉えるためにコントラッシブデコーディングを導入する。大規模なオフライン実験により、NAR4Recは最先端の再ランク法よりも優れた性能を示す。オンラインA/Bテストでは、NAR4Recはユーザーエクスペリエンスを大幅に向上させる。さらに、NAR4Recは、毎日3億人以上のアクティブユーザーがいる人気ビデオアプリKuaishouに完全にデプロイされている。 Contemporary recommendation systems are designed to meet users' needs by delivering tailored lists of items that align with their specific demands or interests. In a multi-stage recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items. The key challenge of reranking lies in the exploration of optimal sequences within the combinatorial space of permutations. Recent research proposes a generator-evaluator learning paradigm, where the generator generates multiple feasible sequences and the evaluator picks out the best sequence based on the estimated listwise score. The generator is of vital importance, and generative models are well-suited for the generator function. Current generative models employ an autoregressive strategy for sequence generation. However, deploying autoregressive models in real-time industrial systems is challenging. To address these issues, we propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness. To tackle challenges such as sparse training samples and dynamic candidates, we introduce a matching model. Considering the diverse nature of user feedback, we employ a sequence-level unlikelihood training objective to differentiate feasible sequences from unfeasible ones. Additionally, to overcome the lack of dependency modeling in non-autoregressive models regarding target items, we introduce contrastive decoding to capture correlations among these items. Extensive offline experiments validate the superior performance of NAR4Rec over state-of-the-art reranking methods. Online A/B tests reveal that NAR4Rec significantly enhances the user experience. Furthermore, NAR4Rec has been fully deployed in a popular video app Kuaishou with over 300 million daily active users.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# ジェネレーションの検証 - スマート並列オートコレクトデコーディングによる大規模言語モデル推論の高速化 Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding ( http://arxiv.org/abs/2402.11809v3 ) ライセンス: Link先を確認	Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao,	(参考訳) 本研究の目的は,数十億のパラメータを持つ大規模言語モデル(LLM)の推論速度を高速化することである。本稿では, LLMのロスレスアクセラレーションを実現するための革新的なアプローチであるSPACE(textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding)を提案する。半自己回帰推論と投機的復号機能を統合することにより、SPACEはトークン生成と検証を並列化する自動回帰LDMを独自に実現している。これは、既存のLLMに複数のトークンを同時に予測する能力を持たせる、半自己回帰制御された微調整プロセスによって実現される。さらに、自動訂正復号アルゴリズムは、1つのモデル呼び出し内でトークンシーケンスの同時生成と検証を容易にする。幅広い LLM の実験を通じて、SPACE は出力品質を維持しながら、HumanEval-X 上の2.7x-4.0x までの推論速度を実証した。 This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# UniST: 都市時空間予測のためのプロンプト型ユニバーサルモデル UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction ( http://arxiv.org/abs/2402.11838v2 ) ライセンス: Link先を確認	Yuan Yuan, Jingtao Ding, Jie Feng, Depeng Jin, Yong Li,	(参考訳) 都市空間の時空間予測は交通管理,資源最適化,都市計画といった情報的意思決定に不可欠である。自然言語のための事前訓練された基礎モデルは、様々な領域にまたがる複数のタスクに1つの汎用モデルが取り組むという、驚くべきブレークスルーを経験してきたが、都市空間の時空間モデリングは遅れている。都市予測のための既存のアプローチは、通常特定の時空間シナリオに合わせて調整され、タスク固有のモデル設計と広範なドメイン内トレーニングデータを必要とする。本研究では,都市空間の時空間予測のためのユニバーサルモデルUniSTを提案する。大規模な言語モデルからインスピレーションを得たUniSTは、以下の通り成功している。一多様な時空間データ特性に対する柔軟性 (II)複雑な時空間的関係を捉えるための精巧なマスキング戦略による効果的な生成前訓練三シナリオをまたいだ本質的・共有的知識の整合と活用を図るための時空間的知識誘導プロンプト。これらの設計は、強力な一般化能力を備えた時空間予測のための1対全モデルの可能性を開放するものである。 15の都市と6つのドメインに対する大規模な実験は、特にショット数とゼロショットのシナリオにおいて、最先端の予測性能の進歩におけるUniSTの普遍性を実証している。 Urban spatio-temporal prediction is crucial for informed decision-making, such as transportation management, resource optimization, and urban planning. Although pretrained foundation models for natural languages have experienced remarkable breakthroughs, wherein one general-purpose model can tackle multiple tasks across various domains, urban spatio-temporal modeling lags behind. Existing approaches for urban prediction are usually tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive in-domain training data. In this work, we propose a universal model, UniST, for urban spatio-temporal prediction. Drawing inspiration from large language models, UniST achieves success through: (i) flexibility towards diverse spatio-temporal data characteristics, (ii) effective generative pre-training with elaborated masking strategies to capture complex spatio-temporal relationships, (iii) spatio-temporal knowledge-guided prompts that align and leverage intrinsic and shared knowledge across scenarios. These designs together unlock the potential of a one-for-all model for spatio-temporal prediction with powerful generalization capability. Extensive experiments on 15 cities and 6 domains demonstrate the universality of UniST in advancing state-of-the-art prediction performance, especially in few-shot and zero-shot scenarios.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# AgentScope: 柔軟でロバストなマルチエージェントプラットフォーム AgentScope: A Flexible yet Robust Multi-Agent Platform ( http://arxiv.org/abs/2402.14034v2 ) ライセンス: Link先を確認	Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou,	(参考訳) LLM(Large Language Models)の急速な進歩により、マルチエージェントアプリケーションにおいて大きな進歩が見られた。しかし、エージェントの協調とLLMの不安定な性能の複雑さは、堅牢で効率的なマルチエージェントアプリケーションを開発する上で顕著な課題となっている。これらの課題に対処するために,メッセージ交換をコア通信機構とする開発者中心のマルチエージェントプラットフォームであるAgentScopeを提案する。豊富な構文ツール、組み込みエージェントとサービス機能、アプリケーションのデモとユーティリティモニタのためのユーザフレンドリなインターフェース、ゼロコードプログラミングワークステーション、自動プロンプトチューニング機構により、開発とデプロイメントの両方の障壁は大幅に低下した。堅牢で柔軟なマルチエージェントアプリケーションを目指して、AgentScopeはビルトインとカスタマイズ可能なフォールトトレランスメカニズムを提供する。同時に、マルチモーダルデータ、ツール、外部知識の管理と利用のためのシステムレベルのサポートも備えている。さらに、ローカルおよび分散デプロイメント間の変換を容易にし、余分な労力なしで自動並列最適化を可能にするアクタベースの分散フレームワークを設計する。これらの機能により、AgentScopeは開発者がインテリジェントエージェントの可能性を完全に認識するアプリケーションを構築することができる。我々はAgentScopeをhttps://github.com/modelscope/agentscopeでリリースしました。 With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications. However, the complexities in coordinating agents' cooperation and LLMs' erratic performance pose notable challenges in developing robust and efficient multi-agent applications. To tackle these challenges, we propose AgentScope, a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment. Towards robust and flexible multi-agent application, AgentScope provides both built-in and customizable fault tolerance mechanisms. At the same time, it is also armed with system-level support for managing and utilizing multi-modal data, tools, and external knowledge. Additionally, we design an actor-based distribution framework, enabling easy conversion between local and distributed deployments and automatic parallel optimization without extra effort. With these features, AgentScope empowers developers to build applications that fully realize the potential of intelligent agents. We have released AgentScope at https://github.com/modelscope/agentscope, and hope AgentScope invites wider participation and innovation in this fast-moving field.	翻訳日:2024-05-21 23:20:37 公開日:2024-05-20
# E2USD:多変量時系列の効率的な非教師付き状態検出 E2USD: Efficient-yet-effective Unsupervised State Detection for Multivariate Time Series ( http://arxiv.org/abs/2402.14041v5 ) ライセンス: Link先を確認	Zhichen Lai, Huan Li, Dalin Zhang, Yan Zhao, Weizhu Qian, Christian S. Jensen,	(参考訳) サイバー物理系センサーは物理系プロセスを監視する多変量時系列(MTS)を出力する。このような時系列は一般に、人間の活動監視において「歩き」や「走り」といった特定の条件に対応する、それぞれの期間が異なる未知の状態の数をキャプチャする。このような状態の教師なし識別は、その後のデータ解析における記憶と処理を容易にし、結果の解釈可能性を高める。既存の状態検出提案は3つの課題に直面している。まず、かなりの計算オーバーヘッドを導入し、リソース制約やストリーミング設定で非現実的にレンダリングする。第二に、最先端のSOTA(State-of-the-art)の提案では、表現のための対照的な学習が採用されているが、偽陰性ハッパーモデル収束と精度に対する注意が不十分である。第三に、SOTAの提案は主にオフラインの非ストリーミングデプロイメントにのみ重点を置いており、オンラインストリーミングシナリオを最適化する緊急の必要性を強調しています。本稿では,効率よくyet-accurate unsupervised MTS状態検出が可能なE2Usdを提案する。 E2UsdはFast Fourier Transform-based Time Series Compressor (fftCompress) とDecomposed Dual-view Embedding Module (ddEM) を利用している。さらに,偽陰性の影響を防止し,クラスタフレンドリーな埋め込み空間を実現するために,False Negative Cancellstive Learning法(fnccLearning)を提案する。ストリーミング設定における計算オーバーヘッドを軽減するため,Adaptive Threshold Detection (adaTD)を導入する。 6つのベースラインと6つのデータセットによる総合的な実験は、E2Usdが計算オーバーヘッドを大幅に削減したSOTA精度を持つことを示す。 Cyber-physical system sensors emit multivariate time series (MTS) that monitor physical system processes. Such time series generally capture unknown numbers of states, each with a different duration, that correspond to specific conditions, e.g., "walking" or "running" in human-activity monitoring. Unsupervised identification of such states facilitates storage and processing in subsequent data analyses, as well as enhances result interpretability. Existing state-detection proposals face three challenges. First, they introduce substantial computational overhead, rendering them impractical in resourceconstrained or streaming settings. Second, although state-of-the-art (SOTA) proposals employ contrastive learning for representation, insufficient attention to false negatives hampers model convergence and accuracy. Third, SOTA proposals predominantly only emphasize offline non-streaming deployment, we highlight an urgent need to optimize online streaming scenarios. We propose E2Usd that enables efficient-yet-accurate unsupervised MTS state detection. E2Usd exploits a Fast Fourier Transform-based Time Series Compressor (fftCompress) and a Decomposed Dual-view Embedding Module (ddEM) that together encode input MTSs at low computational overhead. Additionally, we propose a False Negative Cancellation Contrastive Learning method (fnccLearning) to counteract the effects of false negatives and to achieve more cluster-friendly embedding spaces. To reduce computational overhead further in streaming settings, we introduce Adaptive Threshold Detection (adaTD). Comprehensive experiments with six baselines and six datasets offer evidence that E2Usd is capable of SOTA accuracy at significantly reduced computational overhead.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# 機械学習注意モデルを用いた時間確率バイアス補正 A Temporal Stochastic Bias Correction using a Machine Learning Attention model ( http://arxiv.org/abs/2402.14169v4 ) ライセンス: Link先を確認	Omer Nivron, Damon J. Wischik, Mathieu Vrac, Emily Shuckburgh, Alex T. Archibald,	(参考訳) 気候モデルは現実世界の観測に偏っている。通常、それらは衝撃研究に使用される前に調整される必要がある。このような調整を可能にする統計手法の組はバイアス補正(BC)と呼ばれる。しかし、BCの手法は現在、時間的バイアスを調整するのに苦労している。なぜなら彼らは、連続する時間点間の依存をほとんど無視しているからである。その結果、熱波の持続時間や周波数などの長期的特性を持つ気候統計は正確には修正できない。これにより、このような気候統計に関する信頼性の高い影響研究がより困難になる。本稿では,時間バイアスを補正する新しいBC手法を提案する。これは紀元前の背景にある哲学を再考することで可能となった。 BC を確率的出力を伴う時間インデックス回帰タスクとして紹介する。 BCを再考することで、最先端の機械学習(ML)の注意モデルに適応し、時間的非同期性を含むさまざまな種類のバイアスを学ぶことができます。アブハ,ナイジェリア,東京における熱波持続時間統計のケーススタディにより,現在の気象モデルと代替BC法よりも正確な結果が得られた。 Climate models are biased with respect to real-world observations. They usually need to be adjusted before being used in impact studies. The suite of statistical methods that enable such adjustments is called bias correction (BC). However, BC methods currently struggle to adjust temporal biases. Because they mostly disregard the dependence between consecutive time points. As a result, climate statistics with long-range temporal properties, such as heatwave duration and frequency, cannot be corrected accurately. This makes it more difficult to produce reliable impact studies on such climate statistics. This paper offers a novel BC methodology to correct temporal biases. This is made possible by rethinking the philosophy behind BC. We will introduce BC as a time-indexed regression task with stochastic outputs. Rethinking BC enables us to adapt state-of-the-art machine learning (ML) attention models and thereby learn different types of biases, including temporal asynchronicities. With a case study of heatwave duration statistics in Abuja, Nigeria, and Tokyo, Japan, we show more accurate results than current climate model outputs and alternative BC methods.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# API-BLEND: API LLMのトレーニングとベンチマークのための総合コーパス API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs ( http://arxiv.org/abs/2402.15491v2 ) ライセンス: Link先を確認	Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Sadhana Kumaravel, Vinod Muthusamy, Pavan Kapanipathi, Luis A. Lastras,	(参考訳) ツールと外部アプリケーションプログラミングインターフェース(API)を効果的に利用し、タスクを計画し、完成させるために、LLM(Large Language Models)の必要性はますます高まっている。そのため、ツールやAPIへの呼び出しを含む十分な量のトレインデータやテストデータを取得することのできるメソッドには、非常に関心があります。この課題に対処するための主要な戦略として、2つの研究線が生まれている。ひとつは合成データ生成技術に重点を置いており、もうひとつは、API/ツールベースのタスクに変換可能なタスク関連データセットのキュレーションだ。本稿では,既存のデータセットを特定し,キュレートし,変換するタスクに着目し,ツール拡張LDMのトレーニングと体系的なテストを行うための大規模なコーパスであるAPI-BLENDを導入する。データセットは、API/ツール検出、スロットフィリング、検出されたAPIのシークエンシングといったAPIタスクを含む現実のシナリオを模倣する。トレーニングとベンチマークの両方の目的で,API-BLENDデータセットの有用性を実証する。 There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this challenge. The first has focused on synthetic data generation techniques, while the second has involved curating task-adjacent datasets which can be transformed into API / Tool-based tasks. In this paper, we focus on the task of identifying, curating, and transforming existing datasets and, in turn, introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs. The datasets mimic real-world scenarios involving API-tasks such as API / tool detection, slot filling, and sequencing of the detected APIs. We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# FedFDP: 差別化プライバシによる公正なフェデレーション学習 FedFDP: Fairness-Aware Federated Learning with Differential Privacy ( http://arxiv.org/abs/2402.16028v2 ) ライセンス: Link先を確認	Xinpeng Ling, Jie Fu, Kuncan Wang, Huifa Li, Tong Cheng, Zhili Chen,	(参考訳) Federated Learning(FL)は、データサイロの課題を克服する新しい機械学習パラダイムであり、大きな注目を集めている。しかし、我々の観察により、グローバルに効果的に訓練されたモデルは、異なるクライアントでパフォーマンスの相違が生じる可能性がある。これは、クライアントが共同でトレーニングしたモデルが不公平な結果をもたらす可能性を示唆している。一方、関連する研究では、連合学習における勾配やモデルの伝達が、メンバーシップ推論攻撃などのプライバシー漏洩問題を引き起こす可能性があることを示唆している。上記の課題に対処するため,FedFairと呼ばれる公平性を考慮したフェデレーション学習アルゴリズムを提案する。 FedFairに基づいて、上記の2つ目の問題に対処するため、FedFDPアルゴリズムを形成するためにプライバシ保護を導入します。 FedFDPでは、公正度を調整しながら差分プライバシーを実現するために、公平性を考慮したクリッピング戦略を考案する。さらに, 付加的なアップロード損失値に対して, 有効性を最大化するための適応的クリッピング手法を提案する。さらに、我々のアルゴリズムが収束し、差分プライバシーを保証することを理論的に証明する。最後に、FedFairとFedFDPは、モデル性能と公正性の観点から、最先端のソリューションを著しく上回っていることを示す。コードとデータはhttps://anonymous.4open.science/r/FedFDP-5607でアクセスできる。 Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the transmission of gradients or models in federated learning can also give rise to privacy leakage issues, such as membership inference attacks. To address the first issue mentioned above, we propose a fairness-aware federated learning algorithm, termed FedFair. Building upon FedFair, we introduce privacy protection to form the FedFDP algorithm to address the second issue mentioned above. In FedFDP, we devise a fairness-aware clipping strategy to achieve differential privacy while adjusting fairness. Additionally, for the extra uploaded loss values, we present an adaptive clipping approach to maximize utility. Furthermore, we theoretically prove that our algorithm converges and ensures differential privacy. Lastly, extensive experimental results demonstrate that FedFair and FedFDP significantly outperform state-of-the-art solutions in terms of model performance and fairness. Code and data is accessible at https://anonymous.4open.science/r/FedFDP-5607.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# M3-VRD:マルチモーダルマルチタスクマルチ教師ビジュアルリッチフォーム文書理解 M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding ( http://arxiv.org/abs/2402.17983v2 ) ライセンス: Link先を確認	Yihao Ding, Lorenzo Vaiani, Caren Han, Jean Lee, Paolo Garza, Josiah Poon, Luca Cagliero,	(参考訳) 本稿では,視覚的にリッチな形式文書理解のための,マルチモーダル・マルチタスク・マルチ教師共同知識蒸留モデルを提案する。このモデルは、トークンとエンティティ表現の微妙な相関を容易にし、フォームドキュメントに固有の複雑さに対処することによって、きめ細かなレベルと粗いレベルの両方の洞察を活用するように設計されている。さらに, 多様な多教師間知識蒸留プロセスの高度化, 分散ギャップの提示, フォーム文書の調和的理解を実現するために, 新たな粒度間・粒度間損失関数を導入する。公開形式文書理解データセットの総合的な評価を通じて,提案モデルは既存のベースラインを一貫して上回り,視覚的に複雑な形式文書の複雑な構造や内容を扱う上での有効性を示す。 This paper presents a groundbreaking multimodal, multi-task, multi-teacher joint-grained knowledge distillation model for visually-rich form document understanding. The model is designed to leverage insights from both fine-grained and coarse-grained levels by facilitating a nuanced correlation between token and entity representations, addressing the complexities inherent in form documents. Additionally, we introduce new inter-grained and cross-grained loss functions to further refine diverse multi-teacher knowledge distillation transfer process, presenting distribution gaps and a harmonised understanding of form documents. Through a comprehensive evaluation across publicly available form document understanding datasets, our proposed model consistently outperforms existing baselines, showcasing its efficacy in handling the intricate structures and content of visually complex form documents.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# 散逸性ラシュバナノワイヤにおけるマヨラナゼロモード Majorana zero-modes in a dissipative Rashba nanowire ( http://arxiv.org/abs/2403.00419v2 ) ライセンス: Link先を確認	Arnob Kumar Ghosh, Annica M. Black-Schaffer,	(参考訳) 凝縮物質系は連続的に散逸し、しばしば量子現象に悪影響を及ぼす。超伝導ラシュバナノワイヤにおける散逸の影響に着目した。このシステムは、散逸の存在下では、有限寿命でマヨラナゼロモード(MZM)をホストできる。最も興味深いことに、散逸は、非散逸系が位相的に自明な状態において、4つの強零モード (RZM) と2つのMZM (MZM) という2種類の散逸境界状態を生成することもできる。 MZMはバルクギャップ閉鎖によって出現し、トポロジカルに巻数によって特徴づけられる。 RZMはバルク状態とは関連がなく、巻く数を持たないが、その出現は例外的な点と結びついている。さらに, 散逸により誘発されるRZMとMZMの安定性を, ランダム障害の有無で確認した。本研究は,MZMを消散によって駆動される実験装置で実現し,安定化させる方法である。 Condensed matter systems are continuously subjected to dissipation, which often has adverse effects on quantum phenomena. We focus on the impact of dissipation on a superconducting Rashba nanowire. We reveal that the system can still host Majorana zero-modes (MZMs) with a finite lifetime in the presence of dissipation. Most interestingly, dissipation can also generate two kinds of dissipative boundary states: four robust zero-modes (RZMs) and two MZMs, in the regime where the non-dissipative system is topologically trivial. The MZMs appear via bulk gap closing and are topologically characterized by a winding number. The RZMs are not associated with any bulk states and possess no winding number, but their emergence is instead tied to exceptional points. Further, we confirm the stability of the dissipation-induced RZMs and MZMs in the presence of random disorder. Our study paves the way for both realizing and stabilizing MZMs in an experimental setup, driven by dissipation.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# 確率モデルによるボンガード・ログ問題の解法 Solving the bongard-logo problem by modeling a probabilistic model ( http://arxiv.org/abs/2403.03173v5 ) ライセンス: Link先を確認	Ruizhuo Song, Beiming Yuan,	(参考訳) 抽象推論問題は、AIアルゴリズムの知覚と認識能力に課題をもたらし、明示的な画像特徴の単なる識別以上のパターン認識と帰納的推論を要求する。本研究では,Bongard-Logo問題に適した確率モデルであるPMoCを導入し,独立確率モデルの構築を通じて高い推論精度を実現する。さらに,Bongard-Logo,RAVEN,I-RAVEN,PGMなど,複雑な抽象的推論タスクに特化した拡張トランスフォーマーであるPose-Transformerを設計した。カプセルネットワークのポーズ行列にインスパイアされたPose-Transformerは、画像データを処理する際の局所的特徴間の位置関係に焦点を当てる。 PMoCと組み合わせることで、推論精度をさらに高めることができる。我々のPose-Transformerは、抽象エンティティの位置の変化に伴う推論の難しさを効果的に解決し、RAVENのOIG、D3x3サブセット、およびPGMデータセット上で以前のモデルより優れている。最後に,多数のPose-Transformerパラメータから生じる展開困難を考慮し,パラメータ数を著しく削減しつつ,性能を向上する軽量版Straw Pose-Transformerを提案する。本研究は,抽象的推論と認知パターン認識におけるAI能力の向上に寄与する。 Abstract reasoning problems pose challenges to the perception and cognition abilities of AI algorithms, demanding deeper pattern recognition and inductive reasoning beyond mere identification of explicit image features. In this study, we introduce PMoC, a probabilistic model tailored for the Bongard-Logo problem, achieving high reasoning accuracy through the construction of an independent probabilistic model. Additionally, we have designed the Pose-Transformer, an enhanced Transformer-Encoder specifically crafted for complex abstract reasoning tasks, including Bongard-Logo, RAVEN, I-RAVEN, and PGM. Inspired by the pose matrix in capsule networks, Pose-Transformer strengthens the focus on positional relationships between local features when processing image data. When combined with PMoC, it can further enhance reasoning accuracy. Our Pose-Transformer effectively addresses reasoning difficulties associated with changes in the position of abstract entities, outperforming previous models on RAVEN's OIG, D3x3 subsets, and the PGM dataset. Finally, considering the deployment difficulties arising from the large number of Pose-Transformer parameters, this paper presents a lightweight version, Straw Pose-Transformer, which maintains performance while significantly reducing the parameter count. This study contributes to enhancing AI capabilities in abstract reasoning and cognitive pattern recognition.	翻訳日:2024-05-21 23:10:31 公開日:2024-05-20
# Triple-CFN:抽象推論プロセスの強化のための概念空間の再構築 Triple-CFN: Restructuring Concept Spaces for Enhancing Abstract Reasoning Process ( http://arxiv.org/abs/2403.03190v8 ) ライセンス: Link先を確認	Ruizhuo Song, Beiming Yuan,	(参考訳) 抽象推論は人工知能アルゴリズムに重大な課題をもたらし、知覚タスクに必要な以上の認知能力を要求する。本研究では,画像から概念や特徴を別々に抽出する新しいフレームワークであるCross-Feature Network(CFN)を紹介する。このフレームワークは、特にボンガード・ローゴ問題に対処する上で、推論の表現として機能に対する応答を利用する。抽出した概念と特徴をCFN内に組み込んだ期待最大化プロセスを統合することで,一定の限界はあるものの,顕著な結果を得た。これらの制約を克服するために,画像からの特徴抽出を最大化し,ボンガード・ローゴとレイブンの進歩行列(RPM)の両問題において有効性を示す効率的なモデルであるTriple-CFNを提案する。さらに, RPM問題に適した概念空間を明示的に構築する, Triple-CFN の先進バージョンである Meta Triple-CFN を紹介する。これにより、関連する概念の推論と解釈可能性の高い精度が保証される。全体として、この研究は抽象的推論のための革新的なネットワーク設計を探求し、マシンインテリジェンスのフロンティアを前進させる。 Abstract reasoning poses significant challenges to artificial intelligence algorithms, demanding a cognitive ability beyond that required for perceptual tasks. In this study, we introduce the Cross-Feature Network (CFN), a novel framework designed to separately extract concepts and features from images. This framework utilizes the responses of features to concepts as representations for reasoning, particularly in addressing the Bongard-Logo problem. By integrating an Expectation-Maximization process between the extracted concepts and features within the CFN, we have achieved notable results, albeit with certain limitations. To overcome these limitations, we propose the Triple-CFN, an efficient model that maximizes feature extraction from images and demonstrates effectiveness in both the Bongard-Logo and Raven's Progressive Matrices (RPM) problems. Furthermore, we introduce Meta Triple-CFN, an advanced version of Triple-CFN, which explicitly constructs a concept space tailored for RPM problems. This ensures high accuracy of reasoning and interpretability of the concepts involved. Overall, this work explores innovative network designs for abstract reasoning, thereby advancing the frontiers of machine intelligence.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# D4Cグラブトレイン:概念記述と建築分布によるRPMとボンガードログ問題の解法 D4C Glove-train: Solving the RPM and Bongard-logo Problem by Circumscribing and Building Distribution for Concepts ( http://arxiv.org/abs/2403.03452v8 ) ライセンス: Link先を確認	Ruizhuo Song, Beiming Yuan,	(参考訳) 本稿では,抽象的推論の領域において,特にRaven's Progressive Matrices (RPM) と Bongard-Logo の課題に対処する上で,注目すべき進歩を実現する。リコネット(Lico-Net)は,RPM問題に顕著な精度で対処する新しいベースラインモデルである。この基礎を生かして、我々はD3Cアプローチを推進し、分布を通して抽象的推論問題の根底にある概念を提唱する。この観点は、Lico-NetとBongard-Logoタスクに優れたベースラインモデルの両方のパフォーマンスを向上させる。 D3Cの計算効率を高めるために,D3C-cosの変種を示す。さらに,これらの領域における概念境界を再定義するD2C手法を提案する。最後に、我々の方法論をD4Cに拡張し、さらに概念境界を洗練させ、RPMとBongard-Logoの課題において実質的な改善を示す。全体として、我々の貢献は抽象的推論の分野における新たな展望と実践的な進歩を示している。 This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstract reasoning problems through distributions. This perspective enhances the performance of both Lico-Net and a baseline model excelling in Bongard-Logo tasks. To bolster the computational efficiency of D3C, we present the D3C-cos variant, offering a streamlined solution. Furthermore, we propose the D2C method, redefining concept boundaries within these domains and bridging the divide between high-level abstractions and their lower-dimensional counterparts. Finally, we extend our methodology to D4C, employing adversarial techniques to refine concept boundaries further and demonstrate substantial improvements in both RPM and Bongard-Logo challenges. Overall, our contributions present a fresh outlook and practical advancements in the field of abstract reasoning.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# 自然パラメトリックダウンコンバージョンにおけるアインシュタイン-ポドルスキー-ローゼン相関-ガウス近似を超えて- Einstein-Podolsky-Rosen correlations in spontaneous parametric down-conversion: Beyond the Gaussian approximation ( http://arxiv.org/abs/2403.04561v2 ) ライセンス: Link先を確認	A. G. da Costa Moura, C. H. Monken,	(参考訳) 本稿では、ガウス近似を用いずに、運動量と位置空間の両方で自発パラメトリックダウンコンバージョンによって生じる光子対の偶然検出確率振幅について解析式を示し、非線形結晶における複屈折の影響を考慮に入れた。また,Einstein-Podolsky-Rosen相関をベンチマークとして8種類のポンプビーム構成の理論的予測を支持する実験データも提示した。 We present analytic expressions for the coincidence detection probability amplitudes of photon pairs generated by spontaneous parametric down-conversion in both momentum and position spaces, without making use of the Gaussian approximation, and taking into account the effects of birefringence in the nonlinear crystal. We also present experimental data supporting our theoretical predictions, using Einstein-Podolsky-Rosen correlations as benchmarks, for 8 different pump beam configurations.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# ディープフェイク映像検出のための爆発型潜水流 Exploiting Style Latent Flows for Generalizing Deepfake Video Detection ( http://arxiv.org/abs/2403.06592v3 ) ライセンス: Link先を確認	Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi,	(参考訳) 提案手法は, 映像の時間的変化における遅延ベクトルの解析と異常挙動に基づいて, フェイクビデオの検出手法を提案する。生成した顔画像は,様々な表情と幾何変換を伴う時間的安定な映像の生成において必然的に避けられない,スタイル潜時ベクトルの時間的変化の時間的特徴に悩まされていることがわかった。我々のフレームワークは、スタイル潜在ベクトルの動的特性を表現するために、コントラスト学習によって訓練されたStyleGRUモジュールを利用する。さらに,StyleGRU生成機能とコンテンツベース機能を統合し,視覚的および時間的アーティファクトの検出を可能にするスタイルアテンションモジュールを導入する。提案手法はディープフェイク検出における様々なベンチマークシナリオにまたがって,クロスデータセットおよびクロスマニピュレーションシナリオにおいて,その優位性を示す。さらなる分析を通じて、我々は、ディープフェイクビデオ検出の一般性を改善するために、スタイル潜在ベクトルの時間的変化を用いることの重要性も検証した。 This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# 無限次元ベイズ逆問題に対する幾何MCMCの微分インフォームドニューラル演算子加速 Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems ( http://arxiv.org/abs/2403.08220v2 ) ライセンス: Link先を確認	Lianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas,	(参考訳) 本稿では,無限次元ベイズ逆問題(BIP)の解法として,幾何学的マルコフ連鎖モンテカルロ(MCMC)を高速化する演算子学習手法を提案する。幾何学的MCMCでは、後部局所幾何学に適応する高品質な提案が採用されているが、パラメータ・トゥ・オブザーバブル(PtO)写像が高価なパラメトリック偏微分方程式(PDE)によって定義されると、ログのような勾配とヘシアンの繰り返し計算が禁止される。本稿では,PtOマップのニューラル演算子サロゲートによって駆動される遅延受容幾何学的MCMC法について考察する。かなりのスピードアップを達成するためには、サロゲートはPtOマップとそのヤコビアンを正確に近似する必要がある。本研究では、PtO写像とヤコビアンの合同サンプルを用いた微分インフォームド演算子学習の拡張(O'Leary-Roseberry et al , J. Comput. Phys., 496 (2024))を提案する。これによりデリバティブインフォームド・ニューラル・オペレーター(DINO)は、観測可能および後部局所幾何学を従来の方法よりも大幅に低いトレーニングコストで正確に予測するサロゲートとなる。還元基底DINOサロゲートのコスト及び誤差解析を行う。 DINO駆動MCMCは、幾何学的MCMCより3～9倍、幾何学的MCMCより60～97倍、効果的な後部サンプルを生成する。さらに, DINOサロゲートのトレーニングコストは, 10～25個の有効後部サンプルの後に, 幾何学的MCMCと比較しても低下する。 We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# 創発的エージェント・ソサイエティにおける社会的ノルムの出現--原理と建築 Emergence of Social Norms in Generative Agent Societies: Principles and Architecture ( http://arxiv.org/abs/2403.08251v2 ) ライセンス: Link先を確認	Siyue Ren, Zhiyao Cui, Ruiqi Song, Zhen Wang, Shuyue Hu,	(参考訳) 社会的規範は、行動規範の理解と定着に向けてエージェントを導く上で重要な役割を担い、マルチエージェントシステム(MAS)における社会的対立を減らす。しかし、現在のLLMベースの(あるいは生成的な)MASには、規範的な能力がない。本稿では,生成型MASにおける社会的規範の出現を促進するために,CRSECという新しいアーキテクチャを提案する。私たちのアーキテクチャは、創造と表現、スプレッド、評価、コンプライアンスの4つのモジュールで構成されています。これは、創発的プロセスのいくつかの重要な側面を1つにまとめる。 (i)社会規範の発祥地 (ii) 形式的にどのように表現されるか三エージェントのコミュニケーション及び観察の方法四衛生検査で検査し、長期にわたって合成する方法、及び (v)エージェントの計画と行動にどのように組み込まれているか。 Smallville Sandboxゲーム環境に導入した我々の実験は、我々の建築が社会規範を確立し、生成的MAS内での社会的衝突を減らす能力を示すものである。評価対象者30名を対象に実施した人的評価の結果,その有効性を確認した。私たちのプロジェクトは、https://github.com/sxswz213/CRSEC.com/sxswz213/CRSEC.comのリンクからアクセスできます。 Social norms play a crucial role in guiding agents towards understanding and adhering to standards of behavior, thus reducing social conflicts within multi-agent systems (MASs). However, current LLM-based (or generative) MASs lack the capability to be normative. In this paper, we propose a novel architecture, named CRSEC, to empower the emergence of social norms within generative MASs. Our architecture consists of four modules: Creation & Representation, Spreading, Evaluation, and Compliance. This addresses several important aspects of the emergent processes all in one: (i) where social norms come from, (ii) how they are formally represented, (iii) how they spread through agents' communications and observations, (iv) how they are examined with a sanity check and synthesized in the long term, and (v) how they are incorporated into agents' planning and actions. Our experiments deployed in the Smallville sandbox game environment demonstrate the capability of our architecture to establish social norms and reduce social conflicts within generative MASs. The positive outcomes of our human evaluation, conducted with 30 evaluators, further affirm the effectiveness of our approach. Our project can be accessed via the following link: https://github.com/sxswz213/CRSEC.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# CoRaiS: マルチエッジ協調コンピューティングのための軽量リアルタイムスケジューリング CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing ( http://arxiv.org/abs/2403.09671v2 ) ライセンス: Link先を確認	Yujiao Hu, Qingmin Jia, Jinchao Chen, Yuan Yao, Yan Pan, Renchao Xie, F. Richard Yu,	(参考訳) 複数のエッジの制約されたリソースを強力なリソースプールに組み合わせたマルチエッジ協調コンピューティングは、膨大な計算能力、応答時間の改善、より多様化したサービスなど、大きなメリットをもたらす可能性がある。しかし、大量の異種資源の構成とスケジューリング戦略の欠如により、マルチエッジコンピューティングシステムのモデリングと協調が特に複雑になる。本稿では、まず、複雑なハードウェア構成を保護し、異種エッジで異なるサービス機能を再定義するシステムレベルの状態評価モデルを提案する。第二に、分散到着要求を最適にディスパッチする整数線形プログラミングモデルが設計されている。最後に,学習に基づく軽量リアルタイムスケジューラCoRaiSを提案する。 CoRaiSは、マルチエッジシステムのリアルタイム状態とリクエスト情報を埋め込み、埋め込みとポリシーネットワークを組み合わせてリクエストをスケジュールし、すべてのリクエストの応答時間を最小化する。評価結果は,CoRaiSがリアルタイムに高品質なスケジューリング決定を下し,システムスケールに関わらず,他のマルチエッジコンピューティングシステムに一般化可能であることを検証した。特性検証はまた、CoRaiSが負荷のバランスをうまく学習し、リアルタイムの状態を認識し、スケジューリング中に不均一性を認識することを実証している。 Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits, such as a tremendous computing power, improved response time, more diversified services. However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing system particularly complicated. This paper first proposes a system-level state evaluation model to shield the complex hardware configurations and redefine the different service capabilities at heterogeneous edges. Secondly, an integer linear programming model is designed to cater for optimally dispatching the distributed arriving requests. Finally, a learning-based lightweight real-time scheduler, CoRaiS, is proposed. CoRaiS embeds the real-time states of multi-edge system and requests information, and combines the embeddings with a policy network to schedule the requests, so that the response time of all requests can be minimized. Evaluation results verify that CoRaiS can make a high-quality scheduling decision in real time, and can be generalized to other multi-edge computing system, regardless of system scales. Characteristic validation also demonstrates that CoRaiS successfully learns to balance loads, perceive real-time state and recognize heterogeneity while scheduling.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# ガウススプラッティングによるビュー一貫性3次元編集 View-Consistent 3D Editing with Gaussian Splatting ( http://arxiv.org/abs/2403.11868v4 ) ライセンス: Link先を確認	Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang,	(参考訳) 3D Gaussian Splatting (3DGS)の出現は、3D編集に革命をもたらし、効率よく高忠実なレンダリングを提供し、正確な局所的な操作を可能にした。現在、拡散ベースの2D編集モデルを用いて、マルチビューレンダリング画像を修正し、3DGSモデルの編集をガイドしている。しかし、このアプローチは多視点不整合の重要な問題に直面しており、誘導画像はビュー間で大きな相違を示し、モード崩壊と3DGSの視覚的アーティファクトをもたらす。この目的のために、3DGSをシームレスに画像編集プロセスに組み込む新しいフレームワークであるView-Consistent Editing (VcEdit)を導入する。 VcEditには、Cross-attention Consistency ModuleとEditing Consistency Moduleという2つの革新的な一貫性モジュールがある。これらの一貫性モジュールを反復的なパターンに組み込むことで、VcEditは多視点不整合の問題を解決し、様々な場面で高品質な3DGS編集を容易にする。 The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes.	翻訳日:2024-05-21 23:00:48 公開日:2024-05-20
# ToXCL: Toxic Speech Detection and Explanation のための統一フレームワーク ToXCL: A Unified Framework for Toxic Speech Detection and Explanation ( http://arxiv.org/abs/2403.16685v2 ) ライセンス: Link先を確認	Nhat M. Hoang, Xuan Long Do, Duc Anh Do, Duc Anh Vu, Luu Anh Tuan,	(参考訳) オンラインの有害な言論の拡散は、人口集団に脅威をもたらす関連する問題である。明示的な有害な音声は攻撃的な語彙信号を含むが、暗黙のものはコード化された言語または間接的な言語から構成される。したがって、モデルが暗黙の有毒な音声を検出するだけでなく、その有毒さを説明することも重要である。このことは、暗黙の有毒なスピーチを効果的に検出し、説明できる統一されたフレームワークのユニークな必要性を引き出す。先行研究は、主にテキスト生成問題として有毒な音声の検出と説明のタスクを定式化した。それでも、この戦略を用いて訓練されたモデルは、その後のエラー伝搬問題に悩まされがちである。さらに,本実験では,検出タスクのみに着目したモデルよりも,そのようなモデルの検出結果がはるかに低いことが明らかとなった。これらのギャップを埋めるために、暗黙の有毒な音声の検出と説明のための統一的なフレームワークToXCLを導入する。私たちのモデルは3つのモジュールで構成されています。一所定のポストの目標人口群を生成するための目標集団発生装置二暗黙の有毒音声の検出に焦点を当てたエンコーダデコーダモデル 3 知識蒸留による教師分類器及び復号器は、必要な説明を生成する。 ToXCLは、新しい最先端の有効性を実現し、ベースラインを大幅に上回る。 The proliferation of online toxic speech is a pertinent problem posing threats to demographic groups. While explicit toxic speech contains offensive lexical signals, implicit one consists of coded or indirect language. Therefore, it is crucial for models not only to detect implicit toxic speech but also to explain its toxicity. This draws a unique need for unified frameworks that can effectively detect and explain implicit toxic speech. Prior works mainly formulated the task of toxic speech detection and explanation as a text generation problem. Nonetheless, models trained using this strategy can be prone to suffer from the consequent error propagation problem. Moreover, our experiments reveal that the detection results of such models are much lower than those that focus only on the detection task. To bridge these gaps, we introduce ToXCL, a unified framework for the detection and explanation of implicit toxic speech. Our model consists of three modules: a (i) Target Group Generator to generate the targeted demographic group(s) of a given post; an (ii) Encoder-Decoder Model in which the encoder focuses on detecting implicit toxic speech and is boosted by a (iii) Teacher Classifier via knowledge distillation, and the decoder generates the necessary explanation. ToXCL achieves new state-of-the-art effectiveness, and outperforms baselines significantly.	翻訳日:2024-05-21 22:50:58 公開日:2024-05-20
# 第二言語学習における分散型エージェントと生成AIによる教育 Distributed agency in second language learning and teaching through generative AI ( http://arxiv.org/abs/2403.20216v2 ) ライセンス: Link先を確認	Robert Godwin-Jones,	(参考訳) 生成AIは、言語学習に重要な機会を提供する。 ChatGPTのようなツールは、文章や音声形式のチャットを通じて非公式の第二言語プラクティスを提供することができ、学習者は習熟度、言語レジスタ、議論トピックなどの会話パラメータを指示する。 AIは、修正的なフィードバックを与えたり、実践演習を作成したり、拡張された研究計画を開発するように指示することができる。インストラクタはAIを使って、さまざまなメディアで学習と評価材料を構築することができる。 AIは没入型技術をより強力で多用途にし、スクリプトによるインタラクションから遠ざかる可能性が高い。学習者と教師の双方にとって、純粋に統計的に人間の言語モデルから生じるAIシステムの限界を理解することが重要である。さらに、AIシステムの構築方法に関する倫理的な懸念や、その使用に関する実践的な制約、特に特権の少ない人口に対する懸念もある。 AIツールのパワーと汎用性は、多くの人々の生活において(スマートフォンと同じく)価値ある、絶え間ない仲間になり、単純なツールの使用以上の密接なつながりを生み出すだろう。社会物質主義のような生態学理論は、密接なユーザーとAIの相互作用を通して発展する共有機関を調べるのに役立つ。 Generative AI offers significant opportunities for language learning. Tools like ChatGPT can provide informal second language practice through chats in written or voice forms, with the learner specifying through prompts conversational parameters such as proficiency level, language register, and discussion topics. AI can be instructed to give corrective feedback, create practice exercises, or develop an extended study plan. Instructors can use AI to build learning and assessment materials in a variety of media. AI is likely to make immersive technologies more powerful and versatile, moving away from scripted interactions. For both learners and teachers, it is important to understand the limitations of AI systems that arise from their purely statistical model of human language, which limits their ability to deal with nuanced social and cultural aspects of language use. Additionally, there are ethical concerns over how AI systems are created as well as practical constraints in their use, especially for less privileged populations. The power and versatility of AI tools are likely to turn them into valuable and constant companions in many peoples lives (akin to smartphones), creating a close connection that goes beyond simple tool use. Ecological theories such as sociomaterialism are helpful in examining the shared agency that develops through close user-AI interactions, as are the perspectives on human-object relations from Indigenous cultures.	翻訳日:2024-05-21 22:50:58 公開日:2024-05-20
# FashionEngine:マルチモーダル制御によるインタラクティブな3Dヒューマンジェネレーションと編集 FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls ( http://arxiv.org/abs/2404.01655v3 ) ライセンス: Link先を確認	Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu,	(参考訳) 本稿では,自然言語や視覚認識,手描きスケッチなどのユーザフレンドリーなマルチモーダルコントロールを通じて,対話型3次元人文生成編集システムであるFashionEngineを紹介する。 FashionEngineは、3つの重要なコンポーネントで3Dヒューマンプロダクションを自動化する。 1)2次元画像訓練データから意味的UV潜伏空間における3次元人間のモデリングを学習する事前学習された3次元人体拡散モデル。 2) マルチモーダル入力を暗黙のUV潜在空間に忠実に整合させ, 制御可能な3次元編集を実現する。マルチモーダルUV空間は、テキスト、画像、スケッチなどの異なるユーザ入力間で共有され、様々な共同マルチモーダル編集タスクを可能にする。 3) マルチモダリティ-UVアラインド・サンプラーは,従来の拡散から高品質で多様な3D人間を採取することを学ぶ。大規模な実験は、条件生成/編集タスクに対するFashionEngineの最先端のパフォーマンスを検証する。さらに,FashionEngine用の対話型ユーザインタフェースを提案する。これは条件付きおよび非条件生成タスクと,ポーズ/ビュー/シェープ制御,テキスト,画像,スケッチ駆動3D編集,仮想トライオンなどの編集タスクを統合されたフレームワークで実現する。私たちのプロジェクトページは以下の通りです。 We present FashionEngine, an interactive 3D human generation and editing system that creates 3D digital humans via user-friendly multimodal controls such as natural languages, visual perceptions, and hand-drawing sketches. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks. 2) Multimodality-UV Space encoding the texture appearance, shape topology, and textual semantics of human clothing in a canonical UV-aligned space, which faithfully aligns the user multimodal inputs with the implicit UV latent space for controllable 3D human editing. The multimodality-UV space is shared across different user inputs, such as texts, images, and sketches, which enables various joint multimodal editing tasks. 3) Multimodality-UV Aligned Sampler learns to sample high-quality and diverse 3D humans from the diffusion prior. Extensive experiments validate FashionEngine's state-of-the-art performance for conditional generation/editing tasks. In addition, we present an interactive user interface for our FashionEngine that enables both conditional and unconditional generation tasks, and editing tasks including pose/view/shape control, text-, image-, and sketch-driven 3D human editing and 3D virtual try-on, in a unified framework. Our project page is at: https://taohuumd.github.io/projects/FashionEngine.	翻訳日:2024-05-21 22:41:02 公開日:2024-05-20
# HENet:マルチビューカメラによるエンドツーエンドマルチタスク3次元認識のためのハイブリッド符号化 HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras ( http://arxiv.org/abs/2404.02517v2 ) ライセンス: Link先を確認	Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang,	(参考訳) 多視点カメラからの3次元認識は、自律運転システムにおいて重要な要素であり、3Dオブジェクトの検出や鳥の目視(BEV)セマンティックセグメンテーションといった複数のタスクを含む。近年の3次元知覚モデルでは,大きな画像エンコーダ,高解像度画像,長期時間入力が採用されており,性能が著しく向上している。しかし、これらの手法は、計算資源の制約のため、トレーニングや推論のシナリオでは互換性がないことが多い。さらに、現代の自律運転システムは、システムアーキテクチャ全体を単純化し、実装の複雑さを低減することができるマルチタスク3D知覚のためのエンドツーエンドフレームワークを採用することを好んでいる。しかし、複数のタスクをエンドツーエンドの3D知覚モデル内で協調的に最適化する場合、タスク間の衝突が発生することが多い。本稿では,これらの問題を緩和するために,マルチタスク3次元認識のためのHENetというエンドツーエンドフレームワークを提案する。具体的には,短期フレーム用大画像エンコーダと長期フレーム用小画像エンコーダを用いたハイブリッド画像エンコーダを提案する。次に,2つのハイブリット画像エンコーダから抽出した異なるフレームの特徴を融合する,アテンション機構に基づく時間的特徴統合モジュールを提案する。最後に、各知覚タスクの特徴に基づき、異なるグリッドサイズのBEV機能、独立したBEVエンコーダ、タスクデコーダを異なるタスクに活用する。実験の結果,HENetは3Dオブジェクト検出やBEVセマンティックセマンティックセグメンテーションを含む,最先端のマルチタスク3D知覚結果をnuScenesベンチマークで達成した。ソースコードとモデルはhttps://github.com/VDIGPKU/HENet.comで公開される。 Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation. To improve perception precision, large image encoders, high-resolution images, and long-term temporal inputs have been adopted in recent 3D perception models, bringing remarkable performance gains. However, these techniques are often incompatible in training and inference scenarios due to computational resource constraints. Besides, modern autonomous driving systems prefer to adopt an end-to-end framework for multi-task 3D perception, which can simplify the overall system architecture and reduce the implementation complexity. However, conflict between tasks often arises when optimizing multiple tasks jointly within an end-to-end 3D perception model. To alleviate these issues, we present an end-to-end framework named HENet for multi-task 3D perception in this paper. Specifically, we propose a hybrid image encoding network, using a large image encoder for short-term frames and a small image encoder for long-term temporal frames. Then, we introduce a temporal feature integration module based on the attention mechanism to fuse the features of different frames extracted by the two aforementioned hybrid image encoders. Finally, according to the characteristics of each perception task, we utilize BEV features of different grid sizes, independent BEV encoders, and task decoders for different tasks. Experimental results show that HENet achieves state-of-the-art end-to-end multi-task 3D perception results on the nuScenes benchmark, including 3D object detection and BEV semantic segmentation. The source code and models will be released at https://github.com/VDIGPKU/HENet.	翻訳日:2024-05-21 22:41:01 公開日:2024-05-20
# 外部計画型大規模言語モデルによる会話性疾患の診断 Conversational Disease Diagnosis via External Planner-Controlled Large Language Models ( http://arxiv.org/abs/2404.04292v5 ) ライセンス: Link先を確認	Zhoujian Sun, Cheng Luo, Ziyi Liu, Zhengxing Huang,	(参考訳) 大規模言語モデル(LLM)の開発は、人工知能(AI)に基づく診断に先例のない可能性をもたらした。しかし、実際の診断シナリオにおけるLCMの応用的視点は、患者データを積極的に収集することができないため、まだ不明である。本研究は,医師のエミュレートによる計画能力の向上を目的としたLCMに基づく診断システムを提案する。我々のシステムは、計画タスクを処理するために2つの外部プランナーを含んでいる。最初のプランナーは、病気スクリーニングの質問を定式化し、初期診断を行うための強化学習アプローチを採用している。第2のプランナーは、LSMを使用して医療ガイドラインを解析し、鑑別診断を行う。実際の患者電子カルテデータを用いて,仮想患者と医師とのシミュレーション対話を構築し,診断能力の評価を行った。本システムは, 疾患スクリーニングと鑑別診断の両課題において, 有意な成績を示した。この研究は、AIを臨床環境にシームレスに統合するためのステップであり、医療診断の精度とアクセシビリティを高める可能性がある。 The development of large language models (LLMs) has brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnostic scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a LLM-based diagnostic system that enhances planning capabilities by emulating doctors. Our system involves two external planners to handle planning tasks. The first planner employs a reinforcement learning approach to formulate disease screening questions and conduct initial diagnoses. The second planner uses LLMs to parse medical guidelines and conduct differential diagnoses. By utilizing real patient electronic medical record data, we constructed simulated dialogues between virtual patients and doctors and evaluated the diagnostic abilities of our system. We demonstrated that our system obtained impressive performance in both disease screening and differential diagnoses tasks. This research represents a step towards more seamlessly integrating AI into clinical settings, potentially enhancing the accuracy and accessibility of medical diagnostics.	翻訳日:2024-05-21 22:41:01 公開日:2024-05-20
# テーラーフィールドのキラリティー・対称性研究への応用 The Application of Tailored Fields for Studying Chirality and Symmetry ( http://arxiv.org/abs/2404.05923v2 ) ライセンス: Link先を確認	Dino Habibović, Kathryn R. Hamilton, Ofer Neufeld, Laura Rego,	(参考訳) ウルトラショートレーザーパルスは、物質の中で最速の電荷力学をトリガーし、探究するためのユニークなツールであり、空間、時間、エネルギーにおいて前例のない分解能を持つ基本的な物理現象の研究を可能にする。超短パルスがもたらす最も興味深い機会の1つは、空間および偏光領域におけるレーザービームの特性を調整し、複数のレベルの対称性の破れを効果的に制御することで対称性を調節し、調査することができることである。特に、これはキラル物質と超高速キラルダイナミクスの探索を可能にする。近年では、キラリティーを研究するための高感度なアプローチの開発が物理学や化学においてホットな話題となり、主に尾尾の光の分野から発展した。この視点では、これらの分野の個人的および共同進化を論じ、すでに交配し、科学における新たな機会を開こうとしている。我々は、トピックが完全に統合され、相互に進化すると予想される将来の展望を概説し、卓越したオープンな問題を強調します。 Ultrashort laser pulses pose unique tools to trigger and probe the fastest charge dynamics in matter, allowing the investigation of fundamental physical phenomena with unprecedented resolution in space, time, and energy. One of the most fascinating opportunities that ultrashort pulses offer is the possibility of modulating and investigating symmetries by tailoring the properties of the laser beam in the spatial and polarization domains, effectively controlling symmetry breaking on multiple levels. In particular, this allows probing chiral matter and ultrafast chiral dynamics. In recent years, the development of highly sensitive approaches for studying chirality has been a hot topic in physics and chemistry that has developed largely separately from the field of tailored light. This perspective discusses the individual and joint evolution of these fields with an emphasis on how the fields have already cross-fertilized, opening new opportunities in science. We outline a future outlook of how the topics are expected to fully merge and mutually evolve, emphasizing outstanding open issues.	翻訳日:2024-05-21 22:41:01 公開日:2024-05-20
# シミュレーション最適化による言語モデルプロンプト選択 Language Model Prompt Selection via Simulation Optimization ( http://arxiv.org/abs/2404.08164v2 ) ライセンス: Link先を確認	Haoting Zhang, Jinghai He, Rhonda Righter, Zeyu Zheng,	(参考訳) 生成言語モデルの発展に伴い,近年,プロンプトの選択が注目されている。プロンプト(英: prompt)は、コンテンツ生成において生成言語モデルのガイドとして機能する、ユーザが提供する命令または記述である。人間の労働力に基づくプロンプト選択手法は存在するが、シミュレーション最適化により、選択したプロンプトに対する事前定義されたスコアを最大化することを目的として、この選択を容易にすることを検討する。具体的には,2段階のフレームワークを提案する。第一段階では、各プロンプトが適度な次元ベクトルで表されるような十分数で可能なプロンプトの集合を決定する。評価と選択の次の段階において、プロンプトを表す中等次元ベクトルに関するスコアの代理モデルを構築する。この構築された代理モデルに基づいて、逐次評価のプロンプトを選択することを提案する。本フレームワークにおける逐次評価手順の整合性を証明する。また,提案手法の有効性を示す数値実験を行い,実装の実践的指導を行う。 With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulation optimization, aiming to maximize a pre-defined score for the selected prompt. Specifically, we propose a two-stage framework. In the first stage, we determine a feasible set of prompts in sufficient numbers, where each prompt is represented by a moderate-dimensional vector. In the subsequent stage for evaluation and selection, we construct a surrogate model of the score regarding the moderate-dimensional vectors that represent the prompts. We propose sequentially selecting the prompt for evaluation based on this constructed surrogate model. We prove the consistency of the sequential evaluation procedure in our framework. We also conduct numerical experiments to demonstrate the efficacy of our proposed framework, providing practical instructions for implementation.	翻訳日:2024-05-21 22:41:01 公開日:2024-05-20
# フローにしよう:3次元フローとオブジェクトクラスタリングの同時最適化 Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering ( http://arxiv.org/abs/2404.08363v2 ) ライセンス: Link先を確認	Patrik Vacek, David Hurych, Tomáš Svoboda, Karel Zimmermann,	(参考訳) 本研究では,実大規模原点雲列からの自己監督型3次元シーンフロー推定の問題について検討する。地上真実のシーンフローラベルが存在しない現代的アプローチでは、フローとオブジェクトの剛性に基づく構造的正規化を取り入れることで、点雲の逐次対にわたる最適化フローの低減に重点を置いている。剛体物体は様々な3次元空間クラスタリング法により推定される。最先端の手法はニューラル・プリエント構造を用いてシーン全体の動きをキャプチャすることに成功したが、複数の物体の動きを識別する際の課題に直面した。そこで本研究では, 重なり合うソフトクラスタと非重なり合う固いクラスタ表現を組み合わせたクラスタリング手法を提案する。フローは、徐々に増大する非重なり合う固いクラスターと、一定の大きさの重なり合う柔らかいクラスターとで、共同で推定される。提案手法をLiDAR点雲を用いた複数データセット上で評価し,新たな最先端結果に到達した自己教師付きベースラインよりも優れた性能を示す。本手法は,歩行者やサイクリスト,その他の脆弱な道路利用者を含む,複数の独立移動物体が近接する複雑な動的シーンにおける流れの解消に優れる。私たちのコードはhttps://github.com/ctu-vras/let-it-flow.comで公開されています。 We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes are publicly available on https://github.com/ctu-vras/let-it-flow.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# テキストから歌へ:声と伴奏を取り入れた制御可能な音楽生成を目指して Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment ( http://arxiv.org/abs/2404.09313v3 ) ライセンス: Link先を確認	Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang,	(参考訳) 歌は歌声と伴奏の組み合わせである。しかし、既存の作品では、歌声合成と音楽生成を独立して重視している。歌の合成を探求するためにはほとんど注意が払われなかった。そこで本研究では,音声と伴奏の両方を組み込んだテキスト・ツー・サング・シンセサイザーという新しいタスクを提案する。我々は,歌唱音声合成 (SVS) とV2A合成 (V2A) を組み合わせた2段階音声合成法であるメロディストを開発した。メロディストは、トリトウワーコントラスト事前学習を利用して、制御可能なV2A合成のためのより効果的なテキスト表現を学習する。音楽サイトから発掘された中国の歌のデータセットは、我々の研究のためにデータ不足を軽減するために構築されている。評価結果は,メロディストが同等の品質とスタイルの整合性で楽曲を合成できることを実証した。オーディオサンプルはhttps://text2songMelodist.github.io/Sample/で見ることができる。 A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis. Melodist leverages tri-tower contrastive pretraining to learn more effective text representation for controllable V2A synthesis. A Chinese song dataset mined from a music website is built up to alleviate data scarcity for our research. The evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency. Audio samples can be found in https://text2songMelodist.github.io/Sample/.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# LLM以外のインストラクションを伴わないインストラクションによるテキスト分類器のインキュベーション Incubating Text Classifiers Following User Instruction with Nothing but LLM ( http://arxiv.org/abs/2404.10877v2 ) ライセンス: Link先を確認	Letian Peng, Jingbo Shang,	(参考訳) 本稿では,任意のクラス定義(ユーザ・インストラクション)を与えられたテキスト分類データを生成することを目的としており,人間のアノテーションや生のコーパスを使わずに,小さなテキスト分類器を訓練することができる。先駆的な試みと比較して、提案したインキュベータは、複雑で相互に依存したクラス(例えば、Eduucatorから提供されるTED Talkや他)を処理できる最初のフレームワークです。具体的には,まず,HuggingFace の分類データセットと記述から得られた命令-データマッピングを,GPT-4 によるテキスト内拡張とともに調整した LLM である。次に、インキュベーターを意味的テキスト埋め込みのクラスタ中心で学習し、世代ごとの統一性と意味的多様性を強調する。各種分類タスクにおけるインキュベータと,直接LLMに基づく推論や,迅速なエンジニアリングによるトレーニングデータ生成などの強力なベースラインを比較した。実験では,(1)従来のベンチマークでうまく動作し,(2)ラベル依存やユーザの好みを考慮に入れ,(3)複数の分類器をインキュベートすることで論理的なテキストマイニングを可能にする。 In this paper, we aim to generate text classification data given arbitrary class definitions (i.e., user instruction), so one can train a small text classifier without any human annotation or raw corpus. Compared with pioneer attempts, our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes (e.g., "TED Talk given by Educator" and "Other"). Specifically, Incubator is an LLM firstly tuned on the instruction-to-data mappings that we obtained from classification datasets and descriptions on HuggingFace together with in-context augmentation by GPT-4. We then refine Incubator by learning on the cluster centers of semantic textual embeddings to emphasize the uniformity and semantic diversity in generations. We compare Incubator on various classification tasks with strong baselines such as direct LLM-based inference and training data generation by prompt engineering. Experiments show Incubator is able to (1) perform well on traditional benchmarks, (2) take label dependency and user preference into consideration, and (3) enable logical text mining by incubating multiple classifiers.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# 光から原子へのねじれ度変換 Conversion of twistedness from light to atoms ( http://arxiv.org/abs/2404.11558v2 ) ライセンス: Link先を確認	S. S. Baturin, A. V. Volotka,	(参考訳) 我々は、束縛された電子によるツイストされた光子の吸収を利用して、自由空間におけるツイストされた原子の生成を可能にするための簡単なモデルとスキームを提案する。我々は、光子と原子の非弾性衝突において、光子のねじれ状態が質量中心状態に移され、原子の軌道運動量の投影が$m_\gamma-\Delta m_e$となることを示す。また、実験条件によっては、光子のねじれ度は原子中心の量子状態に移されるか、束縛された電子遷移の選択規則を変更することが示される。提案されたスキームは一般的なもので、原子波面の複雑な整形を可能にする。 We develop a simple model and propose a scheme that allows the production of twisted atoms in free space using the absorption of twisted photons by a bound electron. We show that in the inelastic collision of a photon and an atom, the twisted state of the photon is transferred to the center-of-mass state, so that the projection of the orbital momentum of the atom becomes $m_\gamma-\Delta m_e$. We also show that, depending on the experimental conditions, the twistedness of the photon is either transferred to the atomic center-of-mass quantum state or modifies the selection rule for the bound electron transition. Proposed scheme is general and enables complex shaping of the atomic wavefront.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering ( http://arxiv.org/abs/2404.12020v2 ) ライセンス: Link先を確認	Jie Ma, Min Hu, Pinghui Wang, Wangchun Sun, Lingyun Song, Hongbin Pei, Jun Liu, Youtian Du,	(参考訳) AVQA(Audio-Visual Question Answering)は複雑なマルチモーダル推論タスクであり、音声とビデオの入力ペアに基づいて、インテリジェントなシステムが自然言語クエリに正確に応答するよう要求する。それでも、一般的なAVQAアプローチは、データセットのバイアスを過度に学習する傾向があり、結果としてロバスト性が低下する。さらに、現在のデータセットはこれらの方法の正確な診断を提供していないかもしれない。これらの課題に対処するために、まず、公開データセット(\textit{MUSIC-AVQA})のテストスプリット内の質問を表現し、その後、分割された質問に分配シフトを導入するという、2つのステップで構築された新しいデータセットである \textit{MUSIC-AVQA-R} を提案する。前者は多様で多様なテストスペースを導き、後者は稀で頻繁で全体的な質問に対する包括的な堅牢性評価をもたらす。次に, バイアス学習を克服するために, 多面サイクル協調型バイアス回避戦略を利用する頑健なアーキテクチャを提案する。実験の結果、このアーキテクチャは両方のデータセットで最先端のパフォーマンスを実現し、特に提案したデータセットでは9.32\%の大幅な改善が得られた。これら2つのデータセットに対して大規模なアブレーション実験を行い、デバイアスング戦略の有効性を検証した。さらに,既存のマルチモーダルQA手法の限界ロバスト性を,データセットの評価を通じて強調する。 Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackle these challenges, firstly, we propose a novel dataset, \textit{MUSIC-AVQA-R}, crafted in two steps: rephrasing questions within the test split of a public dataset (\textit{MUSIC-AVQA}) and subsequently introducing distribution shifts to split questions. The former leads to a large, diverse test space, while the latter results in a comprehensive robustness evaluation on rare, frequent, and overall questions. Secondly, we propose a robust architecture that utilizes a multifaceted cycle collaborative debiasing strategy to overcome bias learning. Experimental results show that this architecture achieves state-of-the-art performance on both datasets, especially obtaining a significant improvement of 9.32\% on the proposed dataset. Extensive ablation experiments are conducted on these two datasets to validate the effectiveness of the debiasing strategy. Additionally, we highlight the limited robustness of existing multi-modal QA methods through the evaluation on our dataset.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# 勾配偏光アルゴリズムによる3dB限界を超える最適機械的四次スクイーズ Optimized mechanical quadrature squeezing beyond the 3 dB limit via gradient-descent algorithm ( http://arxiv.org/abs/2404.13563v2 ) ライセンス: Link先を確認	Yu-Hong Liu, Jie-Qiao Liao,	(参考訳) メカニカル・クアチュア・スクイーズ状態の調製は、キャビティ・オプティメニクスにおいて重要な意味を持つ。そこで本研究では,勾配偏光アルゴリズムを用いて最適キャビティフィールド駆動パルスを求めることにより,典型的なキャビティ・オプティメカカル・システムにおけるメカニカル・クィアリングを生成するための信頼性の高い手法を提案する。熱フォノン占有率100の3dB定常限界を超える機械共振器において, 強い4次スキューズを実現する。さらに、機械的スクイーズを1つの機械的発振期間内に超高速に作成することができる。また、生成したメカニカルスクイーズに付随する最適パルス駆動値を求め、メカニカルスクイーズ生成のメカニズムを解析した。この研究は、量子光学および量子情報科学における最適量子制御の適用を促進する。 The preparation of mechanical quadrature-squeezed states holds significant importance in cavity optomechanics because the squeezed states have extensive applications in understanding fundamental quantum mechanics and exploiting modern quantum technonogy. Here, we propose a reliable scheme for generating mechanical quadrature squeezing in a typical cavity optomechanical system via seeking for optimal cavity-field driving pulses using the gradient-descent algorithm. We realize strong quadrature squeezing in a mechanical resonator that exceeds the 3 dB steady-state limit, even with a thermal phonon occupancy of one hundred. Furthermore, the mechanical squeezing can be ultrafastly created within one mechanical oscillation period. We also obtain the optimal pulsed drivings associated with the created mechanical squeezings and analyze the mechanism for mechanical squeezing generation. This work will promote the application of optimal quantum control in quantum optics and quantum information science.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# プログラム環境ファズリング Program Environment Fuzzing ( http://arxiv.org/abs/2404.13951v2 ) ライセンス: Link先を確認	Ruijie Meng, Gregory J. Duck, Abhik Roychoudhury,	(参考訳) プログラムは独立して実行されるのではなく、プログラムの振る舞いを駆動する実行環境と相互作用する。これにより、ファイル、データベース、構成、ネットワークソケット、人間とユーザのインタラクションなど、複雑な環境相互作用の影響を捉える必要がある。シンボリックな実行における環境キャプチャの従来のアプローチと、手作業を伴う環境モデリングを用いたモデルチェック。本稿では,グレーボックスファジングの拡張に基づいて,異なるアプローチをとる。プログラムが与えられた場合、カーネル/ユーザ/モード境界におけるすべての環境相互作用をシステムコールの形式で記録する。次に、元の記録された相互作用の下でプログラムをリプレイするが、今回は選択的な突然変異を適用し、異なるプログラム環境の効果を得る。ファジィキャンペーンの繰り返し(フィードバック駆動)変異によって、クラッシュする振る舞いを引き起こすプログラム環境を探すことができる。私たちのEFuzzツールは、よく知られた現実世界のプロトコル実装とGUIアプリケーションで33のゼロデイバグを発見しました。その多くはセキュリティ上の脆弱性であり、14のCVEが割り当てられている。 Computer programs are not executed in isolation, but rather interact with the execution environment which drives the program behaviours. Software validation and verification methods, such as greybox fuzzing, thus need to capture the effect of possibly complex environmental interactions, including files, databases, configurations, network sockets, human-user interactions, and more. Conventional approaches for environment capture in symbolic execution and model checking employ environment modelling, which involves manual effort. In this paper, we take a different approach based on an extension of greybox fuzzing. Given a program, we first record all observed environmental interactions at the kernel/user-mode boundary in the form of system calls. Next, we replay the program under the original recorded interactions, but this time with selective mutations applied, in order to get the effect of different program environments -- all without environment modelling. Via repeated (feedback-driven) mutations over a fuzzing campaign, we can search for program environments that induce crashing behaviour. Our EFuzz tool found 33 zero-day bugs in well-known real-world protocol implementations and GUI applications. Many of these are security vulnerabilities and 14 CVEs were assigned.	翻訳日:2024-05-21 22:31:13 公開日:2024-05-20
# EEGDiR:時間情報記憶のための脳波デノケーションネットワークとRetentive Networkによるグローバルモデリング EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network ( http://arxiv.org/abs/2404.15289v2 ) ライセンス: Link先を確認	Bin Wang, Fei Deng, Peifan Jiang,	(参考訳) 脳波信号は臨床医学、脳研究、神経疾患研究において重要な役割を担っている。しかし、様々な生理的および環境的アーティファクトへの感受性は、記録された脳波データにノイズをもたらし、基礎となる脳活動の正確な分析を妨げる。この課題を緩和するためには、Denoisingテクニックが不可欠だ。近年の深層学習アプローチの進歩は、従来の手法と比較して脳波データの信号-雑音比を高める大きな可能性を示している。大規模言語モデル(LLM)の領域では、いくつかのモデルで広く使われているRetentive Network(Retnet)インフラストラクチャが、堅牢な特徴抽出とグローバルモデリング機能を示している。脳波信号と自然言語の時間的類似性を認識し、自然言語処理から脳波分解までRetnetを導入する。この統合は、脳波の認知への新しいアプローチを示し、脳活動の深い理解と神経疾患の正確な診断のための道を開く。それでも、Retnetの脳波への直接的適用は、脳波信号の1次元の性質のため不可能であり、自然言語処理は2次元データを扱う。本稿では1次元の脳波信号を2次元に変換してネットワーク入力として使用する信号埋め込み手法を提案する。実験結果から,提案手法によって達成されたデノナイズの有効性が著しく向上したことが確認された。 Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbased approaches exhibit substantial potential for enhancing the signal-to-noise ratio of EEG data compared to traditional methods. In the realm of large-scale language models (LLMs), the Retentive Network (Retnet) infrastructure, prevalent for some models, demonstrates robust feature extraction and global modeling capabilities. Recognizing the temporal similarities between EEG signals and natural language, we introduce the Retnet from natural language processing to EEG denoising. This integration presents a novel approach to EEG denoising, opening avenues for a profound understanding of brain activities and accurate diagnosis of neurological diseases. Nonetheless, direct application of Retnet to EEG denoising is unfeasible due to the one-dimensional nature of EEG signals, while natural language processing deals with two-dimensional data. To facilitate Retnet application to EEG denoising, we propose the signal embedding method, transforming one-dimensional EEG signals into two dimensions for use as network inputs. Experimental results validate the substantial improvement in denoising effectiveness achieved by the proposed method.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# 2次元アーキテクチャにおける高コヒーレンスKerr-cat量子ビット High-Coherence Kerr-cat qubit in 2D architecture ( http://arxiv.org/abs/2404.16697v3 ) ライセンス: Link先を確認	Ahmed Hajr, Bingcheng Qing, Ke Wang, Gerwin Koolstra, Zahra Pedramrazi, Ziqi Kang, Larry Chen, Long B. Nguyen, Christian Junger, Noah Goss, Irwin Huang, Bibek Bhandari, Nicholas E. Frattini, Shruti Puri, Justin Dressel, Andrew N. Jordan, David Santiago, Irfan Siddiqi,	(参考訳) Kerr-cat量子ビット(Kerr-cat qubit)は、Kerr非線形性を持つ発振器に2光子駆動を適用することにより、多光子シュロディンガー猫状態が安定化されるボソニック量子ビットである。猫サイズの増大に伴う抑制ビットフリップ率により、この量子ビットはノイズバイアス量子ビットに適した量子誤り訂正符号を実装するための有望な候補となる。しかし、この量子ビットの安定化と制御に必要な強力な光-物質相互作用を達成するためには、伝統的に、量子ビットを加熱して性能を低下させる強いマイクロ波駆動が必要である。対照的に、駆動ポートとの結合を増大させることで、パーセルの大規模な崩壊を犠牲にして、強い駆動の必要性がなくなる。有効帯域ブロックフィルタをオンチップに統合することにより、このトレードオフを克服し、高コヒーレンスを有するスケーラブルな2D超伝導回路におけるKerr-cat量子ビットを実現する。このフィルタは、安定化および読み出しに必要な周波数で無視可能な減衰で、キュービット周波数で30dBのアイソレーションを提供する。実験では、8個の光子を持つ猫に対して99.6%の量子非破壊読み出し率を実験的に実証した。また、この量子ビットを高忠実に普遍的に制御するために、高速なラビ振動とX(90)ゲートの新たなデモを安定化ドライブの位相変調により組み合わせる。最後に、回路の理論解析と整合して、1ms以上のビットフリップ時間と位相フリップ時間の線形減少しか達成しない発振器における最大10光子の猫の大きさの関数として、このアーキテクチャの寿命を調べた。我々の量子ビットは、小さなフットプリントを持つフォールトトレラント量子プロセッサのビルディングブロックとして有望であることを示している。 The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions necessary for stabilizing and controlling this qubit has traditionally required strong microwave drives that heat the qubit and degrade its performance. In contrast, increasing the coupling to the drive port removes the need for strong drives at the expense of large Purcell decay. By integrating an effective band-block filter on-chip, we overcome this trade-off and realize a Kerr-cat qubit in a scalable 2D superconducting circuit with high coherence. This filter provides 30 dB of isolation at the qubit frequency with negligible attenuation at the frequencies required for stabilization and readout. We experimentally demonstrate quantum non-demolition readout fidelity of 99.6% for a cat with 8 photons. Also, to have high-fidelity universal control over this qubit, we combine fast Rabi oscillations with a new demonstration of the X(90) gate through phase modulation of the stabilization drive. Finally, the lifetime in this architecture is examined as a function of the cat size of up to 10 photons in the oscillator achieving a bit-flip time higher than 1 ms and only a linear decrease in the phase-flip time, in good agreement with the theoretical analysis of the circuit. Our qubit shows promise as a building block for fault-tolerant quantum processors with a small footprint.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# スケーラブルな変動量子シミュレーションのための多体ローカライゼーション Exploiting many-body localization for scalable variational quantum simulation ( http://arxiv.org/abs/2404.17560v2 ) ライセンス: Link先を確認	Chenfeng Cao, Yeqing Zhou, Swamit Tannu, Nic Shannon, Robert Joynt,	(参考訳) 変分量子アルゴリズムは、短期量子デバイスを用いた実用的な量子アドバンテージを達成するための有望なアプローチとして登場した。その可能性にもかかわらず、これらのアルゴリズムのスケーラビリティは大きな課題となる。これは、ノイズがなくても持続する「不規則な高原」現象に大きく起因している。本研究では,Floquet-initialized variational quantum circuitsの枠組み内での多体局在化(MBL)熱化相転移について検討し,MBLがバレンプラトーを回避するためにどのように使用できるかを検討する。位相遷移は、逆参加比、絡み合いエントロピー、および計量として低重安定化器R'enyiエントロピーの計算によって観測される。 MBL相の回路を初期化し、容易に準備可能な初期状態を用いることで、ユニタリな2-設計の形成を防止でき、その結果、体積法ではなく領域を絡み合う出力状態となり、最適化を通してバレンプラトーを回避できる。この手法を用いることで、異なるフェーズにわたる様々なモデルハミルトンの基底状態の判定に成功し、最適化に必要なリソースが大幅に削減されることを示す。我々は127キュービットの$ibm\_brisbane$量子プロセッサで行った実験を通じて、MBLアプローチをさらに検証した。これらの実験は、変分計算を行うために必要な勾配が、ランダムなユニタリな「キック」を受けるハイゼンベルクモデルのMBL相で復元されることを確認した。これらの結果は、MBLと量子コンピューティングの相互作用に関する新たな洞察を与え、量子アルゴリズムの設計において、MBL状態の役割を考慮するべきであることを示唆している。 Variational quantum algorithms have emerged as a promising approach to achieving practical quantum advantages using near-term quantum devices. Despite their potential, the scalability of these algorithms poses a significant challenge. This is largely attributed to the "barren plateau" phenomenon, which persists even in the absence of noise. In this work, we explore the many-body localization (MBL)-thermalization phase transitions within a framework of Floquet-initialized variational quantum circuits and investigate how MBL could be used to avoid barren plateaus. The phase transitions are observed through calculations of the inverse participation ratio, the entanglement entropy, and a metric termed low-weight stabilizer R\'enyi entropy. By initializing the circuit in the MBL phase and employing an easily preparable initial state, we find it is possible to prevent the formation of a unitary 2-design, resulting in an output state with entanglement that follows an area- rather than a volume-law, and which circumvents barren plateaus throughout the optimization. Utilizing this methodology, we successfully determine the ground states of various model Hamiltonians across different phases and show that the resources required for the optimization are significantly reduced. We have further validated the MBL approach through experiments carried out on the 127-qubit $ibm\_brisbane$ quantum processor. These experiments confirm that the gradients needed to carry out variational calculations are restored in the MBL phase of a Heisenberg model subject to random unitary "kicks". These results provide new insights into the interplay between MBL and quantum computing, and suggest that the role of MBL states should be considered in the design of quantum algorithms.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# 変分自己回帰ネットワークと量子アニーリングを用いた統計力学計算 Statistical Mechanics Calculations Using Variational Autoregressive Networks and Quantum Annealing ( http://arxiv.org/abs/2404.19274v2 ) ライセンス: Link先を確認	Yuta Tamura, Masayuki Ohzeki,	(参考訳) 統計力学では、分割関数の計算は一般に困難である。近年,変分自己回帰ネットワーク(VAN)を用いた近似法が提案されている。このアプローチは、非常に多くのサンプルを取得しながら、生成確率を直接計算する利点を提供する。本研究は, 量子熱処理装置から得られた試料を, ギブス・ボルツマン分布に付着すると仮定した新しい近似法を提案する。有限サイズシェリントン・カークパトリックモデルに適用した場合,提案手法は,従来のVANアプローチや,広く利用されるナイーブ平均場などの近似手法と比較して精度が向上することを示した。 In statistical mechanics, computing the partition function is generally difficult. An approximation method using a variational autoregressive network (VAN) has been proposed recently. This approach offers the advantage of directly calculating the generation probabilities while obtaining a significantly large number of samples. The present study introduces a novel approximation method that employs samples derived from quantum annealing machines in conjunction with VAN, which are empirically assumed to adhere to the Gibbs-Boltzmann distribution. When applied to the finite-size Sherrington-Kirkpatrick model, the proposed method demonstrates enhanced accuracy compared to the traditional VAN approach and other approximate methods, such as the widely utilized naive mean field.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# CofiPara: 大規模マルチモーダルモデルを用いたマルチモーダルサルカズムターゲット同定のための粗粒パラダイム CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models ( http://arxiv.org/abs/2405.00390v2 ) ライセンス: Link先を確認	Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen,	(参考訳) ソーシャルメディアはマルチモーダル・サルカズムに満ちており、テキストや画像のモダリティで直接明らかでない暗黙の矛盾のため、サルカズムの標的を特定することは特に困難である。マルチモーダルサルカズムターゲット同定(MSTI)の現在の手法は、主に、テキストと画像の両方を通して伝達されるマルチモーダルサルカズムの微妙な理解を見越して、端から端まで、表面的な指標に焦点を当てている。本稿では,大きめのパラダイムを持つ多目的MSTIフレームワークを提案する。マルチモーダル推論におけるLMM(Large Multimodal Models)の強力な能力に着想を得て、まずLMMに取り組み、マルチモーダルサルカズム検出における小言語モデルの粗粒化事前学習のための競合する有理性を生成する。次に、よりきめ細かな目標同定のためのモデルを微調整する。そこで,本研究の枠組みは,マルチモーダルサルカズム内での複雑な目標を十分に明らかにし,LMMの潜在的なノイズによる負の影響を緩和するものである。実験の結果,我々のモデルは最先端のMSTI法よりも優れており,また,サルカズムの解読における説明可能性も顕著であることがわかった。 Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# 物理埋め込み3Dガウスによるロボット手術映像を用いた効率的なデータ駆動シーンシミュレーション Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians ( http://arxiv.org/abs/2405.00956v2 ) ライセンス: Link先を確認	Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou,	(参考訳) 手術シーンシミュレーションは,外科教育とシミュレータに基づくロボット学習において重要な役割を担っている。これらの環境を外科的シーンで作る従来のアプローチは、デザイナーがソフトボディシミュレーションのためのテクスチャとジオメトリーを備えた手作りの組織をモデル化する、労働集約的なプロセスを含んでいる。この手動のアプローチは時間を要するだけでなく、スケーラビリティやリアリズムにも制限があります。対照的に、データ駆動シミュレーションは魅力的な代替手段を提供する。実世界の手術映像データから3Dの手術シーンを自動的に再構築し、ソフトボディ物理を応用する可能性がある。しかし、この地域は比較的無漁である。本研究では3D Gaussianを手術シーンの学習可能な表現として紹介し,立体内視鏡映像から学習した。これらのシーンの過度な適合を防止し、幾何学的正当性を確保するため、奥行き監視と異方性正規化をガウス学習プロセスに組み込む。さらに,3次元ガウスに物理特性を統合したマテリアルポイント法を適用し,現実的なシーン変形を実現する。本手法を社内および公開外科用ビデオデータセットで評価した。以上の結果から, 内視鏡的画像からの手術シーンの再構築とシミュレーションを効率的に行うことができ, 手術シーンの再構築に数分しかかからず, リアルタイムに近づく速度で視覚的, 身体的両面の変形を生成できることが示唆された。その結果,手術教育やロボット学習で利用可能なシミュレーションの効率性と多様性を高めるための提案手法の可能性が示唆された。 Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes, we incorporate depth supervision and anisotropy regularization into the Gaussian learning process. Furthermore, we apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations. Our method was evaluated on our collected in-house and public surgical videos datasets. Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene-and produce both visually and physically plausible deformations at a speed approaching real-time. The results demonstrate great potential of our proposed method to enhance the efficiency and variety of simulations available for surgical education and robot learning.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# 指操作のための学習力制御 Learning Force Control for Legged Manipulation ( http://arxiv.org/abs/2405.01402v2 ) ライセンス: Link先を確認	Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal,	(参考訳) 相互作用中の接触力の制御は、移動や操作作業において重要である。 sim-to-real reinforcement learning (RL) は多くの接触に富む問題に成功しているが、現在のRL法は力の制御を明示的に行わずに暗黙的に力強い相互作用を達成している。本稿では,力覚へのアクセスを必要とせず,直接力制御のためのRLポリシーを訓練する方法を提案する。腕を持つ四足ロボットの全身制御プラットフォーム上で本手法を実証する。このような力の制御により、重力補償とインピーダンス制御を行え、従順な全身操作を解き放つことができる。可変コンプライアンスの学習された全身制御装置は、ロボットがマニピュレータを指示するだけでロボットの遠隔操作を直感的に行うことができ、ロボットの体は自動的に調整され、所望の位置と力を達成する。これにより、人間の遠隔操作者は、多様なロコ操作タスクを容易に示することができる。我々の知る限り、我々は、学習した全身力制御を脚のマニピュレータに初めて展開し、より汎用的で適応可能な脚ロボットへの道を開いた。 Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# CTD4 - 多重臨界のカルマン融合を用いた深部連続分布型アクター臨界剤 CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics ( http://arxiv.org/abs/2405.02576v2 ) ライセンス: Link先を確認	David Valencia, Henry Williams, Trevor Gee, Bruce A MacDonald, Minas Liarokapis,	(参考訳) CDRL(Categorical Distributional Reinforcement Learning)は,従来のRL(Reinforcement Learning)アプローチと比較して,複雑なタスクの学習において,より優れたサンプル効率を示す。しかし、CDRLの実践的応用は、挑戦的なプロジェクションステップ、詳細なパラメータチューニング、ドメイン知識によって妨げられている。本稿では,連続行動空間に適した連続分布モデル自由RLアルゴリズムを導入することで,これらの課題に対処する。提案アルゴリズムは,連続確率分布を出力するアクタ批判アーキテクチャを用いて,分布RLの実装を単純化する。さらに,過大評価バイアスを軽減するために,カルマン融合機構を通じて融合した複数の批評家のアンサンブルを提案する。一連の実験を通して,提案手法は訓練が容易であり,複雑な連続制御タスクを実行するためのサンプル効率の高いソリューションとして機能することが検証された。 Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks compared to conventional Reinforcement Learning (RL) approaches. However, the practical application of CDRL is encumbered by challenging projection steps, detailed parameter tuning, and domain knowledge. This paper addresses these challenges by introducing a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces. The proposed algorithm simplifies the implementation of distributional RL, adopting an actor-critic architecture wherein the critic outputs a continuous probability distribution. Additionally, we propose an ensemble of multiple critics fused through a Kalman fusion mechanism to mitigate overestimation bias. Through a series of experiments, we validate that our proposed method is easy to train and serves as a sample-efficient solution for executing complex continuous-control tasks.	翻訳日:2024-05-21 22:21:29 公開日:2024-05-20
# 長期連続予測のための粗大化戦略によるMLPの強化 Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting ( http://arxiv.org/abs/2405.03199v2 ) ライセンス: Link先を確認	Nannan Bian, Minhong Zhu, Li Chen, Weiran Cai,	(参考訳) ディープラーニング手法は,長期連続予測においてその強みを発揮してきた。しかし、表現力と計算効率のバランスをとるのに苦労することが多い。マルチ層パーセプトロン (MLPs) へのリソーシングは、妥協的なソリューションを提供するが、それらは固有のポイントワイドマッピングモードによって引き起こされる2つの重大な問題に悩まされる。本稿では,単独の時間点の代わりに情報グラニュラーを形成することで,プロトタイプMLPに関連する問題を緩和する粗大化戦略を特徴とする粗大化パーセプトロンネットワーク(CP-Net)を提案する。 CP-Netは主に意味的パターンと文脈的パターンを抽出するための2段階のフレームワークを使用しており、より大きなタイムパンの相関を保ち、揮発性雑音を除去する。これは、多様な粒度のパターンを総合的な予測に向けて融合させるマルチスケール設定によってさらに強化される。純粋に構造的単純さの畳み込みに基づいて、CP-Netは線形計算の複雑さとランタイムの低さを維持しつつ、7つの予測ベンチマークでSOTA法と比較すると4.1%の改善を示した。 Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Resorting to multi-layer perceptrons (MLPs) provides a compromising solution, yet they suffer from two critical problems caused by the intrinsic point-wise mapping mode, in terms of deficient contextual dependencies and inadequate information bottleneck. Here, we propose the Coarsened Perceptron Network (CP-Net), featured by a coarsening strategy that alleviates the above problems associated with the prototype MLPs by forming information granules in place of solitary temporal points. The CP-Net utilizes primarily a two-stage framework for extracting semantic and contextual patterns, which preserves correlations over larger timespans and filters out volatile noises. This is further enhanced by a multi-scale setting, where patterns of diverse granularities are fused towards a comprehensive prediction. Based purely on convolutions of structural simplicity, CP-Net is able to maintain a linear computational complexity and low runtime, while demonstrates an improvement of 4.1% compared with the SOTA method on seven forecasting benchmarks.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# Retinexmamba:低照度画像強調のためのRetinex-based Mamba Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement ( http://arxiv.org/abs/2405.03349v2 ) ライセンス: Link先を確認	Jiesong Bai, Yuhao Yin, Qiyuan He, Yuanxian Li, Xiaofeng Zhang,	(参考訳) 低照度画像強調の分野では、従来のRetinex法とRetinexformerのような高度なディープラーニング技術の両方が、明確な利点と限界を示している。従来のレチネックス法は、人間の目の明度と色彩の知覚を模倣するために設計され、画像を照明と反射成分に分解するが、低照度条件下でのノイズ管理と詳細な保存に苦労する。 Retinexformerは、従来の自己認識機構を通じて照明推定を強化するが、解釈容易性や準最適強調効果が不十分な課題に直面している。これらの制約を克服するために,RetinexMambaアーキテクチャを提案する。 RetinexMambaは従来のRetinexメソッドの物理的直感性を捉えるだけでなく、Retinexformerのディープラーニングフレームワークを統合し、ステートスペースモデル(SSM)の計算効率を活用して処理速度を向上させる。このアーキテクチャは、イノベーティブな照明推定器と、エンハンスメント中の画質を維持する損傷回復機構を備えている。さらに、RetinexMambaはRetinexformerのIG-MSA(Illumination-Guided Multi-Head Attention)をFused-Attentionメカニズムで置き換え、モデルの解釈性を向上させる。 LOLデータセットの実験的評価により、RetinexMambaは、Retinex理論に基づく既存のディープラーニングアプローチを定量的および定性的メトリクスの両方で上回り、低照度画像の強化におけるその有効性と優位性を確認した。 In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preservation under low light conditions. Retinexformer enhances illumination estimation through traditional self-attention mechanisms, but faces challenges with insufficient interpretability and suboptimal enhancement effects. To overcome these limitations, this paper introduces the RetinexMamba architecture. RetinexMamba not only captures the physical intuitiveness of traditional Retinex methods but also integrates the deep learning framework of Retinexformer, leveraging the computational efficiency of State Space Models (SSMs) to enhance processing speed. This architecture features innovative illumination estimators and damage restorer mechanisms that maintain image quality during enhancement. Moreover, RetinexMamba replaces the IG-MSA (Illumination-Guided Multi-Head Attention) in Retinexformer with a Fused-Attention mechanism, improving the model's interpretability. Experimental evaluations on the LOL dataset show that RetinexMamba outperforms existing deep learning approaches based on Retinex theory in both quantitative and qualitative metrics, confirming its effectiveness and superiority in enhancing low-light images.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# 部分指紋の同時同定とポスアライメント Joint Identity Verification and Pose Alignment for Partial Fingerprints ( http://arxiv.org/abs/2405.03959v2 ) ライセンス: Link先を確認	Xiongjun Guan, Zhiyu Pan, Jianjiang Feng, Jie Zhou,	(参考訳) 現在、ポータブル電子機器はますます人気が高まっている。軽量な考慮のために、指紋認識モジュールは通常、限られたサイズのセンサーを使用する。しかし、部分的な指紋は、特に指圧姿勢や画像品質の違いがある場合に、適合する特徴がほとんどないため、部分的な指紋認証は困難である。既存のほとんどの手法では、指紋位置の正当性検証を独立したタスクとみなし、それらの間の結合関係を無視している - 相対的なポーズ推定は通常、アンカーとしてペア化された特徴に依存しており、認証精度はより正確なポーズアライメントによって改善される傾向にある。そこで本稿では,部分指紋ペアの協調識別とポーズアライメントの手法を提案する。これを実現するために,マルチタスクCNN-Transformerハイブリッドネットワークを提案し,特徴抽出能力を高めるための事前学習タスクを設計する。複数の公開データセット (NIST SD14, FVC 2002 DB1A & DB3A, FVC 2004 DB1A & DB2A, FVC 2006 DB1A) および社内データセットを用いた実験により, 本手法は指紋部分認証と相対ポーズ推定の両方において, 従来手法よりも効率的でありながら, 最先端性能を実現していることが示された。 Currently, portable electronic devices are becoming more and more popular. For lightweight considerations, their fingerprint recognition modules usually use limited-size sensors. However, partial fingerprints have few matchable features, especially when there are differences in finger pressing posture or image quality, which makes partial fingerprint verification challenging. Most existing methods regard fingerprint position rectification and identity verification as independent tasks, ignoring the coupling relationship between them -- relative pose estimation typically relies on paired features as anchors, and authentication accuracy tends to improve with more precise pose alignment. Consequently, in this paper we propose a method for joint identity verification and pose alignment of partial fingerprint pairs, aiming to leverage their inherent correlation to improve each other. To achieve this, we propose a multi-task CNN (Convolutional Neural Network)-Transformer hybrid network, and design a pre-training task to enhance the feature extraction capability. Experiments on multiple public datasets (NIST SD14, FVC2002 DB1A & DB3A, FVC2004 DB1A & DB2A, FVC2006 DB1A) and an in-house dataset show that our method achieves state-of-the-art performance in both partial fingerprint verification and relative pose estimation, while being more efficient than previous methods.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# TrimCaching: 無線エッジネットワークにおけるパラメータ共有AIモデルキャッシュ TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks ( http://arxiv.org/abs/2405.03990v2 ) ライセンス: Link先を確認	Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang,	(参考訳) 次世代のモバイルネットワークは、エンドユーザへの高速なAIモデルダウンロードを容易にすることが期待されている。エッジサーバにモデルをキャッシュすることで、モバイルネットワークは低レイテンシでエンドユーザにモデルを配信することが可能になる。本稿では,パラメータ共有モデルキャッシング(TrimCaching)と呼ばれる新しいモデル配置手法を提案する。 TrimCachingは、畳み込みニューラルネットワークや大規模言語モデルといった幅広いAIモデルが、再利用可能な知識を含むパラメータブロックのかなりの割合を共有できるため、ストレージ効率が向上する、という重要な観察を活用する。この目的のために、ストレージ効率とサービスレイテンシの基本的なトレードオフをバランスさせて、パラメータ共有モデル配置問題を定式化し、マルチエッジ無線ネットワークにおけるキャッシュヒット率を最大化する。定式化問題は、多項式時間近似アルゴリズムが存在しない部分モジュラー制約を持つ部分モジュラー最大化問題であることを示す。この課題を克服するために、モデル間で少数のパラメータブロックが共有される重要なケースについて検討する。そのような場合、$\left(1-\epsilon\right)/2$-approximationが保証される多項式時間アルゴリズムを開発する。その後、グリーディアルゴリズムを考案し、一般事例の原問題に対処する。シミュレーションの結果,提案したTrimCachingフレームワークは,AIモデルで共有パラメータを利用することなく,最先端のコンテンツキャッシュと比較してキャッシュヒット率を大幅に向上することが示された。 Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# Diff-IP2D:Egocentric Videoにおける拡散に基づく手動物体の相互作用予測 Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos ( http://arxiv.org/abs/2405.04370v2 ) ライセンス: Link先を確認	Junyi Ma, Jingyi Xu, Xieyuanli Chen, Hesheng Wang,	(参考訳) サービスロボットの操作や拡張現実の応用には、人間が手動操作でどのように振る舞うかを理解することが不可欠である。これを実現するために、人間の自我中心の動画に手動軌跡と物価を同時に予測する最近の研究が提案されている。共同予測は2次元空間における将来の手-物体相互作用の包括的表現として機能し、潜在的な人間の動きと動機を示す。しかし、既存のアプローチは主に一方向予測のための自己回帰的パラダイムを採用しており、これは全体論的な将来のシーケンスにおける相互制約を欠き、時間軸に沿ってエラーを蓄積する。一方、これらの作品は基本的に、カメラの感情が1対1の視聴予測に与える影響を見落としている。これらの制約に対処するために,Diff-IP2Dという拡散型相互作用予測手法を提案する。逐次的2次元画像から潜在特徴空間へ変換し,過去の被写体に条件付けされた将来の潜時相互作用特徴を予測するために,偏差拡散モデルを設計する。モーション機能は、より正確なインタラクション予測のために、Diff-IP2Dがカメラ装着者のダイナミクスを認識できるように、条件付き復調プロセスにさらに統合される。大規模な実験により,本手法は市販の計測基準と新たに提案した評価プロトコルの両方において,最先端のベースラインを大幅に上回っていることが示された。このことは、2次元ハンドオブジェクト相互作用予測に生成パラダイムを活用することの有効性を強調している。 Diff-IP2Dのコードはhttps://github.com/IRMVLab/Diff-IP2Dで公開される。 Understanding how humans would behave during hand-object interaction is vital for applications in service robot manipulation and extended reality. To achieve this, some recent works have been proposed to simultaneously forecast hand trajectories and object affordances on human egocentric videos. The joint prediction serves as a comprehensive representation of future hand-object interactions in 2D space, indicating potential human motion and motivation. However, the existing approaches mostly adopt the autoregressive paradigm for unidirectional prediction, which lacks mutual constraints within the holistic future sequence, and accumulates errors along the time axis. Meanwhile, these works basically overlook the effect of camera egomotion on first-person view predictions. To address these limitations, we propose a novel diffusion-based interaction prediction method, namely Diff-IP2D, to forecast future hand trajectories and object affordances concurrently in an iterative non-autoregressive manner. We transform the sequential 2D images into latent feature space and design a denoising diffusion model to predict future latent interaction features conditioned on past ones. Motion features are further integrated into the conditional denoising process to enable Diff-IP2D aware of the camera wearer's dynamics for more accurate interaction prediction. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art baselines on both the off-the-shelf metrics and our newly proposed evaluation protocol. This highlights the efficacy of leveraging a generative paradigm for 2D hand-object interaction prediction. The code of Diff-IP2D will be released at https://github.com/IRMVLab/Diff-IP2D.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# 偏光トポロジカルチャージによる高次トポロジの展開 Unveiling Higher-Order Topology via Polarized Topological Charges ( http://arxiv.org/abs/2405.05505v2 ) ライセンス: Link先を確認	Wei Jia, Bao-Zong Wang, Ming-Jian Gao, Jun-Hong An,	(参考訳) 実空間トポロジカル不変量は、カイラル対称高次トポロジカル位相(HOTP)を特徴づけるために広く用いられた。しかし、これらのHOTPの運動量-空間的特徴は、本質的にその固有なバルク-バウンダリ対応を明らかにし、量子シミュレーションシステムにおける検出を容易にするものであるが、まだ不足している。ここでは、偏光トポロジカル電荷の概念を用いて、キラル対称HOTPに対する実験的に観測可能な運動量空間のキャラクタリゼーションを提案する。これは、バルク状態だけでなく、エッジ状態だけでなく、バンドギャップの閉鎖と再開によって生じるトポロジカル相転移を統一的に記述する。注目すべきは、これらの偏極トポロジカル電荷は擬スピン構造を測定することで同定できることである。 $^{87}$Rb冷原子系のHOTPを検出することが可能なスキームが与えられる。本研究は運動量空間におけるキラル対称HOTPの特性と実験的検出のための道を開く。 Real-space topological invariants were widely used to characterize chiral-symmetric higher-order topological phases (HOTPs). However, a momentum-space characterization to these HOTPs, which essentially reveals their intrinsic bulk-boundary correspondence and facilitates their detection in quantum simulation systems, is still lacking. Here, we propose an experimentally observable momentum-space characterization to the chiral-symmetric HOTPs by the concept of polarized topological charges. It provides a unified description to topological phase transitions caused by the closing and reopening of band gap not only of the bulk states but also the edge states. Remarkably, these polarized topological charges can be identified by measuring the pseudospin structures. A feasible scheme to detect the HOTPs in the $^{87}$Rb cold atomic system is given. Our work opens an avenue for characterization and experimental detection of the chiral-symmetric HOTPs in momentum space.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# 特殊文字攻撃:大規模言語モデルからのスケーラブルなトレーニングデータ抽出を目指して Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models ( http://arxiv.org/abs/2405.05990v2 ) ライセンス: Link先を確認	Yang Bai, Ge Pei, Jindong Gu, Yong Yang, Xingjun Ma,	(参考訳) 大規模言語モデル(LLM)は、幅広いタスクにおいて顕著なパフォーマンスを実現している。しかし、最近の研究では、LLMはトレーニングデータを記憶でき、単純な繰り返しトークンはモデルを騙してデータを漏洩させることが示されている。本稿では、さらに一歩進めて、特定の特殊文字またはそれらと英語の文字の組み合わせがより強いメモリトリガであることを示す。 LLMはJSONファイルの構造記号 {, } と @, # を含む大量の特殊文字を含む大量のデータで訓練されているため、このモデルはこれらの特殊文字と原文の共起を記憶することができる。これにより、トレーニングデータ漏洩を誘発する簡易かつ効果的な特殊文字攻撃(SCA)を提案する。コードコーパスやWebページ,個人識別可能な情報など,さまざまなトレーニングデータをリークし,時には副産物として非ストップ出力を生成することができる。さらに, 学習データコーパスの構成は, 漏洩したデータを検査することで明らかにできることを示す。我々の研究は、LLMの特殊文字に対する感受性を理解し、改善のための潜在的な領域を特定するのに役立ちます。 Large language models (LLMs) have achieved remarkable performance on a wide range of tasks. However, recent studies have shown that LLMs can memorize training data and simple repeated tokens can trick the model to leak the data. In this paper, we take a step further and show that certain special characters or their combinations with English letters are stronger memory triggers, leading to more severe data leakage. The intuition is that, since LLMs are trained with massive data that contains a substantial amount of special characters (e.g. structural symbols {, } of JSON files, and @, # in emails and online posts), the model may memorize the co-occurrence between these special characters and the raw texts. This motivates us to propose a simple but effective Special Characters Attack (SCA) to induce training data leakage. Our experiments verify the high effectiveness of SCA against state-of-the-art LLMs: they can leak diverse training data, such as code corpus, web pages, and personally identifiable information, and sometimes generate non-stop outputs as a byproduct. We further show that the composition of the training data corpus can be revealed by inspecting the leaked data -- one crucial piece of information for pre-training high-performance LLMs. Our work can help understand the sensitivity of LLMs to special characters and identify potential areas for improvement.	翻訳日:2024-05-21 20:25:40 公開日:2024-05-20
# LangCell: 細胞アイデンティティ理解のためのLanguage-Cell事前トレーニング LangCell: Language-Cell Pre-training for Cell Identity Understanding ( http://arxiv.org/abs/2405.06708v2 ) ライセンス: Link先を確認	Suyuan Zhao, Jiahuan Zhang, Yizhen Luo, Yushuai Wu, Zaiqing Nie,	(参考訳) 細胞識別は、細胞の種類、経路情報、疾患情報など、細胞の様々な意味的側面を包含しており、生物学者がその生物学的特性を理解するのに不可欠である。細胞型アノテートなどの転写学的データから細胞識別を理解することは、生体情報学において重要な課題となっている。これらのセマンティックな側面は人間の専門家によって決定されるため、単一セルとラベルペアによって提供される監視信号なしで、AIモデルが細胞アイデンティティ理解タスクを効果的に実行することは不可能である。このタスクに現在使用されているシングルセル事前訓練言語モデル(PLM)は、単一のモダリティ、トランスクリプトミクスデータのみに基づいて訓練され、セルアイデンティティの知識の理解が欠如している。結果として、望ましいセマンティックラベルでラベル付きデータを欠いている場合には、ダウンストリームタスクや苦労のために微調整される必要がある。この問題に対処するために,事前学習期間中に単一セルデータと自然言語の統一表現を構築し,セルアイデンティティに関連する洞察を直接組み込むという,革新的な手法を提案する。より具体的には、最初のLanguage-Cell事前トレーニングフレームワークであるLangCellを紹介します。 LangCellは、セルアイデンティティ情報に富んだテキストを利用して、クロスモーダルな知識の深い理解を得る。異なるベンチマークで実施された実験の結果、LangCellはゼロショットのセル識別理解シナリオで効果的に機能する唯一のシングルセルPLMであり、また、少数ショットと微調整のセル識別理解シナリオで既存のモデルよりも大幅に優れていることが示された。 Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, have become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce LangCell, the first Language-Cell pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.	翻訳日:2024-05-21 20:15:46 公開日:2024-05-20
# ゲノム規模メタボリックネットワークモデルにおける遺伝子機能の能動的学習のためのブール行列論理プログラミング Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models ( http://arxiv.org/abs/2405.06724v2 ) ライセンス: Link先を確認	Lun Ai, Stephen H. Muggleton, Shi-Shun Liang, Geoff S. Baldwin,	(参考訳) 研究を自律的に推進する技術はComputational Scientific Discoveryにおいて顕著であり、Synthetic Biologyは有用な目的のために新しい生物学的システムの設計と構築に焦点を当てた科学分野である。ここでは、細胞工学の促進と生物学的発見の促進に論理ベースの機械学習技術を適用したい。ゲノムスケールメタボリックネットワークモデル (GEMs) と呼ばれる代謝過程の包括的データベースは、しばしば標的化合物生産を最適化するための細胞工学的戦略を評価するために使用される。しかしながら、予測されたホストの振る舞いは、しばしばモデル内のエラーのために、常にGEMによって正しく記述されるわけではない。 GEM内の複雑な遺伝的相互作用を学習するタスクは、計算的および経験的課題を提示する。これらの問題に対処するために,ブール行列を利用して大規模論理プログラムを評価する,Boolean Matrix Logic Programming (BMLP) と呼ばれる新しい手法について述べる。能動的学習を通じて情報的実験を導くことにより,ゲノム仮説空間を効率的に探索するシステム「BMLP_{active}$」を導入する。サブシンボリックな方法とは対照的に、$BMLP_{active}$は、データログ論理プログラムを用いて解釈可能で論理的な表現で広く受け入れられている細菌ホストの最先端のGEMを符号化する。特に、$BMLP_{active}$は、ランダムな実験よりも訓練例が少ない遺伝子ペア間の相互作用をうまく学習することができ、実験的な設計空間の増加を克服することができる。 $BMLP_{active}$は、代謝モデルの迅速な最適化を可能にし、有用な化合物を製造するための生物学的システムを確実に設計する。それは、微生物工学のための自動運転ラボを作るための現実的なアプローチを提供する。 Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.	翻訳日:2024-05-21 20:15:46 公開日:2024-05-20
# オープンセットデータを微妙に活用したロバスト半教師付き学習 Robust Semi-supervised Learning by Wisely Leveraging Open-set Data ( http://arxiv.org/abs/2405.06979v2 ) ライセンス: Link先を確認	Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan,	(参考訳) Open-set Semi-supervised Learning (OSSL) は、ラベル付けされていないデータはラベル付けされていないクラス、すなわちOOD(out-of-distribution)データから来る可能性があるという現実的な設定を持ち、従来のSSLモデルの性能劣化を引き起こす可能性がある。この問題を解決するため、従来のID分類器を除いて、既存のOSSLアプローチでは、OODデータの潜在的な負の影響を避けるために、追加のOOD検出モジュールを使用している。それにもかかわらず、これらのアプローチはトレーニングプロセス中に一般的にオープンセットデータの集合全体を使用し、モデルパフォーマンスに悪影響を及ぼす可能性のあるOSSLタスクに親しみのないデータを含む可能性がある。このことは、OSSLの堅牢なオープンセットデータ選択戦略を開発するきっかけになります。学習理論の観点からの理論的理解を通じて,モデルの学習にオープンセットデータを選択的に活用する汎用的なOSSLフレームワークであるWise Open-set Semi-supervised Learning (WiseOpen)を提案する。勾配分散に基づく選択機構を適用することで、WiseOpenは、オープンセットデータセット全体ではなく、フレンドリなサブセットを利用して、モデルのID分類能力を向上する。また,その計算コストを削減するために,低周波更新と損失ベース選択をそれぞれ採用することにより,WiseOpenの実用的2つのバリエーションを提案する。大規模な実験は、最先端技術と比較してWiseOpenの有効性を実証している。 Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.	翻訳日:2024-05-21 20:15:46 公開日:2024-05-20
# 深層学習に基づくオブジェクトポース推定 : 総合的な調査 Deep Learning-Based Object Pose Estimation: A Comprehensive Survey ( http://arxiv.org/abs/2405.07801v2 ) ライセンス: Link先を確認	Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian,	(参考訳) オブジェクトポーズ推定は、拡張現実やロボット工学の幅広い応用において、基本的なコンピュータビジョン問題である。過去10年間で、より優れた精度と堅牢性のために、ディープラーニングモデルは、エンジニアリングされたポイントペア機能に依存する従来のアルゴリズムに取って代わる傾向にある。それでも、ラベル付きトレーニングデータへの依存、モデルコンパクト性、挑戦条件下での堅牢性、新しい未知のオブジェクトに一般化する能力など、現代の手法ではいくつかの課題が続いている。この分野のさまざまな側面、卓越した課題、将来有望な方向性に関する最近の調査は欠落している。このギャップを埋めるために、ディープラーニングに基づくオブジェクトポーズ推定の最近の進歩について論じ、問題の3つの定式化、すなわち、インスタンスレベル、カテゴリレベル、見えないオブジェクトポーズ推定を網羅する。また、複数の入力データモダリティ、出力ポーズの度合い、オブジェクト特性、下流タスクについても調査を行い、この分野の全体的理解を読者に提供する。さらに、異なるドメイン、推論モード、アプリケーション領域、評価指標、ベンチマークデータセットのトレーニングパラダイムや、これらのベンチマークにおける現在の最先端メソッドのパフォーマンスを報告し、読者がアプリケーションに最も適したメソッドを選択するのを容易にする。最後に、調査は鍵となる課題を特定し、その長所と短所と共に傾向をレビューし、将来の研究の有望な方向性を特定する。また、最新の作業をhttps://github.com/CNJianLiu/Awesome-Object-Pose-Estimationで追跡しています。 Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, i.e., instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation.	翻訳日:2024-05-21 20:15:46 公開日:2024-05-20
# MambaOut: ビジョンにMambaは本当に必要か? MambaOut: Do We Really Need Mamba for Vision? ( http://arxiv.org/abs/2405.07992v3 ) ライセンス: Link先を確認	Weihao Yu, Xinchao Wang,	(参考訳) 状態空間モデル(SSM)のRNNライクなトークンミキサーを備えたアーキテクチャであるMambaが最近導入され、注意機構の2次複雑さに対処し、視覚タスクに適用された。それでも、視覚に対するMambaのパフォーマンスは、畳み込みモデルや注目ベースのモデルと比較すると、しばしば過大評価される。本稿では,マンバの本質を探求し,マンバが長期的・自己回帰的特徴を有するタスクに理想的に適していると結論づける。視覚タスクの場合、画像分類はどちらの特徴とも一致しないため、このタスクにはマンバは必要ない、という仮説を立てる。仮説を実証的に検証するために,Mambaブロックを積み重ねてコアトークンミキサーSSMを除去し,MambaOutという一連のモデルを構築した。実験結果は仮説を強く支持する。具体的には、イメージネット画像分類において、我々のMambaOutモデルはすべての視覚的Mambaモデルを上回っており、このタスクにはMambaが本当に不要であることを示している。検出とセグメンテーションに関しては、MambaOutは最先端のビジュアルMambaモデルの性能と一致せず、長時間の視覚タスクに対するMambaの可能性を示す。コードはhttps://github.com/yuweihao/MambaOutで入手できる。 Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks. To empirically verify our hypotheses, we construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM. Experimental results strongly support our hypotheses. Specifically, our MambaOut model surpasses all visual Mamba models on ImageNet image classification, indicating that Mamba is indeed unnecessary for this task. As for detection and segmentation, MambaOut cannot match the performance of state-of-the-art visual Mamba models, demonstrating the potential of Mamba for long-sequence visual tasks. The code is available at https://github.com/yuweihao/MambaOut	翻訳日:2024-05-21 20:15:46 公開日:2024-05-20
# Googleの保護されたオーディエンスプロトコルの評価 Evaluating Google's Protected Audience Protocol ( http://arxiv.org/abs/2405.08102v2 ) ライセンス: Link先を確認	Minjun Long, David Evans,	(参考訳) サードパーティのクッキーは、デジタルマーケティングのエコシステムの重要な要素だが、ユーザのWebサイトをまたがって、深刻なプライバシーの懸念を喚起する。 Googleは、サードパーティのクッキーを使わずに広告ターゲティングを可能にする、Privacy Sandboxイニシアチブを提案した。このイニシアチブの他の側面に焦点をあてた研究はいくつかあるが、リクエストリンクの防止という目的をシステムがいかにうまく達成するかについては、これまではほとんど分析されていない。本研究は,サードパーティのクッキーを使わずにオンライン再販を可能にすることを目的としたProtected Audience (PrAu)提案(以前はFLEDGEと呼ばれていた)で提案される報告メカニズムのリンクプライバシーリスクの分析に焦点をあてる。 PrAuの全体的なワークフローを要約し、提案した設計に関連する潜在的なプライバシーリスクを強調し、敵が異なるサイトへのリクエストを同じユーザにリンクしようとするシナリオに焦点を当てた。我々は、現在提案されているすべてのプライバシーメカニズムの正しい実装であっても、現実的な敵が、ユーザー要求をリンクし、大量監視を行うために、プライバシー保護された報告メカニズムを引き続き使用できることを示します。 While third-party cookies have been a key component of the digital marketing ecosystem for years, they allow users to be tracked across web sites in ways that raise serious privacy concerns. Google has proposed the Privacy Sandbox initiative to enable ad targeting without third-party cookies. While there have been several studies focused on other aspects of this initiative, there has been little analysis to date as to how well the system achieves the intended goal of preventing request linking. This work focuses on analyzing linkage privacy risks for the reporting mechanisms proposed in the Protected Audience (PrAu) proposal (previously known as FLEDGE), which is intended to enable online remarketing without using third-party cookies. We summarize the overall workflow of PrAu and highlight potential privacy risks associated with its proposed design, focusing on scenarios in which adversaries attempt to link requests to different sites to the same user. We show how a realistic adversary would be still able to use the privacy-protected reporting mechanisms to link user requests and conduct mass surveillance, even with correct implementations of all the currently proposed privacy mechanisms.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# Differentially Private Federated Learning: システムレビュー Differentially Private Federated Learning: A Systematic Review ( http://arxiv.org/abs/2405.08299v3 ) ライセンス: Link先を確認	Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang Cao,	(参考訳) 近年、機械学習におけるプライバシとセキュリティの懸念が、信頼できるフェデレーション学習を研究の最前線に押し上げている。微分プライバシーは、厳格な数学的基盤と証明可能な保証のために、連邦学習におけるプライバシー保護の事実上の標準として登場した。差分プライバシーをフェデレート学習に組み込むアルゴリズムに関する広範な研究にもかかわらず、これらの研究を分類し、合成する体系的なレビューには明らかな欠陥がある。我々の研究は、差分的にプライベートなフェデレーション学習の体系的な概要を提示する。既存の分類学は、連邦学習において様々な差分プライバシーモデルによって提供される対象やプライバシー保護のレベルを十分に考慮していない。このギャップを是正するために,様々な異なるプライバシモデルとフェデレーションシナリオの定義と保証に基づく,微分プライベートなフェデレーション学習の新しい分類法を提案する。我々の分類では、保護対象を様々な差分プライバシモデルと、フェデレートされた学習環境内のそれぞれの近隣レベルにわたって明確に記述することができる。さらに,フェデレート学習シナリオにおける差分プライバシーの適用について検討する。本研究は,プライバシ保護フェデレーション学習に関する貴重な知見を提供し,今後の研究に向けた実践的方向性を提案する。 In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within federated learning, there remains an evident deficiency in systematic reviews that categorize and synthesize these studies. Our work presents a systematic overview of the differentially private federated learning. Existing taxonomies have not adequately considered objects and level of privacy protection provided by various differential privacy models in federated learning. To rectify this gap, we propose a new taxonomy of differentially private federated learning based on definition and guarantee of various differential privacy models and federated scenarios. Our classification allows for a clear delineation of the protected objects across various differential privacy models and their respective neighborhood levels within federated learning environments. Furthermore, we explore the applications of differential privacy in federated learning scenarios. Our work provide valuable insights into privacy-preserving federated learning and suggest practical directions for future research.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# 解釈性と制御のためのスパースオートエンコーダの原理的評価に向けて Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control ( http://arxiv.org/abs/2405.08366v3 ) ライセンス: Link先を確認	Aleksandar Makelov, George Lange, Neel Nanda,	(参考訳) モデルアクティベーションを意味のある特徴に遠ざけることは、解釈可能性の中心的な問題である。しかし、現実的なシナリオにおけるこれらの特徴に対する根本的真理の欠如は、スパース辞書学習のような近年のアプローチの検証に有効である。この課題に対処するために,特定のタスクの文脈における特徴辞書を評価するためのフレームワークを提案する。まず,教師付き辞書は,タスクにおけるモデル計算の近似,制御,解釈性に優れることを示す。第2に、教師なし辞書を用いて、同じ3つの軸に沿った教師なし辞書の評価を開発し、文脈的に評価する。 GPT-2 Small を用いた間接オブジェクト識別(IOI)タスクに適用し,IOI や OpenWebText のデータセットで訓練したスパースオートエンコーダ (SAE) を用いた。これらのSAEは、IOIタスクの解釈可能な特徴をキャプチャするが、モデル制御における教師付き特徴よりは成功していない。最後に,SAEトレーニングにおける2つの定性的な現象を観察する:特徴排除(因果関係の概念が学習特徴においてわずかに高次な概念によって強固に覆われている)と特徴過分割(二分的特徴がより小さく,解釈不能な特徴に分裂している)である。我々は,より客観的かつ基礎的な辞書学習手法の評価に向けて,我々のフレームワークが有用なステップを提供することを期待している。 Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary learning, elusive. To address this challenge, we propose a framework for evaluating feature dictionaries in the context of specific tasks, by comparing them against \emph{supervised} feature dictionaries. First, we demonstrate that supervised dictionaries achieve excellent approximation, control, and interpretability of model computations on the task. Second, we use the supervised dictionaries to develop and contextualize evaluations of unsupervised dictionaries along the same three axes. We apply this framework to the indirect object identification (IOI) task using GPT-2 Small, with sparse autoencoders (SAEs) trained on either the IOI or OpenWebText datasets. We find that these SAEs capture interpretable features for the IOI task, but they are less successful than supervised features in controlling the model. Finally, we observe two qualitative phenomena in SAE training: feature occlusion (where a causally relevant concept is robustly overshadowed by even slightly higher-magnitude ones in the learned features), and feature over-splitting (where binary features split into many smaller, less interpretable features). We hope that our framework will provide a useful step towards more objective and grounded evaluations of sparse dictionary learning methods.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# Reinformer:オフラインRLの最大戻りシーケンスモデリング Reinformer: Max-Return Sequence Modeling for Offline RL ( http://arxiv.org/abs/2405.08740v2 ) ライセンス: Link先を確認	Zifeng Zhuang, Dengyun Peng, Jinxin Liu, Ziqi Zhang, Donglin Wang,	(参考訳) データ駆動型パラダイムとして、オフライン強化学習(RL)は、リターン、ゴール、将来の軌道を含む後見情報に基づく条件をシーケンスモデリングとして定式化されている。有望ではあるが、この教師付きパラダイムはリターンを最大化するRLの中核的な目的を見落としている。この見落としは、準最適データから学習するシーケンスモデルに影響を与える軌道縫合能力の欠如に直接繋がる。そこで本研究では,戻り値の最大化という目標を既存シーケンスモデルに組み込む,最大復帰シーケンスモデリングの概念を導入する。本稿では,RLの目的によってシーケンスモデルが強化されていることを示すReinforced Transformer(Reinformer)を提案する。 Reinformerはまた、トレーニングフェーズにおけるリターンの最大化という目的も取り入れており、ディストリビューション内での最大将来のリターンを予測することを目的としている。推論中、この分布内最大戻り値は最適なアクションの選択を導く。実証的には、ReinformerはD4RLベンチマークの古典的なRL手法と競合し、特に軌道縫合能力において最先端のシーケンスモデルより優れている。コードは \url{https://github.com/Dragon-Zhuang/Reinformer} で公開されている。 As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as sequence modeling that conditions on the hindsight information including returns, goal or future trajectory. Although promising, this supervised paradigm overlooks the core objective of RL that maximizes the return. This overlook directly leads to the lack of trajectory stitching capability that affects the sequence model learning from sub-optimal data. In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models. We propose Reinforced Transformer (Reinformer), indicating the sequence model is reinforced by the RL objective. Reinformer additionally incorporates the objective of maximizing returns in the training phase, aiming to predict the maximum future return within the distribution. During inference, this in-distribution maximum return will guide the selection of optimal actions. Empirically, Reinformer is competitive with classical RL methods on the D4RL benchmark and outperforms state-of-the-art sequence model particularly in trajectory stitching ability. Code is public at \url{https://github.com/Dragon-Zhuang/Reinformer}.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# 軌跡予測のための視覚のない知覚:自律運転における効果的な能動学習のためのシーン表現としてのエゴ車両ダイナミクス Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving ( http://arxiv.org/abs/2405.09049v2 ) ライセンス: Link先を確認	Ross Greer, Mohan Trivedi,	(参考訳) 本研究では、自律走行機械学習タスクにおける効率的なデータキュレーションのための軌道情報と動的状態情報の利用について検討する。モデル性能を維持しつつアノテーションやデータコストを削減することを目的とした,アクティブラーニングフレームワークにおける軌道状態とサンプリング戦略のクラスタリング手法を提案する。提案手法は軌道情報を利用してデータ選択をガイドし,トレーニングデータの多様性を促進する。本研究では,nuScenesデータセットを用いたトラジェクティブ予測タスクにおける提案手法の有効性を実証し,異なるデータプールサイズでのランダムサンプリングよりも一貫した性能向上を示すとともに,データコストの50%のサブベースライン変位誤差にまで達することを示した。以上の結果から,トレーニングプールの規模が大きくなるにつれて,初歩的なデータサンプリングが「コールドスタート問題」の克服に役立ちながら,新規性の導入がより有益であることが示唆された。トラジェクティブ・ステート・インフォームド・アクティブ・ラーニングを統合することで、より効率的で堅牢な自動運転システムが低コストのデータキュレーション・ストラテジーによって実現可能であることを示す。 This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# 船衝突回避のための説明可能なAI:意思決定プロセスのデコードと行動意図 Explainable AI for Ship Collision Avoidance: Decoding Decision-Making Processes and Behavioral Intentions ( http://arxiv.org/abs/2405.09081v2 ) ライセンス: Link先を確認	Hitoshi Yoshioka, Hirotada Hashimoto,	(参考訳) 本研究は、船舶衝突回避のための説明可能なAIを開発した。当初、サブタスク批判ネットワークからなる批判ネットワークが提案され、衝突回避において各サブタスクを個別に評価し、関連するAI意思決定プロセスを明らかにする。さらに,Q値分析と注意機構を用いて行動意図を識別する試みを行った。前者は、AI行動によるQ値の増大を調べることによって意図を解釈することに焦点を当て、後者は、衝突回避のための意思決定プロセスにおいて、他の船の意義を学習目的に取り入れた。衝突回避におけるAIの行動意図は、認識された衝突の危険と他の船への注意度を組み合わせることで可視化された。提案手法は数値実験により評価した。開発されたAIは、さまざまな渋滞レベル下での衝突を安全に回避できることが確認され、AIの意思決定プロセスは人間にとって理解しやすいものになった。提案手法は,船体衝突回避タスクにおけるDRLベースのコントローラ/システム理解を容易にするだけでなく,サブタスクを構成するタスクにも拡張する。 This study developed an explainable AI for ship collision avoidance. Initially, a critic network composed of sub-task critic networks was proposed to individually evaluate each sub-task in collision avoidance to clarify the AI decision-making processes involved. Additionally, an attempt was made to discern behavioral intentions through a Q-value analysis and an Attention mechanism. The former focused on interpreting intentions by examining the increment of the Q-value resulting from AI actions, while the latter incorporated the significance of other ships in the decision-making process for collision avoidance into the learning objective. AI's behavioral intentions in collision avoidance were visualized by combining the perceived collision danger with the degree of attention to other ships. The proposed method was evaluated through a numerical experiment. The developed AI was confirmed to be able to safely avoid collisions under various congestion levels, and AI's decision-making process was rendered comprehensible to humans. The proposed method not only facilitates the understanding of DRL-based controllers/systems in the ship collision avoidance task but also extends to any task comprising sub-tasks.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# PolygloToxicity Prompts:大規模言語モデルにおける神経毒性の多言語的評価 PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models ( http://arxiv.org/abs/2405.09373v2 ) ライセンス: Link先を確認	Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、その広範なグローバル展開をもたらし、包括的および多言語毒性評価に対する安全性の要求を確実にしている。しかし、既存の毒性ベンチマークは圧倒的に英語に重点を置いており、他の言語にLSMをデプロイする重大なリスクを負っている。 PTP(PolygloToxicity Prompts)は、17言語にまたがる自然発生425Kの大規模多言語毒性評価ベンチマークである。我々は、Webテキストに自然に発生する毒性の不足を克服し、1億以上のWebテキスト文書を自動的にスクラップすることで、様々なリソースを持つ言語にまたがるカバレッジを確保する。 PTPを用いて,60 LLMのベンチマークにより,モデルサイズ,プロンプト言語,指示および選好学習法が毒性に及ぼす影響について検討した。特に,言語資源の減少やモデルサイズの増加に伴い,毒性が増大することがわかった。指導・嗜好調整は毒性を低下させるが、選好調整法の選択は大きな影響を与えない。 LLMの安全確保と今後の研究分野のハイライトに光を当てた。 Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scraping over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that toxicity increases as language resources decrease or model size increases. Although instruction- and preference-tuning reduce toxicity, the choice of preference-tuning method does not have any significant impact. Our findings shed light on crucial shortcomings of LLM safeguarding and highlight areas for future research.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# DemOpts:新型コロナウイルスのケース予測モデルにおける公正度補正 DemOpts: Fairness corrections in COVID-19 case prediction models ( http://arxiv.org/abs/2405.09483v2 ) ライセンス: Link先を確認	Naman Awasthi, Saad Abrar, Daniel Smolyak, Vanessa Frias-Martinez,	(参考訳) 新型コロナウイルス(COVID-19)の予測モデルは、リソース割り当てや病院のベッド、在宅勤務の注文などの介入に関する意思決定を通知するために使われてきた。最先端のディープラーニングモデルは、新型コロナウイルスのケース予測モデルを強化するために、モビリティや社会デコグラフィーデータなどのマルチモーダルデータを使用することが多い。それにもかかわらず、関連する研究は、新型コロナウイルスの感染者の過少報告バイアスと、一部の少数民族や民族集団の移動データのサンプリングバイアスを明らかにしており、結果として、人種ラベルに沿った新型コロナウイルスの予測の公平性に影響を与える可能性がある。本稿では、現在最先端のディープラーニングモデルを用いて、人種や民族間で大きく異なる予測誤差を出力し、不公平な政策決定を支援することができることを示す。また、潜在的なバイアス付きデータセットに基づいてトレーニングされたディープラーニングに基づく予測モデルの公平性を高めるために、新しいデバイアス化手法であるDemOptsを提案する。以上の結果から、DemOptsは、他の最先端の非バイアス化アプローチと同等のエラーを達成でき、これにより、より人種的および民族的グループ間の平均エラー分布の差異を効果的に低減できることが示された。 COVID-19 forecasting models have been used to inform decision making around resource allocation and intervention decisions e.g., hospital beds or stay-at-home orders. State of the art deep learning models often use multimodal data such as mobility or socio-demographic data to enhance COVID-19 case prediction models. Nevertheless, related work has revealed under-reporting bias in COVID-19 cases as well as sampling bias in mobility data for certain minority racial and ethnic groups, which could in turn affect the fairness of the COVID-19 predictions along race labels. In this paper, we show that state of the art deep learning models output mean prediction errors that are significantly different across racial and ethnic groups; and which could, in turn, support unfair policy decisions. We also propose a novel de-biasing method, DemOpts, to increase the fairness of deep learning based forecasting models trained on potentially biased datasets. Our results show that DemOpts can achieve better error parity that other state of the art de-biasing approaches, thus effectively reducing the differences in the mean error distributions across more racial and ethnic groups.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# テキスト, 画像, ビデオ, 音声基礎モデルにおける幻覚の発見 : 包括的調査 Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey ( http://arxiv.org/abs/2405.09589v2 ) ライセンス: Link先を確認	Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha,	(参考訳) 言語、画像、音声、ビデオ領域にまたがるファンデーションモデル(FM)の急速な進歩は、様々なタスクにおいて顕著な能力を示している。しかし、FMの拡散は、特に高感度の応用において、幻覚出力を発生させる可能性という重要な課題を生んでいる。幻覚コンテンツを生み出す基礎モデルの傾向は、特に信頼性と精度が最重要である領域において、現実のシナリオにおいて広く採用されていることの最大の障害である。本研究は,FMにおける幻覚問題,テキスト,画像,ビデオ,オーディオモダリティの同定と緩和を目的とした最近の研究の概要を概説する。近年の幻覚の検出・緩和の進歩によって,研究者,開発者,実践者に貴重な洞察を提供することが目的である。本質的には、マルチモーダル基礎モデルの幻覚に対処するための定義、分類、検出戦略を含む明確な枠組みを確立し、この中心的な領域における将来の研究の基礎を築いた。 The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research in this pivotal area.	翻訳日:2024-05-21 20:06:02 公開日:2024-05-20
# ニューラルパス表現を用いたテキスト・ツー・ベクター生成 Text-to-Vector Generation with Neural Path Representation ( http://arxiv.org/abs/2405.10317v2 ) ライセンス: Link先を確認	Peiying Zhang, Nanxuan Zhao, Jing Liao,	(参考訳) ベクトルグラフィックスはデジタルアートで広く使われており、そのスケーラビリティとレイヤーワイドの性質からデザイナーに好まれている。しかし、ベクトルグラフィックスの作成と編集には創造性と設計の専門知識が必要であり、時間を要する作業となっている。テキスト・ツー・ベクター(T2V)生成の最近の進歩は、このプロセスをより使いやすくすることを目的としている。しかし、既存のT2V法はベクトルグラフパスの制御点を直接最適化し、幾何学的制約が欠如しているため、しばしば交差やジャグリングの経路が生じる。これらの制約を克服するために,2分岐変分オートエンコーダ(VAE)を設計し,シーケンスと画像の両モードから経路潜時空間を学習するニューラルパス表現を提案する。ニューラルパスの組み合わせを最適化することにより、生成したSVGの表現性を保ちながら幾何的制約を組み込むことができる。さらに,生成したSVGの視覚的およびトポロジ的品質を改善するための2段階経路最適化手法を提案する。第1段階では、事前訓練されたテキスト・ツー・イメージ拡散モデルが、変分スコア蒸留(VSD)プロセスを通じて複雑なベクトルグラフィックスの初期生成を導く。第2段階では、レイヤワイズ画像ベクトル化戦略を用いてグラフィクスを洗練し、より明確な要素と構造を実現する。本手法の有効性を実験的に検証し,様々な応用例を示す。プロジェクトページはhttps://intchous.github.io/T2V-NPR。 Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making it a time-consuming task. Recent advancements in text-to-vector (T2V) generation have aimed to make this process more accessible. However, existing T2V methods directly optimize control points of vector graphics paths, often resulting in intersecting or jagged paths due to the lack of geometry constraints. To overcome these limitations, we propose a novel neural path representation by designing a dual-branch Variational Autoencoder (VAE) that learns the path latent space from both sequence and image modalities. By optimizing the combination of neural paths, we can incorporate geometric constraints while preserving expressivity in generated SVGs. Furthermore, we introduce a two-stage path optimization method to improve the visual and topological quality of generated SVGs. In the first stage, a pre-trained text-to-image diffusion model guides the initial generation of complex vector graphics through the Variational Score Distillation (VSD) process. In the second stage, we refine the graphics using a layer-wise image vectorization strategy to achieve clearer elements and structure. We demonstrate the effectiveness of our method through extensive experiments and showcase various applications. The project page is https://intchous.github.io/T2V-NPR.	翻訳日:2024-05-21 19:56:17 公開日:2024-05-20
# 大規模言語モデルにおける毒性の現実的評価 Realistic Evaluation of Toxicity in Large Language Models ( http://arxiv.org/abs/2405.10659v2 ) ライセンス: Link先を確認	Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen,	(参考訳) 大きな言語モデル(LLM)は、私たちのプロフェッショナルなワークフローや日々の生活に不可欠なものになっています。膨大な量のデータを多種多様な知識で提供し、避けられない毒性や偏見にさらしているのです。ほとんどのLLMは有害なコンテンツの発生を防ぐための防御機構を組み込んでいるが、これらの安全対策は最小限の迅速な技術で容易に回避できる。本稿では,これらのモデルの保護層を無効化するための手作業によるプロンプトを含む,Toroughly Engineered Toxicity (TET)データセットについて紹介する。広範な評価を通じて,本論文では,通常のプロンプトを用いて隠蔽される可能性のあるLSMの毒性について,厳密な評価基準を提供する上で,TETが重要な役割を担っていることを示す。 Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.	翻訳日:2024-05-21 19:56:17 公開日:2024-05-20
# ロスランドスケープにおけるデジェネリアシーを用いた機械的解釈性 Using Degeneracy in the Loss Landscape for Mechanistic Interpretability ( http://arxiv.org/abs/2405.10927v2 ) ライセンス: Link先を確認	Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius Hobbhahn,	(参考訳) 機械的解釈可能性(Mechanistic Interpretability)は、ニューラルネットワークによって実装されたアルゴリズムを、その重みとアクティベーションを研究することによってリバースエンジニアリングすることを目的としている。逆エンジニアリングニューラルネットワークの障害は、ネットワーク内の多くのパラメータが、ネットワークによって実装されている計算に関与していないことである。これらの縮退パラメータは内部構造を難読化することができる。特異学習理論は、ニューラルネットワークのパラメータ化がより退化に偏っていること、そしてより退化性のあるパラメータ化がさらに一般化される可能性が高いことを教えてくれる。ネットワークパラメータをデジェネレーションする3つの方法として,レイヤ内のアクティベーション間の線形依存,レイヤに渡される勾配間の線形依存,データポイントの同じサブセットに発火するReLUを同定する。また、モジュラーネットワークはより退化しやすいというヒューリスティックな議論も提示し、この議論に基づいてネットワーク内のモジュールを識別する指標を開発する。縮退を利用した再パラメータ化に不変な方法でニューラルネットワークを表現できるなら、この表現はより解釈可能である可能性が高く、そのような表現がスペーサー相互作用を持つ可能性が示唆されている。本稿では,アクティベーションやジャコビアンの線形依存から退化に不変な表現を得るためのトラクタブル手法であるInteraction Basisを紹介する。 Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to generalize further. We identify 3 ways that network parameters can be degenerate: linear dependence between activations in a layer; linear dependence between gradients passed back to a layer; ReLUs which fire on the same subset of datapoints. We also present a heuristic argument that modular networks are likely to be more degenerate, and we develop a metric for identifying modules in a network that is based on this argument. We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions. We introduce the Interaction Basis, a tractable technique to obtain a representation that is invariant to degeneracies from linear dependence of activations or Jacobians.	翻訳日:2024-05-21 19:56:17 公開日:2024-05-20
# 局所相互作用ベイズ:ニューラルネットワークにおける計算関連・疎干渉特徴の同定 The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks ( http://arxiv.org/abs/2405.10928v2 ) ライセンス: Link先を確認	Lucius Bushnaq, Stefan Heimersheim, Nicholas Goldowsky-Dill, Dan Braun, Jake Mendel, Kaarel Hänni, Avery Griffin, Jörn Stöhler, Magdalena Wache, Marius Hobbhahn,	(参考訳) 機械的解釈可能性(Mechanistic Interpretability)は、ニューラルネットワークの内部計算をリバースエンジニアリングすることで、その振る舞いを理解することを目的としている。しかし、現在の手法では、演算機能へのアクティベーションの分解が欠如しているため、ニューラルネットワークのアクティベーションの明確な解釈を見つけるのに苦労している。個々のニューロンやモデルコンポーネントは、明確に異なる特徴や機能に対応しない。本稿では,ネットワークの活性化を新たな基盤であるLIB(Local Interaction Basis)に変換することによって,この制限を克服することを目的とした,新たな解釈可能性手法を提案する。 LIBは、無関係なアクティベーションと相互作用を取り除き、計算的特徴を識別することを目的としている。本手法は, アクティベーションの非関係な方向を減少させ, 隣り合う層間のヤコビ行列の特異ベクトルと基底を一致させる。また、下流計算の重要性に基づいて機能をスケールし、モデル内のすべての計算関連特徴と相互作用を示す相互作用グラフを生成する。モジュール追加およびCIFAR-10モデルに対するLIBの有効性を評価し,主成分分析と比較して,より計算的に関連性の高い特徴を同定した。しかし、LIBは言語モデルに適用した場合、解釈可能性や相互作用の空間性を大幅に改善するものではない。我々は、LIBはニューラルネットワークを解析するための有望な理論駆動型アプローチであるが、現在の形式では、大きな言語モデルには適用できないと結論付けた。 Mechanistic interpretability aims to understand the behavior of neural networks by reverse-engineering their internal computations. However, current methods struggle to find clear interpretations of neural network activations because a decomposition of activations into computational features is missing. Individual neurons or model components do not cleanly correspond to distinct features or functions. We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB). LIB aims to identify computational features by removing irrelevant activations and interactions. Our method drops irrelevant activation directions and aligns the basis with the singular vectors of the Jacobian matrix between adjacent layers. It also scales features based on their importance for downstream computation, producing an interaction graph that shows all computationally-relevant features and interactions in a model. We evaluate the effectiveness of LIB on modular addition and CIFAR-10 models, finding that it identifies more computationally-relevant features that interact more sparsely, compared to principal component analysis. However, LIB does not yield substantial improvements in interpretability or interaction sparsity when applied to language models. We conclude that LIB is a promising theory-driven approach for analyzing neural networks, but in its current form is not applicable to large language models.	翻訳日:2024-05-21 19:56:17 公開日:2024-05-20
# OpenRLHF: 使いやすくスケーラブルで高性能なRLHFフレームワーク OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework ( http://arxiv.org/abs/2405.11143v1 ) ライセンス: Link先を確認	Jian Hu, Xibin Wu, Weixun Wang, Xianyu, Dehao Zhang, Yu Cao,	(参考訳) 大規模言語モデル(LLM)は法則のスケーリングによって成長し続けており、人間のフィードバックからの強化学習(RLHF)はその卓越した性能のために大きな注目を集めている。しかし、1つのモデルの事前訓練や微調整とは異なり、人間のフィードバック(RLHF)からの強化学習を拡大して、大きな言語モデルをトレーニングすることは、4つのモデル間で協調的な課題を引き起こす。提案するOpenRLHFは,効率的なRLHFスケーリングを実現するオープンソースフレームワークである。同じGPU上で4つのモデルを同時に配置する既存のRLHFフレームワークとは異なり、OpenRLHFは、Ray、vLLM、DeepSpeedを使用して70Bパラメータを超えるモデルのスケジューリングを再設計し、リソース利用の改善と多様なトレーニングアプローチを活用する。 Hugging Faceとシームレスに統合されたOpenRLHFは、最適化されたアルゴリズムとローンチスクリプトを備えたアウト・オブ・ボックスソリューションを提供する。 OpenRLHFはRLHF、DPO、拒絶サンプリング、その他のアライメント技術を実装している。 OpenRLHF のコードは https://github.com/OpenLLMAI/OpenRLHF で公開されている。 As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at https://github.com/OpenLLMAI/OpenRLHF.	翻訳日:2024-05-21 19:17:16 公開日:2024-05-20
# QComp: 医薬品発見のためのQSARベースのデータ補完フレームワーク QComp: A QSAR-Based Data Completion Framework for Drug Discovery ( http://arxiv.org/abs/2405.11703v1 ) ライセンス: Link先を確認	Bingjia Yang, Yunsie Chung, Archer Y. Yang, Bo Yuan, Xiang Yu,	(参考訳) 薬物発見において、in vitroおよびin vivo実験は化合物の有効性と毒性に関連する生化学的活性を明らかにする。実験データは、巨大な、絶え間なく進化し、スパースなデータセットに蓄積される。化合物の構造情報のみを用いて生化学的活動を予測する定量的構造-活性関係モデル(QSAR)は、研究の進展に伴い、進化する実験データを統合する上での課題に直面している。この問題に対処するデータ補完フレームワークであるQSAR-Complete (QComp) を開発した。既存のQSARモデルに基づいて、QCompは実験データに固有の相関を利用して、様々なタスクにおける予測精度を向上させる。さらに、QCompは、特定のエンドポイントに対する統計的不確実性の低下を定量化し、薬物発見プロセス全体を通して合理的な意思決定を支援することによって、実験の最適なシーケンスを導くための有望なツールとして出現する。 In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the evolving experimental data as studies progress. We develop QSAR-Complete (QComp), a data completion framework to address this issue. Based on pre-existing QSAR models, QComp utilizes the correlation inherent in experimental data to enhance prediction accuracy across various tasks. Moreover, QComp emerges as a promising tool for guiding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific endpoints, thereby aiding in rational decision-making throughout the drug discovery process.	翻訳日:2024-05-21 14:43:16 公開日:2024-05-20
# 自然言語処理タスクにおけるディープラーニングに基づく大規模言語モデルの効率最適化 Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks ( http://arxiv.org/abs/2405.11704v1 ) ライセンス: Link先を確認	Taiyuan Mei, Yun Zi, Xiaohan Cheng, Zijun Gao, Qi Wang, Haowei Yang,	(参考訳) 大規模言語モデルの内部構造と操作機構は理論的に解析され、特にTransformerとその派生アーキテクチャは、長期依存を捕捉しながら計算効率を抑えることができる。さらに、トレーニングフェーズの効率ボトルネックを深く掘り下げ、適応最適化アルゴリズム(AdamWなど)、大規模並列計算技術、収束の加速とメモリフットプリントの削減を目的とした混合精度トレーニング戦略の貢献度を詳細に評価する。これらのアルゴリズムの数学的原理と実装の詳細を解析することにより、実際にトレーニング効率を効果的に改善する方法について明らかにする。モデル配置と推論最適化の観点で,本論文はモデル圧縮技術の最新の進歩を体系的にレビューし,定量化,プルーニング,知識蒸留といった戦略に焦点をあてる。これらの手法の理論的枠組みと異なるアプリケーションシナリオにおけるそれらの効果を比較することにより、モデル予測精度を維持しながら、モデルサイズと推論遅延を著しく低減する能力を示す。さらに, オーバーフィッティングのリスクの増加, 圧縮後の性能損失の制御, アルゴリズムの汎用性の問題など, 現在の効率最適化手法の限界を批判的に検討し, 今後の研究の展望について述べる。本研究は,大規模言語モデルの効率最適化を理解するための包括的な理論的枠組みを提供する。 The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies. Further, we dig deep into the efficiency bottleneck of the training phase, and evaluate in detail the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies to accelerate convergence and reduce memory footprint. By analyzing the mathematical principles and implementation details of these algorithms, we reveal how they effectively improve training efficiency in practice. In terms of model deployment and inference optimization, this paper systematically reviews the latest advances in model compression techniques, focusing on strategies such as quantification, pruning, and knowledge distillation. By comparing the theoretical frameworks of these techniques and their effects in different application scenarios, we demonstrate their ability to significantly reduce model size and inference delay while maintaining model prediction accuracy. In addition, this paper critically examines the limitations of current efficiency optimization methods, such as the increased risk of overfitting, the control of performance loss after compression, and the problem of algorithm generality, and proposes some prospects for future research. In conclusion, this study provides a comprehensive theoretical framework for understanding the efficiency optimization of large-scale language models.	翻訳日:2024-05-21 14:43:16 公開日:2024-05-20
# スクイージング誘起量子拡大多相推定の原理について On the principle of squeezing-induced quantum-enhanced multiphase estimation ( http://arxiv.org/abs/2405.11705v1 ) ライセンス: Link先を確認	Le Bin Ho,	(参考訳) 本研究は,多相量子メートル法におけるスキューズ技術による測定精度の向上について検討する。これらの手法は単相推定においてよく研究され, 有効利用されているが, 多相状態における利用は未だ検討されていない。これらのシナリオにおける量子エンハンスメントのメカニズムを調べることで、このギャップを埋める。我々の分析は、量子クレーマー・ラオ境界を達成するための最適条件に関する理論的および数値的な洞察を与え、スクイーズによる量子拡大多相推定の可能性とメカニズムを理解するのに役立ちます。この研究は量子力学とセンシング技術の進歩の新たな可能性を開く。 We investigate how squeezing techniques can improve measurement precision in multiphase quantum metrology. While these methods are well-studied and used effectively in single-phase estimations, their use in multiphase situations has not been examined yet. We fill this gap by investigating the mechanism of quantum enhancement in these scenarios. Our analysis provides theoretical and numerical insights into the optimal condition for achieving the quantum Cramer-Rao bound, helping us understand the potential and mechanism for quantum-enhanced multiphase estimations with squeezing. This research opens up new possibilities for advancements in quantum metrology and sensing technologies.	翻訳日:2024-05-21 14:43:16 公開日:2024-05-20
# 質問応答のためのLCM精度の向上:救助へのオントロジー! Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! ( http://arxiv.org/abs/2405.11706v1 ) ライセンス: Link先を確認	Dean Allemang, Juan Sequeda,	(参考訳) エンタープライズSQLデータベースの知識グラフ/意味表現(Text-to-SPARQL)を利用するLLM(Large Language Models)を用いたQAシステムは、SQLデータベース(Text-to-SQL)上で直接質問に答えるシステムに比べて精度が高いという証拠が増えている。これまでのベンチマークでは,知識グラフを用いることで,精度が16%から54%に向上した。疑問は残る: 精度をさらに改善し、エラー率を下げるにはどうすればいいのか? LLM生成したSPARQLクエリが不正確な経路を辿った過去の研究の観測に基づいて、我々はそのアプローチを提示する。 1)オントロジーに基づくクエリチェック(OBQC):知識グラフのオントロジーを利用してエラーを検出し、LLM生成したSPARQLクエリがオントロジーの意味と一致するかどうかをチェックする。 2) LLM修復: LLMによるエラー説明を使用してSPARQLクエリを修復する。データベンチマークとチャットすることで、私たちのアプローチが全体の精度を72%に向上し、"私は知らない"結果の8%が追加で含まれています。したがって、全体のエラー率は20%である。これらの結果は、知識グラフ、すなわちオントロジーの投資がLLMによる質問応答システムにより高い精度をもたらすというさらなる証拠を与える。 There is increasing evidence that question-answering (QA) systems with Large Language Models (LLMs), which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.	翻訳日:2024-05-21 14:43:16 公開日:2024-05-20
# 逆ロバスト性のための適応バッチ正規化ネットワーク Adaptive Batch Normalization Networks for Adversarial Robustness ( http://arxiv.org/abs/2405.11708v1 ) ライセンス: Link先を確認	Shao-Yuan Lo, Vishal M. Patel,	(参考訳) ディープネットワークは敵の例に弱い。敵防衛訓練(AT)は、その顕著な効果から、現代の敵防衛の標準的基盤となっている。しかし、ATは極めて時間がかかり、実用アプリケーションへの広範なデプロイを控えている。本稿では,非AT防衛を目標として,ATを排除しつつも,強力な敵攻撃に対して頑健な防衛方法を設計する方法を提案する。この質問に答えるために、テスト時間領域適応の最近の進歩に触発された適応バッチ正規化(BN)を利用する。本稿では,適応バッチ正規化ネットワーク(ABNN)と呼ばれる新しい防衛手法を提案する。 ABNNは、訓練済みの代替モデルを使用して、クリーンBN統計を生成し、ターゲットモデルに送信する。対象モデルはクリーンなデータにのみ訓練され、代替モデルのBN統計を整列することを学ぶ。実験結果から、ABNNは画像データセットとビデオデータセットの両方に対するデジタルおよび物理的に実現可能な攻撃に対して、常に敵のロバスト性を改善することが示された。さらに、ATベースのアプローチに比べて、ABNNはよりクリーンなデータ性能を向上し、トレーニング時間の複雑さを著しく低減することができる。 Deep networks are vulnerable to adversarial examples. Adversarial Training (AT) has been a standard foundation of modern adversarial defense approaches due to its remarkable effectiveness. However, AT is extremely time-consuming, refraining it from wide deployment in practical applications. In this paper, we aim at a non-AT defense: How to design a defense method that gets rid of AT but is still robust against strong adversarial attacks? To answer this question, we resort to adaptive Batch Normalization (BN), inspired by the recent advances in test-time domain adaptation. We propose a novel defense accordingly, referred to as the Adaptive Batch Normalization Network (ABNN). ABNN employs a pre-trained substitute model to generate clean BN statistics and sends them to the target model. The target model is exclusively trained on clean data and learns to align the substitute model's BN statistics. Experimental results show that ABNN consistently improves adversarial robustness against both digital and physically realizable attacks on both image and video datasets. Furthermore, ABNN can achieve higher clean data performance and significantly lower training time complexity compared to AT-based approaches.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 証明できない信頼:教育技術習得実践におけるプライバシーとセキュリティのハードル Trust, Because You Can't Verify:Privacy and Security Hurdles in Education Technology Acquisition Practices ( http://arxiv.org/abs/2405.11712v1 ) ライセンス: Link先を確認	Easton Kelso, Ananta Soneji, Sazzadur Rahaman, Yan Soshitaishvili, Rakibul Hasan,	(参考訳) 高等教育機関(HEI)では教育技術(EdTech)の展望が急速に拡大している。この成長は膨大な複雑さをもたらす。これらのツールによって収集された広範なデータを保護することは、HEIにとって非常に重要です。データ侵害や誤用によるプライバシのインシデントは、データ被写体、特にこれらのツールを使わざるを得ない学生に、セキュリティとプライバシの重大な影響をもたらす可能性がある。これにより、HEIとEdTechベンダーのダイナミクスの深い理解が促される。このギャップに対処するため、私たちは7つのHEIでEdTechのリーダーシップの役割を担っている13人の参加者を対象に、半構造化されたインタビュー調査を実施します。本研究は、HEIにおけるEdTechの買収プロセス、そのプロセス全体にわたるセキュリティとプライバシの問題の検討、サービス契約における適切なセキュリティとプライバシ保護機構を確立する際のHEI職員の問題点、システムとパワー非対称性の視認性の欠如によるベンダーの責任を負うことの難しさなどを明らかにする。現状に関する一定の考察を議論し、状況を改善するための勧告を締めくくる。 The education technology (EdTech) landscape is expanding rapidly in higher education institutes (HEIs). This growth brings enormous complexity. Protecting the extensive data collected by these tools is crucial for HEIs. Privacy incidents of data breaches and misuses can have dire security and privacy consequences on the data subjects, particularly students, who are often compelled to use these tools. This urges an in-depth understanding of HEI and EdTech vendor dynamics, which is largely understudied. To address this gap, we conduct a semi-structured interview study with 13 participants who are in the EdTech leadership roles at seven HEIs. Our study uncovers the EdTech acquisition process in the HEI context, the consideration of security and privacy issues throughout that process, the pain points of HEI personnel in establishing adequate security and privacy protection mechanisms in service contracts, and their struggle in holding vendors accountable due to a lack of visibility into their system and power-asymmetry, among other reasons. We discuss certain observations about the status quo and conclude with recommendations to improve the situation.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# グラフにおける臨界接続のための分散プライバシ保護 Decentralized Privacy Preservation for Critical Connections in Graphs ( http://arxiv.org/abs/2405.11713v1 ) ライセンス: Link先を確認	Conggai Li, Wei Ni, Ming Ding, Youyang Qu, Jianjun Chen, David Smith, Wenjie Zhang, Thierry Rakotoarivelo,	(参考訳) 実体間の多くの実世界の相互接続はグラフとして特徴づけられる。プライバシーとデータユーティリティのバランスがとれたローカルグラフ情報の収集は、最近注目を浴びている。本稿では,結合的なサブグラフ探索に基づいて,個々の参加者に対するエンティティ接続の重要情報を識別し,保護する問題について考察する。この問題は文学では解決されていない。この問題に対処するために,我々は,$p$-cohesion として知られる要塞状粘着部分グラフモデルを用いて,クエリド頂点の臨界接続を抽出することを提案する。要塞内のユーザ接続は、解放されると難読化され、ユーザに関する重要な情報を保護する。新たなメリットとペナルティスコア関数は、各参加者の臨界接続を最小の$p$結合で測定し、コネクションの効果的な識別を容易にするように設計されている。さらに,データ収集者によるクエリに対する応答において,重要な接続のみを保護し,検索した頂点のプライバシを保護することを提案する。分散ディファレンシャルプライバシ(DDP)メカニズムの下では、重要な接続が保護され、残りの接続が未飽和状態にある場合に、その応答が$(\varepsilon, \delta)$-DDPを満たすことが証明されている。提案手法の有効性は,実生活グラフデータセットを用いた広範囲な実験により実証された。 Many real-world interconnections among entities can be characterized as graphs. Collecting local graph information with balanced privacy and data utility has garnered notable interest recently. This paper delves into the problem of identifying and protecting critical information of entity connections for individual participants in a graph based on cohesive subgraph searches. This problem has not been addressed in the literature. To address the problem, we propose to extract the critical connections of a queried vertex using a fortress-like cohesive subgraph model known as $p$-cohesion. A user's connections within a fortress are obfuscated when being released, to protect critical information about the user. Novel merit and penalty score functions are designed to measure each participant's critical connections in the minimal $p$-cohesion, facilitating effective identification of the connections. We further propose to preserve the privacy of a vertex enquired by only protecting its critical connections when responding to queries raised by data collectors. We prove that, under the decentralized differential privacy (DDP) mechanism, one's response satisfies $(\varepsilon, \delta)$-DDP when its critical connections are protected while the rest remains unperturbed. The effectiveness of our proposed method is demonstrated through extensive experiments on real-life graph datasets.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# LLMインフォームドPOI分類を用いた意味軌道データマイニング Semantic Trajectory Data Mining with LLM-Informed POI Classification ( http://arxiv.org/abs/2405.11715v1 ) ライセンス: Link先を確認	Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma,	(参考訳) ヒトの旅行経路のマイニングは、交通システム、経路最適化、交通管理、そして人間の旅行パターンの研究に不可欠である。従来のルールベースのアプローチでは、セマンティック情報を統合することは効率と正確性の両方に制限がある。 Points of Interest(POI)データから推定される活動タイプのような意味情報は、軌道採掘の質を大幅に向上させることができる。しかし、多くのPOIには不完全な特徴情報があり、現在の学習ベースのPOIアルゴリズムは分類を行うためにデータセットの整合性を必要とするため、これらの洞察を統合することは難しい。本稿では,人体走行軌道採掘のための新しいパイプラインを提案する。提案手法はまず,大規模言語モデル(LLM)の強い推論・理解能力を利用して,活動型を付加したPOIをアノテートする。 OpenStreetMap (OSM) POI データセットを用いた評価では,PAI 分類では 93.4% の精度,96.1% のF-1 スコア,91.7% の精度で 92.3% のF-1 スコアを得た。 Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 安全な強化学習のための相反する表現学習の可能性 Feasibility Consistent Representation Learning for Safe Reinforcement Learning ( http://arxiv.org/abs/2405.11718v1 ) ライセンス: Link先を確認	Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao,	(参考訳) 安全強化学習(RL)の分野では、安全制約を満たすことと報酬性能を最適化することのバランスを見つけることが大きな課題である。この取り組みにおける重要な障害は、安全制約の推定であり、通常は、制約信号のスパースな性質から報酬の計量を推定するよりも難しい。この問題に対処するため,FCSRL(Fasibility Consistent Safe Reinforcement Learning)という新しいフレームワークを導入する。本フレームワークは、表現学習と実現可能性指向の目的を組み合わせることで、安全RLのために生の状態から安全関連情報を識別し、抽出する。自己指導型学習技術とより学習可能な安全基準を活用して,政策学習と制約推定を強化する。ベクトル状態および画像に基づくタスクの多岐にわたる経験的評価は,本手法が従来の表現学習ベースラインよりも優れた安全性を学習し,優れた性能を実現することができることを示す。 In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 非アベリア自己補正量子メモリ Non-Abelian Self-Correcting Quantum Memory ( http://arxiv.org/abs/2405.11719v1 ) ライセンス: Link先を確認	Po-Shen Hsin, Ryohei Kobayashi, Guanyu Zhu,	(参考訳) 局所可換な非パウリ安定化格子モデルと$\mathbb{Z}_2^3$高次ゲージ場の非自明なトポロジカル作用を持つ場理論を用いて、粒子励起のない時空次元のD\geq 5+1$で無限に多くの新しい候補非アベリア自己補正型トポロジカル量子メモリの族を構築する。このような非パウリ安定化器モデルをマジック安定化器符号と呼ぶ。トポロジカル順序の族は、アベリアの電気励起とアイシングのような融合則に従う非アベリア磁気励起を持ち、二面体群 $\mathbb{D}_8$ゲージ理論を2+1d で一般化する。最も単純な例は、アベリアループ励起と非アベリア膜励起を含む5+1dの新しい非アベリア自己補正メモリである。我々は、Peierls引数を用いて自己補正特性と熱安定性を示し、確率的局所セル-オートマトンデコーダを考案する。 We construct a family of infinitely many new candidate non-Abelian self-correcting topological quantum memories in $D\geq 5+1$ spacetime dimensions without particle excitations using local commuting non-Pauli stabilizer lattice models and field theories of $\mathbb{Z}_2^3$ higher-form gauge fields with nontrivial topological action. We call such non-Pauli stabilizer models magic stabilizer codes. The family of topological orders have Abelian electric excitations and non-Abelian magnetic excitations that obey Ising-like fusion rules, generalizing the dihedral group $\mathbb{D}_8$ gauge theory in 2+1d. The simplest example includes a new non-Abelian self-correcting memory in 5+1d with Abelian loop excitations and non-Abelian membrane excitations. We use a Peierls argument to demonstrate the self-correction property and the thermal stability, and devise a probablistic local cellular-automaton decoder.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# UAV Swarmの軌道予測と最適化のためのAIアルゴリズム AI Algorithm for Predicting and Optimizing Trajectory of UAV Swarm ( http://arxiv.org/abs/2405.11722v1 ) ライセンス: Link先を確認	Amit Raj, Kapil Ahuja, Yann Busnel,	(参考訳) 本稿では,無人航空機(UAV)の航路生成における人工知能(AI)技術の適用について検討する。 2つの主な課題は、UAVの経路を正確に予測し、それらの衝突を効果的に回避することである。まず,1つの隠蔽層を持つフィードフォワードニューラルネットワーク(FFNN)に多様な活性化関数を体系的に適用し,予測経路の精度を従来よりも向上させる。次に,スウェーデンとエリオットのアクティベーションを高度に融合した新しいアクティベーション関数AdaptoSwelliGaussを導入する。スイッシュは滑らかな遷移を促進し、エリオットは突然の軌道変化を捉え、スケールとシフトしたガウスはノイズに対する堅牢性を高める。このダイナミックな組み合わせは、UAV軌道予測の複雑さを捉えるために特別に設計されている。この新たなアクティベーション関数は、既存のアクティベーション関数よりもかなり精度が高い。第3に,UAVの衝突回避を両立させる新たな統合衝突検出・回避・バッチング(ICDAB)戦略を提案する。この統合は、最初のテクニックで過度に複雑なパスを避けるトラジェクトリ操作の数を減らすことと、第2のテクニックで全体の離陸時間を短縮するバッチサイズを小さくするという、両方の欠点を克服するのに役立ちます。 This paper explores the application of Artificial Intelligence (AI) techniques for generating the trajectories of fleets of Unmanned Aerial Vehicles (UAVs). The two main challenges addressed include accurately predicting the paths of UAVs and efficiently avoiding collisions between them. Firstly, the paper systematically applies a diverse set of activation functions to a Feedforward Neural Network (FFNN) with a single hidden layer, which enhances the accuracy of the predicted path compared to previous work. Secondly, we introduce a novel activation function, AdaptoSwelliGauss, which is a sophisticated fusion of Swish and Elliott activations, seamlessly integrated with a scaled and shifted Gaussian component. Swish facilitates smooth transitions, Elliott captures abrupt trajectory changes, and the scaled and shifted Gaussian enhances robustness against noise. This dynamic combination is specifically designed to excel in capturing the complexities of UAV trajectory prediction. This new activation function gives substantially better accuracy than all existing activation functions. Thirdly, we propose a novel Integrated Collision Detection, Avoidance, and Batching (ICDAB) strategy that merges two complementary UAV collision avoidance techniques: changing UAV trajectories and altering their starting times, also referred to as batching. This integration helps overcome the disadvantages of both - reduction in the number of trajectory manipulations, which avoids overly convoluted paths in the first technique, and smaller batch sizes, which reduce overall takeoff time in the second.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 一般高次元分類枠組みにおける非微分可能サロゲート損失の推論 Inference with non-differentiable surrogate loss in a general high-dimensional classification framework ( http://arxiv.org/abs/2405.11723v1 ) ライセンス: Link先を確認	Muxuan Liang, Yang Ning, Maureen A Smith, Ying-Qi Zhao,	(参考訳) 置換損失関数によるペナル化された経験的リスク最小化は、分類問題において高次元線形決定則を導出するためにしばしば用いられる。文献の多くは一般化誤差に焦点を当てているが、特に代理損失が微分不可能な場合、推定決定規則の駆動要因を特定する有効な推論手順が欠如している。本研究では,不連続な勾配と非正則なヘッセン性を有する一方向線形サロゲート損失を用いて推定した線形決定規則に対する仮説テストと区間推定を構築するために,カーネルスムースな非相関スコアを提案する。具体的には、不連続点付近の不連続勾配を滑らかにするためにカーネル近似を採用し、サロゲート損失の非正則ヘシアンを近似する。追加のニュアンスパラメータが関与するアプリケーションでは、フレキシブルなニュアンス推定とカーネル近似に対応するために、新しいクロスフィットバージョンを提案する。カーネルスムースなデコラートスコアとそのクロスフィットバージョンを高次元設定で限定分布として確立する。提案手法の有効性と優位性を示すため,シミュレーションおよび実データ解析を行った。 Penalized empirical risk minimization with a surrogate loss function is often used to derive a high-dimensional linear decision rule in classification problems. Although much of the literature focuses on the generalization error, there is a lack of valid inference procedures to identify the driving factors of the estimated decision rule, especially when the surrogate loss is non-differentiable. In this work, we propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations for the linear decision rule estimated using a piece-wise linear surrogate loss, which has a discontinuous gradient and non-regular Hessian. Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points and approximate the non-regular Hessian of the surrogate loss. In applications where additional nuisance parameters are involved, we propose a novel cross-fitted version to accommodate flexible nuisance estimates and kernel approximations. We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup. Simulation and real data analysis are conducted to demonstrate the validity and superiority of the proposed method.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 大規模言語モデルのための知識知能学習データ検索 Token-wise Influential Training Data Retrieval for Large Language Models ( http://arxiv.org/abs/2405.11724v1 ) ライセンス: Link先を確認	Huawei Lin, Jikai Long, Zhaozhuo Xu, Weijie Zhao,	(参考訳) LLM(Large Language Model)の生成を前提として、この生成に繋がったトレーニングデータをどのように特定すればよいのか? 本稿では,LLMに適応したスケーラブルなフレームワークであるRapidInを提案し,学習データへの影響を推定した。提案するフレームワークは,キャッシュと検索という2つのステージで構成されている。まず、勾配ベクトルを20,000倍以上圧縮し、ディスクやGPU/CPUメモリにキャッシュする。すると、RapidInはキャッシュされた勾配を効率よく横切り、数分で影響を推定し、6,326倍のスピードアップを達成する。さらに、RapidInはマルチGPU並列化をサポートし、キャッシュと検索を大幅に高速化する。実験の結果,RapidInの有効性と有効性を確認した。 Given a Large Language Model (LLM) generation, how can we identify which training data led to this generation? In this paper, we proposed RapidIn, a scalable framework adapting to LLMs for estimating the influence of each training data. The proposed framework consists of two stages: caching and retrieval. First, we compress the gradient vectors by over 200,000x, allowing them to be cached on disk or in GPU/CPU memory. Then, given a generation, RapidIn efficiently traverses the cached gradients to estimate the influence within minutes, achieving over a 6,326x speedup. Moreover, RapidIn supports multi-GPU parallelization to substantially accelerate caching and retrieval. Our empirical result confirms the efficiency and effectiveness of RapidIn.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 強化学習を加速するハイウェイグラフ Highway Graph to Accelerate Reinforcement Learning ( http://arxiv.org/abs/2405.11727v1 ) ライセンス: Link先を確認	Zidu Yin, Zhen Zhang, Dong Gong, Stefano V. Albrecht, Javen Q. Shi,	(参考訳) 強化学習(RL)アルゴリズムは訓練効率の低下に悩まされることが多い。この問題を緩和するための戦略は、モンテカルロ木探索(MCTS)や価値反復(VI)といったモデルベースの計画アルゴリズムを環境モデルに組み込むことである。 VIの最大の制限は、大きなテンソルを反復する必要があることである。これらはいまだに集中的な計算に繋がる。本稿では,RLアルゴリズムの学習効率を向上させることにより,RLアルゴリズムの学習効率を向上させることに注力する。離散状態と作用空間を持つ決定論的環境において、遷移の非分岐列は中間状態から逸脱することなくエージェントを移動させ、これをハイウェイと呼ぶ。このような非分岐ハイウェイでは、値更新プロセスは1ステップのプロセスとしてマージすることができる。そこで本研究では,状態遷移をモデル化するための新しいグラフ構造であるハイウェイグラフを提案する。我々のハイウェイグラフは遷移モデルを簡潔なグラフに圧縮し、エッジは複数の状態遷移を表現し、各イテレーションで複数の時間ステップで値の伝搬をサポートする。これにより、ハイウェイグラフ上でのVIアルゴリズムの促進により、より効率的な価値学習手法を得ることができる。ハイウェイグラフをRL(モデルに基づくオフポリシーRL法)に統合することにより、初期の段階(100万フレーム)においてRLトレーニングを著しく加速することができる。その結果,提案手法はモデルフリー・モデルベースRLアルゴリズムとモデルフリー・モデルベースRLアルゴリズムの両方に優れており,同等あるいは優れたリターンを維持しつつ10～150倍以上の効率性を示した。さらに、ディープニューラルネットワークベースのエージェントをハイウェイグラフを使用してトレーニングすることで、より一般化とストレージコストの低減を実現している。 Reinforcement Learning (RL) algorithms often suffer from low training efficiency. A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. The major limitation of VI is the need to iterate over a large tensor. These still lead to intensive computations. We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process. For the deterministic environments with discrete state and action spaces, a non-branching sequence of transitions moves the agent without deviating from intermediate states, which we call a highway. On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step. Based on this observation, we propose a novel graph structure, named highway graph, to model the state transition. Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration. We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs. By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames). Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses. Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 遺伝的アルゴリズムとシミュレーションアニーリングを用いたロジスティックスデポにおける作業者スケジューリングの最適化 Optimization of Worker Scheduling at Logistics Depots Using Genetic Algorithms and Simulated Annealing ( http://arxiv.org/abs/2405.11729v1 ) ライセンス: Link先を確認	Jinxin Xu, Haixin Wu, Yu Cheng, Liyang Wang, Xin Yang, Xintong Fu, Yuelong Su,	(参考訳) 本稿では,遺伝的アルゴリズムとシミュレートされたアニーリングアルゴリズムを組み合わせたロジスティクスデポにおける作業者のスケジューリングの最適化について述べる。労働利用を最小化しつつ、ロジスティクス・デポの効率を最適化するためには、恒久的かつ一時的な労働者の効率的なスケジューリングが不可欠である。この研究は0-1整数線形プログラミングモデルの構築から始まり、決定変数が与えられた日毎の時間帯ごとに、永続的および一時的なワーカーのスケジューリングを決定する。目的は、時間的労働条件の履行を保証し、労働者を1日1時間に制限し、永続的な労働者のために連続的な労働日を上限とし、非負性や整数的制約を維持することにある。モデルは、遺伝的アルゴリズムとシミュレートされたアニールを用いて解決される。以上の結果から, 遺伝的アルゴリズムは, 溶液品質の面でシミュレーションアニールよりも優れていたことが示唆された。最適解法は最低29857人日を明らかにする。 This paper addresses the optimization of scheduling for workers at a logistics depot using a combination of genetic algorithm and simulated annealing algorithm. The efficient scheduling of permanent and temporary workers is crucial for optimizing the efficiency of the logistics depot while minimizing labor usage. The study begins by establishing a 0-1 integer linear programming model, with decision variables determining the scheduling of permanent and temporary workers for each time slot on a given day. The objective function aims to minimize person-days, while constraints ensure fulfillment of hourly labor requirements, limit workers to one time slot per day, cap consecutive working days for permanent workers, and maintain non-negativity and integer constraints. The model is then solved using genetic algorithms and simulated annealing. Results indicate that, for this problem, genetic algorithms outperform simulated annealing in terms of solution quality. The optimal solution reveals a minimum of 29857 person-days.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 不合理性の度合い:感度とインプリート揮発性表面 Degree of Irrationality: Sentiment and Implied Volatility Surface ( http://arxiv.org/abs/2405.11730v1 ) ライセンス: Link先を確認	Jiahao Weng, Yan Xie,	(参考訳) 本研究では,毎日の高周波感情データを構築し,VAR法を用いて翌日のインプリート変動面の予測を試みた。 2014年から2023年にかけて、East Money Stock Forumから63万件のテキストデータを収集し、BERTやLSTMといったディープラーニング手法を用いて毎日の市場評価指標を構築しました。 FFT法とEMD法を併用することにより、高頻度の感情はATM(ATM)オプションのインプット・ボラティリティと強く相関し、低頻度の感情はより深いアウト・オブ・ザ・モニー(DOTM)オプションのインプティブ・ボラティリティと強く相関していた。さらに分析したところ、インプリッドボラティリティ表面の形状は、単に市場のパニックを超えた、より豊かな市場の感情情報を含んでいることがわかった。我々は,この感情情報を組み込むことで,刺激された揮発性表面の予測精度を向上させることを実証した。 In this study, we constructed daily high-frequency sentiment data and used the VAR method to attempt to predict the next day's implied volatility surface. We utilized 630,000 text data entries from the East Money Stock Forum from 2014 to 2023 and employed deep learning methods such as BERT and LSTM to build daily market sentiment indicators. By applying FFT and EMD methods for sentiment decomposition, we found that high-frequency sentiment had a stronger correlation with at-the-money (ATM) options' implied volatility, while low-frequency sentiment was more strongly correlated with deep out-of-the-money (DOTM) options' implied volatility. Further analysis revealed that the shape of the implied volatility surface contains richer market sentiment information beyond just market panic. We demonstrated that incorporating this sentiment information can improve the accuracy of implied volatility surface predictions.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 放射線治療における臓器の質保証 Quality assurance of organs-at-risk delineation in radiotherapy ( http://arxiv.org/abs/2405.11732v1 ) ライセンス: Link先を確認	Yihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Wei Liu, Chenbin Liu,	(参考訳) 放射線治療計画において,腫瘍標的と臓器の脱線は重要である。自動セグメンテーションは、医師の作業量を削減し、一貫性を向上させるために使用することができる。しかし, 自動セグメンテーションの品質保証は, 臨床実践においてまだ必要ではない。患者データはAAPM Thoracic Auto-Segmentation Challengeの標準化データセットである。 OARは左右肺,心臓,食道,脊髄であった。 OARの2つのグループが生成され、ベンチマークデータセットは経験豊富な医師によって手動で構成され、テストデータセットはソフトウェアAccuContourを使って自動的に生成される。特徴抽出器としてresnet-152ネットワークが実行され、高品質または低品質の1クラスサポートベクトル分類器が使用された。本研究では, モデル性能のバランス精度, Fスコア, 感度, 特異度, および受信演算子特性曲線の下での評価を行った。我々は,提案手法の一般化を評価するために輪郭誤差をランダムに生成し,検出限界を探索し,検出限界とボリューム,Dice類似度係数,ハウスドルフ距離,平均表面距離などの様々な指標との相関性について検討した。提案した1クラス分類器は、バランスの取れた精度やAUCなどの指標よりも優れていた。提案手法は,様々な種類のエラー処理において,バイナリ分類器よりも大幅に改善された。提案手法は,一級分類フレームワークにおける残差ネットワークとアテンション機構を導入し,様々な種類のOAR輪郭誤差を高精度に検出することができた。提案手法は,輪郭デライン化における医師の診査の負担を大幅に軽減することができる。 The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. Automatic segmentation can be used to reduce the physician workload and improve the consistency. However, the quality assurance of the automatic segmentation is still an unmet need in clinical practice. The patient data used in our study was a standardized dataset from AAPM Thoracic Auto-Segmentation Challenge. The OARs included were left and right lungs, heart, esophagus, and spinal cord. Two groups of OARs were generated, the benchmark dataset manually contoured by experienced physicians and the test dataset automatically created using a software AccuContour. A resnet-152 network was performed as feature extractor, and one-class support vector classifier was used to determine the high or low quality. We evaluate the model performance with balanced accuracy, F-score, sensitivity, specificity and the area under the receiving operator characteristic curve. We randomly generated contour errors to assess the generalization of our method, explored the detection limit, and evaluated the correlations between detection limit and various metrics such as volume, Dice similarity coefficient, Hausdorff distance, and mean surface distance. The proposed one-class classifier outperformed in metrics such as balanced accuracy, AUC, and others. The proposed method showed significant improvement over binary classifiers in handling various types of errors. Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy. The proposed method can significantly reduce the burden of physician review for contour delineation.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# 合成フロケット格子上でC = $\pm$2 のチャーン絶縁体シミュレーション Simulating a Chern Insulator with C = $\pm$2 on Synthetic Floquet Lattice ( http://arxiv.org/abs/2405.11733v1 ) ライセンス: Link先を確認	Lingxiao Lei, Weichen Wang, Guangyao Huang, Shun Hu, Xi Cao, Xinfang Zhang, Mingtang Deng, Pingxing Chen,	(参考訳) 合成フロケ格子は、互いに共振周波数の強い複数の駆動によって生成され、トポロジカル現象の量子シミュレーションのための強力なプラットフォームを提供する。本研究では,ハーフBHZ格子の2層を結合し,そのトポロジカルな性質をシミュレートするためにフロケ格子にマッピングすることで,チャーン数C=$\pm$2のチャーン絶縁体の4バンド強結合モデルを提案する。 Floquet-versionモデルのチャーン数を決定するため、Martin et al (Phys. Rev. X 7, 041008 (2017)) と Boyers et al (Phys. Rev. 125, 160505 (2020)) が導入したトポロジ的発振法を拡張した。シミュレーションの結果、これらの手法のいずれかを用いてチャーン数の抽出に成功したことを示し、元の2層半BHZモデルから導出された理論図と密に一致した位相図の優れた予測を提供する。最後に,本モデルに対する実験的実装の可能性について概説する。我々の研究は、量子コンピューティングプラットフォームを用いて複雑なトポロジカルな物質をシミュレートする大きな可能性を示し、それによって、相互作用しないトポロジカルな量子状態のためのより普遍的なシミュレータを構築する方法を確立し、これらの興味深い現象の理解を深める。 The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently mapping it onto the Floquet lattice to simulate its topological properties. To determine the Chern number of our Floquet-version model, we extend the energy pumping method proposed by Martin et al. [Phys. Rev. X 7, 041008 (2017)] and the topological oscillation method introduced by Boyers et al. [Phys. Rev. Lett. 125, 160505 (2020)], followed by numerical simulations for both methodologies. The simulation results demonstrate the successful extraction of the Chern number using either of these methods, providing an excellent prediction of the phase diagram that closely aligns with the theoretical one derived from the original bilayer half-BHZ model. Finally, we briefly discuss a potential experimental implementation for our model. Our work demonstrates significant potential for simulating complex topological matter using quantum computing platforms, thereby paving the way for constructing a more universal simulator for non-interacting topological quantum states and advancing our understanding of these intriguing phenomena.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# コンタクトレスのポリソノグラフィー:電波が睡眠について教えてくれるもの Contactless Polysomnography: What Radio Waves Tell Us about Sleep ( http://arxiv.org/abs/2405.11739v1 ) ライセンス: Link先を確認	Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi,	(参考訳) 自宅で睡眠を評価したり、睡眠段階を捉えたり、睡眠中に身体から跳ね返る電波を分析するだけで無呼吸症の発生を検知する能力は非常に強力である。このような能力は、患者の家庭における経時的データ収集を可能にし、睡眠の理解と様々な疾患との相互作用、および臨床治験と定期治療の両方における治療反応を知らせる。本稿では、睡眠中の人から反射される電波から睡眠と夜間呼吸を受動的にモニタリングする高度な機械学習アルゴリズムを開発する。金の標準値(ポリソノグラフィー)と比較すると、このモデルが睡眠催眠グラム(ウェイク、ライト睡眠、ディープ睡眠またはREMに分類される30秒エポックの精度が81%)を捉え、睡眠時無呼吸(AUROC = 0.88)を検出し、患者の無呼吸指数(ICC=0.95; 95% CI = [0.93, 0.97])を測定することが示されている。特に、このモデルは人種、性別、年齢にわたって同等のパフォーマンスを示す。さらに、このモデルは睡眠段階と、神経、精神医学、循環器、免疫疾患を含む様々な疾患の間の情報的相互作用を明らかにする。これらの知見は,臨床および介入臨床試験の約束を果たすだけでなく,各種疾患の理解と管理の基本的な要素としての睡眠の重要性も浮き彫りにした。 The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therapeutic responses, both in clinical trials and routine care. In this article, we develop an advanced machine learning algorithm for passively monitoring sleep and nocturnal breathing from radio waves reflected off people while asleep. Validation results in comparison with the gold standard (i.e., polysomnography) (n=849) demonstrate that the model captures the sleep hypnogram (with an accuracy of 81% for 30-second epochs categorized into Wake, Light Sleep, Deep Sleep, or REM), detects sleep apnea (AUROC = 0.88), and measures the patient's Apnea-Hypopnea Index (ICC=0.95; 95% CI = [0.93, 0.97]). Notably, the model exhibits equitable performance across race, sex, and age. Moreover, the model uncovers informative interactions between sleep stages and a range of diseases including neurological, psychiatric, cardiovascular, and immunological disorders. These findings not only hold promise for clinical practice and interventional trials but also underscore the significance of sleep as a fundamental component in understanding and managing various diseases.	翻訳日:2024-05-21 14:33:17 公開日:2024-05-20
# サンプル効率強化学習のための合成観測による未来の学習表現 Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning ( http://arxiv.org/abs/2405.11740v1 ) ライセンス: Link先を確認	Xin Liu, Yaran Chen, Dongbin Zhao,	(参考訳) 視覚強化学習(RL)では、上流表現学習が下流政策学習の効果を決定づける。補助的なタスクを利用することで、エージェントはターゲットとして視覚的表現を高めることができ、下流RLのサンプル効率と性能を向上させることができる。先進的な補助的タスクは、それぞれ異なる補助目的を通じて限られた経験(観察、行動、報酬を含む)からできるだけ多くの情報を抽出する方法に重点を置いている。本稿では,新しい自己教師型 RL アプローチである textbf{L}earning \textbf{F}uture representation with \textbf{S}ynthetic observed \textbf{(LFS)} を提案する。具体的には、将来の情報を含む可能性のある観測を合成するためのトレーニング不要な手法と、不等化合成ノイズを除去するためのデータ選択手法を提案する。残りの合成観測と実観測は、クラスタリングに基づく表現学習のための時間的関連タスクを達成する補助データとして機能する。 LFSは、エージェントが事前に現れていない観察にアクセスし、学習することができるので、後になってそれらがすぐに理解され、活用される。加えて、LFSは報酬やアクションに依存しないため、最近の高度な補助タスクよりも広い範囲のアプリケーション(例えばビデオから学習する)がある。広汎な実験により、我々のLFSは、継続的な制御に挑戦する上で最先端のRLサンプル効率を示し、アクションフリービデオの実演に基づく高度な視覚前訓練を可能にした。 In visual Reinforcement Learning (RL), upstream representation learning largely determines the effect of downstream policy learning. Employing auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the sample efficiency and performance of downstream RL. Prior advanced auxiliary tasks all focus on how to extract as much information as possible from limited experience (including observations, actions, and rewards) through their different auxiliary objectives, whereas in this article, we first start from another perspective: auxiliary training data. We try to improve auxiliary representation learning for RL by enriching auxiliary training data, proposing \textbf{L}earning \textbf{F}uture representation with \textbf{S}ynthetic observations \textbf{(LFS)}, a novel self-supervised RL approach. Specifically, we propose a training-free method to synthesize observations that may contain future information, as well as a data selection approach to eliminate unqualified synthetic noise. The remaining synthetic observations and real observations then serve as the auxiliary data to achieve a clustering-based temporal association task for representation learning. LFS allows the agent to access and learn observations that have not yet appeared in advance, so as to quickly understand and exploit them when they occur later. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced auxiliary tasks. Extensive experiments demonstrate that our LFS exhibits state-of-the-art RL sample efficiency on challenging continuous control and enables advanced visual pre-training based on action-free video demonstrations.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 対称測度に基づく正の写像による量子絡み合いの推定 Quantum entanglement estimation via symmetric measurement based positive maps ( http://arxiv.org/abs/2405.11741v1 ) ライセンス: Link先を確認	Jiaxin Li, Hongmei Yao, Shao-Ming Fei, Zhaobing Fan, Haitao Ma,	(参考訳) 対称測度に基づく正およびトレース保存マップのクラスを提供する。これらの正の写像から、分離性基準、絡み合いの証人、およびより低いコンカレンス境界を示す。我々の分離性基準、絡み合いの証人、下位境界が、関連する既存の結果よりも量子絡み合いをよりよく検出し、推定できることを示す。 We provide a class of positive and trace-preserving maps based on symmetric measurements. From these positive maps we present separability criteria, entanglement witnesses, as well as the lower bounds of concurrence. We show by detailed examples that our separability criteria, entanglement witnesses and lower bounds can detect and estimate the quantum entanglement better than the related existing results.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 組成一般化の一般理論 A General Theory for Compositional Generalization ( http://arxiv.org/abs/2405.11743v1 ) ライセンス: Link先を確認	Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng,	(参考訳) 構成的一般化(CG)は、人間の知的な進歩において重要な認知的な飛躍を示す、馴染み深い概念の新たな組み合わせを理解する能力を具現化したものである。その重要な重要性にもかかわらず、ディープニューラルネットワーク(DNN)は、構成一般化問題に対処する上での課題に直面し、かなりの研究関心を喚起する。しかし、既存の理論はしばしばタスク固有の仮定に依存し、CGの包括的な理解を制限している。本研究の目的は,タスク依存的視点から構成一般化を探求することであり,タスク固有の分析に補完的な視点を提供することである。主な課題は、その範囲を過度に制限することなくCGを定義することである。この定義を用いて、「CGの最終的な解決策はどのようなものか?」という問いに、以下の理論的知見を通して答えようとしている。 1) 一般解の欠如を示すCGにおける最初のNo Free Lunch定理 2)任意のCG問題に適用可能な新しい一般化であって,有効なCGソリューションの条件を指定すること。 3)CG問題とその解決策の理解を深めるための生成的効果の導入。本論文の意義は、CG問題に対する一般的な理論を提供することであり、タスク固有のシナリオの下での事前の定理と組み合わせることで、CGの包括的理解につながる。 Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN) faces challenges in addressing the compositional generalization problem, prompting considerable research interest. However, existing theories often rely on task-specific assumptions, constraining the comprehensive understanding of CG. This study aims to explore compositional generalization from a task-agnostic perspective, offering a complementary viewpoint to task-specific analyses. The primary challenge is to define CG without overly restricting its scope, a feat achieved by identifying its fundamental characteristics and basing the definition on them. Using this definition, we seek to answer the question "what does the ultimate solution to CG look like?" through the following theoretical findings: 1) the first No Free Lunch theorem in CG, indicating the absence of general solutions; 2) a novel generalization bound applicable to any CG problem, specifying the conditions for an effective CG solution; and 3) the introduction of the generative effect to enhance understanding of CG problems and their solutions. This paper's significance lies in providing a general theory for CG problems, which, when combined with prior theorems under task-specific scenarios, can lead to a comprehensive understanding of CG.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 構成可能なミラーダイス : 意思決定の統一を目指して Configurable Mirror Descent: Towards a Unification of Decision Making ( http://arxiv.org/abs/2405.11746v1 ) ライセンス: Link先を確認	Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Shuyue Hu, Xiao Huang, Hau Chan, Bo An,	(参考訳) 意思決定問題は、単一エージェント、eg、Atari、協力型マルチエージェント、eg、ハナビ、競争型マルチエージェント、eg、ホールドエムポーカー、複合型協調型および競争型(eg、サッカー)に分類される。特定の意思決定問題に対処する様々な方法が提案されている。特定のカテゴリーでの成功にもかかわらず、これらの手法は通常独立して進化し、他のカテゴリに一般化することができない。したがって、意思決定の根本的な問題は次の通りである。 \emph{Can we developed \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problem? この問題に対処する主な課題がいくつかある。一異なる意思決定のカテゴリーは、異なるエージェントの数及び異なるエージェント間の関係を含む。二異なるカテゴリーが異なる解決概念及び評価措置を有すること。三すべてのカテゴリをカバーする包括的なベンチマークがないこと。本研究は,3つの主要なコントリビューションでこの問題に対処するための予備的試みを示す。 i) MD変種を一般化した一般化ミラー降下法(GMD)を提案する。二メタコントローラを導入し、評価基準に基づいてGMD条件のハイパーパラメータを動的に調整する構成可能なミラー降下(CMD)を提案する。 iii) 異なる意思決定カテゴリにまたがる15の学術的フレンドリーなゲームを用いて, \textsc{GameBench} を構築した。大規模な実験では、CMDはベースラインよりも経験的に競争力があり、より良い結果が得られる一方で、多様な意思決定の次元を探索する能力を提供している。 Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 線形注意による文脈内学習の漸近理論 Asymptotic theory of in-context learning by linear attention ( http://arxiv.org/abs/2405.11751v1 ) ライセンス: Link先を確認	Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan,	(参考訳) トランスフォーマーは、明示的な事前トレーニングなしで、入力自身で提供される例に基づいてタスクを学習し、実行することができる。イン・コンテクスト・ラーニング(ICL)として知られるこの能力はトランスフォーマーの成功の基盤であるが、必要なサンプルの複雑さ、タスクの多様性の事前学習、ICLの成功のためのコンテキスト長については未解決のままである。ここでは、線形回帰タスクのICLの正確な解法モデルにおいて、これらの疑問に対する正確な答えを提供する。我々は,トークン次元が無限大となる現象論的に豊富なスケーリングシステムにおいて,学習曲線に対する鋭い漸近を導出する。文脈長と事前学習タスクの多様性はトークン次元に比例してスケールし,事前学習サンプルの数は2次的にスケールする。低多様性体制では、モデルはトレーニングタスクを記憶する傾向があり、高多様性体制では、事前訓練されたタスクの範囲を超えて真にコンテキスト内学習と一般化を達成する。これらの理論的洞察は、線形注意と完全な非線形トランスフォーマーアーキテクチャの両方で実験によって実証的に検証される。 Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unresolved. Here, we provide a precise answer to these questions in an exactly solvable model of ICL of a linear regression task by linear attention. We derive sharp asymptotics for the learning curve in a phenomenologically-rich scaling regime where the token dimension is taken to infinity; the context length and pretraining task diversity scale proportionally with the token dimension; and the number of pretraining examples scales quadratically. We demonstrate a double-descent learning curve with increasing pretraining examples, and uncover a phase transition in the model's behavior between low and high task diversity regimes: In the low diversity regime, the model tends toward memorization of training tasks, whereas in the high diversity regime, it achieves genuine in-context learning and generalization beyond the scope of pretrained tasks. These theoretical insights are empirically validated through experiments with both linear attention and full nonlinear Transformer architectures.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 化学プロセスモデリングの基礎モデル:物理インフォームド適応によるメタラーニング Foundation Model for Chemical Process Modeling: Meta-Learning with Physics-Informed Adaptation ( http://arxiv.org/abs/2405.11752v1 ) ライセンス: Link先を確認	Zihao Wang, Zhe Wu,	(参考訳) 本稿では,非線形化学プロセスモデリングの分野における基礎モデルの新たな応用について紹介する。現実世界の化学プロセスのための正確な第一原理モデルを得るという課題と、新しい化学プロセスのためのモデルの再編成と再訓練の非効率性を考えると、我々は重要な疑問を提起する: もし我々は、新しい化学プロセスのモデリングに迅速に適応できる単一の普遍的ニューラルネットワーク(すなわち基礎モデル)を開発できるとしたら? そこで本研究では,Reptile を用いたメタラーニングに基づく基礎モデル構築手法を提案する。提案手法の有効性を評価するため, 連続拌槽リアクター (CSTR) , バッチリアクター (BR) , プラグフローリアクター (PFR) を含む3つの古典的汎用原子炉の各種化学反応の基礎モデルを構築した。提案手法は,データ駆動学習,物理インフォームド学習,伝達学習,純粋メタラーニングといった従来の手法よりも優れている。さらに,提案手法は,指定されたタスクからのデータサンプルのみを用いて,新しいCSTR,BR,PFRへの迅速な適応を実現する。ソースコードはhttps://github.com/killingbear999/chemical-process-foundation-modelで入手できる。 In this work, we introduce a novel application of foundation models in the domain of nonlinear chemical process modeling. Given the challenges of obtaining accurate first-principles models for real-world chemical processes and the inefficiency of rebuilding and retraining models for new chemical processes, we pose a pivotal question: What if we could develop a single, universal neural network (i.e., foundation model) capable of rapidly adapting to modeling any new chemical process? To address this question, we propose a meta-learning-based approach using Reptile to construct the foundation model, followed by physics-informed adaptation to fine-tune it to new modeling tasks using only a few data samples. To assess the effectiveness of our methodology, we construct a foundation model for various chemical reactions in three classical generic reactors, including continuous stirred tank reactors (CSTRs), batch reactors (BRs), and plug flow reactors (PFRs). Our approach outperforms conventional methods such as data-driven learning, physics-informed learning, transfer learning, and pure meta-learning in a few-shot setting. Furthermore, our method achieves rapid adaptation to new CSTRs, BRs, and PFRs using only a few data samples from the designated tasks. Source code is available at https://github.com/killingbear999/chemical-process-foundation-model.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# サブサイクル運動におけるレーザー誘起電子相の解析制御 Analytically controlling laser-induced electron phase in sub-cycle motion ( http://arxiv.org/abs/2405.11753v1 ) ライセンス: Link先を確認	Doan-An Trieu, Trong-Thanh D. Nguyen, Thanh-Duy D. Nguyen, Thanh Tran, Van-Hoang Le, Ngoc-Loan Phan,	(参考訳) 強いレーザー場内でのサブサイクル運動中に蓄積される電子相の精密制御は、強磁場物理学において必須であるが、今のところほとんど間接的かつ複雑である。本稿では、このサブサイクル電子相を制御するために、数サイクル赤外線レーザーパルスとの相互作用において、中心対称ガスターゲットに印加される低周波電界を調整して、このサブサイクル電子相を制御する新しい手法を開発する。本手法は, 強磁場近似による低周波電場とその高調波周波数シフトの普遍的解析的関係に基づく。この単純な関係とその普遍性は、時間依存シュリンガー方程式を直接解くことによって数値的に確認される。さらに、XUV波を連続的かつ正確にチューニングし、THzパルスを包括的にサンプリングする新しい手法を開発することを含む、textit{in situ}アプリケーションにおける検出された関係の利点について論じる。 Precise control of the electron phase accumulated during its sub-cycle motion within intense laser fields is essential in strong-field physics, yet remains mostly indirect and complicated so far. In this Letter, we develop a novel approach to control this sub-cycle electron phase by tuning a low-frequency electric field applied on a centrosymmetric gaseous target during its interaction with a few-cycle infrared laser pulse. Our method is based on a universal analytical relation between the low-frequency electric field and its induced harmonic frequency shift, derived by the strong-field approximation. This simple relation and its universality are confirmed numerically by directly solving the time-dependent Schr\"odinger equation. Moreover, we discuss the benefits of the discovered relation in \textit{in situ} applications, including continuously and precisely tuning XUV waves and developing a new method of comprehensively sampling THz pulse.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# Versatile Teacher: クロスドメイン適応のためのクラス認識型教師学生フレームワーク Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation ( http://arxiv.org/abs/2405.11754v1 ) ライセンス: Link先を確認	Runou Yang, Tian Tian, Jinwen Tian,	(参考訳) データセット間のドメインシフトの課題に対処することは、モデルパフォーマンスを維持する上で不可欠である。クロスドメインオブジェクト検出の文脈において、広く使われている半教師付きモデルである教師学生フレームワークは、大幅な精度の向上を示している。しかし、既存のメソッドはクラスの違いを見落とし、すべてのクラスを平等に扱う。さらに、地域提案ネットワーク(RPN)が存在しないため、インスタンスレベルのアライメントをワンステージ検出器と統合することは、このフレームワークでは未検討のままである。これらの欠点に対応するために,我々はVersatile Teacher (VT) という新しい教師学生モデルを導入する。 VTは、クラス固有の検出困難を考慮して、より信頼性の高い擬似ラベルを生成するために、2段階の擬似ラベル選択機構(Class-aware Pseudo-label Adaptive Selection (CAPS))を採用している。これらのラベルは、ターゲットのインスタンスレベルのアライメントのために識別器を誘導するために、唾液度行列として利用されます。提案手法は,3つのベンチマークデータセットに対して有望な結果を示し,広範に使用されている1段検出器のアライメント手法を拡張し,実用的な応用の可能性を示す。コードはhttps://github.com/RicardooYoung/VersatileTeacher.comで入手できる。 Addressing the challenge of domain shift between datasets is vital in maintaining model performance. In the context of cross-domain object detection, the teacher-student framework, a widely-used semi-supervised model, has shown significant accuracy improvements. However, existing methods often overlook class differences, treating all classes equally, resulting in suboptimal results. Furthermore, the integration of instance-level alignment with a one-stage detector, essential due to the absence of a Region Proposal Network (RPN), remains unexplored in this framework. In response to these shortcomings, we introduce a novel teacher-student model named Versatile Teacher (VT). VT differs from previous works by considering class-specific detection difficulty and employing a two-step pseudo-label selection mechanism, referred to as Class-aware Pseudo-label Adaptive Selection (CAPS), to generate more reliable pseudo labels. These labels are leveraged as saliency matrices to guide the discriminator for targeted instance-level alignment. Our method demonstrates promising results on three benchmark datasets, and extends the alignment methods for widely-used one-stage detectors, presenting significant potential for practical applications. Code is available at https://github.com/RicardooYoung/VersatileTeacher.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# バイアスの消去:半監督学習のためのファインチューニング基礎モデル Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning ( http://arxiv.org/abs/2405.11756v1 ) ライセンス: Link先を確認	Kai Gan, Tong Wei,	(参考訳) 半教師付き学習(SSL)は目覚ましい進歩をみせ、多くの方法のバリエーションが出現した。しかしながら、実践者は、これらのメソッドをデプロイしようとすると、パフォーマンスが低いため、しばしば課題に遭遇する。本稿では,FinSSLという新しいSSLアプローチを提案する。基礎モデルに固有の集合バイアスと認知偏差問題を同定し、バランスの取れたマージンソフトマックスと疎結合ラベルスムーシングを付与することにより、シンプルで効果的な解法を提案する。広範な実験を通じて、FineSSLは、複数のベンチマークデータセットにSSLの最先端を新たに設定し、トレーニングコストを6倍以上削減し、さまざまな微調整と現代的なSSLアルゴリズムをシームレスに統合できることを実証した。ソースコードはhttps://github.com/Gank0078/FineSSLで入手できる。 Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in the emergence of numerous method variations. However, practitioners often encounter challenges when attempting to deploy these methods due to their subpar performance. In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models. We identify the aggregated biases and cognitive deviation problems inherent in foundation models, and propose a simple yet effective solution by imposing balanced margin softmax and decoupled label smoothing. Through extensive experiments, we demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms. The source code is available at https://github.com/Gank0078/FineSSL.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# DLAFormer:ドキュメントレイアウト分析のためのエンドツーエンド変換器 DLAFormer: An End-to-End Transformer For Document Layout Analysis ( http://arxiv.org/abs/2405.11757v1 ) ライセンス: Link先を確認	Jiawei Wang, Kai Hu, Qiang Huo,	(参考訳) 文書レイアウト解析(DLA)は,文書の物理的レイアウトや論理構造,情報検索,文書要約,知識抽出などを理解する上で重要である。しかし、従来の研究では、テーブル/フィギュア検出、テキスト領域の検出、論理的役割分類、読み出し順序予測など、DLA内の個々のサブタスクに対処するために、個別のモデルを使用していた。本研究では,これらのサブタスクを1つのモデルに統合した文書レイアウト解析手法DLAFormerを提案する。そこで本研究では,DLAサブタスク(テキスト領域の検出,論理的役割分類,読み出し順序予測など)を関係予測問題として扱い,これらの関係予測ラベルを統一ラベル空間に統合し,複数のタスクを同時に処理できるようにする。さらに,DeTRにおけるコンテンツクエリの物理的意味を高めるために,新しいタイプワイズクエリを提案する。さらに,グラフィカルなページオブジェクトを正確に識別するための粗大な戦略を採用した。実験の結果,提案したDLAFormerは,DocLayNetとComp-HRDocの2つの文書レイアウト解析ベンチマークにおいて,複数のタスクにマルチブランチアーキテクチャやマルチステージアーキテクチャを採用する従来の手法よりも優れていた。 Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically used separate models to address individual sub-tasks within DLA, including table/figure detection, text region detection, logical role classification, and reading order prediction. In this work, we propose an end-to-end transformer-based approach for document layout analysis, called DLAFormer, which integrates all these sub-tasks into a single model. To achieve this, we treat various DLA sub-tasks (such as text region detection, logical role classification, and reading order prediction) as relation prediction problems and consolidate these relation prediction labels into a unified label space, allowing a unified relation prediction module to handle multiple tasks concurrently. Additionally, we introduce a novel set of type-wise queries to enhance the physical meaning of content queries in DETR. Moreover, we adopt a coarse-to-fine strategy to accurately identify graphical page objects. Experimental results demonstrate that our proposed DLAFormer outperforms previous approaches that employ multi-branch or multi-stage architectures for multiple tasks on two document layout analysis benchmarks, DocLayNet and Comp-HRDoc.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# Fed-Credit: 信頼性管理を備えたロバストなフェデレーション学習 Fed-Credit: Robust Federated Learning with Credibility Management ( http://arxiv.org/abs/2405.11758v1 ) ライセンス: Link先を確認	Jiayan Chen, Zhirong Qian, Tianhui Meng, Xitong Gao, Tian Wang, Weijia Jia,	(参考訳) プライバシ保護を目的としたフェデレーション学習(FL)は、分散型デバイスやデータソースでのモデルトレーニングを可能にする、新興の機械学習アプローチである。 FLの学習メカニズムは、個々のクライアントからのパラメータ更新の集約に依存している。しかし、このプロセスは悪意のあるデバイスが存在するため、潜在的なセキュリティリスクを引き起こす可能性がある。既存のソリューションは、計算集約技術の使用によってコストがかかるか、あるいは攻撃者の数や攻撃方法の事前の知識など、強い仮定の理由で制限される。プライバシ制約と不確実な攻撃シナリオの両方を考慮する方法はほとんどない。本稿では,Fed-Creditと呼ばれる信頼性管理手法に基づく堅牢なFL手法を提案する。従来の研究とは異なり、我々のアプローチではノードやデータ分布に関する事前の知識は必要としない。グローバルモデル更新を調整するために、ローカルモデルとグローバルモデルとの類似性に基づいて、履歴クライアントの貢献度を計測する信頼性セットを維持し、採用する。 Fed-Creditの微妙なところは、時間減衰と時間的値係数が評価重みの動的調整に組み込まれており、O(n) の計算複雑性(n はクライアント数)を誇っていることである。 5種類の攻撃条件下でMNISTとCIFAR-10データセットについて広範な実験を行った。その結果、比較的低い計算複雑性を維持しながら、敵攻撃に対する精度とレジリエンスが向上した。このうち,非IID CIFAR-10データセットでは,2種類のデータ中毒攻撃に対処する際の最先端アルゴリズムと比較して,それぞれ19.5%,14.5%の性能向上を示した。 Aiming at privacy preservation, Federated Learning (FL) is an emerging machine learning approach enabling model training on decentralized devices or data sources. The learning mechanism of FL relies on aggregating parameter updates from individual clients. However, this process may pose a potential security risk due to the presence of malicious devices. Existing solutions are either costly due to the use of compute-intensive technology, or restrictive for reasons of strong assumptions such as the prior knowledge of the number of attackers and how they attack. Few methods consider both privacy constraints and uncertain attack scenarios. In this paper, we propose a robust FL approach based on the credibility management scheme, called Fed-Credit. Unlike previous studies, our approach does not require prior knowledge of the nodes and the data distribution. It maintains and employs a credibility set, which weighs the historical clients' contributions based on the similarity between the local models and global model, to adjust the global model update. The subtlety of Fed-Credit is that the time decay and attitudinal value factor are incorporated into the dynamic adjustment of the reputation weights and it boasts a computational complexity of O(n) (n is the number of the clients). We conducted extensive experiments on the MNIST and CIFAR-10 datasets under 5 types of attacks. The results exhibit superior accuracy and resilience against adversarial attacks, all while maintaining comparatively low computational complexity. Among these, on the Non-IID CIFAR-10 dataset, our algorithm exhibited performance enhancements of 19.5% and 14.5%, respectively, in comparison to the state-of-the-art algorithm when dealing with two types of data poisoning attacks.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 実用的なマッハ・ツェンダー干渉計を用いた差動位相シフトQKD Differential-phase-shift QKD with practical Mach-Zehnder interferometer ( http://arxiv.org/abs/2405.11760v1 ) ライセンス: Link先を確認	Akihiro Mizutani, Masanori Terashita, Junya Matsubayashi, Shogo Mori, Ibuki Matsukura, Suzuna Tagawa, Kiyoshi Tamaki,	(参考訳) 微分位相シフト(DPS)量子鍵分布は、単純な実装のため有望なプロトコルであり、コヒーレントパルス列と受動測定ユニットで実現可能である。さらに、このプロトコルは光源の欠陥に対して堅牢であるという利点がある。しかし残念ながら、測定ユニットに関しては、既存のセキュリティ証明が非現実的な仮定を定めているため、実際の実装ではセキュリティの抜け穴になる可能性がある。本稿では、測定ユニットに主要な欠陥を組み込むことにより、DPSプロトコルの実装セキュリティを強化する。具体的には、既存のセキュリティ証明で仮定されたように、正確に50\%$のビームスプリッタよりも、送信範囲の既知の実用的なビームスプリッタを使用できる。数値シミュレーションにより, 理想値からの透過率の変動が$\pm0.5\%である場合でも, 鍵レートは0.57でしか劣化しないことが示された。この結果は,DPSプロトコルの実現可能性を示すものである。 Differential-phase-shift (DPS) quantum key distribution stands as a promising protocol due to its simple implementation, which can be realized with a train of coherent pulses and a passive measurement unit. Besides, this protocol has the advantage of being robust against imperfections in the light source. Unfortunately, however, as for the measurement unit, existing security proofs put unrealistic assumptions on it, which could be security loopholes in actual implementations. In this paper, we enhance the implementation security of the DPS protocol by incorporating a major imperfection in the measurement unit. Specifically, our proof enables us to employ practical beam splitters with a known range of the transmittance rather than the one with exactly $50\%$, as was assumed in the existing security proofs. Our numerical simulations demonstrate that even with fluctuations of $\pm0.5\%$ in the transmittance from the ideal value, the key rate degrades only by a factor of 0.57. This result highlights the feasibility of the DPS protocol with practical measurement setups.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 地すべりサセプティビリティマッピングにおける解釈可能性の不確実性:統計的・機械学習・ディープラーニングモデルの比較分析 Uncertainty of interpretability in Landslide Susceptibility Mapping: A Comparative Analysis of Statistical, Machine Learning, and Deep Learning Models ( http://arxiv.org/abs/2405.11762v1 ) ライセンス: Link先を確認	Cheng Chen, Lei Fan,	(参考訳) 地すべり感受性マッピング(LSM)は,高リスク領域の特定と予防戦略の実施に不可欠である。本研究では,地すべりの感受性予測における統計的,機械学習(ML),深層学習(DL)モデルの解釈可能性について検討した。これは、地すべりに統計的に関係のある19の要因の包括的セットと、地すべりを誘発する直接に関連する9の要因の専用のセットの2種類の入力因子を組み込むことによって達成される。モデル性能がLSMの重要な指標であることを考えると、解釈可能性に関する調査は、考慮されたモデル間でのLSMの精度の評価と比較を自然に行ないます。本研究では、畳み込みニューラルネットワークモデルが最も精度が高く(19因子0.8447、0.8048、9因子 0.8048)、一方Extreme Gradient Boosting and Support Vector Machineは、従来の統計モデルよりも優れた予測能力を示した。これらの結果から,DLアルゴリズムと高度MLアルゴリズムは,入力要因と地すべりの発生との複雑な関係を効果的に捉えることができることがわかった。しかし、予測の解釈性は、特に19の要因のより広いセットを使用する場合、様々なモデルで異なっていた。 SHAP、LIME、DeepLIFTといった説明法も解釈結果のバリエーションをもたらしている。 19因子からなる包括的集合を用いることで予測精度は向上したが、モデル解釈における複雑さと矛盾が導入された。予測力は犠牲になったが、様々なモデルにまたがるより一貫した重要な要因によって証明され、フィールド調査レポートの調査結果と一致していたように、9つの要因の専用セットに焦点をあてることで解釈可能性を高めた。 Landslide susceptibility mapping (LSM) is crucial for identifying high-risk areas and informing prevention strategies. This study investigates the interpretability of statistical, machine learning (ML), and deep learning (DL) models in predicting landslide susceptibility. This is achieved by incorporating various relevant interpretation methods and two types of input factors: a comprehensive set of 19 contributing factors that are statistically relevant to landslides, as well as a dedicated set of 9 triggering factors directly associated with triggering landslides. Given that model performance is a crucial metric in LSM, our investigations into interpretability naturally involve assessing and comparing LSM accuracy across different models considered. In our investigation, the convolutional neural network model achieved the highest accuracy (0.8447 with 19 factors; 0.8048 with 9 factors), while Extreme Gradient Boosting and Support Vector Machine also demonstrated strong predictive capabilities, outperforming conventional statistical models. These findings indicate that DL and sophisticated ML algorithms can effectively capture the complex relationships between input factors and landslide occurrence. However, the interpretability of predictions varied among different models, particularly when using the broader set of 19 contributing factors. Explanation methods like SHAP, LIME, and DeepLIFT also led to variations in interpretation results. Using a comprehensive set of 19 contributing factors improved prediction accuracy but introduced complexities and inconsistency in model interpretations. Focusing on a dedicated set of 9 triggering factors sacrificed some predictive power but enhanced interpretability, as evidenced by more consistent key factors identified across various models and alignment with the findings of field investigation reports....	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# DATR:データセットレベル適応とプロトタイプアライメントを用いた教師なしドメイン適応検出変換器 DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment ( http://arxiv.org/abs/2405.11765v1 ) ライセンス: Link先を確認	Jianhong Han, Liang Chen, Yupei Wang,	(参考訳) オブジェクト検出器は、収集されたデータ(ソースドメイン)と現実世界のアプリケーション(ターゲットドメイン)のデータの間のドメインギャップに直面すると、しばしば大きなパフォーマンス劣化に直面する。この課題に対処するために、慎重に設計された特徴アライメント技術を利用して、多くの教師なしドメイン適応検出器が提案されている。しかし、これらのテクニックは主にクラスに依存しない方法でインスタンスレベルの特徴を調整し、異なるカテゴリから抽出された特徴の違いを見極めることで、改善は限られている。さらに、現在のアライメントモジュールの範囲は、しばしば限られた画像のバッチに制限され、データセットレベルのキュー全体を学ぶことができず、これにより検出器の一般化能力をターゲット領域に厳しく制限する。そこで本研究では,DATR(Domain Adaptive Detection TRansformer)と呼ばれるDTRベースの強力な検出器を導入する。まず、オブジェクト検出タスクとドメイン適応タスクのギャップを埋めることにより、ドメイン間機能をクラス認識で効果的に整合させるクラスワイドプロトタイプアライメント(CPA)モジュールを提案する。次に、設計されたデータセットレベルのアライメントスキーム(DAS)は、コントラスト学習を活用することで、グローバル表現を達成するための検出器を明示的にガイドし、データセット全体にわたってインスタンスレベルの機能のクラス間区別性を向上する。さらに、DATRは、教師モデルによって生成された擬似ラベルを利用して、平均教師ベースの自己学習フレームワークを導入し、ドメインバイアスをさらに緩和する。複数のドメイン適応シナリオにおいて,提案したDATRの性能と一般化性能が向上することを示した。コードはhttps://github.com/h751410234/DATRで公開されている。 Object detectors frequently encounter significant performance degradation when confronted with domain gaps between collected data (source domain) and data from real-world applications (target domain). To address this task, numerous unsupervised domain adaptive detectors have been proposed, leveraging carefully designed feature alignment techniques. However, these techniques primarily align instance-level features in a class-agnostic manner, overlooking the differences between extracted features from different categories, which results in only limited improvement. Furthermore, the scope of current alignment modules is often restricted to a limited batch of images, failing to learn the entire dataset-level cues, thereby severely constraining the detector's generalization ability to the target domain. To this end, we introduce a strong DETR-based detector named Domain Adaptive detection TRansformer (DATR) for unsupervised domain adaptation of object detection. Firstly, we propose the Class-wise Prototypes Alignment (CPA) module, which effectively aligns cross-domain features in a class-aware manner by bridging the gap between object detection task and domain adaptation task. Then, the designed Dataset-level Alignment Scheme (DAS) explicitly guides the detector to achieve global representation and enhance inter-class distinguishability of instance-level features across the entire dataset, which spans both domains, by leveraging contrastive learning. Moreover, DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Extensive experimental results demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios. Code is released at https://github.com/h751410234/DATR.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# SHAPスコアから機能重要スコアへ From SHAP Scores to Feature Importance Scores ( http://arxiv.org/abs/2405.11766v1 ) ライセンス: Link先を確認	Olivier Letoffe, Xuanxiang Huang, Nicholas Asher, Joao Marques-Silva,	(参考訳) eXplainable Artificial Intelligence(XAI)の中心的な目標は、予測された機械学習(ML)モデルの特徴に相対的な重要性を割り当てることである。特徴属性によるこの説明可能性のタスクの重要性は、最近のSHAPやLIMEといったツールのユビキタスな利用によって説明されている。残念なことに、SHAP と LIME の根底にあるゲーム理論の基礎を用いた機能属性の正確な計算は、明らかな不満足な結果をもたらす可能性がある。最近の研究は、特徴選択による説明の論理的定義に基づく特徴の公理的集約を研究することによって、厳密な特徴属性を目標にしている。本稿は,特徴属性と優先投票力の間には必須の相関関係があることを示し,最近提案された公理集約は,過去に研究されたパワー指標の範囲のいくつかを瞬時に表すものであることを示す。さらに、最も広く使われているパワー指標のどれが特徴重要度スコア(FISs)として活用されるのか、すなわちXAIにおけるパワー指標の使用と、これら指標のどちらが特徴属性によるXAIの目的に最も適しているか、すなわち不満足な結果とは見なされないという点で、はっきりしない。本稿では,FISが提示すべき新たな望ましい特性を提案する。また,提案する特性を示す新しいFISも提案する。最後に、提案した特性の観点から、最もよく知られたパワー指標の厳密な分析を行う。 A central goal of eXplainable Artificial Intelligence (XAI) is to assign relative importance to the features of a Machine Learning (ML) model given some prediction. The importance of this task of explainability by feature attribution is illustrated by the ubiquitous recent use of tools such as SHAP and LIME. Unfortunately, the exact computation of feature attributions, using the game-theoretical foundation underlying SHAP and LIME, can yield manifestly unsatisfactory results, that tantamount to reporting misleading relative feature importance. Recent work targeted rigorous feature attribution, by studying axiomatic aggregations of features based on logic-based definitions of explanations by feature selection. This paper shows that there is an essential relationship between feature attribution and a priori voting power, and that those recently proposed axiomatic aggregations represent a few instantiations of the range of power indices studied in the past. Furthermore, it remains unclear how some of the most widely used power indices might be exploited as feature importance scores (FISs), i.e. the use of power indices in XAI, and which of these indices would be the best suited for the purposes of XAI by feature attribution, namely in terms of not producing results that could be deemed as unsatisfactory. This paper proposes novel desirable properties that FISs should exhibit. In addition, the paper also proposes novel FISs exhibiting the proposed properties. Finally, the paper conducts a rigorous analysis of the best-known power indices in terms of the proposed properties.	翻訳日:2024-05-21 14:23:32 公開日:2024-05-20
# 話者匿名化データを用いた多話者テキスト音声訓練 Multi-speaker Text-to-speech Training with Speaker Anonymized Data ( http://arxiv.org/abs/2405.11767v1 ) ライセンス: Link先を確認	Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda,	(参考訳) 音声生成モデルのスケールアップのトレンドは、トレーニングデータにおける音声のアイデンティティのバイオメトリック情報漏洩の脅威となり、プライバシとセキュリティ上の懸念が高まる。本稿では,他の属性を維持しつつ入力音声の話者識別を隠蔽するプロセスである話者匿名化(SA)を行ったデータを用いて,マルチ話者テキスト音声(TTS)モデルの訓練を行う。 2つの信号処理ベースと3つのディープニューラルネットワークベースのSAメソッドを使用して、テストフェーズ中に未確認の話者TSを実行するために、エンドツーエンドのTSモデルであるVITSをトレーニングするために、マルチスピーカーTSデータセットであるVCTKを匿名化した。我々は、匿名化されたトレーニングデータと、これらのデータを用いてトレーニングされた下流TSモデルの性能を評価するために、広範囲な客観的および主観的な実験を行った。重要なことは、データ駆動型主観的評価予測モデルであるUTMOSと、声質の利得を測定する指標であるGVDが、ダウンストリームTS性能のよい指標であることが判明した。我々は、将来の研究者がマルチスピーカーTTSトレーニングにおけるSAシステムの良否を判断するのに役立つと期待する見解を要約する。 The trend of scaling up speech generation models poses a threat of biometric information leakage of the identities of the voices in the training data, raising privacy and security concerns. In this paper, we investigate training multi-speaker text-to-speech (TTS) models using data that underwent speaker anonymization (SA), a process that tends to hide the speaker identity of the input speech while maintaining other attributes. Two signal processing-based and three deep neural network-based SA methods were used to anonymize VCTK, a multi-speaker TTS dataset, which is further used to train an end-to-end TTS model, VITS, to perform unseen speaker TTS during the testing phase. We conducted extensive objective and subjective experiments to evaluate the anonymized training data, as well as the performance of the downstream TTS model trained using those data. Importantly, we found that UTMOS, a data-driven subjective rating predictor model, and GVD, a metric that measures the gain of voice distinctiveness, are good indicators of the downstream TTS performance. We summarize insights in the hope of helping future researchers determine the goodness of the SA system for multi-speaker TTS training.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 全フォトニック量子リピータの高性能化 An Improved Design for All-Photonic Quantum Repeaters ( http://arxiv.org/abs/2405.11768v1 ) ライセンス: Link先を確認	Ashlesha Patil, Saikat Guha,	(参考訳) 全フォトニック量子リピータは、物質ベースの量子メモリの代わりに、リピータグラフ状態(RGS)と呼ばれるマルチキュービットフォトニックグラフ状態を使用して、主に損失エラーに対する保護を行っている。 RGSは、リピータにおける誤り訂正のための木グラフ符号化論理量子ビットと、隣接するリピータ間の絡み合いを生成する物理リンク量子ビットから構成される。 RGSを生成する2つの方法は、単一の光子で供給される多重確率線形光回路によって作製された小さな絡み合った状態の線形光学ベル状態測定(融合)を用いて、確率的縫合(英語版)と、少数の量子論理可能な固体状態エミッタを用いた直接決定的準備である。核融合による資源オーバーヘッドと量子エミッタ系の回路深さは、RGSのサイズとともに増大する。そのため、資源効率の高いRGSのエンジニアリングが不可欠である。本稿では,従来のRGSよりも少ない量子ビットを用いた全フォトニック量子リピータにおいて,より高い絡み合い率を実現する新しいRGS設計を提案する。木コードリンクキュービットで隣接するリピータを絡める確率を高めることでこれを実現できる。また、損失のみの誤りに対して、リンクキュービット上で論理的BSMを実行するための新しい適応スキームを提案する。適応的BSMは、キュービット損失確率が均一である場合、木コード上の論理的BSMの以前のスキームよりも優れる。これにより、リンク量子ビット上で論理的BSMを実行するのに必要な光学モードの数を削減し、絡み合い率をさらに向上する。 All-photonic quantum repeaters use multi-qubit photonic graph states, called repeater graph states (RGS), instead of matter-based quantum memories, for protection against predominantly loss errors. The RGS comprises tree-graph-encoded logical qubits for error correction at the repeaters and physical {\em link} qubits to create entanglement between neighboring repeaters. The two methods to generate the RGS are probabilistic stitching -- using linear optical Bell state measurements (fusion) -- of small entangled states prepared via multiplexed-probabilistic linear optical circuits fed with single photons, and a direct deterministic preparation using a small number of quantum-logic-capable solid-state emitters. The resource overhead due to fusions and the circuit depth of the quantum emitter system both increase with the size of the RGS. Therefore engineering a resource-efficient RGS is crucial. We propose a new RGS design, which achieves a higher entanglement rate for all-photonic quantum repeaters using fewer qubits than the previously known RGS would. We accomplish this by boosting the probability of entangling neighboring repeaters with tree-encoded link qubits. We also propose a new adaptive scheme to perform logical BSM on the link qubits for loss-only errors. The adaptive BSM outperforms the previous schemes for logical BSM on tree codes when the qubit loss probability is uniform. It reduces the number of optical modes required to perform logical BSM on link qubits to improve the entanglement rate further.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# Uni-Mol Docking V2: Realistic and accurate Binding Pose Predictionを目指して Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction ( http://arxiv.org/abs/2405.11769v1 ) ライセンス: Link先を確認	Eric Alcaide, Zhifeng Gao, Guolin Ke, Yaqi Li, Linfeng Zhang, Hang Zheng, Gengmo Zhou,	(参考訳) 近年,機械学習(ML)手法は分子ドッキングの代替として期待されている。しかし、最近の研究では、これらのMLモデルは、問題に固有の物理的制約を無視しながら、測定値に過度に適合する可能性があることが示されている。 In this work, we present Uni-Mol Docking V2 that showed a impressive improvement in performance, accurate predicting the binding poses of 777+% of ligands in the PoseBusters benchmark with a RMSD value of 2.0 {\AA}, and 75+% pass all quality checks。これは、以前のUni-Mol Dockingモデルによって達成された62%から大幅に増加したことを意味する。特に、我々のUni-Mol Dockingアプローチは化学的に正確な予測を生成し、過去のMLモデルを悩ませたキラリティ反転や立体衝突などの問題を回避しています。さらに,Uni-Mol DockingとUni-Dockのような物理ベースの手法を組み合わせた場合,高品質な予測(RMSD値1.0.AAと1.5.AAの値)と物理音質の高機能化を観察する。本研究は, 仮想スクリーニングおよび薬物設計における産業応用に適したリガンドドッキングに対する総合的なアプローチを採用することにより, 人工知能の科学研究への応用の著しい進歩を示すものである。 Uni-Mol Dockingのコード、データ、サービスはhttps://github.com/dptech-corp/Uni-Mol.comで公開されている。 In recent years, machine learning (ML) methods have emerged as promising alternatives for molecular docking, offering the potential for high accuracy without incurring prohibitive computational costs. However, recent studies have indicated that these ML models may overfit to quantitative metrics while neglecting the physical constraints inherent in the problem. In this work, we present Uni-Mol Docking V2, which demonstrates a remarkable improvement in performance, accurately predicting the binding poses of 77+% of ligands in the PoseBusters benchmark with an RMSD value of less than 2.0 {\AA}, and 75+% passing all quality checks. This represents a significant increase from the 62% achieved by the previous Uni-Mol Docking model. Notably, our Uni-Mol Docking approach generates chemically accurate predictions, circumventing issues such as chirality inversions and steric clashes that have plagued previous ML models. Furthermore, we observe enhanced performance in terms of high-quality predictions (RMSD values of less than 1.0 {\AA} and 1.5 {\AA}) and physical soundness when Uni-Mol Docking is combined with more physics-based methods like Uni-Dock. Our results represent a significant advancement in the application of artificial intelligence for scientific research, adopting a holistic approach to ligand docking that is well-suited for industrial applications in virtual screening and drug design. The code, data and service for Uni-Mol Docking are publicly available for use and further development in https://github.com/dptech-corp/Uni-Mol.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# ファウショットオブジェクトカウントのための空間的類似度分布の学習 Learning Spatial Similarity Distribution for Few-shot Object Counting ( http://arxiv.org/abs/2405.11770v1 ) ライセンス: Link先を確認	Yuanwu Xu, Feifan Song, Haofeng Zhang,	(参考訳) Few-shot Object counting は、クエリイメージ内のオブジェクトの数を、与えられた模範画像と同じクラスに属する数にカウントすることを目的としている。既存の手法では、2次元空間領域におけるクエリ画像と例間の類似性を計算し、回帰してカウント数を求める。しかし、これらの手法は、類似性の空間分布に関する豊富な情報を見落とし、マッチング精度に大きな影響を及ぼす。この問題に対処するために,従来の特徴の空間構造を保存し,クエリ特徴と類似特徴の点間を4D類似度ピラミッドで計算し,各点の完全な分布情報を4D類似度空間で取得する,数ショットオブジェクトカウントのためのネットワーク学習空間類似度分布(SSD)を提案する。本研究では, 類似度分布を予測密度の異なる値にマッピングし, 精度の高い数値を得るために, 類似度ピラミッド上に効率の良い中心ピボット4D畳み込みを適用した類似度学習モジュール(SLM)を提案する。さらに,FCE(Feature Cross Enhancement)モジュールも導入し,クエリの強化と特徴マッチングの精度向上を図る。提案手法は,FSC-147やCARPKなど,複数のデータセット上で最先端の手法より優れている。コードはhttps://github.com/CBalance/SSDで入手できる。 Few-shot object counting aims to count the number of objects in a query image that belong to the same class as the given exemplar images. Existing methods compute the similarity between the query image and exemplars in the 2D spatial domain and perform regression to obtain the counting number. However, these methods overlook the rich information about the spatial distribution of similarity on the exemplar images, leading to significant impact on matching accuracy. To address this issue, we propose a network learning Spatial Similarity Distribution (SSD) for few-shot object counting, which preserves the spatial structure of exemplar features and calculates a 4D similarity pyramid point-to-point between the query features and exemplar features, capturing the complete distribution information for each point in the 4D similarity space. We propose a Similarity Learning Module (SLM) which applies the efficient center-pivot 4D convolutions on the similarity pyramid to map different similarity distributions to distinct predicted density values, thereby obtaining accurate count. Furthermore, we also introduce a Feature Cross Enhancement (FCE) module that enhances query and exemplar features mutually to improve the accuracy of feature matching. Our approach outperforms state-of-the-art methods on multiple datasets, including FSC-147 and CARPK. Code is available at https://github.com/CBalance/SSD.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# テキスト分類における規則性の探索:明示的手法と暗黙的手法の比較研究 Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques ( http://arxiv.org/abs/2405.11775v1 ) ライセンス: Link先を確認	Siva Rajesh Kasa, Aniket Goel, Karan Gupta, Sumegh Roychowdhury, Anish Bhanushali, Nikhil Pattisapu, Prasanna Srinivasa Murthy,	(参考訳) 正規分類(OC)は自然言語処理(NLP)において、感情分析、評価予測など様々な分野の応用において、広く直面している課題である。 OCに取り組むための従来のアプローチは、ラベルの順序性について、textbf{explicitly} が考慮する既存の機能や新規な損失関数の修正に重点を置いていた。しかし、事前訓練された言語モデル(PLMs)が出現すると、ラベルの \textbf{implicit} セマンティクスを通じて順序性に取り組むことが可能になった。本稿では,これら2つのアプローチの総合的理論的および実証的研究について述べる。さらに、特定の設定に基づいて採用する最も効果的なアプローチについて、戦略的に推奨する。 Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP), with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of Pretrained Language Models (PLMs), it became possible to tackle ordinality through the \textbf{implicit} semantics of the labels as well. This paper provides a comprehensive theoretical and empirical examination of both these approaches. Furthermore, we also offer strategic recommendations regarding the most effective approach to adopt based on specific settings.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# プランニングによる効率的なマルチエージェント強化学習 Efficient Multi-agent Reinforcement Learning by Planning ( http://arxiv.org/abs/2405.11778v1 ) ライセンス: Link先を確認	Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang,	(参考訳) マルチエージェント強化学習(MARL)アルゴリズムは、大規模意思決定タスクの解決において、目覚ましいブレークスルーを達成している。それでも、既存のMARLアルゴリズムのほとんどはモデルフリーであり、サンプル効率を制限し、より困難なシナリオにおける適用性を妨げている。対照的に、モデルベース強化学習(MBRL)、特にMuZeroのような計画統合アルゴリズムは、多くのタスクにおいて限られたデータで超人的なパフォーマンスを示す。したがって、モデルベースアプローチを採用することにより、MARLのサンプル効率を高めることを目指している。しかし,マルチエージェントシステムに計画手法と探索手法を組み込むことが大きな課題となっている。マルチエージェントシステムの拡張行動空間は、学習を加速するためにエージェントのほぼ独立性を活用する必要があることが多い。この問題に対処するため,政策探索のための集中型モデルとモンテカルロ木探索(MCTS)を組み合わせたMAZeroアルゴリズムを提案する。分散実行とパラメータ共有を容易にする新しいネットワーク構造を設計する。動作空間を大きくした決定論的環境における探索効率を向上させるために,最適化検索ラムダ (OS($\lambda$)) とアドバンテージ重み付きポリシー最適化 (AWPO) の2つの新しい手法を導入する。 SMACベンチマークの大規模な実験により、MAZeroはサンプル効率の点でモデルフリーアプローチより優れており、サンプル効率と計算効率の両面で既存のモデルベース手法と同等または優れた性能を提供することが示された。私たちのコードはhttps://github.com/liuqh16/MAZero.comから入手可能です。 Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks. Nonetheless, most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios. In contrast, model-based reinforcement learning (MBRL), particularly algorithms integrating planning, such as MuZero, has demonstrated superhuman performance with limited data in many tasks. Hence, we aim to boost the sample efficiency of MARL by adopting model-based approaches. However, incorporating planning and search methods into multi-agent systems poses significant challenges. The expansive action space of multi-agent systems often necessitates leveraging the nearly-independent property of agents to accelerate learning. To tackle this issue, we propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search. We design a novel network structure to facilitate distributed execution and parameter sharing. To enhance search efficiency in deterministic environments with sizable action spaces, we introduce two novel techniques: Optimistic Search Lambda (OS($\lambda$)) and Advantage-Weighted Policy Optimization (AWPO). Extensive experiments on the SMAC benchmark demonstrate that MAZero outperforms model-free approaches in terms of sample efficiency and provides comparable or better performance than existing model-based methods in terms of both sample and computational efficiency. Our code is available at https://github.com/liuqh16/MAZero.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# ベイズコアセットの品質に関する一般境界 General bounds on the quality of Bayesian coresets ( http://arxiv.org/abs/2405.11780v1 ) ライセンス: Link先を確認	Trevor Campbell,	(参考訳) ベイジアンコアセットは、データの小さな重み付きサブセットに基づいて、全データログ様関数を代理ログ様関数で近似することにより、大規模データ体制における後部推論を高速化する。しかし、ベイジアンコアセットと構成法は幅広いモデルに適用できるが、コアセット近似によって得られた後続の推論誤差の既存の理論的解析は、制限的な設定(指数関数的なファミリーモデル、あるいは強い対数共空と滑らかさの仮定を持つモデル)にのみ適用される。この研究は、ベイズコアセットの完全な適用範囲を反映したコアセット近似のクルバック・リーブラ(KL)偏差に関する一般上界と下界を示す。下界はベイズ漸近解析に典型的な穏やかなモデル仮定しか必要としないが、上界は、初期の研究で用いられる条件よりも弱い一般化された指数性基準を満たすために対数様の関数を必要とする。下限はコアセット近似の質に関する基本的な限界を得るために適用され、また、重要なサンプリングベース工法において、以前に観測された貧弱な経験的性能に関する理論的説明を提供する。上界は最近のサブサンプル最適化手法の性能解析に使用される。この理論の柔軟性は、マルチモーダル、未同定、重尾のベイズ分布を含む検証実験で実証される。 Bayesian coresets speed up posterior inference in the large-scale data regime by approximating the full-data log-likelihood function with a surrogate log-likelihood based on a small, weighted subset of the data. But while Bayesian coresets and methods for construction are applicable in a wide range of models, existing theoretical analysis of the posterior inferential error incurred by coreset approximations only apply in restrictive settings -- i.e., exponential family models, or models with strong log-concavity and smoothness assumptions. This work presents general upper and lower bounds on the Kullback-Leibler (KL) divergence of coreset approximations that reflect the full range of applicability of Bayesian coresets. The lower bounds require only mild model assumptions typical of Bayesian asymptotic analyses, while the upper bounds require the log-likelihood functions to satisfy a generalized subexponentiality criterion that is weaker than conditions used in earlier work. The lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations, and to provide a theoretical explanation for the previously-observed poor empirical performance of importance sampling-based construction methods. The upper bounds are used to analyze the performance of recent subsample-optimize methods. The flexibility of the theory is demonstrated in validation experiments involving multimodal, unidentifiable, heavy-tailed Bayesian posterior distributions.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 量子アニールマシンの最適化問題としての海洋力学問題の定式化と評価 Formulation and evaluation of ocean dynamics problems as optimization problems for quantum annealing machines ( http://arxiv.org/abs/2405.11782v1 ) ライセンス: Link先を確認	Takuro Matsuta, Ryo Furue,	(参考訳) 量子コンピューティングの最近の進歩は、海洋学や大気科学を含む様々な科学領域で計算アルゴリズムに革命をもたらす可能性を示唆している。この分野はまだ比較的若いが、量子計算は古典的な計算とは大きく異なり、海洋力学や大気力学を表現するのに適したフレームワークはまだ研究されていない。量子アニール(Quantum annealing)は、組合せ最適化タスクに重点を置いている。本稿では,古典的な量子アニール法である量子アニール法(QA)とシミュレートされたアニール法(SA)を用いて,古典的なストメル問題を解く。線形偏微分方程式を最小二乗法により最適化問題にキャストし、コスト関数を2つの方法で微分する。いずれの場合も、適切なパラメータが選択された場合、SAは期待された解をうまく再現し、アニールがポテンシャルを持つことを示す。対照的に、D-Wave量子アニールマシンを用いたQAは、ハードウェアの制限により、いくつかのケースにおいて良い解を得ることができず、特に、マシンの高度に制限された接続グラフは、少なくとも現在利用可能なアルゴリズムの下では、解決可能な問題のサイズを制限する。接続グラフの拡張やグラフ埋め込みアルゴリズムの改善は、おそらく、量子アニールマシンが海洋や大気の力学問題に利用できるために必要だろう。この発見は、量子アニールの実用化のためのハードウェアの改善とグラフ埋め込みアルゴリズムの強化の必要性を強調しているが、シミュレートされたアニールによる結果は、実際の物理力学問題に対処する可能性を示している。量子計算が進化し続ければ、これらの課題に対処することで、海洋や大気モデリングの革新的な進歩がもたらされる可能性がある。 Recent advancements in quantum computing suggest the potential to revolutionize computational algorithms across various scientific domains including oceanography and atmospheric science. The field is still relatively young and quantum computation is so different from classical computation that suitable frameworks to represent oceanic and atmospheric dynamics are yet to be explored. Quantum annealing, one of the major paradigms, focuses on combinatorial optimization tasks. In this paper, we solve the classical Stommel problem by quantum annealing (QA) and simulated annealing (SA), a classical counterpart of quantum annealing. We cast the linear partial differential equation into an optimization problem by the least-squares method and discretize the cost function in two ways: finite difference and truncated basis expansion. In either case, SA successfully reproduces the expected solution when appropriate parameters are chosen, demonstrating that annealing has the potential. In contrast, QA using the D-Wave quantum annealing machine fails to obtain good solutions for some cases owing to hardware limitations; in particular, the highly limited connectivity graph of the machine limits the size of the solvable problems, at least under currently available algorithms. Either expanding the connectivity graph or improving the graph embedding algorithms would probably be necessary for quantum annealing machines to be usable for oceanic and atmospheric dynamics problems. While this finding emphasizes the need for hardware improvements and enhancements in graph embedding algorithms for practical applications of quantum annealers, the results from simulated annealing suggest its potential to address practical geophysical dynamics problems. As quantum calculation continues to evolve, addressing these challenges may lead to transformative advancements in ocean and atmosphere modeling.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 量子自然言語処理を用いた金属有機フレームワークの逆設計 Inverse Design of Metal-Organic Frameworks Using Quantum Natural Language Processing ( http://arxiv.org/abs/2405.11783v1 ) ライセンス: Link先を確認	Shinyoung Kang, Jihan Kim,	(参考訳) 本研究では、量子自然言語処理(QNLP)を用いて、ターゲット特性を持つ金属-有機フレームワーク(MOF)を逆設計する可能性について検討する。具体的には、10個の金属ノードと15個の有機配位子からなる150個の仮説MOF構造を解析することにより、これらの構造を細孔体積とH_{2}$の取り込み値の4つの異なるクラスに分類する。次に、様々なQNLPモデル(単語のバッグ、DisCoCat(分布構成カテゴリー)、シーケンスベースモデル)を比較し、MOFデータセットを処理する上で最も効果的なアプローチを特定する。 IBM Qiskitによって提供される古典的なシミュレータを用いて、単語のバッグ・オブ・ワード・モデルは最適なモデルであると同定され、それぞれ、細孔体積のバイナリ分類タスクに対して85.7%と86.7%の検証精度を達成する。さらに, 量子回路の確率的性質に合わせた多クラス分類モデルを開発し, 平均試験精度88.4%, 80.7%で, 細孔体積およびH_{2}$のアップテイクデータセットについて検討した。最後に、ターゲット特性を持つMOFの生成性能は、孔容積が93.5%、H_{2}$が89%であった。我々の調査はMOFの検索領域のごく一部しかカバーしていないが、材料設計に量子コンピューティングを使うための有望な第一歩であり、MOFの複雑な景観を探索する新しい視点を提供する。 In this study, we explore the potential of using quantum natural language processing (QNLP) to inverse design metal-organic frameworks (MOFs) with targeted properties. Specifically, by analyzing 150 hypothetical MOF structures consisting of 10 metal nodes and 15 organic ligands, we categorize these structures into four distinct classes for pore volume and $H_{2}$ uptake values. We then compare various QNLP models (i.e. the bag-of-words, DisCoCat (Distributional Compositional Categorical), and sequence-based models) to identify the most effective approach to process the MOF dataset. Using a classical simulator provided by the IBM Qiskit, the bag-of-words model is identified to be the optimum model, achieving validation accuracies of 85.7% and 86.7% for binary classification tasks on pore volume and $H_{2}$ uptake, respectively. Further, we developed multi-class classification models tailored to the probabilistic nature of quantum circuits, with average test accuracies of 88.4% and 80.7% across different classes for pore volume and $H_{2}$ uptake datasets. Finally, the performance of generating MOF with target properties showed accuracies of 93.5% for pore volume and 89% for $H_{2}$ uptake, respectively. Although our investigation covers only a fraction of the vast MOF search space, it marks a promising first step towards using quantum computing for materials design, offering a new perspective through which to explore the complex landscape of MOFs.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 最大エントロピーを用いた逆処理強化学習 Reward-Punishment Reinforcement Learning with Maximum Entropy ( http://arxiv.org/abs/2405.11784v1 ) ライセンス: Link先を確認	Jiexin Wang, Eiji Uchibe,	(参考訳) 我々は,長期的政策エントロピーの最適化を報奨助成強化学習目標に統合した ‘soft Deep MaxPain' (softDMP) アルゴリズムを導入する。私たちのモチベーションは、従来の `max'' や `min'' の演算子以外の動作値の更新に使用される演算子のスムーズな変動を促進することにあります。また、以前のDeep MaxPain法から未解決の2つの問題にも対処する。まず,罰行動値から得られる負の(「フリップ」)鎮痛サブ政治が,「ミン」オペレータと協調して罰モジュールを効果的に学習し,ソフトDMPのスムーズな学習オペレータが「フリップ」のトリックについてどのように洞察するかを検討する。第2に,統一行動政策における<flipped'サブ政治(Pain-avoidanceサブ政治)の関与による不整合を軽減するために,罰則を学習するデータ収集の課題に取り組む。 2つの離散マルコフ決定過程(MDP)環境での最初の課題を実証的に探求し、DMPアプローチの重要な進歩とハード演算子に対するソフト処理の必要性を解明する。第2号では、痛覚サブ政治と痛覚サブ政治と目標達成サブ政治の合計との比率に基づく確率的分類器を提案する。この分類器は、それぞれ報酬と罰則値関数を更新するリプレイバッファにロールアウトを割り当てる。本稿では,ROS Gazebo シミュレーションにより,Turtlebot 3 の迷路ナビゲーションタスクにおいて優れた性能を示す。 We introduce the ``soft Deep MaxPain'' (softDMP) algorithm, which integrates the optimization of long-term policy entropy into reward-punishment reinforcement learning objectives. Our motivation is to facilitate a smoother variation of operators utilized in the updating of action values beyond traditional ``max'' and ``min'' operators, where the goal is enhancing sample efficiency and robustness. We also address two unresolved issues from the previous Deep MaxPain method. Firstly, we investigate how the negated (``flipped'') pain-seeking sub-policy, derived from the punishment action value, collaborates with the ``min'' operator to effectively learn the punishment module and how softDMP's smooth learning operator provides insights into the ``flipping'' trick. Secondly, we tackle the challenge of data collection for learning the punishment module to mitigate inconsistencies arising from the involvement of the ``flipped'' sub-policy (pain-avoidance sub-policy) in the unified behavior policy. We empirically explore the first issue in two discrete Markov Decision Process (MDP) environments, elucidating the crucial advancements of the DMP approach and the necessity for soft treatments on the hard operators. For the second issue, we propose a probabilistic classifier based on the ratio of the pain-seeking sub-policy to the sum of the pain-seeking and goal-reaching sub-policies. This classifier assigns roll-outs to separate replay buffers for updating reward and punishment action-value functions, respectively. Our framework demonstrates superior performance in Turtlebot 3's maze navigation tasks under the ROS Gazebo simulation.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 構造に基づく医薬品設計を促進するための多目的生成AI Guided Multi-objective Generative AI to Enhance Structure-based Drug Design ( http://arxiv.org/abs/2405.11785v1 ) ライセンス: Link先を確認	Amit Kadan, Kevin Ryczko, Adrian Roitberg, Takeshi Yamazaki,	(参考訳) 生成AIは、薬物発見に革命をもたらす可能性がある。しかし、最近の機械学習の進歩にもかかわらず、既存のモデルでは全ての物理化学的性質を満たす分子を生成できない。本稿では, 深層拡散と多目的最適化を組み合わせた新規な生成化学AIであるIDOLproについて述べる。拡散モデルの潜伏変数は、無チャートな化学空間を探索し、シリカで新規な配位子を生成し、複数の標的物理化学的特性を最適化するために、微分可能なスコアリング関数で導かれる。 2つのベンチマークセット上で、結合親和性と合成アクセシビリティを最適化したリガンドを生成することで、その効果を実証する。 IDOLproは、各テストセットにおける次の最高の最先端よりも10%以上の結合親和性を持つリガンドを生成する。実験複合体の試験セットでは、IDOLproは実験で観察された配位子の性能を初めて上回った。 IDOLproは、他のスコアリング機能(ADME-Toxなど)に対応して、ヒットフィディング、ヒット・ツー・リード、および薬物発見のためのリード最適化を加速することができる。 Generative AI has the potential to revolutionize drug discovery. Yet, despite recent advances in machine learning, existing models cannot generate molecules that satisfy all desired physicochemical properties. Herein, we describe IDOLpro, a novel generative chemistry AI combining deep diffusion with multi-objective optimization for structure-based drug design. The latent variables of the diffusion model are guided by differentiable scoring functions to explore uncharted chemical space and generate novel ligands in silico, optimizing a plurality of target physicochemical properties. We demonstrate its effectiveness by generating ligands with optimized binding affinity and synthetic accessibility on two benchmark sets. IDOLpro produces ligands with binding affinities over 10% higher than the next best state-of-the-art on each test set. On a test set of experimental complexes, IDOLpro is the first to surpass the performance of experimentally observed ligands. IDOLpro can accommodate other scoring functions (e.g. ADME-Tox) to accelerate hit-finding, hit-to-lead, and lead optimization for drug discovery.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# TinyLLaVA Factory: 小型マルチモーダルモデルのためのモジュール化コードベース TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models ( http://arxiv.org/abs/2405.11788v1 ) ライセンス: Link先を確認	Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang, Ji Wu,	(参考訳) 小型の大規模マルチモーダルモデル(LMM)のためのオープンソースのモジュラーコードベースであるTinyLLaVA Factoryを紹介し,コード実装の単純さ,新機能の拡張性,トレーニング結果の再現性などに注目した。ソフトウェアエンジニアリングにおけるファクトリパターンの設計哲学に従い、TinyLLaVA Factoryはシステム全体を交換可能なコンポーネントにモジュール化し、各コンポーネントは最先端のモデルとメソッドのスイートを統合する一方で、より多くの機能の拡張の余地を残している。 TinyLLaVA Factoryは、ユーザが独自のLMMをカスタマイズできるだけでなく、一般的なトレーニングレシピを提供して、コーディング作業の少ないモデルの事前トレーニングと微調整を可能にしている。経験的な実験はコードベースの有効性を検証する。 TinyLLaVA Factoryの目標は、研究者や実践者が安価な計算資源で小規模なLMMを設計し、訓練するという広い視野を探索するのを支援することである。 We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and training small-scale LMMs with affordable computational resources.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 祖父母の詐欺:高齢者のフラッドと人間層の概念に関するシステムパースペクティブケーススタディ The Grandparent Scam: A Systems Perspective Case Study On Elder Fraud And The Concept Of Human Layering ( http://arxiv.org/abs/2405.11789v1 ) ライセンス: Link先を確認	Michelle Espinoza,	(参考訳) 2024年4月、81歳のオハイオ州の男性が殺人、暴行、誘拐で起訴された。その男は家族を害を脅かす詐欺から守っていると信じていた。彼が気づかなかったのは、61歳のUberドライバーが同じ詐欺の被害者だったことだ。このケーススタディでは、システムの観点から、祖父母のスカムの一般的な変種と、これらのスカムにおける良心の武器化について検討する。さらに, マネーロンダリングにおける層状化と, これらの詐欺の実行における人的層状化の並列性について検討した。 In April 2024, an 81-year-old Ohio man was charged with murder, assault, and kidnapping. The man believed that he was protecting his family from scammers threatening harm. What he did not realize was that the 61-year-old Uber driver he killed, was also a victim of the same scammers. This case study examines some common variants of the Grandparent Scam from a systems perspective and how weaponization of conscience is used in these scams. Additionally, this study examines the parallels between layering in money laundering and human layering in the execution of these scams.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# 重力子物理学:重力子、重力子ノイズ、重力デコヒーレンスの量子場理論-簡潔なチュートリアル Graviton physics: Quantum field theory of gravitons, graviton noise and gravitational decoherence -- a concise tutorial ( http://arxiv.org/abs/2405.11790v1 ) ライセンス: Link先を確認	Jen-Tsung Hsiang, Hing-Tong Cho, Bei-Lok Hu,	(参考訳) 2015年に重力波が検出されたことで、ブラックホールや中性子星の強磁場のダイナミクスを観測できる新しい重力波天文学が誕生した。アインシュタインの古典的一般相対性理論の実験と宇宙実験のためのエキサイティングな新しい窓を開いた。近年、摂動重力の量子的性質を明らかにするための興味深い2つの提案がある。 1)重力場の真空後の初期の宇宙からの重力音が、膨張によって強く圧迫されたかの理論的予測。 2) 重畳状態における2つの質量間の量子エンタングルメントを用いた実験的提案。第一の提案は場の確率的性質を呼び起こし、第二の提案は量子情報の鍵となる概念を呼び起こす。同様にベーシックで興味深い考え方は、重力デコヒーレンス(英語版)として知られる量子系が古典的に見えるかどうかを問うことである。重力によるデコヒーレンス(decoherence)は、重力が普遍的であるため特に興味深い。これは、マクロ的な量子現象において重要な問題である。これらのエキサイティングな発展を十分に理解するには、古典的なGR、QF理論、QIの作業知識に加えて、確率過程、すなわち量子場のノイズに精通することが必要である。伝統的に、新しい研究者は、背景によって、GR、QFT、QI、SPという4つの主題のうちの1つまたは2つで会話することができる。このチュートリアルは、これらの4つの主題のうちのどれかの係わる読者が、これらの学際的な研究トピックのフロンティアに跳躍するのを手助けする。ここでは、このタイトルに記載されている3つのトピックを扱い、重力の絡みを和らげる。なぜなら、その性質と量子重力に関して宣言された含意が、まだ多くの議論を呼んでいる要素を含んでいるからである。 The detection of gravitational waves in 2015 ushered in a new era of gravitational wave astronomy capable of probing into the strong field dynamics of black holes and neutron stars. It has opened up an exciting new window for laboratory and space tests of Einstein's theory of classical general relativity. In recent years there are two interesting proposals aimed at revealing the quantum natures of perturbative gravity: 1) theoretical predictions in how graviton noise from the early universe after the vacuum of the gravitational field was strongly squeezed by inflationary expansion; 2) experimental proposals using the quantum entanglement between two masses each in a superposition state. The first proposal invokes the stochastic properties of quantum fields, the second invokes a key concept of quantum information. An equally basic and interesting idea is to ask whether and how gravity might be responsible for a quantum system becoming classical in appearance, known as gravitational decoherence. Decoherence due to gravity is of special interest because gravity is universal. This is an important issue in macroscopic quantum phenomena. To fully appreciate these exciting developments requires a working knowledge in classical GR, QF theory and QI plus some familiarity with stochastic processes, namely, noise in quantum fields. Traditionally a new researcher may be conversant in one or two of these four subjects: GR, QFT, QI, SP, depending on his/her background. This tutorial attempts to provide the necessary connections between them, helping an engaging reader from any one of these four subjects to leapfrog to the frontier of these interdisciplinary research topics. Here we shall treat the three topics listed in the title, save gravitational entanglement, because its nature and implications proclaimed in relation to quantum gravity still contain many controversial elements.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# MM-Retinal:Fundus Image-Text Expertiseによる知識強化基礎トレーニング MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise ( http://arxiv.org/abs/2405.11793v1 ) ライセンス: Link先を確認	Ruiqi Wu, Chenran Zhang, Jianle Zhang, Yi Zhou, Tao Zhou, Huazhu Fu,	(参考訳) 現在の基礎画像分析モデルは、主に個々のデータセットに依存する特定のタスクのために構築されている。学習プロセスは、通常、事前の知識のないデータ駆動パラダイムに基づいており、転送性や一般化性が劣る。この問題に対処するため,プロフェッショナル・ファンドス・ダイアグラムから収集した高品質な画像テキスト・ペアを含むマルチモーダル・データセットMM-Retinalを提案する。さらに,MM-Retinalを用いて,Fundus Image-Textの専門知識を取り入れたKeepFITという,知識強化型基礎事前学習モデルを提案する。画像類似性に基づくテキストリビジョンと、専門家の知識を注入するための混合トレーニング戦略によって設計されている。提案するファウンデーションモデルは、6つの未知の下流タスクにまたがる最先端のパフォーマンスを実現し、ゼロショットおよび少数ショットシナリオにおいて優れた一般化能力を有する。 MM-RetinalとKeepFITはhttps://github.com/lxirich/MM-Retinalで入手できる。 Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, we present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise, called KeepFIT. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge. Our proposed fundus foundation model achieves state-of-the-art performance across six unseen downstream tasks and holds excellent generalization ability in zero-shot and few-shot scenarios. MM-Retinal and KeepFIT are available at https://github.com/lxirich/MM-Retinal.	翻訳日:2024-05-21 14:13:43 公開日:2024-05-20
# ViViD:拡散モデルを用いたビデオバーチャルトライオン ViViD: Video Virtual Try-on using Diffusion Models ( http://arxiv.org/abs/2405.11794v1 ) ライセンス: Link先を確認	Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha,	(参考訳) Video Virtual try-onは、服のアイテムを対象者のビデオに転送することを目的としている。画像ベーストライオンの技法をフレームワイズで直接適用すると、時間的に一貫性のない結果が生じるが、従来のビデオベーストライオンソリューションでは、視覚的品質が低く、ぼやけた結果しか得られない。本稿では,ビデオ仮想試行の課題に取り組むために,強力な拡散モデルを用いた新しいフレームワークViViDを提案する。具体的には、Garment Encoderを設計し、細粒度の衣服のセマンティックな特徴を抽出し、提案した注目特徴融合機構を通じて、被服の詳細を捕捉し、対象映像に注入するモデルを導出する。空間的時間的整合性を確保するために,ポーズ信号を符号化する軽量なPose Encoderを導入し,衣服と姿勢の相互作用を学習し,階層型時間モジュールをテキストから画像への安定拡散モデルに挿入することで,よりコヒーレントでライフライクなビデオ合成を実現する。さらに、最も多様な種類の衣服と、ビデオバーチャルトライオンのタスクのための最高の解像度を備えた、最大規模のデータセットを収集する。大規模な実験により,本手法は良好なビデオ試行結果が得られることが示された。データセット、コード、ウェイトが公開される。プロジェクトページ: https://becauseimbatman0.github.io/ViViD。 Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# 時系列量子生成モデルの財務データへの応用 Application of time-series quantum generative model to financial data ( http://arxiv.org/abs/2405.11795v1 ) ライセンス: Link先を確認	Shun Okumura, Masayuki Ohzeki, Masaya Abe,	(参考訳) 複数のブラウン運動と相関する系列をうまく学習する時系列の量子生成モデルを提案したが、そのモデルは適応されず、財政的な問題に対して評価されていない。本研究では,時系列生成モデルを実際の財務データに量子生成モデルとして適用した。 2つの相関時系列の将来のデータを生成し、長い短期記憶やベクトル自己回帰といった古典的手法と比較した。さらに、欠落した値を完成させるために数値実験を行った。この結果をもとに,時系列量子生成モデルの実用化について検討した。その結果,従来の手法に比べてパラメータ値が少なかった。さらに、静止データと非定常データの両方に対して量子時系列生成モデルが実現可能であった。これらの結果から,様々な時系列データに適用可能なパラメータが示唆された。 Despite proposing a quantum generative model for time series that successfully learns correlated series with multiple Brownian motions, the model has not been adapted and evaluated for financial problems. In this study, a time-series generative model was applied as a quantum generative model to actual financial data. Future data for two correlated time series were generated and compared with classical methods such as long short-term memory and vector autoregression. Furthermore, numerical experiments were performed to complete missing values. Based on the results, we evaluated the practical applications of the time-series quantum generation model. It was observed that fewer parameter values were required compared with the classical method. In addition, the quantum time-series generation model was feasible for both stationary and nonstationary data. These results suggest that several parameters can be applied to various types of time-series data.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# ウイットネス光子を用いた電子・陽電子散乱の正規化エンタングルメントエントロピー Regularized Entanglement Entropy of Electron-Positron Scattering with a Witness Photon ( http://arxiv.org/abs/2405.11799v1 ) ライセンス: Link先を確認	Shanmuka Shivashankara, Grace Gogliettino,	(参考訳) 散乱過程 $e^-e^+ \rightarrow \gamma,Z\rightarrow \mu^-\mu^+$ において、初期電子-陽電子状態に絡み合った証光子を有する正規化量子情報メトリクスを算出する。ユニタリ性は、最終密度行列とフォン・ノイマン絡み合いエントロピーの両方に現れる発散の正則化を意味する。エントロピーは不確実性やランダム性を定量化する。情報の変化、絡み合いエントロピー、ミューオンと目撃光子のヘリシティの相関関係は、等価な情報を伝達する。ムーンの予想されるヘリシティの大きさは、ヘリシティエントロピーが降るにつれて(滝)上昇する。領域、あるいは散乱断面は、ミューオンのヘリシティエントロピーと運動量エントロピーの源である。ミューオンの微分角エントロピー分布は微分角断面分布と似ており、質量エネルギーの高中心で前方の非対称性を捉えている。 Regularized quantum information metrics are calculated for the scattering process $e^-e^+ \rightarrow \gamma,Z\rightarrow \mu^-\mu^+$ that has a witness photon entangled with the initial electron-positron state. Unitarity implies the correct regularization of divergences that appear in both the final density matrix and von Neumann entanglement entropies. The entropies are found to quantify uncertainty or randomness. The variation of information, entanglement entropy, and correlation between the muon's and witness photon's helicities are found to convey equivalent information. The magnitude of the muon's expected helicity rises (falls) as the helicity entropy falls (rises). Area, or the scattering cross section, is a source of entropy for the muon's helicity entropy and momentum entropy. The muon's differential angular entropy distribution is similar to the differential angular cross section distribution, capturing the forward-backward asymmetry at high center of mass energies.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# 高等教育におけるジェネレーティブAI : 制度導入政策とガイドラインのグローバルな展望 Generative AI in Higher Education: A Global Perspective of Institutional Adoption Policies and Guidelines ( http://arxiv.org/abs/2405.11800v1 ) ライセンス: Link先を確認	Yueqiao Jin, Lixiang Yan, Vanessa Echeverria, Dragan Gašević, Roberto Martinez-Maldonado,	(参考訳) 生成型AI(GAI)を高等教育に組み込むことは,次世代のGAIリテラル学生の育成に不可欠である。しかし、グローバル・ノースとGAIの約束と課題に焦点を当てた以前の研究は、理論的なレンズが欠如していたため、国際機関の制度採用政策の徹底的な理解はいまだ残っていない。本研究は,世界6大学40校の高等教育におけるGAI導入戦略を検討するために,Diffusion of Innovations Theoryを利用する。大学政策やガイドラインに概説されたコミュニケーションチャネル,役割,責任を分析するとともに,GAIの互換性,試行性,可観測性などの特徴について考察する。その結果,大学によるGAI統合への積極的アプローチが明らかとなり,学術的完全性,教育と学習の強化,エクイティが強調された。慎重で楽観的な姿勢にもかかわらず、GAI統合の影響を評価し、より広範な利害関係者の関与を促進する効果的なコミュニケーション戦略を確立するためには、包括的な政策枠組みが必要である。本研究は、教員、学生、管理者がGAI統合を成功させる上での明確な役割と責任の重要性を強調し、教育におけるGAIの複雑さをナビゲートするための協調モデルを支援する。本研究は、政策立案者にとって、その統合のための詳細な戦略を構築する上での洞察に寄与する。 Integrating generative AI (GAI) into higher education is crucial for preparing a future generation of GAI-literate students. Yet a thorough understanding of the global institutional adoption policy remains absent, with most of the prior studies focused on the Global North and the promises and challenges of GAI, lacking a theoretical lens. This study utilizes the Diffusion of Innovations Theory to examine GAI adoption strategies in higher education across 40 universities from six global regions. It explores the characteristics of GAI innovation, including compatibility, trialability, and observability, and analyses the communication channels and roles and responsibilities outlined in university policies and guidelines. The findings reveal a proactive approach by universities towards GAI integration, emphasizing academic integrity, teaching and learning enhancement, and equity. Despite a cautious yet optimistic stance, a comprehensive policy framework is needed to evaluate the impacts of GAI integration and establish effective communication strategies that foster broader stakeholder engagement. The study highlights the importance of clear roles and responsibilities among faculty, students, and administrators for successful GAI integration, supporting a collaborative model for navigating the complexities of GAI in education. This study contributes insights for policymakers in crafting detailed strategies for its integration.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# LSEnet:ディープグラフクラスタリングのためのローレンツ構造エントロピーニューラルネットワーク LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering ( http://arxiv.org/abs/2405.11801v1 ) ライセンス: Link先を確認	Li Sun, Zhenhao Huang, Hao Peng, Yujie Wang, Chunyang Liu, Philip S. Yu,	(参考訳) グラフクラスタリングは機械学習の基本的な問題である。近年、ディープラーニング手法は最先端の成果を達成しているが、事前に定義されたクラスタ番号なしでは動作できない。このような制限は、未知のクラスタ数を持つグラフクラスタリングにおいて、より難しい問題を提起する動機となります。本稿では,グラフ情報理論(構造情報)の新たな視点からこの問題に対処することを提案する。文献では、構造情報は深層クラスタリングにはまだ導入されておらず、古典的な定義は離散的な定式化やモデル化ノードの特徴に欠ける。本研究では、まず連続領域における微分可能な構造情報(DSI)を、いくつかの理論的結果とともに定式化する。 DSIを最小化することにより、グラフ内の密結合ノードが同じ割り当てを持つ傾向にある最適なパーティショニングツリーを構築し、クラスタ構造を明らかにする。 DSIはまた、事前に定義されたクラスタ番号を必要としない、新しいグラフクラスタリングの目的として理論的に提示される。さらに、双曲空間のローレンツモデルにニューラルLSEnetを設計し、多様体値グラフ畳み込みによる構造情報にノード特徴を統合する。実グラフ上の広範な実験結果は、我々のアプローチの優位性を示している。 Graph clustering is a fundamental problem in machine learning. Deep learning methods achieve the state-of-the-art results in recent years, but they still cannot work without predefined cluster numbers. Such limitation motivates us to pose a more challenging problem of graph clustering with unknown cluster number. We propose to address this problem from a fresh perspective of graph information theory (i.e., structural information). In the literature, structural information has not yet been introduced to deep clustering, and its classic definition falls short of discrete formulation and modeling node features. In this work, we first formulate a differentiable structural information (DSI) in the continuous realm, accompanied by several theoretical results. By minimizing DSI, we construct the optimal partitioning tree where densely connected nodes in the graph tend to have the same assignment, revealing the cluster structure. DSI is also theoretically presented as a new graph clustering objective, not requiring the predefined cluster number. Furthermore, we design a neural LSEnet in the Lorentz model of hyperbolic space, where we integrate node features to structural information via manifold-valued graph convolution. Extensive empirical results on real graphs show the superiority of our approach.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# ウェアラブルセンサを用いた非現実的説明に基づくバドミントン動作誘導 Counterfactual Explanation-Based Badminton Motion Guidance Generation Using Wearable Sensors ( http://arxiv.org/abs/2405.11802v1 ) ライセンス: Link先を確認	Minwoo Seong, Gwangbin Kim, Yumin Kang, Junhyuk Jang, Joseph DelPreto, SeungJun Kim,	(参考訳) 本研究では,マルチモーダルウェアラブルデータセットを用いて,パーソナライズされた動作ガイドを生成することにより,バドミントン選手のストローク品質を向上させる枠組みを提案する。これらのガイドは、逆ファクトアルゴリズムに基づいており、初心者と熟練者の間のパフォーマンスギャップを減らすことを目的としている。本手法は,専門家の知識を必要とせず,選手の動作改善を支援するために,可視化可能なデータを通じて共同レベルのガイダンスを提供する。本手法は,算術的尺度や運動特異的評価指標を含む有効性,近接性,妥当性を評価するために,従来のアルゴリズムに対して評価を行った。提案手法は,ストロークの質を高めつつも,動きの本質を維持できる動きを生成できることを示す。その結果, バドミントンストロークの任意の入力動作サンプルに対して, 対実動作誘導を生成することで, パーソナライズされたスポーツ運動ガイドを作成するためのアプローチの可能性を強調した。 This study proposes a framework for enhancing the stroke quality of badminton players by generating personalized motion guides, utilizing a multimodal wearable dataset. These guides are based on counterfactual algorithms and aim to reduce the performance gap between novice and expert players. Our approach provides joint-level guidance through visualizable data to assist players in improving their movements without requiring expert knowledge. The method was evaluated against a traditional algorithm using metrics to assess validity, proximity, and plausibility, including arithmetic measures and motion-specific evaluation metrics. Our evaluation demonstrates that the proposed framework can generate motions that maintain the essence of original movements while enhancing stroke quality, providing closer guidance than direct expert motion replication. The results highlight the potential of our approach for creating personalized sports motion guides by generating counterfactual motion guidance for arbitrary input motion samples of badminton strokes.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# (おそらく)人間翻訳を超えて:超長文翻訳のための多言語共同作業 (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts ( http://arxiv.org/abs/2405.11804v1 ) ライセンス: Link先を確認	Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang,	(参考訳) 機械翻訳(MT)の最近の進歩は、様々な領域にわたる翻訳品質を大幅に向上させた。しかし、文語の翻訳は、複雑な言語、図形表現、文化的なニュアンスのために、依然として困難な課題である。本研究では,翻訳作業の複雑な要求に対処するため,複数のエージェントの集合的能力を活用することによって,従来の翻訳出版プロセスを反映するTransAgentsという企業として実装された,文語翻訳のための大規模言語モデル(LLM)に基づく新しいマルチエージェントフレームワークを提案する。本システムの有効性を評価するため,モノリンガル・ヒューマン・プライス(MHP)とバイリンガル・LLM・プライス(BLP)の2つの革新的な評価戦略を提案する。 MHPは対象言語の単言語読み手の観点から翻訳を評価し、BLPは翻訳を元のテキストと直接比較するために高度なLLMを使用している。実証的な結果は、低d-BLEUスコアにもかかわらず、TransAgentsからの翻訳は、人間による参照よりも人間による評価とLLMの両方、特にドメイン固有の知識を必要とするジャンルにおいて好まれていることを示している。また,ケーススタディを通じてTransAgentsの強みと限界を強調し,今後の研究の方向性を提案する。 Recent advancements in machine translation (MT) have significantly enhanced translation quality across various domains. However, the translation of literary texts remains a formidable challenge due to their complex language, figurative expressions, and cultural nuances. In this work, we introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents, which mirrors traditional translation publication process by leveraging the collective capabilities of multiple agents, to address the intricate demands of translating literary works. To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP). MHP assesses translations from the perspective of monolingual readers of the target language, while BLP uses advanced LLMs to compare translations directly with the original texts. Empirical findings indicate that despite lower d-BLEU scores, translations from TransAgents are preferred by both human evaluators and LLMs over human-written references, particularly in genres requiring domain-specific knowledge. We also highlight the strengths and limitations of TransAgents through case studies and suggests directions for future research.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# Distill-then-prune:エッジデバイス上でのリアルタイムステレオマッチングネットワークのための効率的な圧縮フレームワーク Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices ( http://arxiv.org/abs/2405.11809v1 ) ライセンス: Link先を確認	Baiyu Pan, Jichao Jiao, Jianxing Pang, Jun Cheng,	(参考訳) 近年,リアルタイムステレオマッチング法が数多く導入されているが,精度は低いことが多い。これらの手法は、新しいモジュールの導入や従来のメソッドの統合によって精度の向上を試みる。しかし、改善は控えめなだけである。本稿では, 知識蒸留とモデルプルーニングを取り入れた新しい手法を提案し, 速度と精度のトレードオフを克服する。その結果,エッジデバイス上で高い精度を実現しつつ,リアルタイム性能を維持するモデルが得られた。提案手法は3つの重要なステップを含む。まず、これらの効率的なモデルから冗長なモジュールを除去し、それらのコントリビューションを比較することによって、最先端の手法をレビューし、軽量モデルの設計を行う。次に,教師としての効率的なモデルを利用して,知識を軽量モデルに抽出する。最後に、我々は、最終モデルを得るために、軽量モデルを体系的に訓練する。 Sceneflow と KITTI の2つの広く使われているベンチマークで行った広範な実験を通じて,各モジュールの有効性を解析し,その結果を提示する。 In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. As a result, we obtained a model that maintains real-time performance while delivering high accuracy on edge devices. Our proposed method involves three key steps. Firstly, we review state-of-the-art methods and design our lightweight model by removing redundant modules from those efficient models through a comparison of their contributions. Next, we leverage the efficient model as the teacher to distill knowledge into the lightweight model. Finally, we systematically prune the lightweight model to obtain the final model. Through extensive experiments conducted on two widely-used benchmarks, Sceneflow and KITTI, we perform ablation studies to analyze the effectiveness of each module and present our state-of-the-art results.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# FedCAda: 加速された安定したフェデレーションラーニングのための適応的なクライアントサイド最適化 FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning ( http://arxiv.org/abs/2405.11811v1 ) ライセンス: Link先を確認	Liuzhi Zhou, Yu He, Kun Zhai, Xiang Liu, Sen Liu, Xingjun Ma, Guangnan Ye, Yu-Gang Jiang, Hongfeng Chai,	(参考訳) フェデレートラーニング(FL)は、データプライバシを保ちながら、分散クライアント全体にわたる機械学習モデルの協調トレーニングにおいて、顕著なアプローチとして登場した。しかし、加速と安定性のバランスを取ろうとする試みは、特にクライアント側においてFLにおいて重要な課題となっている。本稿では,この課題に対処するために設計された,革新的なクライアント適応アルゴリズムであるFedCAdaを紹介する。 FedCAdaはAdamアルゴリズムを利用して、クライアント側の第1モーメント推定$m$と第2モーメント推定$v$の補正プロセスを調整し、サーバ側の適応アルゴリズムパラメータを集約し、安定性と性能を確保しつつ収束速度と通信効率を向上する。さらに、異なる調整関数を組み込んだいくつかのアルゴリズムについて検討する。この比較分析により、フェデレート学習の初期段階において、他のクライアントからのクライアントモデルに含まれる限られた情報により、適応アルゴリズムのパラメータにより実質的な制約が課されることが判明した。フェデレーション学習が進行し、クライアントがよりグローバルな情報を集めるにつれ、FedCAdaは適応パラメータへの影響を徐々に減少させていく。これらの知見は、アルゴリズム改善の堅牢性と効率を高めるための洞察を与える。コンピュータビジョン(CV)と自然言語処理(NLP)データセットに関する広範な実験を通じて、FedCAdaは適応性、収束性、安定性、全体的なパフォーマンスにおいて最先端の手法よりも優れていることを示した。この研究は、連合学習のための適応アルゴリズムに寄与し、さらなる探索を奨励する。 Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. FedCAda leverages the Adam algorithm to adjust the correction process of the first moment estimate $m$ and the second moment estimate $v$ on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# 非線形リンドブラッドマスター方程式とポスト選択皮膚効果 Nonlinear Lindblad Master Equation and Postselected Skin Effect ( http://arxiv.org/abs/2405.11812v1 ) ライセンス: Link先を確認	Yu-Guo Liu, Shu Chen,	(参考訳) リンドブラッドマスター方程式は、リンドブラッドマスター方程式と有効非エルミート・ハミルトニアンによって支配される動的方程式とを連続的に補間する。非線型リンドブラッド・マスター方程式の枠組みの中で、いくつかの量子ジャンプ項が後処理によって破棄される限り、一方の粒子の蓄積を特徴とする定常状態の分布を特徴とする、原型モデルとポストセレクテッドスキン効果の存在を実証する。さらに, トラジェクティブ平均エンタングルメントエントロピーは, 環境とポストセレクションの異なる影響を反映し, 短鎖の代数的成長と長鎖の飽和にともなう特殊分布を呈することを示した。 We introduce a non-linear Lindblad master equation to describe the postselection dynamics of open quantum systems described by the Lindblad master equation, which continuously interpolates between the Lindblad master equation and the dynamical equation governed by an effective non-Hermitian Hamiltonian. Within the framework of the non-linear Lindblad master equation, we study a prototypical model and demonstrate the existence of the postselected skin effect with the distribution of a steady state characterized by the accumulation of particles on one side, as long as some quantum jumping terms are discarded by postselction processes. Moreover, we show that the trajectory-averaged entanglement entropy can reflect the different influences from the environment and postselection, and unveil it exhibiting a special distribution with algebraic growth in the short chain and saturation in the long chain induced by the postselected skin effect.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# ナスカ世界遺産へのクリマティック・人類起源のハザード:リモートセンシング、AI、洪水モデルの適用 Climatic & Anthropogenic Hazards to the Nasca World Heritage: Application of Remote Sensing, AI, and Flood Modelling ( http://arxiv.org/abs/2405.11814v1 ) ライセンス: Link先を確認	Masato Sakai, Marcus Freitag, Akihisa Sakurai, Conrad M Albrecht, Hendrik F Hamann,	(参考訳) ペルーのユネスコの世界遺産にあるナスカ地形の保存は、自然と人間の影響が加速するにつれて急務である。フラッシュフルートのようなより頻繁な天候は、ナスカの人工物を脅かす。我々は、(サブ)メートルスケールに基づく流出モデル、LiDAR由来のデジタル標高データにより、浸食の危険があるAI検出ジオグリフをハイライトできることを実証した。我々は、パンアメリカン・ハイウェイに近い有名な「リザード」、「ツリー」および「ハンド」ジオグリフを守るために緩和策を推奨する。 Preservation of the Nasca geoglyphs at the UNESCO World Heritage Site in Peru is urgent as natural and human impact accelerates. More frequent weather extremes such as flashfloods threaten Nasca artifacts. We demonstrate that runoff models based on (sub-)meter scale, LiDAR-derived digital elevation data can highlight AI-detected geoglyphs that are in danger of erosion. We recommend measures of mitigation to protect the famous "lizard", "tree", and "hand" geoglyphs located close by, or even cut by the Pan-American Highway.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# マルチ環境メタラーニングによるCSIに基づく位置決めのための移動学習 Transfer Learning for CSI-based Positioning with Multi-environment Meta-learning ( http://arxiv.org/abs/2405.11816v1 ) ライセンス: Link先を確認	Anastasios Foliadis, Mario H. Castañeda, Richard A. Stirling-Gallacher, Reiner S. Thomä,	(参考訳) チャネル状態情報(CSI)指紋によるユーザ機器(UE)の無線位置決めのための深層学習(DL)技術の利用は大きな可能性を秘めている。 DLモデルは、特定の環境のCSI指紋から複雑な特徴を抽出し、UEの位置を正確に予測することができる。それでも、CSIフィンガープリントでトレーニングされたDLモデルの有効性は、特定のトレーニング環境に大きく依存しており、異なる環境におけるトレーニングされたモデルの適用性を制限している。本稿では,2つの部分からなる新しいDLモデル構造を提案する。第1部は特定の環境から独立な特徴を特定することを目的としており,第2部は環境特異的な特徴と位置決めの目的を組み合わせている。このような2部構成のモデルをトレーニングするために,第1部ではマルチ環境メタラーニング(MEML)アプローチを提案し,第2部では特定の環境のデータのみに基づいてトレーニングを行う。その結果,新しい未確認環境におけるDLモデルの重み付けを初期化するためのMEML手法を用いることで,新たなターゲット環境におけるUE位置決めの精度が向上し,不確実性評価の信頼性が向上することが示唆された。本手法は、環境間の直接移動学習(DTL)や、新しい環境からのデータをゼロから完全に学習するなど、従来の移動学習方法よりも優れている。提案手法は,LOS(Line-of-sight)環境とNLOS(Non-LOS)環境での実測値を用いて検証する。 Utilizing deep learning (DL) techniques for radio-based positioning of user equipment (UE) through channel state information (CSI) fingerprints has demonstrated significant potential. DL models can extract complex characteristics from the CSI fingerprints of a particular environment and accurately predict the position of a UE. Nonetheless, the effectiveness of the DL model trained on CSI fingerprints is highly dependent on the particular training environment, limiting the trained model's applicability across different environments. This paper proposes a novel DL model structure consisting of two parts, where the first part aims at identifying features that are independent from any specific environment, while the second part combines those features in an environment specific way with the goal of positioning. To train such a two-part model, we propose the multi-environment meta-learning (MEML) approach for the first part to facilitate training across various environments, while the second part of the model is trained solely on data from a specific environment. Our findings indicate that employing the MEML approach for initializing the weights of the DL model for a new unseen environment significantly boosts the accuracy of UE positioning in the new target environment as well the reliability of its uncertainty estimation. This method outperforms traditional transfer learning methods, whether direct transfer learning (DTL) between environments or completely training from scratch with data from a new environment. The proposed approach is verified with real measurements for both line-of-sight (LOS) and non-LOS (NLOS) environments.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# ChatGPTを用いた医療システム工学の体系的検討 Systematic Review on Healthcare Systems Engineering utilizing ChatGPT ( http://arxiv.org/abs/2405.11817v1 ) ライセンス: Link先を確認	Jungwoo Kim, Ji-Su Lee, Huijae Kim, Taesik Lee,	(参考訳) 本稿では、最近の言語モデルにおける最先端のツールであるChatGPTを用いて、医療システム工学の分野における学術的レビューを行うための分析フレームワークを提案する。講演の要約文9,809節を用いて,分野を体系的に検討した。このフレームワークは、それぞれがカスタマイズされたプロンプトとChatGPT APIの体系的な使用を使用して、異なる分析プロセスで構成されている。この枠組みを通じて,対象分野を11のトピックカテゴリに分類し,年次傾向と詳細なサブカテゴリを包括的に分析した。この取り組みは、ChatGPTを活用して学術レビューの負担を軽減する可能性を探るものである。さらに、医療システム工学研究のダイナミックな景観に関する貴重な洞察を提供する。 This paper presents an analytical framework for conducting academic reviews in the field of Healthcare Systems Engineering, employing ChatGPT, a state-of-the-art tool among recent language models. We utilized 9,809 abstract paragraphs from conference presentations to systematically review the field. The framework comprises distinct analytical processes, each employing tailored prompts and the systematic use of the ChatGPT API. Through this framework, we organized the target field into 11 topic categories and conducted a comprehensive analysis covering quantitative yearly trends and detailed sub-categories. This effort explores the potential for leveraging ChatGPT to alleviate the burden of academic reviews. Furthermore, it provides valuable insights into the dynamic landscape of Healthcare Systems Engineering research.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# Beyond MLE:低リソースニューラルネットワーク翻訳のためのSEARNNの調査 Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation ( http://arxiv.org/abs/2405.11819v1 ) ライセンス: Link先を確認	Chris Emezue,	(参考訳) 機械翻訳のような構造化予測タスクには、構造化された入力を構造化された出力にマッピングする学習機能が含まれる。リカレントニューラルネットワーク(Recurrent Neural Networks, RNN)は、自然言語処理(NLP)アプリケーションなど、歴史的にそのようなタスクに人気がある。しかし、MLE(Maximum Likelihood Estimation)を使用したRNNのトレーニングには、露出バイアスやトレーニングとテストのメトリクスのミスマッチなど、制限がある。 SEARNNは、L2S(Learning to Search)フレームワークに基づいて、MLEに代わるRNNトレーニングとして提案されている。このプロジェクトでは、低リソースのアフリカの言語に対する機械翻訳を改善するSEARNNの可能性について検討した。英語のIgbo、フランス語のShaewe、フランス語のShaghomalaへの翻訳実験を通じて、このプロジェクトはこれらの言語がもたらす固有の課題に対処する上で、MLEに対するSEARNNの有効性を評価した。 MLE の目標に対して平均 BLEU スコアが 5.4$\% 改善されていることから,SEARNN は低リソース言語に対する機械翻訳において,RNN を効果的に訓練するためのアルゴリズムとして有効であることが証明された。 Structured prediction tasks, like machine translation, involve learning functions that map structured inputs to structured outputs. Recurrent Neural Networks (RNNs) have historically been a popular choice for such tasks, including in natural language processing (NLP) applications. However, training RNNs using Maximum Likelihood Estimation (MLE) has its limitations, including exposure bias and a mismatch between training and testing metrics. SEARNN, based on the learning to search (L2S) framework, has been proposed as an alternative to MLE for RNN training. This project explored the potential of SEARNN to improve machine translation for low-resourced African languages -- a challenging task characterized by limited training data availability and the morphological complexity of the languages. Through experiments conducted on translation for English to Igbo, French to \ewe, and French to \ghomala directions, this project evaluated the efficacy of SEARNN over MLE in addressing the unique challenges posed by these languages. With an average BLEU score improvement of $5.4$\% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# 機械学習による藻と木材の共熱分解における相乗効果の3相解析 A Three-Phase Analysis of Synergistic Effects During Co-pyrolysis of Algae and Wood for Biochar Yield Using Machine Learning ( http://arxiv.org/abs/2405.11821v1 ) ライセンス: Link先を確認	Subhadeep Chakrabarti, Saish Shinde,	(参考訳) 熱分解技術は、プラスチック、木材、作物の残留物、果物の皮など、天然および人工のバイオマス製品を効果的に利用するための画期的な技術である。近年の進歩は、異なるバイオマスを一定の割合で混合することにより、バイオチャー、バイオオイル、その他の非凝縮性ガスのような必須成分の収率を高めている。藻類と木質バイオマスの共熱分解を併用した2つの熱分解原料の相乗効果を系統的に研究し, 共熱分解の3つの相に分類した。 ML と DL の異なるアルゴリズムは,2 つの異なるバイオマスの相乗効果が生物果樹収に与える影響を網羅的に概観するために,回帰法と分類法に利用されてきた。第1段階では, 完全MSEスコア0.00の決定木回帰器と, 勾配ブースティング回帰器を用いて, バイオチャー収率の最良の予測値を得た。第2相をML法とDL法の両方を用いて解析した。 ML内では,DNNをディープラーニング技術として用いた精度スコア0.972のSVRが最も便利なモデルであることが判明した。最後に, 第3相において, バイオチャー収率を40%以下とした場合, 加熱速度を伴わない2次分類をバイオチャー収率に適用した。 MLの最良のテクニックはSupport Vectorで、次にRandom forestが続き、ANNが最も適したDeep Learning Techniqueだった。 Pyrolysis techniques have served to be a groundbreaking technique for effectively utilising natural and man-made biomass products like plastics, wood, crop residue, fruit peels etc. Recent advancements have shown a greater yield of essential products like biochar, bio-oil and other non-condensable gases by blending different biomasses in a certain ratio. This synergy effect of combining two pyrolytic raw materials i.e co-pyrolysis of algae and wood biomass has been systematically studied and grouped into 3 phases in this research paper-kinetic analysis of co-pyrolysis, correlation among proximate and ultimate analysis with bio-char yield and lastly grouping of different weight ratios based on biochar yield up to a certain percentage. Different ML and DL algorithms have been utilized for regression and classification techniques to give a comprehensive overview of the effect of the synergy of two different biomass materials on biochar yield. For the first phase, the best prediction of biochar yield was obtained by using a decision tree regressor with a perfect MSE score of 0.00, followed by a gradient-boosting regressor. The second phase was analyzed using both ML and DL techniques. Within ML, SVR proved to be the most convenient model with an accuracy score of 0.972 with DNN employed for deep learning technique. Finally, for the third phase, binary classification was applied to biochar yield with and without heating rate for biochar yield percentage above and below 40%. The best technique for ML was Support Vector followed by Random forest while ANN was the most suitable Deep Learning Technique.	翻訳日:2024-05-21 14:03:49 公開日:2024-05-20
# FeTT:特徴変換チューニングによる連続的な授業インクリメンタルラーニング FeTT: Continual Class Incremental Learning via Feature Transformation Tuning ( http://arxiv.org/abs/2405.11822v1 ) ライセンス: Link先を確認	Sunyuan Qiang, Xuxin Lin, Yanyan Liang, Jun Wan, Du Zhang,	(参考訳) 継続学習(CL)は、静的で囲われた環境から動的で複雑なシナリオへ、より深いモデルを拡張することを目的としており、システムは、以前に学習した知識を忘れずに、新しいカテゴリの新しい知識を継続的に取得できる。最近のCLモデルは、パラメータ効率の細かい調整(PEFT)戦略による事前学習モデル(PTM)の利用に徐々に移行している。しかし, 連続的な微調整は, 従来のタスクデータが欠如していることから, 破滅的な忘れ込みが深刻な課題となっている。さらに、ファインチューン・テン・フリーズ機構は、最初のCLタスクにおける特徴チャネルの抑制と不十分なトレーニングデータによる性能制限に悩まされる。そこで本研究では, CLトレーニングデータから独立して動作するだけでなく, 過剰な抑制を防止するために, 特徴チャネルを円滑にする機能変換チューニング(FeTT)モデルを提案する。そして,FeTTモデルに異なるPTMを組み込んだ拡張アンサンブル戦略により,さらなる性能向上が期待できる。さらに, クラス境界分布と特徴チャネルの相違点の観点から, ファインチューン・テン・フリーズパラダイムとFeTTモデルの議論について詳しく述べる。 CLベンチマークの大規模な実験により,提案手法の有効性が検証された。 Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with parameter-efficient fine-tuning (PEFT) strategies. However, continual fine-tuning still presents a serious challenge of catastrophic forgetting due to the absence of previous task data. Additionally, the fine-tune-then-frozen mechanism suffers from performance limitations due to feature channels suppression and insufficient training data in the first CL task. To this end, this paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks, which not only operates independently of CL training data but also smooths feature channels to prevent excessive suppression. Then, the extended ensemble strategy incorporating different PTMs with FeTT model facilitates further performance improvement. We further elaborate on the discussions of the fine-tune-then-frozen paradigm and the FeTT model from the perspectives of discrepancy in class marginal distributions and feature channels. Extensive experiments on CL benchmarks validate the effectiveness of our proposed method.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# 光場画像再構成のためのdpMVからデュアルピクセルへのステレオ知識蒸留 Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction ( http://arxiv.org/abs/2405.11823v1 ) ライセンス: Link先を確認	Aryan Garg, Raghav Mallampali, Akshat Joshi, Shrisudhan Govindarajan, Kaushik Mitra,	(参考訳) デュアルピクセルは、デフォーカスぼけから生じる不透明な手がかりを含む。この異質な情報は、自動運転から3Dクリエイティブリアリズムまで、多くのビジョンタスクに役立ちます。しかし、デュアルピクセルとの差を直接推定するのは正確ではない。この研究は、暗黙的または明示的に、高精度な暗黒ステレオ知識を効率の良いデュアルピクセルの学生ネットワークに蒸留することで、忠実な再構築を可能にするという仮説を立てた。このダークナレッジ蒸留は、パラメータと推論時間効率を劇的に増加させながら、ステレオ同期セットアップとキャリブレーションコストを緩和する。暗黒知識蒸留仮説を検証するため,第1,第1,第2の2画素ビデオデータセットdpMVを収集した。これらの手法は純粋に単分子解よりも優れており、特にデュアルピクセルからの忠実なガイダンスを用いて、前景と背景の分離に挑戦する。最後に,dpMVによるアンロックと暗黙の暗黙の知識蒸留を,光電場(LF)ビデオ再構成のための教師のアンサンブルから示す。我々のLFビデオ再構成法は,現在までに最も高速かつ時間的に一貫性がある。高パラメータ効率、暗黙の非閉塞処理、ゼロショットのクロスデータセット転送、高次空間角分解能の幾何的一貫した推論、適応的ベースライン制御など、多くの重要な特性を提供する一方で、再現性には競争力がある。すべてのソースコードは匿名リポジトリhttps://github.com/Aryan-Garg.comで入手できる。 Dual pixels contain disparity cues arising from the defocus blur. This disparity information is useful for many vision tasks ranging from autonomous driving to 3D creative realism. However, directly estimating disparity from dual pixels is less accurate. This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks enables faithful reconstructions. This dark knowledge distillation should also alleviate stereo-synchronization setup and calibration costs while dramatically increasing parameter and inference time efficiency. We collect the first and largest 3-view dual-pixel video dataset, dpMV, to validate our explicit dark knowledge distillation hypothesis. We show that these methods outperform purely monocular solutions, especially in challenging foreground-background separation regions using faithful guidance from dual pixels. Finally, we demonstrate an unconventional use case unlocked by dpMV and implicit dark knowledge distillation from an ensemble of teachers for Light Field (LF) video reconstruction. Our LF video reconstruction method is the fastest and most temporally consistent to date. It remains competitive in reconstruction fidelity while offering many other essential properties like high parameter efficiency, implicit disocclusion handling, zero-shot cross-dataset transfer, geometrically consistent inference on higher spatial-angular resolutions, and adaptive baseline control. All source code is available at the anonymous repository https://github.com/Aryan-Garg.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# 開量子系におけるフォン・ノイマンエントロピーの時間発展 Time evolution of the von Neumann entropy in open quantum system ( http://arxiv.org/abs/2405.11824v1 ) ライセンス: Link先を確認	Kohei Kobayashi,	(参考訳) オープン量子力学の制御は量子技術の実現に大きな関心を持つ。したがって、デコヒーレンスの下でのオープン量子系のエントロピーを定量化し、特徴付けることは重要なタスクである。本稿では、リンドブラッドマスター方程式によって記述された開量子系に対するフォン・ノイマンエントロピーの時間発展について研究する。特に、デコヒーレンスが系の可観測性の測定に対応するとき、フォン・ノイマンのエントロピーは分散が大きくなるにつれて単調に増加する傾向があることに注意されたい。さらに、フォン・ノイマンエントロピーの下界を長時間の極限に示す。この下界は直接計算され、一般的なマルコフ開量子系に適用できるという利点がある。 Control of open quantum dynamics is of great interest for realizing quantum technologies. Therefore, it is an important task to quantify and characterize the entropy for open quantum systems under decoherence. In this paper, we study the time evolution of the von Neumann entropy for open quantum systems described by the Lindblad master equation. Note that, in particular, when the decoherence corresponds to the measurement for the observable in the system, the von Neumann entropy tends to monotonically increases as the variance becomes larger. Furthermore, we present a lower bound of the von Neumann entropy in the long-time limit. This lower bound has advantages of being straightforwardly calculated and applicable to a general Markovian open quantum system.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# AIベースの競争プラットフォームにおける技術的負債の測定 Measuring Technical Debt in AI-Based Competition Platforms ( http://arxiv.org/abs/2405.11825v1 ) ライセンス: Link先を確認	Dionysios Sklavenitis, Dimitris Kalles,	(参考訳) AIの進歩は、ソフトウェアエンジニアリングプロジェクトにおける新しいタイプの技術的負債につながった。 AIベースの競争プラットフォームは、迅速なプロトタイピングと、参加者によるソフトウェアエンジニアリング原則の遵守の欠如により、技術的負債が発生しているため、課題に直面している。さらに、オーガナイザはプラットフォームの品質を評価する方法がなく、持続可能性や保守性に影響を与えます。本研究では,スクーピングレビューを通じて,AIシステムにおける技術的負債の種類を特定し,分類する。我々は,AIコンペティションプラットフォームにおける技術的負債の評価,アルゴリズム,アーキテクチャ,コード,構成,データなど,さまざまなタイプの負債を分類するアンケートを開発する。 AIコンペティションプラットフォームに特化したアクセシビリティ負債を導入し、不適切なプラットフォームのユーザビリティのために参加者が直面する課題を強調します。技術的負債を管理するためのフレームワークは、これらのプラットフォームの持続可能性と有効性を改善し、研究者、オーガナイザ、参加者にツールを提供することを目的としています。 Advances in AI have led to new types of technical debt in software engineering projects. AI-based competition platforms face challenges due to rapid prototyping and a lack of adherence to software engineering principles by participants, resulting in technical debt. Additionally, organizers often lack methods to evaluate platform quality, impacting sustainability and maintainability. In this research, we identify and categorize types of technical debt in AI systems through a scoping review. We develop a questionnaire for assessing technical debt in AI competition platforms, categorizing debt into various types, such as algorithm, architectural, code, configuration, data etc. We introduce Accessibility Debt, specific to AI competition platforms, highlighting challenges participants face due to inadequate platform usability. Our framework for managing technical debt aims to improve the sustainability and effectiveness of these platforms, providing tools for researchers, organizers, and participants.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# 不完全なセンシングモダリティによるフェデレーション学習 Federated Learning with Incomplete Sensing Modalities ( http://arxiv.org/abs/2405.11828v1 ) ライセンス: Link先を確認	Adiba Orzikulova, Jaehyun Kwak, Jaemin Shin, Sung-Ju Lee,	(参考訳) 多くのモバイルセンシングアプリケーションは、モバイルおよびウェアラブルデバイスにおけるモーションや生理的センサーなど、さまざまなモダリティのデータを活用している。フェデレートラーニング(FL)は、プライバシー保護機能のおかげで、これらのアプリケーションに特に適しています。しかし、バッテリ寿命の制限、ネットワーク条件の低さ、センサーの故障といった課題は、ローカルモデルトレーニングで利用可能なすべてのモダリティの使用を制限する可能性がある。さらに、既存のマルチモーダルFLシステムは、モダリティ源の数が増えるにつれてスケーラビリティと効率性にも苦慮している。これらの問題に対処するため,不完全なマルチモーダルFLを実現するためのフレームワークであるFLISMを紹介する。 FLISMはシミュレーション手法を利用して、欠落したモダリティを処理できる堅牢な表現を学習し、様々なモダリティセットを持つクライアント間でモデル知識を転送する。 3つの実世界のデータセットとシミュレーションによる評価結果は、FLISMのモデル性能とシステム効率の効果的なバランスを示す。 F1スコアでの.067の平均的な改善に加えて、通信(2.69倍高速)と計算(2.28倍高速)のオーバーヘッドを、不完全なモダリティに対処する既存の方法と比較して削減している。さらに、多くのモダリティを持つタスクを含むシミュレーションシナリオでは、FLISMは通信速度が3.23x~85.10x、計算効率が3.73x~32.29xである。 Many mobile sensing applications utilize data from various modalities, including motion and physiological sensors in mobile and wearable devices. Federated Learning (FL) is particularly suitable for these applications thanks to its privacy-preserving feature. However, challenges such as limited battery life, poor network conditions, and sensor malfunctions can restrict the use of all available modalities for local model training. Additionally, existing multimodal FL systems also struggle with scalability and efficiency as the number of modality sources increases. To address these issues, we introduce FLISM, a framework designed to enable multimodal FL with incomplete modalities. FLISM leverages simulation technique to learn robust representations that can handle missing modalities and transfers model knowledge across clients with varying set of modalities. The evaluation results using three real-world datasets and simulations demonstrate FLISM's effective balance between model performance and system efficiency. It shows an average improvement of .067 in F1-score, while also reducing communication (2.69x faster) and computational (2.28x more efficient) overheads compared to existing methods addressing incomplete modalities. Moreover, in simulated scenarios involving tasks with a larger number of modalities, FLISM achieves a significant speedup of 3.23x~85.10x in communication and 3.73x~32.29x in computational efficiency.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# Adversarially Diversified Rehearsal Memory (ADRM):連続学習におけるメモリ過剰化課題の緩和 Adversarially Diversified Rehearsal Memory (ADRM): Mitigating Memory Overfitting Challenge in Continual Learning ( http://arxiv.org/abs/2405.11829v1 ) ライセンス: Link先を確認	Hikmat Khan, Ghulam Rasool, Nidhal Carla Bouaynaya,	(参考訳) 継続的な学習は、それまでの知識を忘れずに、静止しないデータ分布を学習することに焦点を当てる。リハーサルベースのアプローチは、破滅的な忘れに対処するために一般的に使用される。しかし、これらのアプローチは「リハーサルメモリオーバーフィット」と呼ばれる問題に悩まされ、モデルが限られたメモリサンプルに過度に特化し、効果的に一般化する能力を失う。その結果、リハーサル記憶の有効性は徐々に低下し、最終的には学習したタスクを破滅的に忘れてしまう。本稿では、メモリ過適合問題に対処するため、ADRM(Adversarially Diversified Rehearsal Memory)を導入する。本手法は, 自然および逆方向のノイズ破壊に対して, メモリサンプルの多様性を増進し, 耐性を高めるために設計されている。 ADRMはFGSM攻撃を使用して、逆修正されたメモリサンプルを導入し、メモリの多様性の向上と、メモリサンプルにおける連続的な機能ドリフトに対する堅牢な応答の促進という2つの主要な目的を達成する。第一に、ADRMはFGSMを用いてメモリバッファの複雑さを多様化し増大させ、リハーサルメモリに過度に適合する。第2に、ADRMはメモリ過適合を緩和し、安全クリティカルなアプリケーションに欠かせないCLモデルの堅牢性を著しく改善することを示した。最後に,ADRMがCLメモリサンプルのドリフトを緩和し,破滅的忘れを著しく低減し,より弾力性のあるCLモデルが得られることを示す。さらに,特徴分布の詳細なt-SNE可視化と特徴類似性の定量化により,既存のCLアプローチにおける特徴表現の理解を深めることができた。私たちのコードはhttps://github.com/hikmatkhan/ADRM.comで公開されています。 Continual learning focuses on learning non-stationary data distribution without forgetting previous knowledge. Rehearsal-based approaches are commonly used to combat catastrophic forgetting. However, these approaches suffer from a problem called "rehearsal memory overfitting, " where the model becomes too specialized on limited memory samples and loses its ability to generalize effectively. As a result, the effectiveness of the rehearsal memory progressively decays, ultimately resulting in catastrophic forgetting of the learned tasks. We introduce the Adversarially Diversified Rehearsal Memory (ADRM) to address the memory overfitting challenge. This novel method is designed to enrich memory sample diversity and bolster resistance against natural and adversarial noise disruptions. ADRM employs the FGSM attacks to introduce adversarially modified memory samples, achieving two primary objectives: enhancing memory diversity and fostering a robust response to continual feature drifts in memory samples. Our contributions are as follows: Firstly, ADRM addresses overfitting in rehearsal memory by employing FGSM to diversify and increase the complexity of the memory buffer. Secondly, we demonstrate that ADRM mitigates memory overfitting and significantly improves the robustness of CL models, which is crucial for safety-critical applications. Finally, our detailed analysis of features and visualization demonstrates that ADRM mitigates feature drifts in CL memory samples, significantly reducing catastrophic forgetting and resulting in a more resilient CL model. Additionally, our in-depth t-SNE visualizations of feature distribution and the quantification of the feature similarity further enrich our understanding of feature representation in existing CL approaches. Our code is publically available at https://github.com/hikmatkhan/ADRM.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# SSAMBA:マンバ状態空間モデルによる自己監督型音声表現学習 SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model ( http://arxiv.org/abs/2405.11831v1 ) ライセンス: Link先を確認	Siavash Shams, Sukru Samet Dindar, Xilin Jiang, Nima Mesgarani,	(参考訳) トランスフォーマーは、強力なモデリング能力のために、音声表現学習を含む様々なタスクにわたってディープラーニングに革命をもたらした。しかし、GPUメモリ使用量と計算推論時間の両方で2次複雑さに悩まされ、その効率に影響を及ぼすことが多い。最近、Mambaのような状態空間モデル(SSM)は、これらの複雑さを回避してより効率的なアプローチを提供する、有望な代替手段として登場した。これらの利点を踏まえ、音声タスクにおけるSSMベースのモデルの可能性について検討する。本稿では,SSAMBA(Self-Supervised Audio Mamba)について紹介する。 SSAMBAは双方向のMambaを利用して複雑なオーディオパターンを効果的にキャプチャする。我々は、識別的および生成的目的の両方を最適化し、大規模でラベルなしのデータセットから堅牢な音声表現を学習できる自己教師付き事前学習フレームワークを組み込んだ。音声分類やキーワードスポッティング,話者識別など,様々なタスクにおけるSSAMBAの評価を行った。以上の結果から,SSAMBAはSSAST(Self-Supervised Audio Spectrogram Transformer)よりも優れていることがわかった。特に、SSAMBAはバッチ推論速度で約92.7%高速で、入力トークンサイズが22kの小さなモデルサイズではSSASTよりも95.4%メモリ効率が高い。これらの効率向上と優れたパフォーマンスが相まって、SSAMBAのアーキテクチャ革新の有効性を強調し、幅広いオーディオ処理アプリケーションにとって魅力的な選択となった。 Transformers have revolutionized deep learning across various tasks, including audio representation learning, due to their powerful modeling capabilities. However, they often suffer from quadratic complexity in both GPU memory usage and computational inference time, affecting their efficiency. Recently, state space models (SSMs) like Mamba have emerged as a promising alternative, offering a more efficient approach by avoiding these complexities. Given these advantages, we explore the potential of SSM-based models in audio tasks. In this paper, we introduce Self-Supervised Audio Mamba (SSAMBA), the first self-supervised, attention-free, and SSM-based model for audio representation learning. SSAMBA leverages the bidirectional Mamba to capture complex audio patterns effectively. We incorporate a self-supervised pretraining framework that optimizes both discriminative and generative objectives, enabling the model to learn robust audio representations from large-scale, unlabeled datasets. We evaluated SSAMBA on various tasks such as audio classification, keyword spotting, and speaker identification. Our results demonstrate that SSAMBA outperforms the Self-Supervised Audio Spectrogram Transformer (SSAST) in most tasks. Notably, SSAMBA is approximately 92.7% faster in batch inference speed and 95.4% more memory-efficient than SSAST for the tiny model size with an input token size of 22k. These efficiency gains, combined with superior performance, underscore the effectiveness of SSAMBA's architectural innovation, making it a compelling choice for a wide range of audio processing applications.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# トレーニング可能なサロゲートモデルへの非線形性の導入による説明音声概念の改善 Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model ( http://arxiv.org/abs/2405.11837v1 ) ライセンス: Link先を確認	Mounes Zaval, Sedat Ozer,	(参考訳) 説明可能なAI(XAI)の進化する分野では、コンピュータビジョンタスクにおけるディープニューラルネットワーク(DNN)の決定を解釈することが重要なプロセスである。ピクセルベースのXAIメソッドは重要なピクセルの識別に重点を置いているが、既存のコンセプトベースのXAIメソッドでは事前に定義された概念や人間による注釈が付けられている。最近提案されたSegment Anything Model (SAM)は、包括的なインスタンスセグメンテーションを通じて自動概念セットを作成するための大きな一歩を踏み出した。これに基づいて、DNN決定を説明するフレキシブルな方法として、EAC(Explain Any Concept)モデルが登場した。 EACモデルは、ターゲットモデルをシミュレートする訓練可能な1つの線形層を持つ代理モデルを用いている。本稿では,元のサロゲートモデルに新たな非線形層を導入することにより,ERCモデルの性能を向上させることができることを示す。提案手法を元のERCモデルと比較し,ImageNetおよびMS COCOデータセットで得られた改善点を報告する。 In the evolving field of Explainable AI (XAI), interpreting the decisions of deep neural networks (DNNs) in computer vision tasks is an important process. While pixel-based XAI methods focus on identifying significant pixels, existing concept-based XAI methods use pre-defined or human-annotated concepts. The recently proposed Segment Anything Model (SAM) achieved a significant step forward to prepare automatic concept sets via comprehensive instance segmentation. Building upon this, the Explain Any Concept (EAC) model emerged as a flexible method for explaining DNN decisions. EAC model is based on using a surrogate model which has one trainable linear layer to simulate the target model. In this paper, by introducing an additional nonlinear layer to the original surrogate model, we show that we can improve the performance of the EAC model. We compare our proposed approach to the original EAC model and report improvements obtained on both ImageNet and MS COCO datasets.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# ソーシャルインテリジェンスの評価とモデル化:人間とAIの能力の比較研究 Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities ( http://arxiv.org/abs/2405.11841v1 ) ライセンス: Link先を確認	Junqi Wang, Chunhui Zhang, Jiapeng Li, Yuxi Ma, Lixing Niu, Jiaheng Han, Yujia Peng, Yixin Zhu, Lifeng Fan,	(参考訳) Mitchell & Krakauer, 2023; Bubeck et al , 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023) がほぼ人間に近い知能レベルを達成したかどうかに関する現在の議論の中で、現在の研究では、人間の認知の最も特徴的な側面である社会的知能を評価するためのベンチマークが紹介されている。我々は,社会力学の総合的理論的枠組みを開発し,逆推論(IR)と逆逆計画(IIP)の2つの評価タスクを導入した。また,人間の行動パターンの解明に長けた再帰的ベイズ推定に基づく計算モデルについても検討した。大規模な実験と詳細な分析により、人間は最新のGPTモデルを上回るパフォーマンス、ゼロショット学習、ワンショット一般化、マルチモダリティへの適応性を示した。特に、GPTモデルは、ヒトの社会的知能とは対照的に、最も基本的な順序(オーダー=0)でのみ社会的知能を示す。さらなる調査は、LLMがショートカットのパターン認識に頼ることの正当性を示し、真の人間レベルの社会知能の所有に疑念を抱いた。私たちのコード、データセット、付録、人間のデータはhttps://github.com/bigai-ai/Evaluate-n-Model-Social-Intelligence.comで公開されています。 Facing the current debate on whether Large Language Models (LLMs) attain near-human intelligence levels (Mitchell & Krakauer, 2023; Bubeck et al., 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023), the current study introduces a benchmark for evaluating social intelligence, one of the most distinctive aspects of human cognition. We developed a comprehensive theoretical framework for social dynamics and introduced two evaluation tasks: Inverse Reasoning (IR) and Inverse Inverse Planning (IIP). Our approach also encompassed a computational model based on recursive Bayesian inference, adept at elucidating diverse human behavioral patterns. Extensive experiments and detailed analyses revealed that humans surpassed the latest GPT models in overall performance, zero-shot learning, one-shot generalization, and adaptability to multi-modalities. Notably, GPT models demonstrated social intelligence only at the most basic order (order = 0), in stark contrast to human social intelligence (order >= 2). Further examination indicated a propensity of LLMs to rely on pattern recognition for shortcuts, casting doubt on their possession of authentic human-level social intelligence. Our codes, dataset, appendix and human data are released at https://github.com/bigai-ai/Evaluate-n-Model-Social-Intelligence.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# EPPS:エッジ情報注入と選択的特徴分離による高度なポリプセグメンテーション EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling ( http://arxiv.org/abs/2405.11846v1 ) ライセンス: Link先を確認	Mengqi Lei, Xin Wang,	(参考訳) 大腸内視鏡検査におけるポリープの正確な分画は早期大腸癌の診断と管理に不可欠である。ポリプセグメンテーションの深層学習の進歩にもかかわらず、持続的な制限は持続する。ポリプのエッジは、典型的にはあいまいであり、背景から識別することが困難であり、モデルの性能は、無関係または重要でない特徴の影響によってしばしば損なわれる。これらの課題を軽減するために,我々はエッジ・プライオライト化ポリプ・セグメンテーション (EPPS) と呼ばれる新しいモデルを提案する。具体的には,ポリプのエッジを正確に抽出することを目的としたエッジマッピングエンジン(EME)を組み込んだ。その後、捕獲されたエッジ情報をデコーダブロックに注入することにより、マスク予測を強化するためにエッジ情報インジェクタ(EII)が考案される。さらに,選択的特徴分離器(Selective Feature Decoupler,SFD)と呼ばれるコンポーネントを導入し,モデルに対するノイズや外的特徴の影響を抑える。広範に使われている3つのポリプセグメンテーションベンチマークの大規模な実験は、他の最先端手法と比較して、我々の手法の優れた性能を示している。 Accurate segmentation of polyps in colonoscopy images is essential for early-stage diagnosis and management of colorectal cancer. Despite advancements in deep learning for polyp segmentation, enduring limitations persist. The edges of polyps are typically ambiguous, making them difficult to discern from the background, and the model performance is often compromised by the influence of irrelevant or unimportant features. To alleviate these challenges, we propose a novel model named Edge-Prioritized Polyp Segmentation (EPPS). Specifically, we incorporate an Edge Mapping Engine (EME) aimed at accurately extracting the edges of polyps. Subsequently, an Edge Information Injector (EII) is devised to augment the mask prediction by injecting the captured edge information into Decoder blocks. Furthermore, we introduce a component called Selective Feature Decoupler (SFD) to suppress the influence of noise and extraneous features on the model. Extensive experiments on 3 widely used polyp segmentation benchmarks demonstrate the superior performance of our method compared with other state-of-the-art approaches.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# シーケンスモデリングのためのオルタネータ Alternators For Sequence Modeling ( http://arxiv.org/abs/2405.11848v1 ) ライセンス: Link先を確認	Mohammad Reza Rezaei, Adji Bousso Dieng,	(参考訳) 本稿では、列に対する非マルコフ力学モデルの新しいファミリである交代子について紹介する。交替器は、観測軌跡ネットワーク(OTN)と特徴軌跡ネットワーク(FTN)の2つのニューラルネットワークを備える。 OTNとFTNは共同で働き、観測空間にサンプルを出力するのと、周期的にいくつかの特徴空間を出力するのとを交互に交互に行う。 OTNとFTNのパラメータは時間依存ではなく、軌道上の最小エントロピー基準によって学習される。オルタネーターは万能である。動的潜在変数生成モデルやシーケンス・ツー・シーケンス予測モデルとして使用できる。振動子を生成モデルとして使用すると、FTNは解釈可能な低次元潜伏変数を生成し、観測を司るダイナミクスを捉える。変換器をシーケンス・ツー・シーケンス予測器として使用すると、FTNは観測された特徴を予測することを学習する。いずれの場合も、OTNはデータにマッチするシーケンスを生成することを学ぶ。オルタネータは、複雑なシーケンシャルなデータに基づく潜伏するダイナミクスを明らかにし、行方不明なデータを正確に予測し、インプットし、新しいトラジェクトリをサンプリングすることができる。 3つのアプリケーションで交換器の能力を示す。私たちは最初に、カオス的な振る舞いを記述するためにしばしば使用されるローレンツ方程式をモデル化するために、交代子を使用した。次に、脳活動を身体活動にマッピングするために、交互に神経科学に適用した。最後に, 海面温度予測に焦点をあてて, 気候科学に改質器を適用した。全ての実験において、置換体は訓練が安定であり、サンプリングが早く、高品質な生成サンプルと潜伏変数が得られ、研究領域におけるニューラルなODEや拡散モデルなどの強力なベースラインよりも優れていた。 This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. When alternators are used as generative models, the FTN produces interpretable low-dimensional latent variables that capture the dynamics governing the observations. When alternators are used as sequence-to-sequence predictors, the FTN learns to predict the observed features. In both cases, the OTN learns to produce sequences that match the data. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience, to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and outperform strong baselines such as neural ODEs and diffusion models in the domains we studied.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# 視覚言語モデルにおける見過ごされた側面の再考 Rethinking Overlooked Aspects in Vision-Language Models ( http://arxiv.org/abs/2405.11850v1 ) ライセンス: Link先を確認	Yuan Liu, Le Tian, Xiao Zhou, Jie Zhou,	(参考訳) GPT4-VやLLaVAのような大規模視覚言語モデル(LVLM)の最近の進歩は顕著である。 LLaVAのモジュラーアーキテクチャは、特に単純さと効率性をブレンドしている。最近の研究は、モデルの性能を向上させるために、事前学習と指導のチューニングデータの導入に重点を置いている。本稿では,事前学習におけるデータ効率の非無視的な側面と,訓練データセットの選択過程について述べる。我々の研究は、単に事前学習データのサイズを拡大するだけでは性能が向上せず、実際にその劣化につながる可能性を示唆している。さらに、我々は、SFTデータセットをピンポイントするパイプラインを構築し、既存の研究で活用されているすべてのSFTデータが必要ないことを示唆している。本論文の主な目的は,最先端モデルの導入ではなく,事前学習および微調整プロセスにおけるデータ使用量の最適化を目標とし,ビジョン言語モデルの性能向上を目的とした今後の研究のロードマップとして機能することである。 Recent advancements in large vision-language models (LVLMs), such as GPT4-V and LLaVA, have been substantial. LLaVA's modular architecture, in particular, offers a blend of simplicity and efficiency. Recent works mainly focus on introducing more pre-training and instruction tuning data to improve model's performance. This paper delves into the often-neglected aspects of data efficiency during pre-training and the selection process for instruction tuning datasets. Our research indicates that merely increasing the size of pre-training data does not guarantee improved performance and may, in fact, lead to its degradation. Furthermore, we have established a pipeline to pinpoint the most efficient instruction tuning (SFT) dataset, implying that not all SFT data utilized in existing studies are necessary. The primary objective of this paper is not to introduce a state-of-the-art model, but rather to serve as a roadmap for future research, aiming to optimize data usage during pre-training and fine-tuning processes to enhance the performance of vision-language models.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# ストーリーテリングの進化:拡散モデルを用いた新しいキャラクタカスタマイズのためのベンチマークと方法 Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models ( http://arxiv.org/abs/2405.11852v1 ) ライセンス: Link先を確認	Xiyu Wang, Yufei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot,	(参考訳) ストーリービジュアライゼーションのための拡散モデルでは、ストーリーテリングタスクのためのコンテンツコヒーレントな画像を生成することが期待できる。しかし、文字の一貫性を維持しつつ、新しい文字を既存の物語に効果的に統合する方法は、特に限られたデータでは未解決の問題である。 1)潜在的な文字リークと一貫性のないテキストラベリングによる適切なベンチマークがないこと,2)新しい文字と古い文字を区別することの難しさ,そして曖昧な結果をもたらすこと,である。これらの課題に対処するために、生成モデルの適応性を評価するために設計された改良データセットからなるNewEpisodeベンチマークを導入する。洗練されたデータセットは、洗練されたテキストプロンプトと文字のリークを除去する。さらに、生成した結果の文字混乱を軽減するために、新しい文字をシームレスに統合した単一ストーリーで拡散に基づくビジュアルストーリー生成モデルをカスタマイズする手法であるEpicEvoを提案する。 EpicEvoは、新たな逆キャラクタアライメントモジュールを導入し、生成した画像を拡散過程において段階的に整列させ、新しいキャラクタの模範的なイメージを付加するとともに、知識蒸留を適用して文字や背景の詳細の忘れを防止する。 EpicEvoはNewEpisodeベンチマークで既存のベースラインよりも優れており、定性的な研究により拡散モデルにおける視覚的ストーリー生成の優れたカスタマイズが確認されている。要約すると、EpicEvoは1つの例だけを使って新しいキャラクターを組み込む効果的な方法を提供する。 Diffusion-based models for story visualization have shown promise in generating content-coherent images for storytelling tasks. However, how to effectively integrate new characters into existing narratives while maintaining character consistency remains an open problem, particularly with limited data. Two major limitations hinder the progress: (1) the absence of a suitable benchmark due to potential character leakage and inconsistent text labeling, and (2) the challenge of distinguishing between new and old characters, leading to ambiguous results. To address these challenges, we introduce the NewEpisode benchmark, comprising refined datasets designed to evaluate generative models' adaptability in generating new stories with fresh characters using just a single example story. The refined dataset involves refined text prompts and eliminates character leakage. Additionally, to mitigate the character confusion of generated results, we propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics. EpicEvo introduces a novel adversarial character alignment module to align the generated images progressively in the diffusive process, with exemplar images of new characters, while applying knowledge distillation to prevent forgetting of characters and background details. Our evaluation quantitatively demonstrates that EpicEvo outperforms existing baselines on the NewEpisode benchmark, and qualitative studies confirm its superior customization of visual story generation in diffusion models. In summary, EpicEvo provides an effective way to incorporate new characters using only one example story, unlocking new possibilities for applications such as serialized cartoons.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# 配向に基づく量子絡み合いの分離性と下界 Separability and lower bounds of quantum entanglement based on realignment ( http://arxiv.org/abs/2405.11861v1 ) ライセンス: Link先を確認	Jiaxin Sun, Hongmei Yao, Shao-Ming Fei, Zhaobing Fan,	(参考訳) 量子エンタングルメントの検出と推定は、量子エンタングルメントの理論において重要な問題である。本研究では,密度行列の配向と縮密度行列のベクトル化に基づいて行列を構築し,二部体系と多部体系の分離性基準を提示する。さらに、新しい低い収束境界と凸ルーフ拡張負性率を導出する。真の三部体の絡みを検出するための基準も与えられる。真の三部体の絡み合いの収束の低い境界が提示される。より詳細な例により、我々の結果は、真の多部絡みだけでなく、量子絡みの同定と推定において、対応するものよりも優れていることを示す。 The detection and estimation of quantum entanglement are the essential issues in the theory of quantum entanglement. We construct matrices based on the realignment of density matrices and the vectorization of the reduced density matrices, from which a family of separability criteria are presented for both bipartite and multipartite systems. Moreover, new lower bounds of concurrence and convex-roof extended negativity are derived. Criteria are also given to detect the genuine tripartite entanglement. Lower bounds of the concurrence of genuine tripartite entanglement are presented. By detailed examples we show that our results are better than the corresponding ones in identifying and estimating quantum entanglement as well as genuine multipartite entanglement.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# SEMv3:テーブル分離線検出のための高速かつロバストなアプローチ SEMv3: A Fast and Robust Approach to Table Separation Line Detection ( http://arxiv.org/abs/2405.11862v1 ) ライセンス: Link先を確認	Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du,	(参考訳) テーブル構造認識(TSR)は、テーブル固有の構造を入力画像から解析することを目的としている。スプリット・アンド・マージ(split-and-merge)パラダイムは、テーブル分離線検出が不可欠であるテーブル構造を解析するための重要なアプローチである。しかし、無線やデフォルメテーブルなどの課題はそれを要求している。本稿ではスプリット・アンド・マージ(split-and-merge)パラダイムに忠実なSEMv3(Split, Embed, Merge)を提案する。分割段階ではキーポイントオフセット回帰(KOR)モジュールを導入し、キーポイント提案に対して各行のオフセットを直接回帰することでテーブル分離ラインを効果的に検出する。さらに、マージ段階では、テーブルグリッドに基づいたテーブル構造を効率的に記述するための一連のマージアクションを定義する。大規模なアブレーション実験により,提案するKORモジュールはテーブル分離線を迅速かつ正確に検出できることが示された。さらに、パブリックデータセット(例えばWTW、ICDAR-2019 cTDaR Historical、iFLYTAB)では、SEMv3は最先端(SOTA)のパフォーマンスを達成する。コードはhttps://github.com/Chunchunwumu/SEMv3.comで公開されている。 Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines. During the split stage, we introduce a Keypoint Offset Regression (KOR) module, which effectively detects table separation lines by directly regressing the offset of each line relative to its keypoint proposals. Moreover, in the merge stage, we define a series of merge actions to efficiently describe the table structure based on table grids. Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 cTDaR Historical and iFLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance. The code is available at https://github.com/Chunchunwumu/SEMv3.	翻訳日:2024-05-21 13:53:58 公開日:2024-05-20
# CoNLL#: きめ細かいエラー解析とCoNLL-03英語の修正テストセット CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English ( http://arxiv.org/abs/2405.11865v1 ) ライセンス: Link先を確認	Andrew Rueda, Elena Álvarez Mellado, Constantine Lignos,	(参考訳) 現代のエンティティ認識システムは、より大きくより強力なニューラルモデルの時代において、パフォーマンスを着実に改善している。しかし、過去数年間、最先端の言語は、ベンチマークのCoNLL-03英語データセットで別の高原に到達したようだ。本稿では,高パフォーマンスなNERモデルのテスト出力を深く掘り下げ,テストセットに新たな文書レベルのアノテーションを導入することで,その性能を詳細に評価する。我々は、NERの真の最先端を解釈し、将来の作業を導くために、エラーを分類することで、F1スコアを超えます。我々は、テストセットの様々な欠陥を修正するための以前の試みをレビューし、CoNLL#を新たに修正したテストセットを紹介し、その体系的かつ最も一般的なエラーに対処し、低ノイズで解釈可能なエラー解析を可能にする。 Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models. However, over the past several years, the state-of-the-art has seemingly hit another plateau on the benchmark CoNLL-03 English dataset. In this paper, we perform a deep dive into the test outputs of the highest-performing NER models, conducting a fine-grained evaluation of their performance by introducing new document-level annotations on the test set. We go beyond F1 scores by categorizing errors in order to interpret the true state of the art for NER and guide future work. We review previous attempts at correcting the various flaws of the test set and introduce CoNLL#, a new corrected version of the test set that addresses its systematic and most prevalent errors, allowing for low-noise, interpretable error analysis.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# センサ非依存深度推定のための深さプロンプト Depth Prompting for Sensor-Agnostic Depth Estimation ( http://arxiv.org/abs/2405.11867v1 ) ライセンス: Link先を確認	Jin-Hwi Park, Chanhwi Jeong, Junoh Lee, Hae-Gon Jeon,	(参考訳) 深度マップは視覚知覚タスクの重要な要素として使われてきた。最適化に基づくものから学習に基づくものまで、奥行きの質を高めるための膨大な努力が続けられている。長期にわたる顕著な進歩にもかかわらず、密度、センシングパターン、スキャン範囲などの体系的な測定バイアスにより、現実の世界での適用性は制限されている。偏見がこれらの手法の一般化を困難にしていることはよく知られている。直近の手法が採用している入力モダリティ(例えば画像や深さ)の合同表現の学習はバイアスに敏感であることが観察された。この研究では、これらのモダリティをアンタングルしてバイアスを軽減し、迅速なエンジニアリングを行う。そこで我々は,センサタイプとシーン構成のいずれからでも,新たな深度分布に応じて望ましい特徴表現を可能にする,新しい深度プロンプトモジュールを設計する。我々の深度プロンプトは、単分子深度推定の基礎モデルに組み込むことができる。この埋め込みにより,事前学習したモデルが深度スキャン範囲の制限を受けないようにし,絶対スケールの深度マップを提供する。提案手法の有効性を広範囲な評価により実証する。ソースコードはhttps://github.com/JinhwiPark/DepthPrompting.comで公開されている。 Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-known that the biases make it difficult for these methods to achieve their generalization. We observe that learning a joint representation for input modalities (e.g., images and depth), which most recent methods adopt, is sensitive to the biases. In this work, we disentangle those modalities to mitigate the biases with prompt engineering. For this, we design a novel depth prompt module to allow the desirable feature representation according to new depth distributions from either sensor types or scene configurations. Our depth prompt can be embedded into foundation models for monocular depth estimation. Through this embedding process, our method helps the pretrained model to be free from restraint of depth scan range and to provide absolute scale depth maps. We demonstrate the effectiveness of our method through extensive evaluations. Source code is publicly available at https://github.com/JinhwiPark/DepthPrompting .	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# グラフのコントラスト学習に向けて - 調査とその先 Towards Graph Contrastive Learning: A Survey and Beyond ( http://arxiv.org/abs/2405.11868v1 ) ライセンス: Link先を確認	Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Zhiping Xiao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long, Siyu Yi, Xiao Luo, Ming Zhang,	(参考訳) 近年,グラフの深層学習は様々な領域において顕著な成功を収めている。しかし、注釈付きグラフデータへの依存は、その禁忌なコストと時間集約的な性質のために、依然として重大なボトルネックとなっている。この課題に対処するため、グラフ上の自己教師型学習(SSL)が注目され、大きな進歩を遂げた。 SSLにより、機械学習モデルはラベルのないグラフデータから情報表現を生成でき、高価なラベル付きデータへの依存を減らすことができる。グラフ上のSSLは広く採用されているが、GCL(Graph Contrastive Learning)という重要なコンポーネントは、既存の文献では十分に研究されていない。したがって、この調査は、GCLに関する専用の調査を提供することで、このギャップを埋めることを目的としている。本稿では,データ拡張戦略,コントラストモード,コントラスト最適化目標など,GCLの基本原理を概観する。さらに、弱い教師付き学習、移動学習、関連するシナリオなど、データ効率のよいグラフ学習の他の側面へのGCLの拡張についても検討する。また、薬物発見、ゲノム解析、レコメンダシステムといった領域にまたがる実践的応用についても論じ、最終的にこの分野における課題と今後の方向性について概説する。 In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to produce informative representations from unlabeled graph data, reducing the reliance on expensive labeled data. While SSL on graphs has witnessed widespread adoption, one critical component, Graph Contrastive Learning (GCL), has not been thoroughly investigated in the existing literature. Thus, this survey aims to fill this gap by offering a dedicated survey on GCL. We provide a comprehensive overview of the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and contrastive optimization objectives. Furthermore, we explore the extensions of GCL to other aspects of data-efficient graph learning, such as weakly supervised learning, transfer learning, and related scenarios. We also discuss practical applications spanning domains such as drug discovery, genomics analysis, recommender systems, and finally outline the challenges and potential future directions in this field.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# 直感的な微調整:SFTとRLHFを単一プロセスに統合する Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process ( http://arxiv.org/abs/2405.11870v1 ) ライセンス: Link先を確認	Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, Bowen Zhou,	(参考訳) Supervised Fine-Tuning (SFT) と Reinforcement Learning from Human Feedback (RLHF) は、事前トレーニング後の言語モデル(LM)の機能を強化するための2つの基本的なプロセスである。 SFTは訓練効率が向上するが、RLHFはより優れたアライメントを提供するため、しばしば組み合わせられる。しかしながら、一般的なプラクティスは、最適化目標を統一することなく、それらを順次適用することで、異なる目的に合わせることと、パラダイムギャップを埋める機会を無視して、両方から強みを取るというトレードオフをもたらす。統一的な理解を得るために,Markov Decision Process (MDP) フレームワーク内のトークンレベルで定義された2つのサブプロセスであるpreference Estimation と transition Optimization を用いて,SFT と RLHF を解釈する。このモデリングは、SFTが劣等な推定と最適化を伴うRLHFの特殊なケースであることを示している。 RLHFは、モデル全体の回答の質を評価する一方、SFTは、ターゲットの回答から前のトークンに基づいて予測トークンをスコアする。したがって、SFTはモデルの性能を過大評価し、劣等な最適化をもたらす。この観点から,SFTとRLHFを単一のプロセスに統合する直観的ファインチューニング(IFT)を導入する。 IFTは、単一ポリシーとSFTと同量の非参照ラベル付きデータを用いて、LMの時間的残差接続を通して全回答の直感的な感覚を捉えている。我々の実験は、IFTがSFTのシーケンシャルなレシピやいくつかのタスク、特に生成、推論、ファクトフォロー能力を必要とするいくつかの典型的なアライメント手法と相容れないか、あるいはそれ以上に優れていることを示した。説明可能な凍結湖ゲームはIFTの有効性をさらに検証する。 Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, RLHF delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without unifying their optimization targets, resulting in a trade-off between fitting different objectives, and ignoring the opportunities to bridge the paradigm gap and take the strength from both. To obtain a unified understanding, we interpret SFT and RLHF using two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of RLHF with inferior estimation and optimization. RLHF evaluates the quality of model's entire generated answer, whereas SFT only scores predicted tokens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-tuning (IFT) to integrate SFT and RLHF into a single process. IFT captures LMs' intuitive sense of the entire answers through a temporal residual connection, while using a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show that IFT performs comparably or even superiorly to sequential recipes of SFT and some typical alignment methods across several tasks, particularly those requires generation, reasoning, and fact-following abilities. An explainable Frozen Lake game further validates the effectiveness of IFT.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# オープン量子ダイナミクス:情報のバックフローの記憶効果とスーパーアクティベーション Open Quantum Dynamics: Memory Effects and Superactivation of Backflow of Information ( http://arxiv.org/abs/2405.11872v1 ) ライセンス: Link先を確認	Fabio Benatti, Giovanni Nichele,	(参考訳) テンソル積 $\Lambda^{(1)}_t\otimes\Lambda^{(2)}_t$ of open quantum dynamics $\Lambda^{(1,2)}_t$ with time-dependent generators。これらの動的マップは、複雑なオープンシステム$S_1+S_2$から生まれ、環境が追跡されたときにメモリ効果が残るように、自身の環境と相互作用する。本研究は, 環境からのバックフロー・オブ・インフォメーション(BFI)を$S_1+S_2$と$S_1+S_2$と同一の現象を起こさない$S_1+S_2$にすることができる。我々は、この効果をBFI(SBFI)のスーパーアクティベーションと呼ぶ。 We investigate the divisibility properties of the tensor products $\Lambda^{(1)}_t\otimes\Lambda^{(2)}_t$ of open quantum dynamics $\Lambda^{(1,2)}_t$ with time-dependent generators. These dynamical maps emerge from a compound open system $S_1+S_2$ that interacts with its own environment in such a way that memory effects remain when the environment is traced away. This study is motivated by the following intriguing effect: one can have Backflow of Information (BFI) from the environment to $S_1+S_2$ without the same phenomenon occurring for either $S_1$ and $S_2$. We shall refer to this effect as the Superactivation of BFI (SBFI).	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# xFinder: 大規模言語モデルのためのロバストおよびピンポイントアンサー抽出 xFinder: Robust and Pinpoint Answer Extraction for Large Language Models ( http://arxiv.org/abs/2405.11874v1 ) ライセンス: Link先を確認	Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen,	(参考訳) 大規模言語モデル(LLM)の継続的な進歩は、その性能を評価するための公平で信頼性の高い手法を開発するという重要な問題に注意を向けている。特に、テストセットのリークやプロンプトフォーマットのオーバーフィットといった主観的または非客観的な不正現象の出現は、LCMの信頼性評価に重大な課題をもたらす。評価フレームワークは、回答抽出に正規表現(RegEx)を利用することが多いため、RegExによって容易に抽出できる特定のフォーマットに適合するように応答を調整するモデルもある。それにもかかわらず、RegExに基づくキー回答抽出モジュールは、しばしば抽出エラーに悩まされる。本稿では,LLM評価チェーン全体の包括的解析を行い,鍵解答抽出モジュールの最適化により抽出精度が向上し,LLMが特定の解答形式に依存することが低減され,LLM評価の信頼性が向上することが実証された。これらの問題に対処するために、キー回答抽出に特化して設計されたモデルであるxFinderを提案する。このプロセスの一環として、効率的なモデルトレーニングと評価を保証するために、特別なデータセットであるKey Answer Finder (KAF)データセットを作成します。実世界のシナリオにおける一般化テストと評価により、5億のパラメータしか持たない最小のxFinderモデルが平均解解抽出精度93.42%を達成することを示した。対照的に、最高の評価フレームワークにおけるRegExの精度は74.38%である。 xFinderは、既存の評価フレームワークと比較して、強い堅牢性と高い精度を示している。 xFinder のすべてのリソースは \url{https://github.com/IAAR-Shanghai/xFinder} で利用可能である。 The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evaluation frameworks often utilize Regular Expression (RegEx) for answer extraction, some models may adjust their responses to comply with specific formats that are easily extractable by RegEx. Nevertheless, the key answer extraction module based on RegEx frequently suffers from extraction errors. This paper conducts a comprehensive analysis of the entire LLM evaluation chain, demonstrating that optimizing the key answer extraction module can improve extraction accuracy, reduce LLMs' reliance on specific answer formats, and enhance the reliability of LLM evaluation. To address these issues, we propose xFinder, a model specifically designed for key answer extraction. As part of this process, we create a specialized dataset, the Key Answer Finder (KAF) dataset, to ensure effective model training and evaluation. Through generalization testing and evaluation in real-world scenarios, the results demonstrate that the smallest xFinder model with only 500 million parameters achieves an average answer extraction accuracy of 93.42%. In contrast, RegEx accuracy in the best evaluation framework is 74.38%. xFinder exhibits stronger robustness and higher accuracy compared to existing evaluation frameworks. All resources for xFinder are available at \url{https://github.com/IAAR-Shanghai/xFinder}.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# 人気のある地下市場における暗号通貨・サービスを理解する Understanding crypter-as-a-service in a popular underground marketplace ( http://arxiv.org/abs/2405.11876v1 ) ライセンス: Link先を確認	Alejandro de la Cruz, Sergio Pastrana,	(参考訳) クリプタ(Crypters)とは、ターゲットバイナリを変換することで、アンチウイルス(AV)アプリケーションからの検出を回避できるソフトウェアである。マルウェアのバイナリを取得し、一連の修正や難読化、暗号化を適用して、1つ以上のAVを回避するバイナリを出力することで、パッカーと同じような動作を行う。目標は、(しばしば悪意のある)機能を維持しながら、完全に検出されないまま、ハックされたjargon内のFUDを維持することだ。サイバー犯罪におけるコモディティ化の進展に伴い,検出機構の高度化に対応して,シークレット・アズ・ア・サービス・モデルが人気を博している。このビジネスモデルでは、顧客がアンチウイルスによって検出されるとすぐに更新される初期暗号を受信する。本論文は,シークレット・アズ・ア・サービスに特化したオンライン地下市場に関する最初の研究である。販売されている最も関連性の高い製品を比較し、プラットフォーム上の既存のソーシャルネットワークを分析し、それらが提供するさまざまな機能を比較します。事例研究として,市場で販売されている最も人気のある暗号鍵の1つを検証し,バイナリ(良性およびマルウェアの両方)の暗号化前後の結果を比較して,抗ウイルスエンジンの回避効果を示す。 Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs from now on) applications. They work similar to packers, by taking a malware binary and applying a series of modifications, obfuscations and encryptions to output a binary that evades one or more AVs. The goal is to remain fully undetected, or FUD in the hacking jargon, while maintaining its (often malicious) functionality. In line to the growth of commoditization in cybercrime, the crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms. In this business model, customers receive an initial crypter which is soon updated once becomes detected by anti-viruses. This paper provides the first study on an online underground market dedicated to crypter-as-a-service. We compare the most relevant products in sale, analyzing the existent social network on the platform and comparing the different features that they provide. We also conduct an experiment as a case study, to validate the usage of one of the most popular crypters sold in the market, and compare the results before and after crypting binaries (both benign and malware), to show its effectiveness when evading antivirus engines.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# RoNLIを応用したカルトグラフィーに基づく新しいカリキュラム学習法:ルーマニア初の自然言語推論コーパス A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus ( http://arxiv.org/abs/2405.11877v1 ) ライセンス: Link先を確認	Eduard Poesina, Cornelia Caragea, Radu Tudor Ionescu,	(参考訳) 自然言語推論(英: Natural Language Inference, NLI)は、自然言語理解の代名詞として研究されている話題である。対話エージェントの構築やテキスト分類、機械翻訳、その他のNLPタスクの改善には関連性があるものの、我々の知る限り、ルーマニア語のNLIコーパスは公開されていない。この目的のために, 遠隔監視により得られた58Kの訓練文対と, 正確なラベルを手動で注釈付けした6Kの検証とテスト文対からなるルーマニア初のNLIコーパス(RoNLI)を導入する。我々は、単語埋め込みに基づく浅いモデルからトランスフォーマーベースのニューラルネットワークまで、遠隔学習に基づく複数の機械学習手法で実験を行い、競争力のあるベースラインを確立する。さらに、データ地図に基づく新しいカリキュラム学習戦略を採用することにより、最良のモデルを改善する。ベースラインを再現するデータセットとコードはhttps://github.com/Eduard6421/RONLI.comで利用可能です。 Natural language inference (NLI), the task of recognizing the entailment relationship in sentence pairs, is an actively studied topic serving as a proxy for natural language understanding. Despite the relevance of the task in building conversational agents and improving text classification, machine translation and other NLP tasks, to the best of our knowledge, there is no publicly available NLI corpus for the Romanian language. To this end, we introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs, which are obtained via distant supervision, and 6K validation and test sentence pairs, which are manually annotated with the correct labels. We conduct experiments with multiple machine learning methods based on distant learning, ranging from shallow models based on word embeddings to transformer-based neural networks, to establish a set of competitive baselines. Furthermore, we improve on the best model by employing a new curriculum learning strategy based on data cartography. Our dataset and code to reproduce the baselines are available https://github.com/Eduard6421/RONLI.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# LLMにおける文脈推論効果と記憶効果の定量化 Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs ( http://arxiv.org/abs/2405.11880v1 ) ライセンス: Link先を確認	Siyu Lou, Yuntian Chen, Xiaodan Liang, Liang Lin, Quanshi Zhang,	(参考訳) 本研究では,大規模言語モデル(LLM)が言語生成に用いている,正確な記憶と文脈内推論効果を定義し,定量化するための公理系を提案する。これらの効果は LLM で符号化されたトークン/ワード間の非線形相互作用として定式化される。具体的には, 記憶効果を基礎記憶効果とカオス記憶効果に分類し, さらに文脈内推論効果を拡張推論パターンに分類し, 推論パターンを排除し, 推論パターンを反転させる。さらに、分解された効果は、LLMの信頼性スコアが記憶効果と文脈内推論効果に忠実に分解できることを数学的に保証する空間性と普遍的マッチング性を満たす。実験により, 暗記効果と文脈内推論効果の明確な乱れが, LLMによって符号化された詳細な推論パターンの簡易な検証を可能にした。 In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation. These effects are formulated as non-linear interactions between tokens/words encoded by the LLM. Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects, and further classify in-context reasoning effects into enhanced inference patterns, eliminated inference patterns, and reversed inference patterns. Besides, the decomposed effects satisfy the sparsity property and the universal matching property, which mathematically guarantee that the LLM's confidence score can be faithfully decomposed into the memorization effects and in-context reasoning effects. Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# 単一非条件拡散モデルによるアウト・オブ・ディストリビューション検出 Out-of-Distribution Detection with a Single Unconditional Diffusion Model ( http://arxiv.org/abs/2405.11881v1 ) ライセンス: Link先を確認	Alvin Heng, Alexandre H. Thiery, Harold Soh,	(参考訳) アウト・オブ・ディストリビューション(OOD)検出は、異常サンプルを特定しようとする機械学習において重要なタスクである。従来、教師なし手法はOOD検出に深い生成モデルを用いていた。しかし, 新たな分布に対する異常評価には, 異なるモデルが必要である。本稿では,基本生成モデルの出現とともに,単一汎用モデルが多様なタスクに対してOOD検出を行うことができるかどうかを考察する。そこで本研究では,Diffusion Paths (DiffPath) という手法を紹介した。 DiffPath氏は、当初、OOD検出のための無条件生成を実行するために訓練された単一の拡散モデルを利用することを提案する。具体的には,試料を標準標準値に接続する拡散経路の速度と曲率を測定する新しい手法を提案する。大規模な実験により、DiffPathは1つのモデルで、異なる分布を含む様々なOODタスクの先行作業より優れていたことが示されている。私たちのコードはhttps://github.com/clear-nus/diffpath.comで公開されています。 Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples. Traditionally, unsupervised methods utilize a deep generative model for OOD detection. However, such approaches necessitate a different model when evaluating abnormality against a new distribution. With the emergence of foundational generative models, this paper explores whether a single generalist model can also perform OOD detection across diverse tasks. To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. DiffPath proposes to utilize a single diffusion model originally trained to perform unconditional generation for OOD detection. Specifically, we introduce a novel technique of measuring the rate-of-change and curvature of the diffusion paths connecting samples to the standard normal. Extensive experiments show that with a single model, DiffPath outperforms prior work on a variety of OOD tasks involving different distributions. Our code is publicly available at https://github.com/clear-nus/diffpath.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# 鉛直的フェデレーション学習ハイブリッドローカル事前学習 Vertical Federated Learning Hybrid Local Pre-training ( http://arxiv.org/abs/2405.11884v1 ) ライセンス: Link先を確認	Wenguo Li, Xinling Guo, Xu Jiao, Tiancheng Huang, Xiaoran Yan, Yao Yang,	(参考訳) 現実世界の応用範囲の広い垂直的フェデレートラーニング(VFL)は、アカデミックと産業の両方で多くの注目を集めている。企業は、モデルの予測スキルを高めるために、さまざまな部門から同じユーザのより価値のある機能を活用しようとしている。 VFLはこの要求に対処し、個々のパーティが生データを公開しないことを同時に保証します。しかしながら、従来のVFLは、より多くの関係者が関与してサイズが縮小し、データ不足と不整合データの無駄が生じるような整合したサンプルのみを活用するため、ボトルネックに直面している。この問題に対処するために,新しいVFL Hybrid Local Pre-training (VFLHLP) アプローチを提案する。 VFLHLPはまず、参加者のローカルデータに基づいて、ローカルネットワークを事前訓練する。そして、これらの事前学習ネットワークを利用してラベル付きパーティーのサブモデルを調整するか、下流のフェデレーション学習中に他のパーティーの表現学習を強化することで、フェデレーション付きモデルの性能を高める。 Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventional VFL encounters a bottleneck as it only leverages aligned samples, whose size shrinks with more parties involved, resulting in data scarcity and the waste of unaligned data. To address this problem, we propose a novel VFL Hybrid Local Pre-training (VFLHLP) approach. VFLHLP first pre-trains local networks on the local data of participating parties. Then it utilizes these pre-trained networks to adjust the sub-model for the labeled party or enhance representation learning for other parties during downstream federated learning on aligned data, boosting the performance of federated models.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# クォータム後のセキュリティ - 起源,基礎,導入 Post-Quantum Security: Origin, Fundamentals, and Adoption ( http://arxiv.org/abs/2405.11885v1 ) ライセンス: Link先を確認	Johanna Barzen, Frank Leymann,	(参考訳) 今日では、離散対数計算が困難であると考えられるため、非対称な暗号スキームは安全であると考えられている。 Shorのアルゴリズムは、離散対数、すなわち非対称スキームを効果的に計算することができる。しかしShorのアルゴリズムは量子アルゴリズムであり、このアルゴリズムが発明された時点では、このアルゴリズムをうまく実行できる量子コンピュータは、将来的には遠く離れているようだ。量子コンピュータは、数年で利用できるようになるだろう。本稿では、まず、離散対数とよく知られた2つの非対称なセキュリティスキーム、RSAと楕円曲線暗号の関係について述べる。次に、量子アルゴリズム(および古典的アルゴリズム)による攻撃に対して安全と考えられるスキームの基盤である格子ベースの暗号の基礎を示す。次に、このような量子セーフな2つのアルゴリズム(KyberとDilithium)についてより詳細に説明する。最後に、この領域の標準化だけでなく、現在政府や産業が取っているいくつかの行動について、非常に簡潔かつ選択的に概観する。この論文は、特に自己完結することを目指しており、量子後暗号を理解するために必要な数学的基礎が提供され、例が提示される。 Nowadays, predominant asymmetric cryptographic schemes are considered to be secure because discrete logarithms are believed to be hard to be computed. The algorithm of Shor can effectively compute discrete logarithms, i.e. it can brake such asymmetric schemes. But the algorithm of Shor is a quantum algorithm and at the time this algorithm has been invented, quantum computers that may successfully execute this algorithm seemed to be far out in the future. The latter has changed: quantum computers that are powerful enough are likely to be available in a couple of years. In this article, we first describe the relation between discrete logarithms and two well-known asymmetric security schemes, RSA and Elliptic Curve Cryptography. Next, we present the foundations of lattice-based cryptography which is the bases of schemes that are considered to be safe against attacks by quantum algorithms (as well as by classical algorithms). Then we describe two such quantum-safe algorithms (Kyber and Dilithium) in more detail. Finally, we give a very brief and selective overview of a few actions currently taken by governments and industry as well as standardization in this area. The article especially strives towards being self-contained: the required mathematical foundations to understand post-quantum cryptography are provided and examples are given.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# 大規模言語モデルにおける展開と操作のプロンプトの影響 Unveiling and Manipulating Prompt Influence in Large Language Models ( http://arxiv.org/abs/2405.11891v1 ) ライセンス: Link先を確認	Zijian Feng, Hanzhang Zhou, Zixiao Zhu, Junlang Qian, Kezhi Mao,	(参考訳) プロンプトは、LLM(Large Language Models)の応答を導く上で重要な役割を果たす。しかし、入力サリエンシ(input saliency)と呼ばれるプロンプトにおける個々のトークンの複雑な役割は、応答を形作る際には、ほとんど未解明のままである。既存のサリエンシ法はLLM生成目的と不一致であるか、線形性仮定に大きく依存しているかのいずれかであり、潜在的な不正確な結果をもたらす。そこで本稿では,LLM出力生成におけるプロンプトの役割を明らかにするために,Token Distribution Dynamics (TDD) を提案する。 TDDは、言語モデルヘッド(LMヘッド)の堅牢な解釈機能を活用して、入力の正確性を評価する。入力トークンを埋め込み空間に投影し、語彙上の分布ダイナミクスに基づいてそれらの重要性を推定する。私たちは、前方、後方、双方向の3つのTDDのバリエーションを紹介します。大規模な実験によって、TDDは最先端のベースラインを越え、プロンプトとLCMのアウトプット間の因果関係を解明する上で大きなマージンを持つことが明らかになった。単なる解釈の他に、制御されたテキスト生成のための2つの迅速な操作タスク、すなわちゼロショット有害な言語抑制と感情管理にTDDを適用します。経験的な結果は、プロンプトにおける有毒な方法と感傷的な方法の両方を識別するTDDの習熟度を強調し、その後、生成されたコンテンツにおける有毒さを緩和したり、感情を調節したりする。 Prompts play a crucial role in guiding the responses of Large Language Models (LLMs). However, the intricate role of individual tokens in prompts, known as input saliency, in shaping the responses remains largely underexplored. Existing saliency methods either misalign with LLM generation objectives or rely heavily on linearity assumptions, leading to potential inaccuracies. To address this, we propose Token Distribution Dynamics (TDD), a \textcolor{black}{simple yet effective} approach to unveil and manipulate the role of prompts in generating LLM outputs. TDD leverages the robust interpreting capabilities of the language model head (LM head) to assess input saliency. It projects input tokens into the embedding space and then estimates their significance based on distribution dynamics over the vocabulary. We introduce three TDD variants: forward, backward, and bidirectional, each offering unique insights into token relevance. Extensive experiments reveal that the TDD surpasses state-of-the-art baselines with a big margin in elucidating the causal relationships between prompts and LLM outputs. Beyond mere interpretation, we apply TDD to two prompt manipulation tasks for controlled text generation: zero-shot toxic language suppression and sentiment steering. Empirical results underscore TDD's proficiency in identifying both toxic and sentimental cues in prompts, subsequently mitigating toxicity or modulating sentiment in the generated content.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# CNNを用いた後処理による人間の視覚層における符号化画像の精細化 Refining Coded Image in Human Vision Layer Using CNN-Based Post-Processing ( http://arxiv.org/abs/2405.11894v1 ) ライセンス: Link先を確認	Takahiro Shindo, Yui Tatsumi, Taiju Watanabe, Hiroshi Watanabe,	(参考訳) 人間と機械の両方のスケーラブルなイメージコーディングは、最近多くの注目を集めているテクニックです。この技術は、人間の視覚と画像認識モデルのための画像の階層的復号化を可能にする。画像が両方の目的を果たす必要がある場合、非常に効果的な方法である。しかし、一般的な画像圧縮方式でよく使われるポストプロセッシングを人や機械のスケーラブルな画像符号化法に組み込んだ研究はまだない。本稿では,ポストプロセッシングをスケーラブルな符号化方式に統合することにより,人間のデコード画像の品質を向上させる手法を提案する。実験結果から, 後処理により圧縮性能が向上することが示された。さらに,従来の手法との比較により,提案手法の有効性を検証した。 Scalable image coding for both humans and machines is a technique that has gained a lot of attention recently. This technology enables the hierarchical decoding of images for human vision and image recognition models. It is a highly effective method when images need to serve both purposes. However, no research has yet incorporated the post-processing commonly used in popular image compression schemes into scalable image coding method for humans and machines. In this paper, we propose a method to enhance the quality of decoded images for humans by integrating post-processing into scalable coding scheme. Experimental results show that the post-processing improves compression performance. Furthermore, the effectiveness of the proposed method is validated through comparisons with traditional methods.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# ディジタル双生児における生産プロセス最適化のためのスパースアテンション駆動品質予測 Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins ( http://arxiv.org/abs/2405.11895v1 ) ライセンス: Link先を確認	Yanlei Yin, Lihua Wang, Wenbo Wang, Dinh Thai Hoang,	(参考訳) プロセス産業では、生産ラインを長期的効率に最適化するには、生産ラインパラメータを微調整するために、運用状態のリアルタイムモニタリングと分析が必要である。しかし、運用論理の複雑さと生産プロセスパラメータの複雑な結合は、プロセス全体の正確な数学的モデルを開発するのを難しくし、効率的な最適化機構の展開を妨げる。これらの困難を鑑みて、我々は、物理レイアウトと操作ロジックをデジタル的に抽象化することで、生産ラインのデジタルツインをデプロイすることを提案する。デジタル双生児における機器運用状況と製品品質検査を反映した実世界のデータを反復的にマッピングすることにより、自己注意型時間畳み込みニューラルネットワークに基づく生産プロセスの品質予測モデルを採用する。このモデルは、デジタルツインのデータ駆動状態の進化を可能にする。ディジタルツインは、実際の動作条件の情報と品質に敏感な分析結果を集約する役割を担い、多次元制約下での仮想現実性進化によるプロセス生産品質の最適化を容易にする。ディジタルツインモデルを情報フローキャリアとして活用し、キープロセスインジケータから時間的特徴を抽出し、提案した合成ニューラルネットワークに基づく生産プロセス品質予測モデルを確立する。本手法は,本手法により,仮想及び実生産ライン間のシームレスな統合を促進できることを示す。この統合により、平均動作状態予測精度が98\%以上で、ほぼ最適生産プロセス制御が達成される。 In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment of efficient optimization mechanisms. In view of these difficulties, we propose to deploy a digital twin of the production line by digitally abstracting its physical layout and operational logic. By iteratively mapping the real-world data reflecting equipment operation status and product quality inspection in the digital twin, we adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. This model enables the data-driven state evolution of the digital twin. The digital twin takes a role of aggregating the information of actual operating conditions and the results of quality-sensitive analysis, which facilitates the optimization of process production quality with virtual-reality evolution under multi-dimensional constraints. Leveraging the digital twin model as an information-flow carrier, we extract temporal features from key process indicators and establish a production process quality prediction model based on the proposed composite neural network. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines. This integration achieves an average operating status prediction accuracy of over 98\% and near-optimal production process control.	翻訳日:2024-05-21 13:44:14 公開日:2024-05-20
# CReMa: ソーシャルメディア上で共有された言語間要求の計算的識別とマッチングによる危機応答 CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media ( http://arxiv.org/abs/2405.11897v1 ) ライセンス: Link先を確認	Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera, Muhammad Imran,	(参考訳) 危機期には、ソーシャルメディアプラットフォームはコミュニケーションの促進と資源の調整に重要な役割を担っている。混乱と不確実性の中で、コミュニティはしばしばこれらのプラットフォームを頼りにし、助けを求める緊急の嘆願を共有し、支援を拡張し、救援活動の組織化を行っている。しかし、前例のないレベルにエスカレートできるこのような期間の会話の量は、要求の自動識別とマッチングを必要とし、救援活動の合理化を提供する。本研究は、緊急時のソーシャルメディアプラットフォームにおける支援要請と提供を効果的に識別し、マッチングすることの課題に対処する。本稿では,CReMa(Crisis Response Matcher)を提案する。危機に特有の事前訓練されたモデルセットであるCrisisTransformersと、言語間埋め込みスペースを活用することで、危機埋め込みタスクにおいて、RoBERTa、MPNet、BERTweetなどの強力なベースラインを上回りながら、識別とマッチングタスクを向上し、クラス分けタスクではUniversal Sentence Encoder、Sentence Transformers、危機埋め込みタスクではSentence Transformersを利用する。オーストラリアで最もよく使われている16言語にまたがって、ヘルプ検索のシナリオをシミュレートし、ソーシャルメディアに支援を提供する、新しい多言語データセットを導入する。我々は、これらの16言語にわたる包括的な言語間実験を行い、また、複数のベクトル探索戦略と精度のトレードオフについても検討する。さらに、100万件のジオタグ付きグローバルデータセットを分析し、ソーシャルメディアへの支援や支援の提供に関連するパターンを理解する。全体として、これらの貢献は危機情報学の分野を前進させ、地域の将来の研究のためのベンチマークを提供する。 During times of crisis, social media platforms play a vital role in facilitating communication and coordinating resources. Amidst chaos and uncertainty, communities often rely on these platforms to share urgent pleas for help, extend support, and organize relief efforts. However, the sheer volume of conversations during such periods, which can escalate to unprecedented levels, necessitates the automated identification and matching of requests and offers to streamline relief operations. This study addresses the challenge of efficiently identifying and matching assistance requests and offers on social media platforms during emergencies. We propose CReMa (Crisis Response Matcher), a systematic approach that integrates textual, temporal, and spatial features for multi-lingual request-offer matching. By leveraging CrisisTransformers, a set of pre-trained models specific to crises, and a cross-lingual embedding space, our methodology enhances the identification and matching tasks while outperforming strong baselines such as RoBERTa, MPNet, and BERTweet, in classification tasks, and Universal Sentence Encoder, Sentence Transformers in crisis embeddings generation tasks. We introduce a novel multi-lingual dataset that simulates scenarios of help-seeking and offering assistance on social media across the 16 most commonly used languages in Australia. We conduct comprehensive cross-lingual experiments across these 16 languages, also while examining trade-offs between multiple vector search strategies and accuracy. Additionally, we analyze a million-scale geotagged global dataset to comprehend patterns in relation to seeking help and offering assistance on social media. Overall, these contributions advance the field of crisis informatics and provide benchmarks for future research in the area.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# モデルに基づくクビット雑音分光法 Model-Based Qubit Noise Spectroscopy ( http://arxiv.org/abs/2405.11898v1 ) ライセンス: Link先を確認	Kevin Schultz, Christopher A. Watson, Andrew J. Murphy, Timothy M. Sweeney, Gregory Quiroz,	(参考訳) クビットノイズスペクトロスコピー(QNS)は、クビット環境のキャラクタリゼーションと、クビット密度を改善するためにより効果的なクビット制御の前駆体として有用である。既存のQNSへのアプローチは、古典的なスペクトル推定文献が「非パラメトリック」アプローチと呼ぶもので、一連のプローブシーケンスが点やバンドの集合でノイズパワーを推定するために使用される。対照的に、モデルに基づくスペクトル推定のアプローチは、スペクトルの形で付加的な構造を仮定し、これを超解像のような統計的精度や他の能力の改善に活用する。本稿では,従来の信号処理からインスピレーションを得て,モデルに基づくQNSアプローチを導出する。しかし,最近開発されたシュロディンガー波自己回帰移動平均(SchWARMA)は相関雑音をモデル化するための形式である。シミュレーションと実験データの両方を通して、これらのモデルに基づくQNSアプローチが、古典的手法の統計的および計算的利点をいかに維持するかを示し、その結果、強力な新しい推定手法がもたらされた。 QNSと量子センシングへのこれらのアプローチの直接的な適用以外にも、量子システムに対する適応的なフィードバック制御において、古典的な適応信号処理と制御におけるそれらの役割と類似して、基礎となるモデルの柔軟性が有用であることが期待できる。 Qubit noise spectroscopy (QNS) is a valuable tool for both the characterization of a qubit's environment and as a precursor to more effective qubit control to improve qubit fidelities. Existing approaches to QNS are what the classical spectrum estimation literature would call "non-parametric" approaches, in that a series of probe sequences are used to estimate noise power at a set of points or bands. In contrast, model-based approaches to spectrum estimation assume additional structure in the form of the spectrum and leverage this for improved statistical accuracy or other capabilities, such as superresolution. Here, we derive model-based QNS approaches using inspiration from classical signal processing, primarily though the recently developed Schrodinger wave autoregressive moving-average (SchWARMA) formalism for modeling correlated noise. We show, through both simulation and experimental data, how these model-based QNS approaches maintain the statistical and computational benefits of their classical counterparts, resulting in powerful new estimation approaches. Beyond the direct application of these approaches to QNS and quantum sensing, we anticipate that the flexibility of the underlying models will find utility in adaptive feedback control for quantum systems, in analogy with their role in classical adaptive signal processing and control.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# 3Dポイントクラウド分類とセマンティックセグメンテーションのためのディープラーニング技術の概要 A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation ( http://arxiv.org/abs/2405.11903v1 ) ライセンス: Link先を確認	Sushmita Sarker, Prithul Sarker, Gunner Stone, Ryan Gorman, Alireza Tavakkoli, George Bebis, Javad Sattarvand,	(参考訳) ポイントクラウド分析は、コンピュータビジョン、ロボット操作、自律運転など、多くの分野で幅広い応用がある。ディープラーニングは画像ベースのタスクで顕著に成功したが、大規模で秩序のない、不規則でノイズの多い3Dポイントを処理する際に、ディープニューラルネットワークが直面する多くのユニークな課題がある。今後の研究を奨励するために,ポイントクラウド処理に用いられているディープラーニング手法の最近の進歩を分析し,この分野を前進させる上での課題と潜在的方向性を示す。 3Dポイントクラウド処理における2つの主要なタスク、すなわち3D形状分類とセマンティックセグメンテーションの包括的なレビューとして機能する。 Point cloud analysis has a wide range of applications in many areas such as computer vision, robotic manipulation, and autonomous driving. While deep learning has achieved remarkable success on image-based tasks, there are many unique challenges faced by deep neural networks in processing massive, unordered, irregular and noisy 3D points. To stimulate future research, this paper analyzes recent progress in deep learning methods employed for point cloud processing and presents challenges and potential directions to advance this field. It serves as a comprehensive review on two major tasks in 3D point cloud processing-- namely, 3D shape classification and semantic segmentation.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# テキスト分類器の逆攻撃に対する制約付き後退法 A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers ( http://arxiv.org/abs/2405.11904v1 ) ライセンス: Link先を確認	Tom Roth, Inigo Jauregi Unanue, Alsharif Abuadbba, Massimo Piccardi,	(参考訳) テキスト分類器は、敵対的な例に弱い -- 正しく分類された例は、受け入れ可能性の制約を満たしつつ、意図的に非分類に変換される。逆例を見つけるための従来のアプローチは、許容可能な変換の空間上の組合せ最適化問題を定義し、解決することである。効果はあるものの、このアプローチは変革の選択によって遅く、制限されています。別のアプローチは、他のテキスト・テキスト・タスクで一般的に行われているように、事前訓練された言語モデルを微調整することで、直接敵の例を生成することである。このアプローチは、より速く、より表現力に富むことを約束するが、比較的探索されていない。このため、本研究では、エンコーダ-デコーダパラフレーズモデルをトレーニングし、多様な逆例を生成する。トレーニングには強化学習アルゴリズムを採用し,有効な逆例の生成を促進する制約付き報酬を提案する。 2つのテキスト分類データセットに対する実験結果から,本モデルは従来のパラフレーズモデルよりも高い成功率を示し,他の競合攻撃よりも総合的に効果的であることが判明した。最後に、重要な設計選択が生成した例にどのように影響するかを示し、提案手法の長所と短所について議論する。 Text classifiers are vulnerable to adversarial examples -- correctly-classified examples that are deliberately transformed to be misclassified while satisfying acceptability constraints. The conventional approach to finding adversarial examples is to define and solve a combinatorial optimisation problem over a space of allowable transformations. While effective, this approach is slow and limited by the choice of transformations. An alternate approach is to directly generate adversarial examples by fine-tuning a pre-trained language model, as is commonly done for other text-to-text tasks. This approach promises to be much quicker and more expressive, but is relatively unexplored. For this reason, in this work we train an encoder-decoder paraphrase model to generate a diverse range of adversarial examples. For training, we adopt a reinforcement learning algorithm and propose a constraint-enforcing reward that promotes the generation of valid adversarial examples. Experimental results over two text classification datasets show that our model has achieved a higher success rate than the original paraphrase model, and overall has proved more effective than other competitive attacks. Finally, we show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# CSTA:ビデオ要約のためのCNNに基づく時空間アテンション CSTA: CNN-based Spatiotemporal Attention for Video Summarization ( http://arxiv.org/abs/2405.11905v1 ) ライセンス: Link先を確認	Jaewon Son, Jaehun Park, Kwangsu Kim,	(参考訳) ビデオ要約は、ビデオの簡潔な表現を生成し、本質的な内容とキーモーメントをキャプチャし、全体的な長さを短縮することを目的としている。いくつかの手法では長期依存を扱うために注意機構を採用しているが、フレームに固有の視覚的意義を捉えるのに失敗することが多い。この制限に対処するために,CNN ベースの SpatioTemporal Attention (CSTA) 手法を提案する。提案手法は,CNNによるフレーム内およびフレーム内関係の理解と,画像内の絶対位置を学習する能力を活用して,映像中の重要な属性を見つけることに依存する。空間的重要性を重視した追加モジュールを設計することで、従来の作業の効率向上とは対照的に、CSTAでは、CNNをスライディングウィンドウとして使用するため、計算オーバーヘッドを最小限に抑える必要がある。 2つのベンチマークデータセット(SumMeとTVSum)の大規模な実験により,提案手法は従来の手法に比べてMACが少なく,最先端の性能を実現していることが示された。コードはhttps://github.com/thswodnjs3/CSTAで公開されている。 Video summarization aims to generate a concise representation of a video, capturing its essential content and key moments while reducing its overall length. Although several methods employ attention mechanisms to handle long-term dependencies, they often fail to capture the visual significance inherent in frames. To address this limitation, we propose a CNN-based SpatioTemporal Attention (CSTA) method that stacks each feature of frames from a single video to form image-like frame representations and applies 2D CNN to these frame features. Our methodology relies on CNN to comprehend the inter and intra-frame relations and to find crucial attributes in videos by exploiting its ability to learn absolute positions within images. In contrast to previous work compromising efficiency by designing additional modules to focus on spatial importance, CSTA requires minimal computational overhead as it uses CNN as a sliding window. Extensive experiments on two benchmark datasets (SumMe and TVSum) demonstrate that our proposed approach achieves state-of-the-art performance with fewer MACs compared to previous methods. Codes are available at https://github.com/thswodnjs3/CSTA.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# Ensemble and Mixture-of-Experts DeepONets for Operator Learning Ensemble and Mixture-of-Experts DeepONets For Operator Learning ( http://arxiv.org/abs/2405.11907v1 ) ライセンス: Link先を確認	Ramansh Sharma, Varun Shankar,	(参考訳) 演算子学習のための新しいディープ演算子ネットワーク(DeepONet)アーキテクチャであるアンサンブルDeepONetを提案する。このトランク濃縮により、様々な演算子学習問題に対する表現性と一般化能力が向上する。また,演算子学習問題における空間的局所性やモデル空間性を促進するために,PoU近似を用いた空間的混合(MoE)DeepONetトランクネットワークアーキテクチャを提案する。我々はまず、アンサンブルとPoU-MoE DeepONetsの両方が普遍近似器であることを証明した。次に、標準トランク、PoU-MoEトランク、および/または適切な直交分解(POD)トランクのトランクアンサンブルを含むDeepONetsが、標準DeepONetsおよびPOD-DeepONetsよりも2～4倍低い相対的な$\ell_2$エラーを、2次元および3次元の偏微分方程式(PDE)を含む新しい演算子学習問題において達成できることを実証した。新しいPoU-MoEの定式化は、任意のニューラルネットワークアーキテクチャに空間的局所性とモデル空間を組み込む自然な方法を提供する一方、新たなアンサンブルであるDeepONetは、演算子学習のための科学機械学習アーキテクチャに基礎を組み込むための強力で一般的なフレームワークを提供する。 We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative $\ell_2$ errors than standard DeepONets and POD-DeepONets on both standard and challenging new operator learning problems involving partial differential equations (PDEs) in two and three dimensions. Our new PoU-MoE formulation provides a natural way to incorporate spatial locality and model sparsity into any neural network architecture, while our new ensemble DeepONet provides a powerful and general framework for incorporating basis enrichment in scientific machine learning architectures for operator learning.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# PULL:PU学習に基づく正確なリンク予測 PULL: PU-Learning-based Accurate Link Prediction ( http://arxiv.org/abs/2405.11911v1 ) ライセンス: Link先を確認	Junghun Kim, Ka Hyun Park, Hoyoung Yoon, U Kang,	(参考訳) エッジ不完全グラフが与えられたら、どのようにして不足するリンクを正確に見つけることができるのか? エッジ不完全グラフにおけるリンク予測は、それらの関係がグラフとして表されるときに、エンティティ間の欠落した関係を発見することを目的としている。エッジ不完全なグラフは、ソーシャルネットワークに友達を追加する際にすべてのユーザーをチェックすることなど、現実的な制限のために現実的に普及している。この問題に対処することは、ソーシャルネットワークでの友人の推薦や、引用ネットワークにおける参照の発見など、さまざまなタスクに不可欠である。しかし、以前のアプローチは与えられたエッジ不完全(観測された)グラフに大きく依存しているため、トレーニング中に欠落している(観測されていない)リンクを考えるのは難しい。本稿では,PULL(PU-Learning-based Link predictor)を提案する。 PULLはトレーニンググラフの観測されたエッジを肯定的な例として扱い、未接続ノードペアをラベルのないものとして扱う。 PULLは、各エッジに対して潜在変数を提案し、変数に関して期待されるグラフ構造を活用することにより、リンク予測器が観測グラフにオーバーフィットすることを効果的に防止する。 5つの実世界のデータセットに対する大規模な実験は、PULLがエッジ不完全グラフのリンクを予測するベースラインを一貫して上回っていることを示している。 Given an edge-incomplete graph, how can we accurately find the missing links? The link prediction in edge-incomplete graphs aims to discover the missing relations between entities when their relationships are represented as a graph. Edge-incomplete graphs are prevalent in real-world due to practical limitations, such as not checking all users when adding friends in a social network. Addressing the problem is crucial for various tasks, including recommending friends in social networks and finding references in citation networks. However, previous approaches rely heavily on the given edge-incomplete (observed) graph, making it challenging to consider the missing (unobserved) links during training. In this paper, we propose PULL (PU-Learning-based Link predictor), an accurate link prediction method based on the positive-unlabeled (PU) learning. PULL treats the observed edges in the training graph as positive examples, and the unconnected node pairs as unlabeled ones. PULL effectively prevents the link predictor from overfitting to the observed graph by proposing latent variables for every edge, and leveraging the expected graph structure with respect to the variables. Extensive experiments on five real-world datasets show that PULL consistently outperforms the baselines for predicting links in edge-incomplete graphs.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# ARAIDA:Analogical Reasoning-Augmented Interactive Data Annotation ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation ( http://arxiv.org/abs/2405.11912v1 ) ライセンス: Link先を確認	Chen Huang, Yiping Jin, Ilija Ilievski, Wenqiang Lei, Jiancheng Lv,	(参考訳) ヒューマンアノテーションは、かなりの労力を要する時間を要するタスクです。この問題に対処するために、インタラクティブなデータアノテーションはアノテーションモデルを使用して、人間が承認または修正するように提案する。しかし、ラベル付き限られたデータで訓練されたアノテーションモデルは、誤った提案を発生させる傾向があるため、追加の人間の修正努力がもたらされる。この課題に対処するために,対話型データアノテーション設定における自動アノテーション精度を高め,人間の修正の必要性を低減する類似推論に基づくアプローチであるAraidaを提案する。 Araidaは、アノテーションモデルとk-nearest neighbors(KNN)モデルを動的にコーディネートするエラー認識統合戦略で、アノテーションモデルからの予測が不正確であると判断された場合、KNNの予測をより重要視する。経験的研究は、Araidaが異なるアノテーションタスクやモデルに適応可能であることを示した。平均すると、バニラのインタラクティブなデータアノテーション手法に比べて、人間の修正作業が11.02%削減される。 Human annotation is a time-consuming task that requires a significant amount of effort. To address this issue, interactive data annotation utilizes an annotation model to provide suggestions for humans to approve or correct. However, annotation models trained with limited labeled data are prone to generating incorrect suggestions, leading to extra human correction effort. To tackle this challenge, we propose Araida, an analogical reasoning-based approach that enhances automatic annotation accuracy in the interactive data annotation setting and reduces the need for human corrections. Araida involves an error-aware integration strategy that dynamically coordinates an annotation model and a k-nearest neighbors (KNN) model, giving more importance to KNN's predictions when predictions from the annotation model are deemed inaccurate. Empirical studies demonstrate that Araida is adaptable to different annotation tasks and models. On average, it reduces human correction labor by 11.02% compared to vanilla interactive data annotation methods.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# Diff-BGM:ビデオバックグラウンド音楽生成のための拡散モデル Diff-BGM: A Diffusion Model for Video Background Music Generation ( http://arxiv.org/abs/2405.11913v1 ) ライセンス: Link先を確認	Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu,	(参考訳) ビデオを編集する際には、魅力的な背景音楽が不可欠である。しかし、ビデオバックグラウンド音楽生成タスクは、適切なトレーニングデータセットの欠如、音楽生成過程を柔軟に制御することの難しさ、ビデオと音楽の逐次的整列化など、いくつかの課題に直面している。本研究ではまず,ビデオと音楽に関するマルチモーダル情報を提供するための詳細なアノテーションとショット検出機能を備えた高品質な音楽ビデオデータセットBGM909を提案する。そこで我々は,音楽の多様性や音楽とビデオのアライメントを含む音楽の質を評価するための評価指標を,検索精度で提示する。最後に,ビデオの背景音楽を自動的に生成するDiff-BGMフレームワークを提案する。このフレームワークは,生成過程における音楽の異なる側面を制御するために異なる信号を使用する。本稿では,セグメント対応のクロスアテンション層を導入することで,映像と音楽の連続的な調整を提案する。提案手法の有効性を検証する実験を行った。コードとモデルはhttps://github.com/sizhelee/Diff-BGMで公開されている。 When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# PT43D:高輝度RGB画像から3次元形状を生成する確率変換器 PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images ( http://arxiv.org/abs/2405.11914v1 ) ライセンス: Link先を確認	Yiheng Xiong, Angela Dai,	(参考訳) ロボット工学などの様々な応用において,単一のRGB画像から3次元形状を生成することが不可欠である。現行のアプローチでは、物体の鮮明で完全な視覚的記述を含むイメージをターゲットとしており、物体の観察がおおむね無視される、あるいは取り消される、一般的な現実的なケースを考慮しない。そこで本稿では,RGB画像上の3次元形状の確率分布を生成するトランスフォーマーを用いた自己回帰モデルを提案する。閉塞や視野の切り離しといった現実的なシナリオに対処するために、実世界のシナリオの微調整を改善するために、シミュレートされた画像と形状のトレーニングペアを作成します。次に、入力画像から最も関連性の高い領域を効果的に識別し、形状生成を行う。これにより、適切な多様性と入力画像との強い整合性を持つサンプル形状の推測が可能となる。合成データに基づいてモデルをトレーニングし、テストし、微調整し、実世界のデータでテストします。実験により、どちらのシナリオにおいても、我々のモデルは最先端よりも優れていることが示された。 Generating 3D shapes from single RGB images is essential in various applications such as robotics. Current approaches typically target images containing clear and complete visual descriptions of the object, without considering common realistic cases where observations of objects that are largely occluded or truncated. We thus propose a transformer-based autoregressive model to generate the probabilistic distribution of 3D shapes conditioned on an RGB image containing potentially highly ambiguous observations of the object. To handle realistic scenarios such as occlusion or field-of-view truncation, we create simulated image-to-shape training pairs that enable improved fine-tuning for real-world scenarios. We then adopt cross-attention to effectively identify the most relevant region of interest from the input image for shape generation. This enables inference of sampled shapes with reasonable diversity and strong alignment with the input image. We train and test our model on our synthetic data then fine-tune and test it on real-world data. Experiments demonstrate that our model outperforms state of the art in both scenarios	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# 大規模言語モデルにおける埋め込みからの情報漏洩 Information Leakage from Embedding in Large Language Models ( http://arxiv.org/abs/2405.11916v1 ) ライセンス: Link先を確認	Zhipeng Wang, Anda Cheng, Yinggui Wang, Lei Wang,	(参考訳) 大規模言語モデル(LLM)の普及により、データのプライバシに関する懸念が高まっている。本研究の目的は,悪意のあるモデルプロバイダが埋め込みからユーザ入力を回復する可能性のある,入力再構成攻撃によるプライバシー侵害の可能性を検討することである。まず,モデルの隠れ状態からオリジナルテキストを再構築する2つの基本手法を提案する。これら2つの手法は, 浅い層からの埋め込み攻撃に有効であるが, より深い層からの埋め込み攻撃では効果が低下することがわかった。この問題に対処するため,Transformer ベースの Embed Parrot を提案し,深層への埋め込みから入力を再構築する。解析の結果,ChatGLM-6BとLlama2-7Bの隠れ状態からの入力を効果的に再構成し,トークン長やデータ分布の安定な性能を示すことがわかった。プライバシー侵害のリスクを軽減するため,埋め込み再構築プロセスの悪用を防ぐ防衛機構を導入する。本研究は,分散学習システムにおけるユーザプライバシ保護の重要性を強調し,そのような環境におけるセキュリティプロトコルの強化に有用な洞察を提供する。 The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find that these two methods are effective in attacking the embeddings from shallow layers, but their effectiveness decreases when attacking embeddings from deeper layers. To address this issue, we then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers. Our analysis reveals that Embed Parrot effectively reconstructs original inputs from the hidden states of ChatGLM-6B and Llama2-7B, showcasing stable performance across various token lengths and data distributions. To mitigate the risk of privacy breaches, we introduce a defense mechanism to deter exploitation of the embedding reconstruction process. Our findings emphasize the importance of safeguarding user privacy in distributed learning systems and contribute valuable insights to enhance the security protocols within such environments.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# エネルギー結合形成における量子対古典的アルゴリズムの競合的ショーケース A Competitive Showcase of Quantum versus Classical Algorithms in Energy Coalition Formation ( http://arxiv.org/abs/2405.11917v1 ) ライセンス: Link先を確認	Naeimeh Mohseni, Thomas Morstyn, Corey O Meara, David Bucher, Jonas Nüßlein, Giorgio Cortiana,	(参考訳) エネルギーコミュニティの形成は、非中央集権的かつ持続可能なエネルギー管理を進める上で重要である。この文脈の中で、協調構造生成(CSG)は有望なフレームワークとして現れます。 CSGの複雑さはエージェント数とともに急速に増大し、古典的解法は中程度のサイズでも実用的ではない(エージェント数>30)。そのため,高度な計算手法の開発が不可欠である。本研究は,Dwaveハードウェア上での量子アニーリングと,シミュレータおよびIBMQハードウェア上での量子近似最適化アルゴリズム(QAOA)を比較し,エネルギーコミュニティ形成に対処するベンチマークを行う。我々の古典的解法には、Tabu search、simulated annealing、そしてまさに古典的解法が含まれる。以上の結果から,Dwaveはハードウェア上でのQAOAを上回っていることがわかった。注目すべきは、QAOAがDwaveと同等のランタイムスケーリングを示していることだ。特に、Dwaveは古典的な解法と比較して競争力のある性能を示し、より好ましいランタイムスケーリングで同等品質のソリューションを実現している。 The formation of energy communities is pivotal for advancing decentralized and sustainable energy management. Within this context, Coalition Structure Generation (CSG) emerges as a promising framework. The complexity of CSG grows rapidly with the number of agents, making classical solvers impractical for even moderate sizes (number of agents>30). Therefore, the development of advanced computational methods is essential. Motivated by this challenge, this study conducts a benchmark comparing classical solvers with quantum annealing on Dwave hardware and the Quantum Approximation Optimization Algorithm (QAOA) on both simulator and IBMQ hardware to address energy community formation. Our classical solvers include Tabu search, simulated annealing, and an exact classical solver. Our findings reveal that Dwave surpasses QAOA on hardware in terms of solution quality. Remarkably, QAOA demonstrates comparable runtime scaling with Dwave, albeit with a significantly larger prefactor. Notably, Dwave exhibits competitive performance compared to the classical solvers, achieving solutions of equal quality with more favorable runtime scaling.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# データアノテーションの効率的・統計的品質推定法について On Efficient and Statistical Quality Estimation for Data Annotation ( http://arxiv.org/abs/2405.11919v1 ) ライセンス: Link先を確認	Jan-Christoph Klie, Rahul Nair, Juan Haladjian, Marc Kirchner,	(参考訳) アノテーション付きデータセットは、教師付き機械学習モデルをトレーニング、評価、比較、生産化するための重要な要素である。したがって、アノテーションが高品質であることは必須である。彼らの創造のためには、優れた品質管理とそれによる信頼性の高い品質見積が必要である。そして、アノテーション処理中に品質が不十分な場合には、修正措置を講じて改善することができる。品質評価は、専門家が手動でインスタンスを正しくも正しくもラベル付けすることで行われることが多い。しかし、アノテーション付きのインスタンスをチェックするのはコストがかかる傾向にある。したがって、実際には、通常はサブセットのみを検査するが、大部分は正当化や統計的なパワーを考慮せずに選択され、多くの場合は比較的小さい。しかし、小さなサンプルサイズに基づく推定は、誤り率の不正確な値につながる可能性がある。不要な大規模なサンプルサイズの使用には、例えばアノテーションの追加など、もっと多くの費用がかかる可能性がある。そこで我々はまず,アノテーションの誤り率を推定するのに必要となる最小限のサンプルサイズを見つけるために,信頼区間の使い方を詳細に記述する。次に, 誤り率推定の代替として, 受入サンプリングを適用することで, 同じ統計的保証を提供しながら, 必要なサンプルサイズを最大50%削減できることを示す。 Annotated datasets are an essential ingredient to train, evaluate, compare and productionalize supervised machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. Quality estimation is often performed by having experts manually label instances as correct or incorrect. But checking all annotated instances tends to be expensive. Therefore, in practice, usually only subsets are inspected; sizes are chosen mostly without justification or regard to statistical power and more often than not, are relatively small. Basing estimates on small sample sizes, however, can lead to imprecise values for the error rate. Using unnecessarily large sample sizes costs money that could be better spent, for instance on more annotations. Therefore, we first describe in detail how to use confidence intervals for finding the minimal sample size needed to estimate the annotation error rate. Then, we propose applying acceptance sampling as an alternative to error rate estimation We show that acceptance sampling can reduce the required sample sizes up to 50% while providing the same statistical guarantees.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# MirrorGaussian:鏡の反射を再現するために3Dガウスを反射する MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections ( http://arxiv.org/abs/2405.11921v1 ) ライセンス: Link先を確認	Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan,	(参考訳) 3D Gaussian Splattingは、写真リアリスティックおよびリアルタイムの新規ビュー合成において顕著な進歩を見せている。しかし、ミラー反射をモデル化する際の課題に直面しており、異なる視点からかなりの外見の変化が見られる。この問題に対処するために,3次元ガウススティングに基づくリアルタイムレンダリングによるミラーシーン再構成手法であるMirrorGaussianを提案する。重要な洞察は、現実世界空間と仮想ミラー空間の間のミラー対称性に基づいている。実世界の3Dガウスと鏡面を反映した鏡面を区別可能なラスタ化を実現するための直感的な二重レンダリング手法を提案する。すべての3Dガウスアンは、エンドツーエンドのフレームワークでミラープレーンと共同で最適化されている。 MirrorGaussianは、ミラー付きシーンで高品質でリアルタイムなレンダリングを実現し、新しいミラーやオブジェクトの追加のようなシーン編集の強化を実現している。複数のデータセットに対する総合的な実験により、我々のアプローチは既存の手法を著しく上回り、最先端の結果が得られることを示した。プロジェクトページ:https://mirror-gaussian.github.io/.com 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting. The key insight is grounded on the mirror symmetry between the real-world space and the virtual mirror space. We introduce an intuitive dual-rendering strategy that enables differentiable rasterization of both the real-world 3D Gaussians and the mirrored counterpart obtained by reflecting the former about the mirror plane. All 3D Gaussians are jointly optimized with the mirror plane in an end-to-end framework. MirrorGaussian achieves high-quality and real-time rendering in scenes with mirrors, empowering scene editing like adding new mirrors and objects. Comprehensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods, achieving state-of-the-art results. Project page: https://mirror-gaussian.github.io/.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# 大規模分散二部グラフの効率的なクラスタリング Effective Clustering on Large Attributed Bipartite Graphs ( http://arxiv.org/abs/2405.11922v1 ) ライセンス: Link先を確認	Renchi Yang, Yidu Wu, Xiaoyang Lin, Qichen Wang, Tsz Nam Chan, Jieming Shi,	(参考訳) 分散二部グラフ(ABG)は、顧客-商品購入ネットワークや著者-紙の著者間グラフなど、豊富な属性に関連付けられた2組の異種ノード間の相互作用を記述するための表現力のあるデータモデルである。このようなグラフにセットされたターゲットノードを(k-ABGCと呼ばれる)非結合クラスタに分割すると、ソーシャルネットワーク分析、レコメンデーションシステム、情報検索、バイオインフォマティクスなど、様々な領域で広く利用される。しかし、k-ABGCに対する既存のソリューションの大半は属性情報を見落としているか、二部グラフ構造を正確に捉えていないかのいずれかであり、非常に妥協された結果の品質を損なう。これらの問題の重大さは、数百万のノードと大量の属性データを含む実際のABGでアクセント化され、そのようなグラフ上で有効なk-ABGCをレンダリングする。本稿では,複数の実データセット上でのスーパーブクラスタリング性能を実現する,k-ABGCの効率的かつ効率的なアプローチであるTPOを提案する。 TPOは2つの主要な貢献を通じて高いクラスタリング品質を得る。 i) ABGにおけるマルチホップ接続を考慮したノード間の属性親和性獲得に特化したマルチスケール属性親和性に基づくk-ABGC問題の新たな定式化と変換 (II) 明確な親和性行列の構成をサイドステッピングし、より高速な収束を容易にするために、慎重に設計された最適化を含む高効率な解法。 5つの実ABGに対してTPOと19の基線を比較した大規模な実験では、TPOが地上トルスラベルに対して測定された優れたクラスタリング品質を示した。さらに、最先端技術と比較して、TPOは小さなABGと大きなABGのどちらよりも40倍以上高速であることが多い。 Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, including social network analysis, recommendation systems, information retrieval, and bioinformatics. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately, engendering severely compromised result quality. The severity of these issues is accentuated in real ABGs, which often encompass millions of nodes and a sheer volume of attribute data, rendering effective k-ABGC over such graphs highly challenging. In this paper, we propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets. TPO obtains high clustering quality through two major contributions: (i) a novel formulation and transformation of the k-ABGC problem based on multi-scale attribute affinity specialized for capturing attribute affinities between nodes with the consideration of their multi-hop connections in ABGs, and (ii) a highly efficient solver that includes a suite of carefully-crafted optimizations for sidestepping explicit affinity matrix construction and facilitating faster convergence. Extensive experiments, comparing TPO against 19 baselines over 5 real ABGs, showcase the superior clustering quality of TPO measured against ground-truth labels. Moreover, compared to the state of the arts, TPO is often more than 40x faster over both small and large ABGs.	翻訳日:2024-05-21 13:34:30 公開日:2024-05-20
# 『Set It Up!』:構成生成モデルによる機能的オブジェクトアレンジメント "Set It Up!": Functional Object Arrangement with Compositional Generative Models ( http://arxiv.org/abs/2405.11928v1 ) ライセンス: Link先を確認	Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaebling, David Hsu,	(参考訳) 本稿では,「2つのダイニングテーブルをセットアップする」など,機能的オブジェクトアレンジメントを作成するための不特定な指示を理解可能なロボットを開発する上での課題について考察する。未特定命令の解釈を学習するためのフレームワークであるSetItUpを導入する。 SetItUpは少数のトレーニング例と人為的なプログラムスケッチを使って、特定のシーンタイプのアレンジルールを明らかにする。オブジェクト間の抽象空間関係の中間グラフのような表現を活用することで、SetItUpは配置問題を2つのサブプロブレムに分解する。一限られたデータから配置パターンを学習し、二これらの抽象的な関係をオブジェクトのポーズに基礎付けること。 SetItUpは、大きな言語モデル(LLM)を活用して、制約を満たす制約として、新しいシーンにおけるオブジェクト間の抽象的な空間的関係を提案し、制約を満たすオブジェクトのポーズを見つけるために、これらの抽象的関係に関連する拡散モデルのライブラリを構成する。研究用デスク,ダイニングテーブル,コーヒーテーブルからなるデータセット上で,本フレームワークの有効性を検証し,既存のモデルと比較して,物理的に可塑性,機能的,審美的に満足な物体配置を生成する上で,優れた性能を示した。 This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table." We introduce a framework, SetItUp, for learning to interpret under-specified instructions. SetItUp takes a small number of training examples and a human-crafted program sketch to uncover arrangement rules for specific scene types. By leveraging an intermediate graph-like representation of abstract spatial relationships among objects, SetItUp decomposes the arrangement problem into two subproblems: i) learning the arrangement patterns from limited data and ii) grounding these abstract relationships into object poses. SetItUp leverages large language models (LLMs) to propose the abstract spatial relationships among objects in novel scenes as the constraints to be satisfied; then, it composes a library of diffusion models associated with these abstract relationships to find object poses that satisfy the constraints. We validate our framework on a dataset comprising study desks, dining tables, and coffee tables, with the results showing superior performance in generating physically plausible, functional, and aesthetically pleasing object arrangements compared to existing models.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# ブラックボックスLCMのデータ汚染校正 Data Contamination Calibration for Black-box LLMs ( http://arxiv.org/abs/2405.11930v1 ) ライセンス: Link先を確認	Wentao Ye, Jiaqi Hu, Liyao Li, Haobo Wang, Gang Chen, Junbo Zhao,	(参考訳) LLM(Large Language Models)の急速な進歩は、トレーニングデータサイズの拡大と密接に関連している。しかし、未確認の超大規模トレーニングセットは、データ汚染のような潜在的なリスク、すなわち、ベンチマークデータがトレーニングに使用される一連のリスクをもたらす。本研究では, ポーラライズ・オーグメント・キャリブレーション(PAC, Polarized Augment Calibration) と題して, 汚染データを検出し, 汚染効果を低減させる新たなデータセットを提案する。 PACは、マシンラーニングコミュニティから人気のMIA(Membership Inference Attack)を拡張し、トレーニングデータの検出においてよりグローバルなターゲットを形成して、目に見えないトレーニングデータを明確にする。先駆的な業績として、PACは非常に多くのプラグアンドプレイがあり、現在のほとんどの(すべてではないとしても)ホワイトボックスとブラックボックスのLCMと統合できる。大規模な実験により、PACは既存の手法を少なくとも4.5%上回り、4つのデータセットフォーマットでデータ汚染を検出する。さらに、実世界のシナリオにおける我々の応用は、汚染と関連する問題の顕著な存在を強調している。 The rapid advancements of Large Language Models (LLMs) tightly associate with the expansion of the training data size. However, the unchecked ultra-large-scale training sets introduce a series of potential risks like data contamination, i.e. the benchmark data is used for training. In this work, we propose a holistic method named Polarized Augment Calibration (PAC) along with a new to-be-released dataset to detect the contaminated data and diminish the contamination effect. PAC extends the popular MIA (Membership Inference Attack) -- from machine learning community -- by forming a more global target at detecting training data to Clarify invisible training data. As a pioneering work, PAC is very much plug-and-play that can be integrated with most (if not all) current white- and black-box LLMs. By extensive experiments, PAC outperforms existing methods by at least 4.5%, towards data contamination detection on more 4 dataset formats, with more than 10 base LLMs. Besides, our application in real-world scenarios highlights the prominent presence of contamination and related issues.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# 生成拡散模型の非平衡物理学 Nonequilbrium physics of generative diffusion models ( http://arxiv.org/abs/2405.11932v1 ) ライセンス: Link先を確認	Zhendong Yu, Haiping Huang,	(参考訳) 生成拡散モデルは、物理学におけるランゲヴィン力学の概念を機械的傾きに適用し、産業的応用から多くの関心を惹きつけるが、固有のメカニズムに関する完全な図面はいまだに欠けている。本稿では,近年発見された固有相転移を理解するために,ゆらぎ定理,エントロピー生成,フランツ・パリポテンシャルを導出した拡散モデルの透過的な物理解析を行う。我々の解析は、非平衡物理学や平衡物理学の概念、すなわち、前方と後方のダイナミクスをランゲヴィン力学として扱い、逆拡散生成過程を統計的推論として扱い、時間依存状態変数がスピングラス理論で研究された待ち時間障害として機能する。この統一原理は、機械学習の実践者がより良いアルゴリズムや理論物理学者を設計し、機械学習と非平衡熱力学を結びつけるよう導くことが期待されている。 Generative diffusion models apply the concept of Langevin dynamics in physics to machine leaning, attracting a lot of interest from industrial application, but a complete picture about inherent mechanisms is still lacking. In this paper, we provide a transparent physics analysis of the diffusion models, deriving the fluctuation theorem, entropy production, Franz-Parisi potential to understand the intrinsic phase transitions discovered recently. Our analysis is rooted in non-equlibrium physics and concepts from equilibrium physics, i.e., treating both forward and backward dynamics as a Langevin dynamics, and treating the reverse diffusion generative process as a statistical inference, where the time-dependent state variables serve as quenched disorder studied in spin glass theory. This unified principle is expected to guide machine learning practitioners to design better algorithms and theoretical physicists to link the machine learning to non-equilibrium thermodynamics.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# UAV-VisLoc:UAVビジュアルローカライゼーションのための大規模データセット UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization ( http://arxiv.org/abs/2405.11936v1 ) ライセンス: Link先を確認	Wenjia Xu, Yaxuan Yao, Jiaqi Cao, Zhiwei Wei, Chunbo Liu, Jiuniu Wang, Mugen Peng,	(参考訳) 無人航空機(UAV)の応用は近年広く行われている。特にグローバルナビゲーション衛星システム(GNSS)が破壊され、信頼性が低い場合、UAVの正確な緯度と経度座標を確保することが重要である。既存の視覚的ローカライゼーション手法は、UAVの地上視像と正方形衛星地図とをマッチングすることにより、誤差蓄積なしに自律的な視覚的ローカライゼーションを実現する。しかし、UAVの地上画像の収集にはコストがかかるため、現実のシナリオでは大規模なデータセットが不足している。既存のUAVの視覚的ローカライゼーションのためのデータセットは、小さな地理的領域に限られるか、異なるテクスチャを持つ都市領域にのみ焦点が当てられている。そこで本研究では,UAVの実際の位置座標を,撮影地平線図に基づく大規模衛星地図上で決定することにより,UAVの視覚的位置決めタスクを定義する。本稿では,UAVの視覚的ローカライゼーション作業を容易にするために,大規模なUAV-VisLocデータセットを提案する。このデータセットは、中国の11カ所にわたる多様なドローンの画像で構成され、さまざまな地形の特徴を捉えている。データセットには、固定翼ドローンと、異なる高度と方向で撮影されるマルチテランのドローンの画像が含まれている。私たちのデータセットには6,742のドローン画像と11の衛星マップが含まれており、緯度、経度、高度、捕獲日などのメタデータがあります。私たちのデータセットは、多種多様なデータを提供することで、モデルのトレーニングとテストの両方をサポートするように調整されています。 The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of UAV with the ortho satellite maps. However, collecting UAV ground-down view images across diverse locations is costly, leading to a scarcity of large-scale datasets for real-world scenarios. Existing datasets for UAV visual localization are often limited to small geographic areas or are focused only on urban regions with distinct textures. To address this, we define the UAV visual localization task by determining the UAV's real position coordinates on a large-scale satellite map based on the captured ground-down view. In this paper, we present a large-scale dataset, UAV-VisLoc, to facilitate the UAV visual localization task. This dataset comprises images from diverse drones across 11 locations in China, capturing a range of topographical features. The dataset features images from fixed-wing drones and multi-terrain drones, captured at different altitudes and orientations. Our dataset includes 6,742 drone images and 11 satellite maps, with metadata such as latitude, longitude, altitude, and capture date. Our dataset is tailored to support both the training and testing of models by providing a diverse and extensive data.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# チャットCOMET: 自己改善型機械翻訳における最小ベイズリスクデコーディングの活用 Chasing COMET: Leveraging Minimum Bayes Risk Decoding for Self-Improving Machine Translation ( http://arxiv.org/abs/2405.11937v1 ) ライセンス: Link先を確認	Kamil Guttmann, Mikołaj Pokrywka, Adrian Charkiewicz, Artur Nowakowski,	(参考訳) 本稿では,機械翻訳(MT)における自己改善のための最小ベイズリスク(MBR)デコーディングについて検討する。 MBRで復号された前方翻訳のモデルを微調整することで自己改善プロセスを実現する。 COMET を MBR ユーティリティメトリックとして活用することにより,人間の嗜好に適合する翻訳の順位を向上することを目指している。本稿では,このアプローチの反復的適用と,言語固有のMBRユーティリティメトリクスの必要性について検討する。その結果、ドメイン適応型モデルへの適用や低リソース設定への一般化など、すべての言語ペアに対する翻訳品質の大幅な向上が示された。このことは、様々なシナリオにおいて効率的なMT自己改善のためのCOMET誘導MBRの可能性を強調している。 This paper explores Minimum Bayes Risk (MBR) decoding for self-improvement in machine translation (MT), particularly for domain adaptation and low-resource languages. We implement the self-improvement process by fine-tuning the model on its MBR-decoded forward translations. By employing COMET as the MBR utility metric, we aim to achieve the reranking of translations that better aligns with human preferences. The paper explores the iterative application of this approach and the potential need for language-specific MBR utility metrics. The results demonstrate significant enhancements in translation quality for all examined language pairs, including successful application to domain-adapted models and generalisation to low-resource settings. This highlights the potential of COMET-guided MBR for efficient MT self-improvement in various scenarios.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# オランダにおけるバイオメディカルエンティティリンク: 自動生成ウィキペディアコーパス上での自己調整型BERTモデルの微調整 Biomedical Entity Linking for Dutch: Fine-tuning a Self-alignment BERT Model on an Automatically Generated Wikipedia Corpus ( http://arxiv.org/abs/2405.11941v1 ) ライセンス: Link先を確認	Fons Hartendorp, Tom Seinen, Erik van Mulligen, Suzan Verberne,	(参考訳) 健康関連テキストからの自動情報抽出における主要な要素であるバイオメディカル・エンティティ・リンクは、構造化バイオメディカル・ナレッジ・ベースにおいて、テキスト・エンティティ(疾患、薬物、患者が言及する身体部分など)を対応する概念に結びつける上で重要な役割を担っている。自然言語処理の最近の進歩にもかかわらず、この課題は依然として挑戦的だ。本稿では,オランダ語におけるバイオメディカルエンティティリンクモデルについて述べる。我々は、MedRoBERTa.nlをベースモデルとし、UMLSとオランダのSNOMEDから抽出したオランダの生物医学オントロジーに基づいて、自己調整による第2フェーズ事前訓練を行う。我々は、オントロジーにリンクしたオランダのバイオメディカルエンティティのウィキペディアからコーパスを抽出し、このデータセットでモデルを微調整する。我々は,オランダのマントラ GSC-corpus のモデルを評価し,54.7%の分類精度と69.8%の1距離精度を達成した。次に,登録されていない患者支援フォーラムデータの収集に関するケーススタディを行い,本モデルが先行するエンティティ認識ステップの限られた品質によって阻害されていることを示す。小サンプルのマニュアル評価は、正しい抽出された実体の約65%がオントロジーの正しい概念と関連していることを示している。以上の結果から,英語以外の言語でリンクする生物医学的実体は依然として困難なままであるが,オランダ語モデルは患者生成テキストの高レベルな分析に利用することができる。 Biomedical entity linking, a main component in automatic information extraction from health-related texts, plays a pivotal role in connecting textual entities (such as diseases, drugs and body parts mentioned by patients) to their corresponding concepts in a structured biomedical knowledge base. The task remains challenging despite recent developments in natural language processing. This paper presents the first evaluated biomedical entity linking model for the Dutch language. We use MedRoBERTa.nl as base model and perform second-phase pretraining through self-alignment on a Dutch biomedical ontology extracted from the UMLS and Dutch SNOMED. We derive a corpus from Wikipedia of ontology-linked Dutch biomedical entities in context and fine-tune our model on this dataset. We evaluate our model on the Dutch portion of the Mantra GSC-corpus and achieve 54.7% classification accuracy and 69.8% 1-distance accuracy. We then perform a case study on a collection of unlabeled, patient-support forum data and show that our model is hampered by the limited quality of the preceding entity recognition step. Manual evaluation of small sample indicates that of the correctly extracted entities, around 65% is linked to the correct concept in the ontology. Our results indicate that biomedical entity linking in a language other than English remains challenging, but our Dutch model can be used to for high-level analysis of patient-generated text.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# FAME-MTデータセット:機械翻訳目的の形式的認識を容易にする FAME-MT Dataset: Formality Awareness Made Easy for Machine Translation Purposes ( http://arxiv.org/abs/2405.11942v1 ) ライセンス: Link先を確認	Dawid Wiśniewski, Zofia Rostek, Artur Nowakowski,	(参考訳) 人々は様々な目的で言語を使用します。情報を共有することとは別に、個人は感情を表現したり、他人への敬意を示すためにそれを使うこともある。本稿では, 機械翻訳の形式レベルとFAME-MT(FAME-MT)に着目し, ターゲット文の形式性に応じて, フォーマルクラス, フォーマルクラスに分類される15のヨーロッパソース言語と8のヨーロッパターゲット言語間の1120万の翻訳からなるデータセットについて述べる。このデータセットは、考慮された欧州のターゲット言語毎に所定の形式レベルを確保するために、マシン翻訳モデルを微調整するために使用することができる。本稿では、データセット作成手順、FAME-MTが言語レジスタ情報の信頼性のある情報源であることを示すデータセットの品質分析について述べる。現在、公式アノテーションの最大のデータセットであり、例はヨーロッパの言語ペア112で表現されている。データセットはオンラインで公開されている。 https://github.com/laniqo-public/fame-mt/。 People use language for various purposes. Apart from sharing information, individuals may use it to express emotions or to show respect for another person. In this paper, we focus on the formality level of machine-generated translations and present FAME-MT -- a dataset consisting of 11.2 million translations between 15 European source languages and 8 European target languages classified to formal and informal classes according to target sentence formality. This dataset can be used to fine-tune machine translation models to ensure a given formality level for each European target language considered. We describe the dataset creation procedure, the analysis of the dataset's quality showing that FAME-MT is a reliable source of language register information, and we present a publicly available proof-of-concept machine translation model that uses the dataset to steer the formality level of the translation. Currently, it is the largest dataset of formality annotations, with examples expressed in 112 European language pairs. The dataset is published online: https://github.com/laniqo-public/fame-mt/ .	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# WisPerMed at BioLaySumm:Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles ( http://arxiv.org/abs/2405.11950v1 ) ライセンス: Link先を確認	Tabea M. G. Pakull, Hendrik Damm, Ahmad Idrissi-Yaghir, Henning Schäfer, Peter A. Horn, Christoph M. Friedrich,	(参考訳) 本論文は、バイオ医療分野におけるレイ・サマリゼーションの共有タスクにおけるWisPerMedチームの取り組みを詳述する。大規模言語モデル(LLM)、特にBioMistralとLlama3は微調整され、複雑な科学的なテキストからレイサマリーを作成するために使用された。要約性能は、インストラクションチューニング、少数ショット学習、特定のコンテキスト情報を組み込むように調整されたプロンプト変奏法など、様々なアプローチによって強化された。実験の結果、微調整は一般的に、最も評価された指標で最高のパフォーマンスをもたらすことが示された。特に巧妙なプロンプトを使用する場合、いくつかのショット学習により、モデルが関連性があり、事実的に正確なテキストを生成する能力が向上した。さらに,読みやすさと実測値に基づいてテキスト出力の選択を最適化する動的エキスパート選択(DES)機構を開発した。 54人の参加者のうち、WisPerMedチームは可読性、事実性、関連性から4位に達した。総点数から判断すると,本手法は近似によってベースラインを改良した。 5.5ポイントで1位以下は1.5ポイントしかなかった。 This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization in the biomedical domain, aimed at making scientific publications accessible to non-specialists. Large language models (LLMs), specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries from complex scientific texts. The summarization performance was enhanced through various approaches, including instruction tuning, few-shot learning, and prompt variations tailored to incorporate specific context information. The experiments demonstrated that fine-tuning generally led to the best performance across most evaluated metrics. Few-shot learning notably improved the models' ability to generate relevant and factually accurate texts, particularly when using a well-crafted prompt. Additionally, a Dynamic Expert Selection (DES) mechanism to optimize the selection of text outputs based on readability and factuality metrics was developed. Out of 54 participants, the WisPerMed team reached the 4th place, measured by readability, factuality, and relevance. Determined by the overall score, our approach improved upon the baseline by approx. 5.5 percentage points and was only approx 1.5 percentage points behind the first place.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# 独特な:セルフアテンションVs.仮想ノード Distinguished In Uniform: Self Attention Vs. Virtual Nodes ( http://arxiv.org/abs/2405.11951v1 ) ライセンス: Link先を確認	Eran Rosenbluth, Jan Tönshoff, Martin Ritzert, Berke Kisin, Martin Grohe,	(参考訳) SANやGPSなどのグラフトランスフォーマー(GT)は、メッセージパッシングGNN(MPGNN)とグローバルセルフアテンションを組み合わせたグラフ処理モデルである。これらは普遍関数近似器であることが示され、2つの予約がなされた。 1. 初期ノード機能は、特定の位置エンコーディングで拡張されなければならない。 2. 近似は非一様である: 異なる大きさのグラフは異なる近似ネットワークを必要とするかもしれない。同じ位置エンコーディング、純粋なMPGNN、さらには2層MLPでさえも一様でない普遍近似器である。対象関数は、すべての大きさのグラフの1つのネットワークによって近似される。そこで、GTとより効率的なMPGNN + Virtual Nodeアーキテクチャを比較します。 2つのモデル定義の主な違いは、そのグローバルな計算方法であるSelf-Attention Vs Virtual Nodeである。いずれのモデルも、主要な結果を証明する前に、一様ユニバーサル近似子ではないことを証明している。合成データに関する実験でその理論を実証する。さらに、実世界のデータセットを用いて研究を拡大し、実際の明確なランキングも示さない混合結果を観察します。 Graph Transformers (GTs) such as SAN and GPS are graph processing models that combine Message-Passing GNNs (MPGNNs) with global Self-Attention. They were shown to be universal function approximators, with two reservations: 1. The initial node features must be augmented with certain positional encodings. 2. The approximation is non-uniform: Graphs of different sizes may require a different approximating network. We first clarify that this form of universality is not unique to GTs: Using the same positional encodings, also pure MPGNNs and even 2-layer MLPs are non-uniform universal approximators. We then consider uniform expressivity: The target function is to be approximated by a single network for graphs of all sizes. There, we compare GTs to the more efficient MPGNN + Virtual Node architecture. The essential difference between the two model definitions is in their global computation method -- Self-Attention Vs Virtual Node. We prove that none of the models is a uniform-universal approximator, before proving our main result: Neither model's uniform expressivity subsumes the other's. We demonstrate the theory with experiments on synthetic data. We further augment our study with real-world datasets, observing mixed results which indicate no clear ranking in practice as well.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# プラズマダイナミクスの低次モデリングのための浅電流デコーダ Shallow Recurrent Decoder for Reduced Order Modeling of Plasma Dynamics ( http://arxiv.org/abs/2405.11955v1 ) ライセンス: Link先を確認	J. Nathan Kutz, Maryam Reza, Farbod Faraji, Aaron Knoll,	(参考訳) 複雑かつ多スケールの時空間力学を計算的に抽出する上で、低次モデルがますます重要になっている。このような代理モデルの計算効率は、特に設計、徹底的な探索、物理的理解において重要である。プラズマシミュレーションは、特に${\bf E}\times {\bf B}$プラズマ放電の研究に適用され、ホールスラスタのような技術は、広い空間スケールと時間スケールにまたがる多次元力学を解決するために、かなりの計算資源を必要とする。高忠実度計算ツールは、限られた条件や高度に単純化されたジオメトリーでそのようなシステムをシミュレートすることができるが、多くの幾何学的構成や異なる物理的条件下でのフルサイズのシステムや広範囲なパラメトリックな研究のシミュレーションは、従来の数値ツールで計算的に計算可能である。したがって、重要な${\bf E}\times {\bf B}$技術を含む、科学研究と産業指向のプラズマシステムのモデリングは、オーダーモデリングアルゴリズムの縮小による大きな恩恵を受ける。本稿では,SHREDアーキテクチャに基づくモデル縮小手法を提案する。このスキームは、デコーダネットワークを介して、限られたセンサー計測を時間(シーケンスからシーケンスエンコーディング)で完全な状態空間再構成に符号化するニューラルネットワークを使用する。変数の分離の理論に基づいて、SHREDアーキテクチャは機能する。一極小3点のセンサで全時空間を再構築すること。センサフィードで測定されていないが、測定されたフィールドと動的に結合しているフィールドであっても。二トレーニングされた時間符号化モデルからニューラルネットワークのロールアウトを用いてシステムの将来の状態を予測すること。 Reduced order models are becoming increasingly important for rendering complex and multiscale spatio-temporal dynamics computationally tractable. The computational efficiency of such surrogate models is especially important for design, exhaustive exploration and physical understanding. Plasma simulations, in particular those applied to the study of ${\bf E}\times {\bf B}$ plasma discharges and technologies, such as Hall thrusters, require substantial computational resources in order to resolve the multidimentional dynamics that span across wide spatial and temporal scales. Although high-fidelity computational tools are available to simulate such systems over limited conditions and in highly simplified geometries, simulations of full-size systems and/or extensive parametric studies over many geometric configurations and under different physical conditions are computationally intractable with conventional numerical tools. Thus, scientific studies and industrially oriented modeling of plasma systems, including the important ${\bf E}\times {\bf B}$ technologies, stand to significantly benefit from reduced order modeling algorithms. We develop a model reduction scheme based upon a {\em Shallow REcurrent Decoder} (SHRED) architecture. The scheme uses a neural network for encoding limited sensor measurements in time (sequence-to-sequence encoding) to full state-space reconstructions via a decoder network. Based upon the theory of separation of variables, the SHRED architecture is capable of (i) reconstructing full spatio-temporal fields with as little as three point sensors, even the fields that are not measured with sensor feeds but that are in dynamic coupling with the measured field, and (ii) forecasting the future state of the system using neural network roll-outs from the trained time encoding model.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# 説明フレームワークにおける共通点の探索:マルチドメインサーベイ分析 Exploring Commonalities in Explanation Frameworks: A Multi-Domain Survey Analysis ( http://arxiv.org/abs/2405.11958v1 ) ライセンス: Link先を確認	Eduard Barbu, Marharytha Domnich, Raul Vicente, Nikos Sakkas, André Morim,	(参考訳) 本研究は,3つの領域の専門家による調査および議論から得られた知見を提示し,これらと類似のユースケースに適用可能な普遍的説明枠組みに不可欠な要素を見出すことを目的としている。これらの洞察は、その解釈可能性で知られるGPアルゴリズムを利用するソフトウェアツールに組み込まれる。分析されたアプリケーションには、医療シナリオ(予測MLを含む)、小売ユースケース(規範MLを含む)、エネルギーユースケース(予測MLも含む)が含まれる。私たちは各セクターのプロフェッショナルにインタビューを行い、さらなる分析のために会話の書き起こしを行いました。さらに、これらの分野の専門家や非専門家は、説明法の様々な側面を調査するために設計されたアンケートを埋めた。以上の結果から,より説明可能性の高い精度を犠牲にすることが普遍的に望まれることが示唆された。さらに,このようなフレームワークの重要コンポーネントとして,機能の重要性と反実的説明の重要性を強調した。 XAI分野における知識の普及を促進するために,我々のアンケートを公開している。 This study presents insights gathered from surveys and discussions with specialists in three domains, aiming to find essential elements for a universal explanation framework that could be applied to these and other similar use cases. The insights are incorporated into a software tool that utilizes GP algorithms, known for their interpretability. The applications analyzed include a medical scenario (involving predictive ML), a retail use case (involving prescriptive ML), and an energy use case (also involving predictive ML). We interviewed professionals from each sector, transcribing their conversations for further analysis. Additionally, experts and non-experts in these fields filled out questionnaires designed to probe various dimensions of explanatory methods. The findings indicate a universal preference for sacrificing a degree of accuracy in favor of greater explainability. Additionally, we highlight the significance of feature importance and counterfactual explanations as critical components of such a framework. Our questionnaires are publicly available to facilitate the dissemination of knowledge in the field of XAI.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# モジュール最適化フレームワークにおける個人と共同モジュールの影響の定量化 Quantifying Individual and Joint Module Impact in Modular Optimization Frameworks ( http://arxiv.org/abs/2405.11964v1 ) ライセンス: Link先を確認	Ana Nikolikj, Ana Kostovska, Diederick Vermetten, Carola Doerr, Tome Eftimov,	(参考訳) 本研究では,モジュールが連続的な単一目的のブラックボックス最適化のための最適化フレームワークの性能に与える影響について検討する。アルゴリズムの変種を設計する場合から選択すべきモジュールは多岐にわたるが、各モジュールがアルゴリズムのパフォーマンスにどう影響するか、組み合わせた場合モジュールが相互にどのように相互作用するかについては、かなり限定的な理解がある。本稿では,関数型 ANOVA (f-ANOVA) フレームワークを用いて,モジュール分散行列適応 (modCMA) とモジュール微分進化 (modDE) の2つのアルゴリズムに対する個々のモジュールとモジュールの組み合わせの影響を定量化する。 BBOBベンチマークの324 modCMA および 576 modDE 変種の性能データを2つの問題次元と3つの計算予算で解析する。例えば、低次元問題に対する~\textit{weights\ option} や~\textit{mirrored} モジュール、高次元問題に対する~\textit{base\ sampler} などである。 ――\textit{lpsr} モジュールの個々の大きな影響は、問題次元や計算予算に関わらず、modDE のパフォーマンスに非常に重要である。 modCMAとmodDEを比較すると、modDEは個々のモジュールがより影響力のあるものから、モジュールの組み合わせがより影響力のあるものへとシフトし、modCMAは問題次元と計算予算の増加とともに反対のパターンに従う。 This study explores the influence of modules on the performance of modular optimization frameworks for continuous single-objective black-box optimization. There is an extensive variety of modules to choose from when designing algorithm variants, however, there is a rather limited understanding of how each module individually influences the algorithm performance and how the modules interact with each other when combined. We use the functional ANOVA (f-ANOVA) framework to quantify the influence of individual modules and module combinations for two algorithms, the modular Covariance Matrix Adaptation (modCMA) and the modular Differential Evolution (modDE). We analyze the performance data from 324 modCMA and 576 modDE variants on the BBOB benchmark collection, for two problem dimensions, and three computational budgets. Noteworthy findings include the identification of important modules that strongly influence the performance of modCMA, such as the~\textit{weights\ option} and~\textit{mirrored} modules for low dimensional problems, and the~\textit{base\ sampler} for high dimensional problems. The large individual influence of the~\textit{lpsr} module makes it very important for the performance of modDE, regardless of the problem dimensionality and the computational budget. When comparing modCMA and modDE, modDE undergoes a shift from individual modules being more influential, to module combinations being more influential, while modCMA follows the opposite pattern, with an increase in problem dimensionality and computational budget.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# No Free Lunch: 教育におけるソフトウェアテストの研究 No Free Lunch: Research Software Testing in Teaching ( http://arxiv.org/abs/2405.11965v1 ) ライセンス: Link先を確認	Michael Dorner, Andreas Bauer, Florian Angermeir,	(参考訳) ソフトウェアは今日、ほとんどの科学的発見の中核にある。そのため、研究成果の質は研究ソフトウェアの品質に大きく依存する。厳格なテストは、業界におけるソフトウェア工学から分かっているように、研究ソフトウェアの品質を保証するだけでなく、学術分野では報われないような相当な努力も必要です。そこで本研究では,研究ソフトウェア教育に組み込んだ研究ソフトウェアテストの効果について検討する。 In-vivo experimentでは、大規模なネットワークシミュレーションのためのテストスイートのエンジニアリングをスウェーデンのブレキング工科大学のソフトウェアテストコースに統合し、この統合が研究ソフトウェアに与える影響を質的に測定した。調査ソフトウェアは、ドキュメントを大幅に改善し、ハードウェアやソフトウェアの依存関係を少なくすることで、統合の恩恵を受けていることが分かりました。しかし、この統合は大変で、学生チームはエレガントで思慮深いテストスイートを開発しました。我々は、テストのような研究ソフトウェア工学を教育に組み込むことは、研究ソフトウェア自体だけでなく、学生にとっても有益であると強く信じているが、次世代の研究は、彼らの教育の一部として研究ソフトウェア工学や最先端の研究に接触するため、学生のコードの知的特性に関する不確実性は、研究ソフトウェア試験を教育に組み込む可能性を大幅に制限する。 Software is at the core of most scientific discoveries today. Therefore, the quality of research results highly depends on the quality of the research software. Rigorous testing, as we know it from software engineering in the industry, could ensure the quality of the research software but it also requires a substantial effort that is often not rewarded in academia. Therefore, this research explores the effects of research software testing integrated into teaching on research software. In an in-vivo experiment, we integrated the engineering of a test suite for a large-scale network simulation as group projects into a course on software testing at the Blekinge Institute of Technology, Sweden, and qualitatively measured the effects of this integration on the research software. We found that the research software benefited from the integration through substantially improved documentation and fewer hardware and software dependencies. However, this integration was effortful and although the student teams developed elegant and thoughtful test suites, no code by students went directly into the research software since we were not able to make the integration back into the research software obligatory or even remunerative. Although we strongly believe that integrating research software engineering such as testing into teaching is not only valuable for the research software itself but also for students, the research of the next generation, as they get in touch with research software engineering and bleeding-edge research in their field as part of their education, the uncertainty about the intellectual properties of students' code substantially limits the potential of integrating research software testing into teaching.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# 多項目質問は効率的かつロバストなLCM評価器である Multiple-Choice Questions are Efficient and Robust LLM Evaluators ( http://arxiv.org/abs/2405.11966v1 ) ライセンス: Link先を確認	Ziyin Zhang, Lizhen Xu, Zhaokun Jiang, Hongkun Hao, Rui Wang,	(参考訳) GSM-MC と MATH-MC は,50以上のオープンソースモデルから GSM8K と MATH の回答と誤予測を収集して構築された2つの多重選択(MC)データセットである。広範にわたる実験により,これら2つのベンチマークのMCバージョンにおけるLCMの性能は,元のバージョンにおける性能と強く相関し,選択やオプションの順序を逸脱させる可能性が高く,評価時間を最大30倍に短縮できることを示した。同様の手順に従って,HumanEval と MBPP の2つの LLM 評価ベンチマークから構築した新しいプログラム出力予測MCデータセットである PythonIO も導入した。私たちのデータとコードはhttps://github.com/Geralt-Targaryen/MC-Evaluation.comで公開されています。 We present GSM-MC and MATH-MC, two multiple-choice (MC) datasets constructed by collecting answers and incorrect predictions on GSM8K and MATH from over 50 open-source models. Through extensive experiments, we show that LLMs' performance on the MC versions of these two popular benchmarks is strongly correlated with their performance on the original versions, and is quite robust to distractor choices and option orders, while the evaluation time is reduced by a factor of up to 30. Following a similar procedure, we also introduce PythonIO, a new program output prediction MC dataset constructed from two other popular LLM evaluation benchmarks HumanEval and MBPP. Our data and code are available at https://github.com/Geralt-Targaryen/MC-Evaluation.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# 成人家庭におけるCVDリスクファクターの自己管理を支援するリコメンダアルゴリズム Recommender Algorithm for Supporting Self-Management of CVD Risk Factors in an Adult Population at Home ( http://arxiv.org/abs/2405.11967v1 ) ライセンス: Link先を確認	Tatiana V. Afanasieva, Pavel V. Platov, Anastasia I. Medvedeva,	(参考訳) 推薦アルゴリズムの開発における新しいトレンドの1つは、人口の健康管理を支援する能力の普及である。本稿では,心臓血管疾患 (CVD) の予防効果の改善に焦点をあてる。この問題を解決するため,家庭におけるCVDリスクファクターの自己管理を支援するために,知識に基づく推薦アルゴリズムが提案された。提案アルゴリズムは,従来の多次元レコメンデーションモデルと,CVDの健康状態の予測評価を含む新しいユーザプロファイルモデルに基づく。提案アルゴリズムの主な特徴は,多次元レコメンデーションの説明的構成要素として,ルールベースの論理と多言語モデルを組み合わせることである。提案アルゴリズムの有効性を検証し,提案アルゴリズムの有効性を示した。同様の知識に基づく推薦アルゴリズムとの比較から,提案アルゴリズムはCVDリスク因子の多さを評価し,提案アルゴリズムが生成したレコメンデーションの情報とセマンティックキャパシティを有することを示す。 One of the new trends in the development of recommendation algorithms is the dissemination of their capabilities to support the population in managing their health. This article focuses on the problem of improving the effectiveness of cardiovascular diseases (CVD) prevention, since CVD is the leading cause of death worldwide. To address this issue, a knowledge-based recommendation algorithm was proposed to support self-management of CVD risk factors in adults at home. The proposed algorithm is based on the original multidimensional recommendation model and on a new user profile model, which includes predictive assessments of CVD health in addition to its current ones as outlined in official guidelines. The main feature of the proposed algorithm is the combination of rule-based logic with the capabilities of a large language model in generating human-like text for explanatory component of multidimensional recommendation. The verification and evaluation of the proposed algorithm showed the usefulness of the proposed recommendation algorithm for supporting adults in self-management of their CVD risk factors at home. As follows from the comparison with similar knowledge-based recommendation algorithms, the proposed algorithm evaluates a larger number of CVD risk factors and has a greater information and semantic capacity of the generated recommendations.	翻訳日:2024-05-21 13:24:44 公開日:2024-05-20
# グラフニューラルネットワークの条件シフト・ロバスト整形予測 Conditional Shift-Robust Conformal Prediction for Graph Neural Network ( http://arxiv.org/abs/2405.11968v1 ) ライセンス: Link先を確認	S. Akansha,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データの結果を予測する強力なツールとして登場した。有効性にもかかわらず、GNNの重大な欠点は、堅牢な不確実性推定を提供する能力が限られていることであり、エラーが重大な結果をもたらす状況において、信頼性に課題が生じる。さらに、GNNは一般的に、トレーニングデータとテストデータが同じ分布に従うと仮定して、分散環境では優れています。本稿では、予測モデル出力を予測セットに変換することで不確実性を定量化するための、広く知られている統計手法であるコンフォメーション予測を活用し、条件シフト中のGNN予測における不確実性定量化に対処する。グラフベースの半教師あり学習(SSL)。さらに,潜在段階における条件シフトを最小限に抑えて,モデル予測の精細化を目的とした新たな損失関数を提案する。条件シフトロバスト (CondSR) によるGNNの共形予測は, モデルに依存しない, 様々な分類モデルに適用可能なアプローチである。提案手法の有効性を標準グラフベンチマークデータセットで検証し,ノード分類タスクにおける最先端のGNNと統合する。コードの実装は、さらなる探索と実験のために公開されています。 Graph Neural Networks (GNNs) have emerged as potent tools for predicting outcomes in graph-structured data. Despite their efficacy, a significant drawback of GNNs lies in their limited ability to provide robust uncertainty estimates, posing challenges to their reliability in contexts where errors carry significant consequences. Moreover, GNNs typically excel in in-distribution settings, assuming that training and test data follow identical distributions: a condition often unmet in real-world graph data scenarios. In this article, we leverage conformal prediction, a widely recognized statistical technique for quantifying uncertainty by transforming predictive model outputs into prediction sets, to address uncertainty quantification in GNN predictions amidst conditional shift \footnote{Representing the change in conditional probability distribution $P(label \|input)$ from source domain to target domain.} in graph-based semi-supervised learning (SSL). Additionally, we propose a novel loss function aimed at refining model predictions by minimizing conditional shift in latent stages. Termed Conditional Shift Robust (CondSR) conformal prediction for GNNs, our approach CondSR is model-agnostic and adaptable to various classification models. We validate the effectiveness of our method on standard graph benchmark datasets, integrating it with state-of-the-art GNNs in node classification tasks. The code implementation is publicly available for further exploration and experimentation.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# 大規模言語モデルを用いたテキスト型人物検索のためのデータ拡張 Data Augmentation for Text-based Person Retrieval Using Large Language Models ( http://arxiv.org/abs/2405.11971v1 ) ライセンス: Link先を確認	Zheng Li, Lijia Si, Caili Guo, Yang Yang, Qiushi Cao,	(参考訳) テキストベースのPerson Retrieval (TPR)は、テキストクエリが与えられた記述と一致する人物画像の検索を目的としている。 TPRモデルの性能改善は、教師あり訓練のための高品質なデータに依存している。しかし、高価なアノテーションとプライバシー保護のため、大規模で高品質なTPRデータセットを構築するのは難しい。最近、LLM(Large Language Models)は多くのNLPタスクにおいて人間のパフォーマンスに近づいたり、超えたりして、高品質なTPRデータセットを拡張する可能性を生み出している。本稿では,TPRのためのLLM-DA法を提案する。 LLM-DAはLLMを使用して現在のTPRデータセットのテキストを書き直し、データセットを簡潔かつ効率的に高品質に拡張する。これらの書き直されたテキストは、元のキーコンセプトとセマンティック情報を保持しながら、語彙と文構造の多様性を高めることができる。 LLMの幻覚を軽減するため、LLM-DAは不誠実な書き直しテキストをフィルタリングするテキスト忠実度フィルタ(TFF)を導入した。原文と増補テキストのコントリビューションのバランスをとるために,原文と増補テキストの比率を制御するために,バランスドサンプリング戦略(BSS)を提案する。 LLM-DAは様々なTPRモデルに容易に統合できるプラグアンドプレイ方式である。 3つのTPRベンチマークの総合的な実験により、LLM-DAは現在のTPRモデルの検索性能を向上させることができることが示された。 Text-based Person Retrieval (TPR) aims to retrieve person images that match the description given a text query. The performance improvement of the TPR model relies on high-quality data for supervised training. However, it is difficult to construct a large-scale, high-quality TPR dataset due to expensive annotation and privacy protection. Recently, Large Language Models (LLMs) have approached or even surpassed human performance on many NLP tasks, creating the possibility to expand high-quality TPR datasets. This paper proposes an LLM-based Data Augmentation (LLM-DA) method for TPR. LLM-DA uses LLMs to rewrite the text in the current TPR dataset, achieving high-quality expansion of the dataset concisely and efficiently. These rewritten texts are able to increase the diversity of vocabulary and sentence structure while retaining the original key concepts and semantic information. In order to alleviate the hallucinations of LLMs, LLM-DA introduces a Text Faithfulness Filter (TFF) to filter out unfaithful rewritten text. To balance the contributions of original text and augmented text, a Balanced Sampling Strategy (BSS) is proposed to control the proportion of original text and augmented text used for training. LLM-DA is a plug-and-play method that can be easily integrated into various TPR models. Comprehensive experiments on three TPR benchmarks show that LLM-DA can improve the retrieval performance of current TPR models.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# ノイズの多いOracleによる量子検索」への追加 Addendum to "Quantum Search with Noisy Oracle" ( http://arxiv.org/abs/2405.11973v1 ) ライセンス: Link先を確認	Ansis Rosmanis,	(参考訳) 本稿では、最近の研究(arXiv:2309.14944)のテクニックを一般化し、クエリレジスタの1つの既知のキュービットがレート p の非分極ノイズの影響を受けているとしても、n 要素間の量子探索は O(np) クエリよりも高速に行うことができないことを示す。これは、影響を受けるqubitが log(n) インデックス qubits の 1 つで、それがターゲット qubit である場合にも当てはまる。 In this note, I generalize the techniques of my recent work (arXiv:2309.14944) and show that, even if just a single known qubit of query registers is affected by the depolarizing noise of rate p, quantum search among n elements cannot be done any faster than in O(np) queries. This holds both when the affected qubit is one of the log(n) index qubits and when it is the target qubit.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# 胸部X線異常検出のための位置ガイド型プロンプト学習 Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays ( http://arxiv.org/abs/2405.11976v1 ) ライセンス: Link先を確認	Zhichao Sun, Yuliang Gu, Yepeng Liu, Zerui Zhang, Zhou Zhao, Yongchao Xu,	(参考訳) 胸部X線異常検出は重要な課題である。ほとんどの手法は、主に正規像の分布をモデル化し、その後に正規分布からのかなりのずれを異常とみなす。近年,多数の医用画像に基づいて事前トレーニングされたCLIPベースの手法は,ゼロ/フェーショットダウンストリームタスクにおいて顕著な性能を示した。本稿では,CLIPを用いた胸部X線異常検出法の可能性について検討する。そこで本研究では,CLIP事前学習データとタスク固有データとの相違を考慮し,位置誘導型プロンプト学習手法を提案する。具体的には, 胸部X線検査を専門とする専門家が, 個別の肺領域を慎重に検査することによって診断できることに着想を得て, 学習可能な位置誘導テキストと画像のプロンプトを提案し, 課題データを凍結前訓練CLIPモデルに適応させる。モデルの識別能力を高めるために,胸部X線を用いた新しい構造保存異常合成法を提案する。 3つのデータセットに対する大規模な実験により,提案手法は最先端の手法よりも優れていることが示された。実装のコードはhttps://github.com/sunzc-sunny/PPAD.comで公開されています。 Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based methods for anomaly detection in chest X-rays. Considering the discrepancy between the CLIP pre-training data and the task-specific data, we propose a position-guided prompt learning method. Specifically, inspired by the fact that experts diagnose chest X-rays by carefully examining distinct lung regions, we propose learnable position-guided text and image prompts to adapt the task data to the frozen pre-trained CLIP-based model. To enhance the model's discriminative capability, we propose a novel structure-preserving anomaly synthesis method within chest x-rays during the training process. Extensive experiments on three datasets demonstrate that our proposed method outperforms some state-of-the-art methods. The code of our implementation is available at https://github.com/sunzc-sunny/PPAD.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# GuidedRec: Ill-Posed Unsupervised Volumetric Recoveryのガイド GuidedRec: Guiding Ill-Posed Unsupervised Volumetric Recovery ( http://arxiv.org/abs/2405.11977v1 ) ライセンス: Link先を確認	Alexandre Cafaro, Amaury Leroy, Guillaume Beldjoudi, Pauline Maury, Charlotte Robert, Eric Deutsch, Vincent Grégoire, Vincent Lepetit, Nikos Paragios,	(参考訳) 既往の3Dボリュームを利用した2つの平面投影から3Dボリュームを再構成する新規な教師なしアプローチを提案する。このようなボリュームは、多くの重要な医療処置で容易に利用でき、以前の方法ではすでにそのようなボリュームを使用していた。この体積をプロジェクションに合わせるために変形することによって機能する初期の手法は、アライメントが過小評価されるにつれて、投射の数が非常に低いときに失敗する。本稿では, 体積構造の生成モデルを用いて変形を制限し, 正確な推定値を得る方法について述べる。さらに,本手法は特定のセンサキャリブレーションに縛られず,リトレーニングなしで新しいキャリブレーションに適用できる。我々は、挑戦的なデータセットに対する我々のアプローチを評価し、最先端の手法より優れていることを示す。その結果,患者の放射線被曝を劇的に減少させながら,手術や放射線治療などの治療シナリオにおいて本手法が有効であった。 We introduce a novel unsupervised approach to reconstructing a 3D volume from only two planar projections that exploits a previous\-ly-captured 3D volume of the patient. Such volume is readily available in many important medical procedures and previous methods already used such a volume. Earlier methods that work by deforming this volume to match the projections typically fail when the number of projections is very low as the alignment becomes underconstrained. We show how to use a generative model of the volume structures to constrain the deformation and obtain a correct estimate. Moreover, our method is not bounded to a specific sensor calibration and can be applied to new calibrations without retraining. We evaluate our approach on a challenging dataset and show it outperforms state-of-the-art methods. As a result, our method could be used in treatment scenarios such as surgery and radiotherapy while drastically reducing patient radiation exposure.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# SM-DTW: シグネチャ検証のための安定性変調動的時間ワープ SM-DTW: Stability Modulated Dynamic Time Warping for signature verification ( http://arxiv.org/abs/2405.11978v1 ) ライセンス: Link先を確認	Antonio Parziale, Moises Diaz, Miguel A. Ferrer, Angelo Marcelli,	(参考訳) 筆者らは,手書き学習と実行の計算モデルにおいて,被験者の署名の複数実行中に実行された実際の動作の違いを説明するための安定性の概念を導入し,署名の最も安定した部分は,署名検証中の疑似署名と参照署名との類似性を評価する上で重要な役割を担わなければならないと推測した。次に、2つのシグネチャ間の最も類似した部分である安定性領域を、シグネチャ検証のための動的時間ワープによって計算された2つのシグネチャ間の距離測定に組み込むための安定性変調動的時間ワープアルゴリズムを導入する。 2つのデータセットで実験を行い、性能評価を行った。実験の結果,提案アルゴリズムはベースラインシステムの性能を向上し,他のトップパフォーマンスシグネチャ検証システムと比較した。 Building upon findings in computational model of handwriting learning and execution, we introduce the concept of stability to explain the difference between the actual movements performed during multiple execution of the subject's signature, and conjecture that the most stable parts of the signature should play a paramount role in evaluating the similarity between a questioned signature and the reference ones during signature verification. We then introduce the Stability Modulated Dynamic Time Warping algorithm for incorporating the stability regions, i.e. the most similar parts between two signatures, into the distance measure between a pair of signatures computed by the Dynamic Time Warping for signature verification. Experiments were conducted on two datasets largely adopted for performance evaluation. Experimental results show that the proposed algorithm improves the performance of the baseline system and compares favourably with other top performing signature verification systems.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・インダプティブ・スペースを用いたロバスト・ディープ・強化学習 Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space ( http://arxiv.org/abs/2405.11982v1 ) ライセンス: Link先を確認	Qianmei Liu, Yufei Kuang, Jie Wang,	(参考訳) 深部強化学習(DRL)アルゴリズムはシミュレーションと実世界の間の誤差をモデル化する。多くの研究は、逆行学習を用いて、訓練過程中に摂動を発生させ、相違をモデル化し、DRLの堅牢性を改善する。しかし、これらの手法のほとんどは、対向摂動の強度を制御するために固定パラメータを使用し、これは平均性能とロバスト性の間のトレードオフにつながる可能性がある。実際、摂動の最適パラメータを見つけることは、過度な摂動がトレーニングと妥協エージェントのパフォーマンスを不安定にし、一方不十分な摂動は強靭性を高めるのに十分な情報を与えることができないため、困難である。頑健さを向上しながらトレーニングを安定させるため,適応的対向摂動 (Adaptive Adversarial Perturbation, A2P) という簡易かつ効果的な手法を提案し,各サンプルに対して適切な対向摂動を動的に選択できる。具体的には、トレーニング中の対向的摂動の影響を調整するための適応的対向係数フレームワークを提案する。摂動の現在の強度の計量を設計することにより、現在の相対的性能に基づいて適切な摂動レベルを計算することができる。提案手法の特長は,実世界のアプリケーションに簡単にデプロイでき,シミュレータに事前にアクセスする必要がなくなることである。 MuJoCoの実験から,本手法はトレーニングの安定性を向上し,異なるテスト環境に移行する際の堅牢なポリシを学習できることが示された。コードはhttps://github.com/Lqm00/A2P-SACで入手できる。 Deep reinforcement learning (DRL) algorithms can suffer from modeling errors between the simulation and the real world. Many studies use adversarial learning to generate perturbation during training process to model the discrepancy and improve the robustness of DRL. However, most of these approaches use a fixed parameter to control the intensity of the adversarial perturbation, which can lead to a trade-off between average performance and robustness. In fact, finding the optimal parameter of the perturbation is challenging, as excessive perturbations may destabilize training and compromise agent performance, while insufficient perturbations may not impart enough information to enhance robustness. To keep the training stable while improving robustness, we propose a simple but effective method, namely, Adaptive Adversarial Perturbation (A2P), which can dynamically select appropriate adversarial perturbations for each sample. Specifically, we propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. By designing a metric for the current intensity of the perturbation, our method can calculate the suitable perturbation levels based on the current relative performance. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments. The code is available at https://github.com/Lqm00/A2P-SAC.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# 仮想チューターとしての大規模言語モデルの利用に関するレビュー A review on the use of large language models as virtual tutors ( http://arxiv.org/abs/2405.11983v1 ) ライセンス: Link先を確認	Silvia García-Méndez, Francisco de Arriba-Pérez, María del Carmen Somoza-López,	(参考訳) トランスフォーマーアーキテクチャは、自然言語処理の長期的な依存関係の管理に寄与する。これらのアーキテクチャは、いくつかの分野や産業で大きな話題を呼んだ、最先端の大規模言語モデル(LLM)の基盤となっている。したがって、これらの生成的人工知能ベースのソリューションは、高品質な学習に向けて、教育方法やコンテンツ、ネットワークインフラストラクチャーにおける技術の変化と進化を導いてきた。 LLMの人気を踏まえて,本研究は,学生や教員が設計・実験計画に携わる教育教材の制作・評価に特化して設計されたソリューションの概要を概観するものである。我々の知る限りでは、LLMの教育応用(例えば、学生評価)に関する最初のレビューとなる。予想通り、これらのシステムの最も一般的な役割は、自動質問生成のための仮想チューターである。さらに、最も人気のあるモデルはGTP-3とBERTである。しかし、新しい生成モデルの継続的なローンチにより、まもなく新しい作品が公開される予定である。 Transformer architectures contribute to managing long-term dependencies for Natural Language Processing, representing one of the most recent changes in the field. These architectures are the basis of the innovative, cutting-edge Large Language Models (LLMs) that have produced a huge buzz in several fields and industrial sectors, among the ones education stands out. Accordingly, these generative Artificial Intelligence-based solutions have directed the change in techniques and the evolution in educational methods and contents, along with network infrastructure, towards high-quality learning. Given the popularity of LLMs, this review seeks to provide a comprehensive overview of those solutions designed specifically to generate and evaluate educational materials and which involve students and teachers in their design or experimental plan. To the best of our knowledge, this is the first review of educational applications (e.g., student assessment) of LLMs. As expected, the most common role of these systems is as virtual tutors for automatic question generation. Moreover, the most popular models are GTP-3 and BERT. However, due to the continuous launch of new generative models, new works are expected to be published shortly.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# MTVQA:多言語テキスト中心ビジュアル質問応答のベンチマーク MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering ( http://arxiv.org/abs/2405.11985v1 ) ライセンス: Link先を確認	Jingqun Tang, Qi Liu, Yongjie Ye, Jinghui Lu, Shu Wei, Chunhui Lin, Wanqing Li, Mohamad Fitri Faiz Bin Mahmood, Hao Feng, Zhen Zhao, Yanjie Wang, Yuliang Liu, Hao Liu, Xiang Bai, Can Huang,	(参考訳) Text-Centric Visual Question Answering (TEC-VQA)は、テキスト中心の視覚環境における人間と機械の相互作用を促進するだけでなく、テキスト中心のシーン理解の領域におけるAIモデルを評価するデファクトゴールドプロキシとしても機能する。しかしながら、ほとんどのTEC-VQAベンチマークは、英語や中国語のような高リソース言語に焦点を当てている。翻訳エンジンを用いた非テキスト中心のVQAデータセットにおける多言語QAペアの拡張という先駆的な取り組みにもかかわらず、TEC-VQAに適用した場合、翻訳ベースのプロトコルは「視覚的テキストの誤り」という重大な問題に遭遇する。具体的には、画像に存在する視覚的テキストを無視しながら、質問対のテキストを優先する。さらに、ニュアンス付き意味、文脈歪み、言語バイアス、質問型多様性に関連する問題に適切に対処することができない。本研究では、多言語TEC-VQAの課題に対処し、MTVQAと呼ばれる9つの言語で高品質な人間専門家アノテーションをベンチマークする。我々の知る限り、MTVQAはテキスト中心のシナリオに対する人間の専門家アノテーションを提供する最初の多言語TEC-VQAベンチマークである。さらに、我々のMTVQAデータセット上で、GPT-4Vを含む最先端のMLLM(Multimodal Large Language Models)を評価することにより、我々のデータセットの価値を裏付けるパフォーマンス改善の余地がまだ残っていることが明らかである。このデータセットが、コミュニティ内で新たな視点とインスピレーションを提供することを期待しています。 MTVQAデータセットはhttps://huggingface.co/datasets/ByteDance/MTVQAで提供される。 Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scene understanding. However, most TEC-VQA benchmarks have focused on high-resource languages like English and Chinese. Despite pioneering works to expand multilingual QA pairs in non-text-centric VQA datasets using translation engines, the translation-based protocol encounters a substantial ``Visual-textual misalignment'' problem when applied to TEC-VQA. Specifically, it prioritizes the text in question-answer pairs while disregarding the visual text present in images. Furthermore, it does not adequately tackle challenges related to nuanced meaning, contextual distortion, language bias, and question-type diversity. In this work, we address the task of multilingual TEC-VQA and provide a benchmark with high-quality human expert annotations in 9 diverse languages, called MTVQA. To our knowledge, MTVQA is the first multilingual TEC-VQA benchmark to provide human expert annotations for text-centric scenarios. Further, by evaluating several state-of-the-art Multimodal Large Language Models (MLLMs), including GPT-4V, on our MTVQA dataset, it is evident that there is still room for performance improvement, underscoring the value of our dataset. We hope this dataset will provide researchers with fresh perspectives and inspiration within the community. The MTVQA dataset will be available at https://huggingface.co/datasets/ByteDance/MTVQA.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# 分離論理、計算独立性、擬似性について(拡張版) On Separation Logic, Computational Independence, and Pseudorandomness (Extended Version) ( http://arxiv.org/abs/2405.11987v1 ) ライセンス: Link先を確認	Ugo Dal Lago, Davide Davoli, Bruce M. Kapron,	(参考訳) 分離論理は、動的データ構造を扱うプログラムの検証に、多数の実りある応用があることが証明された構造論理である。近年、Barthe, Hsu と Liao は分離論理式に意味論を与える新しい方法を提案している。 2つの事象が独立であることと、合同確率が2つの事象の確率の積であることは同値である。独立性の弱い概念に関する文献は、本質的には計算的であり、すなわち、独立性は効率的な敵に対してのみ成り立ち、成功の無視可能な確率を変調する。本研究の目的は、上記の分離論理の進歩の観点から、暗号シナリオにおける計算独立の性質を探ることである。一方,分離論理のセマンティクスは,複雑性境界を考慮に入れれば適用可能であること,一方,得られた論理系は,敵が隠されたままの標準暗号結果の単純かつコンパクトな証明を書くのに有用であることを示す。注目すべきなのは、これは独立性と疑似ランダム性の間の実りある相互作用を可能にすることだ。 Separation logic is a substructural logic which has proved to have numerous and fruitful applications to the verification of programs working on dynamic data structures. Recently, Barthe, Hsu and Liao have proposed a new way of giving semantics to separation logic formulas in which separating conjunction is interpreted in terms of probabilistic independence. The latter is taken in its exact form, i.e., two events are independent if and only if the joint probability is the product of the probabilities of the two events. There is indeed a literature on weaker notions of independence which are computational in nature, i.e. independence holds only against efficient adversaries and modulo a negligible probability of success. The aim of this work is to explore the nature of computational independence in a cryptographic scenario, in view of the aforementioned advances in separation logic. We show on the one hand that the semantics of separation logic can be adapted so as to account for complexity bounded adversaries, and on the other hand that the obtained logical system is useful for writing simple and compact proofs of standard cryptographic results in which the adversary remains hidden. Remarkably, this allows for a fruitful interplay between independence and pseudorandomness, itself a crucial notion in cryptography.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# 全国規模の電気通信インフラにおけるコヒーレント量子通信 Coherent Quantum Communications Across National Scale Telecommunication Infrastructure ( http://arxiv.org/abs/2405.11990v1 ) ライセンス: Link先を確認	Mirko Pittaluga, Yuen San Lo, Adam Brzosko, Robert I. Woodward, Matthew S. Winnel, Thomas Roger, James F. Dynes, Kim A. Owen, Sergio Juarez, Piotr Rydlichowski, Domenico Vicinanza, Guy Roberts, Andrew J. Shields,	(参考訳) 量子通信は、重畳や絡み合いのような量子現象を利用して、リモートノード間の情報伝達を強化する。位相ベースの量子インターネットアーキテクチャに不可欠なコヒーレント量子通信は、ノード間の光コヒーレンスを必要とし、通常は単一光子干渉を伴う。光コヒーレンス保存や高度な単一光子検出器の統合といった課題は、既存の通信ネットワークへの展開を妨げている。本研究は、フランクフルトとケールの間の商用通信インフラにおける最初の成功例となる、コヒーレント量子通信を支えるアーキテクチャと技術に対する革新的なアプローチを紹介する。ツインフィールド量子鍵分配プロトコルを用いて, 110bit/sの暗号鍵分布を254km以上で達成した。本システムは、測定デバイス非依存特性と非低温冷却検出器を備え、通信インフラにおける最初の効果的な量子リピータ実装であり、これまでで最長の実用的な量子鍵配置であり、位相ベースの量子インターネットアーキテクチャの実現可能性を検証する。 Quantum communications harness quantum phenomena like superposition and entanglement to enhance information transfer between remote nodes. Coherent quantum communications, essential for phase-based quantum internet architecture, require optical coherence among nodes and typically involve single-photon interference. Challenges like preserving optical coherence and integrating advanced single-photon detectors have impeded their deployment in existing telecommunication networks. This study introduces innovative approaches to the architecture and techniques supporting coherent quantum communications, marking their first successful integration within a commercial telecom infrastructure between Frankfurt and Kehl, Germany. Employing the Twin Field Quantum Key Distribution protocol, we achieved encryption key distribution at 110 bit/s over 254 km. This system features measurement-device-independent properties and non-cryogenically cooled detectors, and represents the first effective quantum repeater implementation on telecom infrastructure, the longest practical quantum key distribution deployment to date, and validates the feasibility of a phase-based quantum internet architecture.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# GGAvatar:ガウスの頭部アバターの幾何学的調整 GGAvatar: Geometric Adjustment of Gaussian Head Avatar ( http://arxiv.org/abs/2405.11993v1 ) ライセンス: Link先を確認	Xinyang Li, Jiaxin Wang, Yixin Xuan, Gongxin Yao, Yu Pan,	(参考訳) 複雑な形状と変形を持つ動的頭部アバターを頑健にモデル化する新しい3次元アバター表現であるGGAvatarを提案する。 GGAvatarは、中性ガウス初期化モジュール(英語版)と幾何モルフ調整器(英語版)の2つのコアモジュールを特徴とする粗大な構造を採用している。ニュートラルガウス初期化モジュールは、変形可能な三角形メッシュを持つガウス原始体をペアとし、適応密度制御戦略を用いて、対象の幾何学的構造を中性表現でモデル化する。幾何学Morph Adjusterは、大域空間における各ガウスの変形基底を導入し、線形ブレンドスキニング公式の限界に効果的に対処するために、変形挙動の微細な低次元表現を生成する。広汎な実験により、GGAvatarは高忠実なレンダリングを生成でき、視覚的品質と定量的な測定において最先端の手法より優れていることが示されている。 We propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# Scrutinize What Ignore: Reining Task Representation Shift in Context-based Offline Meta Reinforcement Learning Scrutinize What We Ignore: Reining Task Representation Shift In Context-Based Offline Meta Reinforcement Learning ( http://arxiv.org/abs/2405.12001v1 ) ライセンス: Link先を確認	Hai Zhang, Boyuan Zheng, Anqi Guo, Tianying Ji, Pheng-Ann Heng, Junqiao Zhao, Lanqing Li,	(参考訳) オフラインメタ強化学習(OMRL)は,事前収集データとメタラーニング技術を活用することにより,インタラクション回避と強力な一般化性能を実現するための有望なアプローチとして登場した。従来の文脈に基づくアプローチは、主にタスクとタスク表現(I(Z;M)$)の間の相互情報を最大化する直感に依存している。魅力的な結果を得たにも拘わらず、そのような直観に対する性能改善の理論的正当性は欠如している。モデルベースRLフィールドの戻り値の不一致スキームにより、$I(Z;M)$を最大化することは、最適なタスク表現に基づいて与えられたポリシー条件に対する期待値の低い境界を一貫して引き上げることと解釈できる。しかし、この最適化プロセスは2つの連続更新間のタスク表現シフトを無視しており、性能改善の崩壊につながる可能性がある。この問題に対処するため,タスク表現のシフトの影響を明示的に考慮するために,パフォーマンス差の枠組みを用いる。本研究では,タスク表現のシフトを抑えることで,単調な性能向上を実現し,従来の手法に対する優位性を示す。本手法を実用化するために, バックボーンと比較して1行のコードを追加するだけで, 容易にかつ高効率なRETROアルゴリズムを設計する。実験結果から,MuJoCoベンチマークとMetaWorldベンチマークにおいて,SOTA(State-of-the-art)の漸近的パフォーマンス,トレーニング安定性,トレーニング時間消費が検証された。 Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominantly rely on the intuition that maximizing the mutual information between the task and the task representation ($I(Z;M)$) can lead to performance improvements. Despite achieving attractive results, the theoretical justification of performance improvement for such intuition has been lacking. Motivated by the return discrepancy scheme in the model-based RL field, we find that maximizing $I(Z;M)$ can be interpreted as consistently raising the lower bound of the expected return for a given policy conditioning on the optimal task representation. However, this optimization process ignores the task representation shift between two consecutive updates, which may lead to performance improvement collapse. To address this problem, we turn to use the framework of performance difference bound to consider the impacts of task representation shift explicitly. We demonstrate that by reining the task representation shift, it is possible to achieve monotonic performance improvements, thereby showcasing the advantage against previous approaches. To make it practical, we design an easy yet highly effective algorithm RETRO (\underline{RE}ining \underline{T}ask \underline{R}epresentation shift in context-based \underline{O}ffline meta reinforcement learning) with only adding one line of code compared to the backbone. Empirical results validate its state-of-the-art (SOTA) asymptotic performance, training stability and training-time consumption on MuJoCo and MetaWorld benchmarks.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# Mamba-in-Mamba:Tokenized Mamba Modelにおけるハイパースペクトル画像分類のための集中型Mamba-Cross-Scan Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification ( http://arxiv.org/abs/2405.12003v1 ) ライセンス: Link先を確認	Weilian Zhou, Sei-Ichiro Kamata, Haipeng Wang, Man-Sing Wong, Huiying, Hou,	(参考訳) ハイパースペクトル画像(HSI)分類は、リモートセンシング(RS)分野、特に深層学習技術の進歩において重要である。 RNN(Recurrent Neural Networks)やTransformers(Transformers)といった自然言語処理(NLP)の分野に適応したシーケンスモデルは、このタスクに特化しており、ユニークな視点を提供している。しかし、いくつかの課題が続いている。 1)RNNは中心的特徴集約に苦慮し,画素干渉に敏感である。 2)変換器は重要な計算資源を必要とし、しばしば限られたHSIトレーニングサンプルで性能が低下する。 3) 画像からシーケンスデータに変換する現在の走査法は, 単純かつ非効率である。そこで本研究では,HSI分類のための新しいMamba-in-Mamba(MiM)アーキテクチャを導入する。 MiM モデルには 1) 画像からシーケンスデータへ変換する新しい集中型マンバ・クロススキャン(MCS)機構 2)ガウス式Decay Mask(GDM)、STL(Semantic Token Learner)、STF(Semantic Token Fuser)を内蔵したT-Mambaエンコーダ 3) 重み付きMCSフュージョン(WMF)モジュールとマルチスケールロスデザインを組み合わせることで復号効率を向上する。固定および非結合型トレーニング-テストサンプルを用いた3つの公開HSIデータセットによる実験結果から,本手法は既存のベースラインや最先端アプローチよりも優れ,HSIアプリケーションの有効性と可能性を強調した。 Hyperspectral image (HSI) classification is pivotal in the remote sensing (RS) field, particularly with the advancement of deep learning techniques. Sequential models, adapted from the natural language processing (NLP) field such as Recurrent Neural Networks (RNNs) and Transformers, have been tailored to this task, offering a unique viewpoint. However, several challenges persist 1) RNNs struggle with centric feature aggregation and are sensitive to interfering pixels, 2) Transformers require significant computational resources and often underperform with limited HSI training samples, and 3) Current scanning methods for converting images into sequence-data are simplistic and inefficient. In response, this study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task. The MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder that incorporates a Gaussian Decay Mask (GDM), a Semantic Token Learner (STL), and a Semantic Token Fuser (STF) for enhanced feature generation and concentration, and 3) A Weighted MCS Fusion (WMF) module coupled with a Multi-Scale Loss Design to improve decoding efficiency. Experimental results from three public HSI datasets with fixed and disjoint training-testing samples demonstrate that our method outperforms existing baselines and state-of-the-art approaches, highlighting its efficacy and potential in HSI applications.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# 構造光系におけるニューラルサイン付き距離場を用いた奥行き再構成 Depth Reconstruction with Neural Signed Distance Fields in Structured Light Systems ( http://arxiv.org/abs/2405.12006v1 ) ライセンス: Link先を確認	Rukun Qiao, Hiroshi Kawasaki, Hongbin Zha,	(参考訳) 本稿では,3次元空間の暗黙的表現を用いた多フレーム構造光設定のための新しい深度推定手法を提案する。我々のアプローチでは、自己教師付き微分可能レンダリングによって訓練された、ニューラルサイン付き距離場(SDF)を用いる。放射率と幾何場の合同推定が必要な受動的視覚とは異なり、構造化光系の投射パターンから既知の放射場を利用する。これにより、幾何学分野の分離された最適化が可能となり、固定デバイス位置決めによる収束とネットワークの有効性が確保される。幾何学的忠実度を高めるため、トレーニング中に物体表面に基づく色損失を付加する。実世界の実験では、パターンの可利用性を高めながら、数ショットシナリオにおける幾何学的性能の優位性を実証している。 We introduce a novel depth estimation technique for multi-frame structured light setups using neural implicit representations of 3D space. Our approach employs a neural signed distance field (SDF), trained through self-supervised differentiable rendering. Unlike passive vision, where joint estimation of radiance and geometry fields is necessary, we capitalize on known radiance fields from projected patterns in structured light systems. This enables isolated optimization of the geometry field, ensuring convergence and network efficacy with fixed device positioning. To enhance geometric fidelity, we incorporate an additional color loss based on object surfaces during training. Real-world experiments demonstrate our method's superiority in geometric performance for few-shot scenarios, while achieving comparable results with increased pattern availability.	翻訳日:2024-05-21 13:14:56 公開日:2024-05-20
# コンフォーマル予測による戦略証明オークション Strategy-Proof Auctions through Conformal Prediction ( http://arxiv.org/abs/2405.12016v1 ) ライセンス: Link先を確認	Roy Maor Lotan, Inbal Talgam-Chohen, Yaniv Romano,	(参考訳) 競売は売り手の収益を最大化し、買い手の間で真剣な入札を確保するための鍵である。近年、深層学習に基づく微分経済学として知られるアプローチは、複数の項目や参加者に対して最適な競売メカニズムを学習する上で有望であることを示している。しかし、このアプローチはテスト時に戦略の安全性を保証するものではありません。戦略保護は、買い手が真のバリュエーションの入札にインセンティブを与えられることを保証し、操作のリスクを伴わずに最適かつ公正なオークションの結果をもたらすため、極めて重要である。整合予測に基づいて,厳密な統計的保証で戦略の安全性を実現するための新しいアプローチを導入する。我々の方法の主な特徴は次のとおりである。一戦略保護の試験時違反の定量化に使用する後悔予測モデルの定式化及び (II)新たなオークションにおいて、データ駆動機構が高い確率(例:99\%)で戦略保護要件を満たすことを保証するために、予測された後悔を利用するオークション受理規則。数値実験により,厳密な保証の必要性,理論結果の有効性,提案手法の適用性が確認された。 Auctions are key for maximizing sellers' revenue and ensuring truthful bidding among buyers. Recently, an approach known as differentiable economics based on deep learning shows promise in learning optimal auction mechanisms for multiple items and participants. However, this approach has no guarantee of strategy-proofness at test time. Strategy-proofness is crucial as it ensures that buyers are incentivized to bid their true valuations, leading to optimal and fair auction outcomes without the risk of manipulation. Building on conformal prediction, we introduce a novel approach to achieve strategy-proofness with rigorous statistical guarantees. The key novelties of our method are: (i) the formulation of a regret prediction model, used to quantify at test time violations of strategy-proofness; and (ii) an auction acceptance rule that leverages the predicted regret to ensure that for a new auction, the data-driven mechanism meets the strategy-proofness requirement with high probability (e.g., 99\%). Numerical experiments demonstrate the necessity for rigorous guarantees, the validity of our theoretical results, and the applicability of our proposed method.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# 教師なし事前学習による適応コンバータを用いた連続手話認識 Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining ( http://arxiv.org/abs/2405.12018v1 ) ライセンス: Link先を確認	Neena Aloysius, Geetha M, Prema Nedungadi,	(参考訳) 連続手話認識(CSLR)のための従来のディープラーニングフレームワークは、単一またはマルチモーダルな特徴抽出器、シーケンスラーニングモジュール、グロースを出力するデコーダから構成される。シークエンス学習モジュールは、シークエンス・ツー・シークエンス・タスクにおいて、トランスフォーマーがそれらの有効性を実証する重要な部分である。自然言語処理と音声認識の分野における研究の進展を分析した結果,様々な変圧器の迅速な導入が観察された。しかし、手話の領域では、シーケンス学習コンポーネントでの実験は限られている。本研究では,音声認識のための最先端コンフォーマモデルをCSLRに適用し,提案手法をConSignformerと呼ぶ。これは、視覚ベースのタスクにConformerを使用した最初の例である。 ConSignformerは、CNNのバイモーダルパイプラインを特徴抽出器として、そしてシーケンス学習のためのConformerを備えている。文脈学習を改善するために、Cross-Modal Relative Attention (CMRA)を導入します。 CMRAをモデルに組み込むことで、データ内の複雑な関係を学習し活用しやすくなります。コンフォーマーモデルをさらに強化するため、訓練済み手話データセット上でRegressional Feature extractと呼ばれる教師なし事前訓練を行う。事前訓練されたコンフォーマーは、下流認識タスクのために微調整される。実験により, 事前学習戦略の有効性を確認し, CMRAが認識プロセスにどう貢献するかを実証した。注目すべきは、Conformerベースのバックボーンを活用することで、ベンチマークデータセット(PHOENIX-2014とPHOENIX-2014T)で最先端のパフォーマンスを実現することです。 Conventional Deep Learning frameworks for continuous sign language recognition (CSLR) are comprised of a single or multi-modal feature extractor, a sequence-learning module, and a decoder for outputting the glosses. The sequence learning module is a crucial part wherein transformers have demonstrated their efficacy in the sequence-to-sequence tasks. Analyzing the research progress in the field of Natural Language Processing and Speech Recognition, a rapid introduction of various transformer variants is observed. However, in the realm of sign language, experimentation in the sequence learning component is limited. In this work, the state-of-the-art Conformer model for Speech Recognition is adapted for CSLR and the proposed model is termed ConSignformer. This marks the first instance of employing Conformer for a vision-based task. ConSignformer has bimodal pipeline of CNN as feature extractor and Conformer for sequence learning. For improved context learning we also introduce Cross-Modal Relative Attention (CMRA). By incorporating CMRA into the model, it becomes more adept at learning and utilizing complex relationships within the data. To further enhance the Conformer model, unsupervised pretraining called Regressional Feature Extraction is conducted on a curated sign language dataset. The pretrained Conformer is then fine-tuned for the downstream recognition task. The experimental results confirm the effectiveness of the adopted pretraining strategy and demonstrate how CMRA contributes to the recognition process. Remarkably, leveraging a Conformer-based backbone, our model achieves state-of-the-art performance on the benchmark datasets: PHOENIX-2014 and PHOENIX-2014T.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# AIはメンタルヘルスをサポートするために大規模な言語モデル応答をテストするか? Can AI Relate: Testing Large Language Model Response for Mental Health Support ( http://arxiv.org/abs/2405.12021v1 ) ライセンス: Link先を確認	Saadia Gabriel, Isha Puri, Xuhai Xu, Matteo Malgaroli, Marzyeh Ghassemi,	(参考訳) 大型言語モデル(LLM)はすでにニューヨーク・ラングーン、ダナ・ファーバー、NHSなどの病院システムで臨床使用のために試験されている。提案されたデプロイのユースケースは精神療法であり、LLMを利用したチャットボットが精神疾患を患っている患者を治療することができる。メンタルヘルス対応のためのLSMの展開は、精神療法へのアクセスを仮説的に拡大し、ケアをパーソナライズするための新たな可能性を提供する可能性がある。しかし、テッサ・チャットボットが摂食障害患者に提供した食事のアドバイスを損なうなど、近年の顕著な失敗は、その信頼性に疑問を呈している。本研究では, LLM 反応がメンタルヘルス治療の自動化に向けた, 有効かつ倫理的な道筋であるか否かを評価するための評価枠組みを開発する。心理学研究に根ざした臨床医による人的評価とケア自動品質測定値を用いて, ピアツーピア対応者の回答を, 最先端のLCMによる回答と比較した。 GPT-4のようなLCMは、暗黙的かつ明示的な手がかりを用いて、人種のような患者人口を推測する。黒人のポスターに対する反応は、他のどの集団よりも共感が低い(対照群より2%-13%低い)。確率的に、反応が生成される方法が応答の質に大きく影響していることが分かる。精神保健対応のためのLCMの配置に関する安全ガイドラインを提案した。 Large language models (LLMs) are already being piloted for clinical use in hospital systems like NYU Langone, Dana-Farber and the NHS. A proposed deployment use case is psychotherapy, where a LLM-powered chatbot can treat a patient undergoing a mental health crisis. Deployment of LLMs for mental health response could hypothetically broaden access to psychotherapy and provide new possibilities for personalizing care. However, recent high-profile failures, like damaging dieting advice offered by the Tessa chatbot to patients with eating disorders, have led to doubt about their reliability in high-stakes and safety-critical settings. In this work, we develop an evaluation framework for determining whether LLM response is a viable and ethical path forward for the automation of mental health treatment. Using human evaluation with trained clinicians and automatic quality-of-care metrics grounded in psychology research, we compare the responses provided by peer-to-peer responders to those provided by a state-of-the-art LLM. We show that LLMs like GPT-4 use implicit and explicit cues to infer patient demographics like race. We then show that there are statistically significant discrepancies between patient subgroups: Responses to Black posters consistently have lower empathy than for any other demographic group (2%-13% lower than the control group). Promisingly, we do find that the manner in which responses are generated significantly impacts the quality of the response. We conclude by proposing safety guidelines for the potential deployment of LLMs for mental health response.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# ガウス系の衝突熱測定 Collisional thermometry for Gaussian systems ( http://arxiv.org/abs/2405.12030v1 ) ライセンス: Link先を確認	Gabriel O. Alves, Marcelo A. F. Santos, Gabriel T. Landi,	(参考訳) ガウス系を用いた量子温度測定に基づく衝突モデルについて検討する。これらのスキームの鍵となる疑問は、量子フィッシャー情報(QFI)のスケーリングとアンシラの数に関するものである。量子ビットベースの実装では、ヒルベルト空間が指数関数的に大きくなるため、この問題は評価が難しい。ここでは、QFIのスケーリングを任意の大きさで評価できるガウス衝突モデルに焦点を当てる。この数値的柔軟性により、幅広い構成のモデルの熱メトリック特性を探索することができる。モデルの確率過程のマルコフ順序は無限であるが,QFIの挙動を簡易に解析し,漸近的なフィッシャー情報密度を推定し,アシラ数の増加に対する相関の過渡効果がモデルの物理的パラメータにどのように依存するかを考察した。 We investigate a quantum thermometry scheme based collision model with Gaussian systems. A key open question of these schemes concerns the scaling of the Quantum Fisher Information (QFI) with the number of ancillae. In qubit-based implementations this question is difficult to assess, due to the exponentially growing size of the Hilbert space. Here we focus on Gaussian collision models, which allow for the scaling of the QFI to be evaluated for arbitrarily large sizes. This numerical flexibility enables us to explore the thermometric properties of the model for a wide range of configurations. Despite the infinite Markov order of the stochastic process of the model, we provide a simple phenomenological analysis for the behavior of the QFI, estimating the asymptotic Fisher information density and how the transient effects of correlations for an increasing number of ancillae depend on the physical parameters of the model.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# KG-RAG:知識と創造性のギャップを埋める KG-RAG: Bridging the Gap Between Knowledge and Creativity ( http://arxiv.org/abs/2405.12035v1 ) ライセンス: Link先を確認	Diego Sanmartin,	(参考訳) 大規模言語モデルエージェント(LMAs)の創造的能力を維持しつつ、事実の正確性を確保することは、インテリジェントエージェントシステムの開発において大きな課題となる。 LMAは、情報幻覚、破滅的な忘れ、知識集約的なタスクを扱う場合の長いコンテキストの処理の制限など、一般的な問題に直面している。本稿では、構造化知識グラフ(KG)とLLMの機能を統合することで、LMAの知識能力を向上する新しいフレームワークであるKG-RAG(Knowledge Graph-Retrieval Augmented Generation)パイプラインを紹介する。 KG-RAGパイプラインは、構造化されていないテキストからKGを構築し、新たに作成されたグラフ上で情報検索を行い、KGQAを実行する(知識グラフ質問回答)。探索の連鎖 (Chain of Explorations, CoE) と呼ばれる新しいアルゴリズムは、KG内のノードや関係をシーケンシャルに探索するLSMの推論の恩恵を受けている。 ComplexWebQuestionsデータセットに関する予備実験では、幻覚的コンテンツの削減に顕著な改善が示され、知識集約的なタスクに対処できるインテリジェントなシステム開発への有望な道のりが示唆されている。 Ensuring factual accuracy while maintaining the creative capabilities of Large Language Model Agents (LMAs) poses significant challenges in the development of intelligent agent systems. LMAs face prevalent issues such as information hallucinations, catastrophic forgetting, and limitations in processing long contexts when dealing with knowledge-intensive tasks. This paper introduces a KG-RAG (Knowledge Graph-Retrieval Augmented Generation) pipeline, a novel framework designed to enhance the knowledge capabilities of LMAs by integrating structured Knowledge Graphs (KGs) with the functionalities of LLMs, thereby significantly reducing the reliance on the latent knowledge of LLMs. The KG-RAG pipeline constructs a KG from unstructured text and then performs information retrieval over the newly created graph to perform KGQA (Knowledge Graph Question Answering). The retrieval methodology leverages a novel algorithm called Chain of Explorations (CoE) which benefits from LLMs reasoning to explore nodes and relationships within the KG sequentially. Preliminary experiments on the ComplexWebQuestions dataset demonstrate notable improvements in the reduction of hallucinated content and suggest a promising path toward developing intelligent systems adept at handling knowledge-intensive tasks.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# 多変量長周期時系列予測のための適応抽出ネットワーク Adaptive Extraction Network for Multivariate Long Sequence Time-Series Forecasting ( http://arxiv.org/abs/2405.12038v1 ) ライセンス: Link先を確認	Dandan Zhang, Yun Wang,	(参考訳) CNNアーキテクチャを用いたモデルは、多変量長時系列時系列予測(MLSTF)において、特に局所時系列特性のモデル化において大きな進歩を遂げている。しかし、MLSTFプロセスでは、グローバル時系列パターンを抽出し、異なる変数間の相関を理解することが非常に重要である。この課題に対処するために、マルチレゾリューション畳み込みと変形可能な畳み込み演算を導入する。異なる拡張因子を持つ畳み込みカーネルを用いて受容領域を拡大し、異なる解像度で時間的相関情報を捕捉し、追加のオフセットベクトルを介してサンプリング位置を適応的に調整することにより、変数間の相関特性を捕捉するネットワークの能力を高める。そこで我々は,ATVCNetを提案する。ATVCNetは,多変量時系列の局所的・言語的時間的依存関係と変数間依存関係を効果的にモデル化するための適応的時変畳み込みネットワークである。具体的には、時系列の特徴を異なる解像度で抽出し、融合させ、時系列における局所的な文脈情報とグローバルなパターンの両方をキャプチャする。設計された可変特徴量適応抽出モジュールは、時系列内の異なる変数間の相関をキャプチャする。実世界の8つのデータセットを対象としたATVCNetの性能評価を行った。その結果、ATVCNetは最先端のMLSTFモデルに対して約63.4%の性能向上を達成したことが示唆された。 Models employing CNN architecture have made significant progress in multivariate long sequence time-series forecasting (MLSTF), particularly in modeling local time series characteristics. However, during the MLSTF process, extracting the global time series patterns and understanding the correlations among different variables are highly significant. To address this challenge, we introduce multi-resolution convolution and deformable convolution operations. By enlarging the receptive field using convolution kernels with different dilation factors to capture temporal correlation information across different resolutions, and adaptively adjusting the sampling positions through additional offset vectors, we enhance the network's ability to capture correlated features between variables. Building upon this, we propose ATVCNet, an adaptive temporal-variable convolutional network designed to effectively model the local/global temporal dependencies and inter-variable dependencies of multivariate time series. Specifically, extracting and fusing time series features at different resolutions, captures both local contextual information and global patterns in the time series. The designed inter-variable feature adaptive extraction module captures the correlation among different variables in the time series. We evaluated the performance of ATVCNet across eight real-world datasets. The results indicate that ATVCNet achieved a performance improvement of approximately 63.4% over state-of-the-art MLSTF models.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# リーマン多様体上のランダム化グラディエント線:量子最適化における大域最小値へのほぼ確実に収束 Randomized Gradient Descents on Riemannian Manifolds: Almost Sure Convergence to Global Minima in and beyond Quantum Optimization ( http://arxiv.org/abs/2405.12039v1 ) ライセンス: Link先を確認	Emanuel Malvetti, Christian Arenz, Gunther Dirr, Thomas Schulte-Herbrüggen,	(参考訳) リーマン多様体上の勾配降下アルゴリズムの収束特性を解析する。モース・ボット型(Morse-Bott型)のスムーズなコスト関数を最小化するために,リーマン勾配流の接空間方向のランダム化を検討した。ハール測度に従ってリーマン勾配をランダムに射影することにより、サドル点が存在するにもかかわらず、局所最適への収束はほぼ確実に得られることを証明した。応用として、一元群上の量子最適化による基底状態の準備を考える。この設定では、量子コンピュータにユニタリ2設計を実装することで、ハールランダム射影を効率的に近似することができる。それぞれのアルゴリズムが、所望のハミルトニアン基底状態に対応する大域最小値にほぼ確実に収束していることを証明する。最後に,簡単な2次元設定でサドル点を通過させるのに必要な時間について考察する。 We analyze the convergence properties of gradient descent algorithms on Riemannian manifolds. We study randomization of the tangent space directions of Riemannian gradient flows for minimizing smooth cost functions (of Morse--Bott type) to obtain convergence to local optima. We prove that through randomly projecting Riemannian gradients according to the Haar measure, convergence to local optima can be obtained almost surely despite the existence of saddle points. As an application we consider ground state preparation through quantum optimization over the unitary group. In this setting one can efficiently approximate the Haar-random projections by implementing unitary 2-designs on quantum computers. We prove that the respective algorithm almost surely converges to the global minimum that corresponds to the ground state of a desired Hamiltonian. Finally, we discuss the time required by the algorithm to pass a saddle point in a simple two-dimensional setting.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# 分散環境におけるセキュアグループメッセージングにおける属性ベース認証 Attribute-Based Authentication in Secure Group Messaging for Distributed Environments ( http://arxiv.org/abs/2405.12042v1 ) ライセンス: Link先を確認	David Soler, Carlos Dafonte, Manuel Fernández-Veiga, Ana Fernández Vilas, Francisco J. Nóvoa,	(参考訳) Messaging Layer Security(MLS)とその基盤となるContinuous Group Key Agreement(CGKA)プロトコルは、暗号化シークレットを動的に共有することができる。この柔軟性は、分散環境の実装にMLSを理想的なものにしますが、多くの問題を克服する必要があります。特に、グループ内の認証のためのデジタル証明書の使用は、グループメンバーのプライバシーに反する。本研究では, ソリケータが自身のアイデンティティを明らかにする代わりに, グループによって動的に定義された特定の属性の所有を証明し, メンバーとなるための認証方法を提案する。デジタル証明書の代わりに、選択開示を伴う属性ベースのクレデンシャルを用いて、最小限の情報量を明らかにし、攻撃者が複数のグループを介してユーザのアクティビティをリンクするのを防ぐ。我々は,Attribute-Authenticated Continuous Group Key Agreement (AA-CGKA) というCGKAの変種を正式に定義し,要求整合性,非強制性,アンリンク性といった特性に対するセキュリティ証明を提供する。また、MLSに構築を統合するためのガイドラインも提供します。 Messaging Layer security (MLS) and its underlying Continuous Group Key Agreement (CGKA) protocol allows a group of users to share a cryptographic secret in a dynamic manner, such that the secret is modified in member insertions and deletions. Although this flexibility makes MLS ideal for implementations in distributed environments, a number of issues need to be overcome. Particularly, the use of digital certificates for authentication in a group goes against the group members' privacy. In this work we provide an alternative method of authentication in which the solicitors, instead of revealing their identity, only need to prove possession of certain attributes, dynamically defined by the group, to become a member. Instead of digital certificates, we employ Attribute-Based Credentials accompanied with Selective Disclosure in order to reveal the minimum required amount of information and to prevent attackers from linking the activity of a user through multiple groups. We formally define a CGKA variant named Attribute-Authenticated Continuous Group Key Agreement (AA-CGKA) and provide security proofs for its properties of Requirement Integrity, Unforgeability and Unlinkability. We also provide guidelines for an integration of our construction in MLS.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# EUのサイバーセキュリティ政策における不整合性リスク The Incoherency Risk in the EU's New Cyber Security Policies ( http://arxiv.org/abs/2405.12043v1 ) ライセンス: Link先を確認	Jukka Ruohonen,	(参考訳) 欧州連合(EU)は近年,新たなサイバーセキュリティ政策を追求している。本稿では,4つの政策を短時間に評価する。その焦点は、統合の欠如、加盟国間のばらつき、制度上の機能不全など、健全な政策決定によって少なくとも部分的には避けられるべき問題である。評価によると、この4つのポリシーはEUのサイバーセキュリティフレームワークの複雑さを大幅に高めた。さらに、信頼、業界セクターと異なる技術間の相違、官僚的対立、技術的な問題など、潜在的な問題がある。これらの知見により、この論文はEU政策の研究に貢献するだけでなく、サイバーセキュリティ政策全般の理解を深める。 The European Union (EU) has been pursuing new cyber security policies in recent years. This paper presents a short evaluation of four such policies. The focus is on potential incoherency, meaning a lack of integration, divergence between the member states, institutional dysfunction, and other related problems that should be at least partially avoidable by sound policy-making. According to the evaluation, the four policies have substantially increased the complexity of the EU's cyber security framework. In addition, there are potential problems with trust, divergence between industry sectors and different technologies, bureaucratic conflicts, and technical issues, among other things. With these insights, the paper not only contributes to the study of EU policies but also advances the understanding of cyber security policies in general.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# テンソル-ネットワークに基づく変分モンテカルロアプローチによる開量子系の非平衡定常状態 Tensor-network-based variational Monte Carlo approach to the non-equilibrium steady state of open quantum systems ( http://arxiv.org/abs/2405.12044v1 ) ライセンス: Link先を確認	Dawid A. Hryniuk, Marzena H. Szymańska,	(参考訳) 本研究では,行列積演算子アンザッツのモンテカルロ最適化に基づいて,非局所的相互作用を持つ大規模多体開量子系の非平衡定常状態を効率的にシミュレーションする新しい手法を提案する。提案手法は,周期システムの結合次元に対する計算コストのスケーリングの改善など,同等のアルゴリズムよりも優れ,いくつかの利点がある。我々は、最大$N=100$スピンのスピン鎖に対する集合的退化と長距離パワー法則相互作用を持つ散逸的量子イジングモデルの位相図と相関関数を研究することによって、我々のアプローチの汎用性を示す。 We introduce a novel method of efficiently simulating the non-equilibrium steady state of large many-body open quantum systems with highly non-local interactions, based on a variational Monte Carlo optimization of a matrix product operator ansatz. Our approach outperforms and offers several advantages over comparable algorithms, such as an improved scaling of the computational cost with respect to the bond dimension for periodic systems. We showcase the versatility of our approach by studying the phase diagrams and correlation functions of the dissipative quantum Ising model with collective dephasing and long-ranged power law interactions for spin chains of up to $N=100$ spins.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# ストリーミングデータを用いたエネルギー効率の良いフェデレーションエッジ学習:リアプノフ最適化手法 Energy-Efficient Federated Edge Learning with Streaming Data: A Lyapunov Optimization Approach ( http://arxiv.org/abs/2405.12046v1 ) ライセンス: Link先を確認	Chung-Hsuan Hu, Zheng Chen, Erik G. Larsson,	(参考訳) フェデレートラーニング(FL)は、ユーザセンシティブなデータを開示することなく、分散クライアント間で機械学習モデルの効率的なトレーニングを効果的に行うという利点から、近年大きな注目を集めている。具体的には、FEEL(Federated Edge Learning)システムにおいて、無線チャネルの時間変化の性質は、通信プロセスにおいて避けられないシステムダイナミクスを導入し、トレーニングのレイテンシとエネルギー消費に影響を与える。本研究では,新たなトレーニングデータサンプルをエッジデバイスで時間とともにランダムに生成するストリーミングデータシナリオについても検討する。我々のゴールは、長期エネルギー制約下でのデータ到着や資源の可利用性に固有のランダム性に対処する動的スケジューリングと資源割り当てアルゴリズムを開発することである。そこで我々は,確率的ネットワーク最適化問題を定式化し,Lyapunovのドリフト・プラス・ペナルティ・フレームワークを用いて動的資源管理設計を実現する。提案アルゴリズムは, デバイススケジューリング, 計算容量調整, 帯域幅の割り当ておよび各ラウンドの送信電力を適応的に決定する。提案するスケジューリング設計の背景にある理論的根拠を裏付ける,異種データと時間変化目的関数を用いた考察環境の収束解析を行う。本手法の有効性をシミュレーションにより検証し,ベースライン方式と比較して学習性能とエネルギー効率が向上したことを示す。 Federated learning (FL) has received significant attention in recent years for its advantages in efficient training of machine learning models across distributed clients without disclosing user-sensitive data. Specifically, in federated edge learning (FEEL) systems, the time-varying nature of wireless channels introduces inevitable system dynamics in the communication process, thereby affecting training latency and energy consumption. In this work, we further consider a streaming data scenario where new training data samples are randomly generated over time at edge devices. Our goal is to develop a dynamic scheduling and resource allocation algorithm to address the inherent randomness in data arrivals and resource availability under long-term energy constraints. To achieve this, we formulate a stochastic network optimization problem and use the Lyapunov drift-plus-penalty framework to obtain a dynamic resource management design. Our proposed algorithm makes adaptive decisions on device scheduling, computational capacity adjustment, and allocation of bandwidth and transmit power in every round. We provide convergence analysis for the considered setting with heterogeneous data and time-varying objective functions, which supports the rationale behind our proposed scheduling design. The effectiveness of our scheme is verified through simulation results, demonstrating improved learning performance and energy efficiency as compared to baseline schemes.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# K平均アルゴリズムの並列化とビッグデータクラスタリングへの応用 Parallelization of the K-Means Algorithm with Applications to Big Data Clustering ( http://arxiv.org/abs/2405.12052v1 ) ライセンス: Link先を確認	Ashish Srivastava, Mohammed Nawfal,	(参考訳) LLoydのアルゴリズムを使ったK-Meansクラスタリングは、与えられたデータセットをKの異なるクラスタに分割する反復的なアプローチである。このアルゴリズムは以下の目的関数 \[\ \min \Sigma_{i=1}^{n}\|\|x_i-\mu_{x_i}\|\|^2\] に基づいて各点をクラスタに割り当てる。このアプローチは基本的には期待最大化ステップとして知られている。クラスタリングには、各イテレーションにおける距離を計算するための広範な計算が含まれており、データポイントの数が増加するにつれて増加する。これは並列処理のスコープを提供する。しかし、並列プロセスでは、各スレッドが更新されたセントロイド値にアクセスでき、任意のセントロイド値にレース条件が存在しないことを保証する必要がある。このプロジェクトでは2つの異なるアプローチを比較します。最初のアプローチはOpenMPフラット同期方式で、すべてのプロセスが並列に実行される。 2つ目のアプローチは、OpenACCを使ったGPUベースの並列化アプローチです。スピードアップ、効率、さまざまなデータポイントでかかった時間、そして2つのアプローチを比較して得られる相対的なパフォーマンス改善を理解するためのプロセスの数などを分析します。 The K-Means clustering using LLoyd's algorithm is an iterative approach to partition the given dataset into K different clusters. The algorithm assigns each point to the cluster based on the following objective function \[\ \min \Sigma_{i=1}^{n}\|\|x_i-\mu_{x_i}\|\|^2\] The serial algorithm involves iterative steps where we compute the distance of each datapoint from the centroids and assign the datapoint to the nearest centroid. This approach is essentially known as the expectation-maximization step. Clustering involves extensive computations to calculate distances at each iteration, which increases as the number of data points increases. This provides scope for parallelism. However, we must ensure that in a parallel process, each thread has access to the updated centroid value and no racing condition exists on any centroid values. We will compare two different approaches in this project. The first approach is an OpenMP flat synchronous method where all processes are run in parallel, and we use synchronization to ensure safe updates of clusters. The second approach we adopt is a GPU based parallelization approach using OpenACC wherein we will try to make use of GPU architecture to parallelize chunks of the algorithm to observe decreased computation time. We will analyze metrics such as speed up, efficiency,time taken with varying data points, and number of processes to compare the two approaches and understand the relative performance improvement we can get.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# 自然言語処理による感性分析における判断変動に影響を与える要因の解明と統計 Unveiling factors influencing judgment variation in Sentiment Analysis with Natural Language Processing and Statistics ( http://arxiv.org/abs/2405.12055v1 ) ライセンス: Link先を確認	Olga Kellert, Carlos Gómez-Rodríguez, Mahmud Uz Zaman,	(参考訳) TripAdvisorのレビューと同等のデータソースは、自然言語処理(NLP)における多くのタスクにおいて重要な役割を担い、ホテルやレストランのレビューのような主観的判断を正あるいは負の極性に識別し分類するためのデータ基盤を提供する。本研究では,スペイン語のTripAdvisorレビューに焦点をあて,クラウドソースによる極性判断の変動に影響を与える3つの重要な要因について検討した。音声の一部(POS)の役割、「味」のような感情語の影響、および「ok」のような中立語の影響の3つの仮説が検証されている。研究の方法論はワンワードのタイトルを使用し、単語の極性の変化を研究する上での有効性を実証している。平均等式に関する統計的試験は、我々の興味のある単語群に対して行われる。本研究の結果から, 1語タイトルの形容詞は, 他の語型やPOSに比べて, 判断のばらつきが低い傾向にあることが明らかとなった。感性語は、偏極性判断の研究における感情語の重要性を強調し、中性語は、期待するほど高い判断変異に関連付けられる。しかし、これらの効果は長いタイトルで常に再現できないため、長いタイトルは長いタイトルにおける否定のような他の単語による単語の極性への影響から、単一の単語のあいまいさをテストするのに最適なデータソースではないことを示唆している。この経験的調査は、単語の極性変化に影響を与える要因についての貴重な洞察を提供し、スペイン語の極性判断を捉え予測することを目的としたNLP実践者や、判断変動に影響を与える要因を理解することを目的とした研究者の基盤を提供する。 TripAdvisor reviews and comparable data sources play an important role in many tasks in Natural Language Processing (NLP), providing a data basis for the identification and classification of subjective judgments, such as hotel or restaurant reviews, into positive or negative polarities. This study explores three important factors influencing variation in crowdsourced polarity judgments, focusing on TripAdvisor reviews in Spanish. Three hypotheses are tested: the role of Part Of Speech (POS), the impact of sentiment words such as "tasty", and the influence of neutral words like "ok" on judgment variation. The study's methodology employs one-word titles, demonstrating their efficacy in studying polarity variation of words. Statistical tests on mean equality are performed on word groups of our interest. The results of this study reveal that adjectives in one-word titles tend to result in lower judgment variation compared to other word types or POS. Sentiment words contribute to lower judgment variation as well, emphasizing the significance of sentiment words in research on polarity judgments, and neutral words are associated with higher judgment variation as expected. However, these effects cannot be always reproduced in longer titles, which suggests that longer titles do not represent the best data source for testing the ambiguity of single words due to the influence on word polarity by other words like negation in longer titles. This empirical investigation contributes valuable insights into the factors influencing polarity variation of words, providing a foundation for NLP practitioners that aim to capture and predict polarity judgments in Spanish and for researchers that aim to understand factors influencing judgment variation.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# NPLMV-PS:ニューラルポイントライト多視点光度ステレオ NPLMV-PS: Neural Point-Light Multi-View Photometric Stereo ( http://arxiv.org/abs/2405.12057v1 ) ライセンス: Link先を確認	Fotios Logothetis, Ignas Budvytis, Roberto Cipolla,	(参考訳) 本稿では,新しい多視点測光ステレオ(PS)法を提案する。 3D再構築における多くの作業と同様に、私たちはニューラルネットワークの形状表現と学習用レンダラーを活用しています。しかし,本研究はPS-NeRFやSuperNormalのような最先端の多視点PS法とは異なる。我々は、各点の放射率を最適に近似するために、点光減衰と明示的にレイトラスキャストシャドーをモデル化する。これは、最小の事前仮定を使用し、表面と共同最適化される完全なニューラルネットワーク材料レンダラーへの入力として使用される。最後に、表面の精度を最大化するために、推定された正規写像と分割写像も組み込むことができる。提案手法は,DiLiGenT-MVの古典的アプローチを初めて上回り,約400×400の解像度で1.5m離れた物体に対して平均0.2mmのチャンファー距離を実現する。また、低光数シナリオでは、画素レンダリングが推定値の代わりに使用される場合、0.27mmのチャンファー距離が得られる。 In this work we present a novel multi-view photometric stereo (PS) method. Like many works in 3D reconstruction we are leveraging neural shape representations and learnt renderers. However, our work differs from the state-of-the-art multi-view PS methods such as PS-NeRF or SuperNormal we explicity leverage per-pixel intensity renderings rather than relying mainly on estimated normals. We model point light attenuation and explicitly raytrace cast shadows in order to best approximate each points incoming radiance. This is used as input to a fully neural material renderer that uses minimal prior assumptions and it is jointly optimised with the surface. Finally, estimated normal and segmentation maps can also incorporated in order to maximise the surface accuracy. Our method is among the first to outperform the classical approach of DiLiGenT-MV and achieves average 0.2mm Chamfer distance for objects imaged at approx 1.5m distance away with approximate 400x400 resolution. Moreover, we show robustness to poor normals in low light count scenario, achieving 0.27mm Chamfer distance when pixel rendering is used instead of estimated normals.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# STYLE:大規模言語モデルを用いた対話エージェントにおける問合せ質問のドメイン転送性の向上 STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents ( http://arxiv.org/abs/2405.12059v1 ) ライセンス: Link先を確認	Yue Chen, Chen Huang, Yang Deng, Wenqiang Lei, Dingnan Jin, Jia Liu, Tat-Seng Chua,	(参考訳) 対話型検索エンジンに、明確化を問うときの戦略を取り入れることが、各分野においてますます重要になっている。 LLMのコンテキスト理解能力とドメイン固有の知識ソースへのアクセスに起因して、LLMベースの明確化戦略は、ポストホックな方法で様々なドメインへの迅速な移行を特徴とする。しかし、まだ、目に見えないドメインで有望なパフォーマンスを提供するのに苦労し、効果的なドメイン転送可能性を達成するのに苦労しています。我々はこの問題を調査する第一歩を踏み出し、既存の手法は様々な領域にまたがる一大戦略を生み出す傾向にあり、検索の有効性は制限される。そこで本研究では,ドメイン転送性を効果的に実現するために,Styleと呼ばれる新しい手法を提案する。実験の結果,Styleはドメイン転送性が高く,検索性能は4つの未確認領域で平均約10%向上した。 Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still struggle to deliver promising performance on unseen domains, struggling to achieve effective domain transferability. We take the first step to investigate this issue and existing methods tend to produce one-size-fits-all strategies across diverse domains, limiting their search effectiveness. In response, we introduce a novel method, called Style, to achieve effective domain transferability. Our experimental results indicate that Style bears strong domain transferability, resulting in an average search performance improvement of ~10% on four unseen domains.	翻訳日:2024-05-21 13:05:04 公開日:2024-05-20
# CLAMBER:大規模言語モデルにおける曖昧な情報要求の同定と明確化のベンチマーク CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models ( http://arxiv.org/abs/2405.12063v1 ) ライセンス: Link先を確認	Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua,	(参考訳) 大規模言語モデル(LLM)は、ユーザ情報のニーズを満たすために使われることが多いが、様々な種類のあいまいさを含むユーザクエリを扱う上での有効性は依然として不明であり、最終的にはユーザの信頼と満足度を損なうことになる。この目的のために,よく組織化された分類法を用いてLCMを評価するためのベンチマークであるCLAMBERを紹介した。分類学に基づいて, 様々な既成のLCMの強度, 弱点, 潜在的なリスクを評価するために, 約12Kの高品質なデータを構築した。以上の結果から,現在のLCMが不明瞭なユーザクエリを識別し,明確化するためには,チェーン・オブ・ソート(CoT)や数発のプロンプトによって強化されていることが示唆された。これらの技術はLSMの過信を招き、曖昧さの識別において限界的な拡張しか得られない。さらに、現在のLLMは、紛争解決の欠如と固有の知識の不正確な利用により、質の高い明確な質問を生成するのに不足している。本稿では,CLAMBERを指導し,積極的かつ信頼性の高いLCMのさらなる研究を促進する。私たちのデータセットはhttps://github.com/zt991211/CLAMBERで利用可能です。 Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of conflict resolution and inaccurate utilization of inherent knowledge. In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs. Our dataset is available at https://github.com/zt991211/CLAMBER	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# QuanEstimation.jl: 量子パラメータ推定のためのオープンソースのJuliaフレームワーク QuanEstimation.jl: An open-source Julia framework for quantum parameter estimation ( http://arxiv.org/abs/2405.12066v1 ) ライセンス: Link先を確認	Huai-Ming Yu, Jing Liu,	(参考訳) 量子力学の主要な理論的支援として、量子パラメータ推定は応用科学や産業への量子力学のステップに従う必要がある。したがって、最適スキーム設計は量子パラメータ推定にとってまもなく重要かつ中核的な課題となる。この課題を効果的に解決するために,コンピュータ支援設計を目的としたソフトウェアパッケージが要求されている。このニーズに応えて、我々は量子パラメータ推定におけるスキーム評価と設計のためのオープンソースのJuliaフレームワークであるQuanEstimation.jlを紹介する。独立パッケージとして、あるいは最近開発されたハイブリッド言語(Python-Julia)パッケージQuanEstimation [Phys. Res. 4, 043057 (2022)]の計算コアとして使用することができる。この枠組みを用いることで、特に量子ノイズが存在する場合、量子パラメータ推定におけるスキームの評価と設計が容易に行える。 As the main theoretical support of quantum metrology, quantum parameter estimation must follow the steps of quantum metrology towards the applied science and industry. Hence, optimal scheme design will soon be a crucial and core task for quantum parameter estimation. To efficiently accomplish this task, software packages aimed at computer-aided design are in high demand. In response to this need, we hereby introduce QuanEstimation.jl, an open-source Julia framework for scheme evaluation and design in quantum parameter estimation. It can be used either as an independent package or as the computational core of the recently developed hybrid-language (Python-Julia) package QuanEstimation [Phys. Rev. Res. 4, 043057 (2022)]. Utilizing this framework, the scheme evaluation and design in quantum parameter estimation can be readily performed, especially when quantum noises exist.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# Anchor Gaussian Guided Texture Warping を用いた高忠実度神経上半身アバター Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping ( http://arxiv.org/abs/2405.12069v1 ) ライセンス: Link先を確認	Tianhao Wu, Jing Yang, Zhilin Guo, Jingyi Wan, Fangcheng Zhong, Cengiz Oztireli,	(参考訳) 最新の3次元ガウススプティング表現を3次元モルファスモデル(3DMM)と組み合わせることで、既存の手法は高忠実度で頭部アバターを作成することができる。しかしながら、既存のほとんどのメソッドは、ボディなしでヘッドを再構築するだけで、アプリケーションのシナリオを著しく制限します。胸部・肩部モデルにガウシアンを意識的に応用すると, 新規なポーズ下では, ぼやけた再建やうるさびが生じる傾向がみられた。これはガウス雲と点雲の基本的な制限のためであり、各ガウス雲または点は空間的分散なしに単一の方向の放射しか持たないため、単純な幾何学においても複雑な空間的変化のテクスチャを表現するために必要となる大量のものが必要である。対照的に、粗い色とポーズ依存の微妙な色からなる神経テクスチャで身体部分をモデル化することを提案する。画像平面座標をテクスチャ空間にマッピングするニューラルワープフィールドを制約するアンカーとして、各ビューのボディーテクスチャを適切にレンダリングし、正確な幾何学やUVマッピングを使わずに、別の粗いガウスの集合を最適化する。ガウシアンヘッド&ショルダーは, 被服上半身の高周波細部を高い忠実度で適合させ, 頭部領域の精度と忠実度を向上できる可能性が示唆された。提案手法をカジュアルな電話・インターネットビデオを用いて評価し, 自己・横断的再現作業において, 再現性や堅牢性に優れることを示す。さらに,マルチ層パーセプトロン (MLP) クエリを使わずにトレーニングしたモデルの高速化推論手法を提案する。 By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# AutoSoccerPose:サッカーショット運動の3次元姿勢解析 AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements ( http://arxiv.org/abs/2405.12070v1 ) ライセンス: Link先を確認	Calvin Yeung, Kenjiro Ide, Keisuke Fujii,	(参考訳) 画像理解はコンピュータビジョンの基本課題であり、近年ではサッカーの姿勢分析に応用されている。しかし、既存の公開データセットには包括的な情報がなく、特に姿勢シーケンスと2Dポーズアノテーションの形で提供されている。さらに、現在の分析モデルは、しばしば解釈可能な線形モデル(例えば、PCA、回帰)に依存し、複雑で多様なシナリオにおける非線形時空間関係を捉える能力を制限する。これらのギャップに対処するため,サッカー中継ビデオに3Dショット(3DSP)データセットを導入する。さらに、3DSP-GRAE(Graph Recurrent AutoEncoder)モデルを提案する。さらに,半自動2次元と3次元のポーズ推定と姿勢解析を目的としたパイプラインであるAutoSoccerPoseを提案する。完全な自動化を実現することは難しかったが、基本となるベースラインを提供し、アノテーション付きデータの範囲を超えてそのユーティリティを拡張した。サッカーネットと3DSPのデータセット上でAutoSoccerPoseを検証するとともに,3DSPに基づく姿勢解析結果を示す。データセット、コード、モデルについては、https://github.com/calvinyeungck/3D-Shot-Posture-Datasetを参照してください。 Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# GAN-GRID: スマートグリッド安定性予測のための新たなジェネレーティブアタック GAN-GRID: A Novel Generative Attack on Smart Grid Stability Prediction ( http://arxiv.org/abs/2405.12076v1 ) ライセンス: Link先を確認	Emad Efatinasab, Alessandro Brighente, Mirco Rampazzo, Nahal Azadi, Mauro Conti,	(参考訳) スマートグリッドは、電力セクターの近代化において重要な革新であり、電源から消費者へのエネルギー供給を最適化できるインテリジェントでデジタル化されたエネルギーネットワークを提供する。これは国家のエネルギーセクターのバックボーンを表す。その中心的な役割のため、スマートグリッドの可用性は最重要であり、その運用と安全性の詳細な制御が必要である。この目的のために、スマートグリッドの安定性を評価し、安全な状態で動作することを保証する複数のソリューションを開発した。人工知能と機械学習アルゴリズムは、スマートグリッドの安定性を正確に予測するための効果的な手段であることが証明されている。既知の敵攻撃や潜在的な解決策が存在するにもかかわらず、この脅威からスマートグリッドを保護するための標準化された手段は存在しない。本稿では,現実の制約に合わせたスマートグリッドの安定性予測システムを対象とした,新たな敵攻撃手法であるGAN-GRIDを提案する。以上の結果から,データやモデル知識を欠いた,安定度モデルのみに武装した敵が,攻撃成功率0.99の安定度でデータを作成できることが判明した。また、認証データとセンサー値を操作することで、攻撃者はグリッド問題を増幅することができる。これらの結果は,システム安定性と信頼性を維持するために,敵の操作に対するスマートグリッドセキュリティ機構の強化を示唆している。 The smart grid represents a pivotal innovation in modernizing the electricity sector, offering an intelligent, digitalized energy network capable of optimizing energy delivery from source to consumer. It hence represents the backbone of the energy sector of a nation. Due to its central role, the availability of the smart grid is paramount and is hence necessary to have in-depth control of its operations and safety. To this aim, researchers developed multiple solutions to assess the smart grid's stability and guarantee that it operates in a safe state. Artificial intelligence and Machine learning algorithms have proven to be effective measures to accurately predict the smart grid's stability. Despite the presence of known adversarial attacks and potential solutions, currently, there exists no standardized measure to protect smart grids against this threat, leaving them open to new adversarial attacks. In this paper, we propose GAN-GRID a novel adversarial attack targeting the stability prediction system of a smart grid tailored to real-world constraints. Our findings reveal that an adversary armed solely with the stability model's output, devoid of data or model knowledge, can craft data classified as stable with an Attack Success Rate (ASR) of 0.99. Also by manipulating authentic data and sensor values, the attacker can amplify grid issues, potentially undetected due to a compromised stability prediction system. These results underscore the imperative of fortifying smart grid security mechanisms against adversarial manipulation to uphold system stability and reliability.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# 非エルミート的Janes-Cummingsモデルにおける普遍量子フィッシャー情報とランダウクラスおよびトポロジカルクラス遷移の同時発生 Universal quantum Fisher information and simultaneous occurrence of Landau-class and topological-class transitions in non-Hermitian Jaynes-Cummings models ( http://arxiv.org/abs/2405.12080v1 ) ライセンス: Link先を確認	Zu-Jian Ying,	(参考訳) 光-物質相互作用は、臨界現象、トポロジカル遷移、量子メトロジー、非エルミート物理学の相互作用の理想的な試験場を提供する。我々は、パリティ時間(PT)対称性と反PT対称性で実エネルギースペクトルを持つ2つの基本的な非エルミート的Jaynes-Cummingsモデルを考える。量子フィッシャー情報は例外点の遷移に批判的であり、異なるパラメータ、全てのエネルギーレベル、モデル、対称相および対称性破壊相に関して超普遍性を示す。これらの遷移は、対称性を破るランダウクラス遷移(LCT)と対称性を保護したトポロジカルクラス遷移(TCT)の両方であることが判明し、それとは反対の対称性要求により従来は相容れない臨界LCTとTCTの同時発生を実現する。 Light-matter interactions provide an ideal testground for interplay of critical phenomena, topological transitions, quantum metrology and non-Hermitian physics. We consider two fundamental non-Hermitian Jaynes-Cummings models which possess real energy spectra in parity-time (PT) symmetry and anti-PT symmetry. We show that the quantum Fisher information is critical around the transitions at the exceptional points and exhibits a super universality with respect to different parameters, all energy levels, both models, symmetric phases and symmetry-broken phases. The transitions are found to be both symmetry-breaking Landau-class transitions (LCTs) and symmetry-protected topological-class of transitions (TCTs), thus realizing a simultaneous occurrence of critical LCTs and TCTs which are conventionally incompatible due to contrary symmetry requirements.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# データアロケーションによる選択的アノテーション:これらのデータはモデルではなくアノテーションのために専門家にトリアージされるべきである Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model ( http://arxiv.org/abs/2405.12081v1 ) ライセンス: Link先を確認	Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan,	(参考訳) 限られた予算下で高品質なアノテーションを得るために、半自動アノテーション法が一般的に用いられ、データの一部を専門家によって注釈付けされ、残りのデータに対するアノテーションを完成させるためにモデルが訓練される。しかしながら、これらの手法は主に、モデル予測能力(トリアージ・トゥ・ヒューマン・データ)を改善するために専門家アノテーションのための情報的データを選択することに焦点を当て、残りのデータはモデルアノテーション(トリアージ・トゥ・モデル・データ)に無差別に割り当てられている。これはアノテーションの予算配分の非効率につながる可能性がある。モデルが正確にアノテートできる簡単なデータは専門家に不要に割り当てられる可能性があるし、ハードデータはモデルによって誤って分類される可能性があるからだ。その結果、全体的なアノテーションの品質が損なわれる可能性がある。この問題に対処するため、我々はSANTと呼ばれる選択的なアノテーションフレームワークを提案する。提案した誤り認識トリアージと二重み付け機構により、トリアージ・ツー・ヒューマンデータとトリアージ・ツー・モデルデータの両方を効果的に活用する。そのため、情報的あるいはハードなデータは専門家にアノテーションとして割り当てられ、簡単なデータはモデルによって処理される。実験の結果、SANTは他のベースラインを一貫して上回り、専門家とモデルワーカーの両方にデータの適切な割り当てを通じて高品質なアノテーションをもたらすことが示された。我々は、予算制約の中でデータアノテーションに関する先駆的な研究を行い、将来のトリアージベースのアノテーション研究のランドマークを確立します。 To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data), while the rest of the data is indiscriminately assigned to model annotation (i.e., triage-to-model data). This may lead to inefficiencies in budget allocation for annotations, as easy data that the model could accurately annotate may be unnecessarily assigned to the expert, and hard data may be misclassified by the model. As a result, the overall annotation quality may be compromised. To address this issue, we propose a selective annotation framework called SANT. It effectively takes advantage of both the triage-to-human and triage-to-model data through the proposed error-aware triage and bi-weighting mechanisms. As such, informative or hard data is assigned to the expert for annotation, while easy data is handled by the model. Experimental results show that SANT consistently outperforms other baselines, leading to higher-quality annotation through its proper allocation of data to both expert and model workers. We provide pioneering work on data annotation within budget constraints, establishing a landmark for future triage-based annotation studies.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# 分布意味論, ホロリズム, および意味の不安定性 Distributional Semantics, Holism, and the Instability of Meaning ( http://arxiv.org/abs/2405.12084v1 ) ライセンス: Link先を確認	Jumbly Grindrod, J. D. Porter, Nat Hansen,	(参考訳) 現在の言語モデルは、その中核に分布仮説を持つ言語意味への、いわゆる分散意味論的アプローチに基づいて構築されている。分布仮説は、単語の意味の全体論的概念を包含する: 単語の意味は、モデル内の他の単語との関係に依存する。言語システム(例えば、人間の話者)の意味的性質の変化は、多くの変化やシステム全体の完全な変化をもたらす。問題となっているシステムが互いに通信しようとすると、この種の不安定さはコミュニケーションを不可能にする(Fodor and Lepore 1992, 1996, 1999)。本稿では,不安定性が意味の分布モデルに問題をもたらすかどうかを考察する。まず、これらのモデルが示すような異なる不安定な形式を区別し、そのような形式は、不安定性とコミュニケーションの関係を理解するのに重要な1つ、すなわち微分不安定(differential instability)と呼ぶものだけである、と論じる。微分不安定 (differial instability) とは、空間内の点間の相対距離の変動であり、それらの点の絶対位置の変動である。我々は2つの小説のテキストから構築されたおもちゃモデルと、ウィキペディアとSEPの記事の組み合わせからWord2vecアルゴリズムを用いて構築されたより洗練されたモデルである。これらのモデルがサイズの増加から構築されるコーパスとしてどのように変化するかを示すことによって、不安定性の2つの形態を実証する。 Current language models are built on the so-called distributional semantic approach to linguistic meaning that has the distributional hypothesis at its core. The distributional hypothesis involves a holistic conception of word meaning: the meaning of a word depends upon its relations to other words in the model. A standard objection to meaning holism is the charge of instability: any change in the meaning properties of a linguistic system (a human speaker, for example) would lead to many changes or possibly a complete change in the entire system. When the systems in question are trying to communicate with each other, it has been argued that instability of this kind makes communication impossible (Fodor and Lepore 1992, 1996, 1999). In this article, we examine whether the instability objection poses a problem for distributional models of meaning. First, we distinguish between distinct forms of instability that these models could exhibit, and we argue that only one such form is relevant for understanding the relation between instability and communication: what we call differential instability. Differential instability is variation in the relative distances between points in a space, rather than variation in the absolute position of those points. We distinguish differential and absolute instability by constructing two of our own models, a toy model constructed from the text of two novels, and a more sophisticated model constructed using the Word2vec algorithm from a combination of Wikipedia and SEP articles. We demonstrate the two forms of instability by showing how these models change as the corpora they are constructed from increase in size.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# 統計からの浅量子回路の耐雑音性学習性と量子擬似ランダム性のコスト Noise-tolerant learnability of shallow quantum circuits from statistics and the cost of quantum pseudorandomness ( http://arxiv.org/abs/2405.12085v1 ) ライセンス: Link先を確認	Chirag Wadhwa, Mina Doosti,	(参考訳) この研究は、近い将来に未知の量子回路の学習可能性について研究する。量子過程を学習するための量子統計クエリの自然な堅牢性を証明し、統計学から様々な種類のノイズをベンチマークする効率的な方法を提供し、ノイズ耐性アルゴリズムを開発するための強力なフレームワークを提供する。定深量子回路の学習アルゴリズムを,クエリの複雑さのオーバーヘッドが小さい量子統計クエリ設定に適用する。統計的クエリを用いて,ダイヤモンド距離内における対数的および高い深さのランダム量子回路を学習するための平均ケース下界を証明した。さらに、量子統計的クエリーから量子しきい値探索問題の難しさを示し、浅い量子回路の学習可能性にその意味を論じる。最後に、擬似ランダムユニタリ(PRU)は、効率的な微分器を構築し、量子自由弁当定理の新たなバリエーションを証明することによって、一定の深さの回路を用いて構築できないことを証明した。 This work studies the learnability of unknown quantum circuits in the near term. We prove the natural robustness of quantum statistical queries for learning quantum processes and provide an efficient way to benchmark various classes of noise from statistics, which gives us a powerful framework for developing noise-tolerant algorithms. We adapt a learning algorithm for constant-depth quantum circuits to the quantum statistical query setting with a small overhead in the query complexity. We prove average-case lower bounds for learning random quantum circuits of logarithmic and higher depths within diamond distance with statistical queries. Additionally, we show the hardness of the quantum threshold search problem from quantum statistical queries and discuss its implications for the learnability of shallow quantum circuits. Finally, we prove that pseudorandom unitaries (PRUs) cannot be constructed using circuits of constant depth by constructing an efficient distinguisher and proving a new variation of the quantum no-free lunch theorem.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# 機械学習による雷網のチャネルバランス補間 Channel Balance Interpolation in the Lightning Network via Machine Learning ( http://arxiv.org/abs/2405.12087v1 ) ライセンス: Link先を確認	Vincent, Emanuele Rossi, Vikash Singh,	(参考訳) Bitcoin Lightning Network(ビットコイン・ライトニング・ネットワーク)は、Bitcoinのスケーラビリティに対処するレイヤ2の支払いプロトコルである。本研究は、ネットワーク内のチャネルバランスを補間するために機械学習モデルを使用することにより、ネットワークのパスフィニングアルゴリズムの最適化に使用できる可能性を検討する。残高調査やマルチパス決済プロトコルの探索が盛んに行われているが、ノードとチャネルの機能のみを用いてチャネルのバランスを予測することは、まだ未知の領域である。本稿では,2つのヒューリスティックベースラインに対する機械学習モデルの性能評価を行い,様々な特徴の予測能力について検討する。実験では,両エッジがチャネル容量の半分に割り当てられる等分割ベースラインに対して10%向上した。 The Bitcoin Lightning Network is a Layer 2 payment protocol that addresses Bitcoin's scalability by facilitating quick and cost effective transactions through payment channels. This research explores the feasibility of using machine learning models to interpolate channel balances within the network, which can be used for optimizing the network's pathfinding algorithms. While there has been much exploration in balance probing and multipath payment protocols, predicting channel balances using solely node and channel features remains an uncharted area. This paper evaluates the performance of several machine learning models against two heuristic baselines and investigates the predictive capabilities of various features. Our model performs favorably in experimental evaluation, outperforming by 10% against an equal split baseline where both edges are assigned half of the channel capacity.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# オフライン強化学習における軌道最適化とMambaは相容れないか? Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning? ( http://arxiv.org/abs/2405.12094v1 ) ライセンス: Link先を確認	Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu, Mengzhu Wang, Shouling Ji, Jincai Huang, Li Shen,	(参考訳) トランスフォーマーベースの軌道最適化手法は、オフライン強化学習(オフラインRL)において例外的な性能を示したが、かなりのパラメータサイズと限られたスケーラビリティのため、特に計算能力に制限のあるロボットやドローンのようなリソースが制約されたシーケンシャルな意思決定シナリオにおいて、課題を生じさせている。有望な新しい線形時間シーケンスモデルであるMambaは、トランスフォーマーと同等のパフォーマンスを提供すると同時に、長いシーケンスのパラメータをかなり少なく提供する。 Mambaが軌道最適化と互換性があるかどうかは不明だが、本研究の目的は、データ構造やネットワークアーキテクチャの観点からオフラインRL(Dubbed DeMa)におけるDecision Mambaの可能性を探るための包括的な実験を行うことである。その結果,トランスフォーマーライクなDeMaを,RNNライクなDeMaとは対照的に導入した。 2)DeMaのコンポーネントでは,隠れ注意機構が成功の鍵であり,他の残留構造ともうまく動作し,位置埋め込みを必要としない。 8つのAtariゲームによる大規模な評価の結果,DeMaはトラジェクトリ最適化と互換性があり,従来の最先端手法を超越し,パラメータが30倍のDecision Transformer(DT)を80倍に上回り,パラメータの4分の1しか持たないMuJoCoではDTを上回っていることがわかった。 Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a promising new linear-time sequence model, offers performance on par with transformers while delivering substantially fewer parameters on long sequences. As it remains unclear whether Mamba is compatible with trajectory optimization, this work aims to conduct comprehensive experiments to explore the potential of Decision Mamba in offline RL (dubbed DeMa) from the aspect of data structures and network architectures with the following insights: (1) Long sequences impose a significant computational burden without contributing to performance improvements due to the fact that DeMa's focus on sequences diminishes approximately exponentially. Consequently, we introduce a Transformer-like DeMa as opposed to an RNN-like DeMa. (2) For the components of DeMa, we identify that the hidden attention mechanism is key to its success, which can also work well with other residual structures and does not require position embedding. Extensive evaluations from eight Atari games demonstrate that our specially designed DeMa is compatible with trajectory optimization and surpasses previous state-of-the-art methods, outdoing Decision Transformer (DT) by 80\% with 30\% fewer parameters, and exceeds DT in MuJoCo with only a quarter of the parameters.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# PATE: 近縁性を考慮した時系列異常評価 PATE: Proximity-Aware Time series anomaly Evaluation ( http://arxiv.org/abs/2405.12096v1 ) ライセンス: Link先を確認	Ramin Ghorbani, Marcel J. T. Reinders, David M. J. Tax,	(参考訳) 時系列データにおける異常検出アルゴリズムの評価は、リアルタイム分析とデータ駆動戦略が不可欠であるさまざまな領域において、不正確さが決定の欠陥を引き起こす可能性があるため、非常に重要である。従来のパフォーマンス指標では、iidデータを前提としており、早期検出や遅延検出といった時系列異常の複雑な時間的ダイナミクスや特定の特性を捉えることができない。本稿では、予測と異常区間の時間的関係を組み込んだ新しい評価指標であるPATE(Proximity-Aware Time series anomaly Evaluation)を紹介する。 PATEは異常間隔の周囲のバッファゾーンを考慮した近接ベースの重み付けを使用しており、より詳細で詳細な検出評価を可能にする。これらの重みを使って、PATEはPrecision and Recall曲線の下で領域の重み付きバージョンを計算する。合成および実世界のデータセットを用いた実験は、PATEが他の評価指標よりも精度が高く正確な評価を行う上で優れていることを示す。また、PATE評価手法を用いて、様々なベンチマークデータセットに対して、最先端の異常検知器をいくつか試験した。その結果、ポイント調整F1スコアのような共通メトリックは、検出性能をうまく特徴づけられず、PATEはより公平なモデル比較を提供することができることを示した。 PATEを導入することによって、モデルの有効性の理解を再定義し、より効果的で正確な検出モデルの開発に向けて将来の研究を推し進める。 Evaluating anomaly detection algorithms in time series data is critical as inaccuracies can lead to flawed decision-making in various domains where real-time analytics and data-driven strategies are essential. Traditional performance metrics assume iid data and fail to capture the complex temporal dynamics and specific characteristics of time series anomalies, such as early and delayed detections. We introduce Proximity-Aware Time series anomaly Evaluation (PATE), a novel evaluation metric that incorporates the temporal relationship between prediction and anomaly intervals. PATE uses proximity-based weighting considering buffer zones around anomaly intervals, enabling a more detailed and informed assessment of a detection. Using these weights, PATE computes a weighted version of the area under the Precision and Recall curve. Our experiments with synthetic and real-world datasets show the superiority of PATE in providing more sensible and accurate evaluations than other evaluation metrics. We also tested several state-of-the-art anomaly detectors across various benchmark datasets using the PATE evaluation scheme. The results show that a common metric like Point-Adjusted F1 Score fails to characterize the detection performances well, and that PATE is able to provide a more fair model comparison. By introducing PATE, we redefine the understanding of model efficacy that steers future studies toward developing more effective and accurate detection models.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# DOP:Mathematical Correctionにおける大規模言語モデルのための診断指向プロンプト DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction ( http://arxiv.org/abs/2405.12100v1 ) ライセンス: Link先を確認	Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou,	(参考訳) 数学世界問題修正(MWPC)は数学問題の解法における推論誤差の修正を目的とした新しい課題である。本稿では,大規模言語モデル(LLM)の進歩を生かして,(1)数学的推論と誤り訂正の区別,(2)数学におけるLLMの誤り訂正能力を向上してMWPC課題を解決するための戦略を探る。リアルタイム教育では、学生が間違いを認識することが、単に正しい答えを提供することよりも重要であることに気づきました。しかしながら、現在の研究では、潜在的な不正な問題を修正するのではなく、数学問題に対する正確な解を得るのが優先されている。そこで,本研究では,数学的推論能力の向上が誤り訂正の熟達に等しくないことを実証し,研究パラダイムを変更した。一方,LLMの誤り訂正を容易にするために,診断指向プロンピング(DOP)と呼ばれる新しい手法を提案する。実験では、DOPは優れたパフォーマンスを示し、その大きな影響を強調しています。数学教育において、卓越した正当性に対する需要は、熟練した理性者よりも多いと論じる。コードとデータはhttps://github.com/ChenhaoEcnuCS/Reason-Correctで公開されている。 Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs in mathematics to solve MWPC task. We noticed that, in real-time education,assisting students in recognizing their mistakes is more crucial than simply providing correct answers. However, current research tends to prioritize obtaining accurate solutions to math problems rather than correcting potentially incorrect ones. Therefore, we modify the research paradigm, demonstrating that improving mathematical reasoning abilities does not equate to mastery in error correction. Meanwhile, we propose a novel method called diagnostic-oriented promping(DOP) aimed at facilitating LLMs to excel in error correction. In experiments, DOP has shown outstanding performance, highlighting its significant impact. We argue that in mathematical education, the demand for outstanding correctors surpasses that for proficient reasoners. Codes and data are available on https://github.com/ChenhaoEcnuCS/Reason-Correct.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# Sheet Music Transformer ++: ピアノ楽譜のエンド・ツー・エンドフルページ光音楽認識 Sheet Music Transformer ++: End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music ( http://arxiv.org/abs/2405.12105v1 ) ライセンス: Link先を確認	Antonio Ríos-Vila, Jorge Calvo-Zaragoza, David Rizo, Thierry Paquet,	(参考訳) 光音楽認識は、効果的に楽譜をデジタル形式に転写する正確なシステムを実現するために、大きく進歩した分野である。それにもかかわらず、OMRが完全な可能性を達成するのを妨げるいくつかの制限がある。特に、最先端のOMRは、まだ全ページの転写を行うための多段階パイプラインに依存している。本研究では,従来のレイアウト解析ステップを必要とせずに,全ページのポリフォニック楽譜の書き起こしが可能なエンドツーエンドモデルであるSheet Music Transformer++を提案する。これは、合成データ生成による広範なカリキュラムベースの事前学習によって実現される。公開ポリフォニック転写データセットのフルページ拡張についていくつかの実験を行った。実験結果は、このモデルが全ページのピアノフォルムスコアの書き起こしに優れており、エンドツーエンドのOMR転写において注目すべきマイルストーンであることを示している。 Optical Music Recognition is a field that has progressed significantly, bringing accurate systems that transcribe effectively music scores into digital formats. Despite this, there are still several limitations that hinder OMR from achieving its full potential. Specifically, state of the art OMR still depends on multi-stage pipelines for performing full-page transcription, as well as it has only been demonstrated in monophonic cases, leaving behind very relevant engravings. In this work, we present the Sheet Music Transformer++, an end-to-end model that is able to transcribe full-page polyphonic music scores without the need of a previous Layout Analysis step. This is done thanks to an extensive curriculum learning-based pretraining with synthetic data generation. We conduct several experiments on a full-page extension of a public polyphonic transcription dataset. The experimental outcomes confirm that the model is competent at transcribing full-page pianoform scores, marking a noteworthy milestone in end-to-end OMR transcription.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# Imp: モバイルデバイス用大規模マルチモーダルモデル Imp: Highly Capable Large Multimodal Models for Mobile Devices ( http://arxiv.org/abs/2405.12107v1 ) ライセンス: Link先を確認	Zhenwei Shao, Zhou Yu, Jun Yu, Xuecheng Ouyang, Lihao Zheng, Zhenbiao Gai, Mingyang Wang, Jiajun Ding,	(参考訳) 大規模言語モデル(LLM)の能力を活用することで、近年の大規模マルチモーダルモデル(LMM)は、オープンワールドのマルチモーダル理解において顕著な汎用性を示している。それでも、それらは通常パラメータ重で計算集約的であり、リソース制約のあるシナリオにおける適用性を妨げます。この目的のために、制約付きスケール(例えば、3B)下での能力を最大化するために、いくつかの軽量LMMが連続して提案されている。これらの手法によって達成された奨励的な結果にもかかわらず、そのほとんどはデザイン空間の1つまたは2つの側面のみに焦点を当てており、モデル能力に影響を与える重要な設計選択はまだ十分に研究されていない。本稿では,モデルアーキテクチャ,トレーニング戦略,トレーニングデータの観点から,軽量LMMの体系的研究を行う。その結果,2B-4Bスケールで高い能力を有するLMMのファミリーであるImpが得られた。特に、我々のImp-3Bモデルは、同じ大きさの既存の軽量LMMを着実に上回り、13Bスケールで最先端のLMMを上回ります。低ビット量子化と解像度低減技術により、我々のImpモデルは、約13トークン/秒の高速な推論速度でQualcomm Snapdragon 8Gen3モバイルチップにデプロイできる。 By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximize the capabilities under constrained scale (e.g., 3B). Despite the encouraging results achieved by these methods, most of them only focus on one or two aspects of the design space, and the key design choices that influence model capability have not yet been thoroughly investigated. In this paper, we conduct a systematic study for lightweight LMMs from the aspects of model architecture, training strategy, and training data. Based on our findings, we obtain Imp -- a family of highly capable LMMs at the 2B-4B scales. Notably, our Imp-3B model steadily outperforms all the existing lightweight LMMs of similar size, and even surpasses the state-of-the-art LMMs at the 13B scale. With low-bit quantization and resolution reduction techniques, our Imp model can be deployed on a Qualcomm Snapdragon 8Gen3 mobile chip with a high inference speed of about 13 tokens/s.	翻訳日:2024-05-21 12:55:09 公開日:2024-05-20
# 逐次情報処理におけるボットネックからの言語構造 Linguistic Structure from a Bottleneck on Sequential Information Processing ( http://arxiv.org/abs/2405.12109v1 ) ライセンス: Link先を確認	Richard Futrell, Michael Hahn,	(参考訳) 人間の言語は自然界におけるユニークなコミュニケーション形態であり、その構造的な性質によって区別される。基本的にはシステマティックであり、信号は個々の意味のある部分(大まかに言えば単語)に分解され、文を形成するために通常の方法で結合される。さらに、これらの部分を組み合わせる方法は、通常、単語が結合され、連続した句を形成し、文の関連部分が互いに近接しているような、ある種の局所性を維持している。我々は,これらの言語の基本的特性が,情報処理制約の下での効率的なコミュニケーションのより広い原理からどのように生じるかを理解することの課題に対処する。ここでは, 自然言語に類似した体系性は, エントロピーの最小化から生じることを示す。シミュレーションでは, 余剰エントロピーを最小化する符号が, ソース分布をほぼ独立した成分に分解し, それらの成分を系統的に, 局所的に表現することを示した。次に,音韻学,形態学,構文学,意味論のレベルにおいて,人間の言語が低エントロピーを持つように構成されていることを示す。この結果から,人間言語は表現すべき意味に関する統計的分布について,独立成分分析の逐次的な一般化を行うことが示唆された。それは、人間の言語の統計的構造と代数的構造の間に結びつきを確立し、人間の言語の構造は、コミュニケーション表現性を最大化しながら認知的負荷を最小限に抑えるために進化したかもしれないという考えを強化する。 Human language is a unique form of communication in the natural world, distinguished by its structured nature. Most fundamentally, it is systematic, meaning that signals can be broken down into component parts that are individually meaningful -- roughly, words -- which are combined in a regular way to form sentences. Furthermore, the way in which these parts are combined maintains a kind of locality: words are usually concatenated together, and they form contiguous phrases, keeping related parts of sentences close to each other. We address the challenge of understanding how these basic properties of language arise from broader principles of efficient communication under information processing constraints. Here we show that natural-language-like systematicity arises from minimization of excess entropy, a measure of statistical complexity that represents the minimum amount of information necessary for predicting the future of a sequence based on its past. In simulations, we show that codes that minimize excess entropy factorize their source distributions into approximately independent components, and then express those components systematically and locally. Next, in a series of massively cross-linguistic corpus studies, we show that human languages are structured to have low excess entropy at the level of phonology, morphology, syntax, and semantics. Our result suggests that human language performs a sequential generalization of Independent Components Analysis on the statistical distribution over meanings that need to be expressed. It establishes a link between the statistical and algebraic structure of human language, and reinforces the idea that the structure of human language may have evolved to minimize cognitive load while maximizing communicative expressiveness.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# CoR-GS: Sparse-View 3D Gaussian Splatting by Co-Regularization CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization ( http://arxiv.org/abs/2405.12110v1 ) ライセンス: Link先を確認	Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, Xiao Bai,	(参考訳) 3Dガウススティング(3DGS)は、シーンを表現するために3Dガウスアンからなる放射場を生成する。スパーストレーニングの視点では、3DGSは容易にオーバーフィッティングに悩まされ、再建の質に悪影響を及ぼす。本稿では、スパースビュー3DGSを改善するための新しい協調正規化視点を提案する。また,2つの3次元ガウス放射場をシーンのスパークビューで訓練すると,2つの放射場が,非監督的に再現品質を予測できる「textit{point disagreement」と「textit{rendering disagreement」を呈することが明らかとなった。さらに、ガウスの点表現間の登録を評価し、その描画画素の差を計算することにより、点不一致と描画不一致をさらに定量化する。実験により,2つの相違点と正確な復元点との負の相関が示され,不正確な復元点の同定が可能となった。本研究では,2つの相違点に基づいて不正確な再構築を抑えるCoR-GSを提案する。 (『クロマンメラル2』) Pseudo-viewのコレギュラー化では、高いレンダリング不一致を示すピクセルは不正確にレンダリングされ、不一致を抑制する。 LLFF, Mip-NeRF360, DTU, Blenderの結果, CoR-GSはシーン形状を効果的に調整し, コンパクトな表現を再構築し, スパーストレーニングビュー下での最先端のノベルビュー合成品質を実現することを示した。 3D Gaussian Splatting (3DGS) creates a radiance field consisting of 3D Gaussians to represent a scene. With sparse training views, 3DGS easily suffers from overfitting, negatively impacting the reconstruction quality. This paper introduces a new co-regularization perspective for improving sparse-view 3DGS. When training two 3D Gaussian radiance fields with the same sparse views of a scene, we observe that the two radiance fields exhibit \textit{point disagreement} and \textit{rendering disagreement} that can unsupervisedly predict reconstruction quality, stemming from the sampling implementation in densification. We further quantify the point disagreement and rendering disagreement by evaluating the registration between Gaussians' point representations and calculating differences in their rendered pixels. The empirical study demonstrates the negative correlation between the two disagreements and accurate reconstruction, which allows us to identify inaccurate reconstruction without accessing ground-truth information. Based on the study, we propose CoR-GS, which identifies and suppresses inaccurate reconstruction based on the two disagreements: (\romannumeral1) Co-pruning considers Gaussians that exhibit high point disagreement in inaccurate positions and prunes them. (\romannumeral2) Pseudo-view co-regularization considers pixels that exhibit high rendering disagreement are inaccurately rendered and suppress the disagreement. Results on LLFF, Mip-NeRF360, DTU, and Blender demonstrate that CoR-GS effectively regularizes the scene geometry, reconstructs the compact representations, and achieves state-of-the-art novel view synthesis quality under sparse training views.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# Reindex-Then-Adapt:会話レコメンデーションのための大規模言語モデルの改善 Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation ( http://arxiv.org/abs/2405.12119v1 ) ライセンス: Link先を確認	Zhankui He, Zhouhang Xie, Harald Steck, Dawen Liang, Rahul Jha, Nathan Kallus, Julian McAuley,	(参考訳) 大規模言語モデル(LLM)は、アイテム内容のインデックス付け、複雑な会話コンテキストの理解、関連する項目タイトルの生成によって、会話レコメンデーションシステムに革命をもたらしている。しかし,推奨項目の流通管理は依然として課題である。これにより、ターゲットとなる会話レコメンデーションプラットフォーム上で、アイテムの人気など、急速に変化するデータ配信をキャプチャできないため、サブ最適化のパフォーマンスが向上する。会話のレコメンデーションでは、LLMはタイトル(複数のトークン)を自動回帰的に生成することでアイテムを推薦し、すべてのアイテムのレコメンデーションを取得し、制御することが困難になる。そこで本稿では,マルチトークンのタイトルをLPM内のシングルトークンに変換するReindex-Then-Adapt (RTA) フレームワークを提案し,それに応じて,これらのシングルトークンのタイトルに対する確率分布を調整する。 RTAフレームワークは、LLMと従来のレコメンデーションシステム(RecSys)の両方の利点をマージしている。我々のフレームワークは、3つの異なる会話レコメンデーションデータセットと2つの適応設定にまたがる精度の指標を実証する Large language models (LLMs) are revolutionizing conversational recommender systems by adeptly indexing item content, understanding complex conversational contexts, and generating relevant item titles. However, controlling the distribution of recommended items remains a challenge. This leads to suboptimal performance due to the failure to capture rapidly changing data distributions, such as item popularity, on targeted conversational recommendation platforms. In conversational recommendation, LLMs recommend items by generating the titles (as multiple tokens) autoregressively, making it difficult to obtain and control the recommendations over all items. Thus, we propose a Reindex-Then-Adapt (RTA) framework, which converts multi-token item titles into single tokens within LLMs, and then adjusts the probability distributions over these single-token item titles accordingly. The RTA framework marries the benefits of both LLMs and traditional recommender systems (RecSys): understanding complex queries as LLMs do; while efficiently controlling the recommended item distributions in conversational recommendations as traditional RecSys do. Our framework demonstrates improved accuracy metrics across three different conversational recommendation datasets and two adaptation settings	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# 量子二部計算の安全性とチート・センシティブ・プロトコル・オービビビラス・トランスファー・リダクションへの応用 Insecurity of Quantum Two-Party Computation with Applications to Cheat-Sensitive Protocols and Oblivious Transfer Reductions ( http://arxiv.org/abs/2405.12121v1 ) ライセンス: Link先を確認	Esther Hänggi, Severin Winkler,	(参考訳) Oblivious Transfer (OT)は、セキュアな双方向計算のための基本的なプリミティブである。 2人のプレーヤーが量子の場合であっても、ノイズレス通信チャネルにしかアクセスできない場合、OTは情報理論のセキュリティでは実装できないことが知られている。その結果、OTの弱い変種が研究されている。本研究では,不正な当事者が不正行為をすることができるが,検出されるリスクがある場合に,不正に敏感なOTの可否を厳格に証明する。我々は、受信機が送信者の全ての入力を計算し、この攻撃の成功確率に明示的な上限を与える量子プロトコルに対する一般的な攻撃を構築する。これは、統計情報理論のセキュリティでは、不正感受性の量子対称プライベート情報検索は実装できないことを意味する。証明のために考案された手法を活用して、セキュアな関数評価に必要なプリミティブのエントロピー境界を提供する。これは、プレイヤーがリソースとしてOTにアクセス可能なプロトコルに対して、不可能な結果をもたらすことを意味する。この結果は既存の境界を著しく改善し、リソースプリミティブへの1-out-n OTの還元のために厳密な境界を与える。我々の結果は、特に有限個のプリミティブ間の変換と任意の誤差に対して成り立つ。 Oblivious transfer (OT) is a fundamental primitive for secure two-party computation. It is well known that OT cannot be implemented with information-theoretic security if the two players only have access to noiseless communication channels, even in the quantum case. As a result, weaker variants of OT have been studied. In this work, we rigorously establish the impossibility of cheat-sensitive OT, where a dishonest party can cheat, but risks being detected. We construct a general attack on any quantum protocol that allows the receiver to compute all inputs of the sender and provide an explicit upper bound on the success probability of this attack. This implies that cheat-sensitive quantum Symmetric Private Information Retrieval cannot be implemented with statistical information-theoretic security. Leveraging the techniques devised for our proofs, we provide entropic bounds on primitives needed for secure function evaluation. They imply impossibility results for protocols where the players have access to OT as a resource. This result significantly improves upon existing bounds and yields tight bounds for reductions of 1-out-of-n OT to a resource primitive. Our results hold in particular for transformations between a finite number of primitives and for any error.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# 時系列分類のためのクラスバランス戦略をもつアクティブラーニングフレームワーク An Active Learning Framework with a Class Balancing Strategy for Time Series Classification ( http://arxiv.org/abs/2405.12122v1 ) ライセンス: Link先を確認	Shemonto Das,	(参考訳) 分類タスクのための機械学習モデルのトレーニングには、多くのサンプルをラベル付けする必要があることが多い。本研究では,実効時系列分類に必要なラベル付きデータの量を削減するためのアクティブラーニング(AL)戦略について検討する。従来のAL技術では、ラベル付けのためのクラス毎のインスタンスの選択を制御できないため、分類性能やインスタンスの選択、特に不均衡な時系列データセットに偏りが生じる可能性がある。そこで本研究では,標準AL戦略と統合された新しいクラスバランスインスタンス選択アルゴリズムを提案する。我々のアプローチは、ラベル付き例が少ないクラスからより多くのインスタンスを選択し、時系列データセットの不均衡に対処することを目的としている。触覚テクスチャ認識と産業断層検出の2つの異なる領域における情報データサンプル選択におけるALフレームワークの有効性を実証する。ロボット工学において,本手法は,ラベル付きトレーニングデータ要求を70%に抑えながら,高性能なテクスチャ分類を実現する。また、AL戦略を用いたロボットテクスチャ分類に異なる滑り窓時間間隔が与える影響についても検討した。合成繊維製造において,業界におけるデータアノテーションのコストと時間を最小限に抑えることを目的とした,障害分類の課題に対処するためにAL手法を適用した。また、AL戦略と統合したクラスバランスインスタンスアルゴリズムを用いて、マルチクラス産業異常データセットにおける実生活クラスの不均衡にも対処する。全体として、この論文は、これらの2つの異なるドメインにわたるALフレームワークの可能性を強調します。 Training machine learning models for classification tasks often requires labeling numerous samples, which is costly and time-consuming, especially in time series analysis. This research investigates Active Learning (AL) strategies to reduce the amount of labeled data needed for effective time series classification. Traditional AL techniques cannot control the selection of instances per class for labeling, leading to potential bias in classification performance and instance selection, particularly in imbalanced time series datasets. To address this, we propose a novel class-balancing instance selection algorithm integrated with standard AL strategies. Our approach aims to select more instances from classes with fewer labeled examples, thereby addressing imbalance in time series datasets. We demonstrate the effectiveness of our AL framework in selecting informative data samples for two distinct domains of tactile texture recognition and industrial fault detection. In robotics, our method achieves high-performance texture categorization while significantly reducing labeled training data requirements to 70%. We also evaluate the impact of different sliding window time intervals on robotic texture classification using AL strategies. In synthetic fiber manufacturing, we adapt AL techniques to address the challenge of fault classification, aiming to minimize data annotation cost and time for industries. We also address real-life class imbalances in the multiclass industrial anomalous dataset using our class-balancing instance algorithm integrated with AL strategies. Overall, this thesis highlights the potential of our AL framework across these two distinct domains.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# ディープラーニングモデルとメタラーニングモデルを用いたアルツハイマーの磁気共鳴画像分類 Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models ( http://arxiv.org/abs/2405.12126v1 ) ライセンス: Link先を確認	Nida Nasir, Muneeb Ahmed, Neda Afreen, Mustafa Sameer,	(参考訳) 最先端の機械学習アプローチであるディープラーニングは、複雑な高次元データ、特に医療分野における複雑な構造を特定するという点で、従来の機械学習よりも優れています。本研究では,最新のCNNを特徴とする深層学習技術を活用することで,アルツハイマー病(AD)のMRIデータを分類することに焦点を当てた。 MRIなどの脳イメージング技術により、アルツハイマー病に関連する病態生理学的脳変化の測定が可能になった。アルツハイマー病は高齢者の認知症の主要な原因であり、徐々に認知機能障害を引き起こす不可逆的な脳疾患である。本稿では,ソリューションのアプローチに対して個別にベンチマークディープモデルをトレーニングし,その後,複数のCNNの効果を高いリコールと精度の観測に組み合わせるために,アンサンブルアプローチを用いる。ここでは、積み重ね、多数決、高いリコール値のモデルの組み合わせなど、様々な手法を用いてモデルの有効性を評価する。多数決は、概して予測のばらつきを減少させるため、代替のモデリングアプローチよりも優れている。提案手法では,精度スコア0.90,リコールスコア0.89の試験精度90%を報告した。将来、この研究は、信号、画像、その他のデータを含む他の種類の医療データを組み込むように拡張することができる。同じまたは代替のデータセットは、アルツハイマーの検出を強化するために、追加の分類器、ニューラルネットワーク、AI技術で使用することができる。 Deep learning, a cutting-edge machine learning approach, outperforms traditional machine learning in identifying intricate structures in complex high-dimensional data, particularly in the domain of healthcare. This study focuses on classifying Magnetic Resonance Imaging (MRI) data for Alzheimer's disease (AD) by leveraging deep learning techniques characterized by state-of-the-art CNNs. Brain imaging techniques such as MRI have enabled the measurement of pathophysiological brain changes related to Alzheimer's disease. Alzheimer's disease is the leading cause of dementia in the elderly, and it is an irreversible brain illness that causes gradual cognitive function disorder. In this paper, we train some benchmark deep models individually for the approach of the solution and later use an ensembling approach to combine the effect of multiple CNNs towards the observation of higher recall and accuracy. Here, the model's effectiveness is evaluated using various methods, including stacking, majority voting, and the combination of models with high recall values. The majority voting performs better than the alternative modelling approach as the majority voting approach typically reduces the variance in the predictions. We report a test accuracy of 90% with a precision score of 0.90 and a recall score of 0.89 in our proposed approach. In future, this study can be extended to incorporate other types of medical data, including signals, images, and other data. The same or alternative datasets can be used with additional classifiers, neural networks, and AI techniques to enhance Alzheimer's detection.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# MoRA:パラメータ効率の良いファインチューニングのための高速更新 MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2405.12130v1 ) ライセンス: Link先を確認	Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang,	(参考訳) 低ランク適応は、大規模言語モデルのためのパラメータ効率の良い微調整法として人気がある。本稿では,LoRAに実装された低ランク更新の影響を解析する。以上の結果から,低ランク更新機構はLLMが新たな知識を効果的に学習し記憶する能力を制限する可能性が示唆された。この観測にインスパイアされたMoRAと呼ばれる新しい手法を提案する。これは2乗行列を用いて、同じ数のトレーニング可能なパラメータを維持しながら高階更新を実現する。これを実現するために、対応する非パラメータ演算子を導入し、入力次元を減らし、平方行列の出力次元を増大させる。さらに、これらの演算子は重量をLLMにマージできることを保証し、この手法をLoRAのように展開することができる。我々は,命令チューニング,数学的推論,継続事前学習,メモリ,事前学習という5つのタスクにまたがって,提案手法の総合的な評価を行う。本手法はメモリ集約型タスクではLoRAより優れ,他のタスクでは同等のパフォーマンスを実現している。 Low-rank adaptation is a popular parameter-efficient fine-tuning method for large language models. In this paper, we analyze the impact of low-rank updating, as implemented in LoRA. Our findings suggest that the low-rank updating mechanism may limit the ability of LLMs to effectively learn and memorize new knowledge. Inspired by this observation, we propose a new method called MoRA, which employs a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. To achieve it, we introduce the corresponding non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix. Furthermore, these operators ensure that the weight can be merged back into LLMs, which makes our method can be deployed like LoRA. We perform a comprehensive evaluation of our method across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory and pretraining. Our method outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# ソフトウェア工学研究と実践の進化のためのオープンサイエンスのビジョン A Vision on Open Science for the Evolution of Software Engineering Research and Practice ( http://arxiv.org/abs/2405.12132v1 ) ライセンス: Link先を確認	Edson OliveiraJr, Fernanda Madeiral, Alcemir Rodrigues Santos, Christina von Flach, Sergio Soares,	(参考訳) オープンサイエンスは、研究におけるオープンネスとコラボレーションを促進することを目的としており、より科学的、社会的に重要な影響をもたらす。しかしながら、Open Scienceの実践にはいくつかの課題が伴い、現在は適切な報酬が与えられていない。本稿では,これらの課題に対処するためのビジョンを,文化的にも技術的にも,ソフトウェア工学コミュニティの変革に必要なビルディングブロックを結合する概念的枠組みを通じて共有する。このフレームワークの背景にある考え方は、Open Scienceは、ソフトウェア工学の研究、実践、認識、関連する社会的影響を改善するための第一級の要件として扱われる、ということだ。コミュニティとしての私たちにとって、オープンサイエンスの利益を真に受け入れ、得るためには、長い道のりがあります。それでも私たちは、必要な文化のシフトを促進し、ソフトウェアエンジニアリングコミュニティを力づける方向について光を当てています。 Open Science aims to foster openness and collaboration in research, leading to more significant scientific and social impact. However, practicing Open Science comes with several challenges and is currently not properly rewarded. In this paper, we share our vision for addressing those challenges through a conceptual framework that connects essential building blocks for a change in the Software Engineering community, both culturally and technically. The idea behind this framework is that Open Science is treated as a first-class requirement for better Software Engineering research, practice, recognition, and relevant social impact. There is a long road for us, as a community, to truly embrace and gain from the benefits of Open Science. Nevertheless, we shed light on the directions for promoting the necessary culture shift and empowering the Software Engineering community.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# DTLLM-VLT:LLMに基づく視覚言語追跡のための多言語テキスト生成 DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM ( http://arxiv.org/abs/2405.12139v1 ) ライセンス: Link先を確認	Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang,	(参考訳) Visual Language Tracking (VLT)は、指定されたオブジェクトの正確な追跡のために、ビデオから自然言語記述を統合することで、単一のオブジェクト追跡(SOT)を強化する。高レベルのセマンティック情報を活用することで、VLTはオブジェクト追跡をガイドし、視覚的モダリティに依存する制約を緩和する。しかしながら、ほとんどのVLTベンチマークは単一の粒度で注釈付けされており、科学的ガイダンスを提供するための一貫性のあるセマンティックフレームワークが欠如している。さらに、高品質なアノテーションのための人間のアノテーションをコーディネートすることは、手間と時間を要する。これらの課題に対処するために,DTLLM-VLTを導入する。 1) DTLLM-VLTは, 凝集性プロンプトフレームワークを用いて, 科学的および多粒性テキスト記述を生成する。その簡潔で高度に適応可能な設計は、様々なビジュアルトラッキングベンチマークにシームレスに統合することができる。 2) アプローチの展開には,短期追跡,長期追跡,グローバルインスタンス追跡という,3つの重要なベンチマークを選択した。 DTLLM-VLTの実用性と汎用性を示すため,これらのベンチマークに対して,意味情報の範囲と密度を考慮した4つの粒度組合せを提案する。 3) テキストの粒度が異なるVLTベンチマークで比較実験を行い, 各種テキストがトラッキング性能に与える影響を評価し, 解析する。まとめると、この研究はLLMを活用し、VLTタスクの多言語意味情報を効率的かつ多様な視点から提供し、マルチモーダルトラッカーのきめ細かい評価を可能にする。将来的には、この作業をより多くのデータセットに拡張して、ビジョンデータセット理解をサポートするようになると考えています。 Visual Language Tracking (VLT) enhances single object tracking (SOT) by integrating natural language descriptions from a video, for the precise tracking of a specified object. By leveraging high-level semantic information, VLT guides object tracking, alleviating the constraints associated with relying on a visual modality. Nevertheless, most VLT benchmarks are annotated in a single granularity and lack a coherent semantic framework to provide scientific guidance. Moreover, coordinating human annotators for high-quality annotations is laborious and time-consuming. To address these challenges, we introduce DTLLM-VLT, which automatically generates extensive and multi-granularity text to enhance environmental diversity. (1) DTLLM-VLT generates scientific and multi-granularity text descriptions using a cohesive prompt framework. Its succinct and highly adaptable design allows seamless integration into various visual tracking benchmarks. (2) We select three prominent benchmarks to deploy our approach: short-term tracking, long-term tracking, and global instance tracking. We offer four granularity combinations for these benchmarks, considering the extent and density of semantic information, thereby showcasing the practicality and versatility of DTLLM-VLT. (3) We conduct comparative experiments on VLT benchmarks with different text granularities, evaluating and analyzing the impact of diverse text on tracking performance. Conclusionally, this work leverages LLM to provide multi-granularity semantic information for VLT task from efficient and diverse perspectives, enabling fine-grained evaluation of multi-modal trackers. In the future, we believe this work can be extended to more datasets to support vision datasets understanding.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# 大規模言語モデルによる問題仕様の緩和 Eliciting Problem Specifications via Large Language Models ( http://arxiv.org/abs/2405.12147v1 ) ライセンス: Link先を確認	Robert E. Wray, James R. Kirk, John E. Laird,	(参考訳) 認知システムは一般に、人間に問題の定義を、認知システムが問題の解決やタスクの実行に使用可能な仕様に変換することを要求する。本稿では,大規模言語モデル(LLM)を用いて,自然言語で定義された問題クラスを半形式仕様にマッピングし,既存の推論学習システムを用いて問題クラスからインスタンスを解く方法を提案する。本稿では,LLM対応認知タスク分析エージェントの設計について述べる。 LLMエージェントによって実装された本システムは,自然言語で指定されたタスクに対する問題空間の定義を生成する。 LLMプロンプトは、AI文学における問題空間の定義と一般的な問題解決戦略(Polya's How to Solve It)から導かれる。認知システムは、問題空間の仕様を使い、ドメイン一般の問題解決戦略(探索のような弱い方法)を適用して、問題クラスから複数の問題を解くことができる。この結果は、予備的ではあるが、問題定式化の切り離しを通じて認知システム研究を加速し、堅牢な推論やオンライン学習のような認知システムのコア能力を維持できる可能性を示唆している。 Cognitive systems generally require a human to translate a problem definition into some specification that the cognitive system can use to attempt to solve the problem or perform the task. In this paper, we illustrate that large language models (LLMs) can be utilized to map a problem class, defined in natural language, into a semi-formal specification that can then be utilized by an existing reasoning and learning system to solve instances from the problem class. We present the design of LLM-enabled cognitive task analyst agent(s). Implemented with LLM agents, this system produces a definition of problem spaces for tasks specified in natural language. LLM prompts are derived from the definition of problem spaces in the AI literature and general problem-solving strategies (Polya's How to Solve It). A cognitive system can then use the problem-space specification, applying domain-general problem solving strategies ("weak methods" such as search), to solve multiple instances of problems from the problem class. This result, while preliminary, suggests the potential for speeding cognitive systems research via disintermediation of problem formulation while also retaining core capabilities of cognitive systems, such as robust inference and online learning.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# バングラデシュ、野生で自家用車検出 Bangladeshi Native Vehicle Detection in Wild ( http://arxiv.org/abs/2405.12150v1 ) ライセンス: Link先を確認	Bipin Saha, Md. Johirul Islam, Shaikh Khaled Mostaque, Aditya Bhowmik, Tapodhir Karmakar Taton, Md. Nakib Hayat Chowdhury, Mamun Bin Ibne Reaz,	(参考訳) 自律ナビゲーションの成功は、地域固有の車両検出データセットの不足によって妨げられ、コンテキスト認識システムの開発を妨げる、堅牢で正確な車両認識に依存している。そこで本研究では,バングラデシュで最もよく見られる車両群を対象とした固有車両検出データセットを提案する。 17の異なる車両クラスが考慮され、17326の画像の81542の完全な注釈が付けられている。各画像幅は少なくとも1280pxに設定されている。データセットの平均車両バウンディングボックス-イメージ比は4.7036である。このバングラデシュのNative Vehicle Dataset(BNVD)は、いくつかの地理的、照明、さまざまな車両サイズ、そして、驚きのシナリオでより堅牢な方向を考慮に入れている。 BNVDデータセットを調べる上で、この研究は4つの連続したYou Only Look Once(YOLO)モデル、すなわちYOLO v5、v6、v7、v8で徹底的な評価を提供する。これらのデータセットの有効性は、すでに使用されている他の車両データセットと方法論的に評価され、対比される。 BNVDデータセットは、結合(IoU)上の50%の交差点における平均平均精度(mAP)が0.848で、0.841と0.774の精度とリコール値に対応する。研究結果は、IoUの0.5から0.95の範囲で、mAPが0.643であることを示している。実験の結果,BNVDデータセットは車両分布の信頼性の高い表現として機能し,かなりの複雑さを示すことがわかった。 The success of autonomous navigation relies on robust and precise vehicle recognition, hindered by the scarcity of region-specific vehicle detection datasets, impeding the development of context-aware systems. To advance terrestrial object detection research, this paper proposes a native vehicle detection dataset for the most commonly appeared vehicle classes in Bangladesh. 17 distinct vehicle classes have been taken into account, with fully annotated 81542 instances of 17326 images. Each image width is set to at least 1280px. The dataset's average vehicle bounding box-to-image ratio is 4.7036. This Bangladesh Native Vehicle Dataset (BNVD) has accounted for several geographical, illumination, variety of vehicle sizes, and orientations to be more robust on surprised scenarios. In the context of examining the BNVD dataset, this work provides a thorough assessment with four successive You Only Look Once (YOLO) models, namely YOLO v5, v6, v7, and v8. These dataset's effectiveness is methodically evaluated and contrasted with other vehicle datasets already in use. The BNVD dataset exhibits mean average precision(mAP) at 50% intersection over union (IoU) is 0.848 corresponding precision and recall values of 0.841 and 0.774. The research findings indicate a mAP of 0.643 at an IoU range of 0.5 to 0.95. The experiments show that the BNVD dataset serves as a reliable representation of vehicle distribution and presents considerable complexities.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# Fennec: ブランチとブリッジによって拡張されたきめ細かい言語モデルの評価と補正 Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging ( http://arxiv.org/abs/2405.12163v1 ) ライセンス: Link先を確認	Xiaobo Liang, Haoke Zhang, Helan hu, Juntao Li, Jun Xu, Min Zhang,	(参考訳) 大規模言語モデルの急速な進歩は、主に人間の意図に合わせることに焦点を当てた、無数の現実世界のタスクにまたがる多くの応用をもたらした。しかし、人間の意図に固有の複雑さは、労働集約的かつ時間を要する人間の評価に依存する必要がある。この制約を緩和するため,オープンソースの大規模言語モデルを評価対象として活用するパラダイムを探求し,GPT-4の利用傾向と整合する。特に、ステップバイステップ評価フレームワークを提案する: \textbf{F}ine-fine \textbf{E}valuatio\textbf{N} と correctio\textbf{N} \textbf{E}xtended through bran\textbf{C}hing and bridging。具体的には, 分岐操作により, 評価タスクを様々な次元と粒度に分割し, 評価に伴う課題を軽減する。同時に、ブリッジング操作は多様なトレーニングデータセットに適合し、さまざまな評価タスクを増強する。実験では,GPT-4の能力に近づき,さまざまな広く採用されているベンチマークにおいて,我々の7Bモデルは,オープンソースの大規模評価モデルよりも常に優れています。評価モデルにより誘導される微粒化補正機能を用いて複数のモデル応答を洗練し, 改良により応答の質が向上し, MT-Benchでは1-2点の改善が得られた。私たちのコードはGithub\footnote{\url{https://github.com/dropreg/Fennec}}で公開されています。 The rapid advancement of large language models has given rise to a plethora of applications across a myriad of real-world tasks, mainly centered on aligning with human intent. However, the complexities inherent in human intent necessitate a dependence on labor-intensive and time-consuming human evaluation. To alleviate this constraint, we delve into the paradigm of employing open-source large language models as evaluators, aligning with the prevailing trend of utilizing GPT-4. Particularly, we present a step-by-step evaluation framework: \textbf{Fennec}, capable of \textbf{F}ine-grained \textbf{E}valuatio\textbf{N} and correctio\textbf{N} \textbf{E}xtended through bran\textbf{C}hing and bridging. Specifically, the branching operation dissects the evaluation task into various dimensions and granularities, thereby alleviating the challenges associated with evaluation. Concurrently, the bridging operation amalgamates diverse training datasets, augmenting the variety of evaluation tasks. In experimental trials, our 7B model consistently outperforms open-source larger-scale evaluation models across various widely adopted benchmarks in terms of both \textit{Agreement} and \textit{Consistency}, closely approaching the capabilities of GPT-4. We employ the fine-grained correction capabilities induced by the evaluation model to refine multiple model responses, and the results show that the refinement elevates the quality of responses, leading to an improvement of 1-2 points on the MT-Bench. Our code is available at Github\footnote{\url{https://github.com/dropreg/Fennec}}.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# AI能力のオープンソースアセスメント:AI分析ツールの普及、競合モデルのレプリケーション、Zhousidunデータセット Open-Source Assessments of AI Capabilities: The Proliferation of AI Analysis Tools, Replicating Competitor Models, and the Zhousidun Dataset ( http://arxiv.org/abs/2405.12167v1 ) ライセンス: Link先を確認	Ritwik Gupta, Leah Walker, Eli Glickman, Raine Koizumi, Sarthak Bhatnagar, Andrew W. Reddie,	(参考訳) 人工知能(AI)の軍事能力への統合は、世界中の主要な軍事力の標準となっている。これらのAIモデルがどのように機能するかを理解することは、戦略的アドバンテージの維持とセキュリティの確保に不可欠である。本稿は、アメリカと連合国の駆逐艦に重要な部品を徹底的にラベル付けした中国指向のデータセットであるZhousidunデータセットの詳細な検証を通して、軍事AIモデルを分析するためのオープンソース手法を実証する。このデータセット上で、最先端のコンピュータビジョンモデルのレプリケーションを実演することで、オープンソースツールをどのように活用して、重要な軍事AI機能を評価し、理解することができるかを説明します。この方法論は、AI対応軍事能力の性能と可能性を評価するための堅牢なフレームワークを提供し、戦略評価の正確性と信頼性を高める。 The integration of artificial intelligence (AI) into military capabilities has become a norm for major military power across the globe. Understanding how these AI models operate is essential for maintaining strategic advantages and ensuring security. This paper demonstrates an open-source methodology for analyzing military AI models through a detailed examination of the Zhousidun dataset, a Chinese-originated dataset that exhaustively labels critical components on American and Allied destroyers. By demonstrating the replication of a state-of-the-art computer vision model on this dataset, we illustrate how open-source tools can be leveraged to assess and understand key military AI capabilities. This methodology offers a robust framework for evaluating the performance and potential of AI-enabled military capabilities, thus enhancing the accuracy and reliability of strategic assessments.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# 医用イメージングソフトの現状 State of the Practice for Medical Imaging Software ( http://arxiv.org/abs/2405.12171v1 ) ライセンス: Link先を確認	W. Spencer Smith, Ao Dong, Jacques Carette, Michael D. Noseworthy,	(参考訳) 我々は48の候補者から29の医療画像プロジェクトを選択し、各ソフトウェアプロジェクトに対して108の質問に答えて10のソフトウェア品質を評価し、29の開発チームのうち8人にインタビューを行った。定量的データに基づいて、分析階層プロセス(AHP)を用いてMIソフトウェアをランク付けした。上位4つのソフトウェア製品は、3D Slicer、ImageJ、Fiji、OHIF Viewerである。調査ソフトウェア開発ガイドラインが推奨するドキュメントアーティファクトの88%、MIプロジェクトの100%がバージョン管理ツールを使用しており、開発者は準アジャイルなソフトウェア開発プロセスを使っているようです。しかしながら、いくつかの推奨されたアーティファクトの希少さ、継続的統合の低使用(17%)、ユニットテストの低使用(プロジェクトの約50%)、ドキュメントによる改善の余地があるため、現在のプラクティスは既存のガイドラインから逸脱している。開発者へのインタビューから、開発時間不足、資金不足、技術的ハードル、正確性の確保、ユーザビリティ、保守性、再現性という、潜在的な懸念点の5つと2つの特性を特定しました。インタビュアーは、プラクティスの状態を改善し、特定された痛点に対処し、ソフトウェアの品質を改善するための戦略を提案した。ドキュメントの増加、データセットの強化によるテストの向上、継続的インテグレーションの利用の増加、Webアプリケーションへの移行、リンタの採用、ピアレビューの利用、変更のための設計、保証ケースの追加、"Generate All Things"アプローチの導入。 We selected 29 medical imaging projects from 48 candidates, assessed 10 software qualities by answering 108 questions for each software project, and interviewed 8 of the 29 development teams. Based on the quantitative data, we ranked the MI software with the Analytic Hierarchy Process (AHP). The four top-ranked software products are 3D Slicer, ImageJ, Fiji, and OHIF Viewer. Generally, MI software is in a healthy state as shown by the following: we observed 88% of the documentation artifacts recommended by research software development guidelines, 100% of MI projects use version control tools, and developers appear to use the common quasi-agile research software development process. However, the current state of the practice deviates from the existing guidelines because of the rarity of some recommended artifacts, low usage of continuous integration (17% of the projects), low use of unit testing (about 50% of projects), and room for improvement with documentation (six of nine developers felt their documentation was not clear enough). From interviewing the developers, we identified five pain points and two qualities of potential concern: lack of development time, lack of funding, technology hurdles, ensuring correctness, usability, maintainability, and reproducibility. The interviewees proposed strategies to improve the state of the practice, to address the identified pain points, and to improve software quality. Combining their ideas with ours, we have the following list of recommendations: increase documentation, increase testing by enriching datasets, increase continuous integration usage, move to web applications, employ linters, use peer reviews, design for change, add assurance cases, and incorporate a "Generate All Things" approach.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# CT-Eval: 大規模言語モデルにおける中国語のテキスト・ツー・タブル性能のベンチマーク CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models ( http://arxiv.org/abs/2405.12174v1 ) ライセンス: Link先を確認	Haoxiang Shi, Jiaan Wang, Jiarong Xu, Cen Wang, Tetsuya Sakai,	(参考訳) Text-to-Tableは構造化されたテーブルを生成し、構造化されていないドキュメントからキー情報を伝達することを目的としている。既存のテキストからテーブルへのデータセットは、典型的には英語指向であり、非英語言語の研究を制限する。一方、大規模言語モデル(LLM)の出現は、理論的に他の言語でテキスト・ツー・テーブルを可能にする多言語設定(ChatGPTなど)における一般的なタスク・ソルバとして大きな成功を収めている。本稿では,この課題に対するLCMのベンチマークを行うために,中国語のテキスト・ツー・テーブル・データセットであるCT-Evalを提案する。英語のテキスト・ツー・テーブル・データセットの予備分析では、データセット構築の2つの重要な要素として、データの多様性とデータ幻覚を挙げている。これにインスパイアされたCT-Evalデータセットは、人気の中国の多分野オンライン百科事典をソースとして選択し、データ多様性を確保するために28のドメインをカバーする。データ幻覚を最小化するために、まずLLMを訓練して、幻覚でタスクサンプルを判断・フィルタリングし、次に人間の注釈を使って、検証とテストセットの幻覚をきれいにする。このプロセスの後、CT-Evalは88.6Kのタスクサンプルを含む。 CT-Eval を用いて,オープンソースおよびクローズドソース LLM の性能評価を行った。以上の結果から,ゼロショットLPM(GPT-4を含む)は,人間の判断に比較して有意な性能差がみられた。さらに、微調整後、オープンソースのLCMはテキスト・ツー・テーブルの能力を大幅に向上させ、GPT-4を大きなマージンで上回る。要するに、CT-Evalは、既存のLLMの中国語のテキスト・ツー・テーブル能力の評価と迅速な理解を支援するだけでなく、LLMのテキスト・ツー・テーブル性能を著しく向上させる貴重なリソースとしても役立つ。 Text-to-Table aims to generate structured tables to convey the key information from unstructured documents. Existing text-to-table datasets are typically oriented English, limiting the research in non-English languages. Meanwhile, the emergence of large language models (LLMs) has shown great success as general task solvers in multi-lingual settings (e.g., ChatGPT), theoretically enabling text-to-table in other languages. In this paper, we propose a Chinese text-to-table dataset, CT-Eval, to benchmark LLMs on this task. Our preliminary analysis of English text-to-table datasets highlights two key factors for dataset construction: data diversity and data hallucination. Inspired by this, the CT-Eval dataset selects a popular Chinese multidisciplinary online encyclopedia as the source and covers 28 domains to ensure data diversity. To minimize data hallucination, we first train an LLM to judge and filter out the task samples with hallucination, then employ human annotators to clean the hallucinations in the validation and testing sets. After this process, CT-Eval contains 88.6K task samples. Using CT-Eval, we evaluate the performance of open-source and closed-source LLMs. Our results reveal that zero-shot LLMs (including GPT-4) still have a significant performance gap compared with human judgment. Furthermore, after fine-tuning, open-source LLMs can significantly improve their text-to-table ability, outperforming GPT-4 by a large margin. In short, CT-Eval not only helps researchers evaluate and quickly understand the Chinese text-to-table ability of existing LLMs but also serves as a valuable resource to significantly improve the text-to-table performance of LLMs.	翻訳日:2024-05-21 12:45:20 公開日:2024-05-20
# 説明可能なAIの強化: CNN解釈性のためのGradCAMとLRPを組み合わせたハイブリッドアプローチ Enhancing Explainable AI: A Hybrid Approach Combining GradCAM and LRP for CNN Interpretability ( http://arxiv.org/abs/2405.12175v1 ) ライセンス: Link先を確認	Vaibhav Dhore, Achintya Bhat, Viraj Nerlekar, Kashyap Chavhan, Aniket Umare,	(参考訳) 本稿では,GradCAM と LRP の組合せを用いて,CNN ベースモデルの出力を説明する手法を提案する。どちらの手法も、予測に重要な入力領域をハイライトすることで視覚的説明を生成する。新しい手法では、GradCAMが生成した説明を最初に処理してノイズを除去する。次に、処理された出力とLPPの出力とを乗算する。最後に、製品にガウスのぼかしが適用される。提案手法をGradCAMとLRPと比較し,Fithfulness, Robustness, Complexity, Localisation, Randomisationの指標について検討した。この手法はGradCAMとLRPの両方よりも複雑度に優れており、他の指標よりも優れていることが観察された。 We present a new technique that explains the output of a CNN-based model using a combination of GradCAM and LRP methods. Both of these methods produce visual explanations by highlighting input regions that are important for predictions. In the new method, the explanation produced by GradCAM is first processed to remove noises. The processed output is then multiplied elementwise with the output of LRP. Finally, a Gaussian blur is applied on the product. We compared the proposed method with GradCAM and LRP on the metrics of Faithfulness, Robustness, Complexity, Localisation and Randomisation. It was observed that this method performs better on Complexity than both GradCAM and LRP and is better than atleast one of them in the other metrics.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 直交多項式を用いたテンポラルカーネルの構築 Building Temporal Kernels with Orthogonal Polynomials ( http://arxiv.org/abs/2405.12179v1 ) ライセンス: Link先を確認	Yan Ru Pei, Olivier Coenen,	(参考訳) 直交多項式基底関数から生成される時間的畳み込みカーネルを含むPLEIADES(PoLynomial Expansion In Adaptive Distributed Event-based Systems)と呼ばれるモデルのクラスを紹介する。我々は、これらのネットワークをイベントベースのデータで相互接続して、オンラインの時空間分類と検出を低レイテンシで行うことに重点を置いている。構造化時間カーネルとイベントベースデータを使用することで、さらなる微調整をすることなく、ネットワークの離散化ステップサイズとともにデータのサンプルレートを変更できる。我々は3つのイベントベースのベンチマークを実験し、メモリと計算コストを大幅に削減した大きなマージンで3つすべてに対して最先端の結果を得た。達成しました。 1) DVS128ハンドジェスチャー認識データセット上の192Kパラメータによる99.59%の精度、および小さな出力フィルタによる100%の精度。 2)AIS2024眼球追跡課題における277Kパラメータによる99.58%の検査精度,及び 3) ProPHESEE 1 Megapixel Automotive Detection Datasetに576kパラメータを持つ0.556mAP。 We introduce a class of models named PLEIADES (PoLynomial Expansion In Adaptive Distributed Event-based Systems), which contains temporal convolution kernels generated from orthogonal polynomial basis functions. We focus on interfacing these networks with event-based data to perform online spatiotemporal classification and detection with low latency. By virtue of using structured temporal kernels and event-based data, we have the freedom to vary the sample rate of the data along with the discretization step-size of the network without additional finetuning. We experimented with three event-based benchmarks and obtained state-of-the-art results on all three by large margins with significantly smaller memory and compute costs. We achieved: 1) 99.59% accuracy with 192K parameters on the DVS128 hand gesture recognition dataset and 100% with a small additional output filter; 2) 99.58% test accuracy with 277K parameters on the AIS 2024 eye tracking challenge; and 3) 0.556 mAP with 576k parameters on the PROPHESEE 1 Megapixel Automotive Detection Dataset.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 適応型ノードレベル重み学習を用いた多階グラフクラスタリング Multi-order Graph Clustering with Adaptive Node-level Weight Learning ( http://arxiv.org/abs/2405.12183v1 ) ライセンス: Link先を確認	Ye Liu, Xuelei Lin, Yejia Chen, Reynold Cheng,	(参考訳) 現在のグラフクラスタリング手法では、個々のノードとエッジの対流が強調される一方で、モチーフのレベルで上位の組織を無視している。高階グラフクラスタリングアプローチは、モチーフベースのハイパーグラフによって設計されている。しかし、これらのアプローチは、しばしばハイパーグラフの断片化の問題に悩まされ、クラスタリング性能が大幅に低下する。さらに、現実世界のグラフは通常多様なモチーフを含み、ノードは複数のモチーフに参加する。重要な課題は、ノードレベルで複数のモチーフからの情報を統合することで、正確なクラスタリング結果を実現する方法だ。本稿では,複数の高階構造とエッジ接続をノードレベルで統合する多階グラフクラスタリングモデル(MOGC)を提案する。 MOGCは適応的な重み学習機構を用いて、各ノードに対する異なるモチーフの寄与を数学的に調整する。これはハイパーグラフの断片化問題に対処するだけでなく、クラスタリングの精度を高める。 MOGCは、交互に最小化アルゴリズムにより効率よく解決される。 7つの実世界のデータセットの実験では、MOGCの有効性が示されている。 Current graph clustering methods emphasize individual node and edge con nections, while ignoring higher-order organization at the level of motif. Re cently, higher-order graph clustering approaches have been designed by motif based hypergraphs. However, these approaches often suffer from hypergraph fragmentation issue seriously, which degrades the clustering performance greatly. Moreover, real-world graphs usually contain diverse motifs, with nodes participating in multiple motifs. A key challenge is how to achieve precise clustering results by integrating information from multiple motifs at the node level. In this paper, we propose a multi-order graph clustering model (MOGC) to integrate multiple higher-order structures and edge connections at node level. MOGC employs an adaptive weight learning mechanism to au tomatically adjust the contributions of different motifs for each node. This not only tackles hypergraph fragmentation issue but enhances clustering accuracy. MOGC is efficiently solved by an alternating minimization algo rithm. Experiments on seven real-world datasets illustrate the effectiveness of MOGC.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# Approximate Unrolled Differentation による学習データ帰属 Training Data Attribution via Approximate Unrolled Differentation ( http://arxiv.org/abs/2405.12186v1 ) ライセンス: Link先を確認	Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse,	(参考訳) 多くのトレーニングデータ属性(TDA)メソッドは、トレーニングセットから1つ以上のデータポイントが削除された場合、モデルの振る舞いがどのように変化するかを推定することを目的としている。影響関数のような暗黙の微分に基づく手法は、計算的に効率的にできるが、不特定性、最適化アルゴリズムの暗黙のバイアス、多段階の訓練パイプラインを考慮できない。対照的に、アンロールに基づくメソッドはこれらの問題に対処するが、スケーラビリティの課題に直面している。本研究では、暗黙差分法とアンローリング法を結合し、インフルエンス関数式を用いて計算した近似アンローリング法であるSourceを導入する。アンローリングベースのアプローチに比べて計算効率は良いが、ソースは非収束モデルやマルチステージトレーニングパイプラインなど、暗黙差分に基づくアプローチが苦戦している場合に適している。実証的に、ソースは既存のTDA技術よりも、特に暗黙差分法に基づくアプローチが不十分な環境では、対実予測で優れている。 Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges. In this work, we connect the implicit-differentiation-based and unrolling-based approaches and combine their benefits by introducing Source, an approximate unrolling-based TDA method that is computed using an influence-function-like formula. While being computationally efficient compared to unrolling-based approaches, Source is suitable in cases where implicit-differentiation-based approaches struggle, such as in non-converged models and multi-stage training pipelines. Empirically, Source outperforms existing TDA techniques in counterfactual prediction, especially in settings where implicit-differentiation-based approaches fall short.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 企業責任AI研究の幅と幅 The Narrow Depth and Breadth of Corporate Responsible AI Research ( http://arxiv.org/abs/2405.12193v1 ) ライセンス: Link先を確認	Nur Ahmed, Amit Das, Kirsten Martin, Kawshik Banerjee,	(参考訳) AIの変革的なポテンシャルは、驚くべき機会だけでなく、重大なリスクをもたらし、責任あるAI開発とデプロイメントの重要性を強調している。この分野に重点を置いているにもかかわらず、AI研究における産業の関与、すなわちAIの倫理的、社会的、法的側面に対する批判的な評価について、限定的な理解がある。このギャップに対処するため、産業のエンゲージメントを定量化するために、5つの異なるデータセットにまたがる複数の方法を用いて、600万以上のピアレビュー記事と3200万の特許引用を分析しました。我々の研究結果によると、AI企業の大多数は、この重要なAI分野への関与が限られているか、全くないことがわかった。我々は、従来のAI研究における業界の支配的な存在と、その責任あるAIへの関与が限定的であることの相違を示す。先進的なAI企業は、従来のAI研究や主要な学術機関の貢献と比べて、責任あるAI研究の成果が著しく低い。私たちの言語分析では、業界におけるAI研究の責任範囲が狭くなり、対処される主要なトピックの多様性が欠如しています。当社の大規模特許引用分析は、責任あるAI研究とAI技術の商業化との明確な断絶を明らかにし、業界特許が責任あるAI文献によって生成された洞察に基づいて構築されることは滅多にないことを示唆している。このギャップは、AI開発が社会的に最適な経路から分岐する可能性を強調し、倫理的および社会的意味の考慮が不十分なために意図しない結果のリスクを負う。我々の結果は、学術知識を吸収し、公的な信頼を育み、AIが引き起こす社会的害を積極的に軽減するために、業界が責任あるAI研究を公然と行う必要性を強調している。 The transformative potential of AI presents remarkable opportunities, but also significant risks, underscoring the importance of responsible AI development and deployment. Despite a growing emphasis on this area, there is limited understanding of industry's engagement in responsible AI research, i.e., the critical examination of AI's ethical, social, and legal dimensions. To address this gap, we analyzed over 6 million peer-reviewed articles and 32 million patent citations using multiple methods across five distinct datasets to quantify industry's engagement. Our findings reveal that the majority of AI firms show limited or no engagement in this critical subfield of AI. We show a stark disparity between industry's dominant presence in conventional AI research and its limited engagement in responsible AI. Leading AI firms exhibit significantly lower output in responsible AI research compared to their conventional AI research and the contributions of leading academic institutions. Our linguistic analysis documents a narrower scope of responsible AI research within industry, with a lack of diversity in key topics addressed. Our large-scale patent citation analysis uncovers a pronounced disconnect between responsible AI research and the commercialization of AI technologies, suggesting that industry patents rarely build upon insights generated by the responsible AI literature. This gap highlights the potential for AI development to diverge from a socially optimal path, risking unintended consequences due to insufficient consideration of ethical and societal implications. Our results highlight the urgent need for industry to publicly engage in responsible AI research to absorb academic knowledge, cultivate public trust, and proactively mitigate AI-induced societal harms.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# ソフトウェア開発におけるChatGPTの影響に関する開発者の認識: 調査 Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey ( http://arxiv.org/abs/2405.12195v1 ) ライセンス: Link先を確認	Thiago S. Vaillant, Felipe Deveza de Almeida, Paulo Anselmo M. S. Neto, Cuiyun Gao, Jan Bosch, Eduardo Santana de Almeida,	(参考訳) ChatGPTや類似システムを含むLarge Language Models (LLMs) は進歩を続けており、その堅牢な自然言語処理能力と多様なアプリケーションが注目を集めている。それでも、人工知能(AI)とソフトウェア工学(SE)の収束がますます認知されているにもかかわらず、この収束がソフトウェア開発者の実践や認識に与える影響に関する研究が不足している。ソフトウェア開発者がChatGPTのようなAIツールをどのように認識し、関与しているかを理解することは、AI駆動ツールをソフトウェア開発プロセスに組み込むことによる影響と潜在的な課題を解明するために不可欠である。本稿では,207人のソフトウェア開発者を対象に,ChatGPTがソフトウェア品質,生産性,仕事満足度に与える影響について調査を行った。さらに、この研究は、ChatGPTの今後の適応に対する開発者の期待、潜在的な仕事の移り変わりに関する懸念、規制介入の視点を掘り下げている。 As Large Language Models (LLMs), including ChatGPT and analogous systems, continue to advance, their robust natural language processing capabilities and diverse applications have garnered considerable attention. Nonetheless, despite the increasing acknowledgment of the convergence of Artificial Intelligence (AI) and Software Engineering (SE), there is a lack of studies involving the impact of this convergence on the practices and perceptions of software developers. Understanding how software developers perceive and engage with AI tools, such as ChatGPT, is essential for elucidating the impact and potential challenges of incorporating AI-driven tools in the software development process. In this paper, we conducted a survey with 207 software developers to understand the impact of ChatGPT on software quality, productivity, and job satisfaction. Furthermore, the study delves into developers' expectations regarding future adaptations of ChatGPT, concerns about potential job displacement, and perspectives on regulatory interventions.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 多視点3次元物体検出のための多視点注意コンテキスト化 Multi-View Attentive Contextualization for Multi-View 3D Object Detection ( http://arxiv.org/abs/2405.12200v1 ) ライセンス: Link先を確認	Xianpeng Liu, Ce Zheng, Ming Qian, Nan Xue, Chen Chen, Zhebin Zhang, Chen Li, Tianfu Wu,	(参考訳) 本稿では,MvACon(Multi-View Attentive Contextualization, MvACon)を提案する。クエリベースのMV3Dオブジェクト検出の分野で顕著な進歩があったにもかかわらず、先行技術は、高い計算コストのために、高解像度の2D特徴を高精細な注意ベースリフトで活用することの欠如や、3Dクエリの低精細なグラウンド化から、スパースな注意ベースリフトでのマルチスケールの2D特徴まで、しばしば悩まされる。提案したMvAConは,2D-to-3Dの特徴持ち上げ手法に非依存な,表現的に密度が高く,計算的に疎い特徴合わせ方式を用いて,2羽の鳥に1羽の石で当たった。実験では、提案したMvAConは、BEVFormerと最近の3Dデフォルマブルアテンション(DFA3D)とPETRの両方を用いて、nuScenesベンチマークで徹底的にテストされ、特に位置、方向、速度予測におけるパフォーマンスの向上において、一貫した検出性能の向上を示す。また、同様の改善でBEVFormerを使用してWaymo-miniベンチマークでもテストされている。我々は,大域的なクラスタベースのコンテキストが,MV3Dオブジェクト検出のための濃密なシーンレベルのコンテキストを効果的に符号化できることを質的に定量的に示す。提案したMvAConの有望な結果は、コンピュータビジョンにおけるアドアージを補強する -- `(contextualized) feature matters" である。 We present Multi-View Attentive Contextualization (MvACon), a simple yet effective method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object detection. Despite remarkable progress witnessed in the field of query-based MV3D object detection, prior art often suffers from either the lack of exploiting high-resolution 2D features in dense attention-based lifting, due to high computational costs, or from insufficiently dense grounding of 3D queries to multi-scale 2D features in sparse attention-based lifting. Our proposed MvACon hits the two birds with one stone using a representationally dense yet computationally sparse attentive feature contextualization scheme that is agnostic to specific 2D-to-3D feature lifting approaches. In experiments, the proposed MvACon is thoroughly tested on the nuScenes benchmark, using both the BEVFormer and its recent 3D deformable attention (DFA3D) variant, as well as the PETR, showing consistent detection performance improvement, especially in enhancing performance in location, orientation, and velocity prediction. It is also tested on the Waymo-mini benchmark using BEVFormer with similar improvement. We qualitatively and quantitatively show that global cluster-based contexts effectively encode dense scene-level contexts for MV3D object detection. The promising results of our proposed MvACon reinforces the adage in computer vision -- ``(contextualized) feature matters".	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 任意スケール超解像前における学習可能な周波数認識損失を有する階層型ニューラル演算子変換器 Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution ( http://arxiv.org/abs/2405.12202v1 ) ライセンス: Link先を確認	Xihaier Luo, Xiaoning Qian, Byung-Jun Yoon,	(参考訳) 本研究では, 連続性, マルチスケール物理, 高周波信号の複雑化といった複雑な課題を伴って, 科学的データの解法を強化するため, 任意のスケール超解法を提案する。演算子学習において,提案手法は分解能不変である。我々のモデルの中核は階層型ニューラル演算子であり、ガレルキン型自己認識機構を活用し、関数空間間のマッピングの効率的な学習を可能にする。シンクフィルタは階層内の異なるレベルの情報伝達を容易にするために用いられ、提案したニューラル演算子の表現等価性を保証する。さらに、入力データのスペクトル再構成から導出される学習可能な事前構造を導入する。この損失はモデルに依存しず、画素寄与の重み付けを動的に調整し、モデル全体の勾配を効果的にバランスさせるように設計されている。我々は、さまざまなドメインからの多様なデータセットに関する広範な実験を行い、様々な最先端のSR手法からなる強力なベースラインと比較して一貫した改善を示す。 In this work, we present an arbitrary-scale super-resolution (SR) method to enhance the resolution of scientific data, which often involves complex challenges such as continuity, multi-scale physics, and the intricacies of high-frequency signals. Grounded in operator learning, the proposed method is resolution-invariant. The core of our model is a hierarchical neural operator that leverages a Galerkin-type self-attention mechanism, enabling efficient learning of mappings between function spaces. Sinc filters are used to facilitate the information transfer across different levels in the hierarchy, thereby ensuring representation equivalence in the proposed neural operator. Additionally, we introduce a learnable prior structure that is derived from the spectral resizing of the input data. This loss prior is model-agnostic and is designed to dynamically adjust the weighting of pixel contributions, thereby balancing gradients effectively across the model. We conduct extensive experiments on diverse datasets from different domains and demonstrate consistent improvements compared to strong baselines, which consist of various state-of-the-art SR methods.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# LLMのメタ認知能力:数学的問題解決における探索 Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving ( http://arxiv.org/abs/2405.12205v1 ) ライセンス: Link先を確認	Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora,	(参考訳) メタ認知的知識(Metacognitive knowledge)とは、人間の思考と推論過程に関する直感的な知識のことである。今日の最高のLCMは明らかに推論プロセスを持っています。本論文は,メタ認知的知識をもち,課題を与えられた場合のスキルや手順を名付ける能力を含む証拠を提示する。そこで我々は,まずこれを数学推論の文脈で探求し,強力なLLMを用いて有能なスキルラベルを数学の質問に割り当て,続いてセマンティッククラスタリングを行い,スキルラベルの粗いファミリーを得られるようにする。これらの粗いスキルラベルは人間に解釈可能である。これらのスキルラベルがLCMの推論プロセスに意味があり、関連があることを検証するために、以下の実験を行う。 (a)GPT-4に、数学データセットGSM8KとMATHの学習課題にスキルラベルを割り当てるよう依頼する。 b) LLM を用いてテスト問題の解決を行う場合,スキルラベルの完全なリストを提示し,必要なスキルを特定する。そして、そのスキルラベルに関連するランダムに選択された模範的解答を提示する。これにより、コードアシストモデルを含むいくつかの強力なLCMのGSM8kとMATHの精度が向上する。この記事は数学の問題に当てはまるが、提案する方法論はドメインに依存しない。 Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 注意に基づく双方向長期記憶ネットワークと解釈可能なモデルを用いた引用価値のモデル化 Modeling citation worthiness by using attention-based bidirectional long short-term memory networks and interpretable models ( http://arxiv.org/abs/2405.12206v1 ) ライセンス: Link先を確認	Tong Zeng, Daniel E. Acuna,	(参考訳) 科学者は、彼らの主張を支持するために科学的ソースを引用する方法を早期に学べる。しかし時として、科学者は引用がどこにあるべきかを決定するのに苦労することがある。引用を必要とする文(すなわち引用価値)を自動的に検出することは、これらの問題をどちらも解決し、より堅牢でよく構築された科学的議論へと繋がる。従来の研究者はこのタスクに機械学習を適用してきたが、ディープラーニングにおける注意機構のような最近のアルゴリズムの発展を生かしていない小さなデータセットとモデルを使用してきた。我々はオープンアクセス出版物から構築された大規模な教師付きデータセットから学習する、かなり正確なディープラーニングアーキテクチャを開発することができると仮定する。本研究では,2方向長短期記憶ネットワーク(BiLSTM)を提案する。また、PubMed Open Access Subsetに基づく新しい大規模データセット(PMOA-CITE)も作成しています。実験の結果,本アーキテクチャは標準ACL-ARCデータセット(F_{1}=0.507$)の最先端性能を実現し,新しいPMOA-CITEにおいて高い性能(F_{1}=0.856$)を示すことがわかった。さらに、これらのデータセット間で学習を伝達できることが示される。さらに、解釈可能なモデルを用いて、特定の言語がどのように引用の促進と抑制に使われているかを照らし出す。文の断面や周囲の文が, 予測精度の向上に不可欠であることが判明した。さらに,モデルの誤予測を報告し,引用行動や情報源データにおける体系的な人的誤りを明らかにした。これにより、我々のモデルが、提出前およびアーキヴァル前プロシージャの間、文書をチェックするための扉が開きます。この新しいデータセット、コード、Webベースのツールをコミュニティに提供しています。 Scientist learn early on how to cite scientific sources to support their claims. Sometimes, however, scientists have challenges determining where a citation should be situated -- or, even worse, fail to cite a source altogether. Automatically detecting sentences that need a citation (i.e., citation worthiness) could solve both of these issues, leading to more robust and well-constructed scientific arguments. Previous researchers have applied machine learning to this task but have used small datasets and models that do not take advantage of recent algorithmic developments such as attention mechanisms in deep learning. We hypothesize that we can develop significantly accurate deep learning architectures that learn from large supervised datasets constructed from open access publications. In this work, we propose a Bidirectional Long Short-Term Memory (BiLSTM) network with attention mechanism and contextual information to detect sentences that need citations. We also produce a new, large dataset (PMOA-CITE) based on PubMed Open Access Subset, which is orders of magnitude larger than previous datasets. Our experiments show that our architecture achieves state of the art performance on the standard ACL-ARC dataset ($F_{1}=0.507$) and exhibits high performance ($F_{1}=0.856$) on the new PMOA-CITE. Moreover, we show that it can transfer learning across these datasets. We further use interpretable models to illuminate how specific language is used to promote and inhibit citations. We discover that sections and surrounding sentences are crucial for our improved predictions. We further examined purported mispredictions of the model, and uncovered systematic human mistakes in citation behavior and source data. This opens the door for our model to check documents during pre-submission and pre-archival procedures. We make this new dataset, the code, and a web-based tool available to the community.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# MathBench: 階層型数学ベンチマークによるLLMの理論と応用能力の評価 MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark ( http://arxiv.org/abs/2405.12209v1 ) ライセンス: Link先を確認	Hongwei Liu, Zilong Zheng, Yuxuan Qiao, Haodong Duan, Zhiwei Fei, Fengzhe Zhou, Wenwei Zhang, Songyang Zhang, Dahua Lin, Kai Chen,	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、数学において大きな進歩を見せている。しかし、GSM8kのような従来の数学ベンチマークは一次元的な視点を提供しており、LLMの数学能力の総合的な評価を提供するには不足している。このギャップに対処するために,大規模言語モデルの数学的能力を厳格に評価する新しいベンチマークであるMathBenchを紹介する。 MathBenchは幅広い数学の分野にまたがっており、理論的な理解と実践的な問題解決のスキルの両方を詳細に評価している。このベンチマークは、基本的な算術から大学数学までの5つの異なる段階に進み、様々な知識の深さでモデルを評価するように構成されている。それぞれのステージには理論的な問題と応用の問題が含まれており、モデルの数学的習熟度と現実的なシナリオに概念を適用する能力を測定することができる。 MathBenchは、LLMの数学的能力の評価を強化することを目的としており、両言語的文脈における知識理解レベルと問題解決スキルの微妙な視点を提供する。プロジェクトはhttps://github.com/open-compass/MathBenchで公開されている。 Recent advancements in large language models (LLMs) have showcased significant improvements in mathematics. However, traditional math benchmarks like GSM8k offer a unidimensional perspective, falling short in providing a holistic assessment of the LLMs' math capabilities. To address this gap, we introduce MathBench, a new benchmark that rigorously assesses the mathematical capabilities of large language models. MathBench spans a wide range of mathematical disciplines, offering a detailed evaluation of both theoretical understanding and practical problem-solving skills. The benchmark progresses through five distinct stages, from basic arithmetic to college mathematics, and is structured to evaluate models at various depths of knowledge. Each stage includes theoretical questions and application problems, allowing us to measure a model's mathematical proficiency and its ability to apply concepts in practical scenarios. MathBench aims to enhance the evaluation of LLMs' mathematical abilities, providing a nuanced view of their knowledge understanding levels and problem solving skills in a bilingual context. The project is released at https://github.com/open-compass/MathBench .	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# Slicedit:時空間スライスを用いたテキストと画像の拡散モデルによるゼロショットビデオ編集 Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices ( http://arxiv.org/abs/2405.12211v1 ) ライセンス: Link先を確認	Nathaniel Cohen, Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, Tomer Michaeli,	(参考訳) テキスト・トゥ・イメージ(T2I)拡散モデルは画像合成と編集において最先端の結果が得られる。しかし、このような事前訓練されたモデルをビデオ編集に活用することは大きな課題であると考えられる。既存の多くの作品では、編集されたビデオの時間的一貫性を、ピクセル空間内または深い特徴間の明示的な対応機構によって強制しようとする。しかし、これらの手法は強い非剛性運動に苦しむ。本稿では,自然映像の時空間スライスが自然画像に類似した特徴を示すという観察に基づく,根本的に異なるアプローチを提案する。したがって、通常ビデオフレーム上でのみ使用される同じT2I拡散モデルは、時空間スライスにそれを適用することで時間的一貫性を高めるための強い先行として機能する。そこで本研究では,事前学習したT2I拡散モデルを用いて時空間スライスと時空間スライスの両方を処理するテキストベースのビデオ編集手法であるSliceditを提案する。本手法は,対象のテキストに付着しながら,オリジナル映像の構造と動きを保持するビデオを生成する。広範な実験を通じて,既存の競合する手法と比較して,Sliceditが幅広い実世界の動画を編集できることを実証し,その優位性を確認した。 Webページ: https://matankleiner.github.io/slicedit/ Text-to-image (T2I) diffusion models achieve state-of-the-art results in image synthesis and editing. However, leveraging such pretrained models for video editing is considered a major challenge. Many existing works attempt to enforce temporal consistency in the edited video through explicit correspondence mechanisms, either in pixel space or between deep features. These methods, however, struggle with strong nonrigid motion. In this paper, we introduce a fundamentally different approach, which is based on the observation that spatiotemporal slices of natural videos exhibit similar characteristics to natural images. Thus, the same T2I diffusion model that is normally used only as a prior on video frames, can also serve as a strong prior for enhancing temporal consistency by applying it on spatiotemporal slices. Based on this observation, we present Slicedit, a method for text-based video editing that utilizes a pretrained T2I diffusion model to process both spatial and spatiotemporal slices. Our method generates videos that retain the structure and motion of the original video while adhering to the target text. Through extensive experiments, we demonstrate Slicedit's ability to edit a wide range of real-world videos, confirming its clear advantages compared to existing competing methods. Webpage: https://matankleiner.github.io/slicedit/	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 分散シフトへの大規模マルチモーダルモデルの適用--インコンテキスト学習の役割 Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning ( http://arxiv.org/abs/2405.12217v1 ) ライセンス: Link先を確認	Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Salman Khan, Xin Gao, Lina Yao,	(参考訳) 近年の研究では、大型マルチモーダルモデル (LMM) は自然分布シフトに対して非常に堅牢であり、しばしば以前のベースラインを超えることが示されている。それにもかかわらず、特に医療のような専門分野において、ドメイン固有の適応は依然として必要である。膨大なパラメータ空間を与えられた微調整LMMの非現実性のため、本研究はLMMの適応性を高めるための効果的な代替手段として、文脈内学習(ICL)について検討する。 ICLの成功は、大規模な言語モデルで見られる課題を反映するデモンストレーションの選択に大きく依存しているが、分散シフトに直面したLMMに固有の複雑さを導入している。本研究は,特徴類似性に基づく最寄りの例検索により,文脈内サンプルを選択する非教師付きICL手法であるTopKNearestPRを評価することで,この問題に対処する。分散シフトシナリオ下では、事前学習された視覚エンコーダの欠陥により、その効果が制限されることが判明した。これらの課題に対処するために,より堅牢なデモ選択のためのクラス条件付きコントラスト不変性(CCI)を利用した新しい手法であるInvariantSelectPRを提案する。具体的には、CCIは、異なるクラスにわたる識別能力を改善し、ドメイン固有のバリエーションへの不変性を確保することで、事前訓練された視覚エンコーダを強化する。この拡張により、エンコーダは最も情報に富んだ例を効果的に識別し、検索し、異なる分布の下で新しいクエリサンプルに適応するためにLMMをガイドするために使用される。実験の結果,InvariantSelectPRはLMMの適応性を大幅に向上し,ベンチマークデータセット上での大幅な性能向上を実現し,キャメリオン17では34.2%$\uparrow$精度が,HAM10000では16.9%$\uparrow$精度が向上した。 Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an effective alternative for enhancing LMMs' adaptability. We find that the success of ICL heavily relies on the choice of demonstration, mirroring challenges seen in large language models but introducing unique complexities for LMMs facing distribution shifts. Our study addresses this by evaluating an unsupervised ICL method, TopKNearestPR, which selects in-context examples through a nearest example search based on feature similarity. We uncover that its effectiveness is limited by the deficiencies of pre-trained vision encoders under distribution shift scenarios. To address these challenges, we propose InvariantSelectPR, a novel method leveraging Class-conditioned Contrastive Invariance (CCI) for more robust demonstration selection. Specifically, CCI enhances pre-trained vision encoders by improving their discriminative capabilities across different classes and ensuring invariance to domain-specific variations. This enhancement allows the encoders to effectively identify and retrieve the most informative examples, which are then used to guide LMMs in adapting to new query samples under varying distributions. Our experiments show that InvariantSelectPR substantially improves the adaptability of LMMs, achieving significant performance gains on benchmark datasets, with a 34.2%$\uparrow$ accuracy increase in 7-shot on Camelyon17 and 16.9%$\uparrow$ increase in 7-shot on HAM10000 compared to the baseline zero-shot performance.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# 多視点ステレオによる高速一般化型ガウススプラッティング再構成 Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo ( http://arxiv.org/abs/2405.12218v1 ) ライセンス: Link先を確認	Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu,	(参考訳) MVSGaussianは、Multi-View Stereo (MVS) から導かれる新しい一般化可能な3次元ガウス表現手法であり、見えないシーンを効率的に再構築することができる。具体的には 1) MVS を利用して幾何学的ガウス表現を符号化し,それをガウスパラメータに復号する。 2) 性能をさらに向上させるために, 新規なビュー合成のための効率的なボリュームレンダリング設計を組み込んだハイブリッドガウスレンダリングを提案する。 3)特定シーンの高速微調整を支援するため,一般化可能なモデルによって生成された点群を効果的に集約する多視点幾何一貫したアグリゲーション戦略を導入し,シーンごとの最適化の初期化に役立てる。画像毎の微調整と秒間レンダリングを必要とする従来の一般化可能なNeRFベースの手法と比較して、MVSGaussianは各シーンにより良い合成品質でリアルタイムレンダリングを実現する。バニラ3D-GSと比較すると、MVSGaussianは、より少ないトレーニング計算コストでより良いビュー合成を実現している。 DTU, Real Forward- facing, NeRF Synthetic, and Tanks and Templesデータセットの大規模な実験により、MVSGaussianは、説得力のある汎用性、リアルタイムレンダリング速度、高速なシーンごとの最適化によって、最先端のパフォーマンスを達成できることが確認された。 We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20
# キャンバスで画像と音を合成する画像 Images that Sound: Composing Images and Sounds on a Single Canvas ( http://arxiv.org/abs/2405.12221v1 ) ライセンス: Link先を確認	Ziyang Chen, Daniel Geng, Andrew Owens,	(参考訳) スペクトログラム(Spectrogram)は、私たちの視覚の世界にある画像とは大きく異なる音の2次元表現である。そして自然画像は、スペクトログラムとして再生されると、不自然な音を出す。本稿では,自然画像と自然音声とを同時に扱うスペクトルを合成することが可能であることを示す。私たちはこれらの分光図を音の源泉と呼ぶ。我々のアプローチは単純でゼロショットであり、学習済みのテキスト・ツー・イメージと、共有潜在空間で動作するテキスト・ツー・スペクトログラム拡散モデルを利用する。逆処理中、ノイズの多い潜伏剤を音声と画像の拡散モデルの両方を並列に発音し、その結果、両方のモデルの下にある可能性が高いサンプルが得られる。定量的評価と知覚学的研究により,提案手法は,所望の音声プロンプトと一致したスペクトルを生成するとともに,所望の映像プロンプトの視覚的外観を呈示する。ビデオ結果のプロジェクトページをご覧ください。 Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrograms that simultaneously look like natural images and sound like natural audio. We call these spectrograms images that sound. Our approach is simple and zero-shot, and it leverages pre-trained text-to-image and text-to-spectrogram diffusion models that operate in a shared latent space. During the reverse process, we denoise noisy latents with both the audio and image diffusion models in parallel, resulting in a sample that is likely under both models. Through quantitative evaluations and perceptual studies, we find that our method successfully generates spectrograms that align with a desired audio prompt while also taking the visual appearance of a desired image prompt. Please see our project page for video results: https://ificl.github.io/images-that-sound/	翻訳日:2024-05-21 12:35:30 公開日:2024-05-20

Title

Authors

Abstract

論文公表日・翻訳日

# 多人数会話におけるヒューマン・アウェア・ロボットのマルチモーダル説明可能性アプローチ

A Multi-Modal Explainability Approach for Human-Aware Robots in Multi-Party Conversation ( http://arxiv.org/abs/2407.03340v1 )

ライセンス: Link先を確認

Iveta Bečková, Štefan Pócoš, Giulia Belgiovine, Marco Matarese, Alessandra Sciutti, Carlo Mazzola,

(参考訳) 住所推定(誰かの話に従えば)は、多人数会話のシナリオにおける人間の行動認識の基本的なタスクである。具体的には、人間とロボットの相互作用の分野では、このような対話的なコンテキストにソーシャルロボットを参加させることがさらに重要である。しかし、通常は二分分類タスクとして実装され、ロボットが対処したかどうかを推定し、対話的なスキルを制限する能力を制限する。社会ロボットが人間の信頼を得るためには、あるレベルの透明性と説明可能性を示すことも重要である。したがって、説明可能な人工知能は現在の機械学習アプリケーションやモデルにおいて重要な役割を果たす。私たちの仕事で、私たちは a) 前のSOTAと比較して性能が向上した宛先推定モデルを示すこと。 b) 本モデルをさらに変更して,本質的に説明可能な注意に基づく区分を含むこと。 c) iCubロボットにおける多人数会話のためのモジュール型認知アーキテクチャの一部として説明可能な宛先推定を実装した。 d) 上記アーキテクチャに説明可能性及び透明性を組み込むためのいくつかの方法を提案する。 e) 被験者がロボットをどのように知覚するかに関する様々な説明の効果を分析するために、パイロットユーザー研究を行う。

The addressee estimation (understanding to whom somebody is talking) is a fundamental task for human activity recognition in multi-party conversation scenarios. Specifically, in the field of human-robot interaction, it becomes even more crucial to enable social robots to participate in such interactive contexts. However, it is usually implemented as a binary classification task, restricting the robot's capability to estimate whether it was addressed and limiting its interactive skills. For a social robot to gain the trust of humans, it is also important to manifest a certain level of transparency and explainability. Explainable artificial intelligence thus plays a significant role in the current machine learning applications and models, to provide explanations for their decisions besides excellent performance. In our work, we a) present an addressee estimation model with improved performance in comparison with the previous SOTA; b) further modify this model to include inherently explainable attention-based segments; c) implement the explainable addressee estimation as part of a modular cognitive architecture for multi-party conversation in an iCub robot; d) propose several ways to incorporate explainability and transparency in the aforementioned architecture; and e) perform a pilot user study to analyze the effect of various explanations on how human participants perceive the robot.

翻訳日:2024-07-22 22:09:04 公開日:2024-05-20

# テキスト簡易化における依存距離の役割:人間とチャットGPTの簡易化比較

Role of Dependency Distance in Text Simplification: A Human vs ChatGPT Simplification Comparison ( http://arxiv.org/abs/2406.17787v1 )

ライセンス: Link先を確認

Sumi Lee, Gondy Leroy, David Kauchak, Melissa Just,

(参考訳) 本研究では,人間とチャットGPTテキストの簡易化とその依存距離との関係について検討する。従来のユーザスタディで測定された文法的難易度が増大する220文は、人間の専門家とChatGPTを用いて単純化された。その結果, 3つの文集合は, 平均依存距離が異なり, 原文集合の最上位, 後続のChatGPT簡易文, 人為的簡易文は平均依存距離が低かった。

This study investigates human and ChatGPT text simplification and its relationship to dependency distance. A set of 220 sentences, with increasing grammatical difficulty as measured in a prior user study, were simplified by a human expert and using ChatGPT. We found that the three sentence sets all differed in mean dependency distances: the highest in the original sentence set, followed by ChatGPT simplified sentences, and the human simplified sentences showed the lowest mean dependency distance.

翻訳日:2024-07-01 06:21:45 公開日:2024-05-20

# サンプル選択による3次元点雲正規分布推定の精細化

Refining 3D Point Cloud Normal Estimation via Sample Selection ( http://arxiv.org/abs/2406.18541v1 )

ライセンス: Link先を確認

Jun Zhou, Yaoshun Li, Hongchen Tan, Mingjie Wang, Nannan Li, Xiuping Liu,

(参考訳) 近年,3次元幾何処理の分野では,古典的・基礎的アルゴリズムとしての点雲正規推定が注目されている。現在のニューラルネットワークベースの手法によって達成された顕著なパフォーマンスにもかかわらず、その堅牢性はトレーニングデータの品質とモデルのパフォーマンスの影響を受け続けている。本研究では,グローバルな情報と様々な制約機構を組み込むことにより,正規化のための基本的枠組みを設計し,既存モデルを拡張した。さらに、信頼に基づく戦略を用いて、公平で堅牢なネットワークトレーニングのための妥当なサンプルを選択しました。導入されたサンプル信頼度は、モデルトレーニングにおける異なるサンプルの影響のバランスをとるために損失関数に統合することができる。最後に,従来の方向定式化手法を用いて非方向定式化を行い,非方向定式化タスクと非方向定式化タスクの両方で最先端性能を実現した。大規模な実験結果から,本手法は広く用いられているベンチマークでよく動作することが示された。

In recent years, point cloud normal estimation, as a classical and foundational algorithm, has garnered extensive attention in the field of 3D geometric processing. Despite the remarkable performance achieved by current Neural Network-based methods, their robustness is still influenced by the quality of training data and the models' performance. In this study, we designed a fundamental framework for normal estimation, enhancing existing model through the incorporation of global information and various constraint mechanisms. Additionally, we employed a confidence-based strategy to select the reasonable samples for fair and robust network training. The introduced sample confidence can be integrated into the loss function to balance the influence of different samples on model training. Finally, we utilized existing orientation methods to correct estimated non-oriented normals, achieving state-of-the-art performance in both oriented and non-oriented tasks. Extensive experimental results demonstrate that our method works well on the widely used benchmarks.

翻訳日:2024-07-01 06:12:00 公開日:2024-05-20

# マルチモーダルトランスを用いたAIによるLiDARポイントクラウド生成

Generative AI Empowered LiDAR Point Cloud Generation with Multimodal Transformer ( http://arxiv.org/abs/2406.18542v1 )

ライセンス: Link先を確認

Mohammad Farzanullah, Han Zhang, Akram Bin Sediq, Ali Afana, Melike Erol-Kantarci,

(参考訳) 統合センシングと通信は6G無線通信システムのキーイネーブルである。複数のセンシングモードにより、基地局は環境をより正確に表現することができ、コンテキスト対応の通信につながる。カメラやRADARセンサーなどの広範囲に装備されたセンサーは、いくつかの環境認識を提供することができる。しかし、特に悪天候下では、正確な環境表現を生成するには不十分である。一方、LiDARセンサーはより正確な表現を提供するが、その普及は高いコストで妨げられている。本稿では、画像とRADARデータからLiDAR点雲を合成し、無線通信システムを強化する新しい手法を提案する。具体的には、マルチモーダルトランスアーキテクチャと事前訓練された符号化モデルを使用して、正確なLiDAR生成を可能にする。提案するフレームワークは、コンテキスト対応無線アプリケーション用にキュレートされた実世界のデータセットであるDeepSense 6Gデータセットに基づいて評価される。本研究は,LiDAR点雲を高精度に生成する手法の有効性を示すものである。修正平均二乗誤差は10.3931である。画像の視覚的検査により,LiDAR点雲に存在する構造の大部分を多種多様な環境下で捉えることが可能であることが示唆された。これにより基地局はより正確な環境検知を行うことができる。既存のセンシングモードとLiDAR合成を統合することで、ビームや遮断予測を含む様々な無線アプリケーションの性能を向上させることができる。

Integrated sensing and communications is a key enabler for the 6G wireless communication systems. The multiple sensing modalities will allow the base station to have a more accurate representation of the environment, leading to context-aware communications. Some widely equipped sensors such as cameras and RADAR sensors can provide some environmental perceptions. However, they are not enough to generate precise environmental representations, especially in adverse weather conditions. On the other hand, the LiDAR sensors provide more accurate representations, however, their widespread adoption is hindered by their high cost. This paper proposes a novel approach to enhance the wireless communication systems by synthesizing LiDAR point clouds from images and RADAR data. Specifically, it uses a multimodal transformer architecture and pre-trained encoding models to enable an accurate LiDAR generation. The proposed framework is evaluated on the DeepSense 6G dataset, which is a real-world dataset curated for context-aware wireless applications. Our results demonstrate the efficacy of the proposed approach in accurately generating LiDAR point clouds. We achieve a modified mean squared error of 10.3931. Visual examination of the images indicates that our model can successfully capture the majority of structures present in the LiDAR point cloud for diverse environments. This will enable the base stations to achieve more precise environmental sensing. By integrating LiDAR synthesis with existing sensing modalities, our method can enhance the performance of various wireless applications, including beam and blockage prediction.

翻訳日:2024-07-01 06:12:00 公開日:2024-05-20

# 機械学習で実現可能なシステムエンジニアリングにおけるペインの命名

Naming the Pain in Machine Learning-Enabled Systems Engineering ( http://arxiv.org/abs/2406.04359v1 )

ライセンス: Link先を確認

Marcos Kalinowski, Daniel Mendez, Görkem Giray, Antonio Pedro Santos Alves, Kelly Azevedo, Tatiana Escovedo, Hugo Villamizar, Helio Lopes, Teresa Baldassarre, Stefan Wagner, Stefan Biffl, Jürgen Musil, Michael Felderer, Niklas Lavesson, Tony Gorschek,

(参考訳) コンテキスト: マシンラーニング(ML)対応システムは、製品や運用プロセスの強化を目指す企業によって、ますます採用されています。目的: 本論文は, ML対応システムの現状を概観し, 実践的, 問題駆動型学術研究の基盤となることを目的としている。方法: ML対応システムの現状と問題点について, 実践者から洞察を得るための国際調査を行った。 25カ国から188件の回答を受け取りました。本研究では,信頼区間を有するブートストラップを用いた現代的実践に関する定量的統計分析と,オープンおよび軸方向の符号化手法を用いて報告された問題の質的分析を行った。結果: ML対応システムに関する既存の実証的証拠を補強・拡張し,典型的なML対応システムプロジェクト状況,MLライフサイクルフェーズの認識と複雑性,問題理解,モデル展開,モデル監視に関する現在の実践について,さらなる知見を提供する。さらに、定性的分析は、MLライフサイクルの各フェーズで実践者が直面する問題と、プロジェクト全体の失敗を引き起こす問題の詳細マップを提供する。結論: 結果は,現状と実践環境の問題点の理解に寄与する。我々は、ML対応システムのエンジニアリングを強化するために、ソフトウェアエンジニアリングプラクティスのさらなる適応と普及を提唱する。

Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.

翻訳日:2024-06-23 14:05:12 公開日:2024-05-20

# 勧告強化のための本質的・遠方的知識の抽出

Extracting Essential and Disentangled Knowledge for Recommendation Enhancement ( http://arxiv.org/abs/2406.00012v1 )

ライセンス: Link先を確認

Kounianhua Du, Jizheng Chen, Jianghao Lin, Menghui Zhu, Bo Chen, Shuai Li, Ruiming Tang,

(参考訳) 様々な産業シナリオにおいて、リコメンダモデルは重要な役割を担っているが、高速なシフトデータ配信、ユーザの興味の進化、販売プロモーション中のクリック信号の変動などによって引き起こされる悲惨な忘れ問題に直面していることが多い。この問題を緩和するためには、歴史的データから知識を再利用するのが一般的なアプローチである。しかし、巨大かつ高速に蓄積されたデータの保存は困難であり、劇的なストレージオーバーヘッドを引き起こす。次に、パラメトリックな知識ベースを通じて古いデータを記憶し、膨大な量の生データをモデルパラメータに圧縮する。柔軟性にも拘わらず、パラメトリック知識基盤の記憶と一般化能力を改善する方法は困難である。本稿では,従来のデータから本質的知識と不整合的知識を抽出する2つの制約を提案する。本質的な原理は、入力を代表ベクトルに圧縮し、タスク関連情報をキャプチャし、ノイズのある情報をフィルタリングするのに役立つ。アンタングル化原理は、格納された情報の冗長性を低減し、アンタングル化不変パターンのキャプチャにフォーカスする知識ベースをプッシュする。これら2つのルールは、堅牢で一般化された知識表現のための情報の合理的な圧縮を促進する。 2つのデータセットに対する大規模な実験は,提案手法の有効性を正当化するものである。

Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution, e.g., the evolving user interests, click signals fluctuation during sales promotions, etc. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-accumulating data is hard, which causes dramatic storage overhead. Memorizing old data through a parametric knowledge base is then proposed, which compresses the vast amount of raw data into model parameters. Despite the flexibility, how to improve the memorization and generalization capabilities of the parametric knowledge base is challenging. In this paper, we propose two constraints to extract Essential and Disentangled Knowledge from past data for rational and generalized recommendation enhancement, which improves the capabilities of the parametric knowledge base without increasing the size of it. The essential principle helps to compress the input into representative vectors that capture the task-relevant information and filter out the noisy information. The disentanglement principle reduces the redundancy of stored information and pushes the knowledge base to focus on capturing the disentangled invariant patterns. These two rules together promote rational compression of information for robust and generalized knowledge representations. Extensive experiments on two datasets justify the effectiveness of the proposed method.

翻訳日:2024-06-09 16:19:21 公開日:2024-05-20

# 論文:文書要約とキーワード抽出と画像検索への応用

Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval ( http://arxiv.org/abs/2406.00013v1 )

ライセンス: Link先を確認

Jayaprakash Sundararaj,

(参考訳) 自動要約は、原文書の最も重要な点を保持する要約を生成するために、文書を縮小する過程である。本研究では,2つの問題について検討する。一画像再生のためのキーワード/カプセルの集合としてテキスト文書を要約すること。二書面に関連性及び感情を混同した意見要約を作成すること。そこで本研究では,既存のプレーンテキストニュース記事の相当量の向上に向けた推奨画像について紹介する。確率的モデルと単語類似性ヒューリスティックスを用いてキャプションを生成し、関連するフィードバック機構を持つランク集約フレームワークを用いて再ランク付けされたキーフレーズを抽出する。タグ付け文書やテキスト情報検索で一般的に使用されるランクアグリゲーションや関連するフィードバックは,画像検索の改善にも有効であることを示す。これらのクエリはYahoo Search Engineに送られ、関連する画像を取得する。提案手法は,既存のすべてのベースラインよりも優れた性能を示す。さらに,意見要約のための部分モジュラ関数の集合を提案する。意見要約は、その中に要約と感情検出のタスクが組み込まれている。しかし、感情を検知し、同時に要約を抽出することは容易ではない。この2つの課題は、圧縮の要求が感傷的な文を減少させ、感情検出の要求が冗長な文をもたらすという意味で矛盾する。しかし、サブモジュラリティを使って、この2つの要件のバランスをとる方法を示します。我々の関数は、文書の感情と要約の感情と良いROUGEスコアとの間に良い相関関係があるような要約を生成する。また,提案した部分モジュラ関数の性能を比較する。

Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions.

翻訳日:2024-06-09 16:19:21 公開日:2024-05-20

# 黒体全内エネルギーの定義に関する最後の成果に関するブレークスルー

Breaking news on last achievements on the definition of the black-body total internal energy ( http://arxiv.org/abs/2405.15806v1 )

ライセンス: Link先を確認

Lino Reggiani, Eleonora Alfinito,

(参考訳) 黒体の内部エネルギーは、現代物理学の発展において重要な物理量である。そこで,本稿では,短い歴史的展開とともに,この量の定義と性質に関する最新のニュース(2018-2024)を報告し,コメントする。最初のコメントは、ゼロ点エネルギーの存在によって引き起こされる真空カタストロフィを避けるカシミールエネルギーの包含に関するものであり、それによって境界効果に関連するさらなる量子的寄与につながった。第2のコメントは、量子ブラックボディにおける古典物理学の役割の再検討の可能性を示す1次元のブラックボディの半古典的なシミュレーションに関するものである。

The internal total-energy of the black-body is a physical quantity of paramount importance in the development of modern physics. Accordingly, together with a brief historical development, we report and comment last breaking news (2018-2024) concerning the definition and properties of this quantity. The first comment concerns with the inclusion of the Casimir energy that avoids the vacuum catastrophe implied by he presence of zero-point energy, thus leading to further quantum contributions associated with boundary effects. The second comment concerns with a semi-classical simulation of a one dimensional black-body whose results suggest a possible reconsideration on the role of classical physics on the quantum black-body.

翻訳日:2024-06-02 14:30:04 公開日:2024-05-20

# 独立原子アンザッツからのH$_{2}$分子の解析的相関

Analytical Correlation in the H$_{2}$ Molecule from the Independent Atom Ansatz ( http://arxiv.org/abs/2405.15809v1 )

ライセンス: Link先を確認

Alanna 'Lanie' Leung, Alexander V. Mironenko,

(参考訳) 密度汎関数理論の独立原子アンサッツは、H$_{2}$分子の動的相関エネルギーに対して正確な解析的表現を与える:$E_{c} = 0.5(1 - \sqrt{2})(ab|ba)$ 原子付加自己整合密度$\rho = |a|^{2} + |b|^{2}$。正確な原子の自己交換と組み合わせると、ほぼ正確なSCAN交換相関エネルギーの99.5%以上をR > 0.5$\r{A}$で回収し、0.12eV以下である。全エネルギー関数はH-H結合を正しく解離させ、厳密な結合計算コストでの実験に対して0.002$\r{A}$, 0.19 eV, 13 cm$^{-1}$の絶対誤差を与える。化学結合の形成は、準直交原子状態(-(ab|ba)$)の漸近的ハイトラー・ロンドン共鳴によるもので、その結合の運動エネルギーや電荷蓄積に寄与しない。

The independent atom ansatz of density functional theory yields an accurate analytical expression for dynamic correlation energy in the H$_{2}$ molecule: $E_{c} = 0.5(1 - \sqrt{2})(ab|ba)$ for the atom-additive self-consistent density $\rho = |a|^{2} + |b|^{2}$. Combined with exact atomic self-exchange, it recovers more than 99.5 % of nearly exact SCAN exchange-correlation energy at R > 0.5 $\r{A}$, differing by less than 0.12 eV. The total energy functional correctly dissociates the H-H bond and yields absolute errors of 0.002 $\r{A}$, 0.19 eV, and 13 cm$^{-1}$ relative to experiment at the tight binding computational cost. The chemical bond formation is attributed to the asymptotic Heitler-London resonance of quasi-orthogonal atomic states ($- (ab|ba)$) with no contributions from kinetic energy or charge accumulation in the bond.

翻訳日:2024-06-02 14:30:04 公開日:2024-05-20

# ディープニューラルネットワークにおけるマージンに基づく一般化予測について

On margin-based generalization prediction in deep neural networks ( http://arxiv.org/abs/2405.17445v1 )

ライセンス: Link先を確認

Coenraad Mouton,

(参考訳) ディープニューラルネットワークにおける一般化を理解することは、研究の活発な領域である。有望な探索の道はマージンの測定であり、与えられたサンプルの判定境界から最短距離、またはネットワークの内部のサンプルの表現である。マージンに基づく複雑性測定は、いくつかの状況においてディープニューラルネットワークの一般化能力と相関することが示されているが、それ以外は関係していない。これらの指標の成功や失敗の背景にある理由は、現時点では不明である。本研究では,異なる環境下でのマージンに基づく一般化予測手法について検討する。これらのメトリクスが、時に正確な一般化の予測に失敗し、どのように改善できるかを動機付けています。まず、入力空間で測定されたマージンとサンプルノイズの関係を解析する。異なる種類のサンプルノイズが、ノイズデータをモデル化したネットワーク全体のマージンに、非常に異なる効果をもたらすことが判明した。これに続いて、異なる表現空間で測定されたロバストマージンが、一般化を予測する上でいかに頑健であるかを実証的に評価する。これらの指標にはいくつかの制限があり、多くの場合、大きなマージンは経験的リスクと強く相関しない。最後に、基礎となるデータ多様体の近似を組み込んだ新たなマージンベースの測度を導入する。この測度は、一般に他のすべてのマージンベースの測度よりも一般化の予測的であることが実証的に実証されている。さらに、この測定は、よく知られた一般化予測ベンチマークにおいて、他の現代の複雑性指標よりも優れていることが判明した。さらに,本手法の有用性と限界を分析し,この指標が先行作業で表現された直観とよく一致していることを確認した。

Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or that sample's representation internal to the network. Margin-based complexity measures have been shown to be correlated with the generalization ability of deep neural networks in some circumstances but not others. The reasons behind the success or failure of these metrics are currently unclear. In this study, we examine margin-based generalization prediction methods in different settings. We motivate why these metrics sometimes fail to accurately predict generalization and how they can be improved. First, we analyze the relationship between margins measured in the input space and sample noise. We find that different types of sample noise can have a very different effect on the overall margin of a network that has modeled noisy data. Following this, we empirically evaluate how robust margins measured at different representational spaces are at predicting generalization. We find that these metrics have several limitations and that a large margin does not exhibit a strong correlation with empirical risk in many cases. Finally, we introduce a new margin-based measure that incorporates an approximation of the underlying data manifold. It is empirically demonstrated that this measure is generally more predictive of generalization than all other margin-based measures. Furthermore, we find that this measurement also outperforms other contemporary complexity measures on a well-known generalization prediction benchmark. In addition, we analyze the utility and limitations of this approach and find that this metric is well aligned with intuitions expressed in prior work.

翻訳日:2024-06-02 14:30:04 公開日:2024-05-20

# 病理組織学的特徴エクストラクタを用いた全スライド画像生存解析

Whole Slide Image Survival Analysis Using Histopathological Feature Extractors ( http://arxiv.org/abs/2405.17446v1 )

ライセンス: Link先を確認

Kleanthis Marios Papadopoulos,

(参考訳) WSI (Whole Slide Images) に含まれる情報の豊富さは, 予後評価に有用である。事前訓練されたResNetバックボーンを利用した多数のモデルがリリースされ、主にMIL(Multiple Instance Learning)に基づいたさまざまな機能集約技術が採用されている。最近リリースされたUNI機能抽出器を利用することで、既存のモデルはより高精度に適応することができ、デジタル病理学におけるより堅牢な予後ツールの道を開くことができる。

The abundance of information present in Whole Slide Images (WSIs) makes them useful for prognostic evaluation. A large number of models utilizing a pretrained ResNet backbone have been released and employ various feature aggregation techniques, primarily based on Multiple Instance Learning (MIL). By leveraging the recently released UNI feature extractor, existing models can be adapted to achieve higher accuracy, which paves the way for more robust prognostic tools in digital pathology.

翻訳日:2024-06-02 14:30:04 公開日:2024-05-20

# YUI: 簡易供給・需要曲線を用いた日頭電力価格予測

YUI: Day-ahead Electricity Price Forecasting Using Invariance Simplified Supply and Demand Curve ( http://arxiv.org/abs/2405.14893v1 )

ライセンス: Link先を確認

Linian Wang, Anlan Yu, Jianghong Liu, Huibing Zhang, Leye Wang,

(参考訳) 日頭電気市場においては、すべての市場参加者が意思決定プロセスの信頼性と正確な価格予測にアクセスできることが不可欠である。現在、産業用途で使われている予測手法は、価格形成の基盤となるメカニズムを無視することが多いが、供給・需要の観点からの経済研究は厳しいデータ収集要求を抱えており、実際の市場では適用が困難である。日頭電気市場の特徴を考察し,供給曲線と需要曲線のモデリングを簡略化するための2つの相違仮定を導入する。時間差の仮定を組み込むと、近年の複数のタイムスロットから市場均衡点を用いて供給曲線を予測できる。価格不感の仮定を導入することで、直線を用いて需要曲線を近似することができる。この2つの曲線が交差する点から、予測価格が得られます。提案したモデルでは, suppl\textbf{Y} と要求 cUrve を, YUI と呼ばれる不変性によって単純化し, 最先端の手法よりも効率的である。実験の結果,既存の手法と比較して,YUIは予測誤差をMAEで13.8%,sMAPEで28.7%削減できることがわかった。コードはhttps://github.com/wangln19/YUIで公開されている。

In day-ahead electricity market, it is crucial for all market participants to have access to reliable and accurate price forecasts for their decision-making processes. Forecasting methods currently utilized in industrial applications frequently neglect the underlying mechanisms of price formation, while economic research from the perspective of supply and demand have stringent data collection requirements, making it difficult to apply in actual markets. Observing the characteristics of the day-ahead electricity market, we introduce two invariance assumptions to simplify the modeling of supply and demand curves. Upon incorporating the time invariance assumption, we can forecast the supply curve using the market equilibrium points from multiple time slots in the recent period. By introducing the price insensitivity assumption, we can approximate the demand curve using a straight line. The point where these two curves intersect provides us with the forecast price. The proposed model, forecasting suppl\textbf{Y} and demand cUrve simplified by Invariance, termed as YUI, is more efficient than state-of-the-art methods. Our experiment results in Shanxi day-ahead electricity market show that compared with existing methods, YUI can reduce forecast error by 13.8\% in MAE and 28.7\% in sMAPE. Code is publicly available at https://github.com/wangln19/YUI.

翻訳日:2024-05-27 19:48:22 公開日:2024-05-20

# 空間的自己回帰モデルのための伝達学習

Transfer Learning for Spatial Autoregressive Models ( http://arxiv.org/abs/2405.15600v1 )

ライセンス: Link先を確認

Hao Zeng, Wei Zhong, Xingbai Xu,

(参考訳) 空間自己回帰モデル (SAR) は, 被験者間の空間依存を特徴づけるために, 様々な経験的経済研究に広く応用されている。しかし,SARモデルの推定精度は,対象データのサンプルサイズが制限された場合に低下する。本稿では,SARモデルのための新しい伝達学習フレームワークを提案する。情報ソースデータセットが知られている場合、未知のパラメータを推定し、得られた推定値の理論的収束率を確立するために、転送段階とデバイアス段階を含む2段階のアルゴリズムを導入する。もしどのソースを転送すべきかがわからなければ、空間的残留ブートストラップに基づく情報ソースデータを検出し、必要な空間的依存を維持するために、転送可能なソース検出アルゴリズムが提案される。検出一貫性ももたらされる。シミュレーション研究は,情報ソースデータを用いて,従来の2段最小二乗推定器の性能を著しく向上させることを実証した。実証的な応用として、我々は、2016年アメリカ合衆国大統領選挙の投票データとその他の人口統計および地理的データを利用して、2020年アメリカ合衆国大統領選挙の揺動州における選挙予測に適用する。実験結果から,本手法は従来の推定方法よりも優れていることが示された。

The spatial autoregressive (SAR) model has been widely applied in various empirical economic studies to characterize the spatial dependence among subjects. However, the precision of estimating the SAR model diminishes when the sample size of the target data is limited. In this paper, we propose a new transfer learning framework for the SAR model to borrow the information from similar source data to improve both estimation and prediction. When the informative source data sets are known, we introduce a two-stage algorithm, including a transferring stage and a debiasing stage, to estimate the unknown parameters and also establish the theoretical convergence rates for the resulting estimators. If we do not know which sources to transfer, a transferable source detection algorithm is proposed to detect informative sources data based on spatial residual bootstrap to retain the necessary spatial dependence. Its detection consistency is also derived. Simulation studies demonstrate that using informative source data, our transfer learning algorithm significantly enhances the performance of the classical two-stage least squares estimator. In the empirical application, we apply our method to the election prediction in swing states in the 2020 U.S. presidential election, utilizing polling data from the 2016 U.S. presidential election along with other demographic and geographical data. The empirical results show that our method outperforms traditional estimation methods.

翻訳日:2024-05-27 13:40:24 公開日:2024-05-20

# 医学のための大規模言語モデル:サーベイ

Large Language Models for Medicine: A Survey ( http://arxiv.org/abs/2405.13055v1 )

ライセンス: Link先を確認

Yanxin Zheng, Wensheng Gan, Zefeng Chen, Zhenlian Qi, Qian Liang, Philip S. Yu,

(参考訳) デジタル経済のデジタルインテリジェンスにおける課題に対処するため,大規模言語モデル(LLM)が開発されている。計算能力と利用可能な資源の改善により、LLMは大幅に進歩し、人間の生活のために様々な領域に統合された。医療用LSMは、様々な医療シナリオにまたがる潜在的な応用ツールである。本稿では,医療用LLMの要件と応用に焦点をあてて,LLMの発展を概観する。我々は,先進的な研究の方向性を探究し,将来の医学的応用のために研究者に利益をもたらすことを目的とした,既存モデルの簡潔な概要を提供する。アプリケーションにおける医療用LDMの利点と,その開発における課題を強調した。最後に,医療分野の要求に応えつつ,今後のLLMの課題と研究の方向性を緩和する技術統合の方向性を提案する。

To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, we review LLM developments, focusing on the requirements and applications of medical LLMs. We provide a concise overview of existing models, aiming to explore advanced research directions and benefit researchers for future medical applications. We emphasize the advantages of medical LLMs in applications, as well as the challenges encountered during their development. Finally, we suggest directions for technical integration to mitigate challenges and potential research directions for the future of medical LLMs, aiming to meet the demands of the medical field better.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# 新型コロナウイルスによる新聞記事の感情分析のための大規模言語モデル:The Guardian

Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian ( http://arxiv.org/abs/2405.13056v1 )

ライセンス: Link先を確認

Rohitash Chandra, Baicheng Zhu, Qingying Fang, Eka Shinjikashvili,

(参考訳) 新型コロナウイルス(COVID-19)パンデミックの間、ニュースメディアは、ウイルス感染、医療資源の配分、政府の対応措置など、幅広いトピックをカバーした。新型コロナウイルスの感染拡大を抑えるために実施されるケースや政府戦略の台頭を踏まえ、ソーシャルメディアプラットフォームに対する感情分析について、公衆の反応を理解する研究が進められている。感情分析は、パンデミック中の社会的意見や感情的傾向の変化をよりよく理解することができる。ソーシャルメディア以外では、新聞は政府の情報、専門家、そして様々な話題に関する一般大衆の情報を広める上で重要な役割を担っている。新型コロナウイルス(COVID-19)感染拡大に伴う新聞ソースの感情分析は、メディアがパンデミックをどうカバーしているかを概観することができる。本研究では、The Guardian紙を選定し、初期感染、ロックダウン、ワクチン接種を含む新型コロナウイルスの様々な段階における感情分析を行う。我々は、新しい大規模言語モデル(LLM)を採用し、専門家による感情分析データを用いてそれらを洗練する。また、比較のためにパンデミック前に経験した感情の分析も提供する。その結果、パンデミックの初期段階において、公衆の感情が緊急の危機対応を優先し、後に健康と経済への影響に焦点を移したことが示唆された。ソーシャルメディアの感情分析に関する関連研究と比較すると,「ガーディアン」と「否定的感情」の優位性(不快感,不安感,否定感,否定感)の相違は,ソーシャルメディアがより多様な感情的反射をもたらすことを示唆している。 The Guardianでは、オーストラリア、イギリス、ワールドニュース、オピニオンを含むニュースセクションで、新型コロナウイルス前と期間中に、否定的な感情を総合的に支配する悲惨な物語を見つけた。

During the COVID-19 pandemic, the news media coverage encompassed a wide range of topics that includes viral transmission, allocation of medical resources, and government response measures. There have been studies on sentiment analysis of social media platforms during COVID-19 to understand the public response given the rise of cases and government strategies implemented to control the spread of the virus. Sentiment analysis can provide a better understanding of changes in societal opinions and emotional trends during the pandemic. Apart from social media, newspapers have played a vital role in the dissemination of information, including information from the government, experts, and also the public about various topics. A study of sentiment analysis of newspaper sources during COVID-19 for selected countries can give an overview of how the media covered the pandemic. In this study, we select The Guardian newspaper and provide a sentiment analysis during various stages of COVID-19 that includes initial transmission, lockdowns and vaccination. We employ novel large language models (LLMs) and refine them with expert-labelled sentiment analysis data. We also provide an analysis of sentiments experienced pre-pandemic for comparison. The results indicate that during the early pandemic stages, public sentiment prioritised urgent crisis response, later shifting focus to addressing the impact on health and the economy. In comparison with related studies about social media sentiment analyses, we found a discrepancy between The Guardian with dominance of negative sentiments (sad, annoyed, anxious and denial), suggesting that social media offers a more diversified emotional reflection. We found a grim narrative in The Guardian with overall dominance of negative sentiments, pre and during COVID-19 across news sections including Australia, UK, World News, and Opinion

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# Githubの問題はTree Of Thoughtsで解決できるだろうか?

Can Github issues be solved with Tree Of Thoughts? ( http://arxiv.org/abs/2405.13057v1 )

ライセンス: Link先を確認

Ricardo La Rosa, Corey Hulse, Bangdi Liu,

(参考訳) 大規模な言語モデル(LLM)によるコード生成に関する広範な研究は、HumanEvalのようなベンチマークが96.3%の成功率で上回っているが、これらのベンチマークは主に、基本的な関数レベルのコード生成におけるモデルのパフォーマンスを判断し、GitHubの問題を解決するような現実のシナリオに必要なスコープの批判的思考と概念を欠いている。本研究では,この複雑な課題に対するLLMの意思決定能力と問題解決能力を高めるために,思考のツリー(ToT)言語モデル推論フレームワークの適用について紹介する。従来のインプット・アウトプット(IO)プロンプトとレトリーバル・オーグメンテッド・ジェネレーション(RAG)技術と比較して、ToTは複数の推論軌道の構造化探索を容易にし、潜在的な解の自己評価を可能にすることで性能を向上させるように設計されている。私たちは、SWE-benchのインスタンスに含まれるGithubの問題に対処するために、ToTを実験的にデプロイします。しかし、この結果から、ToTフレームワークだけではLLMに既存のメソッドを上回る重要な理由付け能力を与えるには不十分であることが判明した。本稿では,これらの欠点の潜在的な原因を分析し,思考プロセスの深化やエージェント機能の導入など,改善のための重要な領域を特定する。本研究の知見は,ToTの応用と実世界の問題解決シナリオにおけるLCMの可能性を活かすための今後の方向性を示すことを目的としている。

While there have been extensive studies in code generation by large language models (LLM), where benchmarks like HumanEval have been surpassed with an impressive 96.3% success rate, these benchmarks predominantly judge a model's performance on basic function-level code generation and lack the critical thinking and concept of scope required of real-world scenarios such as solving GitHub issues. This research introduces the application of the Tree of Thoughts (ToT) language model reasoning framework for enhancing the decision-making and problem-solving abilities of LLMs for this complex task. Compared to traditional input-output (IO) prompting and Retrieval Augmented Generation (RAG) techniques, ToT is designed to improve performance by facilitating a structured exploration of multiple reasoning trajectories and enabling self-assessment of potential solutions. We experimentally deploy ToT in tackling a Github issue contained within an instance of the SWE-bench. However, our results reveal that the ToT framework alone is not enough to give LLMs the critical reasoning capabilities to outperform existing methods. In this paper we analyze the potential causes of these shortcomings and identify key areas for improvement such as deepening the thought process and introducing agentic capabilities. The insights of this research are aimed at informing future directions for refining the application of ToT and better harnessing the potential of LLMs in real-world problem-solving scenarios.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# 未来を創るAIコミュニティ : ハグする顔ハブの開発活動の定量的分析

The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub ( http://arxiv.org/abs/2405.13058v1 )

ライセンス: Link先を確認

Cailean Osborne, Jennifer Ding, Hannah Rose Kirk,

(参考訳) オープンソース開発者は、人工知能(AI)の政治経済において重要な役割を担い、クローズドソースのAI開発に代わるものとして、オープンモデル開発が認められている。しかし、オープンソースAIにおける協調的なプラクティスについては、まだ理解が限られています。本稿では,Huging Face (HF) Hubにおける開発活動の定量的分析を3段階に分けて行うことで,このギャップに対処する。まず,348,181モデル,65,761データセット,および156,642スペースリポジトリのさまざまな種類の活動が,右スクリュー分布を示す。例えば、70%以上のモデルが0回ダウンロードされており、1%が99%のダウンロードを占めている。第2に、モデル上でのコラボレーションによるソーシャルネットワークの構造のスナップショットを分析し、コミュニティがコア周辺構造を持ち、多彩な開発者のコアと分離された開発者の大多数(89%)が参加していることを発見した。分離を除去すると、開発者のネットワーク位置に関わらず、協調は高い相互性によって特徴づけられる。第三に、空間におけるモデル利用のレンズを通してモデルの採用を検討し、少数の企業が開発している少数のモデルがHF Hubで広く使われていることを発見した。全体として、HF Hub上のさまざまなタイプのアクティビティは、GitHubのようなプラットフォーム上のOSS開発パターンに関する以前の観察と一致して、Paretoディストリビューションによって特徴づけられる。我々は、(オープンソース)AI研究者、開発者、政策立案者に対する発見とレコメンデーションの意味に関する議論で締めくくります。

Open source developers have emerged as key actors in the political economy of artificial intelligence (AI), with open model development being recognised as an alternative to closed-source AI development. However, we still have a limited understanding of collaborative practices in open source AI. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, we find that various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Second, we analyse a snapshot of the social network structure of collaboration on models, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing isolates, collaboration is characterised by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, we find that various types of activity on the HF Hub are characterised by Pareto distributions, congruent with prior observations about OSS development patterns on platforms like GitHub. We conclude with a discussion of the implications of the findings and recommendations for (open source) AI researchers, developers, and policymakers.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# RNG:複合型マルチモーダルアスペクト感度解析のためのマルチレベルノイズ低減とマルチグレードセマンティックギャップ

RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis ( http://arxiv.org/abs/2405.13059v1 )

ライセンス: Link先を確認

Yaxin Liu, Yan Zhou, Ziming Li, Jinchuan Zhang, Yu Shang, Chenyang Zhang, Songlin Hu,

(参考訳) 重要なマルチモーダル感情分析タスクであるJMASA(Joint Multimodal Aspect-Sentiment Analysis)は、与えられたテキストイメージ対からアスペクト項と関連する感情極性を共同抽出することを目的としており、懸念が高まっている。既存の作業は,(1)多レベルモードノイズ,すなわち,事例レベルと特徴レベルノイズ,(2)多層セマンティックギャップ,すなわち粗くきめ細かなギャップの2つの限界に直面する。どちらの問題もアスペクト知覚対の正確な識別に干渉する可能性がある。これらの制約に対処するため、我々はRNG for JMASAという新しいフレームワークを提案する。具体的には, マルチレベル・モダリティノイズと多粒性セマンティックギャップを同時に低減するために, 1) インスタンスレベルのノイズ低減のためのテキスト画像類似性に基づくグローバルリラクタンス制約(GR-Con), (2) 特徴レベルのノイズ低減のための情報ボトルネック(IB-Con)原理に基づく情報ボトルネック制約(IB-Con), 3) 多粒性セマンティック・セマンティック・セマンティック・セマンティック・ギャップ低減のための対照的な学習方法に基づくセマンティック・コンストラクト(SC-Con)の3つの制約を設計する。 2つのデータセットに関する大規模な実験は、我々の新しい最先端のパフォーマンスを検証する。

As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap, i.e., coarse- and fine-grained gap. Both issues may interfere with accurate identification of aspect-sentiment pairs. To address these limitations, we propose a novel framework named RNG for JMASA. Specifically, to simultaneously reduce multi-level modality noise and multi-grained semantic gap, we design three constraints: (1) Global Relevance Constraint (GR-Con) based on text-image similarity for instance-level noise reduction, (2) Information Bottleneck Constraint (IB-Con) based on the Information Bottleneck (IB) principle for feature-level noise reduction, and (3) Semantic Consistency Constraint (SC-Con) based on mutual information maximization in a contrastive learning way for multi-grained semantic gap reduction. Extensive experiments on two datasets validate our new state-of-the-art performance.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# StatAvg: 侵入検知システムにおけるフェデレーション学習におけるデータ不均一性の軽減

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems ( http://arxiv.org/abs/2405.13062v1 )

ライセンス: Link先を確認

Pavlos S. Bouzinis, Panagiotis Radoglou-Grammatikis, Ioannis Makris, Thomas Lagkas, Vasileios Argyriou, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis, George K. Karagiannidis,

(参考訳) フェデレートラーニング(FL)は、サードパーティに生データを公開せずに、参加するデバイスが共同で機械学習(ML)またはディープラーニング(DL)モデルを構築することを可能にする、分散学習技術である。プライバシー保護の性質から、FLはサイバーセキュリティの領域内で侵入検知システム(IDS)を構築するために広く注目を集めている。しかし、参加ドメインやエンティティ間のデータの均一性は、FLベースのIDSの信頼性を実現する上で大きな課題となる。本稿では,FLにおけるローカルクライアントのデータ間の非独立性および同一性(非ID)の分散機能を緩和する,統計的平均化(StatAvg)と呼ばれる効果的な手法を提案する。特にStatAvgは、FLクライアントが個々のデータ統計データをサーバと共有することを可能にする。後者はクライアントと共有され、ユニバーサルデータ正規化に使用される。 StatAvgは、実際のFLトレーニングプロセスの前に発生するような、あらゆるFLアグリゲーション戦略とシームレスに統合できることは注目に値する。提案手法は,ニューラルネットワークとホスト人工知能(AI)を用いたIDSのためのデータセットを用いて,ベースラインアプローチに対して評価される。実験により, FLクライアント間の非イド特徴分布の低減におけるStatAvgの有効性を, ベースライン法と比較して実証した。

Federated learning (FL) is a decentralized learning technique that enables participating devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. However, the data heterogeneity across participating domains and entities presents significant challenges for the reliable implementation of an FL-based IDS. In this paper, we propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL. In particular, StatAvg allows the FL clients to share their individual data statistics with the server, which then aggregates this information to produce global statistics. The latter are shared with the clients and used for universal data normalisation. It is worth mentioning that StatAvg can seamlessly integrate with any FL aggregation strategy, as it occurs before the actual FL training process. The proposed method is evaluated against baseline approaches using datasets for network and host Artificial Intelligence (AI)-powered IDS. The experimental results demonstrate the efficiency of StatAvg in mitigating non-iid feature distributions across the FL clients compared to the baseline methods.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# Aurora: 大気の基礎モデル

Aurora: A Foundation Model of the Atmosphere ( http://arxiv.org/abs/2405.13063v1 )

ライセンス: Link先を確認

Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris,

(参考訳) ディープラーニング基盤モデルは、大量のデータを活用して、さまざまな下流タスクに取り組むために適応可能な汎用的な表現を学ぶことで、科学の多くの側面に革命をもたらしている。ファンデーションモデルは、地球系の膨大なデータを活用することで、地球とそのサブシステムをモデル化する能力も変革する、という約束を持っています。ここではAuroraを紹介します。Auroraは、100万時間以上の多様な気象および気候データに基づいてトレーニングされた大気の大規模な基盤モデルです。オーロラは基礎モデリングアプローチの強みを活用して、限られた訓練データ、異種変数、極端な事象を含む様々な大気予測問題に対する運用予測を生成する。 1分以内にオーロラは5日間の大気汚染予測と10日間の高解像度気象予測を生成し、最先端の古典的なシミュレーションツールと最高の専門的なディープラーニングモデルを上回った。これらの結果は, 基礎モデルが環境予測を変換できることを示唆している。

Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-scale foundation model of the atmosphere trained on over a million hours of diverse weather and climate data. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models. Taken together, these results indicate that foundation models can transform environmental forecasting.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# デジタルヘルスと室内空気の品質: 行動変化と技術受容のためのIoT駆動の人間中心の可視化プラットフォーム

Digital Health and Indoor Air Quality: An IoT-Driven Human-Centred Visualisation Platform for Behavioural Change and Technology Acceptance ( http://arxiv.org/abs/2405.13064v1 )

ライセンス: Link先を確認

Rameez Raja Kureshi, Bhupesh Kumar Mishra, Dhavalkumar Thakker, Suvodeep Mazumdar, Xiao Li,

(参考訳) 大気汚染物質によるヒトの健康への影響は、室内空気質(IAQ)に対する懸念が高まっている。デジタルヘルスの介入や市民科学のイニシアチブの出現は、意識を高め、IAQを改善し、行動の変化を促進するための新たな道を提供してきた。 TAM(Technology Acceptance Model)は、IAQ技術のユーザ受け入れと採用を理解するための理論的枠組みを提供する。本稿では、COM-BモデルとIoT(Internet of Things)技術を用いて、人間中心のデジタル可視化プラットフォームを設計し、振る舞いの変化とIAQの改善をもたらすケーススタディを提案する。本研究は,IAQに対するユーザ体験,期待,影響に着目し,利用者の受け入れと採用についても検討した。 IAQセンシング、デジタル健康関連介入、市民科学、TAMモデルを統合することで、IAQの課題に対処し、公衆衛生を強化し、持続可能な屋内環境を育む機会を提供する。分析の結果,ヒトの行動,室内活動,意識などの要因がIAQの形成に重要な役割を担っていることが明らかとなった。

The detrimental effects of air pollutants on human health have prompted increasing concerns regarding indoor air quality (IAQ). The emergence of digital health interventions and citizen science initiatives has provided new avenues for raising awareness, improving IAQ, and promoting behavioural changes. The Technology Acceptance Model (TAM) offers a theoretical framework to understand user acceptance and adoption of IAQ technology. This paper presents a case study using the COM-B model and Internet of Things (IoT) technology to design a human-centred digital visualisation platform, leading to behavioural changes and improved IAQ. The study also investigates users' acceptance and adoption of the technology, focusing on their experiences, expectations, and the impact on IAQ. Integrating IAQ sensing, digital health-related interventions, citizen science, and the TAM model offers opportunities to address IAQ challenges, enhance public health, and foster sustainable indoor environments. The analytical results show that factors such as human behaviour, indoor activities, and awareness play crucial roles in shaping IAQ.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# 教師の人工知能認知の探求--K-12教育における人間とAIの相補性の機会と課題-

Exploring Teachers' Perception of Artificial Intelligence: The Socio-emotional Deficiency as Opportunities and Challenges in Human-AI Complementarity in K-12 Education ( http://arxiv.org/abs/2405.13065v1 )

ライセンス: Link先を確認

Soon-young Oh, Yongsu Ahn,

(参考訳) 学校では、教師は教育者、カウンセラー、意思決定者、学校コミュニティのメンバーとして多くの役割を担っている。人工知能(AI)の最近の進歩により、AIが教師にどのように支援し、補完し、協力できるかが議論されている。本研究は,学校における教師とAIの補完関係を改善するために,教師とAIの相補性に関する談話の拡大を目的とした。韓国の小学校教師100名による調査と、12人の教師との詳細なインタビューの混合手法を用いて、教師は、管理タスクを自動化し、高度な知性を通じてパーソナライズされた学習を強化することで、AIが人間の教師を補完する可能性を期待していることを示唆した。興味深いことに、AIの社会的感情能力の欠如は、課題と機会の両方として認識されている。全体として、我々の研究は教師の微妙な認識と彼らの役割に対する様々な期待レベルを示し、教育者の好みや関心に合わせたAIの採用に関する決定の必要性に挑戦する。

In schools, teachers play a multitude of roles, serving as educators, counselors, decision-makers, and members of the school community. With recent advances in artificial intelligence (AI), there is increasing discussion about how AI can assist, complement, and collaborate with teachers. To pave the way for better teacher-AI complementary relationships in schools, our study aims to expand the discourse on teacher-AI complementarity by seeking educators' perspectives on the potential strengths and limitations of AI across a spectrum of responsibilities. Through a mixed method using a survey with 100 elementary school teachers in South Korea and in-depth interviews with 12 teachers, our findings indicate that teachers anticipate AI's potential to complement human teachers by automating administrative tasks and enhancing personalized learning through advanced intelligence. Interestingly, the deficit of AI's socio-emotional capabilities has been perceived as both challenges and opportunities. Overall, our study demonstrates the nuanced perception of teachers and different levels of expectations over their roles, challenging the need for decisions about AI adoption tailored to educators' preferences and concerns.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# 機械学習型NIDSのための分散処理フレームワークの実用化

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS ( http://arxiv.org/abs/2405.13066v1 )

ライセンス: Link先を確認

Maho Kajiura, Junya Nakamura,

(参考訳) ネットワーク侵入検知システム(NIDS)は、ネットワークトラフィックにおける侵入攻撃を検出する。特に、未知の攻撃の検出率が高いため、機械学習ベースのNIDSが注目されている。スケーラブルな分散ストリーム処理システムを用いた機械学習に基づくNIDSのための分散処理フレームワークが文献で提案されている。しかし、機械学習に基づく分類器が実装された場合のパフォーマンスは包括的に評価されていない。本研究では,本フレームワークに基づく5つの代表的な分類器(決定木,ランダムフォレスト,ネイブベイズ,SVM,kNN)を実装し,そのスループットとレイテンシを評価する。実験により,これらの分類器間の処理性能の違いと,フレームワークの処理性能のボトルネックについて検討した。

Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# Lockpicking LLM: トークンレベルの操作を用いたロジトベースのジェイルブレイク

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation ( http://arxiv.org/abs/2405.13068v1 )

ライセンス: Link先を確認

Yuxi Li, Yi Liu, Yuekang Li, Ling Shi, Gelei Deng, Shengquan Chen, Kailong Wang,

(参考訳) 大規模言語モデル(LLM)は、自然言語処理の分野を変えてきたが、意図しない、潜在的に有害なコンテンツを生成する能力を利用するジェイルブレイク攻撃の影響を受け続けている。既存のトークンレベルのジェイルブレイクテクニックは有効だが、特にモデルが頻繁な更新を行い、高度な防御措置を取り入れているため、スケーラビリティと効率の課題に直面している。本稿では,これらの制約に効果的に対応する革新的なトークンレベルの操作手法であるJailMineを紹介する。 JailMineは、肯定的なアウトプットを戦略的に選択し、拒否の可能性を反復的に低減することで、LSMから悪意ある応答を抽出する自動化された"マイニング"プロセスを採用している。複数の有名なLCMとデータセットの厳密なテストを通じて、JailMineの有効性と効率を実証し、進化する防衛戦略に直面した場合でも、平均95%の成功率を維持しながら、使用時間の86%の大幅な削減を実現した。我々の研究は、LLMの脆弱性をジェイルブレイク攻撃に対して評価し緩和するための継続的な努力に寄与し、これらの強力な言語モデルのセキュリティと信頼性を高めるための継続的な警戒と積極的な対策の重要性を強調している。

Large language models (LLMs) have transformed the field of natural language processing, but they remain susceptible to jailbreaking attacks that exploit their capabilities to generate unintended and potentially harmful content. Existing token-level jailbreaking techniques, while effective, face scalability and efficiency challenges, especially as models undergo frequent updates and incorporate advanced defensive measures. In this paper, we introduce JailMine, an innovative token-level manipulation approach that addresses these limitations effectively. JailMine employs an automated "mining" process to elicit malicious responses from LLMs by strategically selecting affirmative outputs and iteratively reducing the likelihood of rejection. Through rigorous testing across multiple well-known LLMs and datasets, we demonstrate JailMine's effectiveness and efficiency, achieving a significant average reduction of 86% in time consumed while maintaining high success rates averaging 95%, even in the face of evolving defensive strategies. Our work contributes to the ongoing effort to assess and mitigate the vulnerability of LLMs to jailbreaking attacks, underscoring the importance of continued vigilance and proactive measures to enhance the security and reliability of these powerful language models.

翻訳日:2024-05-25 04:32:08 公開日:2024-05-20

# ニュース記事イベントベース埋め込みの新しい手法

A Novel Method for News Article Event-Based Embedding ( http://arxiv.org/abs/2405.13071v1 )

ライセンス: Link先を確認

Koren Ishlach, Itzhak Ben-David, Michael Fire, Lior Rokach,

(参考訳) ニュース記事の埋め込みは、メディアバイアスの検出、偽ニュースの特定、ニュースレコメンデーションなど、複数の分野にとって重要なツールである。しかし、既存のニュース埋め込み手法は、ニュースイベントの潜在コンテキストをキャプチャするために最適化されていない。多くの場合、ニュース埋め込み手法は全文情報に依存し、時間関連埋め込み生成の重要性を無視する。そこで本稿では,記事に言及されているエンティティやテーマと,特定のイベントへの歴史的関連性に注目して,ニュース埋め込み生成を最適化する,新たな軽量な手法を提案する。 3段階からなる手法を提案する。まず、与えられたニュース記事のイベント、エンティティ、テーマを処理し、抽出する。第2に、現在および歴史的データに基づいて、時間的に分離されたGloVeモデルをトレーニングすることで、テーマやエンティティの周期的な時間埋め込みを生成する。最後に、記事レベルのベクトルに対するSIF(Smooth Inverse Frequency)と、イベント関連情報による埋め込みのためのSamese Neural Networksの2つの異なるアプローチによって生成されたニュース埋め込みを結合する。我々はGDELTプロジェクトから,85万件以上のニュース記事と1000,000件のイベントを収集し,評価を行った。検証のために、我々は異なるニュース埋め込み生成手法の比較分析を行い、共有イベント検出タスクに2回適用した。提案手法は,すべてのタスクやデータセットに対して,精度・リコール(PR)AUCを大幅に改善することを示す。具体的には,SIFと比較して平均的PR AUC改善率は2.15%,2.57%,日毎および月毎の共有イベント検出タスクに対する半監督的アプローチに比べて2.57%,2.43%であった。

Embedding news articles is a crucial tool for multiple fields, such as media bias detection, identifying fake news, and news recommendations. However, existing news embedding methods are not optimized for capturing the latent context of news events. In many cases, news embedding methods rely on full-textual information and neglect the importance of time-relevant embedding generation. Here, we aim to address these shortcomings by presenting a novel lightweight method that optimizes news embedding generation by focusing on the entities and themes mentioned in the articles and their historical connections to specific events. We suggest a method composed of three stages. First, we process and extract the events, entities, and themes for the given news articles. Second, we generate periodic time embeddings for themes and entities by training timely separated GloVe models on current and historical data. Lastly, we concatenate the news embeddings generated by two distinct approaches: Smooth Inverse Frequency (SIF) for article-level vectors and Siamese Neural Networks for embeddings with nuanced event-related information. To test and evaluate our method, we leveraged over 850,000 news articles and 1,000,000 events from the GDELT project. For validation purposes, we conducted a comparative analysis of different news embedding generation methods, applying them twice to a shared event detection task - first on articles published within the same day and subsequently on those published within the same month. Our experiments show that our method significantly improves the Precision-Recall (PR) AUC across all tasks and datasets. Specifically, we observed an average PR AUC improvement of 2.15% and 2.57% compared to SIF, as well as 2.57% and 2.43% compared to the semi-supervised approach for daily and monthly shared event detection tasks, respectively.

翻訳日:2024-05-25 04:22:11 公開日:2024-05-20

# メタ変数を持つ異種データセットのグラフ構造距離

A graph-structured distance for heterogeneous datasets with meta variables ( http://arxiv.org/abs/2405.13073v1 )

ライセンス: Link先を確認

Edward Hallé-Hannan, Charles Audet, Youssef Diouane, Sébastien Le Digabel, Paul Saves,

(参考訳) 不均一データセットは、さまざまなデータソース、さまざまなデータタイプ、変数間の複雑な関係を特徴とする、さまざまな機械学習や最適化アプリケーションに現れる。実際には、ヘテロジニアスデータセットは、処理が容易なより小さな、よく理解されたデータセットに分割されることが多い。しかしながら、一部のアプリケーションは、高コストで生成または制限されたサイズデータセットを含んでおり、データセット全体に基づいたメソッドを動機付けている。この研究の最初の貢献は、最先端の階層的、木構造的、変数サイズのフレームワークを一般化するグラフ構造化フレームワークのモデリングである。このフレームワークは、変数が連続的、整数的、またはカテゴリー的であるような異種データセットを含むドメインをモデル化する。除外された変数は、与えられたポイントに応じて含まれるか除外される変数を管理するために導入された。 2つ目の主な貢献はグラフ構造距離であり、拡張点と含められた変数と除外された変数の組み合わせを比較する:任意の一対の点を比較することができ、メタ変数を持つ異種データセットで直接動作することができる。コントリビューションはいくつかの回帰実験で説明され、ハイパーパラメーターに対する多層パーセプトロンの性能は逆距離重み付けと$K$-nearest neighborsモデルでモデル化される。

Heterogeneous datasets emerge in various machine learning or optimization applications that feature different data sources, various data types and complex relationships between variables. In practice, heterogeneous datasets are often partitioned into smaller well-behaved ones that are easier to process. However, some applications involve expensive-to-generate or limited size datasets, which motivates methods based on the whole dataset. The first main contribution of this work is a modeling graph-structured framework that generalizes state-of-the-art hierarchical, tree-structured, or variable-size frameworks. This framework models domains that involve heterogeneous datasets in which variables may be continuous, integer, or categorical, with some identified as meta if their values determine the inclusion/exclusion or affect the bounds of other so-called decreed variables. Excluded variables are introduced to manage variables that are either included or excluded depending on the given points. The second main contribution is the graph-structured distance that compares extended points with any combination of included and excluded variables: any pair of points can be compared, allowing to work directly in heterogeneous datasets with meta variables. The contributions are illustrated with some regression experiments, in which the performance of a multilayer perceptron with respect to its hyperparameters is modeled with inverse distance weighting and $K$-nearest neighbors models.

翻訳日:2024-05-25 04:22:11 公開日:2024-05-20

# 量子トモグラフィーは量子力学を説明する

Quantum tomography explains quantum mechanics ( http://arxiv.org/abs/2110.05294v5 )

ライセンス: Link先を確認

Arnold Neumaier,

(参考訳) ボーンの法則よりむしろ量子トモグラフィにインスパイアされた新しい原理から始まり、量子力学と量子測定に対する自己完結型導出的アプローチを提供する。量子検出器を構成するものとその反応の振る舞いに対する示唆的な概念は、論理的に不可能な測定の定義につながる。光状態、位置測定、粒子軌道の測定スキームへの応用は、理想化なしに複雑な現実的な実験に適用可能であることを示す。量子状態、量子検出器、量子プロセス、量子機器のための様々な形態の量子トモグラフィについて論じる。量子力学の伝統的な力学とスペクトルの性質は、量子過程の連続極限から導かれ、混合量子系の密度作用素に対するリンドブラッド方程式と、純粋な非混合量子系の状態ベクトルに対するシュリンガー方程式を与える。正規化密度作用素は古典位相空間変数の位置と運動量と完全に類似した量子位相空間変数の役割を果たす。測定過程のわずかな理想化は、量子場の概念に結びつき、量子期待は、測定可能な空間の領域の再現可能な性質として現れる。新しいアプローチは、従来の基礎よりも実践に近いものです。より一般的であり、従ってより強力である。従来の手法よりもシンプルで技術的には少ないため、量子力学の標準的なツールを導出するのは難しくない。これにより、新しいアプローチは量子力学の入門コースに適合する。文学から引用された様々な引用は、歴史的・哲学的な側面で形式的な展示を照らしている。

Starting from a new principle inspired by quantum tomography rather than from Born's rule, this paper gives a self-contained deductive approach to quantum mechanics and quantum measurement. A suggestive notion for what constitutes a quantum detector and for the behavior of its responses leads to a logically impeccable definition of measurement. Applications to measurement schemes for optical states, position measurements and particle tracks demonstrate the applicability to complex realistic experiments without any idealization. The various forms of quantum tomography for quantum states, quantum detectors, quantum processes, and quantum instruments are discussed. The traditional dynamical and spectral properties of quantum mechanics are derived from a continuum limit of quantum processes, giving the Lindblad equation for the density operator of a mixing quantum system and the Schr\"odinger equation for the state vector of a pure, nonmixing quantum system. Normalized density operators are shown to play the role of quantum phase space variables, in complete analogy to the classical phase space variables position and momentum. A slight idealization of the measurement process leads to the notion of quantum fields, whose smeared quantum expectations emerge as reproducible properties of regions of space accessible to measurements. The new approach is closer to actual practice than the traditional foundations. It is more general, and therefore more powerful. It is simpler and less technical than the traditional approach, and the standard tools of quantum mechanics are not difficult to derive. This makes the new approach suitable for introductory courses on quantum mechanics. A variety of quotes from the literature illuminate the formal exposition with historical and philosophical aspects.

翻訳日:2024-05-22 19:47:36 公開日:2024-05-20

# 観測観測者の量子力学則と量子論の整合性

Quantum mechanical rules for observed observers and the consistency of quantum theory ( http://arxiv.org/abs/2202.04203v2 )

ライセンス: Link先を確認

Alexios P. Polychronakos,

(参考訳) ユニタリ量子力学の規則は、観測者がマクロ状態(<cat>測定)の線形結合で測定の対象となることを示唆しており、そのような測定後の実験結果に対する信頼性の高い予測はできないことを論じる。これにより、Frauchiger と Renner が最近発見した量子力学の解釈に矛盾が生じる。結果の確率を計算し、他のオブザーバーと通信するためのボーンルールは、一般的には猫測定の観察者には適用されない。これらの条件で完了した量子力学的規則は、完全に一貫したものになる。

I argue that the rules of unitary quantum mechanics imply that observers who will themselves be subject to measurements in a linear combination of macroscopic states (``cat" measurements) cannot make reliable predictions on the results of experiments performed after such measurements. This lifts the inconsistency in the interpretation of quantum mechanics recently identified by Frauchiger and Renner. The Born rules for calculating the probability of outcomes and for communicating with other observers do not generally apply for cat-measured observers, nor can they generally be amended to incorporate upcoming cat measurements. Quantum mechanical rules completed with these conditions become fully consistent.

翻訳日:2024-05-22 19:47:36 公開日:2024-05-20

# リプシッツ平滑性を考慮した簡易制御ランダムリシャッフル勾配アルゴリズムの収束性

Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness ( http://arxiv.org/abs/2212.01848v3 )

ライセンス: Link先を確認

Ruggiero Seccia, Corrado Coppola, Giampaolo Liuzzi, Laura Palagi,

(参考訳) 本研究では,非常に多くのスムーズかつ非凸関数の平均を最小化することを検討するとともに,この最適化問題に対処するために広く利用されている2つのミニバッチフレームワークであるインクリメンタルグラディエント(IG)とランダムリシャッフル(RR)に焦点を当てる。我々は IG/RR スキームの緩和制御的な修正を定義するが、これはより軽い計算量 {but} が {weak} と標準仮定の下で収束することを証明できる。特に、IG/RRイテレーションをウォッチドッグルールと、収束を保証するために散発的にのみ活性化するデリバティブフリーライン探索を用いて制御する2つのアルゴリズムスキームを定義する。 2つのスキームは、モノトニックまたは非モノトニックな規則を用いて実行される、ウォッチドッグとライン検索で異なる。この2つのスキームは、メインIG/RRイテレーションで使用されるステップサイズの更新を制御でき、ステップサイズをゼロにしすぎてしまうような事前設定ルールの使用を避けることができ、ステップサイズを効果的に更新するルールを設計する労力を減らすことができる。成分関数の勾配のリプシッツ連続性の軽微な仮定の下で収束性を証明し、異なるディープニューラルネットワークアーキテクチャと様々なサイズデータセットのベンチマークを用いて広範な計算解析を行う。我々は,本手法を全バッチ勾配法(L-BFGS)とIG/RR法の両方と比較し,我々のアルゴリズムが他のオンラインアルゴリズムと同じような計算作業を必要とすること,学習速度の制御が目的関数の高速化を可能にすることを証明した。

In this work, we consider minimizing the average of a very large number of smooth and possibly non-convex functions, and we focus on two widely used minibatch frameworks to tackle this optimization problem: Incremental Gradient (IG) and Random Reshuffling (RR). We define ease-controlled modifications of the IG/RR schemes, which require a light additional computational effort {but} can be proved to converge under {weak} and standard assumptions. In particular, we define two algorithmic schemes in which the IG/RR iteration is controlled by using a watchdog rule and a derivative-free linesearch that activates only sporadically to guarantee convergence. The two schemes differ in the watchdog and the linesearch, which are performed using either a monotonic or a non-monotonic rule. The two schemes also allow controlling the updating of the stepsize used in the main IG/RR iteration, avoiding the use of pre-set rules that may drive the stepsize to zero too fast, reducing the effort in designing effective updating rules of the stepsize. We prove convergence under the mild assumption of Lipschitz continuity of the gradients of the component functions and perform extensive computational analysis using different deep neural architectures and a benchmark of varying-size datasets. We compare our implementation with both a full batch gradient method (i.e. L-BFGS) and an implementation of IG/RR methods, proving that our algorithms require a similar computational effort compared to the other online algorithms and that the control on the learning rate may allow a faster decrease of the objective function.

翻訳日:2024-05-22 19:40:07 公開日:2024-05-20

# 花を五千通り見る

Seeing a Rose in Five Thousand Ways ( http://arxiv.org/abs/2212.04965v2 )

ライセンス: Link先を確認

Yunzhi Zhang, Shangzhe Wu, Noah Snavely, Jiajun Wu,

(参考訳) 視覚的に、バラとは何か? バラは内在的であり、幾何学、テクスチャ、およびその対象カテゴリーに特有の物質が分布する。これらの固有の性質を知ることで、異なる大きさと形状のバラを異なるポーズで、異なる照明条件下でレンダリングすることができる。本研究では,花束の写真など,一つの画像からそのような物体の内在を捉えることを学習する生成モデルを構築する。このようなイメージには、オブジェクトタイプの複数のインスタンスが含まれている。これらの例は全て同じ内在論を共有しているが、これらの内在論におけるばらつきと、ポーズや照明のような外在的要因の違いにより異なるように見える。実験により,インターネット画像から対象物(形状,テクスチャ,素材の分布)を多種多様に学習することに成功した。提案手法は,本質的な画像分解,形状と画像生成,ビュー合成,ライティングなど,複数のダウンストリームタスクにおいて優れた結果が得られる。

What is a rose, visually? A rose comprises its intrinsics, including the distribution of geometry, texture, and material specific to its object category. With knowledge of these intrinsic properties, we may render roses of different sizes and shapes, in different poses, and under different lighting conditions. In this work, we build a generative model that learns to capture such object intrinsics from a single image, such as a photo of a bouquet. Such an image includes multiple instances of an object type. These instances all share the same intrinsics, but appear different due to a combination of variance within these intrinsics and differences in extrinsic factors, such as pose and illumination. Experiments show that our model successfully learns object intrinsics (distribution of geometry, texture, and material) for a wide range of objects, each from a single Internet image. Our method achieves superior results on multiple downstream tasks, including intrinsic image decomposition, shape and image generation, view synthesis, and relighting.

翻訳日:2024-05-22 19:40:07 公開日:2024-05-20

# 線形代数に対する量子ビット効率の良いランダム化量子アルゴリズム

Qubit-Efficient Randomized Quantum Algorithms for Linear Algebra ( http://arxiv.org/abs/2302.01873v3 )

ライセンス: Link先を確認

Samson Wang, Sam McArdle, Mario Berta,

(参考訳) 本稿では,行列関数に対する量子ブロック符号化や他のコヒーレントなオラクルアクセスを使わずに,行列関数からサンプリングするタスクのためのランダム化量子アルゴリズムのクラスを提案する。したがって、量子ビットの使用は純粋にアルゴリズムであり、量子データ構造には追加の量子ビットは必要ない。我々のアルゴリズムは、関心の行列がパウリ基底で指定される古典的なデータ構造から始まる。 N\times N$ Hermitian 行列の場合、空間コストは$\log(N)+1$ qubitsであり、行列の構造によっては、ゲートの複雑さは、等価なエンドツーエンドの問題を考えるとき、最大$O(N^2)$の量子データ構造を使用する最先端の手法に匹敵する。本フレームワークでは,解ベクトルの性質をサンプリングする量子線形系解法と,ハミルトンの基底状態とギブス状態の特性をサンプリングするアルゴリズムを提案する。具体的な応用として、これらのサブルーチンを組み合わせて、量子多体系のグリーン関数を計算するスキームを示す。

We propose a class of randomized quantum algorithms for the task of sampling from matrix functions, without the use of quantum block encodings or any other coherent oracle access to the matrix elements. As such, our use of qubits is purely algorithmic, and no additional qubits are required for quantum data structures. Our algorithms start from a classical data structure in which the matrix of interest is specified in the Pauli basis. For $N\times N$ Hermitian matrices, the space cost is $\log(N)+1$ qubits and depending on the structure of the matrices, the gate complexity can be comparable to state-of-the-art methods that use quantum data structures of up to size $O(N^2)$, when considering equivalent end-to-end problems. Within our framework, we present a quantum linear system solver that allows one to sample properties of the solution vector, as well as algorithms for sampling properties of ground states and Gibbs states of Hamiltonians. As a concrete application, we combine these sub-routines to present a scheme for calculating Green's functions of quantum many-body systems.

翻訳日:2024-05-22 19:40:07 公開日:2024-05-20

# 早期出力を用いた深部ニューラルネットワークの階層的学習

Hierarchical Training of Deep Neural Networks Using Early Exiting ( http://arxiv.org/abs/2303.02384v4 )

ライセンス: Link先を確認

Yamin Sepehri, Pedram Pad, Ahmet Caner Yüzügüler, Pascal Frossard, L. Andrea Dunbar,

(参考訳) 深層ニューラルネットワークは、ビジョンタスクに最先端の精度を提供するが、トレーニングにはかなりのリソースを必要とする。これにより、データを取得するエッジデバイスから遠く離れたクラウドサーバでトレーニングされる。この問題は通信コスト、ランタイム、プライバシの懸念を高める。本研究では,エッジとクラウドワーカを分割したアーキテクチャで早期のエグジットを利用して通信コスト,トレーニングランタイム,プライバシの懸念を緩和する,ディープニューラルネットワークの新しい階層的トレーニング手法を提案する。本手法では,トレーニング期間中のエッジとクラウド間のニューラルネットワークの後方通過を分離するために,早期出口の新しいユースケースを提案する。トレーニングフェーズのシーケンシャルな性質のため、階層のレベルを同時にトレーニングできない、あるいはプライバシを妥協するコストで実行できない、最も利用可能なメソッドの問題に対処する。対照的に,本手法はエッジとクラウドワーカを同時に使用することができ,生の入力データをクラウドと共有せず,後方通過時の通信も不要である。異なるニューラルネットワークアーキテクチャに対するいくつかのシミュレーションとオンデバイス実験は、この方法の有効性を実証している。 CIFAR-10分類では,VGG-16およびResNet-18アーキテクチャのトレーニングランタイムを29%,61%削減し,低ビットレートチャネル上でクラウドとの通信を行う場合,Tiny ImageNet分類では25%,81%削減した。この実行時の利得は達成され、精度低下は無視される。この方法は、エッジクラウドシステムの一部として、携帯電話やロボットなどのセンサ保有の低リソースデバイス上での、高精度なディープニューラルネットワークのオンライン学習に有利である。

Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime and privacy concerns. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved whilst the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy deep neural networks on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge-cloud system, making them more flexible in facing new tasks and classes of data.

翻訳日:2024-05-22 19:40:07 公開日:2024-05-20

# NoRA: 高連結ハミルトニアンの体積方向エンタングル平衡状態のためのテンソルネットワークアンサッツ

NoRA: A Tensor Network Ansatz for Volume-Law Entangled Equilibrium States of Highly Connected Hamiltonians ( http://arxiv.org/abs/2303.16946v5 )

ライセンス: Link先を確認

Valérie Bettaque, Brian Swingle,

(参考訳) 平均場量子スピングラスモデルやSachdev-Ye-Kitaev(SYK)モデルのような全対全相互作用を持つ量子モデルの基底状態構造により、体積法則の絡み合いと大きな基底状態の縮退を緩和できるテンソルネットワークアーキテクチャを提案する。我々は、このアーキテクチャを非局所再正規化アンサッツ(NoRA)と呼んでいる。これは、MERA、DMERA、分岐MERAネットワークの一般化と見なすことができ、空間的局所性の制約が取り除かれるためである。アーキテクチャはSYKモデルの接地空間の絡み合いや複雑さを捉えるのに十分な表現性を持っているため、適切な変分アンザッツとなるが、SYKの詳細な研究は今後の研究に任せる。テンソルがランダムなクリフォードゲートである特別な場合において、アーキテクチャをさらに探求する。ここでは、アーキテクチャをランダムな安定化器コードの符号化マップと見なすことができる。我々はSYKモデルにインスパイアされた一連の符号を導入し、高重量安定器のコストで一定速度と線形距離を選択できることを示した。また、この符号族とSYK基底空間から形成される近似符号との潜在的な類似点についてもコメントする。

Motivated by the ground state structure of quantum models with all-to-all interactions such as mean-field quantum spin glass models and the Sachdev-Ye-Kitaev (SYK) model, we propose a tensor network architecture which can accomodate volume law entanglement and a large ground state degeneracy. We call this architecture the non-local renormalization ansatz (NoRA) because it can be viewed as a generalization of MERA, DMERA, and branching MERA networks with the constraints of spatial locality removed. We argue that the architecture is potentially expressive enough to capture the entanglement and complexity of the ground space of the SYK model, thus making it a suitable variational ansatz, but we leave a detailed study of SYK to future work. We further explore the architecture in the special case in which the tensors are random Clifford gates. Here the architecture can be viewed as the encoding map of a random stabilizer code. We introduce a family of codes inspired by the SYK model which can be chosen to have constant rate and linear distance at the cost of some high weight stabilizers. We also comment on potential similarities between this code family and the approximate code formed from the SYK ground space.

翻訳日:2024-05-22 19:40:07 公開日:2024-05-20

# 動揺運動によるパーキンソン病の時系列分類

Time Series Classification for Detecting Parkinson's Disease from Wrist Motions ( http://arxiv.org/abs/2304.11265v2 )

ライセンス: Link先を確認

Cedric Donié, Neha Das, Satoshi Endo, Sandra Hirche,

(参考訳) パーキンソン病(英: Parkinson disease, PD)は、運動症状の頻繁な変化を特徴とする神経変性疾患である。古典的時系列分類と深層学習技術は、複雑なPD運動パターンと利用可能なデータセットの小さいため、ウェアラブル加速度計データを用いたPD症状のモニタリングにおいて限られた効果を示した。 InceptionTimeとRandOm Convolutional KErnel Transform(ROCKET)をPD症状モニタリングに有望なものとして検討し、InceptionTimeの高学習能力は複雑な動きパターンをモデル化するのに適しており、ROCKETは小さなデータセットに適している。ランダムな探索手法により,最も高いインセプションタイムアーキテクチャを同定し,その性能をPD患者の手首動作データに対する尾根分類器と多層パーセプトロン(MLP)と比較する。以上の結果より, 震度とブラジキネジアの有無を推定するのには全アプローチが適しているが, ジスキネジアの検出には困難が伴うことが示唆された。 ROCKETはジスキネジアの同定において優れた性能を示すが、InceptionTimeは振れやブラジキネシアの検出においてわずかに優れた性能を示す。特に、どちらの手法も多層パーセプトロンよりも優れている。結論として、InceptionTimeは複雑な手首の動き時系列を分類する能力を示し、PDの継続的な症状モニタリングの最大の可能性を秘めている。

Parkinson's disease (PD) is a neurodegenerative condition characterized by frequently changing motor symptoms, necessitating continuous symptom monitoring for more targeted treatment. Classical time series classification and deep learning techniques have demonstrated limited efficacy in monitoring PD symptoms using wearable accelerometer data due to complex PD movement patterns and the small size of available datasets. We investigate InceptionTime and RandOm Convolutional KErnel Transform (ROCKET) as they are promising for PD symptom monitoring, with InceptionTime's high learning capacity being well-suited to modeling complex movement patterns while ROCKET is suited to small datasets. With random search methodology, we identify the highest-scoring InceptionTime architecture and compare its performance to ROCKET with a ridge classifier and a multi-layer perceptron (MLP) on wrist motion data from PD patients. Our findings indicate that all approaches are suitable for estimating tremor severity and bradykinesia presence but encounter challenges in detecting dyskinesia. ROCKET demonstrates superior performance in identifying dyskinesia, whereas InceptionTime exhibits slightly better performance in tremor and bradykinesia detection. Notably, both methods outperform the multi-layer perceptron. In conclusion, InceptionTime exhibits the capability to classify complex wrist motion time series and holds the greatest potential for continuous symptom monitoring in PD.

翻訳日:2024-05-22 19:30:20 公開日:2024-05-20

# 線型フェルミオン部分を持つ指数関数に対するバリアン・ブレジン分解の一般化

Generalization of Balian-Brezin decomposition for exponentials with linear fermionic part ( http://arxiv.org/abs/2306.13481v3 )

ライセンス: Link先を確認

M. A. Seifi Mirjafarlou, A. Jafarizadeh, M. A. Rajabpour,

(参考訳) フェルミオンガウス状態は、その興味深い性質、特にウィックの定理により、かなりの注意を払っている。フェルミオン型ガウス作用素と状態の性質を一般化したバリアンとブレジンの研究を拡張して、それらの発見をさらに拡張して、ガウス作用素を線型成分に組み込む。コルパが導入した手法を活用し、解析を合理化し、線形項を含む指数関数を包含するバリアン・ブレジン分解(BBD)の包括的拡張を示す。さらに、線形部分を含むガウス状態を導入し、対応する重複式を導出する。さらに、Wickの定理を線形項を含むシナリオを包含するように一般化し、一点相関関数と二点相関関数に関する一般的な期待値の表現を容易にする。また、$\mathfrak{so}(N)$ Lie algebra 内の BCH (Zassenhaus) 公式に対処する際の BB 分解の適用性に関する簡単な注釈も提供する。

Fermionic Gaussian states have garnered considerable attention due to their intriguing properties, most notably Wick's theorem. Expanding upon the work of Balian and Brezin, who generalized properties of fermionic Gaussian operators and states, we further extend their findings to incorporate Gaussian operators with a linear component. Leveraging a technique introduced by Colpa, we streamline the analysis and present a comprehensive extension of the Balian-Brezin decomposition (BBD) to encompass exponentials involving linear terms. Furthermore, we introduce Gaussian states featuring a linear part and derive corresponding overlap formulas. Additionally, we generalize Wick's theorem to encompass scenarios involving linear terms, facilitating the expression of generic expectation values in relation to one and two-point correlation functions. We also provide a brief commentary on the applicability of the BB decomposition in addressing the BCH (Zassenhaus) formulas within the $\mathfrak{so}(N)$ Lie algebra.

翻訳日:2024-05-22 19:30:20 公開日:2024-05-20

# 電子健康記録を用いた因果推論のためのテキストデータの活用

Leveraging text data for causal inference using electronic health records ( http://arxiv.org/abs/2307.03687v2 )

ライセンス: Link先を確認

Reagan Mozer, Aaron R. Kaufman, Leo A. Celi, Luke Miratrix,

(参考訳) 電子健康記録(EHR)のデータに依存する研究において、臨床進歩ノートなどの構造化されていないテキストデータは、構造化されたデータから欠落している可能性がある患者の特徴やケアに関する情報の豊富な情報源を提供する。臨床研究におけるテキストの普及にもかかわらず、これらのデータは、その複雑さのために定量的分析のために無視されることが多い。本稿では,テキストデータを利用した電子健康データによる因果推論を解析の複数の段階で支援するための統一的な枠組みを提案する。特に、自然言語処理と統計テキスト解析を標準推論手法と組み合わせて、欠落データ、不確定バイアス、処理効果の不均一性といった問題に対処する方法を検討する。本研究は,非ランダム化医療介入が患者予後に与える影響を調査するERH研究への応用を通じて,従来のマッチング分析にテキストデータを統合することで,治療効果の妥当性を高め,治療の恩恵を最も受ける患者サブグループを特定することができることを示す。我々は,これらの手法が臨床データの二次分析の範囲を,発展途上国のような構造化ERHデータに制限された領域にまで広げる可能性があると考えている。この目的のために、我々は、臨床研究におけるこれらの技術の採用と広範な探索を促進するために、コードとオープンソースの複製材料を提供する。

In studies that rely on data from electronic health records (EHRs), unstructured text data such as clinical progress notes offer a rich source of information about patient characteristics and care that may be missing from structured data. Despite the prevalence of text in clinical research, these data are often ignored for the purposes of quantitative analysis due their complexity. This paper presents a unified framework for leveraging text data to support causal inference with electronic health data at multiple stages of analysis. In particular, we consider how natural language processing and statistical text analysis can be combined with standard inferential techniques to address common challenges due to missing data, confounding bias, and treatment effect heterogeneity. Through an application to a recent EHR study investigating the effects of a non-randomized medical intervention on patient outcomes, we show how incorporating text data in a traditional matching analysis can help strengthen the validity of an estimated treatment effect and identify patient subgroups that may benefit most from treatment. We believe these methods have the potential to expand the scope of secondary analysis of clinical data to domains where structured EHR data is limited, such as in developing countries. To this end, we provide code and open-source replication materials to encourage adoption and broader exploration of these techniques in clinical research.

翻訳日:2024-05-22 19:20:36 公開日:2024-05-20

# ニューラルトピカル表現の一般化に向けて

Towards Generalising Neural Topical Representations ( http://arxiv.org/abs/2307.12564v3 )

ライセンス: Link先を確認

Xiaohao Yang, He Zhao, Dinh Phung, Lan Du,

(参考訳) トピックモデルは従来のベイズ確率モデルから最近のニューラルトピックモデル(NTM)へと進化してきた。 NTMは特定のコーパスでトレーニングおよびテストを行う際に有望な性能を示すが、コーパス間の一般化能力はまだ研究されていない。実際には、ソースコーパスでトレーニングされたNTMが、異なるターゲットコーパスから文書の質の高いトピック表現(トピック上の潜在分布)を生成できると期待されることが多い。本研究では,文書の表現能力がコーパスやタスク全体にわたって確実に一般化されるように,NTMをさらに改良することを目指している。そこで我々は,類似文書間の意味的距離を狭め,異なるコーパスからの文書が類似した意味を共有できるという前提のもとに,NTMの強化を提案する。具体的には、テキストデータ拡張により、トレーニング文書毎に類似した文書を取得する。そして,各ペア間の意味的距離を階層的話題移動距離(Hierarchical Topic Transport Distance)で測定し,トピック表現間の最適移動距離を計算することにより,NTMをさらに最適化する。我々のフレームワークは、ほとんどのNTMにプラグイン・アンド・プレイモジュールとして簡単に適用できます。大規模な実験により, コーパス間の神経トピック表現に関する一般化能力は大幅に向上した。私たちのコードとデータセットは、https://github.com/Xiaohao-Yang/Topic_Model_Generalisationで公開されています。

Topic models have evolved from conventional Bayesian probabilistic models to recent Neural Topic Models (NTMs). Although NTMs have shown promising performance when trained and tested on a specific corpus, their generalisation ability across corpora has yet to be studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation (i.e., latent distribution over topics) for the document from different target corpora. In this work, we aim to improve NTMs further so that their representation power for documents generalises reliably across corpora and tasks. To do so, we propose to enhance NTMs by narrowing the semantical distance between similar documents, with the underlying assumption that documents from different corpora may share similar semantics. Specifically, we obtain a similar document for each training document by text data augmentation. Then, we optimise NTMs further by minimising the semantical distance between each pair, measured by the Hierarchical Topic Transport Distance, which computes the Optimal Transport (OT) distance between their topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora. Our code and datasets are available at: https://github.com/Xiaohao-Yang/Topic_Model_Generalisation

翻訳日:2024-05-22 19:20:36 公開日:2024-05-20

# 低次多項式によるグラフオン推定のための計算下界

Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials ( http://arxiv.org/abs/2308.15728v3 )

ライセンス: Link先を確認

Yuetian Luo, Chao Gao,

(参考訳) グラフオン推定は、ネットワーク分析における最も基本的な問題の一つであり、過去10年間にかなりの注目を集めてきた。統計的観点からは、確率ブロックモデルと非パラメトリックグラフトン推定の両方について、Gao et al (2015) により、グラノン推定の最小誤差速度が確立されている。統計的最適推定子は制約された最小二乗に基づいており、次元において計算複雑性が指数関数的である。計算の観点からは、最もよく知られた多項式時間推定器は普遍特異値しきい値のしきい値に基づいているが、最小値よりもはるかに遅い推定誤差率しか達成できない。 USVTの計算最適性や、グラノン推定における計算障壁の存在は、長年の未解決問題であった。本研究では,低次多項式を用いたグラフトン推定における計算障壁の厳密な証拠を提供する。具体的には,SBMグラノン推定において,低次多項式推定器の場合,その推定誤差は幅広いパラメータ条件下でUSVTの推定値よりも著しく優れていることが示され,非パラメトリックグラノン推定では,低次多項式推定器が推定誤差率を最小値よりも厳密に遅くすることを示す。我々の結果は、Schramm と Wein (2022) による最近の低次多項式の発展に基づいて証明されている。また,本研究の主な成果を生かして,SBMにおけるコミュニティ検出におけるクラスタリング誤差の計算的下限も提供し,コミュニティの効率的な回復のためのケステン・スティグムしきい値の新たな証拠を得た。最後に、計算下界をスパースグラノン推定とビクラスタリングに拡張する。

Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based universal singular value thresholding, but it can only achieve a much slower estimation error rate than the minimax one. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in SBM graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes and in nonparametric graphon estimation, we show low-degree polynomial estimators achieve estimation error rates strictly slower than the minimax rate. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery. Finally, we extend our computational lower bounds to sparse graphon estimation and biclustering.

翻訳日:2024-05-22 19:20:36 公開日:2024-05-20

# BioCoder: 大規模言語モデルを用いたバイオインフォマティクスコード生成ベンチマーク

BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models ( http://arxiv.org/abs/2308.16458v5 )

ライセンス: Link先を確認

Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein,

(参考訳) 事前訓練された大規模言語モデル(LLM)はコード生成を大幅に改善した。これらのモデルがスケールアップするにつれて、より複雑なタスクを処理し、特定のドメインに適切に特化するための出力の必要性が高まっています。ここでは、この規律が必要とするドメイン知識、アルゴリズム、データ操作の量により、バイオインフォマティクスを対象とする。バイオインフォマティクス固有のコードを生成する際のLCMを評価するためのベンチマークであるBioCoderを提案する。 BioCoderは、ファイル間の依存関係、クラス宣言、グローバル変数を含む、フィールドの大部分にまたがる。その中には、GitHubから抽出された1,026のPython関数と1,243のJavaメソッドと、バイオインフォマティクスに関連するRosalindプロジェクトから253のサンプルが含まれている。トピックモデリングを用いて、包含コード全体のカバレッジは、バイオインフォマティクス計算の完全なスペクトルを表していることを示す。 BioCoderは、評価のためのファズテストフレームワークを組み込んでいる。 InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, GPT-4 など,さまざまなモデルの評価に採用しました。さらに、1つのモデル(StarCoder)を微調整し、トレーニングデータセットがテストベンチマークのパフォーマンスを向上できることを実証しました。 1) 成功したモデルは、機能的依存関係を含む完全なコンテキストを持つ長いプロンプト(> 2,600トークン)に対応します。 2) バイオインフォマティクスのドメイン固有の知識は, 一般的なコーディング能力以上のものを含んでいる。これはGPT-3.5/4のパフォーマンス向上から明らかです(ベンチマークの50%対25%)。可用性と実装: コードは https://github.com/gersteinlab/biocoder と https://biocoder-benchmark で利用可能である。 github.io/

Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1,026 Python functions and 1,243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT- 4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (1) Successful models accommodate a long prompt (> 2,600 tokens) with full context, including functional dependencies. (2) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% vs. up to 25%). Availability and implementation: Code is available at: https://github.com/gersteinlab/biocoder and https://biocoder-benchmark. github.io/.

翻訳日:2024-05-22 19:20:36 公開日:2024-05-20

# 境界ストレージモデルにおける機能暗号化

Functional Encryption in the Bounded Storage Models ( http://arxiv.org/abs/2309.06702v3 )

ライセンス: Link先を確認

Mohammed Barhoush, Louis Salvail,

(参考訳) 関数型暗号化は公開鍵暗号の強力なパラダイムであり、暗号化されたデータへの制御されたアクセスを可能にする。このプリミティブの理想的なシミュレーションベースのセキュリティを実現することは、通常、平易なモデルでは不可能であるため、量子記憶モデル(BQSM)と古典記憶モデル(BCSM)では、それぞれ量子記憶量と古典記憶量に制限がある可能性について検討する。機能的暗号化における不可能な結果がこれらの設定に当てはまらないため、肯定的な結果が得られる。まず、BQSMでは、${q}=O(\sqrt{{s}/{r}})$で情報理論に基づくセキュリティを満たす非対話型関数暗号を構築する。ここで${r}$は、相手がプロトコル内の量子メモリの${s}$-qubitsに制限される回数を表し、${q}$はプロトコルを正直に実行するために必要な量子メモリを表す。次に、我々のスキームは、${q} < \sqrt{{s}/{r}}$で情報理論上のセキュリティを得ることができないことを証明することで最適であることを示す。しかし、一方通行関数の存在を仮定することで、${q}=0$ と ${r}=1$ で(相互に)機能的な暗号化を実現する。第二に、BCSMでは、情報理論に基づく部分指数シミュレーションに基づくセキュリティを満足する非対話型機能暗号を構築し、部分指数灰色の箱難読化の存在を仮定する。この仮定は、非対話型機能暗号から部分指数灰色の難読化を構築することで最小限であることを示す。また、グレーボックスの難読化と片道関数を仮定したシミュレーションベースのセキュリティを満たす(対話型)関数暗号の計算設定も検討する。

Functional encryption is a powerful paradigm for public-key encryption that allows for controlled access to encrypted data. Achieving the ideal simulation based security for this primitive is generally impossible in the plain model, so we investigate possibilities in the bounded quantum storage model (BQSM) and the bounded classical storage model (BCSM), where adversaries are limited with respect to their quantum and classical memories, respectively. The impossibility results on functional encryption do not apply to these settings which allows us to obtain positive outcomes. Firstly, in the BQSM, we construct non-interactive functional encryption satisfying information-theoretic simulation based security with ${q}=O(\sqrt{{s}/{r}})$. Here ${r}$ denotes the number of times that an adversary is restricted to ${s}$--qubits of quantum memory in the protocol and ${q}$ denotes the required quantum memory to run the protocol honestly. We then show that our scheme is optimal by proving that it is impossible to attain information-theoretically security with ${q} < \sqrt{{s}/{r}}$. However, by assuming the existence of one-way functions, we achieve (interactive) functional encryption with ${q}=0$ and ${r}=1$. Secondly, in the BCSM, we construct non-interactive functional encryption satisfying information-theoretic subexponential simulation based security assuming the existence of subexponential grey-box obfuscation. We then demonstrate that this assumption is minimal by constructing subexponential grey-box obfuscation from non-interactive functional encryption. We also consider the computational setting, obtaining (interactive) functional encryption satisfying simulation based security assuming grey-box obfuscation and one-way functions.

翻訳日:2024-05-22 19:20:36 公開日:2024-05-20

# 強結合レジームにおける2量子ドット微小キャビティ系の量子相関に及ぼすフォルスター相互作用とパルス励起の影響

Effect of the Förster Interaction and the Pulsed Pumping on the Quantum Correlations of a Two Quantum Dot-Microcavity System in the Strong Coupling Regime ( http://arxiv.org/abs/2309.08699v2 )

ライセンス: Link先を確認

D. Madrid-Úsuga, A. A. Portacio, D. Rasero,

(参考訳) 2つの量子ドットとF\"oster interaction({\Gamma})を強く結合した微小キャビティ内に配置した系と、電磁界の単一モードをレーザーパルスで駆動する系の量子相関を、リンドランド形式のマスター方程式の定式化を用いて理論的に検討した。系のエネルギー固有値は、第1および第2の励起多様体の分解関数として研究された。シミュレーションされたレーザーパルスとパルス強度のポンプ時間の変化を考慮し、コンカレンス(CC)、生成絡み(EoF)、ミューチュアル情報(I)、量子不協和(Q)を時間関数として検討した。エンタングルメント量化器としてEoFとCCの相違を見出した結果,共起がEoFよりもはるかに高い値に達することが示唆された。 F\"oster"相互作用の存在は、系内の量子不協和が支配的な相関関係であることを好んでおり、系の絡み合いが消えても系は量子相関を維持するが、レーザーポンプ時間の増加の影響を受けていることを示している。

The quantum correlations of a system of two quantum dots with F\"oster interaction ({\Gamma}) in a microcavity with strongly coupled dissipation and a single mode of the electromagnetic field and driven by a laser pulse were studied theoretically, using the formalism of the master equation in Lindbland form. The energy eigenvalues of the system were studied as a function of detuning for the first and second excitation varieties. Concurrence (CC), formation entanglement (EoF),mutual information (I) and quantum discord (Q) are studied as a function of time considering different values of F\"oster coupling, varying the pump times of the simulated laser pulse and pulse intensity. We found a discrepancy between EoF and CC as entanglement quantifiers, noting that concurrence reaches much higher values than EoF; so concurrence can indicate results that are well above the EoF. The presence of the F\"oster interaction favors that the quantum discord is the dominant correlation in the system, which indicates that the system maintains quantum correlations even when the entanglement of the system has disappeared, but that it is affected by the increase in the laser pump time

翻訳日:2024-05-22 19:10:52 公開日:2024-05-20

# 自動運転車の長距離3次元物体検出に向けて

Towards Long-Range 3D Object Detection for Autonomous Vehicles ( http://arxiv.org/abs/2310.04800v2 )

ライセンス: Link先を確認

Ajinkya Khoche, Laura Pereira Sánchez, Nazre Batool, Sina Sharif Mansouri, Patric Jensfelt,

(参考訳) 長距離での3D物体検出は、自動運転車の安全性と効率を確保するために不可欠である。しかし、現在最先端のLiDARベースの手法は、遠距離での間隔によって範囲が限られており、エゴ車から遠く離れた地点間での領域ギャップが生じる。もう一つの関連する問題は、遠距離物体のラベル不均衡であり、遠距離でのディープニューラルネットワークの性能を阻害する。上記の制約に対処するため、現在のLiDARベースの3D検出器の長距離性能を改善する2つの方法を検討する。まず,距離の専門家と呼ばれる2つの3D検出ネットワークと,近距離から中距離の物体を専門とする3D検出ネットワークと,長距離の3D検出ネットワークを組み合わせる。ラベルの少ない状況下で長い距離で検出器を訓練するためには、ラベル付き点とエゴ車との距離に応じて損失を更に重み付けする。第2に、画像に基づく深度補完アルゴリズムであるMultimodal Virtual Points (MVP) を用いて、LiDARスキャンを仮想点で拡張する。長距離Argoverse2(AV2)データセットに関する我々の実験は、MVPが単純な実装を維持しながら、長距離性能を改善するのにより効果的であることを示している。一方、レンジの専門家は、画像ベースのセグメンテーションネットワークと完璧なカメラ-LiDARキャリブレーションに依存しないように、計算的に効率的で簡単な代替手段を提供する。

3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away from the ego vehicle. Another related problem is the label imbalance for faraway objects, which inhibits the performance of Deep Neural Networks at long range. To address the above limitations, we investigate two ways to improve long range performance of current LiDAR based 3D detectors. First, we combine two 3D detection networks, referred to as range experts, one specializing at near to mid range objects, and one at long range 3D detection. To train a detector at long range under a scarce label regime, we further weigh the loss according to the labelled point's distance from ego vehicle. Second, we augment LiDAR scans with virtual points generated using Multimodal Virtual Points (MVP), a readily available image-based depth completion algorithm. Our experiments on the long range Argoverse2 (AV2) dataset indicate that MVP is more effective in improving long range performance, while maintaining a straightforward implementation. On the other hand, the range experts offer a computationally efficient and simpler alternative, avoiding dependency on image-based segmentation networks and perfect camera-LiDAR calibration.

翻訳日:2024-05-22 19:10:52 公開日:2024-05-20

# Imitation Bootstrapped Reinforcement Learning

Imitation Bootstrapped Reinforcement Learning ( http://arxiv.org/abs/2311.02198v6 )

ライセンス: Link先を確認

Hengyuan Hu, Suvir Mirchandani, Dorsa Sadigh,

(参考訳) 強化学習(RL)のかなりの可能性にもかかわらず、ロボット制御タスクはより優れたサンプル効率のため、主に模倣学習(IL)に依存している。しかし、ILがすべての可能なシナリオに一般化できるような、包括的な専門家によるデモンストレーションを収集することはコストがかかる。したがって、RL は効率的な自己改善手順として IL 上に構築できることをアピールしている。提案手法は,提案する実演において,まずILポリシーを訓練し,それを用いて,オンライン探索とブートストラップ対象値の両方に対する代替行動を提案する,サンプル効率の高いRLのための新しいフレームワークである。 IBRLは、デモンストレーションのオーバーサンプリングやRLの正規化を、さらなる模倣損失で行う以前の作業と比較して、トレーニングの開始以来、ILポリシーからの高品質なアクションを活用することができ、探索と訓練の効率を大幅に向上させることができる。 IBRLを6つのシミュレーションと3つの実世界のタスクで評価した。 IBRLは従来の手法よりも優れており、特に難しい作業では改善が顕著である。

Despite the considerable potential of reinforcement learning (RL), robotic control tasks predominantly rely on imitation learning (IL) due to its better sample efficiency. However, it is costly to collect comprehensive expert demonstrations that enable IL to generalize to all possible scenarios, and any distribution shift would require recollecting data for finetuning. Therefore, RL is appealing if it can build upon IL as an efficient autonomous self-improvement procedure. We propose imitation bootstrapped reinforcement learning (IBRL), a novel framework for sample-efficient RL with demonstrations that first trains an IL policy on the provided demonstrations and then uses it to propose alternative actions for both online exploration and bootstrapping target values. Compared to prior works that oversample the demonstrations or regularize RL with an additional imitation loss, IBRL is able to utilize high quality actions from IL policies since the beginning of training, which greatly accelerates exploration and training efficiency. We evaluate IBRL on 6 simulation and 3 real-world tasks spanning various difficulty levels. IBRL significantly outperforms prior methods and the improvement is particularly more prominent in harder tasks.

翻訳日:2024-05-22 19:01:09 公開日:2024-05-20

# 分散シフトによるテスト可能な学習

Testable Learning with Distribution Shift ( http://arxiv.org/abs/2311.15142v2 )

ライセンス: Link先を確認

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan,

(参考訳) そこで本研究では,学習者が学習ディストリビューションからラベル付きサンプルに$D$,テストディストリビューションから$D’$のラベル付きサンプルを付与し,テストエラーの少ない分類器を出力する,分散シフトによる学習の基本的な問題を再検討する。この設定における標準的なアプローチは、$D$ と $D'$ の間の距離の概念によって分類器の損失を限定することである。しかし、これらの距離は計算が難しく、効率的なアルゴリズムに繋がらない。我々はこのパラダイムから離れ、分散シフトを伴うテスト可能学習と呼ばれる新しいモデルを定義し、テスト分布上での分類器の性能を証明可能なアルゴリズムを得る。このモデルでは、学習者は、$D$と$D’$のサンプルが関連するテストに合格するたびに、低いテストエラーの分類器を出力する。ハーフ空間、ハーフ空間の交叉、決定木などのよく研究された概念クラスを学習するために、D$の限界がガウス的あるいは一様であるとき、いくつかの肯定的な結果を与える。我々の研究に先立ち、これらの基本事例に対する効率的なアルゴリズムは、$D'$に関する強い仮定なしでは知られていなかった。実現可能な場合($D$と$D’$の両方に整合したハーフスペースが存在する場合)のハーフスペースに対しては、モーメントマッチングアプローチとアクティブラーニングのアイデアを組み合わせて、不一致領域を推定するための効率的なオラクルをシミュレートする。実現不可能な設定にまで拡張するために、テスト可能な(不可知な)学習から最近の研究を適用する。より一般的には、低次$L_2$サンドウィッチ多項式近似器を持つ任意の関数クラスが、我々のモデルで学習できることが証明される。我々は,必要な近似値を得るために擬似ランダム性文献から構成を適用した。

We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on $\{\pm 1\}^d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.

翻訳日:2024-05-22 19:01:09 公開日:2024-05-20

# SqueezeSAM: ユーザフレンドリーなモバイルインタラクティブセグメンテーション

SqueezeSAM: User friendly mobile interactive segmentation ( http://arxiv.org/abs/2312.06736v3 )

ライセンス: Link先を確認

Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra,

(参考訳) Segment Anything Model (SAM)は、インタラクティブセグメンテーションの分野における基盤であり、生成AI、計算写真、医療画像の進歩を加速させている。任意のユーザ入力を処理し、対応するセグメンテーションマスクを生成する能力があるにもかかわらず、SAMの6億ドルのパラメータアーキテクチャはViT-Hをベースにしており、その高い計算要求と大きなモデルサイズのために現在のモバイルハードウェアと互換性がない。本研究の目的は,モバイル写真アプリケーションにSAMを応用することである。この目的のために、完全に畳み込まれたSqueezeSAMモデルアーキテクチャを開発し、これは元のSAMより62.5倍速く、31.6倍小さいので、モバイルアプリケーションにとって実行可能なソリューションです。さらに、我々の小さなモデルは、元のVIT-Hアーキテクチャの1%以内のmIOUを達成する。自動セグメンテーション(Automated segmentation)は、リンゴやカプカットといった業界の主要なプレイヤーによって採用されていることの証明として、写真アプリケーションの作成フローにおいて重要な価値を持っている。この自動化を容易にするために,我々は,有能なオブジェクト検出と,前景オブジェクト選択のための潜在的なユーザクリックをシミュレートし,ユーザが対話的に編集できる初期セグメンテーションマスクを生成する。一般的なユーザからの期待は、オブジェクトの特定の部分のクリックがオブジェクト全体のセグメンテーションをもたらすことである。例えば、写真の中の人のTシャツをクリックすれば、Tシャツだけでなく、理想的には人全体を分割できる。しかし、SAMは通常、クリックされた領域のみをセグメント化する。我々はこの制限を新しいデータ拡張方式によって解決する。これにより、ユーザがバスケットボールを持っている人をクリックすると、人とバスケットボールの両方がセグメンテーションされ、ユーザの期待と一致し、全体的なユーザエクスペリエンスが向上する。

The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware due to its high computational demands and large model size. Our research aims to adapt SAM for use in mobile photography applications. To this end, we have developed a fully convolutional SqueezeSAM model architecture, which is 62.5 times faster and 31.6 times smaller than the original SAM, making it a viable solution for mobile applications. Furthermore, our tiny model achieves an mIOU within 1% of the original VIT-H architecture. Automated segmentation holds significant value in the creation flow for photography applications, as evidenced by its adoption by leading industry players like apple and capcut. To facilitate this automation, we employ salient object detection and simulate potential user clicks for foreground object selection, generating an initial segmentation mask that users can subsequently edit interactively. A common user expectation is that a click on a specific part of an object will result in the segmentation of the entire object. For example, a click on a person's t-shirt in a photo should ideally segment the entire person, not just the t-shirt. However, SAM typically only segments the clicked area. We address this limitation through a novel data augmentation scheme. Consequently, if a user clicks on a person holding a basketball, both the person and the basketball are segmented together, aligning with user expectations and enhancing the overall user experience.

翻訳日:2024-05-22 18:51:19 公開日:2024-05-20

# Gemini: 高機能マルチモーダルモデルのファミリー

Gemini: A Family of Highly Capable Multimodal Models ( http://arxiv.org/abs/2312.11805v3 )

ライセンス: Link先を確認

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, Ryan Doherty, Eli Collins, Clemens Meyer, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Jack Krawczyk, Cosmo Du, Ed Chi, Heng-Tze Cheng, Eric Ni, Purvi Shah, Patrick Kane, Betty Chan, Manaal Faruqui, Aliaksei Severyn, Hanzhao Lin, YaGuang Li, Yong Cheng, Abe Ittycheriah, Mahdis Mahdieh, Mia Chen, Pei Sun, Dustin Tran, Sumit Bagri, Balaji Lakshminarayanan, Jeremiah Liu, Andras Orban, Fabian Güra, Hao Zhou, Xinying Song, Aurelien Boffy, Harish Ganapathy, Steven Zheng, HyunJeong Choe, Ágoston Weisz, Tao Zhu, Yifeng Lu, Siddharth Gopal, Jarrod Kahn, Maciej Kula, Jeff Pitman, Rushin Shah, Emanuel Taropa, Majd Al Merey, Martin Baeuml, Zhifeng Chen, Laurent El Shafey, Yujing Zhang, Olcan Sercinoglu, George Tucker, Enrique Piqueras, Maxim Krikun, Iain Barr, Nikolay Savinov, Ivo Danihelka, Becca Roelofs, Anaïs White, Anders Andreassen, Tamara von Glehn, Lakshman Yagati, Mehran Kazemi, Lucas Gonzalez, Misha Khalman, Jakub Sygnowski, Alexandre Frechette, Charlotte Smith, Laura Culp, Lev Proleev, Yi Luan, Xi Chen, James Lottes, Nathan Schucher, Federico Lebron, Alban Rrustemi, Natalie Clay, Phil Crone, Tomas Kocisky, Jeffrey Zhao, Bartek Perz, Dian Yu, Heidi Howard, Adam Bloniarz, Jack W. Rae, Han Lu, Laurent Sifre, Marcello Maggioni, Fred Alcober, Dan Garrette, Megan Barnes, Shantanu Thakoor, Jacob Austin, Gabriel Barth-Maron, William Wong, Rishabh Joshi, Rahma Chaabouni, Deeni Fatiha, Arun Ahuja, Gaurav Singh Tomar, Evan Senter, Martin Chadwick, Ilya Kornakov, Nithya Attaluri, Iñaki Iturrate, Ruibo Liu, Yunxuan Li, Sarah Cogan, Jeremy Chen, Chao Jia, Chenjie Gu, Qiao Zhang, Jordan Grimstad, Ale Jakse Hartman, Xavier Garcia, Thanumalayan Sankaranarayana Pillai, Jacob Devlin, Michael Laskin, Diego de Las Casas, Dasha Valter, Connie Tao, Lorenzo Blanco, Adrià Puigdomènech Badia, David Reitter, Mianna Chen, Jenny Brennan, Clara Rivera, Sergey Brin, Shariq Iqbal, Gabriela Surita, Jane Labanowski, Abhi Rao, Stephanie Winkler, Emilio Parisotto, Yiming Gu, Kate Olszewska, Ravi Addanki, Antoine Miech, Annie Louis, Denis Teplyashin, Geoff Brown, Elliot Catt, Jan Balaguer, Jackie Xiang, Pidong Wang, Zoe Ashwood, Anton Briukhov, Albert Webson, Sanjay Ganapathy, Smit Sanghavi, Ajay Kannan, Ming-Wei Chang, Axel Stjerngren, Josip Djolonga, Yuting Sun, Ankur Bapna, Matthew Aitchison, Pedram Pejman, Henryk Michalewski, Tianhe Yu, Cindy Wang, Juliette Love, Junwhan Ahn, Dawn Bloxwich, Kehang Han, Peter Humphreys, Thibault Sellam, James Bradbury, Varun Godbole, Sina Samangooei, Bogdan Damoc, Alex Kaskasoli, Sébastien M. R. Arnold, Vijay Vasudevan, Shubham Agrawal, Jason Riesa, Dmitry Lepikhin, Richard Tanburn, Srivatsan Srinivasan, Hyeontaek Lim, Sarah Hodkinson, Pranav Shyam, Johan Ferret, Steven Hand, Ankush Garg, Tom Le Paine, Jian Li, Yujia Li, Minh Giang, Alexander Neitz, Zaheer Abbas, Sarah York, Machel Reid, Elizabeth Cole, Aakanksha Chowdhery, Dipanjan Das, Dominika Rogozińska, Vitaliy Nikolaev, Pablo Sprechmann, Zachary Nado, Lukas Zilka, Flavien Prost, Luheng He, Marianne Monteiro, Gaurav Mishra, Chris Welty, Josh Newlan, Dawei Jia, Miltiadis Allamanis, Clara Huiyi Hu, Raoul de Liedekerke, Justin Gilmer, Carl Saroufim, Shruti Rijhwani, Shaobo Hou, Disha Shrivastava, Anirudh Baddepudi, Alex Goldin, Adnan Ozturel, Albin Cassirer, Yunhan Xu, Daniel Sohn, Devendra Sachan, Reinald Kim Amplayo, Craig Swanson, Dessie Petrova, Shashi Narayan, Arthur Guez, Siddhartha Brahma, Jessica Landon, Miteyan Patel, Ruizhe Zhao, Kevin Villela, Luyu Wang, Wenhao Jia, Matthew Rahtz, Mai Giménez, Legg Yeung, James Keeling, Petko Georgiev, Diana Mincu, Boxi Wu, Salem Haykal, Rachel Saputro, Kiran Vodrahalli, James Qin, Zeynep Cankara, Abhanshu Sharma, Nick Fernando, Will Hawkins, Behnam Neyshabur, Solomon Kim, Adrian Hutter, Priyanka Agrawal, Alex Castro-Ros, George van den Driessche, Tao Wang, Fan Yang, Shuo-yiin Chang, Paul Komarek, Ross McIlroy, Mario Lučić, Guodong Zhang, Wael Farhan, Michael Sharman, Paul Natsev, Paul Michel, Yamini Bansal, Siyuan Qiao, Kris Cao, Siamak Shakeri, Christina Butterfield, Justin Chung, Paul Kishan Rubenstein, Shivani Agrawal, Arthur Mensch, Kedar Soparkar, Karel Lenc, Timothy Chung, Aedan Pope, Loren Maggiore, Jackie Kay, Priya Jhakra, Shibo Wang, Joshua Maynez, Mary Phuong, Taylor Tobin, Andrea Tacchetti, Maja Trebacz, Kevin Robinson, Yash Katariya, Sebastian Riedel, Paige Bailey, Kefan Xiao, Nimesh Ghelani, Lora Aroyo, Ambrose Slone, Neil Houlsby, Xuehan Xiong, Zhen Yang, Elena Gribovskaya, Jonas Adler, Mateo Wirth, Lisa Lee, Music Li, Thais Kagohara, Jay Pavagadhi, Sophie Bridgers, Anna Bortsova, Sanjay Ghemawat, Zafarali Ahmed, Tianqi Liu, Richard Powell, Vijay Bolina, Mariko Iinuma, Polina Zablotskaia, James Besley, Da-Woon Chung, Timothy Dozat, Ramona Comanescu, Xiance Si, Jeremy Greer, Guolong Su, Martin Polacek, Raphaël Lopez Kaufman, Simon Tokumine, Hexiang Hu, Elena Buchatskaya, Yingjie Miao, Mohamed Elhawaty, Aditya Siddhant, Nenad Tomasev, Jinwei Xing, Christina Greer, Helen Miller, Shereen Ashraf, Aurko Roy, Zizhao Zhang, Ada Ma, Angelos Filos, Milos Besta, Rory Blevins, Ted Klimenko, Chih-Kuan Yeh, Soravit Changpinyo, Jiaqi Mu, Oscar Chang, Mantas Pajarskas, Carrie Muir, Vered Cohen, Charline Le Lan, Krishna Haridasan, Amit Marathe, Steven Hansen, Sholto Douglas, Rajkumar Samuel, Mingqiu Wang, Sophia Austin, Chang Lan, Jiepu Jiang, Justin Chiu, Jaime Alonso Lorenzo, Lars Lowe Sjösund, Sébastien Cevey, Zach Gleicher, Thi Avrahami, Anudhyan Boral, Hansa Srinivasan, Vittorio Selo, Rhys May, Konstantinos Aisopos, Léonard Hussenot, Livio Baldini Soares, Kate Baumli, Michael B. Chang, Adrià Recasens, Ben Caine, Alexander Pritzel, Filip Pavetic, Fabio Pardo, Anita Gergely, Justin Frye, Vinay Ramasesh, Dan Horgan, Kartikeya Badola, Nora Kassner, Subhrajit Roy, Ethan Dyer, Víctor Campos Campos, Alex Tomala, Yunhao Tang, Dalia El Badawy, Elspeth White, Basil Mustafa, Oran Lang, Abhishek Jindal, Sharad Vikram, Zhitao Gong, Sergi Caelles, Ross Hemsley, Gregory Thornton, Fangxiaoyu Feng, Wojciech Stokowiec, Ce Zheng, Phoebe Thacker, Çağlar Ünlü, Zhishuai Zhang, Mohammad Saleh, James Svensson, Max Bileschi, Piyush Patil, Ankesh Anand, Roman Ring, Katerina Tsihlas, Arpi Vezer, Marco Selvi, Toby Shevlane, Mikel Rodriguez, Tom Kwiatkowski, Samira Daruki, Keran Rong, Allan Dafoe, Nicholas FitzGerald, Keren Gu-Lemberg, Mina Khan, Lisa Anne Hendricks, Marie Pellat, Vladimir Feinberg, James Cobon-Kerr, Tara Sainath, Maribeth Rauh, Sayed Hadi Hashemi, Richard Ives, Yana Hasson, Eric Noland, Yuan Cao, Nathan Byrd, Le Hou, Qingze Wang, Thibault Sottiaux, Michela Paganini, Jean-Baptiste Lespiau, Alexandre Moufarek, Samer Hassan, Kaushik Shivakumar, Joost van Amersfoort, Amol Mandhane, Pratik Joshi, Anirudh Goyal, Matthew Tung, Andrew Brock, Hannah Sheahan, Vedant Misra, Cheng Li, Nemanja Rakićević, Mostafa Dehghani, Fangyu Liu, Sid Mittal, Junhyuk Oh, Seb Noury, Eren Sezener, Fantine Huot, Matthew Lamm, Nicola De Cao, Charlie Chen, Sidharth Mudgal, Romina Stella, Kevin Brooks, Gautam Vasudevan, Chenxi Liu, Mainak Chain, Nivedita Melinkeri, Aaron Cohen, Venus Wang, Kristie Seymore, Sergey Zubkov, Rahul Goel, Summer Yue, Sai Krishnakumaran, Brian Albert, Nate Hurley, Motoki Sano, Anhad Mohananey, Jonah Joughin, Egor Filonov, Tomasz Kępa, Yomna Eldawy, Jiawern Lim, Rahul Rishi, Shirin Badiezadegan, Taylor Bos, Jerry Chang, Sanil Jain, Sri Gayatri Sundara Padmanabhan, Subha Puttagunta, Kalpesh Krishna, Leslie Baker, Norbert Kalb, Vamsi Bedapudi, Adam Kurzrok, Shuntong Lei, Anthony Yu, Oren Litvin, Xiang Zhou, Zhichun Wu, Sam Sobell, Andrea Siciliano, Alan Papir, Robby Neale, Jonas Bragagnolo, Tej Toor, Tina Chen, Valentin Anklin, Feiran Wang, Richie Feng, Milad Gholami, Kevin Ling, Lijuan Liu, Jules Walter, Hamid Moghaddam, Arun Kishore, Jakub Adamek, Tyler Mercado, Jonathan Mallinson, Siddhinita Wandekar, Stephen Cagle, Eran Ofek, Guillermo Garrido, Clemens Lombriser, Maksim Mukha, Botu Sun, Hafeezul Rahman Mohammad, Josip Matak, Yadi Qian, Vikas Peswani, Pawel Janus, Quan Yuan, Leif Schelin, Oana David, Ankur Garg, Yifan He, Oleksii Duzhyi, Anton Älgmyr, Timothée Lottaz, Qi Li, Vikas Yadav, Luyao Xu, Alex Chinien, Rakesh Shivanna, Aleksandr Chuklin, Josie Li, Carrie Spadine, Travis Wolfe, Kareem Mohamed, Subhabrata Das, Zihang Dai, Kyle He, Daniel von Dincklage, Shyam Upadhyay, Akanksha Maurya, Luyan Chi, Sebastian Krause, Khalid Salama, Pam G Rabinovitch, Pavan Kumar Reddy M, Aarush Selvan, Mikhail Dektiarev, Golnaz Ghiasi, Erdem Guven, Himanshu Gupta, Boyi Liu, Deepak Sharma, Idan Heimlich Shtacher, Shachi Paul, Oscar Akerlund, François-Xavier Aubet, Terry Huang, Chen Zhu, Eric Zhu, Elico Teixeira, Matthew Fritze, Francesco Bertolini, Liana-Eleonora Marinescu, Martin Bölle, Dominik Paulus, Khyatti Gupta, Tejasi Latkar, Max Chang, Jason Sanders, Roopa Wilson, Xuewei Wu, Yi-Xuan Tan, Lam Nguyen Thiet, Tulsee Doshi, Sid Lall, Swaroop Mishra, Wanming Chen, Thang Luong, Seth Benjamin, Jasmine Lee, Ewa Andrejczuk, Dominik Rabiej, Vipul Ranjan, Krzysztof Styrc, Pengcheng Yin, Jon Simon, Malcolm Rose Harriott, Mudit Bansal, Alexei Robsky, Geoff Bacon, David Greene, Daniil Mirylenka, Chen Zhou, Obaid Sarvana, Abhimanyu Goyal, Samuel Andermatt, Patrick Siegler, Ben Horn, Assaf Israel, Francesco Pongetti, Chih-Wei "Louis" Chen, Marco Selvatici, Pedro Silva, Kathie Wang, Jackson Tolins, Kelvin Guu, Roey Yogev, Xiaochen Cai, Alessandro Agostini, Maulik Shah, Hung Nguyen, Noah Ó Donnaile, Sébastien Pereira, Linda Friso, Adam Stambler, Adam Kurzrok, Chenkai Kuang, Yan Romanikhin, Mark Geller, ZJ Yan, Kane Jang, Cheng-Chun Lee, Wojciech Fica, Eric Malmi, Qijun Tan, Dan Banica, Daniel Balle, Ryan Pham, Yanping Huang, Diana Avram, Hongzhi Shi, Jasjot Singh, Chris Hidey, Niharika Ahuja, Pranab Saxena, Dan Dooley, Srividya Pranavi Potharaju, Eileen O'Neill, Anand Gokulchandran, Ryan Foley, Kai Zhao, Mike Dusenberry, Yuan Liu, Pulkit Mehta, Ragha Kotikalapudi, Chalence Safranek-Shrader, Andrew Goodman, Joshua Kessinger, Eran Globen, Prateek Kolhar, Chris Gorgolewski, Ali Ibrahim, Yang Song, Ali Eichenbaum, Thomas Brovelli, Sahitya Potluri, Preethi Lahoti, Cip Baetu, Ali Ghorbani, Charles Chen, Andy Crawford, Shalini Pal, Mukund Sridhar, Petru Gurita, Asier Mujika, Igor Petrovski, Pierre-Louis Cedoz, Chenmei Li, Shiyuan Chen, Niccolò Dal Santo, Siddharth Goyal, Jitesh Punjabi, Karthik Kappaganthu, Chester Kwak, Pallavi LV, Sarmishta Velury, Himadri Choudhury, Jamie Hall, Premal Shah, Ricardo Figueira, Matt Thomas, Minjie Lu, Ting Zhou, Chintu Kumar, Thomas Jurdi, Sharat Chikkerur, Yenai Ma, Adams Yu, Soo Kwak, Victor Ähdel, Sujeevan Rajayogam, Travis Choma, Fei Liu, Aditya Barua, Colin Ji, Ji Ho Park, Vincent Hellendoorn, Alex Bailey, Taylan Bilal, Huanjie Zhou, Mehrdad Khatir, Charles Sutton, Wojciech Rzadkowski, Fiona Macintosh, Konstantin Shagin, Paul Medina, Chen Liang, Jinjing Zhou, Pararth Shah, Yingying Bi, Attila Dankovics, Shipra Banga, Sabine Lehmann, Marissa Bredesen, Zifan Lin, John Eric Hoffmann, Jonathan Lai, Raynald Chung, Kai Yang, Nihal Balani, Arthur Bražinskas, Andrei Sozanschi, Matthew Hayes, Héctor Fernández Alcalde, Peter Makarov, Will Chen, Antonio Stella, Liselotte Snijders, Michael Mandl, Ante Kärrman, Paweł Nowak, Xinyi Wu, Alex Dyck, Krishnan Vaidyanathan, Raghavender R, Jessica Mallet, Mitch Rudominer, Eric Johnston, Sushil Mittal, Akhil Udathu, Janara Christensen, Vishal Verma, Zach Irving, Andreas Santucci, Gamaleldin Elsayed, Elnaz Davoodi, Marin Georgiev, Ian Tenney, Nan Hua, Geoffrey Cideron, Edouard Leurent, Mahmoud Alnahlawi, Ionut Georgescu, Nan Wei, Ivy Zheng, Dylan Scandinaro, Heinrich Jiang, Jasper Snoek, Mukund Sundararajan, Xuezhi Wang, Zack Ontiveros, Itay Karo, Jeremy Cole, Vinu Rajashekhar, Lara Tumeh, Eyal Ben-David, Rishub Jain, Jonathan Uesato, Romina Datta, Oskar Bunyan, Shimu Wu, John Zhang, Piotr Stanczyk, Ye Zhang, David Steiner, Subhajit Naskar, Michael Azzam, Matthew Johnson, Adam Paszke, Chung-Cheng Chiu, Jaume Sanchez Elias, Afroz Mohiuddin, Faizan Muhammad, Jin Miao, Andrew Lee, Nino Vieillard, Jane Park, Jiageng Zhang, Jeff Stanway, Drew Garmon, Abhijit Karmarkar, Zhe Dong, Jong Lee, Aviral Kumar, Luowei Zhou, Jonathan Evens, William Isaac, Geoffrey Irving, Edward Loper, Michael Fink, Isha Arkatkar, Nanxin Chen, Izhak Shafran, Ivan Petrychenko, Zhe Chen, Johnson Jia, Anselm Levskaya, Zhenkai Zhu, Peter Grabowski, Yu Mao, Alberto Magni, Kaisheng Yao, Javier Snaider, Norman Casagrande, Evan Palmer, Paul Suganthan, Alfonso Castaño, Irene Giannoumis, Wooyeol Kim, Mikołaj Rybiński, Ashwin Sreevatsa, Jennifer Prendki, David Soergel, Adrian Goedeckemeyer, Willi Gierke, Mohsen Jafari, Meenu Gaba, Jeremy Wiesner, Diana Gage Wright, Yawen Wei, Harsha Vashisht, Yana Kulizhskaya, Jay Hoover, Maigo Le, Lu Li, Chimezie Iwuanyanwu, Lu Liu, Kevin Ramirez, Andrey Khorlin, Albert Cui, Tian LIN, Marcus Wu, Ricardo Aguilar, Keith Pallo, Abhishek Chakladar, Ginger Perng, Elena Allica Abellan, Mingyang Zhang, Ishita Dasgupta, Nate Kushman, Ivo Penchev, Alena Repina, Xihui Wu, Tom van der Weide, Priya Ponnapalli, Caroline Kaplan, Jiri Simsa, Shuangfeng Li, Olivier Dousse, Fan Yang, Jeff Piper, Nathan Ie, Rama Pasumarthi, Nathan Lintz, Anitha Vijayakumar, Daniel Andor, Pedro Valenzuela, Minnie Lui, Cosmin Paduraru, Daiyi Peng, Katherine Lee, Shuyuan Zhang, Somer Greene, Duc Dung Nguyen, Paula Kurylowicz, Cassidy Hardin, Lucas Dixon, Lili Janzer, Kiam Choo, Ziqiang Feng, Biao Zhang, Achintya Singhal, Dayou Du, Dan McKinnon, Natasha Antropova, Tolga Bolukbasi, Orgad Keller, David Reid, Daniel Finchelstein, Maria Abi Raad, Remi Crocker, Peter Hawkins, Robert Dadashi, Colin Gaffney, Ken Franko, Anna Bulanova, Rémi Leblond, Shirley Chung, Harry Askham, Luis C. Cobo, Kelvin Xu, Felix Fischer, Jun Xu, Christina Sorokin, Chris Alberti, Chu-Cheng Lin, Colin Evans, Alek Dimitriev, Hannah Forbes, Dylan Banarse, Zora Tung, Mark Omernick, Colton Bishop, Rachel Sterneck, Rohan Jain, Jiawei Xia, Ehsan Amid, Francesco Piccinno, Xingyu Wang, Praseem Banzal, Daniel J. Mankowitz, Alex Polozov, Victoria Krakovna, Sasha Brown, MohammadHossein Bateni, Dennis Duan, Vlad Firoiu, Meghana Thotakuri, Tom Natan, Matthieu Geist, Ser tan Girgin, Hui Li, Jiayu Ye, Ofir Roval, Reiko Tojo, Michael Kwong, James Lee-Thorp, Christopher Yew, Danila Sinopalnikov, Sabela Ramos, John Mellor, Abhishek Sharma, Kathy Wu, David Miller, Nicolas Sonnerat, Denis Vnukov, Rory Greig, Jennifer Beattie, Emily Caveness, Libin Bai, Julian Eisenschlos, Alex Korchemniy, Tomy Tsai, Mimi Jasarevic, Weize Kong, Phuong Dao, Zeyu Zheng, Frederick Liu, Fan Yang, Rui Zhu, Tian Huey Teh, Jason Sanmiya, Evgeny Gladchenko, Nejc Trdin, Daniel Toyama, Evan Rosen, Sasan Tavakkol, Linting Xue, Chen Elkind, Oliver Woodman, John Carpenter, George Papamakarios, Rupert Kemp, Sushant Kafle, Tanya Grunina, Rishika Sinha, Alice Talbert, Diane Wu, Denese Owusu-Afriyie, Cosmo Du, Chloe Thornton, Jordi Pont-Tuset, Pradyumna Narayana, Jing Li, Saaber Fatehi, John Wieting, Omar Ajmeri, Benigno Uria, Yeongil Ko, Laura Knight, Amélie Héliou, Ning Niu, Shane Gu, Chenxi Pang, Yeqing Li, Nir Levine, Ariel Stolovich, Rebeca Santamaria-Fernandez, Sonam Goenka, Wenny Yustalim, Robin Strudel, Ali Elqursh, Charlie Deck, Hyo Lee, Zonglin Li, Kyle Levin, Raphael Hoffmann, Dan Holtmann-Rice, Olivier Bachem, Sho Arora, Christy Koh, Soheil Hassas Yeganeh, Siim Põder, Mukarram Tariq, Yanhua Sun, Lucian Ionita, Mojtaba Seyedhosseini, Pouya Tafti, Zhiyu Liu, Anmol Gulati, Jasmine Liu, Xinyu Ye, Bart Chrzaszcz, Lily Wang, Nikhil Sethi, Tianrun Li, Ben Brown, Shreya Singh, Wei Fan, Aaron Parisi, Joe Stanton, Vinod Koverkathu, Christopher A. Choquette-Choo, Yunjie Li, TJ Lu, Abe Ittycheriah, Prakash Shroff, Mani Varadarajan, Sanaz Bahargam, Rob Willoughby, David Gaddy, Guillaume Desjardins, Marco Cornero, Brona Robenek, Bhavishya Mittal, Ben Albrecht, Ashish Shenoy, Fedor Moiseev, Henrik Jacobsson, Alireza Ghaffarkhah, Morgane Rivière, Alanna Walton, Clément Crepy, Alicia Parrish, Zongwei Zhou, Clement Farabet, Carey Radebaugh, Praveen Srinivasan, Claudia van der Salm, Andreas Fidjeland, Salvatore Scellato, Eri Latorre-Chimoto, Hanna Klimczak-Plucińska, David Bridson, Dario de Cesare, Tom Hudson, Piermaria Mendolicchio, Lexi Walker, Alex Morris, Matthew Mauger, Alexey Guseynov, Alison Reid, Seth Odoom, Lucia Loher, Victor Cotruta, Madhavi Yenugula, Dominik Grewe, Anastasia Petrushkina, Tom Duerig, Antonio Sanchez, Steve Yadlowsky, Amy Shen, Amir Globerson, Lynette Webb, Sahil Dua, Dong Li, Surya Bhupatiraju, Dan Hurt, Haroon Qureshi, Ananth Agarwal, Tomer Shani, Matan Eyal, Anuj Khare, Shreyas Rammohan Belle, Lei Wang, Chetan Tekur, Mihir Sanjay Kale, Jinliang Wei, Ruoxin Sang, Brennan Saeta, Tyler Liechty, Yi Sun, Yao Zhao, Stephan Lee, Pandu Nayak, Doug Fritz, Manish Reddy Vuyyuru, John Aslanides, Nidhi Vyas, Martin Wicke, Xiao Ma, Evgenii Eltyshev, Nina Martin, Hardie Cate, James Manyika, Keyvan Amiri, Yelin Kim, Xi Xiong, Kai Kang, Florian Luisier, Nilesh Tripuraneni, David Madras, Mandy Guo, Austin Waters, Oliver Wang, Joshua Ainslie, Jason Baldridge, Han Zhang, Garima Pruthi, Jakob Bauer, Feng Yang, Riham Mansour, Jason Gelman, Yang Xu, George Polovets, Ji Liu, Honglong Cai, Warren Chen, XiangHai Sheng, Emily Xue, Sherjil Ozair, Christof Angermueller, Xiaowei Li, Anoop Sinha, Weiren Wang, Julia Wiesinger, Emmanouil Koukoumidis, Yuan Tian, Anand Iyer, Madhu Gurumurthy, Mark Goldenson, Parashar Shah, MK Blake, Hongkun Yu, Anthony Urbanowicz, Jennimaria Palomaki, Chrisantha Fernando, Ken Durden, Harsh Mehta, Nikola Momchev, Elahe Rahimtoroghi, Maria Georgaki, Amit Raul, Sebastian Ruder, Morgan Redshaw, Jinhyuk Lee, Denny Zhou, Komal Jalan, Dinghua Li, Blake Hechtman, Parker Schuh, Milad Nasr, Kieran Milan, Vladimir Mikulik, Juliana Franco, Tim Green, Nam Nguyen, Joe Kelley, Aroma Mahendru, Andrea Hu, Joshua Howland, Ben Vargas, Jeffrey Hui, Kshitij Bansal, Vikram Rao, Rakesh Ghiya, Emma Wang, Ke Ye, Jean Michel Sarr, Melanie Moranski Preston, Madeleine Elish, Steve Li, Aakash Kaku, Jigar Gupta, Ice Pasupat, Da-Cheng Juan, Milan Someswar, Tejvi M., Xinyun Chen, Aida Amini, Alex Fabrikant, Eric Chu, Xuanyi Dong, Amruta Muthal, Senaka Buthpitiya, Sarthak Jauhari, Nan Hua, Urvashi Khandelwal, Ayal Hitron, Jie Ren, Larissa Rinaldi, Shahar Drath, Avigail Dabush, Nan-Jiang Jiang, Harshal Godhia, Uli Sachs, Anthony Chen, Yicheng Fan, Hagai Taitelbaum, Hila Noga, Zhuyun Dai, James Wang, Chen Liang, Jenny Hamer, Chun-Sung Ferng, Chenel Elkind, Aviel Atias, Paulina Lee, Vít Listík, Mathias Carlen, Jan van de Kerkhof, Marcin Pikus, Krunoslav Zaher, Paul Müller, Sasha Zykova, Richard Stefanec, Vitaly Gatsko, Christoph Hirnschall, Ashwin Sethi, Xingyu Federico Xu, Chetan Ahuja, Beth Tsai, Anca Stefanoiu, Bo Feng, Keshav Dhandhania, Manish Katyal, Akshay Gupta, Atharva Parulekar, Divya Pitta, Jing Zhao, Vivaan Bhatia, Yashodha Bhavnani, Omar Alhadlaq, Xiaolin Li, Peter Danenberg, Dennis Tu, Alex Pine, Vera Filippova, Abhipso Ghosh, Ben Limonchik, Bhargava Urala, Chaitanya Krishna Lanka, Derik Clive, Yi Sun, Edward Li, Hao Wu, Kevin Hongtongsak, Ianna Li, Kalind Thakkar, Kuanysh Omarov, Kushal Majmundar, Michael Alverson, Michael Kucharski, Mohak Patel, Mudit Jain, Maksim Zabelin, Paolo Pelagatti, Rohan Kohli, Saurabh Kumar, Joseph Kim, Swetha Sankar, Vineet Shah, Lakshmi Ramachandruni, Xiangkai Zeng, Ben Bariach, Laura Weidinger, Tu Vu, Amar Subramanya, Sissie Hsiao, Demis Hassabis, Koray Kavukcuoglu, Adam Sadovsky, Quoc Le, Trevor Strohman, Yonghui Wu, Slav Petrov, Jeffrey Dean, Oriol Vinyals,

(参考訳) 本報告では,画像,音声,ビデオ,テキスト理解の両面で優れた機能を示す,新しいマルチモーダルモデルであるGeminiを紹介する。 GeminiファミリーはUltra、Pro、Nanoサイズで構成されており、複雑な推論タスクからオンデバイスメモリ制約のユースケースまで幅広い用途に適している。幅広いベンチマークに対する評価は、我々の最も有能なGemini Ultraモデルが、これらのベンチマークのうち32のベンチマークのうち30の最先端モデルに進歩していることを示している - 特に、よく研究された試験ベンチマークMMLUで人為的なパフォーマンスを達成した最初のモデルであり、調査した20のマルチモーダルベンチマークのうちの1つで最先端モデルが改善されている。 Geminiファミリーのクロスモーダル推論と言語理解における新機能によって、さまざまなユースケースが実現できると考えています。 Gemini、Gemini Advanced、Google AI Studio、Cloud Vertex AIといったサービスを通じて、ユーザに対して責任を負うような、ゲミニモデルのポストトレーニングとデプロイに対する当社のアプローチについて議論する。

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.

翻訳日:2024-05-22 18:51:19 公開日:2024-05-20

# 自動車をハッキングする予測者をハックする: 自律走行セキュリティのための軌道予測脆弱性を識別するために感度分析を利用する

Hacking Predictors Means Hacking Cars: Using Sensitivity Analysis to Identify Trajectory Prediction Vulnerabilities for Autonomous Driving Security ( http://arxiv.org/abs/2401.10313v2 )

ライセンス: Link先を確認

Marsalis Gibson, David Babazadeh, Claire Tomlin, Shankar Sastry,

(参考訳) 学習に基づくマルチモーダル軌道予測器に対する逆攻撃はすでに実証されている。しかし、状態履歴以外の入力に対する摂動の影響や、これらの攻撃が下流の計画と制御にどのように影響するかについては、まだ明らかな疑問がある。本稿では,2つの軌道予測モデルである Trajectron++ と AgentFormer の感度解析を行う。この分析により、全ての入力の間に、両方のモデルに対する摂動感度のほぼ全ては、最新の位置と速度状態にしか属さないことが明らかとなった。さらに、状態履歴の摂動に支配的な感度があるにもかかわらず、高速勾配符号法による検出不可能な画像マップ摂動は、両方のモデルにおいて大きな予測誤差の増加を誘発し、これらの軌跡予測器が実際、画像ベース攻撃の影響を受けやすいことを示した。評価結果から得られた最適プランナーと例摂動を用いて、これらの攻撃によって車両が突然、適度な運転速度から停止する可能性があることを示す。

Adversarial attacks on learning-based multi-modal trajectory predictors have already been demonstrated. However, there are still open questions about the effects of perturbations on inputs other than state histories, and how these attacks impact downstream planning and control. In this paper, we conduct a sensitivity analysis on two trajectory prediction models, Trajectron++ and AgentFormer. The analysis reveals that between all inputs, almost all of the perturbation sensitivities for both models lie only within the most recent position and velocity states. We additionally demonstrate that, despite dominant sensitivity on state history perturbations, an undetectable image map perturbation made with the Fast Gradient Sign Method can induce large prediction error increases in both models, revealing that these trajectory predictors are, in fact, susceptible to image-based attacks. Using an optimization-based planner and example perturbations crafted from sensitivity results, we show how these attacks can cause a vehicle to come to a sudden stop from moderate driving speeds.

翻訳日:2024-05-22 18:41:35 公開日:2024-05-20

# ニューラルネットワークを用いた大気密度適応型精密火星探査ナビゲーション

Precision Mars Entry Navigation with Atmospheric Density Adaptation via Neural Networks ( http://arxiv.org/abs/2401.14411v2 )

ライセンス: Link先を確認

Felipe Giraldo-Grueso, Andrey A. Popov, Renato Zanetti,

(参考訳) 火星に入る宇宙船は、ダイナミックで不確実な大気環境における車両の位置と速度を正確に推定できる正確なナビゲーションアルゴリズムを必要とする。真の火星大気密度とオンボード密度モデルとの相違は、宇宙船の航法フィルタの性能を著しく損なう可能性がある。この研究は、ニューラルネットワークを用いて大気密度を推定し、推定の不確かさを考慮に入れた、火星突入のためのオンラインフィルタリングの新しいアプローチを導入する。ネットワークは指数的な大気密度モデルに基づいて訓練され、そのパラメータは、真の密度と推定された密度のミスマッチを考慮するために、リアルタイムで動的に適応される。ネットワークの適応は、フィルタの測定革新を利用して最適なネットワークパラメータを同定することにより、最大極大問題として定式化される。最大可能性のアプローチのコンテキスト内では、ニューラルネットワークを組み込むことで、機械学習領域におけるその効率で知られている確率的オプティマイザの使用が可能になる。様々な現実的な火星の航法シナリオにおいて、共分散マッチングと状態拡張と修正の2つのオンライン適応アプローチに対して性能比較を行った。その結果、他の手法と比較して推定精度が優れており、火星-GRAMデータから得られた火星の大気を広範囲に選別して推定密度を正確に調整できることがわかった。

Spacecraft entering Mars require precise navigation algorithms capable of accurately estimating the vehicle's position and velocity in dynamic and uncertain atmospheric environments. Discrepancies between the true Martian atmospheric density and the onboard density model can significantly impair the performance of spacecraft entry navigation filters. This work introduces a new approach to online filtering for Martian entry using a neural network to estimate atmospheric density and employing a consider analysis to account for the uncertainty in the estimate. The network is trained on an exponential atmospheric density model, and its parameters are dynamically adapted in real time to account for any mismatch between the true and estimated densities. The adaptation of the network is formulated as a maximum likelihood problem by leveraging the measurement innovations of the filter to identify optimal network parameters. Within the context of the maximum likelihood approach, incorporating a neural network enables the use of stochastic optimizers known for their efficiency in the machine learning domain. Performance comparisons are conducted against two online adaptive approaches, covariance matching and state augmentation and correction, in various realistic Martian entry navigation scenarios. The results show superior estimation accuracy compared to other approaches, and precise alignment of the estimated density with a broad selection of realistic Martian atmospheres sampled from perturbed Mars-GRAM data.

翻訳日:2024-05-22 18:41:35 公開日:2024-05-20

# 予測影響評価支援のためのLCMの能力評価

Evaluating the Capabilities of LLMs for Supporting Anticipatory Impact Assessment ( http://arxiv.org/abs/2401.18028v2 )

ライセンス: Link先を確認

Mowafak Allaham, Nicholas Diakopoulos,

(参考訳) 社会における人工知能(AI)技術の潜在的なネガティブな影響に関する洞察を得ることは、予想されるガバナンスアプローチを実装する上での課題である。このような洞察を生み出すための1つのアプローチは、新興技術の望ましくない結果の範囲を考案し探求する過程で専門家を支援し、ガイドするために、LLM(Large Language Models)を使用することである。しかし、このようなタスクに対するLCMの性能評価は、生成した影響の一般的な品質だけでなく、生成した影響の種類やバイアスも調査するなど、依然として必要である。本稿では, メディアからの多種多様な記事に対して, 微調整完了モデル(GPT-3, Mistral-7B) を用いて, 社会におけるAIの高品質で多様な影響を生み出す可能性を示し, インストラクションベースモデル(GPT-4, Mistral-7B-Instruct) による影響と比較する。我々は, コーヒーレンス, 構造, 妥当性, 信頼性について検討し, メディアからの影響を微調整した小型オープンソースモデルMistral-7Bによる影響は, GPT-4のようなより有能で大規模なモデルで生成された影響と同程度に質的に評価される傾向にあることを示した。さらに, 命令ベースモデルによる影響は, 微調整モデルと比較して, ある種の影響カテゴリーの生成にギャップがあることが判明した。この研究は、最先端のLLMが生み出す影響範囲における潜在的なバイアスと、予想されるガバナンスアプローチを支援するために、より高品質で多様な影響を生み出すためのスケーラブルな代替手段として、より小さなLLMをニュースメディアに整合させる可能性を強調している。

Gaining insight into the potential negative impacts of emerging Artificial Intelligence (AI) technologies in society is a challenge for implementing anticipatory governance approaches. One approach to produce such insight is to use Large Language Models (LLMs) to support and guide experts in the process of ideating and exploring the range of undesirable consequences of emerging technologies. However, performance evaluations of LLMs for such tasks are still needed, including examining the general quality of generated impacts but also the range of types of impacts produced and resulting biases. In this paper, we demonstrate the potential for generating high-quality and diverse impacts of AI in society by fine-tuning completion models (GPT-3 and Mistral-7B) on a diverse sample of articles from news media and comparing those outputs to the impacts generated by instruction-based (GPT-4 and Mistral-7B-Instruct) models. We examine the generated impacts for coherence, structure, relevance, and plausibility and find that the generated impacts using Mistral-7B, a small open-source model fine-tuned on impacts from the news media, tend to be qualitatively on par with impacts generated using a more capable and larger scale model such as GPT-4. Moreover, we find that impacts produced by instruction-based models had gaps in the production of certain categories of impacts in comparison to fine-tuned models. This research highlights a potential bias in the range of impacts generated by state-of-the-art LLMs and the potential of aligning smaller LLMs on news media as a scalable alternative to generate high quality and more diverse impacts in support of anticipatory governance approaches.

翻訳日:2024-05-22 18:41:35 公開日:2024-05-20

# マルチモーダル学習のためのテキスト中心アライメント

Text-centric Alignment for Multi-Modality Learning ( http://arxiv.org/abs/2402.08086v2 )

ライセンス: Link先を確認

Yun-Da Tsai, Ting-Yu Yen, Pei-Fu Guo, Zhe-Yan Li, Shou-De Lin,

(参考訳) 本研究では,マルチモーダル学習におけるモダリティミスマッチの課題について考察する。テキスト中心アライメント・フォー・マルチモーダル・ラーニング(TAMML)アプローチは,Large Language Models(LLM)とインコンテキスト・ラーニングと基礎モデルを用いて,これらの条件下でのマルチモーダルシステムの一般化性を高める手法である。テキストのユニークな性質を統一意味空間として活用することにより、TAMMLは目に見えない、多様性があり、予測不可能なモダリティの組み合わせを扱う上で、大幅な改善を示す。 TAMMLは様々なモダリティに適応するだけでなく、堅牢なパフォーマンスも維持し、埋め込み表現における従来の固定モードフレームワークの限界を克服する基礎モデルの可能性を示している。本研究は,モダリティの可用性が動的で不確実な実世界のアプリケーションに対して,フレキシブルで効果的なソリューションを提供することによって,この分野に寄与する。

This research paper addresses the challenge of modality mismatch in multimodal learning, where the modalities available during inference differ from those available at training. We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach, an innovative method that utilizes Large Language Models (LLMs) with in-context learning and foundation models to enhance the generalizability of multimodal systems under these conditions. By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations. TAMML not only adapts to varying modalities but also maintains robust performance, showcasing the potential of foundation models in overcoming the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the field by offering a flexible, effective solution for real-world applications where modality availability is dynamic and uncertain.

翻訳日:2024-05-22 18:31:52 公開日:2024-05-20

# 不規則時系列データ解析における安定なニューラル確率微分方程式

Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data ( http://arxiv.org/abs/2402.14989v3 )

ライセンス: Link先を確認

YongKyung Oh, Dongyoung Lim, Sungil Kim,

(参考訳) 実世界の時系列データにおける不規則サンプリング間隔と欠落値は、一貫した間隔と完全データを仮定する従来の手法の課題を示す。ニューラル正規微分方程式(Neural Ordinary Differential Equations (Neural ODEs))は、パラメータ化されたベクトル場を通して連続的な潜在表現を学習するためにODEソルバと結合されたニューラルネットワークを利用する別のアプローチを提供する。ニューラル確率微分方程式(Neural Stochastic Differential Equations (Neural SDEs))は、拡散項を組み込むことでニューラル ODE を拡張するが、特に不規則区間や欠落値を扱う場合、この加算は自明ではない。その結果, ドリフトと拡散関数の注意設計は安定性の維持と性能の向上に不可欠であるが, 強い解の欠如, 確率的不安定化, 不安定なオイラー離散化などの不適切な選択はニューラルSDEの性能に大きな影響を及ぼす可能性がある。本研究では,Langevin型SDE,Linear Noise SDE,Geometric SDEの3つの安定クラスを提案する。そして, 配電時の性能を良好に維持する上で, 過度な適合を効果的に防止し, その堅牢性を示す。提案手法の有効性を評価するため, 補間, 予測, 分類タスクの4つのベンチマークデータセットに対して広範囲な実験を行い, 欠落率の異なる30個の公開データセットを用いて手法のロバスト性を解析した。本研究は,実世界の不規則時系列データを扱う上で,提案手法の有効性を示すものである。

Irregular sampling intervals and missing values in real-world time series data present challenges for conventional methods that assume consistent intervals and complete data. Neural Ordinary Differential Equations (Neural ODEs) offer an alternative approach, utilizing neural networks combined with ODE solvers to learn continuous latent representations through parameterized vector fields. Neural Stochastic Differential Equations (Neural SDEs) extend Neural ODEs by incorporating a diffusion term, although this addition is not trivial, particularly when addressing irregular intervals and missing values. Consequently, careful design of drift and diffusion functions is crucial for maintaining stability and enhancing performance, while incautious choices can result in adverse properties such as the absence of strong solutions, stochastic destabilization, or unstable Euler discretizations, significantly affecting Neural SDEs' performance. In this study, we propose three stable classes of Neural SDEs: Langevin-type SDE, Linear Noise SDE, and Geometric SDE. Then, we rigorously demonstrate their robustness in maintaining excellent performance under distribution shift, while effectively preventing overfitting. To assess the effectiveness of our approach, we conduct extensive experiments on four benchmark datasets for interpolation, forecasting, and classification tasks, and analyze the robustness of our methods with 30 public datasets under different missing rates. Our results demonstrate the efficacy of the proposed method in handling real-world irregular time series data.

翻訳日:2024-05-22 18:31:52 公開日:2024-05-20

# NiNformer: トケミキシング生成ゲーティング機能を備えたネットワークトランスフォーマーのネットワーク

NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function ( http://arxiv.org/abs/2403.02411v3 )

ライセンス: Link先を確認

Abdullah Nazhat Abdullah, Tarkan Aydin,

(参考訳) 注意機構はトランスフォーマーアーキテクチャの主要なコンポーネントであり、導入以来、多くのドメインと複数のタスクにまたがるディープラーニングの大幅な進歩につながっている。注意機構はコンピュータビジョンにおいてビジョントランスフォーマー ViT として利用され、その用途は、分類、セグメンテーション、オブジェクト検出、画像生成など、視覚領域の多くのタスクに拡張されている。このメカニズムは非常に表現力があり能力があるが、計算コストが高く、効率的な最適化のためにかなりのサイズのデータセットを必要とするという欠点がある。これらの欠点に対処するために、計算負担を減らし、データサイズ要件を緩和する多くの設計が文献で提案されている。視覚領域におけるこのような試みの例としては、MLP-Mixer、Conv-Mixer、Perciver-IOなどがある。本稿では,MLP-Mixerの静的アプローチを強化するネットワーク・イン・ネットワーク構造を,トークン・ミキシング・プロセスによって要素ワイド・ゲーティング関数を学習する動的システムに置き換えることで,通常のViTブロックに代わる新しい計算ブロックを提案する。広汎な実験により,視覚領域の画像分類タスクに適用された複数のデータセットのベースラインアーキテクチャよりも優れた性能が得られた。

The attention mechanism is the main component of the transformer architecture, and since its introduction, it has led to significant advancements in deep learning that span many domains and multiple tasks. The attention mechanism was utilized in computer vision as the Vision Transformer ViT, and its usage has expanded into many tasks in the vision domain, such as classification, segmentation, object detection, and image generation. While this mechanism is very expressive and capable, it comes with the drawback of being computationally expensive and requiring datasets of considerable size for effective optimization. To address these shortcomings, many designs have been proposed in the literature to reduce the computational burden and alleviate the data size requirements. Examples of such attempts in the vision domain are the MLP-Mixer, the Conv-Mixer, the Perciver-IO, and many more. This paper introduces a new computational block as an alternative to the standard ViT block that reduces the compute burdens by replacing the normal attention layers with a Network in Network structure that enhances the static approach of the MLP-Mixer with a dynamic system of learning an element-wise gating function by a token mixing process. Extensive experimentation shows that the proposed design provides better performance than the baseline architectures on multiple datasets applied in the image classification task of the vision domain.

翻訳日:2024-05-22 18:22:08 公開日:2024-05-20

# 連鎖蒸留における相互情報の最大化のための学習

Learning to Maximize Mutual Information for Chain-of-Thought Distillation ( http://arxiv.org/abs/2403.03348v2 )

ライセンス: Link先を確認

Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, Ke Ding,

(参考訳) 知識蒸留は、大規模で複雑なモデルからより小さなモデルへ知識を伝達する技術であり、効率的なAIデプロイメントに向けた重要なステップである。 CoT蒸留を利用した新しい手法であるDistilling Step-by-Step (DSS) は、より大型のモデルに対して優れた推論能力を持つ小型モデルを投入することで、約束を証明している。 DSSでは、蒸留されたモデルは、マルチタスク学習フレームワークを通じて合理性を生成し、ラベルを同時に予測する能力を取得する。しかし、DSSは2つのトレーニングタスクの本質的な関係を見落とし、CoT知識とラベル予測のタスクの非効率な統合につながる。そこで本研究では,この2つのタスクの相互関係をインフォメーション・ボトルネックの観点から検討し,それら2つのタスクの表現特徴の相互情報の最大化として定式化する。本稿では,この最適化問題を学習に基づく手法を用いて解くための変分手法を提案する。 4つのデータセットにまたがる実験結果から,本手法は最先端DSSよりも優れていることが示された。本研究は,言語モデルの蒸留およびCoTの応用に関する今後の研究に対する洞察に富んだガイダンスを提供する。コードとモデルはまもなくリリースされる。

Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Code and models will be released soon.

翻訳日:2024-05-22 18:22:08 公開日:2024-05-20

# SPTNet:空間プロンプトチューニングによる一般化カテゴリー発見のための効率的な代替フレームワーク

SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning ( http://arxiv.org/abs/2403.13684v2 )

ライセンス: Link先を確認

Hongjun Wang, Sagar Vaze, Kai Han,

(参考訳) Generalized Category Discovery (GCD) は、'seen' クラスと 'unseen' クラスの両方から、ラベル付き 'seen' クラスのイメージのセットから知識を転送することで、未ラベルのイメージを分類することを目的としている。既存のGCDのアプローチにおける重要なテーマは、GCDタスクのために大規模な事前訓練されたモデルを適用することである。しかし、代替的な視点は、データ表現自体を事前訓練されたモデルとの整合性に適応させることである。そこで本研究では,モデルパラメータ(モデルファインタニング)とデータパラメータ(即時学習)を反復的に最適化する,SPTNetと呼ばれる2段階適応手法を提案する。さらに,画像データの空間特性を考慮した空間的プロンプトチューニング手法(SPT)を提案する。我々は,SPTNetを標準ベンチマークで徹底的に評価し,既存のGCD法よりも優れていることを示す。特に, 従来の最先端手法を約10%超えて, SSBの平均精度は61.4%であることがわかった。我々の手法はバックボーンアーキテクチャの0.117%のパラメータを余分に生成するので、この改善は特に顕著である。プロジェクトページ: https://visual-ai.github.io/sptnet.com

Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet.

翻訳日:2024-05-22 18:12:24 公開日:2024-05-20

# 分布シフトを伴うハーフスペースの学習:改良アルゴリズムとSQ下界

Learning Intersections of Halfspaces with Distribution Shift: Improved Algorithms and SQ Lower Bounds ( http://arxiv.org/abs/2404.02364v2 )

ライセンス: Link先を確認

Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan,

(参考訳) Klivans, Stavropoulos, Vasilyanの最近の研究は、分散シフトによるテスト可能な学習(TDS学習)の研究を開始した。そこでは、学習者にトレーニング分布からラベル付きサンプルを付与する$\mathcal{D}$、テスト分布からラベル付けされていないサンプルを$\mathcal{D}'$、トレーニングサンプルが対応するテストを通過するたびに$\mathcal{D}'$で低エラーの分類器を出力することを目的としている。それらのモデルは、$\mathcal{D}'$に仮定されることはないという点で、以前のすべての作業から逸脱する。代わりに、トレーニングとテストの分布の限界が等しい場合、テストは(高い確率で)受け入れなければならない。ここでは、ガウスの訓練分布に関するハーフ空間の交叉の基本的なケースに注目し、$k$同質半空間のTDS学習交叉に対する2$(k/\epsilon)^{O(1)}} \mathsf{poly}(d)$-timeアルゴリズムを含む様々な新しい上限を証明している。ガウスのトレーニング分布は、正と負の両方の例(\epsilon$- Balanced)の少なくとも$\epsilon$分を含むという軽微な仮定の下で作業する。また、任意のTDS学習問題に対するSQの下界の最初の集合を証明し、(1) 1 つの半空間に対する $\mathsf{poly}(d,1/\epsilon)$-time TDS 学習に$\epsilon$- Balanced 仮定が必要であること、(2) a $d^{\tilde{\Omega}(\log 1/\epsilon)$ 2 つの一般半空間の交叉に対する$$\epsilon$- Balanced 仮定においても$\epsilon$- Balanced 仮定は必要であることを示す。我々の技術は、TDS学習のツールキットを著しく拡張します。我々は次元の縮小と被覆を用いて、領域適応文学の重要な指標である離散距離の局所化バージョンを計算するための効率的なアルゴリズムを提供する。

Recent work of Klivans, Stavropoulos, and Vasilyan initiated the study of testable learning with distribution shift (TDS learning), where a learner is given labeled samples from training distribution $\mathcal{D}$, unlabeled samples from test distribution $\mathcal{D}'$, and the goal is to output a classifier with low error on $\mathcal{D}'$ whenever the training samples pass a corresponding test. Their model deviates from all prior work in that no assumptions are made on $\mathcal{D}'$. Instead, the test must accept (with high probability) when the marginals of the training and test distributions are equal. Here we focus on the fundamental case of intersections of halfspaces with respect to Gaussian training distributions and prove a variety of new upper bounds including a $2^{(k/\epsilon)^{O(1)}} \mathsf{poly}(d)$-time algorithm for TDS learning intersections of $k$ homogeneous halfspaces to accuracy $\epsilon$ (prior work achieved $d^{(k/\epsilon)^{O(1)}}$). We work under the mild assumption that the Gaussian training distribution contains at least an $\epsilon$ fraction of both positive and negative examples ($\epsilon$-balanced). We also prove the first set of SQ lower-bounds for any TDS learning problem and show (1) the $\epsilon$-balanced assumption is necessary for $\mathsf{poly}(d,1/\epsilon)$-time TDS learning for a single halfspace and (2) a $d^{\tilde{\Omega}(\log 1/\epsilon)}$ lower bound for the intersection of two general halfspaces, even with the $\epsilon$-balanced assumption. Our techniques significantly expand the toolkit for TDS learning. We use dimension reduction and coverings to give efficient algorithms for computing a localized version of discrepancy distance, a key metric from the domain adaptation literature.

翻訳日:2024-05-22 18:12:24 公開日:2024-05-20

# 客観性は明らかか? - KhrennikovとQBistへの回答

Is Intersubjectivity Proven? A Reply to Khrennikov and to QBists ( http://arxiv.org/abs/2404.04367v2 )

ライセンス: Link先を確認

Herve Zwirn,

(参考訳) 最近の2つの論文において、クレンニコフは、彼は「大沢の射影定理」(Ozawa intersubjectivity theorem) と呼ぶものを用いて、相互射影性は量子力学において必ずしも検証されていると主張し、QB主義を批判し、より一般的に観賞的であるすべての解釈を批判する。以前の2つのQBist論文と一致して、Khrennikovの証明が有効でない理由を説明します。

In two recent papers Khrennikov uses what he calls Ozawa intersubjectivity theorem to claim that intersubjectivity is necessarily verified in quantum mechanics and to criticize QBism and more generally all interpretations that are perspectival. In agreement with two previous QBist papers, I explain here why Khrennikov proof is not valid but in contrast with one of these papers, I criticize the way intersubjectivity is dealt with in QBism.

翻訳日:2024-05-22 18:12:24 公開日:2024-05-20

# 大規模言語モデルを用いた読解テスト項目の自動生成と評価

Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models ( http://arxiv.org/abs/2404.07720v2 )

ライセンス: Link先を確認

Andreas Säuberli, Simon Clematide,

(参考訳) 可読性テストは、教育から簡易テキストの可読性評価まで、様々なアプリケーションで使用されている。しかし、このようなテストを手動で作成し、品質を保証することは難しく、時間を要する。本稿では,大規模言語モデル(LLM)を用いて,複数項目の読解項目の生成と評価を行う。そこで我々は,ドイツ語読解項目のデータセットをコンパイルし,推測可能性と解答可能性に基づくテキスト情報伝達度(text informativity)と呼ばれる指標を含む,人間と自動評価のための新しいプロトコルを開発した。次に、このプロトコルとデータセットを用いて、Llama 2 と GPT-4 で生成されたアイテムの品質を評価した。以上の結果から,両モデルともゼロショット設定で許容品質のアイテムを生成できることが示唆されるが,GPT-4はLlama 2より明らかに優れていた。また, LLM をアイテムレポジトリから抽出することで, 自動評価に利用できることを示す。このシナリオでは、GPT-4による評価結果はヒトのアノテータに最もよく似ている。全体として、LLMによるゼロショット生成は、読解テスト項目の生成と評価において有望なアプローチである。

Reading comprehension tests are used in a variety of applications, reaching from education to assessing the comprehensibility of simplified texts. However, creating such tests manually and ensuring their quality is difficult and time-consuming. In this paper, we explore how large language models (LLMs) can be used to generate and evaluate multiple-choice reading comprehension items. To this end, we compiled a dataset of German reading comprehension items and developed a new protocol for human and automatic evaluation, including a metric we call text informativity, which is based on guessability and answerability. We then used this protocol and the dataset to evaluate the quality of items generated by Llama 2 and GPT-4. Our results suggest that both models are capable of generating items of acceptable quality in a zero-shot setting, but GPT-4 clearly outperforms Llama 2. We also show that LLMs can be used for automatic evaluation by eliciting item reponses from them. In this scenario, evaluation results with GPT-4 were the most similar to human annotators. Overall, zero-shot generation with LLMs is a promising approach for generating and evaluating reading comprehension test items, in particular for languages without large amounts of available data.

翻訳日:2024-05-22 18:02:40 公開日:2024-05-20

# Wasserstein Wormhole: 変圧器を用いたスケーラブルな最適輸送距離

Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformers ( http://arxiv.org/abs/2404.09411v3 )

ライセンス: Link先を確認

Doron Haviv, Russell Zhang Kunes, Thomas Dougherty, Cassandra Burdziak, Tal Nawy, Anna Gilbert, Dana Pe'er,

(参考訳) 最適輸送(OT)と関連するワッサーシュタイン計量(W)は、分布を比較するための強力でユビキタスなツールである。しかし、コホートサイズが大きくなるにつれて、ペアワイズワッサースタイン距離の計算は急速に困難になる。魅力的な選択肢は、標準多次元スケーリング(MDS)と同様、ユークリッド距離をOT距離にペアでマッピングする埋め込み空間を見つけることである。我々は、変圧器をベースとした自己エンコーダであるワッサーシュタイン・ワームホール(Wasserstein Wormhole)を、ユークリッド距離がOT距離に近似する潜在空間に経験的分布を埋める。 MDS理論を拡張して、目的関数は非ユークリッド距離を埋め込む際に発生する誤差の有界性を示すことを示す。実験的に、ワームホール埋め込み間の距離はワッサーシュタイン距離と密接に一致し、OT距離の線形時間計算を可能にした。 Wasserstein Wormholeは、分散を埋め込みにマッピングするエンコーダとともに、埋め込みを分布にマッピングするデコーダを含み、埋め込み空間内の操作をWasserstein Barycenter EstimationやOT補間といったOT空間に一般化することができる。スケーラビリティと解釈可能性をOTアプローチに貸すことで、Wasserstein Wormholeは計算幾何学と単細胞生物学の分野におけるデータ解析の新たな道を開く。

Optimal transport (OT) and the related Wasserstein metric (W) are powerful and ubiquitous tools for comparing distributions. However, computing pairwise Wasserstein distances rapidly becomes intractable as cohort size grows. An attractive alternative would be to find an embedding space in which pairwise Euclidean distances map to OT distances, akin to standard multidimensional scaling (MDS). We present Wasserstein Wormhole, a transformer-based autoencoder that embeds empirical distributions into a latent space wherein Euclidean distances approximate OT distances. Extending MDS theory, we show that our objective function implies a bound on the error incurred when embedding non-Euclidean distances. Empirically, distances between Wormhole embeddings closely match Wasserstein distances, enabling linear time computation of OT distances. Along with an encoder that maps distributions to embeddings, Wasserstein Wormhole includes a decoder that maps embeddings back to distributions, allowing for operations in the embedding space to generalize to OT spaces, such as Wasserstein barycenter estimation and OT interpolation. By lending scalability and interpretability to OT approaches, Wasserstein Wormhole unlocks new avenues for data analysis in the fields of computational geometry and single-cell biology.

翻訳日:2024-05-22 18:02:40 公開日:2024-05-20

# AIインタフェースにおけるデザインパターンとの相互作用による特徴付けとモデリング

Characterizing and modeling harms from interactions with design patterns in AI interfaces ( http://arxiv.org/abs/2404.11370v3 )

ライセンス: Link先を確認

Lujain Ibrahim, Luc Rocher, Ana Valdivia,

(参考訳) 人工知能(AI)システムを用いたアプリケーションの普及は、洗練されたインターフェースを通じてこれらのシステムと対話するユーザの増加につながっている。ヒューマンコンピュータインタラクションの研究は、ユーザー行動と技術的能力とリスクに対するユーザーの認識の両方を形作るインターフェースを長年にわたって示してきた。しかし、AIシステムの社会的および倫理的リスクを評価する実践者や研究者は、人間とAIの相互作用に対する人為的、欺く、没入的なインターフェースの影響を見落としてしまう傾向にある。ここでは,適応型AIシステムを用いたインタフェースの設計は,従来考えられていた以上のフィードバックループによって,カスケード効果をもたらす可能性がある,と論じる。まず、AIインターフェース設計のスコーピングレビューを行い、AIインターフェースに潜在的に有害なデザインパターンの有害なテーマを抽出する。そこで我々は,AIインタフェース設計における影響評価を構造化し,促進する概念モデルとして,AIシステムの設計強化制御(DECAI)を提案する。 DECAIは制御系理論(動的物理系の解析と設計の理論)の原則に基づいて、ヒューマンAIシステムにおけるインターフェースの役割を解明する。推薦システムと対話型言語モデルシステムに関する2つのケーススタディを通じて、AIインタフェース設計の評価にDECAIをどのように利用できるかを示す。

The proliferation of applications using artificial intelligence (AI) systems has led to a growing number of users interacting with these systems through sophisticated interfaces. Human-computer interaction research has long shown that interfaces shape both user behavior and user perception of technical capabilities and risks. Yet, practitioners and researchers evaluating the social and ethical risks of AI systems tend to overlook the impact of anthropomorphic, deceptive, and immersive interfaces on human-AI interactions. Here, we argue that design features of interfaces with adaptive AI systems can have cascading impacts, driven by feedback loops, which extend beyond those previously considered. We first conduct a scoping review of AI interface designs and their negative impact to extract salient themes of potentially harmful design patterns in AI interfaces. Then, we propose Design-Enhanced Control of AI systems (DECAI), a conceptual model to structure and facilitate impact assessments of AI interface designs. DECAI draws on principles from control systems theory -- a theory for the analysis and design of dynamic physical systems -- to dissect the role of the interface in human-AI systems. Through two case studies on recommendation systems and conversational language model systems, we show how DECAI can be used to evaluate AI interface designs.

翻訳日:2024-05-22 18:02:40 公開日:2024-05-20

# STaRK: テキストと関係知識に基づくLLM検索のベンチマーク

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases ( http://arxiv.org/abs/2404.13207v2 )

ライセンス: Link先を確認

Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec,

(参考訳) 複雑な製品検索のような現実世界の複雑なクエリに答えるには、構造化されていない(例:製品のテキスト記述)と構造化された(例:製品の実体関係)情報の混在を含む、半構造化された知識ベースからの正確な検索が必要となることが多い。しかし、以前の研究はテキスト検索と関係検索を個別のトピックとして研究していた。このギャップに対処するため,テキストとリレーショナルKのガベージベース上での大規模半構造評価ベンチマークSTARKを開発した。本ベンチマークでは, 製品検索, 学術論文検索, 精密医療におけるクエリの3分野を対象とする。多様なリレーショナル情報と複雑なテキスト特性を統合した,現実的なユーザクエリを合成する,新たなパイプラインを設計する。我々は,合成クエリの品質を評価するために,厳密な人的評価を行う。さらに、高品質な人為的クエリによるベンチマークを強化し、真の参照を提供する。 STARKは、大規模言語モデル(LLM)によって駆動される検索システムの性能を評価するための総合的なテストベッドとして機能する。実験の結果,STARKは現在の検索システムとLLMシステムに重大な課題を呈し,より有能な検索システムの構築の必要性が示唆された。ベンチマークデータとコードはhttps://github.com/snap-stanford/stark.comで公開されている。

Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational K nowledge Bases. Our benchmark covers three domains/datasets: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries. We further enhance the benchmark with high-quality human-generated queries to provide an authentic reference. STARK serves as a comprehensive testbed for evaluating the performance of retrieval systems driven by large language models (LLMs). Our experiments suggest that STARK presents significant challenges to the current retrieval and LLM systems, indicating the demand for building more capable retrieval systems. The benchmark data and code are available on https://github.com/snap-stanford/stark.

翻訳日:2024-05-22 18:02:40 公開日:2024-05-20

# 音響近似を用いたニューラルネットワーク動的モデルのリアルタイム安全制御

Real-Time Safe Control of Neural Network Dynamic Models with Sound Approximation ( http://arxiv.org/abs/2404.13456v2 )

ライセンス: Link先を確認

Hanjiang Hu, Jianglin Lan, Changliu Liu,

(参考訳) ニューラルネットワークダイナミックモデル(NNDM)の安全な制御は、ロボット工学や多くの応用において重要である。しかし、NNDMの最適安全制御をリアルタイムに計算することは依然として困難である。実時間計算を実現するために,NNDMの音響近似を制御合成に用いることを提案する。特に、NNDMにおけるReLU活性化関数のBernstein多項式オーバー近似(BPO)に基づくBernstein over-approximated Neural Dynamics(BOND)を提案する。 NNDMのBPO緩和における最も安全でない近似状態を用いて、近似による誤差を軽減し、安全制御問題の持続可能性を確保するために、最悪のケース安全性指標を合成する。オンラインリアルタイム最適化では、非線形最悪の安全制約の1次テイラー近似を、高次残差の l2 境界バイアス項を付加した NNDM の線形層として定式化する。異なるニューラルダイナミクスと安全性制約による総合的な実験により、音近似のNNDMは、MIP(Mixed integer Programming)を用いた安全な制御ベースラインよりも10～100倍高速で、最悪の安全指標の有効性と、提案したBONDのリアルタイム大規模設定におけるスケーラビリティが検証された。コードはhttps://github.com/intelligent-control-lab/BOND.comで公開されている。

Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernstein polynomial over-approximation (BPO) of ReLU activation functions in NNDM. To mitigate the errors introduced by the approximation and to ensure persistent feasibility of the safe control problems, we synthesize a worst-case safety index using the most unsafe approximated state within the BPO relaxation of NNDM offline. For the online real-time optimization, we formulate the first-order Taylor approximation of the nonlinear worst-case safety constraint as an additional linear layer of NNDM with the l2 bounded bias term for the higher-order remainder. Comprehensive experiments with different neural dynamics and safety constraints show that with safety guaranteed, our NNDMs with sound approximation are 10-100 times faster than the safe control baseline that uses mixed integer programming (MIP), validating the effectiveness of the worst-case safety index and scalability of the proposed BOND in real-time large-scale settings. The code is available at https://github.com/intelligent-control-lab/BOND.

翻訳日:2024-05-22 18:02:40 公開日:2024-05-20

# トラップイオンを用いたディジタルアナログ反断熱量子最適化

Digital-Analog Counterdiabatic Quantum Optimization with Trapped Ions ( http://arxiv.org/abs/2405.01447v2 )

ライセンス: Link先を確認

Shubham Kumar, Narendra N. Hegade, Alejandro Gomez Cadavid, Murilo Henrique de Oliveira, Enrique Solano, F. Albarrán-Arriagada,

(参考訳) 本稿では,最適化問題に適した反断熱量子力学の,ハードウェア固有の問題依存型ディジタルアナログ量子アルゴリズムを提案する。具体的には,デジタルゲートを補完するアナログ相互作用として,グローバルなM{\o}lmer-S{\o}rensenゲートを活かして,トラップイオンアーキテクチャに着目する。アナログブロックとデジタルステップの最適構成は、純粋にデジタルアプローチに比べて回路深さが大幅に減少することを示す。これは、提案したエンコーディングを使うことで、現在のデバイスのコヒーレンス時間を保ちながら、より多くのキュービットを必要とする、より大きな最適化問題インスタンスに対処できることを意味している。さらに, アナログブロックの最小ゲート忠実度は, 純粋デジタルシミュレーションよりも優れており, 文献で報告されている最良忠実度以下であることが確認された。ディジタル・アナログ符号化の性能を検証するため,最大独立セット問題に取り組み,デジタル・ケースに比べて少ないリソースを必要とすることを示す。このハイブリッド共設計アプローチは、量子最適化問題の効率的な解に対する量子優位性への道を開く。

We introduce a hardware-specific, problem-dependent digital-analog quantum algorithm of a counterdiabatic quantum dynamics tailored for optimization problems. Specifically, we focus on trapped-ion architectures, taking advantage from global M{\o}lmer-S{\o}rensen gates as the analog interactions complemented by digital gates, both of which are available in the state-of-the-art technologies. We show an optimal configuration of analog blocks and digital steps leading to a substantial reduction in circuit depth compared to the purely digital approach. This implies that, using the proposed encoding, we can address larger optimization problem instances, requiring more qubits, while preserving the coherence time of current devices. Furthermore, we study the minimum gate fidelity required by the analog blocks to outperform the purely digital simulation, finding that it is below the best fidelity reported in the literature. To validate the performance of the digital-analog encoding, we tackle the maximum independent set problem, showing that it requires fewer resources compared to the digital case. This hybrid co-design approach paves the way towards quantum advantage for efficient solutions of quantum optimization problems.

翻訳日:2024-05-22 17:52:56 公開日:2024-05-20

# ラストパス漂白における人為的要因

Human Factors in the LastPass Breach ( http://arxiv.org/abs/2405.01795v3 )

ライセンス: Link先を確認

Niroop Sugunaraj,

(参考訳) 本稿では,LastPass攻撃の解析を通じて,サイバー攻撃の複雑な性質について検討する。目標は、目標指向の行動、認知的過負荷、人間の偏見(例えば、楽観主義、アンカーリング)、リスク行動などの要因を緩和することに集中することである。この侵害の分析から得られた発見は、サイバー防衛の人間的側面と技術的側面の両方に対処することで、複雑な脅威に対するサイバーシステムのレジリエンスを著しく向上させるという観点からの支持を提供する。これは、ユーザのインタラクションをシンプルにしつつバランスのとれたアプローチを維持し、ユーザのバイアスを認識させ、サイバーインシデントを防ぐためにリスク回避のプラクティスが不可欠であることを意味します。

This paper examines the complex nature of cyber attacks through an analysis of the LastPass breach. It argues for the integration of human-centric considerations into cybersecurity measures, focusing on mitigating factors such as goal-directed behavior, cognitive overload, human biases (e.g., optimism, anchoring), and risky behaviors. Findings from an analysis of this breach offers support to the perspective that addressing both the human and technical dimensions of cyber defense can significantly enhance the resilience of cyber systems against complex threats. This means maintaining a balanced approach while simultaneously simplifying user interactions, making users aware of biases, and discouraging risky practices are essential for preventing cyber incidents.

翻訳日:2024-05-22 17:52:56 公開日:2024-05-20

# 暗黙のシナリオリダクションによる超量子制約最適化の高速計算

Fast Computation of Superquantile-Constrained Optimization Through Implicit Scenario Reduction ( http://arxiv.org/abs/2405.07965v2 )

ライセンス: Link先を確認

Jake Roth, Ying Cui,

(参考訳) 近年,統計学習や意思決定問題において,公正性や分布変化に対処するためのリスク対応指標として,スーパーチャンティルが注目されている。本稿では,超量子的制約による大規模最適化問題を解くために,高速でスケーラブルで堅牢な2階計算フレームワークを提案する。経験的リスク最小化とは異なり、超量子ベースの最適化は、テール条件予測を計算するために、すべてのシナリオで評価されたランダム関数のランク付けを必要とする。このテールベースの機能は、計算的に不都合に思えるかもしれないが、半滑らか-ニュートンベースのラグランジアン法に有利な設定を提供する。超量子作用素は、テール期待がかなり少ないシナリオを含むため、ニュートン系の次元を効果的に減少させる。特に、関連する2階情報を取得し、行列逆転を行うための余分なコストは、勾配計算に必要な労力に匹敵し、時にはそれ以下である。提案手法は,シナリオ数が決定変数数を超える場合,特に有効である。線形および凸対角2次目的の合成問題において, 数値実験により, 提案手法は, 低精度解のOSQPで実装した乗算器の交互方向法よりも, 線形および2次目的の750倍以上の高速化を実現している。さらに、線形目的の最大25倍、二次目的の最大70倍、線形目的の最大20倍、二次目的の最大30倍、高精度解計算のPortfolio Safeguard最適化スイートよりも高速である。

Superquantiles have recently gained significant interest as a risk-aware metric for addressing fairness and distribution shifts in statistical learning and decision making problems. This paper introduces a fast, scalable and robust second-order computational framework to solve large-scale optimization problems with superquantile-based constraints. Unlike empirical risk minimization, superquantile-based optimization requires ranking random functions evaluated across all scenarios to compute the tail conditional expectation. While this tail-based feature might seem computationally unfriendly, it provides an advantageous setting for a semismooth-Newton-based augmented Lagrangian method. The superquantile operator effectively reduces the dimensions of the Newton systems since the tail expectation involves considerably fewer scenarios. Notably, the extra cost of obtaining relevant second-order information and performing matrix inversions is often comparable to, and sometimes even less than, the effort required for gradient computation. Our developed solver is particularly effective when the number of scenarios substantially exceeds the number of decision variables. In synthetic problems with linear and convex diagonal quadratic objectives, numerical experiments demonstrate that our method outperforms existing approaches by a large margin: It achieves speeds more than 750 times faster for linear and quadratic objectives than the alternating direction method of multipliers as implemented by OSQP for computing low-accuracy solutions. Additionally, it is up to 25 times faster for linear objectives and 70 times faster for quadratic objectives than the commercial solver Gurobi, and 20 times faster for linear objectives and 30 times faster for quadratic objectives than the Portfolio Safeguard optimization suite for high-accuracy solution computations.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# マルチカバーのためのセンサネットワーク設計の最適化

Optimizing Sensor Network Design for Multiple Coverage ( http://arxiv.org/abs/2405.09096v2 )

ライセンス: Link先を確認

Lukas Taus, Yen-Hsi Richard Tsai,

(参考訳) センサ配置最適化法は広く研究されている。それらは、既知の環境の監視、5Gタワーの最適な位置、ミサイル防衛システムの配置など、幅広い用途に適用できる。しかし、センサーの故障や敵の攻撃に関するセンサネットワークの堅牢性と効率性を調べる研究はほとんどない。本稿では、最小限のセンサを最適化して、所定の数のセンサによって、非単純連結領域の複数のカバレッジを実現することで、この問題に対処する。本稿では,より効率的で堅牢なセンサネットワークを設計し,ネットワークの最適性に関する理論的境界を導出するための,新しい目的関数(greedy,next-best-view)アルゴリズムを提案する。さらに,ほぼリアルタイムに計算を行うアルゴリズムを高速化するディープラーニングモデルを導入する。ディープラーニングモデルは、トレーニング例の生成を必要とする。それに対応して、トレーニングデータセットの幾何学的特性を理解することは、深層学習技術の性能と訓練過程に重要な洞察を与えることを示す。最後に,より単純な目的を用いたグレディアプローチの単純な並列バージョンは,非常に競争力が高いことを実証する。

Sensor placement optimization methods have been studied extensively. They can be applied to a wide range of applications, including surveillance of known environments, optimal locations for 5G towers, and placement of missile defense systems. However, few works explore the robustness and efficiency of the resulting sensor network concerning sensor failure or adversarial attacks. This paper addresses this issue by optimizing for the least number of sensors to achieve multiple coverage of non-simply connected domains by a prescribed number of sensors. We introduce a new objective function for the greedy (next-best-view) algorithm to design efficient and robust sensor networks and derive theoretical bounds on the network's optimality. We further introduce a Deep Learning model to accelerate the algorithm for near real-time computations. The Deep Learning model requires the generation of training examples. Correspondingly, we show that understanding the geometric properties of the training data set provides important insights into the performance and training process of deep learning techniques. Finally, we demonstrate that a simple parallel version of the greedy approach using a simpler objective can be highly competitive.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# キラルスピン液体からの創発性マヨラナ金属

Emergent Majorana metal from a chiral spin liquid ( http://arxiv.org/abs/2405.12278v1 )

ライセンス: Link先を確認

Penghao Zhu, Shi Feng, Kang Wang, Tao Xiang, Nandini Trivedi,

(参考訳) 強磁性キラルスピン液体(CSL)とパーシャル偏極(PP)相の間に挟まれた反強磁性キエフモデルにおける中間ギャップレススピン液相(IGP)の出現を説明する新しいメカニズムを提案する。中程度のフィールドで$\pi$-fluxes nucleateを基底状態で提案し,Majoranaゼロモードをトラップすることができる。これらのフラックスが磁場の増加とともに増殖するにつれて、マヨラナゼロモードは重なり、ゼロエネルギーで「フェルミ面」を持つマヨラナ金属状態を生成する。さらに、Majoranaスペクトル関数は、無限射影対角状態(iPEPS)アンサッツによって得られる動的スピンと二量相関をキャプチャすることを示した。本研究は, 候補の北エフ材料に対する結果の意味について論じる。

We propose a novel mechanism to explain the emergence of an intermediate gapless spin liquid phase (IGP) in the antiferromagnetic Kitaev model in an externally applied magnetic field, sandwiched between the well-known gapped chiral spin liquid (CSL) and the gapped partially polarized (PP) phase. We propose in moderate fields $\pi$-fluxes nucleate in the ground state and can trap Majorana zero modes. As these fluxes proliferate with increasing field, the Majorana zero modes overlap creating an emergent Majorana metallic state with a `Fermi surface' at zero energy. We further show that the Majorana spectral function captures the dynamical spin and dimer correlations obtained by the infinite Projected Entangled Pair States (iPEPS) ansatz. We discuss the implications of our results for candidate Kitaev materials.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# Anyonic系における多体非エルミチアン皮膚効果の動的抑制

Dynamical suppression of many-body non-Hermitian skin effect in Anyonic systems ( http://arxiv.org/abs/2405.12288v1 )

ライセンス: Link先を確認

Yi Qin, Ching Hua Lee, Linhu Li,

(参考訳) 非エルミート皮膚効果(英: non-Hermitian skin effect, NHSE)は、非平衡系において、固有状態が系の境界に大きく局在し、これらの系にロードされる(準)粒子を一方向的に境界に励起する現象である。多体効果との相互作用は近年活発に研究され、粒子間反発またはフェルミ縮退圧力は、NHSEによって誘発される境界蓄積を、その固有解法と力学の両方で制限することが示されている。しかし、この研究において、任意の統計学がNHSEの局所化方向に対して状態力学を抑圧したり反転させたりすることで、NHSEの力学にさらに深い影響を与えることが判明した。この系における量子情報の拡散は、NHSEが熱アンサンブルのための情報力学だけに影響を及ぼすが、1つの初期状態には影響しない、さらにエキゾチックな現象を示す。我々の研究結果は、NHSEと正準統計学の相互作用から生じる新しい非エルミート現象を探求する新たな道を開き、超低温原子量子シミュレータや量子コンピュータで実証できる可能性がある。

The non-Hermitian skin effect (NHSE) is a fascinating phenomenon in nonequilibrium systems where eigenstates massively localize at the systems' boundaries, pumping (quasi-)particles loaded in these systems unidirectionally to the boundaries. Its interplay with many-body effects have been vigorously studied recently, and inter-particle repulsion or Fermi degeneracy pressure have been shown to limit the boundary accumulation induced by the NHSE both in their eigensolutions and dynamics. However, in this work we found that anyonic statistics can even more profoundly affect the NHSE dynamics, suppressing or even reversing the state dynamicss against the localizing direction of the NHSE. This phenomenon is found to be more pronounced when more particles are involved.The spreading of quantum information in this system shows even more exotic phenomena, where NHSE affects only the information dynamics for a thermal ensemble, but not that for a single initial state. Our results open up a new avenue on exploring novel non-Hermitian phenomena arisen from the interplay between NHSE and anyonic statistics, and can potentially be demonstrated in ultracold atomic quantum simulators and quantum computers.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# 走査プローブ顕微鏡と高性能コンピューティングの統合:固定政治と報酬駆動ワークフローの実装

Integration of Scanning Probe Microscope with High-Performance Computing: fixed-policy and reward-driven workflows implementation ( http://arxiv.org/abs/2405.12300v1 )

ライセンス: Link先を確認

Yu Liu, Utkarsh Pratiush, Jason Bemis, Roger Proksch, Reece Emery, Philip D. Rack, Yu-Chen Liu, Jan-Chi Yang, Stanislav Udovenko, Susan Trolier-McKinstry, Sergei V. Kalinin,

(参考訳) 計算能力と機械学習アルゴリズムの急速な発展は、走査型プローブ顕微鏡(SPM)による科学的発見の自動化の道を開いた。自動化されたSPMの運用に向けた重要な要素は、PythonコードからのSPM制御を可能にするインターフェース、高いコンピューティングパワーの可用性、科学的発見のためのワークフローの開発である。ここでは、ローカルコンピュータまたはリモート高性能コンピュータ(HPC)からSPMを制御することができるPythonインターフェースライブラリを構築し、自律ワークフローにおける機械学習アルゴリズムの計算能力の向上を満足する。さらに、科学的な発見におけるSPMの操作を固定政治や報酬駆動のワークフローに抽象化するための一般的なプラットフォームも導入する。私たちの作業は、ルーチン操作と機械学習による自律的な科学的発見の両方のために、自動化されたSPMワークフローを構築するための完全なインフラストラクチャを提供します。

The rapid development of computation power and machine learning algorithms has paved the way for automating scientific discovery with a scanning probe microscope (SPM). The key elements towards operationalization of automated SPM are the interface to enable SPM control from Python codes, availability of high computing power, and development of workflows for scientific discovery. Here we build a Python interface library that enables controlling an SPM from either a local computer or a remote high-performance computer (HPC), which satisfies the high computation power need of machine learning algorithms in autonomous workflows. We further introduce a general platform to abstract the operations of SPM in scientific discovery into fixed-policy or reward-driven workflows. Our work provides a full infrastructure to build automated SPM workflows for both routine operations and autonomous scientific discovery with machine learning.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# リコメンダシステムにおけるインテントによる多様化

Diversifying by Intent in Recommender Systems ( http://arxiv.org/abs/2405.12327v1 )

ライセンス: Link先を確認

Yuyan Wang, Cheenar Banerjee, Samer Chucri, Fabio Soldo, Sriraj Badam, Ed H. Chi, Minmin Chen,

(参考訳) 短期的なエンゲージメントに過度にフォーカスするレコメンダシステムが、必然的に長期的なユーザエクスペリエンスを損なうことは、ますます明白になっている。しかし、望まれる信号があいまいでノイズがあり、長い視野で現れるため、長期的なユーザーエクスペリエンスを直接最適化することは困難である。本研究では,複数のインタラクションやレコメンデーションセッションにまたがるユーザインテントを高レベルのユーザ理解を導入することで,長期的なユーザエクスペリエンスを最適化するためのページ全体のレコメンデーションを実現することのメリットを示す。ユーザインテントは主に検索のコンテキスト内で調査されているが、リコメンダシステムでは探索されていない。このギャップを埋めるため,提案システムの最終段階において,確率論的意図に基づく全ページ多様化フレームワークを開発する。従来のユーザ意図の信念から始めると、提案フレームワークはこれらの信念に基づいて各位置の項目を逐次選択し、その後、その意図に関する過去の信念を更新する。長期ユーザーエクスペリエンスを最適化するために、異なるユーザ意図がページ内で表現されることを保証する。我々は、世界最大のコンテンツレコメンデーションプラットフォームのひとつで、毎日何十億ものユーザーにサービスを提供しています。我々のフレームワークは,ユーザの探究意図を取り入れ,新たな関心やコンテンツを探究する機会を捉えている。ライブ実験により,提案手法がユーザ維持とユーザ満足度の向上につながり,長期計画の促進効果が検証された。特に、ユーザは、時間とともに基盤となる意図と整合した多様なコンテンツを一貫して発見し、関与することができるため、長期的なユーザーエクスペリエンスが向上する。

It has become increasingly clear that recommender systems overly focusing on short-term engagement can inadvertently hurt long-term user experience. However, it is challenging to optimize long-term user experience directly as the desired signal is sparse, noisy and manifests over a long horizon. In this work, we show the benefits of incorporating higher-level user understanding, specifically user intents that can persist across multiple interactions or recommendation sessions, for whole-page recommendation toward optimizing long-term user experience. User intent has primarily been investigated within the context of search, but remains largely under-explored for recommender systems. To bridge this gap, we develop a probabilistic intent-based whole-page diversification framework in the final stage of a recommender system. Starting with a prior belief of user intents, the proposed diversification framework sequentially selects items at each position based on these beliefs, and subsequently updates posterior beliefs about the intents. It ensures that different user intents are represented in a page towards optimizing long-term user experience. We experiment with the intent diversification framework on one of the world's largest content recommendation platforms, serving billions of users daily. Our framework incorporates the user's exploration intent, capturing their propensity to explore new interests and content. Live experiments show that the proposed framework leads to an increase in user retention and overall user enjoyment, validating its effectiveness in facilitating long-term planning. In particular, it enables users to consistently discover and engage with diverse contents that align with their underlying intents over time, thereby leading to an improved long-term user experience.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# 3Qubit反強磁性熱機械における磁気異方性の影響

Effects of Magnetic Anisotropy on 3-Qubit Antiferromagnetic Thermal Machines ( http://arxiv.org/abs/2405.12339v1 )

ライセンス: Link先を確認

Bastian Castorene, Francisco J. Peña, Ariel Norambuena, Sergio E. Ulloa, Cristobal Araya, Patricio Vargas,

(参考訳) 本研究は, 反強磁性ハイゼンベルクXXXモデルによって記述された, 鎖と環のトポロジーを持つ3つの量子ビット系の異方性効果について検討する。我々はスターリングサイクルとオットーサイクルを探索し、容易な軸異方性は全てのケースにおいてエンジン効率を大幅に向上させることを示した。低温では、リング構成はスターリングサイクル中の作業と効率の両方においてチェーンよりも優れる。さらに、両方のトポロジーにおいて、スターリングサイクルは量子臨界点における有限の作用でカルノー効率を達成する。対照的に、準静電オットーエンジンはこれらの点でカルノット効率に達するが、有用な作業は得られない。特にスターリングサイクルは、エンジンまたは冷凍機としてのみ機能する準静的オットーサイクルとは異なり、全ての熱運転用エンジン、冷蔵庫、ヒーター、加速器を展示している。

This study investigates the anisotropic effects on a system of three qubits with chain and ring topology, described by the antiferromagnetic Heisenberg XXX model subjected to a homogeneous magnetic field. We explore the Stirling and Otto cycles and find that easy-axis anisotropy significantly enhances engine efficiency across all cases. At low temperatures, the ring configuration outperforms the chain on both work and efficiency during the Stirling cycle. Additionally, in both topologies, the Stirling cycle achieves Carnot efficiency with finite work at quantum critical points. In contrast, the quasistatic Otto engine also reaches Carnot efficiency at these points but yields no useful work. Notably, the Stirling cycle exhibits all thermal operational regimes engine, refrigerator, heater, and accelerator unlike the quasistatic Otto cycle, which functions only as an engine or refrigerator.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# 拡散干渉下における因果効果推定のためのカスケードに基づくランダム化

Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference ( http://arxiv.org/abs/2405.12340v1 )

ライセンス: Link先を確認

Zahra Fatemi, Jean Pouget-Abadie, Elena Zheleva,

(参考訳) 個人の結果が近隣ノードの処理の割り当てや行動に依存する可能性がある干渉の存在は、バイアスのある因果効果の推定につながる可能性がある。ネットワーク設計への現在のアプローチは、クラスタベースのランダム化による干渉の制限に焦点を当てており、クラスタをグラフクラスタリングを用いて識別し、クラスタランダム化はノードの処理と制御を規定する。しかし、クラスタベースのランダム化アプローチは、干渉がカスケード内で伝播し、治療に対する個人の反応が近隣のマルチホップに伝播すると、性能が低下する。カスケードシードノードの知識があれば、この干渉構造を利用して因果効果推定バイアスを軽減することができる。本研究の目的は,カスケードシードノードからの処理の割り当てを開始して,カスケード成長中の干渉を制限するために,それらのマルチホップ近傍への割り当てを伝搬し,全体的な因果効果推定誤差を低減するカスケードベースのネットワーク実験設計を提案することである。実世界のデータセットと合成データセットに関する広範な実験により、提案するフレームワークは、ネットワークデータにおける因果効果を推定する上で、既存の最先端アプローチよりも優れていることを示した。

The presence of interference, where the outcome of an individual may depend on the treatment assignment and behavior of neighboring nodes, can lead to biased causal effect estimation. Current approaches to network experiment design focus on limiting interference through cluster-based randomization, in which clusters are identified using graph clustering, and cluster randomization dictates the node assignment to treatment and control. However, cluster-based randomization approaches perform poorly when interference propagates in cascades, whereby the response of individuals to treatment propagates to their multi-hop neighbors. When we have knowledge of the cascade seed nodes, we can leverage this interference structure to mitigate the resulting causal effect estimation bias. With this goal, we propose a cascade-based network experiment design that initiates treatment assignment from the cascade seed node and propagates the assignment to their multi-hop neighbors to limit interference during cascade growth and thereby reduce the overall causal effect estimation error. Our extensive experiments on real-world and synthetic datasets demonstrate that our proposed framework outperforms the existing state-of-the-art approaches in estimating causal effects in network data.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# ニューラル演算子に基づく高速解像器を用いた大規模散乱

Large scale scattering using fast solvers based on neural operators ( http://arxiv.org/abs/2405.12380v1 )

ライセンス: Link先を確認

Zongren Zou, Adar Kahana, Enrui Zhang, Eli Turkel, Rishikesh Ranade, Jay Pathak, George Em Karniadakis,

(参考訳) 我々は最近提案された機械学習に基づく反復解法(HINTS)を拡張し,複雑な吸収境界条件を持つ外界領域におけるヘルムホルツ方程式によって記述される散乱問題を解く。 HINTS法は、ニューラル演算子(NO)と標準イテレーティブソルバ(eg Jacobi と Gauss-Seidel (GS))を組み合わせて、ニューラルネットワークのスペクトルバイアスを利用してより良い性能を実現する。 HINTSでは、従来の反復法のいくつかのイテレーションは、事前訓練されたNOの推論に置き換えられる。本研究では,HINTSを用いて,標準反復解法が失敗する2次元および3次元問題の散乱問題を解く。 2次元の正方形および三角形の散乱器と3次元の立方体とモデル潜水艦を考える。本研究では,非散乱シナリオ上でNOをトレーニングし,HINTSにNOを配置することで,散乱問題の解法として実現した散乱器の多様なジオメトリを扱うHINTSの補間能力について考察する。 HINTS法におけるNOは,新しい散乱器が与えられるたびに再トレーニングや微調整を行わずに有効であることを示す。その結果,多様な分散問題に対処する拡張HINTS手法の適応性と汎用性を強調した。

We extend a recently proposed machine-learning-based iterative solver, i.e. the hybrid iterative transferable solver (HINTS), to solve the scattering problem described by the Helmholtz equation in an exterior domain with a complex absorbing boundary condition. The HINTS method combines neural operators (NOs) with standard iterative solvers, e.g. Jacobi and Gauss-Seidel (GS), to achieve better performance by leveraging the spectral bias of neural networks. In HINTS, some iterations of the conventional iterative method are replaced by inferences of the pre-trained NO. In this work, we employ HINTS to solve the scattering problem for both 2D and 3D problems, where the standard iterative solver fails. We consider square and triangular scatterers of various sizes in 2D, and a cube and a model submarine in 3D. We explore and illustrate the extrapolation capability of HINTS in handling diverse geometries of the scatterer, which is achieved by training the NO on non-scattering scenarios and then deploying it in HINTS to solve scattering problems. The accurate results demonstrate that the NO in HINTS method remains effective without retraining or fine-tuning it whenever a new scatterer is given. Taken together, our results highlight the adaptability and versatility of the extended HINTS methodology in addressing diverse scattering problems.

翻訳日:2024-05-22 17:43:12 公開日:2024-05-20

# 静的AI評価を超えて: LLMの害とリスクに対する人間のインタラクション評価を前進させる

Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks ( http://arxiv.org/abs/2405.10632v2 )

ライセンス: Link先を確認

Lujain Ibrahim, Saffron Huang, Lama Ahmad, Markus Anderljung,

(参考訳) モデル評価は、AIシステムの安全性、リスク、社会的影響を理解する上で重要である。ほとんどの実世界のAIアプリケーションは人間とAIのインタラクションを含んでいるが、AIモデルの現在の評価(例えば、一般的なベンチマーク)はそうではない。その代わりに、人間的要因を限定的に組み込んで、モデルの安全性を個別に評価することで、人間とモデルの相互作用の複雑さを捉えることができない。本稿では,人-モデルインタラクションの評価や,モデルを用いた人-モデルインタラクションのプロセスと結果に焦点をあてた,新たな評価カテゴリ"ヒューマンインタラクション評価" (HIEs) の定義と運用について論じる。まず、HIEは安全性評価の妥当性を高め、直接人的影響と相互作用特異的害を評価し、モデルによる社会的影響の今後の評価を導くために使用できると論じる。第2に,安全性を重視したHIE設計フレームワーク(人-LLM相互作用分類を含む)について,(1)危険領域の同定,(2)使用状況の特徴付け,(3)評価パラメータの選択の3段階について提案する。第3に、過信と説得リスクの2つの潜在的評価に我々の枠組みを適用します。最後に,HIEのコスト,複製性,非表現性に関する懸念に対処するための具体的な勧告を述べる。

Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.

翻訳日:2024-05-22 17:23:37 公開日:2024-05-20

# INDUS:科学応用のための効率的かつ効率的な言語モデル

INDUS: Effective and Efficient Language Models for Scientific Applications ( http://arxiv.org/abs/2405.10725v2 )

ライセンス: Link先を確認

Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kaylin Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grezes, Megan Ansdell, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis, Michele Dolfi, Rafael Teixeira de Lima, Panagiotis Vagenas, S. Karthik Mukkavilli, Peter Staar, Sanaz Vahidinia, Ryan McGranaghan, Armin Mehrabian, Tsendgar Lee,

(参考訳) 言語モデル(LLM)は、自然言語処理(NLP)タスクにおいて顕著な結果を示した。しかし、以前の研究では、ドメイン中心のコーパスを使用して訓練されたLLMが、専門的なタスクでより良く機能することを示した。この中心的な洞察に触発されて、地球科学、生物学、物理学、ヘリオ物理、惑星科学、天体物理学領域に適した総合的なLLMスイートであるINDUSを開発し、多様なデータソースから得られたキュレートされた科学コーパスを用いて訓練した。 1) 自然言語理解タスクに対処するために,ドメイン固有の語彙とコーパスを用いて訓練されたエンコーダモデル,(2) 複数のソースから抽出された多様なデータセットを用いて訓練された対照的な学習ベースの汎用テキスト埋め込みモデル,(3) 待ち時間やリソース制約のあるアプリケーションに対処するために知識蒸留技術を用いて作成された,これらのモデルのより小さなバージョンである。また、これらの分野の研究を加速するために、CLIMATE-CHANGE-NER(entity-recognition)、NASA-QA(extractive QA)、NASA-IR(IR)という3つの新しい科学的ベンチマークデータセットを作成しました。最後に、我々のモデルは、これらの新しいタスクにおける汎用エンコーダ(RoBERTa)と既存のドメイン固有エンコーダ(SciBERT)、および関心領域における既存のベンチマークタスクよりも優れていることを示す。

Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.

翻訳日:2024-05-22 17:23:37 公開日:2024-05-20

# 線形クエリ複雑度を考慮したKnapsack制約下での非単調部分モジュラ最大化に対する決定論的近似アルゴリズムの強化

Enhanced Deterministic Approximation Algorithm for Non-monotone Submodular Maximization under Knapsack Constraint with Linear Query Complexity ( http://arxiv.org/abs/2405.12252v1 )

ライセンス: Link先を確認

Canh V. Pham,

(参考訳) 本研究では,Knapsack (SMK) 制約問題に基づく部分モジュラ最大化について,n$の基底集合上で検討する。この問題は、最適化、人工知能、機械学習といった様々な分野に応用されているため、最近多くの注目を集めた。我々は、最も高速な決定論的アルゴリズムの近似係数を、6+\epsilon$から5+\epsilon$に改善し、最高のクエリ複雑性は$O(n)$で、$\epsilon > 0$は定数パラメータである。本手法は, しきい値のグリーディ・サブルーチンと, 候補解としての2つの解集合の構築という, 2つの成分の性能を最適化することに基づいている。さらに、候補解のコストを慎重に分析することにより、より厳密な近似係数が得られる。

In this work, we consider the Submodular Maximization under Knapsack (SMK) constraint problem over the ground set of size $n$. The problem recently attracted a lot of attention due to its applications in various domains of combination optimization, artificial intelligence, and machine learning. We improve the approximation factor of the fastest deterministic algorithm from $6+\epsilon$ to $5+\epsilon$ while keeping the best query complexity of $O(n)$, where $\epsilon >0$ is a constant parameter. Our technique is based on optimizing the performance of two components: the threshold greedy subroutine and the building of two disjoint sets as candidate solutions. Besides, by carefully analyzing the cost of candidate solutions, we obtain a tighter approximation factor.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# マンモCLIP:マンモグラフィーにおけるデータ効率とロバスト性を高めるビジョン言語基礎モデル

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography ( http://arxiv.org/abs/2405.12255v1 )

ライセンス: Link先を確認

Shantanu Ghosh, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich,

(参考訳) 乳がん検出におけるCAD(Computer-Aided Diagnosis)の大規模かつ多様なトレーニングデータが欠如していることが,システム導入の障害となっている。近年,VLM(\eg CLIP)を用いた大規模画像テキストデータセットによる事前トレーニングでは,コンピュータビジョン(CV)における堅牢性とデータ効率の問題が部分的に解決されている。本稿では,大量のマンモグラム-レポートペアを事前学習した最初のVLMであるMammo-CLIPを提案する。乳がん検出に欠かせない様々なマンモグラフィー特性の分類, 位置決定, データ効率, CVにおけるCLIPと類似した堅牢性について検討した。また,マンモグラフィーレポートにおける文レベルの粒度による表現の空間的解釈を実現するために,新しい特徴属性法であるマンモファクタを提案する。コードは公開されている。 \url{https://github.com/batmanlab/Mammo-CLIP}。

The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 大規模言語モデルによる科学的仮説生成:乳癌治療における検査的検証

Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment ( http://arxiv.org/abs/2405.12258v1 )

ライセンス: Link先を確認

Abbi Abdel-Rehim, Hector Zenil, Oghenejokpeme Orhobor, Marie Fisher, Ross J. Collins, Elizabeth Bourne, Gareth W. Fearnley, Emma Tate, Holly X. Smith, Larisa N. Soldatova, Ross D. King,

(参考訳) 大規模言語モデル(LLM)はAIを変革し、人間の知性を必要とする幅広いタスクにおいて画期的なパフォーマンスを達成した。科学において、LLMの最も興味深い応用は仮説形成である。 LLMの特徴は、その確率的構造から生じるものであり、出力テキストが必ずしもトレーニングテキストからの有効な推論であるとは限らないことである。これらは「幻覚」であり、多くのアプリケーションにおいて深刻な問題である。しかし、科学では幻覚は有用であり、実験室で検証できる新しい仮説である。ここでは乳がん治療の分野での科学的仮説の根拠としてLLMの使用を実験的に検証する。 LLM GPT4を用いて,MCF7乳がん細胞株を標的とした新しいFDA承認非癌薬の仮説を立証した。実験の第1ラウンドで、GPT4は、正の制御以上のシナジースコアを持つ3つの薬物の組み合わせ(テストされた12のうち)を発見することに成功した。これらの組み合わせはイトラコナゾール+アテノール、ジスルフィラム+シムバスタチン、ジピリダモール+メベンダゾールである。その後、GPT4は最初の結果を考慮して新しい組み合わせを生成するよう求められた。その後、さらに3つの正のシナジースコア(4つの試験のうち)が発見され、これらはジスルフィラム+フヴェストラント、メベンダゾール+キナクリン、ジスルフィラム+キナクリンであった。仮説の生成元としてのGPT4の限界は、それらの説明が定式化され、説得力がないことである。 LLMは科学的仮説のエキサイティングな新しい源であると結論付けている。

Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structure, is that the output text is not necessarily a valid inference from the training text. These are 'hallucinations', and are a serious problem in many applications. However, in science, hallucinations may be useful: they are novel hypotheses whose validity may be tested by laboratory experiments. Here we experimentally test the use of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment. We applied the LLM GPT4 to hypothesize novel pairs of FDA-approved non-cancer drugs that target the MCF7 breast cancer cell line relative to the non-tumorigenic breast cell line MCF10A. In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations (out of 12 tested) with synergy scores above the positive controls. These combinations were itraconazole + atenolol, disulfiram + simvastatin and dipyridamole + mebendazole. GPT4 was then asked to generate new combinations after considering its initial results. It then discovered three more combinations with positive synergy scores (out of four tested), these were disulfiram + fulvestrant, mebendazole + quinacrine and disulfiram + quinacrine. A limitation of GPT4 as a generator of hypotheses was that its explanations for them were formulaic and unconvincing. We conclude that LLMs are an exciting novel source of scientific hypotheses.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 特徴量に基づく性能予測モデルの一般化能力:ベンチマークによる統計的解析

Generalization Ability of Feature-based Performance Prediction Models: A Statistical Analysis across Benchmarks ( http://arxiv.org/abs/2405.12259v1 )

ライセンス: Link先を確認

Ana Nikolikj, Ana Kostovska, Gjorgjina Cenikj, Carola Doerr, Tome Eftimov,

(参考訳) 本研究では,アルゴリズム性能予測モデルの各種ベンチマークスイートにおける一般化能力について検討した。探索ランドスケープ解析の特徴に基づく性能予測モデルの精度と問題収集の統計的類似性を比較すると、これらの2つの指標の間には正の相関関係があることが分かる。具体的には、トレーニングスイートとテストスイート間の高次元的特徴値分布が統計的に重要でない場合、テストエラーがトレーニングエラーと同じ範囲にあるという意味で、モデルをうまく一般化する傾向にある。 2つの実験により、標準ベンチマークスイート、BBOBおよびCECコレクション、およびBBOB問題インスタンスのアフィン組み合わせの5つのコレクションを使用して、これらの結果が検証された。

This study examines the generalization ability of algorithm performance prediction models across various benchmark suites. Comparing the statistical similarity between the problem collections with the accuracy of performance prediction models that are based on exploratory landscape analysis features, we observe that there is a positive correlation between these two measures. Specifically, when the high-dimensional feature value distributions between training and testing suites lack statistical significance, the model tends to generalize well, in the sense that the testing errors are in the same range as the training errors. Two experiments validate these findings: one involving the standard benchmark suites, the BBOB and CEC collections, and another using five collections of affine combinations of BBOB problem instances.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# EXACT: 機械学習モデル説明手法を実証的にベンチマークするプラットフォームを目指して

EXACT: Towards a platform for empirically benchmarking Machine Learning model explanation methods ( http://arxiv.org/abs/2405.12261v1 )

ライセンス: Link先を確認

Benedict Clark, Rick Wilming, Artur Dox, Paul Eschenbach, Sami Hached, Daniel Jin Wodke, Michias Taye Zewdie, Uladzislau Bruila, Marta Oliveira, Hjalmar Schulz, Luca Matteo Cornils, Danny Panknin, Ahcène Boubekki, Stefan Haufe,

(参考訳) 説明可能な人工知能(XAI)の進化する展望は、複雑な機械学習(ML)モデルの解釈可能性を改善することを目的としている。本稿では,初期ベンチマークプラットフォームであるEXACT(Explainable AI Comparison Toolkit)に,さまざまなベンチマークデータセットと新たなパフォーマンス指標を組み込むことにより,XAI手法の評価のための標準化された基盤を提供する。提案するデータセットは, クラス条件の特徴に対する基礎的真理の説明と, 新たな定量的指標を活用して, それらが生成する説明の質において, ポストホックなXAI手法の性能を評価する。我々の最近の知見は、しばしばランダムなベースラインを超えるのに苦労し、無関係な特徴に寄与するため、人気のあるXAI手法の限界を浮き彫りにした。さらに、モデルアーキテクチャが等しく動作する異なるモデルアーキテクチャから導かれる説明において、変動性を示す。この初期ベンチマークプラットフォームは、XAI研究者が新たに開発した手法の高品質をテストし、確実にすることを目的としている。

The evolving landscape of explainable artificial intelligence (XAI) aims to improve the interpretability of intricate machine learning (ML) models, yet faces challenges in formalisation and empirical validation, being an inherently unsupervised process. In this paper, we bring together various benchmark datasets and novel performance metrics in an initial benchmarking platform, the Explainable AI Comparison Toolkit (EXACT), providing a standardised foundation for evaluating XAI methods. Our datasets incorporate ground truth explanations for class-conditional features, and leveraging novel quantitative metrics, this platform assesses the performance of post-hoc XAI methods in the quality of the explanations they produce. Our recent findings have highlighted the limitations of popular XAI methods, as they often struggle to surpass random baselines, attributing significance to irrelevant features. Moreover, we show the variability in explanations derived from different equally performing model architectures. This initial benchmarking platform therefore aims to allow XAI researchers to test and assure the high quality of their newly developed methods.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 一般車両ルーティングのためのプロンプト学習

Prompt Learning for Generalized Vehicle Routing ( http://arxiv.org/abs/2405.12262v1 )

ライセンス: Link先を確認

Fei Liu, Xi Lin, Weiduo Liao, Zhenkun Wang, Qingfu Zhang, Xialiang Tong, Mingxuan Yuan,

(参考訳) ニューラル組合せ最適化(Neural combinatorial Optimization, NCO)は、手作業によるアルゴリズム設計を伴わずに、様々な車両ルーティング問題を解決するための、有望な学習ベースのアプローチである。しかし、現在のNCO法は主に分配性能に重点を置いているのに対し、実際の問題インスタンスは通常異なる分布から来ている。アウト・オブ・ディストリビューションのインスタンスに取り組むには、コストのかかる微調整アプローチや、スクラッチから一般化されたモデルの再トレーニングが必要になる。本研究は,従来の手法と異なり,NCOにおけるクロスディストリビューション適応のための効率的なプロンプト学習手法について検討する。具体的には、事前学習したモデルのゼロショット適応を高速に行う新しいプロンプト学習法を提案し、異なる分布からのルーティング問題を解く。提案モデルでは, 各種分布の一連のプロンプトを学習し, 最良適合のプロンプトを選択し, 各問題インスタンスに対して事前学習したアテンションモデルを提案する。広汎な実験により,提案手法が事前学習されたルーティングモデルの迅速な適応を促進することが示唆された。また、分散予測とゼロショット一般化の両方において、既存の一般化されたモデルよりも、多様な新しいタスクセットに優れる。私たちのコード実装はオンラインhttps://github.com/FeiLiu36/PromptVRP.comで利用可能です。

Neural combinatorial optimization (NCO) is a promising learning-based approach to solving various vehicle routing problems without much manual algorithm design. However, the current NCO methods mainly focus on the in-distribution performance, while the real-world problem instances usually come from different distributions. A costly fine-tuning approach or generalized model retraining from scratch could be needed to tackle the out-of-distribution instances. Unlike the existing methods, this work investigates an efficient prompt learning approach in NCO for cross-distribution adaptation. To be concrete, we propose a novel prompt learning method to facilitate fast zero-shot adaptation of a pre-trained model to solve routing problem instances from different distributions. The proposed model learns a set of prompts among various distributions and then selects the best-matched one to prompt a pre-trained attention model for each problem instance. Extensive experiments show that the proposed prompt learning approach facilitates the fast adaptation of pre-trained routing models. It also outperforms existing generalized models on both in-distribution prediction and zero-shot generalization to a diverse set of new tasks. Our code implementation is available online https://github.com/FeiLiu36/PromptVRP.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 大規模言語モデルにおける方向付きメトリック構造

Directed Metric Structures arising in Large Language Models ( http://arxiv.org/abs/2405.12264v1 )

ライセンス: Link先を確認

Stéphane Gaubert, Yiannis Vlassopoulos,

(参考訳) 大規模言語モデルは、コーパス内の与えられたテキストに対して、可能な次の単語の確率分布を生成するように訓練されたトランスフォーマーニューラルネットワークである。本稿では,テキスト拡張の条件付き確率分布によって定義される数学的構造について述べる。確率から-log確率への視点の変更私たちは、サブテキストの順序が、-log確率によって$\mathcal{L}$というテキストの空間で定義されたメートル法構造に完全にエンコードされていることを観察する。次に、計量ポリヘドロン $P(\mathcal{L})$ と $\mathcal{L}$ を $P(\mathcal{L})$ に等尺埋め込み( Yoneda embedding)し、テキストが特定の極端線の生成元にマップするように構成する。 P(\mathcal{L})$はこれらの極端線発生器の$(\min,+)$(熱帯)線型スパンである。生成元はまた$(\min+)$線型方程式の系を満たす。すると、$P(\mathcal{L})$はテキストの追加と互換性があることを示し、そこからボルツマン重み付きテキストベクトルの線形結合としてテキストベクトルの近似を導出する。次に、テキスト拡張とテキスト制限が等長多面体を与えることを示す双対性定理を証明します。さらに、$P(\mathcal{L})$ はいわゆる (あるバージョンの) の格子閉包であり、$\mathcal{L}$ は extremal ray generators の $(\max,+)$ であることを示す。すべての構成は圏論の解釈を持つが、圏論を明示的に用いない。分類学的解釈は付録で簡潔に説明されている。最後の付録では、意味論問題に対する構文が一般的な数学的双対性にどのように適合するかを記述している。

Large Language Models are transformer neural networks which are trained to produce a probability distribution on the possible next words to given texts in a corpus, in such a way that the most likely word predicted is the actual word in the training text. In this paper we find what is the mathematical structure defined by such conditional probability distributions of text extensions. Changing the view point from probabilities to -log probabilities we observe that the subtext order is completely encoded in a metric structure defined on the space of texts $\mathcal{L}$, by -log probabilities. We then construct a metric polyhedron $P(\mathcal{L})$ and an isometric embedding (called Yoneda embedding) of $\mathcal{L}$ into $P(\mathcal{L})$ such that texts map to generators of certain special extremal rays. We explain that $P(\mathcal{L})$ is a $(\min,+)$ (tropical) linear span of these extremal ray generators. The generators also satisfy a system of $(\min+)$ linear equations. We then show that $P(\mathcal{L})$ is compatible with adding more text and from this we derive an approximation of a text vector as a Boltzmann weighted linear combination of the vectors for words in that text. We then prove a duality theorem showing that texts extensions and text restrictions give isometric polyhedra (even though they look a priory very different). Moreover we prove that $P(\mathcal{L})$ is the lattice closure of (a version of) the so called, Isbell completion of $\mathcal{L}$ which turns out to be the $(\max,+)$ span of the text extremal ray generators. All constructions have interpretations in category theory but we don't use category theory explicitly. The categorical interpretations are briefly explained in an appendix. In the final appendix we describe how the syntax to semantics problem could fit in a general well known mathematical duality.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# SEL-CIE:非線形sRGB画像からのCIE-XYZ再構成のための知識誘導型自己教師付き学習フレームワーク

SEL-CIE: Knowledge-Guided Self-Supervised Learning Framework for CIE-XYZ Reconstruction from Non-Linear sRGB Images ( http://arxiv.org/abs/2405.12265v1 )

ライセンス: Link先を確認

Shir Barzel, Moshe Salhov, Ofir Lindenbaum, Amir Averbuch,

(参考訳) 現代のカメラは、通常、原センサーデータを表す最小処理の線形RGB画像と、sRGB状態のような高処理の非線形画像状態の2種類の画像状態を提供する。 CIE-XYZ色空間(CIE-XYZ color space)は、カメラパイプラインの一部として使用されるデバイスに依存しない線形空間であり、医用アプリケーションにおける画像の劣化、脱毛、色認識といったコンピュータビジョンタスクに役立ち、色精度が重要である。しかし、通常、画像は非線形状態に保存され、従来の方法でCIE-XYZ色画像を達成することは必ずしも不可能である。この問題に対処するため、買収パイプラインの反転に焦点を当てた古典的な方法論が開発されている。最近では、同一画像のCIE-XYZとsRGB表現を組み合わせた教師あり学習が採用されている。しかし、CIE-XYZとsRGBペアの大規模なデータセットを得るのは難しい。この制限を克服し、大量のペアデータへの依存を軽減するために、自己教師付き学習(SSL)を、ペアデータのみに依存する代用として利用することができる。本稿では,CIE-XYZ 画像と sRGB 画像の再構成に SSL 手法を併用したフレームワークを提案する。提案するフレームワークはsRGB2XYZデータセットに適用される。

Modern cameras typically offer two types of image states: a minimally processed linear raw RGB image representing the raw sensor data, and a highly-processed non-linear image state, such as the sRGB state. The CIE-XYZ color space is a device-independent linear space used as part of the camera pipeline and can be helpful for computer vision tasks, such as image deblurring, dehazing, and color recognition tasks in medical applications, where color accuracy is important. However, images are usually saved in non-linear states, and achieving CIE-XYZ color images using conventional methods is not always possible. To tackle this issue, classical methodologies have been developed that focus on reversing the acquisition pipeline. More recently, supervised learning has been employed, using paired CIE-XYZ and sRGB representations of identical images. However, obtaining a large-scale dataset of CIE-XYZ and sRGB pairs can be challenging. To overcome this limitation and mitigate the reliance on large amounts of paired data, self-supervised learning (SSL) can be utilized as a substitute for relying solely on paired data. This paper proposes a framework for using SSL methods alongside paired data to reconstruct CIE-XYZ images and re-render sRGB images, outperforming existing approaches. The proposed framework is applied to the sRGB2XYZ dataset.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# EGAN: ランサムウェア普及のための進化的GAN

EGAN: Evolutional GAN for Ransomware Evasion ( http://arxiv.org/abs/2405.12266v1 )

ライセンス: Link先を確認

Daniel Commey, Benjamin Appiah, Bill K. Frimpong, Isaac Osei, Ebenezer N. A. Hammond, Garth V. Crosby,

(参考訳) 敵の訓練は、敵のマルウェアに対する防御戦略として証明されている。しかし、このような訓練のために敵のマルウェアサンプルを生成することは、敵のマルウェアが回避的かつ機能的であり続ける必要があるため、課題となる。この研究は、この制限に対処する攻撃フレームワークEGANを提案する。 EGANはEvolution StrategyとGenerative Adversarial Networkを活用して、元の機能を保存しながらランサムウェアファイルを変更可能な一連の攻撃アクションを選択する。私たちは、このフレームワークを、VirusTotalにリストされたAIを使った商用アンチウイルスシステムでテストし、我々のフレームワークがこれらのシステムの大部分をバイパスできることを示した。さらに,EGAN攻撃フレームワークが他の商用非AIアンチウイルスソリューションを回避できるかどうかを検討した。この結果から, 敵ランサムウェアが生成したランサムウェアは, それらのいくつかを回避できる可能性が示唆された。

Adversarial Training is a proven defense strategy against adversarial malware. However, generating adversarial malware samples for this type of training presents a challenge because the resulting adversarial malware needs to remain evasive and functional. This work proposes an attack framework, EGAN, to address this limitation. EGAN leverages an Evolution Strategy and Generative Adversarial Network to select a sequence of attack actions that can mutate a Ransomware file while preserving its original functionality. We tested this framework on popular AI-powered commercial antivirus systems listed on VirusTotal and demonstrated that our framework is capable of bypassing the majority of these systems. Moreover, we evaluated whether the EGAN attack framework can evade other commercial non-AI antivirus solutions. Our results indicate that the adversarial ransomware generated can increase the probability of evading some of them.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 多体量子系における固有状態の局在

Eigenstate localization in a many-body quantum system ( http://arxiv.org/abs/2405.12279v1 )

ライセンス: Link先を確認

Chao Yin, Rahul Nandkishore, Andrew Lucas,

(参考訳) 非零エネルギー密度以下のすべての固有状態は、ヒルベルト空間内の「エネルギー的に許容される構成」の指数的に小さな部分で局所化される。我々の構成は古典的な低密度パリティチェックコードへの量子摂動に基づいている。原理的には、この固有状態の局在は、効率的に準備可能な混合状態において、ほとんどボディの相関関数を計測することによって検出することができる。

We prove the existence of extensive many-body Hamiltonians with few-body interactions and a many-body mobility edge: all eigenstates below a nonzero energy density are localized in an exponentially small fraction of "energetically allowed configurations" within Hilbert space. Our construction is based on quantum perturbations to a classical low-density parity check code. In principle, it is possible to detect this eigenstate localization by measuring few-body correlation functions in efficiently preparable mixed states.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 量子コンピュータを用いた電子構造における時間外相関器

Out-of-time-order correlators in electronic structure using Quantum Computers ( http://arxiv.org/abs/2405.12289v1 )

ライセンス: Link先を確認

K. J. Joven, V. M. Bastidas,

(参考訳) 演算子の拡散は、統計力学やブラックホール物理学から量子情報まで様々な分野に深く影響している。量子化の通常の方法は、古典的カオス力学におけるリャプノフ指数の量子アナログであるOtOC(out-of-time-order correlator)である。本研究では,量子コンピュータにおける電子構造の量子シミュレーションにおける演算子拡散現象について検討する。その結果を裏付けるために、水素鎖$H_4$に焦点をあて、この鎖が平衡幾何学から遠く離れている場合、作用素の拡散が促進されることを示す。また,バイパーティライトの絡み合いのダイナミクスと,そのパーティションサイズへの依存性についても検討した。本研究により, 領域と体積法則によく似た特異な特徴が認められた。電子構造の量子シミュレーションにおいて,演算子によるコヒーレントな誤差の拡散に関する知見を提供し,今日利用可能な様々なプラットフォームで実験的に実装可能である。

Operator spreading has profound implications in diverse fields ranging from statistical mechanics and blackhole physics to quantum information. The usual way to quantify it is through out-of-time-order correlators (OTOCs), which are the quantum analog to Lyapunov exponents in classical chaotic dynamics. In this work we explore the phenomenon of operator spreading in quantum simulation of electronic structure in quantum computers. To substantiate our results, we focus on a hydrogen chain $H_4$ and demonstrate that operator spreading is enhanced when the chain is far from its equilibrium geometry. We also investigate the dynamics of bipartite entanglement and its dependence on the partition's size. Our findings reveal distinctive signatures closely resembling area- and volume-laws in equilibrium and far-from-equilibrium geometries, respectively. Our results provide insight of operator spreading of coherent errors in quantum simulation of electronic structure and can be experimentally implemented in various platforms available today.

翻訳日:2024-05-22 15:17:08 公開日:2024-05-20

# 射影による量子リサジョウス図形

Quantum Lissajous Figures via Projection ( http://arxiv.org/abs/2405.12291v1 )

ライセンス: Link先を確認

Errico J. Russo,

(参考訳) 角周周波数の2DHOに対して、新しい量子リッサホス状態のカテゴリを示す。状態は通常のコヒーレント状態の2DHOの退化部分空間への射影から生じる。このように、新しい古典的でない量子力学的定常状態は古典的だが非定常的コヒーレント状態から生じる。リッサホス図形との関係は、我々の状態はすべて、対応する古典的リッサホス図形に沿って局所化される確率密度を持つということである。さらに、我々は、確率電流密度と検討中の状態における量子干渉の出現との間の重要な相互作用を強調した。そうすることで、ボルテックス状態として知られる状態のクラスについて一貫した議論をすることができる。

We present a new category of quantum Lissajous states for a 2DHO having commensurate angular frequencies. The states result from the projection of ordinary coherent states onto a degenerate subspace of the 2DHO. In this way, new, non-classical quantum mechanically stationary states arise from the classical but non-stationary coherent states. The connection to Lissajous figures is that our states all have probability densities that are localized along the corresponding classical Lissajous figures. We further emphasize the important interplay between the probability current density and the emergence of quantum interference in the states we examine. In doing so, we are able to present a consistent discussion of a class of states known as vortex states.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 量子光と創発性グラビトン偏光子に結合した分数量子ホール液体の理論

Theory of fractional quantum Hall liquids coupled to quantum light and emergent graviton-polaritons ( http://arxiv.org/abs/2405.12292v1 )

ライセンス: Link先を確認

Zeno Bacciconi, Hernan Xavier, Iacopo Carusotto, Titas Chanda, Marcello Dalmonte,

(参考訳) 最近のブレークスルー実験は、量子電磁空洞場と相互作用する量子ホール状態のダイナミクスを探索する方法を実証している。強く結合した非局所キャビティモードが整数量子ホール物理学に与える影響は近年研究されているが、分数量子ホール(FQH)液体、そしてより一般的には、物質の分数化状態に対する影響は未解明のままである。本研究では、量子光に結合したFQH状態の理解のための理論的枠組みを開発する。特に、解析的議論とテンソルネットワークシミュレーションを組み合わせることで、単一モードキャビティにおける$\nu=1/3$ Laughlin状態と有限電場勾配のダイナミクスを研究する。 FQH状態の位相的シグネチャは、量子化されたホール比抵抗の持続性によって示されるように、非局所的な空洞真空変動に対して頑健である。しかし、エンタングルメントスペクトルは、光物質の絡み合いとトポロジーの直接指紋を持ち、U(1)$カウントの独特な極性的なレプリカが明らかになる。キャビティ変動に対するさらなる応答として、長波長相関で符号化された圧縮されたFQH幾何も見出す。さらに, 強いキャビティ場勾配への移動は, 勾配方向の強い密度変調を特徴とし, 滑動する友長・ラッティンガー液相への不安定性をもたらすことを観察した。最後に、FQH相内の低エネルギー励起スペクトルを探索することにより、四極性FQH集団励起(グラビトンとして知られる)と光のハイブリッド化から生じる新しい準粒子、グラビトン・ポラリトンを同定する。本研究の結果から,より複雑なシナリオへの拡張の可能性について考察した。

Recent breakthrough experiments have demonstrated how it is now possible to explore the dynamics of quantum Hall states interacting with quantum electromagnetic cavity fields. While the impact of strongly coupled non-local cavity modes on integer quantum Hall physics has been recently addressed, its effects on fractional quantum Hall (FQH) liquids -- and, more generally, fractionalized states of matter -- remain largely unexplored. In this work, we develop a theoretical framework for the understanding of FQH states coupled to quantum light. In particular, combining analytical arguments with tensor network simulations, we study the dynamics of a $\nu=1/3$ Laughlin state in a single-mode cavity with finite electric field gradients. We find that the topological signatures of the FQH state remain robust against the non-local cavity vacuum fluctuations, as indicated by the endurance of the quantized Hall resistivity. The entanglement spectra, however, carry direct fingerprints of light-matter entanglement and topology, revealing peculiar polaritonic replicas of the $U(1)$ counting. As a further response to cavity fluctuations, we also find a squeezed FQH geometry, encoded in long-wavelength correlations. We additionally observe that moving to strong cavity field gradients leads to an instability towards a sliding Tomonaga-Luttinger liquid phase, featuring a strong density modulation in the gradient direction. Finally, by exploring the low-energy excited spectrum inside the FQH phase, we identify a new quasiparticle, the graviton-polariton, arising from the hybridization between quadrupolar FQH collective excitations (known as gravitons) and light. We discuss the experimental implications of our findings and possible extension of our results to more complex scenarios.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 誘導型グラフニューラルネットワークに対する効率的なモデルステアリング攻撃

Efficient Model-Stealing Attacks Against Inductive Graph Neural Networks ( http://arxiv.org/abs/2405.12295v1 )

ライセンス: Link先を確認

Marcin Podhajski, Jan Dubiński, Franziska Boenisch, Adam Dziedzic, Agnieszka Pregowska, Tomasz Michalak,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造で組織された実世界のデータを処理するための強力なツールとして認識されている。特に、事前に定義されたグラフ構造に依存しないグラフ構造化データの処理を可能にするインダクティブGNNは、ますます多様なアプリケーションにおいて重要になっている。これらのネットワークは、様々なタスクにわたる習熟度を示すため、敵が標的ネットワークの機能の複製を試みているモデルステーリング攻撃の利益源となる。画像やテキストで訓練されたモデルに焦点を当てたモデルステアリング攻撃の開発には、多大な努力が払われている。しかし、グラフデータで訓練されたGNNには、ほとんど注意が払われていない。本稿では,グラフコントラスト学習とスペクトルグラフ拡張に基づく誘導型GNNに対する教師なしモデルステーリング攻撃手法を提案する。提案した攻撃は6つのデータセットで徹底的に評価される。その結果,既存の盗難攻撃と比較して高い効率性を示した。より具体的には、我々の攻撃は、ターゲットモデルに送信されるクエリを少なくしながら、盗難モデルの忠実度と下流精度を達成するため、全てのベンチマークでベースラインを上回ります。

Graph Neural Networks (GNNs) are recognized as potent tools for processing real-world data organized in graph structures. Especially inductive GNNs, which enable the processing of graph-structured data without relying on predefined graph structures, are gaining importance in an increasingly wide variety of applications. As these networks demonstrate proficiency across a range of tasks, they become lucrative targets for model-stealing attacks where an adversary seeks to replicate the functionality of the targeted network. A large effort has been made to develop model-stealing attacks that focus on models trained with images and texts. However, little attention has been paid to GNNs trained on graph data. This paper introduces a novel method for unsupervised model-stealing attacks against inductive GNNs, based on graph contrasting learning and spectral graph augmentations to efficiently extract information from the target model. The proposed attack is thoroughly evaluated on six datasets. The results show that this approach demonstrates a higher level of efficiency compared to existing stealing attacks. More concretely, our attack outperforms the baseline on all benchmarks achieving higher fidelity and downstream accuracy of the stolen model while requiring fewer queries sent to the target model.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# メタオーバーフィッティングを緩和するグラディエントの摂動

Perturbing the Gradient for Alleviating Meta Overfitting ( http://arxiv.org/abs/2405.12299v1 )

ライセンス: Link先を確認

Manas Gogoi, Sambhavi Tiwari, Shekhar Verma,

(参考訳) 相互非排他性と多様性の欠如は、単一のグローバル関数がすべてのメタトレーニングタスクのサポートセットデータセットに適合し、新しい未知のタスクに一般化できないことを意味する。この問題は、メタトレーニングタスクではエラー率の低いことが証明されているが、新しいタスクではエラー率が高い。しかしながら、タスクの多様性を高め、いくつかのタスクに対するモデルの信頼性を低下させるという、2つの目標のいずれかを念頭に置いて、この問題に対する新しい解決策が数多く存在する。そこで本研究では,数ショットの正弦波回帰や数ショットの分類など,数ショットの学習環境におけるメタオーバーフィッティングに対処する手法を提案する。提案手法は,非相互排他的タスク設定における学習における最先端のベースラインと比較して,一般化性能の向上を実証する。本論文は,メタラーニングにおけるオーバーフィッティングに対処するための洞察を提供することと,より堅牢で一般化可能なモデルに向けての分野を前進させることを目的としている。

The reason for Meta Overfitting can be attributed to two factors: Mutual Non-exclusivity and the Lack of diversity, consequent to which a single global function can fit the support set data of all the meta-training tasks and fail to generalize to new unseen tasks. This issue is evidenced by low error rates on the meta-training tasks, but high error rates on new tasks. However, there can be a number of novel solutions to this problem keeping in mind any of the two objectives to be attained, i.e. to increase diversity in the tasks and to reduce the confidence of the model for some of the tasks. In light of the above, this paper proposes a number of solutions to tackle meta-overfitting on few-shot learning settings, such as few-shot sinusoid regression and few shot classification. Our proposed approaches demonstrate improved generalization performance compared to state-of-the-art baselines for learning in a non-mutually exclusive task setting. Overall, this paper aims to provide insights into tackling overfitting in meta-learning and to advance the field towards more robust and generalizable models.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 分散型衛星ルーティングのための連続的深層強化学習

Continual Deep Reinforcement Learning for Decentralized Satellite Routing ( http://arxiv.org/abs/2405.12308v1 )

ライセンス: Link先を確認

Federico Lozano-Cuadra, Beatriz Soret, Israel Leyva-Mayorga, Petar Popovski,

(参考訳) 本稿では,低地球軌道衛星コンステレーションにおける分散ルーティングを連続的深部強化学習(DRL)に基づいて完全な解法を提案する。これは、衛星における部分的な知識や連続的な動き、交通、通信リンク、通信バッファといったシステムの不確実性の時間的変化など、複数の課題に対処する必要がある。我々は,各衛星が独立した意思決定エージェントとして機能し,近隣のエージェントからのフィードバックに基づいて環境の知識を限定的に獲得するマルチエージェントアプローチに従う。解法は2つの段階に分けられる。まず、オフライン学習フェーズは、分散化された決定と、グローバルエクスペリエンスをトレーニングしたグローバルディープニューラルネットワーク(DNN)に依存します。次に、ローカル、オンボード、および事前訓練されたDNNによるオンラインフェーズでは、(1)星座の予測可能な条件を次の衛星と共有する各衛星が活用するモデル予測、(2)エージェントのモデルをまずクラスタレベルでマージし、次にグローバルパラメータサーバに集約するフェデレートラーニング(FL)という2つの方法で、環境とともに継続的な学習を行う必要がある。その結果,提案するマルチエージェントDRLフレームワークは,最短パス方式と同じE2E性能を実現するが,後者は集中ノードにおけるリアルタイムネットワーク知識の通信オーバーヘッドを前提としている。重要なことは、当社のソリューションは混雑条件に順応し、負荷の少ない経路を活用できるということです。さらに,長期的アライメントに応用された予測と,長期的アライメントに利用したFLの相乗効果により,時間とともにモデルのばらつきが容易に取り組まれる。

This paper introduces a full solution for decentralized routing in Low Earth Orbit satellite constellations based on continual Deep Reinforcement Learning (DRL). This requires addressing multiple challenges, including the partial knowledge at the satellites and their continuous movement, and the time-varying sources of uncertainty in the system, such as traffic, communication links, or communication buffers. We follow a multi-agent approach, where each satellite acts as an independent decision-making agent, while acquiring a limited knowledge of the environment based on the feedback received from the nearby agents. The solution is divided into two phases. First, an offline learning phase relies on decentralized decisions and a global Deep Neural Network (DNN) trained with global experiences. Then, the online phase with local, on-board, and pre-trained DNNs requires continual learning to evolve with the environment, which can be done in two different ways: (1) Model anticipation, where the predictable conditions of the constellation are exploited by each satellite sharing local model with the next satellite; and (2) Federated Learning (FL), where each agent's model is merged first at the cluster level and then aggregated in a global Parameter Server. The results show that, without high congestion, the proposed Multi-Agent DRL framework achieves the same E2E performance as a shortest-path solution, but the latter assumes intensive communication overhead for real-time network-wise knowledge of the system at a centralized node, whereas ours only requires limited feedback exchange among first neighbour satellites. Importantly, our solution adapts well to congestion conditions and exploits less loaded paths. Moreover, the divergence of models over time is easily tackled by the synergy between anticipation, applied in short-term alignment, and FL, utilized for long-term alignment.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 単一基底状態からの等変量子系の高精度学習

Accurate Learning of Equivariant Quantum Systems from a Single Ground State ( http://arxiv.org/abs/2405.12309v1 )

ライセンス: Link先を確認

Štěpán Šmíd, Roberto Bondesan,

(参考訳) システムパラメータ間の特性予測は、分子動力学から変分量子アルゴリズムまで、量子物理学において重要な課題である。近年,この課題を解決するアルゴリズムの開発が進められている。ここでは, 周期的境界条件を持つ系のすべての基底状態の性質を, 1つの基底状態サンプルから学習する方法を示すことにより, これらのアルゴリズムの効率を劇的に改善する。予測誤差は熱力学の限界でゼロとなる傾向を示し、数値的な検証を行う。

Predicting properties across system parameters is an important task in quantum physics, with applications ranging from molecular dynamics to variational quantum algorithms. Recently, provably efficient algorithms to solve this task for ground states within a gapped phase were developed. Here we dramatically improve the efficiency of these algorithms by showing how to learn properties of all ground states for systems with periodic boundary conditions from a single ground state sample. We prove that the prediction error tends to zero in the thermodynamic limit and numerically verify the results.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 新しいバイアス測定の原理的アプローチ

A Principled Approach for a New Bias Measure ( http://arxiv.org/abs/2405.12312v1 )

ライセンス: Link先を確認

Bruno Scarone, Alfredo Viola, Ricardo Baeza-Yates,

(参考訳) 意思決定に機械学習とデータ駆動アルゴリズムが広く使われていることは、長年にわたり着実に増加している。医療、雇用、金融、教育、法制度など、様々な分野でこの現象が起きている。負のデータであるemph{bias}は、特定の集団に有害な結果をもたらす傾向がある。バイアスの負の結果に対処する緩和戦略や効果的な政策は、バイアスが存在するという認識から始まり、その理解と定量化の方法である。しかし、データのバイアスを測定する方法にはコンセンサスがないため、しばしば意図された意味は文脈に依存し、研究コミュニティには一様ではない。本研究の主な貢献は,(1)保護群に対するデータセットのバイアスレベルを定義し,効率的に定量化するための一般的なアルゴリズムフレームワーク,(2)新しいバイアス尺度の定義である。この結果は,9つの公開データセットを用いて実験的に検証され,理論的に解析され,新たな知見が得られた。当社のアプローチに基づいて,政策立案者にとって有用なバイアス緩和アルゴリズムも導出する。

The widespread use of machine learning and data-driven algorithms for decision making has been steadily increasing over many years. The areas in which this is happening are diverse: healthcare, employment, finance, education, the legal system to name a few; and the associated negative side effects are being increasingly harmful for society. Negative data \emph{bias} is one of those, which tends to result in harmful consequences for specific groups of people. Any mitigation strategy or effective policy that addresses the negative consequences of bias must start with awareness that bias exists, together with a way to understand and quantify it. However, there is a lack of consensus on how to measure data bias and oftentimes the intended meaning is context dependent and not uniform within the research community. The main contributions of our work are: (1) a general algorithmic framework for defining and efficiently quantifying the bias level of a dataset with respect to a protected group; and (2) the definition of a new bias measure. Our results are experimentally validated using nine publicly available datasets and theoretically analyzed, which provide novel insights about the problem. Based on our approach, we also derive a bias mitigation algorithm that might be useful to policymakers.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 深層学習に基づくハイパースペクトル画像再構成による農作物の品質評価

Deep learning-based hyperspectral image reconstruction for quality assessment of agro-product ( http://arxiv.org/abs/2405.12313v1 )

ライセンス: Link先を確認

Md. Toukir Ahmed, Ocean Monjur, Mohammed Kamruzzaman,

(参考訳) ハイパースペクトルイメージング(HSI)は、近年、多くの農業用途において有望なツールとして登場したが、大量のデータを処理するのに膨大な時間を要するため、リアルタイムシステムでは直接利用できない。したがって、現在のHSIシステムでは、単純でコンパクトで費用対効果の高いイメージングシステムの開発は不可能である。そこで本研究の目的は,農業用深層学習によるRGB画像からのハイパースペクトル画像の再構成である。具体的には、高スペクトル畳み込みニューラルネットワーク(HSCNN-D)を用いて、サツマイモの可溶性固形物(SSC)を予測するために、RGB画像から高スペクトル画像を再構成した。アルゴリズムは、RGB画像からのハイパースペクトル画像を正確に再構成し、その結果のスペクトルは、地上構造と密に一致した。再構成スペクトルに基づく部分最小二乗回帰(PLSR)モデルは,サツマイモのSSC予測の可能性を示した。これらの知見は,様々な農業用ツールとして,ディープラーニングに基づくハイパースペクトル画像再構成の可能性を強調した。

Hyperspectral imaging (HSI) has recently emerged as a promising tool for many agricultural applications; however, the technology cannot be directly used in a real-time system due to the extensive time needed to process large volumes of data. Consequently, the development of a simple, compact, and cost-effective imaging system is not possible with the current HSI systems. Therefore, the overall goal of this study was to reconstruct hyperspectral images from RGB images through deep learning for agricultural applications. Specifically, this study used Hyperspectral Convolutional Neural Network - Dense (HSCNN-D) to reconstruct hyperspectral images from RGB images for predicting soluble solid content (SSC) in sweet potatoes. The algorithm accurately reconstructed the hyperspectral images from RGB images, with the resulting spectra closely matching the ground-truth. The partial least squares regression (PLSR) model based on reconstructed spectra outperformed the model using the full spectral range, demonstrating its potential for SSC prediction in sweet potatoes. These findings highlight the potential of deep learning-based hyperspectral image reconstruction as a low-cost, efficient tool for various agricultural uses.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 二重ランドマーク積分作用素を用いた高次元ノイズデータセットのカーネルスペクトル結合埋め込み

Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators ( http://arxiv.org/abs/2405.12317v1 )

ライセンス: Link先を確認

Xiucai Ding, Rong Ma,

(参考訳) 複数の異種データセットの統合解析は、多くの研究分野、特に単細胞ゲノム学や医療情報学において標準的な実践となっている。既存のアプローチは、しばしば非線形構造を捕捉する際の限られたパワー、ノイズや高次元効果の不足、信号への適応性の欠如、サンプルサイズの不均衡、そしてそれらの結果の解釈が困難である。これらの制約に対処するために、独立に観測された2つの高次元ノイズデータセットの結合埋め込みを実現する新しいカーネルスペクトル法を提案する。提案手法は,組込み品質を向上させるために,データセット間で共有される可能性のある低次元構造を自動的に捕捉し,活用する。得られた低次元埋め込みは、同時クラスタリング、データの可視化、デノイングなど、多くの下流タスクに利用できる。提案手法は厳密な理論的解析によって正当化される。具体的には,低次元雑音信号の回復における手法の整合性を示し,信号対雑音比が収束率に与える影響を特徴付ける。合同多様体モデルフレームワークの下で、新たに導入された積分作用素の固有函数への究極の埋め込みの収束を確立する。これらの作用素はデュオランドマーク積分作用素と呼ばれ、再生されたカーネルヒルベルト空間(RKHS)の畳み込みカーネル写像によって定義される。これらのRKHSは、2つのデータセットの部分的または完全に共有された低次元の非線形信号構造をキャプチャする。 2つの単一セルオミクスデータセットの数値実験と解析により,既存手法よりも提案手法の利点を実証した。

Integrative analysis of multiple heterogeneous datasets has become standard practice in many research fields, especially in single-cell genomics and medical informatics. Existing approaches oftentimes suffer from limited power in capturing nonlinear structures, insufficient account of noisiness and effects of high-dimensionality, lack of adaptivity to signals and sample sizes imbalance, and their results are sometimes difficult to interpret. To address these limitations, we propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets. The proposed method automatically captures and leverages possibly shared low-dimensional structures across datasets to enhance embedding quality. The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising. The proposed method is justified by rigorous theoretical analysis. Specifically, we show the consistency of our method in recovering the low-dimensional noiseless signals, and characterize the effects of the signal-to-noise ratios on the rates of convergence. Under a joint manifolds model framework, we establish the convergence of ultimate embeddings to the eigenfunctions of some newly introduced integral operators. These operators, referred to as duo-landmark integral operators, are defined by the convolutional kernel maps of some reproducing kernel Hilbert spaces (RKHSs). These RKHSs capture the either partially or entirely shared underlying low-dimensional nonlinear signal structures of the two datasets. Our numerical experiments and analyses of two single-cell omics datasets demonstrate the empirical advantages of the proposed method over existing methods in both embeddings and several downstream tasks.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 胸部X線画像における正確な肺切開のためのチャネルとコンテキストアテンションを有する階層型セグネット

Hierarchical SegNet with Channel and Context Attention for Accurate Lung Segmentation in Chest X-ray Images ( http://arxiv.org/abs/2405.12318v1 )

ライセンス: Link先を確認

Mohammad Ali Labbaf Khaniki, Nazanin Mahjourian, Mohammad Manthouri,

(参考訳) 胸部X線像における肺セグメンテーションは、様々な肺疾患の正確な診断と治療を可能にする医療画像解析において重要な課題である。本稿では,階層型セグネットとマルチモーダルアテンション機構を組み合わせた肺セグメンテーション手法を提案する。チャネルアテンション機構は肺領域セグメンテーションに不可欠な特定の特徴マップやチャネルを強調し、コンテキストアテンション機構は異なる空間領域の重要性を適応的に重み付けする。両方のメカニズムを組み合わせることで、モデルが複雑なパターンや特徴間の関係をよりよく捉え、セグメンテーションの精度が向上し、特徴表現が向上する。さらに、注意情報とエンコーダ特徴を統合するために注意ゲーティング機構を用い、異なる注意特徴の重要性を適応的に評価し、無関係な特徴を無視できるようにする。実験により,本手法は肺分画作業における最先端性能を達成し,既存手法より優れていたことを示す。提案手法は,肺疾患の診断と治療の精度と効率を向上させる可能性があり,他の画像解析にも適用可能である。

Lung segmentation in chest X-ray images is a critical task in medical image analysis, enabling accurate diagnosis and treatment of various lung diseases. In this paper, we propose a novel approach for lung segmentation by integrating Hierarchical SegNet with a proposed multi-modal attention mechanism. The channel attention mechanism highlights specific feature maps or channels crucial for lung region segmentation, while the context attention mechanism adaptively weighs the importance of different spatial regions. By combining both mechanisms, the proposed mechanism enables the model to better capture complex patterns and relationships between various features, leading to improved segmentation accuracy and better feature representation. Furthermore, an attention gating mechanism is employed to integrate attention information with encoder features, allowing the model to adaptively weigh the importance of different attention features and ignore irrelevant ones. Experimental results demonstrate that our proposed approach achieves state-of-the-art performance in lung segmentation tasks, outperforming existing methods. The proposed approach has the potential to improve the accuracy and efficiency of lung disease diagnosis and treatment, and can be extended to other medical image analysis tasks.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 超局所気象予測を用いた動的ラインレーティング:機械学習によるアプローチ

Dynamic Line Rating using Hyper-local Weather Predictions: A Machine Learning Approach ( http://arxiv.org/abs/2405.12319v1 )

ライセンス: Link先を確認

Henri Manninen, Markus Lippus, Georg Rute,

(参考訳) 送電網における再生可能エネルギー統合には動的ラインレーティング(DLR)システムが不可欠である。しかし、従来の方法では、あらゆる極やスパンにセンサーをインストールする非現実性のために、センサーデータの課題に直面する。さらに、センサベースのアプローチは、急速に変化する気象条件においてDLRを予測するのに苦労する可能性がある。本稿では,ハイパーローカル気象予報データとともに機械学習(ML)技術を活用する新しい手法を提案する。センサデータにのみ依存する従来の手法とは異なり、このアプローチでは、全ネットワークスケールでハイパーローカル気象パラメータを予測するためにトレーニングされたMLモデルを使用する。地形データを統合することで、景観の特徴や頭上線周辺の障害物を考慮した予測精度が向上する。本稿では,不確実性に関連するリスクを軽減するため,DLR評価のための信頼区間を提案する。エストニアのケーススタディでは、提案手法の実践的な実装を実証し、実世界のシナリオにおけるその有効性を強調している。本研究は,センサベースアプローチの限界に対処することにより,送電系統における再生可能エネルギー統合の談話,電力系統における効率と信頼性の向上に寄与する。

Dynamic Line Rating (DLR) systems are crucial for renewable energy integration in transmission networks. However, traditional methods relying on sensor data face challenges due to the impracticality of installing sensors on every pole or span. Additionally, sensor-based approaches may struggle predicting DLR in rapidly changing weather conditions. This paper proposes a novel approach, leveraging machine learning (ML) techniques alongside hyper-local weather forecast data. Unlike conventional methods, which solely rely on sensor data, this approach utilizes ML models trained to predict hyper-local weather parameters on a full network scale. Integrating topographical data enhances prediction accuracy by accounting for landscape features and obstacles around overhead lines. The paper introduces confidence intervals for DLR assessments to mitigate risks associated with uncertainties. A case study from Estonia demonstrates the practical implementation of the proposed methodology, highlighting its effectiveness in real-world scenarios. By addressing limitations of sensor-based approaches, this research contributes to the discourse of renewable energy integration in transmission systems, advancing efficiency and reliability in the power grid.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 物理的不可避関数とゼロ知識証明によるブロックチェーンベースのIoTシステムのセキュア化

Securing Blockchain-based IoT Systems with Physical Unclonable Functions and Zero-Knowledge Proofs ( http://arxiv.org/abs/2405.12322v1 )

ライセンス: Link先を確認

Daniel Commey, Sena Hounsinou, Garth V. Crosby,

(参考訳) 本稿では,ブロックチェーンベースのIoTシステムを保護するためのフレームワークとして,Physical Unclonable Functions(PUF)とZero-Knowledge Proofs(ZKP)をHyperledger Fabric環境に統合する。提案フレームワークは、PUFをユニークなデバイス識別に、ZKPをプライバシ保護認証とトランザクション処理に活用する。実験の結果、様々な攻撃に対するフレームワークの実現可能性、性能、セキュリティが示された。このフレームワークは、ブロックチェーンベースのIoTシステムのセキュリティ問題に対処するための包括的なソリューションを提供する。

This paper presents a framework for securing blockchain-based IoT systems by integrating Physical Unclonable Functions (PUFs) and Zero-Knowledge Proofs (ZKPs) within a Hyperledger Fabric environment. The proposed framework leverages PUFs for unique device identification and ZKPs for privacy-preserving authentication and transaction processing. Experimental results demonstrate the framework's feasibility, performance, and security against various attacks. This framework provides a comprehensive solution for addressing the security challenges in blockchain-based IoT systems.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# ボールのオーバーラップ数モデル非依存対数(ONB-MACF):信頼に値する人工知能のためのデータ構造に基づく対実生成法

Overlap Number of Balls Model-Agnostic CounterFactuals (ONB-MACF): A Data-Morphology-based Counterfactual Generation Method for Trustworthy Artificial Intelligence ( http://arxiv.org/abs/2405.12326v1 )

ライセンス: Link先を確認

José Daniel Pascual-Triana, Alberto Fernández, Javier Del Ser, Francisco Herrera,

(参考訳) 説明可能な人工知能(XAI)は、AIシステムの運用メカニズムを理解することを目的とした重要な研究領域である。 XAIは、これらのAIシステムをより理解しやすく信頼性の高いものにし、意思決定プロセスに関する洞察を提供することを目指している。明確で分かりやすい説明を生み出すことで、XAIはユーザー、実践者、利害関係者がモデルの判断を信頼できるようになる。本研究は,データ形態学戦略の価値を解析し,反実的説明を生成する。ボールのオーバーラップ数(Overlap Number of Balls Model-Agnostic CounterFactuals,ONB-MACF)は、データ形態を利用してモデルの決定境界を推定する、モデルに依存しない逆ファクト生成法である。 ONB-MACF法は、被覆点がクラスを共有するデータ空間内の超球面を構築し、決定境界をマッピングする。その後、インスタンスの属性を最も近い代替クラスハイパースフィアに向けて漸進的に調整し、最小限の変更で決定境界を越えることで、カウンターファクトアルが生成される。設計により、ONB-MACF 法は、データ分布に従う、実現可能でスパースな偽物を生成する。両視点から総合ベンチマークを行ったところ,ONB-MACF法は,多様な表型データセット上での複数の品質指標において,既存の最先端の偽物生成手法よりも優れていることがわかった。これは我々の仮説を支持し、信頼できるAIのためのデータ形態に基づく説明可能性戦略の可能性を示している。

Explainable Artificial Intelligence (XAI) is a pivotal research domain aimed at understanding the operational mechanisms of AI systems, particularly those considered ``black boxes'' due to their complex, opaque nature. XAI seeks to make these AI systems more understandable and trustworthy, providing insight into their decision-making processes. By producing clear and comprehensible explanations, XAI enables users, practitioners, and stakeholders to trust a model's decisions. This work analyses the value of data morphology strategies in generating counterfactual explanations. It introduces the Overlap Number of Balls Model-Agnostic CounterFactuals (ONB-MACF) method, a model-agnostic counterfactual generator that leverages data morphology to estimate a model's decision boundaries. The ONB-MACF method constructs hyperspheres in the data space whose covered points share a class, mapping the decision boundary. Counterfactuals are then generated by incrementally adjusting an instance's attributes towards the nearest alternate-class hypersphere, crossing the decision boundary with minimal modifications. By design, the ONB-MACF method generates feasible and sparse counterfactuals that follow the data distribution. Our comprehensive benchmark from a double perspective (quantitative and qualitative) shows that the ONB-MACF method outperforms existing state-of-the-art counterfactual generation methods across multiple quality metrics on diverse tabular datasets. This supports our hypothesis, showcasing the potential of data-morphology-based explainability strategies for trustworthy AI.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# 医用画像分割のためのアテンションベースフィルタを用いた多次元変換器

Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation ( http://arxiv.org/abs/2405.12328v1 )

ライセンス: Link先を確認

Wentao Wang, Xi Xiao, Mingjie Liu, Qing Tian, Xuanyao Huang, Qizhen Lan, Swalpa Kumar Roy, Tianyang Wang,

(参考訳) 医療画像の正確なセグメンテーションは、疾患の診断と治療に不可欠である。近年の研究では、視覚トランスフォーマーに基づく手法は、特徴間のグローバルな関係を確立する能力と様々な入力への適応性により、医用画像セグメンテーションの性能が著しく向上したことが示されている。しかし,これらの手法は,医用画像固有の低信号-雑音比に苦慮している。また, 医用画像のセグメンテーションに不可欠なチャネル情報と空間情報の有効利用は, 自己注意の表現能力によって制限される。これらの課題に対処するために,医療画像セグメンテーションのためのパッチ埋め込みと自己保持機構を再設計する,アテンションベースフィルタリング(MDT-AF)を備えたマルチ次元トランスフォーマーを提案する。 MDT-AFは、注意に基づく特徴フィルタリング機構をパッチ埋め込みブロックに組み込んでおり、低信号対雑音比の影響を軽減するために粗粒度プロセスを採用している。医用画像の複雑な構造をよりよく捉えるために、MDT-AFは自己認識機構を拡張し、空間次元とチャネル次元を取り入れ、特徴表現を豊かにする。さらに,空間次元とチャネル次元の特徴集約を改善するための相互作用機構を導入する。 3つの公開医用画像セグメンテーションベンチマークによる実験結果から, MDT-AFがSOTA(State-of-the-art)の性能を達成することが示された。

The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low signal-to-noise ratio inherent to medical images. Additionally, the effective utilization of channel and spatial information, which are essential for medical image segmentation, is limited by the representation capacity of self-attention. To address these challenges, we propose a multi-dimension transformer with attention-based filtering (MDT-AF), which redesigns the patch embedding and self-attention mechanism for medical image segmentation. MDT-AF incorporates an attention-based feature filtering mechanism into the patch embedding blocks and employs a coarse-to-fine process to mitigate the impact of low signal-to-noise ratio. To better capture complex structures in medical images, MDT-AF extends the self-attention mechanism to incorporate spatial and channel dimensions, enriching feature representation. Moreover, we introduce an interaction mechanism to improve the feature aggregation between spatial and channel dimensions. Experimental results on three public medical image segmentation benchmarks show that MDT-AF achieves state-of-the-art (SOTA) performance.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# オープンソースプロジェクトにおけるソフトウェア欠陥検出のための静的解析ツールの有効性

Efficacy of static analysis tools for software defect detection on open-source projects ( http://arxiv.org/abs/2405.12333v1 )

ライセンス: Link先を確認

Jones Yeboah, Saheed Popoola,

(参考訳) ソフトウェアプラクティスでは、静的解析ツールはソフトウェアの欠陥検出の不可欠な部分であり、Java、C++、Pythonといったさまざまなプログラミング言語で解析を実行するように設計されている。本稿では,Java,C++,Pythonのコードを用いたいくつかのデータセットを用いて,ソフトウェア欠陥を識別するための一般的な静的解析ツールを実証的に比較する。この研究は、データセットを使用して比較を行うために、SonarQube、PMD、Checkstyle、FindBugsといった一般的な分析ツールを使用した。この研究では、精度、リコール、F1スコアなどのさまざまな評価指標を使用して、分析ツールのパフォーマンスを測定した。この結果から,SonarQubeは3つのプログラミング言語にまたがる欠陥検出において,他のツールよりもかなり優れていることがわかった。これらの結果は、SonarQubeがソフトウェアの欠陥検出に有効なツールであることに同意する他の既存の研究と一致している。この研究は、異なるプログラミング言語を用いた静的解析ツールに関する多くの洞察と、各解析ツールの長所と短所を理解するための追加情報に貢献している。この研究は、ソフトウェア開発研究者や実践者への影響や、この分野の今後の方向性についても論じている。我々の研究アプローチは、ソフトウェア開発者、実践者、研究者が静的解析ツールでソフトウェアコードのエラーを検出する正しい選択をできるようにするためのレコメンデーションガイドラインを提供することを目的としています。また、研究者はソフトウェア分析ツールの調査と改善に取り組み、ソフトウェアシステムの品質と信頼性とソフトウェア開発プロセスの実践を強化する。

In software practice, static analysis tools remain an integral part of detecting defects in software and there have been various tools designed to run the analysis in different programming languages like Java, C++, and Python. This paper presents an empirical comparison of popular static analysis tools for identifying software defects using several datasets using Java, C++, and Python code. The study used popular analysis tools such as SonarQube, PMD, Checkstyle, and FindBugs to perform the comparison based on using the datasets. The study also used various evaluation metrics such as Precision, Recall, and F1-score to determine the performance of each analysis tool. The study results show that SonarQube performs considerably well than all other tools in terms of its defect detection across the various three programming languages. These findings remain consistent with other existing studies that also agree on SonarQube being an effective tool for defect detection in software. The study contributes to much insight on static analysis tools with different programming languages and additional information to understand the strengths and weaknesses of each analysis tool. The study also discusses the implications for software development researchers and practitioners, and future directions in this area. Our research approach aim is to provide a recommendation guideline to enable software developers, practitioners, and researchers to make the right choice on static analysis tools to detect errors in their software codes. Also, for researchers to embark on investigating and improving software analysis tools to enhance the quality and reliability of the software systems and its software development processes practice.

翻訳日:2024-05-22 15:07:24 公開日:2024-05-20

# オープンスタンダードベースメタデータ, 透かし, 暗号を用いた放送媒体の相互運用性保証認証

Interoperable Provenance Authentication of Broadcast Media using Open Standards-based Metadata, Watermarking and Cryptography ( http://arxiv.org/abs/2405.12336v1 )

ライセンス: Link先を確認

John C. Simmons, Joseph M. Winograd,

(参考訳) 誤った情報や誤解を招く情報の拡散は、立法機関や規制機関から大きな注目を集めている。消費者は特定の情報ソースを信頼するので、情報の出所や正確性を決定するためのスケーラブルで相互運用可能な方法が必要である。本稿では、ソーシャルメディアプラットフォームへの放送ニュースコンテンツの投稿、オープンスタンダードの役割、証明の検証における暗号メタデータと透かしの相互利用、そして成功と失敗のシナリオについて分析する。我々は,C2PA (Coalition for Provenance and Authenticity) とATSC (Advanced Television Systems Committee) によって開発された音声・ビデオ透かしのための暗号認証メタデータのオープン標準が,放送の証明に適していると結論付けた。最適な成功のためにこれらの標準を使用する方法を提案する。

The spread of false and misleading information is receiving significant attention from legislative and regulatory bodies. Consumers place trust in specific sources of information, so a scalable, interoperable method for determining the provenance and authenticity of information is needed. In this paper we analyze the posting of broadcast news content to a social media platform, the role of open standards, the interplay of cryptographic metadata and watermarks when validating provenance, and likely success and failure scenarios. We conclude that the open standards for cryptographically authenticated metadata developed by the Coalition for Provenance and Authenticity (C2PA) and for audio and video watermarking developed by the Advanced Television Systems Committee (ATSC) are well suited to address broadcast provenance. We suggest methods for using these standards for optimal success.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# Self-HWDebug: ハードウェアセキュリティ検証のためのLLMセルフインストラクションの自動化

Self-HWDebug: Automation of LLM Self-Instructing for Hardware Security Verification ( http://arxiv.org/abs/2405.12347v1 )

ライセンス: Link先を確認

Mohammad Akyash, Hadi Mardani Kamali,

(参考訳) 命令調整型大規模言語モデル(LLM)の台頭は、人工知能(AI)(特定のプロンプトに対応するのに適したもの)の大幅な進歩を示している。その人気にもかかわらず、ハードウェア設計におけるセキュリティ脆弱性、すなわちレジスタ転送言語(RTL)モジュール、特にSystem-on-chip(SoC)レベルでのセキュリティ脆弱性のデバッグにそのようなモデルを適用することは、大きな課題を呈している。主な課題の1つは、脆弱性の特定と緩和のために、正確に設計された指示が必要であることである。この課題に対応するために,LLMを活用して必要なデバッグ手順を自動生成する,革新的なフレームワークであるSelf-HWDebugを提案する。 Self-HWDebugでは、最も重要なハードウェア共通弱点列挙(CWE)リストから既に特定されているバグのセットと緩和解像度がフレームワークに提供され、その後、LSMにそのような緩和のためのターゲット命令を生成するよう促す。 LLMの生成した命令はその後、同じCWEカテゴリ内の脆弱性に対処する参照として使用されるが、全く異なる設計で、関連するセキュリティ問題にまたがるソリューションを拡張するフレームワークの能力を効果的に示している。 Self-HWDebugは、モデル独自の出力を使用してデバッグをガイドすることによって、人間の介入を大幅に削減する。包括的なテストを通じて、Self-HWDebugは専門家の労力/時間を短縮するだけでなく、デバッグプロセスの品質を向上させることも証明している。

The rise of instruction-tuned Large Language Models (LLMs) marks a significant advancement in artificial intelligence (AI) (tailored to respond to specific prompts). Despite their popularity, applying such models to debug security vulnerabilities in hardware designs, i.e., register transfer language (RTL) modules, particularly at system-on-chip (SoC) level, presents considerable challenges. One of the main issues lies in the need for precisely designed instructions for pinpointing and mitigating the vulnerabilities, which requires substantial time and expertise from human experts. In response to this challenge, this paper proposes Self-HWDebug, an innovative framework that leverages LLMs to automatically create required debugging instructions. In Self-HWDebug, a set of already identified bugs from the most critical hardware common weakness enumeration (CWE) listings, along with mitigation resolutions, is provided to the framework, followed by prompting the LLMs to generate targeted instructions for such mitigation. The LLM-generated instructions are subsequently used as references to address vulnerabilities within the same CWE category but in totally different designs, effectively demonstrating the framework's ability to extend solutions across related security issues. Self-HWDebug significantly reduces human intervention by using the model's own output to guide debugging. Through comprehensive testing, Self-HWDebug proves not only to reduce experts' effort/time but also to even improve the quality of the debugging process.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# TinyM$^2$Net-V3:サステナブルエッジ展開のためのメモリ対応圧縮マルチモーダルディープニューラルネットワーク

TinyM$^2$Net-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment ( http://arxiv.org/abs/2405.12353v1 )

ライセンス: Link先を確認

Hasib-Al Rashid, Tinoosh Mohsenin,

(参考訳) 高度な人工知能(AI)アルゴリズムの進歩により、エネルギー使用量や二酸化炭素排出量が著しく増加し、気候変動に対する懸念が高まっている。この増大する問題により、AI技術の環境持続性が最前線に進出した。これらの課題に対応するため、持続可能なAIソリューションの開発には緊急の必要性がある。これらのソリューションは、限られた資源を持つ環境でも多様なデータタイプを扱えるエネルギー効率の高い組込みシステムに焦点を合わせ、技術的進歩と環境責任の両立を保証する必要がある。エッジデバイス用の小さな機械学習モデルに補完的なマルチモーダルデータを統合することは、複雑さ、レイテンシ、消費電力の増加によって困難である。この研究はTinyM$^2$Net-V3を導入し、相補的なデータの異なるモダリティを処理し、ディープニューラルネットワーク(DNN)モデルを設計し、知識蒸留や低ビット幅量子化を含むモデル圧縮技術を用いて、低メモリ階層レベルにモデルを適合させ、レイテンシを低減し、リソース制約のあるデバイスにおけるエネルギー効率を向上する。我々はTinyM$^2$Net-V3を2つのマルチモーダルケーススタディで評価した。小さな推論モデル(6KBと58KB)では、それぞれ92.95%と90.7%の精度を達成した。私たちの小さな機械学習モデルは、リソース制限されたハードウェア上にデプロイされ、ミリ秒以内の低レイテンシと非常に高い電力効率を示しました。

The advancement of sophisticated artificial intelligence (AI) algorithms has led to a notable increase in energy usage and carbon dioxide emissions, intensifying concerns about climate change. This growing problem has brought the environmental sustainability of AI technologies to the forefront, especially as they expand across various sectors. In response to these challenges, there is an urgent need for the development of sustainable AI solutions. These solutions must focus on energy-efficient embedded systems that are capable of handling diverse data types even in environments with limited resources, thereby ensuring both technological progress and environmental responsibility. Integrating complementary multimodal data into tiny machine learning models for edge devices is challenging due to increased complexity, latency, and power consumption. This work introduces TinyM$^2$Net-V3, a system that processes different modalities of complementary data, designs deep neural network (DNN) models, and employs model compression techniques including knowledge distillation and low bit-width quantization with memory-aware considerations to fit models within lower memory hierarchy levels, reducing latency and enhancing energy efficiency on resource-constrained devices. We evaluated TinyM$^2$Net-V3 in two multimodal case studies: COVID-19 detection using cough, speech, and breathing audios, and pose classification from depth and thermal images. With tiny inference models (6 KB and 58 KB), we achieved 92.95% and 90.7% accuracies, respectively. Our tiny machine learning models, deployed on resource limited hardware, demonstrated low latencies within milliseconds and very high power efficiency.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 強化学習における変分量子回路の最適化手法に関する研究

A Study on Optimization Techniques for Variational Quantum Circuits in Reinforcement Learning ( http://arxiv.org/abs/2405.12354v1 )

ライセンス: Link先を確認

Michael Kölle, Timo Witter, Tobias Rohe, Gerhard Stenzel, Philipp Altmann, Thomas Gabor,

(参考訳) 量子コンピューティングは、機械学習を合理化し、トレーニング可能なパラメータを少なくすることで、より効果的にすることを目指している。このパラメータの削減は、学習プロセスを高速化し、計算資源の使用を削減できる。しかし、量子コンピューティングの現段階では、ノイズのある中間スケール量子時代 (NISQ) として知られており、量子ビットの限られた数と広範な量子ノイズのために学習は困難である。これらの課題を克服するために、研究者は変分量子回路(VQC)に注目している。 VQCは量子回路をマージするハイブリッドアルゴリズムであり、パラメータによって調整できる。これらの回路は、効果的な学習のために数量子ビットしか必要としない。近年の研究では、強化学習にVQCを適用する新しい方法が提示されており、さらなる探索を保証できる有望な結果を示している。本研究では,データ再ロード,インプットスケーリング,アウトプットスケーリングといった様々な手法の効果について検討し,量子近位法最適化アルゴリズムのアクタ-VQCにおいて指数的学習率減衰を導入する。我々は,これらの手法を,人気のある凍結湖とキャットポールの環境において評価する。我々の焦点は、効率を損なうことなく、VQC内のパラメータ数を削減できることにあります。以上の結果から,データ再アップロードと指数的学習率の低下により,ハイパーパラメータ安定性と全体的な性能が著しく向上することが示唆された。入力スケーリングはパラメータ効率を向上しないが、出力スケーリングはグレディネスを効果的に管理し、学習速度と堅牢性を高める。

Quantum Computing aims to streamline machine learning, making it more effective with fewer trainable parameters. This reduction of parameters can speed up the learning process and reduce the use of computational resources. However, in the current phase of quantum computing development, known as the noisy intermediate-scale quantum era (NISQ), learning is difficult due to a limited number of qubits and widespread quantum noise. To overcome these challenges, researchers are focusing on variational quantum circuits (VQCs). VQCs are hybrid algorithms that merge a quantum circuit, which can be adjusted through parameters, with traditional classical optimization techniques. These circuits require only few qubits for effective learning. Recent studies have presented new ways of applying VQCs to reinforcement learning, showing promising results that warrant further exploration. This study investigates the effects of various techniques -- data re-uploading, input scaling, output scaling -- and introduces exponential learning rate decay in the quantum proximal policy optimization algorithm's actor-VQC. We assess these methods in the popular Frozen Lake and Cart Pole environments. Our focus is on their ability to reduce the number of parameters in the VQC without losing effectiveness. Our findings indicate that data re-uploading and an exponential learning rate decay significantly enhance hyperparameter stability and overall performance. While input scaling does not improve parameter efficiency, output scaling effectively manages greediness, leading to increased learning speed and robustness.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 宇宙制御のための深層強化学習における選択の影響の検討

Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls ( http://arxiv.org/abs/2405.12355v1 )

ライセンス: Link先を確認

Nathaniel Hamilton, Kyle Dunlap, Kerianne L. Hobbs,

(参考訳) 多くの宇宙用途において、従来の制御法は操作中によく用いられる。しかし、宇宙資産の数は増え続けているため、自律的な運用は異なる宇宙関連タスクに対する制御方法の迅速な開発を可能にする。自律的な制御を開発する方法のひとつに強化学習(Reinforcement Learning, RL)がある。 RLエージェントが有界連続制御値を学ぶことは一般的であるが、伝統的に制御のオン/オフアプローチを好む多くの宇宙タスクにとって現実的あるいは実践的ではないかもしれない。本稿では、エージェントが予め定義されたアクションリストから選択しなければならない個別のアクション空間を用いて分析する。実験では、エージェントに提供された選択肢の数が、トレーニング中および後のパフォーマンスにどのように影響するかを調査した。この分析は、エージェントが物体を周航してその表面上の点を検査しなければならない検査タスクと、エージェントが別の宇宙船と「ドック」に接近し、相対速度が低いドッキングを行うドッキングタスクに対して行われる。両方のタスクの共通の目的は、燃料の使用を最小化することであり、燃料を使用しないアクションを定期的に選択する動機となっている。本結果より, 個別選択が限定された場合, 検査作業の最適性能が得られ, 連続制御はドッキング作業の最適性能が導かれることがわかった。

For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising performance and success across many complex tasks. While it is common for RL agents to learn bounded continuous control values, this may not be realistic or practical for many space tasks that traditionally prefer an on/off approach for control. This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. The experiments explore how the number of choices provided to the agents affects their measured performance during and after training. This analysis is conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" with a low relative speed. A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel. Our results show that a limited number of discrete choices leads to optimal performance for the inspection task, while continuous control leads to optimal performance for the docking task.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 多次元一般化ランゲヴィン方程式による粗粒配座力学--どのように、いつ、なぜか

Coarse-graining conformational dynamics with multi-dimensional generalized Langevin equation: how, when, and why ( http://arxiv.org/abs/2405.12356v1 )

ライセンス: Link先を確認

Pinchen Xie, Yunrui Qiu, Weinan E,

(参考訳) データ駆動型ab initio Generalized Langevin equation (AIGLE) アプローチが開発され、高次元、不均一、粗粒状コンフォメーションダイナミクスを学習し、シミュレートする。揺らぎ散逸定理に制約されたこのアプローチは、全原子分子動力学との動的整合性において粗い粒度のモデルを構築することができる。また,AIGLEが長期的動的整合性を実現するための実践的基準を提案する。 20の粗粒の部位を持つおもちゃのポリマーと2つの二面角を持つアラニンジペプチドのケーススタディは、実際に粗粒のコンフォメーション力学をモデル化するために、なぜAIGLEまたはマルコフ限界を採用するべきかを解明する。

A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics. Constrained by the fluctuation-dissipation theorem, the approach can build coarse-grained models in dynamical consistency with all-atom molecular dynamics. We also propose practical criteria for AIGLE to enforce long-term dynamical consistency. Case studies of a toy polymer, with 20 coarse-grained sites, and the alanine dipeptide, with two dihedral angles, elucidate why one should adopt AIGLE or its Markovian limit for modeling coarse-grained conformational dynamics in practice.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 高加速度肝4次元MRIのための条件付き生成逆相関ネットワーク

Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI ( http://arxiv.org/abs/2405.12357v1 )

ライセンス: Link先を確認

Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng,

(参考訳) 目的:高時空間分解能の4D MRIが肝放射線治療において望まれる。密集したk空間データを取得するのに時間がかかります。スパースサンプルによる高速な取得が望ましいが、画像の品質低下や長い復元時間の原因となることが多い。本研究では, 再構成品質を維持しつつ, 4次元MRI再構成時間の短縮を図るために, 再構成ペア付き条件生成適応ネットワーク(Re-Con-GAN)を提案する。方法: 自由呼吸肝4D MRIを施行した患者を対象とした。 nuFFTアルゴリズムを用いて3, 6, 10倍 (3x, 6x, 10x) の完全および振り返りアンダーサンプリングデータを初めて再構成した。その後、Re-Con-GANはペアで入力と出力を訓練した。 ResNet9, UNet, Restructation Swin Transformerの3種類のネットワークがジェネレータとして探索された。 PatchGANが差別者に選ばれた。 Re-Con-GANはデータを時間スライス(2D+t)として処理した。時間スライス12332例のうち48例はトレーニング(37例は10721スライス)とテスト(11例は1611スライス)に分けられた。結果: Re-Con-GAN は CS/UNet モデルと比較し,PSNR,SSIM,RMSE のスコアを一貫して達成した。 Re-Con-GAN、UNet、CSの推論時間は0.15s、0.16s、120sである。 GTV検出タスクでは、UNetと比較してRe-Con-GANとCSは、未処理のアンダーサンプル画像(3x 69.61%)のダイススコア(3x Re-Con-GAN 80.98%、3x CS 80.74%、3x UNet 79.88%)を改善した。結論: 家庭内データセットに有望かつ効率的な再構成結果を提示し, 対人訓練を施した生成ネットワークを提案する。 4D肝MRの迅速かつ質的な再構成は、肝癌に対するオンライン適応型MR誘導放射線療法を促進する可能性がある。

Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI reconstruction time while maintaining the reconstruction quality. Methods: Patients who underwent free-breathing liver 4D MRI were included in the study. Fully- and retrospectively under-sampled data at 3, 6 and 10 times (3x, 6x and 10x) were first reconstructed using the nuFFT algorithm. Re-Con-GAN then trained input and output in pairs. Three types of networks, ResNet9, UNet and reconstruction swin transformer, were explored as generators. PatchGAN was selected as the discriminator. Re-Con-GAN processed the data (3D+t) as temporal slices (2D+t). A total of 48 patients with 12332 temporal slices were split into training (37 patients with 10721 slices) and test (11 patients with 1611 slices). Results: Re-Con-GAN consistently achieved comparable/better PSNR, SSIM, and RMSE scores compared to CS/UNet models. The inference time of Re-Con-GAN, UNet and CS are 0.15s, 0.16s, and 120s. The GTV detection task showed that Re-Con-GAN and CS, compared to UNet, better improved the dice score (3x Re-Con-GAN 80.98%; 3x CS 80.74%; 3x UNet 79.88%) of unprocessed under-sampled images (3x 69.61%). Conclusion: A generative network with adversarial training is proposed with promising and efficient reconstruction results demonstrated on an in-house dataset. The rapid and qualitative reconstruction of 4D liver MR has the potential to facilitate online adaptive MR-guided radiotherapy for liver cancer.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# エンタープライズRAGのための原子単位を用いた質問ベース検索

Question-Based Retrieval using Atomic Units for Enterprise RAG ( http://arxiv.org/abs/2405.12363v1 )

ライセンス: Link先を確認

Vatsal Raina, Mark Gales,

(参考訳) エンタープライズ検索拡張生成(RAG)は、強力な大規模言語モデル(LLM)と内部的、あるいは時間的に変化する文書を組み合わせるための、非常に柔軟なフレームワークを提供する。 RAGでは、文書はまずチャンクされる。関連チャンクは、特定のユーザクエリに対して検索され、コンテクストとしてシンセサイザーLLMに渡されてクエリ応答を生成する。しかし、誤ったチャンクがシンセサイザーLLMを誘導して誤応答を発生させるため、検索ステップは性能を制限できる。本研究は、より正確なチャンクリのための標準密度検索ステップのゼロショット適応を提案する。具体的には、チャンクをまず原子ステートメントに分解する。合成質問の集合がこれらの原子上で生成される(コンテキストとしてチャンクが用いられる)。センス検索は、ユーザクエリに最も近い合成質問と関連するチャンクを見つけることを伴う。その結果,原子による検索はチャンクによる検索よりも高いリコールにつながることがわかった。原子上に生成した合成質問を用いた検索により、さらなる性能向上が観察された。検索ステップでのリコールの高速化により、RAGパイプラインを使用したエンタープライズLLMのパフォーマンスの向上が可能となる。

Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a specific user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit performance, as incorrect chunks can lead the synthesizer LLM to generate a false response. This work proposes a zero-shot adaptation of standard dense retrieval steps for more accurate chunk recall. Specifically, a chunk is first decomposed into atomic statements. A set of synthetic questions are then generated on these atoms (with the chunk as the context). Dense retrieval involves finding the closest set of synthetic questions, and associated chunks, to the user query. It is found that retrieval with the atoms leads to higher recall than retrieval with chunks. Further performance gain is observed with retrieval using the synthetic questions generated over the atoms. Higher recall at the retrieval step enables higher performance of the enterprise LLM using the RAG pipeline.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 負の質量特性を模倣する

Mimicking Negative Mass Properties ( http://arxiv.org/abs/2405.12366v1 )

ライセンス: Link先を確認

S. D. Campos,

(参考訳) 本研究では, 負の質量に起因する特性が正の質量粒子によって模倣されるような物理的条件を得るための2つの系を解析する。 1つ目は、ディラック方程式で表されるよく知られた1/2スピン系で、外部電磁場が存在する。いくつかの物理的制限を仮定すると、$e\rightarrow-e$ は $m\rightarrow-m$ と同じ結果をもたらす。特に、零誘電関数に対しては、負の荷電粒子からなる正の質量系から負の質量挙動を得ることができる。第2のシステムは、ド・ブロイ物質波に基づいている。そのような波の分散関係は、虚波数を仮定すると負(実数または虚数)となる。その結果、正の質量粒子に対する負の屈折率が出現した。しかし、この挙動は一般に負の質量系に起因する。

In the present work, one analyzes two systems trying to obtain physical conditions where some properties attributed to negative mass can be mimicked by positive mass particles. The first one is the well-known 1/2-spin system described by the Dirac equation in the presence of an external electromagnetic field. Assuming some physical restrictions, one obtains that the use of $e\rightarrow-e$ can lead to the same results as using $m\rightarrow-m$. In particular, for a null dielectric function, it is possible to obtain a negative mass behavior from a positive mass system composed of negatively charged particles. The second system is based on the de Broglie matter wave. The dispersion relation of such a wave can be negative (real or imaginary valued) if one assumes an imaginary wavenumber. The consequence is the emergence of a negative refractive index for positive mass particles. However, this behavior is generally attributed to a negative mass system.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 深層学習による膵の大規模マルチセンターCTとMRI分割

Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning ( http://arxiv.org/abs/2405.12367v1 )

ライセンス: Link先を確認

Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan, Emil Agarunov, Nassier Harfouch, Chenchan Huang, Marco J. Bruno, Ivo Schoots, Rajesh N. Keswani, Frank H. Miller, Tamas Gonda, Cemal Yazici, Temel Tirkes, Baris Turkbey, Michael B. Wallace, Ulas Bagci,

(参考訳) 膵疾患の診断と経過観察には,横断的画像診断における膵の容積分画の自動化が必要である。 CTベースの膵セグメンテーションはより確立されているが、MRIベースのセグメンテーション手法は、公開データセットの欠如、ベンチマーク研究の努力、ドメイン固有のディープラーニング手法が主な原因である。 2004年3月から2022年11月にかけて,T1強調画像(T1W)とT2強調画像(T2W)の大規模なデータセット(499名)を収集した。また,ベンチマーク目的で公開資料から1,350人の患者のCTも収集した。そこで我々は,nnUNetとTransformerネットワークの長所と,体積計算が可能な新しい線形アテンションモジュールを組み合わせた,パンセグネットと呼ばれる新しい膵分画法を開発した。我々は,Dice と Hausdorff 距離 (HD95) 評価指標を用いて,PanSegNet のクロスモダリティ (合計2,117スキャン) とクロスセンター設定の精度を検証した。我々は,CohenのKappa統計を,それぞれ量比較とDice比較のペアt検定に用いた。 T1W MRIでは85.0% (std: 7.9%) , T2W MRIでは86.3% (std: 6.4%) であった。 R^2は0.91,0.84,0.85はCT,T1W,T2Wと高い相関を示した。 0.624,0.638,T1W,T2WMRIにて中等度なサーバ間一致率を示し,高いサーバ内一致率を示した。すべてのMRIデータはhttps://osf.io/kysnj/で公開されている。ソースコードはhttps://github.com/NUBagciLab/PaNSegNetで公開されています。

Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# センサトリガー(TDOST)のテキスト記述によるスマートホームにおけるレイアウト非依存の人間活動認識

Layout Agnostic Human Activity Recognition in Smart Homes through Textual Descriptions Of Sensor Triggers (TDOST) ( http://arxiv.org/abs/2405.12368v1 )

ライセンス: Link先を確認

Megha Thukral, Sourish Gunesh Dhekane, Shruthi K. Hiremath, Harish Haresamudram, Thomas Ploetz,

(参考訳) スマートホームにおける環境センサを用いたヒューマンアクティビティ認識(HAR)は、人間の健康と健康に多くの応用がある。しかし、新しいスマートホーム環境にデプロイ可能な汎用HARモデルを構築するには、大量のアノテートされたセンサデータとトレーニングのオーバーヘッドが必要である。ほとんどのスマートホームはレイアウト、すなわちフロアプランやセンサーの具体的特徴に大きく違いがあり、特定の住宅向けに訓練されたHARモデルの一般化性は低い。本稿では,センサデータの自然言語記述の伝達可能な表現能力を利用したスマートホームにおけるHARシステムのレイアウトに依存しない新しいモデリング手法を導入することで,この制限に対処する。この目的のために,センサトリガーのテクスチュアル記述(TDOST)を生成し,周囲のトリガー条件をカプセル化し,アクティビティ認識モデルに基盤となるアクティビティの手がかりを提供する。テキストの埋め込みを生のセンサデータではなく活用することで、対象の家庭に適応したり(再学習)することなく、家庭全体の標準的な活動を予測できる活動認識システムを構築します。本研究では,TDOSTをベースとしたスマートホームにおけるモデルの有効性を,ベンチマークしたCASASデータセットを用いた実験により実証した。さらに,本手法の個々の成分が下流活動認識性能に与える影響を詳細に分析する。

Human activity recognition (HAR) using ambient sensors in smart homes has numerous applications for human healthcare and wellness. However, building general-purpose HAR models that can be deployed to new smart home environments requires a significant amount of annotated sensor data and training overhead. Most smart homes vary significantly in their layouts, i.e., floor plans and the specifics of sensors embedded, resulting in low generalizability of HAR models trained for specific homes. We address this limitation by introducing a novel, layout-agnostic modeling approach for HAR systems in smart homes that utilizes the transferrable representational capacity of natural language descriptions of raw sensor data. To this end, we generate Textual Descriptions Of Sensor Triggers (TDOST) that encapsulate the surrounding trigger conditions and provide cues for underlying activities to the activity recognition models. Leveraging textual embeddings, rather than raw sensor data, we create activity recognition systems that predict standard activities across homes without either (re-)training or adaptation on target homes. Through an extensive evaluation, we demonstrate the effectiveness of TDOST-based models in unseen smart homes through experiments on benchmarked CASAS datasets. Furthermore, we conduct a detailed analysis of how the individual components of our approach affect downstream activity recognition performance.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# AtomGS: 高密度放射場のためのガウス散乱の微粒化

AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field ( http://arxiv.org/abs/2405.12369v1 )

ライセンス: Link先を確認

Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng,

(参考訳) 3D Gaussian Splatting (3DGS) は、新しいビュー合成とリアルタイムレンダリング速度の優れた機能を提供することにより、近年、放射界再構成が進んでいる。しかし、最適化と適応密度制御をブレンドするというその戦略は、時として、より小さなものを適切に密度付けするコストで、大きなガウスを最適化することを優先するため、ノイズの多い幾何学やぼやけたアーチファクトを生じることがある。この問題に対処するために、Atomized ProliferationとGeometry-Guided OptimizationからなるAtomGSを紹介します。 Atomized Proliferationは様々な大きさの楕円体ガウスをより均一な大きさの原子ガウスに制限する。この戦略は, シーンの細部に応じて, デンシフィケーションに重きを置くことで, 優れた特徴を持つ領域の表現を促進させる。さらに,エッジ・アウェア・ノーマル・ロスを組み込んだ幾何誘導最適化手法を提案する。この最適化方法は、複雑な詳細を保存しながら、平面を効果的に滑らかにする。評価の結果、AtomGSはレンダリング品質において既存の最先端手法よりも優れています。さらに、幾何再構成における競合精度を実現し、他のSDF法よりもトレーニング速度が大幅に向上する。よりインタラクティブなデモは、私たちのWebサイトにある(\href{https://rongliu-leo.github.io/AtomGS/}{https://rongliu-leo.github.io/AtomGS/})。

3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (\href{https://rongliu-leo.github.io/AtomGS/}{https://rongliu-leo.github.io/AtomGS/}).

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# DispaRisk: データセットにおける格差リスクの評価と解釈

DispaRisk: Assessing and Interpreting Disparity Risks in Datasets ( http://arxiv.org/abs/2405.12372v1 )

ライセンス: Link先を確認

Jonathan Vasquez, Carlotta Domeniconi, Huzefa Rangwala,

(参考訳) 機械学習アルゴリズム(ML)は、人間の生活のあらゆる側面に影響を与え、医療、金融、教育など、さまざまな分野にまたがって利用されてきた。しばしば、MLアルゴリズムはデータセットで示される社会的バイアスを悪化させ、多くの場合、個人のサブセットやグループに敵対的な影響をもたらす。これらの不適切な効果を効果的に軽減するためには、MLパイプラインの早期に相違/相の同定と評価が不可欠である。このプロアクティブなアプローチは、バイアスの増幅を防ぎ、モデル開発の後期段階で複雑さを減らすために、タイムリーな介入を促進する。本稿では,MLパイプラインの初期段階におけるデータセットの不均一性の潜在的なリスクを積極的に評価するために設計された,新しいフレームワークであるDispaRiskを紹介する。フェアネス研究でよく使われるデータセットとベンチマークすることで、DispaRiskの有効性を評価する。以上の結果から,差別リスクの高いデータセットを識別するDispaRiskの能力,バイアスを伴いやすいモデルファミリー,MLパイプラインにおける識別感受性を高める特徴が示された。実験用のコードは以下のリポジトリで利用可能です。

Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors, including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases presented in datasets, leading to adversarial impacts on subsets/groups of individuals, in many cases minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified and assessed early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it with commonly used datasets in fairness research. Our findings demonstrate the capabilities of DispaRisk to identify datasets with a high-risk of discrimination, model families prone to biases, and characteristics that heighten discrimination susceptibility in a ML pipeline. The code for our experiments is available in the following repository: https://github.com/jovasque156/disparisk

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 時空間アテンションに基づく隠れた物理インフォームドニューラルネットワークによる生活予測

Spatio-temporal Attention-based Hidden Physics-informed Neural Network for Remaining Useful Life Prediction ( http://arxiv.org/abs/2405.12377v1 )

ライセンス: Link先を確認

Feilong Jiang, Xiaonan Hou, Min Xia,

(参考訳) RUL(Representing Useful Life)の予測は、産業システムにおける予後健康管理(PHM)に不可欠である。ディープラーニングアプローチはRUL予測においてかなりの成功を収めているが、予測精度の低下や解釈可能性の低下といった課題が大きな課題となり、実践的な実装を妨げている。本研究では,RUL予測のための時空間アテンションに基づく隠れ物理インフォームドニューラルネットワーク(STA-HPINN)を提案する。時空間的注意機構は、入力データから重要な特徴を抽出することができる。センサ次元と時間ステップ次元の両方における自己認識機構により、提案モデルは劣化情報を効果的に抽出することができる。隠れた物理インフォームドニューラルネットワークを用いて、RULの進化を管理する物理機構を捉える。物理の制約により、モデルはより高い精度と妥当な予測を達成できる。このアプローチはベンチマークデータセットで検証され、特に複雑な条件の場合、最先端の手法と比較して、例外的なパフォーマンスを示す。

Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-based Hidden Physics-informed Neural Network (STA-HPINN) for RUL prediction, which can utilize the associated physics of the system degradation. The spatio-temporal attention mechanism can extract important features from the input data. With the self-attention mechanism on both the sensor dimension and time step dimension, the proposed model can effectively extract degradation information. The hidden physics-informed neural network is utilized to capture the physics mechanisms that govern the evolution of RUL. With the constraint of physics, the model can achieve higher accuracy and reasonable predictions. The approach is validated on a benchmark dataset, demonstrating exceptional performance when compared to cutting-edge methods, especially in the case of complex conditions.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 量子核法における計算資源としての位相空間負性

Phase-space negativity as a computational resource for quantum kernel methods ( http://arxiv.org/abs/2405.12378v1 )

ライセンス: Link先を確認

Ulysse Chabaud, Roohollah Ghobadi, Salman Beigi, Saleh Rahimi-Keshari,

(参考訳) 量子カーネル法は、機械学習において量子計算の優位性を達成するための提案である。量子カーネルと呼ばれる関数は量子デバイスによって推定され、残りの計算は古典的に実行される。量子カーネル関数が古典的コンピュータ上で効率的に推定できない場合に限り、この方法で量子上の優位性を達成することができる。本稿では,ボゾン系における量子カーネル関数の最適古典的推定に十分な条件を提供する。具体的には、量子カーネルに関連付けられた量子状態の位相空間準確率分布における負性度が、量子回路の大きさと多項式的にほぼ一致している場合、カーネル関数を古典的に効率的に推定できることを示す。我々は、適応的な非ガウス測度を持つ線形光学ネットワークを含む量子光学的例を考察し、古典的シミュレーションの効率に対する損失の影響について検討する。本研究は, カーネル法に基づく量子機械学習において, 位相空間準確率分布における負性の役割を決定づけるものである。

Quantum kernel methods are a proposal for achieving quantum computational advantage in machine learning. They are based on a hybrid classical-quantum computation where a function called the quantum kernel is estimated by a quantum device while the rest of the computation is performed classically. Quantum advantages may be achieved through this method only if the quantum kernel function cannot be estimated efficiently on a classical computer. In this paper, we provide sufficient conditions for the efficient classical estimation of quantum kernel functions for bosonic systems. Specifically, we show that if the negativity in the phase-space quasi-probability distributions of data-encoding quantum states associated with the quantum kernel scales at most polynomially with the size of the quantum circuit, then the kernel function can be estimated efficiently classically. We consider quantum optical examples involving linear-optical networks with and without adaptive non-Gaussian measurements and investigate the effects of loss on the efficiency of the classical simulation. Our results underpin the role of the negativity in phase-space quasi-probability distributions as an essential resource in quantum machine learning based on kernel methods.

翻訳日:2024-05-22 14:57:39 公開日:2024-05-20

# 測定依存は量子ネットワークにおけるセキュリティを高める

Measurement dependence can enhance security in a quantum network ( http://arxiv.org/abs/2405.12379v1 )

ライセンス: Link先を確認

Amit Kundu, Debasis Sarkar,

(参考訳) ネットワーク非局所性(Network Nonlocality)は、ベルの定理を超えてネットワーク構造を構成する量子非局所性の先進的な研究である。量子ネットワークの発展は、セバラル量子情報処理タスクに多くの技術応用をもたらす可能性がある。ここでは、ネットワークにおけるエンドパーティの測定選択の独立性の役割に注目し、量子ネットワークにおけるセキュリティを強化するために使用することができる。 3つのパートに分かれた2つのソースのバイローカルネットワークと4つのパートの3つのソースのスターネットワークシナリオにおいて、誰かがネットワーク通信に侵入したければ、実際のセキュリティプロトコルを強化するための仮定の緩和を理解するための実践的な方法が示される。理論的には、一方の端点のみの測定選択の独立性を緩和することにより、標準ネットワーク非局所性(SNN)とより強いフルネットワーク非局所性(FNN)を作成でき、古典的無信号局所モデルによって最大量子違反が得られることを証明した。我々は、FNNがSNNよりも強いという意味で、ネットワーク内のすべてのソースが非ローカルリソースを分散する必要がある、とFNNは述べている。

Network Nonlocality is an advanced study of quantum nonlocality that comprises network structure beyond Bell's theorem. The development of quantum networks has the potential to bring a lot of technological applications in sevaral quantum information processing tasks. Here, we are focusing on how the role of the independence of the measurement choices of the end parties in a network works and can be used to enhance the security in a quantum network. In both three-parties two-sources bilocal network and four-parties three-sources star network scenarios, we are able to show, a practical way to understand the relaxation of the assumptions to enhance a real security protocol if someone wants to breach in a network communications. Theoratically, we have proved that by relaxing the independence of the measurement choices of only one end party we can create a Standard Network Nonlocality(SNN) and more stronger Full Network Nonlocality(FNN) and we can get maximum quantum violation by the classical no-signalling local model. We are able to distinguish between two types of network nonlocality in the sense that the FNN is stronger than SNN, i.e., FNN states all the sources in a network need to distribute nonlocal resources.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 確率型貯留層コンピュータ

Stochastic Reservoir Computers ( http://arxiv.org/abs/2405.12382v1 )

ライセンス: Link先を確認

Peter J. Ehlers, Hendra I. Nurdin, Daniel Soh,

(参考訳) 貯留層コンピューティング(Reservoir computing)は、非線形力学システムを用いて、典型的なニューラルネットワークと比較してコスト効率の良い複雑なタスクを実行する機械学習の一種である。近年の貯水池コンピューティングの進歩、特に量子貯水池コンピューティングは、本質的に確率的な貯水池を利用している。しかし、これらのシステムを使用する理論的正当性はまだ十分に確立されていない。本稿では, 確率型貯水池コンピュータの普遍性について検討し, 各貯水池状態の確率を状態自体ではなく可読性として利用した確率型貯水池計算システムについて検討する。確率的貯水池計算では、貯水池のコンピュータ全体の異なる状態の数は、貯水池のハードウェアのサイズと指数関数的にスケールできる可能性があり、コンパクトなデバイスサイズに利点がある。確率的エコー状態ネットワークのクラス、従って全ての確率的貯水池コンピュータのクラスは普遍的な近似クラスであることを示す。また,確率型貯水池コンピュータの分類とカオス時系列予測における実用例について検討した。ショットノイズは確率的貯水池計算の性能の限界要因であるが,ノイズの影響が小さい場合には,類似のハードウェアを持つ決定論的貯水池コンピュータに比べて性能が大幅に向上した。

Reservoir computing is a form of machine learning that utilizes nonlinear dynamical systems to perform complex tasks in a cost-effective manner when compared to typical neural networks. Many recent advancements in reservoir computing, in particular quantum reservoir computing, make use of reservoirs that are inherently stochastic. However, the theoretical justification for using these systems has not yet been well established. In this paper, we investigate the universality of stochastic reservoir computers, in which we use a stochastic system for reservoir computing using the probabilities of each reservoir state as the readout instead of the states themselves. In stochastic reservoir computing, the number of distinct states of the entire reservoir computer can potentially scale exponentially with the size of the reservoir hardware, offering the advantage of compact device size. We prove that classes of stochastic echo state networks, and therefore the class of all stochastic reservoir computers, are universal approximating classes. We also investigate the performance of two practical examples of stochastic reservoir computers in classification and chaotic time series prediction. While shot noise is a limiting factor in the performance of stochastic reservoir computing, we show significantly improved performance compared to a deterministic reservoir computer with similar hardware in cases where the effects of noise are small.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# ディープラーニングによる脆弱性検出

Vulnerability Detection with Deep Learning ( http://arxiv.org/abs/2405.12384v1 )

ライセンス: Link先を確認

Zhen Huang, Amy Aumpansub,

(参考訳) ディープラーニングは、ソフトウェアの脆弱性を検出するための有望なツールであることが示されている。本研究では,C/C++プログラムのソースコードから抽出したプログラムスライスを用いてニューラルネットワークをトレーニングし,ソフトウェア脆弱性を検出する。プログラムスライスでは、API関数呼び出し、配列使用、ポインタ使用、演算式など、脆弱性に関連するプログラム構成の構文とセマンティック特性をキャプチャする。脆弱なコードと非脆弱なコードの両方に対して強力な予測モデルを実現するため、異なるタイプのトレーニングデータ、異なるオプティマイザ、異なるタイプのニューラルネットワークを比較した。この結果から,ソースコードの特徴の相違と,脆弱なプログラムスライスと非脆弱性なプログラムスライスをバランスよく組み合わせることで,脆弱なコードと非脆弱性なコードの両方を予測する上で,バランスの取れた精度が得られることがわかった。さまざまなニューラルネットワークの中で、ADAMオプティマイザを備えたBGRUは、92.49%の精度でソフトウェア脆弱性を検出するのに最善を尽くしている。

Deep learning has been shown to be a promising tool in detecting software vulnerabilities. In this work, we train neural networks with program slices extracted from the source code of C/C++ programs to detect software vulnerabilities. The program slices capture the syntax and semantic characteristics of vulnerability-related program constructs, including API function call, array usage, pointer usage, and arithmetic expression. To achieve a strong prediction model for both vulnerable code and non-vulnerable code, we compare different types of training data, different optimizers, and different types of neural networks. Our result shows that combining different types of characteristics of source code and using a balanced number of vulnerable program slices and non-vulnerable program slices produce a balanced accuracy in predicting both vulnerable code and non-vulnerable code. Among different neural networks, BGRU with the ADAM optimizer performs the best in detecting software vulnerabilities with an accuracy of 92.49%.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# SciJava Ops: FijiとBeyondのための改良されたアルゴリズムフレームワーク

SciJava Ops: An Improved Algorithms Framework for Fiji and Beyond ( http://arxiv.org/abs/2405.12385v1 )

ライセンス: Link先を確認

Gabriel J. Selzer, Curtis T. Rueden, Mark C. Hiner, Edward L. Evans III, David Kolb, Marcel Wiedenmann, Christian Birkhold, Tim-Oliver Buchholz, Stefan Helfrich, Brian Northan, Alison Walter, Johannes Schindelin, Tobias Pietzsch, Stephan Saalfeld, Michael R. Berthold, Kevin W. Eliceiri,

(参考訳) 多くの科学ソフトウェアプラットフォームは、外部開発機能の統合、デプロイ、実行を簡単にするプラグインメカニズムを提供している。画像分野で最も広く使われているプラットフォームのひとつに、科学画像分析のための人気のあるオープンソースアプリケーションであるFijiがある。 FijiにはImageJとImageJ2プラットフォームが組み込まれており、多様な問題を解決するために数千のプラグインが使用する強力なプラグインアーキテクチャを提供する。この機能はFijiの成功の重要な部分であり、生体画像解析ツールとして広く使われ、新機能のターゲットとなっている。しかし、プラグインベースのソフトウェアアーキテクチャは、互換性のないデータ構造で動作する異なるプラットフォームを統合することはできない。その結果、Fijiのようなプラットフォームは高い相互接続性と拡張性を実現していますが、多くのデータタイプ、プログラミング言語、さまざまなソフトウェアプラットフォームのアーキテクチャ上の違いをまたいで統合するように設計されていません。この課題に対処するために、我々はSciJava Opsという、プラグインとしてアルゴリズムを表現するための基礎的なソフトウェアライブラリを紹介します。 FijiのSciJavaプラグインメカニズムの進化を続けているSciJava Opsは、中央実行環境内のさまざまなソフトウェアプラットフォームからのアルゴリズムを活用することができる。さらに、SciJava Opsは、各アルゴリズムの最も適切な構造に自動的にデータを適応し、ユーザーが自由に透過的に非互換なツールのアルゴリズムを組み合わせられるようにする。 SciJava Opsは最初はFijiのアップデートサイトとして配布されるが、フレームワークはFiji、ImageJ、ImageJ2を必要としない。

Many scientific software platforms provide plugin mechanisms that simplify the integration, deployment, and execution of externally developed functionality. One of the most widely used platforms in the imaging space is Fiji, a popular open-source application for scientific image analysis. Fiji incorporates and builds on the ImageJ and ImageJ2 platforms, which provide a powerful plugin architecture used by thousands of plugins to solve a wide variety of problems. This capability is a major part of Fiji's success, and it has become a widely used biological image analysis tool and a target for new functionality. However, a plugin-based software architecture cannot unify disparate platforms operating on incompatible data structures; interoperability necessitates the creation of adaptation or "bridge" layers to translate data and invoke functionality. As a result, while platforms like Fiji enable a high degree of interconnectivity and extensibility, they were not fundamentally designed to integrate across the many data types, programming languages, and architectural differences of various software platforms.To help address this challenge, we present SciJava Ops, a foundational software library for expressing algorithms as plugins in a unified and extensible way. Continuing the evolution of Fiji's SciJava plugin mechanism, SciJava Ops enables users to harness algorithms from various software platforms within a central execution environment. In addition, SciJava Ops automatically adapts data into the most appropriate structure for each algorithm, allowing users to freely and transparently combine algorithms from otherwise incompatible tools. While SciJava Ops is initially distributed as a Fiji update site, the framework does not require Fiji, ImageJ, or ImageJ2, and would be suitable for integration with additional image analysis platforms.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 粒子群最適化と最大近似と負の負二項回帰への応用

Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression ( http://arxiv.org/abs/2405.12386v1 )

ライセンス: Link先を確認

Sisi Shao, Junhyung Park, Weng Kee Wong,

(参考訳) nlminb, optim (R) や nlmixed (SAS) などの汎用最適化ルーチンは、非標準分布のモデルパラメータを推定するために頻繁に使用される。本稿では、統計学で使われている多くのアルゴリズムの代替として、Particle Swarm Optimization (PSO)を提案する。 PSOは上記のルーチンと同じ結果を再現できるだけでなく、より最適な結果や、他のルーチンが収束できない結果も生成できる。後者の場合、問題や問題の原因を特定することもできる。一般化分布のいくつかのパラメータは、(RまたはSASのルーチンを用いて明らかにまたは計算的に表されていない場合)PSOを用いて未同定され、(2)PSOは、現在のルーチンがそうでない場合の対数二項回帰の予測結果を生成することができる、(3)PSOは、それぞれGLMやGENMODなどの標準パッケージによって支持されるLASSOペナルティによる二項回帰のリンク関数に柔軟性を提供する、(4)PSOは、モーメントに依存する従来の統計手法と比較して、EE-IW分布の優れたMLE推定を提供する、という4つの例を用いてPSOを使用する利点を強調した。

General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that are more optimal or when others cannot converge. In the latter case, it can also identify the source of the problem or problems. We highlight advantages of using PSO using four examples, where: (1) some parameters in a generalized distribution are unidentified using PSO when it is not apparent or computationally manifested using routines in R or SAS; (2) PSO can produce estimation results for the log-binomial regressions when current routines may not; (3) PSO provides flexibility in the link function for binomial regression with LASSO penalty, which is unsupported by standard packages like GLM and GENMOD in Stata and SAS, respectively, and (4) PSO provides superior MLE estimates for an EE-IW distribution compared with those from the traditional statistical methods that rely on moments.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 隠れたコンバウンディングによるコンフォルマルカウンターファクト推論

Conformal Counterfactual Inference under Hidden Confounding ( http://arxiv.org/abs/2405.12387v1 )

ライセンス: Link先を確認

Zonghao Chen, Ruocheng Guo, Jean-François Ton, Yang Liu,

(参考訳) パーソナライズされた意思決定は、異なる治療の下での潜在的な結果に関する知識を必要とし、潜在的な結果に対する信頼区間は、この意思決定プロセスをさらに強化し、高い評価シナリオにおける信頼性を向上させる。反事実的世界における潜在的な結果の予測と不確実性は、因果推論における因果的問題を引き起こす。反事実に対する信頼区間を構築する既存の方法は、強い無知の仮定に依存するか、または観察的分布と介入的分布の違いを特徴づける識別不能な下限と上限へのアクセスを必要とする。これらの制限を克服するために,まず,提案手法をトランスダクティブ重み付き共形予測に基づく新しいアプローチwTCP-DRを提案する。制約の少ない仮定では、観察ディストリブトインから介入分布への共変量シフトを考慮に入れるために、少数の介入データ(ランダム化制御試験から)にアクセスする必要がある。理論的な結果は、介入データのみを使用するネーブ法に対して、我々のアルゴリズムが厳密に有利である条件を明確に示している。対策薬の有効間隔を確保した後、個別治療効果(ITE)の間隔を構築することは容易である。提案手法は, 対象範囲と効率の両面で, 最先端のベースラインと比較して, 提案手法の優位性を検証するために, 推薦システムを含む, 合成および実世界のデータにまたがる手法を実証する。

Personalized decision making requires the knowledge of potential outcomes under different treatments, and confidence intervals about the potential outcomes further enrich this decision-making process and improve its reliability in high-stakes scenarios. Predicting potential outcomes along with its uncertainty in a counterfactual world poses the foundamental challenge in causal inference. Existing methods that construct confidence intervals for counterfactuals either rely on the assumption of strong ignorability, or need access to un-identifiable lower and upper bounds that characterize the difference between observational and interventional distributions. To overcome these limitations, we first propose a novel approach wTCP-DR based on transductive weighted conformal prediction, which provides confidence intervals for counterfactual outcomes with marginal converage guarantees, even under hidden confounding. With less restrictive assumptions, our approach requires access to a fraction of interventional data (from randomized controlled trials) to account for the covariate shift from observational distributoin to interventional distribution. Theoretical results explicitly demonstrate the conditions under which our algorithm is strictly advantageous to the naive method that only uses interventional data. After ensuring valid intervals on counterfactuals, it is straightforward to construct intervals for individual treatment effects (ITEs). We demonstrate our method across synthetic and real-world data, including recommendation systems, to verify the superiority of our methods compared against state-of-the-art baselines in terms of both coverage and efficiency

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 1次元マニフォールド学習のためのメートル法に基づく主曲線法

A Metric-based Principal Curve Approach for Learning One-dimensional Manifold ( http://arxiv.org/abs/2405.12390v1 )

ライセンス: Link先を確認

Elvis Han Cui, Sisi Shao,

(参考訳) 主曲線(英: principal curve)は、微分幾何学の概念を用いた多様体学習を指向したよく知られた統計手法である。本稿では,空間データの1次元多様体を学習する新しい計量ベース主曲線(MPC)法を提案する。合成データセット MNISTデータセットを用いた実応用により,本手法は形状の観点から一次元多様体をよく学習できることを示す。

Principal curve is a well-known statistical method oriented in manifold learning using concepts from differential geometry. In this paper, we propose a novel metric-based principal curve (MPC) method that learns one-dimensional manifold of spatial data. Synthetic datasets Real applications using MNIST dataset show that our method can learn the one-dimensional manifold well in terms of the shape.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 欧州XFELクライストロンの異常自動検出

Automated Anomaly Detection on European XFEL Klystrons ( http://arxiv.org/abs/2405.12391v1 )

ライセンス: Link先を確認

Antonin Sulc, Annika Eichler, Tim Wilksen,

(参考訳) 高出力マルチビームクライストロンは、欧州XFELにおける超伝導無線周波数(SRF)キャビティの加速場を生成するためにRFを増幅する重要な構成要素である。これらの高出力コンポーネントの交換には時間と労力を要するため、メンテナンスとダウンタイムを最小限に抑えると同時に、デバイスの動作を最大化する必要がある。機械学習を用いてクライストロンの挙動を探索するために,我々は,さまざまな操作モードを判定し,特徴抽出と次元還元を行い,通常の操作に関する情報を抽出する一連の実験を完了した。記録されたデータを分析するために、私たちは最先端のデータ駆動学習技術を使用し、Klystronの運用状態をよりよく理解し、可能性のある障害や異常の早期発見に役立つ最も有望なコンポーネントを認識しました。

High-power multi-beam klystrons represent a key component to amplify RF to generate the accelerating field of the superconducting radio frequency (SRF) cavities at European XFEL. Exchanging these high-power components takes time and effort, thus it is necessary to minimize maintenance and downtime and at the same time maximize the device's operation. In an attempt to explore the behavior of klystrons using machine learning, we completed a series of experiments on our klystrons to determine various operational modes and conduct feature extraction and dimensionality reduction to extract the most valuable information about a normal operation. To analyze recorded data we used state-of-the-art data-driven learning techniques and recognized the most promising components that might help us better understand klystron operational states and identify early on possible faults or anomalies.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# ASMR: 効率的な推論のための活性化共有マルチレゾリューションコーディネートネットワーク

ASMR: Activation-sharing Multi-resolution Coordinate Networks For Efficient Inference ( http://arxiv.org/abs/2405.12398v1 )

ライセンス: Link先を確認

Jason Chun Lok Li, Steven Tin Sui Luo, Le Xu, Ngai Wong,

(参考訳) コーディネート・ネットワーク (Coordinate Network) または暗黙的ニューラル表現 (INR) は、コンパクトなニューラル表現の利点により、自然信号(画像やビデオなど)を高速に符号化する手法である。 INRの符号化能力を高めるために多くの方法が提案されているが、しばしば見過ごされる側面は推論効率であり、通常は乗算累積(MAC)数で測定される。これは、ハードウェアの制約によって推論スループットが大幅に制限されるユースケースにおいて特に重要である。そこで本研究では,多分解能座標分解と階層変調を組み合わせたASMR(Activation-Sharing Multi-Resolution)座標ネットワークを提案する。具体的には、ASMRモデルはデータのグリッド間でのアクティベーションの共有を可能にする。これにより、その推論コストは、その再設計能力と直接的に相関する深さから大きく切り離され、層数に関係なくほぼO(1)推論の複雑さが生じる。実験により、ASMRはバニラSIRENモデルのMACを最大500倍まで低減し、SIRENのベースラインよりもさらに高い再現品質が得られることが示された。

Coordinate network or implicit neural representation (INR) is a fast-emerging method for encoding natural signals (such as images and videos) with the benefits of a compact neural representation. While numerous methods have been proposed to increase the encoding capabilities of an INR, an often overlooked aspect is the inference efficiency, usually measured in multiply-accumulate (MAC) count. This is particularly critical in use cases where inference throughput is greatly limited by hardware constraints. To this end, we propose the Activation-Sharing Multi-Resolution (ASMR) coordinate network that combines multi-resolution coordinate decomposition with hierarchical modulations. Specifically, an ASMR model enables the sharing of activations across grids of the data. This largely decouples its inference cost from its depth which is directly correlated to its reconstruction capability, and renders a near O(1) inference complexity irrespective of the number of layers. Experiments show that ASMR can reduce the MAC of a vanilla SIREN model by up to 500x while achieving an even higher reconstruction quality than its SIREN baseline.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 世界モデリングのための拡散 : アタリにおける視覚的詳細事項

Diffusion for World Modeling: Visual Details Matter in Atari ( http://arxiv.org/abs/2405.12399v1 )

ライセンス: Link先を確認

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret,

(参考訳) 世界モデルは、安全でサンプル効率のよい強化学習エージェントを訓練するための有望なアプローチである。最近の世界モデルは、主に環境力学をモデル化するために、離散潜在変数のシーケンスを操作する。しかし、このコンパクトな離散表現への圧縮は、強化学習において重要な視覚的詳細を無視する可能性がある。同時に、拡散モデルは画像生成において支配的なアプローチとなり、個別の潜伏者をモデル化する確立された手法に挑戦している。このパラダイムシフトを動機として,拡散世界モデルで訓練された強化学習エージェントであるDIAMOND(DIffusion As a Model of eNvironment Dreams)を紹介する。我々は,世界モデリングに適した拡散を実現する上で必要となる重要な設計選択を解析し,視覚的詳細の改善がエージェントの性能向上にどのように寄与するかを実証する。 DIAMONDは競争力のあるAtari 100kベンチマークで平均1.46の人間正規化スコアを達成している。世界モデリングのための拡散に関する将来の研究を促進するため、私たちはコード、エージェント、プレイ可能な世界モデルをhttps://github.com/eloialonso/diamond.comでリリースします。

World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. To foster future research on diffusion for world modeling, we release our code, agents and playable world models at https://github.com/eloialonso/diamond.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# Gottesman-Kitaev-Preskill状態の基底状態の性質と非線形スクイーズ

Ground state nature and nonlinear squeezing of Gottesman-Kitaev-Preskill states ( http://arxiv.org/abs/2405.12406v1 )

ライセンス: Link先を確認

Petr Marek,

(参考訳) 旅行光による普遍量子計算の主なボトルネックは、十分な品質のゴッテマン・キタエフ・プレスキル状態の準備である。これは非常に難しい課題であり、実験的なだけでなく理論的な問題でもある。このような測度、GKPのスクイーズを導入し、状態の特徴付けの現在の方法とどのように関係しているかを示す。この尺度は計算が容易であり、状態の準備や実験結果の検証に容易に利用できる。

The main bottleneck for universal quantum computation with traveling light is the preparation of Gottesman-Kitaev-Preskill states of sufficient quality. This is an extremely challenging task, experimental as well as theoretical, also because there is currently no single easily computable measure of quality for these states. We introduce such measure, GKP squeezing, and show how it is related to the current ways of characterizing the states. The measure is easy to compute and can be easily employed in state preparation as well as verification of experimental results.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 木深度パラメータによる値制約満足度の局所探索

Local search for valued constraint satisfaction parameterized by treedepth ( http://arxiv.org/abs/2405.12410v1 )

ライセンス: Link先を確認

Artem Kaznatcheev,

(参考訳) 時折、局所探索アルゴリズムは局所的なピークを効率的に見つけることができない。その理由を理解するため、VCSP(価値制約満足度問題)からフィットネスランドスケープの上昇構造を考察する。木深$d$の制約グラフを持つ VCSP が与えられたとき、任意の初期割り当てから、常に局所ピークまでの長さ 2^{d + 1} \cdot n$ の上昇が存在することを証明します。これは、対数木深度の制約グラフから常にフィットネスランドスケープに存在することを意味し、したがって有界木幅のすべてのVCSPに対してもである。しかし、これは、ローカルな検索アルゴリズムが、まばらなVCSPの短い上昇を常に見つけ、追跡するという意味ではない。 loglog treedepthではsuperpolynomial ascentsが存在し、polylog treedepthでは、すべてのAscentsがsuperpolynomialである初期割り当てがある。これらの結果は、スパースVCSPの研究が、効率的な局所探索の障壁を理解するのに役立つことを示唆している。

Sometimes local search algorithms cannot efficiently find even local peaks. To understand why, I look at the structure of ascents in fitness landscapes from valued constraint satisfaction problems (VCSPs). Given a VCSP with a constraint graph of treedepth $d$, I prove that from any initial assignment there always exists an ascent of length $2^{d + 1} \cdot n$ to a local peak. This means that short ascents always exist in fitness landscapes from constraint graphs of logarithmic treedepth, and thus also for all VCSPs of bounded treewidth. But this does not mean that local search algorithms will always find and follow such short ascents in sparse VCSPs. I show that with loglog treedepth, superpolynomial ascents exist; and for polylog treedepth, there are initial assignments from which all ascents are superpolynomial. Together, these results suggest that the study of sparse VCSPs can help us better understand the barriers to efficient local search.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 離散確率型ニューラルネットワークの校正について

On Measuring Calibration of Discrete Probabilistic Neural Networks ( http://arxiv.org/abs/2405.12412v1 )

ライセンス: Link先を確認

Spencer Young, Porter Jenkins,

(参考訳) 機械学習システムが現実世界のアプリケーションにますます統合されるにつれて、その安全性、堅牢性、信頼性を高める上では、不確実性を正確に表現することが不可欠である。高次元確率分布を最大極大で適合させるニューラルネットワークの訓練は、不確実性定量化の有効な方法となっている。しかし、そのようなモデルはしばしばキャリブレーションが悪く、自信過剰な予測につながった。 expected Calibration Error (ECE) や Negative Log Likelihood (NLL) のような従来のメトリクスには、バイアスやパラメトリック仮定などの制限がある。本稿では,これらのバイアスや仮定を伴わずにキャリブレーションの誤差を測定するために,条件付きカーネル平均埋め込みを用いた新しい手法を提案する。合成データに関する予備的な実験は、この方法の可能性を示し、より複雑な応用に向けた今後の研究が計画されている。

As machine learning systems become increasingly integrated into real-world applications, accurately representing uncertainty is crucial for enhancing their safety, robustness, and reliability. Training neural networks to fit high-dimensional probability distributions via maximum likelihood has become an effective method for uncertainty quantification. However, such models often exhibit poor calibration, leading to overconfident predictions. Traditional metrics like Expected Calibration Error (ECE) and Negative Log Likelihood (NLL) have limitations, including biases and parametric assumptions. This paper proposes a new approach using conditional kernel mean embeddings to measure calibration discrepancies without these biases and assumptions. Preliminary experiments on synthetic data demonstrate the method's potential, with future work planned for more complex applications.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# 低リソース言語ファミリに対するターゲット型多言語適応

Targeted Multilingual Adaptation for Low-resource Language Families ( http://arxiv.org/abs/2405.12413v1 )

ライセンス: Link先を確認

C. M. Downey, Terra Blevins, Dhwani Serai, Dwija Parikh, Shane Steinert-Threlkeld,

(参考訳) マルチリンガルモデルの「大規模マルチリンガル」トレーニングは、どの言語でも実用性を制限することが知られており、低リソース言語では特に不十分である。しかし、低リソース言語は、モデルが密接な関係のある言語で訓練されるターゲットの多言語性から恩恵を受けることができるという証拠がある。このアプローチをより厳密にテストするために,事前学習されたモデルを言語系に適用するためのベストプラクティスを体系的に研究する。テストケースとしてUralicファミリに着目し、XLM-Rを様々な構成でモデル15言語に適応させ、2つの下流タスクと11の評価言語でそれぞれの実験環境の性能を評価する。適応モデルは単言語および多言語ベースラインを大きく上回る。さらに、ハイパーパラメータ効果の回帰分析により、適応語彙のサイズは低リソース言語では比較的重要ではなく、低リソース言語は高リソース言語の性能をほとんど損なうことなくトレーニング中に積極的にアップサンプリングできることが明らかになった。これらの結果から,ターゲット設定で言語適応を行うための新たなベストプラクティスが導入された。

The "massively-multilingual" training of multilingual models is known to limit their utility in any one language, and they perform particularly poorly on low-resource languages. However, there is evidence that low-resource languages can benefit from targeted multilinguality, where the model is trained on closely related languages. To test this approach more rigorously, we systematically study best practices for adapting a pre-trained model to a language family. Focusing on the Uralic family as a test case, we adapt XLM-R under various configurations to model 15 languages; we then evaluate the performance of each experimental setting on two downstream tasks and 11 evaluation languages. Our adapted models significantly outperform mono- and multilingual baselines. Furthermore, a regression analysis of hyperparameter effects reveals that adapted vocabulary size is relatively unimportant for low-resource languages, and that low-resource languages can be aggressively up-sampled during training at little detriment to performance in high-resource languages. These results introduce new best practices for performing language adaptation in a targeted setting.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# GeoMask3D:3Dにおける自己教師付きポイントクラウド学習のための幾何学的インフォームドマスク選択

GeoMask3D: Geometrically Informed Mask Selection for Self-Supervised Point Cloud Learning in 3D ( http://arxiv.org/abs/2405.12419v1 )

ライセンス: Link先を確認

Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers,

(参考訳) 我々は,Masked Auto Encoders (MAE) の効率を高めるために,GeoMask3D (GM3D) と呼ばれる幾何学的に情報を得たマスク選択戦略を用いて,点雲に対する自己教師型学習の先駆的アプローチを導入する。従来のランダムマスキング法とは異なり,本手法では教師学生モデルを用いて,データ内の複雑な領域に焦点をあてる。この戦略は、より厳しいパッチに集中することでより堅牢な特徴表現が得られるという仮説に基づいている。また,特徴量情報から包括的コンテキストを用いた幾何学的複雑性の予測を導くために,完全-部分的特徴量レベルの知識蒸留手法を提案する。大規模実験により,本手法がSOTA(State-Of-The-Art)ベースラインよりも優れていることが確認された。

We introduce a pioneering approach to self-supervised learning for point clouds, employing a geometrically informed mask selection strategy called GeoMask3D (GM3D) to boost the efficiency of Masked Auto Encoders (MAE). Unlike the conventional method of random masking, our technique utilizes a teacher-student model to focus on intricate areas within the data, guiding the model's focus toward regions with higher geometric complexity. This strategy is grounded in the hypothesis that concentrating on harder patches yields a more robust feature representation, as evidenced by the improved performance on downstream tasks. Our method also presents a complete-to-partial feature-level knowledge distillation technique designed to guide the prediction of geometric complexity utilizing a comprehensive context from feature-level information. Extensive experiments confirm our method's superiority over State-Of-The-Art (SOTA) baselines, demonstrating marked improvements in classification, and few-shot tasks.

翻訳日:2024-05-22 14:47:55 公開日:2024-05-20

# GarmentDreamer: 多様な幾何学とテクスチャを具備した3DGSガイド型ガーメント合成

GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details ( http://arxiv.org/abs/2405.12420v1 )

ライセンス: Link先を確認

Boqian Li, Xuan Li, Ying Jiang, Tianyi Xie, Feng Gao, Huamin Wang, Yin Yang, Chenfanfu Jiang,

(参考訳) 伝統的な3D衣料は、スケッチ、モデリング、紫外線マッピング、テクスチャなど、時間と費用がかかる労働集約型である。拡散に基づく生成モデルの最近の進歩は、テキストプロンプト、画像、ビデオから3D衣服を生成する新しい可能性を可能にしている。しかし、既存の方法は多視点画像の不整合に悩まされるか、下層の人間モデルから布を分離するために追加のプロセスを必要とする。本稿では,GarmentDreamerを提案する。GarmentDreamerは3Dガウス・スプレイティング(GS)を利用して,テキストプロンプトからウェアラブルでシミュレーション可能な3D衣料メッシュを生成する手法である。生成モデルによって直接予測されるマルチビュー画像をガイダンスとして使用するのとは対照的に、3DGSのガイダンスは、衣服の変形とテクスチャ合成の両方において一貫した最適化を保証する。本手法では,通常およびRGBA情報によってガイドされる新しい衣服拡張モジュールを導入し,暗黙のニューラルネットワーク場(NeTF)とスコア蒸留サンプリング(SDS)を組み合わせて,多様な幾何学的・テクスチャ的詳細を生成する。我々は,GarmentDreamerの最先端技術よりも優れた性能を示す総合的質的,定量的実験により,本手法の有効性を検証した。私たちのプロジェクトページは、https://xuan-li.github.io/GarmentDreamerDemo/.com/で利用可能です。

Traditional 3D garment creation is labor-intensive, involving sketching, modeling, UV mapping, and texturing, which are time-consuming and costly. Recent advances in diffusion-based generative models have enabled new possibilities for 3D garment generation from text prompts, images, and videos. However, existing methods either suffer from inconsistencies among multi-view images or require additional processes to separate cloth from the underlying human model. In this paper, we propose GarmentDreamer, a novel method that leverages 3D Gaussian Splatting (GS) as guidance to generate wearable, simulation-ready 3D garment meshes from text prompts. In contrast to using multi-view images directly predicted by generative models as guidance, our 3DGS guidance ensures consistent optimization in both garment deformation and texture synthesis. Our method introduces a novel garment augmentation module, guided by normal and RGBA information, and employs implicit Neural Texture Fields (NeTF) combined with Score Distillation Sampling (SDS) to generate diverse geometric and texture details. We validate the effectiveness of our approach through comprehensive qualitative and quantitative experiments, showcasing the superior performance of GarmentDreamer over state-of-the-art alternatives. Our project page is available at: https://xuan-li.github.io/GarmentDreamerDemo/.

翻訳日:2024-05-22 14:38:05 公開日:2024-05-20

# オフラインリワード学習のための統一線形プログラミングフレームワーク

A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback ( http://arxiv.org/abs/2405.12421v1 )

ライセンス: Link先を確認

Kihyun Kim, Jiawei Zhang, Pablo A. Parrilo, Asuman Ozdaglar,

(参考訳) Inverse Reinforcement Learning (IRL) と Reinforcement Learning from Human Feedback (RLHF) は報酬学習において重要な方法論であり、人間の実演とフィードバックに基づいて、連続的な意思決定問題の報酬関数を推論・形成する。報奨学習におけるほとんどの以前の作業は、決定や選好モデルに関する事前の知識や仮定に依存しており、堅牢性の問題につながる可能性がある。そこで本研究では,オフライン報酬学習に適した新しい線形プログラミング(LP)フレームワークを提案する。本フレームワークは,オンライン探索を使わずに事前に収集した軌道を用いて,設計したLPの一次双対最適条件から設定した有望な報酬を推定し,提案可能なサンプル効率の最適性保証を提供する。我々のLPフレームワークはまた、計算的トラクタビリティとサンプル効率を維持しながら、ペアの軌道比較データなど、報酬関数を人間のフィードバックと整合させることができる。解析例と数値実験により,従来の最大推定法(MLE)と比較して,本フレームワークは性能が向上する可能性が示唆された。

Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments.

翻訳日:2024-05-22 14:38:05 公開日:2024-05-20

# 深層学習とGoogle Earth Engineを組み合わせた都市水抽出手法

An Urban Water Extraction Method Combining Deep Learning and Google Earth Engine ( http://arxiv.org/abs/1912.10726v2 )

ライセンス: Link先を確認

Yudie Wang, Zhiwei Li, Chao Zeng, Gui-Song Xia, Huanfeng Shen,

(参考訳) 都市水は都市生態系にとって重要である。リモートセンシングデータによる都市水の高精度かつ効率的な検出は、都市管理と計画にとって非常に重要である。本稿では,Google Earth Engine (GEE) とマルチスケール畳み込みニューラルネットワーク (MSCNN) を組み合わせてランドサット画像から都市水を抽出する方法を提案する。すなわち,MSCNNの訓練はオフラインで完了し,MSCNNの訓練パラメータを用いて都市水抽出のプロセスがGEE上で実施された。 OTOPは、GEEとCNNのそれぞれの利点をフルに活用し、GEE上でのディープラーニングメソッドの使用をより柔軟にする。データダウンロードや保存を必要とせずに、利用可能な衛星画像の処理が可能であり、都市水抽出の全体的な性能も、修正された正規化差水指数(MNDWI)やランダム森林よりも高い。長春,武漢,久明,広州では,都市水抽出のカッパ,F1スコア,結合(IoU)の平均値が0.924,0.930,0.869に達した。また、中国の他の主要都市で拡張された検証の結果、OTOPは堅牢であり、MSCNNの構造設計と訓練の恩恵を受ける様々な種類の都市水抽出に使用できることが示された。したがって,OTOPは都市化の背景にある大規模・長期の都市水変化検出研究に特に適している。

Urban water is important for the urban ecosystem. Accurate and efficient detection of urban water with remote sensing data is of great significance for urban management and planning. In this paper, we proposed a new method to combine Google Earth Engine (GEE) with multiscale convolutional neural network (MSCNN) to extract urban water from Landsat images, which is summarized as offline training and online prediction (OTOP). That is, the training of MSCNN was completed offline, and the process of urban water extraction was implemented on GEE with the trained parameters of MSCNN. The OTOP can give full play to the respective advantages of GEE and CNN, and make the use of deep learning method on GEE more flexible. It can process available satellite images with high performance without data download and storage, and the overall performance of urban water extraction is also higher than that of the modified normalized difference water index (MNDWI) and random forest. The mean kappa, F1-score and intersection over union (IoU) of urban water extraction with the OTOP in Changchun, Wuhan, Kunming and Guangzhou reached 0.924, 0.930 and 0.869, respectively. The results of the extended validation in the other major cities of China also show that the OTOP is robust and can be used to extract different types of urban water, which benefits from the structural design and training of the MSCNN. Therefore, the OTOP is especially suitable for the study of large-scale and long-term urban water change detection in the background of urbanization.

翻訳日:2024-05-22 03:18:46 公開日:2024-05-20

# 公正なアクティブラーニング:保険におけるラベル付け問題の解決

Fair Active Learning: Solving the Labeling Problem in Insurance ( http://arxiv.org/abs/2112.09466v4 )

ライセンス: Link先を確認

Romuald Elie, Caroline Hillairet, François Hu, Marc Juillard,

(参考訳) 本稿では,保険業界における機械学習モデルの普及に伴う大きな障害に対処する。最初の課題は、ラベルのないデータを保険で効果的に活用し、ラベル付けの労力を減らし、アクティブな学習技術によるデータ関連性を強調することである。本報告では, 各種アクティブラーニングサンプリング手法について検討し, 総合的および実保険データセットに与える影響について検討する。この分析は、機械学習モデルが基礎となるデータに見られるバイアスや差別を再現する可能性があるため、公正なモデル推論を達成することの難しさを強調している。このような相互接続型課題に対処するために,本研究では,革新的なフェアアクティブラーニング手法を提案する。提案手法は, モデル予測性能と公正性とのバランスが良好であることを, 保険データセットの数値実験で確認した。

This paper addresses significant obstacles that arise from the widespread use of machine learning models in the insurance industry, with a specific focus on promoting fairness. The initial challenge lies in effectively leveraging unlabeled data in insurance while reducing the labeling effort and emphasizing data relevance through active learning techniques. The paper explores various active learning sampling methodologies and evaluates their impact on both synthetic and real insurance datasets. This analysis highlights the difficulty of achieving fair model inferences, as machine learning models may replicate biases and discrimination found in the underlying data. To tackle these interconnected challenges, the paper introduces an innovative fair active learning method. The proposed approach samples informative and fair instances, achieving a good balance between model predictive performance and fairness, as confirmed by numerical experiments on insurance datasets.

翻訳日:2024-05-22 01:31:04 公開日:2024-05-20

# IT5: イタリア語の理解と生成のためのテキストからテキストへの事前学習

IT5: Text-to-text Pretraining for Italian Language Understanding and Generation ( http://arxiv.org/abs/2203.03759v2 )

ライセンス: Link先を確認

Gabriele Sarti, Malvina Nissim,

(参考訳) イタリアで事前訓練されたエンコーダ・デコーダ・トランスフォーマーモデルの最初のファミリーであるIT5を紹介する。我々は,大規模なイタリアのコーパスに対して徹底的なクリーニング手順を文書化し,それを用いて4つのIT5モデルサイズを事前訓練する。次に、ItaGenベンチマークを紹介します。これは、イタリア語に対する幅広い自然言語理解および生成タスクを含み、IT5モデルと多言語ベースラインのパフォーマンスを評価するためにそれを使用します。モノリンガルなIT5モデルは、テスト対象のモデル間で最高のスケールとパフォーマンスの比率を提供し、一貫してマルチリンガルなモデルよりも優れたパフォーマンスを提供し、新たな最先端のイタリア語生成を実現しています。

We introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We document and perform a thorough cleaning procedure for a large Italian corpus and use it to pretrain four IT5 model sizes. We then introduce the ItaGen benchmark, which includes a broad range of natural language understanding and generation tasks for Italian, and use it to evaluate the performance of IT5 models and multilingual baselines. We find monolingual IT5 models to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for Italian language generation.

翻訳日:2024-05-22 01:31:04 公開日:2024-05-20

# 逆問題に対するマニフォールド制約を用いた拡散モデルの改善

Improving Diffusion Models for Inverse Problems using Manifold Constraints ( http://arxiv.org/abs/2206.00941v3 )

ライセンス: Link先を確認

Hyungjin Chung, Byeongsu Sim, Dohoon Ryu, Jong Chul Ye,

(参考訳) 近年、拡散モデルは、サンプリングプロセスに適切な修正を加えることなく、教師なしの方法で様々な逆問題を解くために使用されている。しかし、逆拡散ステップを逐次適用した現在の解法は、射影に基づく測定一貫性ステップを伴って、しばしば準最適結果を生成する。生成的サンプリングパスを調べることで、現在の解法がサンプルパスをデータ多様体から捨てることを示し、したがってエラーが蓄積される。これを解決するために、多様体の制約に着想を得た追加の補正項を提案する。提案された多様体制約は、数行のコードで簡単に実装できるが、驚くほど大きなマージンでパフォーマンスを向上する。広汎な実験により,本手法は理論的にも経験的にも従来の手法よりも優れており,画像インペインティング,カラー化,スパースビューCTなどの多くの応用において有望な結果が得られた。 Code available https://github.com/HJ-harry/MCG_diffusion

Recently, diffusion models have been used to solve various inverse problems in an unsupervised manner with appropriate modifications to the sampling process. However, the current solvers, which recursively apply a reverse diffusion step followed by a projection-based measurement consistency step, often produce suboptimal results. By studying the generative sampling path, here we show that current solvers throw the sample path off the data manifold, and hence the error accumulates. To address this, we propose an additional correction term inspired by the manifold constraint, which can be used synergistically with the previous solvers to make the iterations close to the manifold. The proposed manifold constraint is straightforward to implement within a few lines of code, yet boosts the performance by a surprisingly large margin. With extensive experiments, we show that our method is superior to the previous methods both theoretically and empirically, producing promising results in many applications such as image inpainting, colorization, and sparse-view computed tomography. Code available https://github.com/HJ-harry/MCG_diffusion

翻訳日:2024-05-22 01:31:04 公開日:2024-05-20

# ブロックチェーンベースのセキュアエネルギーマーケットプレーススキームによるピアからピアマイクログリッドへのモチベーション

Blockchain based Secure Energy Marketplace Scheme to Motivate Peer to Peer Microgrids ( http://arxiv.org/abs/2206.07248v3 )

ライセンス: Link先を確認

Muhammad Awais, Qamar Abbas, Shehbaz Tariq, Sayyaf Haider Warraich,

(参考訳) ここ数年、ピーク時のコスト削減のため、マイクログリッドのトレンドは非常に急速に増加している。しかし、これらのシステムでは、サードパーティは依然として余剰エネルギーの販売に関与している。その結果、エネルギーコストが増大し、そのようなシステムには多くの運用およびセキュリティ障壁が存在する。これらの問題は、コンシューマが余剰エネルギーを他のコンシューマにローカルに販売できる、分散化されたマイクログリッドの分散システムによって解決できる。このようなシステムをデプロイするには、エネルギーの取引に対するセキュリティ障壁を考慮する必要がある。そこで,本稿では,ユーザが相互に交流し,より良い利率でエネルギーを売買し,リース時にエネルギー資源を得るマーケットプレースとしてのスキームを考案し,資本投資の心配をしなくて済むようにすることで,これらの問題を解決することを提案する。リソースの所有者とコンシューマの合意は、ブロックチェーンベースのスマートコントラクトに基づいて記録される。本稿では、既存のよく知られた分散型エネルギーソリューションに対する調査を行う。また,保護された実行環境を活用するための余分なセキュリティ層を提案し,システムに侵入しても,コンシューマやサードパーティが生成し,利用し,共有するエネルギー情報を変更できないようにした。

In the past years trend of microgrids is increasing very fast to reduce peak-hour costs. However, in these systems, third parties are still involved in selling surplus energy. This results in increased cost of energy and there are many operational and security barriers in such systems. These issues can be solved by the decentralized distributed system of microgrids where a consumer can locally sell their surplus energy to another consumer. To deploy such a system, one must consider security barriers for the transaction of energy. This paper proposes a solution to these problems by devising a scheme as a marketplace where users interact with each other to buy and sell energy at better rates and get energy-generating resources on lease so that users do not have to worry about capital investment. Agreement between owner of resources and consumer is recorded on blockchain based smart contracts. In this paper, a survey is performed for existing well known, decentralized energy solutions. This paper also proposes an extra layer of security to leverage a shielded execution environment so that information of energy generated, utilized, and shared cannot be changed by consumers and third parties even if the system is compromised.

翻訳日:2024-05-22 01:31:04 公開日:2024-05-20

# レーンマーキングを用いた地理参照のためのロバストなセルフチューニングデータアソシエーション

Robust Self-Tuning Data Association for Geo-Referencing Using Lane Markings ( http://arxiv.org/abs/2207.14042v2 )

ライセンス: Link先を確認

Miguel Ángel Muñoz-Bañón, Jan-Hendrik Pauls, Haohao Hu, Christoph Stiller, Francisco A. Candelas, Fernando Torres,

(参考訳) 航空画像に基づく地図のローカライゼーションは、グローバルな一貫性、ジオリファレンスマップ、パブリックアクセス可能なデータなど、多くの利点がある。しかし、空中画像と搭載センサーの両方から観測できるランドマークは限られている。これは、データアソシエーションのあいまいさやエイリアスにつながる。本稿では,高情報化表現(効率的なデータアソシエーションを可能にする)に基づいて,これらの曖昧性を解決するための完全なパイプラインを提案する。その中核は、測定のエントロピーに応じて探索領域に適応する堅牢な自己調整データアソシエーションである。さらに、最終的な結果を円滑にするために、データアソシエーションプロセスによって生成された相対変換の関数として、関連データの情報行列を調整する。ドイツ・カールスルーエ市周辺の都市・農村のシナリオを実データとして評価した。我々は、最先端のアウトリア緩和手法と自己調整アプローチを比較し、特に郊外のシナリオにおいて、大幅な改善を示す。

Localization in aerial imagery-based maps offers many advantages, such as global consistency, geo-referenced maps, and the availability of publicly accessible data. However, the landmarks that can be observed from both aerial imagery and on-board sensors is limited. This leads to ambiguities or aliasing during the data association. Building upon a highly informative representation (that allows efficient data association), this paper presents a complete pipeline for resolving these ambiguities. Its core is a robust self-tuning data association that adapts the search area depending on the entropy of the measurements. Additionally, to smooth the final result, we adjust the information matrix for the associated data as a function of the relative transform produced by the data association process. We evaluate our method on real data from urban and rural scenarios around the city of Karlsruhe in Germany. We compare state-of-the-art outlier mitigation methods with our self-tuning approach, demonstrating a considerable improvement, especially for outer-urban scenarios.

翻訳日:2024-05-22 01:31:04 公開日:2024-05-20

# Hyperloop:サイバーセキュリティの展望

Hyperloop: A Cybersecurity Perspective ( http://arxiv.org/abs/2209.03095v3 )

ライセンス: Link先を確認

Alessandro Brighente, Mauro Conti, Denis Donadel, Federico Turrin,

(参考訳) ハイパーループは将来の交通システムの中でも最も有名なものの一つである。持続可能性を確保しつつ、最高速度1220km/hの走行を可能にする新しい技術を含んでいる。システムのパフォーマンス要件とそれが表す重要なインフラストラクチャのため、その安全性とセキュリティを慎重に検討する必要がある。輸送システムでは、サイバー攻撃は人口と周辺環境にとって破滅的な結果をもたらす安全上の問題を引き起こす可能性がある。現在まで、ハイパーループ技術のサイバーセキュリティに関する研究は行われていない。本稿では,Hyperloopエコシステムのさまざまなコンポーネント間の相互接続におけるサイバーセキュリティの課題を初めて分析する。私たちは、現在利用可能なHyperloopの実装に基づいて分析を行い、最終的な設計で見られるであろうこれらの機能を精査しています。さらに,インフラ管理のアプローチとそのセキュリティ問題についても検討する。最後に,ハイパーループ設計の安全性に対する対策と今後の方向性について論じる。

Hyperloop is among the most prominent future transportation systems. It involves novel technologies to allow traveling at a maximum speed of 1220km/h while guaranteeing sustainability. Due to the system's performance requirements and the critical infrastructure it represents, its safety and security must be carefully considered. In transportation systems, cyberattacks could lead to safety issues with catastrophic consequences for the population and the surrounding environment. To this day, no research investigated the cybersecurity issues of the Hyperloop technology. In this paper, we provide the first analysis of the cybersecurity challenges of the interconnections between the different components of the Hyperloop ecosystem. We base our analysis on the currently available Hyperloop implementations, distilling those features that will likely be present in its final design. Moreover, we investigate possible infrastructure management approaches and their security concerns. Finally, we discuss countermeasures and future directions for the security of the Hyperloop design.

翻訳日:2024-05-22 01:20:28 公開日:2024-05-20

# 一般雑音逆問題に対する拡散後方サンプリング法

Diffusion Posterior Sampling for General Noisy Inverse Problems ( http://arxiv.org/abs/2209.14687v4 )

ライセンス: Link先を確認

Hyungjin Chung, Jeongsol Kim, Michael T. Mccann, Marc L. Klasky, Jong Chul Ye,

(参考訳) 拡散モデルは最近、高品質な再構成と既存の反復解法を組み合わせることの容易さから、強力な逆問題解法として研究されている。しかし、ほとんどの研究は、ノイズのない環境での単純な線形逆問題の解決に重点を置いている。本研究では,拡散解法を拡張し,後方サンプリングの近似を用いて一般雑音(非線形逆問題)を効率的に処理する。興味深いことに、得られた後続サンプリング方式は、厳密な測定整合性予測ステップを伴わずに、多様体拘束勾配の拡散サンプリングのブレンド版であり、以前の研究と比べてノイズの多い設定でより望ましい生成経路が得られる。拡散モデルではガウシアンやポアソンのような様々な計測ノイズ統計を組み込むことができ、フーリエ位相探索や不均一な振れといった非線形逆問題も効率的に処理できることを示す。 https://github.com/DPS2022/diffusion-posterior-sampling

Diffusion models have been recently studied as powerful generative inverse problem solvers, owing to their high quality reconstructions and the ease of combining existing iterative solvers. However, most works focus on solving simple linear inverse problems in noiseless settings, which significantly under-represents the complexity of real-world problems. In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via approximation of the posterior sampling. Interestingly, the resulting posterior sampling scheme is a blended version of diffusion sampling with the manifold constrained gradient without a strict measurement consistency projection step, yielding a more desirable generative path in noisy settings compared to the previous studies. Our method demonstrates that diffusion models can incorporate various measurement noise statistics such as Gaussian and Poisson, and also efficiently handle noisy nonlinear inverse problems such as Fourier phase retrieval and non-uniform deblurring. Code available at https://github.com/DPS2022/diffusion-posterior-sampling

翻訳日:2024-05-22 01:20:28 公開日:2024-05-20

# ティール: WAN交通工学の学習促進最適化

Teal: Learning-Accelerated Optimization of WAN Traffic Engineering ( http://arxiv.org/abs/2210.13763v4 )

ライセンス: Link先を確認

Zhiying Xu, Francis Y. Yan, Rachee Singh, Justin T. Chiu, Alexander M. Rush, Minlan Yu,

(参考訳) グローバルクラウドワイドエリアネットワーク(WAN)の急速な拡張は、商用最適化エンジンが大規模なネットワークトラフィックエンジニアリング(TE)問題を効率的に解決する上で、課題となっている。既存の高速化戦略では、TE最適化を並列サブプロブレムに分解するが、実行時間とアロケーション性能の本質的にのトレードオフにより、制限された並列性を実現する。本稿では,GPUの並列処理能力を活用してTE制御を高速化する学習型TEアルゴリズムTealを提案する。まず、Tealはフロー中心グラフニューラルネットワーク(GNN)を設計し、WAN接続とネットワークフローをキャプチャし、下流アロケーションへのインプットとしてのフロー特徴を学習する。第2に,問題スケールを小さくし,学習を容易なものにするため,中央のTE目標を最適化しながら,各交通需要を独立的に割り当てるマルチエージェント強化学習(RL)アルゴリズムを用いる。最後に,ADMM(Alternating Direction Method of Multipliers)を用いたTeal Fine-tunesアロケーションは,過利用リンクなどの制約違反を低減するために,高度に並列化可能な最適化アルゴリズムである。 MicrosoftのWANのトラフィック行列を用いたTealの評価を行った。 1,700ノードの大規模なWANトポロジでは、Tealは、プロダクション最適化エンジンよりも数桁高速に動作しながら、ほぼ最適なフローアロケーションを生成する。他のTE加速方式と比較して、Tealは需要を6～32%増やし、197～625倍のスピードアップを達成している。

The rapid expansion of global cloud wide-area networks (WANs) has posed a challenge for commercial optimization engines to efficiently solve network traffic engineering (TE) problems at scale. Existing acceleration strategies decompose TE optimization into concurrent subproblems but realize limited parallelism due to an inherent tradeoff between run time and allocation performance. We present Teal, a learning-based TE algorithm that leverages the parallel processing power of GPUs to accelerate TE control. First, Teal designs a flow-centric graph neural network (GNN) to capture WAN connectivity and network flows, learning flow features as inputs to downstream allocation. Second, to reduce the problem scale and make learning tractable, Teal employs a multi-agent reinforcement learning (RL) algorithm to independently allocate each traffic demand while optimizing a central TE objective. Finally, Teal fine-tunes allocations with ADMM (Alternating Direction Method of Multipliers), a highly parallelizable optimization algorithm for reducing constraint violations such as overutilized links. We evaluate Teal using traffic matrices from Microsoft's WAN. On a large WAN topology with >1,700 nodes, Teal generates near-optimal flow allocations while running several orders of magnitude faster than the production optimization engine. Compared with other TE acceleration schemes, Teal satisfies 6--32% more traffic demand and yields 197--625x speedups.

翻訳日:2024-05-22 01:20:28 公開日:2024-05-20

# RulE: ルール埋め込みによる知識グラフの推論

RulE: Knowledge Graph Reasoning with Rule Embedding ( http://arxiv.org/abs/2210.14905v3 )

ライセンス: Link先を確認

Xiaojuan Tang, Song-Chun Zhu, Yitao Liang, Muhan Zhang,

(参考訳) 知識グラフ推論(KG)は知識グラフにとって重要な問題である。本稿では,論理的ルールを効果的に活用し,KG推論を強化するために,‘textbf{RulE}({Rul}e {E}mbedding)’という新奇で原則化されたフレームワークを提案する。知識グラフ埋め込み (KGE) 法とは異なり、RulE は既存の三重項と一階述語 {rules} から規則埋め込みを学習し、統合埋め込み空間において \textbf{entities} 、 \textbf{relations} および \textbf{logical rules} を共同で表現する。学習したルールの埋め込みに基づいて、各ルールに対する信頼スコアを計算し、観察された三重項との整合性を反映する。これにより、論理規則推論をソフトな方法で実行し、論理の脆さを軽減することができる。一方、RulEは事前の論理ルール情報を埋め込み空間に注入し、エンティティ/リレーショナル埋め込みを豊かにし、規則化する。これにより、KGE単独のパフォーマンスも向上する。 RulEは概念的にはシンプルで、経験的に有効です。我々はRulEの各成分を検証するために広範な実験を行う。複数のベンチマークの結果、我々のモデルは既存の埋め込みベースのアプローチやルールベースのアプローチよりも優れています。

Knowledge graph (KG) reasoning is an important problem for knowledge graphs. In this paper, we propose a novel and principled framework called \textbf{RulE} (stands for {Rul}e {E}mbedding) to effectively leverage logical rules to enhance KG reasoning. Unlike knowledge graph embedding (KGE) methods, RulE learns rule embeddings from existing triplets and first-order {rules} by jointly representing \textbf{entities}, \textbf{relations} and \textbf{logical rules} in a unified embedding space. Based on the learned rule embeddings, a confidence score can be calculated for each rule, reflecting its consistency with the observed triplets. This allows us to perform logical rule inference in a soft way, thus alleviating the brittleness of logic. On the other hand, RulE injects prior logical rule information into the embedding space, enriching and regularizing the entity/relation embeddings. This makes KGE alone perform better too. RulE is conceptually simple and empirically effective. We conduct extensive experiments to verify each component of RulE. Results on multiple benchmarks reveal that our model outperforms the majority of existing embedding-based and rule-based approaches.

翻訳日:2024-05-22 01:20:28 公開日:2024-05-20

# 共変量シフトの祝福と曲線--対数学習ダイナミクス,方向収束,平衡

Blessings and Curses of Covariate Shifts: Adversarial Learning Dynamics, Directional Convergence, and Equilibria ( http://arxiv.org/abs/2212.02457v3 )

ライセンス: Link先を確認

Tengyuan Liang,

(参考訳) 共変量分布の変化と対向摂動は従来の統計学習フレームワークに頑健な課題をもたらす: テスト共変量分布の軽微な変化は、トレーニング分布に基づいて学習された統計モデルの性能に大きく影響する。モデルの性能は、外挿が発生すると劣化する。すなわち、トレーニング分布が不足している領域へのシフトを共変させ、当然、学習されたモデルには情報はほとんどない。頑健性や正規化を考慮し, 対向的摂動法を治療として提案するが, 学習モデルにより, 対向的共変量シフトがどの外挿領域に焦点を絞るかについて, 慎重に検討する必要がある。本稿では,無限次元環境下での回帰と分類の両面から,外挿領域を正確に特徴づける。逐次ゲームフレームワークにおける平衡モデル(ベイズ最適モデル)のその後の学習に対する逆共変量シフトの影響について検討する。対戦型学習ゲームのダイナミクスを生かし、平衡学習と実験設計への共変量シフトの好奇効果を明らかにする。特に, 回帰の祝福, 指数率の逆共変量シフト, 迅速な学習のための最適実験設計への変化, 分類の呪い, 逆共変量シフトの準4次速度へのシフト, 続く学習をトラップする最も難しい実験設計への2つの方向性収束結果を確立する。

Covariate distribution shifts and adversarial perturbations present robustness challenges to the conventional statistical learning framework: mild shifts in the test covariate distribution can significantly affect the performance of the statistical model learned based on the training distribution. The model performance typically deteriorates when extrapolation happens: namely, covariates shift to a region where the training distribution is scarce, and naturally, the learned model has little information. For robustness and regularization considerations, adversarial perturbation techniques are proposed as a remedy; however, careful study needs to be carried out about what extrapolation region adversarial covariate shift will focus on, given a learned model. This paper precisely characterizes the extrapolation region, examining both regression and classification in an infinite-dimensional setting. We study the implications of adversarial covariate shifts to subsequent learning of the equilibrium -- the Bayes optimal model -- in a sequential game framework. We exploit the dynamics of the adversarial learning game and reveal the curious effects of the covariate shift to equilibrium learning and experimental design. In particular, we establish two directional convergence results that exhibit distinctive phenomena: (1) a blessing in regression, the adversarial covariate shifts in an exponential rate to an optimal experimental design for rapid subsequent learning; (2) a curse in classification, the adversarial covariate shifts in a subquadratic rate to the hardest experimental design trapping subsequent learning.

翻訳日:2024-05-22 01:20:28 公開日:2024-05-20

# 生レーダフレーム上でのオンライン物体検出のための繰り返しCNN

A recurrent CNN for online object detection on raw radar frames ( http://arxiv.org/abs/2212.11172v3 )

ライセンス: Link先を確認

Colin Decourt, Rufin VanRullen, Didier Salle, Thomas Oberlin,

(参考訳) 自動車用レーダーセンサーは、高度運転支援システム(ADAS)に貴重な情報を提供する。レーダーは、天気や光条件に関わらず、物体からの距離と相対速度を確実に推定することができる。しかし、レーダーセンサーは、低解像度で、物体の形状のクラス内における大きな変化に悩まされている。時間情報(例えば、複数のフレーム)の爆発は、オブジェクトのダイナミクスをよりよく捉え、したがってオブジェクトの形状の変化を捉えるのに役立つことが示されている。ほとんどの時間的レーダー物体検出器は空間的および時間的情報を学ぶために3D畳み込みを使用する。しかし、これらの手法はしばしば非因果的であり、リアルタイムアプリケーションには適さない。本稿では,オンラインレーダオブジェクト検出のための新しいCNNアーキテクチャであるRECORDを紹介する。本稿では,コンボリューションとConvLSTMを混合したエンドツーエンドのトレーニング可能なアーキテクチャを提案し,逐次フレーム間の時空間依存性を学習する。我々のモデルは因果的であり、オブジェクトを検出するためにConvLSTMのメモリに符号化された過去の情報のみを必要とする。本実験は,ROD2021およびCARRADAデータセットにおける異なるレーダ表現(レンジドップラー,レンジアングル)のオブジェクト検出の妥当性を示す。

Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g., multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive.

翻訳日:2024-05-22 01:20:28 公開日:2024-05-20

# 量子時間差学習の解析

An Analysis of Quantile Temporal-Difference Learning ( http://arxiv.org/abs/2301.04462v3 )

ライセンス: Link先を確認

Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney,

(参考訳) 大規模強化学習の大規模応用において重要な要素であることが証明された分散強化学習アルゴリズムである量子時間差分学習(QTD)を解析する。これらの経験的成功にもかかわらず、QTDに関する理論的理解はこれまでにも発覚的であることが証明されている。標準的な確率近似ツールで解析できる古典的TD学習とは異なり、QTD更新は縮約写像を近似せず、非常に非線形であり、複数の固定点を持つ。本論文の中核的な結果は、確率 1 で関連する動的プログラミング手順のファミリーの固定点への収束の証明であり、QTD をしっかりとした理論的な足場に配置する。この証明は、確率近似理論と非滑らか解析を通じて、QTDと非線形微分包含物の間の関係を確立する。

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.

翻訳日:2024-05-22 01:10:44 公開日:2024-05-20

# 変動推論:後部閾値はスパースレジームにおけるネットワーククラスタリング精度を改善する

Variational Inference: Posterior Threshold Improves Network Clustering Accuracy in Sparse Regimes ( http://arxiv.org/abs/2301.04771v2 )

ライセンス: Link先を確認

Xuezhen Li, Can M. Le,

(参考訳) 変分推論は機械学習の文献で様々なベイズモデルに適合するために広く用いられている。ネットワーク解析において,この手法はコミュニティ検出問題の解決に成功している。これらの結果は有望であるが、理論上の支持は相対的に密度の高いネットワークに限られており、これは実際のネットワークには当てはまらない仮定である。また, 最近, ばらつき損失面には多くのサドル点があり, 特にスパースネットワークに適用した場合, その性能に深刻な影響を及ぼす可能性があることが示されている。本稿では,各反復後のコミュニティ割り当ての後部をハードしきい値にすることで,変分推論法を改善するための簡易な方法を提案する。提案手法は, ネットワークのノード平均次数が有界であっても, 真のコミュニティラベルを収束し, 正確に復元できることを, 真のコミュニティ割り当てと相関するランダム初期化を用いて示す。大規模な数値研究により、古典的変分推論と別の最先端アルゴリズムに対する提案手法の利点がさらに裏付けられる。

Variational inference has been widely used in machine learning literature to fit various Bayesian models. In network analysis, this method has been successfully applied to solve the community detection problems. Although these results are promising, their theoretical support is only for relatively dense networks, an assumption that may not hold for real networks. In addition, it has been shown recently that the variational loss surface has many saddle points, which may severely affect its performance, especially when applied to sparse networks. This paper proposes a simple way to improve the variational inference method by hard thresholding the posterior of the community assignment after each iteration. Using a random initialization that correlates with the true community assignment, we show that the proposed method converges and can accurately recover the true community labels, even when the average node degree of the network is bounded. Extensive numerical study further confirms the advantage of the proposed method over the classical variational inference and another state-of-the-art algorithm.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# PECAN: バックドア攻撃に対する決定論的認証

PECAN: A Deterministic Certified Defense Against Backdoor Attacks ( http://arxiv.org/abs/2301.11824v4 )

ライセンス: Link先を確認

Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni,

(参考訳) ニューラルネットワークはバックドア中毒の攻撃に弱いため、攻撃者はトレーニングセットを悪意を持って毒殺し、テストインプットにトリガーを挿入して、被害者モデルの予測を変更する。既存のバックドア攻撃の防御は、正式な保証を提供しないか、コスト対計算と非効率な確率的保証を伴わない。 PECANは,バックドア攻撃に対する効果的かつ認証されたアプローチである。 PECANを支えている重要な洞察は、データの不整合分割に基づいてトレーニングされたニューラルネットワークのセットに対して、オフザシェルのテスト時間回避認証技術を適用することである。 PECANを画像分類とマルウェア検出データセットで評価する。以上の結果から,PECANは,(1)防衛力と効率の両面で最先端のバックドアディフェンスを著しく上回り,(2)実際のバックドアアタックでは,文献からのベースラインの範囲と比較して,桁違いに攻撃成功率を低下させることができることが示唆された。

Neural networks are vulnerable to backdoor poisoning attacks, where the attackers maliciously poison the training set and insert triggers into the test input to change the prediction of the victim model. Existing defenses for backdoor attacks either provide no formal guarantees or come with expensive-to-compute and ineffective probabilistic guarantees. We present PECAN, an efficient and certified approach for defending against backdoor attacks. The key insight powering PECAN is to apply off-the-shelf test-time evasion certification techniques on a set of neural networks trained on disjoint partitions of the data. We evaluate PECAN on image classification and malware detection datasets. Our results demonstrate that PECAN can (1) significantly outperform the state-of-the-art certified backdoor defense, both in defense strength and efficiency, and (2) on real back-door attacks, PECAN can reduce attack success rate by order of magnitude when compared to a range of baselines from the literature.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# 言語モデルにおけるマルチモーダル・チェーン・オブ・サート推論

Multimodal Chain-of-Thought Reasoning in Language Models ( http://arxiv.org/abs/2302.00923v5 )

ライセンス: Link先を確認

Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola,

(参考訳) 大規模言語モデル (LLM) は複雑な推論において、中間的推論の連鎖を推論する論理として生成するために必要なチェーン・オブ・シント (CoT) を活用することで、顕著な性能を示した。しかし、既存のCoT研究は主に言語モダリティに焦点を当てている。本稿では,言語(テキスト)と視覚(画像)のモダリティを2段階のフレームワークに組み込んだマルチモーダルCoTを提案する。このようにして、回答推論は、マルチモーダル情報に基づくより良い生成論理を活用できる。その結果,ScienceQA と A-OKVQA のベンチマークは,提案手法の有効性を示した。また,Multimodal-CoTでは,ScienceQAベンチマークにおいて,10億パラメータ未満のモデルで最先端のパフォーマンスを実現している。分析の結果,Multimodal-CoTは幻覚を緩和し,収束速度を向上する利点があることがわかった。コードはhttps://github.com/amazon-science/mm-cot.comで公開されている。

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach. With Multimodal-CoT, our model under 1 billion parameters achieves state-of-the-art performance on the ScienceQA benchmark. Our analysis indicates that Multimodal-CoT offers the advantages of mitigating hallucination and enhancing convergence speed. Code is publicly available at https://github.com/amazon-science/mm-cot.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# SPAN: トランスフォーマーによるシーングラフとイメージの類似性を学ぶ

SPAN: Learning Similarity between Scene Graphs and Images with Transformers ( http://arxiv.org/abs/2304.00590v2 )

ライセンス: Link先を確認

Yuren Cong, Wentong Liao, Bodo Rosenhahn, Michael Ying Yang,

(参考訳) シーングラフと画像の類似性を学習することは、シーングラフと画像が与えられた類似度スコアを推定することを目的としている。現在、このタスクに関する研究は行われていないが、シーングラフの生成や下流のアプリケーションには不可欠である。 Recall$@K$と平均Recall$@K$は、人間のラベル付き三重項集合に現れる予測三重項の比率を測定する。しかし、このようなトリプルト指向のメトリクスは、シーングラフと画像の全体的な意味的差異を示すことができず、アノテーションのバイアスやノイズに敏感である。したがって、下流アプリケーションで生成されたシーングラフの使用は制限される。この問題に対処するため,Scene graPh-imAge coNtrastive learning framework, SPANを提案する。我々の新しいフレームワークはグラフ変換器と画像変換器から構成され、シーングラフとその対応する画像を共有潜在空間に配置する。本稿では,シーングラフを構造的エンコーディングを伴うシーケンスに変換する新しいグラフシリアライズ手法を提案する。本稿では,シーングラフ生成のための新しい評価指標として,R-Precision測定画像検索精度を提案する。我々は、Visual GenomeとOpen Imagesデータセットに新しいベンチマークを構築した。シーングラフエンコーダとして大きな可能性を示すSPANの有効性を検証するために,大規模な実験を行った。

Learning similarity between scene graphs and images aims to estimate a similarity score given a scene graph and an image. There is currently no research dedicated to this task, although it is critical for scene graph generation and downstream applications. Scene graph generation is conventionally evaluated by Recall$@K$ and mean Recall$@K$, which measure the ratio of predicted triplets that appear in the human-labeled triplet set. However, such triplet-oriented metrics fail to demonstrate the overall semantic difference between a scene graph and an image and are sensitive to annotation bias and noise. Using generated scene graphs in the downstream applications is therefore limited. To address this issue, for the first time, we propose a Scene graPh-imAge coNtrastive learning framework, SPAN, that can measure the similarity between scene graphs and images. Our novel framework consists of a graph Transformer and an image Transformer to align scene graphs and their corresponding images in the shared latent space. We introduce a novel graph serialization technique that transforms a scene graph into a sequence with structural encodings. Based on our framework, we propose R-Precision measuring image retrieval accuracy as a new evaluation metric for scene graph generation. We establish new benchmarks on the Visual Genome and Open Images datasets. Extensive experiments are conducted to verify the effectiveness of SPAN, which shows great potential as a scene graph encoder.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# EduceLab-Scrolls:X線CTによるHerculaneum Papyriからのテキストの復元

EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT ( http://arxiv.org/abs/2304.02084v4 )

ライセンス: Link先を確認

Stephen Parsons, C. Seth Parker, Christy Chapman, Mami Hayashida, W. Brent Seales,

(参考訳) X線CT画像を用いたHerculaneum papyriの隠れテキストを明らかにするための完全なソフトウェアパイプラインを提案する。この拡張された仮想アンラッピングパイプラインは、機械学習と、3D画像と2D画像をリンクする新しい幾何学的フレームワークを組み合わせる。 EduceLab-Scrollsは、この問題に関する20年間の研究成果を表す包括的オープンデータセットである。 EduceLab-Scrollsには、小さな断片と無傷のロールスクロールの両方のボリュームX線CT画像が含まれている。データセットには、インク検出モデルの教師付きトレーニングに使用される2Dイメージラベルも含まれている。ラベリングは、スクロールフラグメントのスペクトル写真と、同じフラグメントのX線CT画像との整列を可能とし、画像空間とモダリティの間の機械学習可能なマッピングを作成する。このアライメントは、X線CTで「見えない」炭素インクを検出するための教師あり学習を可能にする。私たちの知る限り、このデータセットはこの種のデータセットとしては初めてのもので、遺産ドメインでリリースされた最大のデータセットです。本手法は, スクロール断片のテキスト行の正確な行を, 既知の地底真理で明らかにすることができる。検索されたテキストは、視覚的確認、定量的画像メトリクス、学術的レビューを用いて検証される。 EduceLab-ScrollsはHerculaneum papyriの隠れたテキストを初めて発見した。 EduceLab-Scrollsデータセットは、研究が進むにつれて、より多くのテキスト発見が生成されることを期待しています。

We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# Few Shot Semantic Segmentation: 方法論,ベンチマーク,オープンな課題のレビュー

Few Shot Semantic Segmentation: a review of methodologies, benchmarks, and open challenges ( http://arxiv.org/abs/2304.05832v2 )

ライセンス: Link先を確認

Nico Catalano, Matteo Matteucci,

(参考訳) セマンティックセグメンテーション(Semantic segmentation)は、自律運転からロボティクスまで、アプリケーションに不可欠なもので、大規模な注釈付きデータセットの収集が困難または違法に高価である領域において、大きな課題に直面している。医学や農業などの文脈では、訓練用画像の不足が進行している。 Few-Shot Semantic Segmentationは、コンピュータビジョンの新しいタスクであり、いくつかの例だけで新しいセマンティッククラスをセグメンテーションできるモデルを設計することを目的としている。本稿では、Few-Shot Semantic Segmentationの総合的な調査からなり、より一般的な条件付きおよびプロトタイプ型ネットワークからよりニッチな遅延空間最適化手法まで、その進化を辿り、様々なモデル設計を探究する。年代記を通して、我々は影響力のある傾向と方法論を識別し、その強さと限界について洞察を与える。時間軸は、視野の進行における重要なマイルストーンを示す、視覚的なロードマップを提供する。この調査は、ベンチマークデータセットの定量的分析と、セミナー作品の質的な展示によって補完され、読者にそのトピックを深く理解させる。現状の課題、最先端のモデル、そして今後の展望を解明することで、研究者や実践者がFew-Shot Semantic Segmentationの複雑さをナビゲートし、将来の発展のための基盤を提供する。

Semantic segmentation, vital for applications ranging from autonomous driving to robotics, faces significant challenges in domains where collecting large annotated datasets is difficult or prohibitively expensive. In such contexts, such as medicine and agriculture, the scarcity of training images hampers progress. Introducing Few-Shot Semantic Segmentation, a novel task in computer vision, which aims at designing models capable of segmenting new semantic classes with only a few examples. This paper consists of a comprehensive survey of Few-Shot Semantic Segmentation, tracing its evolution and exploring various model designs, from the more popular conditional and prototypical networks to the more niche latent space optimization methods, presenting also the new opportunities offered by recent foundational models. Through a chronological narrative, we dissect influential trends and methodologies, providing insights into their strengths and limitations. A temporal timeline offers a visual roadmap, marking key milestones in the field's progression. Complemented by quantitative analyses on benchmark datasets and qualitative showcases of seminal works, this survey equips readers with a deep understanding of the topic. By elucidating current challenges, state-of-the-art models, and prospects, we aid researchers and practitioners in navigating the intricacies of Few-Shot Semantic Segmentation and provide ground for future development.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# 倫理的マルチモーダルシステムに向けて

Towards ethical multimodal systems ( http://arxiv.org/abs/2304.13765v3 )

ライセンス: Link先を確認

Alexis Roger, Esma Aïmeur, Irina Rish,

(参考訳) ジェネレーティブAIシステム(ChatGPT、DALL-Eなど)は、アートのRombach氏ら(2021年)からメンタルヘルスのRob Morris氏とKareem Kouddous氏(2022年)まで、私たちの生活のさまざまな領域に拡大しています。 AIアライメントの新たな分野は、AIシステムが人間の価値を反映することを目指している。本稿では,テキストと画像の両方を含むマルチモーダルAIシステムの倫理性を評価することに焦点を当てる。倫理性に対する人間のフィードバックから、まずマルチモーダルな倫理的データベースを作成する。そこで,本データベースを用いて,システム応答の倫理性を自動的に評価するアルゴリズムを開発した。

Generative AI systems (ChatGPT, DALL-E, etc) are expanding into multiple areas of our lives, from art Rombach et al. [2021] to mental health Rob Morris and Kareem Kouddous [2022]; their rapidly growing societal impact opens new opportunities, but also raises ethical concerns. The emerging field of AI alignment aims to make AI systems reflect human values. This paper focuses on evaluating the ethics of multimodal AI systems involving both text and images - a relatively under-explored area, as most alignment work is currently focused on language models. We first create a multimodal ethical database from human feedback on ethicality. Then, using this database, we develop algorithms, including a RoBERTa-large classifier and a multilayer perceptron, to automatically assess the ethicality of system responses.

翻訳日:2024-05-22 01:10:43 公開日:2024-05-20

# 多変量トレース不等式による絡み合いモノガミー

Entanglement monogamy via multivariate trace inequalities ( http://arxiv.org/abs/2304.14878v2 )

ライセンス: Link先を確認

Mario Berta, Marco Tomamichel,

(参考訳) エントロピー(英: Entropy)は、量子情報理論における基本的な概念であり、絡み合いを定量化し、その性質(例えば、マルチパーティイト系上のモノガミー)を調べることができる。ここでは、多部量子系の制限された測定に基づいて、相対エントロピーの変分式を導出する。これを多変量行列トレース不等式と組み合わせることで、様々な既存の絡み合いモノガミー不等式を回復し、時に強化する。特に,一方向局所演算と古典的通信で測定された絡み合いの相対エントロピーと,それを分離的に測定された絡み合いの相対エントロピーに関連付け,相互情報の条件エントロピーの忠実度とを関連づけることで,行列解析に基づく直接的・行列解析に基づく証明を与える。本稿では, 相対エントロピーを正部分転位状態, マルチパーティイト構成状態の相対エントロピーを用いて, これらの結果の変動について論じる。本研究は,情報理論タスクの漸近的達成性に関する操作的議論を取り入れた文献における過去の導出を簡素化し,一般化した。

Entropy is a fundamental concept in quantum information theory that allows to quantify entanglement and investigate its properties, for example its monogamy over multipartite systems. Here, we derive variational formulas for relative entropies based on restricted measurements of multipartite quantum systems. By combining these with multivariate matrix trace inequalities, we recover and sometimes strengthen various existing entanglement monogamy inequalities. In particular, we give direct, matrix-analysis-based proofs for the faithfulness of squashed entanglement by relating it to the relative entropy of entanglement measured with one-way local operations and classical communication, as well as for the faithfulness of conditional entanglement of mutual information by relating it to the separably measured relative entropy of entanglement. We discuss variations of these results using the relative entropy to states with positive partial transpose, and multipartite setups. Our results simplify and generalize previous derivations in the literature that employed operational arguments about the asymptotic achievability of information-theoretic tasks.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# 時間共有計算資源に関する学習可能性

Learnability with Time-Sharing Computational Resource Concerns ( http://arxiv.org/abs/2305.02217v4 )

ライセンス: Link先を確認

Zhi-Hua Zhou,

(参考訳) 従来の理論的機械学習研究は、一般に、十分に、あるいは無限に供給された計算資源が存在することを明示的または暗黙的に仮定する。しかし、実際には、計算リソースは通常限られており、機械学習のパフォーマンスは、受信したデータの数だけでなく、利用可能な計算リソースの処理量にも依存する。現在の 'intelligent supercomputing'' 施設は、学習性能要求や学習プロセス状態などの重要な要因を考慮して、適応的なスケジューリング戦略を使わずに、一定の量のリソースを機械学習タスクに割り当てる排他的オペレーティングシステムのように機能する。本稿では,機械学習のスループットの概念を導入し,計算資源効率学習(CoRE-Learning)を定義し,学習理論における計算資源の影響を考慮した理論的枠組みを提案する。このフレームワークは、入ってくるデータストリームが圧倒的なサイズで無限に終止符を打つことができるようなストリーム学習に自然に適用することができ、受信したすべてのデータを時間内に処理できると仮定するのは現実的ではない。これはまた、インテリジェントなスーパーコンピュータオペレーティングシステムの設計に対する理論的視点を提供するかもしれない。

Conventional theoretical machine learning studies generally assume explicitly or implicitly that there are enough or even infinitely supplied computational resources. In real practice, however, computational resources are usually limited, and the performance of machine learning depends not only on how many data have been received, but also on how many data can be handled subject to computational resources available. Note that most current ``intelligent supercomputing'' facilities work like exclusive operating systems, where a fixed amount of resources are allocated to a machine learning task without adaptive scheduling strategies considering important factors such as the learning performance demands and learning process status. In this article, we introduce the notion of machine learning throughput, define Computational Resource Efficient Learning (CoRE-Learning), and present a theoretical framework that takes into account the influence of computational resources in learning theory. This framework can be naturally applied to stream learning where the incoming data streams can be potentially endless with overwhelming size and it is impractical to assume that all received data can be handled in time. It may also provide a theoretical perspective for the design of intelligent supercomputing operating systems.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# 最適化アルゴリズム、リャプノフ関数、微分方程式の接続について:理論と洞察

On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights ( http://arxiv.org/abs/2305.08658v3 )

ライセンス: Link先を確認

Paul Dobson, Jesus Maria Sanz-Serna, Konstantinos Zygalakis,

(参考訳) 我々はFazylab et al (SIAM J. Optim. 28 2018)によって導入された一般的なフレームワークを再検討し、離散的かつ連続的な時間で最適化アルゴリズムのためのリアプノフ関数を構築する。滑らかで強凸な目的関数に対して、そのような構成に必要な要求を緩和する。その結果、Polyak の常微分方程式と、Nesterov アルゴリズムの 2 パラメータの族に対して、文献で利用できるものよりも良い収束率を証明できる。我々はNesterovアルゴリズムの解釈をPolyak方程式の離散化として分析する。アルゴリズムが加法ランゲ・クッタ積分器の例であることを示し、微分方程式のほとんどの離散化が加速を伴う最適化アルゴリズムを導出しない理由を論じる。また、Polyak方程式の修正を導入し、収束特性について研究する。最後に、一般のフレームワークを確率的シナリオに拡張し、過パラメータモデルに対する加速度を伴うランダムアルゴリズムへの応用を検討する。

We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# 野生におけるディープフェイクテキストの検出

Deepfake Text Detection in the Wild ( http://arxiv.org/abs/2305.13242v2 )

ライセンス: Link先を確認

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang,

(参考訳) 大規模言語モデル(LLM)は、偽ニュースの拡散や盗作などのリスクを軽減するために効果的なAI生成テキスト検出の必要性を強調し、人間レベルのテキスト生成を実現している。既存の研究は、特定のドメインや特定の言語モデルにおける検出方法を評価することで制約されている。しかし、実際のシナリオでは、検出器はソースを知らずに、様々なドメインやLLMのテキストに直面する。この目的のために,様々な LLM が生成する多種多様な人文やテキストからテキストを収集し,総合的なテストベッドを構築する。実証的な結果は、機械が生成したテキストと、さまざまなシナリオ、特にアウト・オブ・ディストリビューションにおける人間によるテキストを区別する上での課題を示している。これらの課題は、2つの情報源間の言語的区別の減少によるものである。問題にもかかわらず、トップパフォーマンス検出器は、新しいLCMによって生成された86.54%のドメイン外のテキストを識別することができ、アプリケーションシナリオの実現可能性を示している。私たちはリソースをhttps://github.com/yafuly/MAGE.comでリリースします。

Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# MGL2Rank:マルチグラフフュージョンに基づく道路ネットワークにおけるノードの重要性のランク付けを学ぶ

MGL2Rank: Learning to Rank the Importance of Nodes in Road Networks Based on Multi-Graph Fusion ( http://arxiv.org/abs/2305.14375v3 )

ライセンス: Link先を確認

Ming Xu, Jing Zhang,

(参考訳) 道路網における伝播能力の強い重要なノードの同定は都市計画において重要な課題である。交通ネットワークにおけるノードの重要性を評価する既存の手法は、地形情報や交通量のみを考慮しており、車線数や道路セグメントの平均速度などの道路ネットワークにおける交通特性の多様性は無視され、性能が制限される。この問題を解決するために,道路ネットワークの豊富な特徴を統合し,ノードの重要性をランク付けするグラフ学習ベースのフレームワーク(MGL2Rank)を提案する。本フレームワークは、サンプリングアルゴリズム(MGWalk)とエンコーダネットワークとを含む埋め込みモジュールを備え、各道路セグメントの潜在表現を学習する。 MGWalkは、道路ネットワークのトポロジを捉え、それらの属性に基づいて道路セグメント間の関連を確立するために、マルチグラフ融合を利用する。得られたノード表現は、道路セグメントの重要性のランキングを学習するために使用される。最後に,シェニアン市の地方道路網をベースとしたタスクのランク付けのための合成データセットを構築し,その評価結果から本手法の有効性を実証した。 MGL2Rankのデータとソースコードはhttps://github.com/iCityLab/MGL2Rank.comで入手できる。

The identification of important nodes with strong propagation capabilities in road networks is a vital topic in urban planning. Existing methods for evaluating the importance of nodes in traffic networks only consider topological information and traffic volumes, the diversity of the traffic characteristics in road networks, such as the number of lanes and average speed of road segments, is ignored, thus limiting their performance. To solve this problem, we propose a graph learning-based framework (MGL2Rank) that integrates the rich characteristics of road networks to rank the importance of nodes. This framework comprises an embedding module containing a sampling algorithm (MGWalk) and an encoder network to learn the latent representations for each road segment. MGWalk utilizes multigraph fusion to capture the topology of road networks and establish associations between road segments based on their attributes. The obtained node representation is then used to learn the importance ranking of the road segments. Finally, a synthetic dataset is constructed for ranking tasks based on the regional road network of Shenyang City, and the ranking results on this dataset demonstrate the effectiveness of our method. The data and source code for MGL2Rank are available at https://github.com/iCityLab/MGL2Rank.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# 信頼による生成:ブラックボックス大言語モデルの不確実性定量化

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models ( http://arxiv.org/abs/2305.19187v3 )

ライセンス: Link先を確認

Zhen Lin, Shubhendu Trivedi, Jimeng Sun,

(参考訳) 自然言語生成(NLG)に特化した大規模言語モデル(LLM)は、最近、様々な領域で有望な能力を示すようになった。しかし、LSMsが生み出す応答の信頼性を追求することは、NLGの不確実性定量化(UQ)の研究が限られており、未解決の課題である。さらに、既存の文献では、言語モデルへのホワイトボックスアクセスを前提としており、これは最新のLCMのクローズソースの性質や計算上の制約によって非現実的になっている。本研究では,NLG における *black-box* LLM の UQ について検討する。第一に *不確実性* と *自信* を区別する: 前者は固定された入力に対する潜在的な予測の ` `dispersion'' を指し、後者は特定の予測/生成に対する信頼を示す。次に、信頼できない結果が無視されるか、さらなる評価のために得られるような、選択的なNLG*に適用して、いくつかの信頼/不確実性対策を提案し、比較する。質問応答データセット(評価目的)について,いくつかのLLMを用いて実験を行った。その結果, 意味的分散の簡易な尺度は, LLMの応答品質の信頼性の高い予測因子となり, LLMを採用する際の不確実性管理について, 実践者にとって貴重な知見を提供することができた。実験を再現するコードはhttps://github.com/zlin7/UQ-NLG.comで公開されている。

Large language models (LLMs) specializing in natural language generation (NLG) have recently started exhibiting promising capabilities across a variety of domains. However, gauging the trustworthiness of responses generated by LLMs remains an open challenge, with limited research on uncertainty quantification (UQ) for NLG. Furthermore, existing literature typically assumes white-box access to language models, which is becoming unrealistic either due to the closed-source nature of the latest LLMs or computational constraints. In this work, we investigate UQ in NLG for *black-box* LLMs. We first differentiate *uncertainty* vs *confidence*: the former refers to the ``dispersion'' of the potential predictions for a fixed input, and the latter refers to the confidence on a particular prediction/generation. We then propose and compare several confidence/uncertainty measures, applying them to *selective NLG* where unreliable results could either be ignored or yielded for further assessment. Experiments were carried out with several popular LLMs on question-answering datasets (for evaluation purposes). Results reveal that a simple measure for the semantic dispersion can be a reliable predictor of the quality of LLM responses, providing valuable insights for practitioners on uncertainty management when adopting LLMs. The code to replicate our experiments is available at https://github.com/zlin7/UQ-NLG.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# 正規グリッドを超えて: 任意ドメイン上のフーリエベースのニューラル演算子

Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains ( http://arxiv.org/abs/2305.19663v4 )

ライセンス: Link先を確認

Levi Lingsch, Mike Y. Michelis, Emmanuel de Bezenac, Sirani M. Perera, Robert K. Katzschmann, Siddhartha Mishra,

(参考訳) PDEの学習に広く用いられている多くのニューラル演算子の計算効率は、スペクトル計算を行うための高速フーリエ変換(FFT)に依存している。 FFT は等間隔(正方形)のグリッドに制限されているため、一般の非等間隔の点分布で入力関数と出力関数を処理しなければならない問題に適用した場合、そのようなニューラル演算子の効率は制限される。ニューラル演算子の要求表現性を提供するために、限られたフーリエ(スペクトル)モードが十分であるという観測を生かして、基礎となるスペクトル変換の効率的な直接評価に基づいて、ニューラルネットワークを任意の領域に拡張する簡単な方法を提案する。このような直接スペクトル評価の効率的な実装*は、任意の非等間隔分布上のデータの処理を可能にするために、既存のニューラル演算子モデルと結合される。実験的な評価により,提案手法により,フーリエニューラル演算子(FNO)と関連するニューラル演算子の精度を維持したり向上させたりしながら,ベースライン上でのトレーニング速度を大幅に向上した任意の点分布にニューラル演算子を拡張できることが実証された。

The computational efficiency of many neural operators, widely used for learning solutions of PDEs, relies on the fast Fourier transform (FFT) for performing spectral computations. As the FFT is limited to equispaced (rectangular) grids, this limits the efficiency of such neural operators when applied to problems where the input and output functions need to be processed on general non-equispaced point distributions. Leveraging the observation that a limited set of Fourier (Spectral) modes suffice to provide the required expressivity of a neural operator, we propose a simple method, based on the efficient direct evaluation of the underlying spectral transformation, to extend neural operators to arbitrary domains. An efficient implementation* of such direct spectral evaluations is coupled with existing neural operator models to allow the processing of data on arbitrary non-equispaced distributions of points. With extensive empirical evaluation, we demonstrate that the proposed method allows us to extend neural operators to arbitrary point distributions with significant gains in training speed over baselines while retaining or improving the accuracy of Fourier neural operators (FNOs) and related neural operators.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# 量子計測のための物理ノイズモデル

A physical noise model for quantum measurements ( http://arxiv.org/abs/2305.19766v3 )

ライセンス: Link先を確認

Faedi Loulidi, Ion Nechita, Clément Pellegrini,

(参考訳) 本稿では, 間接計測方式による量子計測のための新しいノイズモデルを提案する。量子システムとプローブ間の相互作用を制御しているランダムなダイナミクスを平均として、自然の物理的ノイズモデルが出現する。不整合性ロバスト性(英語版)の枠組みにおける既存のノイズモデル(一様および非偏極化)と比較する。我々は,本モデルが特定の測定値のクラスに対して,より大きな互換性領域を実現することを観察した。

In this paper we introduce a novel noise model for quantum measurements motivated by an indirect measurement scheme with faulty preparation. Averaging over random dynamics governing the interaction between the quantum system and a probe, a natural, physical noise model emerges. We compare it to existing noise models (uniform and depolarizing) in the framework of incompatibility robustness. We observe that our model allows for larger compatibility regions for specific classes of measurements.

翻訳日:2024-05-22 01:00:22 公開日:2024-05-20

# マルチバススピン-ボソンモデルにおけるエンタングルメントの増強

Enhanced entanglement in multi-bath spin-boson models ( http://arxiv.org/abs/2306.11036v3 )

ライセンス: Link先を確認

Charlie R. Hogg, Federico Cerisola, James D. Cresser, Simon A. R. Horsley, Janet Anders,

(参考訳) スピンボソンモデルは通常、単一のボゾン浴に結合されたスピンを考える。しかし、いくつかの物理的状況ではスピンを複数の環境に結合する必要がある。例えば、スピンは3次元磁気材料中のフォノンと相互作用する。ここではスピン結合を3つの独立した浴槽に等方的に考える。複数浴室との結合は, スピンと環境との絡み合いを0温度で著しく増大させることを示した。この効果は平均力平衡状態におけるスピンの期待値を減らすことである。対照的に、古典的な3塩基スピン平衡状態は、環境結合から完全に独立であることが判明した。これらの結果から、多重バス結合から生じる純粋に量子効果が明らかとなり、磁気材料など幅広い用途に応用される可能性がある。

The spin-boson model usually considers a spin coupled to a single bosonic bath. However, some physical situations require coupling of the spin to multiple environments. For example, spins interacting with phonons in three-dimensional magnetic materials. Here, we consider a spin coupled isotropically to three independent baths. We show that coupling to multiple baths can significantly increase entanglement between the spin and its environment at zero temperature. The effect of this is to reduce the spin's expectation values in the mean force equilibrium state. In contrast, the classical three-bath spin equilibrium state turns out to be entirely independent of the environmental coupling. These results reveal purely quantum effects that can arise from multi-bath couplings, with potential applications in a wide range of settings, such as magnetic materials.

翻訳日:2024-05-22 00:50:05 公開日:2024-05-20

# Statler: 身体的推論のための状態維持型言語モデル

Statler: State-Maintaining Language Models for Embodied Reasoning ( http://arxiv.org/abs/2306.17840v4 )

ライセンス: Link先を確認

Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter,

(参考訳) 知的ロボットを複雑な推論で強化するために、大きな言語モデルを使うことに大きな研究関心が寄せられている。既存の研究は、彼らの行動と観察の歴史を解明するために彼らの能力を活用することに焦点を当てている。本稿では,ロボット工学の計画において,大規模言語モデルが有用となる新しい次元について検討する。特に,大規模な言語モデルに対して,しばしば観測不可能な世界状態の推定を指示し,その遷移を新たな行動として追跡するフレームワークであるStatlerを提案する。そして、我々のフレームワークは、現在の世界状態の推定に対して各アクションを条件付けします。概念的には単純であるにもかかわらず、我々のStatlerフレームワークは、いくつかのロボット計画タスクにおいて強力な競合する手法(例えば、Code-as-Policies)を著しく上回っている。さらに、より困難な長期計画タスクにスケールアップする可能性もあります。

There has been a significant research interest in employing large language models to empower intelligent robots with complex reasoning. Existing work focuses on harnessing their abilities to reason about the histories of their actions and observations. In this paper, we explore a new dimension in which large language models may benefit robotics planning. In particular, we propose Statler, a framework in which large language models are prompted to maintain an estimate of the world state, which are often unobservable, and track its transition as new actions are taken. Our framework then conditions each action on the estimate of the current world state. Despite being conceptually simple, our Statler framework significantly outperforms strong competing methods (e.g., Code-as-Policies) on several robot planning tasks. Additionally, it has the potential advantage of scaling up to more challenging long-horizon planning tasks.

翻訳日:2024-05-22 00:50:05 公開日:2024-05-20

# FlakyFix: 大規模な言語モデルを使用して、フレキシブルなテスト修正カテゴリとテストコード修正を予測する

FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair ( http://arxiv.org/abs/2307.00012v3 )

ライセンス: Link先を確認

Sakina Fatima, Hadi Hemmati, Lionel Briand,

(参考訳) 不安定なテストは、非決定的に同じソフトウェアバージョンをテスト中にパスまたは失敗し、混乱と開発労力の浪費を引き起こすため、問題となる。機械学習モデルは、フレキネスとその根本原因を予測するために使われてきたが、問題を修正するためのサポートを提供する作業は、はるかに少ない。このギャップに対処するために、本稿では、フレキネスを除去し、そのベースでテストコードを修正するために必要な修正の種類を予測することに焦点を当てる。これは、フレキネスの根本原因がテストケース自身にあり、本番コードにはない、不安定なテストケースのサブセットに対して行います。私たちのキーとなるアイデアは、予測された修正カテゴリの形で、テストのフレキネスに関するさらなる知識で、修復プロセスのガイドを行うことです。そこで我々はまず,13の修正カテゴリのラベル付きデータセットを自動的に生成するフレームワークを提案し,テストコードのみを解析することにより,フレークテストの修正カテゴリを予測するモデルを訓練する。コードモデルと数ショット学習を用いた実験結果から,修正カテゴリのほとんどを正確に予測できることが判明した。欠陥を自動的に修復するための固定カテゴリラベルの有用性を示すため,GPTのような大規模言語モデル(LLM)を改良し,補修提案をLLMに依頼する。提案する修正カテゴリラベルは文脈内学習を補完し, GPT 3.5 Turbo のフレークテストにおける修正能力を大幅に向上させることが示唆された。本研究は, GPT修復フラキ試験のサンプルの実施と解析に基づいて, 修復率の70%から90%が通過することが期待されると推定した。修復されたテストが失敗した場合、平均してテストコードの16%は、通過するためにさらに変更する必要がある。

Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test, causing confusion and wasting development effort. While machine learning models have been used to predict flakiness and its root causes, there is much less work on providing support to fix the problem. To address this gap, in this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis. We do this for a subset of flaky test cases where the root cause of flakiness is in the test case itself and not in the production code. Our key idea is to guide the repair process with additional knowledge about the test's flakiness in the form of its predicted fix category. Thus, we first propose a framework that automatically generates labeled datasets for 13 fix categories and trains models to predict the fix category of a flaky test by analyzing the test code only. Our experimental results using code models and few-shot learning show that we can correctly predict most of the fix categories. To show the usefulness of such fix category labels for automatically repairing flakiness, in addition to informing testers, we augment a Large Language Model (LLM) like GPT with such extra knowledge to ask the LLM for repair suggestions. The results show that our suggested fix category labels, complemented with in-context learning, significantly enhance the capability of GPT 3.5 Turbo in generating fixes for flaky tests. Based on the execution and analysis of a sample of GPT-repaired flaky tests, we estimate that a large percentage of such repairs, (roughly between 70% and 90%) can be expected to pass. For the failing repaired tests, on average, 16% of the test code needs to be further changed for them to pass.

翻訳日:2024-05-22 00:50:05 公開日:2024-05-20

# 正規設計によるロジスティック回帰におけるパラメータ推定のサンプル複雑性について

On the sample complexity of parameter estimation in logistic regression with normal design ( http://arxiv.org/abs/2307.04191v3 )

ライセンス: Link先を確認

Daniel Hsu, Arya Mazumdar,

(参考訳) ロジスティック回帰モデルは、ノイズの多いバイナリ分類問題において最も一般的なデータ生成モデルの一つである。本研究では,ロジスティック回帰モデルのパラメータを与えられた$\ell_2$誤差まで推定するサンプルの複雑さを,標準正規共変量を用いて,次元と逆温度の観点から検討する。逆温度は、データ生成プロセスの信号対雑音比を制御する。対数回帰のための最大線量推定器の一般化境界と漸近性能はよく研究されているが, 誤差依存性を示す非漸近サンプルの複雑さとパラメータ推定のための逆温度は, これまでの分析では欠落している。試料の複雑性曲線は逆温度の点で2つの変化点を持ち, 低温, 中温, 高温状態を明確に分離することを示した。

The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.

翻訳日:2024-05-22 00:50:05 公開日:2024-05-20

# My3DGen: スケーラブルなパーソナライズされた3D生成モデル

My3DGen: A Scalable Personalized 3D Generative Model ( http://arxiv.org/abs/2307.05468v4 )

ライセンス: Link先を確認

Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta,

(参考訳) 近年,光現実的顔合成の課題に対処するために,生成型3次元顔モデル(例えばEG3D)が開発されている。しかし、これらのモデルでは個々の個人固有の顔の特徴を捉えることができず、パーソナライズの重要性を強調している。いくつかの先行研究は、生成的顔モデルのパーソナライズを約束しているが、これらの研究は主に2D設定に焦点を当てている。また、これらの手法では、各ユーザに対して多数のパラメータを微調整と格納の両方を必要とし、スケーラブルなパーソナライゼーションを実現するのに障害となる。パーソナライゼーションのもうひとつの課題は、個々のトレーニングイメージの数が限られていることだ。提案手法であるMy3DGenは,50枚以上のトレーニング画像を用いて個人に対してパーソナライズされた3D画像を生成する。 My3DGenは、新しいビューの合成、特定の顔のセマンティックな編集(例えば、笑顔を追加する)、新しい外観の合成を可能にする。我々は3D顔の特徴をグローバルな特徴とパーソナライズされた特徴に分解し、トレーニング済みのEG3Dを凍結し、低ランクの分解によってさらにパーソナライズされた重みをトレーニングする。その結果、My3DGenは個々のパラメータごとに$\textbf{240K}$パーソナライズされたパラメータのみを導入し、パラメータ全体の微調整に必要な$\textbf{30.6M}$と比較して、トレーニング可能なパラメータが$$\textbf{127}\times$削減される。ストレージの大幅な削減にもかかわらず、我々のモデルは下流アプリケーションの品質を損なうことなくアイデンティティ機能を保ちます。

In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D settings. Also, these methods require both fine-tuning and storing a large number of parameters for each user, posing a hindrance to achieving scalable personalization. Another challenge of personalization is the limited number of training images available for each individual, which often leads to overfitting when using full fine-tuning methods. Our proposed approach, My3DGen, generates a personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity. We decouple the 3D facial features into global features and personalized features by freezing the pre-trained EG3D and training additional personalized weights through low-rank decomposition. As a result, My3DGen introduces only $\textbf{240K}$ personalized parameters per individual, leading to a $\textbf{127}\times$ reduction in trainable parameters compared to the $\textbf{30.6M}$ required for fine-tuning the entire parameter space. Despite this significant reduction in storage, our model preserves identity features without compromising the quality of downstream applications.

翻訳日:2024-05-22 00:50:05 公開日:2024-05-20

# 境界駆動型ダブルスピンチェーンと資源効率のよいリモートエンタングルメント安定化の具体的結果

Exact Results for a Boundary-Driven Double Spin Chain and Resource-Efficient Remote Entanglement Stabilization ( http://arxiv.org/abs/2307.09482v2 )

ライセンス: Link先を確認

Andrew Lingenfelter, Mingxing Yao, Andrew Pocklington, Yu-Xin Wang, Abdullah Irfan, Wolfgang Pfaff, Aashish A. Clerk,

(参考訳) 2つの$XX$結合された$N$-qubitスピンチェイン(おそらくは非一様結合)が境界 Rabi ドライブおよび導波路(双方向または一方向)によって生じる共通境界損失を受けるようなセットアップの定常状態に対する正確な解を導出する。幅広いパラメータに対して、このシステムは純粋に絡み合った定常状態を持ち、圧縮光を使わずに遠隔マルチキュービットの絡み合いを安定化する手段を提供する。我々の解はまた、相互作用するフェルミオンモデルに写像する1つの境界駆動散逸$XX$スピンチェインに関する洞察を与える。非平衡定常状態は、動的に拘束されたホッピングから生じる穴の励起の創発的なペアリングを含む驚くべき相関効果を示す。我々のシステムは、回路QEDを含む多くの実験プラットフォームで実装できる。

We derive an exact solution for the steady state of a setup where two $XX$-coupled $N$-qubit spin chains (with possibly non-uniform couplings) are subject to boundary Rabi drives, and common boundary loss generated by a waveguide (either bidirectional or unidirectional). For a wide range of parameters, this system has a pure entangled steady state, providing a means for stabilizing remote multi-qubit entanglement without the use of squeezed light. Our solution also provides insights into a single boundary-driven dissipative $XX$ spin chain that maps to an interacting fermionic model. The non-equilibrium steady state exhibits surprising correlation effects, including an emergent pairing of hole excitations that arises from dynamically constrained hopping. Our system could be implemented in a number of experimental platforms, including circuit QED.

翻訳日:2024-05-22 00:50:05 公開日:2024-05-20

# LSTM, BiLSTM, CNN, GRU, GloVeを用いた癌遺伝子変異分類のためのハイブリッド機械学習モデル

A Hybrid Machine Learning Model for Classifying Gene Mutations in Cancer using LSTM, BiLSTM, CNN, GRU, and GloVe ( http://arxiv.org/abs/2307.14361v3 )

ライセンス: Link先を確認

Sanad Aburass, Osama Dorgham, Jamil Al Shaqsi,

(参考訳) 本研究では,LSTM,BiLSTM,CNN,GRU,GloVeを併用した新しいハイブリッドアンサンブルモデルを提案する。このモデルは、Kaggle氏のPersonalized Medicine: Redefining Cancer Treatmentデータセットを使用して厳格にテストされ、すべての評価指標で例外的なパフォーマンスを示しました。特に,トレーニング精度80.6%,精度81.6%,リコール80.6%,F1スコア83.1%,Mean Squared Error(MSE)2.596。これらの結果は高度なトランスフォーマーモデルとそのアンサンブルを上回り、遺伝子変異分類の複雑さを扱う上で、我々のモデルの優れた能力を示している。遺伝子変異分類の精度と効率は、個々の遺伝子プロファイルに基づく調整された治療計画が患者の結果を劇的に改善し、命を救うことができる精密医療の時代において最重要である。本モデルでは, がん診断と治療の精度を高める可能性を強調し, パーソナライズされた医療の進歩に大きく貢献する。

In our study, we introduce a novel hybrid ensemble model that synergistically combines LSTM, BiLSTM, CNN, GRU, and GloVe embeddings for the classification of gene mutations in cancer. This model was rigorously tested using Kaggle's Personalized Medicine: Redefining Cancer Treatment dataset, demonstrating exceptional performance across all evaluation metrics. Notably, our approach achieved a training accuracy of 80.6%, precision of 81.6%, recall of 80.6%, and an F1 score of 83.1%, alongside a significantly reduced Mean Squared Error (MSE) of 2.596. These results surpass those of advanced transformer models and their ensembles, showcasing our model's superior capability in handling the complexities of gene mutation classification. The accuracy and efficiency of gene mutation classification are paramount in the era of precision medicine, where tailored treatment plans based on individual genetic profiles can dramatically improve patient outcomes and save lives. Our model's remarkable performance highlights its potential in enhancing the precision of cancer diagnoses and treatments, thereby contributing significantly to the advancement of personalized healthcare.

翻訳日:2024-05-22 00:40:21 公開日:2024-05-20

# ログベース異常検出のための機械学習手法に関する総合的研究

A Comprehensive Study of Machine Learning Techniques for Log-Based Anomaly Detection ( http://arxiv.org/abs/2307.16714v2 )

ライセンス: Link先を確認

Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand,

(参考訳) システム複雑性の増大により、ログベースの異常検出(LAD)など、さまざまなログ分析タスクに特化した自動化技術の必要性が高まっている。後者は文学で広く取り上げられており、主に様々な深層学習技術によって研究されている。ディープラーニング技術には多くの利点があるが、従来の機械学習(ML)技術は多くのケースにおいて、コンテキストやデータセットによってうまく機能する可能性があるため、ある程度は任意である。半監督的技法は、前者が明確な実践上の優位性を持っているため、半監督的技法と同一の注意を払っている。さらに、現在の評価は検出精度の評価に大きく依存している。しかし、特定のMLテクニックが与えられたコンテキストにおけるLAD問題に対処するのに適したかどうかを決定するのに十分ではない。その他の考慮すべき側面としては、トレーニングや予測時間、ハイパーパラメータチューニングに対する感度などがあります。本稿では,教師付き,半教師付き,従来型,深層ML技術の4つの評価基準として,検出精度,時間性能,検出精度の感度,ハイパーパラメータチューニングに対する時間性能の4つの評価基準を提案する。実験結果から,従来のML手法と深部ML手法は,検出精度と予測時間に類似していることがわかった。さらに、総合的に、ハイパーパラメータチューニングw.r.t.検出精度に対する感度解析は、教師付き従来のML技術がディープラーニング技術よりも感度が低いことを示している。さらに、半教師技術は教師技術よりも検出精度が著しく低い。

Growth in system complexity increases the need for automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of a variety of deep learning techniques. Despite their many advantages, that focus on deep learning techniques is somewhat arbitrary as traditional Machine Learning (ML) techniques may perform well in many cases, depending on the context and datasets. In the same vein, semi-supervised techniques deserve the same attention as supervised techniques since the former have clear practical advantages. Further, current evaluations mostly rely on the assessment of detection accuracy. However, this is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem in a given context. Other aspects to consider include training and prediction times as well as the sensitivity to hyperparameter tuning, which in practice matters to engineers. In this paper, we present a comprehensive empirical study, in which we evaluate supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy and time performance to hyperparameter tuning. The experimental results show that supervised traditional and deep ML techniques fare similarly in terms of their detection accuracy and prediction time. Moreover, overall, sensitivity analysis to hyperparameter tuning w.r.t. detection accuracy shows that supervised traditional ML techniques are less sensitive than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.

翻訳日:2024-05-22 00:40:21 公開日:2024-05-20

# 半無限導波路と結合した原子に基づく量子コヒーレント及び測定フィードバック制御

Quantum coherent and measurement feedback control based on atoms coupled with a semi-infinite waveguide ( http://arxiv.org/abs/2307.16876v2 )

ライセンス: Link先を確認

Haijin Ding, Nina H. Amini, Guofeng Zhang, John E. Gough,

(参考訳) 本稿では,複数の2レベル原子を結合した半無限導波路に基づく原子・フォトニック系の所望の状態を生成するために,量子フィードバック制御が適用可能であることを示す。このセットアップでは、初期励起原子が導波路に1つの光子を放出し、終端ミラーや他の原子によって反射され、原子と光子のコヒーレント相互作用を介して異なるフィードバックループを確立することができる。導波管量子電磁力学(導波管QED)系に少なくとも2つの励起が存在する場合、量子状態の進化はランダムグラフ理論を用いて解釈できる。このプロセスは環境の影響を受けながら,計測に基づくフィードバック制御やコヒーレントドライブによって環境誘起のダイナミクスを排除できることを明らかにする。したがって、オープン系原子-導波路相互作用において、測定に基づくフィードバックは最終的な定常量子状態を変調することができ、同時に、測定プロセスにおけるホモダイン検出ノイズは振動を誘発し、コヒーレントなフィードバック設計によって処理される。

In this paper, we show that quantum feedback control may be applied to generate desired states for atomic and photonic systems based on a semi-infinite waveguide coupled with multiple two-level atoms. In this set-up, an initially excited atom can emit one photon into the waveguide, which can be reflected by the terminal mirror or other atoms to establish different feedback loops via the coherent interactions between the atom and photon. When there are at most two excitations in the waveguide quantum electrodynamics (waveguide QED) system, the evolution of quantum states can be interpreted using random graph theory. While this process is influenced by the environment, and we clarify that the environment-induced dynamics can be eliminated by measurement-based feedback control or coherent drives. Thus, in the open system atom-waveguide interactions, measurement-based feedback can modulate the final steady quantum state, while simultaneously, the homodyne detection noise in the measurement process can induce oscillations, which is treated by the coherent feedback designs.

翻訳日:2024-05-22 00:40:21 公開日:2024-05-20

# 連続対称性を持つ新しい畳み込みニューラルネットワークアーキテクチャ

A Novel Convolutional Neural Network Architecture with a Continuous Symmetry ( http://arxiv.org/abs/2308.01621v4 )

ライセンス: Link先を確認

Yao Liu, Hang Shao, Bing Bai,

(参考訳) 本稿では、準線形双曲系と呼ばれる偏微分方程式(PDE)のクラスに着想を得た新しい畳み込みニューラルネットワーク(ConvNet)アーキテクチャを提案する。画像分類タスクで同等の性能を持つので、連続した対称性の群を通して重みを修正できる。これは、アーキテクチャと重みが本質的に固定された従来のモデルから大きく変わります。我々は、ニューラルネットワークの新たな望ましい特性として(内部)対称性を推進し、より広範なDeep LearningコミュニティにおけるConvNetの分析と解釈において、PDEの視点に注意を向けたい。

This paper introduces a new Convolutional Neural Network (ConvNet) architecture inspired by a class of partial differential equations (PDEs) called quasi-linear hyperbolic systems. With comparable performance on the image classification task, it allows for the modification of the weights via a continuous group of symmetry. This is a significant shift from traditional models where the architecture and weights are essentially fixed. We wish to promote the (internal) symmetry as a new desirable property for a neural network, and to draw attention to the PDE perspective in analyzing and interpreting ConvNets in the broader Deep Learning community.

翻訳日:2024-05-22 00:40:21 公開日:2024-05-20

# シーン認識のための意味埋め込み型類似性プロトタイプ

Semantic-embedded Similarity Prototype for Scene Recognition ( http://arxiv.org/abs/2308.05896v3 )

ライセンス: Link先を確認

Chuanxin Song, Hanbo Wu, Xin Ma, Yibin Li,

(参考訳) 複雑な構成によって生じるクラス間類似度の高さと、シーン間の共存オブジェクトにより、多くの研究がシーン認識を改善するためにシーン内のオブジェクトの意味知識を探索してきた。しかし、オブジェクト情報抽出技術では計算コストが重いため、ネットワークの負担が大きくなるため、結果として課題が生じる。この制限は、実際のデプロイにおいて、エッジデバイスと互換性のないオブジェクトアシストアプローチをしばしば引き起こす。対照的に,本研究では,シーン認識ネットワークが実際の計算コストを増大させることなく,より優れた精度を実現するための,意味的知識に基づく類似性プロトタイプを提案する。シンプルで、既存のパイプラインにプラグイン&プレイできる。より具体的には、シーンのセマンティックな知識をクラスレベルのセマンティックな表現として表現するための統計戦略が導入された。これらの表現はシーンクラス間の相関を探索するために使用され、最終的には類似したプロトタイプを構築する。さらに,この類似性を生かして,グラディエントラベルソフトニングとバッチレベルのコントラストロスの観点から,ネットワークトレーニングを支援することを提案する。複数のベンチマークの総合的な評価は、我々の類似性プロトタイプが既存のネットワークの性能を向上させる一方で、実際の展開において計算負荷を余分に回避していることを示している。コードと統計的類似性プロトタイプはhttps://github.com/ChuanxinSong/SimilarityPrototypeで公開される。

Due to the high inter-class similarity caused by the complex composition and the co-existing objects across scenes, numerous studies have explored object semantic knowledge within scenes to improve scene recognition. However, a resulting challenge emerges as object information extraction techniques require heavy computational costs, thereby burdening the network considerably. This limitation often renders object-assisted approaches incompatible with edge devices in practical deployment. In contrast, this paper proposes a semantic knowledge-based similarity prototype, which can help the scene recognition network achieve superior accuracy without increasing the computational cost in practice. It is simple and can be plug-and-played into existing pipelines. More specifically, a statistical strategy is introduced to depict semantic knowledge in scenes as class-level semantic representations. These representations are used to explore correlations between scene classes, ultimately constructing a similarity prototype. Furthermore, we propose to leverage the similarity prototype to support network training from the perspective of Gradient Label Softening and Batch-level Contrastive Loss, respectively. Comprehensive evaluations on multiple benchmarks show that our similarity prototype enhances the performance of existing networks, all while avoiding any additional computational burden in practical deployments. Code and the statistical similarity prototype will be available at https://github.com/ChuanxinSong/SimilarityPrototype

翻訳日:2024-05-22 00:40:21 公開日:2024-05-20

# BEVTrack:鳥から見た3Dオブジェクト追跡のためのシンプルで強力なベースライン

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View ( http://arxiv.org/abs/2309.02185v5 )

ライセンス: Link先を確認

Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha,

(参考訳) 3Dシングルオブジェクトトラッキング(SOT)はコンピュータビジョンの基本課題であり、自律運転のようなアプリケーションに不可欠なことを証明している。外観の変化、散逸、点雲の広さにより、ターゲットを周囲から特定することは依然として困難である。これらの問題に対処するためには、シームズ以前のトラッカーとモーション中心のトラッカーの両方が精巧な設計と複数のサブタスクを解決する必要がある。本稿では,単純で効果的なベースライン手法であるBEVTrackを提案する。 Bird's-Eye View (BEV) の目標運動を推定して追跡を行うことで、BEVTrackは、ネットワーク設計、トレーニング目標、トラッキングパイプラインといった様々な側面から驚くほどの単純さを示しながら、優れたパフォーマンスを実現している。さらに、様々な属性(例えば、サイズ、動きパターン)を持つ対象に対する正確な回帰を達成するために、BEVTrackは、以前の研究のように固定されたラプラシアンあるいはガウス的仮定を作るのではなく、学習した基礎分布を異なる目標に適合させる可能性関数を構築する。これにより、トラッキングのための貴重な事前情報が提供され、パフォーマンスがさらに向上する。単純な畳み込みアーキテクチャで単一の回帰損失のみを使用する一方で、BEVTrackは3つの大規模データセット(KITTI、NuScenes、Waymo Open Dataset)で最先端のパフォーマンスを実現し、推論速度は約200FPSを維持している。コードはhttps://github.com/xmm-prio/BEVTrack.comでリリースされる。

3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. It remains challenging to localize the target from surroundings due to appearance variations, distractors, and the high sparsity of point clouds. To address these issues, prior Siamese and motion-centric trackers both require elaborate designs and solving multiple subtasks. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance. Besides, to achieve accurate regression for targets with diverse attributes (e.g., sizes and motion patterns), BEVTrack constructs the likelihood function with the learned underlying distributions adapted to different targets, rather than making a fixed Laplacian or Gaussian assumption as in previous works. This provides valuable priors for tracking and thus further boosts performance. While only using a single regression loss with a plain convolutional architecture, BEVTrack achieves state-of-the-art performance on three large-scale datasets, KITTI, NuScenes, and Waymo Open Dataset while maintaining a high inference speed of about 200 FPS. The code will be released at https://github.com/xmm-prio/BEVTrack.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# コード秘密:ニューラルネットワークの補完ツールがハードコードクレジットカードを記憶できる

Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-Coded Credentials ( http://arxiv.org/abs/2309.07639v2 )

ライセンス: Link先を確認

Yizhan Huang, Yichen Li, Weibin Wu, Jianping Zhang, Michael R. Lyu,

(参考訳) ニューラルコード補完ツール(NCCT)は、言語モデリング技術に基づいて構築され、文脈に関連のあるコードスニペットを正確に提案できるソフトウェア工学の分野を再構築した。しかし、言語モデルは適切なプロンプトで推論中に冗長なトレーニングデータを出力することができる。この記憶特性は、ハードコードされたクレデンシャルリークに関するNCCTのプライバシー上の懸念を高め、アプリケーション、システム、ネットワークへの不正アクセスを引き起こす。したがって、NCCTがハードコードされたクレデンシャルを出力するかどうかを問うために、ハードコードCredential Revealer (HCR) と呼ばれる評価ツールを提案する。 HCRはGitHubのコードファイルに基づいてテストプロンプトを構築し、NCCTの記憶現象を明らかにする。そして、HCRは不正な認証情報をフィルタする4つのフィルタを設計する。最後に、HCRは、一連の非機密認証の妥当性を直接チェックする。商用NCCT,オープンソースモデル,コード補完機能を備えたチャットボットの3種類のNCCTの評価にHCRを適用した。実験の結果,NCCTはトレーニングデータの正確な部分を返すだけでなく,必然的に追加の秘密文字列を漏洩させることができることがわかった。特に,実験中に2つの有効な認証情報が確認された。したがって、HCRは、商用NCCTのトレーニングデータにハードコードされた認証情報が漏洩する可能性があるという深刻なプライバシー上の懸念を提起する。すべてのアーティファクトとデータは、将来の研究目的のためにhttps://github.com/HCR-Repo/HCRでリリースされる。

Neural Code Completion Tools (NCCTs) have reshaped the field of software engineering, which are built upon the language modeling technique and can accurately suggest contextually relevant code snippets. However, language models may emit the training data verbatim during inference with appropriate prompts. This memorization property raises privacy concerns of NCCTs about hard-coded credential leakage, leading to unauthorized access to applications, systems, or networks. Therefore, to answer whether NCCTs will emit the hard-coded credential, we propose an evaluation tool called Hard-coded Credential Revealer (HCR). HCR constructs test prompts based on GitHub code files with credentials to reveal the memorization phenomenon of NCCTs. Then, HCR designs four filters to filter out ill-formatted credentials. Finally, HCR directly checks the validity of a set of non-sensitive credentials. We apply HCR to evaluate three representative types of NCCTs: Commercial NCCTs, open-source models, and chatbots with code completion capability. Our experimental results show that NCCTs can not only return the precise piece of their training data but also inadvertently leak additional secret strings. Notably, two valid credentials were identified during our experiments. Therefore, HCR raises a severe privacy concern about the potential leakage of hard-coded credentials in the training data of commercial NCCTs. All artifacts and data are released for future research purposes in https://github.com/HCR-Repo/HCR.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# Draft & Verify: 自己投機的デコーディングによるロスレス大規模言語モデルの高速化

Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding ( http://arxiv.org/abs/2309.08168v2 )

ライセンス: Link先を確認

Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, Sharad Mehrotra,

(参考訳) 本稿では,Large Language Models (LLM) を補助モデルなしで高速化するための新しい推論手法,自己投機的デコーディングを提案する。このアプローチの特徴は、ドラフトと検証という2段階のプロセスである。ドラフト段階は、若干低い品質でより迅速にドラフトトークンを生成し、ドラフト期間中に特定の中間層を選択的にスキップすることで達成される。その後、検証段階は元のLSMを使用して、これらのドラフト出力トークンを1つのフォワードパスで検証する。このプロセスは、最終的な出力が未修正のLLMで生成された出力と変わらないことを保証します。さらに、提案手法では、追加のニューラルネットワークトレーニングを必要とせず、メモリフットプリントを必要とせず、推論アクセラレーションのためのプラグアンドプレイで費用対効果の高いソリューションとなる。 LLaMA-2とその変種によるベンチマークでは、1.99$\times$まで高速化された。

We present a novel inference scheme, self-speculative decoding, for accelerating Large Language Models (LLMs) without the need for an auxiliary model. This approach is characterized by a two-stage process: drafting and verification. The drafting stage generates draft tokens at a slightly lower quality but more quickly, which is achieved by selectively skipping certain intermediate layers during drafting. Subsequently, the verification stage employs the original LLM to validate those draft output tokens in one forward pass. This process ensures the final output remains identical to that produced by the unaltered LLM. Moreover, the proposed method requires no additional neural network training and no extra memory footprint, making it a plug-and-play and cost-effective solution for inference acceleration. Benchmarks with LLaMA-2 and its variants demonstrated a speedup up to 1.99$\times$.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# 点拡散モデルを用いた大腸3次元形状再構成によるデジタルファントム生成

Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation ( http://arxiv.org/abs/2309.08289v2 )

ライセンス: Link先を確認

Kaouther Mouheb, Mobina Ghojogh Nejad, Lavsen Dahal, Ehsan Samei, Kyle J. Lafata, W. Paul Segars, Joseph Y. Lo,

(参考訳) 人間の臓器の正確な3Dモデリングは、仮想画像実験のための計算ファントムの構築において重要な役割を担っている。しかし、CTスキャンから臓器表面の解剖学的に妥当な再構成を生成することは、人体の多くの構造にとって依然として困難である。この課題は大腸を扱う際に特に顕著である。本研究では,近年の幾何学的深層学習の進歩と拡散確率モデルのデノベーションを活用して,大腸のセグメンテーション結果を改良する。まず、臓器を3Dセグメンテーションマスクの表面から採取した点雲として表現する。その後,臓器形状のグローバルおよび局所的な潜在表現を得るために,階層的変分オートエンコーダを用いた。階層型潜在空間における2つの条件付き微分拡散モデルを訓練し、形状改善を行う。提案手法をさらに強化するため,得られた完全点雲からスムーズなメッシュを生成することのできる最先端表面再構成モデルを組み込んだ。実験の結果,臓器形状のグローバル分布と微細な細部の両方を捉えるためのアプローチの有効性が示された。我々の完全な精錬パイプラインは、初期セグメンテーションに比べて表面表現の顕著な向上を示し、チャンファー距離を70%、ハウスドルフ距離を32%、アースモーバー距離を6%削減した。幾何学的深層学習, 拡散モデル, 高度な表面再構成技術を組み合わせることで, 大腸表面を正確にモデル化し, 他の解剖学的構造にも容易に拡張できることを示す。

Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent advancements in geometric deep learning and denoising diffusion probabilistic models to refine the segmentation results of the large intestine. We begin by representing the organ as point clouds sampled from the surface of the 3D segmentation mask. Subsequently, we employ a hierarchical variational autoencoder to obtain global and local latent representations of the organ's shape. We train two conditional denoising diffusion models in the hierarchical latent space to perform shape refinement. To further enhance our method, we incorporate a state-of-the-art surface reconstruction model, allowing us to generate smooth meshes from the obtained complete point clouds. Experimental results demonstrate the effectiveness of our approach in capturing both the global distribution of the organ's shape and its fine details. Our complete refinement pipeline demonstrates remarkable enhancements in surface representation compared to the initial segmentation, reducing the Chamfer distance by 70%, the Hausdorff distance by 32%, and the Earth Mover's distance by 6%. By combining geometric deep learning, denoising diffusion models, and advanced surface reconstruction techniques, our proposed method offers a promising solution for accurately modeling the large intestine's surface and can easily be extended to other anatomical structures.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# アダプティブ・プライオリティ・リライジングによる公正分類器の一般化の促進

Boosting Fair Classifier Generalization through Adaptive Priority Reweighing ( http://arxiv.org/abs/2309.08375v3 )

ライセンス: Link先を確認

Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He,

(参考訳) 重要な意思決定領域における機械学習アプリケーションの普及に伴い、アルゴリズム的公正性の要求がより顕著になる。公正性制約を学習することでアルゴリズムの公正性を改善するための様々なモダリティがあるが、それらの性能はテストセットではうまく一般化しない。より優れた一般化性を持つ性能向上フェアアルゴリズムが必要である。本稿では,トレーニングデータとテストデータ間の分散シフトがモデル一般化性に与える影響を解消する適応的リライジング手法を提案する。以前のリウィーディング法のほとんどは、各(部分)群に対して統一重みを割り当てることを提案している。むしろ,本手法は,サンプル予測から決定境界までの距離を詳細にモデル化する。適応的リウィーディング法は, 決定境界に近いサンプルを優先し, 公平な分類器の一般化性を向上させるために, より高い重みを割り当てる。グラフ型ベンチマークにおいて,適応的優先順位付け手法の精度と公平度(等機会,等化確率,人口比率)の一般化性を検証するため,広範囲な実験を行った。また,言語と視覚モデルの公平性を向上する上で,本手法の性能を強調した。コードはhttps://github.com/che2198/APW.comで公開されている。

With the increasing penetration of machine learning applications in critical decision-making areas, calls for algorithmic fairness are more prominent. Although there have been various modalities to improve algorithmic fairness through learning with fairness constraints, their performance does not generalize well in the test set. A performance-promising fair algorithm with better generalizability is needed. This paper proposes a novel adaptive reweighing method to eliminate the impact of the distribution shifts between training and test data on model generalizability. Most previous reweighing methods propose to assign a unified weight for each (sub)group. Rather, our method granularly models the distance from the sample predictions to the decision boundary. Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers. Extensive experiments are performed to validate the generalizability of our adaptive priority reweighing method for accuracy and fairness measures (i.e., equal opportunity, equalized odds, and demographic parity) in tabular benchmarks. We also highlight the performance of our method in improving the fairness of language and vision models. The code is available at https://github.com/che2198/APW.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# スクリーンだけ見る:マルチモーダル・チェーン・オブ・アクション・エージェント

You Only Look at Screens: Multimodal Chain-of-Action Agents ( http://arxiv.org/abs/2309.11436v3 )

ライセンス: Link先を確認

Zhuosheng Zhang, Aston Zhang,

(参考訳) 自律型グラフィカルユーザインタフェース(GUI)エージェントは、手作業による介入なしにユーザインタフェースと対話することで、タスクの自動化を促進することを目的としている。近年,多様な環境において,大規模言語モデル(LLM)を効果的に活用する能力について検討している。 LLMの入出力要件に合わせて、既存のほとんどのアプローチはサンドボックス環境下で開発され、外部ツールやアプリケーション固有のAPIに依存して、環境をテキスト要素に解析し、予測されたアクションを解釈する。その結果、これらのアプローチは推論の非効率性とエラーの伝播リスクに悩まされることが多い。課題を軽減するため、私たちはAuto-GUIを導入しました。Auto-GUIはインターフェースと直接対話するマルチモーダルソリューションで、環境解析やアプリケーション依存APIへの依存を回避します。さらに、エージェントが実行すべきアクションを決定するのを助けるために、一連の中間的なアクション履歴と将来のアクション計画を活用するチェーン・オブ・アクション手法を提案する。我々は,アプリケーション操作やWeb検索,Webショッピングといったマルチステップタスクにまたがる,30$Kのユニークな命令を持つ新しいデバイス制御ベンチマークAITWに対するアプローチを評価した。実験の結果,Auto-GUIは動作型予測精度90\%,総合動作成功率74\%で最先端性能を達成することがわかった。コードはhttps://github.com/cooelf/Auto-GUIで公開されている。

Autonomous graphical user interface (GUI) agents aim to facilitate task automation by interacting with the user interface without manual intervention. Recent studies have investigated eliciting the capabilities of large language models (LLMs) for effective engagement in diverse environments. To align with the input-output requirement of LLMs, most existing approaches are developed under a sandbox setting where they rely on external tools and application-specific APIs to parse the environment into textual elements and interpret the predicted actions. Consequently, those approaches often grapple with inference inefficiency and error propagation risks. To mitigate the challenges, we introduce Auto-GUI, a multimodal solution that directly interacts with the interface, bypassing the need for environment parsing or reliance on application-dependent APIs. Moreover, we propose a chain-of-action technique -- leveraging a series of intermediate previous action histories and future action plans -- to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30$K$ unique instructions, spanning multi-step tasks such as application operation, web searching, and web shopping. Experimental results show that Auto-GUI achieves state-of-the-art performance with an action type prediction accuracy of 90\% and an overall action success rate of 74\%. Code is publicly available at https://github.com/cooelf/Auto-GUI.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# Jury: 総合評価ツールキット

Jury: A Comprehensive Evaluation Toolkit ( http://arxiv.org/abs/2310.02040v2 )

ライセンス: Link先を確認

Devrim Cavusoglu, Secil Sen, Ulas Sert, Sinan Altinuc,

(参考訳) 評価は、どんな予測ベースシステムの基本ブロックとして、ディープラーニングにおいて重要な役割を果たす。しかし、膨大な数の自然言語処理(NLP)タスクと様々なメトリクスの開発が、異なるメトリクスで異なるシステムを評価する上での課題につながっている。これらの課題に対処するために、さまざまなタスクやメトリクスに対して評価を行うための標準化された構造を備えた統一的な評価フレームワークである陪審を導入する。陪審の目的は、すべてのシステムに対するメートル法評価の標準化と改善であり、評価の課題を克服するコミュニティを支援することである。オープンソースリリース以来、審査員は幅広い読者にリーチし、https://github.com/obss/jury.comで入手できる。

Evaluation plays a critical role in deep learning as a fundamental block of any prediction-based system. However, the vast number of Natural Language Processing (NLP) tasks and the development of various metrics have led to challenges in evaluating different systems with different metrics. To address these challenges, we introduce jury, a toolkit that provides a unified evaluation framework with standardized structures for performing evaluation across different tasks and metrics. The objective of jury is to standardize and improve metric evaluation for all systems and aid the community in overcoming the challenges in evaluation. Since its open-source release, jury has reached a wide audience and is available at https://github.com/obss/jury.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# 振動による浮遊型マイクロ磁気シリンダの線形冷却

Linear cooling of a levitated micromagnetic cylinder by vibration ( http://arxiv.org/abs/2310.03880v2 )

ライセンス: Link先を確認

Chris Timberlake, Elliot Simcox, Hendrik Ulbricht,

(参考訳) 我々は, 圧電アクチュエータを用いて, 高Qの機械的モードに線形フィードバックを適用することにより, 導電性マイクロマグネットシリンダの変換自由度とリリレーショナル自由度のフィードバック冷却を報告する。通常のモードは、直流SQUIDに結合した超伝導ピックアップコイルを用いて測定され、位相情報は圧電アクチュエータにフィードバックされ、中心質量モードを~$\sim$~7〜Kに冷却し、830 \pm 200$~mKにリリレーモードとする。中心質量モードでは1.0 \times 10^7$のQ因子が評価される。振動分離を導入し, ピックアップコイルの形状を最適化し, 注目モードに焦点を合わせ, 検出に最先端のSQUIDを利用することにより, 地中冷却が可能であることが判明した。

We report feedback cooling of translational and librational degrees of freedom of a levitated micromagnet cylinder, utilizing a piezoelectric actuator to apply linear feedback to high-Q mechanical modes. The normal modes are measured with a superconducting pick-up coil coupled to a DC SQUID, and phase information is fed back to the piezoelectric actuator to feedback cool a center-of-mass mode to ~$\sim$~7~K, and a librational mode to $830 \pm 200$~mK. Q-factors of $1.0 \times 10^7$ are evaluated for the center-of-mass mode. We find that it is plausible to achieve ground state cooling of the center-of-mass mode by introducing vibration isolation, optimizing the geometry of the pick-up coil to focus on the specific mode of interest and utilizing a state-of-the-art SQUID for detection.

翻訳日:2024-05-22 00:30:29 公開日:2024-05-20

# シェープ値に基づく導電性勾配の新しいベースライン推定

A New Baseline Assumption of Integated Gradients Based on Shaply value ( http://arxiv.org/abs/2310.04821v3 )

ライセンス: Link先を確認

Shuyang Liu, Zixuan Chen, Ge Shi, Ji Wang, Changjie Fan, Yu Xiong, Runze Wu Yujing Hu, Ze Ji, Yang Gao,

(参考訳) ディープニューラルネットワーク(DNN)をデコードする試みは、しばしば、予測を入力機能にマッピングする。これらの手法の中で、統合勾配(IG)が重要な手法として登場している。 IGにおける適切なベースラインの選択は、多種多様な設定におけるモデル予測の有意義で偏見のない説明を作成するために不可欠である。しかし、単一のベースラインを利用する標準的なアプローチは、しばしば不十分であり、複数のベースラインが必要である。 IGとAumann-Shapley Valueの自然な結びつきを活用し、ベースライン設計の新たな展望を提供する。理論的には、ある仮定の下では、基本ラインの集合は、共有価値(Shapley Value)によって記述された連立関係と一致している。この知見に基づいて,Shapley値計算プロセスに比例サンプリングを用いたShapley Integrated Gradients (SIG) と呼ばれる新しいベースライン手法を開発した。 GridWorldで実施されたシミュレーションでは、SIGがシェープ値の分布を効果的にエミュレートしている。さらに、様々な画像処理タスクに関する実証テストでは、SIGが従来のIGベースラインメソッドを超越し、より正確な特徴の見積もりを提供し、異なるアプリケーション間で一貫した説明を提供し、追加の計算要求を無視できる多様なデータタイプへの適応性を確保する。

Efforts to decode deep neural networks (DNNs) often involve mapping their predictions back to the input features. Among these methods, Integrated Gradients (IG) has emerged as a significant technique. The selection of appropriate baselines in IG is crucial for crafting meaningful and unbiased explanations of model predictions in diverse settings. The standard approach of utilizing a single baseline, however, is frequently inadequate, prompting the need for multiple baselines. Leveraging the natural link between IG and the Aumann-Shapley Value, we provide a novel outlook on baseline design. Theoretically, we demonstrate that under certain assumptions, a collection of baselines aligns with the coalitions described by the Shapley Value. Building on this insight, we develop a new baseline method called Shapley Integrated Gradients (SIG), which uses proportional sampling to mirror the Shapley Value computation process. Simulations conducted in GridWorld validate that SIG effectively emulates the distribution of Shapley Values. Moreover, empirical tests on various image processing tasks show that SIG surpasses traditional IG baseline methods by offering more precise estimates of feature contributions, providing consistent explanations across different applications, and ensuring adaptability to diverse data types with negligible additional computational demand.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# UniParser: 相関表現学習を統一したマルチヒューマンパーシング

UniParser: Multi-Human Parsing with Unified Correlation Representation Learning ( http://arxiv.org/abs/2310.08984v2 )

ライセンス: Link先を確認

Jiaming Chu, Lei Jin, Junliang Xing, Jian Zhao,

(参考訳) マルチヒューマンパーシング(Multi- Human parsing)は、インスタンスレベルと詳細なカテゴリレベルの情報の両方を必要とするイメージセグメンテーションタスクである。しかし、以前の研究では、これらの2つのタイプの情報を別々のブランチと異なる出力形式で処理し、非効率で冗長なフレームワークを生み出した。本稿では、インスタンスレベルとカテゴリレベルの表現を3つの重要な側面に統合するUniParserを紹介する。 1)コサイン空間内のインスタンスやカテゴリの特徴をネットワークで学べる統合された相関表現学習手法を提案する。 2)各モジュールの出力の形式を画素レベルのセグメンテーション結果として統一するとともに,補助的損失を伴う同種ラベルを用いてインスタンスとカテゴリの特徴を監督する。 3)インスタンスとカテゴリ表現を融合させる共同最適化手法を設計する。インスタンスレベルの出力とカテゴリレベルの出力を統合することで、UniParserは手動で設計した後処理技術を回避し、最先端の手法を超越し、MHPv2.0では49.3%のAP、CIHPでは60.4%のAPを達成した。私たちは、将来の研究を促進するために、ソースコード、事前訓練されたモデル、オンラインデモをリリースします。

Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through separate branches and distinct output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level segmentation results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By virtual of unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We will release our source code, pretrained models, and online demos to facilitate future studies.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# 人間の記憶機構にインスパイアされた推論のためのフレームワーク

A Framework for Inference Inspired by Human Memory Mechanisms ( http://arxiv.org/abs/2310.09297v2 )

ライセンス: Link先を確認

Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang,

(参考訳) 人間と機械は、認識された情報を過去の記憶の文脈に含めながら、関係推論と質問応答の現在の入力をどう理解するかは、認知科学と人工知能において難しい問題となっている。人間の脳の記憶システムと認知アーキテクチャに触発され,知覚,記憶,推論の構成要素からなるPMIフレームワークを提案する。特に、メモリモジュールは、ワーキングメモリと長期メモリから構成されており、後者は、広範囲で複雑なリレーショナル知識と経験を保持するために、高次構造を備えている。異なる競合する書き込みアクセスを通じて、現在の知覚はワーキングメモリを更新し、後に外部製品アソシエーションを通じて長期記憶にマージされ、情報競合を低減し、メモリオーバーフローを回避する。推論モジュールでは、2つの別々のメモリ起源から関連情報を検索し、連想的に統合して現在の知覚をより包括的かつ正確に解釈する。我々は,bAbI-20kやSolt-of-CLEVRデータセットなどの質問応答タスクに対して,PMIを爆発的に適用し,また,等角三角形,言語モデリング,画像分類タスクを検出するとともに,PMIの強化により,元のモデルを大きく上回っている。可視化解析により、リレーショナルメモリの統合は、多様なメモリソースからの情報の相互作用と統合と共に、推論タスクにおけるモデルの有効性に大きく寄与することが明らかとなった。

How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain extensive and complex relational knowledge and experience. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, reducing information conflicts and averting memory overflow. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as detecting equilateral triangles, language modeling and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that relational memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# ビッグデータコンテキストにおけるK平均クラスタリング最適化手法の比較分析:

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review ( http://arxiv.org/abs/2310.09819v3 )

ライセンス: Link先を確認

Ravil Mussabayev, Rustam Mussabayev,

(参考訳) 本稿では,ビッグデータの文脈におけるK平均アルゴリズムの最適化手法の比較分析を行う。 K-meansはクラスタリングアルゴリズムとして広く使用されているが、大規模なデータセットを扱う場合、スケーラビリティの問題に悩まされる可能性がある。本稿では、並列化、近似、サンプリング方法など、これらの問題を克服するための様々なアプローチについて検討する。著者らは、多数のベンチマークデータセット上で、様々なクラスタリング技術の性能を評価し、それらを"less is more"アプローチ(LIMA)によって提供される支配的基準、すなわち、スピード、クラスタリング品質、単純さの次元に沿って同時に比較した。その結果、異なる手法がデータセットの異なるタイプに適していることが示され、ビッグデータのK平均クラスタリングにおける速度と精度のトレードオフに関する洞察を提供する。全体として、この論文は、ビッグデータアプリケーションにK平均をどのように最適化するかについて、実践者や研究者に包括的なガイドを提供する。

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with large datasets. The paper explores different approaches to overcome these issues, including parallelization, approximation, and sampling methods. The authors evaluate the performance of various clustering techniques on a large number of benchmark datasets, comparing them according to the dominance criterion provided by the "less is more" approach (LIMA), i.e., simultaneously along the dimensions of speed, clustering quality, and simplicity. The results show that different techniques are more suitable for different types of datasets and provide insights into the trade-offs between speed and accuracy in K-means clustering for big data. Overall, the paper offers a comprehensive guide for practitioners and researchers on how to optimize K-means for big data applications.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# KI-PMF:知識統合可塑性運動予測

KI-PMF: Knowledge Integrated Plausible Motion Forecasting ( http://arxiv.org/abs/2310.12007v2 )

ライセンス: Link先を確認

Abhishek Vivekanandan, Ahmed Abouelazm, Philip Schörner, J. Marius Zöllner,

(参考訳) 交通機関の正確な動きを予測することは、大規模な自動運転車の配備に不可欠である。現在の軌道予測アプローチは、主に特定の計量で損失関数を最適化することに集中しており、これは物理的法則に従わない、あるいは外部の制約に反しない予測をもたらす可能性がある。本研究の目的は,車両の運動的制約と運転環境の幾何学的制約に適合して,ネットワークが将来の軌跡を予測できる明示的な知識を組み込むことである。これを実現するために、定義した知識事前を統合するために、非パラメトリックプルーニング層とアテンション層を導入する。提案手法は,複雑な状況と動的状況の両方において,交通アクタの到達可能性を保証するように設計されている。ネットワークに物理法則に従うよう条件付けすることで、現実の環境での自動運転車の安全性と効率を維持する上で不可欠な正確かつ安全な予測が得られ、要約して、トレーニングプロセスに知識を取り入れることで、安全で信頼性の高い動き予測のためのオフロード予測を防止する概念を提示する。

Accurately forecasting the motion of traffic actors is crucial for the deployment of autonomous vehicles at a large scale. Current trajectory forecasting approaches primarily concentrate on optimizing a loss function with a specific metric, which can result in predictions that do not adhere to physical laws or violate external constraints. Our objective is to incorporate explicit knowledge priors that allow a network to forecast future trajectories in compliance with both the kinematic constraints of a vehicle and the geometry of the driving environment. To achieve this, we introduce a non-parametric pruning layer and attention layers to integrate the defined knowledge priors. Our proposed method is designed to ensure reachability guarantees for traffic actors in both complex and dynamic situations. By conditioning the network to follow physical laws, we can obtain accurate and safe predictions, essential for maintaining autonomous vehicles' safety and efficiency in real-world settings.In summary, this paper presents concepts that prevent off-road predictions for safe and reliable motion forecasting by incorporating knowledge priors into the training process.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# 相互共振相互作用を超越した固定結合、固定周波数トランスモンにおけるネイティブ2量子ゲート

Native two-qubit gates in fixed-coupling, fixed-frequency transmons beyond cross-resonance interaction ( http://arxiv.org/abs/2310.12146v2 )

ライセンス: Link先を確認

Ken Xuan Wei, Isaac Lauer, Emily Pritchett, William Shanks, David C. McKay, Ali Javadi-Abhari,

(参考訳) 固定周波数超伝導量子ビットは、安定かつスケーラブルな量子コンピューティングのプラットフォームとして素晴らしい成功を収めた。クロス共振ゲートは固定結合で固定周波数の超伝導プロセッサのワークホースであり、隣人の周波数と1キュービットの共振で発生した絡み合いを利用して高忠実で普遍的なCNOTを実現している。ここでは、オン共振およびオフ共振マイクロ波駆動を用いてクロス共振を超越し、CNOTと等価でないネイティブに興味深い2ビットゲートを実現する。特に、ネイティブISWAP、SWAP、$\sqrt{\text{ISWAP}}$、BSWAPゲートを実装し、ベンチマークする。さらに、これらの手法をBゲートの効率的な構成に応用し、任意の2ビットゲートに到達可能な完全エンタングルを2つの応用で実現した。これらのネイティブな2ビットゲートは、クロス共振ゲートからコンパイルしたゲートよりも優れていることを示す。本研究では,各2ビットゲートの駆動に必要な共振条件を解明し,それをカイスキットで実装するための新しいフレームトラッキング技術を提供する。

Fixed-frequency superconducting qubits demonstrate remarkable success as platforms for stable and scalable quantum computing. Cross-resonance gates have been the workhorse of fixed-coupling, fixed-frequency superconducting processors, leveraging the entanglement generated by driving one qubit resonantly with a neighbor's frequency to achieve high-fidelity, universal CNOTs. Here, we use on-resonant and off-resonant microwave drives to go beyond cross-resonance, realizing natively interesting two-qubit gates that are not equivalent to CNOTs. In particular, we implement and benchmark native ISWAP, SWAP, $\sqrt{\text{ISWAP}}$, and BSWAP gates. Furthermore, we apply these techniques for an efficient construction of the B-gate: a perfect entangler from which any two-qubit gate can be reached in only two applications. We show these native two-qubit gates are better than their counterparts compiled from cross-resonance gates. We elucidate the resonance conditions required to drive each two-qubit gate and provide a novel frame tracking technique to implement them in Qiskit.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# タンパク質リガンド構造予測モデルの可能性を解き放つため, HelixDock を用いた大規模ドッキングコンフォーメーションの事前評価

Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models ( http://arxiv.org/abs/2310.13913v3 )

ライセンス: Link先を確認

Lihang Liu, Shanzhuo Zhang, Donglong He, Xianbin Ye, Jingbo Zhou, Xiaonan Zhang, Yaoyao Jiang, Weiming Diao, Hang Yin, Hua Chai, Fan Wang, Jingzhou He, Liang Zheng, Yonghui Li, Xiaomin Fang,

(参考訳) タンパク質リガンド構造予測は、小さな分子(リガンド)と標的タンパク質(受容体)の結合相互作用を予測する薬物発見において必須の課題である。近年の進歩は、タンパク質リガンド構造予測の精度を向上させるためのディープラーニング技術が組み込まれている。それでもドッキングコンフォーメーションの実験的な検証はコストがかかるままであり、訓練データに制限があるため、これらの深層学習手法の一般化可能性に関する懸念が高まる。本研究では,従来の物理ドッキングツールによる大規模ドッキングコンフォメーションの事前トレーニングを行い,実験によって検証された受容体-リガンド複合体の限定セットを用いて微調整を行うことにより,優れた性能を有するタンパク質-リガンド構造予測モデルが得られることを示す。具体的には、このプロセスはタンパク質とリガンドのペアリングのための1億ドッキングコンフォメーションを生成し、約100万のCPUコア日を要した。提案モデルであるHelixDockは,物理ベースのドッキングツールによってカプセル化された物理知識を,事前学習期間中に取得することを目的としている。 HelixDockは、物理学ベースのベースラインとディープラーニングベースのベースラインの両方に対して厳格にベンチマークされ、バインディング確認の予測において、例外的な精度と堅牢な転送性を示している。さらに,本研究は,事前学習したタンパク質リガンド構造予測モデルに基づくスケーリング法則を明らかにし,モデルパラメータの増加と事前学習データ量の増加に伴う性能の持続的な向上を示唆している。さらに,HelixDockをいくつかの薬物発見関連タスクに適用し,その実用性を検証した。 HelixDockはクロスドッキングと構造ベースの仮想スクリーニングベンチマークの両方で優れた機能を示している。

Protein-ligand structure prediction is an essential task in drug discovery, predicting the binding interactions between small molecules (ligands) and target proteins (receptors). Recent advances have incorporated deep learning techniques to improve the accuracy of protein-ligand structure prediction. Nevertheless, the experimental validation of docking conformations remains costly, it raises concerns regarding the generalizability of these deep learning-based methods due to the limited training data. In this work, we show that by pre-training on a large-scale docking conformation generated by traditional physics-based docking tools and then fine-tuning with a limited set of experimentally validated receptor-ligand complexes, we can obtain a protein-ligand structure prediction model with outstanding performance. Specifically, this process involved the generation of 100 million docking conformations for protein-ligand pairings, an endeavor consuming roughly 1 million CPU core days. The proposed model, HelixDock, aims to acquire the physical knowledge encapsulated by the physics-based docking tools during the pre-training phase. HelixDock has been rigorously benchmarked against both physics-based and deep learning-based baselines, demonstrating its exceptional precision and robust transferability in predicting binding confirmation. In addition, our investigation reveals the scaling laws governing pre-trained protein-ligand structure prediction models, indicating a consistent enhancement in performance with increases in model parameters and the volume of pre-training data. Moreover, we applied HelixDock to several drug discovery-related tasks to validate its practical utility. HelixDock demonstrates outstanding capabilities on both cross-docking and structure-based virtual screening benchmarks.

翻訳日:2024-05-22 00:20:28 公開日:2024-05-20

# 時系列因果グラフの抽象化による全効果の同定可能性

Identifiability of total effects from abstractions of time series causal graphs ( http://arxiv.org/abs/2310.14691v5 )

ライセンス: Link先を確認

Charles K. Assaad, Emilie Devijver, Eric Gaussier, Gregor Gössler, Anouar Meynaoui,

(参考訳) 実例では,真の因果グラフの抽象化にのみアクセス可能な状況において,観測時系列からの介入による全効果の識別可能性の問題について検討する。ここでは、全てのラタグ因果関係を混同するが、ラタグ関係と即時関係を区別する拡張要約因果グラフと、因果関係間の遅延を示さない要約因果グラフの2つの抽象化について考察する。要約因果グラフでは,全効果が常に識別可能であることを示し,要約因果グラフにおける識別可能性について十分な条件を提供する。さらに、特定可能な場合の総効果を推定するための調整セットも提供します。

We study the problem of identifiability of the total effect of an intervention from observational time series in the situation, common in practice, where one only has access to abstractions of the true causal graph. We consider here two abstractions: the extended summary causal graph, which conflates all lagged causal relations but distinguishes between lagged and instantaneous relations, and the summary causal graph which does not give any indication about the lag between causal relations. We show that the total effect is always identifiable in extended summary causal graphs and provide sufficient conditions for identifiability in summary causal graphs. We furthermore provide adjustment sets allowing to estimate the total effect whenever it is identifiable.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# タブラルデータクエリと可視化のための自然言語インタフェース:サーベイ

Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey ( http://arxiv.org/abs/2310.17894v3 )

ライセンス: Link先を確認

Weixu Zhang, Yifei Wang, Yuanfeng Song, Victor Junqiu Wei, Yuxing Tian, Yiyan Qi, Jonathan H. Chan, Raymond Chi-Wing Wong, Haiqin Yang,

(参考訳) 自然言語処理の出現は、ユーザが表形式のデータと対話する方法に革命をもたらし、従来のクエリ言語や手作業によるプロットから、より直感的な言語ベースのインターフェースへの移行を可能にした。 ChatGPTなどの大規模言語モデル(LLM)の台頭は、この分野をさらに進歩させ、自然言語処理技術のための新たな道を開いた。本調査では,自然言語クエリによるデータ操作を可能にする,表形式のデータクエリと可視化のための自然言語インタフェースの概要を概観する。自然言語からSQLクエリやデータ視覚化コマンドへの変換を容易にする重要な技術であるセマンティック解析に特に重点を置いて、これらのインターフェースの基礎となる概念とテクニックを紹介します。次に、データセット、方法論、メトリクス、システム設計の観点から、Text-to-SQLおよびText-to-Vis問題の最近の進歩を掘り下げます。この中には、LSMの影響を深く掘り下げ、その強み、制限、将来の改善の可能性を強調している。本調査は,大規模言語モデルの時代におけるデータインタラクションのための自然言語インタフェースの開発と適用に関心のある研究者や実践者を対象としたロードマップの提供を目的とする。

The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# GIST: 生成入力はディープラーニングにおける転送可能性を設定する

GIST: Generated Inputs Sets Transferability in Deep Learning ( http://arxiv.org/abs/2311.00801v3 )

ライセンス: Link先を確認

Florian Tambon, Foutse Khomh, Giuliano Antoniol,

(参考訳) ディープニューラルネットワーク(DNN)の妥当性とテスト性を高めるため,テストケース生成手法の開発が進んでいる。 DNNモデルのテストに直面すると、ユーザーは既存のテスト生成テクニックを適用できる。しかし、テスト中の各テクニックと各DNNモデルに対してそうする必要がある。テスト中の各DNNモデルに対して独立してテストセットを再生するのではなく、既存のDNNモデルから移行することができる。本稿では、テストセットの効率的な転送のための新しいアプローチであるGIST(Generated Inputs Sets Transferability)を紹介する。ユーザによって選択されたプロパティ(例えば、ニューロンがカバーされ、障害)が与えられた場合、GISTは、利用可能なテストセットのうち、このプロパティの観点から良いテストセットを選択することができる。これにより、ユーザは、テストケース生成技術を使って、スクラッチからテストセットを生成することで、転送されたテストセット上の同様のプロパティを回復することができる。実験結果から,GISTは移動対象のプロパティに対して有効なテストセットを選択することができることがわかった。さらに、GISTはテスト中のDNNモデルでスクラッチからテストケース生成テクニックを再適用するよりもスケールが優れている。

To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed. When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models. This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# 自然言語記述を用いたインコンテクスト学習のロバスト性向上

Using Natural Language Explanations to Improve Robustness of In-context Learning ( http://arxiv.org/abs/2311.07556v2 )

ライセンス: Link先を確認

Xuanli He, Yuxiang Wu, Oana-Maria Camburu, Pasquale Minervini, Pontus Stenetorp,

(参考訳) 近年の研究では、大規模言語モデル(LLM)が、文脈内学習(ICL)を通じて多くのタスクを遂行できることが示されている。しかし,近年の研究では,逆入力を用いた場合,ICLが提案するモデルでは不正確な結果が生じる傾向が示されている。本研究では,自然言語の推論とパラフレーズ識別を対象とする敵対的データセットにおいて,自然言語説明(NLE)によるICLの強化がLLMの堅牢性を向上させるか否かを検討する。我々は,人間生成NLEの小さなセットでLSMにさらなるNLEを生成するよう促し,ゼロショットICL設定と人生成NLEの使用の両方よりも正確な結果を得る。 5つのLLM (GPT3.5-turbo, Llama2, Vicuna, Zephyr, Mistral) の結果から, HANS, ISCS, NaN, ST, PICD, PISP, ANLI, PAWS の8つのデータセットに対するベースラインアプローチよりも6%以上の改善が得られた。さらに、従来の研究では、迅速な選択戦略により、分布内テストセット上でのICLが著しく向上することが示されている。しかし, 本手法はロバスト性評価に適合せず, その結果, 提案手法と比較して8%の精度低下がみられた。

Recent studies demonstrated that large language models (LLMs) can excel in many tasks via in-context learning (ICL). However, recent works show that ICL-prompted models tend to produce inaccurate results when presented with adversarial inputs. In this work, we investigate whether augmenting ICL with natural language explanations (NLEs) improves the robustness of LLMs on adversarial datasets covering natural language inference and paraphrasing identification. We prompt LLMs with a small set of human-generated NLEs to produce further NLEs, yielding more accurate results than both a zero-shot-ICL setting and using only human-generated NLEs. Our results on five popular LLMs (GPT3.5-turbo, Llama2, Vicuna, Zephyr, and Mistral) show that our approach yields over 6% improvement over baseline approaches for eight adversarial datasets: HANS, ISCS, NaN, ST, PICD, PISP, ANLI, and PAWS. Furthermore, previous studies have demonstrated that prompt selection strategies significantly enhance ICL on in-distribution test sets. However, our findings reveal that these strategies do not match the efficacy of our approach for robustness evaluations, resulting in an accuracy drop of 8% compared to the proposed approach.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# 量子部分空間法における適応的測定戦略

Adaptive measurement strategy for quantum subspace methods ( http://arxiv.org/abs/2311.07893v2 )

ライセンス: Link先を確認

Yuma Nakamura, Yoshichika Yano, Nobuyuki Yoshioka,

(参考訳) 未知の量子状態に対する物理観測値の推定は、量子情報処理、量子物理学、量子化学など幅広い分野の基礎となる重要な問題である。特に量子計算の文脈では、既存の研究は主に、既知の古典的な記述を持つ特定の可観測物に対する全体論的な状態トモグラフィーや推定に焦点を当てているが、これは推定対象自体が測定結果に依存している重要な問題のクラスを欠いている。本研究では、量子部分空間法、すなわち古典的後処理を計測結果に用いた変分シミュレーション法に有用な適応的測定最適化法を提案する。提案手法は、まず古典的にシミュレート可能な状態の測定プロトコルを決定し、量子測定結果に応じて量子部分空間展開(QSE)のプロトコルを適応的に更新する。数値実験として,分子の励起状態シミュレーションを行った。一適切な測定戦略を構築することにより、測定回数を桁違いに減らすことができること。 (ii) 適応反復は H$_4$ の強い相関分子に対してもうまく収束する。本研究は,QSE法の可能性について,精巧な測定プロトコルを用いて検証し,より効率的な量子計測手法を実用化するための道を開くことを明らかにする。

Estimation of physical observables for unknown quantum states is an important problem that underlies a wide range of fields, including quantum information processing, quantum physics, and quantum chemistry. In the context of quantum computation, in particular, existing studies have mainly focused on holistic state tomography or estimation on specific observables with known classical descriptions, while this lacks the important class of problems where the estimation target itself relies on the measurement outcome. In this work, we propose an adaptive measurement optimization method that is useful for the quantum subspace methods, namely the variational simulation methods that utilize classical postprocessing on measurement outcomes. The proposed method first determines the measurement protocol for classically simulatable states, and then adaptively updates the protocol of quantum subspace expansion (QSE) according to the quantum measurement result. As a numerical demonstration, we have shown for excited-state simulation of molecules that (i) we are able to reduce the number of measurements by an order of magnitude by constructing an appropriate measurement strategy (ii) the adaptive iteration converges successfully even for a strongly correlated molecule of H$_4$. Our work reveals that the potential of the QSE method can be empowered by elaborated measurement protocols, and opens a path to further pursue efficient quantum measurement techniques in practical computations.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# 量子モンテカルロと多モード摂動法による第1相水素の電子励起スペクトル

Electronic excitation spectra of molecular hydrogen in Phase I from Quantum Monte Carlo and Many-Body perturbation methods ( http://arxiv.org/abs/2311.08506v2 )

ライセンス: Link先を確認

Vitaly Gorelov, Markus Holzmann, David M. Ceperley, Carlo Pierleoni,

(参考訳) 固体水素(フェーズI)中の電子励起スペクトルを,量子モンテカルロ法および多体摂動理論を用いて,周囲温度および5-90 GPa圧力で検討した。この範囲では、システムは広いギャップ分子絶縁体から半導体に変化し、励起の性質は局所化から非局在化する。計算されたギャップとスペクトルは実験に一致し、核量子および熱効果の存在下で多体系のバンドギャップを正確に予測する能力を示す。

We study the electronic excitation spectra in solid molecular hydrogen (phase I) at ambient temperature and 5-90 GPa pressures using Quantum Monte Carlo methods and Many-Body Perturbation Theory. In this range, the system changes from a wide gap molecular insulator to a semiconductor, altering the nature of the excitations from localized to delocalized. Computed gaps and spectra agree with experiments, proving the ability to predict accurately band gaps of many-body systems in presence of nuclear quantum and thermal effects.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# 正規微分方程式(SA-nODE)によるニューラルネットワーク分類のための安定なトラクター

Stable Attractors for Neural networks classification via Ordinary Differential Equations (SA-nODE) ( http://arxiv.org/abs/2311.10387v2 )

ライセンス: Link先を確認

Raffaele Marino, Lorenzo Giambagli, Lorenzo Chicchi, Lorenzo Buffoni, Duccio Fanelli,

(参考訳) 機械学習と力学系理論の交点に位置する教師付き分類の新しい手法を提案する。通常の微分方程式を分類目的に用いた他の手法との相違点において、訓練されていないモデルは事前割り当てされた定常的誘引器の集合に対応するように構築された先行性である。分類量は、入力として供給された処理項目の特異性に応じて、植木された引き金の1つに向かってダイナミクスを操る。漸近的に、システムは探索された多次元空間の特定の点に収束し、最終的に分類される対象の圏を宣言する。この文脈で作業する際、訓練されたモデルによって取得されたポストによって固有の分類を行う能力は、最終的にターゲットの安定なアトラクションのそれぞれに関連するアトラクションの形状の流域に反映される。提案手法の性能は,その目的のために製作されたシンプルな玩具モデルや,確立された参照基準に頼って評価される。この手法は最先端のディープラーニングアルゴリズムの性能には達しないが、解析的相互作用項を閉じた連続力学系が高性能な分類器として機能することを示す。

A novel approach for supervised classification is presented which sits at the intersection of machine learning and dynamical systems theory. At variance with other methodologies that employ ordinary differential equations for classification purposes, the untrained model is a priori constructed to accommodate for a set of pre-assigned stationary stable attractors. Classifying amounts to steer the dynamics towards one of the planted attractors, depending on the specificity of the processed item supplied as an input. Asymptotically the system will hence converge on a specific point of the explored multi-dimensional space, flagging the category of the object to be eventually classified. Working in this context, the inherent ability to perform classification, as acquired ex post by the trained model, is ultimately reflected in the shaped basin of attractions associated to each of the target stable attractors. The performance of the proposed method is here challenged against simple toy models crafted for the purpose, as well as by resorting to well established reference standards. Although this method does not reach the performance of state-of-the-art deep learning algorithms, it illustrates that continuous dynamical systems with closed analytical interaction terms can serve as high-performance classifiers.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# バイモーダル畳み込みニューラルネットワークを用いた言語・生理データストリームの認識検出

Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks ( http://arxiv.org/abs/2311.10944v3 )

ライセンス: Link先を確認

Panfeng Li, Mohamed Abouelenien, Rada Mihalcea, Zhicheng Ding, Qikai Yang, Yiming Zhou,

(参考訳) 倫理的・セキュリティ上の懸念から、偽造検知が関心を増している。本稿では,畳み込み型ニューラルネットワークのマルチモーダルな騙し検出への応用について検討する。 2つのトピックについて104人の被験者にインタビューして構築したデータセットを使用します。特に、主な貢献は3つあります。まず,このデータから言語的・生理的特徴を抽出し,ニューラルネットワークモデルを訓練・構築する。次に,両モードを用いた融合畳み込みニューラルネットワークモデルを提案する。第3に,新しい手法と,マルチモーダルな偽装検出のための従来手法を比較した。我々のシステムは通常の分類法よりも優れており,本研究の結果は,限られた量のデータが存在する場合でも,誤検出にニューラルネットワークを用いることの可能性を示している。

Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# Tactics2D: 意思決定のための高度にモジュラーで拡張可能なシミュレータ

Tactics2D: A Highly Modular and Extensible Simulator for Driving Decision-making ( http://arxiv.org/abs/2311.11058v3 )

ライセンス: Link先を確認

Yueyuan Li, Songan Zhang, Mingyang Jiang, Xingyuan Chen, Yeqiang Qian, Chunxiang Wang, Ming Yang,

(参考訳) シミュレーションは多様で現実的な交通シナリオを生成するための先進的な手法であり、運転意思決定システムの開発を支援する。しかし、既存のシミュレータは、様々なシナリオや、交通参加者の対話的行動モデルでは不足することが多い。この欠陥は、柔軟で信頼性が高く、ユーザフレンドリなオープンソースシミュレータの必要性を浮き彫りにする。この課題に対処するため、Tactics2Dでは、道路要素、交通規制、行動モデル、車両の物理シミュレーション、イベント検出機構を含む、交通シナリオ構築へのモジュラーアプローチを採用している。広く利用されているアルゴリズムと構成を統合することで、Tactics2Dは、ビルディングブロックを組み立てるように、ユーザが強制的に駆動シナリオを構築することができる。ユーザは、パブリックデータセットとユーザによる実世界のデータの両方を活用することで、さまざまなシナリオで意思決定モデルを駆動するパフォーマンスを効果的に評価できる。ソースコードとコミュニティのサポートにアクセスするには、https://github.com/WoodOxen/Tactics2Dの公式GitHubページを参照してほしい。

Simulation is a prospective method for generating diverse and realistic traffic scenarios to aid in the development of driving decision-making systems. However, existing simulators often fall short in diverse scenarios or interactive behavior models for traffic participants. This deficiency underscores the need for a flexible, reliable, user-friendly open-source simulator. Addressing this challenge, Tactics2D adopts a modular approach to traffic scenario construction, encompassing road elements, traffic regulations, behavior models, physics simulations for vehicles, and event detection mechanisms. By integrating numerous commonly utilized algorithms and configurations, Tactics2D empowers users to construct their driving scenarios effortlessly, just like assembling building blocks. Users can effectively evaluate the performance of driving decision-making models across various scenarios by leveraging both public datasets and user-collected real-world data. For access to the source code and community support, please visit the official GitHub page for Tactics2D at https://github.com/WoodOxen/Tactics2D.

翻訳日:2024-05-22 00:10:05 公開日:2024-05-20

# 分散二段階最適化の通信複雑性について

On the Communication Complexity of Decentralized Bilevel Optimization ( http://arxiv.org/abs/2311.11342v3 )

ライセンス: Link先を確認

Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao,

(参考訳) 分散二段階最適化は、機械学習に広く応用されているため、ここ数年で活発に研究されている。しかし、既存のアルゴリズムは確率的過次性の推定によって引き起こされる通信の複雑さに悩まされ、その応用を現実のタスクに限定する。この問題に対処するため,各ラウンドの通信コストと通信ラウンド数が少ない不均一な条件下で,分散確率的二段階勾配降下アルゴリズムを開発した。したがって、不均一性に関する強い仮定なしに、既存のアルゴリズムよりもはるかに優れた通信複雑性を実現することができる。我々の知る限りでは、これは不均一な条件下でこれらの理論結果を達成する最初の確率的アルゴリズムである。最終的に実験結果により,本アルゴリズムの有効性が確認された。

Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and a small number of communication rounds. As such, it can achieve a much better communication complexity than existing algorithms without any strong assumptions regarding heterogeneity. To the best of our knowledge, this is the first stochastic algorithm achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# SIAM:ビデオ予測のための簡単な交互ミキサー

SIAM: A Simple Alternating Mixer for Video Prediction ( http://arxiv.org/abs/2311.11683v2 )

ライセンス: Link先を確認

Xin Zheng, Ziang Peng, Yuan Cao, Hongming Shan, Junping Zhang,

(参考訳) ビデオ予測は、以前のフレームから将来のフレームを予測するもので、自律運転や天気予報といった幅広い応用がある。既存の最先端の手法は、通常、ビデオから空間的、時間的、または時空間的な特徴を抽出することに焦点を当てる。異なる特徴は、異なるネットワークアーキテクチャから生じるもので、結果のモデルがいくつかのビデオ予測タスクで優れているが、他のモデルでは不十分である。より汎用的なビデオ予測ソリューションを目指して、これらの機能を統一エンコーダデコーダフレームワークで明示的にモデル化し、新しい簡易交互混合器(SIAM)を提案する。 SIAMの斬新さは次元交互混合(DaMi)ブロックの設計にあり、特徴写像の次元の交互化によって空間的・時間的・時空間的特徴をモデル化することができる。大規模な実験結果から,合成シナリオと実世界のシナリオの両方をカバーする4つのベンチマークビデオデータセットにおいて,提案したSIAMの優れた性能を示す。

Video prediction, predicting future frames from the previous ones, has broad applications such as autonomous driving and weather forecasting. Existing state-of-the-art methods typically focus on extracting either spatial, temporal, or spatiotemporal features from videos. Different feature focuses, resulting from different network architectures, may make the resultant models excel at some video prediction tasks but perform poorly on others. Towards a more generic video prediction solution, we explicitly model these features in a unified encoder-decoder framework and propose a novel simple alternating Mixer (SIAM). The novelty of SIAM lies in the design of dimension alternating mixing (DaMi) blocks, which can model spatial, temporal, and spatiotemporal features through alternating the dimensions of the feature maps. Extensive experimental results demonstrate the superior performance of the proposed SIAM on four benchmark video datasets covering both synthetic and real-world scenarios.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 生成的蒸留を伴う拡散モデルの連続学習

Continual Learning of Diffusion Models with Generative Distillation ( http://arxiv.org/abs/2311.14028v2 )

ライセンス: Link先を確認

Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven,

(参考訳) 拡散モデルは画像合成における最先端性能を達成する強力な生成モデルである。しかし、それらのトレーニングには大量のデータと計算資源が必要である。継続的な学習は、新しいタスクを漸進的に学習し、知識を蓄積し、さらなる学習のためにトレーニングされたモデルの再利用を可能にする。生成的リプレイでは、以前のタスクで訓練された生成モデルのコピーが、現在のタスクのデータとインターリーブされた合成データを生成する。しかし、拡散モデルに適用された標準的な生成的リプレイは、デノナイジング能力の破滅的な損失をもたらす。本稿では,拡散モデルの全逆過程を除去する生成蒸留法を提案する。提案手法は,生成的リプレイの継続学習性能を大幅に向上させ,計算コストをわずかに増加させることを実証する。

Diffusion models are powerful generative models that achieve state-of-the-art performance in image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus enabling the reuse of trained models for further learning. One potentially suitable continual learning approach is generative replay, where a copy of a generative model trained on previous tasks produces synthetic data that are interleaved with data from the current task. However, standard generative replay applied to diffusion models results in a catastrophic loss in denoising capabilities. In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model. We demonstrate that our approach substantially improves the continual learning performance of generative replay with only a modest increase in the computational costs.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 微分可能かつ加速された球面調和変換とウィグナー変換

Differentiable and accelerated spherical harmonic and Wigner transforms ( http://arxiv.org/abs/2311.14670v2 )

ライセンス: Link先を確認

Matthew A. Price, Jason D. McEwen,

(参考訳) 科学と工学の多くの分野は、球面多様体上で定義されたデータに遭遇する。球面データのモデリングと解析は、しばしば高次の球面調和変換を必要とする。球面 $\mathbb{S}^2$ と回転群 $\text{SO}(3)$,すなわち球面調和およびウィグナー変換上の一般化フーリエ変換の高速化および微分可能計算のための新しいアルゴリズム構造を開発する。 Wigner $d$-functions の計算に対する再帰的アルゴリズムを提案する。これを分離可能な球面変換と密結合することにより、現代のハードウェアアクセラレータ(例えばGPU)の高スループット計算に適した極めて並列的な構造を示すアルゴリズムを得る。我々はまた、勾配を効率的に計算できるように、ハイブリッド自動微分法と手動微分法を開発した。我々のアルゴリズムは、S2FFTソフトウェアコードのJAX差別化プログラミングフレームワークで実装されています。等角およびHEALPixサンプリングを含む球面の多数のサンプリングがサポートされている。計算誤差は、サンプリング定理を持つ球面サンプリングの機械精度の順である。代替のCコードに対してベンチマークすると、最大400倍の加速度が観測されます。さらに、複数のGPUに分散すると、アルゴリズムの高度に並列化されバランスの取れた性質のため、GPUの数が増加するにつれて、最適な線形スケーリングに非常に近い。十分に多くのGPUにアクセスすることで、我々の変換は前例のない効果的な線形時間複雑性を示す。

Many areas of science and engineering encounter data defined on spherical manifolds. Modelling and analysis of spherical data often necessitates spherical harmonic transforms, at high degrees, and increasingly requires efficient computation of gradients for machine learning or other differentiable programming tasks. We develop novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere $\mathbb{S}^2$ and rotation group $\text{SO}(3)$, i.e. spherical harmonic and Wigner transforms, respectively. We present a recursive algorithm for the calculation of Wigner $d$-functions that is both stable to high harmonic degrees and extremely parallelisable. By tightly coupling this with separable spherical transforms, we obtain algorithms that exhibit an extremely parallelisable structure that is well-suited for the high throughput computing of modern hardware accelerators (e.g. GPUs). We also develop a hybrid automatic and manual differentiation approach so that gradients can be computed efficiently. Our algorithms are implemented within the JAX differentiable programming framework in the S2FFT software code. Numerous samplings of the sphere are supported, including equiangular and HEALPix sampling. Computational errors are at the order of machine precision for spherical samplings that admit a sampling theorem. When benchmarked against alternative C codes we observe up to a 400-fold acceleration. Furthermore, when distributing over multiple GPUs we achieve very close to optimal linear scaling with increasing number of GPUs due to the highly parallelised and balanced nature of our algorithms. Provided access to sufficiently many GPUs our transforms thus exhibit an unprecedented effective linear time complexity.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 拡散確率モデルに基づく残差雑音に基づく画像復元

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise ( http://arxiv.org/abs/2311.14900v2 )

ライセンス: Link先を確認

Zhenning Shi, Haoshuai Zheng, Chen Xu, Changsheng Dong, Bin Pan, Xueshuo Xie, Along He, Tao Li, Huazhu Fu,

(参考訳) 近年,デノナイズ拡散モデルの研究が画像復元分野への応用を拡大している。従来の拡散に基づく画像復元法では、劣化した画像を条件入力として利用し、元の劣化拡散過程を変更することなく、逆生成プロセスを効果的に導出する。しかし、劣化した画像は、既に低周波情報を含んでいるため、ガウスホワイトノイズから始めるとサンプリングステップが増加する。本稿では,残項を拡散前処理に組み込んだ一般フレームワークであるResfusionを提案する。私たちの推論プロセスの形式はDDPMと一致しています。我々は,残音の重み付けされた残音を予測対象として導入し,残音における残音項と残音項の量的関係を明示した。滑らかな等価変換を利用することで、Resfusionは最適な加速度ステップを決定し、既存のノイズスケジュールの整合性を維持し、トレーニングと推論プロセスを統一する。実験の結果,Resfusion は ISTD データセット,OL データセット,Raindrop データセットに対して,わずか5つのサンプリングステップで競合性能を示すことがわかった。さらに、画像生成に簡単に適用でき、強力な汎用性で現れる。私たちのコードとモデルはhttps://github.com/nkicsl/Resfusion.comで公開されています。

Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the reverse generation process, without modifying the original denoising diffusion process. However, since the degraded images already include low-frequency information, starting from Gaussian white noise will result in increased sampling steps. We propose Resfusion, a general framework that incorporates the residual term into the diffusion forward process, starting the reverse process directly from the noisy degraded images. The form of our inference process is consistent with the DDPM. We introduced a weighted residual noise, named resnoise, as the prediction target and explicitly provide the quantitative relationship between the residual term and the noise term in resnoise. By leveraging a smooth equivalence transformation, Resfusion determine the optimal acceleration step and maintains the integrity of existing noise schedules, unifying the training and inference processes. The experimental results demonstrate that Resfusion exhibits competitive performance on ISTD dataset, LOL dataset and Raindrop dataset with only five sampling steps. Furthermore, Resfusion can be easily applied to image generation and emerges with strong versatility. Our code and model are available at https://github.com/nkicsl/Resfusion.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# DreamPropeller: 並列サンプリングによるスーパーチャージテキスト・ツー・3D生成

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling ( http://arxiv.org/abs/2311.17082v3 )

ライセンス: Link先を確認

Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon,

(参考訳) テキストから3D生成のための2次元拡散モデルを用いたスコア蒸留(SDS)や変分スコア蒸留(VSD)といった最近の手法は、印象的な生成品質を示している。しかし、そのようなアルゴリズムの長寿命化はユーザー体験を著しく劣化させる。そこで我々はDreamPropellerを提案する。このDreamPropellerは、既存のテキストから3D生成パイプラインに、スコアの蒸留に基づいてラップできる加速アルゴリズムである。我々のフレームワークは、ODEパスを並列サンプリングする古典的なアルゴリズムであるPicard繰り返しを一般化し、モーメントベースの勾配更新や最適化プロセス中の寸法変化などの非ODEパスを3次元生成の場合と同様に考慮することができる。提案アルゴリズムは, 並列計算をウォールクロック時間で処理し, 最大4.7倍の高速化を実現する。

Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 異常運転行動検出のためのサロゲート安全対策を用いたデータ駆動半教師付き機械学習

Data-driven Semi-supervised Machine Learning with Surrogate Safety Measures for Abnormal Driving Behavior Detection ( http://arxiv.org/abs/2312.04610v4 )

ライセンス: Link先を確認

Yongqi Dong, Lanxin Zhang, Haneen Farah, Arkady Zgonnikov, Bart van Arem,

(参考訳) 道路交通の安全と運転者の行動評価には,異常運転行動の検出が重要である。機械学習(ML)アルゴリズムの進歩と自然主義駆動データの蓄積により、多くのMLモデルが異常運転行動検出に採用されている。既存のMLベースの検出器の多くは(完全に)教師付きML法に依存しており、かなりのラベル付きデータを必要とする。しかし、地上の真理ラベルは必ずしも現実世界で利用できておらず、大量のデータをラベル付けするのは面倒である。したがって、異常検出プロセスをより効果的かつ効果的にするために、教師なしまたは半教師なしの手法を検討する必要がある。このギャップを埋めるために,本研究では,複数の異常運転行動(例えば,急激な加速,高速車線変更)を明らかにする大規模実世界のデータを分析し,部分ラベル付きデータを用いて階層的エクストリーム学習マシン(HELM)に基づく半教師付きML法を開発し,その異常運転動作を正確に検出する。さらに、従来のMLベースアプローチでは、基本車両の動作特性(速度や加速度など)を利用して異常運転行動のラベル付けと検出を行うのに対して、本研究では、MLモデルの入力機能としてサロゲート安全対策(SSM)を導入し、検出性能を向上させることを目的とする。実験結果から,提案した半教師付きMLモデルの有効性を示すとともに,SSMが重要な特徴であることを示す。提案した半教師付きML法は、様々な指標(例えば、99.58%で最高の精度、0.9913で最高のF-1測定値)に関して、他のベースラインの半教師付きあるいは教師なしの手法よりも優れている。アブレーション研究は, 検出性能向上におけるSSMsの重要性をさらに強調した。

Detecting abnormal driving behavior is critical for road traffic safety and the evaluation of drivers' behavior. With the advancement of machine learning (ML) algorithms and the accumulation of naturalistic driving data, many ML models have been adopted for abnormal driving behavior detection. Most existing ML-based detectors rely on (fully) supervised ML methods, which require substantial labeled data. However, ground truth labels are not always available in the real world, and labeling large amounts of data is tedious. Thus, there is a need to explore unsupervised or semi-supervised methods to make the anomaly detection process more feasible and efficient. To fill this research gap, this study analyzes large-scale real-world data revealing several abnormal driving behaviors (e.g., sudden acceleration, rapid lane-changing) and develops a Hierarchical Extreme Learning Machines (HELM) based semi-supervised ML method using partly labeled data to accurately detect the identified abnormal driving behaviors. Moreover, previous ML-based approaches predominantly utilize basic vehicle motion features (such as velocity and acceleration) to label and detect abnormal driving behaviors, while this study seeks to introduce Surrogate Safety Measures (SSMs) as the input features for ML models to improve the detection performance. Results from extensive experiments demonstrate the effectiveness of the proposed semi-supervised ML model with the introduced SSMs serving as important features. The proposed semi-supervised ML method outperforms other baseline semi-supervised or unsupervised methods regarding various metrics, e.g., delivering the best accuracy at 99.58% and the best F-1 measure at 0.9913. The ablation study further highlights the significance of SSMs for advancing detection performance.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# ホログラフィーレニーエントロピーのための改良型宇宙ブレインの提案

A Modified Cosmic Brane Proposal for Holographic Renyi Entropy ( http://arxiv.org/abs/2312.04625v2 )

ライセンス: Link先を確認

Xi Dong, Jonah Kudler-Flam, Pratik Rath,

(参考訳) 本稿では,複数面が存在する場合のホログラフィック・レニーエントロピーの計算式を提案する。提案手法は、固定領域状態に基づいて波動関数を計算し、レーニエントロピーの対角近似を仮定する。 Renyi index $n\geq1$ に対して、我々の提案はホログラフィック Renyi entropy に対する既存の宇宙ブレインの提案と一致している。しかし、$n<1$の場合、我々の提案は(ニュートンの定数$G$で)宇宙ブレインの提案を補正する新しい位相を予測している。固定領域状態に対する最適化の観点からは、この2つの提案の違いは最適化の順序から理解することができる:$n<1$の場合、宇宙ブレイン提案は最小限の処方令であるのに対して、我々の提案は最大限の処方令である。このような先行順序補正の存在を実例で示す。特に,本提案では,PSSYモデルと高エネルギー固有状態の文献における既存の結果を再現し,前述した先行順序補正を$n<1$ Renyiエントロピーに普遍的に説明する。

We propose a new formula for computing holographic Renyi entropies in the presence of multiple extremal surfaces. Our proposal is based on computing the wave function in the basis of fixed-area states and assuming a diagonal approximation for the Renyi entropy. For Renyi index $n\geq1$, our proposal agrees with the existing cosmic brane proposal for holographic Renyi entropy. For $n<1$, however, our proposal predicts a new phase with leading order (in Newton's constant $G$) corrections to the cosmic brane proposal, even far from entanglement phase transitions and when bulk quantum corrections are unimportant. Recast in terms of optimization over fixed-area states, the difference between the two proposals can be understood to come from the order of optimization: for $n<1$, the cosmic brane proposal is a minimax prescription whereas our proposal is a maximin prescription. We demonstrate the presence of such leading order corrections using illustrative examples. In particular, our proposal reproduces existing results in the literature for the PSSY model and high-energy eigenstates, providing a universal explanation for previously found leading order corrections to the $n<1$ Renyi entropies.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# ソフトウェア定義VANETのためのスタック型アンサンブル学習IDSモデル

A Stacked Ensemble Learning IDS Model for Software-Defined VANET ( http://arxiv.org/abs/2312.04956v4 )

ライセンス: Link先を確認

Shakil Ibne Ahsan, Phil Legg, S M Iftekharul Alam,

(参考訳) 侵入検知システム(IDS)は、外部ネットワークのセキュリティイベントを検出し、緩和するために広く利用されている。 VANET(Vehicle ad-hoc Networks)は特にコネクテッド・オートモービルズ(CAV)の開発で進化している。したがって、新興技術において従来のIDSアプローチをどのように活用できるかを評価することが不可欠である。この問題に対処するため,本研究では,複数の機械学習アルゴリズムを組み合わせることで,単一のアルゴリズム手法よりも効果的に脅威を検出することを目的とした,集積型アンサンブル学習手法を提案する。 CICIDS2017とVeReMiベンチマークデータセットを使用して、我々のアプローチのパフォーマンスを既存の機械学習手法と比較し、脅威を特定するのがより正確であることを確かめる。また,ハイパーパラメータ最適化と特徴選択を取り入れて,性能をさらに向上する。以上の結果から,累積アンサンブル学習はIDSの有効性を高める上で有望な手法であることが示唆された。

Intrusion Detection Systems (IDS) are widely employed to detect and mitigate external network security events. VANETs (Vehicle ad-hoc Networks) are evolving, especially with the development of Connected Autonomous Vehicles (CAVs). So, it is crucial to assess how traditional IDS approaches can be utilised for emerging technologies. To address this concern, our work presents a stacked ensemble learning approach for IDS, which combines multiple machine learning algorithms to detect threats more effectively than single algorithm methods. Using the CICIDS2017 and the VeReMi benchmark data sets, we compare the performance of our approach with existing machine learning methods and find that it is more accurate at identifying threats. Our method also incorporates hyperparameter optimization and feature selection to improve its performance further. Overall, our results suggest that stacked ensemble learning is a promising technique for enhancing the effectiveness of IDS.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 量子アルゴリズムを用いたランダムハイパーグラフMAX-3-XORSAT問題の近似性について

On the approximability of random-hypergraph MAX-3-XORSAT problems with quantum algorithms ( http://arxiv.org/abs/2312.06104v3 )

ライセンス: Link先を確認

Eliot Kapit, Brandon A. Barton, Sean Feeney, George Grattan, Pratik Patnaik, Jacob Sagal, Lincoln D. Carr, Vadim Oganesyan,

(参考訳) NPにおける制約満足度問題の標準的特徴は近似硬度であり、最悪の場合、すべての既知の方法において十分品質の近似解を見つけることは指数関数的に困難である。基本的に、ガイド付き局所最小脱出法が欠如していることは、古典的近似の正確さと近似的な近似の硬さの両方を保証するが、量子アルゴリズムの等価メカニズムはよく理解されていない。ハミルトニアン時間進化に基づくアルゴリズムでは、原型的にハードなMAX-3-XORSAT問題クラスを通してこの問題を探索する。量子精度と近似硬さのメカニズムは根本的に異なると結論付けている。論文の既知結果をレビューし,従来の量子法(例えばAdiabatic Quantum Computing)の弱い近似アルゴリズムを最悪の場合に用いるメカニズムを同定する。我々はこれらの問題から逃れるスペクトルフィルタリング量子アルゴリズムのファミリーを構築し、その性能に関する解析理論を開発する。近似系におけるランダムなハイパーグラフに対して、エネルギーを$E = N_{\mathrm{unsat}}-N_{\mathrm{sat}}$と定義すると、スペクトルフィルタリングされた量子最適化は準四進時間において$E \leq q_m E_{\mathrm{GS}}$(ここで$E_{\rm GS}$は基底状態エネルギー)で状態を返す。これは、古典的な検索を行う最も難しいインスタンスに対して$q_m \to 0$と対照的である。これらすべての主張を広範な数値シミュレーションで検証する。我々は、この近似保証がすべての可能なハイパーグラフを保持できると主張するわけではないが、我々のアルゴリズムのメカニズムは広く一般化される可能性がある。これらの結果は、量子コンピュータが以前想定されていたよりも近似最適化に強力であることを示唆している。

A canonical feature of the constraint satisfaction problems in NP is approximation hardness, where in the worst case, finding sufficient-quality approximate solutions is exponentially hard for all known methods. Fundamentally, the lack of any guided local minimum escape method ensures both exact and approximate classical approximation hardness, but the equivalent mechanism(s) for quantum algorithms are poorly understood. For algorithms based on Hamiltonian time evolution, we explore this question through the prototypically hard MAX-3-XORSAT problem class. We conclude that the mechanisms for quantum exact and approximation hardness are fundamentally distinct. We review known results from the literature, and identify mechanisms that make conventional quantum methods (such as Adiabatic Quantum Computing) weak approximation algorithms in the worst case. We construct a family of spectrally filtered quantum algorithms that escape these issues, and develop analytical theories for their performance. We show that, for random hypergraphs in the approximation-hard regime, if we define the energy to be $E = N_{\mathrm{unsat}}-N_{\mathrm{sat}}$, spectrally filtered quantum optimization will return states with $E \leq q_m E_{\mathrm{GS}}$ (where $E_{\rm GS}$ is the ground state energy) in sub-quadratic time, where conservatively, $q_m \simeq 0.59$. This is in contrast to $q_m \to 0$ for the hardest instances with classical searches. We test all of these claims with extensive numerical simulations. We do not claim that this approximation guarantee holds for all possible hypergraphs, though our algorithm's mechanism can likely generalize widely. These results suggest that quantum computers are more powerful for approximate optimization than had been previously assumed.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 構造化状態空間モデルはディープ・ウィーナーモデルである

Structured state-space models are deep Wiener models ( http://arxiv.org/abs/2312.06211v2 )

ライセンス: Link先を確認

Fabio Bonassi, Carl Andersson, Per Mattsson, Thomas B. Schön,

(参考訳) 本研究の目的は,構造化状態空間モデル (Structured State-space Models, SSM) に対するシステム識別フレンドリな導入を提供することである。これらのモデルは、その並列化性のため、非常に長いシーケンス分類と回帰問題に取り組むために効率よく、訓練できるため、機械学習コミュニティで最近人気になっている。興味深いことに、SSMは深層Wienerモデルを学習する効果的な方法として現れ、システム識別によく使用されるモデルクラスの拡張としてSSMを再構成することができる。機械学習とシステム識別コミュニティ間のアイデアの多様さを刺激するために,最近のトピックに対するコントリビューションを構造化され,アクセス可能な形式で要約することが有用であると考えられる。最後に、このコミュニティが影響力のあるコントリビューションを提供するための今後の研究の方向性を強調します。

The goal of this paper is to provide a system identification-friendly introduction to the Structured State-space Models (SSMs). These models have become recently popular in the machine learning community since, owing to their parallelizability, they can be efficiently and scalably trained to tackle extremely-long sequence classification and regression problems. Interestingly, SSMs appear as an effective way to learn deep Wiener models, which allows to reframe SSMs as an extension of a model class commonly used in system identification. In order to stimulate a fruitful exchange of ideas between the machine learning and system identification communities, we deem it useful to summarize the recent contributions on the topic in a structured and accessible form. At last, we highlight future research directions for which this community could provide impactful contributions.

翻訳日:2024-05-22 00:00:07 公開日:2024-05-20

# 学習とリコール : 事前学習型言語モデルによるインクリメンタルラーニングの再考

Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models ( http://arxiv.org/abs/2312.07887v2 )

ライセンス: Link先を確認

Junhao Zheng, Shengjie Qiu, Qianli Ma,

(参考訳) インクリメンタルラーニング(IL)は、ビジョンと自然言語処理(NLP)コミュニティにおいて長年の課題であった。近年、PLM(Pre-trained Language Models)は様々なNLP下流タスクにおいて顕著な進歩を遂げており、最近のNLPにおけるIL研究において、PLMをバックボーンとして活用することが一般的となっている。殆どの人は、破滅的な忘れが優れたIL性能を達成するための最大の障害であると仮定し、この問題を克服するための様々な手法を提案する。しかし、この仮定は問題となる。具体的には,4つの分類タスク(テキスト分類,インテント分類,関係抽出,名前付きエンティティ認識)について,最も一般的な2つのIL設定(クラスインクリメンタルとタスクインクリメンタル)に基づいて20以上の手法を再検討し,PLMの固有のアンチフォジット能力を著しく過小評価していることを明らかにする。そこで本研究では,PLMを用いたILのためのSEQ*というフラストレーションに富んだ手法を提案する。その結果,SEQ* は最新式 (SOTA) の IL 法に比べて性能が優れており,トレーニング時間やトレーニング時間もかなり少ないことがわかった。これらの知見は, ILをPLMで再考し, 今後の研究がPLMにおける破滅的な忘れを根本的に理解することを促すものである。データ、コード、スクリプトはhttps://github.com/zzz47zzz/pretrained-lm-for-incremental-learningで公開されている。

Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# ASLseg:半監督肝腫瘍分節に対するSAMのループへの適応

ASLseg: Adapting SAM in the Loop for Semi-supervised Liver Tumor Segmentation ( http://arxiv.org/abs/2312.07969v2 )

ライセンス: Link先を確認

Shiyun Chen, Li Lin, Pujin Cheng, Xiaoying Tang,

(参考訳) 肝腫瘍の分節化は, コンピュータ診断, 手術計画, 予後評価に必須である。しかし、高密度アノテーションによる大規模データセットの取得と維持は困難である。セミ・スーパーバイザード・ラーニング(SSL)はこれらの課題に対処するための一般的なテクニックである。近年,Segment Anything Model (SAM) は,いくつかの画像分割作業において有望な性能を示したが,肝腫瘍のセグメンテーションでは不十分であった。本稿では,新しい半教師付きフレームワークであるASLsegを提案する。これはSAMをSSL設定に効果的に適応し,肝腫瘍のドメイン固有知識と一般知識を組み合わせることができる。具体的には、特定のSSLパラダイムでトレーニングされたセグメンテーションモデルは、微調整されたSAMへのプロンプトとして生成された擬似ラベルを提供する。次に、適応ネットワークを使用してSAM予測を洗練し、高品質な擬似ラベルを生成する。最後に、信頼された擬似ラベルを選択して、反復訓練のためのラベル付きセットを拡張する。 LiTSデータセットの大規模な実験は、当社のASLセグの圧倒的な性能を示している。

Liver tumor segmentation is essential for computer-aided diagnosis, surgical planning, and prognosis evaluation. However, obtaining and maintaining a large-scale dataset with dense annotations is challenging. Semi-Supervised Learning (SSL) is a common technique to address these challenges. Recently, Segment Anything Model (SAM) has shown promising performance in some medical image segmentation tasks, but it performs poorly for liver tumor segmentation. In this paper, we propose a novel semi-supervised framework, named ASLseg, which can effectively adapt the SAM to the SSL setting and combine both domain-specific and general knowledge of liver tumors. Specifically, the segmentation model trained with a specific SSL paradigm provides the generated pseudo-labels as prompts to the fine-tuned SAM. An adaptation network is then used to refine the SAM-predictions and generate higher-quality pseudo-labels. Finally, the reliable pseudo-labels are selected to expand the labeled set for iterative training. Extensive experiments on the LiTS dataset demonstrate overwhelming performance of our ASLseg.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# DIRECT:不均衡とラベルノイズ下での深層能動学習

DIRECT: Deep Active Learning under Imbalance and Label Noise ( http://arxiv.org/abs/2312.09196v3 )

ライセンス: Link先を確認

Shyam Nuggehalli, Jifan Zhang, Lalit Jain, Robert Nowak,

(参考訳) クラス不均衡は、実世界の機械学習アプリケーションでは一般的な問題であり、希少クラスや少数クラスではパフォーマンスが低下することが多い。ワイルドなラベル付きデータの豊富さによって、アクティブラーニングは、おそらくその根底にある問題を解決する最も効果的なテクニックである。ラベルノイズは、データアノテーションジョブのもう1つの一般的な問題であり、特にアクティブな学習方法では難しい。本研究では,クラス不均衡とラベルノイズの両面において,アクティブラーニングの最初の研究を行う。本稿では,クラス分離閾値を頑健に同定し,最も近い不確実な例を注釈する新しいアルゴリズムを提案する。 DIRECTは,一次元アクティブラーニングへの新たな削減を通じて,古典的なアクティブラーニング文献を活用し,バッチラベリングやラベルノイズに対する耐性といった問題に対処することができる。ラベルノイズを伴わない不均衡データセットについて広範な実験を行った。 DIRECTは,最先端のアクティブ学習アルゴリズムと比較して60%以上のアノテーション予算を節約でき,また,ランダムサンプリングに比べて80%以上のアノテーション予算を節約できることを示した。

Class imbalance is a prevalent issue in real world machine learning applications, often leading to poor performance in rare and minority classes. With an abundance of wild unlabeled data, active learning is perhaps the most effective technique in solving the problem at its root -- collecting a more balanced and informative set of labeled examples during annotation. Label noise is another common issue in data annotation jobs, which is especially challenging for active learning methods. In this work, we conduct the first study of active learning under both class imbalance and label noise. We propose a novel algorithm that robustly identifies the class separation threshold and annotates the most uncertain examples that are closest from it. Through a novel reduction to one-dimensional active learning, our algorithm DIRECT is able to leverage the classic active learning literature to address issues such as batch labeling and tolerance towards label noise. We present extensive experiments on imbalanced datasets with and without label noise. Our results demonstrate that DIRECT can save more than 60% of the annotation budget compared to state-of-art active learning algorithms and more than 80% of annotation budget compared to random sampling.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# 深部ドラム音源分離に向けて

Toward Deep Drum Source Separation ( http://arxiv.org/abs/2312.09663v3 )

ライセンス: Link先を確認

Alessandro Ilic Mezza, Riccardo Giampiccolo, Alberto Bernardini, Augusto Sarti,

(参考訳) これまで、ドラムソース分離の分野は、データ可用性が限られており、他の関連するオーディオアプリケーションで成功を収めた最先端のディープラーニング手法の採用を妨げていたため、重大な課題に直面していた。本稿では,独立した単一構造ドラムステムの大規模オーディオデータセットであるStemGMDを紹介する。 10個の実音響ドラムキットを用いて、各オーディオクリップを表現型ドラム演奏のMIDI記録から合成する。トータルで1224時間、StemGMDはドラムの最大のオーディオデータセットであり、標準の9ピースドラムキットですべての楽器のための独立したオーディオクリップを初めて作成した。我々は、StemGMDを利用して、新しいディープドラムソース分離モデルであるLarsNetを開発した。専用U-Netのバンクを通じて、LarsNetはステレオドラムの混合物から5本の幹をリアルタイムより高速に分離することができ、最先端の非負の分光時間分解法よりも著しく優れていることを示す。

In the past, the field of drum source separation faced significant challenges due to limited data availability, hindering the adoption of cutting-edge deep learning methods that have found success in other related audio applications. In this manuscript, we introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems. Each audio clip is synthesized from MIDI recordings of expressive drums performances using ten real-sounding acoustic drum kits. Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date and the first to comprise isolated audio clips for every instrument in a canonical nine-piece drum kit. We leverage StemGMD to develop LarsNet, a novel deep drum source separation model. Through a bank of dedicated U-Nets, LarsNet can separate five stems from a stereo drum mixture faster than real-time and is shown to significantly outperform state-of-the-art nonnegative spectro-temporal factorization methods.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# 決定のための識別表現事前学習に基づくトップkサブタスクプランニングツリーの学習

Learning Top-k Subtask Planning Tree based on Discriminative Representation Pre-training for Decision Making ( http://arxiv.org/abs/2312.11027v2 )

ライセンス: Link先を確認

Jingqing Ruan, Kaishen Wang, Qingyang Zhang, Dengpeng Xing, Bo Xu,

(参考訳) 多くの複雑な現実世界のタスクは、より小さく、より管理しやすい部分に分割することができる。しかし、このプロセスを複製することはAIエージェントにとって課題であり、自然に2つの疑問を提起する。複雑な問題を分解する合理的プランを開発するには? 単一エンコーダ構造を用いた既存の表現学習手法の多くは脆弱で、複雑で多様な力学に敏感である。この問題に対処するために、簡単なサブタスクのための十分なデータからタスク固有表現を学習するマルチエンコーダと個別予測システムを導入する。複数のエンコーダは、混乱することなく適切なタスク関連ダイナミクスを抽出することができ、共有予測器はタスク特性を識別することができる。また、注意機構を用いてトップkのサブタスク計画木を生成し、未確認タスクの複雑な決定を導くためにサブタスク実行計画をカスタマイズする。このプロセスは、計画木の深さと幅を柔軟に調整し、前方視とグローバル性を実現する。いくつかの基本的な単純なタスクと組合せ的にリッチな合成タスクからなる挑戦的なプラットフォーム上での実証的な結果は、競争力のあるベースラインを一貫して上回り、我々の設計の利点を実証する。

Many complicated real-world tasks can be broken down into smaller, more manageable parts, and planning with prior knowledge extracted from these simplified pieces is crucial for humans to make accurate decisions. However, replicating this process remains a challenge for AI agents and naturally raises two questions: How to extract discriminative knowledge representation from priors? How to develop a rational plan to decompose complex problems? Most existing representation learning methods employing a single encoder structure are fragile and sensitive to complex and diverse dynamics. To address this issue, we introduce a multiple-encoder and individual-predictor regime to learn task-essential representations from sufficient data for simple subtasks. Multiple encoders can extract adequate task-relevant dynamics without confusion, and the shared predictor can discriminate the task characteristics. We also use the attention mechanism to generate a top-k subtask planning tree, which customizes subtask execution plans in guiding complex decisions on unseen tasks. This process enables forward-looking and globality by flexibly adjusting the depth and width of the planning tree. Empirical results on a challenging platform composed of some basic simple tasks and combinatorially rich synthetic tasks consistently outperform some competitive baselines and demonstrate the benefits of our design.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# トロコイド探索最適化

Trochoid Search Optimization ( http://arxiv.org/abs/2312.13597v2 )

ライセンス: Link先を確認

Abdesslem Layeb,

(参考訳) 本稿では,トロコイド曲線の数学的特性を利用した新しいメタヒューリスティックであるトロコイド探索最適化アルゴリズム(TSO)を提案する。 TSOアルゴリズムは、トロコイド固有の同時翻訳運動と回転運動のユニークな組み合わせを採用し、爆発的な探索能力と搾取的な探索能力の間の洗練された平衡を育む。特に、TSOは、その効率性と有効性に一括して寄与する、グローバルとローカルの2つの重要なフェーズで構成されている。実験による検証は、TSOアルゴリズムが様々なベンチマーク関数にまたがる顕著な性能を示し、探索空間における探索とエクスプロイトのバランスのとれた競争力のあるエッジを示す。 TSOの際立った特徴は単純さにある。ユーザ定義パラメータの最小限の要件が特徴であり、アクセス可能で強力な最適化ツールである。

This paper introduces the Trochoid Search Optimization Algorithm (TSO), a novel metaheuristic leveraging the mathematical properties of trochoid curves. The TSO algorithm employs a unique combination of simultaneous translational and rotational motions inherent in trochoids, fostering a refined equilibrium between explorative and exploitative search capabilities. Notably, TSO consists of two pivotal phases global and local search that collectively contribute to its efficiency and efficacy. Experimental validation demonstrates the TSO algorithm's remarkable performance across various benchmark functions, showcasing its competitive edge in balancing exploration and exploitation within the search space. A distinguishing feature of TSO lies in its simplicity, marked by a minimal requirement for user-defined parameters, making it an accessible yet powerful optimization tool.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# 吸収分布はコンセンサスにどのように影響するか : ブロックチェーンの分散化の分析

How Does Stake Distribution Influence Consensus? Analyzing Blockchain Decentralization ( http://arxiv.org/abs/2312.13938v3 )

ライセンス: Link先を確認

Shashank Motepalli, Hans-Arno Jacobsen,

(参考訳) PoSブロックチェーンの世界では、完全な分散化を実現する上での課題は、少数のバリデータ間でステンドトークンが不均等に集中していることによって、しばしば妨げられます。本研究では、重み付けされたコンセンサス機構のための分散化指標を最初に定式化することにより、この課題を解析する。 10個の無許可ブロックチェーンに対する実証分析により、バリデータ間のかなりの重量集中が明らかとなり、等価なアプローチの必要性が強調された。これに対応するために,重み分布を効果的に再検討するSquare Root Stake Weight (SRSW) モデルを提案する。 Gini指数は平均37.16%向上し, 中本指数は平均101.04%, 80.09%向上した。この研究は、ブロックチェーンのコンセンサスメカニズムにおける分散化を推進し、より公平で公平なステイクウェイト分布に向けた重要なステップである。

In the PoS blockchain landscape, the challenge of achieving full decentralization is often hindered by a disproportionate concentration of staked tokens among a few validators. This study analyses this challenge by first formalizing decentralization metrics for weighted consensus mechanisms. An empirical analysis across ten permissionless blockchains uncovers significant weight concentration among validators, underscoring the need for an equitable approach. To counter this, we introduce the Square Root Stake Weight (SRSW) model, which effectively recalibrates staking weight distribution. Our examination of the SRSW model demonstrates notable improvements in the decentralization metrics: the Gini index improves by 37.16% on average, while Nakamoto coefficients for liveness and safety see mean enhancements of 101.04% and 80.09%, respectively. This research is a pivotal step toward a more fair and equitable distribution of staking weight, advancing the decentralization in blockchain consensus mechanisms.

翻訳日:2024-05-21 23:50:08 公開日:2024-05-20

# 工学的正規微分方程式を分類アルゴリズム(EODECA):徹底的な特徴付けと試験

Engineered Ordinary Differential Equations as Classification Algorithm (EODECA): thorough characterization and testing ( http://arxiv.org/abs/2312.14681v2 )

ライセンス: Link先を確認

Raffaele Marino, Lorenzo Buffoni, Lorenzo Chicchi, Lorenzo Giambagli, Duccio Fanelli,

(参考訳) EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) は、機械学習と動的システム理論の共通部分における新しいアプローチであり、分類タスクのためのユニークなフレームワークである[1]。この方法は、通常の微分方程式(ODE)を用いて、複雑な分類課題を効率的に扱うことによって、その力学系構造を際立たせる。論文は、EODECAの動的特性を考察し、ランダムな摂動に対するレジリエンスと、さまざまな分類シナリオにおける堅牢なパフォーマンスを強調した。特に、EODECAの設計には、安定したアトラクタをフェーズ空間に埋め込む機能が含まれており、信頼性を高め、可逆的なダイナミクスを可能にする。本稿では,作業 [1] を拡張し,オイラー離散化方式を用いて包括的解析を行う。特に,EODECAの性能を5つの異なる分類問題に分けて評価し,適応性と効率性を検討した。重要なことは、EODECAがMNISTデータセットとFashion MNISTデータセットで有効であることを実証し、それぞれ98.06\%と8.21\%の印象的な精度を達成したことである。これらの結果は多層パーセプトロン(MLP)に匹敵するものであり、複雑なデータ処理タスクにおけるEODECAの可能性を示している。我々は、モデルの学習の旅をさらに探求し、前と後の両方のトレーニング環境におけるその進化を評価し、安定した誘引者に向かう能力を強調します。この研究は、EODECAの可逆性、意思決定プロセスと内部作業に光を当てることについても検討している。本稿では、より透明で堅牢な機械学習パラダイムに向けて、機械学習アルゴリズムと動的システム方法論のギャップを埋める重要なステップを示す。

EODECA (Engineered Ordinary Differential Equations as Classification Algorithm) is a novel approach at the intersection of machine learning and dynamical systems theory, presenting a unique framework for classification tasks [1]. This method stands out with its dynamical system structure, utilizing ordinary differential equations (ODEs) to efficiently handle complex classification challenges. The paper delves into EODECA's dynamical properties, emphasizing its resilience against random perturbations and robust performance across various classification scenarios. Notably, EODECA's design incorporates the ability to embed stable attractors in the phase space, enhancing reliability and allowing for reversible dynamics. In this paper, we carry out a comprehensive analysis by expanding on the work [1], and employing a Euler discretization scheme. In particular, we evaluate EODECA's performance across five distinct classification problems, examining its adaptability and efficiency. Significantly, we demonstrate EODECA's effectiveness on the MNIST and Fashion MNIST datasets, achieving impressive accuracies of $98.06\%$ and $88.21\%$, respectively. These results are comparable to those of a multi-layer perceptron (MLP), underscoring EODECA's potential in complex data processing tasks. We further explore the model's learning journey, assessing its evolution in both pre and post training environments and highlighting its ability to navigate towards stable attractors. The study also investigates the invertibility of EODECA, shedding light on its decision-making processes and internal workings. This paper presents a significant step towards a more transparent and robust machine learning paradigm, bridging the gap between machine learning algorithms and dynamical systems methodologies.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# Bipartiete Mixed Separable States を用いた Ancilla-Assisted Process Tomography

Ancilla-Assisted Process Tomography with Bipartiete Mixed Separable States ( http://arxiv.org/abs/2312.14901v3 )

ライセンス: Link先を確認

Zhuoran Bao, Daniel F. V. James,

(参考訳) システム状態とアシラリー状態の絡み合いは、アシラ支援プロセス断層撮影(AAPT)を行うための厳密な要件ではないことが示されている。代わりに、システム・アンシラ状態が忠実であることを要求するだけであり、実際には、状態を表すある行列の可逆性である。本稿は、量子過程に関する完全な情報を抽出できる状態が忠実であること、および2キュービットのシステム・アンシラ状態における単一キュービットの操作に制限されたプロセスについて述べる。本稿では,2つの量子ビットの相関関係を定量化する,可逆性問題とシニスターネスの概念を結びつける理論的解析について述べる。シニスターネス(Sinisterness)を用いて、運用上の意味に忠実であることが保証された2量子状態を構築し、プロセスの平均誤差に基づいて境界を推定する手法を導出する。我々の分析は、最大絡み合った状態が最小の誤差増幅を与えることに一致している。それでも、エンタングルメントの利点が始める数値領域をマップアウトする。

It has been shown that the entanglement between the system state and the ancillary state is not a strict requirement for performing ancilla-assisted process tomography(AAPT). Instead, it only requires that the system-ancilla state be faithful, which, in practice, is the invertibility of a certain matrix representing the state. Our paper takes on the operational definition of faithfulness, i.e., a state is faithful if one can extract complete information about the quantum process, and we restrict the process to single-qubit operations on a two-qubit system-ancilla state. We present a theoretical analysis to connect the invertibility problem to the concept of Sinisterness, which quantifies the correlation of two qubits. Using Sinisterness, we derive a way of constructing two-qubit states that are guaranteed to be faithful in an operational sense and estimate the bound on the average error of the process. Our analysis agrees that the maximally entangled states provided the smallest error amplification. Nevertheless, it maps out a numerical region where the advantage of the entanglement starts.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# コヒーレント散乱による2つの浮遊ナノ粒子の同時地中冷却

Simultaneous ground-state cooling of two levitated nanoparticles by coherent scattering ( http://arxiv.org/abs/2312.15898v2 )

ライセンス: Link先を確認

Yi Xu, Yu-Hong Liu, Cheng Liu, Jie-Qiao Liao,

(参考訳) 2つの浮遊ナノ粒子の同時冷却は、量子エンタングルメントや粒子の翻訳運動を伴う量子相関のようなマクロ的な量子効果を研究するための重要な前提条件である。ここでは,共振器共振粒子系について考察し,ハミルトンの詳細な導出について述べる。 2つの粒子の$y$-direction運動は空洞場と$x$-および$z$-direction運動の両方から切り離され、さらに$z$-direction運動は粒子の適切な位置を選択することにより空洞場と$x$-direction運動からさらに切り離される。 3モード・5モード共振器共振器共振器共振器モデルにおけるこれらの機械モードの同時冷却について検討した。 2つのツイーザーが同じパワーを持つ場合、同時に地中冷却を抑えるダークモード効果が存在することが判明した。それでも、これらのモードの同時冷却は、適切なパラメータの下でダークモード効果を破ることによって実現できる。本システムは,共振器共振器共振器系における量子効果と応用を研究するための汎用的なプラットフォームを提供する。

Simultaneous ground-state cooling of two levitated nanoparticles is a crucial prerequisite for investigation of macroscopic quantum effects such as quantum entanglement and quantum correlation involving translational motion of particles. Here we consider a coupled cavity-levitated-particle system and present a detailed derivation of its Hamiltonian. We find that the $y$-direction motions of the two particles are decoupled from the cavity field and both the $x$- and $z$-direction motions, and that the $z$-direction motions can be further decoupled from the cavity field and the $x$-direction motions by choosing proper locations of the particles. We study the simultaneous cooling of these mechanical modes in both the three-mode and five-mode cavity-levitated optomechanical models. It is found that there exists the dark-mode effect when the two tweezers have the same powers, which suppress the simultaneous ground-state cooling. Nevertheless, the simultaneous ground-state cooling of these modes can be realized by breaking the dark-mode effect under proper parameters. Our system provides a versatile platform to study quantum effects and applications in cavity-levitated optomechanical systems.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# 凸確率計画における平均平均近似のための計量エントロピー自由サンプル複素境界

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming ( http://arxiv.org/abs/2401.00664v2 )

ライセンス: Link先を確認

Hongcheng Liu, Jindong Tong,

(参考訳) 本稿では、凸あるいは強凸確率計画問題の解法におけるサンプル平均近似(SAA)について検討する。いくつかの共通正規性条件の下では、おそらく初めて、SAAのサンプルの複雑さが(被覆数の対数のような)計量エントロピーの量子化から完全に解放されることを示し、既存のほとんどの結果よりも次元$d$のかなり効率的な速度をもたらす。新たに確立された複雑性境界から、SAA と正準確率ミラー降下(SMD)法は、SP に対する2つの主流解法であり、サンプル効率のほぼ同じ率を伴い、SAA の長期理論上の矛盾を$O(d)$ の順序で修正する。さらに,SAAが証明可能な有効性を維持する非リプシッツ的シナリオについて検討する一方,SMDの対応は未検討であり,不規則な環境下でのSAAのよりよい適用可能性を示している。

This paper studies the sample average approximation (SAA) in solving convex or strongly convex stochastic programming problems. Under some common regularity conditions, we show -- perhaps for the first time -- that the SAA's sample complexity can be completely free from any quantification of metric entropy (such as the logarithm of the covering number), leading to a significantly more efficient rate with dimensionality $d$ than most existing results. From the newly established complexity bounds, an important revelation is that the SAA and the canonical stochastic mirror descent (SMD) method, two mainstream solution approaches to SP, entail almost identical rates of sample efficiency, rectifying a long-standing theoretical discrepancy of the SAA from the SMD by the order of $O(d)$. Furthermore, this paper explores non-Lipschitzian scenarios where the SAA maintains provable efficacy, whereas corresponding results for the SMD remain unexplored, indicating the potential of the SAA's better applicability in some irregular settings.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# 光ワイドエリア通信ネットワークのための量子アニーリングにより実現したILPベースの資源最適化 -量子アニーリングによる実世界のアプリケーションにおける組合せ問題を解くためのフレームワーク

ILP-based Resource Optimization Realized by Quantum Annealing for Optical Wide-area Communication Networks -- A Framework for Solving Combinatorial Problems of a Real-world Application by Quantum Annealing ( http://arxiv.org/abs/2401.00826v2 )

ライセンス: Link先を確認

Arthur Witt, Jangho Kim, Christopher Körber, Thomas Luu,

(参考訳) 広域インターネットネットワークのリソース割り当ては、本質的には組合せ最適化の問題であり、高速に解決すれば、ネットワークの有効性とロバスト性を高めるとともに、電力供給トランシーバからのエネルギー要求を最小限に抑えながら、インターネットプロトコールトラフィックのほぼリアルタイムな適応制御を提供できる。近年の研究では、D-Wave AdvantageTM量子アニールシステムに組み込むことができる2次非拘束二元最適化(QUBO)問題として、そのような問題をいかに実装できるかを実証し、原理実証を行った。我々の初期の研究は、システム実行パラメータの司法的選択によるD-Waveソリューションの改善の可能性を広げた。本稿では、これらのシステムのパラメータを最適化するための調査と、ソリューションの品質をさらに向上させるために機械学習(ML)技術をどのように組み込んだかについて報告する。特に,ハミング距離を用いて,様々なシステム実行パラメータと解ベクトルの相関関係について検討する。次に、これらの相関関係を学習するために決定木ニューラルネットワーク(NN)を適用し、ニューラルネットワークを使用して解ベクトルにさらなる推測を提供する。我々は、このNNを単純な整数線形プログラミング(ILP)の例で実装し、D-Waveが取得しなかった解空間をNNが完全にマッピングする方法を実証した。しかし、3ノードネットワークの問題に対して、NNはソリューションの空間の質を高めることができない。

Resource allocation of wide-area internet networks is inherently a combinatorial optimization problem that if solved quickly, could provide near real-time adaptive control of internet-protocol traffic ensuring increased network efficacy and robustness, while minimizing energy requirements coming from power-hungry transceivers. In recent works we demonstrated how such a problem could be cast as a quadratic unconstrained binary optimization (QUBO) problem that can be embedded onto the D-Wave AdvantageTM quantum annealer system, demonstrating proof of principle. Our initial studies left open the possibility for improvement of D-Wave solutions via judicious choices of system run parameters. Here we report on our investigations for optimizing these system parameters, and how we incorporate machine learning (ML) techniques to further improve on the quality of solutions. In particular, we use the Hamming distance to investigate correlations between various system-run parameters and solution vectors. We then apply a decision tree neural network (NN) to learn these correlations, with the goal of using the neural network to provide further guesses to solution vectors. We successfully implement this NN in a simple integer linear programming (ILP) example, demonstrating how the NN can fully map out the solution space that was not captured by D-Wave. We find, however, for the 3-node network problem the NN is not able to enhance the quality of space of solutions.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# 高分解能ジコトコス像の両側参照

Bilateral Reference for High-Resolution Dichotomous Image Segmentation ( http://arxiv.org/abs/2401.03407v3 )

ライセンス: Link先を確認

Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe,

(参考訳) 高分解能ディコトコス像分割(DIS)のための新しい両側参照フレームワーク(BiRefNet)を導入する。本研究は,2つの基本成分: 局所化モジュール (LM) と再構成モジュール (RM) を, 提案した両側参照 (BiRef) で構成する。 LMはグローバルな意味情報を用いたオブジェクトのローカライゼーションを支援する。 RM内では、画像の階層的パッチがソース参照を提供し、勾配マップがターゲット参照として機能する、再構成プロセスにBiRefを利用する。これらのコンポーネントは、最終的な予測マップを生成するために協力する。また,より詳細な領域に焦点を絞るために,補助的な勾配監督を導入する。さらに、地図の質とトレーニングプロセスを改善するために、Disdisに適した実践的なトレーニング戦略を概説する。提案手法の汎用性を検証するため,BiRefNetがすべてのベンチマークにおいて,タスク固有の最先端手法よりも優れた性能を示すことを示すため,4つのタスクについて広範な実験を行った。私たちのコードはhttps://github.com/ZhengPeng7/BiRefNetで公開されています。

We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# シュワルツシルト時空における真の三部体の非局所性と絡み合いのデコヒーレンス下での増幅

Amplification of genuine tripartite nonlocality and entanglement in the Schwarzschild spacetime under decoherence ( http://arxiv.org/abs/2401.04407v2 )

ライセンス: Link先を確認

Chunyao Liu, Zhengwen Long, Qiliang He,

(参考訳) シュワルツシルトブラックホールの背景における真の三部体非局所性(GTN)と真の三部体エンタングルメント(GTE)のデコヒーレンス下での局所濾過操作による増幅について検討した。物理的にアクセス可能なGTNはデコヒーレンスによって完全に破壊され、物理的にアクセス可能なGTNがシステム内に存在しないことが示されている。特に、局所フィルタリング操作は、物理的にアクセス可能なGTNを一定範囲のホーキング温度内に表示させることができ、すなわち、局所フィルタリング操作は、物理的にアクセス可能なGTNを、以前に発見されておらず、量子情報処理の恩恵を受ける環境と結合したシステム内で生成することができる。さらに、物理的にアクセス可能なGTEは、ほとんどの場合において無限のホーキング温度の極限で安定な値に近づき、もしデコヒーレンスパラメータ$p$が1未満であれば、デコヒーレンス強度が十分大きい場合には、GTEの ‘sudden death'' が発生する。なお、非ゼロ安定値のGTEは、デコヒーレンスの存在下であっても、局所フィルタリング操作を行うことで増大させることができる。最後に, 物理的に到達不能なGTNとGTEの生成をデコヒーレンス下で検討し, 物理的に到達不能なGTNは生成できないが, 物理的に到達不能なGTEは生成可能であることを示した。さらに, 局所フィルタリング操作を適用することにより, 生成した物理的に到達不能なGTEを増大させることができる。

We investigate the amplification of the genuine tripartite nonlocality(GTN) and the genuine tripartite entanglement(GTE) of Dirac particles in the background of a Schwarzschild black hole by a local filtering operation under decoherence. It is shown that the physically accessible GTN will be completely destroyed by decoherence, which means that the physically accessible GTN will not exist in the system. Particularly, the local filtering operation can make the physically accessible GTN appear within a certain range of Hawking temperature, namely, the local filtering operation can cause the physically accessible GTN to be generated in the system coupled with the environment, which is not discovered before and is benefit for the quantum information processing. Furthermore, we also find that the physically accessible GTE approaches a stable value in the limit of infinite Hawking temperature for most cases, but if the decoherence parameter $p$ is less than 1, the ``sudden death'' of GTE will take place when the decoherence strength is large enough. It is worth noting that the nonzero stable value of GTE can be increased by performing the local filtering operation, even in the presence of decoherence. Finally, we explore the generation of physically inaccessible GTN and GTE of other tripartite subsystems under decoherence, it is shown that the physically inaccessible GTN cannot be produced, but the physically inaccessible GTE can be produced. In addition, we can see that the generated physically inaccessible GTE can be increased by applying the local filtering operation.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# データ駆動物理インフォームドニューラルネットワーク:デジタル双対視点

Data-Driven Physics-Informed Neural Networks: A Digital Twin Perspective ( http://arxiv.org/abs/2401.08667v4 )

ライセンス: Link先を確認

Sunwoong Yang, Hojin Kim, Yoonpyo Hong, Kwanjung Yee, Romit Maulik, Namwoo Kang,

(参考訳) 本研究では, 物理インフォームドニューラルネットワーク(PINN)によるディジタル双生児(DT)の実現の可能性について, 様々な観点から検討した。まず,手動によるメッシュ生成を伴わない仮想表現の自動構築を可能にするPINNのメッシュフリーフレームワークにおいて,コロケーションポイントに対する様々な適応サンプリング手法の有効性を検証した。次に,データ駆動型PINN(DD-PINN)フレームワークの全体的な性能について検討し,DTシナリオで取得したデータセットを活用する。より一般的な物理学へのスケーラビリティはパラメトリックなナビエ・ストークス方程式で検証され、レイノルズ数が異なるため、PINNは再訓練される必要はない。また, 実際に異なる忠実度/疎度からデータセットを収集できるため, 多忠実DD-PINNも提案され, 評価されている。これらは外挿タスクにおいても顕著な予測性能を示し、シングルフィデリティアプローチよりも42\sim62\%$改善されている。最後に,多要素DD-PINNの不確実性定量化性能をアンサンブル法を用いて検討し,精度の高い予測不確かさの測定が重要であるDTにおけるその可能性を検証する。この研究で調べたDD-PINNフレームワークは、上記の観点から従来のPINNよりもDTシナリオに適していることが分かり、エンジニアはシームレスなDTの実現に一歩近づいた。

This study explores the potential of physics-informed neural networks (PINNs) for the realization of digital twins (DT) from various perspectives. First, various adaptive sampling approaches for collocation points are investigated to verify their effectiveness in the mesh-free framework of PINNs, which allows automated construction of virtual representation without manual mesh generation. Then, the overall performance of the data-driven PINNs (DD-PINNs) framework is examined, which can utilize the acquired datasets in DT scenarios. Its scalability to more general physics is validated within parametric Navier-Stokes equations, where PINNs do not need to be retrained as the Reynolds number varies. In addition, since datasets can be often collected from different fidelity/sparsity in practice, multi-fidelity DD-PINNs are also proposed and evaluated. They show remarkable prediction performance even in the extrapolation tasks, with $42\sim62\%$ improvement over the single-fidelity approach. Finally, the uncertainty quantification performance of multi-fidelity DD-PINNs is investigated by the ensemble method to verify their potential in DT, where an accurate measure of predictive uncertainty is critical. The DD-PINN frameworks explored in this study are found to be more suitable for DT scenarios than traditional PINNs from the above perspectives, bringing engineers one step closer to seamless DT realization.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# 分散ランダムネットワーク蒸留による探索と反探索

Exploration and Anti-Exploration with Distributional Random Network Distillation ( http://arxiv.org/abs/2401.09750v4 )

ライセンス: Link先を確認

Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li,

(参考訳) エージェントが未知の環境で高いリターンを得るための深層強化学習において、探索は依然として重要な課題である。探索的ランダムネットワーク蒸留(RND)アルゴリズムは、多くの環境で有効であることが証明されているが、しばしばボーナスアロケーションにおいてより識別力を必要とする。本稿では、RNDにおける「結合不整合」の問題を強調し、その主な限界を指摘する。この問題に対処するために、RNDの派生である分布式RND(DRND)を導入する。 DRNDは、ランダムネットワークの分布を蒸留し、疑似カウントを暗黙的に取り入れて、ボーナス割り当ての精度を向上させることにより、探索プロセスを強化する。この改良により、エージェントはより広範な探査に従事した。本手法は,計算オーバーヘッドの増大を伴わずに,不整合問題を効果的に軽減する。理論的解析と実験結果は、元のRNDアルゴリズムよりも我々のアプローチの方が優れていることを示している。本手法は,D4RLオフラインタスクにおいて,オンライン探索シナリオの挑戦に優れ,探索防止機構として効果的に機能する。私たちのコードはhttps://github.com/yk7333/DRND.comで公開されています。

Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https://github.com/yk7333/DRND.

翻訳日:2024-05-21 23:40:18 公開日:2024-05-20

# PatchAD: 時系列異常検出のための軽量パッチベースMLPミキサ

PatchAD: A Lightweight Patch-based MLP-Mixer for Time Series Anomaly Detection ( http://arxiv.org/abs/2401.09793v4 )

ライセンス: Link先を確認

Zhijie Zhong, Zhiwen Yu, Yiyuan Yang, Weizheng Wang, Kaixiang Yang,

(参考訳) 時系列解析における異常検出は重要な課題であるが、ラベル不足シナリオにおける正常パターンと異常パターンを識別することが課題となっている。以前の研究では、モデルの表現能力を制限する再構成に基づくアプローチが大半を占めていた。さらに、既存のディープラーニングベースの手法は十分に軽量ではない。これらの問題に対処するため,表現抽出と異常検出にコントラスト学習を利用する,新しいマルチスケールパッチベースのマルチスケールMLP-MixerアーキテクチャであるPatchADを提案する。 4つの異なるMLPミキサーと革新的なデュアルプロジェクト制約モジュールにより、PatchADは潜在的なモデル劣化を軽減し、わずか3.2$MBの軽量なソリューションを提供する。その有効性は、異なるアプリケーションシナリオから得られる9ドルのデータセットの最先端の結果によって実証され、30ドルの比較アルゴリズムよりも優れています。 PatchAD は古典的な F1 スコアを 50.5 %$ で、Aff-F1 スコアを 7.8 %$ で、AUC スコアを $10.0 %$ で大幅に改善する。コードは公開されている。 \url{https://github.com/EmorZz1G/PatchAD}

Anomaly detection in time series analysis is a pivotal task, yet it poses the challenge of discerning normal and abnormal patterns in label-deficient scenarios. While prior studies have largely employed reconstruction-based approaches, which limits the models' representational capacities. Moreover, existing deep learning-based methods are not sufficiently lightweight. Addressing these issues, we present PatchAD, our novel, highly efficient multiscale patch-based MLP-Mixer architecture that utilizes contrastive learning for representation extraction and anomaly detection. With its four distinct MLP Mixers and innovative dual project constraint module, PatchAD mitigates potential model degradation and offers a lightweight solution, requiring only $3.2$MB. Its efficacy is demonstrated by state-of-the-art results across $9$ datasets sourced from different application scenarios, outperforming over $30$ comparative algorithms. PatchAD significantly improves the classical F1 score by $50.5\%$, the Aff-F1 score by $7.8\%$, and the AUC by $10.0\%$. The code is publicly available. \url{https://github.com/EmorZz1G/PatchAD}

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# Marabou 2.0: ニューラルネットワークのVersatile形式分析ツール

Marabou 2.0: A Versatile Formal Analyzer of Neural Networks ( http://arxiv.org/abs/2401.14461v2 )

ライセンス: Link先を確認

Haoze Wu, Omri Isac, Aleksandar Zeljić, Teruhiro Tagomori, Matthew Daggitt, Wen Kokke, Idan Refaeli, Guy Amir, Kyle Julian, Shahaf Bassan, Pei Huang, Ori Lahav, Min Wu, Min Zhang, Ekaterina Komendantskaya, Guy Katz, Clark Barrett,

(参考訳) 本稿では,ニューラルネットワークの形式解析のためのMarabouフレームワークのバージョン2.0の包括的システム記述として機能する。ツールのアーキテクチャ設計について議論し、最初のリリース以降に導入された主要な機能とコンポーネントを強調します。

This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release.

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# 古典的量子貯水池計算の統一的普遍性条件

Universality conditions of unified classical and quantum reservoir computing ( http://arxiv.org/abs/2401.15067v3 )

ライセンス: Link先を確認

Francesco Monzani, Enrico Prati,

(参考訳) 貯留層コンピューティング(Reservoir computing)は、動的システム(貯水池)の非線形ダイナミクスを利用して時間依存情報を効率的に処理する、計算神経科学と機械学習の多用途パラダイムである。導入以来、様々なアプリケーションで顕著な能力を発揮してきた。広く知られているように、貯水池コンピュータのクラスは、暗くなるメモリを持つ関数の普遍的な近似器として機能する。そのような普遍類の構成はしばしば文脈固有のように見えるが、実際にはそれらは同じ原理に従う。ここでは、統一された理論的枠組みを示し、普遍性を確保するための準備が整った設定を提案する。量子貯水池計算は,従来の量子貯水池計算と量子貯水池計算の統一的な見方に光を当てている。

Reservoir computing is a versatile paradigm in computational neuroscience and machine learning, that exploits the non-linear dynamics of a dynamical system - the reservoir - to efficiently process time-dependent information. Since its introduction, it has exhibited remarkable capabilities in various applications. As widely known, classes of reservoir computers serve as universal approximators of functionals with fading memory. The construction of such universal classes often appears context-specific, but in fact, they follow the same principles. Here we present a unified theoretical framework and we propose a ready-made setting to secure universality. We test the result in the arising context of quantum reservoir computing.The analysis sheds light on a unified view of classical and quantum reservoir computing.

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# チェンフライス級数によるニューラルオードのラデマッハ複素度

Rademacher Complexity of Neural ODEs via Chen-Fliess Series ( http://arxiv.org/abs/2401.16655v3 )

ライセンス: Link先を確認

Joshua Hanson, Maxim Raginsky,

(参考訳) 本稿では, 非線形ODEに対するChen-Fliess級数展開を用いて, 連続深度ニューラルODEモデルを単一層無限幅ネットとしてフレーム化する方法を示す。このネットでは, 制御入力のシグネチャから出力 `‘weights'' を抽出し, 無限次元パスをテンソルの列として表現するツールであるテンソルの列から, 制御入力の繰り返し積分を構成する。 `features'' は、制御されたODEモデルのベクトル場に関して出力関数のリー微分を反復化したものである。この研究の主な成果は、初期条件をある終端時間にスカラー出力にマッピングするODEモデルのラデマッハ複雑性に対するコンパクトな表現を導出するために、このフレームワークを適用することである。その結果、単層アーキテクチャで得られる素直な分析が活用される。いくつかの特定のシステムのバウンダリをインスタンス化して、潜在的なフォローアップ作業について議論する例で締めくくります。

We show how continuous-depth neural ODE models can be framed as single-layer, infinite-width nets using the Chen--Fliess series expansion for nonlinear ODEs. In this net, the output ``weights'' are taken from the signature of the control input -- a tool used to represent infinite-dimensional paths as a sequence of tensors -- which comprises iterated integrals of the control input over a simplex. The ``features'' are taken to be iterated Lie derivatives of the output function with respect to the vector fields in the controlled ODE model. The main result of this work applies this framework to derive compact expressions for the Rademacher complexity of ODE models that map an initial condition to a scalar output at some terminal time. The result leverages the straightforward analysis afforded by single-layer architectures. We conclude with some examples instantiating the bound for some specific systems and discuss potential follow-up work.

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# スケジューリングされた好奇心-ディープダイナ-Q:対話政策学習のための効率的な探索

Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning ( http://arxiv.org/abs/2402.00085v2 )

ライセンス: Link先を確認

Xuecheng Niu, Akinori Ito, Takashi Nose,

(参考訳) 強化学習に基づくタスク指向対話エージェントの訓練には時間を要する。限られたダイアログ体験の中でダイアログポリシーを把握する方法は、エージェントのトレーニングプロセスの効率を損なう障害である。さらに、従来のほとんどのフレームワークは、人間の学習方法とは異なるトレーニングサンプルをランダムに選択することでトレーニングを開始し、トレーニングの効率と安定性を損なう。そこで本研究では,現状のモデルに基づく強化学習ダイアログモデルであるDeep Dyna-Q(DDQ)に基づく,好奇心駆動型カリキュラム学習フレームワークであるSchduled Curiosity-Deep Dyna-Q(SC-DDQ)を提案する。さらに,SC-DDQ と DDQ の学習スケジュールを,古典的カリキュラム学習と逆バージョンという2つの逆の学習戦略に従って設計した。その結果,本フレームワークは,スケジュール学習と好奇心を導入することで,DDQとディープQラーニング(DQN)を大幅に改善することがわかった。驚いたことに、従来のカリキュラム学習は必ずしも効果的ではなかった。具体的には、実験結果によると、SC-DDQ と DDQ には、より容易で難易度の高い戦略が適している。実験結果から,実験結果のエントロピーを用いて行動探索を図った結果,第1段階では高いエントロピー,最終段階では低いエントロピーのトレーニング戦略により,より優れた性能が得られることがわかった。

Training task-oriented dialog agents based on reinforcement learning is time-consuming and requires a large number of interactions with real users. How to grasp dialog policy within limited dialog experiences remains an obstacle that makes the agent training process less efficient. In addition, most previous frameworks start training by randomly choosing training samples, which differs from the human learning method and hurts the efficiency and stability of training. Therefore, we propose Scheduled Curiosity-Deep Dyna-Q (SC-DDQ), a curiosity-driven curriculum learning framework based on a state-of-the-art model-based reinforcement learning dialog model, Deep Dyna-Q (DDQ). Furthermore, we designed learning schedules for SC-DDQ and DDQ, respectively, following two opposite training strategies: classic curriculum learning and its reverse version. Our results show that by introducing scheduled learning and curiosity, the new framework leads to a significant improvement over the DDQ and Deep Q-learning(DQN). Surprisingly, we found that traditional curriculum learning was not always effective. Specifically, according to the experimental results, the easy-first and difficult-first strategies are more suitable for SC-DDQ and DDQ. To analyze our results, we adopted the entropy of sampled actions to depict action exploration and found that training strategies with high entropy in the first stage and low entropy in the last stage lead to better performance.

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# 組合せ最適化のためのハイパーヒューリスティックスとしての大規模言語モデル

Large Language Models as Hyper-Heuristics for Combinatorial Optimization ( http://arxiv.org/abs/2402.01145v2 )

ライセンス: Link先を確認

Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, Guojie Song,

(参考訳) NP-hard combinatorial optimization problem (COP) の完全解釈は、ドメインの専門家をトライアル・アンド・エラー・ヒューリスティックな設計に駆り立てる。設計自動化の長年の取り組みは、大きな言語モデル(LLM)の台頭とともに、新たな勢いを増している。本稿では,LHH(Language Hyper-Heuristics)を提案する。LHH(Language Hyper-Heuristics)は,LLMをヒューリスティック生成に活用し,最小限の手動介入とオープンエンドヒューリスティック空間を特徴とする。 LHHを効果的に探索するための進化探索の新たな統合であるRelective Evolution(ReEvo)と、空間内の言語勾配を提供するLLMリフレクションを提案する。 5つの異種アルゴリズムタイプ、6つの異なるCOP、そして、COPのホワイトボックスとブラックボックスの両方のビューにおいて、ReEvoは最先端で競争的なメタヒューリスティック、進化アルゴリズム、ヒューリスティック、ニューラルソルバを出力し、従来のLHHよりもサンプル効率が高い。私たちのコードは、https://github.com/ai4co/LLM-as-HH.comで利用可能です。

The omnipresence of NP-hard combinatorial optimization problems (COPs) compels domain experts to engage in trial-and-error heuristic design. The long-standing endeavor of design automation has gained new momentum with the rise of large language models (LLMs). This paper introduces Language Hyper-Heuristics (LHHs), an emerging variant of Hyper-Heuristics that leverages LLMs for heuristic generation, featuring minimal manual intervention and open-ended heuristic spaces. To empower LHHs, we present Reflective Evolution (ReEvo), a novel integration of evolutionary search for efficiently exploring the heuristic space, and LLM reflections to provide verbal gradients within the space. Across five heterogeneous algorithmic types, six different COPs, and both white-box and black-box views of COPs, ReEvo yields state-of-the-art and competitive meta-heuristics, evolutionary algorithms, heuristics, and neural solvers, while being more sample-efficient than prior LHHs. Our code is available: https://github.com/ai4co/LLM-as-HH.

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# 多様性が集団意思決定に及ぼす影響

The effect of diversity on group decision-making ( http://arxiv.org/abs/2402.01427v2 )

ライセンス: Link先を確認

Georgi Karadzhov, Andreas Vlachos, Tom Stafford,

(参考訳) 認知の多様性の異なる側面と、それが集団検討の成功に与える影響を考察する。これを評価するために、Wason Card SelectionタスクであるDeliData corpusについて議論する小さなオンライングループから500の対話を使用します。コーパスを活用することで,認知多様性の3つの異なる尺度を定量的に評価する。まず,多様性のプロキシ尺度としてグループサイズの影響を分析する。第2に、初期アイデアプールのサイズの影響を評価する。最後に、議論の解決策、議論のパターン、そして会話の探索がそれらの特性をどのように改善するかを分析し、議論の内容について考察する。複合バイアスに対するグループの評価にもかかわらず、小集団は対話を通じて直感的なバイアスを克服し、個人の意思決定を改善することができることを示す。大規模なサンプルと異なる運用方法を通じて、より認知的な多様性がより成功したグループ熟考と結びついていることが一貫して分かる。分析に使用されるコードとデータは、リポジトリで利用可能である。

We explore different aspects of cognitive diversity and its effect on the success of group deliberation. To evaluate this, we use 500 dialogues from small, online groups discussing the Wason Card Selection task - the DeliData corpus. Leveraging the corpus, we perform quantitative analysis evaluating three different measures of cognitive diversity. First, we analyse the effect of group size as a proxy measure for diversity. Second, we evaluate the effect of the size of the initial idea pool. Finally, we look into the content of the discussion by analysing discussed solutions, discussion patterns, and how conversational probing can improve those characteristics. Despite the reputation of groups for compounding bias, we show that small groups can, through dialogue, overcome intuitive biases and improve individual decision-making. Across a large sample and different operationalisations, we consistently find that greater cognitive diversity is associated with more successful group deliberation. Code and data used for the analysis are available in the repository: https://github.com/gkaradzhov/cognitive-diversity-groups-cogsci24.

翻訳日:2024-05-21 23:30:28 公開日:2024-05-20

# Bellman Infinity-error を用いた最適対向ロバストQ-ラーニングに向けて

Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error ( http://arxiv.org/abs/2402.02165v2 )

ライセンス: Link先を確認

Haoran Li, Zicheng Zhang, Wang Luo, Congying Han, Yudong Hu, Tiande Guo, Shichen Liao,

(参考訳) 強固な政策を確立することは、深層強化学習(DRL)エージェントに影響を及ぼす攻撃や妨害に対抗するために不可欠である。近年の研究では、国家と対立するロバスト性を探求し、最適ロバストポリシー(ORP)の潜在的な欠如を示唆し、厳密なロバスト性制約を設定する上での課題を提起している。はじめに、マルコフ決定過程における最適な行動は、経験的および理論的証拠によって支えられた小さな摂動と一貫しているというポリシー(CAP)の一貫性の仮定を導入する。 CAPを基盤として,ベルマン最適政策に適合する決定論的かつ定常なORPの存在を決定的に証明する。さらに、ベルマン誤差を最小限に抑えるために、$L^{\infty}$-normの必要性を述べる。この発見は、ベルマン最適ポリシーを$L^{1}$-normでターゲットとする従来のDRLアルゴリズムの脆弱性を明らかにし、ベルマンインフィニティエラーのサロゲートを最小化することにより、一貫性のある逆ロバスト深度Q-Network (CAR-DQN) のトレーニングを動機付ける。 CAR-DQNの様々なベンチマークにおける最上位性能は、その実用性を検証し、理論解析の健全性を補強する。

Establishing robust policies is essential to counter attacks or disturbances affecting deep reinforcement learning (DRL) agents. Recent studies explore state-adversarial robustness and suggest the potential lack of an optimal robust policy (ORP), posing challenges in setting strict robustness constraints. This work further investigates ORP: At first, we introduce a consistency assumption of policy (CAP) stating that optimal actions in the Markov decision process remain consistent with minor perturbations, supported by empirical and theoretical evidence. Building upon CAP, we crucially prove the existence of a deterministic and stationary ORP that aligns with the Bellman optimal policy. Furthermore, we illustrate the necessity of $L^{\infty}$-norm when minimizing Bellman error to attain ORP. This finding clarifies the vulnerability of prior DRL algorithms that target the Bellman optimal policy with $L^{1}$-norm and motivates us to train a Consistent Adversarial Robust Deep Q-Network (CAR-DQN) by minimizing a surrogate of Bellman Infinity-error. The top-tier performance of CAR-DQN across various benchmarks validates its practical effectiveness and reinforces the soundness of our theoretical analysis.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# RSCNet: クラウドベースのWiFiセンシングのための動的CSI圧縮

RSCNet: Dynamic CSI Compression for Cloud-based WiFi Sensing ( http://arxiv.org/abs/2402.04888v2 )

ライセンス: Link先を確認

Borna Barahimi, Hakam Singh, Hina Tabassum, Omer Waqar, Mohammad Omer,

(参考訳) WiFi対応のIoT(Internet-of-Things)デバイスは、単なる通信デバイスから計測機器へと進化し、チャネル状態情報(CSI)抽出機能を活用している。それでも、リソース制約のあるIoTデバイスとディープニューラルネットワークの複雑さは、センシングのためにCSIをクラウドサーバに送信する必要がある。実現可能ではあるが、これはかなりの通信オーバーヘッドにつながる。本稿では,圧縮CSIによるセンシングが可能な新しいリアルタイムセンシング・圧縮ネットワーク(RSCNet)を開発し,通信オーバーヘッドを低減する。 RSCNetは、いくつかのCSIフレームからなるCSIウィンドウ間の最適化を容易にする。クラウドサーバに送信されると、Long Short-Term Memory (LSTM) ユニットを使用して、前のウィンドウからのデータを利用する。 RSCNetは、CSI圧縮とセンシング精度のトレードオフを十分にバランスさせ、通信コストを削減し、リアルタイムクラウドベースのWiFiセンシングを合理化する。数値的な発見は、SenseFiのような既存のベンチマークよりもRCCNetが向上していることを示し、最小のCSI再構成誤差で97.4%の感度の精度を示した。また,CSIフレーム数の関数として提案したRCCNetの計算解析を行った。

WiFi-enabled Internet-of-Things (IoT) devices are evolving from mere communication devices to sensing instruments, leveraging Channel State Information (CSI) extraction capabilities. Nevertheless, resource-constrained IoT devices and the intricacies of deep neural networks necessitate transmitting CSI to cloud servers for sensing. Although feasible, this leads to considerable communication overhead. In this context, this paper develops a novel Real-time Sensing and Compression Network (RSCNet) which enables sensing with compressed CSI; thereby reducing the communication overheads. RSCNet facilitates optimization across CSI windows composed of a few CSI frames. Once transmitted to cloud servers, it employs Long Short-Term Memory (LSTM) units to harness data from prior windows, thus bolstering both the sensing accuracy and CSI reconstruction. RSCNet adeptly balances the trade-off between CSI compression and sensing precision, thus streamlining real-time cloud-based WiFi sensing with reduced communication costs. Numerical findings demonstrate the gains of RSCNet over the existing benchmarks like SenseFi, showcasing a sensing accuracy of 97.4% with minimal CSI reconstruction error. Numerical results also show a computational analysis of the proposed RSCNet as a function of the number of CSI frames.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# 浅部ReLU様ニューラルネットワークのランドスケープ:静止点,サドルエスケープ,ネットワーク埋め込み

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escaping, and Network Embedding ( http://arxiv.org/abs/2402.05626v2 )

ライセンス: Link先を確認

Zhengqing Wu, Berfin Simsek, Francois Ged,

(参考訳) 本稿では,経験的二乗損失を学習したReLU様活性化関数を持つ一層ニューラルネットワークの損失状況について検討する。アクティベーション関数は微分不可能であるため、固定点を完全に特徴づける方法は今のところ不明である。非微分可能ケースと微分可能ケースの両方に適用可能な定常条件を提案する。さらに、定常点が一階条件で定義される「エスケープニューロン」を含まない場合、局所最小値でなければならないことを示す。さらに、スカラーアウトプットの場合、エスケープニューロンの存在は、静止点が局所的な最小値でないことを保証している。その結果,浅部ReLU様ネットワークに対する無限小の初期化から始まり,サドルからサドルまでのトレーニングプロセスの記述を洗練し,サドルから脱出したニューロンのパラメータ変化と直接関連付けることができた。さらに、より広いネットワーク内でより狭いネットワークをインスタンス化するネットワーク埋め込みが、静止点を再設定する方法について、十分に議論することができる。

In this paper, we investigate the loss landscape of one-hidden-layer neural networks with ReLU-like activation functions trained with the empirical squared loss. As the activation function is non-differentiable, it is so far unclear how to completely characterize the stationary points. We propose the conditions for stationarity that apply to both non-differentiable and differentiable cases. Additionally, we show that, if a stationary point does not contain "escape neurons", which are defined with first-order conditions, then it must be a local minimum. Moreover, for the scalar-output case, the presence of an escape neuron guarantees that the stationary point is not a local minimum. Our results refine the description of the saddle-to-saddle training process starting from infinitesimally small (vanishing) initialization for shallow ReLU-like networks, linking saddle escaping directly with the parameter changes of escape neurons. Moreover, we are also able to fully discuss how network embedding, which is to instantiate a narrower network within a wider network, reshapes the stationary points.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# ファクティクスとファクティクスの融合: 長期世代における集合的ファクティカルクレームのコントラクティクティヴな性質の評価

Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations ( http://arxiv.org/abs/2402.05629v3 )

ライセンス: Link先を確認

Cheng-Han Chiang, Hung-yi Lee,

(参考訳) 大規模言語モデル(LLM)からの長文生成には、事実と非事実のクレームが混在しており、事実性を評価するのが困難である。よりきめ細かな方法で長文生成の事実精度を評価するために、先行研究では、長文生成を複数の検証可能な事実に分解し、それらの事実を独立に検証することを提案する。生成の事実は、すべての事実の中で検証可能な事実の割合である。このような手法は、事実的クレームの組み合わせが事実的段落を形成すると仮定する。本稿では,エンティティの曖昧さによって仮定に違反する可能性があることを示す。 LLMは、検証可能な事実を含む段落を生成することができるが、実体的曖昧さのため、事実が組み合わさって非事実的段落を形成する。さらに、FActScoreや引用リコールを含む既存の事実精度指標が、これらの非事実項の事実性を適切に評価できないことも明らかにした。そこで本研究では,不明瞭なエンティティを持つコンテンツを対象とした拡張メトリックD-FActScoreを提案する。検索増強世代(RAG)で生成された人物のD-FActScoresを評価する。 D-FActScore は FActScore よりもエンティティの曖昧さで段落の事実性を評価することができることを示す。また,4つのオープンソース LLM が,異なるエンティティの情報を混合して非実数項を形成する傾向にあることも確認した。

Long-form generations from large language models (LLMs) contains a mix of factual and non-factual claims, making evaluating factuality difficult. To evaluate factual precision of long-form generations in a more fine-grained way, prior works propose to decompose long-form generations into multiple verifiable facts and verify those facts independently. The factuality of the generation is the proportion of verifiable facts among all the facts. Such methods assume that combining factual claims forms a factual paragraph. This paper shows that the assumption can be violated due to entity ambiguity. We show that LLMs can generate paragraphs that contain verifiable facts, but the facts are combined to form a non-factual paragraph due to entity ambiguity. We further reveal that existing factual precision metrics, including FActScore and citation recall, cannot properly evaluate the factuality of these non-factual paragraphs. To address this, we introduce an enhanced metric, D-FActScore, specifically designed for content with ambiguous entities. We evaluate the D-FActScores of people biographies generated with retrieval-augmented generation (RAG). We show that D-FActScore can better assess the factuality of paragraphs with entity ambiguity than FActScore. We also find that four widely used open-source LLMs tend to mix information of distinct entities to form non-factual paragraphs.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# REMEDI: 改良されたニューラルエントロピー推定のための補正変換

REMEDI: Corrective Transformations for Improved Neural Entropy Estimation ( http://arxiv.org/abs/2402.05718v2 )

ライセンス: Link先を確認

Viktor Nilsson, Anirban Samaddar, Sandeep Madireddy, Pierre Nyquist,

(参考訳) 情報理論量は、機械学習において中心的な役割を果たす。近年、データとモデルの複雑さの増大により、これらの量の正確な推定に対する需要が高まっている。しかし、次元が大きくなるにつれて、既存の手法は比較的低次元で既に苦労しているため、推定には大きな課題が生じる。この問題に対処するために、基本的な情報理論量である微分エントロピーの効率的かつ正確な推定のために$\texttt{REMEDI}$を導入する。このアプローチは、単純で適応的なベースモデルに対するクロスエントロピーの最小化と、データ密度から相対エントロピーの観点からそれらの偏差を推定するものである。提案手法は, 合成データと自然データの両方において, エントロピー推定を包含して, 幅広い推定課題にまたがる改善を実証する。さらに、我々は、重要な理論的整合性の結果を、我々のアプローチで要求されるより一般化された設定にまで拡張する。本稿では,このフレームワークを情報理論教師あり学習モデルに自然に拡張する方法について述べる。本手法は既存のインフォメーション・ボトルネック法と比較して精度がよいことを示す。さらに、$\texttt{REMEDI}$と、リジェクションサンプリングとランゲヴィンダイナミクスを用いた生成モデリングとの自然な関係について検討する。

Information theoretic quantities play a central role in machine learning. The recent surge in the complexity of data and models has increased the demand for accurate estimation of these quantities. However, as the dimension grows the estimation presents significant challenges, with existing methods struggling already in relatively low dimensions. To address this issue, in this work, we introduce $\texttt{REMEDI}$ for efficient and accurate estimation of differential entropy, a fundamental information theoretic quantity. The approach combines the minimization of the cross-entropy for simple, adaptive base models and the estimation of their deviation, in terms of the relative entropy, from the data density. Our approach demonstrates improvement across a broad spectrum of estimation tasks, encompassing entropy estimation on both synthetic and natural data. Further, we extend important theoretical consistency results to a more generalized setting required by our approach. We illustrate how the framework can be naturally extended to information theoretic supervised learning models, with a specific focus on the Information Bottleneck approach. It is demonstrated that the method delivers better accuracy compared to the existing methods in Information Bottleneck. In addition, we explore a natural connection between $\texttt{REMEDI}$ and generative modeling using rejection sampling and Langevin dynamics.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# 推薦の優先順位付けのための非自己回帰生成モデル

Non-autoregressive Generative Models for Reranking Recommendation ( http://arxiv.org/abs/2402.06871v2 )

ライセンス: Link先を確認

Yuxin Ren, Qiya Yang, Yichun Wu, Wei Xu, Yalong Wang, Zhiqiang Zhang,

(参考訳) コンテンポラリーレコメンデーションシステムは、ユーザのニーズを満たすために、特定の要求や関心に合わせたアイテムの適切なリストを提供することによって設計されている。多段階レコメンデーションシステムでは、項目間のリスト内相関をモデル化することで、リランクが重要な役割を果たす。再階の鍵となる課題は、置換の組合せ空間内の最適な列の探索である。近年の研究では、ジェネレータが複数の実行可能なシーケンスを生成し、評価器が推定されたリストワイズスコアに基づいて最適なシーケンスを選択する、ジェネレータ-評価器学習パラダイムを提案する。ジェネレータは非常に重要であり、生成モデルはジェネレータ機能に適している。現在の生成モデルは、シーケンス生成のための自己回帰戦略を採用している。しかし、リアルタイム産業システムに自己回帰モデルを展開することは困難である。これらの課題に対処するため,効率と有効性を高めるために,提案するレコメンデーション(NAR4Rec)の再評価のための非自己回帰生成モデルを提案する。スパーストレーニングサンプルや動的候補といった課題に対処するために,マッチングモデルを導入する。ユーザフィードバックの多様性を考えると、実現不可能なシークエンスと不可能なシークエンスを区別するために、シークエンスレベルの相違したトレーニング目標を用いる。さらに,対象項目に関する非自己回帰モデルにおける依存性モデリングの欠如を克服するため,これらの項目間の相関を捉えるためにコントラッシブデコーディングを導入する。大規模なオフライン実験により、NAR4Recは最先端の再ランク法よりも優れた性能を示す。オンラインA/Bテストでは、NAR4Recはユーザーエクスペリエンスを大幅に向上させる。さらに、NAR4Recは、毎日3億人以上のアクティブユーザーがいる人気ビデオアプリKuaishouに完全にデプロイされている。

Contemporary recommendation systems are designed to meet users' needs by delivering tailored lists of items that align with their specific demands or interests. In a multi-stage recommendation system, reranking plays a crucial role by modeling the intra-list correlations among items. The key challenge of reranking lies in the exploration of optimal sequences within the combinatorial space of permutations. Recent research proposes a generator-evaluator learning paradigm, where the generator generates multiple feasible sequences and the evaluator picks out the best sequence based on the estimated listwise score. The generator is of vital importance, and generative models are well-suited for the generator function. Current generative models employ an autoregressive strategy for sequence generation. However, deploying autoregressive models in real-time industrial systems is challenging. To address these issues, we propose a Non-AutoRegressive generative model for reranking Recommendation (NAR4Rec) designed to enhance efficiency and effectiveness. To tackle challenges such as sparse training samples and dynamic candidates, we introduce a matching model. Considering the diverse nature of user feedback, we employ a sequence-level unlikelihood training objective to differentiate feasible sequences from unfeasible ones. Additionally, to overcome the lack of dependency modeling in non-autoregressive models regarding target items, we introduce contrastive decoding to capture correlations among these items. Extensive offline experiments validate the superior performance of NAR4Rec over state-of-the-art reranking methods. Online A/B tests reveal that NAR4Rec significantly enhances the user experience. Furthermore, NAR4Rec has been fully deployed in a popular video app Kuaishou with over 300 million daily active users.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# ジェネレーションの検証 - スマート並列オートコレクトデコーディングによる大規模言語モデル推論の高速化

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding ( http://arxiv.org/abs/2402.11809v3 )

ライセンス: Link先を確認

Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao,

(参考訳) 本研究の目的は,数十億のパラメータを持つ大規模言語モデル(LLM)の推論速度を高速化することである。本稿では, LLMのロスレスアクセラレーションを実現するための革新的なアプローチであるSPACE(textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding)を提案する。半自己回帰推論と投機的復号機能を統合することにより、SPACEはトークン生成と検証を並列化する自動回帰LDMを独自に実現している。これは、既存のLLMに複数のトークンを同時に予測する能力を持たせる、半自己回帰制御された微調整プロセスによって実現される。さらに、自動訂正復号アルゴリズムは、1つのモデル呼び出し内でトークンシーケンスの同時生成と検証を容易にする。幅広い LLM の実験を通じて、SPACE は出力品質を維持しながら、HumanEval-X 上の2.7x-4.0x までの推論速度を実証した。

This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# UniST: 都市時空間予測のためのプロンプト型ユニバーサルモデル

UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction ( http://arxiv.org/abs/2402.11838v2 )

ライセンス: Link先を確認

Yuan Yuan, Jingtao Ding, Jie Feng, Depeng Jin, Yong Li,

(参考訳) 都市空間の時空間予測は交通管理,資源最適化,都市計画といった情報的意思決定に不可欠である。自然言語のための事前訓練された基礎モデルは、様々な領域にまたがる複数のタスクに1つの汎用モデルが取り組むという、驚くべきブレークスルーを経験してきたが、都市空間の時空間モデリングは遅れている。都市予測のための既存のアプローチは、通常特定の時空間シナリオに合わせて調整され、タスク固有のモデル設計と広範なドメイン内トレーニングデータを必要とする。本研究では,都市空間の時空間予測のためのユニバーサルモデルUniSTを提案する。大規模な言語モデルからインスピレーションを得たUniSTは、以下の通り成功している。一多様な時空間データ特性に対する柔軟性 (II)複雑な時空間的関係を捉えるための精巧なマスキング戦略による効果的な生成前訓練三シナリオをまたいだ本質的・共有的知識の整合と活用を図るための時空間的知識誘導プロンプト。これらの設計は、強力な一般化能力を備えた時空間予測のための1対全モデルの可能性を開放するものである。 15の都市と6つのドメインに対する大規模な実験は、特にショット数とゼロショットのシナリオにおいて、最先端の予測性能の進歩におけるUniSTの普遍性を実証している。

Urban spatio-temporal prediction is crucial for informed decision-making, such as transportation management, resource optimization, and urban planning. Although pretrained foundation models for natural languages have experienced remarkable breakthroughs, wherein one general-purpose model can tackle multiple tasks across various domains, urban spatio-temporal modeling lags behind. Existing approaches for urban prediction are usually tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive in-domain training data. In this work, we propose a universal model, UniST, for urban spatio-temporal prediction. Drawing inspiration from large language models, UniST achieves success through: (i) flexibility towards diverse spatio-temporal data characteristics, (ii) effective generative pre-training with elaborated masking strategies to capture complex spatio-temporal relationships, (iii) spatio-temporal knowledge-guided prompts that align and leverage intrinsic and shared knowledge across scenarios. These designs together unlock the potential of a one-for-all model for spatio-temporal prediction with powerful generalization capability. Extensive experiments on 15 cities and 6 domains demonstrate the universality of UniST in advancing state-of-the-art prediction performance, especially in few-shot and zero-shot scenarios.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# AgentScope: 柔軟でロバストなマルチエージェントプラットフォーム

AgentScope: A Flexible yet Robust Multi-Agent Platform ( http://arxiv.org/abs/2402.14034v2 )

ライセンス: Link先を確認

Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou,

(参考訳) LLM(Large Language Models)の急速な進歩により、マルチエージェントアプリケーションにおいて大きな進歩が見られた。しかし、エージェントの協調とLLMの不安定な性能の複雑さは、堅牢で効率的なマルチエージェントアプリケーションを開発する上で顕著な課題となっている。これらの課題に対処するために,メッセージ交換をコア通信機構とする開発者中心のマルチエージェントプラットフォームであるAgentScopeを提案する。豊富な構文ツール、組み込みエージェントとサービス機能、アプリケーションのデモとユーティリティモニタのためのユーザフレンドリなインターフェース、ゼロコードプログラミングワークステーション、自動プロンプトチューニング機構により、開発とデプロイメントの両方の障壁は大幅に低下した。堅牢で柔軟なマルチエージェントアプリケーションを目指して、AgentScopeはビルトインとカスタマイズ可能なフォールトトレランスメカニズムを提供する。同時に、マルチモーダルデータ、ツール、外部知識の管理と利用のためのシステムレベルのサポートも備えている。さらに、ローカルおよび分散デプロイメント間の変換を容易にし、余分な労力なしで自動並列最適化を可能にするアクタベースの分散フレームワークを設計する。これらの機能により、AgentScopeは開発者がインテリジェントエージェントの可能性を完全に認識するアプリケーションを構築することができる。我々はAgentScopeをhttps://github.com/modelscope/agentscopeでリリースしました。

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications. However, the complexities in coordinating agents' cooperation and LLMs' erratic performance pose notable challenges in developing robust and efficient multi-agent applications. To tackle these challenges, we propose AgentScope, a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment. Towards robust and flexible multi-agent application, AgentScope provides both built-in and customizable fault tolerance mechanisms. At the same time, it is also armed with system-level support for managing and utilizing multi-modal data, tools, and external knowledge. Additionally, we design an actor-based distribution framework, enabling easy conversion between local and distributed deployments and automatic parallel optimization without extra effort. With these features, AgentScope empowers developers to build applications that fully realize the potential of intelligent agents. We have released AgentScope at https://github.com/modelscope/agentscope, and hope AgentScope invites wider participation and innovation in this fast-moving field.

翻訳日:2024-05-21 23:20:37 公開日:2024-05-20

# E2USD:多変量時系列の効率的な非教師付き状態検出

E2USD: Efficient-yet-effective Unsupervised State Detection for Multivariate Time Series ( http://arxiv.org/abs/2402.14041v5 )

ライセンス: Link先を確認

Zhichen Lai, Huan Li, Dalin Zhang, Yan Zhao, Weizhu Qian, Christian S. Jensen,

(参考訳) サイバー物理系センサーは物理系プロセスを監視する多変量時系列(MTS)を出力する。このような時系列は一般に、人間の活動監視において「歩き」や「走り」といった特定の条件に対応する、それぞれの期間が異なる未知の状態の数をキャプチャする。このような状態の教師なし識別は、その後のデータ解析における記憶と処理を容易にし、結果の解釈可能性を高める。既存の状態検出提案は3つの課題に直面している。まず、かなりの計算オーバーヘッドを導入し、リソース制約やストリーミング設定で非現実的にレンダリングする。第二に、最先端のSOTA(State-of-the-art)の提案では、表現のための対照的な学習が採用されているが、偽陰性ハッパーモデル収束と精度に対する注意が不十分である。第三に、SOTAの提案は主にオフラインの非ストリーミングデプロイメントにのみ重点を置いており、オンラインストリーミングシナリオを最適化する緊急の必要性を強調しています。本稿では,効率よくyet-accurate unsupervised MTS状態検出が可能なE2Usdを提案する。 E2UsdはFast Fourier Transform-based Time Series Compressor (fftCompress) とDecomposed Dual-view Embedding Module (ddEM) を利用している。さらに,偽陰性の影響を防止し,クラスタフレンドリーな埋め込み空間を実現するために,False Negative Cancellstive Learning法(fnccLearning)を提案する。ストリーミング設定における計算オーバーヘッドを軽減するため,Adaptive Threshold Detection (adaTD)を導入する。 6つのベースラインと6つのデータセットによる総合的な実験は、E2Usdが計算オーバーヘッドを大幅に削減したSOTA精度を持つことを示す。

Cyber-physical system sensors emit multivariate time series (MTS) that monitor physical system processes. Such time series generally capture unknown numbers of states, each with a different duration, that correspond to specific conditions, e.g., "walking" or "running" in human-activity monitoring. Unsupervised identification of such states facilitates storage and processing in subsequent data analyses, as well as enhances result interpretability. Existing state-detection proposals face three challenges. First, they introduce substantial computational overhead, rendering them impractical in resourceconstrained or streaming settings. Second, although state-of-the-art (SOTA) proposals employ contrastive learning for representation, insufficient attention to false negatives hampers model convergence and accuracy. Third, SOTA proposals predominantly only emphasize offline non-streaming deployment, we highlight an urgent need to optimize online streaming scenarios. We propose E2Usd that enables efficient-yet-accurate unsupervised MTS state detection. E2Usd exploits a Fast Fourier Transform-based Time Series Compressor (fftCompress) and a Decomposed Dual-view Embedding Module (ddEM) that together encode input MTSs at low computational overhead. Additionally, we propose a False Negative Cancellation Contrastive Learning method (fnccLearning) to counteract the effects of false negatives and to achieve more cluster-friendly embedding spaces. To reduce computational overhead further in streaming settings, we introduce Adaptive Threshold Detection (adaTD). Comprehensive experiments with six baselines and six datasets offer evidence that E2Usd is capable of SOTA accuracy at significantly reduced computational overhead.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# 機械学習注意モデルを用いた時間確率バイアス補正

A Temporal Stochastic Bias Correction using a Machine Learning Attention model ( http://arxiv.org/abs/2402.14169v4 )

ライセンス: Link先を確認

Omer Nivron, Damon J. Wischik, Mathieu Vrac, Emily Shuckburgh, Alex T. Archibald,

(参考訳) 気候モデルは現実世界の観測に偏っている。通常、それらは衝撃研究に使用される前に調整される必要がある。このような調整を可能にする統計手法の組はバイアス補正(BC)と呼ばれる。しかし、BCの手法は現在、時間的バイアスを調整するのに苦労している。なぜなら彼らは、連続する時間点間の依存をほとんど無視しているからである。その結果、熱波の持続時間や周波数などの長期的特性を持つ気候統計は正確には修正できない。これにより、このような気候統計に関する信頼性の高い影響研究がより困難になる。本稿では,時間バイアスを補正する新しいBC手法を提案する。これは紀元前の背景にある哲学を再考することで可能となった。 BC を確率的出力を伴う時間インデックス回帰タスクとして紹介する。 BCを再考することで、最先端の機械学習(ML)の注意モデルに適応し、時間的非同期性を含むさまざまな種類のバイアスを学ぶことができます。アブハ,ナイジェリア,東京における熱波持続時間統計のケーススタディにより,現在の気象モデルと代替BC法よりも正確な結果が得られた。

Climate models are biased with respect to real-world observations. They usually need to be adjusted before being used in impact studies. The suite of statistical methods that enable such adjustments is called bias correction (BC). However, BC methods currently struggle to adjust temporal biases. Because they mostly disregard the dependence between consecutive time points. As a result, climate statistics with long-range temporal properties, such as heatwave duration and frequency, cannot be corrected accurately. This makes it more difficult to produce reliable impact studies on such climate statistics. This paper offers a novel BC methodology to correct temporal biases. This is made possible by rethinking the philosophy behind BC. We will introduce BC as a time-indexed regression task with stochastic outputs. Rethinking BC enables us to adapt state-of-the-art machine learning (ML) attention models and thereby learn different types of biases, including temporal asynchronicities. With a case study of heatwave duration statistics in Abuja, Nigeria, and Tokyo, Japan, we show more accurate results than current climate model outputs and alternative BC methods.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# API-BLEND: API LLMのトレーニングとベンチマークのための総合コーパス

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs ( http://arxiv.org/abs/2402.15491v2 )

ライセンス: Link先を確認

Kinjal Basu, Ibrahim Abdelaziz, Subhajit Chaudhury, Soham Dan, Maxwell Crouse, Asim Munawar, Sadhana Kumaravel, Vinod Muthusamy, Pavan Kapanipathi, Luis A. Lastras,

(参考訳) ツールと外部アプリケーションプログラミングインターフェース(API)を効果的に利用し、タスクを計画し、完成させるために、LLM(Large Language Models)の必要性はますます高まっている。そのため、ツールやAPIへの呼び出しを含む十分な量のトレインデータやテストデータを取得することのできるメソッドには、非常に関心があります。この課題に対処するための主要な戦略として、2つの研究線が生まれている。ひとつは合成データ生成技術に重点を置いており、もうひとつは、API/ツールベースのタスクに変換可能なタスク関連データセットのキュレーションだ。本稿では,既存のデータセットを特定し,キュレートし,変換するタスクに着目し,ツール拡張LDMのトレーニングと体系的なテストを行うための大規模なコーパスであるAPI-BLENDを導入する。データセットは、API/ツール検出、スロットフィリング、検出されたAPIのシークエンシングといったAPIタスクを含む現実のシナリオを模倣する。トレーニングとベンチマークの両方の目的で,API-BLENDデータセットの有用性を実証する。

There is a growing need for Large Language Models (LLMs) to effectively use tools and external Application Programming Interfaces (APIs) to plan and complete tasks. As such, there is tremendous interest in methods that can acquire sufficient quantities of train and test data that involve calls to tools / APIs. Two lines of research have emerged as the predominant strategies for addressing this challenge. The first has focused on synthetic data generation techniques, while the second has involved curating task-adjacent datasets which can be transformed into API / Tool-based tasks. In this paper, we focus on the task of identifying, curating, and transforming existing datasets and, in turn, introduce API-BLEND, a large corpora for training and systematic testing of tool-augmented LLMs. The datasets mimic real-world scenarios involving API-tasks such as API / tool detection, slot filling, and sequencing of the detected APIs. We demonstrate the utility of the API-BLEND dataset for both training and benchmarking purposes.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# FedFDP: 差別化プライバシによる公正なフェデレーション学習

FedFDP: Fairness-Aware Federated Learning with Differential Privacy ( http://arxiv.org/abs/2402.16028v2 )

ライセンス: Link先を確認

Xinpeng Ling, Jie Fu, Kuncan Wang, Huifa Li, Tong Cheng, Zhili Chen,

(参考訳) Federated Learning(FL)は、データサイロの課題を克服する新しい機械学習パラダイムであり、大きな注目を集めている。しかし、我々の観察により、グローバルに効果的に訓練されたモデルは、異なるクライアントでパフォーマンスの相違が生じる可能性がある。これは、クライアントが共同でトレーニングしたモデルが不公平な結果をもたらす可能性を示唆している。一方、関連する研究では、連合学習における勾配やモデルの伝達が、メンバーシップ推論攻撃などのプライバシー漏洩問題を引き起こす可能性があることを示唆している。上記の課題に対処するため,FedFairと呼ばれる公平性を考慮したフェデレーション学習アルゴリズムを提案する。 FedFairに基づいて、上記の2つ目の問題に対処するため、FedFDPアルゴリズムを形成するためにプライバシ保護を導入します。 FedFDPでは、公正度を調整しながら差分プライバシーを実現するために、公平性を考慮したクリッピング戦略を考案する。さらに, 付加的なアップロード損失値に対して, 有効性を最大化するための適応的クリッピング手法を提案する。さらに、我々のアルゴリズムが収束し、差分プライバシーを保証することを理論的に証明する。最後に、FedFairとFedFDPは、モデル性能と公正性の観点から、最先端のソリューションを著しく上回っていることを示す。コードとデータはhttps://anonymous.4open.science/r/FedFDP-5607でアクセスできる。

Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the transmission of gradients or models in federated learning can also give rise to privacy leakage issues, such as membership inference attacks. To address the first issue mentioned above, we propose a fairness-aware federated learning algorithm, termed FedFair. Building upon FedFair, we introduce privacy protection to form the FedFDP algorithm to address the second issue mentioned above. In FedFDP, we devise a fairness-aware clipping strategy to achieve differential privacy while adjusting fairness. Additionally, for the extra uploaded loss values, we present an adaptive clipping approach to maximize utility. Furthermore, we theoretically prove that our algorithm converges and ensures differential privacy. Lastly, extensive experimental results demonstrate that FedFair and FedFDP significantly outperform state-of-the-art solutions in terms of model performance and fairness. Code and data is accessible at https://anonymous.4open.science/r/FedFDP-5607.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# M3-VRD:マルチモーダルマルチタスクマルチ教師ビジュアルリッチフォーム文書理解

M3-VRD: Multimodal Multi-task Multi-teacher Visually-Rich Form Document Understanding ( http://arxiv.org/abs/2402.17983v2 )

ライセンス: Link先を確認

Yihao Ding, Lorenzo Vaiani, Caren Han, Jean Lee, Paolo Garza, Josiah Poon, Luca Cagliero,

(参考訳) 本稿では,視覚的にリッチな形式文書理解のための,マルチモーダル・マルチタスク・マルチ教師共同知識蒸留モデルを提案する。このモデルは、トークンとエンティティ表現の微妙な相関を容易にし、フォームドキュメントに固有の複雑さに対処することによって、きめ細かなレベルと粗いレベルの両方の洞察を活用するように設計されている。さらに, 多様な多教師間知識蒸留プロセスの高度化, 分散ギャップの提示, フォーム文書の調和的理解を実現するために, 新たな粒度間・粒度間損失関数を導入する。公開形式文書理解データセットの総合的な評価を通じて,提案モデルは既存のベースラインを一貫して上回り,視覚的に複雑な形式文書の複雑な構造や内容を扱う上での有効性を示す。

This paper presents a groundbreaking multimodal, multi-task, multi-teacher joint-grained knowledge distillation model for visually-rich form document understanding. The model is designed to leverage insights from both fine-grained and coarse-grained levels by facilitating a nuanced correlation between token and entity representations, addressing the complexities inherent in form documents. Additionally, we introduce new inter-grained and cross-grained loss functions to further refine diverse multi-teacher knowledge distillation transfer process, presenting distribution gaps and a harmonised understanding of form documents. Through a comprehensive evaluation across publicly available form document understanding datasets, our proposed model consistently outperforms existing baselines, showcasing its efficacy in handling the intricate structures and content of visually complex form documents.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# 散逸性ラシュバナノワイヤにおけるマヨラナゼロモード

Majorana zero-modes in a dissipative Rashba nanowire ( http://arxiv.org/abs/2403.00419v2 )

ライセンス: Link先を確認

Arnob Kumar Ghosh, Annica M. Black-Schaffer,

(参考訳) 凝縮物質系は連続的に散逸し、しばしば量子現象に悪影響を及ぼす。超伝導ラシュバナノワイヤにおける散逸の影響に着目した。このシステムは、散逸の存在下では、有限寿命でマヨラナゼロモード(MZM)をホストできる。最も興味深いことに、散逸は、非散逸系が位相的に自明な状態において、4つの強零モード (RZM) と2つのMZM (MZM) という2種類の散逸境界状態を生成することもできる。 MZMはバルクギャップ閉鎖によって出現し、トポロジカルに巻数によって特徴づけられる。 RZMはバルク状態とは関連がなく、巻く数を持たないが、その出現は例外的な点と結びついている。さらに, 散逸により誘発されるRZMとMZMの安定性を, ランダム障害の有無で確認した。本研究は,MZMを消散によって駆動される実験装置で実現し,安定化させる方法である。

Condensed matter systems are continuously subjected to dissipation, which often has adverse effects on quantum phenomena. We focus on the impact of dissipation on a superconducting Rashba nanowire. We reveal that the system can still host Majorana zero-modes (MZMs) with a finite lifetime in the presence of dissipation. Most interestingly, dissipation can also generate two kinds of dissipative boundary states: four robust zero-modes (RZMs) and two MZMs, in the regime where the non-dissipative system is topologically trivial. The MZMs appear via bulk gap closing and are topologically characterized by a winding number. The RZMs are not associated with any bulk states and possess no winding number, but their emergence is instead tied to exceptional points. Further, we confirm the stability of the dissipation-induced RZMs and MZMs in the presence of random disorder. Our study paves the way for both realizing and stabilizing MZMs in an experimental setup, driven by dissipation.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# 確率モデルによるボンガード・ログ問題の解法

Solving the bongard-logo problem by modeling a probabilistic model ( http://arxiv.org/abs/2403.03173v5 )

ライセンス: Link先を確認

Ruizhuo Song, Beiming Yuan,

(参考訳) 抽象推論問題は、AIアルゴリズムの知覚と認識能力に課題をもたらし、明示的な画像特徴の単なる識別以上のパターン認識と帰納的推論を要求する。本研究では,Bongard-Logo問題に適した確率モデルであるPMoCを導入し,独立確率モデルの構築を通じて高い推論精度を実現する。さらに,Bongard-Logo,RAVEN,I-RAVEN,PGMなど,複雑な抽象的推論タスクに特化した拡張トランスフォーマーであるPose-Transformerを設計した。カプセルネットワークのポーズ行列にインスパイアされたPose-Transformerは、画像データを処理する際の局所的特徴間の位置関係に焦点を当てる。 PMoCと組み合わせることで、推論精度をさらに高めることができる。我々のPose-Transformerは、抽象エンティティの位置の変化に伴う推論の難しさを効果的に解決し、RAVENのOIG、D3x3サブセット、およびPGMデータセット上で以前のモデルより優れている。最後に,多数のPose-Transformerパラメータから生じる展開困難を考慮し,パラメータ数を著しく削減しつつ,性能を向上する軽量版Straw Pose-Transformerを提案する。本研究は,抽象的推論と認知パターン認識におけるAI能力の向上に寄与する。

Abstract reasoning problems pose challenges to the perception and cognition abilities of AI algorithms, demanding deeper pattern recognition and inductive reasoning beyond mere identification of explicit image features. In this study, we introduce PMoC, a probabilistic model tailored for the Bongard-Logo problem, achieving high reasoning accuracy through the construction of an independent probabilistic model. Additionally, we have designed the Pose-Transformer, an enhanced Transformer-Encoder specifically crafted for complex abstract reasoning tasks, including Bongard-Logo, RAVEN, I-RAVEN, and PGM. Inspired by the pose matrix in capsule networks, Pose-Transformer strengthens the focus on positional relationships between local features when processing image data. When combined with PMoC, it can further enhance reasoning accuracy. Our Pose-Transformer effectively addresses reasoning difficulties associated with changes in the position of abstract entities, outperforming previous models on RAVEN's OIG, D3x3 subsets, and the PGM dataset. Finally, considering the deployment difficulties arising from the large number of Pose-Transformer parameters, this paper presents a lightweight version, Straw Pose-Transformer, which maintains performance while significantly reducing the parameter count. This study contributes to enhancing AI capabilities in abstract reasoning and cognitive pattern recognition.

翻訳日:2024-05-21 23:10:31 公開日:2024-05-20

# Triple-CFN:抽象推論プロセスの強化のための概念空間の再構築

Triple-CFN: Restructuring Concept Spaces for Enhancing Abstract Reasoning Process ( http://arxiv.org/abs/2403.03190v8 )

ライセンス: Link先を確認

Ruizhuo Song, Beiming Yuan,

(参考訳) 抽象推論は人工知能アルゴリズムに重大な課題をもたらし、知覚タスクに必要な以上の認知能力を要求する。本研究では,画像から概念や特徴を別々に抽出する新しいフレームワークであるCross-Feature Network(CFN)を紹介する。このフレームワークは、特にボンガード・ローゴ問題に対処する上で、推論の表現として機能に対する応答を利用する。抽出した概念と特徴をCFN内に組み込んだ期待最大化プロセスを統合することで,一定の限界はあるものの,顕著な結果を得た。これらの制約を克服するために,画像からの特徴抽出を最大化し,ボンガード・ローゴとレイブンの進歩行列(RPM)の両問題において有効性を示す効率的なモデルであるTriple-CFNを提案する。さらに, RPM問題に適した概念空間を明示的に構築する, Triple-CFN の先進バージョンである Meta Triple-CFN を紹介する。これにより、関連する概念の推論と解釈可能性の高い精度が保証される。全体として、この研究は抽象的推論のための革新的なネットワーク設計を探求し、マシンインテリジェンスのフロンティアを前進させる。

Abstract reasoning poses significant challenges to artificial intelligence algorithms, demanding a cognitive ability beyond that required for perceptual tasks. In this study, we introduce the Cross-Feature Network (CFN), a novel framework designed to separately extract concepts and features from images. This framework utilizes the responses of features to concepts as representations for reasoning, particularly in addressing the Bongard-Logo problem. By integrating an Expectation-Maximization process between the extracted concepts and features within the CFN, we have achieved notable results, albeit with certain limitations. To overcome these limitations, we propose the Triple-CFN, an efficient model that maximizes feature extraction from images and demonstrates effectiveness in both the Bongard-Logo and Raven's Progressive Matrices (RPM) problems. Furthermore, we introduce Meta Triple-CFN, an advanced version of Triple-CFN, which explicitly constructs a concept space tailored for RPM problems. This ensures high accuracy of reasoning and interpretability of the concepts involved. Overall, this work explores innovative network designs for abstract reasoning, thereby advancing the frontiers of machine intelligence.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# D4Cグラブトレイン:概念記述と建築分布によるRPMとボンガードログ問題の解法

D4C Glove-train: Solving the RPM and Bongard-logo Problem by Circumscribing and Building Distribution for Concepts ( http://arxiv.org/abs/2403.03452v8 )

ライセンス: Link先を確認

Ruizhuo Song, Beiming Yuan,

(参考訳) 本稿では,抽象的推論の領域において,特にRaven's Progressive Matrices (RPM) と Bongard-Logo の課題に対処する上で,注目すべき進歩を実現する。リコネット(Lico-Net)は,RPM問題に顕著な精度で対処する新しいベースラインモデルである。この基礎を生かして、我々はD3Cアプローチを推進し、分布を通して抽象的推論問題の根底にある概念を提唱する。この観点は、Lico-NetとBongard-Logoタスクに優れたベースラインモデルの両方のパフォーマンスを向上させる。 D3Cの計算効率を高めるために,D3C-cosの変種を示す。さらに,これらの領域における概念境界を再定義するD2C手法を提案する。最後に、我々の方法論をD4Cに拡張し、さらに概念境界を洗練させ、RPMとBongard-Logoの課題において実質的な改善を示す。全体として、我々の貢献は抽象的推論の分野における新たな展望と実践的な進歩を示している。

This paper achieves noteworthy progress in the realm of abstract reasoning, particularly in addressing Raven's Progressive Matrices (RPM) and Bongard-Logo challenges. Initially, we introduce Lico-Net, a novel baseline model that resolves RPM problems with remarkable accuracy. Leveraging this foundation, we advance with the D3C approach, which advocates representing the underlying concepts in abstract reasoning problems through distributions. This perspective enhances the performance of both Lico-Net and a baseline model excelling in Bongard-Logo tasks. To bolster the computational efficiency of D3C, we present the D3C-cos variant, offering a streamlined solution. Furthermore, we propose the D2C method, redefining concept boundaries within these domains and bridging the divide between high-level abstractions and their lower-dimensional counterparts. Finally, we extend our methodology to D4C, employing adversarial techniques to refine concept boundaries further and demonstrate substantial improvements in both RPM and Bongard-Logo challenges. Overall, our contributions present a fresh outlook and practical advancements in the field of abstract reasoning.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# 自然パラメトリックダウンコンバージョンにおけるアインシュタイン-ポドルスキー-ローゼン相関-ガウス近似を超えて-

Einstein-Podolsky-Rosen correlations in spontaneous parametric down-conversion: Beyond the Gaussian approximation ( http://arxiv.org/abs/2403.04561v2 )

ライセンス: Link先を確認

A. G. da Costa Moura, C. H. Monken,

(参考訳) 本稿では、ガウス近似を用いずに、運動量と位置空間の両方で自発パラメトリックダウンコンバージョンによって生じる光子対の偶然検出確率振幅について解析式を示し、非線形結晶における複屈折の影響を考慮に入れた。また,Einstein-Podolsky-Rosen相関をベンチマークとして8種類のポンプビーム構成の理論的予測を支持する実験データも提示した。

We present analytic expressions for the coincidence detection probability amplitudes of photon pairs generated by spontaneous parametric down-conversion in both momentum and position spaces, without making use of the Gaussian approximation, and taking into account the effects of birefringence in the nonlinear crystal. We also present experimental data supporting our theoretical predictions, using Einstein-Podolsky-Rosen correlations as benchmarks, for 8 different pump beam configurations.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# ディープフェイク映像検出のための爆発型潜水流

Exploiting Style Latent Flows for Generalizing Deepfake Video Detection ( http://arxiv.org/abs/2403.06592v3 )

ライセンス: Link先を確認

Jongwook Choi, Taehoon Kim, Yonghyun Jeong, Seungryul Baek, Jongwon Choi,

(参考訳) 提案手法は, 映像の時間的変化における遅延ベクトルの解析と異常挙動に基づいて, フェイクビデオの検出手法を提案する。生成した顔画像は,様々な表情と幾何変換を伴う時間的安定な映像の生成において必然的に避けられない,スタイル潜時ベクトルの時間的変化の時間的特徴に悩まされていることがわかった。我々のフレームワークは、スタイル潜在ベクトルの動的特性を表現するために、コントラスト学習によって訓練されたStyleGRUモジュールを利用する。さらに,StyleGRU生成機能とコンテンツベース機能を統合し,視覚的および時間的アーティファクトの検出を可能にするスタイルアテンションモジュールを導入する。提案手法はディープフェイク検出における様々なベンチマークシナリオにまたがって,クロスデータセットおよびクロスマニピュレーションシナリオにおいて,その優位性を示す。さらなる分析を通じて、我々は、ディープフェイクビデオ検出の一般性を改善するために、スタイル潜在ベクトルの時間的変化を用いることの重要性も検証した。

This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# 無限次元ベイズ逆問題に対する幾何MCMCの微分インフォームドニューラル演算子加速

Derivative-informed neural operator acceleration of geometric MCMC for infinite-dimensional Bayesian inverse problems ( http://arxiv.org/abs/2403.08220v2 )

ライセンス: Link先を確認

Lianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas,

(参考訳) 本稿では,無限次元ベイズ逆問題(BIP)の解法として,幾何学的マルコフ連鎖モンテカルロ(MCMC)を高速化する演算子学習手法を提案する。幾何学的MCMCでは、後部局所幾何学に適応する高品質な提案が採用されているが、パラメータ・トゥ・オブザーバブル(PtO)写像が高価なパラメトリック偏微分方程式(PDE)によって定義されると、ログのような勾配とヘシアンの繰り返し計算が禁止される。本稿では,PtOマップのニューラル演算子サロゲートによって駆動される遅延受容幾何学的MCMC法について考察する。かなりのスピードアップを達成するためには、サロゲートはPtOマップとそのヤコビアンを正確に近似する必要がある。本研究では、PtO写像とヤコビアンの合同サンプルを用いた微分インフォームド演算子学習の拡張(O'Leary-Roseberry et al , J. Comput. Phys., 496 (2024))を提案する。これによりデリバティブインフォームド・ニューラル・オペレーター(DINO)は、観測可能および後部局所幾何学を従来の方法よりも大幅に低いトレーニングコストで正確に予測するサロゲートとなる。還元基底DINOサロゲートのコスト及び誤差解析を行う。 DINO駆動MCMCは、幾何学的MCMCより3～9倍、幾何学的MCMCより60～97倍、効果的な後部サンプルを生成する。さらに, DINOサロゲートのトレーニングコストは, 10～25個の有効後部サンプルの後に, 幾何学的MCMCと比較しても低下する。

We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse problems (BIPs). While geometric MCMC employs high-quality proposals that adapt to posterior local geometry, it requires repeated computations of gradients and Hessians of the log-likelihood, which becomes prohibitive when the parameter-to-observable (PtO) map is defined through expensive-to-solve parametric partial differential equations (PDEs). We consider a delayed-acceptance geometric MCMC method driven by a neural operator surrogate of the PtO map, where the proposal exploits fast surrogate predictions of the log-likelihood and, simultaneously, its gradient and Hessian. To achieve a substantial speedup, the surrogate must accurately approximate the PtO map and its Jacobian, which often demands a prohibitively large number of PtO map samples via conventional operator learning methods. In this work, we present an extension of derivative-informed operator learning [O'Leary-Roseberry et al., J. Comput. Phys., 496 (2024)] that uses joint samples of the PtO map and its Jacobian. This leads to derivative-informed neural operator (DINO) surrogates that accurately predict the observables and posterior local geometry at a significantly lower training cost than conventional methods. Cost and error analysis for reduced basis DINO surrogates are provided. Numerical studies demonstrate that DINO-driven MCMC generates effective posterior samples 3--9 times faster than geometric MCMC and 60--97 times faster than prior geometry-based MCMC. Furthermore, the training cost of DINO surrogates breaks even compared to geometric MCMC after just 10--25 effective posterior samples.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# 創発的エージェント・ソサイエティにおける社会的ノルムの出現--原理と建築

Emergence of Social Norms in Generative Agent Societies: Principles and Architecture ( http://arxiv.org/abs/2403.08251v2 )

ライセンス: Link先を確認

Siyue Ren, Zhiyao Cui, Ruiqi Song, Zhen Wang, Shuyue Hu,

(参考訳) 社会的規範は、行動規範の理解と定着に向けてエージェントを導く上で重要な役割を担い、マルチエージェントシステム(MAS)における社会的対立を減らす。しかし、現在のLLMベースの(あるいは生成的な)MASには、規範的な能力がない。本稿では,生成型MASにおける社会的規範の出現を促進するために,CRSECという新しいアーキテクチャを提案する。私たちのアーキテクチャは、創造と表現、スプレッド、評価、コンプライアンスの4つのモジュールで構成されています。これは、創発的プロセスのいくつかの重要な側面を1つにまとめる。 (i)社会規範の発祥地 (ii) 形式的にどのように表現されるか三エージェントのコミュニケーション及び観察の方法四衛生検査で検査し、長期にわたって合成する方法、及び (v)エージェントの計画と行動にどのように組み込まれているか。 Smallville Sandboxゲーム環境に導入した我々の実験は、我々の建築が社会規範を確立し、生成的MAS内での社会的衝突を減らす能力を示すものである。評価対象者30名を対象に実施した人的評価の結果,その有効性を確認した。私たちのプロジェクトは、https://github.com/sxswz213/CRSEC.com/sxswz213/CRSEC.comのリンクからアクセスできます。

Social norms play a crucial role in guiding agents towards understanding and adhering to standards of behavior, thus reducing social conflicts within multi-agent systems (MASs). However, current LLM-based (or generative) MASs lack the capability to be normative. In this paper, we propose a novel architecture, named CRSEC, to empower the emergence of social norms within generative MASs. Our architecture consists of four modules: Creation & Representation, Spreading, Evaluation, and Compliance. This addresses several important aspects of the emergent processes all in one: (i) where social norms come from, (ii) how they are formally represented, (iii) how they spread through agents' communications and observations, (iv) how they are examined with a sanity check and synthesized in the long term, and (v) how they are incorporated into agents' planning and actions. Our experiments deployed in the Smallville sandbox game environment demonstrate the capability of our architecture to establish social norms and reduce social conflicts within generative MASs. The positive outcomes of our human evaluation, conducted with 30 evaluators, further affirm the effectiveness of our approach. Our project can be accessed via the following link: https://github.com/sxswz213/CRSEC.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# CoRaiS: マルチエッジ協調コンピューティングのための軽量リアルタイムスケジューリング

CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing ( http://arxiv.org/abs/2403.09671v2 )

ライセンス: Link先を確認

Yujiao Hu, Qingmin Jia, Jinchao Chen, Yuan Yao, Yan Pan, Renchao Xie, F. Richard Yu,

(参考訳) 複数のエッジの制約されたリソースを強力なリソースプールに組み合わせたマルチエッジ協調コンピューティングは、膨大な計算能力、応答時間の改善、より多様化したサービスなど、大きなメリットをもたらす可能性がある。しかし、大量の異種資源の構成とスケジューリング戦略の欠如により、マルチエッジコンピューティングシステムのモデリングと協調が特に複雑になる。本稿では、まず、複雑なハードウェア構成を保護し、異種エッジで異なるサービス機能を再定義するシステムレベルの状態評価モデルを提案する。第二に、分散到着要求を最適にディスパッチする整数線形プログラミングモデルが設計されている。最後に,学習に基づく軽量リアルタイムスケジューラCoRaiSを提案する。 CoRaiSは、マルチエッジシステムのリアルタイム状態とリクエスト情報を埋め込み、埋め込みとポリシーネットワークを組み合わせてリクエストをスケジュールし、すべてのリクエストの応答時間を最小化する。評価結果は,CoRaiSがリアルタイムに高品質なスケジューリング決定を下し,システムスケールに関わらず,他のマルチエッジコンピューティングシステムに一般化可能であることを検証した。特性検証はまた、CoRaiSが負荷のバランスをうまく学習し、リアルタイムの状態を認識し、スケジューリング中に不均一性を認識することを実証している。

Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits, such as a tremendous computing power, improved response time, more diversified services. However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing system particularly complicated. This paper first proposes a system-level state evaluation model to shield the complex hardware configurations and redefine the different service capabilities at heterogeneous edges. Secondly, an integer linear programming model is designed to cater for optimally dispatching the distributed arriving requests. Finally, a learning-based lightweight real-time scheduler, CoRaiS, is proposed. CoRaiS embeds the real-time states of multi-edge system and requests information, and combines the embeddings with a policy network to schedule the requests, so that the response time of all requests can be minimized. Evaluation results verify that CoRaiS can make a high-quality scheduling decision in real time, and can be generalized to other multi-edge computing system, regardless of system scales. Characteristic validation also demonstrates that CoRaiS successfully learns to balance loads, perceive real-time state and recognize heterogeneity while scheduling.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# ガウススプラッティングによるビュー一貫性3次元編集

View-Consistent 3D Editing with Gaussian Splatting ( http://arxiv.org/abs/2403.11868v4 )

ライセンス: Link先を確認

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang,

(参考訳) 3D Gaussian Splatting (3DGS)の出現は、3D編集に革命をもたらし、効率よく高忠実なレンダリングを提供し、正確な局所的な操作を可能にした。現在、拡散ベースの2D編集モデルを用いて、マルチビューレンダリング画像を修正し、3DGSモデルの編集をガイドしている。しかし、このアプローチは多視点不整合の重要な問題に直面しており、誘導画像はビュー間で大きな相違を示し、モード崩壊と3DGSの視覚的アーティファクトをもたらす。この目的のために、3DGSをシームレスに画像編集プロセスに組み込む新しいフレームワークであるView-Consistent Editing (VcEdit)を導入する。 VcEditには、Cross-attention Consistency ModuleとEditing Consistency Moduleという2つの革新的な一貫性モジュールがある。これらの一貫性モジュールを反復的なパターンに組み込むことで、VcEditは多視点不整合の問題を解決し、様々な場面で高品質な3DGS編集を容易にする。

The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes.

翻訳日:2024-05-21 23:00:48 公開日:2024-05-20

# ToXCL: Toxic Speech Detection and Explanation のための統一フレームワーク

ToXCL: A Unified Framework for Toxic Speech Detection and Explanation ( http://arxiv.org/abs/2403.16685v2 )

ライセンス: Link先を確認

Nhat M. Hoang, Xuan Long Do, Duc Anh Do, Duc Anh Vu, Luu Anh Tuan,

(参考訳) オンラインの有害な言論の拡散は、人口集団に脅威をもたらす関連する問題である。明示的な有害な音声は攻撃的な語彙信号を含むが、暗黙のものはコード化された言語または間接的な言語から構成される。したがって、モデルが暗黙の有毒な音声を検出するだけでなく、その有毒さを説明することも重要である。このことは、暗黙の有毒なスピーチを効果的に検出し、説明できる統一されたフレームワークのユニークな必要性を引き出す。先行研究は、主にテキスト生成問題として有毒な音声の検出と説明のタスクを定式化した。それでも、この戦略を用いて訓練されたモデルは、その後のエラー伝搬問題に悩まされがちである。さらに,本実験では,検出タスクのみに着目したモデルよりも,そのようなモデルの検出結果がはるかに低いことが明らかとなった。これらのギャップを埋めるために、暗黙の有毒な音声の検出と説明のための統一的なフレームワークToXCLを導入する。私たちのモデルは3つのモジュールで構成されています。一所定のポストの目標人口群を生成するための目標集団発生装置二暗黙の有毒音声の検出に焦点を当てたエンコーダデコーダモデル 3 知識蒸留による教師分類器及び復号器は、必要な説明を生成する。 ToXCLは、新しい最先端の有効性を実現し、ベースラインを大幅に上回る。

The proliferation of online toxic speech is a pertinent problem posing threats to demographic groups. While explicit toxic speech contains offensive lexical signals, implicit one consists of coded or indirect language. Therefore, it is crucial for models not only to detect implicit toxic speech but also to explain its toxicity. This draws a unique need for unified frameworks that can effectively detect and explain implicit toxic speech. Prior works mainly formulated the task of toxic speech detection and explanation as a text generation problem. Nonetheless, models trained using this strategy can be prone to suffer from the consequent error propagation problem. Moreover, our experiments reveal that the detection results of such models are much lower than those that focus only on the detection task. To bridge these gaps, we introduce ToXCL, a unified framework for the detection and explanation of implicit toxic speech. Our model consists of three modules: a (i) Target Group Generator to generate the targeted demographic group(s) of a given post; an (ii) Encoder-Decoder Model in which the encoder focuses on detecting implicit toxic speech and is boosted by a (iii) Teacher Classifier via knowledge distillation, and the decoder generates the necessary explanation. ToXCL achieves new state-of-the-art effectiveness, and outperforms baselines significantly.

翻訳日:2024-05-21 22:50:58 公開日:2024-05-20

# 第二言語学習における分散型エージェントと生成AIによる教育

Distributed agency in second language learning and teaching through generative AI ( http://arxiv.org/abs/2403.20216v2 )

ライセンス: Link先を確認

Robert Godwin-Jones,

(参考訳) 生成AIは、言語学習に重要な機会を提供する。 ChatGPTのようなツールは、文章や音声形式のチャットを通じて非公式の第二言語プラクティスを提供することができ、学習者は習熟度、言語レジスタ、議論トピックなどの会話パラメータを指示する。 AIは、修正的なフィードバックを与えたり、実践演習を作成したり、拡張された研究計画を開発するように指示することができる。インストラクタはAIを使って、さまざまなメディアで学習と評価材料を構築することができる。 AIは没入型技術をより強力で多用途にし、スクリプトによるインタラクションから遠ざかる可能性が高い。学習者と教師の双方にとって、純粋に統計的に人間の言語モデルから生じるAIシステムの限界を理解することが重要である。さらに、AIシステムの構築方法に関する倫理的な懸念や、その使用に関する実践的な制約、特に特権の少ない人口に対する懸念もある。 AIツールのパワーと汎用性は、多くの人々の生活において(スマートフォンと同じく)価値ある、絶え間ない仲間になり、単純なツールの使用以上の密接なつながりを生み出すだろう。社会物質主義のような生態学理論は、密接なユーザーとAIの相互作用を通して発展する共有機関を調べるのに役立つ。

Generative AI offers significant opportunities for language learning. Tools like ChatGPT can provide informal second language practice through chats in written or voice forms, with the learner specifying through prompts conversational parameters such as proficiency level, language register, and discussion topics. AI can be instructed to give corrective feedback, create practice exercises, or develop an extended study plan. Instructors can use AI to build learning and assessment materials in a variety of media. AI is likely to make immersive technologies more powerful and versatile, moving away from scripted interactions. For both learners and teachers, it is important to understand the limitations of AI systems that arise from their purely statistical model of human language, which limits their ability to deal with nuanced social and cultural aspects of language use. Additionally, there are ethical concerns over how AI systems are created as well as practical constraints in their use, especially for less privileged populations. The power and versatility of AI tools are likely to turn them into valuable and constant companions in many peoples lives (akin to smartphones), creating a close connection that goes beyond simple tool use. Ecological theories such as sociomaterialism are helpful in examining the shared agency that develops through close user-AI interactions, as are the perspectives on human-object relations from Indigenous cultures.

翻訳日:2024-05-21 22:50:58 公開日:2024-05-20

# FashionEngine:マルチモーダル制御によるインタラクティブな3Dヒューマンジェネレーションと編集

FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls ( http://arxiv.org/abs/2404.01655v3 )

ライセンス: Link先を確認

Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu,

(参考訳) 本稿では,自然言語や視覚認識,手描きスケッチなどのユーザフレンドリーなマルチモーダルコントロールを通じて,対話型3次元人文生成編集システムであるFashionEngineを紹介する。 FashionEngineは、3つの重要なコンポーネントで3Dヒューマンプロダクションを自動化する。 1)2次元画像訓練データから意味的UV潜伏空間における3次元人間のモデリングを学習する事前学習された3次元人体拡散モデル。 2) マルチモーダル入力を暗黙のUV潜在空間に忠実に整合させ, 制御可能な3次元編集を実現する。マルチモーダルUV空間は、テキスト、画像、スケッチなどの異なるユーザ入力間で共有され、様々な共同マルチモーダル編集タスクを可能にする。 3) マルチモダリティ-UVアラインド・サンプラーは,従来の拡散から高品質で多様な3D人間を採取することを学ぶ。大規模な実験は、条件生成/編集タスクに対するFashionEngineの最先端のパフォーマンスを検証する。さらに,FashionEngine用の対話型ユーザインタフェースを提案する。これは条件付きおよび非条件生成タスクと,ポーズ/ビュー/シェープ制御,テキスト,画像,スケッチ駆動3D編集,仮想トライオンなどの編集タスクを統合されたフレームワークで実現する。私たちのプロジェクトページは以下の通りです。

We present FashionEngine, an interactive 3D human generation and editing system that creates 3D digital humans via user-friendly multimodal controls such as natural languages, visual perceptions, and hand-drawing sketches. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks. 2) Multimodality-UV Space encoding the texture appearance, shape topology, and textual semantics of human clothing in a canonical UV-aligned space, which faithfully aligns the user multimodal inputs with the implicit UV latent space for controllable 3D human editing. The multimodality-UV space is shared across different user inputs, such as texts, images, and sketches, which enables various joint multimodal editing tasks. 3) Multimodality-UV Aligned Sampler learns to sample high-quality and diverse 3D humans from the diffusion prior. Extensive experiments validate FashionEngine's state-of-the-art performance for conditional generation/editing tasks. In addition, we present an interactive user interface for our FashionEngine that enables both conditional and unconditional generation tasks, and editing tasks including pose/view/shape control, text-, image-, and sketch-driven 3D human editing and 3D virtual try-on, in a unified framework. Our project page is at: https://taohuumd.github.io/projects/FashionEngine.

翻訳日:2024-05-21 22:41:02 公開日:2024-05-20

# HENet:マルチビューカメラによるエンドツーエンドマルチタスク3次元認識のためのハイブリッド符号化

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras ( http://arxiv.org/abs/2404.02517v2 )

ライセンス: Link先を確認

Zhongyu Xia, ZhiWei Lin, Xinhao Wang, Yongtao Wang, Yun Xing, Shengxiang Qi, Nan Dong, Ming-Hsuan Yang,

(参考訳) 多視点カメラからの3次元認識は、自律運転システムにおいて重要な要素であり、3Dオブジェクトの検出や鳥の目視(BEV)セマンティックセグメンテーションといった複数のタスクを含む。近年の3次元知覚モデルでは,大きな画像エンコーダ,高解像度画像,長期時間入力が採用されており,性能が著しく向上している。しかし、これらの手法は、計算資源の制約のため、トレーニングや推論のシナリオでは互換性がないことが多い。さらに、現代の自律運転システムは、システムアーキテクチャ全体を単純化し、実装の複雑さを低減することができるマルチタスク3D知覚のためのエンドツーエンドフレームワークを採用することを好んでいる。しかし、複数のタスクをエンドツーエンドの3D知覚モデル内で協調的に最適化する場合、タスク間の衝突が発生することが多い。本稿では,これらの問題を緩和するために,マルチタスク3次元認識のためのHENetというエンドツーエンドフレームワークを提案する。具体的には,短期フレーム用大画像エンコーダと長期フレーム用小画像エンコーダを用いたハイブリッド画像エンコーダを提案する。次に,2つのハイブリット画像エンコーダから抽出した異なるフレームの特徴を融合する,アテンション機構に基づく時間的特徴統合モジュールを提案する。最後に、各知覚タスクの特徴に基づき、異なるグリッドサイズのBEV機能、独立したBEVエンコーダ、タスクデコーダを異なるタスクに活用する。実験の結果,HENetは3Dオブジェクト検出やBEVセマンティックセマンティックセグメンテーションを含む,最先端のマルチタスク3D知覚結果をnuScenesベンチマークで達成した。ソースコードとモデルはhttps://github.com/VDIGPKU/HENet.comで公開される。

Three-dimensional perception from multi-view cameras is a crucial component in autonomous driving systems, which involves multiple tasks like 3D object detection and bird's-eye-view (BEV) semantic segmentation. To improve perception precision, large image encoders, high-resolution images, and long-term temporal inputs have been adopted in recent 3D perception models, bringing remarkable performance gains. However, these techniques are often incompatible in training and inference scenarios due to computational resource constraints. Besides, modern autonomous driving systems prefer to adopt an end-to-end framework for multi-task 3D perception, which can simplify the overall system architecture and reduce the implementation complexity. However, conflict between tasks often arises when optimizing multiple tasks jointly within an end-to-end 3D perception model. To alleviate these issues, we present an end-to-end framework named HENet for multi-task 3D perception in this paper. Specifically, we propose a hybrid image encoding network, using a large image encoder for short-term frames and a small image encoder for long-term temporal frames. Then, we introduce a temporal feature integration module based on the attention mechanism to fuse the features of different frames extracted by the two aforementioned hybrid image encoders. Finally, according to the characteristics of each perception task, we utilize BEV features of different grid sizes, independent BEV encoders, and task decoders for different tasks. Experimental results show that HENet achieves state-of-the-art end-to-end multi-task 3D perception results on the nuScenes benchmark, including 3D object detection and BEV semantic segmentation. The source code and models will be released at https://github.com/VDIGPKU/HENet.

翻訳日:2024-05-21 22:41:01 公開日:2024-05-20

# 外部計画型大規模言語モデルによる会話性疾患の診断

Conversational Disease Diagnosis via External Planner-Controlled Large Language Models ( http://arxiv.org/abs/2404.04292v5 )

ライセンス: Link先を確認

Zhoujian Sun, Cheng Luo, Ziyi Liu, Zhengxing Huang,

(参考訳) 大規模言語モデル(LLM)の開発は、人工知能(AI)に基づく診断に先例のない可能性をもたらした。しかし、実際の診断シナリオにおけるLCMの応用的視点は、患者データを積極的に収集することができないため、まだ不明である。本研究は,医師のエミュレートによる計画能力の向上を目的としたLCMに基づく診断システムを提案する。我々のシステムは、計画タスクを処理するために2つの外部プランナーを含んでいる。最初のプランナーは、病気スクリーニングの質問を定式化し、初期診断を行うための強化学習アプローチを採用している。第2のプランナーは、LSMを使用して医療ガイドラインを解析し、鑑別診断を行う。実際の患者電子カルテデータを用いて,仮想患者と医師とのシミュレーション対話を構築し,診断能力の評価を行った。本システムは, 疾患スクリーニングと鑑別診断の両課題において, 有意な成績を示した。この研究は、AIを臨床環境にシームレスに統合するためのステップであり、医療診断の精度とアクセシビリティを高める可能性がある。

The development of large language models (LLMs) has brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnostic scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a LLM-based diagnostic system that enhances planning capabilities by emulating doctors. Our system involves two external planners to handle planning tasks. The first planner employs a reinforcement learning approach to formulate disease screening questions and conduct initial diagnoses. The second planner uses LLMs to parse medical guidelines and conduct differential diagnoses. By utilizing real patient electronic medical record data, we constructed simulated dialogues between virtual patients and doctors and evaluated the diagnostic abilities of our system. We demonstrated that our system obtained impressive performance in both disease screening and differential diagnoses tasks. This research represents a step towards more seamlessly integrating AI into clinical settings, potentially enhancing the accuracy and accessibility of medical diagnostics.

翻訳日:2024-05-21 22:41:01 公開日:2024-05-20

# テーラーフィールドのキラリティー・対称性研究への応用

The Application of Tailored Fields for Studying Chirality and Symmetry ( http://arxiv.org/abs/2404.05923v2 )

ライセンス: Link先を確認

Dino Habibović, Kathryn R. Hamilton, Ofer Neufeld, Laura Rego,

(参考訳) ウルトラショートレーザーパルスは、物質の中で最速の電荷力学をトリガーし、探究するためのユニークなツールであり、空間、時間、エネルギーにおいて前例のない分解能を持つ基本的な物理現象の研究を可能にする。超短パルスがもたらす最も興味深い機会の1つは、空間および偏光領域におけるレーザービームの特性を調整し、複数のレベルの対称性の破れを効果的に制御することで対称性を調節し、調査することができることである。特に、これはキラル物質と超高速キラルダイナミクスの探索を可能にする。近年では、キラリティーを研究するための高感度なアプローチの開発が物理学や化学においてホットな話題となり、主に尾尾の光の分野から発展した。この視点では、これらの分野の個人的および共同進化を論じ、すでに交配し、科学における新たな機会を開こうとしている。我々は、トピックが完全に統合され、相互に進化すると予想される将来の展望を概説し、卓越したオープンな問題を強調します。

Ultrashort laser pulses pose unique tools to trigger and probe the fastest charge dynamics in matter, allowing the investigation of fundamental physical phenomena with unprecedented resolution in space, time, and energy. One of the most fascinating opportunities that ultrashort pulses offer is the possibility of modulating and investigating symmetries by tailoring the properties of the laser beam in the spatial and polarization domains, effectively controlling symmetry breaking on multiple levels. In particular, this allows probing chiral matter and ultrafast chiral dynamics. In recent years, the development of highly sensitive approaches for studying chirality has been a hot topic in physics and chemistry that has developed largely separately from the field of tailored light. This perspective discusses the individual and joint evolution of these fields with an emphasis on how the fields have already cross-fertilized, opening new opportunities in science. We outline a future outlook of how the topics are expected to fully merge and mutually evolve, emphasizing outstanding open issues.

翻訳日:2024-05-21 22:41:01 公開日:2024-05-20

# シミュレーション最適化による言語モデルプロンプト選択

Language Model Prompt Selection via Simulation Optimization ( http://arxiv.org/abs/2404.08164v2 )

ライセンス: Link先を確認

Haoting Zhang, Jinghai He, Rhonda Righter, Zeyu Zheng,

(参考訳) 生成言語モデルの発展に伴い,近年,プロンプトの選択が注目されている。プロンプト(英: prompt)は、コンテンツ生成において生成言語モデルのガイドとして機能する、ユーザが提供する命令または記述である。人間の労働力に基づくプロンプト選択手法は存在するが、シミュレーション最適化により、選択したプロンプトに対する事前定義されたスコアを最大化することを目的として、この選択を容易にすることを検討する。具体的には,2段階のフレームワークを提案する。第一段階では、各プロンプトが適度な次元ベクトルで表されるような十分数で可能なプロンプトの集合を決定する。評価と選択の次の段階において、プロンプトを表す中等次元ベクトルに関するスコアの代理モデルを構築する。この構築された代理モデルに基づいて、逐次評価のプロンプトを選択することを提案する。本フレームワークにおける逐次評価手順の整合性を証明する。また,提案手法の有効性を示す数値実験を行い,実装の実践的指導を行う。

With the advancement in generative language models, the selection of prompts has gained significant attention in recent years. A prompt is an instruction or description provided by the user, serving as a guide for the generative language model in content generation. Despite existing methods for prompt selection that are based on human labor, we consider facilitating this selection through simulation optimization, aiming to maximize a pre-defined score for the selected prompt. Specifically, we propose a two-stage framework. In the first stage, we determine a feasible set of prompts in sufficient numbers, where each prompt is represented by a moderate-dimensional vector. In the subsequent stage for evaluation and selection, we construct a surrogate model of the score regarding the moderate-dimensional vectors that represent the prompts. We propose sequentially selecting the prompt for evaluation based on this constructed surrogate model. We prove the consistency of the sequential evaluation procedure in our framework. We also conduct numerical experiments to demonstrate the efficacy of our proposed framework, providing practical instructions for implementation.

翻訳日:2024-05-21 22:41:01 公開日:2024-05-20

# フローにしよう:3次元フローとオブジェクトクラスタリングの同時最適化

Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering ( http://arxiv.org/abs/2404.08363v2 )

ライセンス: Link先を確認

Patrik Vacek, David Hurych, Tomáš Svoboda, Karel Zimmermann,

(参考訳) 本研究では,実大規模原点雲列からの自己監督型3次元シーンフロー推定の問題について検討する。地上真実のシーンフローラベルが存在しない現代的アプローチでは、フローとオブジェクトの剛性に基づく構造的正規化を取り入れることで、点雲の逐次対にわたる最適化フローの低減に重点を置いている。剛体物体は様々な3次元空間クラスタリング法により推定される。最先端の手法はニューラル・プリエント構造を用いてシーン全体の動きをキャプチャすることに成功したが、複数の物体の動きを識別する際の課題に直面した。そこで本研究では, 重なり合うソフトクラスタと非重なり合う固いクラスタ表現を組み合わせたクラスタリング手法を提案する。フローは、徐々に増大する非重なり合う固いクラスターと、一定の大きさの重なり合う柔らかいクラスターとで、共同で推定される。提案手法をLiDAR点雲を用いた複数データセット上で評価し,新たな最先端結果に到達した自己教師付きベースラインよりも優れた性能を示す。本手法は,歩行者やサイクリスト,その他の脆弱な道路利用者を含む,複数の独立移動物体が近接する複雑な動的シーンにおける流れの解消に優れる。私たちのコードはhttps://github.com/ctu-vras/let-it-flow.comで公開されています。

We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences, which is crucial to various tasks like trajectory prediction or instance segmentation. In the absence of ground truth scene flow labels, contemporary approaches concentrate on deducing optimizing flow across sequential pairs of point clouds by incorporating structure based regularization on flow and object rigidity. The rigid objects are estimated by a variety of 3D spatial clustering methods. While state-of-the-art methods successfully capture overall scene motion using the Neural Prior structure, they encounter challenges in discerning multi-object motions. We identified the structural constraints and the use of large and strict rigid clusters as the main pitfall of the current approaches and we propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters representation. Flow is then jointly estimated with progressively growing non-overlapping rigid clusters together with fixed size overlapping soft clusters. We evaluate our method on multiple datasets with LiDAR point clouds, demonstrating the superior performance over the self-supervised baselines reaching new state of the art results. Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other which includes pedestrians, cyclists and other vulnerable road users. Our codes are publicly available on https://github.com/ctu-vras/let-it-flow.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# テキストから歌へ:声と伴奏を取り入れた制御可能な音楽生成を目指して

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment ( http://arxiv.org/abs/2404.09313v3 )

ライセンス: Link先を確認

Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang,

(参考訳) 歌は歌声と伴奏の組み合わせである。しかし、既存の作品では、歌声合成と音楽生成を独立して重視している。歌の合成を探求するためにはほとんど注意が払われなかった。そこで本研究では,音声と伴奏の両方を組み込んだテキスト・ツー・サング・シンセサイザーという新しいタスクを提案する。我々は,歌唱音声合成 (SVS) とV2A合成 (V2A) を組み合わせた2段階音声合成法であるメロディストを開発した。メロディストは、トリトウワーコントラスト事前学習を利用して、制御可能なV2A合成のためのより効果的なテキスト表現を学習する。音楽サイトから発掘された中国の歌のデータセットは、我々の研究のためにデータ不足を軽減するために構築されている。評価結果は,メロディストが同等の品質とスタイルの整合性で楽曲を合成できることを実証した。オーディオサンプルはhttps://text2songMelodist.github.io/Sample/で見ることができる。

A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis. Melodist leverages tri-tower contrastive pretraining to learn more effective text representation for controllable V2A synthesis. A Chinese song dataset mined from a music website is built up to alleviate data scarcity for our research. The evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency. Audio samples can be found in https://text2songMelodist.github.io/Sample/.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# LLM以外のインストラクションを伴わないインストラクションによるテキスト分類器のインキュベーション

Incubating Text Classifiers Following User Instruction with Nothing but LLM ( http://arxiv.org/abs/2404.10877v2 )

ライセンス: Link先を確認

Letian Peng, Jingbo Shang,

(参考訳) 本稿では,任意のクラス定義(ユーザ・インストラクション)を与えられたテキスト分類データを生成することを目的としており,人間のアノテーションや生のコーパスを使わずに,小さなテキスト分類器を訓練することができる。先駆的な試みと比較して、提案したインキュベータは、複雑で相互に依存したクラス(例えば、Eduucatorから提供されるTED Talkや他)を処理できる最初のフレームワークです。具体的には,まず,HuggingFace の分類データセットと記述から得られた命令-データマッピングを,GPT-4 によるテキスト内拡張とともに調整した LLM である。次に、インキュベーターを意味的テキスト埋め込みのクラスタ中心で学習し、世代ごとの統一性と意味的多様性を強調する。各種分類タスクにおけるインキュベータと,直接LLMに基づく推論や,迅速なエンジニアリングによるトレーニングデータ生成などの強力なベースラインを比較した。実験では,(1)従来のベンチマークでうまく動作し,(2)ラベル依存やユーザの好みを考慮に入れ,(3)複数の分類器をインキュベートすることで論理的なテキストマイニングを可能にする。

In this paper, we aim to generate text classification data given arbitrary class definitions (i.e., user instruction), so one can train a small text classifier without any human annotation or raw corpus. Compared with pioneer attempts, our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes (e.g., "TED Talk given by Educator" and "Other"). Specifically, Incubator is an LLM firstly tuned on the instruction-to-data mappings that we obtained from classification datasets and descriptions on HuggingFace together with in-context augmentation by GPT-4. We then refine Incubator by learning on the cluster centers of semantic textual embeddings to emphasize the uniformity and semantic diversity in generations. We compare Incubator on various classification tasks with strong baselines such as direct LLM-based inference and training data generation by prompt engineering. Experiments show Incubator is able to (1) perform well on traditional benchmarks, (2) take label dependency and user preference into consideration, and (3) enable logical text mining by incubating multiple classifiers.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# 光から原子へのねじれ度変換

Conversion of twistedness from light to atoms ( http://arxiv.org/abs/2404.11558v2 )

ライセンス: Link先を確認

S. S. Baturin, A. V. Volotka,

(参考訳) 我々は、束縛された電子によるツイストされた光子の吸収を利用して、自由空間におけるツイストされた原子の生成を可能にするための簡単なモデルとスキームを提案する。我々は、光子と原子の非弾性衝突において、光子のねじれ状態が質量中心状態に移され、原子の軌道運動量の投影が$m_\gamma-\Delta m_e$となることを示す。また、実験条件によっては、光子のねじれ度は原子中心の量子状態に移されるか、束縛された電子遷移の選択規則を変更することが示される。提案されたスキームは一般的なもので、原子波面の複雑な整形を可能にする。

We develop a simple model and propose a scheme that allows the production of twisted atoms in free space using the absorption of twisted photons by a bound electron. We show that in the inelastic collision of a photon and an atom, the twisted state of the photon is transferred to the center-of-mass state, so that the projection of the orbital momentum of the atom becomes $m_\gamma-\Delta m_e$. We also show that, depending on the experimental conditions, the twistedness of the photon is either transferred to the atomic center-of-mass quantum state or modifies the selection rule for the bound electron transition. Proposed scheme is general and enables complex shaping of the atomic wavefront.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering ( http://arxiv.org/abs/2404.12020v2 )

ライセンス: Link先を確認

Jie Ma, Min Hu, Pinghui Wang, Wangchun Sun, Lingyun Song, Hongbin Pei, Jun Liu, Youtian Du,

(参考訳) AVQA(Audio-Visual Question Answering)は複雑なマルチモーダル推論タスクであり、音声とビデオの入力ペアに基づいて、インテリジェントなシステムが自然言語クエリに正確に応答するよう要求する。それでも、一般的なAVQAアプローチは、データセットのバイアスを過度に学習する傾向があり、結果としてロバスト性が低下する。さらに、現在のデータセットはこれらの方法の正確な診断を提供していないかもしれない。これらの課題に対処するために、まず、公開データセット(\textit{MUSIC-AVQA})のテストスプリット内の質問を表現し、その後、分割された質問に分配シフトを導入するという、2つのステップで構築された新しいデータセットである \textit{MUSIC-AVQA-R} を提案する。前者は多様で多様なテストスペースを導き、後者は稀で頻繁で全体的な質問に対する包括的な堅牢性評価をもたらす。次に, バイアス学習を克服するために, 多面サイクル協調型バイアス回避戦略を利用する頑健なアーキテクチャを提案する。実験の結果、このアーキテクチャは両方のデータセットで最先端のパフォーマンスを実現し、特に提案したデータセットでは9.32\%の大幅な改善が得られた。これら2つのデータセットに対して大規模なアブレーション実験を行い、デバイアスング戦略の有効性を検証した。さらに,既存のマルチモーダルQA手法の限界ロバスト性を,データセットの評価を通じて強調する。

Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackle these challenges, firstly, we propose a novel dataset, \textit{MUSIC-AVQA-R}, crafted in two steps: rephrasing questions within the test split of a public dataset (\textit{MUSIC-AVQA}) and subsequently introducing distribution shifts to split questions. The former leads to a large, diverse test space, while the latter results in a comprehensive robustness evaluation on rare, frequent, and overall questions. Secondly, we propose a robust architecture that utilizes a multifaceted cycle collaborative debiasing strategy to overcome bias learning. Experimental results show that this architecture achieves state-of-the-art performance on both datasets, especially obtaining a significant improvement of 9.32\% on the proposed dataset. Extensive ablation experiments are conducted on these two datasets to validate the effectiveness of the debiasing strategy. Additionally, we highlight the limited robustness of existing multi-modal QA methods through the evaluation on our dataset.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# 勾配偏光アルゴリズムによる3dB限界を超える最適機械的四次スクイーズ

Optimized mechanical quadrature squeezing beyond the 3 dB limit via gradient-descent algorithm ( http://arxiv.org/abs/2404.13563v2 )

ライセンス: Link先を確認

Yu-Hong Liu, Jie-Qiao Liao,

(参考訳) メカニカル・クアチュア・スクイーズ状態の調製は、キャビティ・オプティメニクスにおいて重要な意味を持つ。そこで本研究では,勾配偏光アルゴリズムを用いて最適キャビティフィールド駆動パルスを求めることにより,典型的なキャビティ・オプティメカカル・システムにおけるメカニカル・クィアリングを生成するための信頼性の高い手法を提案する。熱フォノン占有率100の3dB定常限界を超える機械共振器において, 強い4次スキューズを実現する。さらに、機械的スクイーズを1つの機械的発振期間内に超高速に作成することができる。また、生成したメカニカルスクイーズに付随する最適パルス駆動値を求め、メカニカルスクイーズ生成のメカニズムを解析した。この研究は、量子光学および量子情報科学における最適量子制御の適用を促進する。

The preparation of mechanical quadrature-squeezed states holds significant importance in cavity optomechanics because the squeezed states have extensive applications in understanding fundamental quantum mechanics and exploiting modern quantum technonogy. Here, we propose a reliable scheme for generating mechanical quadrature squeezing in a typical cavity optomechanical system via seeking for optimal cavity-field driving pulses using the gradient-descent algorithm. We realize strong quadrature squeezing in a mechanical resonator that exceeds the 3 dB steady-state limit, even with a thermal phonon occupancy of one hundred. Furthermore, the mechanical squeezing can be ultrafastly created within one mechanical oscillation period. We also obtain the optimal pulsed drivings associated with the created mechanical squeezings and analyze the mechanism for mechanical squeezing generation. This work will promote the application of optimal quantum control in quantum optics and quantum information science.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# プログラム環境ファズリング

Program Environment Fuzzing ( http://arxiv.org/abs/2404.13951v2 )

ライセンス: Link先を確認

Ruijie Meng, Gregory J. Duck, Abhik Roychoudhury,

(参考訳) プログラムは独立して実行されるのではなく、プログラムの振る舞いを駆動する実行環境と相互作用する。これにより、ファイル、データベース、構成、ネットワークソケット、人間とユーザのインタラクションなど、複雑な環境相互作用の影響を捉える必要がある。シンボリックな実行における環境キャプチャの従来のアプローチと、手作業を伴う環境モデリングを用いたモデルチェック。本稿では,グレーボックスファジングの拡張に基づいて,異なるアプローチをとる。プログラムが与えられた場合、カーネル/ユーザ/モード境界におけるすべての環境相互作用をシステムコールの形式で記録する。次に、元の記録された相互作用の下でプログラムをリプレイするが、今回は選択的な突然変異を適用し、異なるプログラム環境の効果を得る。ファジィキャンペーンの繰り返し(フィードバック駆動)変異によって、クラッシュする振る舞いを引き起こすプログラム環境を探すことができる。私たちのEFuzzツールは、よく知られた現実世界のプロトコル実装とGUIアプリケーションで33のゼロデイバグを発見しました。その多くはセキュリティ上の脆弱性であり、14のCVEが割り当てられている。

Computer programs are not executed in isolation, but rather interact with the execution environment which drives the program behaviours. Software validation and verification methods, such as greybox fuzzing, thus need to capture the effect of possibly complex environmental interactions, including files, databases, configurations, network sockets, human-user interactions, and more. Conventional approaches for environment capture in symbolic execution and model checking employ environment modelling, which involves manual effort. In this paper, we take a different approach based on an extension of greybox fuzzing. Given a program, we first record all observed environmental interactions at the kernel/user-mode boundary in the form of system calls. Next, we replay the program under the original recorded interactions, but this time with selective mutations applied, in order to get the effect of different program environments -- all without environment modelling. Via repeated (feedback-driven) mutations over a fuzzing campaign, we can search for program environments that induce crashing behaviour. Our EFuzz tool found 33 zero-day bugs in well-known real-world protocol implementations and GUI applications. Many of these are security vulnerabilities and 14 CVEs were assigned.

翻訳日:2024-05-21 22:31:13 公開日:2024-05-20

# EEGDiR:時間情報記憶のための脳波デノケーションネットワークとRetentive Networkによるグローバルモデリング

EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network ( http://arxiv.org/abs/2404.15289v2 )

ライセンス: Link先を確認

Bin Wang, Fei Deng, Peifan Jiang,

(参考訳) 脳波信号は臨床医学、脳研究、神経疾患研究において重要な役割を担っている。しかし、様々な生理的および環境的アーティファクトへの感受性は、記録された脳波データにノイズをもたらし、基礎となる脳活動の正確な分析を妨げる。この課題を緩和するためには、Denoisingテクニックが不可欠だ。近年の深層学習アプローチの進歩は、従来の手法と比較して脳波データの信号-雑音比を高める大きな可能性を示している。大規模言語モデル(LLM)の領域では、いくつかのモデルで広く使われているRetentive Network(Retnet)インフラストラクチャが、堅牢な特徴抽出とグローバルモデリング機能を示している。脳波信号と自然言語の時間的類似性を認識し、自然言語処理から脳波分解までRetnetを導入する。この統合は、脳波の認知への新しいアプローチを示し、脳活動の深い理解と神経疾患の正確な診断のための道を開く。それでも、Retnetの脳波への直接的適用は、脳波信号の1次元の性質のため不可能であり、自然言語処理は2次元データを扱う。本稿では1次元の脳波信号を2次元に変換してネットワーク入力として使用する信号埋め込み手法を提案する。実験結果から,提案手法によって達成されたデノナイズの有効性が著しく向上したことが確認された。

Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbased approaches exhibit substantial potential for enhancing the signal-to-noise ratio of EEG data compared to traditional methods. In the realm of large-scale language models (LLMs), the Retentive Network (Retnet) infrastructure, prevalent for some models, demonstrates robust feature extraction and global modeling capabilities. Recognizing the temporal similarities between EEG signals and natural language, we introduce the Retnet from natural language processing to EEG denoising. This integration presents a novel approach to EEG denoising, opening avenues for a profound understanding of brain activities and accurate diagnosis of neurological diseases. Nonetheless, direct application of Retnet to EEG denoising is unfeasible due to the one-dimensional nature of EEG signals, while natural language processing deals with two-dimensional data. To facilitate Retnet application to EEG denoising, we propose the signal embedding method, transforming one-dimensional EEG signals into two dimensions for use as network inputs. Experimental results validate the substantial improvement in denoising effectiveness achieved by the proposed method.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# 2次元アーキテクチャにおける高コヒーレンスKerr-cat量子ビット

High-Coherence Kerr-cat qubit in 2D architecture ( http://arxiv.org/abs/2404.16697v3 )

ライセンス: Link先を確認

Ahmed Hajr, Bingcheng Qing, Ke Wang, Gerwin Koolstra, Zahra Pedramrazi, Ziqi Kang, Larry Chen, Long B. Nguyen, Christian Junger, Noah Goss, Irwin Huang, Bibek Bhandari, Nicholas E. Frattini, Shruti Puri, Justin Dressel, Andrew N. Jordan, David Santiago, Irfan Siddiqi,

(参考訳) Kerr-cat量子ビット(Kerr-cat qubit)は、Kerr非線形性を持つ発振器に2光子駆動を適用することにより、多光子シュロディンガー猫状態が安定化されるボソニック量子ビットである。猫サイズの増大に伴う抑制ビットフリップ率により、この量子ビットはノイズバイアス量子ビットに適した量子誤り訂正符号を実装するための有望な候補となる。しかし、この量子ビットの安定化と制御に必要な強力な光-物質相互作用を達成するためには、伝統的に、量子ビットを加熱して性能を低下させる強いマイクロ波駆動が必要である。対照的に、駆動ポートとの結合を増大させることで、パーセルの大規模な崩壊を犠牲にして、強い駆動の必要性がなくなる。有効帯域ブロックフィルタをオンチップに統合することにより、このトレードオフを克服し、高コヒーレンスを有するスケーラブルな2D超伝導回路におけるKerr-cat量子ビットを実現する。このフィルタは、安定化および読み出しに必要な周波数で無視可能な減衰で、キュービット周波数で30dBのアイソレーションを提供する。実験では、8個の光子を持つ猫に対して99.6%の量子非破壊読み出し率を実験的に実証した。また、この量子ビットを高忠実に普遍的に制御するために、高速なラビ振動とX(90)ゲートの新たなデモを安定化ドライブの位相変調により組み合わせる。最後に、回路の理論解析と整合して、1ms以上のビットフリップ時間と位相フリップ時間の線形減少しか達成しない発振器における最大10光子の猫の大きさの関数として、このアーキテクチャの寿命を調べた。我々の量子ビットは、小さなフットプリントを持つフォールトトレラント量子プロセッサのビルディングブロックとして有望であることを示している。

The Kerr-cat qubit is a bosonic qubit in which multi-photon Schrodinger cat states are stabilized by applying a two-photon drive to an oscillator with a Kerr nonlinearity. The suppressed bit-flip rate with increasing cat size makes this qubit a promising candidate to implement quantum error correction codes tailored for noise-biased qubits. However, achieving strong light-matter interactions necessary for stabilizing and controlling this qubit has traditionally required strong microwave drives that heat the qubit and degrade its performance. In contrast, increasing the coupling to the drive port removes the need for strong drives at the expense of large Purcell decay. By integrating an effective band-block filter on-chip, we overcome this trade-off and realize a Kerr-cat qubit in a scalable 2D superconducting circuit with high coherence. This filter provides 30 dB of isolation at the qubit frequency with negligible attenuation at the frequencies required for stabilization and readout. We experimentally demonstrate quantum non-demolition readout fidelity of 99.6% for a cat with 8 photons. Also, to have high-fidelity universal control over this qubit, we combine fast Rabi oscillations with a new demonstration of the X(90) gate through phase modulation of the stabilization drive. Finally, the lifetime in this architecture is examined as a function of the cat size of up to 10 photons in the oscillator achieving a bit-flip time higher than 1 ms and only a linear decrease in the phase-flip time, in good agreement with the theoretical analysis of the circuit. Our qubit shows promise as a building block for fault-tolerant quantum processors with a small footprint.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# スケーラブルな変動量子シミュレーションのための多体ローカライゼーション

Exploiting many-body localization for scalable variational quantum simulation ( http://arxiv.org/abs/2404.17560v2 )

ライセンス: Link先を確認

Chenfeng Cao, Yeqing Zhou, Swamit Tannu, Nic Shannon, Robert Joynt,

(参考訳) 変分量子アルゴリズムは、短期量子デバイスを用いた実用的な量子アドバンテージを達成するための有望なアプローチとして登場した。その可能性にもかかわらず、これらのアルゴリズムのスケーラビリティは大きな課題となる。これは、ノイズがなくても持続する「不規則な高原」現象に大きく起因している。本研究では,Floquet-initialized variational quantum circuitsの枠組み内での多体局在化(MBL)熱化相転移について検討し,MBLがバレンプラトーを回避するためにどのように使用できるかを検討する。位相遷移は、逆参加比、絡み合いエントロピー、および計量として低重安定化器R'enyiエントロピーの計算によって観測される。 MBL相の回路を初期化し、容易に準備可能な初期状態を用いることで、ユニタリな2-設計の形成を防止でき、その結果、体積法ではなく領域を絡み合う出力状態となり、最適化を通してバレンプラトーを回避できる。この手法を用いることで、異なるフェーズにわたる様々なモデルハミルトンの基底状態の判定に成功し、最適化に必要なリソースが大幅に削減されることを示す。我々は127キュービットの$ibm\_brisbane$量子プロセッサで行った実験を通じて、MBLアプローチをさらに検証した。これらの実験は、変分計算を行うために必要な勾配が、ランダムなユニタリな「キック」を受けるハイゼンベルクモデルのMBL相で復元されることを確認した。これらの結果は、MBLと量子コンピューティングの相互作用に関する新たな洞察を与え、量子アルゴリズムの設計において、MBL状態の役割を考慮するべきであることを示唆している。

Variational quantum algorithms have emerged as a promising approach to achieving practical quantum advantages using near-term quantum devices. Despite their potential, the scalability of these algorithms poses a significant challenge. This is largely attributed to the "barren plateau" phenomenon, which persists even in the absence of noise. In this work, we explore the many-body localization (MBL)-thermalization phase transitions within a framework of Floquet-initialized variational quantum circuits and investigate how MBL could be used to avoid barren plateaus. The phase transitions are observed through calculations of the inverse participation ratio, the entanglement entropy, and a metric termed low-weight stabilizer R\'enyi entropy. By initializing the circuit in the MBL phase and employing an easily preparable initial state, we find it is possible to prevent the formation of a unitary 2-design, resulting in an output state with entanglement that follows an area- rather than a volume-law, and which circumvents barren plateaus throughout the optimization. Utilizing this methodology, we successfully determine the ground states of various model Hamiltonians across different phases and show that the resources required for the optimization are significantly reduced. We have further validated the MBL approach through experiments carried out on the 127-qubit $ibm\_brisbane$ quantum processor. These experiments confirm that the gradients needed to carry out variational calculations are restored in the MBL phase of a Heisenberg model subject to random unitary "kicks". These results provide new insights into the interplay between MBL and quantum computing, and suggest that the role of MBL states should be considered in the design of quantum algorithms.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# 変分自己回帰ネットワークと量子アニーリングを用いた統計力学計算

Statistical Mechanics Calculations Using Variational Autoregressive Networks and Quantum Annealing ( http://arxiv.org/abs/2404.19274v2 )

ライセンス: Link先を確認

Yuta Tamura, Masayuki Ohzeki,

(参考訳) 統計力学では、分割関数の計算は一般に困難である。近年,変分自己回帰ネットワーク(VAN)を用いた近似法が提案されている。このアプローチは、非常に多くのサンプルを取得しながら、生成確率を直接計算する利点を提供する。本研究は, 量子熱処理装置から得られた試料を, ギブス・ボルツマン分布に付着すると仮定した新しい近似法を提案する。有限サイズシェリントン・カークパトリックモデルに適用した場合,提案手法は,従来のVANアプローチや,広く利用されるナイーブ平均場などの近似手法と比較して精度が向上することを示した。

In statistical mechanics, computing the partition function is generally difficult. An approximation method using a variational autoregressive network (VAN) has been proposed recently. This approach offers the advantage of directly calculating the generation probabilities while obtaining a significantly large number of samples. The present study introduces a novel approximation method that employs samples derived from quantum annealing machines in conjunction with VAN, which are empirically assumed to adhere to the Gibbs-Boltzmann distribution. When applied to the finite-size Sherrington-Kirkpatrick model, the proposed method demonstrates enhanced accuracy compared to the traditional VAN approach and other approximate methods, such as the widely utilized naive mean field.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# CofiPara: 大規模マルチモーダルモデルを用いたマルチモーダルサルカズムターゲット同定のための粗粒パラダイム

CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models ( http://arxiv.org/abs/2405.00390v2 )

ライセンス: Link先を確認

Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen,

(参考訳) ソーシャルメディアはマルチモーダル・サルカズムに満ちており、テキストや画像のモダリティで直接明らかでない暗黙の矛盾のため、サルカズムの標的を特定することは特に困難である。マルチモーダルサルカズムターゲット同定(MSTI)の現在の手法は、主に、テキストと画像の両方を通して伝達されるマルチモーダルサルカズムの微妙な理解を見越して、端から端まで、表面的な指標に焦点を当てている。本稿では,大きめのパラダイムを持つ多目的MSTIフレームワークを提案する。マルチモーダル推論におけるLMM(Large Multimodal Models)の強力な能力に着想を得て、まずLMMに取り組み、マルチモーダルサルカズム検出における小言語モデルの粗粒化事前学習のための競合する有理性を生成する。次に、よりきめ細かな目標同定のためのモデルを微調整する。そこで,本研究の枠組みは,マルチモーダルサルカズム内での複雑な目標を十分に明らかにし,LMMの潜在的なノイズによる負の影響を緩和するものである。実験の結果,我々のモデルは最先端のMSTI法よりも優れており,また,サルカズムの解読における説明可能性も顕著であることがわかった。

Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# 物理埋め込み3Dガウスによるロボット手術映像を用いた効率的なデータ駆動シーンシミュレーション

Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians ( http://arxiv.org/abs/2405.00956v2 )

ライセンス: Link先を確認

Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou,

(参考訳) 手術シーンシミュレーションは,外科教育とシミュレータに基づくロボット学習において重要な役割を担っている。これらの環境を外科的シーンで作る従来のアプローチは、デザイナーがソフトボディシミュレーションのためのテクスチャとジオメトリーを備えた手作りの組織をモデル化する、労働集約的なプロセスを含んでいる。この手動のアプローチは時間を要するだけでなく、スケーラビリティやリアリズムにも制限があります。対照的に、データ駆動シミュレーションは魅力的な代替手段を提供する。実世界の手術映像データから3Dの手術シーンを自動的に再構築し、ソフトボディ物理を応用する可能性がある。しかし、この地域は比較的無漁である。本研究では3D Gaussianを手術シーンの学習可能な表現として紹介し,立体内視鏡映像から学習した。これらのシーンの過度な適合を防止し、幾何学的正当性を確保するため、奥行き監視と異方性正規化をガウス学習プロセスに組み込む。さらに,3次元ガウスに物理特性を統合したマテリアルポイント法を適用し,現実的なシーン変形を実現する。本手法を社内および公開外科用ビデオデータセットで評価した。以上の結果から, 内視鏡的画像からの手術シーンの再構築とシミュレーションを効率的に行うことができ, 手術シーンの再構築に数分しかかからず, リアルタイムに近づく速度で視覚的, 身体的両面の変形を生成できることが示唆された。その結果,手術教育やロボット学習で利用可能なシミュレーションの効率性と多様性を高めるための提案手法の可能性が示唆された。

Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes, we incorporate depth supervision and anisotropy regularization into the Gaussian learning process. Furthermore, we apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations. Our method was evaluated on our collected in-house and public surgical videos datasets. Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene-and produce both visually and physically plausible deformations at a speed approaching real-time. The results demonstrate great potential of our proposed method to enhance the efficiency and variety of simulations available for surgical education and robot learning.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# 指操作のための学習力制御

Learning Force Control for Legged Manipulation ( http://arxiv.org/abs/2405.01402v2 )

ライセンス: Link先を確認

Tifanny Portela, Gabriel B. Margolis, Yandong Ji, Pulkit Agrawal,

(参考訳) 相互作用中の接触力の制御は、移動や操作作業において重要である。 sim-to-real reinforcement learning (RL) は多くの接触に富む問題に成功しているが、現在のRL法は力の制御を明示的に行わずに暗黙的に力強い相互作用を達成している。本稿では,力覚へのアクセスを必要とせず,直接力制御のためのRLポリシーを訓練する方法を提案する。腕を持つ四足ロボットの全身制御プラットフォーム上で本手法を実証する。このような力の制御により、重力補償とインピーダンス制御を行え、従順な全身操作を解き放つことができる。可変コンプライアンスの学習された全身制御装置は、ロボットがマニピュレータを指示するだけでロボットの遠隔操作を直感的に行うことができ、ロボットの体は自動的に調整され、所望の位置と力を達成する。これにより、人間の遠隔操作者は、多様なロコ操作タスクを容易に示することができる。我々の知る限り、我々は、学習した全身力制御を脚のマニピュレータに初めて展開し、より汎用的で適応可能な脚ロボットへの道を開いた。

Controlling contact forces during interactions is critical for locomotion and manipulation tasks. While sim-to-real reinforcement learning (RL) has succeeded in many contact-rich problems, current RL methods achieve forceful interactions implicitly without explicitly regulating forces. We propose a method for training RL policies for direct force control without requiring access to force sensing. We showcase our method on a whole-body control platform of a quadruped robot with an arm. Such force control enables us to perform gravity compensation and impedance control, unlocking compliant whole-body manipulation. The learned whole-body controller with variable compliance makes it intuitive for humans to teleoperate the robot by only commanding the manipulator, and the robot's body adjusts automatically to achieve the desired position and force. Consequently, a human teleoperator can easily demonstrate a wide variety of loco-manipulation tasks. To the best of our knowledge, we provide the first deployment of learned whole-body force control in legged manipulators, paving the way for more versatile and adaptable legged robots.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# CTD4 - 多重臨界のカルマン融合を用いた深部連続分布型アクター臨界剤

CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics ( http://arxiv.org/abs/2405.02576v2 )

ライセンス: Link先を確認

David Valencia, Henry Williams, Trevor Gee, Bruce A MacDonald, Minas Liarokapis,

(参考訳) CDRL(Categorical Distributional Reinforcement Learning)は,従来のRL(Reinforcement Learning)アプローチと比較して,複雑なタスクの学習において,より優れたサンプル効率を示す。しかし、CDRLの実践的応用は、挑戦的なプロジェクションステップ、詳細なパラメータチューニング、ドメイン知識によって妨げられている。本稿では,連続行動空間に適した連続分布モデル自由RLアルゴリズムを導入することで,これらの課題に対処する。提案アルゴリズムは,連続確率分布を出力するアクタ批判アーキテクチャを用いて,分布RLの実装を単純化する。さらに,過大評価バイアスを軽減するために,カルマン融合機構を通じて融合した複数の批評家のアンサンブルを提案する。一連の実験を通して,提案手法は訓練が容易であり,複雑な連続制御タスクを実行するためのサンプル効率の高いソリューションとして機能することが検証された。

Categorical Distributional Reinforcement Learning (CDRL) has demonstrated superior sample efficiency in learning complex tasks compared to conventional Reinforcement Learning (RL) approaches. However, the practical application of CDRL is encumbered by challenging projection steps, detailed parameter tuning, and domain knowledge. This paper addresses these challenges by introducing a pioneering Continuous Distributional Model-Free RL algorithm tailored for continuous action spaces. The proposed algorithm simplifies the implementation of distributional RL, adopting an actor-critic architecture wherein the critic outputs a continuous probability distribution. Additionally, we propose an ensemble of multiple critics fused through a Kalman fusion mechanism to mitigate overestimation bias. Through a series of experiments, we validate that our proposed method is easy to train and serves as a sample-efficient solution for executing complex continuous-control tasks.

翻訳日:2024-05-21 22:21:29 公開日:2024-05-20

# 長期連続予測のための粗大化戦略によるMLPの強化

Boosting MLPs with a Coarsening Strategy for Long-Term Time Series Forecasting ( http://arxiv.org/abs/2405.03199v2 )

ライセンス: Link先を確認

Nannan Bian, Minhong Zhu, Li Chen, Weiran Cai,

(参考訳) ディープラーニング手法は,長期連続予測においてその強みを発揮してきた。しかし、表現力と計算効率のバランスをとるのに苦労することが多い。マルチ層パーセプトロン (MLPs) へのリソーシングは、妥協的なソリューションを提供するが、それらは固有のポイントワイドマッピングモードによって引き起こされる2つの重大な問題に悩まされる。本稿では,単独の時間点の代わりに情報グラニュラーを形成することで,プロトタイプMLPに関連する問題を緩和する粗大化戦略を特徴とする粗大化パーセプトロンネットワーク(CP-Net)を提案する。 CP-Netは主に意味的パターンと文脈的パターンを抽出するための2段階のフレームワークを使用しており、より大きなタイムパンの相関を保ち、揮発性雑音を除去する。これは、多様な粒度のパターンを総合的な予測に向けて融合させるマルチスケール設定によってさらに強化される。純粋に構造的単純さの畳み込みに基づいて、CP-Netは線形計算の複雑さとランタイムの低さを維持しつつ、7つの予測ベンチマークでSOTA法と比較すると4.1%の改善を示した。

Deep learning methods have been exerting their strengths in long-term time series forecasting. However, they often struggle to strike a balance between expressive power and computational efficiency. Resorting to multi-layer perceptrons (MLPs) provides a compromising solution, yet they suffer from two critical problems caused by the intrinsic point-wise mapping mode, in terms of deficient contextual dependencies and inadequate information bottleneck. Here, we propose the Coarsened Perceptron Network (CP-Net), featured by a coarsening strategy that alleviates the above problems associated with the prototype MLPs by forming information granules in place of solitary temporal points. The CP-Net utilizes primarily a two-stage framework for extracting semantic and contextual patterns, which preserves correlations over larger timespans and filters out volatile noises. This is further enhanced by a multi-scale setting, where patterns of diverse granularities are fused towards a comprehensive prediction. Based purely on convolutions of structural simplicity, CP-Net is able to maintain a linear computational complexity and low runtime, while demonstrates an improvement of 4.1% compared with the SOTA method on seven forecasting benchmarks.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# Retinexmamba:低照度画像強調のためのRetinex-based Mamba

Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement ( http://arxiv.org/abs/2405.03349v2 )

ライセンス: Link先を確認

Jiesong Bai, Yuhao Yin, Qiyuan He, Yuanxian Li, Xiaofeng Zhang,

(参考訳) 低照度画像強調の分野では、従来のRetinex法とRetinexformerのような高度なディープラーニング技術の両方が、明確な利点と限界を示している。従来のレチネックス法は、人間の目の明度と色彩の知覚を模倣するために設計され、画像を照明と反射成分に分解するが、低照度条件下でのノイズ管理と詳細な保存に苦労する。 Retinexformerは、従来の自己認識機構を通じて照明推定を強化するが、解釈容易性や準最適強調効果が不十分な課題に直面している。これらの制約を克服するために,RetinexMambaアーキテクチャを提案する。 RetinexMambaは従来のRetinexメソッドの物理的直感性を捉えるだけでなく、Retinexformerのディープラーニングフレームワークを統合し、ステートスペースモデル(SSM)の計算効率を活用して処理速度を向上させる。このアーキテクチャは、イノベーティブな照明推定器と、エンハンスメント中の画質を維持する損傷回復機構を備えている。さらに、RetinexMambaはRetinexformerのIG-MSA(Illumination-Guided Multi-Head Attention)をFused-Attentionメカニズムで置き換え、モデルの解釈性を向上させる。 LOLデータセットの実験的評価により、RetinexMambaは、Retinex理論に基づく既存のディープラーニングアプローチを定量的および定性的メトリクスの両方で上回り、低照度画像の強化におけるその有効性と優位性を確認した。

In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preservation under low light conditions. Retinexformer enhances illumination estimation through traditional self-attention mechanisms, but faces challenges with insufficient interpretability and suboptimal enhancement effects. To overcome these limitations, this paper introduces the RetinexMamba architecture. RetinexMamba not only captures the physical intuitiveness of traditional Retinex methods but also integrates the deep learning framework of Retinexformer, leveraging the computational efficiency of State Space Models (SSMs) to enhance processing speed. This architecture features innovative illumination estimators and damage restorer mechanisms that maintain image quality during enhancement. Moreover, RetinexMamba replaces the IG-MSA (Illumination-Guided Multi-Head Attention) in Retinexformer with a Fused-Attention mechanism, improving the model's interpretability. Experimental evaluations on the LOL dataset show that RetinexMamba outperforms existing deep learning approaches based on Retinex theory in both quantitative and qualitative metrics, confirming its effectiveness and superiority in enhancing low-light images.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# 部分指紋の同時同定とポスアライメント

Joint Identity Verification and Pose Alignment for Partial Fingerprints ( http://arxiv.org/abs/2405.03959v2 )

ライセンス: Link先を確認

Xiongjun Guan, Zhiyu Pan, Jianjiang Feng, Jie Zhou,

(参考訳) 現在、ポータブル電子機器はますます人気が高まっている。軽量な考慮のために、指紋認識モジュールは通常、限られたサイズのセンサーを使用する。しかし、部分的な指紋は、特に指圧姿勢や画像品質の違いがある場合に、適合する特徴がほとんどないため、部分的な指紋認証は困難である。既存のほとんどの手法では、指紋位置の正当性検証を独立したタスクとみなし、それらの間の結合関係を無視している - 相対的なポーズ推定は通常、アンカーとしてペア化された特徴に依存しており、認証精度はより正確なポーズアライメントによって改善される傾向にある。そこで本稿では,部分指紋ペアの協調識別とポーズアライメントの手法を提案する。これを実現するために,マルチタスクCNN-Transformerハイブリッドネットワークを提案し,特徴抽出能力を高めるための事前学習タスクを設計する。複数の公開データセット (NIST SD14, FVC 2002 DB1A & DB3A, FVC 2004 DB1A & DB2A, FVC 2006 DB1A) および社内データセットを用いた実験により, 本手法は指紋部分認証と相対ポーズ推定の両方において, 従来手法よりも効率的でありながら, 最先端性能を実現していることが示された。

Currently, portable electronic devices are becoming more and more popular. For lightweight considerations, their fingerprint recognition modules usually use limited-size sensors. However, partial fingerprints have few matchable features, especially when there are differences in finger pressing posture or image quality, which makes partial fingerprint verification challenging. Most existing methods regard fingerprint position rectification and identity verification as independent tasks, ignoring the coupling relationship between them -- relative pose estimation typically relies on paired features as anchors, and authentication accuracy tends to improve with more precise pose alignment. Consequently, in this paper we propose a method for joint identity verification and pose alignment of partial fingerprint pairs, aiming to leverage their inherent correlation to improve each other. To achieve this, we propose a multi-task CNN (Convolutional Neural Network)-Transformer hybrid network, and design a pre-training task to enhance the feature extraction capability. Experiments on multiple public datasets (NIST SD14, FVC2002 DB1A & DB3A, FVC2004 DB1A & DB2A, FVC2006 DB1A) and an in-house dataset show that our method achieves state-of-the-art performance in both partial fingerprint verification and relative pose estimation, while being more efficient than previous methods.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# TrimCaching: 無線エッジネットワークにおけるパラメータ共有AIモデルキャッシュ

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks ( http://arxiv.org/abs/2405.03990v2 )

ライセンス: Link先を確認

Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang,

(参考訳) 次世代のモバイルネットワークは、エンドユーザへの高速なAIモデルダウンロードを容易にすることが期待されている。エッジサーバにモデルをキャッシュすることで、モバイルネットワークは低レイテンシでエンドユーザにモデルを配信することが可能になる。本稿では,パラメータ共有モデルキャッシング(TrimCaching)と呼ばれる新しいモデル配置手法を提案する。 TrimCachingは、畳み込みニューラルネットワークや大規模言語モデルといった幅広いAIモデルが、再利用可能な知識を含むパラメータブロックのかなりの割合を共有できるため、ストレージ効率が向上する、という重要な観察を活用する。この目的のために、ストレージ効率とサービスレイテンシの基本的なトレードオフをバランスさせて、パラメータ共有モデル配置問題を定式化し、マルチエッジ無線ネットワークにおけるキャッシュヒット率を最大化する。定式化問題は、多項式時間近似アルゴリズムが存在しない部分モジュラー制約を持つ部分モジュラー最大化問題であることを示す。この課題を克服するために、モデル間で少数のパラメータブロックが共有される重要なケースについて検討する。そのような場合、$\left(1-\epsilon\right)/2$-approximationが保証される多項式時間アルゴリズムを開発する。その後、グリーディアルゴリズムを考案し、一般事例の原問題に対処する。シミュレーションの結果,提案したTrimCachingフレームワークは,AIモデルで共有パラメータを利用することなく,最先端のコンテンツキャッシュと比較してキャッシュヒット率を大幅に向上することが示された。

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# Diff-IP2D:Egocentric Videoにおける拡散に基づく手動物体の相互作用予測

Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos ( http://arxiv.org/abs/2405.04370v2 )

ライセンス: Link先を確認

Junyi Ma, Jingyi Xu, Xieyuanli Chen, Hesheng Wang,

(参考訳) サービスロボットの操作や拡張現実の応用には、人間が手動操作でどのように振る舞うかを理解することが不可欠である。これを実現するために、人間の自我中心の動画に手動軌跡と物価を同時に予測する最近の研究が提案されている。共同予測は2次元空間における将来の手-物体相互作用の包括的表現として機能し、潜在的な人間の動きと動機を示す。しかし、既存のアプローチは主に一方向予測のための自己回帰的パラダイムを採用しており、これは全体論的な将来のシーケンスにおける相互制約を欠き、時間軸に沿ってエラーを蓄積する。一方、これらの作品は基本的に、カメラの感情が1対1の視聴予測に与える影響を見落としている。これらの制約に対処するために,Diff-IP2Dという拡散型相互作用予測手法を提案する。逐次的2次元画像から潜在特徴空間へ変換し,過去の被写体に条件付けされた将来の潜時相互作用特徴を予測するために,偏差拡散モデルを設計する。モーション機能は、より正確なインタラクション予測のために、Diff-IP2Dがカメラ装着者のダイナミクスを認識できるように、条件付き復調プロセスにさらに統合される。大規模な実験により,本手法は市販の計測基準と新たに提案した評価プロトコルの両方において,最先端のベースラインを大幅に上回っていることが示された。このことは、2次元ハンドオブジェクト相互作用予測に生成パラダイムを活用することの有効性を強調している。 Diff-IP2Dのコードはhttps://github.com/IRMVLab/Diff-IP2Dで公開される。

Understanding how humans would behave during hand-object interaction is vital for applications in service robot manipulation and extended reality. To achieve this, some recent works have been proposed to simultaneously forecast hand trajectories and object affordances on human egocentric videos. The joint prediction serves as a comprehensive representation of future hand-object interactions in 2D space, indicating potential human motion and motivation. However, the existing approaches mostly adopt the autoregressive paradigm for unidirectional prediction, which lacks mutual constraints within the holistic future sequence, and accumulates errors along the time axis. Meanwhile, these works basically overlook the effect of camera egomotion on first-person view predictions. To address these limitations, we propose a novel diffusion-based interaction prediction method, namely Diff-IP2D, to forecast future hand trajectories and object affordances concurrently in an iterative non-autoregressive manner. We transform the sequential 2D images into latent feature space and design a denoising diffusion model to predict future latent interaction features conditioned on past ones. Motion features are further integrated into the conditional denoising process to enable Diff-IP2D aware of the camera wearer's dynamics for more accurate interaction prediction. Extensive experiments demonstrate that our method significantly outperforms the state-of-the-art baselines on both the off-the-shelf metrics and our newly proposed evaluation protocol. This highlights the efficacy of leveraging a generative paradigm for 2D hand-object interaction prediction. The code of Diff-IP2D will be released at https://github.com/IRMVLab/Diff-IP2D.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# 偏光トポロジカルチャージによる高次トポロジの展開

Unveiling Higher-Order Topology via Polarized Topological Charges ( http://arxiv.org/abs/2405.05505v2 )

ライセンス: Link先を確認

Wei Jia, Bao-Zong Wang, Ming-Jian Gao, Jun-Hong An,

(参考訳) 実空間トポロジカル不変量は、カイラル対称高次トポロジカル位相(HOTP)を特徴づけるために広く用いられた。しかし、これらのHOTPの運動量-空間的特徴は、本質的にその固有なバルク-バウンダリ対応を明らかにし、量子シミュレーションシステムにおける検出を容易にするものであるが、まだ不足している。ここでは、偏光トポロジカル電荷の概念を用いて、キラル対称HOTPに対する実験的に観測可能な運動量空間のキャラクタリゼーションを提案する。これは、バルク状態だけでなく、エッジ状態だけでなく、バンドギャップの閉鎖と再開によって生じるトポロジカル相転移を統一的に記述する。注目すべきは、これらの偏極トポロジカル電荷は擬スピン構造を測定することで同定できることである。 $^{87}$Rb冷原子系のHOTPを検出することが可能なスキームが与えられる。本研究は運動量空間におけるキラル対称HOTPの特性と実験的検出のための道を開く。

Real-space topological invariants were widely used to characterize chiral-symmetric higher-order topological phases (HOTPs). However, a momentum-space characterization to these HOTPs, which essentially reveals their intrinsic bulk-boundary correspondence and facilitates their detection in quantum simulation systems, is still lacking. Here, we propose an experimentally observable momentum-space characterization to the chiral-symmetric HOTPs by the concept of polarized topological charges. It provides a unified description to topological phase transitions caused by the closing and reopening of band gap not only of the bulk states but also the edge states. Remarkably, these polarized topological charges can be identified by measuring the pseudospin structures. A feasible scheme to detect the HOTPs in the $^{87}$Rb cold atomic system is given. Our work opens an avenue for characterization and experimental detection of the chiral-symmetric HOTPs in momentum space.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# 特殊文字攻撃:大規模言語モデルからのスケーラブルなトレーニングデータ抽出を目指して

Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models ( http://arxiv.org/abs/2405.05990v2 )

ライセンス: Link先を確認

Yang Bai, Ge Pei, Jindong Gu, Yong Yang, Xingjun Ma,

(参考訳) 大規模言語モデル(LLM)は、幅広いタスクにおいて顕著なパフォーマンスを実現している。しかし、最近の研究では、LLMはトレーニングデータを記憶でき、単純な繰り返しトークンはモデルを騙してデータを漏洩させることが示されている。本稿では、さらに一歩進めて、特定の特殊文字またはそれらと英語の文字の組み合わせがより強いメモリトリガであることを示す。 LLMはJSONファイルの構造記号 {, } と @, # を含む大量の特殊文字を含む大量のデータで訓練されているため、このモデルはこれらの特殊文字と原文の共起を記憶することができる。これにより、トレーニングデータ漏洩を誘発する簡易かつ効果的な特殊文字攻撃(SCA)を提案する。コードコーパスやWebページ,個人識別可能な情報など,さまざまなトレーニングデータをリークし,時には副産物として非ストップ出力を生成することができる。さらに, 学習データコーパスの構成は, 漏洩したデータを検査することで明らかにできることを示す。我々の研究は、LLMの特殊文字に対する感受性を理解し、改善のための潜在的な領域を特定するのに役立ちます。

Large language models (LLMs) have achieved remarkable performance on a wide range of tasks. However, recent studies have shown that LLMs can memorize training data and simple repeated tokens can trick the model to leak the data. In this paper, we take a step further and show that certain special characters or their combinations with English letters are stronger memory triggers, leading to more severe data leakage. The intuition is that, since LLMs are trained with massive data that contains a substantial amount of special characters (e.g. structural symbols {, } of JSON files, and @, # in emails and online posts), the model may memorize the co-occurrence between these special characters and the raw texts. This motivates us to propose a simple but effective Special Characters Attack (SCA) to induce training data leakage. Our experiments verify the high effectiveness of SCA against state-of-the-art LLMs: they can leak diverse training data, such as code corpus, web pages, and personally identifiable information, and sometimes generate non-stop outputs as a byproduct. We further show that the composition of the training data corpus can be revealed by inspecting the leaked data -- one crucial piece of information for pre-training high-performance LLMs. Our work can help understand the sensitivity of LLMs to special characters and identify potential areas for improvement.

翻訳日:2024-05-21 20:25:40 公開日:2024-05-20

# LangCell: 細胞アイデンティティ理解のためのLanguage-Cell事前トレーニング

LangCell: Language-Cell Pre-training for Cell Identity Understanding ( http://arxiv.org/abs/2405.06708v2 )

ライセンス: Link先を確認

Suyuan Zhao, Jiahuan Zhang, Yizhen Luo, Yushuai Wu, Zaiqing Nie,

(参考訳) 細胞識別は、細胞の種類、経路情報、疾患情報など、細胞の様々な意味的側面を包含しており、生物学者がその生物学的特性を理解するのに不可欠である。細胞型アノテートなどの転写学的データから細胞識別を理解することは、生体情報学において重要な課題となっている。これらのセマンティックな側面は人間の専門家によって決定されるため、単一セルとラベルペアによって提供される監視信号なしで、AIモデルが細胞アイデンティティ理解タスクを効果的に実行することは不可能である。このタスクに現在使用されているシングルセル事前訓練言語モデル(PLM)は、単一のモダリティ、トランスクリプトミクスデータのみに基づいて訓練され、セルアイデンティティの知識の理解が欠如している。結果として、望ましいセマンティックラベルでラベル付きデータを欠いている場合には、ダウンストリームタスクや苦労のために微調整される必要がある。この問題に対処するために,事前学習期間中に単一セルデータと自然言語の統一表現を構築し,セルアイデンティティに関連する洞察を直接組み込むという,革新的な手法を提案する。より具体的には、最初のLanguage-Cell事前トレーニングフレームワークであるLangCellを紹介します。 LangCellは、セルアイデンティティ情報に富んだテキストを利用して、クロスモーダルな知識の深い理解を得る。異なるベンチマークで実施された実験の結果、LangCellはゼロショットのセル識別理解シナリオで効果的に機能する唯一のシングルセルPLMであり、また、少数ショットと微調整のセル識別理解シナリオで既存のモデルよりも大幅に優れていることが示された。

Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, have become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce LangCell, the first Language-Cell pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.

翻訳日:2024-05-21 20:15:46 公開日:2024-05-20

# ゲノム規模メタボリックネットワークモデルにおける遺伝子機能の能動的学習のためのブール行列論理プログラミング

Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models ( http://arxiv.org/abs/2405.06724v2 )

ライセンス: Link先を確認

Lun Ai, Stephen H. Muggleton, Shi-Shun Liang, Geoff S. Baldwin,

(参考訳) 研究を自律的に推進する技術はComputational Scientific Discoveryにおいて顕著であり、Synthetic Biologyは有用な目的のために新しい生物学的システムの設計と構築に焦点を当てた科学分野である。ここでは、細胞工学の促進と生物学的発見の促進に論理ベースの機械学習技術を適用したい。ゲノムスケールメタボリックネットワークモデル (GEMs) と呼ばれる代謝過程の包括的データベースは、しばしば標的化合物生産を最適化するための細胞工学的戦略を評価するために使用される。しかしながら、予測されたホストの振る舞いは、しばしばモデル内のエラーのために、常にGEMによって正しく記述されるわけではない。 GEM内の複雑な遺伝的相互作用を学習するタスクは、計算的および経験的課題を提示する。これらの問題に対処するために,ブール行列を利用して大規模論理プログラムを評価する,Boolean Matrix Logic Programming (BMLP) と呼ばれる新しい手法について述べる。能動的学習を通じて情報的実験を導くことにより,ゲノム仮説空間を効率的に探索するシステム「BMLP_{active}$」を導入する。サブシンボリックな方法とは対照的に、$BMLP_{active}$は、データログ論理プログラムを用いて解釈可能で論理的な表現で広く受け入れられている細菌ホストの最先端のGEMを符号化する。特に、$BMLP_{active}$は、ランダムな実験よりも訓練例が少ない遺伝子ペア間の相互作用をうまく学習することができ、実験的な設計空間の増加を克服することができる。 $BMLP_{active}$は、代謝モデルの迅速な最適化を可能にし、有用な化合物を製造するための生物学的システムを確実に設計する。それは、微生物工学のための自動運転ラボを作るための現実的なアプローチを提供する。

Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.

翻訳日:2024-05-21 20:15:46 公開日:2024-05-20

# オープンセットデータを微妙に活用したロバスト半教師付き学習

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data ( http://arxiv.org/abs/2405.06979v2 )

ライセンス: Link先を確認

Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan,

(参考訳) Open-set Semi-supervised Learning (OSSL) は、ラベル付けされていないデータはラベル付けされていないクラス、すなわちOOD(out-of-distribution)データから来る可能性があるという現実的な設定を持ち、従来のSSLモデルの性能劣化を引き起こす可能性がある。この問題を解決するため、従来のID分類器を除いて、既存のOSSLアプローチでは、OODデータの潜在的な負の影響を避けるために、追加のOOD検出モジュールを使用している。それにもかかわらず、これらのアプローチはトレーニングプロセス中に一般的にオープンセットデータの集合全体を使用し、モデルパフォーマンスに悪影響を及ぼす可能性のあるOSSLタスクに親しみのないデータを含む可能性がある。このことは、OSSLの堅牢なオープンセットデータ選択戦略を開発するきっかけになります。学習理論の観点からの理論的理解を通じて,モデルの学習にオープンセットデータを選択的に活用する汎用的なOSSLフレームワークであるWise Open-set Semi-supervised Learning (WiseOpen)を提案する。勾配分散に基づく選択機構を適用することで、WiseOpenは、オープンセットデータセット全体ではなく、フレンドリなサブセットを利用して、モデルのID分類能力を向上する。また,その計算コストを削減するために,低周波更新と損失ベース選択をそれぞれ採用することにより,WiseOpenの実用的2つのバリエーションを提案する。大規模な実験は、最先端技術と比較してWiseOpenの有効性を実証している。

Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.

翻訳日:2024-05-21 20:15:46 公開日:2024-05-20

# 深層学習に基づくオブジェクトポース推定 : 総合的な調査

Deep Learning-Based Object Pose Estimation: A Comprehensive Survey ( http://arxiv.org/abs/2405.07801v2 )

ライセンス: Link先を確認

Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian,

(参考訳) オブジェクトポーズ推定は、拡張現実やロボット工学の幅広い応用において、基本的なコンピュータビジョン問題である。過去10年間で、より優れた精度と堅牢性のために、ディープラーニングモデルは、エンジニアリングされたポイントペア機能に依存する従来のアルゴリズムに取って代わる傾向にある。それでも、ラベル付きトレーニングデータへの依存、モデルコンパクト性、挑戦条件下での堅牢性、新しい未知のオブジェクトに一般化する能力など、現代の手法ではいくつかの課題が続いている。この分野のさまざまな側面、卓越した課題、将来有望な方向性に関する最近の調査は欠落している。このギャップを埋めるために、ディープラーニングに基づくオブジェクトポーズ推定の最近の進歩について論じ、問題の3つの定式化、すなわち、インスタンスレベル、カテゴリレベル、見えないオブジェクトポーズ推定を網羅する。また、複数の入力データモダリティ、出力ポーズの度合い、オブジェクト特性、下流タスクについても調査を行い、この分野の全体的理解を読者に提供する。さらに、異なるドメイン、推論モード、アプリケーション領域、評価指標、ベンチマークデータセットのトレーニングパラダイムや、これらのベンチマークにおける現在の最先端メソッドのパフォーマンスを報告し、読者がアプリケーションに最も適したメソッドを選択するのを容易にする。最後に、調査は鍵となる課題を特定し、その長所と短所と共に傾向をレビューし、将来の研究の有望な方向性を特定する。また、最新の作業をhttps://github.com/CNJianLiu/Awesome-Object-Pose-Estimationで追跡しています。

Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependency on labeled training data, model compactness, robustness under challenging conditions, and their ability to generalize to novel unseen objects. A recent survey discussing the progress made on different aspects of this area, outstanding challenges, and promising future directions, is missing. To fill this gap, we discuss the recent advances in deep learning-based object pose estimation, covering all three formulations of the problem, i.e., instance-level, category-level, and unseen object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks, providing readers with a holistic understanding of this field. Additionally, it discusses training paradigms of different domains, inference modes, application areas, evaluation metrics, and benchmark datasets, as well as reports the performance of current state-of-the-art methods on these benchmarks, thereby facilitating readers in selecting the most suitable method for their application. Finally, the survey identifies key challenges, reviews prevailing trends along with their pros and cons, and identifies promising directions for future research. We also keep tracing the latest works at https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation.

翻訳日:2024-05-21 20:15:46 公開日:2024-05-20

# MambaOut: ビジョンにMambaは本当に必要か?

MambaOut: Do We Really Need Mamba for Vision? ( http://arxiv.org/abs/2405.07992v3 )

ライセンス: Link先を確認

Weihao Yu, Xinchao Wang,

(参考訳) 状態空間モデル(SSM)のRNNライクなトークンミキサーを備えたアーキテクチャであるMambaが最近導入され、注意機構の2次複雑さに対処し、視覚タスクに適用された。それでも、視覚に対するMambaのパフォーマンスは、畳み込みモデルや注目ベースのモデルと比較すると、しばしば過大評価される。本稿では,マンバの本質を探求し,マンバが長期的・自己回帰的特徴を有するタスクに理想的に適していると結論づける。視覚タスクの場合、画像分類はどちらの特徴とも一致しないため、このタスクにはマンバは必要ない、という仮説を立てる。仮説を実証的に検証するために,Mambaブロックを積み重ねてコアトークンミキサーSSMを除去し,MambaOutという一連のモデルを構築した。実験結果は仮説を強く支持する。具体的には、イメージネット画像分類において、我々のMambaOutモデルはすべての視覚的Mambaモデルを上回っており、このタスクにはMambaが本当に不要であることを示している。検出とセグメンテーションに関しては、MambaOutは最先端のビジュアルMambaモデルの性能と一致せず、長時間の視覚タスクに対するMambaの可能性を示す。コードはhttps://github.com/yuweihao/MambaOutで入手できる。

Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism and subsequently applied to vision tasks. Nevertheless, the performance of Mamba for vision is often underwhelming when compared with convolutional and attention-based models. In this paper, we delve into the essence of Mamba, and conceptually conclude that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks. To empirically verify our hypotheses, we construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM. Experimental results strongly support our hypotheses. Specifically, our MambaOut model surpasses all visual Mamba models on ImageNet image classification, indicating that Mamba is indeed unnecessary for this task. As for detection and segmentation, MambaOut cannot match the performance of state-of-the-art visual Mamba models, demonstrating the potential of Mamba for long-sequence visual tasks. The code is available at https://github.com/yuweihao/MambaOut

翻訳日:2024-05-21 20:15:46 公開日:2024-05-20

# Googleの保護されたオーディエンスプロトコルの評価

Evaluating Google's Protected Audience Protocol ( http://arxiv.org/abs/2405.08102v2 )

ライセンス: Link先を確認

Minjun Long, David Evans,

(参考訳) サードパーティのクッキーは、デジタルマーケティングのエコシステムの重要な要素だが、ユーザのWebサイトをまたがって、深刻なプライバシーの懸念を喚起する。 Googleは、サードパーティのクッキーを使わずに広告ターゲティングを可能にする、Privacy Sandboxイニシアチブを提案した。このイニシアチブの他の側面に焦点をあてた研究はいくつかあるが、リクエストリンクの防止という目的をシステムがいかにうまく達成するかについては、これまではほとんど分析されていない。本研究は,サードパーティのクッキーを使わずにオンライン再販を可能にすることを目的としたProtected Audience (PrAu)提案(以前はFLEDGEと呼ばれていた)で提案される報告メカニズムのリンクプライバシーリスクの分析に焦点をあてる。 PrAuの全体的なワークフローを要約し、提案した設計に関連する潜在的なプライバシーリスクを強調し、敵が異なるサイトへのリクエストを同じユーザにリンクしようとするシナリオに焦点を当てた。我々は、現在提案されているすべてのプライバシーメカニズムの正しい実装であっても、現実的な敵が、ユーザー要求をリンクし、大量監視を行うために、プライバシー保護された報告メカニズムを引き続き使用できることを示します。

While third-party cookies have been a key component of the digital marketing ecosystem for years, they allow users to be tracked across web sites in ways that raise serious privacy concerns. Google has proposed the Privacy Sandbox initiative to enable ad targeting without third-party cookies. While there have been several studies focused on other aspects of this initiative, there has been little analysis to date as to how well the system achieves the intended goal of preventing request linking. This work focuses on analyzing linkage privacy risks for the reporting mechanisms proposed in the Protected Audience (PrAu) proposal (previously known as FLEDGE), which is intended to enable online remarketing without using third-party cookies. We summarize the overall workflow of PrAu and highlight potential privacy risks associated with its proposed design, focusing on scenarios in which adversaries attempt to link requests to different sites to the same user. We show how a realistic adversary would be still able to use the privacy-protected reporting mechanisms to link user requests and conduct mass surveillance, even with correct implementations of all the currently proposed privacy mechanisms.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# Differentially Private Federated Learning: システムレビュー

Differentially Private Federated Learning: A Systematic Review ( http://arxiv.org/abs/2405.08299v3 )

ライセンス: Link先を確認

Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang Cao,

(参考訳) 近年、機械学習におけるプライバシとセキュリティの懸念が、信頼できるフェデレーション学習を研究の最前線に押し上げている。微分プライバシーは、厳格な数学的基盤と証明可能な保証のために、連邦学習におけるプライバシー保護の事実上の標準として登場した。差分プライバシーをフェデレート学習に組み込むアルゴリズムに関する広範な研究にもかかわらず、これらの研究を分類し、合成する体系的なレビューには明らかな欠陥がある。我々の研究は、差分的にプライベートなフェデレーション学習の体系的な概要を提示する。既存の分類学は、連邦学習において様々な差分プライバシーモデルによって提供される対象やプライバシー保護のレベルを十分に考慮していない。このギャップを是正するために,様々な異なるプライバシモデルとフェデレーションシナリオの定義と保証に基づく,微分プライベートなフェデレーション学習の新しい分類法を提案する。我々の分類では、保護対象を様々な差分プライバシモデルと、フェデレートされた学習環境内のそれぞれの近隣レベルにわたって明確に記述することができる。さらに,フェデレート学習シナリオにおける差分プライバシーの適用について検討する。本研究は,プライバシ保護フェデレーション学習に関する貴重な知見を提供し,今後の研究に向けた実践的方向性を提案する。

In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within federated learning, there remains an evident deficiency in systematic reviews that categorize and synthesize these studies. Our work presents a systematic overview of the differentially private federated learning. Existing taxonomies have not adequately considered objects and level of privacy protection provided by various differential privacy models in federated learning. To rectify this gap, we propose a new taxonomy of differentially private federated learning based on definition and guarantee of various differential privacy models and federated scenarios. Our classification allows for a clear delineation of the protected objects across various differential privacy models and their respective neighborhood levels within federated learning environments. Furthermore, we explore the applications of differential privacy in federated learning scenarios. Our work provide valuable insights into privacy-preserving federated learning and suggest practical directions for future research.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# 解釈性と制御のためのスパースオートエンコーダの原理的評価に向けて

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control ( http://arxiv.org/abs/2405.08366v3 )

ライセンス: Link先を確認

Aleksandar Makelov, George Lange, Neel Nanda,

(参考訳) モデルアクティベーションを意味のある特徴に遠ざけることは、解釈可能性の中心的な問題である。しかし、現実的なシナリオにおけるこれらの特徴に対する根本的真理の欠如は、スパース辞書学習のような近年のアプローチの検証に有効である。この課題に対処するために,特定のタスクの文脈における特徴辞書を評価するためのフレームワークを提案する。まず,教師付き辞書は,タスクにおけるモデル計算の近似,制御,解釈性に優れることを示す。第2に、教師なし辞書を用いて、同じ3つの軸に沿った教師なし辞書の評価を開発し、文脈的に評価する。 GPT-2 Small を用いた間接オブジェクト識別(IOI)タスクに適用し,IOI や OpenWebText のデータセットで訓練したスパースオートエンコーダ (SAE) を用いた。これらのSAEは、IOIタスクの解釈可能な特徴をキャプチャするが、モデル制御における教師付き特徴よりは成功していない。最後に,SAEトレーニングにおける2つの定性的な現象を観察する:特徴排除(因果関係の概念が学習特徴においてわずかに高次な概念によって強固に覆われている)と特徴過分割(二分的特徴がより小さく,解釈不能な特徴に分裂している)である。我々は,より客観的かつ基礎的な辞書学習手法の評価に向けて,我々のフレームワークが有用なステップを提供することを期待している。

Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary learning, elusive. To address this challenge, we propose a framework for evaluating feature dictionaries in the context of specific tasks, by comparing them against \emph{supervised} feature dictionaries. First, we demonstrate that supervised dictionaries achieve excellent approximation, control, and interpretability of model computations on the task. Second, we use the supervised dictionaries to develop and contextualize evaluations of unsupervised dictionaries along the same three axes. We apply this framework to the indirect object identification (IOI) task using GPT-2 Small, with sparse autoencoders (SAEs) trained on either the IOI or OpenWebText datasets. We find that these SAEs capture interpretable features for the IOI task, but they are less successful than supervised features in controlling the model. Finally, we observe two qualitative phenomena in SAE training: feature occlusion (where a causally relevant concept is robustly overshadowed by even slightly higher-magnitude ones in the learned features), and feature over-splitting (where binary features split into many smaller, less interpretable features). We hope that our framework will provide a useful step towards more objective and grounded evaluations of sparse dictionary learning methods.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# Reinformer:オフラインRLの最大戻りシーケンスモデリング

Reinformer: Max-Return Sequence Modeling for Offline RL ( http://arxiv.org/abs/2405.08740v2 )

ライセンス: Link先を確認

Zifeng Zhuang, Dengyun Peng, Jinxin Liu, Ziqi Zhang, Donglin Wang,

(参考訳) データ駆動型パラダイムとして、オフライン強化学習(RL)は、リターン、ゴール、将来の軌道を含む後見情報に基づく条件をシーケンスモデリングとして定式化されている。有望ではあるが、この教師付きパラダイムはリターンを最大化するRLの中核的な目的を見落としている。この見落としは、準最適データから学習するシーケンスモデルに影響を与える軌道縫合能力の欠如に直接繋がる。そこで本研究では,戻り値の最大化という目標を既存シーケンスモデルに組み込む,最大復帰シーケンスモデリングの概念を導入する。本稿では,RLの目的によってシーケンスモデルが強化されていることを示すReinforced Transformer(Reinformer)を提案する。 Reinformerはまた、トレーニングフェーズにおけるリターンの最大化という目的も取り入れており、ディストリビューション内での最大将来のリターンを予測することを目的としている。推論中、この分布内最大戻り値は最適なアクションの選択を導く。実証的には、ReinformerはD4RLベンチマークの古典的なRL手法と競合し、特に軌道縫合能力において最先端のシーケンスモデルより優れている。コードは \url{https://github.com/Dragon-Zhuang/Reinformer} で公開されている。

As a data-driven paradigm, offline reinforcement learning (RL) has been formulated as sequence modeling that conditions on the hindsight information including returns, goal or future trajectory. Although promising, this supervised paradigm overlooks the core objective of RL that maximizes the return. This overlook directly leads to the lack of trajectory stitching capability that affects the sequence model learning from sub-optimal data. In this work, we introduce the concept of max-return sequence modeling which integrates the goal of maximizing returns into existing sequence models. We propose Reinforced Transformer (Reinformer), indicating the sequence model is reinforced by the RL objective. Reinformer additionally incorporates the objective of maximizing returns in the training phase, aiming to predict the maximum future return within the distribution. During inference, this in-distribution maximum return will guide the selection of optimal actions. Empirically, Reinformer is competitive with classical RL methods on the D4RL benchmark and outperforms state-of-the-art sequence model particularly in trajectory stitching ability. Code is public at \url{https://github.com/Dragon-Zhuang/Reinformer}.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# 軌跡予測のための視覚のない知覚:自律運転における効果的な能動学習のためのシーン表現としてのエゴ車両ダイナミクス

Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving ( http://arxiv.org/abs/2405.09049v2 )

ライセンス: Link先を確認

Ross Greer, Mohan Trivedi,

(参考訳) 本研究では、自律走行機械学習タスクにおける効率的なデータキュレーションのための軌道情報と動的状態情報の利用について検討する。モデル性能を維持しつつアノテーションやデータコストを削減することを目的とした,アクティブラーニングフレームワークにおける軌道状態とサンプリング戦略のクラスタリング手法を提案する。提案手法は軌道情報を利用してデータ選択をガイドし,トレーニングデータの多様性を促進する。本研究では,nuScenesデータセットを用いたトラジェクティブ予測タスクにおける提案手法の有効性を実証し,異なるデータプールサイズでのランダムサンプリングよりも一貫した性能向上を示すとともに,データコストの50%のサブベースライン変位誤差にまで達することを示した。以上の結果から,トレーニングプールの規模が大きくなるにつれて,初歩的なデータサンプリングが「コールドスタート問題」の克服に役立ちながら,新規性の導入がより有益であることが示唆された。トラジェクティブ・ステート・インフォームド・アクティブ・ラーニングを統合することで、より効率的で堅牢な自動運転システムが低コストのデータキュレーション・ストラテジーによって実現可能であることを示す。

This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# 船衝突回避のための説明可能なAI:意思決定プロセスのデコードと行動意図

Explainable AI for Ship Collision Avoidance: Decoding Decision-Making Processes and Behavioral Intentions ( http://arxiv.org/abs/2405.09081v2 )

ライセンス: Link先を確認

Hitoshi Yoshioka, Hirotada Hashimoto,

(参考訳) 本研究は、船舶衝突回避のための説明可能なAIを開発した。当初、サブタスク批判ネットワークからなる批判ネットワークが提案され、衝突回避において各サブタスクを個別に評価し、関連するAI意思決定プロセスを明らかにする。さらに,Q値分析と注意機構を用いて行動意図を識別する試みを行った。前者は、AI行動によるQ値の増大を調べることによって意図を解釈することに焦点を当て、後者は、衝突回避のための意思決定プロセスにおいて、他の船の意義を学習目的に取り入れた。衝突回避におけるAIの行動意図は、認識された衝突の危険と他の船への注意度を組み合わせることで可視化された。提案手法は数値実験により評価した。開発されたAIは、さまざまな渋滞レベル下での衝突を安全に回避できることが確認され、AIの意思決定プロセスは人間にとって理解しやすいものになった。提案手法は,船体衝突回避タスクにおけるDRLベースのコントローラ/システム理解を容易にするだけでなく,サブタスクを構成するタスクにも拡張する。

This study developed an explainable AI for ship collision avoidance. Initially, a critic network composed of sub-task critic networks was proposed to individually evaluate each sub-task in collision avoidance to clarify the AI decision-making processes involved. Additionally, an attempt was made to discern behavioral intentions through a Q-value analysis and an Attention mechanism. The former focused on interpreting intentions by examining the increment of the Q-value resulting from AI actions, while the latter incorporated the significance of other ships in the decision-making process for collision avoidance into the learning objective. AI's behavioral intentions in collision avoidance were visualized by combining the perceived collision danger with the degree of attention to other ships. The proposed method was evaluated through a numerical experiment. The developed AI was confirmed to be able to safely avoid collisions under various congestion levels, and AI's decision-making process was rendered comprehensible to humans. The proposed method not only facilitates the understanding of DRL-based controllers/systems in the ship collision avoidance task but also extends to any task comprising sub-tasks.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# PolygloToxicity Prompts:大規模言語モデルにおける神経毒性の多言語的評価

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models ( http://arxiv.org/abs/2405.09373v2 )

ライセンス: Link先を確認

Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap,

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、その広範なグローバル展開をもたらし、包括的および多言語毒性評価に対する安全性の要求を確実にしている。しかし、既存の毒性ベンチマークは圧倒的に英語に重点を置いており、他の言語にLSMをデプロイする重大なリスクを負っている。 PTP(PolygloToxicity Prompts)は、17言語にまたがる自然発生425Kの大規模多言語毒性評価ベンチマークである。我々は、Webテキストに自然に発生する毒性の不足を克服し、1億以上のWebテキスト文書を自動的にスクラップすることで、様々なリソースを持つ言語にまたがるカバレッジを確保する。 PTPを用いて,60 LLMのベンチマークにより,モデルサイズ,プロンプト言語,指示および選好学習法が毒性に及ぼす影響について検討した。特に,言語資源の減少やモデルサイズの増加に伴い,毒性が増大することがわかった。指導・嗜好調整は毒性を低下させるが、選好調整法の選択は大きな影響を与えない。 LLMの安全確保と今後の研究分野のハイライトに光を当てた。

Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-scale multilingual toxicity evaluation benchmark of 425K naturally occurring prompts spanning 17 languages. We overcome the scarcity of naturally occurring toxicity in web-text and ensure coverage across languages with varying resources by automatically scraping over 100M web-text documents. Using PTP, we investigate research questions to study the impact of model size, prompt language, and instruction and preference-tuning methods on toxicity by benchmarking over 60 LLMs. Notably, we find that toxicity increases as language resources decrease or model size increases. Although instruction- and preference-tuning reduce toxicity, the choice of preference-tuning method does not have any significant impact. Our findings shed light on crucial shortcomings of LLM safeguarding and highlight areas for future research.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# DemOpts:新型コロナウイルスのケース予測モデルにおける公正度補正

DemOpts: Fairness corrections in COVID-19 case prediction models ( http://arxiv.org/abs/2405.09483v2 )

ライセンス: Link先を確認

Naman Awasthi, Saad Abrar, Daniel Smolyak, Vanessa Frias-Martinez,

(参考訳) 新型コロナウイルス(COVID-19)の予測モデルは、リソース割り当てや病院のベッド、在宅勤務の注文などの介入に関する意思決定を通知するために使われてきた。最先端のディープラーニングモデルは、新型コロナウイルスのケース予測モデルを強化するために、モビリティや社会デコグラフィーデータなどのマルチモーダルデータを使用することが多い。それにもかかわらず、関連する研究は、新型コロナウイルスの感染者の過少報告バイアスと、一部の少数民族や民族集団の移動データのサンプリングバイアスを明らかにしており、結果として、人種ラベルに沿った新型コロナウイルスの予測の公平性に影響を与える可能性がある。本稿では、現在最先端のディープラーニングモデルを用いて、人種や民族間で大きく異なる予測誤差を出力し、不公平な政策決定を支援することができることを示す。また、潜在的なバイアス付きデータセットに基づいてトレーニングされたディープラーニングに基づく予測モデルの公平性を高めるために、新しいデバイアス化手法であるDemOptsを提案する。以上の結果から、DemOptsは、他の最先端の非バイアス化アプローチと同等のエラーを達成でき、これにより、より人種的および民族的グループ間の平均エラー分布の差異を効果的に低減できることが示された。

COVID-19 forecasting models have been used to inform decision making around resource allocation and intervention decisions e.g., hospital beds or stay-at-home orders. State of the art deep learning models often use multimodal data such as mobility or socio-demographic data to enhance COVID-19 case prediction models. Nevertheless, related work has revealed under-reporting bias in COVID-19 cases as well as sampling bias in mobility data for certain minority racial and ethnic groups, which could in turn affect the fairness of the COVID-19 predictions along race labels. In this paper, we show that state of the art deep learning models output mean prediction errors that are significantly different across racial and ethnic groups; and which could, in turn, support unfair policy decisions. We also propose a novel de-biasing method, DemOpts, to increase the fairness of deep learning based forecasting models trained on potentially biased datasets. Our results show that DemOpts can achieve better error parity that other state of the art de-biasing approaches, thus effectively reducing the differences in the mean error distributions across more racial and ethnic groups.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# テキスト, 画像, ビデオ, 音声基礎モデルにおける幻覚の発見 : 包括的調査

Unveiling Hallucination in Text, Image, Video, and Audio Foundation Models: A Comprehensive Survey ( http://arxiv.org/abs/2405.09589v2 )

ライセンス: Link先を確認

Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, Aman Chadha,

(参考訳) 言語、画像、音声、ビデオ領域にまたがるファンデーションモデル(FM)の急速な進歩は、様々なタスクにおいて顕著な能力を示している。しかし、FMの拡散は、特に高感度の応用において、幻覚出力を発生させる可能性という重要な課題を生んでいる。幻覚コンテンツを生み出す基礎モデルの傾向は、特に信頼性と精度が最重要である領域において、現実のシナリオにおいて広く採用されていることの最大の障害である。本研究は,FMにおける幻覚問題,テキスト,画像,ビデオ,オーディオモダリティの同定と緩和を目的とした最近の研究の概要を概説する。近年の幻覚の検出・緩和の進歩によって,研究者,開発者,実践者に貴重な洞察を提供することが目的である。本質的には、マルチモーダル基礎モデルの幻覚に対処するための定義、分類、検出戦略を含む明確な枠組みを確立し、この中心的な領域における将来の研究の基礎を築いた。

The rapid advancement of foundation models (FMs) across language, image, audio, and video domains has shown remarkable capabilities in diverse tasks. However, the proliferation of FMs brings forth a critical challenge: the potential to generate hallucinated outputs, particularly in high-stakes applications. The tendency of foundation models to produce hallucinated content arguably represents the biggest hindrance to their widespread adoption in real-world scenarios, especially in domains where reliability and accuracy are paramount. This survey paper presents a comprehensive overview of recent developments that aim to identify and mitigate the problem of hallucination in FMs, spanning text, image, video, and audio modalities. By synthesizing recent advancements in detecting and mitigating hallucination across various modalities, the paper aims to provide valuable insights for researchers, developers, and practitioners. Essentially, it establishes a clear framework encompassing definition, taxonomy, and detection strategies for addressing hallucination in multimodal foundation models, laying the foundation for future research in this pivotal area.

翻訳日:2024-05-21 20:06:02 公開日:2024-05-20

# ニューラルパス表現を用いたテキスト・ツー・ベクター生成

Text-to-Vector Generation with Neural Path Representation ( http://arxiv.org/abs/2405.10317v2 )

ライセンス: Link先を確認

Peiying Zhang, Nanxuan Zhao, Jing Liao,

(参考訳) ベクトルグラフィックスはデジタルアートで広く使われており、そのスケーラビリティとレイヤーワイドの性質からデザイナーに好まれている。しかし、ベクトルグラフィックスの作成と編集には創造性と設計の専門知識が必要であり、時間を要する作業となっている。テキスト・ツー・ベクター(T2V)生成の最近の進歩は、このプロセスをより使いやすくすることを目的としている。しかし、既存のT2V法はベクトルグラフパスの制御点を直接最適化し、幾何学的制約が欠如しているため、しばしば交差やジャグリングの経路が生じる。これらの制約を克服するために,2分岐変分オートエンコーダ(VAE)を設計し,シーケンスと画像の両モードから経路潜時空間を学習するニューラルパス表現を提案する。ニューラルパスの組み合わせを最適化することにより、生成したSVGの表現性を保ちながら幾何的制約を組み込むことができる。さらに,生成したSVGの視覚的およびトポロジ的品質を改善するための2段階経路最適化手法を提案する。第1段階では、事前訓練されたテキスト・ツー・イメージ拡散モデルが、変分スコア蒸留(VSD)プロセスを通じて複雑なベクトルグラフィックスの初期生成を導く。第2段階では、レイヤワイズ画像ベクトル化戦略を用いてグラフィクスを洗練し、より明確な要素と構造を実現する。本手法の有効性を実験的に検証し,様々な応用例を示す。プロジェクトページはhttps://intchous.github.io/T2V-NPR。

Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making it a time-consuming task. Recent advancements in text-to-vector (T2V) generation have aimed to make this process more accessible. However, existing T2V methods directly optimize control points of vector graphics paths, often resulting in intersecting or jagged paths due to the lack of geometry constraints. To overcome these limitations, we propose a novel neural path representation by designing a dual-branch Variational Autoencoder (VAE) that learns the path latent space from both sequence and image modalities. By optimizing the combination of neural paths, we can incorporate geometric constraints while preserving expressivity in generated SVGs. Furthermore, we introduce a two-stage path optimization method to improve the visual and topological quality of generated SVGs. In the first stage, a pre-trained text-to-image diffusion model guides the initial generation of complex vector graphics through the Variational Score Distillation (VSD) process. In the second stage, we refine the graphics using a layer-wise image vectorization strategy to achieve clearer elements and structure. We demonstrate the effectiveness of our method through extensive experiments and showcase various applications. The project page is https://intchous.github.io/T2V-NPR.

翻訳日:2024-05-21 19:56:17 公開日:2024-05-20

# 大規模言語モデルにおける毒性の現実的評価

Realistic Evaluation of Toxicity in Large Language Models ( http://arxiv.org/abs/2405.10659v2 )

ライセンス: Link先を確認

Tinh Son Luong, Thanh-Thien Le, Linh Ngo Van, Thien Huu Nguyen,

(参考訳) 大きな言語モデル(LLM)は、私たちのプロフェッショナルなワークフローや日々の生活に不可欠なものになっています。膨大な量のデータを多種多様な知識で提供し、避けられない毒性や偏見にさらしているのです。ほとんどのLLMは有害なコンテンツの発生を防ぐための防御機構を組み込んでいるが、これらの安全対策は最小限の迅速な技術で容易に回避できる。本稿では,これらのモデルの保護層を無効化するための手作業によるプロンプトを含む,Toroughly Engineered Toxicity (TET)データセットについて紹介する。広範な評価を通じて,本論文では,通常のプロンプトを用いて隠蔽される可能性のあるLSMの毒性について,厳密な評価基準を提供する上で,TETが重要な役割を担っていることを示す。

Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.

翻訳日:2024-05-21 19:56:17 公開日:2024-05-20

# ロスランドスケープにおけるデジェネリアシーを用いた機械的解釈性

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability ( http://arxiv.org/abs/2405.10927v2 )

ライセンス: Link先を確認

Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius Hobbhahn,

(参考訳) 機械的解釈可能性(Mechanistic Interpretability)は、ニューラルネットワークによって実装されたアルゴリズムを、その重みとアクティベーションを研究することによってリバースエンジニアリングすることを目的としている。逆エンジニアリングニューラルネットワークの障害は、ネットワーク内の多くのパラメータが、ネットワークによって実装されている計算に関与していないことである。これらの縮退パラメータは内部構造を難読化することができる。特異学習理論は、ニューラルネットワークのパラメータ化がより退化に偏っていること、そしてより退化性のあるパラメータ化がさらに一般化される可能性が高いことを教えてくれる。ネットワークパラメータをデジェネレーションする3つの方法として,レイヤ内のアクティベーション間の線形依存,レイヤに渡される勾配間の線形依存,データポイントの同じサブセットに発火するReLUを同定する。また、モジュラーネットワークはより退化しやすいというヒューリスティックな議論も提示し、この議論に基づいてネットワーク内のモジュールを識別する指標を開発する。縮退を利用した再パラメータ化に不変な方法でニューラルネットワークを表現できるなら、この表現はより解釈可能である可能性が高く、そのような表現がスペーサー相互作用を持つ可能性が示唆されている。本稿では,アクティベーションやジャコビアンの線形依存から退化に不変な表現を得るためのトラクタブル手法であるInteraction Basisを紹介する。

Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to generalize further. We identify 3 ways that network parameters can be degenerate: linear dependence between activations in a layer; linear dependence between gradients passed back to a layer; ReLUs which fire on the same subset of datapoints. We also present a heuristic argument that modular networks are likely to be more degenerate, and we develop a metric for identifying modules in a network that is based on this argument. We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions. We introduce the Interaction Basis, a tractable technique to obtain a representation that is invariant to degeneracies from linear dependence of activations or Jacobians.

翻訳日:2024-05-21 19:56:17 公開日:2024-05-20

# 局所相互作用ベイズ:ニューラルネットワークにおける計算関連・疎干渉特徴の同定

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks ( http://arxiv.org/abs/2405.10928v2 )

ライセンス: Link先を確認

Lucius Bushnaq, Stefan Heimersheim, Nicholas Goldowsky-Dill, Dan Braun, Jake Mendel, Kaarel Hänni, Avery Griffin, Jörn Stöhler, Magdalena Wache, Marius Hobbhahn,

(参考訳) 機械的解釈可能性(Mechanistic Interpretability)は、ニューラルネットワークの内部計算をリバースエンジニアリングすることで、その振る舞いを理解することを目的としている。しかし、現在の手法では、演算機能へのアクティベーションの分解が欠如しているため、ニューラルネットワークのアクティベーションの明確な解釈を見つけるのに苦労している。個々のニューロンやモデルコンポーネントは、明確に異なる特徴や機能に対応しない。本稿では,ネットワークの活性化を新たな基盤であるLIB(Local Interaction Basis)に変換することによって,この制限を克服することを目的とした,新たな解釈可能性手法を提案する。 LIBは、無関係なアクティベーションと相互作用を取り除き、計算的特徴を識別することを目的としている。本手法は, アクティベーションの非関係な方向を減少させ, 隣り合う層間のヤコビ行列の特異ベクトルと基底を一致させる。また、下流計算の重要性に基づいて機能をスケールし、モデル内のすべての計算関連特徴と相互作用を示す相互作用グラフを生成する。モジュール追加およびCIFAR-10モデルに対するLIBの有効性を評価し,主成分分析と比較して,より計算的に関連性の高い特徴を同定した。しかし、LIBは言語モデルに適用した場合、解釈可能性や相互作用の空間性を大幅に改善するものではない。我々は、LIBはニューラルネットワークを解析するための有望な理論駆動型アプローチであるが、現在の形式では、大きな言語モデルには適用できないと結論付けた。

Mechanistic interpretability aims to understand the behavior of neural networks by reverse-engineering their internal computations. However, current methods struggle to find clear interpretations of neural network activations because a decomposition of activations into computational features is missing. Individual neurons or model components do not cleanly correspond to distinct features or functions. We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB). LIB aims to identify computational features by removing irrelevant activations and interactions. Our method drops irrelevant activation directions and aligns the basis with the singular vectors of the Jacobian matrix between adjacent layers. It also scales features based on their importance for downstream computation, producing an interaction graph that shows all computationally-relevant features and interactions in a model. We evaluate the effectiveness of LIB on modular addition and CIFAR-10 models, finding that it identifies more computationally-relevant features that interact more sparsely, compared to principal component analysis. However, LIB does not yield substantial improvements in interpretability or interaction sparsity when applied to language models. We conclude that LIB is a promising theory-driven approach for analyzing neural networks, but in its current form is not applicable to large language models.

翻訳日:2024-05-21 19:56:17 公開日:2024-05-20

# OpenRLHF: 使いやすくスケーラブルで高性能なRLHFフレームワーク

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework ( http://arxiv.org/abs/2405.11143v1 )

ライセンス: Link先を確認

Jian Hu, Xibin Wu, Weixun Wang, Xianyu, Dehao Zhang, Yu Cao,

(参考訳) 大規模言語モデル(LLM)は法則のスケーリングによって成長し続けており、人間のフィードバックからの強化学習(RLHF)はその卓越した性能のために大きな注目を集めている。しかし、1つのモデルの事前訓練や微調整とは異なり、人間のフィードバック(RLHF)からの強化学習を拡大して、大きな言語モデルをトレーニングすることは、4つのモデル間で協調的な課題を引き起こす。提案するOpenRLHFは,効率的なRLHFスケーリングを実現するオープンソースフレームワークである。同じGPU上で4つのモデルを同時に配置する既存のRLHFフレームワークとは異なり、OpenRLHFは、Ray、vLLM、DeepSpeedを使用して70Bパラメータを超えるモデルのスケジューリングを再設計し、リソース利用の改善と多様なトレーニングアプローチを活用する。 Hugging Faceとシームレスに統合されたOpenRLHFは、最適化されたアルゴリズムとローンチスクリプトを備えたアウト・オブ・ボックスソリューションを提供する。 OpenRLHFはRLHF、DPO、拒絶サンプリング、その他のアライメント技術を実装している。 OpenRLHF のコードは https://github.com/OpenLLMAI/OpenRLHF で公開されている。

As large language models (LLMs) continue to grow by scaling laws, reinforcement learning from human feedback (RLHF) has gained significant attention due to its outstanding performance. However, unlike pretraining or fine-tuning a single model, scaling reinforcement learning from human feedback (RLHF) for training large language models poses coordination challenges across four models. We present OpenRLHF, an open-source framework enabling efficient RLHF scaling. Unlike existing RLHF frameworks that co-locate four models on the same GPUs, OpenRLHF re-designs scheduling for the models beyond 70B parameters using Ray, vLLM, and DeepSpeed, leveraging improved resource utilization and diverse training approaches. Integrating seamlessly with Hugging Face, OpenRLHF provides an out-of-the-box solution with optimized algorithms and launch scripts, which ensures user-friendliness. OpenRLHF implements RLHF, DPO, rejection sampling, and other alignment techniques. Empowering state-of-the-art LLM development, OpenRLHF's code is available at https://github.com/OpenLLMAI/OpenRLHF.

翻訳日:2024-05-21 19:17:16 公開日:2024-05-20

# QComp: 医薬品発見のためのQSARベースのデータ補完フレームワーク

QComp: A QSAR-Based Data Completion Framework for Drug Discovery ( http://arxiv.org/abs/2405.11703v1 )

ライセンス: Link先を確認

Bingjia Yang, Yunsie Chung, Archer Y. Yang, Bo Yuan, Xiang Yu,

(参考訳) 薬物発見において、in vitroおよびin vivo実験は化合物の有効性と毒性に関連する生化学的活性を明らかにする。実験データは、巨大な、絶え間なく進化し、スパースなデータセットに蓄積される。化合物の構造情報のみを用いて生化学的活動を予測する定量的構造-活性関係モデル(QSAR)は、研究の進展に伴い、進化する実験データを統合する上での課題に直面している。この問題に対処するデータ補完フレームワークであるQSAR-Complete (QComp) を開発した。既存のQSARモデルに基づいて、QCompは実験データに固有の相関を利用して、様々なタスクにおける予測精度を向上させる。さらに、QCompは、特定のエンドポイントに対する統計的不確実性の低下を定量化し、薬物発見プロセス全体を通して合理的な意思決定を支援することによって、実験の最適なシーケンスを導くための有望なツールとして出現する。

In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the evolving experimental data as studies progress. We develop QSAR-Complete (QComp), a data completion framework to address this issue. Based on pre-existing QSAR models, QComp utilizes the correlation inherent in experimental data to enhance prediction accuracy across various tasks. Moreover, QComp emerges as a promising tool for guiding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific endpoints, thereby aiding in rational decision-making throughout the drug discovery process.

翻訳日:2024-05-21 14:43:16 公開日:2024-05-20

# 自然言語処理タスクにおけるディープラーニングに基づく大規模言語モデルの効率最適化

Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks ( http://arxiv.org/abs/2405.11704v1 )

ライセンス: Link先を確認

Taiyuan Mei, Yun Zi, Xiaohan Cheng, Zijun Gao, Qi Wang, Haowei Yang,

(参考訳) 大規模言語モデルの内部構造と操作機構は理論的に解析され、特にTransformerとその派生アーキテクチャは、長期依存を捕捉しながら計算効率を抑えることができる。さらに、トレーニングフェーズの効率ボトルネックを深く掘り下げ、適応最適化アルゴリズム(AdamWなど)、大規模並列計算技術、収束の加速とメモリフットプリントの削減を目的とした混合精度トレーニング戦略の貢献度を詳細に評価する。これらのアルゴリズムの数学的原理と実装の詳細を解析することにより、実際にトレーニング効率を効果的に改善する方法について明らかにする。モデル配置と推論最適化の観点で,本論文はモデル圧縮技術の最新の進歩を体系的にレビューし,定量化,プルーニング,知識蒸留といった戦略に焦点をあてる。これらの手法の理論的枠組みと異なるアプリケーションシナリオにおけるそれらの効果を比較することにより、モデル予測精度を維持しながら、モデルサイズと推論遅延を著しく低減する能力を示す。さらに, オーバーフィッティングのリスクの増加, 圧縮後の性能損失の制御, アルゴリズムの汎用性の問題など, 現在の効率最適化手法の限界を批判的に検討し, 今後の研究の展望について述べる。本研究は,大規模言語モデルの効率最適化を理解するための包括的な理論的枠組みを提供する。

The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies. Further, we dig deep into the efficiency bottleneck of the training phase, and evaluate in detail the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies to accelerate convergence and reduce memory footprint. By analyzing the mathematical principles and implementation details of these algorithms, we reveal how they effectively improve training efficiency in practice. In terms of model deployment and inference optimization, this paper systematically reviews the latest advances in model compression techniques, focusing on strategies such as quantification, pruning, and knowledge distillation. By comparing the theoretical frameworks of these techniques and their effects in different application scenarios, we demonstrate their ability to significantly reduce model size and inference delay while maintaining model prediction accuracy. In addition, this paper critically examines the limitations of current efficiency optimization methods, such as the increased risk of overfitting, the control of performance loss after compression, and the problem of algorithm generality, and proposes some prospects for future research. In conclusion, this study provides a comprehensive theoretical framework for understanding the efficiency optimization of large-scale language models.

翻訳日:2024-05-21 14:43:16 公開日:2024-05-20

# スクイージング誘起量子拡大多相推定の原理について

On the principle of squeezing-induced quantum-enhanced multiphase estimation ( http://arxiv.org/abs/2405.11705v1 )

ライセンス: Link先を確認

Le Bin Ho,

(参考訳) 本研究は,多相量子メートル法におけるスキューズ技術による測定精度の向上について検討する。これらの手法は単相推定においてよく研究され, 有効利用されているが, 多相状態における利用は未だ検討されていない。これらのシナリオにおける量子エンハンスメントのメカニズムを調べることで、このギャップを埋める。我々の分析は、量子クレーマー・ラオ境界を達成するための最適条件に関する理論的および数値的な洞察を与え、スクイーズによる量子拡大多相推定の可能性とメカニズムを理解するのに役立ちます。この研究は量子力学とセンシング技術の進歩の新たな可能性を開く。

We investigate how squeezing techniques can improve measurement precision in multiphase quantum metrology. While these methods are well-studied and used effectively in single-phase estimations, their use in multiphase situations has not been examined yet. We fill this gap by investigating the mechanism of quantum enhancement in these scenarios. Our analysis provides theoretical and numerical insights into the optimal condition for achieving the quantum Cramer-Rao bound, helping us understand the potential and mechanism for quantum-enhanced multiphase estimations with squeezing. This research opens up new possibilities for advancements in quantum metrology and sensing technologies.

翻訳日:2024-05-21 14:43:16 公開日:2024-05-20

# 質問応答のためのLCM精度の向上:救助へのオントロジー!

Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! ( http://arxiv.org/abs/2405.11706v1 )

ライセンス: Link先を確認

Dean Allemang, Juan Sequeda,

(参考訳) エンタープライズSQLデータベースの知識グラフ/意味表現(Text-to-SPARQL)を利用するLLM(Large Language Models)を用いたQAシステムは、SQLデータベース(Text-to-SQL)上で直接質問に答えるシステムに比べて精度が高いという証拠が増えている。これまでのベンチマークでは,知識グラフを用いることで,精度が16%から54%に向上した。疑問は残る: 精度をさらに改善し、エラー率を下げるにはどうすればいいのか? LLM生成したSPARQLクエリが不正確な経路を辿った過去の研究の観測に基づいて、我々はそのアプローチを提示する。 1)オントロジーに基づくクエリチェック(OBQC):知識グラフのオントロジーを利用してエラーを検出し、LLM生成したSPARQLクエリがオントロジーの意味と一致するかどうかをチェックする。 2) LLM修復: LLMによるエラー説明を使用してSPARQLクエリを修復する。データベンチマークとチャットすることで、私たちのアプローチが全体の精度を72%に向上し、"私は知らない"結果の8%が追加で含まれています。したがって、全体のエラー率は20%である。これらの結果は、知識グラフ、すなわちオントロジーの投資がLLMによる質問応答システムにより高い精度をもたらすというさらなる証拠を与える。

There is increasing evidence that question-answering (QA) systems with Large Language Models (LLMs), which employ a knowledge graph/semantic representation of an enterprise SQL database (i.e. Text-to-SPARQL), achieve higher accuracy compared to systems that answer questions directly on SQL databases (i.e. Text-to-SQL). Our previous benchmark research showed that by using a knowledge graph, the accuracy improved from 16% to 54%. The question remains: how can we further improve the accuracy and reduce the error rate? Building on the observations of our previous research where the inaccurate LLM-generated SPARQL queries followed incorrect paths, we present an approach that consists of 1) Ontology-based Query Check (OBQC): detects errors by leveraging the ontology of the knowledge graph to check if the LLM-generated SPARQL query matches the semantic of ontology and 2) LLM Repair: use the error explanations with an LLM to repair the SPARQL query. Using the chat with the data benchmark, our primary finding is that our approach increases the overall accuracy to 72% including an additional 8% of "I don't know" unknown results. Thus, the overall error rate is 20%. These results provide further evidence that investing knowledge graphs, namely the ontology, provides higher accuracy for LLM powered question answering systems.

翻訳日:2024-05-21 14:43:16 公開日:2024-05-20

# 逆ロバスト性のための適応バッチ正規化ネットワーク

Adaptive Batch Normalization Networks for Adversarial Robustness ( http://arxiv.org/abs/2405.11708v1 )

ライセンス: Link先を確認

Shao-Yuan Lo, Vishal M. Patel,

(参考訳) ディープネットワークは敵の例に弱い。敵防衛訓練(AT)は、その顕著な効果から、現代の敵防衛の標準的基盤となっている。しかし、ATは極めて時間がかかり、実用アプリケーションへの広範なデプロイを控えている。本稿では,非AT防衛を目標として,ATを排除しつつも,強力な敵攻撃に対して頑健な防衛方法を設計する方法を提案する。この質問に答えるために、テスト時間領域適応の最近の進歩に触発された適応バッチ正規化(BN)を利用する。本稿では,適応バッチ正規化ネットワーク(ABNN)と呼ばれる新しい防衛手法を提案する。 ABNNは、訓練済みの代替モデルを使用して、クリーンBN統計を生成し、ターゲットモデルに送信する。対象モデルはクリーンなデータにのみ訓練され、代替モデルのBN統計を整列することを学ぶ。実験結果から、ABNNは画像データセットとビデオデータセットの両方に対するデジタルおよび物理的に実現可能な攻撃に対して、常に敵のロバスト性を改善することが示された。さらに、ATベースのアプローチに比べて、ABNNはよりクリーンなデータ性能を向上し、トレーニング時間の複雑さを著しく低減することができる。

Deep networks are vulnerable to adversarial examples. Adversarial Training (AT) has been a standard foundation of modern adversarial defense approaches due to its remarkable effectiveness. However, AT is extremely time-consuming, refraining it from wide deployment in practical applications. In this paper, we aim at a non-AT defense: How to design a defense method that gets rid of AT but is still robust against strong adversarial attacks? To answer this question, we resort to adaptive Batch Normalization (BN), inspired by the recent advances in test-time domain adaptation. We propose a novel defense accordingly, referred to as the Adaptive Batch Normalization Network (ABNN). ABNN employs a pre-trained substitute model to generate clean BN statistics and sends them to the target model. The target model is exclusively trained on clean data and learns to align the substitute model's BN statistics. Experimental results show that ABNN consistently improves adversarial robustness against both digital and physically realizable attacks on both image and video datasets. Furthermore, ABNN can achieve higher clean data performance and significantly lower training time complexity compared to AT-based approaches.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 証明できない信頼:教育技術習得実践におけるプライバシーとセキュリティのハードル

Trust, Because You Can't Verify:Privacy and Security Hurdles in Education Technology Acquisition Practices ( http://arxiv.org/abs/2405.11712v1 )

ライセンス: Link先を確認

Easton Kelso, Ananta Soneji, Sazzadur Rahaman, Yan Soshitaishvili, Rakibul Hasan,

(参考訳) 高等教育機関(HEI)では教育技術(EdTech)の展望が急速に拡大している。この成長は膨大な複雑さをもたらす。これらのツールによって収集された広範なデータを保護することは、HEIにとって非常に重要です。データ侵害や誤用によるプライバシのインシデントは、データ被写体、特にこれらのツールを使わざるを得ない学生に、セキュリティとプライバシの重大な影響をもたらす可能性がある。これにより、HEIとEdTechベンダーのダイナミクスの深い理解が促される。このギャップに対処するため、私たちは7つのHEIでEdTechのリーダーシップの役割を担っている13人の参加者を対象に、半構造化されたインタビュー調査を実施します。本研究は、HEIにおけるEdTechの買収プロセス、そのプロセス全体にわたるセキュリティとプライバシの問題の検討、サービス契約における適切なセキュリティとプライバシ保護機構を確立する際のHEI職員の問題点、システムとパワー非対称性の視認性の欠如によるベンダーの責任を負うことの難しさなどを明らかにする。現状に関する一定の考察を議論し、状況を改善するための勧告を締めくくる。

The education technology (EdTech) landscape is expanding rapidly in higher education institutes (HEIs). This growth brings enormous complexity. Protecting the extensive data collected by these tools is crucial for HEIs. Privacy incidents of data breaches and misuses can have dire security and privacy consequences on the data subjects, particularly students, who are often compelled to use these tools. This urges an in-depth understanding of HEI and EdTech vendor dynamics, which is largely understudied. To address this gap, we conduct a semi-structured interview study with 13 participants who are in the EdTech leadership roles at seven HEIs. Our study uncovers the EdTech acquisition process in the HEI context, the consideration of security and privacy issues throughout that process, the pain points of HEI personnel in establishing adequate security and privacy protection mechanisms in service contracts, and their struggle in holding vendors accountable due to a lack of visibility into their system and power-asymmetry, among other reasons. We discuss certain observations about the status quo and conclude with recommendations to improve the situation.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# グラフにおける臨界接続のための分散プライバシ保護

Decentralized Privacy Preservation for Critical Connections in Graphs ( http://arxiv.org/abs/2405.11713v1 )

ライセンス: Link先を確認

Conggai Li, Wei Ni, Ming Ding, Youyang Qu, Jianjun Chen, David Smith, Wenjie Zhang, Thierry Rakotoarivelo,

(参考訳) 実体間の多くの実世界の相互接続はグラフとして特徴づけられる。プライバシーとデータユーティリティのバランスがとれたローカルグラフ情報の収集は、最近注目を浴びている。本稿では,結合的なサブグラフ探索に基づいて,個々の参加者に対するエンティティ接続の重要情報を識別し,保護する問題について考察する。この問題は文学では解決されていない。この問題に対処するために,我々は,$p$-cohesion として知られる要塞状粘着部分グラフモデルを用いて,クエリド頂点の臨界接続を抽出することを提案する。要塞内のユーザ接続は、解放されると難読化され、ユーザに関する重要な情報を保護する。新たなメリットとペナルティスコア関数は、各参加者の臨界接続を最小の$p$結合で測定し、コネクションの効果的な識別を容易にするように設計されている。さらに,データ収集者によるクエリに対する応答において,重要な接続のみを保護し,検索した頂点のプライバシを保護することを提案する。分散ディファレンシャルプライバシ(DDP)メカニズムの下では、重要な接続が保護され、残りの接続が未飽和状態にある場合に、その応答が$(\varepsilon, \delta)$-DDPを満たすことが証明されている。提案手法の有効性は,実生活グラフデータセットを用いた広範囲な実験により実証された。

Many real-world interconnections among entities can be characterized as graphs. Collecting local graph information with balanced privacy and data utility has garnered notable interest recently. This paper delves into the problem of identifying and protecting critical information of entity connections for individual participants in a graph based on cohesive subgraph searches. This problem has not been addressed in the literature. To address the problem, we propose to extract the critical connections of a queried vertex using a fortress-like cohesive subgraph model known as $p$-cohesion. A user's connections within a fortress are obfuscated when being released, to protect critical information about the user. Novel merit and penalty score functions are designed to measure each participant's critical connections in the minimal $p$-cohesion, facilitating effective identification of the connections. We further propose to preserve the privacy of a vertex enquired by only protecting its critical connections when responding to queries raised by data collectors. We prove that, under the decentralized differential privacy (DDP) mechanism, one's response satisfies $(\varepsilon, \delta)$-DDP when its critical connections are protected while the rest remains unperturbed. The effectiveness of our proposed method is demonstrated through extensive experiments on real-life graph datasets.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# LLMインフォームドPOI分類を用いた意味軌道データマイニング

Semantic Trajectory Data Mining with LLM-Informed POI Classification ( http://arxiv.org/abs/2405.11715v1 )

ライセンス: Link先を確認

Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma,

(参考訳) ヒトの旅行経路のマイニングは、交通システム、経路最適化、交通管理、そして人間の旅行パターンの研究に不可欠である。従来のルールベースのアプローチでは、セマンティック情報を統合することは効率と正確性の両方に制限がある。 Points of Interest(POI)データから推定される活動タイプのような意味情報は、軌道採掘の質を大幅に向上させることができる。しかし、多くのPOIには不完全な特徴情報があり、現在の学習ベースのPOIアルゴリズムは分類を行うためにデータセットの整合性を必要とするため、これらの洞察を統合することは難しい。本稿では,人体走行軌道採掘のための新しいパイプラインを提案する。提案手法はまず,大規模言語モデル(LLM)の強い推論・理解能力を利用して,活動型を付加したPOIをアノテートする。 OpenStreetMap (OSM) POI データセットを用いた評価では,PAI 分類では 93.4% の精度,96.1% のF-1 スコア,91.7% の精度で 92.3% のF-1 スコアを得た。

Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 安全な強化学習のための相反する表現学習の可能性

Feasibility Consistent Representation Learning for Safe Reinforcement Learning ( http://arxiv.org/abs/2405.11718v1 )

ライセンス: Link先を確認

Zhepeng Cen, Yihang Yao, Zuxin Liu, Ding Zhao,

(参考訳) 安全強化学習(RL)の分野では、安全制約を満たすことと報酬性能を最適化することのバランスを見つけることが大きな課題である。この取り組みにおける重要な障害は、安全制約の推定であり、通常は、制約信号のスパースな性質から報酬の計量を推定するよりも難しい。この問題に対処するため,FCSRL(Fasibility Consistent Safe Reinforcement Learning)という新しいフレームワークを導入する。本フレームワークは、表現学習と実現可能性指向の目的を組み合わせることで、安全RLのために生の状態から安全関連情報を識別し、抽出する。自己指導型学習技術とより学習可能な安全基準を活用して,政策学習と制約推定を強化する。ベクトル状態および画像に基づくタスクの多岐にわたる経験的評価は,本手法が従来の表現学習ベースラインよりも優れた安全性を学習し,優れた性能を実現することができることを示す。

In the field of safe reinforcement learning (RL), finding a balance between satisfying safety constraints and optimizing reward performance presents a significant challenge. A key obstacle in this endeavor is the estimation of safety constraints, which is typically more difficult than estimating a reward metric due to the sparse nature of the constraint signals. To address this issue, we introduce a novel framework named Feasibility Consistent Safe Reinforcement Learning (FCSRL). This framework combines representation learning with feasibility-oriented objectives to identify and extract safety-related information from the raw state for safe RL. Leveraging self-supervised learning techniques and a more learnable safety metric, our approach enhances the policy learning and constraint estimation. Empirical evaluations across a range of vector-state and image-based tasks demonstrate that our method is capable of learning a better safety-aware embedding and achieving superior performance than previous representation learning baselines.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 非アベリア自己補正量子メモリ

Non-Abelian Self-Correcting Quantum Memory ( http://arxiv.org/abs/2405.11719v1 )

ライセンス: Link先を確認

Po-Shen Hsin, Ryohei Kobayashi, Guanyu Zhu,

(参考訳) 局所可換な非パウリ安定化格子モデルと$\mathbb{Z}_2^3$高次ゲージ場の非自明なトポロジカル作用を持つ場理論を用いて、粒子励起のない時空次元のD\geq 5+1$で無限に多くの新しい候補非アベリア自己補正型トポロジカル量子メモリの族を構築する。このような非パウリ安定化器モデルをマジック安定化器符号と呼ぶ。トポロジカル順序の族は、アベリアの電気励起とアイシングのような融合則に従う非アベリア磁気励起を持ち、二面体群 $\mathbb{D}_8$ゲージ理論を2+1d で一般化する。最も単純な例は、アベリアループ励起と非アベリア膜励起を含む5+1dの新しい非アベリア自己補正メモリである。我々は、Peierls引数を用いて自己補正特性と熱安定性を示し、確率的局所セル-オートマトンデコーダを考案する。

We construct a family of infinitely many new candidate non-Abelian self-correcting topological quantum memories in $D\geq 5+1$ spacetime dimensions without particle excitations using local commuting non-Pauli stabilizer lattice models and field theories of $\mathbb{Z}_2^3$ higher-form gauge fields with nontrivial topological action. We call such non-Pauli stabilizer models magic stabilizer codes. The family of topological orders have Abelian electric excitations and non-Abelian magnetic excitations that obey Ising-like fusion rules, generalizing the dihedral group $\mathbb{D}_8$ gauge theory in 2+1d. The simplest example includes a new non-Abelian self-correcting memory in 5+1d with Abelian loop excitations and non-Abelian membrane excitations. We use a Peierls argument to demonstrate the self-correction property and the thermal stability, and devise a probablistic local cellular-automaton decoder.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# UAV Swarmの軌道予測と最適化のためのAIアルゴリズム

AI Algorithm for Predicting and Optimizing Trajectory of UAV Swarm ( http://arxiv.org/abs/2405.11722v1 )

ライセンス: Link先を確認

Amit Raj, Kapil Ahuja, Yann Busnel,

(参考訳) 本稿では,無人航空機(UAV)の航路生成における人工知能(AI)技術の適用について検討する。 2つの主な課題は、UAVの経路を正確に予測し、それらの衝突を効果的に回避することである。まず,1つの隠蔽層を持つフィードフォワードニューラルネットワーク(FFNN)に多様な活性化関数を体系的に適用し,予測経路の精度を従来よりも向上させる。次に,スウェーデンとエリオットのアクティベーションを高度に融合した新しいアクティベーション関数AdaptoSwelliGaussを導入する。スイッシュは滑らかな遷移を促進し、エリオットは突然の軌道変化を捉え、スケールとシフトしたガウスはノイズに対する堅牢性を高める。このダイナミックな組み合わせは、UAV軌道予測の複雑さを捉えるために特別に設計されている。この新たなアクティベーション関数は、既存のアクティベーション関数よりもかなり精度が高い。第3に,UAVの衝突回避を両立させる新たな統合衝突検出・回避・バッチング(ICDAB)戦略を提案する。この統合は、最初のテクニックで過度に複雑なパスを避けるトラジェクトリ操作の数を減らすことと、第2のテクニックで全体の離陸時間を短縮するバッチサイズを小さくするという、両方の欠点を克服するのに役立ちます。

This paper explores the application of Artificial Intelligence (AI) techniques for generating the trajectories of fleets of Unmanned Aerial Vehicles (UAVs). The two main challenges addressed include accurately predicting the paths of UAVs and efficiently avoiding collisions between them. Firstly, the paper systematically applies a diverse set of activation functions to a Feedforward Neural Network (FFNN) with a single hidden layer, which enhances the accuracy of the predicted path compared to previous work. Secondly, we introduce a novel activation function, AdaptoSwelliGauss, which is a sophisticated fusion of Swish and Elliott activations, seamlessly integrated with a scaled and shifted Gaussian component. Swish facilitates smooth transitions, Elliott captures abrupt trajectory changes, and the scaled and shifted Gaussian enhances robustness against noise. This dynamic combination is specifically designed to excel in capturing the complexities of UAV trajectory prediction. This new activation function gives substantially better accuracy than all existing activation functions. Thirdly, we propose a novel Integrated Collision Detection, Avoidance, and Batching (ICDAB) strategy that merges two complementary UAV collision avoidance techniques: changing UAV trajectories and altering their starting times, also referred to as batching. This integration helps overcome the disadvantages of both - reduction in the number of trajectory manipulations, which avoids overly convoluted paths in the first technique, and smaller batch sizes, which reduce overall takeoff time in the second.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 一般高次元分類枠組みにおける非微分可能サロゲート損失の推論

Inference with non-differentiable surrogate loss in a general high-dimensional classification framework ( http://arxiv.org/abs/2405.11723v1 )

ライセンス: Link先を確認

Muxuan Liang, Yang Ning, Maureen A Smith, Ying-Qi Zhao,

(参考訳) 置換損失関数によるペナル化された経験的リスク最小化は、分類問題において高次元線形決定則を導出するためにしばしば用いられる。文献の多くは一般化誤差に焦点を当てているが、特に代理損失が微分不可能な場合、推定決定規則の駆動要因を特定する有効な推論手順が欠如している。本研究では,不連続な勾配と非正則なヘッセン性を有する一方向線形サロゲート損失を用いて推定した線形決定規則に対する仮説テストと区間推定を構築するために,カーネルスムースな非相関スコアを提案する。具体的には、不連続点付近の不連続勾配を滑らかにするためにカーネル近似を採用し、サロゲート損失の非正則ヘシアンを近似する。追加のニュアンスパラメータが関与するアプリケーションでは、フレキシブルなニュアンス推定とカーネル近似に対応するために、新しいクロスフィットバージョンを提案する。カーネルスムースなデコラートスコアとそのクロスフィットバージョンを高次元設定で限定分布として確立する。提案手法の有効性と優位性を示すため,シミュレーションおよび実データ解析を行った。

Penalized empirical risk minimization with a surrogate loss function is often used to derive a high-dimensional linear decision rule in classification problems. Although much of the literature focuses on the generalization error, there is a lack of valid inference procedures to identify the driving factors of the estimated decision rule, especially when the surrogate loss is non-differentiable. In this work, we propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations for the linear decision rule estimated using a piece-wise linear surrogate loss, which has a discontinuous gradient and non-regular Hessian. Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points and approximate the non-regular Hessian of the surrogate loss. In applications where additional nuisance parameters are involved, we propose a novel cross-fitted version to accommodate flexible nuisance estimates and kernel approximations. We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup. Simulation and real data analysis are conducted to demonstrate the validity and superiority of the proposed method.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 大規模言語モデルのための知識知能学習データ検索

Token-wise Influential Training Data Retrieval for Large Language Models ( http://arxiv.org/abs/2405.11724v1 )

ライセンス: Link先を確認

Huawei Lin, Jikai Long, Zhaozhuo Xu, Weijie Zhao,

(参考訳) LLM(Large Language Model)の生成を前提として、この生成に繋がったトレーニングデータをどのように特定すればよいのか? 本稿では,LLMに適応したスケーラブルなフレームワークであるRapidInを提案し,学習データへの影響を推定した。提案するフレームワークは,キャッシュと検索という2つのステージで構成されている。まず、勾配ベクトルを20,000倍以上圧縮し、ディスクやGPU/CPUメモリにキャッシュする。すると、RapidInはキャッシュされた勾配を効率よく横切り、数分で影響を推定し、6,326倍のスピードアップを達成する。さらに、RapidInはマルチGPU並列化をサポートし、キャッシュと検索を大幅に高速化する。実験の結果,RapidInの有効性と有効性を確認した。

Given a Large Language Model (LLM) generation, how can we identify which training data led to this generation? In this paper, we proposed RapidIn, a scalable framework adapting to LLMs for estimating the influence of each training data. The proposed framework consists of two stages: caching and retrieval. First, we compress the gradient vectors by over 200,000x, allowing them to be cached on disk or in GPU/CPU memory. Then, given a generation, RapidIn efficiently traverses the cached gradients to estimate the influence within minutes, achieving over a 6,326x speedup. Moreover, RapidIn supports multi-GPU parallelization to substantially accelerate caching and retrieval. Our empirical result confirms the efficiency and effectiveness of RapidIn.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 強化学習を加速するハイウェイグラフ

Highway Graph to Accelerate Reinforcement Learning ( http://arxiv.org/abs/2405.11727v1 )

ライセンス: Link先を確認

Zidu Yin, Zhen Zhang, Dong Gong, Stefano V. Albrecht, Javen Q. Shi,

(参考訳) 強化学習(RL)アルゴリズムは訓練効率の低下に悩まされることが多い。この問題を緩和するための戦略は、モンテカルロ木探索(MCTS)や価値反復(VI)といったモデルベースの計画アルゴリズムを環境モデルに組み込むことである。 VIの最大の制限は、大きなテンソルを反復する必要があることである。これらはいまだに集中的な計算に繋がる。本稿では,RLアルゴリズムの学習効率を向上させることにより,RLアルゴリズムの学習効率を向上させることに注力する。離散状態と作用空間を持つ決定論的環境において、遷移の非分岐列は中間状態から逸脱することなくエージェントを移動させ、これをハイウェイと呼ぶ。このような非分岐ハイウェイでは、値更新プロセスは1ステップのプロセスとしてマージすることができる。そこで本研究では,状態遷移をモデル化するための新しいグラフ構造であるハイウェイグラフを提案する。我々のハイウェイグラフは遷移モデルを簡潔なグラフに圧縮し、エッジは複数の状態遷移を表現し、各イテレーションで複数の時間ステップで値の伝搬をサポートする。これにより、ハイウェイグラフ上でのVIアルゴリズムの促進により、より効率的な価値学習手法を得ることができる。ハイウェイグラフをRL(モデルに基づくオフポリシーRL法)に統合することにより、初期の段階(100万フレーム)においてRLトレーニングを著しく加速することができる。その結果,提案手法はモデルフリー・モデルベースRLアルゴリズムとモデルフリー・モデルベースRLアルゴリズムの両方に優れており,同等あるいは優れたリターンを維持しつつ10～150倍以上の効率性を示した。さらに、ディープニューラルネットワークベースのエージェントをハイウェイグラフを使用してトレーニングすることで、より一般化とストレージコストの低減を実現している。

Reinforcement Learning (RL) algorithms often suffer from low training efficiency. A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. The major limitation of VI is the need to iterate over a large tensor. These still lead to intensive computations. We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process. For the deterministic environments with discrete state and action spaces, a non-branching sequence of transitions moves the agent without deviating from intermediate states, which we call a highway. On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step. Based on this observation, we propose a novel graph structure, named highway graph, to model the state transition. Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration. We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs. By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames). Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses. Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 遺伝的アルゴリズムとシミュレーションアニーリングを用いたロジスティックスデポにおける作業者スケジューリングの最適化

Optimization of Worker Scheduling at Logistics Depots Using Genetic Algorithms and Simulated Annealing ( http://arxiv.org/abs/2405.11729v1 )

ライセンス: Link先を確認

Jinxin Xu, Haixin Wu, Yu Cheng, Liyang Wang, Xin Yang, Xintong Fu, Yuelong Su,

(参考訳) 本稿では,遺伝的アルゴリズムとシミュレートされたアニーリングアルゴリズムを組み合わせたロジスティクスデポにおける作業者のスケジューリングの最適化について述べる。労働利用を最小化しつつ、ロジスティクス・デポの効率を最適化するためには、恒久的かつ一時的な労働者の効率的なスケジューリングが不可欠である。この研究は0-1整数線形プログラミングモデルの構築から始まり、決定変数が与えられた日毎の時間帯ごとに、永続的および一時的なワーカーのスケジューリングを決定する。目的は、時間的労働条件の履行を保証し、労働者を1日1時間に制限し、永続的な労働者のために連続的な労働日を上限とし、非負性や整数的制約を維持することにある。モデルは、遺伝的アルゴリズムとシミュレートされたアニールを用いて解決される。以上の結果から, 遺伝的アルゴリズムは, 溶液品質の面でシミュレーションアニールよりも優れていたことが示唆された。最適解法は最低29857人日を明らかにする。

This paper addresses the optimization of scheduling for workers at a logistics depot using a combination of genetic algorithm and simulated annealing algorithm. The efficient scheduling of permanent and temporary workers is crucial for optimizing the efficiency of the logistics depot while minimizing labor usage. The study begins by establishing a 0-1 integer linear programming model, with decision variables determining the scheduling of permanent and temporary workers for each time slot on a given day. The objective function aims to minimize person-days, while constraints ensure fulfillment of hourly labor requirements, limit workers to one time slot per day, cap consecutive working days for permanent workers, and maintain non-negativity and integer constraints. The model is then solved using genetic algorithms and simulated annealing. Results indicate that, for this problem, genetic algorithms outperform simulated annealing in terms of solution quality. The optimal solution reveals a minimum of 29857 person-days.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 不合理性の度合い:感度とインプリート揮発性表面

Degree of Irrationality: Sentiment and Implied Volatility Surface ( http://arxiv.org/abs/2405.11730v1 )

ライセンス: Link先を確認

Jiahao Weng, Yan Xie,

(参考訳) 本研究では,毎日の高周波感情データを構築し,VAR法を用いて翌日のインプリート変動面の予測を試みた。 2014年から2023年にかけて、East Money Stock Forumから63万件のテキストデータを収集し、BERTやLSTMといったディープラーニング手法を用いて毎日の市場評価指標を構築しました。 FFT法とEMD法を併用することにより、高頻度の感情はATM(ATM)オプションのインプット・ボラティリティと強く相関し、低頻度の感情はより深いアウト・オブ・ザ・モニー(DOTM)オプションのインプティブ・ボラティリティと強く相関していた。さらに分析したところ、インプリッドボラティリティ表面の形状は、単に市場のパニックを超えた、より豊かな市場の感情情報を含んでいることがわかった。我々は,この感情情報を組み込むことで,刺激された揮発性表面の予測精度を向上させることを実証した。

In this study, we constructed daily high-frequency sentiment data and used the VAR method to attempt to predict the next day's implied volatility surface. We utilized 630,000 text data entries from the East Money Stock Forum from 2014 to 2023 and employed deep learning methods such as BERT and LSTM to build daily market sentiment indicators. By applying FFT and EMD methods for sentiment decomposition, we found that high-frequency sentiment had a stronger correlation with at-the-money (ATM) options' implied volatility, while low-frequency sentiment was more strongly correlated with deep out-of-the-money (DOTM) options' implied volatility. Further analysis revealed that the shape of the implied volatility surface contains richer market sentiment information beyond just market panic. We demonstrated that incorporating this sentiment information can improve the accuracy of implied volatility surface predictions.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 放射線治療における臓器の質保証

Quality assurance of organs-at-risk delineation in radiotherapy ( http://arxiv.org/abs/2405.11732v1 )

ライセンス: Link先を確認

Yihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Wei Liu, Chenbin Liu,

(参考訳) 放射線治療計画において,腫瘍標的と臓器の脱線は重要である。自動セグメンテーションは、医師の作業量を削減し、一貫性を向上させるために使用することができる。しかし, 自動セグメンテーションの品質保証は, 臨床実践においてまだ必要ではない。患者データはAAPM Thoracic Auto-Segmentation Challengeの標準化データセットである。 OARは左右肺,心臓,食道,脊髄であった。 OARの2つのグループが生成され、ベンチマークデータセットは経験豊富な医師によって手動で構成され、テストデータセットはソフトウェアAccuContourを使って自動的に生成される。特徴抽出器としてresnet-152ネットワークが実行され、高品質または低品質の1クラスサポートベクトル分類器が使用された。本研究では, モデル性能のバランス精度, Fスコア, 感度, 特異度, および受信演算子特性曲線の下での評価を行った。我々は,提案手法の一般化を評価するために輪郭誤差をランダムに生成し,検出限界を探索し,検出限界とボリューム,Dice類似度係数,ハウスドルフ距離,平均表面距離などの様々な指標との相関性について検討した。提案した1クラス分類器は、バランスの取れた精度やAUCなどの指標よりも優れていた。提案手法は,様々な種類のエラー処理において,バイナリ分類器よりも大幅に改善された。提案手法は,一級分類フレームワークにおける残差ネットワークとアテンション機構を導入し,様々な種類のOAR輪郭誤差を高精度に検出することができた。提案手法は,輪郭デライン化における医師の診査の負担を大幅に軽減することができる。

The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. Automatic segmentation can be used to reduce the physician workload and improve the consistency. However, the quality assurance of the automatic segmentation is still an unmet need in clinical practice. The patient data used in our study was a standardized dataset from AAPM Thoracic Auto-Segmentation Challenge. The OARs included were left and right lungs, heart, esophagus, and spinal cord. Two groups of OARs were generated, the benchmark dataset manually contoured by experienced physicians and the test dataset automatically created using a software AccuContour. A resnet-152 network was performed as feature extractor, and one-class support vector classifier was used to determine the high or low quality. We evaluate the model performance with balanced accuracy, F-score, sensitivity, specificity and the area under the receiving operator characteristic curve. We randomly generated contour errors to assess the generalization of our method, explored the detection limit, and evaluated the correlations between detection limit and various metrics such as volume, Dice similarity coefficient, Hausdorff distance, and mean surface distance. The proposed one-class classifier outperformed in metrics such as balanced accuracy, AUC, and others. The proposed method showed significant improvement over binary classifiers in handling various types of errors. Our proposed model, which introduces residual network and attention mechanism in the one-class classification framework, was able to detect the various types of OAR contour errors with high accuracy. The proposed method can significantly reduce the burden of physician review for contour delineation.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# 合成フロケット格子上でC = $\pm$2 のチャーン絶縁体シミュレーション

Simulating a Chern Insulator with C = $\pm$2 on Synthetic Floquet Lattice ( http://arxiv.org/abs/2405.11733v1 )

ライセンス: Link先を確認

Lingxiao Lei, Weichen Wang, Guangyao Huang, Shun Hu, Xi Cao, Xinfang Zhang, Mingtang Deng, Pingxing Chen,

(参考訳) 合成フロケ格子は、互いに共振周波数の強い複数の駆動によって生成され、トポロジカル現象の量子シミュレーションのための強力なプラットフォームを提供する。本研究では,ハーフBHZ格子の2層を結合し,そのトポロジカルな性質をシミュレートするためにフロケ格子にマッピングすることで,チャーン数C=$\pm$2のチャーン絶縁体の4バンド強結合モデルを提案する。 Floquet-versionモデルのチャーン数を決定するため、Martin et al (Phys. Rev. X 7, 041008 (2017)) と Boyers et al (Phys. Rev. 125, 160505 (2020)) が導入したトポロジ的発振法を拡張した。シミュレーションの結果、これらの手法のいずれかを用いてチャーン数の抽出に成功したことを示し、元の2層半BHZモデルから導出された理論図と密に一致した位相図の優れた予測を提供する。最後に,本モデルに対する実験的実装の可能性について概説する。我々の研究は、量子コンピューティングプラットフォームを用いて複雑なトポロジカルな物質をシミュレートする大きな可能性を示し、それによって、相互作用しないトポロジカルな量子状態のためのより普遍的なシミュレータを構築する方法を確立し、これらの興味深い現象の理解を深める。

The synthetic Floquet lattice, generated by multiple strong drives with mutually incommensurate frequencies, provides a powerful platform for the quantum simulation of topological phenomena. In this study, we propose a 4-band tight-binding model of the Chern insulator with a Chern number C = $\pm$2 by coupling two layers of the half-BHZ lattice and subsequently mapping it onto the Floquet lattice to simulate its topological properties. To determine the Chern number of our Floquet-version model, we extend the energy pumping method proposed by Martin et al. [Phys. Rev. X 7, 041008 (2017)] and the topological oscillation method introduced by Boyers et al. [Phys. Rev. Lett. 125, 160505 (2020)], followed by numerical simulations for both methodologies. The simulation results demonstrate the successful extraction of the Chern number using either of these methods, providing an excellent prediction of the phase diagram that closely aligns with the theoretical one derived from the original bilayer half-BHZ model. Finally, we briefly discuss a potential experimental implementation for our model. Our work demonstrates significant potential for simulating complex topological matter using quantum computing platforms, thereby paving the way for constructing a more universal simulator for non-interacting topological quantum states and advancing our understanding of these intriguing phenomena.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# コンタクトレスのポリソノグラフィー:電波が睡眠について教えてくれるもの

Contactless Polysomnography: What Radio Waves Tell Us about Sleep ( http://arxiv.org/abs/2405.11739v1 )

ライセンス: Link先を確認

Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi,

(参考訳) 自宅で睡眠を評価したり、睡眠段階を捉えたり、睡眠中に身体から跳ね返る電波を分析するだけで無呼吸症の発生を検知する能力は非常に強力である。このような能力は、患者の家庭における経時的データ収集を可能にし、睡眠の理解と様々な疾患との相互作用、および臨床治験と定期治療の両方における治療反応を知らせる。本稿では、睡眠中の人から反射される電波から睡眠と夜間呼吸を受動的にモニタリングする高度な機械学習アルゴリズムを開発する。金の標準値(ポリソノグラフィー)と比較すると、このモデルが睡眠催眠グラム(ウェイク、ライト睡眠、ディープ睡眠またはREMに分類される30秒エポックの精度が81%)を捉え、睡眠時無呼吸(AUROC = 0.88)を検出し、患者の無呼吸指数(ICC=0.95; 95% CI = [0.93, 0.97])を測定することが示されている。特に、このモデルは人種、性別、年齢にわたって同等のパフォーマンスを示す。さらに、このモデルは睡眠段階と、神経、精神医学、循環器、免疫疾患を含む様々な疾患の間の情報的相互作用を明らかにする。これらの知見は,臨床および介入臨床試験の約束を果たすだけでなく,各種疾患の理解と管理の基本的な要素としての睡眠の重要性も浮き彫りにした。

The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therapeutic responses, both in clinical trials and routine care. In this article, we develop an advanced machine learning algorithm for passively monitoring sleep and nocturnal breathing from radio waves reflected off people while asleep. Validation results in comparison with the gold standard (i.e., polysomnography) (n=849) demonstrate that the model captures the sleep hypnogram (with an accuracy of 81% for 30-second epochs categorized into Wake, Light Sleep, Deep Sleep, or REM), detects sleep apnea (AUROC = 0.88), and measures the patient's Apnea-Hypopnea Index (ICC=0.95; 95% CI = [0.93, 0.97]). Notably, the model exhibits equitable performance across race, sex, and age. Moreover, the model uncovers informative interactions between sleep stages and a range of diseases including neurological, psychiatric, cardiovascular, and immunological disorders. These findings not only hold promise for clinical practice and interventional trials but also underscore the significance of sleep as a fundamental component in understanding and managing various diseases.

翻訳日:2024-05-21 14:33:17 公開日:2024-05-20

# サンプル効率強化学習のための合成観測による未来の学習表現

Learning Future Representation with Synthetic Observations for Sample-efficient Reinforcement Learning ( http://arxiv.org/abs/2405.11740v1 )

ライセンス: Link先を確認

Xin Liu, Yaran Chen, Dongbin Zhao,

(参考訳) 視覚強化学習(RL)では、上流表現学習が下流政策学習の効果を決定づける。補助的なタスクを利用することで、エージェントはターゲットとして視覚的表現を高めることができ、下流RLのサンプル効率と性能を向上させることができる。先進的な補助的タスクは、それぞれ異なる補助目的を通じて限られた経験(観察、行動、報酬を含む)からできるだけ多くの情報を抽出する方法に重点を置いている。本稿では,新しい自己教師型 RL アプローチである textbf{L}earning \textbf{F}uture representation with \textbf{S}ynthetic observed \textbf{(LFS)} を提案する。具体的には、将来の情報を含む可能性のある観測を合成するためのトレーニング不要な手法と、不等化合成ノイズを除去するためのデータ選択手法を提案する。残りの合成観測と実観測は、クラスタリングに基づく表現学習のための時間的関連タスクを達成する補助データとして機能する。 LFSは、エージェントが事前に現れていない観察にアクセスし、学習することができるので、後になってそれらがすぐに理解され、活用される。加えて、LFSは報酬やアクションに依存しないため、最近の高度な補助タスクよりも広い範囲のアプリケーション(例えばビデオから学習する)がある。広汎な実験により、我々のLFSは、継続的な制御に挑戦する上で最先端のRLサンプル効率を示し、アクションフリービデオの実演に基づく高度な視覚前訓練を可能にした。

In visual Reinforcement Learning (RL), upstream representation learning largely determines the effect of downstream policy learning. Employing auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the sample efficiency and performance of downstream RL. Prior advanced auxiliary tasks all focus on how to extract as much information as possible from limited experience (including observations, actions, and rewards) through their different auxiliary objectives, whereas in this article, we first start from another perspective: auxiliary training data. We try to improve auxiliary representation learning for RL by enriching auxiliary training data, proposing \textbf{L}earning \textbf{F}uture representation with \textbf{S}ynthetic observations \textbf{(LFS)}, a novel self-supervised RL approach. Specifically, we propose a training-free method to synthesize observations that may contain future information, as well as a data selection approach to eliminate unqualified synthetic noise. The remaining synthetic observations and real observations then serve as the auxiliary data to achieve a clustering-based temporal association task for representation learning. LFS allows the agent to access and learn observations that have not yet appeared in advance, so as to quickly understand and exploit them when they occur later. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application (e.g., learning from video) than recent advanced auxiliary tasks. Extensive experiments demonstrate that our LFS exhibits state-of-the-art RL sample efficiency on challenging continuous control and enables advanced visual pre-training based on action-free video demonstrations.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 対称測度に基づく正の写像による量子絡み合いの推定

Quantum entanglement estimation via symmetric measurement based positive maps ( http://arxiv.org/abs/2405.11741v1 )

ライセンス: Link先を確認

Jiaxin Li, Hongmei Yao, Shao-Ming Fei, Zhaobing Fan, Haitao Ma,

(参考訳) 対称測度に基づく正およびトレース保存マップのクラスを提供する。これらの正の写像から、分離性基準、絡み合いの証人、およびより低いコンカレンス境界を示す。我々の分離性基準、絡み合いの証人、下位境界が、関連する既存の結果よりも量子絡み合いをよりよく検出し、推定できることを示す。

We provide a class of positive and trace-preserving maps based on symmetric measurements. From these positive maps we present separability criteria, entanglement witnesses, as well as the lower bounds of concurrence. We show by detailed examples that our separability criteria, entanglement witnesses and lower bounds can detect and estimate the quantum entanglement better than the related existing results.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 組成一般化の一般理論

A General Theory for Compositional Generalization ( http://arxiv.org/abs/2405.11743v1 )

ライセンス: Link先を確認

Jingwen Fu, Zhizheng Zhang, Yan Lu, Nanning Zheng,

(参考訳) 構成的一般化(CG)は、人間の知的な進歩において重要な認知的な飛躍を示す、馴染み深い概念の新たな組み合わせを理解する能力を具現化したものである。その重要な重要性にもかかわらず、ディープニューラルネットワーク(DNN)は、構成一般化問題に対処する上での課題に直面し、かなりの研究関心を喚起する。しかし、既存の理論はしばしばタスク固有の仮定に依存し、CGの包括的な理解を制限している。本研究の目的は,タスク依存的視点から構成一般化を探求することであり,タスク固有の分析に補完的な視点を提供することである。主な課題は、その範囲を過度に制限することなくCGを定義することである。この定義を用いて、「CGの最終的な解決策はどのようなものか?」という問いに、以下の理論的知見を通して答えようとしている。 1) 一般解の欠如を示すCGにおける最初のNo Free Lunch定理 2)任意のCG問題に適用可能な新しい一般化であって,有効なCGソリューションの条件を指定すること。 3)CG問題とその解決策の理解を深めるための生成的効果の導入。本論文の意義は、CG問題に対する一般的な理論を提供することであり、タスク固有のシナリオの下での事前の定理と組み合わせることで、CGの包括的理解につながる。

Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN) faces challenges in addressing the compositional generalization problem, prompting considerable research interest. However, existing theories often rely on task-specific assumptions, constraining the comprehensive understanding of CG. This study aims to explore compositional generalization from a task-agnostic perspective, offering a complementary viewpoint to task-specific analyses. The primary challenge is to define CG without overly restricting its scope, a feat achieved by identifying its fundamental characteristics and basing the definition on them. Using this definition, we seek to answer the question "what does the ultimate solution to CG look like?" through the following theoretical findings: 1) the first No Free Lunch theorem in CG, indicating the absence of general solutions; 2) a novel generalization bound applicable to any CG problem, specifying the conditions for an effective CG solution; and 3) the introduction of the generative effect to enhance understanding of CG problems and their solutions. This paper's significance lies in providing a general theory for CG problems, which, when combined with prior theorems under task-specific scenarios, can lead to a comprehensive understanding of CG.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 構成可能なミラーダイス : 意思決定の統一を目指して

Configurable Mirror Descent: Towards a Unification of Decision Making ( http://arxiv.org/abs/2405.11746v1 )

ライセンス: Link先を確認

Pengdeng Li, Shuxin Li, Chang Yang, Xinrun Wang, Shuyue Hu, Xiao Huang, Hau Chan, Bo An,

(参考訳) 意思決定問題は、単一エージェント、eg、Atari、協力型マルチエージェント、eg、ハナビ、競争型マルチエージェント、eg、ホールドエムポーカー、複合型協調型および競争型(eg、サッカー)に分類される。特定の意思決定問題に対処する様々な方法が提案されている。特定のカテゴリーでの成功にもかかわらず、これらの手法は通常独立して進化し、他のカテゴリに一般化することができない。したがって、意思決定の根本的な問題は次の通りである。 \emph{Can we developed \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problem? この問題に対処する主な課題がいくつかある。一異なる意思決定のカテゴリーは、異なるエージェントの数及び異なるエージェント間の関係を含む。二異なるカテゴリーが異なる解決概念及び評価措置を有すること。三すべてのカテゴリをカバーする包括的なベンチマークがないこと。本研究は,3つの主要なコントリビューションでこの問題に対処するための予備的試みを示す。 i) MD変種を一般化した一般化ミラー降下法(GMD)を提案する。二メタコントローラを導入し、評価基準に基づいてGMD条件のハイパーパラメータを動的に調整する構成可能なミラー降下(CMD)を提案する。 iii) 異なる意思決定カテゴリにまたがる15の学術的フレンドリーなゲームを用いて, \textsc{GameBench} を構築した。大規模な実験では、CMDはベースラインよりも経験的に競争力があり、より良い結果が得られる一方で、多様な意思決定の次元を探索する能力を提供している。

Decision-making problems, categorized as single-agent, e.g., Atari, cooperative multi-agent, e.g., Hanabi, competitive multi-agent, e.g., Hold'em poker, and mixed cooperative and competitive, e.g., football, are ubiquitous in the real world. Various methods are proposed to address the specific decision-making problems. Despite the successes in specific categories, these methods typically evolve independently and cannot generalize to other categories. Therefore, a fundamental question for decision-making is: \emph{Can we develop \textbf{a single algorithm} to tackle \textbf{ALL} categories of decision-making problems?} There are several main challenges to address this question: i) different decision-making categories involve different numbers of agents and different relationships between agents, ii) different categories have different solution concepts and evaluation measures, and iii) there lacks a comprehensive benchmark covering all the categories. This work presents a preliminary attempt to address the question with three main contributions. i) We propose the generalized mirror descent (GMD), a generalization of MD variants, which considers multiple historical policies and works with a broader class of Bregman divergences. ii) We propose the configurable mirror descent (CMD) where a meta-controller is introduced to dynamically adjust the hyper-parameters in GMD conditional on the evaluation measures. iii) We construct the \textsc{GameBench} with 15 academic-friendly games across different decision-making categories. Extensive experiments demonstrate that CMD achieves empirically competitive or better outcomes compared to baselines while providing the capability of exploring diverse dimensions of decision making.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 線形注意による文脈内学習の漸近理論

Asymptotic theory of in-context learning by linear attention ( http://arxiv.org/abs/2405.11751v1 )

ライセンス: Link先を確認

Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan,

(参考訳) トランスフォーマーは、明示的な事前トレーニングなしで、入力自身で提供される例に基づいてタスクを学習し、実行することができる。イン・コンテクスト・ラーニング(ICL)として知られるこの能力はトランスフォーマーの成功の基盤であるが、必要なサンプルの複雑さ、タスクの多様性の事前学習、ICLの成功のためのコンテキスト長については未解決のままである。ここでは、線形回帰タスクのICLの正確な解法モデルにおいて、これらの疑問に対する正確な答えを提供する。我々は,トークン次元が無限大となる現象論的に豊富なスケーリングシステムにおいて,学習曲線に対する鋭い漸近を導出する。文脈長と事前学習タスクの多様性はトークン次元に比例してスケールし,事前学習サンプルの数は2次的にスケールする。低多様性体制では、モデルはトレーニングタスクを記憶する傾向があり、高多様性体制では、事前訓練されたタスクの範囲を超えて真にコンテキスト内学習と一般化を達成する。これらの理論的洞察は、線形注意と完全な非線形トランスフォーマーアーキテクチャの両方で実験によって実証的に検証される。

Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unresolved. Here, we provide a precise answer to these questions in an exactly solvable model of ICL of a linear regression task by linear attention. We derive sharp asymptotics for the learning curve in a phenomenologically-rich scaling regime where the token dimension is taken to infinity; the context length and pretraining task diversity scale proportionally with the token dimension; and the number of pretraining examples scales quadratically. We demonstrate a double-descent learning curve with increasing pretraining examples, and uncover a phase transition in the model's behavior between low and high task diversity regimes: In the low diversity regime, the model tends toward memorization of training tasks, whereas in the high diversity regime, it achieves genuine in-context learning and generalization beyond the scope of pretrained tasks. These theoretical insights are empirically validated through experiments with both linear attention and full nonlinear Transformer architectures.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 化学プロセスモデリングの基礎モデル:物理インフォームド適応によるメタラーニング

Foundation Model for Chemical Process Modeling: Meta-Learning with Physics-Informed Adaptation ( http://arxiv.org/abs/2405.11752v1 )

ライセンス: Link先を確認

Zihao Wang, Zhe Wu,

(参考訳) 本稿では,非線形化学プロセスモデリングの分野における基礎モデルの新たな応用について紹介する。現実世界の化学プロセスのための正確な第一原理モデルを得るという課題と、新しい化学プロセスのためのモデルの再編成と再訓練の非効率性を考えると、我々は重要な疑問を提起する: もし我々は、新しい化学プロセスのモデリングに迅速に適応できる単一の普遍的ニューラルネットワーク(すなわち基礎モデル)を開発できるとしたら? そこで本研究では,Reptile を用いたメタラーニングに基づく基礎モデル構築手法を提案する。提案手法の有効性を評価するため, 連続拌槽リアクター (CSTR) , バッチリアクター (BR) , プラグフローリアクター (PFR) を含む3つの古典的汎用原子炉の各種化学反応の基礎モデルを構築した。提案手法は,データ駆動学習,物理インフォームド学習,伝達学習,純粋メタラーニングといった従来の手法よりも優れている。さらに,提案手法は,指定されたタスクからのデータサンプルのみを用いて,新しいCSTR,BR,PFRへの迅速な適応を実現する。ソースコードはhttps://github.com/killingbear999/chemical-process-foundation-modelで入手できる。

In this work, we introduce a novel application of foundation models in the domain of nonlinear chemical process modeling. Given the challenges of obtaining accurate first-principles models for real-world chemical processes and the inefficiency of rebuilding and retraining models for new chemical processes, we pose a pivotal question: What if we could develop a single, universal neural network (i.e., foundation model) capable of rapidly adapting to modeling any new chemical process? To address this question, we propose a meta-learning-based approach using Reptile to construct the foundation model, followed by physics-informed adaptation to fine-tune it to new modeling tasks using only a few data samples. To assess the effectiveness of our methodology, we construct a foundation model for various chemical reactions in three classical generic reactors, including continuous stirred tank reactors (CSTRs), batch reactors (BRs), and plug flow reactors (PFRs). Our approach outperforms conventional methods such as data-driven learning, physics-informed learning, transfer learning, and pure meta-learning in a few-shot setting. Furthermore, our method achieves rapid adaptation to new CSTRs, BRs, and PFRs using only a few data samples from the designated tasks. Source code is available at https://github.com/killingbear999/chemical-process-foundation-model.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# サブサイクル運動におけるレーザー誘起電子相の解析制御

Analytically controlling laser-induced electron phase in sub-cycle motion ( http://arxiv.org/abs/2405.11753v1 )

ライセンス: Link先を確認

Doan-An Trieu, Trong-Thanh D. Nguyen, Thanh-Duy D. Nguyen, Thanh Tran, Van-Hoang Le, Ngoc-Loan Phan,

(参考訳) 強いレーザー場内でのサブサイクル運動中に蓄積される電子相の精密制御は、強磁場物理学において必須であるが、今のところほとんど間接的かつ複雑である。本稿では、このサブサイクル電子相を制御するために、数サイクル赤外線レーザーパルスとの相互作用において、中心対称ガスターゲットに印加される低周波電界を調整して、このサブサイクル電子相を制御する新しい手法を開発する。本手法は, 強磁場近似による低周波電場とその高調波周波数シフトの普遍的解析的関係に基づく。この単純な関係とその普遍性は、時間依存シュリンガー方程式を直接解くことによって数値的に確認される。さらに、XUV波を連続的かつ正確にチューニングし、THzパルスを包括的にサンプリングする新しい手法を開発することを含む、textit{in situ}アプリケーションにおける検出された関係の利点について論じる。

Precise control of the electron phase accumulated during its sub-cycle motion within intense laser fields is essential in strong-field physics, yet remains mostly indirect and complicated so far. In this Letter, we develop a novel approach to control this sub-cycle electron phase by tuning a low-frequency electric field applied on a centrosymmetric gaseous target during its interaction with a few-cycle infrared laser pulse. Our method is based on a universal analytical relation between the low-frequency electric field and its induced harmonic frequency shift, derived by the strong-field approximation. This simple relation and its universality are confirmed numerically by directly solving the time-dependent Schr\"odinger equation. Moreover, we discuss the benefits of the discovered relation in \textit{in situ} applications, including continuously and precisely tuning XUV waves and developing a new method of comprehensively sampling THz pulse.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# Versatile Teacher: クロスドメイン適応のためのクラス認識型教師学生フレームワーク

Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation ( http://arxiv.org/abs/2405.11754v1 )

ライセンス: Link先を確認

Runou Yang, Tian Tian, Jinwen Tian,

(参考訳) データセット間のドメインシフトの課題に対処することは、モデルパフォーマンスを維持する上で不可欠である。クロスドメインオブジェクト検出の文脈において、広く使われている半教師付きモデルである教師学生フレームワークは、大幅な精度の向上を示している。しかし、既存のメソッドはクラスの違いを見落とし、すべてのクラスを平等に扱う。さらに、地域提案ネットワーク(RPN)が存在しないため、インスタンスレベルのアライメントをワンステージ検出器と統合することは、このフレームワークでは未検討のままである。これらの欠点に対応するために,我々はVersatile Teacher (VT) という新しい教師学生モデルを導入する。 VTは、クラス固有の検出困難を考慮して、より信頼性の高い擬似ラベルを生成するために、2段階の擬似ラベル選択機構(Class-aware Pseudo-label Adaptive Selection (CAPS))を採用している。これらのラベルは、ターゲットのインスタンスレベルのアライメントのために識別器を誘導するために、唾液度行列として利用されます。提案手法は,3つのベンチマークデータセットに対して有望な結果を示し,広範に使用されている1段検出器のアライメント手法を拡張し,実用的な応用の可能性を示す。コードはhttps://github.com/RicardooYoung/VersatileTeacher.comで入手できる。

Addressing the challenge of domain shift between datasets is vital in maintaining model performance. In the context of cross-domain object detection, the teacher-student framework, a widely-used semi-supervised model, has shown significant accuracy improvements. However, existing methods often overlook class differences, treating all classes equally, resulting in suboptimal results. Furthermore, the integration of instance-level alignment with a one-stage detector, essential due to the absence of a Region Proposal Network (RPN), remains unexplored in this framework. In response to these shortcomings, we introduce a novel teacher-student model named Versatile Teacher (VT). VT differs from previous works by considering class-specific detection difficulty and employing a two-step pseudo-label selection mechanism, referred to as Class-aware Pseudo-label Adaptive Selection (CAPS), to generate more reliable pseudo labels. These labels are leveraged as saliency matrices to guide the discriminator for targeted instance-level alignment. Our method demonstrates promising results on three benchmark datasets, and extends the alignment methods for widely-used one-stage detectors, presenting significant potential for practical applications. Code is available at https://github.com/RicardooYoung/VersatileTeacher.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# バイアスの消去:半監督学習のためのファインチューニング基礎モデル

Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning ( http://arxiv.org/abs/2405.11756v1 )

ライセンス: Link先を確認

Kai Gan, Tong Wei,

(参考訳) 半教師付き学習(SSL)は目覚ましい進歩をみせ、多くの方法のバリエーションが出現した。しかしながら、実践者は、これらのメソッドをデプロイしようとすると、パフォーマンスが低いため、しばしば課題に遭遇する。本稿では,FinSSLという新しいSSLアプローチを提案する。基礎モデルに固有の集合バイアスと認知偏差問題を同定し、バランスの取れたマージンソフトマックスと疎結合ラベルスムーシングを付与することにより、シンプルで効果的な解法を提案する。広範な実験を通じて、FineSSLは、複数のベンチマークデータセットにSSLの最先端を新たに設定し、トレーニングコストを6倍以上削減し、さまざまな微調整と現代的なSSLアルゴリズムをシームレスに統合できることを実証した。ソースコードはhttps://github.com/Gank0078/FineSSLで入手できる。

Semi-supervised learning (SSL) has witnessed remarkable progress, resulting in the emergence of numerous method variations. However, practitioners often encounter challenges when attempting to deploy these methods due to their subpar performance. In this paper, we present a novel SSL approach named FineSSL that significantly addresses this limitation by adapting pre-trained foundation models. We identify the aggregated biases and cognitive deviation problems inherent in foundation models, and propose a simple yet effective solution by imposing balanced margin softmax and decoupled label smoothing. Through extensive experiments, we demonstrate that FineSSL sets a new state of the art for SSL on multiple benchmark datasets, reduces the training cost by over six times, and can seamlessly integrate various fine-tuning and modern SSL algorithms. The source code is available at https://github.com/Gank0078/FineSSL.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# DLAFormer:ドキュメントレイアウト分析のためのエンドツーエンド変換器

DLAFormer: An End-to-End Transformer For Document Layout Analysis ( http://arxiv.org/abs/2405.11757v1 )

ライセンス: Link先を確認

Jiawei Wang, Kai Hu, Qiang Huo,

(参考訳) 文書レイアウト解析(DLA)は,文書の物理的レイアウトや論理構造,情報検索,文書要約,知識抽出などを理解する上で重要である。しかし、従来の研究では、テーブル/フィギュア検出、テキスト領域の検出、論理的役割分類、読み出し順序予測など、DLA内の個々のサブタスクに対処するために、個別のモデルを使用していた。本研究では,これらのサブタスクを1つのモデルに統合した文書レイアウト解析手法DLAFormerを提案する。そこで本研究では,DLAサブタスク(テキスト領域の検出,論理的役割分類,読み出し順序予測など)を関係予測問題として扱い,これらの関係予測ラベルを統一ラベル空間に統合し,複数のタスクを同時に処理できるようにする。さらに,DeTRにおけるコンテンツクエリの物理的意味を高めるために,新しいタイプワイズクエリを提案する。さらに,グラフィカルなページオブジェクトを正確に識別するための粗大な戦略を採用した。実験の結果,提案したDLAFormerは,DocLayNetとComp-HRDocの2つの文書レイアウト解析ベンチマークにおいて,複数のタスクにマルチブランチアーキテクチャやマルチステージアーキテクチャを採用する従来の手法よりも優れていた。

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically used separate models to address individual sub-tasks within DLA, including table/figure detection, text region detection, logical role classification, and reading order prediction. In this work, we propose an end-to-end transformer-based approach for document layout analysis, called DLAFormer, which integrates all these sub-tasks into a single model. To achieve this, we treat various DLA sub-tasks (such as text region detection, logical role classification, and reading order prediction) as relation prediction problems and consolidate these relation prediction labels into a unified label space, allowing a unified relation prediction module to handle multiple tasks concurrently. Additionally, we introduce a novel set of type-wise queries to enhance the physical meaning of content queries in DETR. Moreover, we adopt a coarse-to-fine strategy to accurately identify graphical page objects. Experimental results demonstrate that our proposed DLAFormer outperforms previous approaches that employ multi-branch or multi-stage architectures for multiple tasks on two document layout analysis benchmarks, DocLayNet and Comp-HRDoc.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# Fed-Credit: 信頼性管理を備えたロバストなフェデレーション学習

Fed-Credit: Robust Federated Learning with Credibility Management ( http://arxiv.org/abs/2405.11758v1 )

ライセンス: Link先を確認

Jiayan Chen, Zhirong Qian, Tianhui Meng, Xitong Gao, Tian Wang, Weijia Jia,

(参考訳) プライバシ保護を目的としたフェデレーション学習(FL)は、分散型デバイスやデータソースでのモデルトレーニングを可能にする、新興の機械学習アプローチである。 FLの学習メカニズムは、個々のクライアントからのパラメータ更新の集約に依存している。しかし、このプロセスは悪意のあるデバイスが存在するため、潜在的なセキュリティリスクを引き起こす可能性がある。既存のソリューションは、計算集約技術の使用によってコストがかかるか、あるいは攻撃者の数や攻撃方法の事前の知識など、強い仮定の理由で制限される。プライバシ制約と不確実な攻撃シナリオの両方を考慮する方法はほとんどない。本稿では,Fed-Creditと呼ばれる信頼性管理手法に基づく堅牢なFL手法を提案する。従来の研究とは異なり、我々のアプローチではノードやデータ分布に関する事前の知識は必要としない。グローバルモデル更新を調整するために、ローカルモデルとグローバルモデルとの類似性に基づいて、履歴クライアントの貢献度を計測する信頼性セットを維持し、採用する。 Fed-Creditの微妙なところは、時間減衰と時間的値係数が評価重みの動的調整に組み込まれており、O(n) の計算複雑性(n はクライアント数)を誇っていることである。 5種類の攻撃条件下でMNISTとCIFAR-10データセットについて広範な実験を行った。その結果、比較的低い計算複雑性を維持しながら、敵攻撃に対する精度とレジリエンスが向上した。このうち,非IID CIFAR-10データセットでは,2種類のデータ中毒攻撃に対処する際の最先端アルゴリズムと比較して,それぞれ19.5%,14.5%の性能向上を示した。

Aiming at privacy preservation, Federated Learning (FL) is an emerging machine learning approach enabling model training on decentralized devices or data sources. The learning mechanism of FL relies on aggregating parameter updates from individual clients. However, this process may pose a potential security risk due to the presence of malicious devices. Existing solutions are either costly due to the use of compute-intensive technology, or restrictive for reasons of strong assumptions such as the prior knowledge of the number of attackers and how they attack. Few methods consider both privacy constraints and uncertain attack scenarios. In this paper, we propose a robust FL approach based on the credibility management scheme, called Fed-Credit. Unlike previous studies, our approach does not require prior knowledge of the nodes and the data distribution. It maintains and employs a credibility set, which weighs the historical clients' contributions based on the similarity between the local models and global model, to adjust the global model update. The subtlety of Fed-Credit is that the time decay and attitudinal value factor are incorporated into the dynamic adjustment of the reputation weights and it boasts a computational complexity of O(n) (n is the number of the clients). We conducted extensive experiments on the MNIST and CIFAR-10 datasets under 5 types of attacks. The results exhibit superior accuracy and resilience against adversarial attacks, all while maintaining comparatively low computational complexity. Among these, on the Non-IID CIFAR-10 dataset, our algorithm exhibited performance enhancements of 19.5% and 14.5%, respectively, in comparison to the state-of-the-art algorithm when dealing with two types of data poisoning attacks.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 実用的なマッハ・ツェンダー干渉計を用いた差動位相シフトQKD

Differential-phase-shift QKD with practical Mach-Zehnder interferometer ( http://arxiv.org/abs/2405.11760v1 )

ライセンス: Link先を確認

Akihiro Mizutani, Masanori Terashita, Junya Matsubayashi, Shogo Mori, Ibuki Matsukura, Suzuna Tagawa, Kiyoshi Tamaki,

(参考訳) 微分位相シフト(DPS)量子鍵分布は、単純な実装のため有望なプロトコルであり、コヒーレントパルス列と受動測定ユニットで実現可能である。さらに、このプロトコルは光源の欠陥に対して堅牢であるという利点がある。しかし残念ながら、測定ユニットに関しては、既存のセキュリティ証明が非現実的な仮定を定めているため、実際の実装ではセキュリティの抜け穴になる可能性がある。本稿では、測定ユニットに主要な欠陥を組み込むことにより、DPSプロトコルの実装セキュリティを強化する。具体的には、既存のセキュリティ証明で仮定されたように、正確に50\%$のビームスプリッタよりも、送信範囲の既知の実用的なビームスプリッタを使用できる。数値シミュレーションにより, 理想値からの透過率の変動が$\pm0.5\%である場合でも, 鍵レートは0.57でしか劣化しないことが示された。この結果は,DPSプロトコルの実現可能性を示すものである。

Differential-phase-shift (DPS) quantum key distribution stands as a promising protocol due to its simple implementation, which can be realized with a train of coherent pulses and a passive measurement unit. Besides, this protocol has the advantage of being robust against imperfections in the light source. Unfortunately, however, as for the measurement unit, existing security proofs put unrealistic assumptions on it, which could be security loopholes in actual implementations. In this paper, we enhance the implementation security of the DPS protocol by incorporating a major imperfection in the measurement unit. Specifically, our proof enables us to employ practical beam splitters with a known range of the transmittance rather than the one with exactly $50\%$, as was assumed in the existing security proofs. Our numerical simulations demonstrate that even with fluctuations of $\pm0.5\%$ in the transmittance from the ideal value, the key rate degrades only by a factor of 0.57. This result highlights the feasibility of the DPS protocol with practical measurement setups.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 地すべりサセプティビリティマッピングにおける解釈可能性の不確実性:統計的・機械学習・ディープラーニングモデルの比較分析

Uncertainty of interpretability in Landslide Susceptibility Mapping: A Comparative Analysis of Statistical, Machine Learning, and Deep Learning Models ( http://arxiv.org/abs/2405.11762v1 )

ライセンス: Link先を確認

Cheng Chen, Lei Fan,

(参考訳) 地すべり感受性マッピング(LSM)は,高リスク領域の特定と予防戦略の実施に不可欠である。本研究では,地すべりの感受性予測における統計的,機械学習(ML),深層学習(DL)モデルの解釈可能性について検討した。これは、地すべりに統計的に関係のある19の要因の包括的セットと、地すべりを誘発する直接に関連する9の要因の専用のセットの2種類の入力因子を組み込むことによって達成される。モデル性能がLSMの重要な指標であることを考えると、解釈可能性に関する調査は、考慮されたモデル間でのLSMの精度の評価と比較を自然に行ないます。本研究では、畳み込みニューラルネットワークモデルが最も精度が高く(19因子0.8447、0.8048、9因子 0.8048)、一方Extreme Gradient Boosting and Support Vector Machineは、従来の統計モデルよりも優れた予測能力を示した。これらの結果から,DLアルゴリズムと高度MLアルゴリズムは,入力要因と地すべりの発生との複雑な関係を効果的に捉えることができることがわかった。しかし、予測の解釈性は、特に19の要因のより広いセットを使用する場合、様々なモデルで異なっていた。 SHAP、LIME、DeepLIFTといった説明法も解釈結果のバリエーションをもたらしている。 19因子からなる包括的集合を用いることで予測精度は向上したが、モデル解釈における複雑さと矛盾が導入された。予測力は犠牲になったが、様々なモデルにまたがるより一貫した重要な要因によって証明され、フィールド調査レポートの調査結果と一致していたように、9つの要因の専用セットに焦点をあてることで解釈可能性を高めた。

Landslide susceptibility mapping (LSM) is crucial for identifying high-risk areas and informing prevention strategies. This study investigates the interpretability of statistical, machine learning (ML), and deep learning (DL) models in predicting landslide susceptibility. This is achieved by incorporating various relevant interpretation methods and two types of input factors: a comprehensive set of 19 contributing factors that are statistically relevant to landslides, as well as a dedicated set of 9 triggering factors directly associated with triggering landslides. Given that model performance is a crucial metric in LSM, our investigations into interpretability naturally involve assessing and comparing LSM accuracy across different models considered. In our investigation, the convolutional neural network model achieved the highest accuracy (0.8447 with 19 factors; 0.8048 with 9 factors), while Extreme Gradient Boosting and Support Vector Machine also demonstrated strong predictive capabilities, outperforming conventional statistical models. These findings indicate that DL and sophisticated ML algorithms can effectively capture the complex relationships between input factors and landslide occurrence. However, the interpretability of predictions varied among different models, particularly when using the broader set of 19 contributing factors. Explanation methods like SHAP, LIME, and DeepLIFT also led to variations in interpretation results. Using a comprehensive set of 19 contributing factors improved prediction accuracy but introduced complexities and inconsistency in model interpretations. Focusing on a dedicated set of 9 triggering factors sacrificed some predictive power but enhanced interpretability, as evidenced by more consistent key factors identified across various models and alignment with the findings of field investigation reports....

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# DATR:データセットレベル適応とプロトタイプアライメントを用いた教師なしドメイン適応検出変換器

DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment ( http://arxiv.org/abs/2405.11765v1 )

ライセンス: Link先を確認

Jianhong Han, Liang Chen, Yupei Wang,

(参考訳) オブジェクト検出器は、収集されたデータ(ソースドメイン)と現実世界のアプリケーション(ターゲットドメイン)のデータの間のドメインギャップに直面すると、しばしば大きなパフォーマンス劣化に直面する。この課題に対処するために、慎重に設計された特徴アライメント技術を利用して、多くの教師なしドメイン適応検出器が提案されている。しかし、これらのテクニックは主にクラスに依存しない方法でインスタンスレベルの特徴を調整し、異なるカテゴリから抽出された特徴の違いを見極めることで、改善は限られている。さらに、現在のアライメントモジュールの範囲は、しばしば限られた画像のバッチに制限され、データセットレベルのキュー全体を学ぶことができず、これにより検出器の一般化能力をターゲット領域に厳しく制限する。そこで本研究では,DATR(Domain Adaptive Detection TRansformer)と呼ばれるDTRベースの強力な検出器を導入する。まず、オブジェクト検出タスクとドメイン適応タスクのギャップを埋めることにより、ドメイン間機能をクラス認識で効果的に整合させるクラスワイドプロトタイプアライメント(CPA)モジュールを提案する。次に、設計されたデータセットレベルのアライメントスキーム(DAS)は、コントラスト学習を活用することで、グローバル表現を達成するための検出器を明示的にガイドし、データセット全体にわたってインスタンスレベルの機能のクラス間区別性を向上する。さらに、DATRは、教師モデルによって生成された擬似ラベルを利用して、平均教師ベースの自己学習フレームワークを導入し、ドメインバイアスをさらに緩和する。複数のドメイン適応シナリオにおいて,提案したDATRの性能と一般化性能が向上することを示した。コードはhttps://github.com/h751410234/DATRで公開されている。

Object detectors frequently encounter significant performance degradation when confronted with domain gaps between collected data (source domain) and data from real-world applications (target domain). To address this task, numerous unsupervised domain adaptive detectors have been proposed, leveraging carefully designed feature alignment techniques. However, these techniques primarily align instance-level features in a class-agnostic manner, overlooking the differences between extracted features from different categories, which results in only limited improvement. Furthermore, the scope of current alignment modules is often restricted to a limited batch of images, failing to learn the entire dataset-level cues, thereby severely constraining the detector's generalization ability to the target domain. To this end, we introduce a strong DETR-based detector named Domain Adaptive detection TRansformer (DATR) for unsupervised domain adaptation of object detection. Firstly, we propose the Class-wise Prototypes Alignment (CPA) module, which effectively aligns cross-domain features in a class-aware manner by bridging the gap between object detection task and domain adaptation task. Then, the designed Dataset-level Alignment Scheme (DAS) explicitly guides the detector to achieve global representation and enhance inter-class distinguishability of instance-level features across the entire dataset, which spans both domains, by leveraging contrastive learning. Moreover, DATR incorporates a mean-teacher based self-training framework, utilizing pseudo-labels generated by the teacher model to further mitigate domain bias. Extensive experimental results demonstrate superior performance and generalization capabilities of our proposed DATR in multiple domain adaptation scenarios. Code is released at https://github.com/h751410234/DATR.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# SHAPスコアから機能重要スコアへ

From SHAP Scores to Feature Importance Scores ( http://arxiv.org/abs/2405.11766v1 )

ライセンス: Link先を確認

Olivier Letoffe, Xuanxiang Huang, Nicholas Asher, Joao Marques-Silva,

(参考訳) eXplainable Artificial Intelligence(XAI)の中心的な目標は、予測された機械学習(ML)モデルの特徴に相対的な重要性を割り当てることである。特徴属性によるこの説明可能性のタスクの重要性は、最近のSHAPやLIMEといったツールのユビキタスな利用によって説明されている。残念なことに、SHAP と LIME の根底にあるゲーム理論の基礎を用いた機能属性の正確な計算は、明らかな不満足な結果をもたらす可能性がある。最近の研究は、特徴選択による説明の論理的定義に基づく特徴の公理的集約を研究することによって、厳密な特徴属性を目標にしている。本稿は,特徴属性と優先投票力の間には必須の相関関係があることを示し,最近提案された公理集約は,過去に研究されたパワー指標の範囲のいくつかを瞬時に表すものであることを示す。さらに、最も広く使われているパワー指標のどれが特徴重要度スコア(FISs)として活用されるのか、すなわちXAIにおけるパワー指標の使用と、これら指標のどちらが特徴属性によるXAIの目的に最も適しているか、すなわち不満足な結果とは見なされないという点で、はっきりしない。本稿では,FISが提示すべき新たな望ましい特性を提案する。また,提案する特性を示す新しいFISも提案する。最後に、提案した特性の観点から、最もよく知られたパワー指標の厳密な分析を行う。

A central goal of eXplainable Artificial Intelligence (XAI) is to assign relative importance to the features of a Machine Learning (ML) model given some prediction. The importance of this task of explainability by feature attribution is illustrated by the ubiquitous recent use of tools such as SHAP and LIME. Unfortunately, the exact computation of feature attributions, using the game-theoretical foundation underlying SHAP and LIME, can yield manifestly unsatisfactory results, that tantamount to reporting misleading relative feature importance. Recent work targeted rigorous feature attribution, by studying axiomatic aggregations of features based on logic-based definitions of explanations by feature selection. This paper shows that there is an essential relationship between feature attribution and a priori voting power, and that those recently proposed axiomatic aggregations represent a few instantiations of the range of power indices studied in the past. Furthermore, it remains unclear how some of the most widely used power indices might be exploited as feature importance scores (FISs), i.e. the use of power indices in XAI, and which of these indices would be the best suited for the purposes of XAI by feature attribution, namely in terms of not producing results that could be deemed as unsatisfactory. This paper proposes novel desirable properties that FISs should exhibit. In addition, the paper also proposes novel FISs exhibiting the proposed properties. Finally, the paper conducts a rigorous analysis of the best-known power indices in terms of the proposed properties.

翻訳日:2024-05-21 14:23:32 公開日:2024-05-20

# 話者匿名化データを用いた多話者テキスト音声訓練

Multi-speaker Text-to-speech Training with Speaker Anonymized Data ( http://arxiv.org/abs/2405.11767v1 )

ライセンス: Link先を確認

Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda,

(参考訳) 音声生成モデルのスケールアップのトレンドは、トレーニングデータにおける音声のアイデンティティのバイオメトリック情報漏洩の脅威となり、プライバシとセキュリティ上の懸念が高まる。本稿では,他の属性を維持しつつ入力音声の話者識別を隠蔽するプロセスである話者匿名化(SA)を行ったデータを用いて,マルチ話者テキスト音声(TTS)モデルの訓練を行う。 2つの信号処理ベースと3つのディープニューラルネットワークベースのSAメソッドを使用して、テストフェーズ中に未確認の話者TSを実行するために、エンドツーエンドのTSモデルであるVITSをトレーニングするために、マルチスピーカーTSデータセットであるVCTKを匿名化した。我々は、匿名化されたトレーニングデータと、これらのデータを用いてトレーニングされた下流TSモデルの性能を評価するために、広範囲な客観的および主観的な実験を行った。重要なことは、データ駆動型主観的評価予測モデルであるUTMOSと、声質の利得を測定する指標であるGVDが、ダウンストリームTS性能のよい指標であることが判明した。我々は、将来の研究者がマルチスピーカーTTSトレーニングにおけるSAシステムの良否を判断するのに役立つと期待する見解を要約する。

The trend of scaling up speech generation models poses a threat of biometric information leakage of the identities of the voices in the training data, raising privacy and security concerns. In this paper, we investigate training multi-speaker text-to-speech (TTS) models using data that underwent speaker anonymization (SA), a process that tends to hide the speaker identity of the input speech while maintaining other attributes. Two signal processing-based and three deep neural network-based SA methods were used to anonymize VCTK, a multi-speaker TTS dataset, which is further used to train an end-to-end TTS model, VITS, to perform unseen speaker TTS during the testing phase. We conducted extensive objective and subjective experiments to evaluate the anonymized training data, as well as the performance of the downstream TTS model trained using those data. Importantly, we found that UTMOS, a data-driven subjective rating predictor model, and GVD, a metric that measures the gain of voice distinctiveness, are good indicators of the downstream TTS performance. We summarize insights in the hope of helping future researchers determine the goodness of the SA system for multi-speaker TTS training.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 全フォトニック量子リピータの高性能化

An Improved Design for All-Photonic Quantum Repeaters ( http://arxiv.org/abs/2405.11768v1 )

ライセンス: Link先を確認

Ashlesha Patil, Saikat Guha,

(参考訳) 全フォトニック量子リピータは、物質ベースの量子メモリの代わりに、リピータグラフ状態(RGS)と呼ばれるマルチキュービットフォトニックグラフ状態を使用して、主に損失エラーに対する保護を行っている。 RGSは、リピータにおける誤り訂正のための木グラフ符号化論理量子ビットと、隣接するリピータ間の絡み合いを生成する物理リンク量子ビットから構成される。 RGSを生成する2つの方法は、単一の光子で供給される多重確率線形光回路によって作製された小さな絡み合った状態の線形光学ベル状態測定(融合)を用いて、確率的縫合(英語版)と、少数の量子論理可能な固体状態エミッタを用いた直接決定的準備である。核融合による資源オーバーヘッドと量子エミッタ系の回路深さは、RGSのサイズとともに増大する。そのため、資源効率の高いRGSのエンジニアリングが不可欠である。本稿では,従来のRGSよりも少ない量子ビットを用いた全フォトニック量子リピータにおいて,より高い絡み合い率を実現する新しいRGS設計を提案する。木コードリンクキュービットで隣接するリピータを絡める確率を高めることでこれを実現できる。また、損失のみの誤りに対して、リンクキュービット上で論理的BSMを実行するための新しい適応スキームを提案する。適応的BSMは、キュービット損失確率が均一である場合、木コード上の論理的BSMの以前のスキームよりも優れる。これにより、リンク量子ビット上で論理的BSMを実行するのに必要な光学モードの数を削減し、絡み合い率をさらに向上する。

All-photonic quantum repeaters use multi-qubit photonic graph states, called repeater graph states (RGS), instead of matter-based quantum memories, for protection against predominantly loss errors. The RGS comprises tree-graph-encoded logical qubits for error correction at the repeaters and physical {\em link} qubits to create entanglement between neighboring repeaters. The two methods to generate the RGS are probabilistic stitching -- using linear optical Bell state measurements (fusion) -- of small entangled states prepared via multiplexed-probabilistic linear optical circuits fed with single photons, and a direct deterministic preparation using a small number of quantum-logic-capable solid-state emitters. The resource overhead due to fusions and the circuit depth of the quantum emitter system both increase with the size of the RGS. Therefore engineering a resource-efficient RGS is crucial. We propose a new RGS design, which achieves a higher entanglement rate for all-photonic quantum repeaters using fewer qubits than the previously known RGS would. We accomplish this by boosting the probability of entangling neighboring repeaters with tree-encoded link qubits. We also propose a new adaptive scheme to perform logical BSM on the link qubits for loss-only errors. The adaptive BSM outperforms the previous schemes for logical BSM on tree codes when the qubit loss probability is uniform. It reduces the number of optical modes required to perform logical BSM on link qubits to improve the entanglement rate further.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# Uni-Mol Docking V2: Realistic and accurate Binding Pose Predictionを目指して

Uni-Mol Docking V2: Towards Realistic and Accurate Binding Pose Prediction ( http://arxiv.org/abs/2405.11769v1 )

ライセンス: Link先を確認

Eric Alcaide, Zhifeng Gao, Guolin Ke, Yaqi Li, Linfeng Zhang, Hang Zheng, Gengmo Zhou,

(参考訳) 近年,機械学習(ML)手法は分子ドッキングの代替として期待されている。しかし、最近の研究では、これらのMLモデルは、問題に固有の物理的制約を無視しながら、測定値に過度に適合する可能性があることが示されている。 In this work, we present Uni-Mol Docking V2 that showed a impressive improvement in performance, accurate predicting the binding poses of 777+% of ligands in the PoseBusters benchmark with a RMSD value of 2.0 {\AA}, and 75+% pass all quality checks。これは、以前のUni-Mol Dockingモデルによって達成された62%から大幅に増加したことを意味する。特に、我々のUni-Mol Dockingアプローチは化学的に正確な予測を生成し、過去のMLモデルを悩ませたキラリティ反転や立体衝突などの問題を回避しています。さらに,Uni-Mol DockingとUni-Dockのような物理ベースの手法を組み合わせた場合,高品質な予測(RMSD値1.0.AAと1.5.AAの値)と物理音質の高機能化を観察する。本研究は, 仮想スクリーニングおよび薬物設計における産業応用に適したリガンドドッキングに対する総合的なアプローチを採用することにより, 人工知能の科学研究への応用の著しい進歩を示すものである。 Uni-Mol Dockingのコード、データ、サービスはhttps://github.com/dptech-corp/Uni-Mol.comで公開されている。

In recent years, machine learning (ML) methods have emerged as promising alternatives for molecular docking, offering the potential for high accuracy without incurring prohibitive computational costs. However, recent studies have indicated that these ML models may overfit to quantitative metrics while neglecting the physical constraints inherent in the problem. In this work, we present Uni-Mol Docking V2, which demonstrates a remarkable improvement in performance, accurately predicting the binding poses of 77+% of ligands in the PoseBusters benchmark with an RMSD value of less than 2.0 {\AA}, and 75+% passing all quality checks. This represents a significant increase from the 62% achieved by the previous Uni-Mol Docking model. Notably, our Uni-Mol Docking approach generates chemically accurate predictions, circumventing issues such as chirality inversions and steric clashes that have plagued previous ML models. Furthermore, we observe enhanced performance in terms of high-quality predictions (RMSD values of less than 1.0 {\AA} and 1.5 {\AA}) and physical soundness when Uni-Mol Docking is combined with more physics-based methods like Uni-Dock. Our results represent a significant advancement in the application of artificial intelligence for scientific research, adopting a holistic approach to ligand docking that is well-suited for industrial applications in virtual screening and drug design. The code, data and service for Uni-Mol Docking are publicly available for use and further development in https://github.com/dptech-corp/Uni-Mol.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# ファウショットオブジェクトカウントのための空間的類似度分布の学習

Learning Spatial Similarity Distribution for Few-shot Object Counting ( http://arxiv.org/abs/2405.11770v1 )

ライセンス: Link先を確認

Yuanwu Xu, Feifan Song, Haofeng Zhang,

(参考訳) Few-shot Object counting は、クエリイメージ内のオブジェクトの数を、与えられた模範画像と同じクラスに属する数にカウントすることを目的としている。既存の手法では、2次元空間領域におけるクエリ画像と例間の類似性を計算し、回帰してカウント数を求める。しかし、これらの手法は、類似性の空間分布に関する豊富な情報を見落とし、マッチング精度に大きな影響を及ぼす。この問題に対処するために,従来の特徴の空間構造を保存し,クエリ特徴と類似特徴の点間を4D類似度ピラミッドで計算し,各点の完全な分布情報を4D類似度空間で取得する,数ショットオブジェクトカウントのためのネットワーク学習空間類似度分布(SSD)を提案する。本研究では, 類似度分布を予測密度の異なる値にマッピングし, 精度の高い数値を得るために, 類似度ピラミッド上に効率の良い中心ピボット4D畳み込みを適用した類似度学習モジュール(SLM)を提案する。さらに,FCE(Feature Cross Enhancement)モジュールも導入し,クエリの強化と特徴マッチングの精度向上を図る。提案手法は,FSC-147やCARPKなど,複数のデータセット上で最先端の手法より優れている。コードはhttps://github.com/CBalance/SSDで入手できる。

Few-shot object counting aims to count the number of objects in a query image that belong to the same class as the given exemplar images. Existing methods compute the similarity between the query image and exemplars in the 2D spatial domain and perform regression to obtain the counting number. However, these methods overlook the rich information about the spatial distribution of similarity on the exemplar images, leading to significant impact on matching accuracy. To address this issue, we propose a network learning Spatial Similarity Distribution (SSD) for few-shot object counting, which preserves the spatial structure of exemplar features and calculates a 4D similarity pyramid point-to-point between the query features and exemplar features, capturing the complete distribution information for each point in the 4D similarity space. We propose a Similarity Learning Module (SLM) which applies the efficient center-pivot 4D convolutions on the similarity pyramid to map different similarity distributions to distinct predicted density values, thereby obtaining accurate count. Furthermore, we also introduce a Feature Cross Enhancement (FCE) module that enhances query and exemplar features mutually to improve the accuracy of feature matching. Our approach outperforms state-of-the-art methods on multiple datasets, including FSC-147 and CARPK. Code is available at https://github.com/CBalance/SSD.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# テキスト分類における規則性の探索:明示的手法と暗黙的手法の比較研究

Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques ( http://arxiv.org/abs/2405.11775v1 )

ライセンス: Link先を確認

Siva Rajesh Kasa, Aniket Goel, Karan Gupta, Sumegh Roychowdhury, Anish Bhanushali, Nikhil Pattisapu, Prasanna Srinivasa Murthy,

(参考訳) 正規分類(OC)は自然言語処理(NLP)において、感情分析、評価予測など様々な分野の応用において、広く直面している課題である。 OCに取り組むための従来のアプローチは、ラベルの順序性について、textbf{explicitly} が考慮する既存の機能や新規な損失関数の修正に重点を置いていた。しかし、事前訓練された言語モデル(PLMs)が出現すると、ラベルの \textbf{implicit} セマンティクスを通じて順序性に取り組むことが可能になった。本稿では,これら2つのアプローチの総合的理論的および実証的研究について述べる。さらに、特定の設定に基づいて採用する最も効果的なアプローチについて、戦略的に推奨する。

Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP), with applications in various domains such as sentiment analysis, rating prediction, and more. Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that \textbf{explicitly} account for the ordinal nature of labels. However, with the advent of Pretrained Language Models (PLMs), it became possible to tackle ordinality through the \textbf{implicit} semantics of the labels as well. This paper provides a comprehensive theoretical and empirical examination of both these approaches. Furthermore, we also offer strategic recommendations regarding the most effective approach to adopt based on specific settings.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# プランニングによる効率的なマルチエージェント強化学習

Efficient Multi-agent Reinforcement Learning by Planning ( http://arxiv.org/abs/2405.11778v1 )

ライセンス: Link先を確認

Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang,

(参考訳) マルチエージェント強化学習(MARL)アルゴリズムは、大規模意思決定タスクの解決において、目覚ましいブレークスルーを達成している。それでも、既存のMARLアルゴリズムのほとんどはモデルフリーであり、サンプル効率を制限し、より困難なシナリオにおける適用性を妨げている。対照的に、モデルベース強化学習(MBRL)、特にMuZeroのような計画統合アルゴリズムは、多くのタスクにおいて限られたデータで超人的なパフォーマンスを示す。したがって、モデルベースアプローチを採用することにより、MARLのサンプル効率を高めることを目指している。しかし,マルチエージェントシステムに計画手法と探索手法を組み込むことが大きな課題となっている。マルチエージェントシステムの拡張行動空間は、学習を加速するためにエージェントのほぼ独立性を活用する必要があることが多い。この問題に対処するため,政策探索のための集中型モデルとモンテカルロ木探索(MCTS)を組み合わせたMAZeroアルゴリズムを提案する。分散実行とパラメータ共有を容易にする新しいネットワーク構造を設計する。動作空間を大きくした決定論的環境における探索効率を向上させるために,最適化検索ラムダ (OS($\lambda$)) とアドバンテージ重み付きポリシー最適化 (AWPO) の2つの新しい手法を導入する。 SMACベンチマークの大規模な実験により、MAZeroはサンプル効率の点でモデルフリーアプローチより優れており、サンプル効率と計算効率の両面で既存のモデルベース手法と同等または優れた性能を提供することが示された。私たちのコードはhttps://github.com/liuqh16/MAZero.comから入手可能です。

Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks. Nonetheless, most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios. In contrast, model-based reinforcement learning (MBRL), particularly algorithms integrating planning, such as MuZero, has demonstrated superhuman performance with limited data in many tasks. Hence, we aim to boost the sample efficiency of MARL by adopting model-based approaches. However, incorporating planning and search methods into multi-agent systems poses significant challenges. The expansive action space of multi-agent systems often necessitates leveraging the nearly-independent property of agents to accelerate learning. To tackle this issue, we propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search. We design a novel network structure to facilitate distributed execution and parameter sharing. To enhance search efficiency in deterministic environments with sizable action spaces, we introduce two novel techniques: Optimistic Search Lambda (OS($\lambda$)) and Advantage-Weighted Policy Optimization (AWPO). Extensive experiments on the SMAC benchmark demonstrate that MAZero outperforms model-free approaches in terms of sample efficiency and provides comparable or better performance than existing model-based methods in terms of both sample and computational efficiency. Our code is available at https://github.com/liuqh16/MAZero.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# ベイズコアセットの品質に関する一般境界

General bounds on the quality of Bayesian coresets ( http://arxiv.org/abs/2405.11780v1 )

ライセンス: Link先を確認

Trevor Campbell,

(参考訳) ベイジアンコアセットは、データの小さな重み付きサブセットに基づいて、全データログ様関数を代理ログ様関数で近似することにより、大規模データ体制における後部推論を高速化する。しかし、ベイジアンコアセットと構成法は幅広いモデルに適用できるが、コアセット近似によって得られた後続の推論誤差の既存の理論的解析は、制限的な設定(指数関数的なファミリーモデル、あるいは強い対数共空と滑らかさの仮定を持つモデル)にのみ適用される。この研究は、ベイズコアセットの完全な適用範囲を反映したコアセット近似のクルバック・リーブラ(KL)偏差に関する一般上界と下界を示す。下界はベイズ漸近解析に典型的な穏やかなモデル仮定しか必要としないが、上界は、初期の研究で用いられる条件よりも弱い一般化された指数性基準を満たすために対数様の関数を必要とする。下限はコアセット近似の質に関する基本的な限界を得るために適用され、また、重要なサンプリングベース工法において、以前に観測された貧弱な経験的性能に関する理論的説明を提供する。上界は最近のサブサンプル最適化手法の性能解析に使用される。この理論の柔軟性は、マルチモーダル、未同定、重尾のベイズ分布を含む検証実験で実証される。

Bayesian coresets speed up posterior inference in the large-scale data regime by approximating the full-data log-likelihood function with a surrogate log-likelihood based on a small, weighted subset of the data. But while Bayesian coresets and methods for construction are applicable in a wide range of models, existing theoretical analysis of the posterior inferential error incurred by coreset approximations only apply in restrictive settings -- i.e., exponential family models, or models with strong log-concavity and smoothness assumptions. This work presents general upper and lower bounds on the Kullback-Leibler (KL) divergence of coreset approximations that reflect the full range of applicability of Bayesian coresets. The lower bounds require only mild model assumptions typical of Bayesian asymptotic analyses, while the upper bounds require the log-likelihood functions to satisfy a generalized subexponentiality criterion that is weaker than conditions used in earlier work. The lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations, and to provide a theoretical explanation for the previously-observed poor empirical performance of importance sampling-based construction methods. The upper bounds are used to analyze the performance of recent subsample-optimize methods. The flexibility of the theory is demonstrated in validation experiments involving multimodal, unidentifiable, heavy-tailed Bayesian posterior distributions.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 量子アニールマシンの最適化問題としての海洋力学問題の定式化と評価

Formulation and evaluation of ocean dynamics problems as optimization problems for quantum annealing machines ( http://arxiv.org/abs/2405.11782v1 )

ライセンス: Link先を確認

Takuro Matsuta, Ryo Furue,

(参考訳) 量子コンピューティングの最近の進歩は、海洋学や大気科学を含む様々な科学領域で計算アルゴリズムに革命をもたらす可能性を示唆している。この分野はまだ比較的若いが、量子計算は古典的な計算とは大きく異なり、海洋力学や大気力学を表現するのに適したフレームワークはまだ研究されていない。量子アニール(Quantum annealing)は、組合せ最適化タスクに重点を置いている。本稿では,古典的な量子アニール法である量子アニール法(QA)とシミュレートされたアニール法(SA)を用いて,古典的なストメル問題を解く。線形偏微分方程式を最小二乗法により最適化問題にキャストし、コスト関数を2つの方法で微分する。いずれの場合も、適切なパラメータが選択された場合、SAは期待された解をうまく再現し、アニールがポテンシャルを持つことを示す。対照的に、D-Wave量子アニールマシンを用いたQAは、ハードウェアの制限により、いくつかのケースにおいて良い解を得ることができず、特に、マシンの高度に制限された接続グラフは、少なくとも現在利用可能なアルゴリズムの下では、解決可能な問題のサイズを制限する。接続グラフの拡張やグラフ埋め込みアルゴリズムの改善は、おそらく、量子アニールマシンが海洋や大気の力学問題に利用できるために必要だろう。この発見は、量子アニールの実用化のためのハードウェアの改善とグラフ埋め込みアルゴリズムの強化の必要性を強調しているが、シミュレートされたアニールによる結果は、実際の物理力学問題に対処する可能性を示している。量子計算が進化し続ければ、これらの課題に対処することで、海洋や大気モデリングの革新的な進歩がもたらされる可能性がある。

Recent advancements in quantum computing suggest the potential to revolutionize computational algorithms across various scientific domains including oceanography and atmospheric science. The field is still relatively young and quantum computation is so different from classical computation that suitable frameworks to represent oceanic and atmospheric dynamics are yet to be explored. Quantum annealing, one of the major paradigms, focuses on combinatorial optimization tasks. In this paper, we solve the classical Stommel problem by quantum annealing (QA) and simulated annealing (SA), a classical counterpart of quantum annealing. We cast the linear partial differential equation into an optimization problem by the least-squares method and discretize the cost function in two ways: finite difference and truncated basis expansion. In either case, SA successfully reproduces the expected solution when appropriate parameters are chosen, demonstrating that annealing has the potential. In contrast, QA using the D-Wave quantum annealing machine fails to obtain good solutions for some cases owing to hardware limitations; in particular, the highly limited connectivity graph of the machine limits the size of the solvable problems, at least under currently available algorithms. Either expanding the connectivity graph or improving the graph embedding algorithms would probably be necessary for quantum annealing machines to be usable for oceanic and atmospheric dynamics problems. While this finding emphasizes the need for hardware improvements and enhancements in graph embedding algorithms for practical applications of quantum annealers, the results from simulated annealing suggest its potential to address practical geophysical dynamics problems. As quantum calculation continues to evolve, addressing these challenges may lead to transformative advancements in ocean and atmosphere modeling.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 量子自然言語処理を用いた金属有機フレームワークの逆設計

Inverse Design of Metal-Organic Frameworks Using Quantum Natural Language Processing ( http://arxiv.org/abs/2405.11783v1 )

ライセンス: Link先を確認

Shinyoung Kang, Jihan Kim,

(参考訳) 本研究では、量子自然言語処理(QNLP)を用いて、ターゲット特性を持つ金属-有機フレームワーク(MOF)を逆設計する可能性について検討する。具体的には、10個の金属ノードと15個の有機配位子からなる150個の仮説MOF構造を解析することにより、これらの構造を細孔体積とH_{2}$の取り込み値の4つの異なるクラスに分類する。次に、様々なQNLPモデル(単語のバッグ、DisCoCat(分布構成カテゴリー)、シーケンスベースモデル)を比較し、MOFデータセットを処理する上で最も効果的なアプローチを特定する。 IBM Qiskitによって提供される古典的なシミュレータを用いて、単語のバッグ・オブ・ワード・モデルは最適なモデルであると同定され、それぞれ、細孔体積のバイナリ分類タスクに対して85.7%と86.7%の検証精度を達成する。さらに, 量子回路の確率的性質に合わせた多クラス分類モデルを開発し, 平均試験精度88.4%, 80.7%で, 細孔体積およびH_{2}$のアップテイクデータセットについて検討した。最後に、ターゲット特性を持つMOFの生成性能は、孔容積が93.5%、H_{2}$が89%であった。我々の調査はMOFの検索領域のごく一部しかカバーしていないが、材料設計に量子コンピューティングを使うための有望な第一歩であり、MOFの複雑な景観を探索する新しい視点を提供する。

In this study, we explore the potential of using quantum natural language processing (QNLP) to inverse design metal-organic frameworks (MOFs) with targeted properties. Specifically, by analyzing 150 hypothetical MOF structures consisting of 10 metal nodes and 15 organic ligands, we categorize these structures into four distinct classes for pore volume and $H_{2}$ uptake values. We then compare various QNLP models (i.e. the bag-of-words, DisCoCat (Distributional Compositional Categorical), and sequence-based models) to identify the most effective approach to process the MOF dataset. Using a classical simulator provided by the IBM Qiskit, the bag-of-words model is identified to be the optimum model, achieving validation accuracies of 85.7% and 86.7% for binary classification tasks on pore volume and $H_{2}$ uptake, respectively. Further, we developed multi-class classification models tailored to the probabilistic nature of quantum circuits, with average test accuracies of 88.4% and 80.7% across different classes for pore volume and $H_{2}$ uptake datasets. Finally, the performance of generating MOF with target properties showed accuracies of 93.5% for pore volume and 89% for $H_{2}$ uptake, respectively. Although our investigation covers only a fraction of the vast MOF search space, it marks a promising first step towards using quantum computing for materials design, offering a new perspective through which to explore the complex landscape of MOFs.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 最大エントロピーを用いた逆処理強化学習

Reward-Punishment Reinforcement Learning with Maximum Entropy ( http://arxiv.org/abs/2405.11784v1 )

ライセンス: Link先を確認

Jiexin Wang, Eiji Uchibe,

(参考訳) 我々は,長期的政策エントロピーの最適化を報奨助成強化学習目標に統合した ‘soft Deep MaxPain' (softDMP) アルゴリズムを導入する。私たちのモチベーションは、従来の `max'' や `min'' の演算子以外の動作値の更新に使用される演算子のスムーズな変動を促進することにあります。また、以前のDeep MaxPain法から未解決の2つの問題にも対処する。まず,罰行動値から得られる負の(「フリップ」)鎮痛サブ政治が,「ミン」オペレータと協調して罰モジュールを効果的に学習し,ソフトDMPのスムーズな学習オペレータが「フリップ」のトリックについてどのように洞察するかを検討する。第2に,統一行動政策における<flipped'サブ政治(Pain-avoidanceサブ政治)の関与による不整合を軽減するために,罰則を学習するデータ収集の課題に取り組む。 2つの離散マルコフ決定過程(MDP)環境での最初の課題を実証的に探求し、DMPアプローチの重要な進歩とハード演算子に対するソフト処理の必要性を解明する。第2号では、痛覚サブ政治と痛覚サブ政治と目標達成サブ政治の合計との比率に基づく確率的分類器を提案する。この分類器は、それぞれ報酬と罰則値関数を更新するリプレイバッファにロールアウトを割り当てる。本稿では,ROS Gazebo シミュレーションにより,Turtlebot 3 の迷路ナビゲーションタスクにおいて優れた性能を示す。

We introduce the ``soft Deep MaxPain'' (softDMP) algorithm, which integrates the optimization of long-term policy entropy into reward-punishment reinforcement learning objectives. Our motivation is to facilitate a smoother variation of operators utilized in the updating of action values beyond traditional ``max'' and ``min'' operators, where the goal is enhancing sample efficiency and robustness. We also address two unresolved issues from the previous Deep MaxPain method. Firstly, we investigate how the negated (``flipped'') pain-seeking sub-policy, derived from the punishment action value, collaborates with the ``min'' operator to effectively learn the punishment module and how softDMP's smooth learning operator provides insights into the ``flipping'' trick. Secondly, we tackle the challenge of data collection for learning the punishment module to mitigate inconsistencies arising from the involvement of the ``flipped'' sub-policy (pain-avoidance sub-policy) in the unified behavior policy. We empirically explore the first issue in two discrete Markov Decision Process (MDP) environments, elucidating the crucial advancements of the DMP approach and the necessity for soft treatments on the hard operators. For the second issue, we propose a probabilistic classifier based on the ratio of the pain-seeking sub-policy to the sum of the pain-seeking and goal-reaching sub-policies. This classifier assigns roll-outs to separate replay buffers for updating reward and punishment action-value functions, respectively. Our framework demonstrates superior performance in Turtlebot 3's maze navigation tasks under the ROS Gazebo simulation.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 構造に基づく医薬品設計を促進するための多目的生成AI

Guided Multi-objective Generative AI to Enhance Structure-based Drug Design ( http://arxiv.org/abs/2405.11785v1 )

ライセンス: Link先を確認

Amit Kadan, Kevin Ryczko, Adrian Roitberg, Takeshi Yamazaki,

(参考訳) 生成AIは、薬物発見に革命をもたらす可能性がある。しかし、最近の機械学習の進歩にもかかわらず、既存のモデルでは全ての物理化学的性質を満たす分子を生成できない。本稿では, 深層拡散と多目的最適化を組み合わせた新規な生成化学AIであるIDOLproについて述べる。拡散モデルの潜伏変数は、無チャートな化学空間を探索し、シリカで新規な配位子を生成し、複数の標的物理化学的特性を最適化するために、微分可能なスコアリング関数で導かれる。 2つのベンチマークセット上で、結合親和性と合成アクセシビリティを最適化したリガンドを生成することで、その効果を実証する。 IDOLproは、各テストセットにおける次の最高の最先端よりも10%以上の結合親和性を持つリガンドを生成する。実験複合体の試験セットでは、IDOLproは実験で観察された配位子の性能を初めて上回った。 IDOLproは、他のスコアリング機能(ADME-Toxなど)に対応して、ヒットフィディング、ヒット・ツー・リード、および薬物発見のためのリード最適化を加速することができる。

Generative AI has the potential to revolutionize drug discovery. Yet, despite recent advances in machine learning, existing models cannot generate molecules that satisfy all desired physicochemical properties. Herein, we describe IDOLpro, a novel generative chemistry AI combining deep diffusion with multi-objective optimization for structure-based drug design. The latent variables of the diffusion model are guided by differentiable scoring functions to explore uncharted chemical space and generate novel ligands in silico, optimizing a plurality of target physicochemical properties. We demonstrate its effectiveness by generating ligands with optimized binding affinity and synthetic accessibility on two benchmark sets. IDOLpro produces ligands with binding affinities over 10% higher than the next best state-of-the-art on each test set. On a test set of experimental complexes, IDOLpro is the first to surpass the performance of experimentally observed ligands. IDOLpro can accommodate other scoring functions (e.g. ADME-Tox) to accelerate hit-finding, hit-to-lead, and lead optimization for drug discovery.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# TinyLLaVA Factory: 小型マルチモーダルモデルのためのモジュール化コードベース

TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models ( http://arxiv.org/abs/2405.11788v1 )

ライセンス: Link先を確認

Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang, Ji Wu,

(参考訳) 小型の大規模マルチモーダルモデル(LMM)のためのオープンソースのモジュラーコードベースであるTinyLLaVA Factoryを紹介し,コード実装の単純さ,新機能の拡張性,トレーニング結果の再現性などに注目した。ソフトウェアエンジニアリングにおけるファクトリパターンの設計哲学に従い、TinyLLaVA Factoryはシステム全体を交換可能なコンポーネントにモジュール化し、各コンポーネントは最先端のモデルとメソッドのスイートを統合する一方で、より多くの機能の拡張の余地を残している。 TinyLLaVA Factoryは、ユーザが独自のLMMをカスタマイズできるだけでなく、一般的なトレーニングレシピを提供して、コーディング作業の少ないモデルの事前トレーニングと微調整を可能にしている。経験的な実験はコードベースの有効性を検証する。 TinyLLaVA Factoryの目標は、研究者や実践者が安価な計算資源で小規模なLMMを設計し、訓練するという広い視野を探索するのを支援することである。

We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and training small-scale LMMs with affordable computational resources.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 祖父母の詐欺:高齢者のフラッドと人間層の概念に関するシステムパースペクティブケーススタディ

The Grandparent Scam: A Systems Perspective Case Study On Elder Fraud And The Concept Of Human Layering ( http://arxiv.org/abs/2405.11789v1 )

ライセンス: Link先を確認

Michelle Espinoza,

(参考訳) 2024年4月、81歳のオハイオ州の男性が殺人、暴行、誘拐で起訴された。その男は家族を害を脅かす詐欺から守っていると信じていた。彼が気づかなかったのは、61歳のUberドライバーが同じ詐欺の被害者だったことだ。このケーススタディでは、システムの観点から、祖父母のスカムの一般的な変種と、これらのスカムにおける良心の武器化について検討する。さらに, マネーロンダリングにおける層状化と, これらの詐欺の実行における人的層状化の並列性について検討した。

In April 2024, an 81-year-old Ohio man was charged with murder, assault, and kidnapping. The man believed that he was protecting his family from scammers threatening harm. What he did not realize was that the 61-year-old Uber driver he killed, was also a victim of the same scammers. This case study examines some common variants of the Grandparent Scam from a systems perspective and how weaponization of conscience is used in these scams. Additionally, this study examines the parallels between layering in money laundering and human layering in the execution of these scams.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# 重力子物理学:重力子、重力子ノイズ、重力デコヒーレンスの量子場理論-簡潔なチュートリアル

Graviton physics: Quantum field theory of gravitons, graviton noise and gravitational decoherence -- a concise tutorial ( http://arxiv.org/abs/2405.11790v1 )

ライセンス: Link先を確認

Jen-Tsung Hsiang, Hing-Tong Cho, Bei-Lok Hu,

(参考訳) 2015年に重力波が検出されたことで、ブラックホールや中性子星の強磁場のダイナミクスを観測できる新しい重力波天文学が誕生した。アインシュタインの古典的一般相対性理論の実験と宇宙実験のためのエキサイティングな新しい窓を開いた。近年、摂動重力の量子的性質を明らかにするための興味深い2つの提案がある。 1)重力場の真空後の初期の宇宙からの重力音が、膨張によって強く圧迫されたかの理論的予測。 2) 重畳状態における2つの質量間の量子エンタングルメントを用いた実験的提案。第一の提案は場の確率的性質を呼び起こし、第二の提案は量子情報の鍵となる概念を呼び起こす。同様にベーシックで興味深い考え方は、重力デコヒーレンス(英語版)として知られる量子系が古典的に見えるかどうかを問うことである。重力によるデコヒーレンス(decoherence)は、重力が普遍的であるため特に興味深い。これは、マクロ的な量子現象において重要な問題である。これらのエキサイティングな発展を十分に理解するには、古典的なGR、QF理論、QIの作業知識に加えて、確率過程、すなわち量子場のノイズに精通することが必要である。伝統的に、新しい研究者は、背景によって、GR、QFT、QI、SPという4つの主題のうちの1つまたは2つで会話することができる。このチュートリアルは、これらの4つの主題のうちのどれかの係わる読者が、これらの学際的な研究トピックのフロンティアに跳躍するのを手助けする。ここでは、このタイトルに記載されている3つのトピックを扱い、重力の絡みを和らげる。なぜなら、その性質と量子重力に関して宣言された含意が、まだ多くの議論を呼んでいる要素を含んでいるからである。

The detection of gravitational waves in 2015 ushered in a new era of gravitational wave astronomy capable of probing into the strong field dynamics of black holes and neutron stars. It has opened up an exciting new window for laboratory and space tests of Einstein's theory of classical general relativity. In recent years there are two interesting proposals aimed at revealing the quantum natures of perturbative gravity: 1) theoretical predictions in how graviton noise from the early universe after the vacuum of the gravitational field was strongly squeezed by inflationary expansion; 2) experimental proposals using the quantum entanglement between two masses each in a superposition state. The first proposal invokes the stochastic properties of quantum fields, the second invokes a key concept of quantum information. An equally basic and interesting idea is to ask whether and how gravity might be responsible for a quantum system becoming classical in appearance, known as gravitational decoherence. Decoherence due to gravity is of special interest because gravity is universal. This is an important issue in macroscopic quantum phenomena. To fully appreciate these exciting developments requires a working knowledge in classical GR, QF theory and QI plus some familiarity with stochastic processes, namely, noise in quantum fields. Traditionally a new researcher may be conversant in one or two of these four subjects: GR, QFT, QI, SP, depending on his/her background. This tutorial attempts to provide the necessary connections between them, helping an engaging reader from any one of these four subjects to leapfrog to the frontier of these interdisciplinary research topics. Here we shall treat the three topics listed in the title, save gravitational entanglement, because its nature and implications proclaimed in relation to quantum gravity still contain many controversial elements.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# MM-Retinal:Fundus Image-Text Expertiseによる知識強化基礎トレーニング

MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise ( http://arxiv.org/abs/2405.11793v1 )

ライセンス: Link先を確認

Ruiqi Wu, Chenran Zhang, Jianle Zhang, Yi Zhou, Tao Zhou, Huazhu Fu,

(参考訳) 現在の基礎画像分析モデルは、主に個々のデータセットに依存する特定のタスクのために構築されている。学習プロセスは、通常、事前の知識のないデータ駆動パラダイムに基づいており、転送性や一般化性が劣る。この問題に対処するため,プロフェッショナル・ファンドス・ダイアグラムから収集した高品質な画像テキスト・ペアを含むマルチモーダル・データセットMM-Retinalを提案する。さらに,MM-Retinalを用いて,Fundus Image-Textの専門知識を取り入れたKeepFITという,知識強化型基礎事前学習モデルを提案する。画像類似性に基づくテキストリビジョンと、専門家の知識を注入するための混合トレーニング戦略によって設計されている。提案するファウンデーションモデルは、6つの未知の下流タスクにまたがる最先端のパフォーマンスを実現し、ゼロショットおよび少数ショットシナリオにおいて優れた一般化能力を有する。 MM-RetinalとKeepFITはhttps://github.com/lxirich/MM-Retinalで入手できる。

Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, we present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise, called KeepFIT. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge. Our proposed fundus foundation model achieves state-of-the-art performance across six unseen downstream tasks and holds excellent generalization ability in zero-shot and few-shot scenarios. MM-Retinal and KeepFIT are available at https://github.com/lxirich/MM-Retinal.

翻訳日:2024-05-21 14:13:43 公開日:2024-05-20

# ViViD:拡散モデルを用いたビデオバーチャルトライオン

ViViD: Video Virtual Try-on using Diffusion Models ( http://arxiv.org/abs/2405.11794v1 )

ライセンス: Link先を確認

Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha,

(参考訳) Video Virtual try-onは、服のアイテムを対象者のビデオに転送することを目的としている。画像ベーストライオンの技法をフレームワイズで直接適用すると、時間的に一貫性のない結果が生じるが、従来のビデオベーストライオンソリューションでは、視覚的品質が低く、ぼやけた結果しか得られない。本稿では,ビデオ仮想試行の課題に取り組むために,強力な拡散モデルを用いた新しいフレームワークViViDを提案する。具体的には、Garment Encoderを設計し、細粒度の衣服のセマンティックな特徴を抽出し、提案した注目特徴融合機構を通じて、被服の詳細を捕捉し、対象映像に注入するモデルを導出する。空間的時間的整合性を確保するために,ポーズ信号を符号化する軽量なPose Encoderを導入し,衣服と姿勢の相互作用を学習し,階層型時間モジュールをテキストから画像への安定拡散モデルに挿入することで,よりコヒーレントでライフライクなビデオ合成を実現する。さらに、最も多様な種類の衣服と、ビデオバーチャルトライオンのタスクのための最高の解像度を備えた、最大規模のデータセットを収集する。大規模な実験により,本手法は良好なビデオ試行結果が得られることが示された。データセット、コード、ウェイトが公開される。プロジェクトページ: https://becauseimbatman0.github.io/ViViD。

Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on. Specifically, we design the Garment Encoder to extract fine-grained clothing semantic features, guiding the model to capture garment details and inject them into the target video through the proposed attention feature fusion mechanism. To ensure spatial-temporal consistency, we introduce a lightweight Pose Encoder to encode pose signals, enabling the model to learn the interactions between clothing and human posture and insert hierarchical Temporal Modules into the text-to-image stable diffusion model for more coherent and lifelike video synthesis. Furthermore, we collect a new dataset, which is the largest, with the most diverse types of garments and the highest resolution for the task of video virtual try-on to date. Extensive experiments demonstrate that our approach is able to yield satisfactory video try-on results. The dataset, codes, and weights will be publicly available. Project page: https://becauseimbatman0.github.io/ViViD.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# 時系列量子生成モデルの財務データへの応用

Application of time-series quantum generative model to financial data ( http://arxiv.org/abs/2405.11795v1 )

ライセンス: Link先を確認

Shun Okumura, Masayuki Ohzeki, Masaya Abe,

(参考訳) 複数のブラウン運動と相関する系列をうまく学習する時系列の量子生成モデルを提案したが、そのモデルは適応されず、財政的な問題に対して評価されていない。本研究では,時系列生成モデルを実際の財務データに量子生成モデルとして適用した。 2つの相関時系列の将来のデータを生成し、長い短期記憶やベクトル自己回帰といった古典的手法と比較した。さらに、欠落した値を完成させるために数値実験を行った。この結果をもとに,時系列量子生成モデルの実用化について検討した。その結果,従来の手法に比べてパラメータ値が少なかった。さらに、静止データと非定常データの両方に対して量子時系列生成モデルが実現可能であった。これらの結果から,様々な時系列データに適用可能なパラメータが示唆された。

Despite proposing a quantum generative model for time series that successfully learns correlated series with multiple Brownian motions, the model has not been adapted and evaluated for financial problems. In this study, a time-series generative model was applied as a quantum generative model to actual financial data. Future data for two correlated time series were generated and compared with classical methods such as long short-term memory and vector autoregression. Furthermore, numerical experiments were performed to complete missing values. Based on the results, we evaluated the practical applications of the time-series quantum generation model. It was observed that fewer parameter values were required compared with the classical method. In addition, the quantum time-series generation model was feasible for both stationary and nonstationary data. These results suggest that several parameters can be applied to various types of time-series data.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# ウイットネス光子を用いた電子・陽電子散乱の正規化エンタングルメントエントロピー

Regularized Entanglement Entropy of Electron-Positron Scattering with a Witness Photon ( http://arxiv.org/abs/2405.11799v1 )

ライセンス: Link先を確認

Shanmuka Shivashankara, Grace Gogliettino,

(参考訳) 散乱過程 $e^-e^+ \rightarrow \gamma,Z\rightarrow \mu^-\mu^+$ において、初期電子-陽電子状態に絡み合った証光子を有する正規化量子情報メトリクスを算出する。ユニタリ性は、最終密度行列とフォン・ノイマン絡み合いエントロピーの両方に現れる発散の正則化を意味する。エントロピーは不確実性やランダム性を定量化する。情報の変化、絡み合いエントロピー、ミューオンと目撃光子のヘリシティの相関関係は、等価な情報を伝達する。ムーンの予想されるヘリシティの大きさは、ヘリシティエントロピーが降るにつれて(滝)上昇する。領域、あるいは散乱断面は、ミューオンのヘリシティエントロピーと運動量エントロピーの源である。ミューオンの微分角エントロピー分布は微分角断面分布と似ており、質量エネルギーの高中心で前方の非対称性を捉えている。

Regularized quantum information metrics are calculated for the scattering process $e^-e^+ \rightarrow \gamma,Z\rightarrow \mu^-\mu^+$ that has a witness photon entangled with the initial electron-positron state. Unitarity implies the correct regularization of divergences that appear in both the final density matrix and von Neumann entanglement entropies. The entropies are found to quantify uncertainty or randomness. The variation of information, entanglement entropy, and correlation between the muon's and witness photon's helicities are found to convey equivalent information. The magnitude of the muon's expected helicity rises (falls) as the helicity entropy falls (rises). Area, or the scattering cross section, is a source of entropy for the muon's helicity entropy and momentum entropy. The muon's differential angular entropy distribution is similar to the differential angular cross section distribution, capturing the forward-backward asymmetry at high center of mass energies.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# 高等教育におけるジェネレーティブAI : 制度導入政策とガイドラインのグローバルな展望

Generative AI in Higher Education: A Global Perspective of Institutional Adoption Policies and Guidelines ( http://arxiv.org/abs/2405.11800v1 )

ライセンス: Link先を確認

Yueqiao Jin, Lixiang Yan, Vanessa Echeverria, Dragan Gašević, Roberto Martinez-Maldonado,

(参考訳) 生成型AI(GAI)を高等教育に組み込むことは,次世代のGAIリテラル学生の育成に不可欠である。しかし、グローバル・ノースとGAIの約束と課題に焦点を当てた以前の研究は、理論的なレンズが欠如していたため、国際機関の制度採用政策の徹底的な理解はいまだ残っていない。本研究は,世界6大学40校の高等教育におけるGAI導入戦略を検討するために,Diffusion of Innovations Theoryを利用する。大学政策やガイドラインに概説されたコミュニケーションチャネル,役割,責任を分析するとともに,GAIの互換性,試行性,可観測性などの特徴について考察する。その結果,大学によるGAI統合への積極的アプローチが明らかとなり,学術的完全性,教育と学習の強化,エクイティが強調された。慎重で楽観的な姿勢にもかかわらず、GAI統合の影響を評価し、より広範な利害関係者の関与を促進する効果的なコミュニケーション戦略を確立するためには、包括的な政策枠組みが必要である。本研究は、教員、学生、管理者がGAI統合を成功させる上での明確な役割と責任の重要性を強調し、教育におけるGAIの複雑さをナビゲートするための協調モデルを支援する。本研究は、政策立案者にとって、その統合のための詳細な戦略を構築する上での洞察に寄与する。

Integrating generative AI (GAI) into higher education is crucial for preparing a future generation of GAI-literate students. Yet a thorough understanding of the global institutional adoption policy remains absent, with most of the prior studies focused on the Global North and the promises and challenges of GAI, lacking a theoretical lens. This study utilizes the Diffusion of Innovations Theory to examine GAI adoption strategies in higher education across 40 universities from six global regions. It explores the characteristics of GAI innovation, including compatibility, trialability, and observability, and analyses the communication channels and roles and responsibilities outlined in university policies and guidelines. The findings reveal a proactive approach by universities towards GAI integration, emphasizing academic integrity, teaching and learning enhancement, and equity. Despite a cautious yet optimistic stance, a comprehensive policy framework is needed to evaluate the impacts of GAI integration and establish effective communication strategies that foster broader stakeholder engagement. The study highlights the importance of clear roles and responsibilities among faculty, students, and administrators for successful GAI integration, supporting a collaborative model for navigating the complexities of GAI in education. This study contributes insights for policymakers in crafting detailed strategies for its integration.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# LSEnet:ディープグラフクラスタリングのためのローレンツ構造エントロピーニューラルネットワーク

LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering ( http://arxiv.org/abs/2405.11801v1 )

ライセンス: Link先を確認

Li Sun, Zhenhao Huang, Hao Peng, Yujie Wang, Chunyang Liu, Philip S. Yu,

(参考訳) グラフクラスタリングは機械学習の基本的な問題である。近年、ディープラーニング手法は最先端の成果を達成しているが、事前に定義されたクラスタ番号なしでは動作できない。このような制限は、未知のクラスタ数を持つグラフクラスタリングにおいて、より難しい問題を提起する動機となります。本稿では,グラフ情報理論(構造情報)の新たな視点からこの問題に対処することを提案する。文献では、構造情報は深層クラスタリングにはまだ導入されておらず、古典的な定義は離散的な定式化やモデル化ノードの特徴に欠ける。本研究では、まず連続領域における微分可能な構造情報(DSI)を、いくつかの理論的結果とともに定式化する。 DSIを最小化することにより、グラフ内の密結合ノードが同じ割り当てを持つ傾向にある最適なパーティショニングツリーを構築し、クラスタ構造を明らかにする。 DSIはまた、事前に定義されたクラスタ番号を必要としない、新しいグラフクラスタリングの目的として理論的に提示される。さらに、双曲空間のローレンツモデルにニューラルLSEnetを設計し、多様体値グラフ畳み込みによる構造情報にノード特徴を統合する。実グラフ上の広範な実験結果は、我々のアプローチの優位性を示している。

Graph clustering is a fundamental problem in machine learning. Deep learning methods achieve the state-of-the-art results in recent years, but they still cannot work without predefined cluster numbers. Such limitation motivates us to pose a more challenging problem of graph clustering with unknown cluster number. We propose to address this problem from a fresh perspective of graph information theory (i.e., structural information). In the literature, structural information has not yet been introduced to deep clustering, and its classic definition falls short of discrete formulation and modeling node features. In this work, we first formulate a differentiable structural information (DSI) in the continuous realm, accompanied by several theoretical results. By minimizing DSI, we construct the optimal partitioning tree where densely connected nodes in the graph tend to have the same assignment, revealing the cluster structure. DSI is also theoretically presented as a new graph clustering objective, not requiring the predefined cluster number. Furthermore, we design a neural LSEnet in the Lorentz model of hyperbolic space, where we integrate node features to structural information via manifold-valued graph convolution. Extensive empirical results on real graphs show the superiority of our approach.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# ウェアラブルセンサを用いた非現実的説明に基づくバドミントン動作誘導

Counterfactual Explanation-Based Badminton Motion Guidance Generation Using Wearable Sensors ( http://arxiv.org/abs/2405.11802v1 )

ライセンス: Link先を確認

Minwoo Seong, Gwangbin Kim, Yumin Kang, Junhyuk Jang, Joseph DelPreto, SeungJun Kim,

(参考訳) 本研究では,マルチモーダルウェアラブルデータセットを用いて,パーソナライズされた動作ガイドを生成することにより,バドミントン選手のストローク品質を向上させる枠組みを提案する。これらのガイドは、逆ファクトアルゴリズムに基づいており、初心者と熟練者の間のパフォーマンスギャップを減らすことを目的としている。本手法は,専門家の知識を必要とせず,選手の動作改善を支援するために,可視化可能なデータを通じて共同レベルのガイダンスを提供する。本手法は,算術的尺度や運動特異的評価指標を含む有効性,近接性,妥当性を評価するために,従来のアルゴリズムに対して評価を行った。提案手法は,ストロークの質を高めつつも,動きの本質を維持できる動きを生成できることを示す。その結果, バドミントンストロークの任意の入力動作サンプルに対して, 対実動作誘導を生成することで, パーソナライズされたスポーツ運動ガイドを作成するためのアプローチの可能性を強調した。

This study proposes a framework for enhancing the stroke quality of badminton players by generating personalized motion guides, utilizing a multimodal wearable dataset. These guides are based on counterfactual algorithms and aim to reduce the performance gap between novice and expert players. Our approach provides joint-level guidance through visualizable data to assist players in improving their movements without requiring expert knowledge. The method was evaluated against a traditional algorithm using metrics to assess validity, proximity, and plausibility, including arithmetic measures and motion-specific evaluation metrics. Our evaluation demonstrates that the proposed framework can generate motions that maintain the essence of original movements while enhancing stroke quality, providing closer guidance than direct expert motion replication. The results highlight the potential of our approach for creating personalized sports motion guides by generating counterfactual motion guidance for arbitrary input motion samples of badminton strokes.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# (おそらく)人間翻訳を超えて:超長文翻訳のための多言語共同作業

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts ( http://arxiv.org/abs/2405.11804v1 )

ライセンス: Link先を確認

Minghao Wu, Yulin Yuan, Gholamreza Haffari, Longyue Wang,

(参考訳) 機械翻訳(MT)の最近の進歩は、様々な領域にわたる翻訳品質を大幅に向上させた。しかし、文語の翻訳は、複雑な言語、図形表現、文化的なニュアンスのために、依然として困難な課題である。本研究では,翻訳作業の複雑な要求に対処するため,複数のエージェントの集合的能力を活用することによって,従来の翻訳出版プロセスを反映するTransAgentsという企業として実装された,文語翻訳のための大規模言語モデル(LLM)に基づく新しいマルチエージェントフレームワークを提案する。本システムの有効性を評価するため,モノリンガル・ヒューマン・プライス(MHP)とバイリンガル・LLM・プライス(BLP)の2つの革新的な評価戦略を提案する。 MHPは対象言語の単言語読み手の観点から翻訳を評価し、BLPは翻訳を元のテキストと直接比較するために高度なLLMを使用している。実証的な結果は、低d-BLEUスコアにもかかわらず、TransAgentsからの翻訳は、人間による参照よりも人間による評価とLLMの両方、特にドメイン固有の知識を必要とするジャンルにおいて好まれていることを示している。また,ケーススタディを通じてTransAgentsの強みと限界を強調し,今後の研究の方向性を提案する。

Recent advancements in machine translation (MT) have significantly enhanced translation quality across various domains. However, the translation of literary texts remains a formidable challenge due to their complex language, figurative expressions, and cultural nuances. In this work, we introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents, which mirrors traditional translation publication process by leveraging the collective capabilities of multiple agents, to address the intricate demands of translating literary works. To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP). MHP assesses translations from the perspective of monolingual readers of the target language, while BLP uses advanced LLMs to compare translations directly with the original texts. Empirical findings indicate that despite lower d-BLEU scores, translations from TransAgents are preferred by both human evaluators and LLMs over human-written references, particularly in genres requiring domain-specific knowledge. We also highlight the strengths and limitations of TransAgents through case studies and suggests directions for future research.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# Distill-then-prune:エッジデバイス上でのリアルタイムステレオマッチングネットワークのための効率的な圧縮フレームワーク

Distill-then-prune: An Efficient Compression Framework for Real-time Stereo Matching Network on Edge Devices ( http://arxiv.org/abs/2405.11809v1 )

ライセンス: Link先を確認

Baiyu Pan, Jichao Jiao, Jianxing Pang, Jun Cheng,

(参考訳) 近年,リアルタイムステレオマッチング法が数多く導入されているが,精度は低いことが多い。これらの手法は、新しいモジュールの導入や従来のメソッドの統合によって精度の向上を試みる。しかし、改善は控えめなだけである。本稿では, 知識蒸留とモデルプルーニングを取り入れた新しい手法を提案し, 速度と精度のトレードオフを克服する。その結果,エッジデバイス上で高い精度を実現しつつ,リアルタイム性能を維持するモデルが得られた。提案手法は3つの重要なステップを含む。まず、これらの効率的なモデルから冗長なモジュールを除去し、それらのコントリビューションを比較することによって、最先端の手法をレビューし、軽量モデルの設計を行う。次に,教師としての効率的なモデルを利用して,知識を軽量モデルに抽出する。最後に、我々は、最終モデルを得るために、軽量モデルを体系的に訓練する。 Sceneflow と KITTI の2つの広く使われているベンチマークで行った広範な実験を通じて,各モジュールの有効性を解析し,その結果を提示する。

In recent years, numerous real-time stereo matching methods have been introduced, but they often lack accuracy. These methods attempt to improve accuracy by introducing new modules or integrating traditional methods. However, the improvements are only modest. In this paper, we propose a novel strategy by incorporating knowledge distillation and model pruning to overcome the inherent trade-off between speed and accuracy. As a result, we obtained a model that maintains real-time performance while delivering high accuracy on edge devices. Our proposed method involves three key steps. Firstly, we review state-of-the-art methods and design our lightweight model by removing redundant modules from those efficient models through a comparison of their contributions. Next, we leverage the efficient model as the teacher to distill knowledge into the lightweight model. Finally, we systematically prune the lightweight model to obtain the final model. Through extensive experiments conducted on two widely-used benchmarks, Sceneflow and KITTI, we perform ablation studies to analyze the effectiveness of each module and present our state-of-the-art results.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# FedCAda: 加速された安定したフェデレーションラーニングのための適応的なクライアントサイド最適化

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning ( http://arxiv.org/abs/2405.11811v1 )

ライセンス: Link先を確認

Liuzhi Zhou, Yu He, Kun Zhai, Xiang Liu, Sen Liu, Xingjun Ma, Guangnan Ye, Yu-Gang Jiang, Hongfeng Chai,

(参考訳) フェデレートラーニング(FL)は、データプライバシを保ちながら、分散クライアント全体にわたる機械学習モデルの協調トレーニングにおいて、顕著なアプローチとして登場した。しかし、加速と安定性のバランスを取ろうとする試みは、特にクライアント側においてFLにおいて重要な課題となっている。本稿では,この課題に対処するために設計された,革新的なクライアント適応アルゴリズムであるFedCAdaを紹介する。 FedCAdaはAdamアルゴリズムを利用して、クライアント側の第1モーメント推定$m$と第2モーメント推定$v$の補正プロセスを調整し、サーバ側の適応アルゴリズムパラメータを集約し、安定性と性能を確保しつつ収束速度と通信効率を向上する。さらに、異なる調整関数を組み込んだいくつかのアルゴリズムについて検討する。この比較分析により、フェデレート学習の初期段階において、他のクライアントからのクライアントモデルに含まれる限られた情報により、適応アルゴリズムのパラメータにより実質的な制約が課されることが判明した。フェデレーション学習が進行し、クライアントがよりグローバルな情報を集めるにつれ、FedCAdaは適応パラメータへの影響を徐々に減少させていく。これらの知見は、アルゴリズム改善の堅牢性と効率を高めるための洞察を与える。コンピュータビジョン(CV)と自然言語処理(NLP)データセットに関する広範な実験を通じて、FedCAdaは適応性、収束性、安定性、全体的なパフォーマンスにおいて最先端の手法よりも優れていることを示した。この研究は、連合学習のための適応アルゴリズムに寄与し、さらなる探索を奨励する。

Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. FedCAda leverages the Adam algorithm to adjust the correction process of the first moment estimate $m$ and the second moment estimate $v$ on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# 非線形リンドブラッドマスター方程式とポスト選択皮膚効果

Nonlinear Lindblad Master Equation and Postselected Skin Effect ( http://arxiv.org/abs/2405.11812v1 )

ライセンス: Link先を確認

Yu-Guo Liu, Shu Chen,

(参考訳) リンドブラッドマスター方程式は、リンドブラッドマスター方程式と有効非エルミート・ハミルトニアンによって支配される動的方程式とを連続的に補間する。非線型リンドブラッド・マスター方程式の枠組みの中で、いくつかの量子ジャンプ項が後処理によって破棄される限り、一方の粒子の蓄積を特徴とする定常状態の分布を特徴とする、原型モデルとポストセレクテッドスキン効果の存在を実証する。さらに, トラジェクティブ平均エンタングルメントエントロピーは, 環境とポストセレクションの異なる影響を反映し, 短鎖の代数的成長と長鎖の飽和にともなう特殊分布を呈することを示した。

We introduce a non-linear Lindblad master equation to describe the postselection dynamics of open quantum systems described by the Lindblad master equation, which continuously interpolates between the Lindblad master equation and the dynamical equation governed by an effective non-Hermitian Hamiltonian. Within the framework of the non-linear Lindblad master equation, we study a prototypical model and demonstrate the existence of the postselected skin effect with the distribution of a steady state characterized by the accumulation of particles on one side, as long as some quantum jumping terms are discarded by postselction processes. Moreover, we show that the trajectory-averaged entanglement entropy can reflect the different influences from the environment and postselection, and unveil it exhibiting a special distribution with algebraic growth in the short chain and saturation in the long chain induced by the postselected skin effect.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# ナスカ世界遺産へのクリマティック・人類起源のハザード:リモートセンシング、AI、洪水モデルの適用

Climatic & Anthropogenic Hazards to the Nasca World Heritage: Application of Remote Sensing, AI, and Flood Modelling ( http://arxiv.org/abs/2405.11814v1 )

ライセンス: Link先を確認

Masato Sakai, Marcus Freitag, Akihisa Sakurai, Conrad M Albrecht, Hendrik F Hamann,

(参考訳) ペルーのユネスコの世界遺産にあるナスカ地形の保存は、自然と人間の影響が加速するにつれて急務である。フラッシュフルートのようなより頻繁な天候は、ナスカの人工物を脅かす。我々は、(サブ)メートルスケールに基づく流出モデル、LiDAR由来のデジタル標高データにより、浸食の危険があるAI検出ジオグリフをハイライトできることを実証した。我々は、パンアメリカン・ハイウェイに近い有名な「リザード」、「ツリー」および「ハンド」ジオグリフを守るために緩和策を推奨する。

Preservation of the Nasca geoglyphs at the UNESCO World Heritage Site in Peru is urgent as natural and human impact accelerates. More frequent weather extremes such as flashfloods threaten Nasca artifacts. We demonstrate that runoff models based on (sub-)meter scale, LiDAR-derived digital elevation data can highlight AI-detected geoglyphs that are in danger of erosion. We recommend measures of mitigation to protect the famous "lizard", "tree", and "hand" geoglyphs located close by, or even cut by the Pan-American Highway.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# マルチ環境メタラーニングによるCSIに基づく位置決めのための移動学習

Transfer Learning for CSI-based Positioning with Multi-environment Meta-learning ( http://arxiv.org/abs/2405.11816v1 )

ライセンス: Link先を確認

Anastasios Foliadis, Mario H. Castañeda, Richard A. Stirling-Gallacher, Reiner S. Thomä,

(参考訳) チャネル状態情報(CSI)指紋によるユーザ機器(UE)の無線位置決めのための深層学習(DL)技術の利用は大きな可能性を秘めている。 DLモデルは、特定の環境のCSI指紋から複雑な特徴を抽出し、UEの位置を正確に予測することができる。それでも、CSIフィンガープリントでトレーニングされたDLモデルの有効性は、特定のトレーニング環境に大きく依存しており、異なる環境におけるトレーニングされたモデルの適用性を制限している。本稿では,2つの部分からなる新しいDLモデル構造を提案する。第1部は特定の環境から独立な特徴を特定することを目的としており,第2部は環境特異的な特徴と位置決めの目的を組み合わせている。このような2部構成のモデルをトレーニングするために,第1部ではマルチ環境メタラーニング(MEML)アプローチを提案し,第2部では特定の環境のデータのみに基づいてトレーニングを行う。その結果,新しい未確認環境におけるDLモデルの重み付けを初期化するためのMEML手法を用いることで,新たなターゲット環境におけるUE位置決めの精度が向上し,不確実性評価の信頼性が向上することが示唆された。本手法は、環境間の直接移動学習(DTL)や、新しい環境からのデータをゼロから完全に学習するなど、従来の移動学習方法よりも優れている。提案手法は,LOS(Line-of-sight)環境とNLOS(Non-LOS)環境での実測値を用いて検証する。

Utilizing deep learning (DL) techniques for radio-based positioning of user equipment (UE) through channel state information (CSI) fingerprints has demonstrated significant potential. DL models can extract complex characteristics from the CSI fingerprints of a particular environment and accurately predict the position of a UE. Nonetheless, the effectiveness of the DL model trained on CSI fingerprints is highly dependent on the particular training environment, limiting the trained model's applicability across different environments. This paper proposes a novel DL model structure consisting of two parts, where the first part aims at identifying features that are independent from any specific environment, while the second part combines those features in an environment specific way with the goal of positioning. To train such a two-part model, we propose the multi-environment meta-learning (MEML) approach for the first part to facilitate training across various environments, while the second part of the model is trained solely on data from a specific environment. Our findings indicate that employing the MEML approach for initializing the weights of the DL model for a new unseen environment significantly boosts the accuracy of UE positioning in the new target environment as well the reliability of its uncertainty estimation. This method outperforms traditional transfer learning methods, whether direct transfer learning (DTL) between environments or completely training from scratch with data from a new environment. The proposed approach is verified with real measurements for both line-of-sight (LOS) and non-LOS (NLOS) environments.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# ChatGPTを用いた医療システム工学の体系的検討

Systematic Review on Healthcare Systems Engineering utilizing ChatGPT ( http://arxiv.org/abs/2405.11817v1 )

ライセンス: Link先を確認

Jungwoo Kim, Ji-Su Lee, Huijae Kim, Taesik Lee,

(参考訳) 本稿では、最近の言語モデルにおける最先端のツールであるChatGPTを用いて、医療システム工学の分野における学術的レビューを行うための分析フレームワークを提案する。講演の要約文9,809節を用いて,分野を体系的に検討した。このフレームワークは、それぞれがカスタマイズされたプロンプトとChatGPT APIの体系的な使用を使用して、異なる分析プロセスで構成されている。この枠組みを通じて,対象分野を11のトピックカテゴリに分類し,年次傾向と詳細なサブカテゴリを包括的に分析した。この取り組みは、ChatGPTを活用して学術レビューの負担を軽減する可能性を探るものである。さらに、医療システム工学研究のダイナミックな景観に関する貴重な洞察を提供する。

This paper presents an analytical framework for conducting academic reviews in the field of Healthcare Systems Engineering, employing ChatGPT, a state-of-the-art tool among recent language models. We utilized 9,809 abstract paragraphs from conference presentations to systematically review the field. The framework comprises distinct analytical processes, each employing tailored prompts and the systematic use of the ChatGPT API. Through this framework, we organized the target field into 11 topic categories and conducted a comprehensive analysis covering quantitative yearly trends and detailed sub-categories. This effort explores the potential for leveraging ChatGPT to alleviate the burden of academic reviews. Furthermore, it provides valuable insights into the dynamic landscape of Healthcare Systems Engineering research.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# Beyond MLE:低リソースニューラルネットワーク翻訳のためのSEARNNの調査

Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation ( http://arxiv.org/abs/2405.11819v1 )

ライセンス: Link先を確認

Chris Emezue,

(参考訳) 機械翻訳のような構造化予測タスクには、構造化された入力を構造化された出力にマッピングする学習機能が含まれる。リカレントニューラルネットワーク(Recurrent Neural Networks, RNN)は、自然言語処理(NLP)アプリケーションなど、歴史的にそのようなタスクに人気がある。しかし、MLE(Maximum Likelihood Estimation)を使用したRNNのトレーニングには、露出バイアスやトレーニングとテストのメトリクスのミスマッチなど、制限がある。 SEARNNは、L2S(Learning to Search)フレームワークに基づいて、MLEに代わるRNNトレーニングとして提案されている。このプロジェクトでは、低リソースのアフリカの言語に対する機械翻訳を改善するSEARNNの可能性について検討した。英語のIgbo、フランス語のShaewe、フランス語のShaghomalaへの翻訳実験を通じて、このプロジェクトはこれらの言語がもたらす固有の課題に対処する上で、MLEに対するSEARNNの有効性を評価した。 MLE の目標に対して平均 BLEU スコアが 5.4$\% 改善されていることから,SEARNN は低リソース言語に対する機械翻訳において,RNN を効果的に訓練するためのアルゴリズムとして有効であることが証明された。

Structured prediction tasks, like machine translation, involve learning functions that map structured inputs to structured outputs. Recurrent Neural Networks (RNNs) have historically been a popular choice for such tasks, including in natural language processing (NLP) applications. However, training RNNs using Maximum Likelihood Estimation (MLE) has its limitations, including exposure bias and a mismatch between training and testing metrics. SEARNN, based on the learning to search (L2S) framework, has been proposed as an alternative to MLE for RNN training. This project explored the potential of SEARNN to improve machine translation for low-resourced African languages -- a challenging task characterized by limited training data availability and the morphological complexity of the languages. Through experiments conducted on translation for English to Igbo, French to \ewe, and French to \ghomala directions, this project evaluated the efficacy of SEARNN over MLE in addressing the unique challenges posed by these languages. With an average BLEU score improvement of $5.4$\% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# 機械学習による藻と木材の共熱分解における相乗効果の3相解析

A Three-Phase Analysis of Synergistic Effects During Co-pyrolysis of Algae and Wood for Biochar Yield Using Machine Learning ( http://arxiv.org/abs/2405.11821v1 )

ライセンス: Link先を確認

Subhadeep Chakrabarti, Saish Shinde,

(参考訳) 熱分解技術は、プラスチック、木材、作物の残留物、果物の皮など、天然および人工のバイオマス製品を効果的に利用するための画期的な技術である。近年の進歩は、異なるバイオマスを一定の割合で混合することにより、バイオチャー、バイオオイル、その他の非凝縮性ガスのような必須成分の収率を高めている。藻類と木質バイオマスの共熱分解を併用した2つの熱分解原料の相乗効果を系統的に研究し, 共熱分解の3つの相に分類した。 ML と DL の異なるアルゴリズムは,2 つの異なるバイオマスの相乗効果が生物果樹収に与える影響を網羅的に概観するために,回帰法と分類法に利用されてきた。第1段階では, 完全MSEスコア0.00の決定木回帰器と, 勾配ブースティング回帰器を用いて, バイオチャー収率の最良の予測値を得た。第2相をML法とDL法の両方を用いて解析した。 ML内では,DNNをディープラーニング技術として用いた精度スコア0.972のSVRが最も便利なモデルであることが判明した。最後に, 第3相において, バイオチャー収率を40%以下とした場合, 加熱速度を伴わない2次分類をバイオチャー収率に適用した。 MLの最良のテクニックはSupport Vectorで、次にRandom forestが続き、ANNが最も適したDeep Learning Techniqueだった。

Pyrolysis techniques have served to be a groundbreaking technique for effectively utilising natural and man-made biomass products like plastics, wood, crop residue, fruit peels etc. Recent advancements have shown a greater yield of essential products like biochar, bio-oil and other non-condensable gases by blending different biomasses in a certain ratio. This synergy effect of combining two pyrolytic raw materials i.e co-pyrolysis of algae and wood biomass has been systematically studied and grouped into 3 phases in this research paper-kinetic analysis of co-pyrolysis, correlation among proximate and ultimate analysis with bio-char yield and lastly grouping of different weight ratios based on biochar yield up to a certain percentage. Different ML and DL algorithms have been utilized for regression and classification techniques to give a comprehensive overview of the effect of the synergy of two different biomass materials on biochar yield. For the first phase, the best prediction of biochar yield was obtained by using a decision tree regressor with a perfect MSE score of 0.00, followed by a gradient-boosting regressor. The second phase was analyzed using both ML and DL techniques. Within ML, SVR proved to be the most convenient model with an accuracy score of 0.972 with DNN employed for deep learning technique. Finally, for the third phase, binary classification was applied to biochar yield with and without heating rate for biochar yield percentage above and below 40%. The best technique for ML was Support Vector followed by Random forest while ANN was the most suitable Deep Learning Technique.

翻訳日:2024-05-21 14:03:49 公開日:2024-05-20

# FeTT:特徴変換チューニングによる連続的な授業インクリメンタルラーニング

FeTT: Continual Class Incremental Learning via Feature Transformation Tuning ( http://arxiv.org/abs/2405.11822v1 )

ライセンス: Link先を確認

Sunyuan Qiang, Xuxin Lin, Yanyan Liang, Jun Wan, Du Zhang,

(参考訳) 継続学習(CL)は、静的で囲われた環境から動的で複雑なシナリオへ、より深いモデルを拡張することを目的としており、システムは、以前に学習した知識を忘れずに、新しいカテゴリの新しい知識を継続的に取得できる。最近のCLモデルは、パラメータ効率の細かい調整(PEFT)戦略による事前学習モデル(PTM)の利用に徐々に移行している。しかし, 連続的な微調整は, 従来のタスクデータが欠如していることから, 破滅的な忘れ込みが深刻な課題となっている。さらに、ファインチューン・テン・フリーズ機構は、最初のCLタスクにおける特徴チャネルの抑制と不十分なトレーニングデータによる性能制限に悩まされる。そこで本研究では, CLトレーニングデータから独立して動作するだけでなく, 過剰な抑制を防止するために, 特徴チャネルを円滑にする機能変換チューニング(FeTT)モデルを提案する。そして,FeTTモデルに異なるPTMを組み込んだ拡張アンサンブル戦略により,さらなる性能向上が期待できる。さらに, クラス境界分布と特徴チャネルの相違点の観点から, ファインチューン・テン・フリーズパラダイムとFeTTモデルの議論について詳しく述べる。 CLベンチマークの大規模な実験により,提案手法の有効性が検証された。

Continual learning (CL) aims to extend deep models from static and enclosed environments to dynamic and complex scenarios, enabling systems to continuously acquire new knowledge of novel categories without forgetting previously learned knowledge. Recent CL models have gradually shifted towards the utilization of pre-trained models (PTMs) with parameter-efficient fine-tuning (PEFT) strategies. However, continual fine-tuning still presents a serious challenge of catastrophic forgetting due to the absence of previous task data. Additionally, the fine-tune-then-frozen mechanism suffers from performance limitations due to feature channels suppression and insufficient training data in the first CL task. To this end, this paper proposes feature transformation tuning (FeTT) model to non-parametrically fine-tune backbone features across all tasks, which not only operates independently of CL training data but also smooths feature channels to prevent excessive suppression. Then, the extended ensemble strategy incorporating different PTMs with FeTT model facilitates further performance improvement. We further elaborate on the discussions of the fine-tune-then-frozen paradigm and the FeTT model from the perspectives of discrepancy in class marginal distributions and feature channels. Extensive experiments on CL benchmarks validate the effectiveness of our proposed method.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# 光場画像再構成のためのdpMVからデュアルピクセルへのステレオ知識蒸留

Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction ( http://arxiv.org/abs/2405.11823v1 )

ライセンス: Link先を確認

Aryan Garg, Raghav Mallampali, Akshat Joshi, Shrisudhan Govindarajan, Kaushik Mitra,

(参考訳) デュアルピクセルは、デフォーカスぼけから生じる不透明な手がかりを含む。この異質な情報は、自動運転から3Dクリエイティブリアリズムまで、多くのビジョンタスクに役立ちます。しかし、デュアルピクセルとの差を直接推定するのは正確ではない。この研究は、暗黙的または明示的に、高精度な暗黒ステレオ知識を効率の良いデュアルピクセルの学生ネットワークに蒸留することで、忠実な再構築を可能にするという仮説を立てた。このダークナレッジ蒸留は、パラメータと推論時間効率を劇的に増加させながら、ステレオ同期セットアップとキャリブレーションコストを緩和する。暗黒知識蒸留仮説を検証するため,第1,第1,第2の2画素ビデオデータセットdpMVを収集した。これらの手法は純粋に単分子解よりも優れており、特にデュアルピクセルからの忠実なガイダンスを用いて、前景と背景の分離に挑戦する。最後に,dpMVによるアンロックと暗黙の暗黙の知識蒸留を,光電場(LF)ビデオ再構成のための教師のアンサンブルから示す。我々のLFビデオ再構成法は,現在までに最も高速かつ時間的に一貫性がある。高パラメータ効率、暗黙の非閉塞処理、ゼロショットのクロスデータセット転送、高次空間角分解能の幾何的一貫した推論、適応的ベースライン制御など、多くの重要な特性を提供する一方で、再現性には競争力がある。すべてのソースコードは匿名リポジトリhttps://github.com/Aryan-Garg.comで入手できる。

Dual pixels contain disparity cues arising from the defocus blur. This disparity information is useful for many vision tasks ranging from autonomous driving to 3D creative realism. However, directly estimating disparity from dual pixels is less accurate. This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks enables faithful reconstructions. This dark knowledge distillation should also alleviate stereo-synchronization setup and calibration costs while dramatically increasing parameter and inference time efficiency. We collect the first and largest 3-view dual-pixel video dataset, dpMV, to validate our explicit dark knowledge distillation hypothesis. We show that these methods outperform purely monocular solutions, especially in challenging foreground-background separation regions using faithful guidance from dual pixels. Finally, we demonstrate an unconventional use case unlocked by dpMV and implicit dark knowledge distillation from an ensemble of teachers for Light Field (LF) video reconstruction. Our LF video reconstruction method is the fastest and most temporally consistent to date. It remains competitive in reconstruction fidelity while offering many other essential properties like high parameter efficiency, implicit disocclusion handling, zero-shot cross-dataset transfer, geometrically consistent inference on higher spatial-angular resolutions, and adaptive baseline control. All source code is available at the anonymous repository https://github.com/Aryan-Garg.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# 開量子系におけるフォン・ノイマンエントロピーの時間発展

Time evolution of the von Neumann entropy in open quantum system ( http://arxiv.org/abs/2405.11824v1 )

ライセンス: Link先を確認

Kohei Kobayashi,

(参考訳) オープン量子力学の制御は量子技術の実現に大きな関心を持つ。したがって、デコヒーレンスの下でのオープン量子系のエントロピーを定量化し、特徴付けることは重要なタスクである。本稿では、リンドブラッドマスター方程式によって記述された開量子系に対するフォン・ノイマンエントロピーの時間発展について研究する。特に、デコヒーレンスが系の可観測性の測定に対応するとき、フォン・ノイマンのエントロピーは分散が大きくなるにつれて単調に増加する傾向があることに注意されたい。さらに、フォン・ノイマンエントロピーの下界を長時間の極限に示す。この下界は直接計算され、一般的なマルコフ開量子系に適用できるという利点がある。

Control of open quantum dynamics is of great interest for realizing quantum technologies. Therefore, it is an important task to quantify and characterize the entropy for open quantum systems under decoherence. In this paper, we study the time evolution of the von Neumann entropy for open quantum systems described by the Lindblad master equation. Note that, in particular, when the decoherence corresponds to the measurement for the observable in the system, the von Neumann entropy tends to monotonically increases as the variance becomes larger. Furthermore, we present a lower bound of the von Neumann entropy in the long-time limit. This lower bound has advantages of being straightforwardly calculated and applicable to a general Markovian open quantum system.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# AIベースの競争プラットフォームにおける技術的負債の測定

Measuring Technical Debt in AI-Based Competition Platforms ( http://arxiv.org/abs/2405.11825v1 )

ライセンス: Link先を確認

Dionysios Sklavenitis, Dimitris Kalles,

(参考訳) AIの進歩は、ソフトウェアエンジニアリングプロジェクトにおける新しいタイプの技術的負債につながった。 AIベースの競争プラットフォームは、迅速なプロトタイピングと、参加者によるソフトウェアエンジニアリング原則の遵守の欠如により、技術的負債が発生しているため、課題に直面している。さらに、オーガナイザはプラットフォームの品質を評価する方法がなく、持続可能性や保守性に影響を与えます。本研究では,スクーピングレビューを通じて,AIシステムにおける技術的負債の種類を特定し,分類する。我々は,AIコンペティションプラットフォームにおける技術的負債の評価,アルゴリズム,アーキテクチャ,コード,構成,データなど,さまざまなタイプの負債を分類するアンケートを開発する。 AIコンペティションプラットフォームに特化したアクセシビリティ負債を導入し、不適切なプラットフォームのユーザビリティのために参加者が直面する課題を強調します。技術的負債を管理するためのフレームワークは、これらのプラットフォームの持続可能性と有効性を改善し、研究者、オーガナイザ、参加者にツールを提供することを目的としています。

Advances in AI have led to new types of technical debt in software engineering projects. AI-based competition platforms face challenges due to rapid prototyping and a lack of adherence to software engineering principles by participants, resulting in technical debt. Additionally, organizers often lack methods to evaluate platform quality, impacting sustainability and maintainability. In this research, we identify and categorize types of technical debt in AI systems through a scoping review. We develop a questionnaire for assessing technical debt in AI competition platforms, categorizing debt into various types, such as algorithm, architectural, code, configuration, data etc. We introduce Accessibility Debt, specific to AI competition platforms, highlighting challenges participants face due to inadequate platform usability. Our framework for managing technical debt aims to improve the sustainability and effectiveness of these platforms, providing tools for researchers, organizers, and participants.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# 不完全なセンシングモダリティによるフェデレーション学習

Federated Learning with Incomplete Sensing Modalities ( http://arxiv.org/abs/2405.11828v1 )

ライセンス: Link先を確認

Adiba Orzikulova, Jaehyun Kwak, Jaemin Shin, Sung-Ju Lee,

(参考訳) 多くのモバイルセンシングアプリケーションは、モバイルおよびウェアラブルデバイスにおけるモーションや生理的センサーなど、さまざまなモダリティのデータを活用している。フェデレートラーニング(FL)は、プライバシー保護機能のおかげで、これらのアプリケーションに特に適しています。しかし、バッテリ寿命の制限、ネットワーク条件の低さ、センサーの故障といった課題は、ローカルモデルトレーニングで利用可能なすべてのモダリティの使用を制限する可能性がある。さらに、既存のマルチモーダルFLシステムは、モダリティ源の数が増えるにつれてスケーラビリティと効率性にも苦慮している。これらの問題に対処するため,不完全なマルチモーダルFLを実現するためのフレームワークであるFLISMを紹介する。 FLISMはシミュレーション手法を利用して、欠落したモダリティを処理できる堅牢な表現を学習し、様々なモダリティセットを持つクライアント間でモデル知識を転送する。 3つの実世界のデータセットとシミュレーションによる評価結果は、FLISMのモデル性能とシステム効率の効果的なバランスを示す。 F1スコアでの.067の平均的な改善に加えて、通信(2.69倍高速)と計算(2.28倍高速)のオーバーヘッドを、不完全なモダリティに対処する既存の方法と比較して削減している。さらに、多くのモダリティを持つタスクを含むシミュレーションシナリオでは、FLISMは通信速度が3.23x~85.10x、計算効率が3.73x~32.29xである。

Many mobile sensing applications utilize data from various modalities, including motion and physiological sensors in mobile and wearable devices. Federated Learning (FL) is particularly suitable for these applications thanks to its privacy-preserving feature. However, challenges such as limited battery life, poor network conditions, and sensor malfunctions can restrict the use of all available modalities for local model training. Additionally, existing multimodal FL systems also struggle with scalability and efficiency as the number of modality sources increases. To address these issues, we introduce FLISM, a framework designed to enable multimodal FL with incomplete modalities. FLISM leverages simulation technique to learn robust representations that can handle missing modalities and transfers model knowledge across clients with varying set of modalities. The evaluation results using three real-world datasets and simulations demonstrate FLISM's effective balance between model performance and system efficiency. It shows an average improvement of .067 in F1-score, while also reducing communication (2.69x faster) and computational (2.28x more efficient) overheads compared to existing methods addressing incomplete modalities. Moreover, in simulated scenarios involving tasks with a larger number of modalities, FLISM achieves a significant speedup of 3.23x~85.10x in communication and 3.73x~32.29x in computational efficiency.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# Adversarially Diversified Rehearsal Memory (ADRM):連続学習におけるメモリ過剰化課題の緩和

Adversarially Diversified Rehearsal Memory (ADRM): Mitigating Memory Overfitting Challenge in Continual Learning ( http://arxiv.org/abs/2405.11829v1 )

ライセンス: Link先を確認

Hikmat Khan, Ghulam Rasool, Nidhal Carla Bouaynaya,

(参考訳) 継続的な学習は、それまでの知識を忘れずに、静止しないデータ分布を学習することに焦点を当てる。リハーサルベースのアプローチは、破滅的な忘れに対処するために一般的に使用される。しかし、これらのアプローチは「リハーサルメモリオーバーフィット」と呼ばれる問題に悩まされ、モデルが限られたメモリサンプルに過度に特化し、効果的に一般化する能力を失う。その結果、リハーサル記憶の有効性は徐々に低下し、最終的には学習したタスクを破滅的に忘れてしまう。本稿では、メモリ過適合問題に対処するため、ADRM(Adversarially Diversified Rehearsal Memory)を導入する。本手法は, 自然および逆方向のノイズ破壊に対して, メモリサンプルの多様性を増進し, 耐性を高めるために設計されている。 ADRMはFGSM攻撃を使用して、逆修正されたメモリサンプルを導入し、メモリの多様性の向上と、メモリサンプルにおける連続的な機能ドリフトに対する堅牢な応答の促進という2つの主要な目的を達成する。第一に、ADRMはFGSMを用いてメモリバッファの複雑さを多様化し増大させ、リハーサルメモリに過度に適合する。第2に、ADRMはメモリ過適合を緩和し、安全クリティカルなアプリケーションに欠かせないCLモデルの堅牢性を著しく改善することを示した。最後に,ADRMがCLメモリサンプルのドリフトを緩和し,破滅的忘れを著しく低減し,より弾力性のあるCLモデルが得られることを示す。さらに,特徴分布の詳細なt-SNE可視化と特徴類似性の定量化により,既存のCLアプローチにおける特徴表現の理解を深めることができた。私たちのコードはhttps://github.com/hikmatkhan/ADRM.comで公開されています。

Continual learning focuses on learning non-stationary data distribution without forgetting previous knowledge. Rehearsal-based approaches are commonly used to combat catastrophic forgetting. However, these approaches suffer from a problem called "rehearsal memory overfitting, " where the model becomes too specialized on limited memory samples and loses its ability to generalize effectively. As a result, the effectiveness of the rehearsal memory progressively decays, ultimately resulting in catastrophic forgetting of the learned tasks. We introduce the Adversarially Diversified Rehearsal Memory (ADRM) to address the memory overfitting challenge. This novel method is designed to enrich memory sample diversity and bolster resistance against natural and adversarial noise disruptions. ADRM employs the FGSM attacks to introduce adversarially modified memory samples, achieving two primary objectives: enhancing memory diversity and fostering a robust response to continual feature drifts in memory samples. Our contributions are as follows: Firstly, ADRM addresses overfitting in rehearsal memory by employing FGSM to diversify and increase the complexity of the memory buffer. Secondly, we demonstrate that ADRM mitigates memory overfitting and significantly improves the robustness of CL models, which is crucial for safety-critical applications. Finally, our detailed analysis of features and visualization demonstrates that ADRM mitigates feature drifts in CL memory samples, significantly reducing catastrophic forgetting and resulting in a more resilient CL model. Additionally, our in-depth t-SNE visualizations of feature distribution and the quantification of the feature similarity further enrich our understanding of feature representation in existing CL approaches. Our code is publically available at https://github.com/hikmatkhan/ADRM.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# SSAMBA:マンバ状態空間モデルによる自己監督型音声表現学習

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model ( http://arxiv.org/abs/2405.11831v1 )

ライセンス: Link先を確認

Siavash Shams, Sukru Samet Dindar, Xilin Jiang, Nima Mesgarani,

(参考訳) トランスフォーマーは、強力なモデリング能力のために、音声表現学習を含む様々なタスクにわたってディープラーニングに革命をもたらした。しかし、GPUメモリ使用量と計算推論時間の両方で2次複雑さに悩まされ、その効率に影響を及ぼすことが多い。最近、Mambaのような状態空間モデル(SSM)は、これらの複雑さを回避してより効率的なアプローチを提供する、有望な代替手段として登場した。これらの利点を踏まえ、音声タスクにおけるSSMベースのモデルの可能性について検討する。本稿では,SSAMBA(Self-Supervised Audio Mamba)について紹介する。 SSAMBAは双方向のMambaを利用して複雑なオーディオパターンを効果的にキャプチャする。我々は、識別的および生成的目的の両方を最適化し、大規模でラベルなしのデータセットから堅牢な音声表現を学習できる自己教師付き事前学習フレームワークを組み込んだ。音声分類やキーワードスポッティング,話者識別など,様々なタスクにおけるSSAMBAの評価を行った。以上の結果から,SSAMBAはSSAST(Self-Supervised Audio Spectrogram Transformer)よりも優れていることがわかった。特に、SSAMBAはバッチ推論速度で約92.7%高速で、入力トークンサイズが22kの小さなモデルサイズではSSASTよりも95.4%メモリ効率が高い。これらの効率向上と優れたパフォーマンスが相まって、SSAMBAのアーキテクチャ革新の有効性を強調し、幅広いオーディオ処理アプリケーションにとって魅力的な選択となった。

Transformers have revolutionized deep learning across various tasks, including audio representation learning, due to their powerful modeling capabilities. However, they often suffer from quadratic complexity in both GPU memory usage and computational inference time, affecting their efficiency. Recently, state space models (SSMs) like Mamba have emerged as a promising alternative, offering a more efficient approach by avoiding these complexities. Given these advantages, we explore the potential of SSM-based models in audio tasks. In this paper, we introduce Self-Supervised Audio Mamba (SSAMBA), the first self-supervised, attention-free, and SSM-based model for audio representation learning. SSAMBA leverages the bidirectional Mamba to capture complex audio patterns effectively. We incorporate a self-supervised pretraining framework that optimizes both discriminative and generative objectives, enabling the model to learn robust audio representations from large-scale, unlabeled datasets. We evaluated SSAMBA on various tasks such as audio classification, keyword spotting, and speaker identification. Our results demonstrate that SSAMBA outperforms the Self-Supervised Audio Spectrogram Transformer (SSAST) in most tasks. Notably, SSAMBA is approximately 92.7% faster in batch inference speed and 95.4% more memory-efficient than SSAST for the tiny model size with an input token size of 22k. These efficiency gains, combined with superior performance, underscore the effectiveness of SSAMBA's architectural innovation, making it a compelling choice for a wide range of audio processing applications.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# トレーニング可能なサロゲートモデルへの非線形性の導入による説明音声概念の改善

Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model ( http://arxiv.org/abs/2405.11837v1 )

ライセンス: Link先を確認

Mounes Zaval, Sedat Ozer,

(参考訳) 説明可能なAI(XAI)の進化する分野では、コンピュータビジョンタスクにおけるディープニューラルネットワーク(DNN)の決定を解釈することが重要なプロセスである。ピクセルベースのXAIメソッドは重要なピクセルの識別に重点を置いているが、既存のコンセプトベースのXAIメソッドでは事前に定義された概念や人間による注釈が付けられている。最近提案されたSegment Anything Model (SAM)は、包括的なインスタンスセグメンテーションを通じて自動概念セットを作成するための大きな一歩を踏み出した。これに基づいて、DNN決定を説明するフレキシブルな方法として、EAC(Explain Any Concept)モデルが登場した。 EACモデルは、ターゲットモデルをシミュレートする訓練可能な1つの線形層を持つ代理モデルを用いている。本稿では,元のサロゲートモデルに新たな非線形層を導入することにより,ERCモデルの性能を向上させることができることを示す。提案手法を元のERCモデルと比較し,ImageNetおよびMS COCOデータセットで得られた改善点を報告する。

In the evolving field of Explainable AI (XAI), interpreting the decisions of deep neural networks (DNNs) in computer vision tasks is an important process. While pixel-based XAI methods focus on identifying significant pixels, existing concept-based XAI methods use pre-defined or human-annotated concepts. The recently proposed Segment Anything Model (SAM) achieved a significant step forward to prepare automatic concept sets via comprehensive instance segmentation. Building upon this, the Explain Any Concept (EAC) model emerged as a flexible method for explaining DNN decisions. EAC model is based on using a surrogate model which has one trainable linear layer to simulate the target model. In this paper, by introducing an additional nonlinear layer to the original surrogate model, we show that we can improve the performance of the EAC model. We compare our proposed approach to the original EAC model and report improvements obtained on both ImageNet and MS COCO datasets.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# ソーシャルインテリジェンスの評価とモデル化:人間とAIの能力の比較研究

Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilities ( http://arxiv.org/abs/2405.11841v1 )

ライセンス: Link先を確認

Junqi Wang, Chunhui Zhang, Jiapeng Li, Yuxi Ma, Lixing Niu, Jiaheng Han, Yujia Peng, Yixin Zhu, Lifeng Fan,

(参考訳) Mitchell & Krakauer, 2023; Bubeck et al , 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023) がほぼ人間に近い知能レベルを達成したかどうかに関する現在の議論の中で、現在の研究では、人間の認知の最も特徴的な側面である社会的知能を評価するためのベンチマークが紹介されている。我々は,社会力学の総合的理論的枠組みを開発し,逆推論(IR)と逆逆計画(IIP)の2つの評価タスクを導入した。また,人間の行動パターンの解明に長けた再帰的ベイズ推定に基づく計算モデルについても検討した。大規模な実験と詳細な分析により、人間は最新のGPTモデルを上回るパフォーマンス、ゼロショット学習、ワンショット一般化、マルチモダリティへの適応性を示した。特に、GPTモデルは、ヒトの社会的知能とは対照的に、最も基本的な順序(オーダー=0)でのみ社会的知能を示す。さらなる調査は、LLMがショートカットのパターン認識に頼ることの正当性を示し、真の人間レベルの社会知能の所有に疑念を抱いた。私たちのコード、データセット、付録、人間のデータはhttps://github.com/bigai-ai/Evaluate-n-Model-Social-Intelligence.comで公開されています。

Facing the current debate on whether Large Language Models (LLMs) attain near-human intelligence levels (Mitchell & Krakauer, 2023; Bubeck et al., 2023; Kosinski, 2023; Shiffrin & Mitchell, 2023; Ullman, 2023), the current study introduces a benchmark for evaluating social intelligence, one of the most distinctive aspects of human cognition. We developed a comprehensive theoretical framework for social dynamics and introduced two evaluation tasks: Inverse Reasoning (IR) and Inverse Inverse Planning (IIP). Our approach also encompassed a computational model based on recursive Bayesian inference, adept at elucidating diverse human behavioral patterns. Extensive experiments and detailed analyses revealed that humans surpassed the latest GPT models in overall performance, zero-shot learning, one-shot generalization, and adaptability to multi-modalities. Notably, GPT models demonstrated social intelligence only at the most basic order (order = 0), in stark contrast to human social intelligence (order >= 2). Further examination indicated a propensity of LLMs to rely on pattern recognition for shortcuts, casting doubt on their possession of authentic human-level social intelligence. Our codes, dataset, appendix and human data are released at https://github.com/bigai-ai/Evaluate-n-Model-Social-Intelligence.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# EPPS:エッジ情報注入と選択的特徴分離による高度なポリプセグメンテーション

EPPS: Advanced Polyp Segmentation via Edge Information Injection and Selective Feature Decoupling ( http://arxiv.org/abs/2405.11846v1 )

ライセンス: Link先を確認

Mengqi Lei, Xin Wang,

(参考訳) 大腸内視鏡検査におけるポリープの正確な分画は早期大腸癌の診断と管理に不可欠である。ポリプセグメンテーションの深層学習の進歩にもかかわらず、持続的な制限は持続する。ポリプのエッジは、典型的にはあいまいであり、背景から識別することが困難であり、モデルの性能は、無関係または重要でない特徴の影響によってしばしば損なわれる。これらの課題を軽減するために,我々はエッジ・プライオライト化ポリプ・セグメンテーション (EPPS) と呼ばれる新しいモデルを提案する。具体的には,ポリプのエッジを正確に抽出することを目的としたエッジマッピングエンジン(EME)を組み込んだ。その後、捕獲されたエッジ情報をデコーダブロックに注入することにより、マスク予測を強化するためにエッジ情報インジェクタ(EII)が考案される。さらに,選択的特徴分離器(Selective Feature Decoupler,SFD)と呼ばれるコンポーネントを導入し,モデルに対するノイズや外的特徴の影響を抑える。広範に使われている3つのポリプセグメンテーションベンチマークの大規模な実験は、他の最先端手法と比較して、我々の手法の優れた性能を示している。

Accurate segmentation of polyps in colonoscopy images is essential for early-stage diagnosis and management of colorectal cancer. Despite advancements in deep learning for polyp segmentation, enduring limitations persist. The edges of polyps are typically ambiguous, making them difficult to discern from the background, and the model performance is often compromised by the influence of irrelevant or unimportant features. To alleviate these challenges, we propose a novel model named Edge-Prioritized Polyp Segmentation (EPPS). Specifically, we incorporate an Edge Mapping Engine (EME) aimed at accurately extracting the edges of polyps. Subsequently, an Edge Information Injector (EII) is devised to augment the mask prediction by injecting the captured edge information into Decoder blocks. Furthermore, we introduce a component called Selective Feature Decoupler (SFD) to suppress the influence of noise and extraneous features on the model. Extensive experiments on 3 widely used polyp segmentation benchmarks demonstrate the superior performance of our method compared with other state-of-the-art approaches.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# シーケンスモデリングのためのオルタネータ

Alternators For Sequence Modeling ( http://arxiv.org/abs/2405.11848v1 )

ライセンス: Link先を確認

Mohammad Reza Rezaei, Adji Bousso Dieng,

(参考訳) 本稿では、列に対する非マルコフ力学モデルの新しいファミリである交代子について紹介する。交替器は、観測軌跡ネットワーク(OTN)と特徴軌跡ネットワーク(FTN)の2つのニューラルネットワークを備える。 OTNとFTNは共同で働き、観測空間にサンプルを出力するのと、周期的にいくつかの特徴空間を出力するのとを交互に交互に行う。 OTNとFTNのパラメータは時間依存ではなく、軌道上の最小エントロピー基準によって学習される。オルタネーターは万能である。動的潜在変数生成モデルやシーケンス・ツー・シーケンス予測モデルとして使用できる。振動子を生成モデルとして使用すると、FTNは解釈可能な低次元潜伏変数を生成し、観測を司るダイナミクスを捉える。変換器をシーケンス・ツー・シーケンス予測器として使用すると、FTNは観測された特徴を予測することを学習する。いずれの場合も、OTNはデータにマッチするシーケンスを生成することを学ぶ。オルタネータは、複雑なシーケンシャルなデータに基づく潜伏するダイナミクスを明らかにし、行方不明なデータを正確に予測し、インプットし、新しいトラジェクトリをサンプリングすることができる。 3つのアプリケーションで交換器の能力を示す。私たちは最初に、カオス的な振る舞いを記述するためにしばしば使用されるローレンツ方程式をモデル化するために、交代子を使用した。次に、脳活動を身体活動にマッピングするために、交互に神経科学に適用した。最後に, 海面温度予測に焦点をあてて, 気候科学に改質器を適用した。全ての実験において、置換体は訓練が安定であり、サンプリングが早く、高品質な生成サンプルと潜伏変数が得られ、研究領域におけるニューラルなODEや拡散モデルなどの強力なベースラインよりも優れていた。

This paper introduces alternators, a novel family of non-Markovian dynamical models for sequences. An alternator features two neural networks: the observation trajectory network (OTN) and the feature trajectory network (FTN). The OTN and the FTN work in conjunction, alternating between outputting samples in the observation space and some feature space, respectively, over a cycle. The parameters of the OTN and the FTN are not time-dependent and are learned via a minimum cross-entropy criterion over the trajectories. Alternators are versatile. They can be used as dynamical latent-variable generative models or as sequence-to-sequence predictors. When alternators are used as generative models, the FTN produces interpretable low-dimensional latent variables that capture the dynamics governing the observations. When alternators are used as sequence-to-sequence predictors, the FTN learns to predict the observed features. In both cases, the OTN learns to produce sequences that match the data. Alternators can uncover the latent dynamics underlying complex sequential data, accurately forecast and impute missing data, and sample new trajectories. We showcase the capabilities of alternators in three applications. We first used alternators to model the Lorenz equations, often used to describe chaotic behavior. We then applied alternators to Neuroscience, to map brain activity to physical activity. Finally, we applied alternators to Climate Science, focusing on sea-surface temperature forecasting. In all our experiments, we found alternators are stable to train, fast to sample from, yield high-quality generated samples and latent variables, and outperform strong baselines such as neural ODEs and diffusion models in the domains we studied.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# 視覚言語モデルにおける見過ごされた側面の再考

Rethinking Overlooked Aspects in Vision-Language Models ( http://arxiv.org/abs/2405.11850v1 )

ライセンス: Link先を確認

Yuan Liu, Le Tian, Xiao Zhou, Jie Zhou,

(参考訳) GPT4-VやLLaVAのような大規模視覚言語モデル(LVLM)の最近の進歩は顕著である。 LLaVAのモジュラーアーキテクチャは、特に単純さと効率性をブレンドしている。最近の研究は、モデルの性能を向上させるために、事前学習と指導のチューニングデータの導入に重点を置いている。本稿では,事前学習におけるデータ効率の非無視的な側面と,訓練データセットの選択過程について述べる。我々の研究は、単に事前学習データのサイズを拡大するだけでは性能が向上せず、実際にその劣化につながる可能性を示唆している。さらに、我々は、SFTデータセットをピンポイントするパイプラインを構築し、既存の研究で活用されているすべてのSFTデータが必要ないことを示唆している。本論文の主な目的は,最先端モデルの導入ではなく,事前学習および微調整プロセスにおけるデータ使用量の最適化を目標とし,ビジョン言語モデルの性能向上を目的とした今後の研究のロードマップとして機能することである。

Recent advancements in large vision-language models (LVLMs), such as GPT4-V and LLaVA, have been substantial. LLaVA's modular architecture, in particular, offers a blend of simplicity and efficiency. Recent works mainly focus on introducing more pre-training and instruction tuning data to improve model's performance. This paper delves into the often-neglected aspects of data efficiency during pre-training and the selection process for instruction tuning datasets. Our research indicates that merely increasing the size of pre-training data does not guarantee improved performance and may, in fact, lead to its degradation. Furthermore, we have established a pipeline to pinpoint the most efficient instruction tuning (SFT) dataset, implying that not all SFT data utilized in existing studies are necessary. The primary objective of this paper is not to introduce a state-of-the-art model, but rather to serve as a roadmap for future research, aiming to optimize data usage during pre-training and fine-tuning processes to enhance the performance of vision-language models.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# ストーリーテリングの進化:拡散モデルを用いた新しいキャラクタカスタマイズのためのベンチマークと方法

Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models ( http://arxiv.org/abs/2405.11852v1 )

ライセンス: Link先を確認

Xiyu Wang, Yufei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot,

(参考訳) ストーリービジュアライゼーションのための拡散モデルでは、ストーリーテリングタスクのためのコンテンツコヒーレントな画像を生成することが期待できる。しかし、文字の一貫性を維持しつつ、新しい文字を既存の物語に効果的に統合する方法は、特に限られたデータでは未解決の問題である。 1)潜在的な文字リークと一貫性のないテキストラベリングによる適切なベンチマークがないこと,2)新しい文字と古い文字を区別することの難しさ,そして曖昧な結果をもたらすこと,である。これらの課題に対処するために、生成モデルの適応性を評価するために設計された改良データセットからなるNewEpisodeベンチマークを導入する。洗練されたデータセットは、洗練されたテキストプロンプトと文字のリークを除去する。さらに、生成した結果の文字混乱を軽減するために、新しい文字をシームレスに統合した単一ストーリーで拡散に基づくビジュアルストーリー生成モデルをカスタマイズする手法であるEpicEvoを提案する。 EpicEvoは、新たな逆キャラクタアライメントモジュールを導入し、生成した画像を拡散過程において段階的に整列させ、新しいキャラクタの模範的なイメージを付加するとともに、知識蒸留を適用して文字や背景の詳細の忘れを防止する。 EpicEvoはNewEpisodeベンチマークで既存のベースラインよりも優れており、定性的な研究により拡散モデルにおける視覚的ストーリー生成の優れたカスタマイズが確認されている。要約すると、EpicEvoは1つの例だけを使って新しいキャラクターを組み込む効果的な方法を提供する。

Diffusion-based models for story visualization have shown promise in generating content-coherent images for storytelling tasks. However, how to effectively integrate new characters into existing narratives while maintaining character consistency remains an open problem, particularly with limited data. Two major limitations hinder the progress: (1) the absence of a suitable benchmark due to potential character leakage and inconsistent text labeling, and (2) the challenge of distinguishing between new and old characters, leading to ambiguous results. To address these challenges, we introduce the NewEpisode benchmark, comprising refined datasets designed to evaluate generative models' adaptability in generating new stories with fresh characters using just a single example story. The refined dataset involves refined text prompts and eliminates character leakage. Additionally, to mitigate the character confusion of generated results, we propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics. EpicEvo introduces a novel adversarial character alignment module to align the generated images progressively in the diffusive process, with exemplar images of new characters, while applying knowledge distillation to prevent forgetting of characters and background details. Our evaluation quantitatively demonstrates that EpicEvo outperforms existing baselines on the NewEpisode benchmark, and qualitative studies confirm its superior customization of visual story generation in diffusion models. In summary, EpicEvo provides an effective way to incorporate new characters using only one example story, unlocking new possibilities for applications such as serialized cartoons.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# 配向に基づく量子絡み合いの分離性と下界

Separability and lower bounds of quantum entanglement based on realignment ( http://arxiv.org/abs/2405.11861v1 )

ライセンス: Link先を確認

Jiaxin Sun, Hongmei Yao, Shao-Ming Fei, Zhaobing Fan,

(参考訳) 量子エンタングルメントの検出と推定は、量子エンタングルメントの理論において重要な問題である。本研究では,密度行列の配向と縮密度行列のベクトル化に基づいて行列を構築し,二部体系と多部体系の分離性基準を提示する。さらに、新しい低い収束境界と凸ルーフ拡張負性率を導出する。真の三部体の絡みを検出するための基準も与えられる。真の三部体の絡み合いの収束の低い境界が提示される。より詳細な例により、我々の結果は、真の多部絡みだけでなく、量子絡みの同定と推定において、対応するものよりも優れていることを示す。

The detection and estimation of quantum entanglement are the essential issues in the theory of quantum entanglement. We construct matrices based on the realignment of density matrices and the vectorization of the reduced density matrices, from which a family of separability criteria are presented for both bipartite and multipartite systems. Moreover, new lower bounds of concurrence and convex-roof extended negativity are derived. Criteria are also given to detect the genuine tripartite entanglement. Lower bounds of the concurrence of genuine tripartite entanglement are presented. By detailed examples we show that our results are better than the corresponding ones in identifying and estimating quantum entanglement as well as genuine multipartite entanglement.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# SEMv3:テーブル分離線検出のための高速かつロバストなアプローチ

SEMv3: A Fast and Robust Approach to Table Separation Line Detection ( http://arxiv.org/abs/2405.11862v1 )

ライセンス: Link先を確認

Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du,

(参考訳) テーブル構造認識(TSR)は、テーブル固有の構造を入力画像から解析することを目的としている。スプリット・アンド・マージ(split-and-merge)パラダイムは、テーブル分離線検出が不可欠であるテーブル構造を解析するための重要なアプローチである。しかし、無線やデフォルメテーブルなどの課題はそれを要求している。本稿ではスプリット・アンド・マージ(split-and-merge)パラダイムに忠実なSEMv3(Split, Embed, Merge)を提案する。分割段階ではキーポイントオフセット回帰(KOR)モジュールを導入し、キーポイント提案に対して各行のオフセットを直接回帰することでテーブル分離ラインを効果的に検出する。さらに、マージ段階では、テーブルグリッドに基づいたテーブル構造を効率的に記述するための一連のマージアクションを定義する。大規模なアブレーション実験により,提案するKORモジュールはテーブル分離線を迅速かつ正確に検出できることが示された。さらに、パブリックデータセット(例えばWTW、ICDAR-2019 cTDaR Historical、iFLYTAB)では、SEMv3は最先端(SOTA)のパフォーマンスを達成する。コードはhttps://github.com/Chunchunwumu/SEMv3.comで公開されている。

Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines. During the split stage, we introduce a Keypoint Offset Regression (KOR) module, which effectively detects table separation lines by directly regressing the offset of each line relative to its keypoint proposals. Moreover, in the merge stage, we define a series of merge actions to efficiently describe the table structure based on table grids. Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 cTDaR Historical and iFLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance. The code is available at https://github.com/Chunchunwumu/SEMv3.

翻訳日:2024-05-21 13:53:58 公開日:2024-05-20

# CoNLL#: きめ細かいエラー解析とCoNLL-03英語の修正テストセット

CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English ( http://arxiv.org/abs/2405.11865v1 )

ライセンス: Link先を確認

Andrew Rueda, Elena Álvarez Mellado, Constantine Lignos,

(参考訳) 現代のエンティティ認識システムは、より大きくより強力なニューラルモデルの時代において、パフォーマンスを着実に改善している。しかし、過去数年間、最先端の言語は、ベンチマークのCoNLL-03英語データセットで別の高原に到達したようだ。本稿では,高パフォーマンスなNERモデルのテスト出力を深く掘り下げ,テストセットに新たな文書レベルのアノテーションを導入することで,その性能を詳細に評価する。我々は、NERの真の最先端を解釈し、将来の作業を導くために、エラーを分類することで、F1スコアを超えます。我々は、テストセットの様々な欠陥を修正するための以前の試みをレビューし、CoNLL#を新たに修正したテストセットを紹介し、その体系的かつ最も一般的なエラーに対処し、低ノイズで解釈可能なエラー解析を可能にする。

Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models. However, over the past several years, the state-of-the-art has seemingly hit another plateau on the benchmark CoNLL-03 English dataset. In this paper, we perform a deep dive into the test outputs of the highest-performing NER models, conducting a fine-grained evaluation of their performance by introducing new document-level annotations on the test set. We go beyond F1 scores by categorizing errors in order to interpret the true state of the art for NER and guide future work. We review previous attempts at correcting the various flaws of the test set and introduce CoNLL#, a new corrected version of the test set that addresses its systematic and most prevalent errors, allowing for low-noise, interpretable error analysis.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# センサ非依存深度推定のための深さプロンプト

Depth Prompting for Sensor-Agnostic Depth Estimation ( http://arxiv.org/abs/2405.11867v1 )

ライセンス: Link先を確認

Jin-Hwi Park, Chanhwi Jeong, Junoh Lee, Hae-Gon Jeon,

(参考訳) 深度マップは視覚知覚タスクの重要な要素として使われてきた。最適化に基づくものから学習に基づくものまで、奥行きの質を高めるための膨大な努力が続けられている。長期にわたる顕著な進歩にもかかわらず、密度、センシングパターン、スキャン範囲などの体系的な測定バイアスにより、現実の世界での適用性は制限されている。偏見がこれらの手法の一般化を困難にしていることはよく知られている。直近の手法が採用している入力モダリティ(例えば画像や深さ)の合同表現の学習はバイアスに敏感であることが観察された。この研究では、これらのモダリティをアンタングルしてバイアスを軽減し、迅速なエンジニアリングを行う。そこで我々は,センサタイプとシーン構成のいずれからでも,新たな深度分布に応じて望ましい特徴表現を可能にする,新しい深度プロンプトモジュールを設計する。我々の深度プロンプトは、単分子深度推定の基礎モデルに組み込むことができる。この埋め込みにより,事前学習したモデルが深度スキャン範囲の制限を受けないようにし,絶対スケールの深度マップを提供する。提案手法の有効性を広範囲な評価により実証する。ソースコードはhttps://github.com/JinhwiPark/DepthPrompting.comで公開されている。

Dense depth maps have been used as a key element of visual perception tasks. There have been tremendous efforts to enhance the depth quality, ranging from optimization-based to learning-based methods. Despite the remarkable progress for a long time, their applicability in the real world is limited due to systematic measurement biases such as density, sensing pattern, and scan range. It is well-known that the biases make it difficult for these methods to achieve their generalization. We observe that learning a joint representation for input modalities (e.g., images and depth), which most recent methods adopt, is sensitive to the biases. In this work, we disentangle those modalities to mitigate the biases with prompt engineering. For this, we design a novel depth prompt module to allow the desirable feature representation according to new depth distributions from either sensor types or scene configurations. Our depth prompt can be embedded into foundation models for monocular depth estimation. Through this embedding process, our method helps the pretrained model to be free from restraint of depth scan range and to provide absolute scale depth maps. We demonstrate the effectiveness of our method through extensive evaluations. Source code is publicly available at https://github.com/JinhwiPark/DepthPrompting .

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# グラフのコントラスト学習に向けて - 調査とその先

Towards Graph Contrastive Learning: A Survey and Beyond ( http://arxiv.org/abs/2405.11868v1 )

ライセンス: Link先を確認

Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Zhiping Xiao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long, Siyu Yi, Xiao Luo, Ming Zhang,

(参考訳) 近年,グラフの深層学習は様々な領域において顕著な成功を収めている。しかし、注釈付きグラフデータへの依存は、その禁忌なコストと時間集約的な性質のために、依然として重大なボトルネックとなっている。この課題に対処するため、グラフ上の自己教師型学習(SSL)が注目され、大きな進歩を遂げた。 SSLにより、機械学習モデルはラベルのないグラフデータから情報表現を生成でき、高価なラベル付きデータへの依存を減らすことができる。グラフ上のSSLは広く採用されているが、GCL(Graph Contrastive Learning)という重要なコンポーネントは、既存の文献では十分に研究されていない。したがって、この調査は、GCLに関する専用の調査を提供することで、このギャップを埋めることを目的としている。本稿では,データ拡張戦略,コントラストモード,コントラスト最適化目標など,GCLの基本原理を概観する。さらに、弱い教師付き学習、移動学習、関連するシナリオなど、データ効率のよいグラフ学習の他の側面へのGCLの拡張についても検討する。また、薬物発見、ゲノム解析、レコメンダシステムといった領域にまたがる実践的応用についても論じ、最終的にこの分野における課題と今後の方向性について概説する。

In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to produce informative representations from unlabeled graph data, reducing the reliance on expensive labeled data. While SSL on graphs has witnessed widespread adoption, one critical component, Graph Contrastive Learning (GCL), has not been thoroughly investigated in the existing literature. Thus, this survey aims to fill this gap by offering a dedicated survey on GCL. We provide a comprehensive overview of the fundamental principles of GCL, including data augmentation strategies, contrastive modes, and contrastive optimization objectives. Furthermore, we explore the extensions of GCL to other aspects of data-efficient graph learning, such as weakly supervised learning, transfer learning, and related scenarios. We also discuss practical applications spanning domains such as drug discovery, genomics analysis, recommender systems, and finally outline the challenges and potential future directions in this field.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# 直感的な微調整:SFTとRLHFを単一プロセスに統合する

Intuitive Fine-Tuning: Towards Unifying SFT and RLHF into a Single Process ( http://arxiv.org/abs/2405.11870v1 )

ライセンス: Link先を確認

Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, Bowen Zhou,

(参考訳) Supervised Fine-Tuning (SFT) と Reinforcement Learning from Human Feedback (RLHF) は、事前トレーニング後の言語モデル(LM)の機能を強化するための2つの基本的なプロセスである。 SFTは訓練効率が向上するが、RLHFはより優れたアライメントを提供するため、しばしば組み合わせられる。しかしながら、一般的なプラクティスは、最適化目標を統一することなく、それらを順次適用することで、異なる目的に合わせることと、パラダイムギャップを埋める機会を無視して、両方から強みを取るというトレードオフをもたらす。統一的な理解を得るために,Markov Decision Process (MDP) フレームワーク内のトークンレベルで定義された2つのサブプロセスであるpreference Estimation と transition Optimization を用いて,SFT と RLHF を解釈する。このモデリングは、SFTが劣等な推定と最適化を伴うRLHFの特殊なケースであることを示している。 RLHFは、モデル全体の回答の質を評価する一方、SFTは、ターゲットの回答から前のトークンに基づいて予測トークンをスコアする。したがって、SFTはモデルの性能を過大評価し、劣等な最適化をもたらす。この観点から,SFTとRLHFを単一のプロセスに統合する直観的ファインチューニング(IFT)を導入する。 IFTは、単一ポリシーとSFTと同量の非参照ラベル付きデータを用いて、LMの時間的残差接続を通して全回答の直感的な感覚を捉えている。我々の実験は、IFTがSFTのシーケンシャルなレシピやいくつかのタスク、特に生成、推論、ファクトフォロー能力を必要とするいくつかの典型的なアライメント手法と相容れないか、あるいはそれ以上に優れていることを示した。説明可能な凍結湖ゲームはIFTの有効性をさらに検証する。

Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) are two fundamental processes for enhancing the capabilities of Language Models (LMs) post pre-training, aligning them better with human preferences. Although SFT advances in training efficiency, RLHF delivers better alignment, thus they are often combined. However, common practices simply apply them sequentially without unifying their optimization targets, resulting in a trade-off between fitting different objectives, and ignoring the opportunities to bridge the paradigm gap and take the strength from both. To obtain a unified understanding, we interpret SFT and RLHF using two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework. This modeling shows that SFT is only a specialized case of RLHF with inferior estimation and optimization. RLHF evaluates the quality of model's entire generated answer, whereas SFT only scores predicted tokens based on preceding tokens from target answers. Therefore, SFT overestimates the ability of model, leading to inferior optimization. Building on this view, we introduce Intuitive Fine-tuning (IFT) to integrate SFT and RLHF into a single process. IFT captures LMs' intuitive sense of the entire answers through a temporal residual connection, while using a single policy and the same volume of non-preference-labeled data as SFT. Our experiments show that IFT performs comparably or even superiorly to sequential recipes of SFT and some typical alignment methods across several tasks, particularly those requires generation, reasoning, and fact-following abilities. An explainable Frozen Lake game further validates the effectiveness of IFT.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# オープン量子ダイナミクス:情報のバックフローの記憶効果とスーパーアクティベーション

Open Quantum Dynamics: Memory Effects and Superactivation of Backflow of Information ( http://arxiv.org/abs/2405.11872v1 )

ライセンス: Link先を確認

Fabio Benatti, Giovanni Nichele,

(参考訳) テンソル積 $\Lambda^{(1)}_t\otimes\Lambda^{(2)}_t$ of open quantum dynamics $\Lambda^{(1,2)}_t$ with time-dependent generators。これらの動的マップは、複雑なオープンシステム$S_1+S_2$から生まれ、環境が追跡されたときにメモリ効果が残るように、自身の環境と相互作用する。本研究は, 環境からのバックフロー・オブ・インフォメーション(BFI)を$S_1+S_2$と$S_1+S_2$と同一の現象を起こさない$S_1+S_2$にすることができる。我々は、この効果をBFI(SBFI)のスーパーアクティベーションと呼ぶ。

We investigate the divisibility properties of the tensor products $\Lambda^{(1)}_t\otimes\Lambda^{(2)}_t$ of open quantum dynamics $\Lambda^{(1,2)}_t$ with time-dependent generators. These dynamical maps emerge from a compound open system $S_1+S_2$ that interacts with its own environment in such a way that memory effects remain when the environment is traced away. This study is motivated by the following intriguing effect: one can have Backflow of Information (BFI) from the environment to $S_1+S_2$ without the same phenomenon occurring for either $S_1$ and $S_2$. We shall refer to this effect as the Superactivation of BFI (SBFI).

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# xFinder: 大規模言語モデルのためのロバストおよびピンポイントアンサー抽出

xFinder: Robust and Pinpoint Answer Extraction for Large Language Models ( http://arxiv.org/abs/2405.11874v1 )

ライセンス: Link先を確認

Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen,

(参考訳) 大規模言語モデル(LLM)の継続的な進歩は、その性能を評価するための公平で信頼性の高い手法を開発するという重要な問題に注意を向けている。特に、テストセットのリークやプロンプトフォーマットのオーバーフィットといった主観的または非客観的な不正現象の出現は、LCMの信頼性評価に重大な課題をもたらす。評価フレームワークは、回答抽出に正規表現(RegEx)を利用することが多いため、RegExによって容易に抽出できる特定のフォーマットに適合するように応答を調整するモデルもある。それにもかかわらず、RegExに基づくキー回答抽出モジュールは、しばしば抽出エラーに悩まされる。本稿では,LLM評価チェーン全体の包括的解析を行い,鍵解答抽出モジュールの最適化により抽出精度が向上し,LLMが特定の解答形式に依存することが低減され,LLM評価の信頼性が向上することが実証された。これらの問題に対処するために、キー回答抽出に特化して設計されたモデルであるxFinderを提案する。このプロセスの一環として、効率的なモデルトレーニングと評価を保証するために、特別なデータセットであるKey Answer Finder (KAF)データセットを作成します。実世界のシナリオにおける一般化テストと評価により、5億のパラメータしか持たない最小のxFinderモデルが平均解解抽出精度93.42%を達成することを示した。対照的に、最高の評価フレームワークにおけるRegExの精度は74.38%である。 xFinderは、既存の評価フレームワークと比較して、強い堅牢性と高い精度を示している。 xFinder のすべてのリソースは \url{https://github.com/IAAR-Shanghai/xFinder} で利用可能である。

The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evaluation frameworks often utilize Regular Expression (RegEx) for answer extraction, some models may adjust their responses to comply with specific formats that are easily extractable by RegEx. Nevertheless, the key answer extraction module based on RegEx frequently suffers from extraction errors. This paper conducts a comprehensive analysis of the entire LLM evaluation chain, demonstrating that optimizing the key answer extraction module can improve extraction accuracy, reduce LLMs' reliance on specific answer formats, and enhance the reliability of LLM evaluation. To address these issues, we propose xFinder, a model specifically designed for key answer extraction. As part of this process, we create a specialized dataset, the Key Answer Finder (KAF) dataset, to ensure effective model training and evaluation. Through generalization testing and evaluation in real-world scenarios, the results demonstrate that the smallest xFinder model with only 500 million parameters achieves an average answer extraction accuracy of 93.42%. In contrast, RegEx accuracy in the best evaluation framework is 74.38%. xFinder exhibits stronger robustness and higher accuracy compared to existing evaluation frameworks. All resources for xFinder are available at \url{https://github.com/IAAR-Shanghai/xFinder}.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# 人気のある地下市場における暗号通貨・サービスを理解する

Understanding crypter-as-a-service in a popular underground marketplace ( http://arxiv.org/abs/2405.11876v1 )

ライセンス: Link先を確認

Alejandro de la Cruz, Sergio Pastrana,

(参考訳) クリプタ(Crypters)とは、ターゲットバイナリを変換することで、アンチウイルス(AV)アプリケーションからの検出を回避できるソフトウェアである。マルウェアのバイナリを取得し、一連の修正や難読化、暗号化を適用して、1つ以上のAVを回避するバイナリを出力することで、パッカーと同じような動作を行う。目標は、(しばしば悪意のある)機能を維持しながら、完全に検出されないまま、ハックされたjargon内のFUDを維持することだ。サイバー犯罪におけるコモディティ化の進展に伴い,検出機構の高度化に対応して,シークレット・アズ・ア・サービス・モデルが人気を博している。このビジネスモデルでは、顧客がアンチウイルスによって検出されるとすぐに更新される初期暗号を受信する。本論文は,シークレット・アズ・ア・サービスに特化したオンライン地下市場に関する最初の研究である。販売されている最も関連性の高い製品を比較し、プラットフォーム上の既存のソーシャルネットワークを分析し、それらが提供するさまざまな機能を比較します。事例研究として,市場で販売されている最も人気のある暗号鍵の1つを検証し,バイナリ(良性およびマルウェアの両方)の暗号化前後の結果を比較して,抗ウイルスエンジンの回避効果を示す。

Crypters are pieces of software whose main goal is to transform a target binary so it can avoid detection from Anti Viruses (AVs from now on) applications. They work similar to packers, by taking a malware binary and applying a series of modifications, obfuscations and encryptions to output a binary that evades one or more AVs. The goal is to remain fully undetected, or FUD in the hacking jargon, while maintaining its (often malicious) functionality. In line to the growth of commoditization in cybercrime, the crypter-as-a-service model has gained popularity, in response to the increased sophistication of detection mechanisms. In this business model, customers receive an initial crypter which is soon updated once becomes detected by anti-viruses. This paper provides the first study on an online underground market dedicated to crypter-as-a-service. We compare the most relevant products in sale, analyzing the existent social network on the platform and comparing the different features that they provide. We also conduct an experiment as a case study, to validate the usage of one of the most popular crypters sold in the market, and compare the results before and after crypting binaries (both benign and malware), to show its effectiveness when evading antivirus engines.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# RoNLIを応用したカルトグラフィーに基づく新しいカリキュラム学習法:ルーマニア初の自然言語推論コーパス

A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus ( http://arxiv.org/abs/2405.11877v1 )

ライセンス: Link先を確認

Eduard Poesina, Cornelia Caragea, Radu Tudor Ionescu,

(参考訳) 自然言語推論(英: Natural Language Inference, NLI)は、自然言語理解の代名詞として研究されている話題である。対話エージェントの構築やテキスト分類、機械翻訳、その他のNLPタスクの改善には関連性があるものの、我々の知る限り、ルーマニア語のNLIコーパスは公開されていない。この目的のために, 遠隔監視により得られた58Kの訓練文対と, 正確なラベルを手動で注釈付けした6Kの検証とテスト文対からなるルーマニア初のNLIコーパス(RoNLI)を導入する。我々は、単語埋め込みに基づく浅いモデルからトランスフォーマーベースのニューラルネットワークまで、遠隔学習に基づく複数の機械学習手法で実験を行い、競争力のあるベースラインを確立する。さらに、データ地図に基づく新しいカリキュラム学習戦略を採用することにより、最良のモデルを改善する。ベースラインを再現するデータセットとコードはhttps://github.com/Eduard6421/RONLI.comで利用可能です。

Natural language inference (NLI), the task of recognizing the entailment relationship in sentence pairs, is an actively studied topic serving as a proxy for natural language understanding. Despite the relevance of the task in building conversational agents and improving text classification, machine translation and other NLP tasks, to the best of our knowledge, there is no publicly available NLI corpus for the Romanian language. To this end, we introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs, which are obtained via distant supervision, and 6K validation and test sentence pairs, which are manually annotated with the correct labels. We conduct experiments with multiple machine learning methods based on distant learning, ranging from shallow models based on word embeddings to transformer-based neural networks, to establish a set of competitive baselines. Furthermore, we improve on the best model by employing a new curriculum learning strategy based on data cartography. Our dataset and code to reproduce the baselines are available https://github.com/Eduard6421/RONLI.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# LLMにおける文脈推論効果と記憶効果の定量化

Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs ( http://arxiv.org/abs/2405.11880v1 )

ライセンス: Link先を確認

Siyu Lou, Yuntian Chen, Xiaodan Liang, Liang Lin, Quanshi Zhang,

(参考訳) 本研究では,大規模言語モデル(LLM)が言語生成に用いている,正確な記憶と文脈内推論効果を定義し,定量化するための公理系を提案する。これらの効果は LLM で符号化されたトークン/ワード間の非線形相互作用として定式化される。具体的には, 記憶効果を基礎記憶効果とカオス記憶効果に分類し, さらに文脈内推論効果を拡張推論パターンに分類し, 推論パターンを排除し, 推論パターンを反転させる。さらに、分解された効果は、LLMの信頼性スコアが記憶効果と文脈内推論効果に忠実に分解できることを数学的に保証する空間性と普遍的マッチング性を満たす。実験により, 暗記効果と文脈内推論効果の明確な乱れが, LLMによって符号化された詳細な推論パターンの簡易な検証を可能にした。

In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation. These effects are formulated as non-linear interactions between tokens/words encoded by the LLM. Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects, and further classify in-context reasoning effects into enhanced inference patterns, eliminated inference patterns, and reversed inference patterns. Besides, the decomposed effects satisfy the sparsity property and the universal matching property, which mathematically guarantee that the LLM's confidence score can be faithfully decomposed into the memorization effects and in-context reasoning effects. Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# 単一非条件拡散モデルによるアウト・オブ・ディストリビューション検出

Out-of-Distribution Detection with a Single Unconditional Diffusion Model ( http://arxiv.org/abs/2405.11881v1 )

ライセンス: Link先を確認

Alvin Heng, Alexandre H. Thiery, Harold Soh,

(参考訳) アウト・オブ・ディストリビューション(OOD)検出は、異常サンプルを特定しようとする機械学習において重要なタスクである。従来、教師なし手法はOOD検出に深い生成モデルを用いていた。しかし, 新たな分布に対する異常評価には, 異なるモデルが必要である。本稿では,基本生成モデルの出現とともに,単一汎用モデルが多様なタスクに対してOOD検出を行うことができるかどうかを考察する。そこで本研究では,Diffusion Paths (DiffPath) という手法を紹介した。 DiffPath氏は、当初、OOD検出のための無条件生成を実行するために訓練された単一の拡散モデルを利用することを提案する。具体的には,試料を標準標準値に接続する拡散経路の速度と曲率を測定する新しい手法を提案する。大規模な実験により、DiffPathは1つのモデルで、異なる分布を含む様々なOODタスクの先行作業より優れていたことが示されている。私たちのコードはhttps://github.com/clear-nus/diffpath.comで公開されています。

Out-of-distribution (OOD) detection is a critical task in machine learning that seeks to identify abnormal samples. Traditionally, unsupervised methods utilize a deep generative model for OOD detection. However, such approaches necessitate a different model when evaluating abnormality against a new distribution. With the emergence of foundational generative models, this paper explores whether a single generalist model can also perform OOD detection across diverse tasks. To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. DiffPath proposes to utilize a single diffusion model originally trained to perform unconditional generation for OOD detection. Specifically, we introduce a novel technique of measuring the rate-of-change and curvature of the diffusion paths connecting samples to the standard normal. Extensive experiments show that with a single model, DiffPath outperforms prior work on a variety of OOD tasks involving different distributions. Our code is publicly available at https://github.com/clear-nus/diffpath.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# 鉛直的フェデレーション学習ハイブリッドローカル事前学習

Vertical Federated Learning Hybrid Local Pre-training ( http://arxiv.org/abs/2405.11884v1 )

ライセンス: Link先を確認

Wenguo Li, Xinling Guo, Xu Jiao, Tiancheng Huang, Xiaoran Yan, Yao Yang,

(参考訳) 現実世界の応用範囲の広い垂直的フェデレートラーニング(VFL)は、アカデミックと産業の両方で多くの注目を集めている。企業は、モデルの予測スキルを高めるために、さまざまな部門から同じユーザのより価値のある機能を活用しようとしている。 VFLはこの要求に対処し、個々のパーティが生データを公開しないことを同時に保証します。しかしながら、従来のVFLは、より多くの関係者が関与してサイズが縮小し、データ不足と不整合データの無駄が生じるような整合したサンプルのみを活用するため、ボトルネックに直面している。この問題に対処するために,新しいVFL Hybrid Local Pre-training (VFLHLP) アプローチを提案する。 VFLHLPはまず、参加者のローカルデータに基づいて、ローカルネットワークを事前訓練する。そして、これらの事前学習ネットワークを利用してラベル付きパーティーのサブモデルを調整するか、下流のフェデレーション学習中に他のパーティーの表現学習を強化することで、フェデレーション付きモデルの性能を高める。

Vertical Federated Learning (VFL), which has a broad range of real-world applications, has received much attention in both academia and industry. Enterprises aspire to exploit more valuable features of the same users from diverse departments to boost their model prediction skills. VFL addresses this demand and concurrently secures individual parties from exposing their raw data. However, conventional VFL encounters a bottleneck as it only leverages aligned samples, whose size shrinks with more parties involved, resulting in data scarcity and the waste of unaligned data. To address this problem, we propose a novel VFL Hybrid Local Pre-training (VFLHLP) approach. VFLHLP first pre-trains local networks on the local data of participating parties. Then it utilizes these pre-trained networks to adjust the sub-model for the labeled party or enhance representation learning for other parties during downstream federated learning on aligned data, boosting the performance of federated models.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# クォータム後のセキュリティ - 起源,基礎,導入

Post-Quantum Security: Origin, Fundamentals, and Adoption ( http://arxiv.org/abs/2405.11885v1 )

ライセンス: Link先を確認

Johanna Barzen, Frank Leymann,

(参考訳) 今日では、離散対数計算が困難であると考えられるため、非対称な暗号スキームは安全であると考えられている。 Shorのアルゴリズムは、離散対数、すなわち非対称スキームを効果的に計算することができる。しかしShorのアルゴリズムは量子アルゴリズムであり、このアルゴリズムが発明された時点では、このアルゴリズムをうまく実行できる量子コンピュータは、将来的には遠く離れているようだ。量子コンピュータは、数年で利用できるようになるだろう。本稿では、まず、離散対数とよく知られた2つの非対称なセキュリティスキーム、RSAと楕円曲線暗号の関係について述べる。次に、量子アルゴリズム(および古典的アルゴリズム)による攻撃に対して安全と考えられるスキームの基盤である格子ベースの暗号の基礎を示す。次に、このような量子セーフな2つのアルゴリズム(KyberとDilithium)についてより詳細に説明する。最後に、この領域の標準化だけでなく、現在政府や産業が取っているいくつかの行動について、非常に簡潔かつ選択的に概観する。この論文は、特に自己完結することを目指しており、量子後暗号を理解するために必要な数学的基礎が提供され、例が提示される。

Nowadays, predominant asymmetric cryptographic schemes are considered to be secure because discrete logarithms are believed to be hard to be computed. The algorithm of Shor can effectively compute discrete logarithms, i.e. it can brake such asymmetric schemes. But the algorithm of Shor is a quantum algorithm and at the time this algorithm has been invented, quantum computers that may successfully execute this algorithm seemed to be far out in the future. The latter has changed: quantum computers that are powerful enough are likely to be available in a couple of years. In this article, we first describe the relation between discrete logarithms and two well-known asymmetric security schemes, RSA and Elliptic Curve Cryptography. Next, we present the foundations of lattice-based cryptography which is the bases of schemes that are considered to be safe against attacks by quantum algorithms (as well as by classical algorithms). Then we describe two such quantum-safe algorithms (Kyber and Dilithium) in more detail. Finally, we give a very brief and selective overview of a few actions currently taken by governments and industry as well as standardization in this area. The article especially strives towards being self-contained: the required mathematical foundations to understand post-quantum cryptography are provided and examples are given.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# 大規模言語モデルにおける展開と操作のプロンプトの影響

Unveiling and Manipulating Prompt Influence in Large Language Models ( http://arxiv.org/abs/2405.11891v1 )

ライセンス: Link先を確認

Zijian Feng, Hanzhang Zhou, Zixiao Zhu, Junlang Qian, Kezhi Mao,

(参考訳) プロンプトは、LLM(Large Language Models)の応答を導く上で重要な役割を果たす。しかし、入力サリエンシ(input saliency)と呼ばれるプロンプトにおける個々のトークンの複雑な役割は、応答を形作る際には、ほとんど未解明のままである。既存のサリエンシ法はLLM生成目的と不一致であるか、線形性仮定に大きく依存しているかのいずれかであり、潜在的な不正確な結果をもたらす。そこで本稿では,LLM出力生成におけるプロンプトの役割を明らかにするために,Token Distribution Dynamics (TDD) を提案する。 TDDは、言語モデルヘッド(LMヘッド)の堅牢な解釈機能を活用して、入力の正確性を評価する。入力トークンを埋め込み空間に投影し、語彙上の分布ダイナミクスに基づいてそれらの重要性を推定する。私たちは、前方、後方、双方向の3つのTDDのバリエーションを紹介します。大規模な実験によって、TDDは最先端のベースラインを越え、プロンプトとLCMのアウトプット間の因果関係を解明する上で大きなマージンを持つことが明らかになった。単なる解釈の他に、制御されたテキスト生成のための2つの迅速な操作タスク、すなわちゼロショット有害な言語抑制と感情管理にTDDを適用します。経験的な結果は、プロンプトにおける有毒な方法と感傷的な方法の両方を識別するTDDの習熟度を強調し、その後、生成されたコンテンツにおける有毒さを緩和したり、感情を調節したりする。

Prompts play a crucial role in guiding the responses of Large Language Models (LLMs). However, the intricate role of individual tokens in prompts, known as input saliency, in shaping the responses remains largely underexplored. Existing saliency methods either misalign with LLM generation objectives or rely heavily on linearity assumptions, leading to potential inaccuracies. To address this, we propose Token Distribution Dynamics (TDD), a \textcolor{black}{simple yet effective} approach to unveil and manipulate the role of prompts in generating LLM outputs. TDD leverages the robust interpreting capabilities of the language model head (LM head) to assess input saliency. It projects input tokens into the embedding space and then estimates their significance based on distribution dynamics over the vocabulary. We introduce three TDD variants: forward, backward, and bidirectional, each offering unique insights into token relevance. Extensive experiments reveal that the TDD surpasses state-of-the-art baselines with a big margin in elucidating the causal relationships between prompts and LLM outputs. Beyond mere interpretation, we apply TDD to two prompt manipulation tasks for controlled text generation: zero-shot toxic language suppression and sentiment steering. Empirical results underscore TDD's proficiency in identifying both toxic and sentimental cues in prompts, subsequently mitigating toxicity or modulating sentiment in the generated content.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# CNNを用いた後処理による人間の視覚層における符号化画像の精細化

Refining Coded Image in Human Vision Layer Using CNN-Based Post-Processing ( http://arxiv.org/abs/2405.11894v1 )

ライセンス: Link先を確認

Takahiro Shindo, Yui Tatsumi, Taiju Watanabe, Hiroshi Watanabe,

(参考訳) 人間と機械の両方のスケーラブルなイメージコーディングは、最近多くの注目を集めているテクニックです。この技術は、人間の視覚と画像認識モデルのための画像の階層的復号化を可能にする。画像が両方の目的を果たす必要がある場合、非常に効果的な方法である。しかし、一般的な画像圧縮方式でよく使われるポストプロセッシングを人や機械のスケーラブルな画像符号化法に組み込んだ研究はまだない。本稿では,ポストプロセッシングをスケーラブルな符号化方式に統合することにより,人間のデコード画像の品質を向上させる手法を提案する。実験結果から, 後処理により圧縮性能が向上することが示された。さらに,従来の手法との比較により,提案手法の有効性を検証した。

Scalable image coding for both humans and machines is a technique that has gained a lot of attention recently. This technology enables the hierarchical decoding of images for human vision and image recognition models. It is a highly effective method when images need to serve both purposes. However, no research has yet incorporated the post-processing commonly used in popular image compression schemes into scalable image coding method for humans and machines. In this paper, we propose a method to enhance the quality of decoded images for humans by integrating post-processing into scalable coding scheme. Experimental results show that the post-processing improves compression performance. Furthermore, the effectiveness of the proposed method is validated through comparisons with traditional methods.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# ディジタル双生児における生産プロセス最適化のためのスパースアテンション駆動品質予測

Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins ( http://arxiv.org/abs/2405.11895v1 )

ライセンス: Link先を確認

Yanlei Yin, Lihua Wang, Wenbo Wang, Dinh Thai Hoang,

(参考訳) プロセス産業では、生産ラインを長期的効率に最適化するには、生産ラインパラメータを微調整するために、運用状態のリアルタイムモニタリングと分析が必要である。しかし、運用論理の複雑さと生産プロセスパラメータの複雑な結合は、プロセス全体の正確な数学的モデルを開発するのを難しくし、効率的な最適化機構の展開を妨げる。これらの困難を鑑みて、我々は、物理レイアウトと操作ロジックをデジタル的に抽象化することで、生産ラインのデジタルツインをデプロイすることを提案する。デジタル双生児における機器運用状況と製品品質検査を反映した実世界のデータを反復的にマッピングすることにより、自己注意型時間畳み込みニューラルネットワークに基づく生産プロセスの品質予測モデルを採用する。このモデルは、デジタルツインのデータ駆動状態の進化を可能にする。ディジタルツインは、実際の動作条件の情報と品質に敏感な分析結果を集約する役割を担い、多次元制約下での仮想現実性進化によるプロセス生産品質の最適化を容易にする。ディジタルツインモデルを情報フローキャリアとして活用し、キープロセスインジケータから時間的特徴を抽出し、提案した合成ニューラルネットワークに基づく生産プロセス品質予測モデルを確立する。本手法は,本手法により,仮想及び実生産ライン間のシームレスな統合を促進できることを示す。この統合により、平均動作状態予測精度が98\%以上で、ほぼ最適生産プロセス制御が達成される。

In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment of efficient optimization mechanisms. In view of these difficulties, we propose to deploy a digital twin of the production line by digitally abstracting its physical layout and operational logic. By iteratively mapping the real-world data reflecting equipment operation status and product quality inspection in the digital twin, we adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. This model enables the data-driven state evolution of the digital twin. The digital twin takes a role of aggregating the information of actual operating conditions and the results of quality-sensitive analysis, which facilitates the optimization of process production quality with virtual-reality evolution under multi-dimensional constraints. Leveraging the digital twin model as an information-flow carrier, we extract temporal features from key process indicators and establish a production process quality prediction model based on the proposed composite neural network. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines. This integration achieves an average operating status prediction accuracy of over 98\% and near-optimal production process control.

翻訳日:2024-05-21 13:44:14 公開日:2024-05-20

# CReMa: ソーシャルメディア上で共有された言語間要求の計算的識別とマッチングによる危機応答

CReMa: Crisis Response through Computational Identification and Matching of Cross-Lingual Requests and Offers Shared on Social Media ( http://arxiv.org/abs/2405.11897v1 )

ライセンス: Link先を確認

Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera, Muhammad Imran,

(参考訳) 危機期には、ソーシャルメディアプラットフォームはコミュニケーションの促進と資源の調整に重要な役割を担っている。混乱と不確実性の中で、コミュニティはしばしばこれらのプラットフォームを頼りにし、助けを求める緊急の嘆願を共有し、支援を拡張し、救援活動の組織化を行っている。しかし、前例のないレベルにエスカレートできるこのような期間の会話の量は、要求の自動識別とマッチングを必要とし、救援活動の合理化を提供する。本研究は、緊急時のソーシャルメディアプラットフォームにおける支援要請と提供を効果的に識別し、マッチングすることの課題に対処する。本稿では,CReMa(Crisis Response Matcher)を提案する。危機に特有の事前訓練されたモデルセットであるCrisisTransformersと、言語間埋め込みスペースを活用することで、危機埋め込みタスクにおいて、RoBERTa、MPNet、BERTweetなどの強力なベースラインを上回りながら、識別とマッチングタスクを向上し、クラス分けタスクではUniversal Sentence Encoder、Sentence Transformers、危機埋め込みタスクではSentence Transformersを利用する。オーストラリアで最もよく使われている16言語にまたがって、ヘルプ検索のシナリオをシミュレートし、ソーシャルメディアに支援を提供する、新しい多言語データセットを導入する。我々は、これらの16言語にわたる包括的な言語間実験を行い、また、複数のベクトル探索戦略と精度のトレードオフについても検討する。さらに、100万件のジオタグ付きグローバルデータセットを分析し、ソーシャルメディアへの支援や支援の提供に関連するパターンを理解する。全体として、これらの貢献は危機情報学の分野を前進させ、地域の将来の研究のためのベンチマークを提供する。

During times of crisis, social media platforms play a vital role in facilitating communication and coordinating resources. Amidst chaos and uncertainty, communities often rely on these platforms to share urgent pleas for help, extend support, and organize relief efforts. However, the sheer volume of conversations during such periods, which can escalate to unprecedented levels, necessitates the automated identification and matching of requests and offers to streamline relief operations. This study addresses the challenge of efficiently identifying and matching assistance requests and offers on social media platforms during emergencies. We propose CReMa (Crisis Response Matcher), a systematic approach that integrates textual, temporal, and spatial features for multi-lingual request-offer matching. By leveraging CrisisTransformers, a set of pre-trained models specific to crises, and a cross-lingual embedding space, our methodology enhances the identification and matching tasks while outperforming strong baselines such as RoBERTa, MPNet, and BERTweet, in classification tasks, and Universal Sentence Encoder, Sentence Transformers in crisis embeddings generation tasks. We introduce a novel multi-lingual dataset that simulates scenarios of help-seeking and offering assistance on social media across the 16 most commonly used languages in Australia. We conduct comprehensive cross-lingual experiments across these 16 languages, also while examining trade-offs between multiple vector search strategies and accuracy. Additionally, we analyze a million-scale geotagged global dataset to comprehend patterns in relation to seeking help and offering assistance on social media. Overall, these contributions advance the field of crisis informatics and provide benchmarks for future research in the area.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# モデルに基づくクビット雑音分光法

Model-Based Qubit Noise Spectroscopy ( http://arxiv.org/abs/2405.11898v1 )

ライセンス: Link先を確認

Kevin Schultz, Christopher A. Watson, Andrew J. Murphy, Timothy M. Sweeney, Gregory Quiroz,

(参考訳) クビットノイズスペクトロスコピー(QNS)は、クビット環境のキャラクタリゼーションと、クビット密度を改善するためにより効果的なクビット制御の前駆体として有用である。既存のQNSへのアプローチは、古典的なスペクトル推定文献が「非パラメトリック」アプローチと呼ぶもので、一連のプローブシーケンスが点やバンドの集合でノイズパワーを推定するために使用される。対照的に、モデルに基づくスペクトル推定のアプローチは、スペクトルの形で付加的な構造を仮定し、これを超解像のような統計的精度や他の能力の改善に活用する。本稿では,従来の信号処理からインスピレーションを得て,モデルに基づくQNSアプローチを導出する。しかし,最近開発されたシュロディンガー波自己回帰移動平均(SchWARMA)は相関雑音をモデル化するための形式である。シミュレーションと実験データの両方を通して、これらのモデルに基づくQNSアプローチが、古典的手法の統計的および計算的利点をいかに維持するかを示し、その結果、強力な新しい推定手法がもたらされた。 QNSと量子センシングへのこれらのアプローチの直接的な適用以外にも、量子システムに対する適応的なフィードバック制御において、古典的な適応信号処理と制御におけるそれらの役割と類似して、基礎となるモデルの柔軟性が有用であることが期待できる。

Qubit noise spectroscopy (QNS) is a valuable tool for both the characterization of a qubit's environment and as a precursor to more effective qubit control to improve qubit fidelities. Existing approaches to QNS are what the classical spectrum estimation literature would call "non-parametric" approaches, in that a series of probe sequences are used to estimate noise power at a set of points or bands. In contrast, model-based approaches to spectrum estimation assume additional structure in the form of the spectrum and leverage this for improved statistical accuracy or other capabilities, such as superresolution. Here, we derive model-based QNS approaches using inspiration from classical signal processing, primarily though the recently developed Schrodinger wave autoregressive moving-average (SchWARMA) formalism for modeling correlated noise. We show, through both simulation and experimental data, how these model-based QNS approaches maintain the statistical and computational benefits of their classical counterparts, resulting in powerful new estimation approaches. Beyond the direct application of these approaches to QNS and quantum sensing, we anticipate that the flexibility of the underlying models will find utility in adaptive feedback control for quantum systems, in analogy with their role in classical adaptive signal processing and control.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# 3Dポイントクラウド分類とセマンティックセグメンテーションのためのディープラーニング技術の概要

A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation ( http://arxiv.org/abs/2405.11903v1 )

ライセンス: Link先を確認

Sushmita Sarker, Prithul Sarker, Gunner Stone, Ryan Gorman, Alireza Tavakkoli, George Bebis, Javad Sattarvand,

(参考訳) ポイントクラウド分析は、コンピュータビジョン、ロボット操作、自律運転など、多くの分野で幅広い応用がある。ディープラーニングは画像ベースのタスクで顕著に成功したが、大規模で秩序のない、不規則でノイズの多い3Dポイントを処理する際に、ディープニューラルネットワークが直面する多くのユニークな課題がある。今後の研究を奨励するために,ポイントクラウド処理に用いられているディープラーニング手法の最近の進歩を分析し,この分野を前進させる上での課題と潜在的方向性を示す。 3Dポイントクラウド処理における2つの主要なタスク、すなわち3D形状分類とセマンティックセグメンテーションの包括的なレビューとして機能する。

Point cloud analysis has a wide range of applications in many areas such as computer vision, robotic manipulation, and autonomous driving. While deep learning has achieved remarkable success on image-based tasks, there are many unique challenges faced by deep neural networks in processing massive, unordered, irregular and noisy 3D points. To stimulate future research, this paper analyzes recent progress in deep learning methods employed for point cloud processing and presents challenges and potential directions to advance this field. It serves as a comprehensive review on two major tasks in 3D point cloud processing-- namely, 3D shape classification and semantic segmentation.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# テキスト分類器の逆攻撃に対する制約付き後退法

A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers ( http://arxiv.org/abs/2405.11904v1 )

ライセンス: Link先を確認

Tom Roth, Inigo Jauregi Unanue, Alsharif Abuadbba, Massimo Piccardi,

(参考訳) テキスト分類器は、敵対的な例に弱い -- 正しく分類された例は、受け入れ可能性の制約を満たしつつ、意図的に非分類に変換される。逆例を見つけるための従来のアプローチは、許容可能な変換の空間上の組合せ最適化問題を定義し、解決することである。効果はあるものの、このアプローチは変革の選択によって遅く、制限されています。別のアプローチは、他のテキスト・テキスト・タスクで一般的に行われているように、事前訓練された言語モデルを微調整することで、直接敵の例を生成することである。このアプローチは、より速く、より表現力に富むことを約束するが、比較的探索されていない。このため、本研究では、エンコーダ-デコーダパラフレーズモデルをトレーニングし、多様な逆例を生成する。トレーニングには強化学習アルゴリズムを採用し,有効な逆例の生成を促進する制約付き報酬を提案する。 2つのテキスト分類データセットに対する実験結果から,本モデルは従来のパラフレーズモデルよりも高い成功率を示し,他の競合攻撃よりも総合的に効果的であることが判明した。最後に、重要な設計選択が生成した例にどのように影響するかを示し、提案手法の長所と短所について議論する。

Text classifiers are vulnerable to adversarial examples -- correctly-classified examples that are deliberately transformed to be misclassified while satisfying acceptability constraints. The conventional approach to finding adversarial examples is to define and solve a combinatorial optimisation problem over a space of allowable transformations. While effective, this approach is slow and limited by the choice of transformations. An alternate approach is to directly generate adversarial examples by fine-tuning a pre-trained language model, as is commonly done for other text-to-text tasks. This approach promises to be much quicker and more expressive, but is relatively unexplored. For this reason, in this work we train an encoder-decoder paraphrase model to generate a diverse range of adversarial examples. For training, we adopt a reinforcement learning algorithm and propose a constraint-enforcing reward that promotes the generation of valid adversarial examples. Experimental results over two text classification datasets show that our model has achieved a higher success rate than the original paraphrase model, and overall has proved more effective than other competitive attacks. Finally, we show how key design choices impact the generated examples and discuss the strengths and weaknesses of the proposed approach.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# CSTA:ビデオ要約のためのCNNに基づく時空間アテンション

CSTA: CNN-based Spatiotemporal Attention for Video Summarization ( http://arxiv.org/abs/2405.11905v1 )

ライセンス: Link先を確認

Jaewon Son, Jaehun Park, Kwangsu Kim,

(参考訳) ビデオ要約は、ビデオの簡潔な表現を生成し、本質的な内容とキーモーメントをキャプチャし、全体的な長さを短縮することを目的としている。いくつかの手法では長期依存を扱うために注意機構を採用しているが、フレームに固有の視覚的意義を捉えるのに失敗することが多い。この制限に対処するために,CNN ベースの SpatioTemporal Attention (CSTA) 手法を提案する。提案手法は,CNNによるフレーム内およびフレーム内関係の理解と,画像内の絶対位置を学習する能力を活用して,映像中の重要な属性を見つけることに依存する。空間的重要性を重視した追加モジュールを設計することで、従来の作業の効率向上とは対照的に、CSTAでは、CNNをスライディングウィンドウとして使用するため、計算オーバーヘッドを最小限に抑える必要がある。 2つのベンチマークデータセット(SumMeとTVSum)の大規模な実験により,提案手法は従来の手法に比べてMACが少なく,最先端の性能を実現していることが示された。コードはhttps://github.com/thswodnjs3/CSTAで公開されている。

Video summarization aims to generate a concise representation of a video, capturing its essential content and key moments while reducing its overall length. Although several methods employ attention mechanisms to handle long-term dependencies, they often fail to capture the visual significance inherent in frames. To address this limitation, we propose a CNN-based SpatioTemporal Attention (CSTA) method that stacks each feature of frames from a single video to form image-like frame representations and applies 2D CNN to these frame features. Our methodology relies on CNN to comprehend the inter and intra-frame relations and to find crucial attributes in videos by exploiting its ability to learn absolute positions within images. In contrast to previous work compromising efficiency by designing additional modules to focus on spatial importance, CSTA requires minimal computational overhead as it uses CNN as a sliding window. Extensive experiments on two benchmark datasets (SumMe and TVSum) demonstrate that our proposed approach achieves state-of-the-art performance with fewer MACs compared to previous methods. Codes are available at https://github.com/thswodnjs3/CSTA.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# Ensemble and Mixture-of-Experts DeepONets for Operator Learning

Ensemble and Mixture-of-Experts DeepONets For Operator Learning ( http://arxiv.org/abs/2405.11907v1 )

ライセンス: Link先を確認

Ramansh Sharma, Varun Shankar,

(参考訳) 演算子学習のための新しいディープ演算子ネットワーク(DeepONet)アーキテクチャであるアンサンブルDeepONetを提案する。このトランク濃縮により、様々な演算子学習問題に対する表現性と一般化能力が向上する。また,演算子学習問題における空間的局所性やモデル空間性を促進するために,PoU近似を用いた空間的混合(MoE)DeepONetトランクネットワークアーキテクチャを提案する。我々はまず、アンサンブルとPoU-MoE DeepONetsの両方が普遍近似器であることを証明した。次に、標準トランク、PoU-MoEトランク、および/または適切な直交分解(POD)トランクのトランクアンサンブルを含むDeepONetsが、標準DeepONetsおよびPOD-DeepONetsよりも2～4倍低い相対的な$\ell_2$エラーを、2次元および3次元の偏微分方程式(PDE)を含む新しい演算子学習問題において達成できることを実証した。新しいPoU-MoEの定式化は、任意のニューラルネットワークアーキテクチャに空間的局所性とモデル空間を組み込む自然な方法を提供する一方、新たなアンサンブルであるDeepONetは、演算子学習のための科学機械学習アーキテクチャに基礎を組み込むための強力で一般的なフレームワークを提供する。

We present a novel deep operator network (DeepONet) architecture for operator learning, the ensemble DeepONet, that allows for enriching the trunk network of a single DeepONet with multiple distinct trunk networks. This trunk enrichment allows for greater expressivity and generalization capabilities over a range of operator learning problems. We also present a spatial mixture-of-experts (MoE) DeepONet trunk network architecture that utilizes a partition-of-unity (PoU) approximation to promote spatial locality and model sparsity in the operator learning problem. We first prove that both the ensemble and PoU-MoE DeepONets are universal approximators. We then demonstrate that ensemble DeepONets containing a trunk ensemble of a standard trunk, the PoU-MoE trunk, and/or a proper orthogonal decomposition (POD) trunk can achieve 2-4x lower relative $\ell_2$ errors than standard DeepONets and POD-DeepONets on both standard and challenging new operator learning problems involving partial differential equations (PDEs) in two and three dimensions. Our new PoU-MoE formulation provides a natural way to incorporate spatial locality and model sparsity into any neural network architecture, while our new ensemble DeepONet provides a powerful and general framework for incorporating basis enrichment in scientific machine learning architectures for operator learning.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# PULL:PU学習に基づく正確なリンク予測

PULL: PU-Learning-based Accurate Link Prediction ( http://arxiv.org/abs/2405.11911v1 )

ライセンス: Link先を確認

Junghun Kim, Ka Hyun Park, Hoyoung Yoon, U Kang,

(参考訳) エッジ不完全グラフが与えられたら、どのようにして不足するリンクを正確に見つけることができるのか? エッジ不完全グラフにおけるリンク予測は、それらの関係がグラフとして表されるときに、エンティティ間の欠落した関係を発見することを目的としている。エッジ不完全なグラフは、ソーシャルネットワークに友達を追加する際にすべてのユーザーをチェックすることなど、現実的な制限のために現実的に普及している。この問題に対処することは、ソーシャルネットワークでの友人の推薦や、引用ネットワークにおける参照の発見など、さまざまなタスクに不可欠である。しかし、以前のアプローチは与えられたエッジ不完全(観測された)グラフに大きく依存しているため、トレーニング中に欠落している(観測されていない)リンクを考えるのは難しい。本稿では,PULL(PU-Learning-based Link predictor)を提案する。 PULLはトレーニンググラフの観測されたエッジを肯定的な例として扱い、未接続ノードペアをラベルのないものとして扱う。 PULLは、各エッジに対して潜在変数を提案し、変数に関して期待されるグラフ構造を活用することにより、リンク予測器が観測グラフにオーバーフィットすることを効果的に防止する。 5つの実世界のデータセットに対する大規模な実験は、PULLがエッジ不完全グラフのリンクを予測するベースラインを一貫して上回っていることを示している。

Given an edge-incomplete graph, how can we accurately find the missing links? The link prediction in edge-incomplete graphs aims to discover the missing relations between entities when their relationships are represented as a graph. Edge-incomplete graphs are prevalent in real-world due to practical limitations, such as not checking all users when adding friends in a social network. Addressing the problem is crucial for various tasks, including recommending friends in social networks and finding references in citation networks. However, previous approaches rely heavily on the given edge-incomplete (observed) graph, making it challenging to consider the missing (unobserved) links during training. In this paper, we propose PULL (PU-Learning-based Link predictor), an accurate link prediction method based on the positive-unlabeled (PU) learning. PULL treats the observed edges in the training graph as positive examples, and the unconnected node pairs as unlabeled ones. PULL effectively prevents the link predictor from overfitting to the observed graph by proposing latent variables for every edge, and leveraging the expected graph structure with respect to the variables. Extensive experiments on five real-world datasets show that PULL consistently outperforms the baselines for predicting links in edge-incomplete graphs.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# ARAIDA:Analogical Reasoning-Augmented Interactive Data Annotation

ARAIDA: Analogical Reasoning-Augmented Interactive Data Annotation ( http://arxiv.org/abs/2405.11912v1 )

ライセンス: Link先を確認

Chen Huang, Yiping Jin, Ilija Ilievski, Wenqiang Lei, Jiancheng Lv,

(参考訳) ヒューマンアノテーションは、かなりの労力を要する時間を要するタスクです。この問題に対処するために、インタラクティブなデータアノテーションはアノテーションモデルを使用して、人間が承認または修正するように提案する。しかし、ラベル付き限られたデータで訓練されたアノテーションモデルは、誤った提案を発生させる傾向があるため、追加の人間の修正努力がもたらされる。この課題に対処するために,対話型データアノテーション設定における自動アノテーション精度を高め,人間の修正の必要性を低減する類似推論に基づくアプローチであるAraidaを提案する。 Araidaは、アノテーションモデルとk-nearest neighbors(KNN)モデルを動的にコーディネートするエラー認識統合戦略で、アノテーションモデルからの予測が不正確であると判断された場合、KNNの予測をより重要視する。経験的研究は、Araidaが異なるアノテーションタスクやモデルに適応可能であることを示した。平均すると、バニラのインタラクティブなデータアノテーション手法に比べて、人間の修正作業が11.02%削減される。

Human annotation is a time-consuming task that requires a significant amount of effort. To address this issue, interactive data annotation utilizes an annotation model to provide suggestions for humans to approve or correct. However, annotation models trained with limited labeled data are prone to generating incorrect suggestions, leading to extra human correction effort. To tackle this challenge, we propose Araida, an analogical reasoning-based approach that enhances automatic annotation accuracy in the interactive data annotation setting and reduces the need for human corrections. Araida involves an error-aware integration strategy that dynamically coordinates an annotation model and a k-nearest neighbors (KNN) model, giving more importance to KNN's predictions when predictions from the annotation model are deemed inaccurate. Empirical studies demonstrate that Araida is adaptable to different annotation tasks and models. On average, it reduces human correction labor by 11.02% compared to vanilla interactive data annotation methods.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# Diff-BGM:ビデオバックグラウンド音楽生成のための拡散モデル

Diff-BGM: A Diffusion Model for Video Background Music Generation ( http://arxiv.org/abs/2405.11913v1 )

ライセンス: Link先を確認

Sizhe Li, Yiming Qin, Minghang Zheng, Xin Jin, Yang Liu,

(参考訳) ビデオを編集する際には、魅力的な背景音楽が不可欠である。しかし、ビデオバックグラウンド音楽生成タスクは、適切なトレーニングデータセットの欠如、音楽生成過程を柔軟に制御することの難しさ、ビデオと音楽の逐次的整列化など、いくつかの課題に直面している。本研究ではまず,ビデオと音楽に関するマルチモーダル情報を提供するための詳細なアノテーションとショット検出機能を備えた高品質な音楽ビデオデータセットBGM909を提案する。そこで我々は,音楽の多様性や音楽とビデオのアライメントを含む音楽の質を評価するための評価指標を,検索精度で提示する。最後に,ビデオの背景音楽を自動的に生成するDiff-BGMフレームワークを提案する。このフレームワークは,生成過程における音楽の異なる側面を制御するために異なる信号を使用する。本稿では,セグメント対応のクロスアテンション層を導入することで,映像と音楽の連続的な調整を提案する。提案手法の有効性を検証する実験を行った。コードとモデルはhttps://github.com/sizhelee/Diff-BGMで公開されている。

When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# PT43D:高輝度RGB画像から3次元形状を生成する確率変換器

PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images ( http://arxiv.org/abs/2405.11914v1 )

ライセンス: Link先を確認

Yiheng Xiong, Angela Dai,

(参考訳) ロボット工学などの様々な応用において,単一のRGB画像から3次元形状を生成することが不可欠である。現行のアプローチでは、物体の鮮明で完全な視覚的記述を含むイメージをターゲットとしており、物体の観察がおおむね無視される、あるいは取り消される、一般的な現実的なケースを考慮しない。そこで本稿では,RGB画像上の3次元形状の確率分布を生成するトランスフォーマーを用いた自己回帰モデルを提案する。閉塞や視野の切り離しといった現実的なシナリオに対処するために、実世界のシナリオの微調整を改善するために、シミュレートされた画像と形状のトレーニングペアを作成します。次に、入力画像から最も関連性の高い領域を効果的に識別し、形状生成を行う。これにより、適切な多様性と入力画像との強い整合性を持つサンプル形状の推測が可能となる。合成データに基づいてモデルをトレーニングし、テストし、微調整し、実世界のデータでテストします。実験により、どちらのシナリオにおいても、我々のモデルは最先端よりも優れていることが示された。

Generating 3D shapes from single RGB images is essential in various applications such as robotics. Current approaches typically target images containing clear and complete visual descriptions of the object, without considering common realistic cases where observations of objects that are largely occluded or truncated. We thus propose a transformer-based autoregressive model to generate the probabilistic distribution of 3D shapes conditioned on an RGB image containing potentially highly ambiguous observations of the object. To handle realistic scenarios such as occlusion or field-of-view truncation, we create simulated image-to-shape training pairs that enable improved fine-tuning for real-world scenarios. We then adopt cross-attention to effectively identify the most relevant region of interest from the input image for shape generation. This enables inference of sampled shapes with reasonable diversity and strong alignment with the input image. We train and test our model on our synthetic data then fine-tune and test it on real-world data. Experiments demonstrate that our model outperforms state of the art in both scenarios

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# 大規模言語モデルにおける埋め込みからの情報漏洩

Information Leakage from Embedding in Large Language Models ( http://arxiv.org/abs/2405.11916v1 )

ライセンス: Link先を確認

Zhipeng Wang, Anda Cheng, Yinggui Wang, Lei Wang,

(参考訳) 大規模言語モデル(LLM)の普及により、データのプライバシに関する懸念が高まっている。本研究の目的は,悪意のあるモデルプロバイダが埋め込みからユーザ入力を回復する可能性のある,入力再構成攻撃によるプライバシー侵害の可能性を検討することである。まず,モデルの隠れ状態からオリジナルテキストを再構築する2つの基本手法を提案する。これら2つの手法は, 浅い層からの埋め込み攻撃に有効であるが, より深い層からの埋め込み攻撃では効果が低下することがわかった。この問題に対処するため,Transformer ベースの Embed Parrot を提案し,深層への埋め込みから入力を再構築する。解析の結果,ChatGLM-6BとLlama2-7Bの隠れ状態からの入力を効果的に再構成し,トークン長やデータ分布の安定な性能を示すことがわかった。プライバシー侵害のリスクを軽減するため,埋め込み再構築プロセスの悪用を防ぐ防衛機構を導入する。本研究は,分散学習システムにおけるユーザプライバシ保護の重要性を強調し,そのような環境におけるセキュリティプロトコルの強化に有用な洞察を提供する。

The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find that these two methods are effective in attacking the embeddings from shallow layers, but their effectiveness decreases when attacking embeddings from deeper layers. To address this issue, we then present Embed Parrot, a Transformer-based method, to reconstruct input from embeddings in deep layers. Our analysis reveals that Embed Parrot effectively reconstructs original inputs from the hidden states of ChatGLM-6B and Llama2-7B, showcasing stable performance across various token lengths and data distributions. To mitigate the risk of privacy breaches, we introduce a defense mechanism to deter exploitation of the embedding reconstruction process. Our findings emphasize the importance of safeguarding user privacy in distributed learning systems and contribute valuable insights to enhance the security protocols within such environments.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# エネルギー結合形成における量子対古典的アルゴリズムの競合的ショーケース

A Competitive Showcase of Quantum versus Classical Algorithms in Energy Coalition Formation ( http://arxiv.org/abs/2405.11917v1 )

ライセンス: Link先を確認

Naeimeh Mohseni, Thomas Morstyn, Corey O Meara, David Bucher, Jonas Nüßlein, Giorgio Cortiana,

(参考訳) エネルギーコミュニティの形成は、非中央集権的かつ持続可能なエネルギー管理を進める上で重要である。この文脈の中で、協調構造生成(CSG)は有望なフレームワークとして現れます。 CSGの複雑さはエージェント数とともに急速に増大し、古典的解法は中程度のサイズでも実用的ではない(エージェント数>30)。そのため,高度な計算手法の開発が不可欠である。本研究は,Dwaveハードウェア上での量子アニーリングと,シミュレータおよびIBMQハードウェア上での量子近似最適化アルゴリズム(QAOA)を比較し,エネルギーコミュニティ形成に対処するベンチマークを行う。我々の古典的解法には、Tabu search、simulated annealing、そしてまさに古典的解法が含まれる。以上の結果から,Dwaveはハードウェア上でのQAOAを上回っていることがわかった。注目すべきは、QAOAがDwaveと同等のランタイムスケーリングを示していることだ。特に、Dwaveは古典的な解法と比較して競争力のある性能を示し、より好ましいランタイムスケーリングで同等品質のソリューションを実現している。

The formation of energy communities is pivotal for advancing decentralized and sustainable energy management. Within this context, Coalition Structure Generation (CSG) emerges as a promising framework. The complexity of CSG grows rapidly with the number of agents, making classical solvers impractical for even moderate sizes (number of agents>30). Therefore, the development of advanced computational methods is essential. Motivated by this challenge, this study conducts a benchmark comparing classical solvers with quantum annealing on Dwave hardware and the Quantum Approximation Optimization Algorithm (QAOA) on both simulator and IBMQ hardware to address energy community formation. Our classical solvers include Tabu search, simulated annealing, and an exact classical solver. Our findings reveal that Dwave surpasses QAOA on hardware in terms of solution quality. Remarkably, QAOA demonstrates comparable runtime scaling with Dwave, albeit with a significantly larger prefactor. Notably, Dwave exhibits competitive performance compared to the classical solvers, achieving solutions of equal quality with more favorable runtime scaling.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# データアノテーションの効率的・統計的品質推定法について

On Efficient and Statistical Quality Estimation for Data Annotation ( http://arxiv.org/abs/2405.11919v1 )

ライセンス: Link先を確認

Jan-Christoph Klie, Rahul Nair, Juan Haladjian, Marc Kirchner,

(参考訳) アノテーション付きデータセットは、教師付き機械学習モデルをトレーニング、評価、比較、生産化するための重要な要素である。したがって、アノテーションが高品質であることは必須である。彼らの創造のためには、優れた品質管理とそれによる信頼性の高い品質見積が必要である。そして、アノテーション処理中に品質が不十分な場合には、修正措置を講じて改善することができる。品質評価は、専門家が手動でインスタンスを正しくも正しくもラベル付けすることで行われることが多い。しかし、アノテーション付きのインスタンスをチェックするのはコストがかかる傾向にある。したがって、実際には、通常はサブセットのみを検査するが、大部分は正当化や統計的なパワーを考慮せずに選択され、多くの場合は比較的小さい。しかし、小さなサンプルサイズに基づく推定は、誤り率の不正確な値につながる可能性がある。不要な大規模なサンプルサイズの使用には、例えばアノテーションの追加など、もっと多くの費用がかかる可能性がある。そこで我々はまず,アノテーションの誤り率を推定するのに必要となる最小限のサンプルサイズを見つけるために,信頼区間の使い方を詳細に記述する。次に, 誤り率推定の代替として, 受入サンプリングを適用することで, 同じ統計的保証を提供しながら, 必要なサンプルサイズを最大50%削減できることを示す。

Annotated datasets are an essential ingredient to train, evaluate, compare and productionalize supervised machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. Quality estimation is often performed by having experts manually label instances as correct or incorrect. But checking all annotated instances tends to be expensive. Therefore, in practice, usually only subsets are inspected; sizes are chosen mostly without justification or regard to statistical power and more often than not, are relatively small. Basing estimates on small sample sizes, however, can lead to imprecise values for the error rate. Using unnecessarily large sample sizes costs money that could be better spent, for instance on more annotations. Therefore, we first describe in detail how to use confidence intervals for finding the minimal sample size needed to estimate the annotation error rate. Then, we propose applying acceptance sampling as an alternative to error rate estimation We show that acceptance sampling can reduce the required sample sizes up to 50% while providing the same statistical guarantees.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# MirrorGaussian:鏡の反射を再現するために3Dガウスを反射する

MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections ( http://arxiv.org/abs/2405.11921v1 )

ライセンス: Link先を確認

Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan,

(参考訳) 3D Gaussian Splattingは、写真リアリスティックおよびリアルタイムの新規ビュー合成において顕著な進歩を見せている。しかし、ミラー反射をモデル化する際の課題に直面しており、異なる視点からかなりの外見の変化が見られる。この問題に対処するために,3次元ガウススティングに基づくリアルタイムレンダリングによるミラーシーン再構成手法であるMirrorGaussianを提案する。重要な洞察は、現実世界空間と仮想ミラー空間の間のミラー対称性に基づいている。実世界の3Dガウスと鏡面を反映した鏡面を区別可能なラスタ化を実現するための直感的な二重レンダリング手法を提案する。すべての3Dガウスアンは、エンドツーエンドのフレームワークでミラープレーンと共同で最適化されている。 MirrorGaussianは、ミラー付きシーンで高品質でリアルタイムなレンダリングを実現し、新しいミラーやオブジェクトの追加のようなシーン編集の強化を実現している。複数のデータセットに対する総合的な実験により、我々のアプローチは既存の手法を著しく上回り、最先端の結果が得られることを示した。プロジェクトページ:https://mirror-gaussian.github.io/.com

3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting. The key insight is grounded on the mirror symmetry between the real-world space and the virtual mirror space. We introduce an intuitive dual-rendering strategy that enables differentiable rasterization of both the real-world 3D Gaussians and the mirrored counterpart obtained by reflecting the former about the mirror plane. All 3D Gaussians are jointly optimized with the mirror plane in an end-to-end framework. MirrorGaussian achieves high-quality and real-time rendering in scenes with mirrors, empowering scene editing like adding new mirrors and objects. Comprehensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods, achieving state-of-the-art results. Project page: https://mirror-gaussian.github.io/.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# 大規模分散二部グラフの効率的なクラスタリング

Effective Clustering on Large Attributed Bipartite Graphs ( http://arxiv.org/abs/2405.11922v1 )

ライセンス: Link先を確認

Renchi Yang, Yidu Wu, Xiaoyang Lin, Qichen Wang, Tsz Nam Chan, Jieming Shi,

(参考訳) 分散二部グラフ(ABG)は、顧客-商品購入ネットワークや著者-紙の著者間グラフなど、豊富な属性に関連付けられた2組の異種ノード間の相互作用を記述するための表現力のあるデータモデルである。このようなグラフにセットされたターゲットノードを(k-ABGCと呼ばれる)非結合クラスタに分割すると、ソーシャルネットワーク分析、レコメンデーションシステム、情報検索、バイオインフォマティクスなど、様々な領域で広く利用される。しかし、k-ABGCに対する既存のソリューションの大半は属性情報を見落としているか、二部グラフ構造を正確に捉えていないかのいずれかであり、非常に妥協された結果の品質を損なう。これらの問題の重大さは、数百万のノードと大量の属性データを含む実際のABGでアクセント化され、そのようなグラフ上で有効なk-ABGCをレンダリングする。本稿では,複数の実データセット上でのスーパーブクラスタリング性能を実現する,k-ABGCの効率的かつ効率的なアプローチであるTPOを提案する。 TPOは2つの主要な貢献を通じて高いクラスタリング品質を得る。 i) ABGにおけるマルチホップ接続を考慮したノード間の属性親和性獲得に特化したマルチスケール属性親和性に基づくk-ABGC問題の新たな定式化と変換 (II) 明確な親和性行列の構成をサイドステッピングし、より高速な収束を容易にするために、慎重に設計された最適化を含む高効率な解法。 5つの実ABGに対してTPOと19の基線を比較した大規模な実験では、TPOが地上トルスラベルに対して測定された優れたクラスタリング品質を示した。さらに、最先端技術と比較して、TPOは小さなABGと大きなABGのどちらよりも40倍以上高速であることが多い。

Attributed bipartite graphs (ABGs) are an expressive data model for describing the interactions between two sets of heterogeneous nodes that are associated with rich attributes, such as customer-product purchase networks and author-paper authorship graphs. Partitioning the target node set in such graphs into k disjoint clusters (referred to as k-ABGC) finds widespread use in various domains, including social network analysis, recommendation systems, information retrieval, and bioinformatics. However, the majority of existing solutions towards k-ABGC either overlook attribute information or fail to capture bipartite graph structures accurately, engendering severely compromised result quality. The severity of these issues is accentuated in real ABGs, which often encompass millions of nodes and a sheer volume of attribute data, rendering effective k-ABGC over such graphs highly challenging. In this paper, we propose TPO, an effective and efficient approach to k-ABGC that achieves superb clustering performance on multiple real datasets. TPO obtains high clustering quality through two major contributions: (i) a novel formulation and transformation of the k-ABGC problem based on multi-scale attribute affinity specialized for capturing attribute affinities between nodes with the consideration of their multi-hop connections in ABGs, and (ii) a highly efficient solver that includes a suite of carefully-crafted optimizations for sidestepping explicit affinity matrix construction and facilitating faster convergence. Extensive experiments, comparing TPO against 19 baselines over 5 real ABGs, showcase the superior clustering quality of TPO measured against ground-truth labels. Moreover, compared to the state of the arts, TPO is often more than 40x faster over both small and large ABGs.

翻訳日:2024-05-21 13:34:30 公開日:2024-05-20

# 『Set It Up!』:構成生成モデルによる機能的オブジェクトアレンジメント

"Set It Up!": Functional Object Arrangement with Compositional Generative Models ( http://arxiv.org/abs/2405.11928v1 )

ライセンス: Link先を確認

Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaebling, David Hsu,

(参考訳) 本稿では,「2つのダイニングテーブルをセットアップする」など,機能的オブジェクトアレンジメントを作成するための不特定な指示を理解可能なロボットを開発する上での課題について考察する。未特定命令の解釈を学習するためのフレームワークであるSetItUpを導入する。 SetItUpは少数のトレーニング例と人為的なプログラムスケッチを使って、特定のシーンタイプのアレンジルールを明らかにする。オブジェクト間の抽象空間関係の中間グラフのような表現を活用することで、SetItUpは配置問題を2つのサブプロブレムに分解する。一限られたデータから配置パターンを学習し、二これらの抽象的な関係をオブジェクトのポーズに基礎付けること。 SetItUpは、大きな言語モデル(LLM)を活用して、制約を満たす制約として、新しいシーンにおけるオブジェクト間の抽象的な空間的関係を提案し、制約を満たすオブジェクトのポーズを見つけるために、これらの抽象的関係に関連する拡散モデルのライブラリを構成する。研究用デスク,ダイニングテーブル,コーヒーテーブルからなるデータセット上で,本フレームワークの有効性を検証し,既存のモデルと比較して,物理的に可塑性,機能的,審美的に満足な物体配置を生成する上で,優れた性能を示した。

This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table." We introduce a framework, SetItUp, for learning to interpret under-specified instructions. SetItUp takes a small number of training examples and a human-crafted program sketch to uncover arrangement rules for specific scene types. By leveraging an intermediate graph-like representation of abstract spatial relationships among objects, SetItUp decomposes the arrangement problem into two subproblems: i) learning the arrangement patterns from limited data and ii) grounding these abstract relationships into object poses. SetItUp leverages large language models (LLMs) to propose the abstract spatial relationships among objects in novel scenes as the constraints to be satisfied; then, it composes a library of diffusion models associated with these abstract relationships to find object poses that satisfy the constraints. We validate our framework on a dataset comprising study desks, dining tables, and coffee tables, with the results showing superior performance in generating physically plausible, functional, and aesthetically pleasing object arrangements compared to existing models.