Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240529となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 早期軽度認知障害の解剖学的バイオマーカー同定のための機械学習アプローチ A Machine Learning Approach for Identifying Anatomical Biomarkers of Early Mild Cognitive Impairment ( http://arxiv.org/abs/2407.00040v1 ) ライセンス: Link先を確認	Alwani Liyana Ahmad, Jose Sanchez-Bornot, Roberto C. Sotero, Damien Coyle, Zamzuri Idris, Ibrahima Faye,	(参考訳) アルツハイマー病(英語: Alzheimer's Disease、AD)は、認知機能や運動機能に障害を与え、高齢化に主に影響を及ぼす進行性神経変性疾患である。磁気共鳴画像(MRI)のようなアクセス可能な方法でADを早期に検出することは、疾患の進行を停止または遅らせるための効果的な介入を開発するために不可欠である。本研究の目的は、MRIベースのバイオマーカーを選択し、個人を健康的なコントロール(HC)と不安定なコントロール(uHC)に分類する機械学習手法を網羅的に分析することである。この研究は、アルツハイマー病ニューロインフォマティクスイニシアチブ(ADNI)とOASIS-3(Open Access Series of Imaging Studies)のMRIデータを利用しており、HCとuHCの両方の参加者に焦点を当てている。この研究は、バランスの取れたデータセットとバランスの取れていないデータセットの分類法をテストすることで、不均衡なデータの課題に対処し、多項式回帰を用いてデータを調和させて、年齢、性別、頭蓋内容積などのニュアンス変数を緩和する。その結果、Gaussian Naive Bayes と RusBoost の分類器は最適な性能を示し、それぞれ ADNI データセット上で76.46% と 72.48% の精度を達成した。 OASIS-3データセットでは、Kernel Naive BayesとRusBoostは64.66%から75.71%のアキュラシーを発生させ、年齢に合わせたデータセットをさらに改善した。後角皮質、海馬、外側心室、外側眼窩前頭皮質などの脳領域は、早期の認知機能低下の過程で大きな影響が認められる。小さなサンプルサイズのような制限にもかかわらず、この研究の調和化アプローチはバイオマーカーの選択の堅牢性を高め、MRIを用いた早期AD検出のための半自動機械学習パイプラインの可能性を示している。 Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that primarily affects the aging population by impairing cognitive and motor functions. Early detection of AD through accessible methodologies like magnetic resonance imaging (MRI) is vital for developing effective interventions to halt or slow the disease's progression. This study aims to perform a comprehensive analysis of machine learning techniques for selecting MRI-based biomarkers and classifying individuals into healthy controls (HC) and unstable controls (uHC) who later show mild cognitive impairment within five years. The research utilizes MRI data from the Alzheimer's Disease Neuroinformatics Initiative (ADNI) and the Open Access Series of Imaging Studies 3 (OASIS-3), focusing on both HC and uHC participants. The study addresses the challenges of imbalanced data by testing classification methods on balanced and unbalanced datasets, and harmonizes data using polynomial regression to mitigate nuisance variables like age, gender, and intracranial volume. Results indicate that Gaussian Naive Bayes and RusBoost classifiers shows an optimal performance, achieving accuracies of up to 76.46% and 72.48% respectively on the ADNI dataset. For the OASIS-3 dataset, Kernel Naive Bayes and RusBoost yield accuracies ranging from 64.66% to 75.71%, improving further in age-matched datasets. Brain regions like the entorhinal cortex, hippocampus, lateral ventricle, and lateral orbitofrontal cortex are identified as significantly impacted during early cognitive decline. Despite limitations such as small sample sizes, the study's harmonization approach enhances the robustness of biomarker selection, suggesting the potential of this semi-automatic machine learning pipeline for early AD detection using MRI.	翻訳日:2024-07-22 22:38:24 公開日:2024-05-29
# cryoSPHERE:Cryo EMからの単一粒子の不均一な再構成 cryoSPHERE: Single-particle heterogeneous reconstruction from cryo EM ( http://arxiv.org/abs/2407.01574v1 ) ライセンス: Link先を確認	Gabriel Ducrocq, Lukas Grunewald, Sebastian Westenhoff, Fredrik Lindsten,	(参考訳) タンパク質の3次元構造は、その機能を決定する上で重要な役割を果たす。 AlphaFoldのような手法はアミノ酸配列のみに基づくタンパク質構造予測に革命をもたらした。しかし、タンパク質はしばしば複数の異なるコンフォメーションに出現し、完全なコンフォメーション分布を解決することは極めて重要である。単一粒子の低温電子顕微鏡(cryo EM)は、与えられたタンパク質の多数の画像をキャプチャする強力なツールであり、しばしば異なるコンフォーメーション(粒子として参照)を持つ。しかし、この画像はタンパク質の非常にノイズの多い投射であり、Cryo EM再構成の伝統的な方法は、1つまたは数個のコンフォメーションの回復に限られている。本稿では,AlphaFoldのタンパク質構造を入力として利用する深層学習手法であるCryoSPHEREを紹介する。この定式化は、単一のタンパク質構造の有意義な再構成を取り戻すのに十分な制約をもたらすことが示されている。異種再建の現況に対して一貫した改善が見られた例を3例に挙げる。 The three-dimensional structure of a protein plays a key role in determining its function. Methods like AlphaFold have revolutionized protein structure prediction based only on the amino-acid sequence. However, proteins often appear in multiple different conformations, and it is highly relevant to resolve the full conformational distribution. Single-particle cryo-electron microscopy (cryo EM) is a powerful tool for capturing a large number of images of a given protein, frequently in different conformations (referred to as particles). The images are, however, very noisy projections of the protein, and traditional methods for cryo EM reconstruction are limited to recovering a single, or a few, conformations. In this paper, we introduce cryoSPHERE, a deep learning method that takes as input a nominal protein structure, e.g. from AlphaFold, learns how to divide it into segments, and how to move these as approximately rigid bodies to fit the different conformations present in the cryo EM dataset. This formulation is shown to provide enough constraints to recover meaningful reconstructions of single protein structures. This is illustrated in three examples where we show consistent improvements over the current state-of-the-art for heterogeneous reconstruction.	翻訳日:2024-07-22 22:18:55 公開日:2024-05-29
# Hebbian Learningを用いたホップフィールドネットワークのプロトタイプ解析 Prototype Analysis in Hopfield Networks with Hebbian Learning ( http://arxiv.org/abs/2407.03342v1 ) ライセンス: Link先を確認	Hayden McAlister, Anthony Robins, Lech Szymanski,	(参考訳) ホップフィールドネットワークにおけるプロトタイプ形成について論じる。通常、高い相関状態を持つヘビアン学習は、劣化したメモリ性能をもたらす。このような学習は、未学習状態が大きな相関した状態のサブセットの代表として出現し、能力難を緩和するプロトタイプ形成に繋がることを示す。このプロセスは、人間の認知におけるプロトタイプ学習と類似している。本稿では,心理学,統計物理学,計算機科学からの貢献を網羅した,連想記憶におけるプロトタイプ学習の実質的な文献レビューを行う。理論的な観点からプロトタイプの生成を分析し,学習用プロトタイプのサンプル数,これらのサンプルのノイズ数,非サンプル状態の数に基づいて,これらの状態に対する安定性条件を導出する。安定性条件は、安定性変化の要因としてプロトタイプ状態に対する安定性の確率を構築するために用いられる。また、従来のネットワーク分析と類似点があり、プロトタイプのキャパシティを見つけることができます。標準ヘビアン学習を用いた単純なホップフィールドネットワークを用いた実験により,プロトタイプ形成に対するこれらの期待を裏付ける。我々は実験を、複数のプロトタイプでデータに基づいて訓練されたホップフィールドネットワークに拡張し、同時に複数のプロトタイプを安定化できるネットワークを見つける。複数のプロトタイプ状態のアトラクションの流域を測定し,実例数と実例の一致により,アトラクタ強度が増大することを示した。我々はプロトタイプ状態の安定性と支配性をこれらの状態のエネルギープロファイルと結びつけ、特にプロファイル形状をターゲット状態や他の刺激状態と比較する。 We discuss prototype formation in the Hopfield network. Typically, Hebbian learning with highly correlated states leads to degraded memory performance. We show this type of learning can lead to prototype formation, where unlearned states emerge as representatives of large correlated subsets of states, alleviating capacity woes. This process has similarities to prototype learning in human cognition. We provide a substantial literature review of prototype learning in associative memories, covering contributions from psychology, statistical physics, and computer science. We analyze prototype formation from a theoretical perspective and derive a stability condition for these states based on the number of examples of the prototype presented for learning, the noise in those examples, and the number of non-example states presented. The stability condition is used to construct a probability of stability for a prototype state as the factors of stability change. We also note similarities to traditional network analysis, allowing us to find a prototype capacity. We corroborate these expectations of prototype formation with experiments using a simple Hopfield network with standard Hebbian learning. We extend our experiments to a Hopfield network trained on data with multiple prototypes and find the network is capable of stabilizing multiple prototypes concurrently. We measure the basins of attraction of the multiple prototype states, finding attractor strength grows with the number of examples and the agreement of examples. We link the stability and dominance of prototype states to the energy profile of these states, particularly when comparing the profile shape to target states or other spurious states.	翻訳日:2024-07-22 22:09:04 公開日:2024-05-29
# 近代ホップフィールドネットワークにおけるロバスト性向上とハイパーパラメータ選択 Improved Robustness and Hyperparameter Selection in Modern Hopfield Networks ( http://arxiv.org/abs/2407.08742v1 ) ライセンス: Link先を確認	Hayden McAlister, Anthony Robins, Lech Szymanski,	(参考訳) 現代のホップフィールドネットワークは、よりシャープな相互作用関数を許容することによって、古典的なホップフィールドネットワークを一般化する。これにより、近くの学習されたアトラクションが互いに干渉しないため、自己連想記憶としてのネットワークの容量が増大する。しかし、ネットワークの実装は、メモリベクトルとプローブベクトルのドット積に大きな指数を適用することに依存している。データの次元が大きければ、計算は非常に大きくなり、実際の実装で浮動小数点数を使用する場合の問題が発生する。この問題を詳細に記述し、元のネットワーク記述を変更して問題を緩和し、更新やトレーニング中にネットワークのダイナミクスを変更することはないことを示す。また、現在のホップフィールドネットワークにおけるハイパーパラメータ選択を大幅に改善し、相互作用頂点への依存を取り除き、元のネットワークのように相互作用頂点に大きく変化しない最適なハイパーパラメータ領域が得られることを示した。 The modern Hopfield network generalizes the classical Hopfield network by allowing for sharper interaction functions. This increases the capacity of the network as an autoassociative memory as nearby learned attractors will not interfere with one another. However, the implementation of the network relies on applying large exponents to the dot product of memory vectors and probe vectors. If the dimension of the data is large the calculation can be very large and result in problems when using floating point numbers in a practical implementation. We describe this problem in detail, modify the original network description to mitigate the problem, and show the modification will not alter the networks' dynamics during update or training. We also show our modification greatly improves hyperparameter selection for the modern Hopfield network, removing the dependence on the interaction vertex and resulting in an optimal region of hyperparameters that does not significantly change with the interaction vertex as it does in the original network.	翻訳日:2024-07-22 13:48:17 公開日:2024-05-29
# 看護計画の最適化:医療機関のサプライチェーンアプローチ Optimizing Nurse Scheduling: A Supply Chain Approach for Healthcare Institutions ( http://arxiv.org/abs/2407.11195v1 ) ライセンス: Link先を確認	Jubin Thomas,	(参考訳) 組織を管理する場合、プランナーは多くの困難なシナリオに直面します。このような例では、直感や管理的経験のみに頼るだけでは十分ではなく、定量的なアプローチが必要である。この需要は、厳重なスケールと制約の複雑さが重大な課題を引き起こすビッグデータの時代においてさらに強調される。そこで本研究の目的は,組織管理における重要な課題である人事スケジューリングの基盤となる枠組みを提供することである。具体的には,契約義務や強制休業期間などの要因によって複雑化した作業である,スタッフのシフト割り当ての最適化に焦点をあてる。さらに、現在の状況は、様々な産業で従業員不足が頻発していることが特徴であり、多くの組織では、それに対応するための効率的で信頼性の高い管理ツールが欠如している。したがって, 医療環境における人事スケジューリングの課題である, 介護者のロスター問題に特に注目が集まっている。これらの問題は、単一の医療施設が数百人の看護師を雇う可能性があることや、適切なスタッフレベルや夜勤の休息といった厳しい制約が課せられることを考えると、様々な変数が特徴である。さらに、新型コロナウイルス(COVID-19)のパンデミックは、医療機関のスタッフの課題を悪化させ、従業員ニーズを正確に評価し、危機状況下での効果的な運用のためのシフト割り当てを最適化することの重要性を浮き彫りにした。 When managing an organization, planners often encounter numerous challenging scenarios. In such instances, relying solely on intuition or managerial experience may not suffice, necessitating a quantitative approach. This demand is further accentuated in the era of big data, where the sheer scale and complexity of constraints pose significant challenges. Therefore, the aim of this study is to provide a foundational framework for addressing personnel scheduling, a critical issue in organizational management. Specifically, we focus on optimizing shift assignments for staff, a task fraught with complexities due to factors such as contractual obligations and mandated rest periods. Moreover, the current landscape is characterized by frequent employee shortages across various industries, with many organizations lacking efficient and dependable management tools to address them. Therefore, our attention is particularly drawn to the nurse rostering problem, a personnel scheduling challenge prevalent in healthcare settings. These issues are characterized by a multitude of variables, given that a single healthcare facility may employ hundreds of nurses, alongside stringent constraints such as the need for adequate staffing levels and rest periods postnight shifts. Furthermore, the ongoing COVID19 pandemic has exacerbated staffing challenges in healthcare institutions, underlining the importance of accurately assessing staffing needs and optimizing shift allocations for effective operation amidst crisis situations.	翻訳日:2024-07-22 12:00:08 公開日:2024-05-29
# オープンソースライセンスの変更と取り消しについて On the modification and revocation of open source licences ( http://arxiv.org/abs/2407.13064v1 ) ライセンス: Link先を確認	Paul Gagnon, Misha Benjamin, Justine Gauthier, Catherine Regis, Jenny Lee, Alexei Nordell-Markovits,	(参考訳) 歴史的に、オープンソースライセンス下で資料がリリースされると、オープンソースへのコミットメントは無効とみなされてきた。本稿では,オープンソースコントリビュータがユーザを強制する権利のサブセットの作成について論じる。 (i)最新のモデルの更新。 (二新たな利用制限を受理すること、又は三ソフトウェアの使用を全面的に停止すること。これは従来のオープンソースアプローチから逸脱するものの、オープンソースAIモデルに関連する法的、評判、道徳的なリスクは、下流の使用をもっとコントロールできるコントリビュータを正当化する可能性がある。最近の法律改正により、あるケースでは、オープンソースコントリビュータの責任への扉が開かれた。著者らは、下流のユーザがバイアスやガードレール回避、あるいは彼らのコントリビューションに対する敵攻撃といった問題に対処するアップデートを確実に実施できることを、コントリビュータは歓迎するだろうと考えている。最後に、このライセンスカテゴリがRAILライセンスとどのように相互作用するか、OSSプラットフォームやスキャニングツールといった主要な利害関係者による運用と採用の方法について述べる。 Historically, open source commitments have been deemed irrevocable once materials are released under open source licenses. In this paper, the authors argue for the creation of a subset of rights that allows open source contributors to force users to (i) update to the most recent version of a model, (ii) accept new use case restrictions, or even (iii) cease using the software entirely. While this would be a departure from the traditional open source approach, the legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses. Recent legislative changes have also opened the door to liability of open source contributors in certain cases. The authors believe that contributors would welcome the ability to ensure that downstream users are implementing updates that address issues like bias, guardrail workarounds or adversarial attacks on their contributions. Finally, this paper addresses how this license category would interplay with RAIL licenses, and how it should be operationalized and adopted by key stakeholders such as OSS platforms and scanning tools.	翻訳日:2024-07-22 08:18:00 公開日:2024-05-29
# 因果グラフ検証に向けたPrompt-based vs. Fine-Tuned LLMs Prompt-based vs. Fine-tuned LLMs Toward Causal Graph Verification ( http://arxiv.org/abs/2406.16899v1 ) ライセンス: Link先を確認	Yuni Susanti, Nina Holsmoelle,	(参考訳) 本研究の目的は,テキストソースを用いた因果グラフの自動検証に自然言語処理(NLP)技術を適用することである。因果グラフは、しばしば教師なし因果発見法から派生し、人間の専門家による手作業による評価を必要とする。 NLP技術、すなわちBERTやChatGPTのような大規模言語モデル(LLM)は、テキストコンテキストに基づいてノードペア間の因果関係を観測できるかどうかを予測することによって、結果の因果グラフを検証できる可能性がある。本研究では,(1)因果関係分類タスクに微調整された事前学習言語モデル,(2)プロンプトベースLPMの2種類のNLPモデルの性能を比較した。プロンプトベースのLLMが様々なタスクに対して比較的うまく機能する以前の研究とは対照的に、バイオメディカルおよびオープンドメインのデータセットに関する予備実験では、微調整されたモデルはプロンプトベースのLLMよりも優れており、F1スコアは最大20.5ポイント向上している。コードと事前処理されたデータセットをリポジトリで共有しました。 This work aims toward an application of natural language processing (NLP) technology for automatic verification of causal graphs using text sources. A causal graph is often derived from unsupervised causal discovery methods and requires manual evaluation from human experts. NLP technologies, i.e., Large Language Models (LLMs) such as BERT and ChatGPT, can potentially be used to verify the resulted causal graph by predicting if causal relation can be observed between node pairs based on the textual context. In this work, we compare the performance of two types of NLP models: (1) Pre-trained language models fine-tuned for causal relation classification task and, (2) prompt-based LLMs. Contrasted to previous studies where prompt-based LLMs work relatively well over a set of diverse tasks, preliminary experiments on biomedical and open-domain datasets suggest that the fine-tuned models far outperform the prompt-based LLMs, up to 20.5 points improvement of F1 score. We shared the code and the pre-processed datasets in our repository.	翻訳日:2024-07-01 06:41:31 公開日:2024-05-29
# 平等な説明可能なAIを進化させる学際的専門知識 Interdisciplinary Expertise to Advance Equitable Explainable AI ( http://arxiv.org/abs/2406.18563v1 ) ライセンス: Link先を確認	Chloe R. Bennett, Heather Cole-Lewis, Stephanie Farquhar, Naama Haamel, Boris Babenko, Oran Lang, Mat Fleck, Ilana Traynis, Charles Lau, Ivor Horn, Courtney Lyles,	(参考訳) 人工知能(AI)の分野は、健康と医療に急速に影響している。従来の研究は、データ代表性やモデルパフォーマンスに厳格な注意を払って、エクイティを推し進め、バイアスを減らす必要性を明確に示した。しかし、社会疫学のベストプラクティスと健康のエクイティを活用してAIの説明可能性を向上させる機会もあり、見いだされた協会の仮説の策定に役立てることができる。本稿では、説明可能なAI(XAI)に注目し、複数の視点からAIモデルの説明を議論し、批判的に評価し、将来の研究のバイアスと方向性の領域を特定するための学際的専門家パネルレビューのためのフレームワークを記述する。我々は,学際的専門家パネルの重要性を強調し,歴史的かつ文脈的に理解された,より正確で公平な解釈を創出する。学際的なパネルディスカッションは、バイアスを減らし、潜在的な共同創設者を特定し、文献にギャップがある追加研究の機会を特定するのに役立つ。これらの洞察は、AIモデルの改善の機会を示唆する。 The field of artificial intelligence (AI) is rapidly influencing health and healthcare, but bias and poor performance persists for populations who face widespread structural oppression. Previous work has clearly outlined the need for more rigorous attention to data representativeness and model performance to advance equity and reduce bias. However, there is an opportunity to also improve the explainability of AI by leveraging best practices of social epidemiology and health equity to help us develop hypotheses for associations found. In this paper, we focus on explainable AI (XAI) and describe a framework for interdisciplinary expert panel review to discuss and critically assess AI model explanations from multiple perspectives and identify areas of bias and directions for future research. We emphasize the importance of the interdisciplinary expert panel to produce more accurate, equitable interpretations which are historically and contextually informed. Interdisciplinary panel discussions can help reduce bias, identify potential confounders, and identify opportunities for additional research where there are gaps in the literature. In turn, these insights can suggest opportunities for AI model improvement.	翻訳日:2024-07-01 06:00:20 公開日:2024-05-29
# 回転平均化:サイクルグラフにおける原始双対法と閉形 Rotation Averaging: A Primal-Dual Method and Closed-Forms in Cycle Graphs ( http://arxiv.org/abs/2406.18564v1 ) ライセンス: Link先を確認	Gabriel Moreira, Manuel Marques, João Paulo Costeira,	(参考訳) 幾何的再構成の土台である回転平均化(英語版)は、それらの間の測定された相対方向の集合を最適に説明する絶対回転の集合を求める。回転の同期は、バンドル調整や動きからの構造化の不可欠な部分であるだけでなく、視覚的同時ローカライゼーションやマッピングにも応用され、反復型ソルバの初期化やカメラネットワークキャリブレーションにも応用されている。しかし、この最適化問題は非凸と高次元の両方である。本稿では,最大推定点からこの問題に対処し,2倍のコントリビューションを行う。まず、広く受け入れられているスペクトル初期化を動機とした、新しい原始双対法を考案した。さらに、サイクルグラフトポロジにおける平均回転点の定常点を特徴付け、スペクトルグラフ理論におけるこの結果の文脈化を行う。提案手法を複数の設定でベンチマークし、双対性理論を用いて解を証明し、精度と性能を著しく向上させる。 A cornerstone of geometric reconstruction, rotation averaging seeks the set of absolute rotations that optimally explains a set of measured relative orientations between them. In addition to being an integral part of bundle adjustment and structure-from-motion, the problem of synchronizing rotations also finds applications in visual simultaneous localization and mapping, where it is used as an initialization for iterative solvers, and camera network calibration. Nevertheless, this optimization problem is both non-convex and high-dimensional. In this paper, we address it from a maximum likelihood estimation standpoint and make a twofold contribution. Firstly, we set forth a novel primal-dual method, motivated by the widely accepted spectral initialization. Further, we characterize stationary points of rotation averaging in cycle graphs topologies and contextualize this result within spectral graph theory. We benchmark the proposed method in multiple settings and certify our solution via duality theory, achieving a significant gain in precision and performance.	翻訳日:2024-07-01 06:00:20 公開日:2024-05-29
# Chiribella et al., Phys. Lett. 132 (2024) 190201, arXiv:2301.10885 Comment on Chiribella et al., Phys. Rev. Lett. 132 (2024) 190201, arXiv:2301.10885 ( http://arxiv.org/abs/2406.04363v1 ) ライセンス: Link先を確認	Robert B. Griffiths,	(参考訳) Chiribella et al の論文 'Bell Nonlocality in Classical Systems Coexisting with Other System Types' は、非可換な量子プロジェクタを無視した方法で量子文脈における 'classical' を定義し、したがってヒルベルト量子量子理論と矛盾しない。 The article `Bell Nonlocality in Classical Systems Coexisting with Other System Types' by Chiribella et al. defines `classical' in a quantum context in a way that ignores noncommuting quantum projectors, and is hence inconsistent with Hilbert-space quantum theory.	翻訳日:2024-06-23 14:05:12 公開日:2024-05-29
# オートマタ・シ・アプリカティイの論文 Sisteme Hibride de Invatare Automata si Aplicatii ( http://arxiv.org/abs/2406.11870v1 ) ライセンス: Link先を確認	Eduard Hogea, Darian Onchis,	(参考訳) 本稿では、分類と回帰のために、ディープニューラルネットワークアプローチとニューロシンボリックアプローチを提案する。 Logic Tensor Networksに基づくニューロシンボリック予測モデルは、警告または攻撃と呼ばれる悪い接続の特徴と通常の接続の特徴を説明すると同時に、識別することができる。提案するハイブリッドシステムは、経験を通じて、深層ニューラルネットワークによる独自の改善能力と、象徴的な人工知能アプローチによって提供される結果の解釈可能性の両方を取り入れている。ハイブリッドシステムへのシフトの必要性を正当化するために、高密度ニューラルネットワークとニューロシンボリックネットワークの詳細な説明、実装、比較を行う。関連する比較のために、同じデータセットをトレーニングに使用し、結果のメトリクスを比較した。結果のメトリクスのレビューでは、どちらの手法も予測モデルに類似した精度を持つが、Logic Tensor Networksはデータよりもインタラクティブな精度と推論の推論も可能である。また、過度な緩和やスケーラビリティの問題といった他の利点や欠点も議論されている。 In this paper, a deep neural network approach and a neuro-symbolic one are proposed for classification and regression. The neuro-symbolic predictive models based on Logic Tensor Networks are capable of discriminating and in the same time of explaining the characterization of bad connections, called alerts or attacks, and of normal connections. The proposed hybrid systems incorporate both the ability of deep neural networks to improve on their own through experience and the interpretability of the results provided by symbolic artificial intelligence approach. To justify the need for shifting towards hybrid systems, explanation, implementation, and comparison of the dense neural network and the neuro-symbolic network is performed in detail. For the comparison to be relevant, the same datasets were used in training and the metrics resulted have been compared. A review of the resulted metrics shows that while both methods have similar precision in their predictive models, with Logic Tensor Networks being also possible to have interactive accuracy and deductive reasoning over data. Other advantages and disadvantages such as overfitting mitigation and scalability issues are also further discussed.	翻訳日:2024-06-23 13:24:48 公開日:2024-05-29
# Pretrained Mobility Transformer: 人体移動のための基礎モデル Pretrained Mobility Transformer: A Foundation Model for Human Mobility ( http://arxiv.org/abs/2406.02578v1 ) ライセンス: Link先を確認	Xinhua Wu, Haoyu He, Yanchao Wang, Qi Wang,	(参考訳) ユビキタスなモバイルデバイスは、個人が都市空間を詳細にナビゲートし利用する方法を明らかにする、膨大な量の位置情報ベースのサービスデータを生成している。本研究では,都市空間と人間の移動性を理解するための基礎モデルを構築するために,これらの広範囲な未ラベルのユーザトラジェクトリを利用する。本稿では, ユーザトラジェクトリを自己回帰的に処理し, 地理的領域をトークンに変換し, 空間的および時間的情報をこれらの表現内に埋め込むためのトランスフォーマアーキテクチャを利用する, PMT (textbf{M}obility \textbf{T}ransformer) を提案する。 2ヶ月間に3つの大都市圏で実施された実験は、PMTが地域の地理的・社会的なデコグラフィー特性を捉える能力を示している。提案したPMTは、次の位置予測、軌道計算、軌道生成など、様々な下流タスクにまたがる。これらの結果は、都市空間機能と個人の移動性嗜好に関する新たな洞察を提供する、人間の移動性の複雑なパターンの復号化におけるPMTの能力と有効性を支持する。 Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (PMT), which leverages the transformer architecture to process user trajectories in an autoregressive manner, converting geographical areas into tokens and embedding spatial and temporal information within these representations. Experiments conducted in three U.S. metropolitan areas over a two-month period demonstrate PMT's ability to capture underlying geographic and socio-demographic characteristics of regions. The proposed PMT excels across various downstream tasks, including next-location prediction, trajectory imputation, and trajectory generation. These results support PMT's capability and effectiveness in decoding complex patterns of human mobility, offering new insights into urban spatial functionality and individual mobility preferences.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-29
# 効率的な数値計算のためのオープンソースフレームワーク An Open-Source Framework for Efficient Numerically-Tailored Computations ( http://arxiv.org/abs/2406.02579v1 ) ライセンス: Link先を確認	Louis Ledoux, Marc Casas,	(参考訳) 本稿では,効率的な行列行列乗算(MMM)を容易にするために設計された多用途オープンソースフレームワークを提案する。このフレームワークは2つの主要なコントリビューションを提供している: 1つは、算術データパス生成のための微調整された自動パイプラインで、高度にカスタマイズ可能なシストリックなMMMカーネルを実現する。このフレームワークは、人工知能(AI)推論や海面高度(SSH)計算など、さまざまな数値要件を示す多様なハイパフォーマンスコンピューティング(HPC)ワークロードに対して、エネルギーコスト当たりの精度を体系的に向上させる。 AI推論では、ResNet18、ResNet34、ResNet50、DenseNet121、DenseNet161、DenseNet169、VGG11という最先端のニューラルネットワークモデルを、2つのデータセット、2つのコンピュータフォーマット、27の異なる中間演算データパスと共に検討する。 IEEE754-32の3.3\times$とResNet50のImageNet推論中のBfloat16の1.4\times$の3.3\times$である。これは従来の浮動小数点演算器(FPU)に匹敵する8.2.3\%と8.6\%の精度を維持しながら達成される。 SSH計算の文脈では、FPUにおける従来の2倍精度演算と4倍精度演算の精度を上回る2倍精度の単語を用いて、完全再現可能な結果を得る。提案手法は, IEEE754-64 と IEEE754-128 と比較して, SSH の計算精度を最低で 5\times$ と $27\times$ で向上させ, 結果として 5.6\times$ と $115.1\times$ の計算精度の向上を実現した。 We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNet18, ResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in conjunction with two datasets, two computer formats, and 27 distinct intermediate arithmetic datapaths. Our approach consistently reduces energy consumption across all cases, with a notable example being the reduction by factors of $3.3\times$ for IEEE754-32 and $1.4\times$ for Bfloat16 during ImageNet inference with ResNet50. This is accomplished while maintaining accuracies of $82.3\%$ and $86\%$, comparable to those achieved with conventional Floating-Point Units (FPUs). In the context of SSH computation, our method achieves fully-reproducible results using double-precision words, surpassing the accuracy of conventional double- and quad-precision arithmetic in FPUs. Our approach enhances SSH computation accuracy by a minimum of $5\times$ and $27\times$ compared to IEEE754-64 and IEEE754-128, respectively, resulting in $5.6\times$ and $15.1\times$ improvements in accuracy per power cost.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-29
# ディープニューラルネットワークとしてのカオスダイナミクスの爆発 Exploiting Chaotic Dynamics as Deep Neural Networks ( http://arxiv.org/abs/2406.02580v1 ) ライセンス: Link先を確認	Shuhong Liu, Nozomi Akashi, Qingyao Huang, Yasuo Kuniyoshi, Kohei Nakajima,	(参考訳) カオスは、非線形性および初期状態に対する感度から生じる複素ダイナミクスを示す。これらの特徴は、高度な計算応用のポテンシャルを裏付ける表現性の深さを示唆している。しかし、情報処理にカオス力学を効果的に活用するための戦略は、ほとんど解明されていない。本研究では,様々な最先端の深層ニューラルネットワークでカオスの本質を見出すことができることを示した。この啓示から着想を得た本研究では,カオス力学を直接活用して深層学習アーキテクチャを提案する。我々のアプローチは、異なるカオスシステムにまたがって体系的に評価される。すべての場合において、我々のフレームワークは精度、収束速度、効率の点で従来のディープニューラルネットワークに優れた結果をもたらす。さらに,本手法では,過渡的カオス形成の活発な役割を見出した。この研究は、情報処理において長年見過ごされてきたカオスの統合のための新しい経路を提供し、機械学習とニューロモルフィック計算の領域におけるカオス力学の将来的な融合に関する洞察を提供する。 Chaos presents complex dynamics arising from nonlinearity and a sensitivity to initial states. These characteristics suggest a depth of expressivity that underscores their potential for advanced computational applications. However, strategies to effectively exploit chaotic dynamics for information processing have largely remained elusive. In this study, we reveal that the essence of chaos can be found in various state-of-the-art deep neural networks. Drawing inspiration from this revelation, we propose a novel method that directly leverages chaotic dynamics for deep learning architectures. Our approach is systematically evaluated across distinct chaotic systems. In all instances, our framework presents superior results to conventional deep neural networks in terms of accuracy, convergence speed, and efficiency. Furthermore, we found an active role of transient chaos formation in our scheme. Collectively, this study offers a new path for the integration of chaos, which has long been overlooked in information processing, and provides insights into the prospective fusion of chaotic dynamics within the domains of machine learning and neuromorphic computation.	翻訳日:2024-06-09 15:49:54 公開日:2024-05-29
# ε$-Optimally Solving Zero-Sum POSGs $ε$-Optimally Solving Zero-Sum POSGs ( http://arxiv.org/abs/2406.00054v1 ) ライセンス: Link先を確認	Erwan Escudie, Matthia Sabatelli, Jilles Dibangoye,	(参考訳) ゼロサム部分可観測確率ゲーム (zs-POSGs) の解法は、元のゲームを占領マルコフゲームと呼ばれる新しいゲームに埋め込む。この再構成により、zs-POSGを解くためにベルマンの最適性原理を適用することができる。しかし、現在のソリューションを改善するには、指数関数的に多くの潜在的な制約を持つ線形プログラムを解く必要があり、このアプローチのスケーラビリティを著しく制限する。本稿では、この制限を克服するために、最適値関数の新たな一様連続性特性を利用する。まず、最適性を損なうことなく、最新の更新ルールよりも計算効率の良い新しい演算子を構築する。特に、現在の解を改善するには、指数関数的な制約の減少を伴う線形プログラムが必要となる。また,各領域における保証を維持しつつ,既存の手法のスケーラビリティを向上させる点ベースの値反復アルゴリズムについても示す。 A recent method for solving zero-sum partially observable stochastic games (zs-POSGs) embeds the original game into a new one called the occupancy Markov game. This reformulation allows applying Bellman's principle of optimality to solve zs-POSGs. However, improving a current solution requires solving a linear program with exponentially many potential constraints, which significantly restricts the scalability of this approach. This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation. We first construct a new operator that is computationally more efficient than the state-of-the-art update rules without compromising optimality. In particular, improving a current solution now involves a linear program with an exponential drop in constraints. We then also show that point-based value iteration algorithms utilizing our findings improve the scalability of existing methods while maintaining guarantees in various domains.	翻訳日:2024-06-06 08:53:00 公開日:2024-05-29
# 文脈と時間知覚的長期記憶を用いた会話エージェントの実現に向けて Toward Conversational Agents with Context and Time Sensitive Long-term Memory ( http://arxiv.org/abs/2406.00057v1 ) ライセンス: Link先を確認	Nick Alonso, Tomás Figliolia, Anthony Ndirango, Beren Millidge,	(参考訳) 近年,長期記憶を持つ会話エージェントへの関心が高まっており,検索強化生成(RAG)を用いた言語モデルの開発が急速に進んでいる。最近まで、RAGに関するほとんどの研究は、長文の会話の情報ではなく、ウィキペディアのような巨大なテキストデータベースからの情報検索に重点を置いてきた。本稿では,データベースの静的検索と比較して,長文形式の会話データからの効果的な検索が2つの問題に直面していることを論じる。 1)時間/イベントベースのクエリで、会話イベントの時間や順序(例えば、火曜日の第3回会話)に基づいて、モデルが過去の会話に関する情報を取得する必要がある。 2) 周囲の会話コンテキストを理解する必要があるあいまいなクエリ。これらの課題に対処できるRAGベースのエージェントをより良く開発するために、私たちは、最近の長文でシミュレートされた会話のデータセットの上に構築された、あいまいで時間的な質問の新しいデータセットを作成し、標準RAGベースのアプローチがそのような質問を不十分に扱うことを実証する。そこで我々は,連鎖型検索手法,標準ベクトルデータベース検索,問合せを曖昧にするためのプロンプト手法を組み合わせた新しい検索モデルを開発し,これらの課題を解決するための現在の手法よりも大幅に改善されていることを示す。この新しいデータセットとより高度なRAGエージェントは、重要なベンチマークとして機能し、さまざまなAIアプリケーションで使用可能な、効果的なメモリ拡張会話エージェントへと踏み込むことができると考えています。 There has recently been growing interest in conversational agents with long-term memory which has led to the rapid development of language models that use retrieval-augmented generation (RAG). Until recently, most work on RAG has focused on information retrieval from large databases of texts, like Wikipedia, rather than information from long-form conversations. In this paper, we argue that effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval: 1) time/event-based queries, which requires the model to retrieve information about previous conversations based on time or the order of a conversational event (e.g., the third conversation on Tuesday), and 2) ambiguous queries that require surrounding conversational context to understand. To better develop RAG-based agents that can deal with these challenges, we generate a new dataset of ambiguous and time-based questions that build upon a recent dataset of long-form, simulated conversations, and demonstrate that standard RAG based approaches handle such questions poorly. We then develop a novel retrieval model which combines chained-of-table search methods, standard vector-database retrieval, and a prompting method to disambiguate queries, and demonstrate that this approach substantially improves over current methods at solving these tasks. We believe that this new dataset and more advanced RAG agent can act as a key benchmark and stepping stone towards effective memory augmented conversational agents that can be used in a wide variety of AI applications.	翻訳日:2024-06-06 08:53:00 公開日:2024-05-29
# Conveyor: ツール部分実行を備えた効率的なツール対応LDM Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution ( http://arxiv.org/abs/2406.00059v1 ) ライセンス: Link先を確認	Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo,	(参考訳) 大規模言語モデル(LLM)サービスワークロードの複雑さは、ChatGPTプラグインのような外部ツール呼び出しとの統合によって大幅に増大している。本稿では, LLMデコーディングと並行してツール部分実行を行う要求に対して, 効率的なLLMサービスを実現するための新たな機会を特定する。この目的のために、外部ツールを含む要求を処理するために最適化された効率的なLLMサービスシステムであるConveyorを設計する。ツール開発者がLCMサービスシステムに部分的な実行機会を公開するための新しいインターフェースと、部分的なツール実行を容易にする要求スケジューラを導入する。ツールの部分的な実行は、要求完了のレイテンシを最大38.8%改善することを示した。 The complexity of large language model (LLM) serving workloads has substantially increased due to the integration with external tool invocations, such as ChatGPT plugins. In this paper, we identify a new opportunity for efficient LLM serving for requests that trigger tools: tool partial execution alongside LLM decoding. To this end, we design Conveyor, an efficient LLM serving system optimized for handling requests involving external tools. We introduce a novel interface for tool developers to expose partial execution opportunities to the LLM serving system and a request scheduler that facilitates partial tool execution. Our results demonstrate that tool partial execution can improve request completion latency by up to 38.8%.	翻訳日:2024-06-06 08:53:00 公開日:2024-05-29
# 言語モデルのカスケード・アウェア・トレーニング Cascade-Aware Training of Language Models ( http://arxiv.org/abs/2406.00060v1 ) ライセンス: Link先を確認	Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go,	(参考訳) サービスコストとレイテンシの削減は、ビジネスアプリケーションに言語モデル(LM)を配置する上で、基本的な懸念事項である。これを解決するために、LMのカスケードは、より単純なクエリのためにより小さなモデルを条件付きで使用する効果的なソリューションを提供する。カスケードシステムは一般に独立に訓練されたモデルで構築され、訓練中にカスケードされたLMの推論時間相互作用を考慮するという利点を無視している。本稿では,カスケードの性能トレードオフを最適化する手法として,カスケード対応トレーニング(CAT)を提案する。我々は,小規模なLMをカスケードや下流の能力において,その位置を意識して訓練することで,推定時間の利点を得る。提案手法の有効性を,SuperGLUE,WMT22,FLAN2021データセットの60以上のLMタスクで示す。 Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the cascaded LMs during training. In this paper, we present cascade-aware training(CAT), an approach to optimizing the overall quality-cost performance tradeoff of a cascade of LMs. We achieve inference-time benefits by training the small LM with awareness of its place in a cascade and downstream capabilities. We demonstrate the value of the proposed method with over 60 LM tasks of the SuperGLUE, WMT22, and FLAN2021 datasets.	翻訳日:2024-06-06 08:53:00 公開日:2024-05-29
# STAT: トレーニング後の変圧器の収縮 STAT: Shrinking Transformers After Training ( http://arxiv.org/abs/2406.00061v1 ) ライセンス: Link先を確認	Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle,	(参考訳) 本稿では,変換器モデルに微調整を伴わない簡単なアルゴリズムSTATを提案する。 STATは、次の層の重みを補正して精度を保ちながら、注意頭とニューロンの両方をネットワークから排除する。ネットワーク内の各層ブロックは、ネットワーク構造を保存する一連の基本行列分解を用いて圧縮される。われわれのアルゴリズムは、BERTを圧縮するのに数分を要し、単一のGPUを用いて7Bパラメータを持つモデルを圧縮するのに3時間もかからない。わずか数百のデータ例を使用して、STATはネットワークの出力を保存し、既存の勾配のないプルーニング法を改善する。優れた微調整を含む手法とさえ競合する。本稿では, GLUE, Squad, WikiText2 などのベンチマークを用いて, BERT, DistilBERT, Llama-2 などのエンコーダアーキテクチャとデコーダアーキテクチャの両方に適用した。 We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next layer. Each layer block in the network is compressed using a series of principled matrix factorizations that preserve the network structure. Our entire algorithm takes minutes to compress BERT, and less than three hours to compress models with 7B parameters using a single GPU. Using only several hundred data examples, STAT preserves the output of the network and improves upon existing gradient-free pruning methods. It is even competitive with methods that include significant fine-tuning. We demonstrate our method on both encoder and decoder architectures, including BERT, DistilBERT, and Llama-2 using benchmarks such as GLUE, Squad, WikiText2.	翻訳日:2024-06-06 08:43:16 公開日:2024-05-29
# 臨床テキスト匿名化のための大規模言語モデルの可能性 : 比較研究 Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study ( http://arxiv.org/abs/2406.00062v1 ) ライセンス: Link先を確認	David Pissarra, Isabel Curioso, João Alveira, Duarte Pereira, Bruno Ribeiro, Tomás Souper, Vasco Gomes, André V. Carreiro, Vitor Rolla,	(参考訳) 自動臨床テキスト匿名化は、患者のプライバシーと安全性を確保しつつ、二次的使用のためにテキスト健康データを広く共有する可能性を秘めている。文学において多くの複雑で理論的に成功した匿名化解の提案にもかかわらず、これらの手法は依然として欠陥がある。そのため、医療機関はデータへのオープンアクセスを望んでいない。近年のLarge Language Models (LLM) の開発は、様々なタスクを遂行する能力を考えると、この分野をさらに発展させる有望な機会となっている。本稿では,LLMによる生成匿名化の課題に適した6つの新しい評価指標を提案する。さらに, LLM法の比較研究を行い, 2つのベースライン法との比較を行った。本研究は,臨床テキストの信頼性の高い匿名化に向けて,LCMを用いたモデルを構築した。 Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage while assuring patient privacy and safety. Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed. As such, clinical institutions are still reluctant to apply them for open access to their data. Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field, given their capability to perform various tasks. This paper proposes six new evaluation metrics tailored to the challenges of generative anonymization with LLMs. Moreover, we present a comparative study of LLM-based methods, testing them against two baseline techniques. Our results establish LLM-based models as a reliable alternative to common approaches, paving the way toward trustworthy anonymization of clinical text.	翻訳日:2024-06-06 08:43:16 公開日:2024-05-29
# システム2レコメンダ:時間的ポイント・プロシースによる勧告システムにおける実用性とエンゲージメントの両立 System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes ( http://arxiv.org/abs/2406.01611v1 ) ライセンス: Link先を確認	Arpit Agarwal, Nicolas Usunier, Alessandro Lazaric, Maximilian Nickel,	(参考訳) リコメンダーシステムは現代の人間の体験の重要な部分であり、その影響は、食べる食べ物から読むニュースまで様々である。しかし、レコメンデーションプラットフォームがユーザ目標とどの程度一致しているかについては、まだ議論がある。この議論を刺激する中核的な問題は、プラットフォームがコンテンツを最適化するために使用している主要な指標である、いいね!、シェア、ウォッチタイムなどのエンゲージメント信号に基づいてユーザユーティリティを推測する、という課題である。これは、ユーザーがユーティリティ駆動の意思決定プロセス(System-2と呼ぶ)、例えば、それらに関連するニュースを読むことは、しばしば衝動的意思決定プロセス(System-1と呼ぶ)、例えばクリックベイトニュースに費やす時間によって構成されるためである。その結果、観測されたエンゲージメントがユーティリティ駆動なのかインパルス駆動なのかを推測することは困難である。本稿では、エンゲージメント信号ではなく、プラットフォームへのリターン確率に基づいてユーザユーティリティを推測する、リコメンデータシステムに対する新しいアプローチを提案する。私たちの直感は、ユーザーがユーティリティを作成すれば、長期的にはプラットフォームに戻る傾向がありますが、ユーティリティを追加しない純粋なエンゲージメント駆動インタラクションは、短期的にはユーザリターンに影響を与えるかもしれませんが、持続的な効果はありません。本稿では,過去のコンテンツインタラクションが,自己興奮型ホークスプロセスに基づくユーザの到着率に影響を及ぼす生成モデルを提案する。これらのプラットフォームへの到着率は、System-1とSystem-2の両方の決定プロセスの組み合わせである。 System-2の到着強度は実用性に依存するが、System-1の到着強度は即時的な満足度に依存し、急速に消滅する傾向にある。そこで本研究では,システム1とシステム2のアンタングルを解消し,ユーザ利用によるコンテンツ最適化を可能にすることを解析的に示す。提案手法の有効性を実証するために, 合成データの実験を行った。 Recommender systems are an important part of the modern human experience whose influence ranges from the food we eat to the news we read. Yet, there is still debate as to what extent recommendation platforms are aligned with the user goals. A core issue fueling this debate is the challenge of inferring a user utility based on engagement signals such as likes, shares, watch time etc., which are the primary metric used by platforms to optimize content. This is because users utility-driven decision-processes (which we refer to as System-2), e.g., reading news that are relevant for them, are often confounded by their impulsive decision-processes (which we refer to as System-1), e.g., spend time on click-bait news. As a result, it is difficult to infer whether an observed engagement is utility-driven or impulse-driven. In this paper we explore a new approach to recommender systems where we infer user utility based on their return probability to the platform rather than engagement signals. Our intuition is that users tend to return to a platform in the long run if it creates utility for them, while pure engagement-driven interactions that do not add utility, may affect user return in the short term but will not have a lasting effect. We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process. These arrival rates to the platform are a combination of both System-1 and System-2 decision processes. The System-2 arrival intensity depends on the utility and has a long lasting effect, while the System-1 intensity depends on the instantaneous gratification and tends to vanish rapidly. We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility. We conduct experiments on synthetic data to demonstrate the effectiveness of our approach.	翻訳日:2024-06-05 21:31:36 公開日:2024-05-29
# 量子力学から解釈まで、理想的な量子計測を教える Teaching ideal quantum measurement, from dynamics to interpretation ( http://arxiv.org/abs/2405.20353v1 ) ライセンス: Link先を確認	Armen E. Allahverdyan, Roger Balian, Theo M. Nieuwenhuizen,	(参考訳) 本稿では, 実験系Sと装置Aとの相互作用の動的過程として解析された理想的な測度に関する大学院コースについて, 量子統計力学によって記述した。装置A=M+Bは、マクロな測定装置Mと浴槽Bを含み、測定の理想性の要件により、分離された化合物系S+M+Bのハミルトニアンを特定することができる。結果として生じる力学方程式は、単純なモデルに対して解くことができる。保存法は、切り離しと登録という2つの独立した緩和機構を含むことが示されている。 M と B の大きい大きさで正当化される近似が必要である。 S+A の最終密度行列 $\hat{\cal D}(t_f)$ は平衡形式を持つ。これは、測定の大規模な実行の結果を世界規模で記述している。測定問題、すなわち$\hat{\cal D}(t_f)$から個々の物理特性を抽出すると、その不明瞭さが実行の部分集合に関連付けられた部分に分割されることから生じる。この曖昧さに対処するため、各ランは、M の異なるポインタ値 $A_i$ で終わると仮定する。ボルンの法則は、試験された可観測物の保存法則から生じ、S. Von Neumann の初期状態から M の最終的な表示の出現頻度を表す。我々は、非可換観測値の測定を解析する際に、$q$-probabilitiesと$q$-correlationsという用語を提唱する。これらの考え方は、異なるタイプのコースに適応することができる。 We present a graduate course on ideal measurements, analyzed as dynamical processes of interaction between the tested system S and an apparatus A, described by quantum statistical mechanics. The apparatus A=M+B involves a macroscopic measuring device M and a bath B. The requirements for ideality of the measurement allow us to specify the Hamiltonian of the isolated compound system S+M+B. The resulting dynamical equations may be solved for simple models. Conservation laws are shown to entail two independent relaxation mechanisms: truncation and registration. Approximations, justified by the large size of M and of B, are needed. The final density matrix $\hat{\cal D}(t_f)$ of S+A has an equilibrium form. It describes globally the outcome of a large set of runs of the measurement. The measurement problem, i.e., extracting physical properties of individual runs from $\hat{\cal D}(t_f)$, then arises due to the ambiguity of its splitting into parts associated with subsets of runs. To deal with this ambiguity, we postulate that each run ends up with a distinct pointer value $A_i$ of the macroscopic M. This is compatible with the principles of quantum mechanics. Born's rule then arises from the conservation law for the tested observable; it expresses the frequency of occurrence of the final indications $A_i$ of M in terms of the initial state of S. Von Neumann's reduction amounts to updating of information due to selection of $A_i$. We advocate the terms $q$-probabilities and $q$-correlations when analyzing measurements of non-commuting observables. These ideas may be adapted to different types of courses.	翻訳日:2024-06-03 18:44:15 公開日:2024-05-29
# アコースティックエミッションとディープトランスファー学習による溶接継手の条件モニタリングについて:一般化, 正規損失および超収束 On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence ( http://arxiv.org/abs/2405.20887v1 ) ライセンス: Link先を確認	Emmanuel Ramasso, Rafael de O. Teloli, Romain Marcel,	(参考訳) 本稿では, 畳み込みニューラルネットワーク(CNN)を用いた深部伝達学習を用いて, 超音波放射を用いたボルト接合部の状態をモニタリングする。ボルト構造は多くのメカニカルシステムにおいて重要な要素であり、その状態を監視する能力は、効果的な構造的健康モニタリングに不可欠である。 3本のボルトで接続された2本の細いビームからなるOrION-AEベンチマークを用いて,本手法の性能評価を行った。この構造から得られたデータは、連続ウェーブレット変換を用いて音響放射データストリームを画像に変換し、事前学習したCNNを用いて特徴抽出と復調を行う。本実験では, ボルトの締め付けレベルを推定するために, 単センサと多センサフュージョンを比較し, 実・前フィルタデータを用いた性能評価を行った。特に,CNNに基づく移動学習の一般化機能に着目し,不正確な予測を根本事実に近づいた場合,不正確な予測を過度に減らし,誤分類誤りを助長する順序的損失関数について検討した。ネットワーク構成や学習速度スケジューラについても検討し,ネットワーク間の複数イテレーションにおいて高い分類精度を実現する。さらに,CNNを用いた伝達学習の音響放射によるボルト状構造物監視の一般化能力について,訓練中に必要となる事前情報量について検証した。 This paper investigates the use of deep transfer learning based on convolutional neural networks (CNNs) to monitor the condition of bolted joints using acoustic emissions. Bolted structures are critical components in many mechanical systems, and the ability to monitor their condition status is crucial for effective structural health monitoring. We evaluated the performance of our methodology using the ORION-AE benchmark, a structure composed of two thin beams connected by three bolts, where highly noisy acoustic emission measurements were taken to detect changes in the applied tightening torque of the bolts. The data used from this structure is derived from the transformation of acoustic emission data streams into images using continuous wavelet transform, and leveraging pretrained CNNs for feature extraction and denoising. Our experiments compared single-sensor versus multiple-sensor fusion for estimating the tightening level (loosening) of bolts and evaluated the use of raw versus prefiltered data on the performance. We particularly focused on the generalization capabilities of CNN-based transfer learning across different measurement campaigns and we studied ordinal loss functions to penalize incorrect predictions less severely when close to the ground truth, thereby encouraging misclassification errors to be in adjacent classes. Network configurations as well as learning rate schedulers are also investigated, and super-convergence is obtained, i.e., high classification accuracy is achieved in a few number of iterations with different networks. Furthermore, results demonstrate the generalization capabilities of CNN-based transfer learning for monitoring bolted structures by acoustic emission with varying amounts of prior information required during training.	翻訳日:2024-06-03 14:08:24 公開日:2024-05-29
# 原子核ノルム規則化マトリックスの相対誤差境界解析 Relative Error Bound Analysis for Nuclear Norm Regularized Matrix Completion ( http://arxiv.org/abs/1504.06817v2 ) ライセンス: Link先を確認	Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou,	(参考訳) 本稿では,核ノルム正規化行列の完備化に対する相対誤差を,フルランク行列の完備化に着目して開発する。対象行列のトップ固有空間が不整合であるという仮定の下で、未知行列の最良の低ランク近似を回復する相対上界を導出する。複数の研究がフルランク行列補完の回復誤差の分析に費やされているが、その誤差境界は通常加法的であり、完全な回復ケースを得ることができず、より一般的には固有値の歪んだ分布を利用するのが困難である。本分析は, 正規化定式化の最適条件と, 低ランク行列完備化の既定保証に基づく。我々の知る限りでは、これは行列完備化の正規化された定式化のために証明された最初の相対的境界である。 In this paper, we develop a relative error bound for nuclear norm regularized matrix completion, with the focus on the completion of full-rank matrices. Under the assumption that the top eigenspaces of the target matrix are incoherent, we derive a relative upper bound for recovering the best low-rank approximation of the unknown matrix. Although multiple works have been devoted to analyzing the recovery error of full-rank matrix completion, their error bounds are usually additive, making it impossible to obtain the perfect recovery case and more generally difficult to leverage the skewed distribution of eigenvalues. Our analysis is built upon the optimality condition of the regularized formulation and existing guarantees for low-rank matrix completion. To the best of our knowledge, this is the first relative bound that has been proved for the regularized formulation of matrix completion.	翻訳日:2024-06-02 18:32:50 公開日:2024-05-29
# 言語横断・文字レベルニューラルな形態的タグ付け Cross-lingual, Character-Level Neural Morphological Tagging ( http://arxiv.org/abs/1708.09157v4 ) ライセンス: Link先を確認	Ryan Cotterell, Georg Heigold,	(参考訳) 一般的なNLPタスクであっても、多くの言語では十分な監視ができない。そこで本研究では,高リソース言語と低リソース言語に対する形態的タグ付けを予測するために,文字レベルのリカレントなニューラルタグをトレーニングするトランスファーラーニング手法について検討する。複数の関連言語間の共同文字表現の学習は、高リソース言語から低リソース言語への知識伝達を成功させ、モノリンガルモデルの精度を最大30%向上させる。 Even for common NLP tasks, sufficient supervision is not available in many languages -- morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.	翻訳日:2024-06-02 14:47:20 公開日:2024-05-29
# オープンオントロジースロット充填のための弾性CRF Elastic CRFs for Open-ontology Slot Filling ( http://arxiv.org/abs/1811.01331v2 ) ライセンス: Link先を確認	Yinpei Dai, Yichi Zhang, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng,	(参考訳) スロットフィリングはタスク指向の対話システムにおいて重要なコンポーネントであり、(ユーザ)発話をスロットと呼ばれるセマンティックな概念にパースするために使用される。オントロジーはスロットの集合と各スロットが取ることのできる値によって定義される。スロットフィリングをシーケンスラベリングタスクとして扱う最も広く使われているプラクティスは、2つの主な欠点に悩まされている。まず、オントロジーは通常事前に定義され、固定されているため、目に見えないスロットの新しいラベルを検出できない。第二に、スロットラベルのワンホット符号化は、スロットと類似のセマンティクスとの相関を無視するので、異なるドメインで学んだ知識を共有することは困難である。これらの問題に対処するために,各スロットは自然言語記述の埋め込みによって表現され,CRF層でモデル化される弾性条件付きランダムフィールド(eCRF)と呼ばれる新しいモデルを提案する。スロットに対する言語記述が利用可能であれば、eCRFによって新しいスロット値を検出することができる。実験の結果,eCRFはドメイン内タスクとクロスドメインタスクの両方において既存のモデルよりも優れており,特に未確認スロットや値の予測において優れていることがわかった。 Slot filling is a crucial component in task-oriented dialog systems that is used to parse (user) utterances into semantic concepts called slots. An ontology is defined by the collection of slots and the values that each slot can take. The most widely used practice of treating slot filling as a sequence labeling task suffers from two main drawbacks. First, the ontology is usually pre-defined and fixed and therefore is not able to detect new labels for unseen slots. Second, the one-hot encoding of slot labels ignores the correlations between slots with similar semantics, which makes it difficult to share knowledge learned across different domains. To address these problems, we propose a new model called elastic conditional random field (eCRF), where each slot is represented by the embedding of its natural language description and modeled by a CRF layer. New slot values can be detected by eCRF whenever a language description is available for the slot. In our experiment, we show that eCRFs outperform existing models in both in-domain and cross-domain tasks, especially in predicting unseen slots and values.	翻訳日:2024-06-02 14:47:20 公開日:2024-05-29
# 量子ニューラルネットワークの一般化研究 Generalization Study of Quantum Neural Network ( http://arxiv.org/abs/2006.02388v2 ) ライセンス: Link先を確認	JinZhe Jiang, Xin Zhang, Chen Li, YaQian Zhao, RenGang Li,	(参考訳) 一般化はニューラルネットワークの重要な特徴であり、それについて多くの研究がなされている。近年、量子コンプ・ティング(quantum compu-ting)の発展に伴い、新たな機会がもたらされる。本稿では,量子ゲートによって構築された量子ニューラルネットワークのクラスについて検討した。このモデルでは、特徴データをまずヒルベルト空間の量子状態にマッピングし、その上にユニタリ進化を実装し、最後に量子状態の即時測定によって分類結果を得ることができた。四項ニューラルネットワークにおける全ての演算はユニタリであるため、パラメータはヒルベルト空間の超球面を構成する。従来のニューラルネットワークと比較すると、パラメータ空間はフラットである。したがって、局所的な最適化に陥ることは容易ではなく、量子ニューラルネットワークはより一般化されている。提案手法を検証するため,提案手法を3つの公開データセット上で評価した。 Generalization is an important feature of neural network, and there have been many studies on it. Recently, with the development of quantum compu-ting, it brings new opportunities. In this paper, we studied a class of quantum neural network constructed by quantum gate. In this model, we mapped the feature data to a quantum state in Hilbert space firstly, and then implement unitary evolution on it, in the end, we can get the classification result by im-plement measurement on the quantum state. Since all the operations in quan-tum neural networks are unitary, the parameters constitute a hypersphere of Hilbert space. Compared with traditional neural network, the parameter space is flatter. Therefore, it is not easy to fall into local optimum, which means the quantum neural networks have better generalization. In order to validate our proposal, we evaluated our model on three public datasets, the results demonstrated that our model has better generalization than the classical neu-ral network with the same structure.	翻訳日:2024-06-02 14:47:20 公開日:2024-05-29
# ChexNetで事前訓練したResNet50を用いたコロナ病に対応するX線画像の転写学習アプローチ Transfer learning approach to Classify the X-ray image that corresponds to corona disease Using ResNet50 pretrained by ChexNet ( http://arxiv.org/abs/2105.08382v2 ) ライセンス: Link先を確認	Mahyar Bolhassani,	(参考訳) コロナウイルスは世界中の人々に悪影響を及ぼした。コビッド19ウイルスと、肺炎やインフルエンザなどの他の呼吸器疾患との間には共通の症状がある。そのため, 早期診断は, 患者を救うだけでなく, 感染拡大を防ぐためにも重要である。最も信頼性の高い診断方法の1つは、肺のX線画像によるものである。深層学習アプローチの助けを借りて,感染した肺の状態を知るための深層モデルを教えることができる。したがって、新しいサンプルをCovid19感染患者であるかどうかの分類が可能である。このプロジェクトでは、ImageNetデータセットとCheXNetデータセットによって事前訓練されたResNet50に基づいて、ディープモデルをトレーニングする。 Kaggle氏が導入した不均衡なCoronaHack Chest X-Rayデータセットに基づいて、バイナリとマルチクラスの分類を適用した。また,Focal lossとCross Entropy lossの比較を行った。 Coronavirus adversely has affected people worldwide. There are common symptoms between the Covid19 virus disease and other respiratory diseases like pneumonia or Influenza. Therefore, diagnosing it fast is crucial not only to save patients but also to prevent it from spreading. One of the most reliant methods of diagnosis is through X-ray images of a lung. With the help of deep learning approaches, we can teach the deep model to learn the condition of an affected lung. Therefore, it can classify the new sample as if it is a Covid19 infected patient or not. In this project, we train a deep model based on ResNet50 pretrained by ImageNet dataset and CheXNet dataset. Based on the imbalanced CoronaHack Chest X-Ray dataset introducing by Kaggle we applied both binary and multi-class classification. Also, we compare the results when using Focal loss and Cross entropy loss.	翻訳日:2024-06-01 00:29:19 公開日:2024-05-29
# マニピュレーションとピアメカニズム:サーベイ Manipulation and Peer Mechanisms: A Survey ( http://arxiv.org/abs/2210.01984v3 ) ライセンス: Link先を確認	Matthew Olckers, Toby Walsh,	(参考訳) ピアメカニズムでは、賞の競争相手も誰が勝つかを決定する。各競技者には、賞のランク、成績、候補者の指名を依頼することができる。この賞は、金融援助、コースグレード、会議での賞などの価値があり得るため、競技者は、その仕組みを操作する誘惑を受けることができる。ピアメカニズムの操作を防止または回避するためのアプローチを調査する。我々はいくつかの重要な研究課題を特定して調査を締めくくった。 In peer mechanisms, the competitors for a prize also determine who wins. Each competitor may be asked to rank, grade, or nominate peers for the prize. Since the prize can be valuable, such as financial aid, course grades, or an award at a conference, competitors may be tempted to manipulate the mechanism. We survey approaches to prevent or discourage the manipulation of peer mechanisms. We conclude our survey by identifying several important research challenges.	翻訳日:2024-06-01 00:22:17 公開日:2024-05-29
# LoopDraw: 形状合成と編集のためのループベース自己回帰モデル LoopDraw: a Loop-Based Autoregressive Model for Shape Synthesis and Editing ( http://arxiv.org/abs/2212.04981v2 ) ライセンス: Link先を確認	Nam Anh Dinh, Haochen Wang, Greg Shakhnarovich, Rana Hanocka,	(参考訳) 幾何学の普遍的な3次元表現は存在しないが、点雲、メッシュ、暗黙の関数、ボクセルなど多くの代替品がある。本研究では, 断面閉ループの列を用いて, 形状を表現するための新しい, 説得力のある代替手段を提案する。すべての平面にまたがるループは、自己回帰的な形状の合成と編集に活用する組織階層を形成します。ループは基底形状の非局所的な記述であり、単純なループ操作(シフトなど)は幾何学に大きな構造変化をもたらす。これは、点雲の点や三角形メッシュの三角形のような局所原始的な操作とは対照的である。さらに、ループは直感的かつ自然なプリミティブであり、コンピュータとユーザの両方の形状を解析し、編集するものであることを実証する。 There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few. In this work, we present a new, compelling alternative for representing shapes using a sequence of cross-sectional closed loops. The loops across all planes form an organizational hierarchy which we leverage for autoregressive shape synthesis and editing. Loops are a non-local description of the underlying shape, as simple loop manipulations (such as shifts) result in significant structural changes to the geometry. This is in contrast to manipulating local primitives such as points in a point cloud or a triangle in a triangle mesh. We further demonstrate that loops are intuitive and natural primitive for analyzing and editing shapes, both computationally and for users.	翻訳日:2024-06-01 00:12:24 公開日:2024-05-29
# 最適2倍ロバスト推定のためのニュアンス関数チューニング Nuisance Function Tuning for Optimal Doubly Robust Estimation ( http://arxiv.org/abs/2212.14857v2 ) ライセンス: Link先を確認	Sean McGrath, Rajarshi Mukherjee,	(参考訳) 二重頑健な汎函数の推定子は、平均処理効果汎関数に対する確率スコアと条件結果平均のような2つの複素ニュアンス関数を推定することに依存する。因果推論と条件付き独立性試験の文献にまたがる応用を目撃した二重頑健な非パラメトリック関数に対して、ニュアンス関数を最適収束率で推定する方法の問題点を考察する。いくつかのプラグイン型推定器とワンステップ型推定器に対して、ニュアンス関数推定器の異なるチューニングパラメータ選択と興味関数の最適推定率に関するサンプル分割戦略の相互作用を述べる。これらの各推定器および各サンプル分割戦略について、興味の関数に対する最適収束率を得るために、低規則性条件下でニュアンス関数推定器をアンダースムースすることの必要性を示す。適切なニュアンス関数チューニングとサンプル分割戦略により、これらの推定器のいくつかは、ニュアンス関数のすべてのH\"古い滑らか度クラスにおいて収束の最小値を達成することができることを示す。 Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in type estimators and a one-step type estimator, we illustrate the interplay between different tuning parameter choices for the nuisance function estimators and sample splitting strategies on the optimal rate of estimating the functional of interest. For each of these estimators and each sample splitting strategy, we show the necessity to undersmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. By performing suitable nuisance function tuning and sample splitting strategies, we show that some of these estimators can achieve minimax rates of convergence in all H\"older smoothness classes of the nuisance functions.	翻訳日:2024-06-01 00:12:24 公開日:2024-05-29
# 音源とターゲット埋め込みの混合による配電シフトへのわずかな適応 Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings ( http://arxiv.org/abs/2305.14521v3 ) ライセンス: Link先を確認	Yihao Xue, Ali Payani, Yu Yang, Baharan Mirzasoleiman,	(参考訳) トレーニング済みの機械学習モデルは、新しいターゲット環境にデプロイされた場合、分散シフトに適応する必要がある。対象分布からラベル付きデータを取得する場合、ターゲット分布からのサンプルを少数含む少数ショット適応が必須となる。そこで本研究では,MixProを提案する。 MixProはまず、トレーニング済みの大規模なデータと、ターゲットとする少数のデータとを混合(直線的に組み合わせ)することによって、比較的大きなデータセットを生成する。このプロセスは、小さなターゲットデータ中の特定のノイズを緩和しながら、ソースとターゲットの両方の重要な特徴を保存します。そして、混合埋め込み上に線形分類器を訓練し、小さなターゲットデータを過度に適合させることなく、モデルを目標分布に効果的に適応させる。理論的には、従来の方法よりもMixProの利点を実証する。実験の結果,MixPro がベースラインを最大 7 % 上回る性能を示し,対象とする例は 2-4 例に留まった。 Pretrained machine learning models need to be adapted to distribution shifts when deployed in new target environments. When obtaining labeled data from the target distribution is expensive, few-shot adaptation with only a few examples from the target distribution becomes essential. In this work, we propose MixPro, a lightweight and highly data-efficient approach for few-shot adaptation. MixPro first generates a relatively large dataset by mixing (linearly combining) pre-trained embeddings of large source data with those of the few target examples. This process preserves important features of both source and target distributions, while mitigating the specific noise in the small target data. Then, it trains a linear classifier on the mixed embeddings to effectively adapts the model to the target distribution without overfitting the small target data. Theoretically, we demonstrate the advantages of MixPro over previous methods. Our experiments, conducted across various model architectures on 8 datasets featuring different types of distribution shifts, reveal that MixPro can outperform baselines by up to 7\%, with only 2-4 target examples.	翻訳日:2024-06-01 00:12:24 公開日:2024-05-29
# SecureFalcon: LLMによるソフトウェア脆弱性の自動検出はまだ存在するか? SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? ( http://arxiv.org/abs/2307.06616v2 ) ライセンス: Link先を確認	Mohamed Amine Ferrag, Ammar Battah, Norbert Tihanyi, Ridhi Jain, Diana Maimut, Fatima Alwahedi, Thierry Lestable, Narinderjit Singh Thandi, Abdechakour Mechri, Merouane Debbah, Lucas C. Cordeiro,	(参考訳) ソフトウェアの脆弱性は、クラッシュ、データ損失、セキュリティ侵害など、数多くの問題を引き起こす可能性がある。これらの問題は品質を著しく侵害し、ソフトウェアアプリケーションやシステムの市場採用に悪影響を及ぼす可能性がある。静的解析のような従来のバグ修正手法は、しばしば偽陽性を生成する。 FV(Formal Verification)の形式である境界モデルチェックは、静的アナライザよりも正確な結果を提供するが、かなりのリソースを必要とし、開発者の生産性を著しく損なう。機械学習(ML)は、FVメソッドに匹敵する精度を達成でき、ほぼリアルタイムで人気のあるインスタントコード補完フレームワークで使用できるか? 本稿では,Falcon-40Bモデルから派生した1億1100万のパラメータしか持たない,ソフトウェア脆弱性の分類に適した,革新的なモデルアーキテクチャであるSecureFalconを紹介する。最高のパフォーマンスを達成するため、FormAIデータセットとFalconVulnDBという2つのデータセットを使用してモデルをトレーニングしました。 FalconVulnDBは、最近のパブリックデータセット、すなわちSySeVRフレームワーク、Draper VDISC、Bigvul、Diversevul、SARD Juliet、ReVealデータセットの組み合わせである。これらのデータセットには、CWE-119、CWE-120、CWE-476、CWE-122、CWE-190、CWE-121、CWE-78、CWE-787、CWE-20、CWE-762など、最も危険なソフトウェア脆弱性が含まれている。 SecureFalconはバイナリ分類で94%の精度、マルチクラス化で最大92%、即時CPU推論時間を実現している。 BERT、RoBERTa、CodeBERT、および従来のMLアルゴリズムといった既存のモデルよりも優れており、ソフトウェアの脆弱性検出とインスタントコード補完フレームワークの境界を推し進めることを約束している。 Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide more accurate outcomes compared to static analyzers, it demands substantial resources and significantly hinders developer productivity. Can Machine Learning (ML) achieve accuracy comparable to FV methods and be used in popular instant code completion frameworks in near real-time? In this paper, we introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model and explicitly tailored for classifying software vulnerabilities. To achieve the best performance, we trained our model using two datasets, namely the FormAI dataset and the FalconVulnDB. The FalconVulnDB is a combination of recent public datasets, namely the SySeVR framework, Draper VDISC, Bigvul, Diversevul, SARD Juliet, and ReVeal datasets. These datasets contain the top 25 most dangerous software weaknesses, such as CWE-119, CWE-120, CWE-476, CWE-122, CWE-190, CWE-121, CWE-78, CWE-787, CWE-20, and CWE-762. SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times. It outperforms existing models such as BERT, RoBERTa, CodeBERT, and traditional ML algorithms, promising to push the boundaries of software vulnerability detection and instant code completion frameworks.	翻訳日:2024-06-01 00:02:40 公開日:2024-05-29
# 単体及び多体量子カオスにおける局所レベル間隔の統計 Statistics of local level spacings in single- and many-body quantum chaos ( http://arxiv.org/abs/2308.06766v2 ) ライセンス: Link先を確認	Peng Tian, Roman Riser, Eugene Kanzieper,	(参考訳) 局所的なレベルの間隔の概念を導入し、確率行列理論のアプローチでそれらの統計を研究する。無限次元のランダム行列の極限において、平均局所間隔の普遍列と、量子系の大域対称性とその内部-カオスまたは正則-力学を一意に識別するそれらの比を決定する。これらの発見は、単体および多体量子系を監視するための新しい枠組みを提供するもので、リーマンゼータ関数の零点、不合理な矩形ビリヤードのスペクトル、サハデフ・イェーキタエフ・ハミルトンの多体スペクトルの数値実験によって裏付けられている。 We introduce a notion of local level spacings and study their statistics within a random-matrix-theory approach. In the limit of infinite-dimensional random matrices, we determine universal sequences of mean local spacings and of their ratios which uniquely identify the global symmetries of a quantum system and its internal -- chaotic or regular -- dynamics. These findings, which offer a new framework to monitor single- and many-body quantum systems, are corroborated by numerical experiments performed for zeros of the Riemann zeta function, spectra of irrational rectangular billiards and many-body spectra of the Sachdev-Ye-Kitaev Hamiltonians.	翻訳日:2024-06-01 00:02:40 公開日:2024-05-29
# 高速かつレグレトな最適アーム同定法:基本極限と低複雑さアルゴリズム Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms ( http://arxiv.org/abs/2309.00591v3 ) ライセンス: Link先を確認	Qining Zhang, Lei Ying,	(参考訳) 本稿では,2つの目的を持つ確率的マルチアーマッド帯域(MAB)問題について考察する。一最適な腕の素早い識別及びコミットメント (ii)$T$連続ラウンドの一連の報酬最大化。それぞれの目的は、個別によく研究されているが、つまり、最高の腕の識別である。 (i)と後悔の最小化 (ii) 実用的重要性にもかかわらず, 両目的の同時実現は未解決の問題である。本稿では,これら2つの目的を達成することを目的とした,emph{Regret Optimal Best Arm Identification} (ROBAI)を紹介する。事前に決定された停止時間と適応的な停止時間の両方でROBAIを解くため、EOCPとその変種と呼ばれるアルゴリズムをそれぞれ提示する。これはガウスと一般の包帯において漸近的最適後悔を達成できるだけでなく、事前決定された停止時間を持つ$\mathcal{O}(\log T)$ラウンドと適応的な停止時間を持つ$\mathcal{O}(\log^2 T)$ラウンドにおいて最適なアームにコミットする。さらに、ROBAIのコミットメント時間(サンプル複雑性と同等)の低い境界を特徴付け、EOCPとその変種が予め決定された停止時間に最適なサンプルであり、適応的な停止時間にほぼ最適であることを示す。数値的な結果は、我々の理論解析を裏付け、古典的 UCB アルゴリズムによってもたらされる興味深い「過剰探索」現象を明らかにし、EOCP は UCB よりもはるかに早く探索を中止しているにもかかわらず、より少ない後悔、すなわち $\mathcal{O}(\log T)$ 対 $\mathcal{O}(T)$ である。 This paper considers a stochastic Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. This paper introduces \emph{Regret Optimal Best Arm Identification} (ROBAI) which aims to achieve these dual objectives. To solve ROBAI with both pre-determined stopping time and adaptive stopping time requirements, we present an algorithm called EOCP and its variants respectively, which not only achieve asymptotic optimal regret in both Gaussian and general bandits, but also commit to the optimal arm in $\mathcal{O}(\log T)$ rounds with pre-determined stopping time and $\mathcal{O}(\log^2 T)$ rounds with adaptive stopping time. We further characterize lower bounds on the commitment time (equivalent to the sample complexity) of ROBAI, showing that EOCP and its variants are sample optimal with pre-determined stopping time, and almost sample optimal with adaptive stopping time. Numerical results confirm our theoretical analysis and reveal an interesting "over-exploration" phenomenon carried by classic UCB algorithms, such that EOCP has smaller regret even though it stops exploration much earlier than UCB, i.e., $\mathcal{O}(\log T)$ versus $\mathcal{O}(T)$, which suggests over-exploration is unnecessary and potentially harmful to system performance.	翻訳日:2024-05-31 23:52:32 公開日:2024-05-29
# Hi Model, generating 'nice' without 'good' is not bad as generate 'rice'! Hi Model, generating 'nice' instead of 'good' is not as bad as generating 'rice'! Towards Context and Semantic Infused Dialogue Generation Loss Function and Evaluation Metric ( http://arxiv.org/abs/2309.05804v2 ) ライセンス: Link先を確認	Abhisek Tiwari, Muhammed Sinan, Kaushik Roy, Amit Sheth, Sriparna Saha, Pushpak Bhattacharyya,	(参考訳) 過去20年間、対話モデリングは、単純なルールベースの応答からパーソナライズされ説得力のある応答生成へと、大きな進歩を遂げてきた。しかし,これらの進歩にもかかわらず,対話生成の目的関数と評価指標はいまだに停滞している。これらの語彙ベースのメトリクス、例えばクロスエントロピーとBLEUには2つの重要な制限がある。 (a)意味的考慮のない単語間マッチング:「ニセ」と「ライス」を「良い」に生成できないために同じクレジットを割り当てる (b) 生成された応答を評価するための欠落したコンテキスト属性: 生成された応答が進行中の対話コンテキストに関連しているとしても、コーパスで提供される金の発話にマッチしないよう罰せられることがある。本稿では,これらの制約を包括的に検討し,Semantic Infused Contextualized diaLogue (SemTextualLogue)ロス関数と呼ばれる新たな損失関数を提案する。また、文脈と意味的関連性の両方を取り入れて、Dialuationと呼ばれる評価指標を定式化する。課題指向シナリオとオープンドメインシナリオを含む2つの対話コーパス上で,事前学習モデルと事前学習モデルの両方を実験した。 SemTextualLoguelossで訓練した対話生成モデルは,従来のクロスエントロピー損失関数よりも優れた性能を示した。その結果、対話生成モデルの効果的な訓練は、意味論と文脈を取り入れることに大きく依存していることが判明した。このパターンは、従来のメトリクスと比較して、文脈と意味の両方の考慮が人間の評価と強く相関する、導入されたダイアリュージョンメトリックにも反映されている。 Over the past two decades, dialogue modeling has made significant strides, moving from simple rule-based responses to personalized and persuasive response generation. However, despite these advancements, the objective functions and evaluation metrics for dialogue generation have remained stagnant. These lexical-based metrics, e.g., cross-entropy and BLEU, have two key limitations: (a) word-to-word matching without semantic consideration: It assigns the same credit for failure to generate "nice" and "rice" for "good", (b) missing context attribute for evaluating the generated response: Even if a generated response is relevant to the ongoing dialogue context, it may still be penalized for not matching the gold utterance provided in the corpus. In this paper, we first investigate these limitations comprehensively and propose a new loss function called Semantic Infused Contextualized diaLogue (SemTextualLogue) loss function. We also formulate an evaluation metric called Dialuation, incorporating both context and semantic relevance. We experimented with both non-pretrained and pre-trained models on two dialogue corpora, encompassing task-oriented and open-domain scenarios. We found that the dialogue generation models trained with SemTextualLogueloss attained superior performance compared to the traditional cross-entropy loss function. The findings establish that the effective training of a dialogue generation model hinges significantly on incorporating semantics and context. This pattern is also mirrored in the introduced Dialuation metric, where the consideration of both context and semantics correlates more strongly with human evaluation compared to traditional metrics.	翻訳日:2024-05-31 23:52:32 公開日:2024-05-29
# 深層学習における不特定性回避のためのループポーラリティ解析 Loop Polarity Analysis to Avoid Underspecification in Deep Learning ( http://arxiv.org/abs/2309.10211v2 ) ライセンス: Link先を確認	Donald Martin, Jr., David Kinney,	(参考訳) ディープラーニングは、データの複雑なパターンを検出するための強力なテクニックセットである。しかし、そのプロセスの因果構造が過小評価されると、深層学習モデルは脆くなり、データ生成プロセスの分布の変化に対する堅牢性に欠ける。本稿では,データ生成プロセスの因果構造を特定するツールとしてループ極性解析を応用し,深層学習パイプラインにおけるシステム構造とシステム挙動の関係について,より堅牢な理解を符号化する。 SIRモデルに基づくシミュレートされた流行データを用いて、システムを構成する異なるフィードバックループの極性を測定することで、ニューラルネットワークのより堅牢な推論を実現し、ディープラーニングモデルのアウト・オブ・ディストリビューション性能を改善し、システム力学にインスパイアされたアプローチを機械学習開発パイプラインに注入する方法を実証する。 Deep learning is a powerful set of techniques for detecting complex patterns in data. However, when the causal structure of that process is underspecified, deep learning models can be brittle, lacking robustness to shifts in the distribution of the data-generating process. In this paper, we turn to loop polarity analysis as a tool for specifying the causal structure of a data-generating process, in order to encode a more robust understanding of the relationship between system structure and system behavior within the deep learning pipeline. We use simulated epidemic data based on an SIR model to demonstrate how measuring the polarity of the different feedback loops that compose a system can lead to more robust inferences on the part of neural networks, improving the out-of-distribution performance of a deep learning model and infusing a system-dynamics-inspired approach into the machine learning development pipeline.	翻訳日:2024-05-31 23:52:32 公開日:2024-05-29
# 非エルゴードカイラル量子力学のためのライダーベルクプラットフォーム Rydberg platform for non-ergodic chiral quantum dynamics ( http://arxiv.org/abs/2309.12392v2 ) ライセンス: Link先を確認	Riccardo J. Valencia-Tortora, Nicola Pancotti, Michael Fleischhauer, Hannes Bernien, Jamir Marino,	(参考訳) 本稿では,右(または左)の原子が励起された場合にのみ,原子が状態を変化させることができる方向の反ブロッケード条件により,リドベルク原子のキラル相互作用を工学的に制御する機構を提案する。提案手法のスケーラビリティにより,一方向キャラクタを持つ運動的制約付きモデルの多体ダイナミクスを探索することができる。我々は、原子に作用する2つの駆動場の強度を単に調整することで、傷跡、閉じ込め、あるいは局在化を通じて非エルゴード的挙動を観察する。我々は、我々のメカニズムが古典的なノイズの存在下でどのように持続し、相互作用におけるキラリティの度合いを調整できるかを議論し、中性原子配列を用いた指向性、強い相関の量子力学のフロンティアに向けて展開する。 We propose a mechanism for engineering chiral interactions in Rydberg atoms via a directional antiblockade condition, where an atom can change its state only if an atom to its right (or left) is excited. The scalability of our scheme enables us to explore the many-body dynamics of kinetically constrained models with unidirectional character. We observe non-ergodic behavior via either scars, confinement, or localization, upon simply tuning the strength of two driving fields acting on the atoms. We discuss how our mechanism persists in the presence of classical noise and how the degree of chirality in the interactions can be tuned, opening towards the frontier of directional, strongly correlated, quantum mechanics using neutral atoms arrays.	翻訳日:2024-05-31 23:52:32 公開日:2024-05-29
# Think, Act, and Ask: オープンワールドの対話型パーソナライズされたロボットナビゲーション Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation ( http://arxiv.org/abs/2310.07968v4 ) ライセンス: Link先を確認	Yinpei Dai, Run Peng, Sikai Li, Joyce Chai,	(参考訳) Zero-Shot Object Navigation (ZSON)は、エージェントが未知の環境でオープン語彙オブジェクトへナビゲートすることを可能にする。 ZSONの既存の研究は主に、汎用オブジェクトクラスを見つけるための個別の命令に従うことに焦点を当てており、自然言語の相互作用の利用や、ユーザ固有のオブジェクトを特定する複雑さを無視している。これらの制限に対処するために、ZIPON(Zero-shot Interactive Personalized Object Navigation)を導入する。 ZIPON を解決するために,Large Language Models (LLM) を用いた Open-woRld Interactive persOnalized Navigation (ORION) と呼ばれる新しいフレームワークを提案する。実験の結果,ユーザフィードバックを活用できる対話型エージェントの性能は著しく向上した。しかし,タスク完了とナビゲーションとインタラクションの効率のバランスが良好であることは,すべての方法において依然として困難である。さらに,多様なユーザフィードバックフォームがエージェントのパフォーマンスに与える影響について,さらなる知見を提供する。コードはhttps://github.com/sled-group/navchat.comで入手できる。 Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. The existing works of ZSON mainly focus on following individual instructions to find generic object classes, neglecting the utilization of natural language interaction and the complexities of identifying user-specific objects. To address these limitations, we introduce Zero-shot Interactive Personalized Object Navigation (ZIPON), where robots need to navigate to personalized goal objects while engaging in conversations with users. To solve ZIPON, we propose a new framework termed Open-woRld Interactive persOnalized Navigation (ORION), which uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules for perception, navigation and communication. Experimental results show that the performance of interactive agents that can leverage user feedback exhibits significant improvement. However, obtaining a good balance between task completion and the efficiency of navigation and interaction remains challenging for all methods. We further provide more findings on the impact of diverse user feedback forms on the agents' performance. Code is available at https://github.com/sled-group/navchat.	翻訳日:2024-05-31 23:52:32 公開日:2024-05-29
# 情報理論に基づく等価性の発見に向けて Towards Information Theory-Based Discovery of Equivariances ( http://arxiv.org/abs/2310.16555v3 ) ライセンス: Link先を確認	Hippolyte Charvin, Nicola Catenacci Volpi, Daniel Polani,	(参考訳) 対称性の存在は、システムに厳密な制約のセットを課す。この制約された構造により、インテリジェントなエージェントがそのようなシステムと対話し、システムの対称性を内部化して情報処理によって学習と一般化の効率を大幅に改善することができる。並行して、複雑性に制約のある学習と行動の原則モデルが、情報理論の手法の利用を増大させる。ここでは、これら2つの視点を統合して、情報理論レンズがシステムの対称性の効果を「見る」ことができるかどうかを理解したい。そこで本研究では,学習と情報制約を考慮した適応行動に関する多くの原則研究において,生産的基盤として機能するインフォメーション・ボトルネック(Information Bottleneck)の新たな変種を提案する。離散的な場合と特定の技術的前提の下では)我々の手法は対称性と情報パロジニーのある種の双対性を定式化していることを示す。この情報理論的処理は、さらに「粗さ」が対応する最適圧縮によって保存される入力出力相互情報の量によって測定される「ソフト」同値の原理的概念を示唆する。この新たな概念は、有界有理性(bounded rationality)の分野と神経表現における対称性の研究の間の橋渡しを提供する。この枠組みは、(現実的かつソフトな)同値を自動的に発見することを可能にする。 The presence of symmetries imposes a stringent set of constraints on a system. This constrained structure allows intelligent agents interacting with such a system to drastically improve the efficiency of learning and generalization, through the internalisation of the system's symmetries into their information-processing. In parallel, principled models of complexity-constrained learning and behaviour make increasing use of information-theoretic methods. Here, we wish to marry these two perspectives and understand whether and in which form the information-theoretic lens can "see" the effect of symmetries of a system. For this purpose, we propose a novel variant of the Information Bottleneck principle, which has served as a productive basis for many principled studies of learning and information-constrained adaptive behaviour. We show (in the discrete case and under a specific technical assumption) that our approach formalises a certain duality between symmetry and information parsimony: namely, channel equivariances can be characterised by the optimal mutual information-preserving joint compression of the channel's input and output. This information-theoretic treatment furthermore suggests a principled notion of "soft" equivariance, whose "coarseness" is measured by the amount of input-output mutual information preserved by the corresponding optimal compression. This new notion offers a bridge between the field of bounded rationality and the study of symmetries in neural representations. The framework may also allow (exact and soft) equivariances to be automatically discovered.	翻訳日:2024-05-31 23:42:43 公開日:2024-05-29
# 量子電子回路の統一線形応答理論 Unified linear response theory of quantum electronic circuits ( http://arxiv.org/abs/2310.17399v2 ) ライセンス: Link先を確認	L. Peri, M. Benito, C. J. B. Ford, M. F. Gonzalez-Zalba,	(参考訳) 有限周波数での多レベル量子系の電気応答のモデル化は、典型的には2つの不完全パラダイムの文脈で行われてきた。 (i)任意の周波数で有効であるが、動的損失を無視する入出力理論及び (II)半古典理論は、よくダイナミックな散逸効果を捉えるが、低周波数でのみ正確である。ここでは、任意の周波数に有効な統一理論を開発し、緩和と強調によってもたらされる量子的振る舞いと非一意的効果の両方を捉える。この理論により、マルチレベルシステムは、エネルギー準位にのみ依存する共振性RCC回路の普遍的な小信号等価回路モデルによって記述できる。我々は,2重量子ドット電荷量子ビットとマヨラナ量子ビットに適用し,アディバティックから共鳴,コヒーレントから非コヒーレントまで連続的にシステムを記述する能力を示し,量子状態の読み出しを改善するための新しい現実的な実験を提案する。我々のモデルは、ハイブリッド量子古典回路の設計を容易にし、量子ビット制御と量子状態の読み出しをシミュレーションする。 Modelling the electrical response of multi-level quantum systems at finite frequency has been typically performed in the context of two incomplete paradigms: (i) input-output theory, which is valid at any frequency but neglects dynamic losses, and (ii) semiclassical theory, which captures well dynamic dissipation effects but is only accurate at low frequencies. Here, we develop a unifying theory, valid for arbitrary frequencies, that captures both the quantum behaviour and the non-unitary effects introduced by relaxation and dephasing. The theory allows a multi-level system to be described by a universal small-signal equivalent circuit model, a resonant RLC circuit, whose topology only depends on the number of energy levels. We apply our model to a double quantum-dot charge qubit and a Majorana qubit, showing the capability to continuously describe the systems from adiabatic to resonant and from coherent to incoherent, suggesting new and realistic experiments for improved quantum state readout. Our model will facilitate the design of hybrid quantum-classical circuits and the simulation of qubit control and quantum state readout.	翻訳日:2024-05-31 23:42:43 公開日:2024-05-29
# 量子状態の安定化器分解を最適化するための基底 Bases for optimising stabiliser decompositions of quantum states ( http://arxiv.org/abs/2311.17384v2 ) ライセンス: Link先を確認	Nadish de Silva, Ming Yin, Sergii Strelchuk,	(参考訳) スタビライザー状態は量子計算理論において中心的な役割を果たす。例えば、最も一般的な量子誤り訂正スキームの計算基底状態を符号化するのに使用される。任意量子状態は多くの安定化器分解(安定化器状態の重ね合わせとして表される方法)を許容する。安定化器分解の構造を理解することは、短期量子コンピュータの検証とシミュレーションに重要な応用をもたらす。我々は、$n$-qubit 安定化状態の線型依存のベクトル空間を導入し、研究する。これらの空間はベクトルを含む標準基底を持ち、その大きさは指数関数的に$n$で成長する。定数サイズ3の線形依存のエレガントな基底を構築する。我々のスパース基底は、まずすべての$n$-qubit安定化状態の辞書をコンパイルせずに計算できる。我々は既存の手法よりも多くの量子ビットの状態の安定化度を明示的に計算するためにそれらを利用する。最後に、魔法状態の安定化器ランクに関する理論的境界を改善するための将来の応用について述べる。 Stabiliser states play a central role in the theory of quantum computation. For example, they are used to encode computational basis states in the most common quantum error correction schemes. Arbitrary quantum states admit many stabiliser decompositions: ways of being expressed as a superposition of stabiliser states. Understanding the structure of stabiliser decompositions has significant applications in verifying and simulating near-term quantum computers. We introduce and study the vector space of linear dependencies of $n$-qubit stabiliser states. These spaces have canonical bases containing vectors whose size grows exponentially in $n$. We construct elegant bases of linear dependencies of constant size three. Critically, our sparse bases can be computed without first compiling a dictionary of all $n$-qubit stabiliser states. We utilise them to explicitly compute the stabiliser extent of states of more qubits than is feasible with existing techniques. Finally, we delineate future applications to improving theoretical bounds on the stabiliser rank of magic states.	翻訳日:2024-05-31 23:32:48 公開日:2024-05-29
# 古典的フレネット・サーレット装置から量子力学的進化の曲率とねじりまで : 第1報定常ハミルトニアス From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part I. Stationary Hamiltonians ( http://arxiv.org/abs/2311.18458v3 ) ライセンス: Link先を確認	Paul M. Alsing, Carlo Cafaro,	(参考訳) 三次元ユークリッド空間における空間曲線のフレネット・セレット装置が曲線の局所幾何学を決定することが知られている。特に、Frenet-Serret 装置は曲線の曲率やねじれを含む重要な幾何学的不変量を特定する。また、量子情報科学において、物理系に関する量子情報を符号化する量子状態を巧みに操作する際には、低複雑性と高効率が重要な特徴であることが認識されている。本稿では,動的に変化する状態ベクトルによって追跡される量子曲線の曲げとねじれの定量化について,幾何学的視点を提案する。具体的には、シュロディンガー方程式を指定した定常ハミルトニアンの下で一元的に進化する平行輸送された純粋量子状態によって追跡される射影ヒルベルト空間における量子軌道に対するフレネット・セレット装置の量子バージョンを提案する。提案した定数曲率係数は, 接ベクトルと状態ベクトルとの共変微分の大きさ2乗で与えられる。提案した定数ねじれ係数は、接ベクトルと状態ベクトルの両方に直交する接ベクトルの共変微分の射影の大きさの2乗で定義される。トーション係数は、量子曲線のねじれの便利な測度を提供する。注目すべきは、提案した曲率とねじり係数が文献に存在するものと一致するが、全く異なる方法で導入されることである。 It is known that the Frenet-Serret apparatus of a space curve in three-dimensional Euclidean space determines the local geometry of curves. In particular, the Frenet-Serret apparatus specifies important geometric invariants, including the curvature and the torsion of a curve. It is also acknowledged in quantum information science that low complexity and high efficiency are essential features to achieve when cleverly manipulating quantum states that encode quantum information about a physical system. In this paper, we propose a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by dynamically evolving state vectors. Specifically, we propose a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced by a parallel-transported pure quantum state evolving unitarily under a stationary Hamiltonian specifying the Schrodinger equation. Our proposed constant curvature coefficient is given by the magnitude squared of the covariant derivative of the tangent vector to the state vector and represents a useful measure of the bending of the quantum curve. Our proposed constant torsion coefficient, instead, is defined in terms of the magnitude squared of the projection of the covariant derivative of the tangent vector, orthogonal to both the tangent vector and the state vector. The torsion coefficient provides a convenient measure of the twisting of the quantum curve. Remarkably, we show that our proposed curvature and torsion coefficients coincide with those existing in the literature, although introduced in a completely different manner...	翻訳日:2024-05-31 23:32:48 公開日:2024-05-29
# 古典的フレネット・サーレット装置から量子力学的進化の曲率とねじりまで : 第2報非定常ハミルトニアス From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part II. Nonstationary Hamiltonians ( http://arxiv.org/abs/2311.18463v3 ) ライセンス: Link先を確認	Paul M. Alsing, Carlo Cafaro,	(参考訳) 非定常ハミルトニアンの下で進化する状態ベクトルによって追跡される量子曲線の曲げとねじれの定量化に関する幾何学的視点を示す。具体的には、定常ハミルトニアンに対する既存の幾何学的視点を頼りに、時変曲率とねじれ係数の両方が重要な役割を果たす時間依存量子力学シナリオへの我々の理論的構成の一般化について議論する。具体的には、シュロディンガーの進化方程式を指定した時間依存ハミルトニアンの下で、並列輸送された純粋量子状態によって一元的に進化したヒルベルト空間における量子軌道に対するフレネット・セレット装置の量子バージョンを示す。時変曲率係数は、接ベクトルの共変微分を状態ベクトルに二乗して指定し、量子曲線の曲げを測定する。時間変化のねじれ係数は、接ベクトルの共変微分の状態ベクトルへの射影の大きさの2乗、接ベクトルと状態ベクトルに直交し、さらに量子曲線のねじれを測定することによって与えられる。時間変化の設定は、統計的観点からよりリッチな構造を示す。例えば、時間に依存しない構成とは異なり、一般化された分散の概念は非定常ハミルトニアンの下で進化する量子状態によってトレースされた曲線のねじれの定義において非自明に現れる。我々の構成の意義を物理的に説明するために、正弦波振動時間依存電位によって指定された、正確に可溶な時間依存の2状態ラビ問題に適用する。 We present a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by state vectors evolving under nonstationary Hamiltonians. Specifically, relying on the existing geometric viewpoint for stationary Hamiltonians, we discuss the generalization of our theoretical construct to time-dependent quantum-mechanical scenarios where both time-varying curvature and torsion coefficients play a key role. Specifically, we present a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced out by a parallel-transported pure quantum state evolving unitarily under a time-dependent Hamiltonian specifying the Schrodinger evolution equation. The time-varying curvature coefficient is specified by the magnitude squared of the covariant derivative of the tangent vector to the state vector and measures the bending of the quantum curve. The time-varying torsion coefficient, instead, is given by the magnitude squared of the projection of the covariant derivative of the tangent vector to the state vector, orthogonal to the tangent vector and state vector and, in addition, measures the twisting of the quantum curve. We find that the time-varying setting exhibits a richer structure from a statistical standpoint. For instance, unlike the time-independent configuration, we find that the notion of generalized variance enters nontrivially in the definition of the torsion of a curve traced out by a quantum state evolving under a nonstationary Hamiltonian. To physically illustrate the significance of our construct, we apply it to an exactly soluble time-dependent two-state Rabi problem specified by a sinusoidal oscillating time-dependent potential...	翻訳日:2024-05-31 23:32:48 公開日:2024-05-29
# MonoNPHM:モノクロビデオからの動的頭部再構成 MonoNPHM: Dynamic Head Reconstruction from Monocular Videos ( http://arxiv.org/abs/2312.06740v2 ) ライセンス: Link先を確認	Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner,	(参考訳) モノクラーRGBビデオからの動的3次元頭部再構成のためのモノクラーニューラルパラメトリックヘッドモデル(MonoNPHM)を提案する。そこで本研究では,ニューラルパラメトリックモデル上でのテクスチャ場をパラメータ化する潜在外観空間を提案する。我々は、RGBからの勾配が逆レンダリング中の潜時幾何学符号に効果的に影響を及ぼすような、下層の幾何学と相関する予測色値を制約する。表現空間の表現能力を高めるために,超次元で後方変形場を拡大し,位相的に困難な表現における色や幾何学的表現を改善する。先行学習としてMonoNPHMを用いて,符号付き距離場に基づくボリュームレンダリングを用いた3次元頭部再構成の課題にアプローチする。後ろ向きの変形場を数値的に反転させることで,我々の正準幾何学的表現と密接な結びつきを持つ顔アンカーポイントを用いたランドマークロスを組み込んだ。単眼RGBビデオからの動的顔再構成の課題を評価するために,カジュアルな条件下でのKinectシークエンスを20個記録する。 MonoNPHMはすべてのベースラインを大きなマージンで上回り、RGBトラッキングを通じて容易にアクセス可能なニューラルパラメトリック顔モデルに向けた重要なステップとなる。 We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry codes during inverse rendering. To increase the representational capacity of our expression space, we augment our backward deformation field with hyper-dimensions, thus improving color and geometry representation in topologically challenging expressions. Using MonoNPHM as a learned prior, we approach the task of 3D head reconstruction using signed distance field based volumetric rendering. By numerically inverting our backward deformation field, we incorporated a landmark loss using facial anchor points that are closely tied to our canonical geometry representation. To evaluate the task of dynamic face reconstruction from monocular RGB videos we record 20 challenging Kinect sequences under casual conditions. MonoNPHM outperforms all baselines with a significant margin, and makes an important step towards easily accessible neural parametric face models through RGB tracking.	翻訳日:2024-05-31 23:32:48 公開日:2024-05-29
# 物質波の交叉ウィグナー関数によるグーイ位相と量子干渉 Gouy phase and quantum interference with cross-Wigner functions for matter-waves ( http://arxiv.org/abs/2401.00083v2 ) ライセンス: Link先を確認	Lucas S. Marinho, Pedro R. Dieguez, Carlos H. S. Vieira, Irismar G. da Paz,	(参考訳) グーイ相は、古典的な電磁波から物質波、量子光学まで、様々な波の現象を正確に記述するために不可欠である。本研究では, 相互ウィグナー変換に基づく位相空間法を用いて, 相関ガウス波パケットによって特徴付けられる物質波の進化における空間的および時間的干渉を解析する。まず、その自由進化を伴う初期関数の交叉ウィグナーと、二重スリット配置による進化について考察する。グローバルなグーイ位相を取得する波動関数と異なり、クロスウィグナーは進化時間が異なるため、グーイ位相差を取得する。以上の結果から,時間的干渉の正確な記述には時間的様相が重要であることが示唆された。さらに、物質波を用いた二重スリット実験において、空間強度干渉項からクロスウィグナーを再構成するためのウィグナー関数に基づく手法を提案する。 The Gouy phase is essential for accurately describing various wave phenomena, ranging from classical electromagnetic waves to matter waves and quantum optics. In this work, we employ phase-space methods based on the cross-Wigner transformation to analyze spatial and temporal interference in the evolution of matter waves characterized initially by a correlated Gaussian wave packet. First, we consider the cross-Wigner of the initial function with its free evolution, and second for the evolution through a double-slit arrangement. Different from the wave function which acquires a global Gouy phase, we find that the cross-Wigner acquires a Gouy phase difference due to different evolution times. The results suggest that temporal like-Gouy phases are important for an accurate description of temporal interference. Furthermore, we propose a technique based on the Wigner function to reconstruct the cross-Wigner from the spatial intensity interference term in a double-slit experiment with matter waves.	翻訳日:2024-05-31 23:23:04 公開日:2024-05-29
# GA-SmaAt-GNet: 極端沈殿用生成逆小注意GNet GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting ( http://arxiv.org/abs/2401.09881v2 ) ライセンス: Link先を確認	Eloy Reulen, Siamak Mehrkanoon,	(参考訳) 近年、データ駆動モデリングのアプローチは、気象予報など様々な気象学の分野で大きな注目を集めている。しかし、これらの手法は極度の気象条件を扱う際にしばしば困難に直面する。そこで本研究では, 降水量抑制のためのGA-SmaAt-GNetモデルを提案する。このモデルは独自のSmaAt-GNetジェネレータを備えており、成功しているSmaAt-UNetアーキテクチャを拡張し、降水マスク(二値降水マップ)を統合して予測精度を高めることができる。さらに、GA-SmaAt-GNetはPix2Pixアーキテクチャにインスパイアされた注意増強された識別器を組み込んでいる。この革新的なフレームワークは、複数のデータソースを使用して、生成的な降水今ストリーミングする方法を舗装する。オランダの実際の降水データを用いて,SmaAt-GNetとGA-SmaAt-GNetの性能を評価し,他のモデルと比較して,全体的な性能および極端な降水イベントに対する顕著な改善を明らかにした。具体的には,夏と秋の降水強度が平均的にピークである場合,本アーキテクチャは主な性能向上を示す。さらに,GA-SmaAt-GNetモデルと降水データセットの不確実性解析を行い,その予測能力について考察する。最後に、Grad-CAMを用いてモデルの予測を視覚的に説明し、ネットワーク全体の入力アクティベーションの領域をハイライトするアクティベーションヒートマップを生成する。 In recent years, data-driven modeling approaches have gained significant attention across various meteorological applications, particularly in weather forecasting. However, these methods often face challenges in handling extreme weather conditions. In response, we present the GA-SmaAt-GNet model, a novel generative adversarial framework for extreme precipitation nowcasting. This model features a unique SmaAt-GNet generator, an extension of the successful SmaAt-UNet architecture, capable of integrating precipitation masks (binarized precipitation maps) to enhance predictive accuracy. Additionally, GA-SmaAt-GNet incorporates an attention-augmented discriminator inspired by the Pix2Pix architecture. This innovative framework paves the way for generative precipitation nowcasting using multiple data sources. We evaluate the performance of SmaAt-GNet and GA-SmaAt-GNet using real-life precipitation data from the Netherlands, revealing notable improvements in overall performance and for extreme precipitation events compared to other models. Specifically, our proposed architecture demonstrates its main performance gain in summer and autumn, when precipitation intensity is typically at its peak. Furthermore, we conduct uncertainty analysis on the GA-SmaAt-GNet model and the precipitation dataset, providing insights into its predictive capabilities. Finally, we employ Grad-CAM to offer visual explanations of our model's predictions, generating activation heatmaps that highlight areas of input activation throughout the network.	翻訳日:2024-05-31 23:23:04 公開日:2024-05-29
# ログアクセス不要のブラックボックス大言語モデル強化のためのスケッチガイド付き制約付き復号法 Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access ( http://arxiv.org/abs/2401.09967v2 ) ライセンス: Link先を確認	Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West,	(参考訳) 制約付き復号化(Constrained decoding)は、言語モデル出力の制約を強制するテクニックで、再訓練やアーキテクチャの変更なしにテキスト生成を制御する手段を提供する。しかしながら、そのアプリケーションは一般的に、ユーザーが次のトーケン分布(通常はソフトマックスロジットを介して)にアクセスできるモデルに限定されており、ブラックボックスの大規模言語モデル(LLM)で制限される。本稿では,ブラックボックスLLMのロジットにアクセスせずに動作するブラックボックスLLMの制約付き復号法であるスケッチ誘導制約復号法(SGCD)を提案する。 SGCDは、ローカルにホストされた補助モデルを使用して、制約のないブラックボックスLSMの出力を洗練し、この初期出力を「スケッチ」として効果的に処理し、さらなる実験を行う。このアプローチは、従来のロジットベースのテクニックを補完するものであり、完全なモデルの透明性が利用できない設定で制約付きデコードの適用を可能にする。本研究では,複雑なNLPタスクに対するブラックボックスLLMの有用性と柔軟性をいかに向上させるかを示す。 Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper introduces sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM. SGCD utilizes a locally hosted auxiliary model to refine the output of an unconstrained blackbox LLM, effectively treating this initial output as a "sketch" for further elaboration. This approach is complementary to traditional logit-based techniques and enables the application of constrained decoding in settings where full model transparency is unavailable. We demonstrate the efficacy of SGCD through experiments in closed information extraction and constituency parsing, showing how it enhances the utility and flexibility of blackbox LLMs for complex NLP tasks.	翻訳日:2024-05-31 23:23:04 公開日:2024-05-29
# Chem-FINESE:テキスト再構成によるファインショット要素抽出の検証 Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction ( http://arxiv.org/abs/2401.10189v4 ) ライセンス: Link先を確認	Qingyun Wang, Zixuan Zhang, Hongxiang Li, Xuan Liu, Jiawei Han, Huimin Zhao, Heng Ji,	(参考訳) 化学領域における微粒な数発の実体抽出は、2つの固有の課題に直面している。第一に、一般領域におけるエンティティ抽出タスクと比較して、化学論文からの文は、通常より多くのエンティティを含む。さらに、エンティティ抽出モデルは通常、長い尾型のエンティティを抽出することが困難である。本稿では,これら2つの課題に対処するために,シークエンス・ツー・シーケンス(seq2seq)をベースとした複数ショットのエンティティ抽出手法であるChem-FINESEを提案する。私たちのChem-FINESEは、入力文から名前付きエンティティを抽出するSeq2seqエンティティ抽出器と、抽出されたエンティティから元の入力文を再構成するSeq2seqセルフバリデーションモジュールの2つのコンポーネントを備えている。優れたエンティティ抽出システムがエンティティを忠実に抽出する必要があるという事実にインスパイアされた新しい自己検証モジュールは、エンティティ抽出結果を活用して元の入力文を再構築する。さらに,抽出過程における過剰コピーを減らすために,新たなコントラスト損失を設計する。最後に、ChemNERスキーマでドメインの専門家によって注釈付けされた、新しいきめ細かい化学エンティティ抽出データセットであるChemNER+をリリースする。 ChemNER+とCHEMETのデータセットによる数ショット設定の実験では、新たに提案したフレームワークは、それぞれ8.26%と6.84%の絶対F1スコアゲインに寄与している。 Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq) based few-shot entity extraction approach, to address these two challenges. Our Chem-FINESE has two components: a seq2seq entity extractor to extract named entities from the input sentence and a seq2seq self-validation module to reconstruct the original input sentence from extracted entities. Inspired by the fact that a good entity extraction system needs to extract entities faithfully, our new self-validation module leverages entity extraction results to reconstruct the original input sentence. Besides, we design a new contrastive loss to reduce excessive copying during the extraction process. Finally, we release ChemNER+, a new fine-grained chemical entity extraction dataset that is annotated by domain experts with the ChemNER schema. Experiments in few-shot settings with both ChemNER+ and CHEMET datasets show that our newly proposed framework has contributed up to 8.26% and 6.84% absolute F1-score gains respectively.	翻訳日:2024-05-31 23:23:04 公開日:2024-05-29
# ループ変換器を用いたグラフアルゴリズムのシミュレーション Simulation of Graph Algorithms with Looped Transformers ( http://arxiv.org/abs/2402.01107v2 ) ライセンス: Link先を確認	Artur Back de Luca, Kimon Fountoulakis,	(参考訳) ニューラルネットワークを用いたグラフアルゴリズムの実行は、最近、有望な経験的進歩のために大きな関心を集めている。このことは、ニューラルネットワークが推論ステップをリレーショナルデータで再現する方法について、さらなる理解を動機付けている。本研究では,理論的な観点から,グラフ上のアルゴリズムをシミュレートするトランスフォーマーネットワークの能力について検討する。私たちが使用しているアーキテクチャは、グラフと相互作用する追加の注意頭を持つループ変換器です。我々は,このアーキテクチャがDijkstraの最短経路,Breadth- and Depth-First Search,Kosarajuの強結合成分,および複数のアルゴリズムを同時にシミュレーションできることを示す。ネットワーク内のパラメータ数は入力グラフのサイズによって増加しないため、ネットワークは上記のアルゴリズムを任意のグラフに対してシミュレートすることができる。この性質にもかかわらず、有限精度による解のシミュレーションには限界がある。最後に,付加的なアテンションヘッドを利用する場合のチューリング完全度を一定幅で示す。 The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.	翻訳日:2024-05-31 23:13:17 公開日:2024-05-29
# 固有スペクトルによるカーネルリッジレス回帰におけるオーバーフィッティングの特徴 Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum ( http://arxiv.org/abs/2402.01297v3 ) ライセンス: Link先を確認	Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius,	(参考訳) 固定された入力次元に対するオーバーパラメータ化された状態において、カーネルリッジレス回帰(KRR)に対する既存の非漸近的テストエラー境界を強化するために、カーネル行列の条件数に対する新しい境界を導出する。多項式スペクトル減衰を持つ核に対しては、以前の研究から境界を回復し、指数減衰に対しては、我々の境界は非自明で新規である。私たちの貢献は2つあります。一ガウス以下の設計の前提の下で、過度に誘惑された過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過大な過度の現象を厳格に証明し、文献の既存のギャップを埋めること。 (II) 従来のガウス設計の前提を用いてKRR一般化を近似することに対する懸念を提起し, この特徴の独立性が, 過度な適合性を保証する上で重要な役割を担っていることを確認した。 We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.	翻訳日:2024-05-31 23:13:17 公開日:2024-05-29
# S2malloc: 統計的に安全なアロケータ S2malloc: Statistically Secure Allocator for Use-After-Free Protection And More ( http://arxiv.org/abs/2402.01894v2 ) ライセンス: Link先を確認	Ruizhe Wang, Meng Xu, N. Asokan,	(参考訳) ヒープメモリへの攻撃、メモリオーバーフロー、ダブルおよび無効なフリー、UAF(Use-after-free)、および様々なヒープ・スプレー技術は増加を続けている。既存のエントロピーベースの安全なメモリアロケータは、これらの攻撃ベクトルのほとんど全てに対して統計的に防御する。彼らはUAF攻撃に対する防御を主張するが、その設計は(失敗に終わった)試みを検出するように調整されていない。このため、このエントロピーベースの保護に打ち勝つために、攻撃者はヒープスプレーの可能性を秘め、同じ攻撃を繰り返すだけで成功の可能性がさらに向上する。 S2mallocを導入し、他のセキュリティ保証を妥協したり、大幅な性能上のオーバーヘッドを発生させることなく、UAF-attempt検出を強化することを目的としている。これを実現するために、UAFの試みを検知する自由ブロックカナリア(FBC)、攻撃者が被害者のオブジェクトを正確に上書きするのを阻止するランダムインブロックオフセット(RIO)、攻撃者のアドレスに基づいてブロックサイズを推定するランダムバッグレイアウト(RBL)の3つの革新的な構成を用いる。私たちはそれを示します (a) RIOオフセットのオブジェクトサイズを25%保存することにより、攻撃者が同じポインタを再利用した場合は8バイトのカナリアが69%の保護率を提供し、攻撃者が64バイトのオブジェクトをターゲットとするUAF攻撃に対して、他の攻撃に対して同等またはそれ以上のセキュリティ保証を持たずに、96%の保護率を提供する。 (b) S2mallocは実用的であり、PARSECでの実行時のオーバーヘッドはわずか2.8%、SPECでは11.5%である。最先端のエントロピーベースのアロケータと比較して、S2mallocはさらなる性能オーバーヘッドを発生させることなくUAF保護を改善する。 UAFを緩和するアロケータと比較して、S2mallocは、オーバーヘッドを大幅に低減するために、保護の失敗の極小確率で取引する。 Attacks on heap memory, encompassing memory overflow, double and invalid free, use-after-free (UAF), and various heap spraying techniques are ever-increasing. Existing entropy-based secure memory allocators provide statistical defenses against virtually all of these attack vectors. Although they claim protections against UAF attacks, their designs are not tailored to detect (failed) attempts. Consequently, to beat this entropy-based protection, an attacker can simply launch the same attack repeatedly with the potential use of heap spraying to further improve their chance of success. We introduce S2malloc, aiming to enhance UAF-attempt detection without compromising other security guarantees or introducing significant performance overhead. To achieve this, we use three innovative constructs in secure allocator design: free block canaries (FBC) to detect UAF attempts, random in-block offset (RIO) to stop the attacker from accurately overwriting the victim object, and random bag layout (RBL) to impede attackers from estimating the block size based on its address. We show that (a) by reserving 25% of the object size for the RIO offset, an 8-byte canary offers a 69% protection rate if the attacker reuses the same pointer and 96% protection rate if the attacker does not, against UAF exploitation attempts targeting a 64 bytes object, with equal or higher security guarantees against all other attacks; and (b) S2malloc is practical, with only a 2.8% run-time overhead on PARSEC and an 11.5% overhead on SPEC. Compared to state-of-the-art entropy-based allocators, S2malloc improves UAF-protection without incurring additional performance overhead. Compared to UAF-mitigating allocators, S2malloc trades off a minuscule probability of failed protection for significantly lower overhead.	翻訳日:2024-05-31 23:13:17 公開日:2024-05-29
# ニューラルコントラクトのダイナミクスを学習する - 線形化の拡張とグローバルな保証 Learning Neural Contracting Dynamics: Extended Linearization and Global Guarantees ( http://arxiv.org/abs/2402.08090v3 ) ライセンス: Link先を確認	Sean Jaffe, Alexander Davydov, Deniz Lapsekili, Ambuj Singh, Francesco Bullo,	(参考訳) 学習力学系におけるグローバルな安定性と堅牢性を保証することは、不確実性に直面したシステムの健全性を保証するために不可欠である。拡張線形化契約力学(ELCD)は,グローバルな契約性を保証するニューラルネットワークベースの力学系である。 ELCDの鍵となる特徴は、非線形ベクトル場の拡張線型化のパラメトリゼーションである。最も基本的な形では、ELCDは保証される。 (i)グローバルに指数関数的に安定する (二)均衡縮小、及び (三)世界規模のメートル法に関する契約データ空間におけるより一般的なメトリクスに対する縮約を可能にするため、データ空間と潜在空間の間の微分同相を訓練し、潜在空間における縮約を強制し、データ空間における大域的縮約性を保証する。我々は,高次元LASA,マルチリンク振り子,ローゼンブロックデータセット上でのELCDの性能を示す。 Global stability and robustness guarantees in learned dynamical systems are essential to ensure well-behavedness of the systems in the face of uncertainty. We present Extended Linearized Contracting Dynamics (ELCD), the first neural network-based dynamical system with global contractivity guarantees in arbitrary metrics. The key feature of ELCD is a parametrization of the extended linearization of the nonlinear vector field. In its most basic form, ELCD is guaranteed to be (i) globally exponentially stable, (ii) equilibrium contracting, and (iii) globally contracting with respect to some metric. To allow for contraction with respect to more general metrics in the data space, we train diffeomorphisms between the data space and a latent space and enforce contractivity in the latent space, which ensures global contractivity in the data space. We demonstrate the performance of ELCD on the high dimensional LASA, multi-link pendulum, and Rosenbrock datasets.	翻訳日:2024-05-31 21:05:54 公開日:2024-05-29
# GradSafe: 安全臨界勾配解析によるLCMの脱獄プロンプト検出 GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis ( http://arxiv.org/abs/2402.13494v2 ) ライセンス: Link先を確認	Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong,	(参考訳) 大規模言語モデル(LLM)は、脱獄プロンプトからの脅威に直面している。 jailbreakプロンプトを検出する既存の方法は、主にオンラインモデレーションAPIまたは微調整LDMである。しかし、これらの戦略は、広範囲でリソース集約的なデータ収集とトレーニングプロセスを必要とすることが多い。本研究では, LLMにおける安全クリティカルパラメータの勾配を精査し, 脱獄プロンプトを効果的に検出するGradSafeを提案する。 LLMのジェイルブレイクに対する損失の勾配は、コンプライアンス応答と組み合わせることで、特定の安全クリティカルパラメータに類似したパターンを示す。対照的に、安全なプロンプトは異なる勾配パターンをもたらす。この観測に基づいて、GradSafeは、(コンプライアンス対応を備えた)プロンプトから勾配を分析して、Jailbreakプロンプトを正確に検出する。 Llama Guardは、大規模なデータセットによる微調整によってジェイルブレイクのプロンプトを検出するが、Llama-2にさらなるトレーニングなしで適用されたGradSafeは、Llama Guardより優れていた。 ToxicChatとXSTestで評価したように、この優れたパフォーマンスはゼロショットとアダプションの両方のシナリオで一貫しています。ソースコードはhttps://github.com/xyq7/GradSafeで入手できる。 Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of safety-critical parameters in LLMs. Our method is grounded in a pivotal observation: the gradients of an LLM's loss for jailbreak prompts paired with compliance response exhibit similar patterns on certain safety-critical parameters. In contrast, safe prompts lead to different gradient patterns. Building on this observation, GradSafe analyzes the gradients from prompts (paired with compliance responses) to accurately detect jailbreak prompts. We show that GradSafe, applied to Llama-2 without further training, outperforms Llama Guard, despite its extensive finetuning with a large dataset, in detecting jailbreak prompts. This superior performance is consistent across both zero-shot and adaptation scenarios, as evidenced by our evaluations on ToxicChat and XSTest. The source code is available at https://github.com/xyq7/GradSafe.	翻訳日:2024-05-31 20:54:36 公開日:2024-05-29
# UNITS: 統合マルチタスク時系列モデル UNITS: A Unified Multi-Task Time Series Model ( http://arxiv.org/abs/2403.00131v2 ) ライセンス: Link先を確認	Shanghua Gao, Teddy Koker, Owen Queen, Thomas Hartvigsen, Theodoros Tsiligkaridis, Marinka Zitnik,	(参考訳) 時系列モデルの進歩は、従来のディープラーニング手法から、事前訓練された基礎モデルへのシフトを促している。事前訓練されたトランスフォーマーと再プログラムされたテキストベースのLCMは、最先端の結果を報告するが、最高のパフォーマンスのアーキテクチャはタスクによって大きく異なり、しばしばモデルは時系列予測のみに焦点を当てるなど、限られた範囲を持つ。予測的および生成的時系列タスクを単一のフレームワークで統一するモデルは、達成が困難なままである。タスクトークン化を用いたマルチタスク時系列モデルUniTSを導入し,予測および生成タスクを単一モデル内で表現する。 UniTSは、ユニバーサル時系列表現を得るために設計された改良されたトランスフォーマーブロックを利用する。この設計は、多種多様な動的パターン、サンプリングレート、時間スケールを持つ、多種多様なマルチドメイン事前トレーニングデータセットから、多くの下流データセットへの転送可能性を誘導する。人間の活動センサー、医療、エンジニアリング、ファイナンスドメインにまたがる38のデータセットに対して、UniTSモデルは、12の予測モデル、20の分類モデル、18の異常検出モデル、および16の計算モデルに対して好意的に機能する。 UniTSは、新しいデータドメインやタスクを評価する際に、効果的な数ショットと迅速な学習機能を示す。従来のシングルタスク設定では、UniTSは強いタスク特化時系列モデルより優れている。ソースコードとデータセットはhttps://github.com/mims-harvard/UniTSで公開されている。 Advances in time series models are driving a shift from conventional deep learning methods to pre-trained foundational models. While pre-trained transformers and reprogrammed text-based LLMs report state-of-the-art results, the best-performing architectures vary significantly across tasks, and models often have limited scope, such as focusing only on time series forecasting. Models that unify predictive and generative time series tasks under a single framework remain challenging to achieve. We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. UniTS leverages a modified transformer block designed to obtain universal time series representations. This design induces transferability from a heterogeneous, multi-domain pre-training dataset-often with diverse dynamic patterns, sampling rates, and temporal scales-to many downstream datasets, which can also be diverse in task specifications and data domains. Across 38 datasets spanning human activity sensors, healthcare, engineering, and finance domains, UniTS model performs favorably against 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including repurposed text-based LLMs. UniTS demonstrates effective few-shot and prompt learning capabilities when evaluated on new data domains and tasks. In the conventional single-task setting, UniTS outperforms strong task-specialized time series models. The source code and datasets are available at https://github.com/mims-harvard/UniTS.	翻訳日:2024-05-31 20:54:36 公開日:2024-05-29
# IOI:非参照画像とビデオ品質メトリクスに対する可視的ワンイテレーション・アドバイザリアタック IOI: Invisible One-Iteration Adversarial Attack on No-Reference Image- and Video-Quality Metrics ( http://arxiv.org/abs/2403.05955v2 ) ライセンス: Link先を確認	Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin,	(参考訳) 非参照画像とビデオ品質のメトリクスは、ビデオ処理ベンチマークで広く使われている。ビデオアタックによる学習ベースのメトリクスの堅牢性は、広く研究されていない。成功したことに加えて、ビデオ処理ベンチマークで使用できる攻撃は、高速で受け入れがたいものである必要がある。 Invisible One-Iteration (IOI) は参照画像やビデオ品質の指標に反する攻撃である。対象および主観的テストにより,画像とビデオのデータセットを用いた8つの先行手法との比較を行った。本手法は,攻撃性能と速度を同等に保ちながら,攻撃された各種メトリックアーキテクチャの視覚的品質に優れていた。私たちはGitHubでコードを公開しました。 No-reference image- and video-quality metrics are widely used in video processing benchmarks. The robustness of learning-based metrics under video attacks has not been widely studied. In addition to having success, attacks that can be employed in video processing benchmarks must be fast and imperceptible. This paper introduces an Invisible One-Iteration (IOI) adversarial attack on no reference image and video quality metrics. We compared our method alongside eight prior approaches using image and video datasets via objective and subjective tests. Our method exhibited superior visual quality across various attacked metric architectures while maintaining comparable attack success and speed. We made the code available on GitHub: https://github.com/katiashh/ioi-attack.	翻訳日:2024-05-31 20:44:52 公開日:2024-05-29
# 分子を解釈可能な文法のランダムウォークとして表現する Representing Molecules as Random Walks Over Interpretable Grammars ( http://arxiv.org/abs/2403.08147v2 ) ライセンス: Link先を確認	Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik,	(参考訳) 分子発見の最近の研究は、主に小さな薬物のような分子に焦点が当てられ、同様に材料設計において適切な技術を持たない多くの重要な応用が残されている。これらの応用は、既知のサブ構造を用いて慎重に設計されるサンプルが少なく、より複雑な分子構造に依存していることが多い。本稿では,設計基盤となるモチーフを特徴とする階層設計空間を明示的に記述したグラフ文法を用いて,そのような分子を表現・推論するためのデータ効率・解釈可能なモデルを提案する。本稿では,分子生成と特性予測の両方を容易にする設計空間上のランダムウォークという,新しい表現を提案する。本研究では, 予測分子の性能, 効率, 合成可能性の観点から, 既存の手法に対する明確な優位性を実証し, 提案手法の化学的解釈可能性に関する詳細な知見を提供する。 Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method's chemical interpretability.	翻訳日:2024-05-31 20:44:52 公開日:2024-05-29
# メンタルヘルスのための大規模言語モデル:システムレビュー Large Language Model for Mental Health: A Systematic Review ( http://arxiv.org/abs/2403.15401v2 ) ライセンス: Link先を確認	Zhijun Guo, Alvina Lai, Johan Hilge Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li,	(参考訳) 大規模言語モデル(LLM)は、デジタルヘルスの潜在的な応用に対して大きな注目を集めている一方、メンタルヘルスへの応用は、現在進行中の議論の対象となっている。本研究は, 早期スクリーニング, デジタル介入, 臨床応用の強さと限界に着目し, 精神保健におけるLSMの使用状況を評価することを目的とする。 PRISMAガイドラインに従って, PubMed, IEEE Xplore, Scopus, JMIRをキーワードとして検索した。非英語記事を除く2017年1月1日から2023年12月31日までの記事を掲載した。 30項目が評価され, テキストによる精神疾患と自殺的思考検出(n=12), メンタルヘルス会話エージェント(n=5), その他のメンタルヘルスにおけるLSMの応用と評価(n=13。 LLMは、メンタルヘルスの問題を検知し、アクセス可能で非スティグマタイズされたeヘルスサービスを提供する上で、かなりの効果を発揮する。しかし、現在の臨床使用に伴うリスクは、彼らの利益を上回る可能性がある。この研究は、専門家によって注釈付けされた多言語データセットの欠如、生成されたコンテンツの正確性と信頼性に関する懸念、LCMの「ブラックボックス」の性質による解釈可能性の課題、永続的な倫理的ジレンマなど、いくつかの重要な問題を明らかにしている。これには、明確な倫理的枠組みの欠如、データのプライバシーへの懸念、セラピストと患者の双方によるLSMへの過度な信頼の可能性が含まれており、従来の医療行為を損なう可能性がある。これらの問題にもかかわらず、LSMの急速な開発は、新たな臨床支援としての可能性を強調し、この分野における継続的な研究と開発の必要性を強調している。 Large language models (LLMs) have attracted significant attention for potential applications in digital health, while their application in mental health is subject to ongoing debate. This systematic review aims to evaluate the usage of LLMs in mental health, focusing on their strengths and limitations in early screening, digital interventions, and clinical applications. Adhering to PRISMA guidelines, we searched PubMed, IEEE Xplore, Scopus, and the JMIR using keywords: 'mental health OR mental illness OR mental disorder OR psychiatry' AND 'large language models'. We included articles published between January 1, 2017, and December 31, 2023, excluding non-English articles. 30 articles were evaluated, which included research on mental illness and suicidal ideation detection through text (n=12), usage of LLMs for mental health conversational agents (CAs) (n=5), and other applications and evaluations of LLMs in mental health (n=13). LLMs exhibit substantial effectiveness in detecting mental health issues and providing accessible, de-stigmatized eHealth services. However, the current risks associated with the clinical use might surpass their benefits. The study identifies several significant issues: the lack of multilingual datasets annotated by experts, concerns about the accuracy and reliability of the content generated, challenges in interpretability due to the 'black box' nature of LLMs, and persistent ethical dilemmas. These include the lack of a clear ethical framework, concerns about data privacy, and the potential for over-reliance on LLMs by both therapists and patients, which could compromise traditional medical practice. Despite these issues, the rapid development of LLMs underscores their potential as new clinical aids, emphasizing the need for continued research and development in this area.	翻訳日:2024-05-31 20:35:08 公開日:2024-05-29
# 巨大な言語モデルでさえ、間違った理由を正すのが難しい Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons ( http://arxiv.org/abs/2403.17760v2 ) ライセンス: Link先を確認	Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David R. Mortensen, Lori Levin,	(参考訳) 本稿では,NLPの観点から,トークンの区別のみに基づいて包括性を識別するモデルを最小化し,GPT-4とLlama 2が強いバイアスで失敗する可能性を示す,大きな語彙重なりを持つNLIのための小さな挑戦データセットを提案する。そして、この失敗を説明するために、さらに挑戦的なサブタスクを作成します。計算言語学の観点から、曲面特徴によって区別できない3種類の形容詞を持つ構成群を同定する。これにより, LLM のこれらの構造に対する理解を様々な方法で探究することが可能となり, 両者の区別に様々な方法で失敗し, それらの意味を適切に表現したり, 語彙的特徴を捉えたりすることができないことが示唆された。 In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort to explain this failure. From a Computational Linguistics perspective, we identify a group of constructions with three classes of adjectives which cannot be distinguished by surface features. This enables us to probe for LLM's understanding of these constructions in various ways, and we find that they fail in a variety of ways to distinguish between them, suggesting that they don't adequately represent their meaning or capture the lexical properties of phrasal heads.	翻訳日:2024-05-31 20:35:08 公開日:2024-05-29
# Serpent: マルチスケール構造化状態空間モデルによるスケーラブルで効率的な画像復元 Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models ( http://arxiv.org/abs/2403.17902v2 ) ライセンス: Link先を確認	Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi,	(参考訳) 効率的な画像復元アーキテクチャの計算構築ブロックのランドスケープは、畳み込み処理と様々な注意機構の組み合わせによって支配されている。しかし、畳み込みフィルタは効率的ではあるが本質的に局所的であるため、画像内の長距離依存関係のモデリングに苦慮している。対照的に、注意は任意の画像領域間のグローバルな相互作用を捉えるのに優れるが、画像次元の二次的なコストに悩まされる。本研究では,最近の状態空間モデル(SSM)とマルチスケール信号処理を組み合わせた高解像度画像復元のための効率的なアーキテクチャであるSerpentを提案する。もともとシーケンスモデリングのために導入されたSSMは、入力サイズが好適な線形スケーリングで、グローバルな受容場を維持することができる。本稿では,従来の信号処理原理に着想を得た新しい階層型アーキテクチャを提案し,入力画像をシーケンスの集合に変換し,マルチスケールで処理する。実験結果から,Serpentはコンピュート・オブ・ザ・アーティファクト(FLOPSの最大150ドル分の削減)と最大5ドル分のGPUメモリを必要とすると同時に,コンピュート・オブ・ザ・アーティファクトに匹敵する再現性を実現することができることを示した。 Serpentによって達成された効率向上は、特に高解像度で顕著である。 The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arbitrary image regions, but suffers from a quadratic cost in image dimension. In this work, we propose Serpent, an efficient architecture for high-resolution image restoration that combines recent advances in state space models (SSMs) with multi-scale signal processing in its core computational block. SSMs, originally introduced for sequence modeling, can maintain a global receptive field with a favorable linear scaling in input size. We propose a novel hierarchical architecture inspired by traditional signal processing principles, that converts the input image into a collection of sequences and processes them in a multi-scale fashion. Our experimental results demonstrate that Serpent can achieve reconstruction quality on par with state-of-the-art techniques, while requiring orders of magnitude less compute (up to $150$ fold reduction in FLOPS) and a factor of up to $5\times$ less GPU memory while maintaining a compact model size. The efficiency gains achieved by Serpent are especially notable at high image resolutions.	翻訳日:2024-05-31 20:35:08 公開日:2024-05-29
# DeFT:効率的な木構造LPM推論のためのFlashツリーアテンションによるデコーディング DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference ( http://arxiv.org/abs/2404.00242v2 ) ライセンス: Link先を確認	Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin,	(参考訳) LLMとのツリー構造相互作用の需要が高まる中、木構造推論に適したIO対応木注目アルゴリズムであるDeFT(Decoding with Flash Tree-Attention)を導入する。従来のシーケンスベースのデコーディングとは異なり、ツリー構造化デコーディングは、自己整合性、少数ショットプロンプト、マルチステップ推論、マルチモデル/ヘッド調整など、現代的なタスク要件に適合する。しかし、既存のシーケンスベースの推論システムは木構造デコードには適していないため、計算、メモリフットプリント、メモリアクセスの冗長性が低下し、推論効率が低下する。この課題に対処するために、DeFTは、メモリ効率の低いメモリフットプリントによるメモリ効率の低い注意計算を、(1)QKVの作成:木分割によるKV誘導グループ化戦略を提案し、GPUグローバルメモリとオンチップ共有メモリ間のKVキャッシュのメモリ読み込み/書き込みを最小化しながら、GPUリソース利用を最適化する;(2)注意計算:各QKVグループの部分的注意を融合カーネルで計算し、ツリートポロジーを意識したグローバルリコメンデーション戦略を用いて最終注目を得る。注意計算中に73-99%のKVキャッシュIOと100%のIOを減らし(例えば、Softmax)、DeFTは3つの実用的なツリーベースワークロードで2.52/3.82倍の高速化を実現している。 Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-model/head coordination. However, existing sequence-based inference systems are ill-suited for tree-structured decoding, resulting in redundancy in computation, memory footprints, and memory access, thereby undermining inference efficiency. To address this challenge, DeFT maintains memory-efficient attention calculation with low memory footprints through two key stages: (1) QKV Preparation: We propose a KV-Guided Grouping Strategy with Tree Split to intelligently group QKV, optimizing GPU resource utilization while minimizing memory reads/writes for KV cache between GPU global memory and on-chip shared memory; (2)Attention Calculation: We compute partial attention of each QKV group in a fused kernel and employ a Tree-topology-aware Global Reduction strategy to obtain final attention. By reducing 73-99% KV cache IO and nearly 100% IO for partial results during attention calculation (e.g., Softmax), DeFT achieves up to 2.52/3.82x speedup in the end-to-end/attention latency across three practical tree-based workloads: namely, few-shot prompting, multi-step reasoning, and speculative decoding, over state-of-the-art attention algorithms.	翻訳日:2024-05-31 20:35:08 公開日:2024-05-29
# 音声基礎モデルの大規模評価 A Large-Scale Evaluation of Speech Foundation Models ( http://arxiv.org/abs/2404.09385v2 ) ライセンス: Link先を確認	Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee,	(参考訳) ファンデーションモデルパラダイムは、共有ファンデーションモデルを利用して、さまざまなタスクに対して最先端(SOTA)のパフォーマンスを実現し、下流固有のモデリングとデータアノテーションを最小限にする必要がある。このアプローチは自然言語処理(NLP)分野において極めて重要であることが証明されている。しかし、音声処理コミュニティには、このパラダイムを体系的に探求するための同様の設定が欠けている。本研究では,音声処理の汎用性能ベンチマーク (SUPERB) を構築し,このパラダイムの有効性について検討する。凍結基盤モデルを用いてSUPERBにおける音声処理タスクに対処する統合マルチタスクフレームワークを提案する。この結果とコミュニティの投稿とを組み合わせることで,基礎モデルパラダイムがスピーチに有望であること,マルチタスクフレームワークがシンプルかつ効果的であること,そして最も優れた基礎モデルが,ほとんどのSUPERBタスク間での競争的一般化性を示していること,などが確認できる。再現性と拡張性のために、決定論的ベンチマークを可能にし、オンラインのリーダーボードによる結果共有を可能にし、コミュニティ主導のベンチマークデータベースを通じてコラボレーションを促進し、新しい開発サイクルをサポートする長期的なプラットフォームを開発しました。最後に,SUPERBと音声基礎モデルの詳細な理解を目的とした一連の分析を行い,モデル内のタスク間の情報フロー,重み付きベンチマークプロトコルの正確性,ベンチマークの統計的意義と堅牢性などについて述べる。 The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark.	翻訳日:2024-05-31 20:25:21 公開日:2024-05-29
# インシデント応答GPT:生成人工知能を用いた交通事故対応計画の作成 IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence ( http://arxiv.org/abs/2404.18550v2 ) ライセンス: Link先を確認	Artur Grigorev, Adriana-Simona Mihaita Khaled Saleh, Yuming Ou,	(参考訳) 道路事故による交通渋滞は、都市環境において大きな課題となり、汚染、経済的な損失、交通渋滞が増大する。これらのインシデントを効果的に管理することは、その悪影響を軽減するために不可欠であるが、都市交通システムの複雑さと潜在的なインシデントの多様性は、かなりの障害を表している。本稿では,迅速な情報提供,適応可能な交通事故対応計画を提供することで,交通管理当局を支援する革新的なソリューションであるインシデントレスGPTを紹介する。生成型AIプラットフォームをリアルタイムトラフィックインシデントレポートと運用ガイドラインに統合することにより,交通インシデントに対応する意思決定プロセスの合理化を目指す。この研究は、交通管理におけるAIの展開に関わる重要な課題に対処する。都市交通ネットワークの複雑さの克服、リアルタイムな意思決定能力の確保、地方法と規制の整合、AI駆動システムに対する公的な受け入れの確保などだ。事故報告のテキスト分析、交通シミュレーションによるAIレコメンデーションの検証、透明で検証されたAIシステムの実装の組み合わせを通じて、IncidenceResponseGPTは、トラフィックフローを最適化し、交通インシデントに直面した混雑を低減するための有望なアプローチを提供する。この作業は、交通管理当局、緊急対応チーム、自治体など、都市交通管理とインシデント管理のすべての統合的なステークホルダーにも及んでいる。本研究は,交通事故の迅速解決だけでなく,都市交通システムへの全体的な影響を最小限に抑える枠組みを開発することを目的としている。 Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems.	翻訳日:2024-05-31 20:25:21 公開日:2024-05-29
# LLMの理解には統計的一般化以上のものが必要だ Understanding LLMs Requires More Than Statistical Generalization ( http://arxiv.org/abs/2405.01964v2 ) ライセンス: Link先を確認	Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár,	(参考訳) この10年、ディープラーニング理論における花の咲く研究が「なぜディープラーニングは一般化するのか?」と答えようとしている。パースペクティブの強力なシフトは、補間系における過度にパラメトリケートされたモデルの研究という、この進歩を早めた。本稿では, LLMの望ましい性質のいくつかは, 良好な統計一般化の結果ではなく, 別々に理論的な説明を必要とするため, もう一つの視点シフトが原因であると主張する。我々の中心的な議論は、AR確率モデルは本質的には識別不可能である、という観察に依存している。我々は,(1)ゼロショット規則外挿の非識別性,(2)文脈内学習の近似的非識別性,(3)微視的学習の非識別性という3つのケーススタディを通じて,非識別性が実際的関連性を持つ理由を考察した。我々は, LLM関連一般化対策, 伝達可能性, 誘導バイアスに着目した有望な研究方向性を概観する。 The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.	翻訳日:2024-05-31 20:25:21 公開日:2024-05-29
# 宇宙における機械学習: 搭載MLモデルの放射能に対するロバスト性の調査 Machine Learning in Space: Surveying the Robustness of on-board ML models to Radiation ( http://arxiv.org/abs/2405.02642v2 ) ライセンス: Link先を確認	Kevin Lange, Federico Fontana, Francesco Rossi, Mattia Varile, Giovanni Apruzzese,	(参考訳) 現代の宇宙船はますます機械学習(ML)に依存している。しかし、宇宙の物理機器は、放射線などの様々な自然の危険にさらされており、コンピュータ装置の正しい操作を阻害する可能性がある。自然に誘発される欠陥がML関連ハードウェアに損傷をもたらすことを示す証拠は数多くあるが、宇宙用途のMLモデルに対する放射の影響は十分に研究されていない。これは問題であり、これらの自然現象によってMLモデルがどのように影響を受けるかを理解していないため、放射耐性MLソフトウェアを開発する上で「どこから始めるか」は不確実である。 ML研究者として、私たちはこのジレンマに取り組みます。機械学習を専門とするスペースインダストリー実践者と組むことで,最先端技術に関するリフレクティブな分析を行う。本研究は, 宇宙船用MLモデルに対する自然災害の影響について, 先行研究が徹底的に検証しなかった事実を提示する。そして、"負の結果"を通して、いくつかの既存のオープンソース技術は、衛星におけるMLのいくつかの応用に対する放射の影響を研究するために、研究者によってはほとんど利用できないことを示す。建設的なステップとして、我々は現在のフレームワークを活用して、放射誘発断層に対するクラウド検出のための実用的なMLモデルのロバスト性を評価するための簡単な実験を行った。我々の評価は、すべての欠点が、いくつかの先行研究で主張されているような破壊的なものではないことを明らかにしている。私たちのリソースを一般公開することで、宇宙耐性MLモデルの開発を先導するために、研究者が宇宙船にアクセスできる足場を提供しています。 Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space applications are not well-studied. This is a problem: without understanding how ML models are affected by these natural phenomena, it is uncertain "where to start from" to develop radiation-tolerant ML software. As ML researchers, we attempt to tackle this dilemma. By partnering up with space-industry practitioners specialized in ML, we perform a reflective analysis of the state of the art. We provide factual evidence that prior work did not thoroughly examine the impact of natural hazards on ML models meant for spacecraft. Then, through a "negative result", we show that some existing open-source technologies can hardly be used by researchers to study the effects of radiation for some applications of ML in satellites. As a constructive step forward, we perform simple experiments showcasing how to leverage current frameworks to assess the robustness of practical ML models for cloud detection against radiation-induced faults. Our evaluation reveals that not all faults are as devastating as claimed by some prior work. By publicly releasing our resources, we provide a foothold -- usable by researchers without access to spacecraft -- for spearheading development of space-tolerant ML models.	翻訳日:2024-05-31 20:15:18 公開日:2024-05-29
# 知識グラフに基づくニューラルシンボリックシステムの研究 Exploring knowledge graph-based neural-symbolic system from application perspective ( http://arxiv.org/abs/2405.03524v4 ) ライセンス: Link先を確認	Shenzhe Zhu, Shengxiang Sun,	(参考訳) 人工知能(AI)とディープニューラルネットワークの進歩は、視覚とテキスト処理に大きな進歩をもたらした。しかし、AIシステムにおける人間のような推論と解釈可能性を達成することは、依然として大きな課題である。ニューラルネットワークをシンボリックシステムと統合するNeural-Symbolicパラダイムは、より解釈可能なAIへの有望な経路を提供する。このパラダイムの中では、知識グラフ(KG)が重要であり、相互接続された実体や関係を通じて知識を表現する構造的かつ動的な方法を提供する。本稿では、KGに基づくニューラルシンボリック統合の最近の進歩について、ニューラルネットワークの論理的知識による推論と解釈可能性の向上(Symbol for Neural)、ニューラルネットワーク手法(Neural for Symbol)によるシンボリックシステムの完全性と正確性の改善(Neural for Symbol)、ハイブリッドニューラルシンボリック統合(Hybrid Neural-Symbolic Integration)におけるそれらの組み合わせ適用の促進という、3つのカテゴリの統合をサポートする方法について検討する。最新のトレンドを強調し、Neural-Symbolic AIにおける今後の研究方向を提案する。 Advancements in Artificial Intelligence (AI) and deep neural networks have driven significant progress in vision and text processing. However, achieving human-like reasoning and interpretability in AI systems remains a substantial challenge. The Neural-Symbolic paradigm, which integrates neural networks with symbolic systems, presents a promising pathway toward more interpretable AI. Within this paradigm, Knowledge Graphs (KG) are crucial, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, typically as triples (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, examining how it supports integration in three categories: enhancing the reasoning and interpretability of neural networks with symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes future research directions in Neural-Symbolic AI.	翻訳日:2024-05-31 20:15:18 公開日:2024-05-29
# 耳に耳を傾ける:雑音のある音声をターゲットに Look Once to Hear: Target Speech Hearing with Noisy Examples ( http://arxiv.org/abs/2405.06289v3 ) ライセンス: Link先を確認	Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota,	(参考訳) 混み合った環境では、人間の脳はターゲット話者からのスピーチに集中することができる。本稿では,この能力を実現するための新しいインテリジェントな聴取システムを提案する。ナイーブなアプローチは、ターゲット話者を登録するためにクリーンな音声サンプルを必要とすることである。しかしこれは、クリーンな例を得ることは現実のシナリオでは困難であり、ユニークなユーザーインターフェイスの問題を生み出すため、聞き取り可能なアプリケーションドメインとうまく一致しない。本稿では,対象話者を数秒間観察して,目標話者の単一,短く,雑音の多いバイノーラルな例を捉える,最初の登録インタフェースを提案する。このノイズのある例は、干渉する話者や雑音の存在下での音声抽出の登録と後続の音声抽出に使用される。本システムでは,5秒未満の雑音の入出力音声を用いて7.01dBの信号品質向上を実現し,6.24msで8ミリ秒の音声チャンクを処理可能である。本研究は,屋内および屋外のマルチパス環境における実世界の静的・移動型話者への一般化を実証するものである。最後に、ノイズの多い例の登録インターフェースは、クリーンな例に比べてパフォーマンスの劣化を起こさないが、便利でユーザフレンドリーである。一歩後退して、人工知能による人間の聴覚知覚を高めるための重要な一歩を踏み出した。 https://github.com/vb000/LookOnceToHear.com/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/ s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence. We provide code and data at: https://github.com/vb000/LookOnceToHear.	翻訳日:2024-05-31 20:15:18 公開日:2024-05-29
# URDFormer: 実世界の画像から人工シミュレーション環境を構築するパイプライン URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images ( http://arxiv.org/abs/2405.11656v2 ) ライセンス: Link先を確認	Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta,	(参考訳) 視覚的にも身体的にも現実的にもシミュレーションシーンを構築することは、ロボット工学からコンピュータビジョンまで、領域における実践的な関心の問題である。この問題は、大規模なデータハングリー学習手法が物理的な意思決定システムのための新たなトレーニングデータソースを求める研究者によってさらに重要になっている。しかし、シミュレーションモデルの構築は依然として手作業で行われていることが多い。グラフィックデザイナとシミュレーションエンジニアは、事前に定義された資産を使って、リアルな動的およびキネマティックな特性を持つリッチなシーンを構築する。これは、データ駆動型ロボット制御に必要な一般化特性を達成するために、少数のシーンにスケールする可能性があるが、我々は「自然」キネマティック構造と動的構造を完備した、多数の現実的なシーンを合成できるパイプラインが必要である。この問題に対処するため、我々は自然画像から構造を推論しシミュレーションシーンを生成するモデルを開発し、Webスケールのデータセットからスケーラブルなシーン生成を可能にした。これらのイメージ・トゥ・シミュレートモデルをトレーニングするために、現実的な画像から完全なシーンモデルへのマッピング、逆問題のモデル化を可能にするペア化トレーニングデータを生成するために、制御可能なテキスト・ツー・イメージ生成モデルをどのように利用できるかを示す。このパラダイムによって、セマンティックおよび物理リアリズムを用いたシミュレーションにおいて、大規模なシーンデータセットを構築することができることを示す。本稿では,実世界の画像から機械的・動的構造を表現したシミュレーションシーンを生成し,ロボット制御ポリシのトレーニングに使用する統合エンドツーエンドパイプラインを提案する。そして、オブジェクトの操作のようなタスクのために、現実世界にしっかりとデプロイします。そこで本研究は,シミュレーション環境を大規模に生成するためのパイプラインと,ロバストなロボット制御ポリシをトレーニングする統合システムの両方を提供する。 Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by hand. A graphic designer and a simulation engineer work with predefined assets to construct rich scenes with realistic dynamic and kinematic properties. While this may scale to small numbers of scenes, to achieve the generalization properties that are required for data-driven robotic control, we require a pipeline that is able to synthesize large numbers of realistic scenes, complete with 'natural' kinematic and dynamic structures. To attack this problem, we develop models for inferring structure and generating simulation scenes from natural images, allowing for scalable scene generation from web-scale datasets. To train these image-to-simulation models, we show how controllable text-to-image generative models can be used in generating paired training data that allows for modeling of the inverse problem, mapping from realistic images back to complete scene models. We show how this paradigm allows us to build large datasets of scenes in simulation with semantic and physical realism. We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images and use these for training robotic control policies. We then robustly deploy in the real world for tasks like articulated object manipulation. In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies in the resulting environments.	翻訳日:2024-05-31 20:15:18 公開日:2024-05-29
# 自己改善による大規模視覚言語モデルにおける視覚言語モダリティアライメントの強化 Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement ( http://arxiv.org/abs/2405.15973v2 ) ライセンス: Link先を確認	Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao,	(参考訳) 大規模視覚言語モデル(LVLM)は、特定のデータセットに対する視覚指導による様々な視覚的質問応答および推論タスクにおいて印象的な結果を得た。しかし、視覚的モダリティと言語的モダリティの整合性を改善する余地は依然として大きい。このアライメントを強化するには、通常、その能力と品質に大きく依存する外部モデルやデータが必要である。本稿では,自己改善による視覚的・言語的モダリティの整合性を向上し,外部モデルやデータの必要性を解消するフレームワークであるSIMAを提案する。 SIMAは、既存のビジョンインストラクションチューニングデータセットからのプロンプトを活用して、自己生成応答を生成し、コンテキスト内自己批判機構を使用して、優先順位調整のためのレスポンスペアを選択する。重要なイノベーションは、コンテキスト内自己批判プロセス中に3つの視覚メトリクスを導入し、画像の理解を深める応答の選択においてLVLMを導くことである。 14の幻覚と総合的なベンチマークの実験を通して、SIMAは全てのベンチマークでモデル性能を向上するだけでなく、過去のアプローチよりも優れたモダリティアライメントを実現することを示した。 Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending on their capabilities and quality, which inevitably sets an upper bound on performance. In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data. SIMA leverages prompts from existing vision instruction tuning datasets to self-generate responses and employs an in-context self-critic mechanism to select response pairs for preference tuning. The key innovation is the introduction of three vision metrics during the in-context self-critic process, which can guide the LVLM in selecting responses that enhance image comprehension. Through experiments across 14 hallucination and comprehensive benchmarks, we demonstrate that SIMA not only improves model performance across all benchmarks but also achieves superior modality alignment, outperforming previous approaches.	翻訳日:2024-05-31 20:05:24 公開日:2024-05-29
# 医用テキストデータの要約におけるオープンソース言語モデルの比較分析 Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data ( http://arxiv.org/abs/2405.16295v3 ) ライセンス: Link先を確認	Yuhao Chen, Zhimu Wang, Bo Wen, Farhana Zulkernine,	(参考訳) 医療ノートや対話における構造化されていないテキストには、豊富な情報が含まれている。近年のLarge Language Models (LLMs) の進歩は、非構造化テキストデータに対する回答および要約タスクにおいて優れた性能を示し、従来のテキスト解析手法よりも優れている。しかし、医学図表のような分野固有のデータに対して、異なるLCMの性能を客観的に評価し報告する科学的研究は文献に欠けている。 GPT-4 をアセスメントとして,医療要約タスクにおける Llama2 や Mistral などのオープンソース LLM の性能評価手法を提案する。 LLMの定量的評価に対する革新的なアプローチは、品質管理を可能にし、特定のタスクに有効なLLMの選択を支援し、デジタルヘルスにおける知識発見を促進する。 Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.	翻訳日:2024-05-31 19:55:33 公開日:2024-05-29
# 重要なことを記憶する: マルチトラバースからの創発的シーン分解 Memorize What Matters: Emergent Scene Decomposition from Multitraverse ( http://arxiv.org/abs/2405.17187v2 ) ライセンス: Link先を確認	Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez,	(参考訳) 人間は自然に永久的な要素の記憶を保持するが、短命の瞬間はしばしば記憶のひび割れを乗り越える。この選択的保持は、ロボット知覚、局所化、マッピングに不可欠である。ロボットにこの能力を付与するために,3次元ガウスマッピング(3DGM)を導入する。 3DGMは、同じ領域から複数のRGBビデオをガウスベースの環境マップに変換し、同時に2D短命なオブジェクトセグメンテーションを実行する。私たちのキーとなる観察は、オブジェクトが頻繁に変化する間、環境は横断的に一貫しているということです。これにより、環境オブジェクトの分解を実現するために、繰り返し発生するトラバーサルからの自己超越を活用できる。より具体的には、3DGMは、堅牢な微分可能なレンダリング問題としてマルチトラバース環境マッピングを定式化し、環境のピクセルとオブジェクトをそれぞれインレーヤとアウトレーヤとして扱う。頑健な特徴蒸留, 特徴残量マイニング, 頑健な最適化を用いて, 3DGMは人間の介入なしに2次元分割と3次元マッピングを共同で行う。 We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and Neural rendering。本手法の有効性と可能性を検証した。 Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.	翻訳日:2024-05-31 19:55:33 公開日:2024-05-29
# 乱流の大規模渦シミュレーションのためのデータ駆動クロージャモデルの誤差解析 A note on the error analysis of data-driven closure models for large eddy simulations of turbulence ( http://arxiv.org/abs/2405.17612v2 ) ライセンス: Link先を確認	Dibyajyoti Chakraborty, Shivam Barwey, Hong Zhang, Romit Maulik,	(参考訳) 本研究では,データ駆動型乱流閉鎖モデルを用いて,流れの軌跡予測における誤差伝搬の数学的定式化を行う。大渦シミュレーション予測の予測状態がサブサンプル直接数値シミュレーションの予測状態に近くなければならないという仮定の下で,データ駆動クロージャモデルを利用する場合の予測誤差の上限を求める。また、この誤差は、時間ステップサイズと、クロージャを用いて最初のワンステップエラーを増幅する役割を担っているヤコビアンに大きく影響されることも示している。また, この誤差は, 閉包定式化のジャコビアンの影響を受けやすいシステムヤコビアンの上界とロールアウト時間で指数関数的に伝播することを示した。これらの知見は、同定されたエラーバウンド項に基づくMLモデルの新たな正規化手法の開発を可能にし、その堅牢性を改善し、エラーの伝播を低減する。 In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation.	翻訳日:2024-05-31 19:55:33 公開日:2024-05-29
# ハイブリッドな選好最適化:補助的目的による直接選好最適化の強化 Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives ( http://arxiv.org/abs/2405.17956v2 ) ライセンス: Link先を確認	Anirudhan Badrinath, Prabhat Agarwal, Jiajing Xu,	(参考訳) 大規模言語モデル(LLM)の整合性を確保するため、先行研究は人間フィードバック(RLHF)や直接選好最適化(DPO)による強化学習を活用している。 DPOは、最大推定に基づいてより単純なフレームワークを提供するが、LLM設計者の好みに応じて、言語モデルをチューニングし、非微分可能および非バイナリ目的を容易に最大化する能力に妥協する(例えば、より単純な言語を使用したり、特定の有害なコンテンツを最小化するなど)。これらは、ユーザの好みと一致せず、バイナリの好みデータによって引き付けられることもない。本稿では,DPOの簡易性と性能をRLの一般化性に活かすために,DPOとRLHFのハイブリッドアプローチを提案する。 DPOの暗黙的な報酬分解に対する単純な拡張により、LLM をチューニングすることで、オフライン RL を用いて任意の補助報酬の集合を最大化することができる。提案手法であるHybrid Preference Optimization (HPO) は, ユーザの嗜好と補助的設計目的の両方に効果的に一般化できると同時に, 様々な課題のあるベンチマークやモデルサイズでアライメント性能を保っていることを示す。 For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to tune language models to easily maximize non-differentiable and non-binary objectives according to the LLM designer's preferences (e.g., using simpler language or minimizing specific kinds of harmful content). These may neither align with user preferences nor even be able to be captured tractably by binary preference data. To leverage the simplicity and performance of DPO with the generalizability of RL, we propose a hybrid approach between DPO and RLHF. With a simple augmentation to the implicit reward decomposition of DPO, we allow for tuning LLMs to maximize a set of arbitrary auxiliary rewards using offline RL. The proposed method, Hybrid Preference Optimization (HPO), shows the ability to effectively generalize to both user preferences and auxiliary designer objectives, while preserving alignment performance across a range of challenging benchmarks and model sizes.	翻訳日:2024-05-31 19:45:41 公開日:2024-05-29
# 効果的な崩壊理論としての因果フェルミオン系 Causal Fermion Systems as an Effective Collapse Theory ( http://arxiv.org/abs/2405.19254v1 ) ライセンス: Link先を確認	Felix Finster, Johannes Kleiner, Claudio F. Paganini,	(参考訳) 非相対論的極限において、因果フェルミオン系は効果的な崩壊理論をもたらすことが示されている。 Schr\\odinger方程式に対する非線形および確率的補正項は因果作用原理から導かれる。統計作用素の力学は、Kossakowski-Lindblad形式の決定論的方程式によって記述される。さらに、量子状態はボルンの規則と互換性のある動的崩壊を起こす。有効モデルは連続自発局所化モデルと類似しているが、確率積分の保存法則と顕微鏡長スケール$\ell_{\min}$の時間的非局所性により異なる。 It is shown that, in the non-relativistic limit, causal fermion systems give rise to an effective collapse theory. The nonlinear and stochastic correction terms to the Schr\"odinger equation are derived from the causal action principle. The dynamics of the statistical operator is described by a deterministic equation of Kossakowski-Lindblad form. Moreover, the quantum state undergoes a dynamical collapse compatible with Born's rule. The effective model has similarities with the continuous spontaneous localization model, but differs from it by a conservation law for the probability integral as well as a non-locality in time on a microscopic length scale $\ell_{\min}$.	翻訳日:2024-05-31 19:45:41 公開日:2024-05-29
# O(\sqrt{T})= Regret を用いた線形二次レギュレータ学習のための近似トンプソンサンプリング Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret ( http://arxiv.org/abs/2405.19380v1 ) ライセンス: Link先を確認	Yeoneung Kim, Gihun Kim, Insoon Yang,	(参考訳) 本稿では,線形二次レギュレータ(LQR)を改良したベイズ的残差値$O(\sqrt{T})$で学習する近似トンプソンサンプリングアルゴリズムを提案する。本手法では,Langevin の動的特性と簡単な励起機構を巧みに設計したプレコンディショナーを用いる。励振信号は、プレコンディショナーの最小固有値を時間とともに増加させ、近似した後方サンプリングプロセスを加速させることを示す。さらに,本アルゴリズムにより生成された近似後縁部の非自明な濃度特性を同定する。これらの性質により、システム状態のモーメントを束縛し、文献でよく使われるパラメータ集合に対する非現実的な制限的仮定を伴わずに$O(\sqrt{T})$ regret boundを達成できる。 We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Moreover, we identify nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without the unrealistic restrictive assumptions on parameter sets that are often used in the literature.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 反モニー洗浄のためのネットワーク分析 -系統的な文献レビューと実験的評価- Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation ( http://arxiv.org/abs/2405.19383v1 ) ライセンス: Link先を確認	Bruno Deprez, Toon Vanderschueren, Wouter Verbeke, Bart Baesens, Tim Verdonck,	(参考訳) マネーロンダリングは、違法な活動の資金提供によって社会を負担する、広範囲にわたる課題を提示する。マネーロンダリングをより効果的に戦い、検出するために、ネットワーク情報の利用がますます検討され、マネーロンダリングには必ずしも相互接続されたパーティが伴うことを悪用している。これにより、反マネーロンダリング(AML)のためのネットワーク分析(NA)に関する文献が急増した。しかし、文献は断片化されており、既存の作品の包括的な概要が欠落している。これにより、適用可能なメソッドとその比較検出能力の限定的な理解がもたらされる。そこで本稿では,文献の大規模かつ体系的なレビューを行う。我々は、Web of ScienceとScopusデータベースの97の論文を特定し分析し、その結果、Bockel-Rickermannらの詐欺分析フレームワークによるアプローチの分類結果を得た。さらに,一様セットアップにおける顕著なNA手法の性能評価と比較を行うための総合的な実験フレームワークを提案する。このフレームワークは一般公開されているEllipticデータセットに適用され、手動機能エンジニアリング、ランダムウォークベースのメソッド、ディープラーニングGNNを実装している。ネットワーク分析により,グラフニューラルネットワークを用いたAMLモデルの予測能力が向上し,最良の結果が得られた。研究者や実践者がこれらの結果を拡張し、プロプライエタリなデータで実験できるように、実験フレームワークのオープンソース実装が提供されている。そこで我々は,AMLにおけるネットワーク分析の分析と評価に向けて,標準化されたアプローチを推進することを目的としている。 Money laundering presents a pervasive challenge, burdening society by financing illegal activities. To more effectively combat and detect money laundering, the use of network information is increasingly being explored, exploiting that money laundering necessarily involves interconnected parties. This has lead to a surge in literature on network analytics (NA) for anti-money laundering (AML). The literature, however, is fragmented and a comprehensive overview of existing work is missing. This results in limited understanding of the methods that may be applied and their comparative detection power. Therefore, this paper presents an extensive and systematic review of the literature. We identify and analyse 97 papers in the Web of Science and Scopus databases, resulting in a taxonomy of approaches following the fraud analytics framework of Bockel-Rickermann et al.. Moreover, this paper presents a comprehensive experimental framework to evaluate and compare the performance of prominent NA methods in a uniform setup. The framework is applied on the publicly available Elliptic data set and implements manual feature engineering, random walk-based methods, and deep learning GNNs. We conclude from the results that network analytics increases the predictive power of the AML model with graph neural networks giving the best results. An open source implementation of the experimental framework is provided to facilitate researchers and practitioners to extend upon these results and experiment on proprietary data. As such, we aim to promote a standardised approach towards the analysis and evaluation of network analytics for AML.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# VLEOシミュレーションのためのニューラルネットワーク:熱圏モデリングのためのサーモネットの導入 NeuralODEs for VLEO simulations: Introducing thermoNET for Thermosphere Modeling ( http://arxiv.org/abs/2405.19384v1 ) ライセンス: Link先を確認	Dario Izzo, Giacomo Acciarini, Francesco Biscani,	(参考訳) 本研究では,衛星軌道伝搬における熱圏密度を表現するために,微分可能な計算量を削減した新しいニューラルアーキテクチャ「サーモネット」を提案する。運動方程式の右側にニューラルネットワークが現れるため、結果として生じる衛星力学はニューラルノード(NeuralODE)によって制御され、その完全に微分可能な性質によって特徴付けられる。ネットワークパラメータの効率的なトレーニングは、2つの異なるアプローチによって行われる。最初のアプローチでは、ネットワークは宇宙船の動力学とは独立して訓練を行い、JB-08やNRLMSISE-00といった地上の真理モデルに対して純粋な回帰処理を行う。第2のパラダイムでは、ネットワークパラメータは観測されたダイナミクスに基づいて学習され、ODEの感度によって適応する。どちらの場合も、結果はフレキシブルでコンパクトな熱圏密度モデルであり、軌道予測の精度を維持しながら数値伝播効率を大幅に向上させる。 We introduce a novel neural architecture termed thermoNET, designed to represent thermospheric density in satellite orbital propagation using a reduced amount of differentiable computations. Due to the appearance of a neural network on the right-hand side of the equations of motion, the resulting satellite dynamics is governed by a NeuralODE, a neural Ordinary Differential Equation, characterized by its fully differentiable nature, allowing the derivation of variational equations (hence of the state transition matrix) and facilitating its use in connection to advanced numerical techniques such as Taylor-based numerical propagation and differential algebraic techniques. Efficient training of the network parameters occurs through two distinct approaches. In the first approach, the network undergoes training independently of spacecraft dynamics, engaging in a pure regression task against ground truth models, including JB-08 and NRLMSISE-00. In the second paradigm, network parameters are learned based on observed dynamics, adapting through ODE sensitivities. In both cases, the outcome is a flexible, compact model of the thermosphere density greatly enhancing numerical propagation efficiency while maintaining accuracy in the orbital predictions.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 10年ぶりのビデオ異常検出:調査と展望 Video Anomaly Detection in 10 Years: A Survey and Outlook ( http://arxiv.org/abs/2405.19387v1 ) ライセンス: Link先を確認	Moshira Abdalla, Sajid Javed, Muaz Al Radi, Anwaar Ulhaq, Naoufel Werghi,	(参考訳) ビデオ異常検出(VAD)は、監視、医療、環境監視といった様々な領域において非常に重要である。多くの調査では従来のVAD手法に重点を置いているが、特定のアプローチや新たなトレンドを探求する深みを欠いていることが多い。この調査では、従来の教師付きトレーニングパラダイムを超えて、弱教師付き、自己監督型、教師なしのアプローチを包含する、ディープラーニングベースのVADを調査している。このレビューの顕著な特徴は、大規模なデータセット、特徴抽出、学習方法、損失関数、正規化、異常スコア予測を含む、VADパラダイムの中核的な課題の調査である。さらに,視覚言語モデル(VLM)をVADの強力な特徴抽出器として検討した。 VLMは視覚データをビデオからテキスト記述や音声言語と統合し、異常検出に不可欠なシーンの微妙な理解を可能にする。これらの課題に対処し、今後の研究方向性を提案することにより、複雑な実世界のシナリオにおいて、VLMの能力を活用した堅牢で効率的なVADシステムの開発を促進することを目的としている。この包括的分析は、既存の知識ギャップを埋め、研究者に貴重な洞察を与え、VAD研究の将来形成に貢献しようとしている。 Video anomaly detection (VAD) holds immense importance across diverse domains such as surveillance, healthcare, and environmental monitoring. While numerous surveys focus on conventional VAD methods, they often lack depth in exploring specific approaches and emerging trends. This survey explores deep learning-based VAD, expanding beyond traditional supervised training paradigms to encompass emerging weakly supervised, self-supervised, and unsupervised approaches. A prominent feature of this review is the investigation of core challenges within the VAD paradigms including large-scale datasets, features extraction, learning methods, loss functions, regularization, and anomaly score prediction. Moreover, this review also investigates the vision language models (VLMs) as potent feature extractors for VAD. VLMs integrate visual data with textual descriptions or spoken language from videos, enabling a nuanced understanding of scenes crucial for anomaly detection. By addressing these challenges and proposing future research directions, this review aims to foster the development of robust and efficient VAD systems leveraging the capabilities of VLMs for enhanced anomaly detection in complex real-world scenarios. This comprehensive analysis seeks to bridge existing knowledge gaps, provide researchers with valuable insights, and contribute to shaping the future of VAD research.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 統一的変分法による2次元電子ガスの基底状態相 Ground state phases of the two-dimension electron gas with a unified variational approach ( http://arxiv.org/abs/2405.19397v1 ) ライセンス: Link先を確認	Conor Smith, Yixiao Chen, Ryan Levy, Yubo Yang, Miguel A. Morales, Shiwei Zhang,	(参考訳) 2次元電子ガス(2DEG)は基本的なモデルであり、近年の2次元物質の実験的および理論的研究の進展により関心が高まりつつある。 2DEGの基底状態の現在の理解は、異なる相に対する異なるアンサーゼの変分比較に基づいて、量子モンテカルロ計算に依存する。我々は, メッセージパス型ニューラル量子状態アーキテクチャを用いた一般的な逆流型波動関数である単一変分アンサッツを用いて, 密度範囲全体の統一的な記述を行う。変分最適化は、前回の最良の結果よりも低い基底状態エネルギーをもたらす。ウィグナー結晶(WC)相への遷移は、現在信じられているよりも低い密度の rs = 37 +/- 1 で自動的に起こる。液体とWC相の間、同じアンザッツと変分探索は、幅広い密度の中間状態の存在を強く示唆し、短距離ネマティックスピン相関が強化された。 The two-dimensional electron gas (2DEG) is a fundamental model, which is drawing increasing interest because of recent advances in experimental and theoretical studies of 2D materials. Current understanding of the ground state of the 2DEG relies on quantum Monte Carlo calculations, based on variational comparisons of different ansatze for different phases. We use a single variational ansatz, a general backflow-type wave function using a message-passing neural quantum state architecture, for a unified description across the entire density range. The variational optimization consistently leads to lower ground-state energies than previous best results. Transition into a Wigner crystal (WC) phase occurs automatically at rs = 37 +/- 1, a density lower than currently believed. Between the liquid and WC phases, the same ansatz and variational search strongly suggest the existence of intermediate states in a broad range of densities, with enhanced short-range nematic spin correlations.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 大N場理論のニューラルスケーリング法則:リッジレス限界を超えた解法モデル Neural Scaling Laws From Large-N Field Theory: Solvable Model Beyond the Ridgeless Limit ( http://arxiv.org/abs/2405.19398v1 ) ライセンス: Link先を確認	Zhengkang Zhang,	(参考訳) ニューラルネットワークに基づく多くの機械学習モデルはスケーリング法則を示しており、その性能はモデルのサイズとトレーニングデータセットに関するパワー法則としてスケールする。我々は、Maloney, Roberts, Sully が最近提案したモデルにおいて、ニューラルスケーリング法則を研究するための簡易な設定を提供するために、大N場の理論法を用いる。本手法は, モデル動作の正則化に不可欠であるリッジパラメータの非ゼロ値に対して, 後者の論文の結果を拡張した。新たなより正確なスケーリング法則の獲得に加えて、モデルとトレーニングデータセットサイズの間の対称性を説明するダイアグラムレベルでの双対性変換も発見する。同じ双対性は、量子場理論をシミュレートするニューラルネットワークを設計する最近の試みの根底にある。 Many machine learning models based on neural networks exhibit scaling laws: their performance scales as power laws with respect to the sizes of the model and training data set. We use large-N field theory methods to solve a model recently proposed by Maloney, Roberts and Sully which provides a simplified setting to study neural scaling laws. Our solution extends the result in this latter paper to general nonzero values of the ridge parameter, which are essential to regularize the behavior of the model. In addition to obtaining new and more precise scaling laws, we also uncover a duality transformation at the diagrams level which explains the symmetry between model and training data set sizes. The same duality underlies recent efforts to design neural networks to simulate quantum field theories.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 直接絡み合い生成のためのパリティ依存状態伝達 Parity-dependent state transfer for direct entanglement generation ( http://arxiv.org/abs/2405.19408v1 ) ライセンス: Link先を確認	Federico A. Roy, João H. Romeiro, Leon Koch, Ivan Tsitsilin, Johannes Schirk, Niklas J. Glaser, Niklas Bruckmoser, Malay Singh, Franz X. Haslbeck, Gerhard B. P. Huber, Gleb Krylov, Achim Marx, Frederik Pfeiffer, Christian M. F. Schneider, Christian Schweizer, Florian Wallner, David Bunch, Lea Richard, Lasse Södergren, Klaus Liegener, Max Werninghaus, Stefan Filipp,	(参考訳) 量子情報技術が進歩するにつれ、スケーリングと接続性の課題に直面している。特に、遠方の量子ビット間の接続の必要性と、効率的な絡み合いの生成の必要性の2つが技術的実装から独立している。完全状態移動(Perfect State Transfer)は、キュービット格子の遠いノード間の量子状態の時間的最適転送と、最も近い近傍結合のみを実現する技術であり、デバイス接続性を改善する重要なツールを提供する。重要なことに、転送プロトコルは効果的なパリティに依存しない非局所的な相互作用をもたらし、その効用を絡み合った状態の効率的な生成にまで拡張する。ここでは, 超伝導量子ビットの連鎖上での完全状態移動と多ビット絡みの発生を実験的に実証する。このシステムは6つの固定周波数トランスモンキュービットから構成されており、結合はパラメトリックドライブを介して制御される。すべての結合を同時に活性化し、個々の振幅と周波数をエンジニアリングすることにより、最大6キュービットのパーフェクトステートトランスファーを実装し、異なる初期状態に対するそれぞれの単一励起ダイナミクスを観察する。次に、このプロトコルを複数の励起の存在下で適用し、そのパリティに依存した性質を検証する。最後に、この特性を利用して、単一転送操作のみを用いてマルチキュービットグリーンバーガー・ホーネ・ザイリンガー状態を作成し、その効率的な絡み合い生成への応用を実証する。 As quantum information technologies advance they face challenges in scaling and connectivity. In particular, two necessities remain independent of the technological implementation: the need for connectivity between distant qubits and the need for efficient generation of entanglement. Perfect State Transfer is a technique which realises the time optimal transfer of a quantum state between distant nodes of qubit lattices with only nearest-neighbour couplings, hence providing an important tool to improve device connectivity. Crucially, the transfer protocol results in effective parity-dependent non-local interactions, extending its utility to the efficient generation of entangled states. Here, we experimentally demonstrate Perfect State Transfer and the generation of multi-qubit entanglement on a chain of superconducting qubits. The system consists of six fixed-frequency transmon qubits connected by tunable couplers, where the couplings are controlled via parametric drives. By simultaneously activating all couplings and engineering their individual amplitudes and frequencies, we implement Perfect State Transfer on up to six qubits and observe the respective single-excitation dynamics for different initial states. We then apply the protocol in the presence of multiple excitations and verify its parity-dependent property, where the number of excitations within the chain controls the phase of the transferred state. Finally, we utilise this property to prepare a multi-qubit Greenberger-Horne-Zeilinger state using only a single transfer operation, demonstrating its application for efficient entanglement generation.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 物質のk局所量子相の安定性について On stability of k-local quantum phases of matter ( http://arxiv.org/abs/2405.19412v1 ) ライセンス: Link先を確認	Ali Lavasani, Michael J. Gullans, Victor V. Albert, Maissam Barkeshli,	(参考訳) 現在のトポロジカル位相の理論の枠組みは、幾何学的に局所的な相互作用を持つ系の熱力学的極限に基づいている。自然な疑問は、幾何学的局所性の制約を緩和し、それをより弱いグラフ理論の$k$-局所性の概念に置き換えるならば、物質相の概念がどの程度明確に定義されているかである。この問題に対処するためのステップとして、一般的な量子的低密度パリティチェック符号に対応するハミルトンの摂動に対するエネルギーギャップの安定性を分析し、Bravyi と Hastings [Commun. Math. Phys. 307, 609 (2011)] の仕事を延長する。主な結果のまとめとして、もしある定数 $\varepsilon_1,\varepsilon_2>0$ が存在して、相互作用グラフ上の半径 $r の球の大きさ $\Gamma(r)$ が$\Gamma(r) = O(\exp(r^{1-\varepsilon_1}))$ を満たすと、半径 $r\le\rho^\ast = O(\log(n)^{1+\varepsilon_2})$ の局所基底状態は局所的な摂動に対して安定となる。これは、$D$-次元ユークリッドの場合よりもほぼ指数関数的に改善され、$\Gamma(r) = O(r^D)$ と $\rho^\ast = O(n^\alpha)$ は、ある$\alpha > 0$ である。従うアプローチは、$\varepsilon_1 = 0$を持つ有限レートqLDPC符号の安定性を証明できない。局所ハミルトニアンは広い零温度エントロピーを持つことができるので、熱力学の第3法則の意味を論じる。 The current theoretical framework for topological phases of matter is based on the thermodynamic limit of a system with geometrically local interactions. A natural question is to what extent the notion of a phase of matter remains well-defined if we relax the constraint of geometric locality, and replace it with a weaker graph-theoretic notion of $k$-locality. As a step towards answering this question, we analyze the stability of the energy gap to perturbations for Hamiltonians corresponding to general quantum low-density parity-check codes, extending work of Bravyi and Hastings [Commun. Math. Phys. 307, 609 (2011)]. A corollary of our main result is that if there exist constants $\varepsilon_1,\varepsilon_2>0$ such that the size $\Gamma(r)$ of balls of radius $r$ on the interaction graph satisfy $\Gamma(r) = O(\exp(r^{1-\varepsilon_1}))$ and the local ground states of balls of radius $r\le\rho^\ast = O(\log(n)^{1+\varepsilon_2})$ are locally indistinguishable, then the energy gap of the associated Hamiltonian is stable against local perturbations. This gives an almost exponential improvement over the $D$-dimensional Euclidean case, which requires $\Gamma(r) = O(r^D)$ and $\rho^\ast = O(n^\alpha)$ for some $\alpha > 0$. The approach we follow falls just short of proving stability of finite-rate qLDPC codes, which have $\varepsilon_1 = 0$; we discuss some strategies to extend the result to these cases. We discuss implications for the third law of thermodynamics, as $k$-local Hamiltonians can have extensive zero-temperature entropy.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# VisTA-SR:農業用低速度撮像カメラの精度と分解能の向上 VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture ( http://arxiv.org/abs/2405.19413v1 ) ライセンス: Link先を確認	Heesup Yun, Sassoum Lo, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles,	(参考訳) 熱カメラは、植物温度の非侵襲的な測定を可能にするため、農業研究において重要なツールである。低コストのサーマルカメラを利用することで、農業研究と生産におけるサーマルイメージング導入の障壁を低くすることができる。本稿では,農業用低コスト熱画像カメラの温度精度と画質を改善するためのアプローチを提案する。コンピュータビジョン技術、特にディープラーニングネットワークの進歩を活用して、RGBと熱画像を組み合わせた低解像度サーマルカメラの能力を高めるために、$\textbf{Vis}$ual \&$\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement)という手法を提案する。この研究には、温度測定の校正と検証、ペア画像データセットの取得、農業用熱画像に適したディープラーニングネットワークの開発が含まれる。本研究は,農業領域における画像強調の課題に対処し,高分解能産業用カメラに代わる低コスト熱カメラの可能性を探るものである。実験により, 農業における温度精度と画像のシャープ性の向上に本手法が有効であることを示し, よりアクセスしやすく, 効率的な熱イメージングソリューションの確立を図った。 Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature accuracy and image quality of low-cost thermal imaging cameras for agricultural applications. Leveraging advancements in computer vision techniques, particularly deep learning networks, we propose a method, called $\textbf{VisTA-SR}$ ($\textbf{Vis}$ual \& $\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement) that combines RGB and thermal images to enhance the capabilities of low-resolution thermal cameras. The research includes calibration and validation of temperature measurements, acquisition of paired image datasets, and the development of a deep learning network tailored for agricultural thermal imaging. Our study addresses the challenges of image enhancement in the agricultural domain and explores the potential of low-cost thermal cameras to replace high-resolution industrial cameras. Experimental results demonstrate the effectiveness of our approach in enhancing temperature accuracy and image sharpness, paving the way for more accessible and efficient thermal imaging solutions in agriculture.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 許容性による安全:高速かつ安全な強化学習のためのシールド構築 Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning ( http://arxiv.org/abs/2405.19414v1 ) ライセンス: Link先を確認	Alexander Politowicz, Sahisnu Mazumder, Bing Liu,	(参考訳) 実生活問題に対する強化学習(RL)ソリューションの設計は依然として大きな課題である。主な関心領域は安全である。シールドディング」は、ユーザ定義の安全仕様を安全なエージェント動作に変換することで、RLの安全性を強制する一般的な手法である。しかし、これらの手法は、極端な学習遅延に悩まされ、問題のモデルや安全なドメインの設計に広範囲な人的努力を必要とするか、事前計算を必要とする。本稿では,安全と遮蔽構造に対処する新しい許容性に基づく枠組みを提案する。許容性はもともと、RLトレーニング効率を改善するための最適な解決策にはならない(許容不可能な)動作を排除するために設計された。本論文は,安全性を本枠組みに自然に組み込むことが可能であること,すなわち,安全性を含む許容範囲を延長することにより,安全性と効率の向上を両立できることを示す。 3つの標準RLアプリケーションを用いた実験評価は, 提案手法の有効性を示す。 Designing Reinforcement Learning (RL) solutions for real-life problems remains a significant challenge. A major area of concern is safety. "Shielding" is a popular technique to enforce safety in RL by turning user-defined safety specifications into safe agent behavior. However, these methods either suffer from extreme learning delays, demand extensive human effort in designing models and safe domains in the problem, or require pre-computation. In this paper, we propose a new permissibility-based framework to deal with safety and shield construction. Permissibility was originally designed for eliminating (non-permissible) actions that will not lead to an optimal solution to improve RL training efficiency. This paper shows that safety can be naturally incorporated into this framework, i.e. extending permissibility to include safety, and thereby we can achieve both safety and improved efficiency. Experimental evaluation using three standard RL applications shows the effectiveness of the approach.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 生成的類似性を持つコントラスト学習を用いたヒト誘導的ビアーゼをキャプチャする空間の学習 Using Contrastive Learning with Generative Similarity to Learn Spaces that Capture Human Inductive Biases ( http://arxiv.org/abs/2405.19420v1 ) ライセンス: Link先を確認	Raja Marjieh, Sreejan Kumar, Declan Campbell, Liyi Zhang, Gianluca Bencomo, Jake Snell, Thomas L. Griffiths,	(参考訳) 人間は、少数の例から学び、感覚データから有用な情報を抽象化するために、強い帰納バイアスに頼る。機械学習モデルにそのようなバイアスを注入することで、数ショットの学習、堅牢性、アライメントなど、さまざまなベンチマークのパフォーマンスが向上することが示されている。しかし、人間の類似性判断のような心理的に豊かなトレーニングデータがスケールするにはコストがかかるため、目標を達成するための効果的なトレーニング手順を見つけることは困難である。ここでは,2つのデータポイントが同一分布からサンプリングされた場合の類似性を考えるベイズ的類似性の概念を導入することで,この問題に対処する。この尺度は確率的プログラムを含む複雑な生成過程に適用できる。生成的類似性は, 特定の帰納的バイアスを表現する空間埋め込みの学習を可能にするため, 正確な形状を抽出可能な場合でも, 対照的な学習目標を定義するのに有効であることを示す。本研究では, 幾何学的形状の帰納的バイアスを捕捉し, 確率的プログラムによってパラメータ化される抽象的描画スタイルをよりよく識別するために, 提案手法の有用性を実証する。 Humans rely on strong inductive biases to learn from few examples and abstract useful information from sensory data. Instilling such biases in machine learning models has been shown to improve their performance on various benchmarks including few-shot learning, robustness, and alignment. However, finding effective training procedures to achieve that goal can be challenging as psychologically-rich training data such as human similarity judgments are expensive to scale, and Bayesian models of human inductive biases are often intractable for complex, realistic domains. Here, we address this challenge by introducing a Bayesian notion of generative similarity whereby two datapoints are considered similar if they are likely to have been sampled from the same distribution. This measure can be applied to complex generative processes, including probabilistic programs. We show that generative similarity can be used to define a contrastive learning objective even when its exact form is intractable, enabling learning of spatial embeddings that express specific inductive biases. We demonstrate the utility of our approach by showing how it can be used to capture human inductive biases for geometric shapes, and to better distinguish different abstract drawing styles that are parameterized by probabilistic programs.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# バイスタブル画像を用いた視覚言語モデルの評価 Evaluating Vision-Language Models on Bistable Images ( http://arxiv.org/abs/2405.19423v1 ) ライセンス: Link先を確認	Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch,	(参考訳) ビスタブル・イメージ(ビスタブル・イメージ、英: Bistable image、または不明瞭または可逆的イメージ)は、2つの異なる解釈で見ることができる視覚刺激を示すが、観察者は同時に見ることはできない。本研究では,バイスタブル画像を用いた視覚言語モデルについて,これまでで最も広範な検討を行った。私たちは手動で29枚のバイスタブル画像と関連するラベルを集め、明るさ、色調、回転で116種類の操作を行ないました。 6つのモデルアーキテクチャにまたがる分類タスクと生成タスクにおいて,12種類のモデルを評価した。以上の結果から,Idefics 系と LLaVA1.5-13b 系のモデルを除くと,各モデル間の解釈の相違が顕著であり,画像操作下での差は最小であり,画像回転の例外は少ないことが明らかとなった。さらに、モデル嗜好を人間と比較し、モデルが人間と同じ連続性バイアスを示しておらず、しばしば人間の初期解釈から分岐していることを指摘した。また,プロンプトの変動や同義語ラベルの使用の影響についても検討し,これらの要因が画像操作よりもモデル解釈に大きく影響することを発見した。すべてのコードとデータはオープンソースである。 Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected them to 116 different manipulations in brightness, tint, and rotation. We evaluated twelve different models in both classification and generative tasks across six model architectures. Our findings reveal that, with the exception of models from the Idefics family and LLaVA1.5-13b, there is a pronounced preference for one interpretation over another among the models, and minimal variance under image manipulations, with few exceptions on image rotations. Additionally, we compared the model preferences with humans, noting that the models do not exhibit the same continuity biases as humans and often diverge from human initial interpretations. We also investigated the influence of variations in prompts and the use of synonymous labels, discovering that these factors significantly affect model interpretations more than image manipulations showing a higher influence of the language priors on bistable image interpretations compared to image-text training data. All code and data is open sourced.	翻訳日:2024-05-31 19:35:56 公開日:2024-05-29
# 拡散政策攻撃者:拡散政策に対する敵対的攻撃の作法 Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies ( http://arxiv.org/abs/2405.19424v1 ) ライセンス: Link先を確認	Yipu Chen, Haotian Xue, Yongxin Chen,	(参考訳) 拡散モデル (DM) は行動クローニング (BC) の有望なアプローチとして出現している。 DMに基づく拡散ポリシー(DP)は、BCのパフォーマンスを新たな高さまで向上させ、様々なタスクにまたがる堅牢な有効性を証明し、その固有の柔軟性と実装の容易さを兼ね備えている。政策創出の基盤としてDPの採用が増加しているにもかかわらず、安全の重大な問題は未解決のままである。過去の試みでは、深い政策ネットワークをターゲットとしていたが、DPは拡散モデルを政策ネットワークとして使用し、連鎖構造とランダム性によって従来の手法による攻撃を効果的に行なわなかった。本稿では, 敵のシナリオを導入し, オフライン・オンライン攻撃を包含し, グローバル・パッチベースの攻撃を行うことにより, DPの安全性に関する包括的検討を行う。 DP-Attackerは、上記すべてのシナリオにまたがる効果的な敵攻撃を構築できるアルゴリズム群である。我々は、様々な操作タスクにわたる事前訓練された拡散ポリシーに対する攻撃を行う。実験により,DP-Attackerは全てのシナリオにおいてDPの成功率を大幅に低下させることができることを示した。特にオフラインのシナリオでは、DP-Attackerはすべてのフレームに適用可能な高度に転送可能な摂動を生成することができる。さらに、環境に適用された場合、効果的にモデルを欺くような逆の物理的パッチの作成について説明する。ビデオの結果は以下のとおりだ。 Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC). Diffusion policies (DP) based on DMs have elevated BC performance to new heights, demonstrating robust efficacy across diverse tasks, coupled with their inherent flexibility and ease of implementation. Despite the increasing adoption of DP as a foundation for policy generation, the critical issue of safety remains largely unexplored. While previous attempts have targeted deep policy networks, DP used diffusion models as the policy network, making it ineffective to be attacked using previous methods because of its chained structure and randomness injected. In this paper, we undertake a comprehensive examination of DP safety concerns by introducing adversarial scenarios, encompassing offline and online attacks, and global and patch-based attacks. We propose DP-Attacker, a suite of algorithms that can craft effective adversarial attacks across all aforementioned scenarios. We conduct attacks on pre-trained diffusion policies across various manipulation tasks. Through extensive experiments, we demonstrate that DP-Attacker has the capability to significantly decrease the success rate of DP for all scenarios. Particularly in offline scenarios, DP-Attacker can generate highly transferable perturbations applicable to all frames. Furthermore, we illustrate the creation of adversarial physical patches that, when applied to the environment, effectively deceive the model. Video results are put in: https://sites.google.com/view/diffusion-policy-attacker.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 言語モデルエージェントのための適応型会話内チーム構築 Adaptive In-conversation Team Building for Language Model Agents ( http://arxiv.org/abs/2405.19425v1 ) ライセンス: Link先を確認	Linxin Song, Jiale Liu, Jieyu Zhang, Shaokun Zhang, Ao Luo, Shijian Wang, Qingyun Wu, Chi Wang,	(参考訳) 複数の大規模言語モデル(LLM)エージェントを活用することは、複雑なタスクに取り組む上で有望なアプローチであることを示している。タスクが与えられたら、効果的に解決するためのLLMエージェントのチームをどのように構築すればよいか? 私たちの新しい適応型チーム構築パラダイムは、Captain Agentという新しいエージェント設計を通じて実現された柔軟なソリューションを提供します。タスク解決プロセスの各ステップごとに動的にチームを編成し、管理し、ネストしたグループの会話とリフレクションを利用して、多様な専門知識を確保し、ステレオタイプ的なアウトプットを防ぐ。柔軟性がありながら構造化された問題解決のアプローチを可能にし、冗長性を低減し、出力の多様性を高めるのに役立つ。 6つの実世界のシナリオに対する総合的な評価では、エージェントは21.94%の精度で既存のマルチエージェントメソッドを著しく上回り、タスク固有のプロンプトエンジニアリングを必要とせずに優れたパフォーマンスを提供する。 Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks, while the effective design of multiple agents for a particular application remains an art. It is thus intriguing to answer a critical question: Given a task, how can we build a team of LLM agents to solve it effectively? Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent. It dynamically forms and manages teams for each step of a task-solving process, utilizing nested group conversations and reflection to ensure diverse expertise and prevent stereotypical outputs. It allows for a flexible yet structured approach to problem-solving and can help reduce redundancy and enhance output diversity. A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods with 21.94% improvement in average accuracy, providing outstanding performance without requiring task-specific prompt engineering.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 深層学習による経口読解頻度の評価 Deep Learning for Assessment of Oral Reading Fluency ( http://arxiv.org/abs/2405.19426v1 ) ライセンス: Link先を確認	Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao,	(参考訳) 読み流しの評価はリテラシープログラムの重要な要素であり、早期教育介入の指導と監視に役立っている。教員が実施する演習のリソース集約性を考えると,口頭読みの音声記録を操作できる自動ツールの開発は,客観的かつ高度にスケーラブルなソリューションとして魅力的である。精度、レート、表現力などの複雑な側面は、読み流しの人間の判断を下す。そこで本研究では,人間専門家がラベル付けした物語テキストの子どもの音声記録の学習データセットのエンドツーエンドモデリングについて検討する。事前訓練されたwav2vec2.0モデルは、ラベル付きデータの限られた量による課題を軽減する可能性から採用されている。本報告では, 学習した語彙・音響・韻律的特徴の組込みが, 読み流しの知覚に重要であることを明らかにする。 Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accuracy, rate and expressiveness underlie human judgements of reading fluency. In this work, we investigate end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. The pre-trained wav2vec2.0 model is adopted due its potential to alleviate the challenges from the limited amount of labeled data. We report the performance of a number of system variations on the relevant measures, and also probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 量子ヒストリーにおける空間と時間相関 Space and time correlations in quantum histories ( http://arxiv.org/abs/2405.19427v1 ) ライセンス: Link先を確認	Leonardo Castellani, Anna Gabetti,	(参考訳) 一般化された量子ヒストリーの定式化は、同じ履歴密度行列の異なるトレースを取ることによって、空間と時間相関の対称的な処理を可能にする。この枠組みで空間的・時間的絡み合いをどう特徴づけるかを思い出す。静的複合システムのケケットに履歴状態をマッピングする操作プロトコルが提示される。例えば、我々のアプローチでは、Leggett-Garg と temporal CHSH の不等式がどのように破られるかを示す。 The formalism of generalized quantum histories allows a symmetrical treatment of space and time correlations, by taking different traces of the same history density matrix. We recall how to characterize spatial and temporal entanglement in this framework. An operative protocol is presented, to map a history state into the ket of a static composite system. We show, by examples, how the Leggett-Garg and the temporal CHSH inequalities can be violated in our approach.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# コンフォーマル再帰的特徴除去 Conformal Recursive Feature Elimination ( http://arxiv.org/abs/2405.19429v1 ) ライセンス: Link先を確認	Marcos López-De-Castro, Alberto García-Galindo, Rubén Armañanzas,	(参考訳) 従来の統計手法とは異なり、コンフォーマル予測(CP)はデータの交換可能性のみに基づいて、個々の予測に関連する有効かつ正確な信頼レベルを決定することができる。本稿では,CP フレームワークを利用した新しい特徴選択手法を提案する。提案したCRFE(Conformal Recursive Feature Elimination)は,データセットの非整合性を高める機能を特定し,再帰的に削除する。また、CRFEの自動停止基準と、機能のサブセット間の一貫性を測定するための新しいインデックスも提示する。 CRFE選択は、データの複数のパーティションを用いて、複数のマルチクラスデータセット上の古典的再帰的特徴除去(RFE)手法と比較される。その結果、CRFEはデータセットの半分でRFEを明らかに上回り、残りの半分では同様のパフォーマンスを実現していることがわかった。自動停止基準は、分類性能を計算せずに有効かつ非冗長な機能のサブセットを提供する。 Unlike traditional statistical methods, Conformal Prediction (CP) allows for the determination of valid and accurate confidence levels associated with individual predictions based only on exchangeability of the data. We here introduce a new feature selection method that takes advantage of the CP framework. Our proposal, named Conformal Recursive Feature Elimination (CRFE), identifies and recursively removes features that increase the non-conformity of a dataset. We also present an automatic stopping criterion for CRFE, as well as a new index to measure consistency between subsets of features. CRFE selections are compared to the classical Recursive Feature Elimination (RFE) method on several multiclass datasets by using multiple partitions of the data. The results show that CRFE clearly outperforms RFE in half of the datasets, while achieving similar performance in the rest. The automatic stopping criterion provides subsets of effective and non-redundant features without computing any classification performance.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 合意を超えて:言語的インフォームド・カウンセリングに基づく自動評価手法の合理化 Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals ( http://arxiv.org/abs/2405.19433v1 ) ライセンス: Link先を確認	Yupei Wang, Renfen Hu, Zhe Zhao,	(参考訳) 現在の自動エッセイスコアリング(AES)手法は、ヒトのレイカーと高い一致を示しているが、それらのスコアリングメカニズムは十分に解明されていない。提案手法は,Large Language Models (LLMs) によって支援された対実的介入を用いて,エッセイ評価において,BERT のようなモデルは主に文レベルの特徴に焦点をあてる一方で,LLM は慣習,言語複雑性,組織に順応するものであり,より包括的アライメントとルーブリックのスコアリングとの整合性を示す。さらに、LLMはフィードバック中の反事実的介入を識別することができる。我々のアプローチは、ニューラルネットワークAES手法の理解を改善し、モデル駆動決定における透明性を求める他の領域にも適用できる。コードとデータはGitHubでリリースされる。 While current automated essay scoring (AES) methods show high agreement with human raters, their scoring mechanisms are not fully explored. Our proposed method, using counterfactual intervention assisted by Large Language Models (LLMs), reveals that when scoring essays, BERT-like models primarily focus on sentence-level features, while LLMs are attuned to conventions, language complexity, as well as organization, indicating a more comprehensive alignment with scoring rubrics. Moreover, LLMs can discern counterfactual interventions during feedback. Our approach improves understanding of neural AES methods and can also apply to other domains seeking transparency in model-driven decisions. The codes and data will be released at GitHub.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 一般化滑らか性下における多目的最適化の収束性について On the Convergence of Multi-objective Optimization under Generalized Smoothness ( http://arxiv.org/abs/2405.19440v1 ) ライセンス: Link先を確認	Qi Zhang, Peiyao Xiao, Kaiyi Ji, Shaofeng Zou,	(参考訳) 多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論的な分析を伴う効果的なアルゴリズムを提供しているが、それらは標準の$L$-smoothや、リカレントニューラルネットワーク(RNN)やトランスフォーマーのようなニューラルネットワークには不満足な境界段階の仮定によって制限されている。本稿では、より一般的で現実的な$\ell$-smooth損失関数の研究を行い、$\ell$は勾配ノルムの一般非減少関数である。目的物間の最小改善を最大化する競合回避(CA)方向を近似した,$\ell$-smooth MOO問題,一般化されたSmooth Multi-objective Gradient descent (GSMGrad) とその確率的変種であるStochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad) の2つの新しいシングルループアルゴリズムを開発した。両アルゴリズムの総合収束解析を行い, 平均CA距離を保証した$\epsilon$-accurate Pareto定常点(すなわち, 更新方向とCA方向のギャップ)に全反復で収束することを示し, 完全$\mathcal{O}(\epsilon^{-2})$と$\mathcal{O}(\epsilon^{-4})$サンプルは決定論的および確率的設定にそれぞれ必要である。私たちのアルゴリズムは、より多くのサンプルを使用して、各イテレーションにおいてより厳密な$\epsilon$-level CA距離を保証することができます。また,GSMGradと同等の性能保証を達成しつつ,一定の時間と空間のみを用いてGSMGrad-FAという実用的なGSMGradの変種を提案する。提案手法の有効性を検証し,提案手法の有効性を検証した。 Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which are typically unsatisfactory for neural networks, such as recurrent neural networks (RNNs) and transformers. In this paper, we study a more general and realistic class of $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We develop two novel single-loop algorithms for $\ell$-smooth MOO problems, Generalized Smooth Multi-objective Gradient descent (GSMGrad) and its stochastic variant, Stochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad), which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of both algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. Our algorithms can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we propose a practical variant of GSMGrad named GSMGrad-FA using only constant-level time and space, while achieving the same performance guarantee as GSMGrad. Our experiments validate our theory and demonstrate the effectiveness of the proposed methods.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 動き平均化による大規模DSM登録 Large-scale DSM registration via motion averaging ( http://arxiv.org/abs/2405.19442v1 ) ライセンス: Link先を確認	Ningli Xu, Rongjun Qin,	(参考訳) 広域デジタルサーフェスモデル(DSM)の生成には、多数の個人と部分的に重複したDSMを登録する必要がある。これは、複数のDSMからの多くの観測が考慮された場合、メモリオーバーフローを引き起こすため、典型的な登録アルゴリズムでは難しい問題となる。逐次登録アルゴリズムは計算を著しく削減できるが、特に小さな重なり合ったペアに対して脆弱であり、大きなエラーの蓄積につながる。本研究では,DSM間の相対的なポーズを表すエッジを持つシーングラフを構築するために,ペアワイズDSMを登録する,動き平均化問題としてDSM登録タスクを構築する新しいソリューションを提案する。具体的には、大きなDSMのグリッド構造に基づいて、新しい近接探索法を用いてペアワイズ登録を行う。シーングラフは,O(N)複雑性の極めて高速な動き平均アルゴリズムを用いて最適化可能である(Nは画像数を指す)。高分解能衛星由来DSMの評価は、計算と精度を著しく向上させる。 Generating wide-area digital surface models (DSMs) requires registering a large number of individual, and partially overlapped DSMs. This presents a challenging problem for a typical registration algorithm, since when a large number of observations from these multiple DSMs are considered, it may easily cause memory overflow. Sequential registration algorithms, although can significantly reduce the computation, are especially vulnerable for small overlapped pairs, leading to a large error accumulation. In this work, we propose a novel solution that builds the DSM registration task as a motion averaging problem: pair-wise DSMs are registered to build a scene graph, with edges representing relative poses between DSMs. Specifically, based on the grid structure of the large DSM, the pair-wise registration is performed using a novel nearest neighbor search method. We show that the scene graph can be optimized via an extremely fast motion average algorithm with O(N) complexity (N refers to the number of images). Evaluation of high-resolution satellite-derived DSM demonstrates significant improvement in computation and accuracy.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# MathChat: マルチターンインタラクションにおける数学的推論と指導のベンチマーク MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions ( http://arxiv.org/abs/2405.19444v1 ) ライセンス: Link先を確認	Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu,	(参考訳) 大規模言語モデル(LLM)は数学的な問題解決において、特に一ターンの質問応答形式において顕著な能力を示した。しかし、現実のシナリオでは、多ターンや対話的な情報交換を必要とする数学的な質問応答がしばしば必要であり、これらのタスクにおけるLLMの性能はまだ未定である。本稿では,LLMを幅広い数学的タスクの範囲にわたって評価するための総合的なベンチマークであるMathChatを紹介する。これらのタスクは、マルチターン相互作用とオープンエンド生成におけるモデルの能力を評価するために構成される。我々は,MathChatベンチマークにおける様々なSOTA LLMの性能評価を行い,これらのモデルが一ターン質問応答において優れている一方で,持続的な推論と対話理解を必要とする複雑なシナリオにおいて,それらの性能は著しく低下していることを示した。マルチターンタスクやオープンエンドタスクに直面する既存のLLMの制限に対処するため,LLMファインタニングのための合成対話型数学データセットであるMathChatSyncを開発した。実験結果は、MathChatsyncのような多種多様な対話型インストラクションチューニングデータセットでLLMをトレーニングする必要性を強調している。本研究は, LLMの多元数理推論能力向上に向けた有望な方向性の1つを概説し, 対話型数学的問題解決や実世界の応用に長けているLCMの開発を推し進めるものであると考えている。 Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a comprehensive benchmark specifically designed to evaluate LLMs across a broader spectrum of mathematical tasks. These tasks are structured to assess the models' abilities in multiturn interactions and open ended generation. We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios that require sustained reasoning and dialogue understanding. To address the above limitations of existing LLMs when faced with multiturn and open ended tasks, we develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations. Experimental results emphasize the need for training LLMs with diverse, conversational instruction tuning datasets like MathChatsync. We believe this work outlines one promising direction for improving the multiturn mathematical reasoning abilities of LLMs, thus pushing forward the development of LLMs that are more adept at interactive mathematical problem solving and real world applications.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# FourierMamba: イメージデライニングのためのステートスペースモデルとフーリエラーニング統合 FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining ( http://arxiv.org/abs/2405.19450v1 ) ライセンス: Link先を確認	Dong Li, Yidi Liu, Xueyang Fu, Senyan Xu, Zheng-Jun Zha,	(参考訳) Image derainingは雨が降る画像から雨の跡を取り除き、透明な背景を復元することを目的としている。現在、フーリエ変換を用いたいくつかの研究は、降雨を捉える前に有効な周波数として機能するため、画像の劣化に有効であることが証明されている。しかし、画像に低周波と高周波の依存性があるにもかかわらず、これらのフーリエ法は、学習手順の整合性に異なる周波数の相関を利用することは稀であり、画像デラリニングにおける周波数情報の完全利用を制限している。あるいは、最近登場したMamba手法は、様々な領域(例えば、空間的、時間的)における相関をモデル化するための効果と効率を描いており、異なる周波数を相関付けるために、探索されていないフーリエ空間にMambaを導入することは、画像のデライニングを改善するのに役立つと論じている。これにより,FourierMambaという新たなフレームワークが提案され,Fourier空間におけるMambaとのイメージデベリングが実現された。フーリエマムバのコアは、フーリエ空間における周波数順序のユニークな配置に依拠し、低周波順序形式は空間次元(軸に配置されていない)とチャネル次元(軸に配置されている)で異なる形で表される。そこで我々は、空間次元とチャネル次元のフーリエ空間情報を異なる設計で関連付けるフーリエマンバを設計する。具体的には、空間次元フーリエ空間において、周波数をスキャンして低周波数から高周波数に並べ替えることで、周波数間の接続を秩序的に関連付けるジグザグ符号を導入する。 Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely exploit the correlation of different frequencies for conjuncting their learning procedures, limiting the full utilization of frequency information for image deraining. Alternatively, the recently emerged Mamba technique depicts its effectiveness and efficiency for modeling correlation in various domains (e.g., spatial, temporal), and we argue that introducing Mamba into its unexplored Fourier spaces to correlate different frequencies would help improve image deraining. This motivates us to propose a new framework termed FourierMamba, which performs image deraining with Mamba in the Fourier space. Owning to the unique arrangement of frequency orders in Fourier space, the core of FourierMamba lies in the scanning encoding of different frequencies, where the low-high frequency order formats exhibit differently in the spatial dimension (unarranged in axis) and channel dimension (arranged in axis). Therefore, we design FourierMamba that correlates Fourier space information in the spatial and channel dimensions with distinct designs. Specifically, in the spatial dimension Fourier space, we introduce the zigzag coding to scan the frequencies to rearrange the orders from low to high frequencies, thereby orderly correlating the connections between frequencies; in the channel dimension Fourier space with arranged orders of frequencies in axis, we can directly use Mamba to perform frequency correlation and improve the channel information representation.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 遮蔽クラッツァーポテンシャルとその変種について On the screened Kratzer potential and its variants ( http://arxiv.org/abs/2405.19451v1 ) ライセンス: Link先を確認	Francisco M. Fernández,	(参考訳) 最近提案された2原子分子の振動-回転スペクトルとその熱力学特性は欠陥を示す。これらのポテンシャルのパラメータ $D_e $ と $r_e$ は、それぞれ解離エネルギーと平衡結合長ではないことが容易に示せる。私たちは、その間違いをシンプルで一般的な方法で克服する方法を示します。 We argue that several potentials proposed recently for the analysis of the vibrational-rotational spectra of diatomic molecules and their thermodynamic properties exhibit a flaw. One can easily show that the parameters $D_e $ and $r_e$ in those potentials are not the dissociation energy and equilibrium bond length, respectively, as the proposers believe. We show how to overcome the mistake in a simple and quite general way.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# ゲイター:現実世界の四足歩行のための一貫した表現学習 Gaitor: Learning a Unified Representation Across Gaits for Real-World Quadruped Locomotion ( http://arxiv.org/abs/2405.19452v1 ) ライセンス: Link先を確認	Alexander L. Mitchell, Wolfgang Merkt, Aristotelis Papatheodorou, Ioannis Havoutis, Ingmar Posner,	(参考訳) 現在の四足歩行の最先端は、地形を横断する頑丈な動きを生み出すことができるが、望ましいロボット軌道をトロットやクロールのような個別の移動スキルに分割する必要がある。対照的に、本研究では、歩行タイプと特徴の連続的なブレンディングを可能にする四足歩行の単一統一表現を学習する可能性を示す。本稿では,運動スキルのゆがみのある表現を学習し,トレーニング中に見られるすべての歩行タイプに共通する情報を共有するGaitorを提案する。学習した表現に現れる構造は、異なる歩行タイプ間の位相相関を符号化できることで解釈可能である。これらは連続的な歩行遷移を生み出すために利用することができる。また、フットスイング特性はアンタングル化され、直接対応可能である。この構造化された潜在表現で動作する初歩的な地形符号化と学習プランナーとともに、Gaitorは、不均一な地形に反応しながら、所望の歩行タイプやユーザの特徴を含む動作コマンドを受信することができる。我々はANYmal Cプラットフォーム上でのシミュレーションと実世界の両方の設定でGaitorを評価した。我々の知る限りでは、これは複数の歩行に対して統一的で解釈可能な潜在表現を学習する最初の仕事であり、実際の四足歩行ロボット上で異なる移動モード間でオンデマンドで連続的にブレンドする結果となった。 The current state-of-the-art in quadruped locomotion is able to produce robust motion for terrain traversal but requires the segmentation of a desired robot trajectory into a discrete set of locomotion skills such as trot and crawl. In contrast, in this work we demonstrate the feasibility of learning a single, unified representation for quadruped locomotion enabling continuous blending between gait types and characteristics. We present Gaitor, which learns a disentangled representation of locomotion skills, thereby sharing information common to all gait types seen during training. The structure emerging in the learnt representation is interpretable in that it is found to encode phase correlations between the different gait types. These can be leveraged to produce continuous gait transitions. In addition, foot swing characteristics are disentangled and directly addressable. Together with a rudimentary terrain encoding and a learned planner operating in this structured latent representation, Gaitor is able to take motion commands including desired gait type and characteristics from a user while reacting to uneven terrain. We evaluate Gaitor in both simulated and real-world settings on the ANYmal C platform. To the best of our knowledge, this is the first work learning such a unified and interpretable latent representation for multiple gaits, resulting in on-demand continuous blending between different locomotion modes on a real quadruped robot.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# 誤り耐性スプリットフッド学習のためのスプリットポイントの最適化 Optimizing Split Points for Error-Resilient SplitFed Learning ( http://arxiv.org/abs/2405.19453v1 ) ライセンス: Link先を確認	Chamani Shiranthika, Parvaneh Saeedi, Ivan V. Bajić,	(参考訳) フェデレート・ラーニング(FL)、スプリット・ラーニング(SL)、スプリット・フェデレート・ラーニング(Split Federated Learning)といった分散学習の最近の進歩は、機械学習の可能性を広げている。 SplitFedは、FL内の個々のクライアントの計算負担を最小限に抑え、プライバシーを維持しながらSLを並列化する。本研究では,SplitFedのパケット損失に対するモデル分割点のレジリエンスについて検討した。 SplitFedのパラメータアグリゲーション戦略について検討し、各ポイントでモデルを分割する影響を調べる。実験はヒト胚画像分割作業で行われ、より深い分割点の統計的に有意な利点を明らかにした。 Recent advancements in decentralized learning, such as Federated Learning (FL), Split Learning (SL), and Split Federated Learning (SplitFed), have expanded the potentials of machine learning. SplitFed aims to minimize the computational burden on individual clients in FL and parallelize SL while maintaining privacy. This study investigates the resilience of SplitFed to packet loss at model split points. It explores various parameter aggregation strategies of SplitFed by examining the impact of splitting the model at different points-either shallow split or deep split-on the final global model performance. The experiments, conducted on a human embryo image segmentation task, reveal a statistically significant advantage of a deeper split point.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# Deep Grokking: ディープニューラルネットワークはより一般化するのか? Deep Grokking: Would Deep Neural Networks Generalize Better? ( http://arxiv.org/abs/2405.19454v1 ) ライセンス: Link先を確認	Simin Fan, Razvan Pascanu, Martin Jaggi,	(参考訳) グラッキング現象に関する最近の研究は、ニューラルネットワークのトレーニング力学と一般化挙動の複雑さを照らしている。グロキング(Grokking)とは、ネットワークがトレーニングセットに完全に適合する拡張オーバーフィッティングフェーズの後に発生する、テストセット上でのネットワークの一般化精度の急激な上昇を指す。既存の研究は主に2層MLPや1層トランスフォーマーのような浅層ネットワークに焦点を当てているが、我々はディープネットワーク(例えば12層MLP)のグラッキングについて検討する。我々はこの現象を実証的に再現し、深層ニューラルネットワークが浅いものよりもグラッキングの影響を受けやすいことを発見した。一方,テスト精度が2次サージを示すMLPモデルの深さを増大させると,浅層モデルにはほとんど見られない,興味深い多段階一般化現象が観測される。さらに,特徴量の減少と,グルーキング時の過度適合から一般化段階への位相遷移の間の説得力のある対応を明らかにする。さらに,多段階一般化現象は特徴ランクの二重発色パターンとよく一致していることがわかった。これらの観測から、内部特徴ランクは重量ノルムと比較してモデルの一般化挙動のより有望な指標となる可能性が示唆された。我々の研究は、ディープニューラルネットワークに潜入し、特徴ランクと一般化性能の関係を調査する最初のものであると信じています。 Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on shallow networks such as 2-layer MLP and 1-layer Transformer, we explore grokking on deep networks (e.g. 12-layer MLP). We empirically replicate the phenomenon and find that deep neural networks can be more susceptible to grokking than its shallower counterparts. Meanwhile, we observe an intriguing multi-stage generalization phenomenon when increase the depth of the MLP model where the test accuracy exhibits a secondary surge, which is scarcely seen on shallow models. We further uncover compelling correspondences between the decreasing of feature ranks and the phase transition from overfitting to the generalization stage during grokking. Additionally, we find that the multi-stage generalization phenomenon often aligns with a double-descent pattern in feature ranks. These observations suggest that internal feature rank could serve as a more promising indicator of the model's generalization behavior compared to the weight-norm. We believe our work is the first one to dive into grokking in deep neural networks, and investigate the relationship of feature rank and generalization performance.	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# スタートアップ評価パイプラインの自動化 - スタートアップ成功予測フレームワーク(SSFF) An Automated Startup Evaluation Pipeline: Startup Success Forecasting Framework (SSFF) ( http://arxiv.org/abs/2405.19456v1 ) ライセンス: Link先を確認	Xisen Wang, Yigit Ihlamur,	(参考訳) スタートアップを初期段階で評価することは、専門家による詳細な分析を必要とする複雑なタスクである。このプロセスを大規模に自動化することは、ビジネスに大きな影響を与えるが、固有の複雑さは課題を引き起こす。本稿では、従来の機械学習と高度な言語モデルを組み合わせた新しい自動化システムであるStartup Success Forecasting Framework(SSFF)を導入することで、この課題に対処する。このインテリジェントエージェントベースのアーキテクチャは、分析をエンドツーエンドで実行するベンチャーキャピタリストのように、推論、行動、合成、決定するために設計されています。 SSFFは3つの主な部分で構成されている。 - 予測ブロック: ランダムな森林とニューラルネットワークを使用して予測を行う。外部知識ブロック:外部ソースからリアルタイム情報を取得する。このフレームワークは、創業者とスタートアップの説明に関する最小限の入力データを必要とし、外部リソースからの追加データでそれを強化し、すべて自動化された方法で高精度に詳細な分析を行う。 Evaluating startups in their early stages is a complex task that requires detailed analysis by experts. While automating this process on a large scale can significantly impact businesses, the inherent complexity poses challenges. This paper addresses this challenge by introducing the Startup Success Forecasting Framework (SSFF), a new automated system that combines traditional machine learning with advanced language models. This intelligent agent-based architecture is designed to reason, act, synthesize, and decide like a venture capitalist to perform the analysis end-to-end. The SSFF is made up of three main parts: - Prediction Block: Uses random forests and neural networks to make predictions. - Analyst Block: Simulates VC analysis scenario and uses SOTA prompting techniques - External Knowledge Block: Gathers real-time information from external sources. This framework requires minimal input data about the founder and startup description, enhances it with additional data from external resources, and performs a detailed analysis with high accuracy, all in an automated manner	翻訳日:2024-05-31 19:26:02 公開日:2024-05-29
# MemControl:自動パラメータ選択による医療拡散モデルにおける記憶の緩和 MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection ( http://arxiv.org/abs/2405.19458v1 ) ライセンス: Link先を確認	Raman Dutt, Pedro Sanchez, Ondrej Bohdal, Sotirios A. Tsaftaris, Timothy Hospedales,	(参考訳) 拡散モデルは、トレーニング分布を忠実に反映した画像を生成する際、顕著な能力を示す。しかし、これらのモデルはデータの記憶をトレーニングする傾向があり、特に医用画像のような敏感な分野において、プライバシー、倫理、法的な懸念を生じさせる。我々は、暗記は深層モデルの過度パラメータ化によって引き起こされると仮定し、微調整時のモデルキャパシティの正規化が効果的な緩和戦略である可能性を示唆した。パラメータ効率のよい微調整(PEFT)手法は、特定のパラメータを選択的に更新することでキャパシティ制御に有望なアプローチを提供する。しかし、生成品質と記憶のバランスをとる学習可能なパラメータの最適なサブセットを見つけることは、いまだ解明されていない。この課題に対処するために、記憶と生成品質の指標を報酬として利用することにより、自動パラメータ選択をガイドする二段階最適化フレームワークを提案する。我々のフレームワークは、生成記憶トレードオフを満たすために更新すべき最適パラメータセットをうまく識別する。我々は,医用画像生成の特定のタスクに対する実験を行い,モデルパラメータの0.019%を微調整することで,既存の最先端のトレーニング時間緩和戦略を上回りました。さらに、我々のフレームワークを通じて得られた戦略は、異なるデータセットやドメイン間で転送可能であることを示す。提案するフレームワークは,大規模なデータセットに対してスケーラブルであり,報酬関数の選択に非依存である。最後に、我々のフレームワークと既存のアプローチを組み合わせることで、さらなる記憶の緩和を実現できることを示す。 Diffusion models show a remarkable ability in generating images that closely mirror the training distribution. However, these models are prone to training data memorization, leading to significant privacy, ethical, and legal concerns, particularly in sensitive fields such as medical imaging. We hypothesize that memorization is driven by the overparameterization of deep models, suggesting that regularizing model capacity during fine-tuning could be an effective mitigation strategy. Parameter-efficient fine-tuning (PEFT) methods offer a promising approach to capacity control by selectively updating specific parameters. However, finding the optimal subset of learnable parameters that balances generation quality and memorization remains elusive. To address this challenge, we propose a bi-level optimization framework that guides automated parameter selection by utilizing memorization and generation quality metrics as rewards. Our framework successfully identifies the optimal parameter set to be updated to satisfy the generation-memorization tradeoff. We perform our experiments for the specific task of medical image generation and outperform existing state-of-the-art training-time mitigation strategies by fine-tuning as few as 0.019% of model parameters. Furthermore, we show that the strategies learned through our framework are transferable across different datasets and domains. Our proposed framework is scalable to large datasets and agnostic to the choice of reward functions. Finally, we show that our framework can be combined with existing approaches for further memorization mitigation.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# クラスタリングに基づくドメイン一般化のための検証スプリット Clustering-Based Validation Splits for Domain Generalisation ( http://arxiv.org/abs/2405.19461v1 ) ライセンス: Link先を確認	Andrea Napoli, Paul White,	(参考訳) 本稿では,ドメインシフトによるモデル選択の問題について考察する。この設定では、トレーニングセットと検証セットの間の最大平均誤差(MMD)が、選択されたモデルの一般化可能性を高めることが提案される。この目的を最大化するカーネルk平均クラスタリングに基づくデータ分割アルゴリズムを提案する。このアルゴリズムは線形プログラミングを利用して分割のサイズ、ラベル、(任意に)群分布を制御し、収束を保証する。このテクニックは、ドメイン一般化(DG)と教師なしドメイン適応(UDA)タスクの両方において、さまざまなデータセットとトレーニングアルゴリズムの代替分割戦略を一貫して上回る。分析はまた、トレーニングと検証セットの間のMDDが、テスト領域の精度と強いランク関連(\rho=0.63$)であることを示し、このアプローチの有効性をさらに裏付けている。 This paper considers the problem of model selection under domain shift. In this setting, it is proposed that a high maximum mean discrepancy (MMD) between the training and validation sets increases the generalisability of selected models. A data splitting algorithm based on kernel k-means clustering, which maximises this objective, is presented. The algorithm leverages linear programming to control the size, label, and (optionally) group distributions of the splits, and comes with convergence guarantees. The technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation (DG) and unsupervised domain adaptation (UDA) tasks. Analysis also shows the MMD between the training and validation sets to be strongly rank-correlated ($\rho=0.63$) with test domain accuracy, further substantiating the validity of this approach.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# クリティカルラーニング期間: 効率的なデータ処理のための早期トレーニングダイナミクスを活用する Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning ( http://arxiv.org/abs/2405.19462v1 ) ライセンス: Link先を確認	Everlyn Asiko Chimoto, Jay Gala, Orevaoghene Ahia, Julia Kreutzer, Bruce A. Bassett, Sara Hooker,	(参考訳) ニューラルマシン翻訳モデルは、非常にデータと計算能力が高い。しかし、全てのデータポイントがモデルトレーニングと一般化に等しく寄与するわけではない。低値のデータポイントを取り除くためのデータプルーニングは、モデルの性能を大幅に低下させることなく、計算予算を大幅に削減する利点がある。本稿では、初期モデルトレーニングのダイナミクスを活用して、モデル性能の最も関連性の高いデータポイントを識別する新しいデータプルーニング手法であるチェックポイントアクロスタイム(CAT)を提案する。我々は、COMET-QE、LASER、LaBSEなど、いくつかのデータプルーニング技術に対してCATをベンチマークする。 CAT は Indo-European 言語のベンチマークを複数のテストセットで上回ります。英語-ドイツ語、英語-フランス語、英語-スワヒリの翻訳タスクに適用すると、CATはトレーニングデータの最大50%をプルーニングしながら、完全なデータセットを使用するのに匹敵するパフォーマンスを達成する。我々は、CATが選択したデータポイントを検査し、それよりも長い文や、ユニークな単語や稀な単語が好まれる傾向にあることを示す。 Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically reducing the compute budget without significant drop in model performance. In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT), that leverages early model training dynamics to identify the most relevant data points for model performance. We benchmark CAT against several data pruning techniques including COMET-QE, LASER and LaBSE. We find that CAT outperforms the benchmarks on Indo-European languages on multiple test sets. When applied to English-German, English-French and English-Swahili translation tasks, CAT achieves comparable performance to using the full dataset, while pruning up to 50% of training data. We inspect the data points that CAT selects and find that it tends to favour longer sentences and sentences with unique or rare words.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# ストリームデータを用いた楽器可変回帰の確率的最適化アルゴリズム Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data ( http://arxiv.org/abs/2405.19463v1 ) ライセンス: Link先を確認	Xuxing Chen, Abhishek Roy, Yifan Hu, Krishnakumar Balasubramanian,	(参考訳) 本研究では,条件付き確率最適化問題として問題を見極め,インストゥルメンタル変数回帰のためのアルゴリズムを開発し,解析する。最小二乗変数回帰の文脈では、我々のアルゴリズムは行列逆転やミニバッチを必要とせず、ストリーミングデータを用いて機器変数回帰を行うための完全なオンラインアプローチを提供する。真のモデルが線型であれば、次数 $\mathcal{O}(\log T/T)$ と $\mathcal{O}(1/T^{1-\iota})$ がそれぞれ$\iota>0$ となる。重要なことは、2サンプルのオラクルが利用可能である場合、我々は共同創設者と機器変数の関係を明示的にモデル化し、推定することを避け、この問題をミニマックス最適化問題として再定義することに基づく最近の研究に対するアプローチの利点を実証する。理論的結果を裏付ける数値実験が提供される。 We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-\iota})$ for any $\iota>0$, respectively under the availability of two-sample and one-sample oracles, respectively, where $T$ is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# スマートシティディジタル双生児のための生成AIを活用する:データ、シナリオ、3D都市モデル、都市デザインの自律的生成に関する調査 Leveraging Generative AI for Smart City Digital Twins: A Survey on the Autonomous Generation of Data, Scenarios, 3D City Models, and Urban Designs ( http://arxiv.org/abs/2405.19464v1 ) ライセンス: Link先を確認	Haowen Xu, Femi Omitaomu, Soheil Sabri, Xiao Li, Yongze Song,	(参考訳) 先進的な情報、コミュニケーション、およびコンピューティング技術を統合することで、現代の都市のデジタルトランスフォーメーションは、効率的で持続可能な都市管理のためのデータ駆動型スマートシティアプリケーションの時代を象徴している。それらの効果にもかかわらず、これらのアプリケーションは、異なる都市サブシステムを監視し、特徴付けるために、大量の高次元およびマルチドメインデータに頼り、データ品質と可用性によって制限されたアプリケーション領域における課題を示し、また、都市シナリオの生成や代替案の設計にコストがかかる。ディープラーニングの新たな研究領域として、生成人工知能(AI)モデルは、データとコード生成における独自の価値を実証している。本稿では, 交通・移動管理, エネルギーシステム運用, 建築・インフラ管理, 都市デザインなど, 都市部におけるスマートシティの領域における課題に対処するために, 生成型AI技術と都市デジタルツインの革新的な統合を検討することを目的とする。調査は、一般的な生成AIモデルとその応用分野の導入から始まり、続いて、生成AI技術の自律的能力を活用した既存の都市科学応用の構造化されたレビューで始まった。 (a)都市モニタリングと予測分析を促進するためのデータ拡張 b) 合成データ及びシナリオ生成 (c)自動3D都市モデリング、及び (d)都市デザインと最適化の創出。このレビューに基づいて、スマートシティのより信頼性が高く、スケーラブルで、自動化された管理のために、生成可能なAIモデルを次世代の都市デジタルツインに統合する潜在的な機会と技術的戦略について論じる。 The digital transformation of modern cities by integrating advanced information, communication, and computing technologies has marked the epoch of data-driven smart city applications for efficient and sustainable urban management. Despite their effectiveness, these applications often rely on massive amounts of high-dimensional and multi-domain data for monitoring and characterizing different urban sub-systems, presenting challenges in application areas that are limited by data quality and availability, as well as costly efforts for generating urban scenarios and design alternatives. As an emerging research area in deep learning, Generative Artificial Intelligence (AI) models have demonstrated their unique values in data and code generation. This survey paper aims to explore the innovative integration of generative AI techniques and urban digital twins to address challenges in the realm of smart cities in various urban sectors, such as transportation and mobility management, energy system operations, building and infrastructure management, and urban design. The survey starts with the introduction of popular generative AI models with their application areas, followed by a structured review of the existing urban science applications that leverage the autonomous capability of the generative AI techniques to facilitate (a) data augmentation for promoting urban monitoring and predictive analytics, (b) synthetic data and scenario generation, (c) automated 3D city modeling, and (d) generative urban design and optimization. Based on the review, this survey discusses potential opportunities and technical strategies that integrate generative AI models into the next-generation urban digital twins for more reliable, scalable, and automated management of smart cities.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# RAP: Sparse-and-Correlated Adapterを用いた効率的なテキストビデオ検索 RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter ( http://arxiv.org/abs/2405.19465v1 ) ライセンス: Link先を確認	Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li,	(参考訳) Text-Video Retrieval (TVR)は、関連するビデオコンテンツと自然言語クエリを連携させることを目的としている。現在までに、ほとんどの最先端のTVR手法は、大規模な事前学習された視覚言語モデル(例えばCLIP)に基づいて、画像からビデオへの変換学習を学習している。しかし、TVR用にトレーニング済みのモデルを完全に微調整することは、非常に高価な計算コストを発生させる。そこで本研究では,テキスト・ビデオ検索の高速化を図るため,テキスト・ビデオ検索の手法をRAP (sparse-andcorrelated AdaPter) を用いて提案する。テキスト・ビデオのシナリオに適合するため,RAPには時間的間隔と相関性という2つの欠かせない特徴が備わっている。具体的には,凍結したCLIPバックボーンから画像毎の特徴を改良する低ランク変調モジュールを提案する。さらに、まずトップレスポンシブな視覚パッチを選択し、学習可能な時間とパッチのオフセットによる相関モデリングを強化する非同期な自己認識機構を導入する。 4つのTVRデータセットに対する大規模な実験により、RAPは完全な微調整や他のパラメータ効率の良い微調整方法と比較して、優れた、または同等のパフォーマンスを達成することが示された。 Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient text-video Retrieval with a sparse-andcorrelated AdaPter (RAP), i.e., fine-tuning the pre-trained model with a few parameterized layers. To accommodate the text-video scenario, we equip our RAP with two indispensable characteristics: temporal sparsity and correlation. Specifically, we propose a low-rank modulation module to refine the per-image features from the frozen CLIP backbone, which accentuates salient frames within the video features while alleviating temporal redundancy. Besides, we introduce an asynchronous self-attention mechanism that first selects the top responsive visual patches and augments the correlation modeling between them with learnable temporal and patch offsets. Extensive experiments on four TVR datasets demonstrate that RAP achieves superior or comparable performance compared to the fully fine-tuned counterpart and other parameter-efficient fine-tuning methods.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 自己回帰生成による後方サンプリング Posterior Sampling via Autoregressive Generation ( http://arxiv.org/abs/2405.19466v1 ) ライセンス: Link先を確認	Kelly W Zhang, Tiffany, Cai, Hongseok Namkoong, Daniel Russo,	(参考訳) 知的エージェントは不確実性を理解し、それを解決するために積極的に情報を集める必要がある。本稿では,大規模な履歴データから帯域幅アルゴリズムを学習するための新しいフレームワークを提案する。まず、過去のデータを用いて自己回帰モデルを事前訓練し、繰り返しフィードバック/リワードの順序を予測する(例えば、時間とともに異なるユーザに対して表示されるニュース記事に対する応答)。正確な予測を行うために、モデルは、リッチなアクション特徴(例:記事の見出し)と、より多くの報酬が集められるにつれて信念を研ぐ方法(例:各記事が推奨されるようにクリックする)に基づいて、暗黙的に情報事前を学習する。意思決定時には、各アクションに対して想像された報酬の列を自動で(インプット)サンプリングし、最大平均的な報酬でアクションを選択する。ヒューリスティックとは程遠いが、我々のアプローチはトンプソンサンプリング(学習前の学習)の実装であり、注目すべき活発な探索アルゴリズムである。我々は,事前学習の損失がオンライン意思決定性能を直接制御できることを証明し,事前学習された言語モデルのエンドツーエンド微調整を統合してニュース記事の見出しテキストを処理し,パフォーマンスを向上させるニューズレコメンデーションタスクにおいて,我々のフレームワークを実証する。 Real-world decision-making requires grappling with a perpetual lack of data as environments change; intelligent agents must comprehend uncertainty and actively gather information to resolve it. We propose a new framework for learning bandit algorithms from massive historical data, which we demonstrate in a cold-start recommendation problem. First, we use historical data to pretrain an autoregressive model to predict a sequence of repeated feedback/rewards (e.g., responses to news articles shown to different users over time). In learning to make accurate predictions, the model implicitly learns an informed prior based on rich action features (e.g., article headlines) and how to sharpen beliefs as more rewards are gathered (e.g., clicks as each article is recommended). At decision-time, we autoregressively sample (impute) an imagined sequence of rewards for each action, and choose the action with the largest average imputed reward. Far from a heuristic, our approach is an implementation of Thompson sampling (with a learned prior), a prominent active exploration algorithm. We prove our pretraining loss directly controls online decision-making performance, and we demonstrate our framework on a news recommendation task where we integrate end-to-end fine-tuning of a pretrained language model to process news article headline text to improve performance.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 機械学習におけるデータ最小化原理 The Data Minimization Principle in Machine Learning ( http://arxiv.org/abs/2405.19471v1 ) ライセンス: Link先を確認	Prakhar Ganesh, Cuong Tran, Reza Shokri, Ferdinando Fioretto,	(参考訳) データ最小化の原則は、不正使用、不正アクセス、データ漏洩の可能性を最小化するために収集、処理、保持されたデータの量を減らすことを目的としている。プライバシ・バイ・デザインの原則に則って、データ最小化はさまざまなグローバルなデータ保護規制によって支持されている。しかし、厳密な定式化が欠如しているため、その実践的な実装は依然として課題である。本稿では,このギャップに対処し,その法的定義に基づくデータ最小化のための最適化フレームワークを提案する。その後、データ最小化の実行にいくつかの最適化アルゴリズムを適用し、最小化目標への準拠やユーザのプライバシへの影響の観点から包括的な評価を行う。我々の分析は、データ最小化のプライバシー期待と実際のプライバシー利益のミスマッチを強調し、現実のプライバシーリスクの複数の側面を考慮に入れたアプローチの必要性を強調している。 The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation. This paper addresses this gap and introduces an optimization framework for data minimization based on its legal definitions. It then adapts several optimization algorithms to perform data minimization and conducts a comprehensive evaluation in terms of their compliance with minimization objectives as well as their impact on user privacy. Our analysis underscores the mismatch between the privacy expectations of data minimization and the actual privacy benefits, emphasizing the need for approaches that account for multiple facets of real-world privacy risks.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 基礎モデルの時代への参加 Participation in the age of foundation models ( http://arxiv.org/abs/2405.19479v1 ) ライセンス: Link先を確認	Harini Suresh, Emily Tseng, Meg Young, Mary L. Gray, Emma Pierson, Karen Levy,	(参考訳) ファンデーションモデルの能力に対する関心と投資の高まりは、そのようなシステムを幅広い公共サービスに影響を与えるように位置づけてきた。これらの機会の他に、これらのシステムが既存の権力不均衡を緩和し、疎外化コミュニティに不均衡な害をもたらすリスクがある。参加型アプローチは、利害関係者に代理店や意思決定権限を貸すことを約束する。しかし、参加型AI/MLにおける既存のアプローチは、一般的にコンテキストに深く根ざしている。私たちの論文はこの質問を尋問する。まず、ファンデーションモデルへの参加を取り入れる既存の試みについて検討する。参加者と規模の間の緊張を強調し、インパクトのあるコミュニティが、普遍的な適用を意図した基盤モデルを有意義に形成することは困難であることを示す。これに対し、我々は、より局所的でアプリケーション指向の、有意義な参加機会を特定する、参加型ファンデーションモデルのための青写真を開発する。基盤」レイヤに加えて、我々のフレームワークでは、ステークホルダーが基盤となるドメインに対して共通の技術基盤、規範、ガバナンスを開発する「下層」層と、影響のあるコミュニティが特定の下流タスクに基盤モデルを使用することを形作る「下層」層を提案しています。中間の「下層」層は、検討すべき潜在的な害の範囲を包含し、議論と介入のためのより具体的な道筋をコミュニティに与えている。同時に、関連するユースケースにまたがって入力をスケールすることで、重複を回避する。臨床医療,金融サービス,ジャーナリズムの3つのケーススタディを通じて,この多層モデルが基盤層にのみ介入するよりも,より有意義な参加機会を生み出すかを説明する。 Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question. First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model that is intended to be universally applicable. In response, we develop a blueprint for participatory foundation models that identifies more local, application-oriented opportunities for meaningful participation. In addition to the "foundation" layer, our framework proposes the "subfloor'' layer, in which stakeholders develop shared technical infrastructure, norms and governance for a grounded domain, and the "surface'' layer, in which affected communities shape the use of a foundation model for a specific downstream task. The intermediate "subfloor'' layer scopes the range of potential harms to consider, and affords communities more concrete avenues for deliberation and intervention. At the same time, it avoids duplicative effort by scaling input across relevant use cases. Through three case studies in clinical care, financial services, and journalism, we illustrate how this multi-layer model can create more meaningful opportunities for participation than solely intervening at the foundation layer.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# テンソルSVDのための量子アルゴリズム Quantum Algorithms for tensor-SVD ( http://arxiv.org/abs/2405.19485v1 ) ライセンス: Link先を確認	Jezer Jojo, Ankit Khandelwal, M Girish Chandra,	(参考訳) 量子コンピューティングへの応用の有望な領域は線形代数問題である。本研究では,2つの新しい量子t-SVD (tensor-SVD)アルゴリズムを提案する。最初のアルゴリズムは、主にコンテキスト認識レコメンデーションシステムのための量子t-SVDアルゴリズムを提案した以前の研究に基づいている。しかし、新しいアルゴリズムは、元の欠点に対処し、修正しようとしており、既存の作業と根本的に異なるアプローチである。提案する2番目のアルゴリズムは、既知の変分量子SVDアルゴリズムに基づくハイブリッド変分法を用いる。 A promising area of applications for quantum computing is in linear algebra problems. In this work, we introduce two new quantum t-SVD (tensor-SVD) algorithms. The first algorithm is largely based on previous work that proposed a quantum t-SVD algorithm for context-aware recommendation systems. The new algorithm however seeks to address and fix certain drawbacks to the original, and is fundamentally different in its approach compared to the existing work. The second algorithm proposed uses a hybrid variational approach largely based on a known variational quantum SVD algorithm.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 大規模データのためのオンライン非パラメトリック教師付き学習 Online Nonparametric Supervised Learning for Massive Data ( http://arxiv.org/abs/2405.19486v1 ) ライセンス: Link先を確認	Mohamed Chaouch, Omama M. Al-Hamed,	(参考訳) 単純さ、計算コストの低さ、データ要求の面での利点にもかかわらず、線形判別分析、二次判別分析、ロジスティック回帰のようなパラメトリック機械学習アルゴリズムは、線形性、通常課される正規分布と高次元性に対する特徴の整合性、といった深刻な欠点に悩まされている。特徴制約の線形性と正規性を克服するバッチカーネルベースの非パラメトリック分類器は、教師付き分類問題の興味深い代替手段である。しかし、それは『次元の計算』に苦しめられている。この問題は、ビッグデータ時代の爆発的なサンプルサイズによって緩和できるが、大規模データサイズはデータの保存と分類器の計算にいくつかの課題をもたらす。これらの課題により、古典的な非パラメトリック分類器はもはや適用されない。これにより,非パラメトリック分類器の大規模化とストリーミングデータフレームワークのリアルタイム計算に適応した高速なアルゴリズムを開発することができる。このオンライン分類器は2つのステップを含む。まず、計算コストを非常に低く抑えるために、オンラインの原理成分分析について検討する。そして、確率近似アルゴリズムを適用し、非パラメトリック分類器のリアルタイム計算を得る。提案手法は、リアルタイムな胎児の健康モニタリングによく使用される機械学習アルゴリズムと比較して評価・比較する。研究によると、オフライン(またはバッチ)だけでなく、オンライン分類器もランダムな森林アルゴリズムと競合する。さらに、オンライン分類器はオフライン分類器と比較して、最良のトレードオフ精度/計算コストを与えることを示した。 Despite their benefits in terms of simplicity, low computational cost and data requirement, parametric machine learning algorithms, such as linear discriminant analysis, quadratic discriminant analysis or logistic regression, suffer from serious drawbacks including linearity, poor fit of features to the usually imposed normal distribution and high dimensionality. Batch kernel-based nonparametric classifier, which overcomes the linearity and normality of features constraints, represent an interesting alternative for supervised classification problem. However, it suffers from the ``curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while large-scale data size presents some challenges in the storage of data and the calculation of the classifier. These challenges make the classical batch nonparametric classifier no longer applicable. This motivates us to develop a fast algorithm adapted to the real-time calculation of the nonparametric classifier in massive as well as streaming data frameworks. This online classifier includes two steps. First, we consider an online principle components analysis to reduce the dimension of the features with a very low computation cost. Then, a stochastic approximation algorithm is deployed to obtain a real-time calculation of the nonparametric classifier. The proposed methods are evaluated and compared to some commonly used machine learning algorithms for real-time fetal well-being monitoring. The study revealed that, in terms of accuracy, the offline (or Batch), as well as, the online classifiers are good competitors to the random forest algorithm. Moreover, we show that the online classifier gives the best trade-off accuracy/computation cost compared to the offline classifier.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 大規模言語モデルに基づく全二重音声対話方式 A Full-duplex Speech Dialogue Scheme Based On Large Language Models ( http://arxiv.org/abs/2405.19487v1 ) ライセンス: Link先を確認	Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Yuanjun Xiong, Wei Xia,	(参考訳) 本稿では,シームレスな対話が可能な生成対話システムについて述べる。これは、知覚モジュール、運動関数モジュール、および2つの状態を持つ単純な有限状態マシン(ニューラルFSMと呼ばれる)の概念を認識するために慎重に整列された大きな言語モデル(LLM)に基づいている。知覚機能モジュールと運動機能モジュールは同時に動作し、システムは同時にユーザの声を聴くことができる。 LLMは、問い合わせ応答のためのテキストトークンを生成し、神経FSMに制御トークンを出力することにより、応答、待機、または中断を開始するための自律的な決定を行う。 LLMのこれらのタスクはすべて、リアルタイムに対話のシリアライズされたビュー上で次のトークン予測として実行される。実生活のインタラクションをシミュレーションした自動品質評価では,LLMベースの半二重対話システムと比較して,平均会話応答遅延を3倍以上に削減し,500ミリ秒以内の応答を50%以上と評価した。 LLMをわずか80億のパラメータで実行すると、音声による対話において最も有効な商用LLMよりも8%高い割り込み精度を示す。 We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allowing the system to simultaneously speak and listen to the user. The LLM generates textual tokens for inquiry responses and makes autonomous decisions to start responding to, wait for, or interrupt the user by emitting control tokens to the neural FSM. All these tasks of the LLM are carried out as next token prediction on a serialized view of the dialogue in real-time. In automatic quality evaluations simulating real-life interaction, the proposed system reduces the average conversation response latency by more than 3 folds compared with LLM-based half-duplex dialogue systems while responding within less than 500 milliseconds in more than 50% of evaluated interactions. Running a LLM with only 8 billion parameters, our system exhibits a 8% higher interruption precision rate than the best available commercial LLM for voice-based dialogue.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# Total Segmentator MRI : MRI画像における59個の解剖学的構造の配列非依存的セグメンテーション TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images ( http://arxiv.org/abs/2405.19492v1 ) ライセンス: Link先を確認	Tugba Akinci D'Antonoli, Lucas K. Berger, Ashraya K. Indrakanti, Nathan Vishwanathan, Jakob Weiß, Matthias Jung, Zeynep Berkarda, Alexander Rau, Marco Reisert, Thomas Küstner, Alexandra Walter, Elmar M. Merkle, Martin Segeroth, Joshy Cyriac, Shan Yang, Jakob Wasserthal,	(参考訳) 目的:MR画像のほとんどの解剖学的構造をMRシーケンスから独立して自動的かつ堅牢に分割できる,オープンソースで使いやすいセグメンテーションモデルを開発すること。材料と方法:本研究では,TotalSegmentatorをMR画像に拡張した。 298個のMRIスキャンと227個のCTスキャンを用いて59個の解剖学的構造(20の臓器,18の骨,11の筋肉,7の血管,3の組織型)を分類した。 MRとCTの画像は、通常の臨床研究からランダムにサンプリングされ、現実世界のデータセット(年齢、病理、スキャナー、身体部分、シーケンス、コントラスト、エコー時間、反復時間、フィールド強度、スライス厚さ、サイト)を表す。我々は,このデータセット上でnnU-Netセグメンテーションアルゴリズムを訓練し,モデルの性能を評価するためにDice類似度係数(Dice)を算出した。結果: Dice スコアは 0.824 (CI: 0.801, 0.842) であった。このモデルは、他の2つの公開セグメンテーションモデル(Dice score, 0.824 vs 0.762; p<0.001 and 0.762 versus 0.542; p<0.001)を大きく上回った。 The CT image test set of the original TotalSegmentator paper, almost match the performance of the original TotalSegmentator (Dice score, 0.960 versus 0.970; p<0.001)。結論:本モデルでは,TotalSegmentatorをMR画像に拡張する。アノテーション付きデータセット(https://zenodo.org/doi/10.5281/zenodo.11367004)とオープンソースツールキット(https://www.github.com/wasserth/TotalSegmentator)が公開されている。 Purpose: To develop an open-source and easy-to-use segmentation model that can automatically and robustly segment most major anatomical structures in MR images independently of the MR sequence. Materials and Methods: In this study we extended the capabilities of TotalSegmentator to MR images. 298 MR scans and 227 CT scans were used to segment 59 anatomical structures (20 organs, 18 bones, 11 muscles, 7 vessels, 3 tissue types) relevant for use cases such as organ volumetry, disease characterization, and surgical planning. The MR and CT images were randomly sampled from routine clinical studies and thus represent a real-world dataset (different ages, pathologies, scanners, body parts, sequences, contrasts, echo times, repetition times, field strengths, slice thicknesses and sites). We trained an nnU-Net segmentation algorithm on this dataset and calculated Dice similarity coefficients (Dice) to evaluate the model's performance. Results: The model showed a Dice score of 0.824 (CI: 0.801, 0.842) on the test set, which included a wide range of clinical data with major pathologies. The model significantly outperformed two other publicly available segmentation models (Dice score, 0.824 versus 0.762; p<0.001 and 0.762 versus 0.542; p<0.001). On the CT image test set of the original TotalSegmentator paper it almost matches the performance of the original TotalSegmentator (Dice score, 0.960 versus 0.970; p<0.001). Conclusion: Our proposed model extends the capabilities of TotalSegmentator to MR images. The annotated dataset (https://zenodo.org/doi/10.5281/zenodo.11367004) and open-source toolkit (https://www.github.com/wasserth/TotalSegmentator) are publicly available.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 後方刺激ブリルアン散乱による光学的絡み合い Optomechanical entanglement induced by backward stimulated Brillouin scattering ( http://arxiv.org/abs/2405.19494v1 ) ライセンス: Link先を確認	P. Djorwé, A. H. Abdel-Aty, K. S. Nisar, S. G. Nana Engo,	(参考訳) 本稿では,ロバストな光学的絡み合いを生成する手法を提案する。このスキームは、光学構造内にホストされる後方刺激ブリルアン散乱(BSBS)プロセスに基づいている。我々のベンチマークシステムは、BSBS(放射圧)プロセスを介して2つの光学モードに結合された音響(機械)モードで構成されている。有効機械結合の適度な値に対して、BSBSは比較的弱い絡み合いを誘導する。この絡み合いは、機械的結合強度が十分に強い場合、少なくとも1桁まで大きく強化される。生成した絡み合いは熱ゆらぎに対して十分に堅牢である。我々の研究は、BSBS効果に基づくエンタングルメント生成の新しいスキームを提供し、マイクロ波やハイブリッド光学構造に拡張することができる。このような生成された絡み合った状態は、量子情報処理、量子センシング、量子コンピューティングに利用できる。 We propose a scheme to generate robust optomechanical entanglement. This scheme is based on a Backward Stimulated Brillouin Scattering (BSBS) process, which is hosted within an optomechanical structure. Our benchmark system consists of an acoustic (mechanical) mode coupled to two optical modes through the BSBS (radiation pressure) process. For a moderate values of the effective mechanical coupling, the BSBS induces a relatively weak entanglement. This entanglement is greatly enhanced, for at least up to one order of magnitude, when the mechanical coupling strength is strong enough. The generated entanglement is robust enough against thermal fluctuation. Our work provides a new scheme for entanglement generation based on BSBS effect, and can be extended to microwaves and hybrid optomechanical structures. Such a generated entangled states can be used for quantum information processing, quantum sensing, and quantum computing.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# Qiskit Code Assistant: 量子コンピューティングコードを生成するためのLLMのトレーニング Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code ( http://arxiv.org/abs/2405.19495v1 ) ライセンス: Link先を確認	Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito,	(参考訳) Code Large Language Models (Code LLMs) は強力なツールとして登場し、コーディングプロセスの自動化とアプリケーション構築に必要な時間と労力の削減によって、ソフトウェア開発の世界に革命をもたらした。本稿では,量子コンピューティングの分野を専門とする Code LLM のトレーニングに焦点をあてる。まず、古典的なプログラミング手法や言語とは大きく異なる量子コンピューティングプログラミングのユニークなニーズについて議論する。量子コンピューティングに特化したコードLLMは、量子コンピューティングと量子情報理論の基本的な理解を必要とする。しかし、利用可能な量子コードサンプルの不足と、継続的なデータセット更新を必要とする急速に進化するフィールドは、重大な課題を呈している。さらに,Qiskitライブラリを用いた高品質な量子コードを生成するために,コードLLMをトレーニングする作業についても論じる。本研究は, 訓練用LLMの様々な側面と訓練条件, および現在のモデルで得られた結果について検討することを含む。我々のモデルを評価するために、我々は、Qiskitを用いた量子コンピューティングプログラミングの分野に特化して設計された一連のテストを含むHumanEvalに似たカスタムベンチマークを開発した。以上の結果から,我々のモデルは量子コンピューティングタスクにおける既存の最先端モデルよりも優れていたことが示唆された。また、コード提案の例を示し、私たちのモデルを他の関連するコードLLMと比較します。最後に、量子コンピューティングの計算科学者、研究者、実践者に対して、Code LLMsの潜在的なメリットについて論じる。この状況に関係のあるさまざまな機能や今後の作業についても検討しています。 Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ significantly from classical programming approaches or languages. A Code LLM specializing in quantum computing requires a foundational understanding of quantum computing and quantum information theory. However, the scarcity of available quantum code examples and the rapidly evolving field, which necessitates continuous dataset updates, present significant challenges. Moreover, we discuss our work on training Code LLMs to produce high-quality quantum code using the Qiskit library. This work includes an examination of the various aspects of the LLMs used for training and the specific training conditions, as well as the results obtained with our current models. To evaluate our models, we have developed a custom benchmark, similar to HumanEval, which includes a set of tests specifically designed for the field of quantum computing programming using Qiskit. Our findings indicate that our model outperforms existing state-of-the-art models in quantum computing tasks. We also provide examples of code suggestions, comparing our model to other relevant code LLMs. Finally, we introduce a discussion on the potential benefits of Code LLMs for quantum computing computational scientists, researchers, and practitioners. We also explore various features and future work that could be relevant in this context.	翻訳日:2024-05-31 19:16:17 公開日:2024-05-29
# 未ペアデータを用いた音声ドメイン転送のためのガウス流橋 Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data ( http://arxiv.org/abs/2405.19497v1 ) ライセンス: Link先を確認	Eloi Moliner, Sebastian Braun, Hannes Gamper,	(参考訳) オーディオドメイン転送(Audio domain transfer)とは、元のコンテンツを保持しながら、異なるドメインの特性に合わせて音声信号を変更するプロセスである。本稿では,生成モデルにおけるガウス流橋の可能性について検討する。提案フレームワークは,2つの決定論的確率フローの一連の実装を通じて,音声信号の異なる分布間の伝達問題に対処する。提案フレームワークは,対象領域の特定の側面を定義する連続制御変数を通じて,対象分布特性の操作を容易にする。特に、このアプローチはペアの例をトレーニングに頼ってはいません。音声内容の一貫性を維持する上での課題に対処するため,チャンクをベースとしたデータサンプルとノイズの最適輸送結合を含むトレーニング戦略を推奨する。教師なし手法と確立されたベースラインを比較すると,残響や歪み操作のタスクにおいて,競争性能が向上することがわかった。この研究で得られた興味深い結果は、さらなる探査の可能性を浮き彫りにしている。 Audio domain transfer is the process of modifying audio signals to match characteristics of a different domain, while retaining the original content. This paper investigates the potential of Gaussian Flow Bridges, an emerging approach in generative modeling, for this problem. The presented framework addresses the transport problem across different distributions of audio signals through the implementation of a series of two deterministic probability flows. The proposed framework facilitates manipulation of the target distribution properties through a continuous control variable, which defines a certain aspect of the target domain. Notably, this approach does not rely on paired examples for training. To address identified challenges on maintaining the speech content consistent, we recommend a training strategy that incorporates chunk-based minibatch Optimal Transport couplings of data samples and noise. Comparing our unsupervised method with established baselines, we find competitive performance in tasks of reverberation and distortion manipulation. Despite encoutering limitations, the intriguing results obtained in this study underscore potential for further exploration.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 機械心理学:人工汎用知能研究の促進のための非公理推論システムと操作条件の統合 Machine Psychology: Integrating Operant Conditioning with the Non-Axiomatic Reasoning System for Advancing Artificial General Intelligence Research ( http://arxiv.org/abs/2405.19498v1 ) ライセンス: Link先を確認	Robert Johansson,	(参考訳) 本稿では,機械心理学という学際的枠組みを導入し,操作的学習心理学から特定の人工知能モデル,非公理推論システム(NARS)を融合させ,人工知能(AGI)の研究を強化する。この枠組みの中核的な前提は、適応は生物学的および人工知能の両方にとって不可欠であり、操作的な条件付け原理によって理解できるということである。本研究は,OpenNARS for Applications (ONA) を用いた3つの操作的学習タスクを通して,本手法を評価する。単純な識別タスクでは、NARSは迅速な学習を示し、トレーニングとテストの段階で完全な精度を達成した。タスク条件が逆転した際の動作の調整に成功し、NARSの適応性を示した。条件付き識別タスクにおいて、NARSは複雑な学習シナリオを効果的に処理し、条件付きキューに基づいて複雑な仮説を形成し、活用することで高い精度を達成する。これらの知見は適応型AGIシステム構築のフレームワークとしてのオペラントコンディショニングの活用を支援する。 NARSの知識と資源不足の条件下での運用能力は、感覚運動の推論能力と相まって、AGIの堅牢なモデルとして確立されている。 Machine Psychologyフレームワークは、継続的学習やゴール駆動行動といった自然知性の要素を取り入れることで、現実世界のアプリケーションにスケーラブルで柔軟なアプローチを提供する。今後の研究は、強化されたNARSシステム、より高度なタスクを使用して、このフレームワークを多種多様な複雑な課題に適用し、人間レベルのAIの開発をさらに進展させるべきである。 This paper introduces an interdisciplinary framework called Machine Psychology, which merges principles from operant learning psychology with a specific Artificial Intelligence model, the Non-Axiomatic Reasoning System (NARS), to enhance Artificial General Intelligence (AGI) research. The core premise of this framework is that adaptation is crucial to both biological and artificial intelligence and can be understood through operant conditioning principles. The study assesses this approach via three operant learning tasks using OpenNARS for Applications (ONA): simple discrimination, changing contingencies, and conditional discrimination tasks. In the simple discrimination task, NARS demonstrated rapid learning, achieving perfect accuracy during both training and testing phases. The changing contingencies task showcased NARS's adaptability, as it successfully adjusted its behavior when task conditions were reversed. In the conditional discrimination task, NARS handled complex learning scenarios effectively, achieving high accuracy by forming and utilizing intricate hypotheses based on conditional cues. These findings support the application of operant conditioning as a framework for creating adaptive AGI systems. NARS's ability to operate under conditions of insufficient knowledge and resources, coupled with its sensorimotor reasoning capabilities, establishes it as a robust model for AGI. The Machine Psychology framework, by incorporating elements of natural intelligence such as continuous learning and goal-driven behavior, offers a scalable and flexible approach for real-world applications. Future research should investigate using enhanced NARS systems, more advanced tasks, and applying this framework to diverse, complex challenges to further progress the development of human-level AI.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 勝利のためのモメンタム:異種環境における協調的強化学習 Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments ( http://arxiv.org/abs/2405.19499v1 ) ライセンス: Link先を確認	Han Wang, Sihong He, Zhili Zhang, Fei Miao, James Anderson,	(参考訳) 我々は、フェデレート強化学習(FRL)問題を探り、N$エージェントが共通の方針を、軌跡データを共有せずに共同で学習する。これまで、既存のFRL作業は、主に同じまたは‘類似’環境で動作するエージェントに焦点を当ててきた。対照的に、我々の問題設定は、任意に大きな環境不均一性を可能にします。完全に異なる環境における平均性能を最大化する最適ポリシーを得るために,FedSVRPG-MとFedHAPG-Mの2つのアルゴリズムを提案する。既存の結果とは対照的に, 運動量機構を利用するFedSVRPG-MとFedHAPG-Mは, 環境の不均一性に関わらず, 平均性能関数の定常点に正確に収束できることを実証した。さらに、分散還元法やヘッセン近似の利点を取り入れることで、両アルゴリズムは、$\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$のサンプル複雑性を特徴とする最先端の収束結果が得られる。特に,本アルゴリズムはエージェント数に関して線形収束の高速化を享受し,共通ポリシーの発見におけるエージェント間の協調のメリットを強調している。 We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# MDS-ViTNet:視覚変換器による視線追跡の精度予測の改善 MDS-ViTNet: Improving saliency prediction for Eye-Tracking with Vision Transformer ( http://arxiv.org/abs/2405.19501v1 ) ライセンス: Link先を確認	Polezhaev Ignat, Goncharenko Igor, Iurina Natalya,	(参考訳) 本稿では、視覚的サリエンシ予測や視線追跡を改善するため、MDS-ViTNet(Multi Decoder Saliency by Vision Transformer Network)と呼ばれる新しい手法を提案する。このアプローチは、マーケティング、医療、ロボティクス、小売など、さまざまな分野において大きな可能性を秘めている。本稿では、従来のImageNetバックボーンを超えて、Vision Transformerを利用するネットワークアーキテクチャを提案する。フレームワークはエンコーダ-デコーダ構造を採用し、エンコーダはSwinトランスフォーマーを使用して最も重要な機能を効率的に埋め込む。このプロセスにはTransfer Learningメソッドが含まれており、Vision TransformerのレイヤはEncoder Transformerで変換され、CNN Decoderにシームレスに統合される。この手法は、元の入力画像からの情報損失を最小限に抑える。デコーダは2つの異なる注意マップを生成するためにデュアルデコーダを利用するマルチデコーダ技術を採用している。これらの写像はその後、追加のCNNモデルを介して特異出力に結合される。我々のトレーニングモデルMDS-ViTNetは、いくつかのベンチマークで最先端の結果を得る。さらなるコラボレーションを促進するために、コードやモデル、データセットを一般向けに公開するつもりです。 In this paper, we present a novel methodology we call MDS-ViTNet (Multi Decoder Saliency by Vision Transformer Network) for enhancing visual saliency prediction or eye-tracking. This approach holds significant potential for diverse fields, including marketing, medicine, robotics, and retail. We propose a network architecture that leverages the Vision Transformer, moving beyond the conventional ImageNet backbone. The framework adopts an encoder-decoder structure, with the encoder utilizing a Swin transformer to efficiently embed most important features. This process involves a Transfer Learning method, wherein layers from the Vision Transformer are converted by the Encoder Transformer and seamlessly integrated into a CNN Decoder. This methodology ensures minimal information loss from the original input image. The decoder employs a multi-decoding technique, utilizing dual decoders to generate two distinct attention maps. These maps are subsequently combined into a singular output via an additional CNN model. Our trained model MDS-ViTNet achieves state-of-the-art results across several benchmarks. Committed to fostering further collaboration, we intend to make our code, models, and datasets accessible to the public.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 低温における2000サイト光ツイーザアレイにおける単一原子の再配置 Rearrangement of single atoms in a 2000-site optical tweezers array at cryogenic temperatures ( http://arxiv.org/abs/2405.19503v1 ) ライセンス: Link先を確認	Grégoire Pichard, Desiree Lim, Etienne Bloch, Julien Vaneecloo, Lilian Bourachot, Gert-Jan Both, Guillaume Mériaux, Sylvain Dutartre, Richard Hostein, Julien Paris, Bruno Ximenez, Adrien Signoles, Antoine Browaeys, Thierry Lahaye, Davide Dreon,	(参考訳) 我々は6Kの低温環境下で2088箇所までの光学的ツイーザからなる1つのルビジウム原子のトラップについて報告する。我々のアプローチは、真空だが室温の顕微鏡目的と、捕捉された原子の低温環境を確保するために目的が突き出ている窓のない熱シールドとの併用に依拠する。効率的なトラップを行うのに十分な光学パワーを達成するために、我々は2つのレーザーをわずかに異なる波長で組み合わせる。設計の性能と限界について論じる。最後に、フィールドプログラマブルゲートアレイによって制御される移動光学式ツイーザを用いた828原子ターゲットアレイの原子間アレンジメントを実証する。 We report on the trapping of single rubidium atoms in large arrays of optical tweezers comprising up to 2088 sites in a cryogenic environment at 6 K. Our approach relies on the use of microscope objectives that are in-vacuum but at room temperature, in combination with windowless thermal shields into which the objectives are protruding to ensure a cryogenic environment for the trapped atoms. To achieve enough optical power for efficient trapping, we combine two lasers at slightly different wavelengths. We discuss the performance and limitations of our design. Finally, we demonstrate atom-by-atom rearrangement of an 828-atom target array using moving optical tweezers controlled by a field-programmable gate array.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 任意遅延を伴う時間変数ネットワークにおける分散最適化 Decentralized Optimization in Time-Varying Networks with Arbitrary Delays ( http://arxiv.org/abs/2405.19513v1 ) ライセンス: Link先を確認	Tomas Ortega, Hamid Jafarkhani,	(参考訳) 通信遅延によるネットワークの分散最適化問題を考察する。そのようなネットワークの例としては、協調機械学習、センサーネットワーク、マルチエージェントシステムなどがある。通信遅延を模倣するため、ネットワークに仮想非計算ノードを追加し、有向グラフを生成する。このことは、有向グラフ上の分散最適化ソリューションの調査を動機付けている。既存のソリューションでは、ノードはアウト学位を知っていると仮定し、適用性は制限される。この制限を克服するために、DT-GOと呼ばれる新しいゴシップベースのアルゴリズムを導入する。このアルゴリズムは一般的な有向ネットワーク、例えば遅延や限定的な認知能力を持つネットワークに適用できる。我々は凸目標と非凸目標の両方に対して収束率を導出し、このアルゴリズムが集中確率勾配Descentと同じ複雑さのオーダーを達成することを示す。言い換えれば、グラフ位相と遅延の影響は高次項に限られる。さらに、時間変化のネットワークトポロジに対応するために分析を拡張します。理論的知見を裏付ける数値シミュレーションが提案されている。 We consider a decentralized optimization problem for networks affected by communication delays. Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems. To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs. This motivates investigating decentralized optimization solutions on directed graphs. Existing solutions assume nodes know their out-degrees, resulting in limited applicability. To overcome this limitation, we introduce a novel gossip-based algorithm, called DT-GO, that does not need to know the out-degrees. The algorithm is applicable in general directed networks, for example networks with delays or limited acknowledgment capabilities. We derive convergence rates for both convex and non-convex objectives, showing that our algorithm achieves the same complexity order as centralized Stochastic Gradient Descent. In other words, the effects of the graph topology and delays are confined to higher-order terms. Additionally, we extend our analysis to accommodate time-varying network topologies. Numerical simulations are provided to support our theoretical findings.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 無線周波数での視覚認識の実現 Enabling Visual Recognition at Radio Frequency ( http://arxiv.org/abs/2405.19516v1 ) ライセンス: Link先を確認	Haowen Lai, Gaoxiang Luo, Yifei Liu, Mingmin Zhao,	(参考訳) 本稿では、光信号に挑戦する条件に対する耐性を提供しつつ、RF分解能をLiDARに近づける新しいRFイメージングシステムであるPanoRadarを紹介する。当社のLiDAR対応3D画像は, 表面正規推定, セマンティックセグメンテーション, オブジェクト検出など, 無線周波数での様々な視覚認識タスクを初めて実現した。 PanoRadarは、回転するシングルチップmmWaveレーダーと、新しい信号処理と機械学習アルゴリズムを組み合わせて、周囲の高解像度な3D画像を生成する。本システムはロボットの動きを正確に推定し,高密度の合成アンテナ網によるコヒーレントイメージングを可能にする。また、高い方位分解能を利用して、学習に基づく手法で高分解能を向上させる。さらに、PanoRadarは2D畳み込みによる3D学習に取り組み、RF信号のユニークな特性のために課題に対処する。以上の結果から,パノラダルの12棟の建物における堅牢な性能が示された。 This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. PanoRadar utilizes a rotating single-chip mmWave radar, along with a combination of novel signal processing and machine learning algorithms, to create high-resolution 3D images of the surroundings. Our system accurately estimates robot motion, allowing for coherent imaging through a dense grid of synthetic antennas. It also exploits the high azimuth resolution to enhance elevation resolution using learning-based methods. Furthermore, PanoRadar tackles 3D learning via 2D convolutions and addresses challenges due to the unique characteristics of RF signals. Our results demonstrate PanoRadar's robust performance across 12 buildings.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 中距離を超える大気・海洋予測のためのハイブリッド・マシンラーニング/物理モデルの可能性を探る Exploring the Potential of Hybrid Machine-Learning/Physics-Based Modeling for Atmospheric/Oceanic Prediction Beyond the Medium Range ( http://arxiv.org/abs/2405.19518v1 ) ライセンス: Link先を確認	Dhruvit Patel, Troy Arcomano, Brian Hunt, Istvan Szunyogh, Edward Ott,	(参考訳) 本稿では、機械学習(ML)と従来の物理モデルを組み合わせたハイブリッドモデリング手法の可能性について検討する。短距離・中距離気象予測のためのアプローチを検証したArcomano et al(2022年)と、気候モデリングの可能性を調査したArcomano et al(2023年)の作業を拡張した。論文の予測実験に使用するハイブリッドモデルは,低分解能で簡易なパラメータ化大気一般循環モデル(AGCM)SPEEDYに基づいている。 SPEEDYのハイブリッド化された予後変数に加えて、現在のモデルには3つのMLベースの予後変数がある。そのうちの1つは6~hの累積降水であり、もう1つは海面温度であり、もう1つは海深300mの層の熱量である。このモデルには、エルニーニョのサイクルと、季節によって3～7ヶ月の降水を伴う世界的なテレコネクションを予測する能力がある。このモデルはケルビン波やロスビー波、MJOに伴う降水の赤道変動を捉えている。赤道域の降水量の予測は東太平洋で15日、西太平洋で11.5日である。このモデルは空間分解能が低いが、これらのタスクには高解像度で純粋に物理ベースの従来の運用予測モデルに匹敵する予測技術がある。 This paper explores the potential of a hybrid modeling approach that combines machine learning (ML) with conventional physics-based modeling for weather prediction beyond the medium range. It extends the work of Arcomano et al. (2022), which tested the approach for short- and medium-range weather prediction, and the work of Arcomano et al. (2023), which investigated its potential for climate modeling. The hybrid model used for the forecast experiments of the paper is based on the low-resolution, simplified parameterization atmospheric general circulation model (AGCM) SPEEDY. In addition to the hybridized prognostic variables of SPEEDY, the current version of the model has three purely ML-based prognostic variables. One of these is 6~h cumulative precipitation, another is the sea surface temperature, while the third is the heat content of the top 300 m deep layer of the ocean. The model has skill in predicting the El Ni\~no cycle and its global teleconnections with precipitation for 3-7 months depending on the season. The model captures equatorial variability of the precipitation associated with Kelvin and Rossby waves and MJO. Predictions of the precipitation in the equatorial region have skill for 15 days in the East Pacific and 11.5 days in the West Pacific. Though the model has low spatial resolution, for these tasks it has prediction skill comparable to what has been published for high-resolution, purely physics-based, conventional operational forecast models.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 低リソース医療質問応答のための2層検索拡張フレームワーク:Redditデータを用いた概念実証 Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data ( http://arxiv.org/abs/2405.19519v1 ) ライセンス: Link先を確認	Sudeshna Das, Yao Ge, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker,	(参考訳) Retrieval augmented generation(RAG)は、生成モデル出力を制限し、関連するインコンテキストテキストを提供することで幻覚の可能性を軽減する能力を提供する。生成的大言語モデル(LLM)は、文脈が有限であるため、トークンの数を組み込むことができるため、答えを生成するための知識の量を制限することができる。本稿では,クエリに着目した回答生成のための2層RAGフレームワークを提案する。評価は,資源制約設定における2層フレームワークの有効性を示し,研究者がユーザからリアルタイムに近いデータを得ることを可能にする。 Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for query-focused answer generation and evaluate a proof-of-concept for this framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. The evaluations demonstrate the effectiveness of the two-layer framework in resource constrained settings to enable researchers in obtaining near real-time data from users.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 困難を伴うクラウドソーシング:不均一項目に対するベイズ評価モデル Crowdsourcing with Difficulty: A Bayesian Rating Model for Heterogeneous Items ( http://arxiv.org/abs/2405.19521v1 ) ライセンス: Link先を確認	Seong Woo Han, Ozan Adıgüzel, Bob Carpenter,	(参考訳) 応用統計学と機械学習では、訓練に使用される「金の標準」はしばしば偏りがあり、ほとんど常にうるさい。 DawidとSkeneの人気の高いクラウドソーシングモデルは、レーダ(コーダ、アノテータ)の感度と特異性を調整するが、トレーニングのために収集されたレーティングデータの分布特性を捉えることができず、それがトレーニングのバイアスとなる。本研究では,難易度,識別性,推測可能性に項目レベルの影響を加えることで,コンセンサスカテゴリを推測できる汎用的な測定エラーモデルを提案する。さらに、これらのモデルのバイモーダル後部を制限し、(必要であれば許容)敵のレーダを避ける方法を示す。このモデルが後方予測チェックに適合するかどうかを検証し, ベイジアンによる$\chi^2$検定の類似性を検証した。 Dawid と Skene のモデルは適合試験の良さによって無視されるが、アイテムの不均一性を調整する新しいモデルは拒否されない。我々は,2つのよく研究されたデータセット,歯科用X線撮影におけるケーリーのバイナリ・レーティング・データ,および自然言語による含意について述べる。 In applied statistics and machine learning, the "gold standards" used for training are often biased and almost always noisy. Dawid and Skene's justifiably popular crowdsourcing model adjusts for rater (coder, annotator) sensitivity and specificity, but fails to capture distributional properties of rating data gathered for training, which in turn biases training. In this study, we introduce a general purpose measurement-error model with which we can infer consensus categories by adding item-level effects for difficulty, discriminativeness, and guessability. We further show how to constrain the bimodal posterior of these models to avoid (or if necessary, allow) adversarial raters. We validate our model's goodness of fit with posterior predictive checks, the Bayesian analogue of $\chi^2$ tests. Dawid and Skene's model is rejected by goodness of fit tests, whereas our new model, which adjusts for item heterogeneity, is not rejected. We illustrate our new model with two well-studied data sets, binary rating data for caries in dental X-rays and implication in natural language.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# 人工知能インデックスレポート2024 Artificial Intelligence Index Report 2024 ( http://arxiv.org/abs/2405.19522v1 ) ライセンス: Link先を確認	Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Jack Clark,	(参考訳) 2024指数は今まででもっとも包括的な指標であり、AIが社会に影響を及ぼした影響がより顕著になることはなかった重要な瞬間に到達した。今年は、AIの技術的進歩、技術に対する大衆の認識、開発を取り巻く地政学のダイナミクスなど、重要なトレンドをより広範囲にカバーする範囲を広げました。このエディションでは、AIトレーニングコストに関する新たな見積、責任あるAIの状況に関する詳細な分析、AIが科学と医学に与える影響に関するまったく新しい章が導入されている。 AIインデックスは、人工知能(AI)に関連するデータを追跡、照合、蒸留、可視化する。私たちのミッションは、政策立案者、研究者、幹部、ジャーナリスト、一般大衆に対して、AIの複雑な分野に関するより徹底的で厳密な理解を深めるために、偏見のない、厳格に審査された、広範囲にソースされたデータを提供することです。 AIインデックスは、人工知能に関するデータと洞察の最も信頼性が高く権威のある情報源の1つとして、世界的に認識されている。ニューヨーク・タイムズ、ブルームバーグ、ガーディアンなどの主要新聞では、何百もの学術的引用を集めており、米国、英国、欧州連合の高レベルの政策立案者によって参照されている。今年のエディションは、サイズ、スケール、スコープのすべての旧版を上回り、AIが私たちの人生で持つ重要性が増していることを反映している。 The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ever before, this edition introduces new estimates on AI training costs, detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI's impact on science and medicine. The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The AI Index is recognized globally as one of the most credible and authoritative sources for data and insights on artificial intelligence. Previous editions have been cited in major newspapers, including the The New York Times, Bloomberg, and The Guardian, have amassed hundreds of academic citations, and been referenced by high-level policymakers in the United States, the United Kingdom, and the European Union, among other places. This year's edition surpasses all previous ones in size, scale, and scope, reflecting the growing significance that AI is coming to hold in all of our lives.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# ポイント・プロセス・ラーニングとTaccs-Fiksel 推定の特殊な場合の比較 Comparison of Point Process Learning and its special case Takacs-Fiksel estimation ( http://arxiv.org/abs/2405.19523v1 ) ライセンス: Link先を確認	Julia Jansson, Ottmar Cronie,	(参考訳) 最近、Cronie et al (2024)はポイントプロセスのクロスバリデーションの概念と、ポイントプロセス学習(PPL)と呼ばれる新しい統計方法論を導入した。 PPLでは、ポイントプロセス/パターンをトレーニングと検証セットに分割し、パラメトリドのパパンガルー条件強度によって後者を前者から予測する。モデルパラメータは点過程予測誤差を最小化することで推定され、この概念はPPLの2番目のビルディングブロックとして導入された。 PPLは、Gibsハードコアプロセスのカーネル強度推定とパラメータ推定の両方において、最先端技術よりも優れていることを示した。後者の場合、最先端技術は擬似的類似度推定によって表される。本稿では,PPLとTaccs-Fiksel推定の関係について検討する。本稿では, 特定の損失関数を持つPLPが, クロスバリデーション体制を離脱する傾向にある場合, 特定の損失関数を持つPLPをTakacs-Fiksel推定に漸近的に還元するという意味では, PPLの特別な場合であることを示す。さらに、PPLは重み関数によって与えられるある種のハイパーパラメータを伴い、予測誤差が期待値ゼロであることを保証する。重み関数は一般ギブスモデルに対して明示的だが難解な形式をとることを示す。そこで本研究では,実際の重量関数を推定するための異なる手法を提案する。一般のPPLセットアップが特殊ケースであるTakacs-Fiksel推定と比較してどのように動作するかを評価するため、一般的なGibsモデルでは損失関数やハイパーパラメータが得られ、PPLは平均二乗誤差でTakacs-Fiksel推定を著しく上回る。ここで、ハイパーパラメータは、クロスバリデーションパラメータと重み関数の推定値である。 Recently, Cronie et al. (2024) introduced the notion of cross-validation for point processes and a new statistical methodology called Point Process Learning (PPL). In PPL one splits a point process/pattern into a training and a validation set, and then predicts the latter from the former through a parametrised Papangelou conditional intensity. The model parameters are estimated by minimizing a point process prediction error; this notion was introduced as the second building block of PPL. It was shown that PPL outperforms the state-of-the-art in both kernel intensity estimation and estimation of the parameters of the Gibbs hard-core process. In the latter case, the state-of-the-art was represented by pseudolikelihood estimation. In this paper we study PPL in relation to Takacs-Fiksel estimation, of which pseudolikelihood is a special case. We show that Takacs-Fiksel estimation is a special case of PPL in the sense that PPL with a specific loss function asymptotically reduces to Takacs-Fiksel estimation if we let the cross-validation regime tend to leave-one-out cross-validation. Moreover, PPL involves a certain type of hyperparameter given by a weight function which ensures that the prediction errors have expectation zero if and only if we have the correct parametrisation. We show that the weight function takes an explicit but intractable form for general Gibbs models. Consequently, we propose different approaches to estimate the weight function in practice. In order to assess how the general PPL setup performs in relation to its special case Takacs-Fiksel estimation, we conduct a simulation study where we find that for common Gibbs models we can find loss functions and hyperparameters so that PPL typically outperforms Takacs-Fiksel estimation significantly in terms of mean square error. Here, the hyperparameters are the cross-validation parameters and the weight function estimate.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# AIリスクマネジメントは安全とセキュリティの両方を取り入れるべきである AI Risk Management Should Incorporate Both Safety and Security ( http://arxiv.org/abs/2405.19524v1 ) ライセンス: Link先を確認	Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal,	(参考訳) 安全に整合した言語モデルにおけるセキュリティ脆弱性の暴露、例えば、敵攻撃に対する感受性は、AIの安全性とAIのセキュリティの間の複雑な相互作用に光を当てている。現在、2つの規律はAIリスク管理という大まかな目標の下にまとめられているが、それらは歴史的に別々に進化し、異なる視点を生み出している。そこで,本稿では,AIリスクマネジメントの利害関係者が,安全と安全の間のニュアンス,シナジー,相互作用を意識し,主に効果的で全体論的リスク軽減アプローチを考案するために,両分野の視点を明確かつ考慮しなくてはならないことを主張する。残念なことに、このビジョンは「安全」と「安全」の基本的な概念の定義が矛盾し、コミュニティ全体でのコンセンサスが欠如しているため、しばしば難解である。 AIのリスク管理はますます学際的になってきており、この問題は特に健全だ。この概念的課題を踏まえ、我々は、コミュニティ間の共通理解と効果的なコラボレーションを促進することを目的として、AIの安全性とAIのセキュリティの違いと相互作用を明らかにする統一された参照フレームワークを導入する。 The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# ビデオオブジェクトセグメンテーションにおける領域一般化のためのサブネットの動的成長木を用いた生涯学習 Lifelong Learning Using a Dynamically Growing Tree of Sub-networks for Domain Generalization in Video Object Segmentation ( http://arxiv.org/abs/2405.19525v1 ) ライセンス: Link先を確認	Islam Osman, Mohamed S. Shehata,	(参考訳) 現在の最先端のビデオオブジェクトセグメンテーションモデルは、大量のラベル付きトレーニングデータセットを用いた教師あり学習を用いて大きな成功を収めている。しかし、これらのモデルは単一のソースドメインを使用してトレーニングされ、同じソースドメインからサンプルされたビデオを使用して評価される。これらのモデルが異なる対象領域からサンプリングされたビデオを用いて評価されると、それらの性能はドメインの一般化が貧弱なため著しく低下する。本稿では,マルチドメインソースから効果的に学習するサブネットワーク(DGT)の動的成長木を提案する。 DGTは、学習済みのドメインを忘れることなく、モデルが新しいドメインから継続的に効果的に学習することを可能にする、新しい生涯学習技術を使用している。したがって、モデルはドメイン外のビデオに一般化することができる。提案手法は,シングルソース・イン・ドメイン(従来のビデオ・オブジェクト・セグメンテーション),マルチソース・イン・ドメイン,マルチソース・アウト・オブ・ドメイン・ビデオ・オブジェクト・セグメンテーションを用いて評価する。 DGTの結果は、DAVIS16データセットとDAVIS17データセットでそれぞれ0.2%と3.5%という、単一ソースのドメイン内パフォーマンス向上を示している。しかし、DGTをドメイン内マルチソースを用いて評価すると、この結果は最先端のビデオオブジェクトセグメンテーションや他の生涯学習技術と比較して優れた性能を示し、Fスコアの平均的なパフォーマンスは6.9%、破滅的最小化は6.9%向上した。最後に、ドメイン外実験では、DGTのパフォーマンスは、それぞれ1ショットと5ショットの最先端よりも2.7%、4%向上している。 Current state-of-the-art video object segmentation models have achieved great success using supervised learning with massive labeled training datasets. However, these models are trained using a single source domain and evaluated using videos sampled from the same source domain. When these models are evaluated using videos sampled from a different target domain, their performance degrades significantly due to poor domain generalization, i.e., their inability to learn from multi-domain sources simultaneously using traditional supervised learning. In this paper, We propose a dynamically growing tree of sub-networks (DGT) to learn effectively from multi-domain sources. DGT uses a novel lifelong learning technique that allows the model to continuously and effectively learn from new domains without forgetting the previously learned domains. Hence, the model can generalize to out-of-domain videos. The proposed work is evaluated using single-source in-domain (traditional video object segmentation), multi-source in-domain, and multi-source out-of-domain video object segmentation. The results of DGT show a single source in-domain performance gain of 0.2% and 3.5% on the DAVIS16 and DAVIS17 datasets, respectively. However, when DGT is evaluated using in-domain multi-sources, the results show superior performance compared to state-of-the-art video object segmentation and other lifelong learning techniques with an average performance increase in the F-score of 6.9% with minimal catastrophic forgetting. Finally, in the out-of-domain experiment, the performance of DGT is 2.7% and 4% better than state-of-the-art in 1 and 5-shots, respectively.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# モーションプリミティブを用いたリアルタイムロボット支援ハンドオブジェクトインタラクション Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives ( http://arxiv.org/abs/2405.19531v1 ) ライセンス: Link先を確認	Mingqi Yuan, Huijiang Wang, Kai-Fung Chu, Fumiya Iida, Bo Li, Wenjun Zeng,	(参考訳) 人工知能(AI)の進歩は、人間-ロボットインタラクション(HRI)技術の進化を促している。しかし、特に人間との物理的接触を必要とするタスクにおいて、シームレスな相互作用を達成する上で重要な課題が残っている。これらの課題は、人間の行動の正確なリアルタイム認識、ロボットの適応制御アルゴリズム、人間とロボットの動きの効果的な調整の必要性から生じる。本稿では,動的ロボット支援ハンドオブジェクトインタラクション(HOI)に着目した物理HRIの強化手法を提案する。提案手法は,ロボットとロボットの協調を支援するために,ポーズ推定,適応ロボット制御,モーションプリミティブを統合した。具体的には,動作プリミティブモデル(MPM)が人間の手の動きをロボット動作に変換するように設計された,単一のRGB画像から手の動きをリアルタイムに3Dモデリングするトランスフォーマーベースのアルゴリズムを用いる。ロボットのアクション実装は、継続的に更新された3Dハンドモデルを使用して動的に微調整される。リングウェアリングタスクを含む実験的な検証は、リアルタイムな動作に適応し、正確なタスク実行を支援するシステムの有効性を実証する。 Advances in artificial intelligence (AI) have been propelling the evolution of human-robot interaction (HRI) technologies. However, significant challenges remain in achieving seamless interactions, particularly in tasks requiring physical contact with humans. These challenges arise from the need for accurate real-time perception of human actions, adaptive control algorithms for robots, and the effective coordination between human and robotic movements. In this paper, we propose an approach to enhancing physical HRI with a focus on dynamic robot-assisted hand-object interaction (HOI). Our methodology integrates hand pose estimation, adaptive robot control, and motion primitives to facilitate human-robot collaboration. Specifically, we employ a transformer-based algorithm to perform real-time 3D modeling of human hands from single RGB images, based on which a motion primitives model (MPM) is designed to translate human hand motions into robotic actions. The robot's action implementation is dynamically fine-tuned using the continuously updated 3D hand models. Experimental validations, including a ring-wearing task, demonstrate the system's effectiveness in adapting to real-time movements and assisting in precise task executions.	翻訳日:2024-05-31 19:06:28 公開日:2024-05-29
# マルチMarginal Matching Gapによる複数表現の対比 Contrasting Multiple Representations with the Multi-Marginal Matching Gap ( http://arxiv.org/abs/2405.19532v1 ) ライセンス: Link先を確認	Zoe Piran, Michal Klein, James Thornton, Marco Cuturi,	(参考訳) 複数の(k\geq 3$)ビューやモダリティを通して見ることができる複雑なオブジェクトの有意義な表現を学習することは、機械学習における中核的なタスクである。既存のメソッドでは、元々はペアビュー用に意図された損失を使用し、$\tfrac12k(k-1)$ロスペアをインスタンス化するか、あるいは、‘textit{one vs. average-of-rest}戦略に従って、最小の埋め込みを使用することで、$k$ビューに拡張している。我々はM3G(Multi-marginal matching gap)を提案し、M3G(Multi-marginal optimal transport)理論からツールを借りてすべての$k$ビューを同時に組み込む。それぞれのビューが$k$-tuplesに変換され、その後$k$-tuplesに変換されると、損失は、$n$ ground-truth $k$-tuplesとMM-OTポリマッチングコストをマッチングするコストとを比較します。 MM-OT問題の指数関数的複雑性$O(n^k$)は恐ろしいように思えるが、その問題に対するシンクホーンアルゴリズムの適切な一般化は、例えば、$k=3\sim 6$ view に6.4~\sim128$ のミニバッチを用いてスケールできることを実験で示す。本実験は、自己教師型タスクとマルチモーダル型タスクの両方において、ペアワイズ損失のマルチビュー拡張よりも性能が向上したことを示す。 Learning meaningful representations of complex objects that can be seen through multiple ($k\geq 3$) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to $k$ views, either by instantiating $\tfrac12k(k-1)$ loss-pairs, or by using reduced embeddings, following a \textit{one vs. average-of-rest} strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to simultaneously incorporate all $k$ views. Given a batch of $n$ points, each seen as a $k$-tuple of views subsequently transformed into $k$ embeddings, our loss contrasts the cost of matching these $n$ ground-truth $k$-tuples with the MM-OT polymatching cost, which seeks $n$ optimally arranged $k$-tuples chosen within these $n\times k$ vectors. While the exponential complexity $O(n^k$) of the MM-OT problem may seem daunting, we show in experiments that a suitable generalization of the Sinkhorn algorithm for that problem can scale to, e.g., $k=3\sim 6$ views using mini-batches of size $64~\sim128$. Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# 選好学習アルゴリズムは選好ランキングを学習しない Preference Learning Algorithms Do Not Learn Preference Rankings ( http://arxiv.org/abs/2405.19534v1 ) ライセンス: Link先を確認	Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho,	(参考訳) 優先学習アルゴリズム(例えば、RLHFやDPO)は、LLMを操り、人間に好まれる世代を生成するために頻繁に使われていますが、その内部動作に対する私たちの理解は限定的です。そこで本研究では,選好学習モデルを用いて,好ましくない出力よりも好ましくない出力により高い確率を割り当てる従来の知恵を,$\textit{ ranking accuracy}$で測定した。驚いたことに、ほとんどの最先端の選好調整モデルでは、一般的な選好データセットでは60%未満のランキング精度が得られる。さらに、DPO や RLHF の目的を完璧に最適化すれば、優先順位調整 LLM が達成できるという $\textit{idealized ranking accuracy}$ を導出する。我々は既存のモデルが有意な$\textit{alignment gap}$ -- $\textit{i.e.}$を示すことを示した。提案手法は,参照モデルにおける微妙なランク付け誤りの修正に経験的かつ理論的に不適なDPO目的に起因し,与えられた選好データポイントの学習の難しさを定量化するための単純かつ効率的な公式を導出する。最後に、評価精度は、モデルが目的の基準モデルに近い場合に、経験的に人気の高い利率指標と強く相関し、オン・ポリティ(例えば、RLHF)とオフ・ポリティ(例えば、DPO)の選好学習アルゴリズムの違いにさらに光を当てることを示した。 Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via $\textit{ranking accuracy}$. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the $\textit{idealized ranking accuracy}$ that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant $\textit{alignment gap}$ -- $\textit{i.e.}$, a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e.g., RLHF) and off-policy (e.g., DPO) preference learning algorithms.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# Einstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen correlations for teleporting collective spin state in a two-dimensional trapion crystal Generating Einstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen correlations for teleporting collective spin states in a two dimensional trapped ion crystal ( http://arxiv.org/abs/2405.19536v1 ) ライセンス: Link先を確認	Muhammad Miskeen Khan, Edwin Chaparro, Bhuvanesh Sundar, Allison Carter, John Bollinger, Klaus Molmer, Ana Maria Rey,	(参考訳) 我々は、エンジニアEinstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen(EPR)相関に対する絡み合い資源としてのフォノン$\unicode{x2013}$介在相互作用の利用を提案し、2$\unicode{x2013}$次元イオン結晶における集合スピン状態のテレポーテーションを行う。我々は、異なる核スピン度に対応するサブシステム間の連続可変量子テレポーテーションプロトコルをエミュレートする。それぞれにおいて、量子状態は電子スピンの度合いで符号化され、結晶の振動モードと結合する。スピンコヒーレント状態とその相変位変種、絡み合ったスピンスクイーズ状態、およびディック状態の高忠実テレポーテーションは、数十イオンから数百イオンの配列における現実的な実験条件に対して可能であることを示す。 We propose the use of phonon$\unicode{x2013}$mediated interactions as an entanglement resource to engineer Einstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen (EPR) correlations and to perform teleportation of collective spin states in two$\unicode{x2013}$dimensional ion crystals. We emulate continuous variable quantum teleportation protocols between subsystems corresponding to different nuclear spin degrees of freedom. In each of them, a quantum state is encoded in an electronic spin degree of freedom that couples to the vibrational modes of the crystal. We show that high fidelity teleportation of spin-coherent states and their phase-displaced variant, entangled spin-squeezed states, and Dicke states, is possible for realistic experimental conditions in arrays from a few tens to a few hundred ions.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# パラメータ化量子回路の最適複雑性 Optimal complexity of parameterized quantum circuits ( http://arxiv.org/abs/2405.19537v1 ) ライセンス: Link先を確認	Guilherme Ilário Correr, Pedro C. Azado, Diogo O. Soares-Pinto, Gabriel Carlo,	(参考訳) パラメータ化量子回路は、NISQ時代の領域における量子変分アルゴリズムの発展に重要な役割を果たしている。さまざまなタスクを実行する実際の能力を知ることが,その上で最も重要なのです。それらを普遍ランダム回路の原型クラスと比較することにより、ハール測度によって定義される漸近的複雑性へのアプローチはより速く、それに到達するためのゲートが少なくなることが判明した。このためにトポロジーが重要視されている。メジャー化基準は、表現可能性と平均的絡み合いを補完するツールとして証明されている。 Parameterized quantum circuits play a key role for the development of quantum variational algorithms in the realm of the NISQ era. Knowing their actual capability of performing different kinds of tasks is then of the utmost importance. By comparing them with a prototypical class of universal random circuits we have found that their approach to the asymptotic complexity defined by the Haar measure is faster, needing less gates to reach it. Topology has been revealed crucial for this. The majorization criterion has proven as a relevant complementary tool to the expressibility and the mean entanglement.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# CheXpert Plus:何百もの放射線学のテキスト、画像、患者 CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and Patients ( http://arxiv.org/abs/2405.19538v1 ) ライセンス: Link先を確認	Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz,	(参考訳) 5年前にCheXpertの最初の論文がリリースされて以来、CheXpertは最も広く使われ、引用された臨床AIデータセットの1つになった。ビジョン言語モデルの出現は、CheXpertイメージに関連するレポートの共有要求の高まりを招き、人口統計データを取得することへのAIフェアネス研究者の関心が高まった。これを解決するため、CheXpert Plusは、放射線学の分野におけるその後のすべての機械学習タスクに対するモデルのスケーリング、パフォーマンス、堅牢性、公平性を高めるために公開された、新しい放射線学データソースのコレクションとして機能する。 CheXpert Plusは、放射線学で公開された最大のテキストデータセットで、合計で3600万のテキストトークンがあり、1300万のインプレッショントークンが含まれている。私たちの知る限りでは、これは放射線学における最大のテキスト識別の取り組みであり、ほぼ100万PHIが匿名化されている。大規模な英語ペアデータセットが放射線学でリリースされたのは2回目であり、これにより初めて大規模なクロスインスティテュートトレーニングが可能になる。全てのレポートは、DICOMフォーマットの高品質な画像と組み合わせられ、様々な臨床および社会経済的グループを含む多数の画像と患者のメタデータ、および多くの病理ラベルとRadGraphアノテーションが提供される。このデータセットは、放射線科医のさらなる支援と医療改善に役立つAIモデルの研究を促進することを願っている。 https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 モデルは以下のURLで利用可能である。 Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# 大規模配電系統における低エントロピー結合の計算 Computing Low-Entropy Couplings for Large-Support Distributions ( http://arxiv.org/abs/2405.19540v1 ) ライセンス: Link先を確認	Samuel Sokota, Dylan Sam, Christian Schroeder de Witt, Spencer Compton, Jakob Foerster, J. Zico Kolter,	(参考訳) 最小エントロピー結合(MEC、Minimum-Entropy coupling)とは、最小エントロピーを持つ最小エントロピーを求める過程であり、因果関係やステガノグラフィーなどの分野で応用されている。しかし、既存のアルゴリズムは、大容量の分布に対して計算的に抽出可能であるか、特定の分布タイプに制限されているか、ハイパーパラメータの選択に敏感である。この研究は、従来の反復MEC(IMEC)アプローチを一般化されたパーティションベースの形式主義に統一することで、これらの制限に対処する。この枠組みから任意の離散分布を処理できる新しいIMECアルゴリズムであるARIMECを導出し、最適化されたハイパーパラメータ設定に対してIMECを堅牢にする方法を提案する。これらの革新はIMECの言語モデルを用いた高スループットステガノグラフィーへの応用を促進する。私たちのコードベースはhttps://github.com/ssokota/mec で公開されています。 Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec .	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# Aモード超音波信号の動的復号化による解剖学的領域認識とリアルタイム骨追跡法 Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals ( http://arxiv.org/abs/2405.19542v1 ) ライセンス: Link先を確認	Bangyu Lan, Stefano Stramigioli, Kenan Niu,	(参考訳) 整形外科と補綴ロボットの運動解析には正確な骨追跡が不可欠である。従来の方法(例えば皮膚マーカー)は軟部組織のアーティファクトであり、手術で使用される骨のピンは、追加の外傷や感染のリスクをもたらす。エレクトロミオグラフィー(EMG)では、関節角度を直接測定できないため、運動学的推定のための複雑なアルゴリズムが必要である。これらの問題に対処するため、Aモード超音波による追跡は非侵襲的で安全な代替手段として提案されている。しかし、この手法は、受信した超音波信号を処理する際にピーク検出の精度が限られている。本稿では,Aモード超音波信号を用いた解剖学的領域認識と骨追跡のための深層学習手法を提案する。このアルゴリズムは、同時に骨追跡を行い、Aモード超音波トランスデューサが置かれた解剖学的領域を特定することができる。これは、カスケードされたU-Netのすべてのエンコーディング層とデコード層の間の完全な接続を含み、骨のピークを持つ可能性が高い信号領域のみに焦点を合わせ、ピークの正確な位置を特定し、信号の解剖学的領域を分類する。実験では, 関節周囲の解剖学的領域に対する動的追跡条件下で, 解剖学的領域の分類において97%の精度, 約0.5$\pm$1mmの精度を示した。一般に, 超音波が付加機能として付加された解剖学的領域の精度と認識において, 従来の手法を超える大きな可能性を示す。 Accurate bone tracking is crucial for kinematic analysis in orthopedic surgery and prosthetic robotics. Traditional methods (e.g., skin markers) are subject to soft tissue artifacts, and the bone pins used in surgery introduce the risk of additional trauma and infection. For electromyography (EMG), its inability to directly measure joint angles requires complex algorithms for kinematic estimation. To address these issues, A-mode ultrasound-based tracking has been proposed as a non-invasive and safe alternative. However, this approach suffers from limited accuracy in peak detection when processing received ultrasound signals. To build a precise and real-time bone tracking approach, this paper introduces a deep learning-based method for anatomical region recognition and bone tracking using A-mode ultrasound signals, specifically focused on the knee joint. The algorithm is capable of simultaneously performing bone tracking and identifying the anatomical region where the A-mode ultrasound transducer is placed. It contains the fully connection between all encoding and decoding layers of the cascaded U-Nets to focus only on the signal region that is most likely to have the bone peak, thus pinpointing the exact location of the peak and classifying the anatomical region of the signal. The experiment showed a 97% accuracy in the classification of the anatomical regions and a precision of around 0.5$\pm$1mm under dynamic tracking conditions for various anatomical areas surrounding the knee joint. In general, this approach shows great potential beyond the traditional method, in terms of the accuracy achieved and the recognition of the anatomical region where the ultrasound has been attached as an additional functionality.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# 最適双対化による大規模言語モデルのワンショット安全アライメント One-Shot Safety Alignment for Large Language Models via Optimal Dualization ( http://arxiv.org/abs/2405.19544v1 ) ライセンス: Link先を確認	Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding,	(参考訳) LLM(Large Language Models, 大規模言語モデル)を取り巻く安全性の懸念が高まり、その利便性と安全性を同時に向上するために、様々な人間の好みに合わせる必要がある。有望なアプローチは、RLHF(Reinforcement Learning from Human Feedback)を通じて安全性の制約を実施することである。このような制約付きRLHFでは、一般的なラグランジアンベースの原始双対ポリシー最適化手法は計算コストが高く、しばしば不安定である。本稿では,制約付きアライメントを等価な非制約アライメント問題に還元する双対化の観点を提案する。我々は、閉形式を持つ滑らかで凸な双対函数を事前に最適化する。このショートカットは、煩雑な原始二重ポリシー反復の必要性を排除し、計算負担を大幅に低減し、訓練安定性を向上させる。我々の戦略はモデルベースと嗜好ベースのシナリオ(それぞれMoCANとPeCAN)の2つの実践的アルゴリズムに導かれる。幅広い実験により,本手法の有効性が示された。 The growing safety concerns surrounding Large Language Models (LLMs) raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, common Lagrangian-based primal-dual policy optimization methods are computationally expensive and often unstable. This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. This shortcut eliminates the need for cumbersome primal-dual policy iterations, thus greatly reducing the computational burden and improving training stability. Our strategy leads to two practical algorithms in model-based and preference-based scenarios (MoCAN and PeCAN, respectively). A broad range of experiments demonstrate the effectiveness of our methods.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# マルチモーダルコントラスト学習のためのCLIPLosとノルムに基づくデータ選択法 CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning ( http://arxiv.org/abs/2405.19547v1 ) ライセンス: Link先を確認	Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei Du,	(参考訳) データ選択は、大規模なビジュアル言語モデル(例えば、CLIP)、特に騒がしいWebキュレートデータセットにおいて、中心的な問題として現れている。 3つの主要なデータ選択アプローチは、(1)外部の非CLIPモデルを活用してデータ選択を支援すること、(2)元々のOpenAI CLIPモデルよりも高品質なデータを選択するのに効果的であるCLIPスタイルの埋め込みモデルをトレーニングすること、(3)特定のモデルプロパティを必要とせずにCLIP埋め込みに適用可能なより良いメトリクスや戦略を設計すること(例えば、CLIPScoreは人気のあるメトリックである)である。最初の2つのアプローチは広く研究されているが、第3のアプローチは未調査のままである。本稿では,2つの新しい手法を提案することによって,第3のアプローチを推し進める。まず,1つのサンプルから2つのモダリティのアライメントのみを考慮する古典的なCLIPスコアの代わりに,1つのサンプルとその対照的なペア間のアライメントを追加するCLIPロスインスパイア法であるnegCLIPLossを導入する。第二に、下流タスクが分かっている場合、事前学習データと対象データとの類似性を測定するために、ノルムシムという新しい基準ベースの指標を提案する。我々は、データ選択ベンチマークDataComp~\cite{gadre2023datacomp}でメソッドをテストする。 OpenAIのCLIP-L/14のみを使用した最高のベースラインと比較すると,ImageNet-1kでは5.3倍,38ダウンストリーム評価タスクでは2.8倍の改善を実現している。さらに、negCLIPLossとNormSimはどちらも既存の技術と互換性がある。現在のベストメソッドDFN~\cite{fang2023data} とHYPE~\cite{kim2024hype} を組み合わせることで、ダウンストリームタスクにおける平均パフォーマンスを0.9\%向上させ、新しい最先端を実現することができます。 Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data than the original OpenAI CLIP model, and (3) designing better metrics or strategies universally applicable to any CLIP embedding without requiring specific model properties (e.g., CLIPScore is one popular metric). While the first two approaches have been extensively studied, the third remains under-explored. In this paper, we advance the third approach by proposing two new methods. Firstly, instead of classical CLIP scores that only consider the alignment between two modalities from a single sample, we introduce negCLIPLoss, a CLIP loss-inspired method that adds the alignment between one sample and its contrastive pairs as an extra normalization term for better quality measurement. Secondly, when downstream tasks are known, we propose a new norm-based metric, NormSim, to measure the similarity between pretraining data and target data. We test our methods on the data selection benchmark, DataComp~\cite{gadre2023datacomp}. Compared to the best baseline using only OpenAI's CLIP-L/14, our methods achieve a 5.3\% improvement on ImageNet-1k and a 2.8\% improvement on 38 downstream evaluation tasks. Moreover, both negCLIPLoss and NormSim are compatible with existing techniques. By combining our methods with the current best methods DFN~\cite{fang2023data} and HYPE~\cite{kim2024hype}, we can boost average performance on downstream tasks by 0.9\%, achieving a new state-of-the-art.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# RLeXplore: 本質的な動機付け強化学習における加速研究 RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning ( http://arxiv.org/abs/2405.19548v1 ) ライセンス: Link先を確認	Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Glen Berseth, Wenjun Zeng,	(参考訳) 外部報酬は、特定のタスクにおける強化学習(RL)エージェントを効果的に導くことができる。しかしながら、外在的な報酬は、設計やアノテーションに必要な人的労力のために、複雑な環境でしばしば不足する。この制限は、補助的かつ高密度な信号を提供し、エージェントが教師なしの方法で学習できるようにする本質的な報酬の必要性を浮き彫りにする。様々な本質的な報酬の定式化が提案されているが、その実装と最適化の詳細は不十分であり、標準化が欠如しているため、研究の進展を妨げている。このギャップに対処するため、我々はRLeXploreを紹介した。RLeXploreは8つの最先端固有の報酬アルゴリズムの信頼性のある実装を提供する統一的で高度にモジュール化されたプラグイン・アンド・プレイのフレームワークである。さらに、重要な実装の詳細を特定し、本質的な動機付けRLにおける適切な標準プラクティスを確立するための詳細な研究を行う。 RLeXploreのソースコードはhttps://github.com/RLE-Foundation/RLeXploreで公開されている。 Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. The source code for RLeXplore is available at https://github.com/RLE-Foundation/RLeXplore.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# パスワード同期モデルによるストレス試験能力の回復 Stress-Testing Capability Elicitation With Password-Locked Models ( http://arxiv.org/abs/2405.19550v1 ) ライセンス: Link先を確認	Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov, David Krueger,	(参考訳) 大規模言語モデル(LLM)の安全性を決定するためには、AI開発者は危険な能力を評価する必要がある。しかし、単純なプロンプト戦略はLLMの全機能を引き出すのに失敗することが多い。より堅牢に機能を引き出す1つの方法は、タスクを完了させるためにLLMを微調整することです。本稿では,微調整による誘引が能力を引き出すのに十分である条件について検討する。これを実現するために、パスワードロックモデル(LLM)を導入し、その機能の一部を意図的に隠蔽するように微調整する。具体的には、これらのLSMは、プロンプトにパスワードが存在する場合にのみこれらの能力を示すように訓練され、それ以外はより弱いLSMを模倣する。パスワードロックされたモデルは、パスワードを使わずにこれらのパスワードロックされた機能を引き出すことができるかどうかをテストすることによって、機能を引き出す新しい方法を可能にする。いくつかの高品質なデモは、パスワードでロックされた機能を完全に引き出すのに十分であることがわかった。より驚くべきことに、微調整は、同じパスワードや異なるパスワードを使ってロックされた他の機能を引き出すことができる。さらに、評価のみが利用可能であり、実演ではない場合、強化学習のようなアプローチは、しばしば能力を引き出すことができる。全体としては、ファインチューニングは現在のモデルの隠れた能力を引き出す効果的な方法であるが、高品質なデモンストレーションが得られない場合、例えばモデルの(隠された)能力が人間のデモの能力を上回る場合など、信頼性が低い可能性があることを示唆している。 To determine the safety of large language models (LLMs), AI developers must be able to assess their dangerous capabilities. But simple prompting strategies often fail to elicit an LLM's full capabilities. One way to elicit capabilities more robustly is to fine-tune the LLM to complete the task. In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elicit capabilities. To do this, we introduce password-locked models, LLMs fine-tuned such that some of their capabilities are deliberately hidden. Specifically, these LLMs are trained to exhibit these capabilities only when a password is present in the prompt, and to imitate a much weaker LLM otherwise. Password-locked models enable a novel method of evaluating capabilities elicitation methods, by testing whether these password-locked capabilities can be elicited without using the password. We find that a few high-quality demonstrations are often sufficient to fully elicit password-locked capabilities. More surprisingly, fine-tuning can elicit other capabilities that have been locked using the same password, or even different passwords. Furthermore, when only evaluations, and not demonstrations, are available, approaches like reinforcement learning are still often able to elicit capabilities. Overall, our findings suggest that fine-tuning is an effective method of eliciting hidden capabilities of current models, but may be unreliable when high-quality demonstrations are not available, e.g. as may be the case when models' (hidden) capabilities exceed those of human demonstrators.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# ソフト分解を用いた連続モンテカルロのマルチモーダル分布の収束境界 Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition ( http://arxiv.org/abs/2405.19553v1 ) ライセンス: Link先を確認	Holden Lee, Matheau Santana-Gijzen,	(参考訳) 逐次モンテカルロ法(SMC)アルゴリズムで得られたサンプルの実証測度の下での関数$f$の分散に関するバウンダリを,大域マルコフ連鎖混合力学よりも局所的に依存する時間的複雑さで証明する。 SMCはマルコフ・チェイン・モンテカルロ (MCMC) 法であり、既知の分布からN$の粒子を引いてから始まり、各インスタンスでマルコフ連鎖を滑らかにするために再サンプリングする。原則として、SMCは多モード性から問題を緩和しようとする。しかし、SMCのほとんどの理論的保証は、一様条件でのみ効率的である大域的な混合時間境界を仮定することで得られる。局所MCMCダイナミクスにのみ依存する混合時間を用いて,真のマルチモーダル設定でバウンダリが得られることを示す。 We prove bounds on the variance of a function $f$ under the empirical measure of the samples obtained by the Sequential Monte Carlo (SMC) algorithm, with time complexity depending on local rather than global Markov chain mixing dynamics. SMC is a Markov Chain Monte Carlo (MCMC) method, which starts by drawing $N$ particles from a known distribution, and then, through a sequence of distributions, re-weights and re-samples the particles, at each instance applying a Markov chain for smoothing. In principle, SMC tries to alleviate problems from multi-modality. However, most theoretical guarantees for SMC are obtained by assuming global mixing time bounds, which are only efficient in the uni-modal setting. We show that bounds can be obtained in the truly multi-modal setting, with mixing times that depend only on local MCMC dynamics.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# 離散分布のクラスタリング混合:Mitraのアルゴリズムについて Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm ( http://arxiv.org/abs/2405.19559v1 ) ライセンス: Link先を確認	Mohamed Seif, Yanxi Chen,	(参考訳) 本稿では、一般的な離散混合分布モデルを分類するためのMitraのアルゴリズム \cite{mitra2008clustering} の洗練された解析を行う。このアルゴリズムはスペクトルクラスタリング \cite{mcsherry 2001spectral} に基づいて構築され、確率分布に対して魅力的な条件を提供する。この分析は,確率的ブロックモデルに2分割するようにモデルを調整し,より洗練された条件を与える。その結果, 分離条件の改善が得られた。 In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined conditions. Compared to those derived in \cite{mitra2008clustering}, our improved separation conditions are obtained.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# Quo Vadis ChatGPT : 大規模言語モデルから大規模知識モデルへ Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models ( http://arxiv.org/abs/2405.19561v1 ) ライセンス: Link先を確認	Venkat Venkatasubramanian, Arijit Chakraborty,	(参考訳) 自然言語処理や画像合成といったアプリケーションにおけるトランスフォーマーベースの生成ニューラルネットワークアーキテクチャを用いたChatGPTやその他の大規模言語モデル(LLM)の驚くべき成功は、プロセスシステム工学(PSE)の潜在的な可能性に多くの研究者が興奮している。これらの分野でのLLMのほぼ人間的なパフォーマンスは非常に印象的であり、驚き、そして大きなブレークスルーです。その機能は、ドキュメントの最初のドラフトを書くこと、コード記述補助、テキストの要約など、特定のタスクで非常に役立ちます。しかし、その成功は、詳細なドメイン知識の欠如のために、まだ推論、計画、説明ができないため、非常に科学的領域において限られている。これは化学工学などの分野において、物理や化学(および生物学)の基本法則、構成的関係、材料、プロセス、システムに関する高度な技術知識によって支配される問題である。純粋にデータ駆動機械学習はすぐに使えるが、科学と工学の分野におけるAIの長期的な成功は、第一原理と技術的な知識を効果的に活用するハイブリッドAIシステムの開発に依存する。我々はこれらのハイブリッドAIシステムをLKM(Large Knowledge Models)と呼んでいる。本稿では,化学工学におけるこのようなシステム開発における課題と機会について論じる。 The startling success of ChatGPT and other large language models (LLMs) using transformer-based generative neural network architecture in applications such as natural language processing and image synthesis has many researchers excited about potential opportunities in process systems engineering (PSE). The almost human-like performance of LLMs in these areas is indeed very impressive, surprising, and a major breakthrough. Their capabilities are very useful in certain tasks, such as writing first drafts of documents, code writing assistance, text summarization, etc. However, their success is limited in highly scientific domains as they cannot yet reason, plan, or explain due to their lack of in-depth domain knowledge. This is a problem in domains such as chemical engineering as they are governed by fundamental laws of physics and chemistry (and biology), constitutive relations, and highly technical knowledge about materials, processes, and systems. Although purely data-driven machine learning has its immediate uses, the long-term success of AI in scientific and engineering domains would depend on developing hybrid AI systems that use first principles and technical knowledge effectively. We call these hybrid AI systems Large Knowledge Models (LKMs), as they will not be limited to only NLP-based techniques or NLP-like applications. In this paper, we discuss the challenges and opportunities in developing such systems in chemical engineering.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# 選択的説明 Selective Explanations ( http://arxiv.org/abs/2405.19562v1 ) ライセンス: Link先を確認	Lucas Monteiro Paes, Dennis Wei, Flavio P. Calmon,	(参考訳) 特徴属性法は、重要点を入力特徴に割り当てることで、ブラックボックス機械学習(ML)モデルを説明する。これらの手法は大規模MLモデルでは計算コストがかかる。この課題に対処するために、機械学習モデルに1つの推論だけで特徴帰属スコアを予測するためのトレーニングを行う、償却説明書の開発への取り組みが増えている。その効率にもかかわらず、償却された説明者は不正確な予測や誤解を招く説明を生み出すことができる。本稿では,新しい特徴帰属法である選択的説明法を提案する。 (i)償却説明書が品質の低い説明を生成するときの検出 (二)最初の推測による説明という手法を用いて、これらの説明を改善する。我々の選択的説明法は,説明を受けるサンプルのごく一部を初期推定で特定し,償却説明書と高品質な説明書とのギャップを埋める基本的方法を提供する。 Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there has been increasing efforts to develop amortized explainers, where a machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. In this paper, we propose selective explanations, a novel feature attribution method that (i) detects when amortized explainers generate low-quality explanations and (ii) improves these explanations using a technique called explanations with initial guess. Our selective explanation method allows practitioners to specify the fraction of samples that receive explanations with initial guess, offering a principled way to bridge the gap between amortized explainers and their high-quality counterparts.	翻訳日:2024-05-31 18:56:18 公開日:2024-05-29
# 大規模言語モデルにおける未学習の気候誤報 Unlearning Climate Misinformation in Large Language Models ( http://arxiv.org/abs/2405.19563v1 ) ライセンス: Link先を確認	Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis,	(参考訳) 気候変動に関する誤報は、人類にとって最も深刻な脅威の1つに対処する上で、重要な障害となっている。本稿では,気候情報に関する大規模言語モデル(LLM)の事実的精度について検討する。実/偽ラベル付きQ&Aデータを用いて、気候変動問題に対する真正な応答を生成できるオープンソースモデルを比較し、気候変動に関する主張を微調整し評価する。本研究は, 意図的に偽の気候情報に汚染されたモデルの検出可能性について検討し, 他領域におけるモデル応答の精度には影響しないことを示した。さらに, 未学習アルゴリズム, 微調整, 検索・拡張生成(RAG)の有効性を, 気候変動トピックに基づく現実的なLLMの有効性と比較した。評価の結果, 未学習アルゴリズムは, プライバシの文脈における非効率性を示唆する以前の知見にもかかわらず, 曖昧な概念的主張に対して有効であることが明らかとなった。これらの知見は、より現実的に信頼性の高いLLMの開発を導くことを目的としており、誤情報攻撃に対するLLMの安全性を確保するための追加作業の必要性を強調している。 Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity. This paper investigates factual accuracy in large language models (LLMs) regarding climate information. Using true/false labeled Q&A data for fine-tuning and evaluating LLMs on climate-related claims, we compare open-source models, assessing their ability to generate truthful responses to climate change questions. We investigate the detectability of models intentionally poisoned with false climate information, finding that such poisoning may not affect the accuracy of a model's responses in other domains. Furthermore, we compare the effectiveness of unlearning algorithms, fine-tuning, and Retrieval-Augmented Generation (RAG) for factually grounding LLMs on climate change topics. Our evaluation reveals that unlearning algorithms can be effective for nuanced conceptual claims, despite previous findings suggesting their inefficacy in privacy contexts. These insights aim to guide the development of more factually reliable LLMs and highlight the need for additional work to secure LLMs against misinformation attacks.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# 2次元Rydberg原子アレイにおける雑音耐性パリティ制御ゲートによる量子誤差検出 Quantum error detection with noise-resilient parity-controlled gate in two-dimensional Rydberg atom arrays ( http://arxiv.org/abs/2405.19564v1 ) ライセンス: Link先を確認	F. Q. Guo, S. L. Su, Weibin Li, X. Q. Shao,	(参考訳) 量子誤り検出は主に量子情報処理の基本的な操作である量子ビットパリティの正確な測定に依存する。本稿では,2次元Rydberg原子配列内の量子誤差を検出するためのレジリエントパリティ制御ゲートを提案する。本手法は, 補助原子の動的進化を追跡することにより, 仮想励起制御原子の偶数パリティと奇数パリティの識別を可能にする。 Rydberg状態とRydberg状態の間のスピン交換双極子相互作用と単光子と2光子駆動を用いることで、Rydberg-parity測定を従来の方法と比較して大幅に高速化する。本稿では,3ビット繰り返し符号,ZZZ$およびXXXX$の安定化器を特徴とする標準曲面符号,およびXZZX$構成の回転曲面符号について検討し,単発読み出しによる安定化器の測定を容易にする。我々は,Rydberg状態間の望ましくない相互作用,原子位置のゆらぎ,劣化ノイズ,レーザー振幅の不均一性などの潜在的な実験的不完全性を考慮して,我々の戦略の有効性を評価するために,徹底的な数値シミュレーションを行った。特に、パリティメータの物理機構の信頼性と利点の確保に重点を置いている。これらの結果は、我々のプロトコルの堅牢性と生存性を確認し、近い将来、Rydberg原子系を用いた量子エラー検出の有望な候補として位置づけた。 Quantum error detection relies primarily on precise measurement of qubit parity, a fundamental operation in quantum information processing. Here, we introduce a resilient parity-controlled gate tailored for detecting quantum errors within a 2D Rydberg atom array. Our method enables the discrimination between even and odd parities of virtually excited control atoms by tracking the dynamic evolution of an auxiliary atom. Using spin-exchange dipolar interactions of Rydberg states and single- and two-photon driving between ground states and Rydberg states, our method speeds up Rydberg-parity measurements by a large amount compared to previous methods. In practical application, we explore three-qubit repetition codes, standard surface codes featuring stabilizers in the forms $ZZZZ$ and $XXXX$, as well as rotated surface codes in the $XZZX$ configuration, facilitating the measurement of stabilizers with a single-shot readout. We carry out thorough numerical simulations to evaluate the feasibility of our strategy, considering potential experimental imperfections such as undesired interactions between Rydberg states, fluctuations in atomic positions, dephasing noise, and laser amplitude inhomogeneities. Particular emphasis is placed on ensuring the reliability and advantages of the physical mechanisms of the parity meter. These results affirm the robustness and viability of our protocol, positioning it as a promising candidate for quantum error detection employing the Rydberg atom system in the foreseeable future.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# Dr-LLaVA:シンボリック臨床接地による視覚指導 Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding ( http://arxiv.org/abs/2405.19567v1 ) ライセンス: Link先を確認	Shenghuan Sun, Gregory M. Goldgof, Alexander Schubert, Zhiqing Sun, Thomas Hartvigsen, Atul J. Butte, Ahmed Alaa,	(参考訳) VLM(Vision-Language Models)は、医用画像を分析し、自然言語の相互作用に関わり、診断や治療のタスクを支援することで、臨床医を支援する。しかしながら、VLMはしばしば「幻覚的」な振る舞いを示し、文脈的マルチモーダル情報に基づかないテキスト出力を生成する。この課題は医学領域において特に顕著であり、VLM出力は単一の相互作用において正確であるだけでなく、マルチターン会話を通して臨床推論や診断経路と一致している。そこで本研究では,臨床推論のシンボル表現を用いて,VLMを医用知識に根ざしたアライメントアルゴリズムを提案する。これらの表現は利用されます i) GPT-4誘導視覚指導チューニングデータを大規模に生成し、臨床とVLMの会話と臨床推論のデモンストレーションをシミュレートし、 (II)臨床医とVLMの相互作用を通じてVLM世代の臨床的妥当性を評価する自動報酬関数を作成する。提案アルゴリズムは, トレーニングデータ生成や報酬モデル構築における人間の関与を排除し, 人間のフィードバックによる標準的な強化学習(RLHF)と比較してコストを低減させる。本アルゴリズムを用いて骨髄病理のスライド解析に最適化された対話型VLMであるDr-LLaVAを開発し,マルチターン医療会話において高い性能を示した。 Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VLM outputs to be accurate in single interactions but also to be consistent with clinical reasoning and diagnostic pathways throughout multi-turn conversations. For this purpose, we propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge. These representations are utilized to (i) generate GPT-4-guided visual instruction tuning data at scale, simulating clinician-VLM conversations with demonstrations of clinical reasoning, and (ii) create an automatic reward function that evaluates the clinical validity of VLM generations throughout clinician-VLM interactions. Our algorithm eliminates the need for human involvement in training data generation or reward model construction, reducing costs compared to standard reinforcement learning with human feedback (RLHF). We apply our alignment algorithm to develop Dr-LLaVA, a conversational VLM finetuned for analyzing bone marrow pathology slides, demonstrating strong performance in multi-turn medical conversations.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# インクリメンタル・フォーショット・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマン Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation ( http://arxiv.org/abs/2405.19568v1 ) ライセンス: Link先を確認	Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding,	(参考訳) インクリメンタルなFew-shot Semantic Segmentation(iFSS)の目的は、トレーニング済みのセグメンテーションモデルを、古いトレーニングデータにアクセスせずに、アノテートされた少数のイメージを通じて、新しいクラスに拡張することだ。新たなクラスを漸進的に学習する過程で、古いクラスのデータ分布が破壊され、破滅的な忘れがもたらされる。一方、新しいクラスにはサンプルがほとんどないため、新しいクラスの満足できる表現を学習することは不可能である。 iFSS問題に対して、OINetと呼ばれる背景埋め込み空間 \textbf{O}rganization とプロトタイプ \textbf{I}nherit Network を提案する。具体的には、ベースクラスをトレーニングする場合、OINetはバックグラウンドで複数の分類ヘッドを使用し、複数のサブクラスプロトタイプを設定して、潜在する新規クラスに埋め込みスペースを予約する。そこで本研究では,新しいクラスを段階的に学習する過程で,現在学習している新しいクラスに最もよく適合するサブクラスプロトタイプを選択し,選択したプロトタイプの埋め込み空間を継承する戦略を提案する。この操作により、ベースクラスの分布に影響を与えることなく、少数のサンプルを使用して、新しいクラスを埋め込みスペースに登録することができる。 Pascal-VOC と COCO の結果,OINet は新たな最先端を実現している。 The goal of incremental Few-shot Semantic Segmentation (iFSS) is to extend pre-trained segmentation models to new classes via few annotated images without access to old training data. During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting. Meanwhile, the novel classes have only few samples, making models impossible to learn the satisfying representations of novel classes. For the iFSS problem, we propose a network called OINet, i.e., the background embedding space \textbf{O}rganization and prototype \textbf{I}nherit Network. Specifically, when training base classes, OINet uses multiple classification heads for the background and sets multiple sub-class prototypes to reserve embedding space for the latent novel classes. During incrementally learning novel classes, we propose a strategy to select the sub-class prototypes that best match the current learning novel classes and make the novel classes inherit the selected prototypes' embedding space. This operation allows the novel classes to be registered in the embedding space using few samples without affecting the distribution of the base classes. Results on Pascal-VOC and COCO show that OINet achieves a new state of the art.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# 組立およびブールプリミティブによる凸分解の改善 Improved Convex Decomposition with Ensembling and Boolean Primitives ( http://arxiv.org/abs/2405.19569v1 ) ライセンス: Link先を確認	Vaibhav Vavilala, Florian Kluger, Seemandhar Jain, Bodo Rosenhahn, David Forsyth,	(参考訳) プリミティブ(幾何学的に単純な形状で構造を正確に抽象化する)の観点でシーンを記述することは、確立されたビジョン問題である。異なるシーンは異なる数のプリミティブを必要とし、プリミティブは強く相互作用するが、提案されたソリューションは推論時に評価することができる。最先端の手法は、一定数のプリミティブからなる開始点を予測するための学習された回帰手順と、幾何を洗練させ、冗長プリミティブを除去する降下法を含む。手法は深度, 正常予測, シーンセグメンテーションの精度で評価される。本稿では,精度の大幅な向上が期待できることを示す。 (a)少数の負の原始体を取り入れて b) さまざまなレグレッション手順をまとめる。組み立ては予測される各スタートポイントを精錬し、損失を埋め合わせることでベストを選択する。標準データセットにおける大規模な実験により、負のプリミティブが多数の画像で有用であることが確認され、我々の精巧な選択選択戦略がより優れていることが確認され、適合問題が非常に難しいことが確認された。 Describing a scene in terms of primitives -- geometrically simple shapes that offer a parsimonious but accurate abstraction of structure -- is an established vision problem. This is a good model of a difficult fitting problem: different scenes require different numbers of primitives and primitives interact strongly, but any proposed solution can be evaluated at inference time. The state of the art method involves a learned regression procedure to predict a start point consisting of a fixed number of primitives, followed by a descent method to refine the geometry and remove redundant primitives. Methods are evaluated by accuracy in depth and normal prediction and in scene segmentation. This paper shows that very significant improvements in accuracy can be obtained by (a) incorporating a small number of negative primitives and (b) ensembling over a number of different regression procedures. Ensembling is by refining each predicted start point, then choosing the best by fitting loss. Extensive experiments on a standard dataset confirm that negative primitives are useful in a large fraction of images, and that our refine-then-choose strategy outperforms choose-then-refine, confirming that the fitting problem is very difficult.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# 高速拡散インバージョンによるブラインド画像復元 Blind Image Restoration via Fast Diffusion Inversion ( http://arxiv.org/abs/2405.19572v1 ) ライセンス: Link先を確認	Hamadi Chihaoui, Abdelhak Lemkhenter, Paolo Favaro,	(参考訳) 近年,事前学習した拡散モデルを用いて画像復元(IR)課題を解決するための様々な手法が提案されている。しかし、これらの手法の多くは、IRタスクの劣化演算子が完全に知られていると仮定している。さらに、これらの手法の共通する特徴は、劣化した入力画像との整合性を満たすために拡散サンプリングプロセスを変更することである。この選択は、最近、準最適であることが示され、復元された画像がデータ多様体から逸脱する原因となった。これらの問題に対処するため,高速拡散インバージョン (BIRD) を用いたブラインド画像復元法を提案する。復元された画像がデータ多様体上に配置されることを保証するため,事前学習した拡散モデルに基づく新しいサンプリング手法を提案する。提案手法の鍵となる考え方は、初期ノイズがサンプリングされると、逆サンプリングを変更すること、すなわち、中間ラテントを全て変更しないことである。これは究極的には、入力ノイズの空間における最適化問題としてIRタスクをキャストすることと同値である。さらに, 完全にローリングされていない拡散モデルの逆転に伴う計算コストを軽減するために, これらのモデルの本質的能力を活用して, 大きな時間ステップを用いて前方拡散過程をスキップする。画像復元作業におけるBIRDの有効性を実験的に検証し,それらすべてに対して最先端の性能を実現することを示す。私たちのコードはhttps://github.com/hamadichihaoui/BIRD.comで公開されています。 Recently, various methods have been proposed to solve Image Restoration (IR) tasks using a pre-trained diffusion model leading to state-of-the-art performance. However, most of these methods assume that the degradation operator in the IR task is completely known. Furthermore, a common characteristic among these approaches is that they alter the diffusion sampling process in order to satisfy the consistency with the degraded input image. This choice has recently been shown to be sub-optimal and to cause the restored image to deviate from the data manifold. To address these issues, we propose Blind Image Restoration via fast Diffusion inversion (BIRD) a blind IR method that jointly optimizes for the degradation model parameters and the restored image. To ensure that the restored images lie onto the data manifold, we propose a novel sampling technique on a pre-trained diffusion model. A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled. This is ultimately equivalent to casting the IR task as an optimization problem in the space of the input noise. Moreover, to mitigate the computational cost associated with inverting a fully unrolled diffusion model, we leverage the inherent capability of these models to skip ahead in the forward diffusion process using large time steps. We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance on all of them. Our code is available at https://github.com/hamadichihaoui/BIRD.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# ハウサ映画レビューにおけるアスペクト・ポラリティ分類のための深部畳み込みニューラルネットワークモデル A Deep Convolutional Neural Network-based Model for Aspect and Polarity Classification in Hausa Movie Reviews ( http://arxiv.org/abs/2405.19575v1 ) ライセンス: Link先を確認	Umar Ibrahim, Abubakar Yakubu Zandam, Fatima Muhammad Adam, Aminu Musa,	(参考訳) Aspect-based Sentiment Analysis (ABSA) は、感情のニュアンスを理解するのに不可欠である。本稿では,感情分析研究においてあまり表現されない言語であるハウサ映画レビューにおいて,アスペクトと極性分類に適した新しいDeep Convolutional Neural Network(CNN)モデルを提案する。 Hausa ABSAデータセットが作成され、リソースの可用性に大きなギャップを埋める。このデータセットは、TF-IDF変換のためにsci-kit-learnを使用して前処理され、手動で注釈付きアスペクトレベルの特徴オントロジーワードと感情極性割り当てを含む。提案モデルでは,CNNとアスペクトワード予測の注意機構を組み合わせることで,文脈情報と感情の極性を活用する。アスペクト項抽出の91%、感情極性分類の92%の精度で、モデルは従来のマシンモデルより優れており、特定のアスペクトや感情に関する洞察を提供する。本研究は、ABSA研究、特に表現不足言語において、異文化間言語研究に影響を及ぼす。 Aspect-based Sentiment Analysis (ABSA) is crucial for understanding sentiment nuances in text, especially across diverse languages and cultures. This paper introduces a novel Deep Convolutional Neural Network (CNN)-based model tailored for aspect and polarity classification in Hausa movie reviews, an underrepresented language in sentiment analysis research. A comprehensive Hausa ABSA dataset is created, filling a significant gap in resource availability. The dataset, preprocessed using sci-kit-learn for TF-IDF transformation, includes manually annotated aspect-level feature ontology words and sentiment polarity assignments. The proposed model combines CNNs with attention mechanisms for aspect-word prediction, leveraging contextual information and sentiment polarities. With 91% accuracy on aspect term extraction and 92% on sentiment polarity classification, the model outperforms traditional machine models, offering insights into specific aspects and sentiments. This study advances ABSA research, particularly in underrepresented languages, with implications for cross-cultural linguistic research.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# 情報システムマネジメントの転換 : ディジタルエンジニアリング統合のための参照モデル Transforming Information Systems Management: A Reference Model for Digital Engineering Integration ( http://arxiv.org/abs/2405.19576v1 ) ライセンス: Link先を確認	John Bonar, John Hastings,	(参考訳) デジタルエンジニアリングの実践は、情報保証とシステムライフサイクル管理を改善するために、重要かつ未使用のポテンシャルを提供する。本稿では、モデルベースのエンジニアリング、デジタルスレッド、統合された製品ライフサイクルといった機能が、一般的なフレームワークのギャップにどのように対処できるかを検討する。参照モデルは、参照情報システムにデジタルエンジニアリング技術を適用し、トレーサビリティ、リスク可視性、正確性、統合性を示す。モデルは、ビューをまたいだ権威的な要素を再利用しながら、戦略的ニーズと要求とアーキテクチャを結びつける。モデルの分析は、デジタルエンジニアリングがコンプライアンス、監視、変更管理、リスクアセスメントのギャップを埋めることを示している。デジタルエンジニアリングの採用が目的であることは、サイバーセキュリティ、オペレーション、サービス配信、システムガバナンスを、包括的なデジタルシステム表現を通じて変革する可能性があることを示している。この研究は、組織がインフラを近代化し、デジタルトランスフォーメーションを追求するときに、情報システムにデジタル工学の応用を成熟させる基盤を提供する。 Digital engineering practices offer significant yet underutilized potential for improving information assurance and system lifecycle management. This paper examines how capabilities like model-based engineering, digital threads, and integrated product lifecycles can address gaps in prevailing frameworks. A reference model demonstrates applying digital engineering techniques to a reference information system, exhibiting enhanced traceability, risk visibility, accuracy, and integration. The model links strategic needs to requirements and architecture while reusing authoritative elements across views. Analysis of the model shows digital engineering closes gaps in compliance, monitoring, change management, and risk assessment. Findings indicate purposeful digital engineering adoption could transform cybersecurity, operations, service delivery, and system governance through comprehensive digital system representations. This research provides a foundation for maturing application of digital engineering for information systems as organizations modernize infrastructure and pursue digital transformation.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# スピン系における安定化器レニーエントロピーのための非平衡量子モンテカルロアルゴリズム Non-equilibrium Quantum Monte Carlo Algorithm for Stabilizer Rényi Entropy in Spin Systems ( http://arxiv.org/abs/2405.19577v1 ) ライセンス: Link先を確認	Zejun Liu, Bryan K. Clark,	(参考訳) 量子マジック(英: Quantum magic, nonstabilizerness)は、安定化状態を持つ古典的なシミュラビリティに関する量子系の重要な特徴である。本研究では,量子魔法の尺度の1つである安定化器R'enyiエントロピーを,サインプロブレム自由ハミルトニアンを持つスピン系で計算するための,新しい,効率的なアルゴリズムを提案する。このアルゴリズムは、2つの分割関数のアンサンブル間の作業の経路積分の量子モンテカルロシミュレーションに基づいており、全ての空間次元と温度に適用される。このアルゴリズムは, 有限温度と零温度の両方で1次元および2次元の逆場Isingモデル上で実演し, テンソルネットワークに基づくアルゴリズムと定量的に一致することを示す。さらに,計算コストを解析し,解析的および数値的証拠の両方をシステムサイズの多項式として提供する。 Quantum magic, or nonstabilizerness, provides a crucial characterization of quantum systems, regarding the classical simulability with stabilizer states. In this work, we propose a novel and efficient algorithm for computing stabilizer R\'enyi entropy, one of the measures for quantum magic, in spin systems with sign-problem free Hamiltonians. This algorithm is based on the quantum Monte Carlo simulation of the path integral of the work between two partition function ensembles and it applies to all spatial dimensions and temperatures. We demonstrate this algorithm on the one and two dimensional transverse field Ising model at both finite and zero temperatures and show the quantitative agreements with tensor-network based algorithms. Furthermore, we analyze the computational cost and provide both analytical and numerical evidences for it to be polynomial in system size.	翻訳日:2024-05-31 18:46:29 公開日:2024-05-29
# EvaGaussian:Blurry画像からのガウス散乱を支援するイベントストリーム EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images ( http://arxiv.org/abs/2405.20224v1 ) ライセンス: Link先を確認	Wangbo Yu, Chaoran Feng, Jiye Tang, Xu Jia, Li Yuan, Yonghong Tian,	(参考訳) 3次元ガウススプラッティング(3D-GS)は、3次元シーン再構成と新しいビュー合成において例外的な機能を示した。しかし、その訓練は高品質でシャープな画像と正確なカメラポーズに大きく依存している。これらの要件を満たすことは、高速移動カメラや長時間露光を必要とする低照度環境において、モーションブルーのイメージが一般的に遭遇する、理想的でない現実世界のシナリオでは難しい。これらの課題に対処するために,イベントストリーム支援ガウシアンスプラッティング(EvaGaussians)を紹介した。これは,イベントカメラがキャプチャしたイベントストリームを統合して,ぼやけた画像から高品質な3D-GSを再構築する,新たなアプローチである。イベントカメラによって提供される高時間分解能とダイナミックレンジを利用して、イベントストリームを利用して、動きブル画像の形成過程を明示的にモデル化し、3D-GSの劣化する再構築を導く。露出時間中に3D-GSパラメータを共同で最適化し,カメラモーショントラジェクトリを復元することにより,複雑なテクスチャの詳細を持つ高忠実度新規ビューの獲得を確実なものにすることができる。提案手法を網羅的に評価し,従来の最先端のデブロアレンダリング手法と比較した。定性的・定量的な比較は, ぼやけた画像から細部を復元し, 高忠実なノベルビューを創出する上で, 既存の手法を超越していることを示す。 3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light environments that require long exposure times. To address these challenges, we introduce Event Stream Assisted Gaussian Splatting (EvaGaussians), a novel approach that integrates event streams captured by an event camera to assist in reconstructing high-quality 3D-GS from blurry images. Capitalizing on the high temporal resolution and dynamic range offered by the event camera, we leverage the event streams to explicitly model the formation process of motion-blurred images and guide the deblurring reconstruction of 3D-GS. By jointly optimizing the 3D-GS parameters and recovering camera motion trajectories during the exposure time, our method can robustly facilitate the acquisition of high-fidelity novel views with intricate texture details. We comprehensively evaluated our method and compared it with previous state-of-the-art deblurring rendering methods. Both qualitative and quantitative comparisons demonstrate that our method surpasses existing techniques in restoring fine details from blurry images and producing high-fidelity novel views.	翻訳日:2024-05-31 13:29:24 公開日:2024-05-29
# ViTGAN:視覚変換器を用いたガン訓練 ViTGAN: Training GANs with Vision Transformers ( http://arxiv.org/abs/2107.04589v2 ) ライセンス: Link先を確認	Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu,	(参考訳) 近年、視覚変換器(ViT)は、視覚固有の誘導バイアスを少なくしながら、画像認識に競争力を発揮している。本稿では,このような性能を画像生成に拡張できるかどうかについて検討する。この目的のために、我々はViTアーキテクチャをGAN(Generative Adversarial Network)に統合する。 ViT差別者に対しては、既存のGANの正規化手法が自己注意と不適切な相互作用をし、トレーニング中に深刻な不安定を生じさせることが観察された。この問題を解決するために、我々は、VTを用いたGANのトレーニングのための新しい正規化手法をいくつか導入する。 ViTジェネレータに対しては,収束を容易にするため,潜在層と画素マッピング層のアーキテクチャ選択について検討する。実証的に、我々のアプローチはViTGANと呼ばれ、CIFAR-10、CelebA、LSUN寝室という3つのデータセット上で、主要なCNNベースのGANモデルに匹敵する性能を実現している。 Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel mapping layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# MRCpy: 最小リスク分類のためのライブラリ MRCpy: A Library for Minimax Risk Classifiers ( http://arxiv.org/abs/2108.01952v4 ) ライセンス: Link先を確認	Kartheek Bondugula, Verónica Álvarez, José I. Segovia-Martín, Aritz Pérez, Santiago Mazuelas,	(参考訳) 教師付き分類のためのライブラリは、機械学習手法の幅広い使用を可能にしている。既存のライブラリであるScikit-learn、 Caret、mlpackは、古典的な経験的リスク最小化(ERM)アプローチに基づいた技術を実装している。我々は,ロバストリスク最小化(RRM)アプローチに基づいて,ミニマックスリスク分類器 (MRC) を実装したPythonライブラリ MRCpy を提案する。このライブラリは、性能保証を提供し、高次元での効率的な学習を可能にし、分散シフトに適応できる複数の MRC を提供する。 MRCpyはオブジェクト指向のアプローチに従い、Scikit-learnのような人気のあるPythonライブラリの標準に準拠している。ソースコードはGPL-3.0ライセンスでhttps://github.com/MachineLearningBCAM/MRCpyで入手できる。 Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The library offers multiple variants of MRCs that can provide performance guarantees, enable efficient learning in high dimensions, and adapt to distribution shifts. MRCpy follows an object-oriented approach and adheres to the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries. The source code is available under the GPL-3.0 license at https://github.com/MachineLearningBCAM/MRCpy.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# PARIS:睡眠の質を高めるためのパーソナライズされたアクティビティ勧告 PARIS: Personalized Activity Recommendation for Improving Sleep Quality ( http://arxiv.org/abs/2110.13745v2 ) ライセンス: Link先を確認	Meghna Singh, Saksham Goel, Abhiraj Mohan, Jaideep Srivastava,	(参考訳) 睡眠の質は、人々の身体的および精神的な健康に大きな影響を与えます。睡眠不足の人は、身体的・精神的苦痛、活動制限、不安、痛みを報告しやすい。さらに、ここ数年、アクティビティ監視やヘルストラッキングのためのアプリケーションやデバイスが爆発的に増えている。これらのウェアラブルデバイスから収集された信号は、睡眠品質の研究と改善に利用することができる。本稿では,身体活動と睡眠の質の関係を利用して,機械学習技術を用いて睡眠改善を支援する方法を提案する。通常、人々はいくつかの行動モードを持ち、その生体機能は分割できる。アクティビティデータに基づいて時系列クラスタリングを行うと、特定の対象に対して最も明白な行動モードと相関するクラスタセンターが見つかる。アクティビティのレシピは、各クラスタ内の各動作モードに対して、適切な睡眠品質のために生成される。これらのアクティビティレシピは、日々のルーチン中に被験者にリラックスした、激しいアクティビティの混合を提案するためのアクティビティ推奨エンジンに供給される。推奨は、睡眠の質の向上を目的とし、年齢、性別、体重指数(BMI)、安静時心拍数など、被験者のライフスタイルの制約に基づいてさらにパーソナライズされる。これは、心拍数を下げたり、睡眠の全体的な品質を改善したりといった長期的な健康目標に役立ちます。 The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to study and improve sleep quality. In this paper, we utilize the relationship between physical activity and sleep quality to find ways of assisting people improve their sleep using machine learning techniques. People usually have several behavior modes that their bio-functions can be divided into. Performing time series clustering on activity data, we find cluster centers that would correlate to the most evident behavior modes for a specific subject. Activity recipes are then generated for good sleep quality for each behavior mode within each cluster. These activity recipes are supplied to an activity recommendation engine for suggesting a mix of relaxed to intense activities to subjects during their daily routines. The recommendations are further personalized based on the subjects' lifestyle constraints, i.e. their age, gender, body mass index (BMI), resting heart rate, etc, with the objective of the recommendation being the improvement of that night's quality of sleep. This would in turn serve a longer-term health objective, like lowering heart rate, improving the overall quality of sleep, etc.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# 交換対称コインを持つ2つの量子ウォーカー間の空間的絡み合い Spatial entanglement between two quantum walkers with exchange symmetric coins ( http://arxiv.org/abs/2203.00873v2 ) ライセンス: Link先を確認	Ibrahim Yahaya Muhammad, Tanapat Deesuwan, Sikarin Yoo-Kong, Suwat Tangwancharoen, Monsit Tanasittikosol,	(参考訳) 本研究では、2つのコイン状態間の初期および最終交換対称性が、2つの量子ウォーカー間の空間的絡み合いのダイナミクスにどのように影響するかを検討する。特に、初期状態が反対称であり、コインの最終的な測定結果が対称的な結果をもたらすとき、硬貨がいつ測定されたかに関わらず、すべての初期絡み合いは空間的な自由度に転送される。逆に、最終的な結果が反対称であれば、空間絡み合いは減衰振動を示し、周期(T$)はコイン演算子パラメータ(\theta$)に逆比例する。これらの挙動は対称初期状態に対して逆転する。さらに,選択後の結果に対称性がない場合,初期状態に関わらず,同じ空間的絡み合い減衰を観測する。我々の研究結果は、量子ウォークにおける対称性が絡み合いのダイナミクスにどのように影響するかを明らかにし、量子技術への応用に対する潜在的な洞察を提供する。 We investigate how the initial and final exchange symmetries between the two-coin states influence the spatial entanglement dynamics between the two corresponding quantum walkers. Notably, when the initial state is anti-symmetric and the final measurement on the coins yields symmetric outcomes, all the initial entanglement will be transferred to the spatial degrees of freedom, regardless of when the coins are measured. Conversely, if the final outcomes are anti-symmetric, the spatial entanglement exhibits damped oscillation with a period ($T$) being inversely proportional to the coin operator parameter ($\theta$). These behaviours are reversed for symmetric initial states. Moreover, we also observe the same spatial entanglement damping regardless of the initial state when the post-selected results lack symmetry. Our findings reveal how symmetries affect the entanglement dynamics in quantum walks, offering potential insights for applications in quantum technology.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# 双方向オンライン市場のための動的マッチングバンド Dynamic Matching Bandit For Two-Sided Online Markets ( http://arxiv.org/abs/2205.03699v3 ) ライセンス: Link先を確認	Yuantong Li, Chi-hua Wang, Guang Cheng, Will Wei Sun,	(参考訳) 両面のオンラインマッチングプラットフォームは、様々な市場で採用されている。しかし、現在の市場でのエージェントの好みは通常暗黙的で未知であるため、データから学ぶ必要がある。意思決定プロセスに関わる動的側情報の増加に伴い、現代のオンラインマッチング手法では、コンテキスト情報に基づいてエージェントのシフト好みを追跡する能力が要求される。これにより,この動的オンラインマッチング問題とコンテキスト情報との新たな枠組みが提案され,マッチング決定における動的嗜好が実現される。既存の作業はオンラインマッチングと静的な嗜好に重点を置いているが、これは不十分である。本稿では,この問題に適応する動的マッチング帯域幅アルゴリズムを提案する。提案する動的マッチングアルゴリズムの鍵となる要素は、統計的保証を伴う選好ランクのオンライン推定である。理論的には,提案した動的マッチングアルゴリズムは,高い確率でエージェント-最適安定マッチング結果を提供する。特に、対数的後悔上限$\mathcal{O}(\log(T))$を証明し、対応するインスタンス依存の後悔下限を構築する。実験では、動的マッチングアルゴリズムは、様々な選好スキーム、文脈の次元、報奨雑音レベル、文脈変動レベルに対して堅牢であることを示し、求職市場への適用により、提案手法の実用性をさらに実証する。 Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side information involved in the decision process, modern online matching methodology demands the capability to track shifting preferences for agents based on contextual information. This motivates us to propose a novel framework for this dynamic online matching problem with contextual information, which allows for dynamic preferences in matching decisions. Existing works focus on online matching with static preferences, but this is insufficient: the two-sided preference changes as soon as one side's contextual information updates, resulting in non-static matching. In this paper, we propose a dynamic matching bandit algorithm to adapt to this problem. The key component of the proposed dynamic matching algorithm is an online estimation of the preference ranking with a statistical guarantee. Theoretically, we show that the proposed dynamic matching algorithm delivers an agent-optimal stable matching result with high probability. In particular, we prove a logarithmic regret upper bound $\mathcal{O}(\log(T))$ and construct a corresponding instance-dependent matching regret lower bound. In the experiments, we demonstrate that dynamic matching algorithm is robust to various preference schemes, dimensions of contexts, reward noise levels, and context variation levels, and its application to a job-seeking market further demonstrates the practical usage of the proposed method.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# 単一原子からの絡み合った多光子グラフ状態の効率的な生成 Efficient generation of entangled multi-photon graph states from a single atom ( http://arxiv.org/abs/2205.12736v2 ) ライセンス: Link先を確認	Philip Thomas, Leonardo Ruscio, Olivier Morin, Gerhard Rempe,	(参考訳) 絡み合いは強力な概念であり、科学的・技術的進歩の可能性を秘めている。現代の研究の中心は、絡み合った状態の生成と制御を少数のキュービットから多くのキュービットに拡張し、それらをデコヒーレンスから保護することである。これらのクビットキャリアは自然に頑丈で操作が容易であるため、光子は顕著な役割を担っている。しかし、フォトニックエンタングルメントを作成するための最も成功した技術は本質的に確率的であり、従って拡張性に厳しい制限が課せられる。ここでは、空洞内に単一のメモリ原子を持つ決定論的プロトコルを実装することにより、これらを回避する。調整原子軌道回転による単一光子放射をインターリーブし,グリーンベルガー・ホルン・ゼリンジャー状態の最大14光子と最大12光子の線形クラスター状態のそれぞれ76(6)%,56(4)%を効率よく成長させる。光子1個あたり43.18(7)%のソース対検出効率のおかげで、これらの大きな状態は1分間に1度、以前のどの実験よりも桁違いに速く測定できる。将来的には、この速度をさらに高めることができ、このスキームは空洞内の2つの原子に拡張したり、量子力学的に結合して高次元のクラスター状態を生成することができる。フォトニックエンタングルメント生成の確率的スキームによって生じる限界を克服し、我々はスケーラブルな計測に基づく量子計算と通信の方法を提案する。 Entanglement is a powerful concept with an enormous potential for scientific and technological advances. A central focus in modern research is to extend the generation and control of entangled states from few to many qubits, and protect them against decoherence. Optical photons play a prominent role as these qubit carriers are naturally robust and easy to manipulate. However, the most successful technique to date for creating photonic entanglement is inherently probabilistic and therefore subject to severe scalability limitations. Here we avoid these by implementing a deterministic protocol with a single memory atom in a cavity. We interleave controlled single-photon emissions with tailored atomic qubit rotations to efficiently grow Greenberger-Horne-Zeilinger states of up to 14 photons and linear cluster states of up to 12 photons with a fidelity lower bounded by 76(6)% and 56(4)%, respectively. Thanks to a source-to-detection efficiency of 43.18(7)% per photon we measure these large states about once every minute, orders of magnitude faster than in any previous experiment. In the future, this rate could be increased even further, the scheme could be extended to two atoms in a cavity, or several sources could be quantum mechanically coupled, to generate higher-dimensional cluster states. Overcoming the limitations encountered by probabilistic schemes for photonic entanglement generation, our results may offer a way towards scalable measurement-based quantum computation and communication.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# ツイスト付き等変微分(TED)K-理論における任意の位相次数 Anyonic Topological Order in Twisted Equivariant Differential (TED) K-Theory ( http://arxiv.org/abs/2206.13563v2 ) ライセンス: Link先を確認	Hisham Sati, Urs Schreiber,	(参考訳) 等変K-理論による非相互作用性結晶性トポロジカル絶縁相の分類は広く受け入れられているが、それ故に、トポロジカルブレイド量子ゲートを支持する位相的に秩序づけられた基底状態を持つ位相への一般化は、広く開放されている。それとは対照的に、相互作用しない位相を分類するK-理論の成功は、相互作用する位相秩序のK-理論的な分類を先導するものとして暗黙的に認識され、代わりに他の提案の混合が検討されている。しかしながら、K-理論だけが価電子の実際の物理学と密接に結びついており、自己整合性は、他の任意の提案がK-理論に結び付けなければならないことを要求している。ここでは、結晶のブリルアントーラス orbi-オリエンティフォールド内の結節点の補間における点の構成空間のツイスト等変微分(TED)K理論により、特に相互作用する2d半金属において、対称性の保護/エンハンスSU(2)-アノニックトポロジー次数の分類について詳細に議論する。特に、(1) 位相的 2d 半金属相 modulo 大域質量項は、直交点の補集合の平坦な微分等式 K-理論によって分類される; (2) n-電子相互作用相は、ブリルアントーラスの n 個の点の構成空間のK-理論によって分類される; (3) 等式 K-電子の「インナー局所系」による幾分無視されたねじれは、フェルミ粒子を任意の量子aに変換するChen, Wilczeck, Witten & Halperin (1989) の効果的な「実」ゲージ相互作用を反映する; (4) 誘導されたSu(2)-電子相互作用相は、相互作用束の構成空間の相互作用のチャーンクラスに反映される、という主張を論じる。 While the classification of non-interacting crystalline topological insulator phases by equivariant K-theory has become widely accepted, its generalization to anyonic interacting phases -- hence to phases with topologically ordered ground states supporting topological braid quantum gates -- has remained wide open. On the contrary, the success of K-theory with classifying non-interacting phases seems to have tacitly been perceived as precluding a K-theoretic classification of interacting topological order; and instead a mix of other proposals has been explored. However, only K-theory connects closely to the actual physics of valence electrons; and self-consistency demands that any other proposal must connect to K-theory. Here we provide a detailed argument for the classification of symmetry protected/enhanced su(2)-anyonic topological order, specifically in interacting 2d semi-metals, by the twisted equivariant differential (TED) K-theory of configuration spaces of points in the complement of nodal points inside the crystal's Brillouin torus orbi-orientifold. We argue, in particular, that: (1) topological 2d semi-metal phases modulo global mass terms are classified by the flat differential twisted equivariant K-theory of the complement of the nodal points; (2) n-electron interacting phases are classified by the K-theory of configuration spaces of n points in the Brillouin torus; (3) the somewhat neglected twisting of equivariant K-theory by "inner local systems" reflects the effective "fictitious" gauge interaction of Chen, Wilczeck, Witten & Halperin (1989), which turns fermions into anyonic quanta; (4) the induced su(2)-anyonic topological order is reflected in the twisted Chern classes of the interacting valence bundle over configuration space, constituting the hypergeometric integral construction of monodromy braid representations.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# ポテンシャルエネルギー損失による分断学習の保護 Protecting Split Learning by Potential Energy Loss ( http://arxiv.org/abs/2210.09617v2 ) ライセンス: Link先を確認	Fei Zheng, Chaochao Chen, Lingjuan Lyu, Xinyi Fu, Xing Fu, Weiqiang Wang, Xiaolin Zheng, Jianwei Yin,	(参考訳) 実践的なプライバシー保護学習手法として、スプリットラーニングは学術や産業で注目を集めている。しかし、中間結果がトレーニングと推論中に共有されるため、そのセキュリティは常に疑問視されている。本稿では,分割学習の前方埋め込みによるプライバシー漏洩に着目した。具体的には、フォワード埋め込みにはラベルに関する情報が多すぎるため、攻撃者はいくつかのラベル付きサンプルを使用してトップモデルを微調整するか、クラスタリングのような教師なしのアタックを実行して、フォワード埋め込みから真のラベルを推測することができる。このようなプライバシリークを防止するため,同じクラスの埋め込みを決定境界に向かって押し進めることで,フォワード埋め込みをより複雑にするための潜在的なエネルギー損失を提案する。したがって、攻撃者が前方埋め込みから学ぶことは困難である。実験の結果,本手法は細調整攻撃とクラスタリング攻撃の両方の性能を著しく低下させることがわかった。 As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage from the forward embeddings of split learning. Specifically, since the forward embeddings contain too much information about the label, the attacker can either use a few labeled samples to fine-tune the top model or perform unsupervised attacks such as clustering to infer the true labels from the forward embeddings. To prevent such kind of privacy leakage, we propose the potential energy loss to make the forward embeddings become more 'complicated', by pushing embeddings of the same class towards the decision boundary. Therefore, it is hard for the attacker to learn from the forward embeddings. Experiment results show that our method significantly lowers the performance of both fine-tuning attacks and clustering attacks.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# 教師なし領域適応による画像編集のためのGANインバージョン GAN Inversion for Image Editing via Unsupervised Domain Adaptation ( http://arxiv.org/abs/2211.12123v2 ) ライセンス: Link先を確認	Siyu Xing, Chen Gong, Hewei Guo, Xiao-Yu Zhang, Xinwen Hou, Yu Liu,	(参考訳) 既存のGANインバージョン手法は、より一般的な低品質(LQ)入力に苦しむ一方で、高品質(HQ)イメージの再構築に優れる。この問題に対処するために、HQおよびLQ画像の効果的な逆変換と編集のために、Unsupervised Domain Adaptation (UDA) をインバージョンプロセス、すなわち UDA-inversion として提案する。未ペアのHQイメージをソースドメインとして、LQイメージを未ラベルのターゲットドメインとして、対象ドメインの損失値がソースドメインの損失によって上界となるという理論的保証と、2つのドメイン間の差を測定する新しい差分関数を導入する。その後、この上限を最小化してHQおよびLQ画像の正確な潜時符号を得る。これにより、HQ画像の構成的表現を自然に学習し、監督なしでLQ画像に変換することができる。 UDA-InversionはFFHQデータセットで22.14のPSNRを実現し、教師付きメソッドと互換性がある。 Existing GAN inversion methods work brilliantly in reconstructing high-quality (HQ) images while struggling with more common low-quality (LQ) inputs in practical application. To address this issue, we propose Unsupervised Domain Adaptation (UDA) in the inversion process, namely UDA-inversion, for effective inversion and editing of both HQ and LQ images. Regarding unpaired HQ images as the source domain and LQ images as the unlabeled target domain, we introduce a theoretical guarantee: loss value in the target domain is upper-bounded by loss in the source domain and a novel discrepancy function measuring the difference between two domains. Following that, we can only minimize this upper bound to obtain accurate latent codes for HQ and LQ images. Thus, constructive representations of HQ images can be spontaneously learned and transformed into LQ images without supervision. UDA-Inversion achieves a better PSNR of 22.14 on FFHQ dataset and performs comparably to supervised methods.	翻訳日:2024-05-31 02:51:07 公開日:2024-05-29
# 確率的文脈 MDP のためのエルダーベースレグレット Eluder-based Regret for Stochastic Contextual MDPs ( http://arxiv.org/abs/2211.14932v3 ) ライセンス: Link先を確認	Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour,	(参考訳) 本稿では,確率的マルコフ決定過程(CMDP)における後悔最小化のためのE-UC$^3$RLアルゴリズムを提案する。このアルゴリズムは、実現可能な関数クラスと \emph{offline} 最小二乗およびログ損失回帰オラクルへのアクセスという最小の仮定の下で機能する。我々のアルゴリズムは効率が良く(効率的なオフライン回帰オラクルを仮定すると)、$ \widetilde{O}(H^3 \sqrt{T \|S\| \|A\|d_{\mathrm{E}}(\mathcal{P}) \log (\|\mathcal{F}\|\|\mathcal{P}\|/\delta) )} の後悔を保証する。我々の知る限り、我々のアルゴリズムは、一般的なオフライン関数近似設定の下で動作しているCMDPに対する、最初の効率的かつレート最適後悔最小化アルゴリズムである。さらに、エルダー次元を別の興味を持つような一般有界測度に拡張する。 We present the E-UC$^3$RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $ \widetilde{O}(H^3 \sqrt{T \|S\| \|A\|d_{\mathrm{E}}(\mathcal{P}) \log (\|\mathcal{F}\| \|\mathcal{P}\|/ \delta) )}) , $ with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, $\mathcal{P}$ and $\mathcal{F}$ are finite function classes used to approximate the context-dependent dynamics and rewards, respectively, and $d_{\mathrm{E}}(\mathcal{P})$ is the Eluder dimension of $\mathcal{P}$ w.r.t the Hellinger distance. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting. In addition, we extend the Eluder dimension to general bounded metrics which may be of separate interest.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# Quota と Complementary Preferences Constraint による両面競合型マッチング推奨市場 Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints ( http://arxiv.org/abs/2301.10230v3 ) ライセンス: Link先を確認	Yuantong Li, Guang Cheng, Xiaowu Dai,	(参考訳) 本稿では,エージェントの嗜好が未知であり,データから学習しなければならないような,相補的な嗜好とクォータ制約を伴う双方向のオンラインマッチング市場の課題に対処する,新たなレコメンデーションアルゴリズムを提案する。混合クォータと相補的な選好制約の存在は、マッチングプロセスの不安定性を招き、この問題を解決するのが難しくなる。この課題を克服するために、バンドレート学習フレームワークとして問題を定式化し、マルチエージェントマルチタイプトンプソンサンプリング(MMTS)アルゴリズムを提案する。このアルゴリズムは、トンプソンサンプリングの強度と新しい二重マッチング手法を組み合わせて、安定したマッチング結果を提供する。我々の理論的分析は、MMTSが安定性を達成できることを示すものであり、全会社のクォータ$Q$、利用可能なタイプワーカーの最大サイズの平方根である$\sqrt{K_{\max}T}}とタイムホライズン$T$に対して線形性を示す高い確率で、合計$\widetilde{\mathcal{O}}(Q{\sqrt{K_{\max}T}})$-Bayesian regretを持つ。さらに,様々な環境下でのMMTSの有効性についてもシミュレーション研究を行った。実験で使われたコードを提供しています。 In this paper, we propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints, where agents' preferences are unknown a priori and must be learned from data. The presence of mixed quota and complementary preferences constraints can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a new double matching technique to provide a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it can achieve stability and has a total $\widetilde{\mathcal{O}}(Q{\sqrt{K_{\max}T}})$-Bayesian regret with high probability, which exhibits linearity with respect to the total firm's quota $Q$, the square root of the maximum size of available type workers $\sqrt{K_{\max}}$ and time horizon $T$. In addition, simulation studies also demonstrate MMTS's effectiveness in various settings. We provide code used in our experiments \url{https://github.com/Likelyt/Double-Matching}.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# アルゴリズム設計型ニューラルネットワーク(ADANNs):パラメトリック偏微分方程式の高次深部演算子学習 Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations ( http://arxiv.org/abs/2302.03286v2 ) ライセンス: Link先を確認	Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger,	(参考訳) 本稿では,パラメータ偏微分方程式(PDE)に関連する近似演算子に対する新しいディープラーニング手法を提案する。特に、特定の近似問題に適した特定のANN初期化スキームとともに、特定の人工知能ニューラルネットワーク(ANN)アーキテクチャを設計する新しい戦略を導入する。提案手法では,高速な古典的数値近似手法と深層演算子学習手法を併用する。具体的には、既存のANNアーキテクチャのカスタマイズされた適応と、これらのANNアーキテクチャの特別な初期化を導入し、初期化時に、ANNが検討された近似問題に対して、選択された最適化された古典的数値アルゴリズムを忠実に模倣するようにした。得られたANNアーキテクチャとその初期化スキームは、数値アルゴリズムや文学からの一般的なディープラーニング手法に強く影響を受けており、その意味では、アルゴリズム設計されたニューラルネットワーク(ADANN)として、アルゴリズムで作成された初期化スキームとともに導入されたANNを参照する。いくつかのパラメトリックPDEの場合,提案手法を数値的に検証する。検証された数値例では、ADANN手法は既存の近似アルゴリズムと、文献からのディープラーニングの方法論を著しく上回っている。 In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the proposed approach we combine efficient classical numerical approximation techniques with deep operator learning methodologies. Specifically, we introduce customized adaptions of existing ANN architectures together with specialized initializations for these ANN architectures so that at initialization we have that the ANNs closely mimic a chosen efficient classical numerical algorithm for the considered approximation problem. The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs). We numerically test the proposed ADANN methodology in the case of several parametric PDEs. In the tested numerical examples the ADANN methodology significantly outperforms existing traditional approximation algorithms as well as existing deep operator learning methodologies from the literature.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# プライベート推定におけるサブセットベースインスタンス最適性 Subset-Based Instance Optimality in Private Estimation ( http://arxiv.org/abs/2303.01262v3 ) ライセンス: Link先を確認	Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh,	(参考訳) 本稿では,差分プライベート推定アルゴリズムのインスタンス最適性を新たに定義する。私たちの定義では、データセットの$D$に対して同時に競合する最適なアルゴリズムと、最高のプライベートベンチマークアルゴリズムが必要です。 (a)事前にD$を知っており、 (b) は$D$の大規模なサブセット上での最悪のパフォーマンスによって評価される。つまり、潜在的に極端な点が$D$に加算された場合、ベンチマークアルゴリズムはうまく動作しない。これにより、ベンチマークは以前の作業で提案されたベンチマークよりも大幅に強くなります。それにもかかわらず、実際の評価されたデータセットに対して、手段、量子化、および$\ell_p$-norm最小化を含む幅広いデータセット特性のクラスを推定する際に、インスタンス最適性の概念を達成するプライベートアルゴリズムを構築する方法を示す。具体的には,提案アルゴリズムが既存アルゴリズムの漸近的性能と同時に一致するか,あるいは超えていることを示す。 We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# 安全-帯域制限下における中本合意のトレードオフ- Security--Throughput Tradeoff of Nakamoto Consensus under Bandwidth Constraints ( http://arxiv.org/abs/2303.09113v3 ) ライセンス: Link先を確認	Lucianna Kiffer, Joachim Neu, Srivatsan Sridhar, Aviv Zohar, David Tse,	(参考訳) セキュリティとパフォーマンスのトレードオフの古典的な問題を再考する: 限られた能力を持つノードのネットワークが与えられると、特定のブロックの生産速度に対して、敵の力の何パーセントがNCであるのか? 制限された通信や計算資源といった現実的な制約を捉えない有界遅延モデルであるため、中本プロトコルの最先端分析ではこの問題に答えられていない。境界帯域モデルを用いて,PoW中本コンセンサスに対するセキュリティ・パフォーマンストレードオフを改良した解析手法を開発した。このモデルでは,従来の有界遅延モデルとは対照的に,中本氏の私的攻撃はもはや最悪の攻撃ではなく,帯域幅の制限によるネットワーク混雑を悪用したティーシング戦略と呼ばれる新たな攻撃戦略が著しく悪化していることが示されている。 PoSでは、同化ブロックは、非常に低いブロック生成率を除いて、従来のPoS Nakamotoコンセンサスプロトコルの安全性を損なうため、混雑を悪化させる可能性がある。このような均等なスパムに対処するため、我々はBlanking NC (BlaNC) と呼ぶPoS NCプロトコルの変種を提示し、PoW NCと同じレジリエンスを実現する。 For Nakamoto's longest-chain consensus protocol, whose proof-of-work (PoW) and proof-of-stake (PoS) variants power major blockchains such as Bitcoin and Cardano, we revisit the classic problem of the security-performance tradeoff: Given a network of nodes with limited capacities, against what fraction of adversary power is Nakamoto consensus (NC) secure for a given block production rate? State-of-the-art analyses of Nakamoto's protocol fail to answer this question because their bounded-delay model does not capture realistic constraints such as limited communication- and computation-resources. We develop a new analysis technique to prove a refined security-performance tradeoff for PoW Nakamoto consensus in a bounded-bandwidth model. In this model, we show that, in contrast to the classic bounded-delay model, Nakamoto's private attack is no longer the worst attack, and a new attack strategy we call the teasing strategy, that exploits the network congestion caused by limited bandwidth, is strictly worse. In PoS, equivocating blocks can exacerbate congestion, making the traditional PoS Nakamoto consensus protocol insecure except at very low block production rates. To counter such equivocation spamming, we present a variant of the PoS NC protocol we call Blanking NC (BlaNC), which achieves the same resilience as PoW NC.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# 拡散モデルのセマンティック潜在空間における解釈的方向の発見 Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models ( http://arxiv.org/abs/2303.11073v2 ) ライセンス: Link先を確認	René Haas, Inbar Huberman-Spiegelglas, Rotem Mulayoff, Stella Graßhof, Sami S. Brandt, Tomer Michaeli,	(参考訳) Denoising Diffusion Models (DDM) はGenerative Adversarial Networks (GAN) と強力な競合関係にある。しかし、画像合成や編集に広く用いられているにもかかわらず、その潜在空間はいまだによく理解されていない。近年,「$h$-space」とよばれるDDMのセマンティック潜在空間が,GANを連想させる形でセマンティック画像編集を容易にすることが示されている。 h$-スペースは拡散過程のすべての時間ステップでDDMのデノイザーのボトルネックアクティベーションから成り立っている。本稿では,h-spaceの特性について検討し,その中に意味のある意味的方向を求めるための新しい手法を提案する。まず、事前訓練されたDDMにおける解釈可能な意味方向を明らかにするための教師なし手法の研究から始める。具体的には,グローバル潜伏方向が潜伏空間の主成分として現れることを示す。さらに,遅延符号のデノイザWr.t.のヤコビアンのスペクトル解析により,画像固有の意味方向を検出する新しい手法を提案する。次に、非条件DDMにおける教師付き方式で方向を求めることで分析を拡張した。実画像のラベル付きデータセットか、ドメイン固有の属性分類器で生成されたサンプルにアノテートすることで、そのような方向を見つけることができることを示す。さらに、簡単な線形射影により、検出された方向を意味的に切り離す方法を示す。私たちのアプローチは、アーキテクチャの変更、テキストベースのガイダンス、CLIPベースの最適化、モデル微調整を必要とせずに適用できます。 Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs). However, despite their widespread use in image synthesis and editing applications, their latent space is still not as well understood. Recently, a semantic latent space for DDMs, coined `$h$-space', was shown to facilitate semantic image editing in a way reminiscent of GANs. The $h$-space is comprised of the bottleneck activations in the DDM's denoiser across all timesteps of the diffusion process. In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it. We start by studying unsupervised methods for revealing interpretable semantic directions in pretrained DDMs. Specifically, we show that global latent directions emerge as the principal components in the latent space. Additionally, we provide a novel method for discovering image-specific semantic directions by spectral analysis of the Jacobian of the denoiser w.r.t. the latent code. Next, we extend the analysis by finding directions in a supervised fashion in unconditional DDMs. We demonstrate how such directions can be found by relying on either a labeled data set of real images or by annotating generated samples with a domain-specific attribute classifier. We further show how to semantically disentangle the found direction by simple linear projection. Our approaches are applicable without requiring any architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# グラフ上のランダム逆問題:分散オンライン学習 Random Inverse Problems Over Graphs: Decentralized Online Learning ( http://arxiv.org/abs/2303.11789v5 ) ライセンス: Link先を確認	Tao Li, Xiwei Zhang,	(参考訳) ネットワークグラフ上の分散ランダム逆問題のフレームワークをオンライン測定で構築し,分散化されたオンライン学習アルゴリズムを提案する。これはヒルベルト空間における分散パラメータ推定と、再現されたカーネルヒルベルト空間(RKHS-LMS)における最小平均平方問題を統一する。我々は、アルゴリズムの収束を、L2有界なマルティンゲール差項を持つヒルベルト空間における不均一なランダム差分方程式のクラスにおける漸近安定性に変換し、ヒルベルト空間におけるL2-漸近安定性理論を開発する。ネットワークグラフが連結され、フォワード作用素の列が励起条件の無限次元時空間持続性を満たすならば、全てのノードの見積もりは平均二乗であり、ほぼ確実に一致している。さらに,RKHSにおける非定常および非独立なオンラインデータストリームに基づく分散オンライン学習アルゴリズムを提案し,ランダム入力データによって誘導される演算子が励振条件の無限次元時空間持続性を満たす場合,そのアルゴリズムが平均二乗でほぼ確実に整合であることを証明した。 We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# SoftED:時系列イベント検出のソフト評価基準 SoftED: Metrics for Soft Evaluation of Time Series Event Detection ( http://arxiv.org/abs/2304.00439v2 ) ライセンス: Link先を確認	Rebecca Salles, Janio Lima, Rafaelli Coutinho, Esther Pacitti, Florent Masseglia, Reza Akbarinia, Chao Chen, Jonathan Garibaldi, Fabio Porto, Eduardo Ogasawara,	(参考訳) 時系列イベント検出法は,検出精度にのみ焦点をあてた標準分類基準によって評価される。しかし、事象を検出する不正確さは、しばしば、隣り合う検出に反映される前のまたは遅れた影響から生じる。これらの検出は、必要なアクションをトリガーしたり、不満足な結果を軽減するのに役立ちます。この文脈では、現在のメトリクスは不十分であり、イベント検出のコンテキストには不十分である。近隣の検知に対する時間と時間的寛容の両方の概念を取り入れたメトリクスの需要がある。本稿では,イベント検出手法のソフトアセスメントのために設計された,新しいメトリクスセットであるSoftEDメトリクスを紹介する。これにより、検出精度と、その検出がイベントを表す程度の両方を評価することができる。彼らは、通常の分類指標と比較して、時間的寛容性を36倍以上の実験に取り入れ、イベントと代表的検出を関連付けることにより、イベント検出の評価を改善した。 SoftEDメトリクスは、検出評価とメソッド選択への貢献を示すドメインスペシャリストによって検証された。 Time series event detection methods are evaluated mainly by standard classification metrics that focus solely on detection accuracy. However, inaccuracy in detecting an event can often result from its preceding or delayed effects reflected in neighboring detections. These detections are valuable to trigger necessary actions or help mitigate unwelcome consequences. In this context, current metrics are insufficient and inadequate for the context of event detection. There is a demand for metrics that incorporate both the concept of time and temporal tolerance for neighboring detections. This paper introduces SoftED metrics, a new set of metrics designed for soft evaluating event detection methods. They enable the evaluation of both detection accuracy and the degree to which their detections represent events. They improved event detection evaluation by associating events and their representative detections, incorporating temporal tolerance in over 36\% of experiments compared to the usual classification metrics. SoftED metrics were validated by domain specialists that indicated their contribution to detection evaluation and method selection.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# 複合組織均質化のためのニューラルネットワークトランスモデル A Neural Network Transformer Model for Composite Microstructure Homogenization ( http://arxiv.org/abs/2304.07877v2 ) ライセンス: Link先を確認	Emil Pitz, Kishore Pochiraju,	(参考訳) 複合組織における不均一性と不確実性は、厳密にモデル化された場合の計算ボトルネックか、応力場における解の不正確さと、近似された場合の故障予測につながる。任意の構造と非線形構造を解析するのに適する手法は存在するが、その計算コストは大規模構造解析において実用的ではない。サロゲートモデル (Surrogate Model) またはリダクションオーダーモデル (ROM) は、一般的に効率を高めるが、通常は単一のマイクロ構造でキャリブレーションされる。森田中法のような均質化法は、幅広い構成特性に対して急速な均質化をもたらす。しかし、位相における応力やひずみ平均化のような仮定を単純化することは、ミクロ構造における決定論的および確率的バリエーションの両方を考慮すべきである。本稿では,様々なミクロ構造や構成成分の知識を捉えるトランスフォーマーニューラルネットワークアーキテクチャについて述べる。エラストプラストマトリックス内の線形弾性繊維の任意の合成ミクロ構造のイメージや抽象化が与えられた場合、トランスフォーマーネットワークは、履歴依存、非線形、均質化されたストレス-ひずみ応答を予測する。主成分分析 (PCA) を用いた2点統計計算と, 畳み込みニューラルネットワーク (CNN) を用いたオートエンコーダを用いた。どちらの手法も、均質化物質応答を正確に予測する。開発されたトランスニューラルネットワークは、様々なミクロ構造に一般化可能で拡張可能な、ミクロ構造間翻訳の効率的な手段を提供する。本稿では,サイクリングおよびランダム負荷下でのネットワークアーキテクチャ,データ生成のトレーニングとテスト,パフォーマンスについて述べる。 Heterogeneity and uncertainty in a composite microstructure lead to either computational bottlenecks if modeled rigorously or to solution inaccuracies in the stress field and failure predictions if approximated. Although methods suitable for analyzing arbitrary and non-linear microstructures exist, their computational cost makes them impractical to use in large-scale structural analysis. Surrogate models or Reduced Order Models (ROMs) commonly enhance efficiencies but are typically calibrated with a single microstructure. Homogenization methods, such as the Mori-Tanaka method, offer rapid homogenization for a wide range of constituent properties. However, simplifying assumptions, like stress and strain averaging in phases, render the consideration of both deterministic and stochastic variations in microstructure infeasible. This paper illustrates a transformer neural network architecture that captures the knowledge of various microstructures and constituents, enabling it to function as a computationally efficient homogenization surrogate model. Given an image or an abstraction of an arbitrary composite microstructure of linearly elastic fibers in an elastoplastic matrix, the transformer network predicts the history-dependent, non-linear, and homogenized stress-strain response. Two methods for encoding microstructure features were tested: calculating two-point statistics using Principal Component Analysis (PCA) for dimensionality reduction and employing an autoencoder with a Convolutional Neural Network (CNN). Both methods accurately predict the homogenized material response. The developed transformer neural network offers an efficient means for microstructure-to-property translation, generalizable and extendable to a variety of microstructures. The paper describes the network architecture, training and testing data generation, and performance under cycling and random loadings.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# PLIP:人物表現学習のための言語画像事前学習 PLIP: Language-Image Pre-training for Person Representation Learning ( http://arxiv.org/abs/2305.08386v2 ) ライセンス: Link先を確認	Jialong Zuo, Jiahao Hong, Feng Zhang, Changqian Yu, Hanyu Zhou, Changxin Gao, Nong Sang, Jingdong Wang,	(参考訳) 言語イメージ事前学習は、一般的なドメインにおける強力な表現を学習するための効果的なテクニックである。しかし、直接人体表現学習を行う場合、これらの一般的な事前学習法は不満足な性能に悩まされる。理由は、批判的な人物の特徴、すなわちきめ細かい属性やアイデンティティを無視するからである。この問題に対処するために,PLIPと呼ばれる人物表現学習のための新しい言語画像事前学習フレームワークを提案する。具体的には、3つのプレテキストタスクを精巧に設計する。 1) テキスト誘導画像のカラー化は,人物関連画像領域と微粒なカラー部分のテキストフレーズとの対応性を確立することを目的としている。 2【画像誘導属性予測】は、画像中の人物の微粒な属性情報をマイニングすることを目的とする。 3) アイデンティティベースのVision-Language Contrastは、インスタンスレベルではなく、アイデンティティレベルでのクロスモーダル表現の相関を目指している。さらに,事前トレーニングフレームワークを実装するために,SynTH-PEDESという画像テキストペアを用いた大規模人物データセットを構築し,テキストアノテーションを自動生成する。我々は、SynTH-PEDES上でPLIPを事前訓練し、下流の人中心のタスクにまたがってモデルを評価する。 PLIPはこれらのタスクの既存のメソッドを大幅に改善するだけでなく、ゼロショットやドメインの一般化設定でも優れた機能を示している。コード、データセット、重み付けは~\url{https://github.com/Zplusdragon/PLIP} でリリースされる。 Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases. 2) Image-guided Attributes Prediction, aims to mine fine-grained attribute information of the person body in the image; and 3) Identity-based Vision-Language Contrast, aims to correlate the cross-modal representations at the identity level rather than the instance level. Moreover, to implement our pre-train framework, we construct a large-scale person dataset with image-text pairs named SYNTH-PEDES by automatically generating textual annotations. We pre-train PLIP on SYNTH-PEDES and evaluate our models by spanning downstream person-centric tasks. PLIP not only significantly improves existing methods on all these tasks, but also shows great ability in the zero-shot and domain generalization settings. The code, dataset and weights will be released at~\url{https://github.com/Zplusdragon/PLIP}	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# 物理インフォームドニューラルネットワークの効率的な誤り証明 Efficient Error Certification for Physics-Informed Neural Networks ( http://arxiv.org/abs/2305.10157v2 ) ライセンス: Link先を確認	Francisco Eiras, Adel Bibi, Rudy Bunel, Krishnamurthy Dj Dvijotham, Philip Torr, M. Pawan Kumar,	(参考訳) 最近の研究は、物理情報ニューラルネットワーク(PINN)が偏微分方程式(PDE)を効率的に解くことができるという有望な証拠を提供している。しかし、従来の研究では、時空間領域をまたいだPINNの最悪の残差(数値解法の耐性に類似した尺度)を保証できなかった。実世界のアプリケーションでは、有限の点集合に対するテストは、異なる集合におけるパフォーマンスが著しく悪化する可能性があるため、配置のための十分な基盤とみなすことはできない。この問題を緩和するため、PINNの継続的な適用性ドメインに対するエラーベースの条件を保証します。 PINN残差エラーをバインドする汎用的で効率的でスケーラブルなポストトレーニングフレームワークである$\partial$-CROWNを導入する。本稿では,古典的に研究されている2つのPINN(Burgers' と Schr\odinger' の方程式)と,Allan-Cahn と Diffusion-Sorption の2つの実世界の応用において,より難易度の高い証明を得る上での有効性を示す。 Recent work provides promising evidence that Physics-Informed Neural Networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their solution and the ones obtained by a solver on a set of inputs. In real-world applications, one cannot consider tests on a finite set of points to be sufficient grounds for deployment, as the performance could be substantially worse on a different set. To alleviate this issue, we establish guaranteed error-based conditions for PINNs over their continuous applicability domain. To verify the extent to which they hold, we introduce $\partial$-CROWN: a general, efficient and scalable post-training framework to bound PINN residual errors. We demonstrate its effectiveness in obtaining tight certificates by applying it to two classically studied PINNs - Burgers' and Schr\"odinger's equations -, and two more challenging ones with real-world applications - the Allan-Cahn and Diffusion-Sorption equations.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# UP5:Fairness-Aware RecommendationのためのUnbiased Foundation Model UP5: Unbiased Foundation Model for Fairness-aware Recommendation ( http://arxiv.org/abs/2305.12090v2 ) ライセンス: Link先を確認	Wenyue Hua, Yingqiang Ge, Shuyuan Xu, Jianchao Ji, Yongfeng Zhang,	(参考訳) LLM(Large Language Models)のような基礎モデルの最近の進歩は、それらをRecommender Systems(RS)の最前線へと押し上げている。実用性にもかかわらず、LSMが社会的ステレオタイプを必然的に持続させ、不当な勧告をもたらすのではないかという懸念が高まっている。本論文は,多くのユーザが意思決定や需要充足のために考えるように,RSにとって公平性は不可欠であるため,性別や年齢などの特定の敏感な特徴に公平であるように推奨するレコメンデーションに対して,ユーザ側の公正性に焦点をあてる。本稿では,LLM ベースのレコメンデーションモデルにおいて,T5 と LLaMA のバックボーンをベースとしたレコメンデーションモデルが示す不公平さの程度について検討し,LLM ベースのレコメンデーションモデルにおけるユーザの公平な扱いを促進するための適切な方法について議論する。フェアネスを意識したLLMレコメンデーションのための新しいCFP法をUnbiased Foundation mOdels(UFO)に導入する。実世界の2つのデータセットであるMovieLens-1MとInsurationで実験を行い、マッチングベースとシーケンシャルベースの両方のフェアネス対応レコメンデーションモデルと比較した。その結果,CFPは高い公正度でより優れたレコメンデーション性能が得られることがわかった。データとコードはhttps://github.com/agiresearch/UP5.comで公開されている。 Recent advances in Foundation Models such as Large Language Models (LLMs) have propelled them to the forefront of Recommender Systems (RS). Despite their utility, there is a growing concern that LLMs might inadvertently perpetuate societal stereotypes, resulting in unfair recommendations. Since fairness is critical for RS as many users take it for decision-making and demand fulfillment, this paper focuses on user-side fairness for LLM-based recommendation where the users may require a recommender system to be fair on specific sensitive features such as gender or age. In this paper, we dive into the extent of unfairness exhibited by LLM-based recommender models based on both T5 and LLaMA backbones, and discuss appropriate methods for promoting equitable treatment of users in LLM-based recommendation models. We introduce a novel Counterfactually-Fair-Prompt (CFP) method towards Unbiased Foundation mOdels (UFO) for fairness-aware LLM-based recommendation. Experiments are conducted on two real-world datasets, MovieLens-1M and Insurance, and compared with both matching-based and sequential-based fairness-aware recommendation models. Results show that CFP achieves better recommendation performance with a high level of fairness. Data and code are open-sourced at https://github.com/agiresearch/UP5.	翻訳日:2024-05-31 02:41:05 公開日:2024-05-29
# InstructVid2Vid:自然言語による制御可能なビデオ編集 InstructVid2Vid: Controllable Video Editing with Natural Language Instructions ( http://arxiv.org/abs/2305.12328v2 ) ライセンス: Link先を確認	Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang,	(参考訳) InstructVid2Vidは人間の言語指導による映像編集のためのエンドツーエンド拡散方式である。我々のアプローチは、自然言語ディレクティブによって案内される映像操作を強化し、サンプルごとの微調整や逆変換の必要性を排除します。提案したInstructVid2Vidモデルは、予め訓練された画像生成モデルであるStable Diffusionを変更して、ビデオフレームの時間依存シーケンスを生成する。異なるモデルの集合的インテリジェンスを活用することで、私たちは、実世界のシナリオでデータを収集するよりコスト効率の良い代替手段として、ビデオインストラクション三脚に富んだトレーニングデータセットを構築しました。生成したビデオ内の連続したフレーム間のコヒーレンスを高めるために、フレーム間一貫性損失を提案し、トレーニングプロセス中にそれを組み込む。推論段階におけるマルチモーダル分類器フリーガイダンスにより、生成されたビデオは、入力されたビデオと付随する命令の両方に共鳴することができる。実験結果から,InstructVid2Vidは高品質で時間的コヒーレントなビデオを生成し,属性編集や背景変更,スタイル転送などの多様な編集を行うことができることがわかった。これらの結果は,提案手法の汎用性と有効性を示すものである。 We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion. The proposed InstructVid2Vid model modifies a pretrained image generation model, Stable Diffusion, to generate a time-dependent sequence of video frames. By harnessing the collective intelligence of disparate models, we engineer a training dataset rich in video-instruction triplets, which is a more cost-efficient alternative to collecting data in real-world scenarios. To enhance the coherence between successive frames within the generated videos, we propose the Inter-Frames Consistency Loss and incorporate it during the training process. With multimodal classifier-free guidance during the inference stage, the generated videos is able to resonate with both the input video and the accompanying instructions. Experimental results demonstrate that InstructVid2Vid is capable of generating high-quality, temporally coherent videos and performing diverse edits, including attribute editing, background changes, and style transfer. These results underscore the versatility and effectiveness of our proposed method.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# ランジュバンモンテカルロアルゴリズムによる非対数・非平滑サンプリング Non-Log-Concave and Nonsmooth Sampling via Langevin Monte Carlo Algorithms ( http://arxiv.org/abs/2305.15988v2 ) ライセンス: Link先を確認	Tim Tsz-Kit Lau, Han Liu, Thomas Pock,	(参考訳) マルチモーダル性のため,低次元でもしばしば困難であるガウス混合系(例えば,ガウス混合系)からの近似サンプリング問題について検討する。我々はマルコフ連鎖モンテカルロ法(MCMC)を用いてこの課題を実行することに集中する。さらに, 近位MCMC法が開発されている2つの非平滑症例にも関心がある。 (i)非滑らかな前者は、ガウス混合とみなす。 (II)ラプラシアン混合分布このような非滑らかで非log-concaveサンプリングタスクは、ベイズ推論や画像のデコンボリューションのような逆問題の画像化への幅広い応用から生じる。我々は,最もよく用いられるLangevin Monte Carloアルゴリズムの性能を比較するために数値シミュレーションを行う。 We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermore, we are also interested in two nonsmooth cases for which a large class of proximal MCMC methods have been developed: (i) a nonsmooth prior is considered with a Gaussian mixture likelihood; (ii) a Laplacian mixture distribution. Such nonsmooth and non-log-concave sampling tasks arise from a wide range of applications to Bayesian inference and imaging inverse problems such as image deconvolution. We perform numerical simulations to compare the performance of most commonly used Langevin Monte Carlo algorithms.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# ベイズ原理によるニューラル付加モデルの改善 Improving Neural Additive Models with Bayesian Principles ( http://arxiv.org/abs/2305.16905v4 ) ライセンス: Link先を確認	Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Rätsch, Vincent Fortuin,	(参考訳) ニューラル加算モデル(NAM)は、個別の加算サブネットワークにおける入力特徴を扱うことにより、ディープニューラルネットワークの透明性を高める。しかし、それらは校正された不確実性を提供し、関連する特徴や相互作用の選択を可能にする固有のメカニズムを欠いている。ベイズの観点から NAM にアプローチすることで、我々はこれらを3つの主要な方法で拡張する。 a) 個別の付加的部分ネットワークに信頼性のある期間を提供することロ経験的ベイズ手続により特徴の暗黙的な選択を行うための限界確率を推定すること。 c) 微調整されたモデルにおける二階相互作用の候補としての機能対のランク付けを容易にすること。特にLaplace-approximated NAMs (LA-NAMs) を開発した。 Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 自己教師付き学習のための行列情報理論 Matrix Information Theory for Self-Supervised Learning ( http://arxiv.org/abs/2305.17326v6 ) ライセンス: Link先を確認	Yifan Zhang, Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan,	(参考訳) 最大エントロピー符号化フレームワークは、SimSiam、Barlow Twins、MECといった多くの非コントラスト学習手法に対して統一的な視点を提供する。このフレームワークに着想を得たMatrix-SSLは,行列情報理論を利用して最大エントロピー符号化損失を行列均一性損失として解釈する手法である。さらに、Matrix-SSLは、行列アライメント損失をシームレスに取り込み、異なる分岐に共分散行列を直接アライメントすることで、最大エントロピー符号化法を強化する。実験結果から, Matrix-SSLは, 線形評価条件下でのImageNetデータセットや, 伝達学習タスクのためのMS-COCO上で, 最先端の手法よりも優れていることがわかった。具体的には,MS-COCO上で伝達学習を行う場合,MoCo v2やBYOLといった従来のSOTA手法よりも3.3%向上し,800エポックの事前学習に比べて400エポックに留まった。また,行列クロスエントロピー損失を用いた7Bモデルを微調整し,標準クロスエントロピー損失に対するGSM8Kデータセットのマージンを3.1%とすることで,表現学習を言語モデリングシステムに導入する。コードはhttps://github.com/yifanzhang-pro/Matrix-SSLで公開されている。 The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# フローガイド型ナノスケールローカライゼーションの設計空間の展望 Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization ( http://arxiv.org/abs/2305.18493v2 ) ライセンス: Link先を確認	Filip Lemic, Gerard Calvo Bartra, Arnau Brosa López, Jorge Torres Gómez, Jakob Struye, Falko Dressler, Sergi Abadal, Xavier Costa Perez,	(参考訳) Terahertz(THz)をベースとした無線通信機能を備えたナノデバイスは、ヒトの血流内におけるフロー誘導局在のプライマーを提供する。このようなローカライゼーションは、知覚された事象の場所をイベント自体に割り当てることを可能にし、早期かつ正確な診断の線に沿って精度の高い医療の恩恵を与え、コストと侵襲性を低減させる。フロー誘導型ローカライゼーションはまだ初歩的な段階であり、この問題を対象とする研究はごくわずかである。それにもかかわらず、提案手法の性能評価は、通常、単一の性能指標に沿って、そのようなスケール(例えば、ナノデバイスの限られたエネルギー)と、そのような困難な環境(例えば、体内のTHz伝搬の極端減衰)で関係する様々な側面を無視する非標準化方法で既に実施されている。このように、これらの評価は現実主義のレベルが低く、客観的に比較することはできない。この問題に対処するために、我々はシナリオの環境およびスケールに関連する特質を説明し、その精度や信頼性などの不均一なパフォーマンス指標に沿って、最先端のフロー誘導型ローカライゼーションアプローチの2つの性能を評価する。 Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits in precision medicine along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# Vocos: 高品質音声合成のための時間領域とフーリエベースニューラルボコーダのギャップを埋める Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis ( http://arxiv.org/abs/2306.00814v3 ) ライセンス: Link先を確認	Hubert Siuzdak,	(参考訳) ニューラルヴォコーディングの最近の進歩は、主に時間領域で動作するジェネレーティブ・アドバイサル・ネットワーク(GAN)によって駆動される。このアプローチは有効であるが、時間周波数表現による帰納バイアスを無視し、再帰的かつ計算集約的なアップサンプリング操作をもたらす。フーリエに基づく時間周波数表現は、人間の聴覚知覚とより正確に一致し、その計算のために確立された高速アルゴリズムの恩恵を受ける、魅力的な代替手段である。それでも、複雑な値を持つ分光器の直接再構成は歴史的に問題であり、主に位相回復の問題が原因である。本研究は、フーリエスペクトル係数を直接生成する新しいモデルであるVocosを提示することで、このギャップを埋めようとしている。我々の評価で示されているように、Vocosは音質の最先端に適合するだけでなく、計算効率も大幅に向上し、時間-ドメインのニューラル・ヴォコーディング・アプローチに比べて処理速度が大幅に向上する。ソースコードとモデルの重み付けはhttps://github.com/gemelo-ai/vocos.comでオープンソース化された。 Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that directly generates Fourier spectral coefficients. Vocos not only matches the state-of-the-art in audio quality, as demonstrated in our evaluations, but it also substantially improves computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. The source code and model weights have been open-sourced at https://github.com/gemelo-ai/vocos.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 交流場センサとしてのフロケット時間結晶 Floquet time-crystals as sensors of AC fields ( http://arxiv.org/abs/2306.03927v4 ) ライセンス: Link先を確認	Fernando Iemini, Rosario Fazio, Anna Sanpera,	(参考訳) 離散時間結晶で示される長距離空間秩序と時間秩序は、極端に弱い信号を感知する際に有利な特性となる。本稿では、弱いACフィールドの量子センサとしての性能について検討し、量子フィッシャー情報測定を用いて、長時間の尋問を可能としながら、ショットノイズ限界を克服できることを実証する。このようなシステムでは、集団間相互作用はノイズに対してその力学を安定させ、不完全をプロトコル化するのに十分堅牢である。 The long range spatial and temporal ordering displayed by discrete time crystals, can become advantageous properties when used for sensing extremely weak signals. Here, we investigate their performance as quantum sensors of weak AC-fields and demonstrate, using the quantum Fisher information measure, that they can overcome the shot noise limit while allowing long interrogation times. In such systems, collective interactions stabilize their dynamics against noise making them robust enough to protocol imperfections.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 確率的崩壊: より単純なサブネットに向けたSGDダイナミクスのグラディエントノイズの抽出方法 Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks ( http://arxiv.org/abs/2306.04251v3 ) ライセンス: Link先を確認	Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli,	(参考訳) 本研究では,より単純なサブネットワークに過度に表現的ネットワークを駆動する確率勾配降下(SGD)の強い暗黙バイアスを明らかにし,独立パラメータの数を劇的に削減し,一般化を改善する。このバイアスを明らかにするために、SGD によって修正されないパラメータ空間の不変集合や部分集合を同定する。我々は、より単純な(スパースまたはローランクの)サブネットワークに対応する不変集合の2つのクラスに焦点を合わせ、モダンアーキテクチャに一般的に現れる。解析により、SGDはこれらの単純不変集合に対する確率的誘引性の性質を示すことが明らかとなった。我々は,不変量集合の周囲のロスランドスケープの曲率と,確率勾配によってもたらされる雑音との競合に基づいて,確率的誘引性の十分な条件を確立する。顕著なことに、ノイズのレベルが増大すると魅力が増し、サドルポイントや列車損失の局所的な最大値に関連する魅力的な不変集合が出現する。我々は、訓練されたディープニューラルネットワークにおける魅力的な不変集合の存在を経験的に観察し、SGDのダイナミクスがしばしば消滅または冗長なニューロンを持つ単純なサブネットに崩壊することを示す。さらに、この確率的崩壊の単純化プロセスが、線形教師学生フレームワークの一般化にどう役立つかを実証する。最後に、この分析を通じて、長期にわたる学習率の高い早期学習が、その後の一般化に有効である理由を機械的に説明する。 In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant sets that correspond to simpler (sparse or low-rank) subnetworks and commonly appear in modern architectures. Our analysis uncovers that SGD exhibits a property of stochastic attractivity towards these simpler invariant sets. We establish a sufficient condition for stochastic attractivity based on a competition between the loss landscape's curvature around the invariant set and the noise introduced by stochastic gradients. Remarkably, we find that an increased level of noise strengthens attractivity, leading to the emergence of attractive invariant sets associated with saddle-points or local maxima of the train loss. We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework. Finally, through this analysis, we mechanistically explain why early training with large learning rates for extended periods benefits subsequent generalization.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# Decom--CAM: tell me you see, in details! Feature-Level Interpretation via Decomposition Class Activation Map Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map ( http://arxiv.org/abs/2306.04644v2 ) ライセンス: Link先を確認	Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Juan Zhang, Xuan Gong, Baochang Zhang,	(参考訳) ディープラーニングの解釈は非常に難しい問題です。クラスアクティベーションマップ(CAM)は、オブジェクトの位置を強調することによって深層モデルの予測を解釈するために広く使われているが、決定を行うためにモデルが使用する健全な機能についての洞察を得られていない。さらに、既存の評価プロトコルは、解釈可能性のパフォーマンスとモデルの判断品質の相関を見落とし、より根本的な問題を提示します。本稿では,2段階の解法である分解クラス活性化マップ(Decom-CAM)を提案する。 Decom-CAMは、特異値分解を用いて中間活性化写像を直交的特徴に分解し、それらの積分により塩分マップを生成する。特徴の直交性により、CAMは局所的な特徴を捉え、入力画像の目、鼻、顔などの意味的要素を特定できるため、深いモデル解釈にとってより有益である。包括的比較を保証するため、分類精度の結果に基づいてデータセットをサブセットに分割し、各サブセットの解釈可能性性能を別々に評価することで、新しい評価プロトコルを導入する。以上の結果から,Decom-CAMは,すべてのレベルの分類精度において,より高精度な精度マップを生成することにより,最先端の手法を著しく上回ることを示す。機能レベルの解釈可能性のアプローチと組み合わせることで、ディープニューラルネットワークの意思決定プロセスを理解するための新たな方向性の道を開くことができる。 Interpretation of deep learning remains a very challenging problem. Although the Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location, it fails to provide insight into the salient features used by the model to make decisions. Furthermore, existing evaluation protocols often overlook the correlation between interpretability performance and the model's decision quality, which presents a more fundamental issue. This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM), which offers a feature-level interpretation of the model's prediction. Decom-CAM decomposes intermediate activation maps into orthogonal features using singular value decomposition and generates saliency maps by integrating them. The orthogonality of features enables CAM to capture local features and can be used to pinpoint semantic components such as eyes, noses, and faces in the input image, making it more beneficial for deep model interpretation. To ensure a comprehensive comparison, we introduce a new evaluation protocol by dividing the dataset into subsets based on classification accuracy results and evaluating the interpretability performance on each subset separately. Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly by generating more precise saliency maps across all levels of classification accuracy. Combined with our feature-level interpretability approach, this paper could pave the way for a new direction for understanding the decision-making process of deep neural networks.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 量子ビット準備・測定シナリオにおける半対称情報完全測定の自己検査 Self-testing of semisymmetric informationally complete measurements in a qubit prepare-and-measure scenario ( http://arxiv.org/abs/2306.07248v4 ) ライセンス: Link先を確認	Gábor Drótos, Károly F. Pál, Tamás Vértesi,	(参考訳) 自己検査は量子システムを認証するための強力な方法である。当初、デバイス非依存(DI)設定で提案されていたセルフテストは、セミデバイス非依存(セミDI)設定に緩和された。本研究では,1パラメータファミリーに属する特定の種類の非射影量子ビット測定を,半DI準備・測定(PM)シナリオを用いて自己検査することに焦点を当てた。興味深いことに,これまでに発見された最も単純なPMシナリオは,4つの準備と4つの測定のみで,第4の測定を自己検査するためのものである。この測定は 4-アウトカムな非射影作用素評価測度 (POVM) であり、Geng et al [Phys. Rev. Lett. 126, 100401 (2021)] によって導入された半対称情報完備(半SIC) POVM のクラスに該当する。そこで我々は,PMシナリオにおけるセミDI自己検査の分析手法を開発した。我々の結果は、潜在的に最小限の PM シナリオ内で超極小の qubit POVM を自己テストする方法を開拓する。 Self-testing is a powerful method for certifying quantum systems. Initially proposed in the device-independent (DI) setting, self-testing has since been relaxed to the semi-device-independent (semi-DI) setting. In this study, we focus on the self-testing of a specific type of non-projective qubit measurements belonging to a one-parameter family, using the semi-DI prepare-and-measure (PM) scenario. Remarkably, we identify the simplest PM scenario discovered so far, involving only four preparations and four measurements, for self-testing the fourth measurement. This particular measurement is a four-outcome non-projective positive operator-valued measure (POVM) and falls in the class of semisymmetric informationally complete (semi-SIC) POVMs introduced by Geng et al. [Phys. Rev. Lett. 126, 100401 (2021)]. To achieve this, we develop analytical techniques for semi-DI self-testing in the PM scenario. Our results shall pave the way towards self-testing any extremal qubit POVM within a potentially minimal PM scenario.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# ニューラルサーフェスレンダリングによるクリーニングシーンにおけるAny-View 6DoFロボットグラスピングの学習 Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering ( http://arxiv.org/abs/2306.07392v4 ) ライセンス: Link先を確認	Snehal Jauhri, Ishikaa Lunawat, Georgia Chalvatzaki,	(参考訳) 現実世界のロボット操作において重要な課題は、追加のシーン探索を必要とせずに、あらゆる視点から散らばったシーンのオブジェクトを効果的につかむ6DoFである。この研究は、グルーピングをレンダリングとして再解釈し、6DoFグルーピング検出のための新しい方法であるNeuGraspNetを導入し、ニューラルボリューム表現とサーフェスレンダリングの進歩を活用する。ロボットのエンドエフェクタと物体の表面との相互作用を符号化し、共同学習により局所的な物体表面をレンダリングし、共有特徴空間における把握機能を学習する。このアプローチでは、グローバルな(シーンレベルの)特徴をつかむために、局所的な(グラフレベルの)神経表面の特徴をつかむために使用します。これにより、部分的に観察されたシーンであっても、有効で完全に暗黙的な6DoFのクオリティ予測が可能になる。 NeuGraspNetは、モバイル操作のシナリオに共通するランダムな視点で動作し、既存の暗黙的および半単純的把握方法より優れている。この手法の現実的な適用性は、オープンで散らばった空間をつかむ移動マニピュレータロボットで実証されている。 Project website at https://sites.google.com/view/neugraspnet A significant challenge for real-world robotic manipulation is the effective 6DoF grasping of objects in cluttered scenes from any single viewpoint without the need for additional scene exploration. This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering. It encodes the interaction between a robot's end-effector and an object's surface by jointly learning to render the local object surface and learning grasping functions in a shared feature space. The approach uses global (scene-level) features for grasp generation and local (grasp-level) neural surface features for grasp evaluation. This enables effective, fully implicit 6DoF grasp quality prediction, even in partially observed scenes. NeuGraspNet operates on random viewpoints, common in mobile manipulation scenarios, and outperforms existing implicit and semi-implicit grasping methods. The real-world applicability of the method has been demonstrated with a mobile manipulator robot, grasping in open, cluttered spaces. Project website at https://sites.google.com/view/neugraspnet	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# DiffAug:ロバスト分類器の訓練のためのディフューズ・アンド・ディネーズ強化 DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers ( http://arxiv.org/abs/2306.09192v2 ) ライセンス: Link先を確認	Chandramouli Sastry, Sri Harsha Dumpala, Sageev Oore,	(参考訳) DiffAugは、画像分類器を訓練するための、単純で効率的な拡散に基づく拡張手法である。与えられた例にDiffAugを適用すると、1つの前方拡散ステップと1つの逆拡散ステップからなる。 ResNet-50アーキテクチャとVision Transformerアーキテクチャの両方を用いて、DiffAugで訓練された分類器を網羅的に評価し、コバリアレートシフトに対するロバスト性の向上、検証された逆精度、および分布検出における単一ステップ逆拡散の驚くべき効果を実証する。 DiffAugをAugMixやDeepAugmentのような他の拡張と組み合わせると、さらなる堅牢性の向上が示されます。最後に、このアプローチに基づいて分類器誘導拡散を改善する。 (i)分類器の一般化 (二)勾配品質(知覚アライメントの改善)、及び (iii)画像生成性能。そこで本稿では,新たなデータを必要としない頑健さを向上し,既存の拡張アプローチを効果的に補完する,計算効率のよいトレーニング手法を提案する。 We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers for the crucial yet challenging goal of improved classifier robustness. Applying DiffAug to a given example consists of one forward-diffusion step followed by one reverse-diffusion step. Using both ResNet-50 and Vision Transformer architectures, we comprehensively evaluate classifiers trained with DiffAug and demonstrate the surprising effectiveness of single-step reverse diffusion in improving robustness to covariate shifts, certified adversarial accuracy and out of distribution detection. When we combine DiffAug with other augmentations such as AugMix and DeepAugment we demonstrate further improved robustness. Finally, building on this approach, we also improve classifier-guided diffusion wherein we observe improvements in: (i) classifier-generalization, (ii) gradient quality (i.e., improved perceptual alignment) and (iii) image generation performance. We thus introduce a computationally efficient technique for training with improved robustness that does not require any additional data, and effectively complements existing augmentation approaches.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 非対称初期状態からの電荷ゆらぎのダイナミクス Dynamics of charge fluctuations from asymmetric initial states ( http://arxiv.org/abs/2306.12404v3 ) ライセンス: Link先を確認	Bruno Bertini, Katja Klobas, Mario Collura, Pasquale Calabrese, Colin Rylands,	(参考訳) 保存電荷密度は、量子多体系において非常に特殊な観測可能量であり、建設によって力学に関する情報を符号化する。したがって、それらの進化は一般的な観測可能なものよりもはるかに単純な解釈であり、任意の時間にシステムの状態に関する普遍的な情報を返すことが期待されている。ここでは、電荷非対称初期状態で準備された系における保存されたU(1)電荷のゆらぎのダイナミクスについて検討する。与えられたサブシステムの電荷ゆらぎを、切り刻まれた電荷のフルカウント統計と、電荷の対称性セクターに解決されたサブシステムと残りの部分の間の量子的絡み合いを用いて特徴づける。初期状態が空間において均質であるとしても、電荷揺らぎは初期状態の電荷非対称性に起因する有効不均一性を生成することを示す。この観測により、この問題を不均一な電荷対称状態上の電荷ゆらぎにマッピングし、最近開発された時空双対性アプローチを用いてそれを扱う。相互作用可能なシステムに対する処理を専門にすることで、時空双対性アプローチと一般化された流体力学を組み合わせて明確な予測を求める。 Conserved-charge densities are very special observables in quantum many-body systems as, by construction, they encode information about the dynamics. Therefore, their evolution is expected to be of much simpler interpretation than that of generic observables and to return universal information on the state of the system at any given time. Here we study the dynamics of the fluctuations of conserved U(1) charges in systems that are prepared in charge-asymmetric initial states. We characterise the charge fluctuations in a given subsystem using the full-counting statistics of the truncated charge and the quantum entanglement between the subsystem and the rest resolved to the symmetry sectors of the charge. We show that, even though the initial states considered are homogeneous in space, the charge fluctuations generate an effective inhomogeneity due to the charge-asymmetric nature of the initial states. We use this observation to map the problem into that of charge fluctuations on inhomogeneous, charge-symmetric states and treat it using a recently developed space-time duality approach. Specialising the treatment to interacting integrable systems we combine the space-time duality approach with generalised hydrodynamics to find explicit predictions.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 非エルミート型低温原子ポンプによるき裂線形応答 Kinked linear response from non-Hermitian cold-atom pumping ( http://arxiv.org/abs/2306.13139v2 ) ライセンス: Link先を確認	Fang Qin, Ruizhe Shen, Linhu Li, Ching Hua Lee,	(参考訳) 非エルミート系と非エルミート系が指数関数的に局所化された皮膚モードを持つことはよく知られている。しかし,本研究では, 急激な物理的衝動の欠如にもかかわらず, 非ハーモニティ性は量子気体の半古典波パケット軌道において, 急激で顕著なキツネを生じさせることがわかった。これは、すべての物理的カップリングが局所的であっても、非エルミートポンピングから過小評価された内在的非局所性に由来するため、不連続なバンド幾何学やベリー曲率につながるバンド構造における謎的な特異性をもたらす。具体的には,超低温原子配置におけるキンク応答の実現に焦点をあてる。実測実験では, 物理原子雲の微調整をせずに応答キンクを観察できるようなレーザー誘起損失を有する2次元光学格子の超低温原子配置を提案する。以上の結果から,非エルミチアン励起による非エルミチアン励起による特異な非単調な挙動が示され,超低温原子プラットフォーム上での非エルミチアン動力学研究の新たな道筋が示唆された。 It is well known that non-Hermitian, non-reciprocal systems may harbor exponentially localized skin modes. However, in this work, we find that, generically, non-Hermiticity gives rise to abrupt and prominent kinks in the semi-classical wave packet trajectories of quantum gases, despite the absence of sudden physical impulses. This physically stems from a hitherto underappreciated intrinsic non-locality from non-Hermitian pumping, even if all physical couplings are local, thereby resulting in enigmatic singularities in the band structure that lead to discontinuous band geometry and Berry curvature. Specifically, we focus on the realization of the kinked response in an ultracold atomic setup. For a concrete experimental demonstration, we propose an ultracold atomic setup in a two-dimensional optical lattice with laser-induced loss such that response kinks can be observed without fine-tuning in the physical atomic cloud dynamics. Our results showcase unique non-monotonic behavior from non-Hermitian pumping beyond the non-Hermitian skin effect and suggest new avenues for investigating non-Hermitian dynamics on ultracold atomic platforms.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# 高励起状態の断熱時間進化 Adiabatic time evolution of highly excited states ( http://arxiv.org/abs/2306.13967v3 ) ライセンス: Link先を確認	Hadi Yarloo, Hua-Chen Zhang, Anne E. B. Nielsen,	(参考訳) 量子システムの断熱時間進化は、状態の準備から計算の単純化、トポロジカル変換、最適化や量子コンピューティングに至るまで、広く使われているツールである。断熱時間進化は一般にギャップのある基底状態に対してうまく機能するが、保護エネルギーギャップが欠如しているスペクトルの中央の熱状態に対してはうまく機能しない。ここでは、特定の種類の高励起状態である量子多体傷が、エネルギーギャップの保護がないにもかかわらず断熱的な時間進化に適していることが示される。テンソルネットワークと2次元分数的量子ホールモデルから構築された2つのかなり異なるモデルを考えると、必要な最終断熱忠実度が約0.99のとき、量子不足は断熱力学に関してギャップ付き基底状態と類似する。 1次元モデルの傷痕状態が断熱的に変換できる最大速度は、一般的な熱と障害駆動の局所化状態の両方に対して指数関数的に減少するのに対し、システムサイズによるパワー則として減少する。傾斜速度が一定で非常に低い場合、一元性からの距離のずれはスカー状態のランプ速度と線形にスケールするが、ギャップのある地盤状態の2次的な変化は見いだされる。したがって、0.9999以上のような、必要となる断熱係数が非常に高い場合には、ギャップのある基底状態がより良く動作する。傷跡から漏れ出す2つのメカニズムを特定し,その結果を説明する。単一で孤立した基底状態の操作は量子的応用では一般的であるが、傷跡状態の断熱的進化は、単一のシステムで同時に基底状態のような状態の塔全体を操作できる柔軟性を提供する。 Adiabatic time evolution of quantum systems is a widely used tool with applications ranging from state preparation through simplifications of computations and topological transformations to optimization and quantum computing. Adiabatic time evolution generally works well for gapped ground states, but not for thermal states in the middle of the spectrum that lack a protecting energy gap. Here we show that quantum many-body scars -- a particular type of highly excited states -- are suitable for adiabatic time evolution despite the absence of a protecting energy gap. Considering two rather different models, namely a one-dimensional model constructed from tensor networks and a two-dimensional fractional quantum Hall model with anyons, we find that the quantum scars perform similarly to gapped ground states with respect to adiabatic dynamics when the required final adiabatic fidelity is around 0.99. The maximum speed at which the scar state of the one-dimensional model can be adiabatically transformed decreases as a power law with system size, as opposed to exponentially for both generic thermal and disorder-driven localized states. At constant and very low ramp speed, we find that the deviation of the fidelity from unity scales linearly with ramp speed for scar states, but quadratically for gapped ground states. The gapped ground states hence perform better when the required adiabatic fidelities are very high, such as 0.9999 and above. We identify two mechanisms for leakage out of the scar state and use them to explain our results. While manipulating a single, isolated ground state is common in quantum applications, adiabatic evolution of scar states provides the flexibility to manipulate an entire tower of ground-state-like states simultaneously in a single system.	翻訳日:2024-05-31 02:31:12 公開日:2024-05-29
# ノイズの多い量子木:補正なしの無限保護 Noisy Quantum Trees: Infinite Protection Without Correction ( http://arxiv.org/abs/2306.14294v3 ) ライセンス: Link先を確認	Shiv Akshar Yadavalli, Iman Marvian,	(参考訳) 我々は,木構造を持つ量子ネットワークについて研究し,そこでは根から葉まで情報を伝達する。ネットワーク内の各ノードにおいて、受信されたキュービットは、新しいアンシラ量子ビットと一元的に相互作用し、その後、各キュービットはノイズチャネルを介して次のレベルで別のノードに送信される。したがって,木深度が大きくなるにつれて,情報の非局在化によって達成されるノイズの可逆効果と,そのようなノイズに対する保護との競合が生じる。古典的な設定では、各ノードが入力ビットを複数の出力ビットにコピーするだけで、このモデルは広範に応用できる木上の放送や再構成の問題として研究されている。本研究では,この問題の量子バージョンについて検討する。本稿では,各ノードにおけるCliffordエンコーダについて検討し,各エッジに単一キュービットのPauliノイズチャネルとともに,入力キュービットを安定化器コードに符号化する。このようなノイズの多い量子木は、新鮮な(低エントロピー)アンシラ量子ビットのストリームにアクセスすることができるが、誤り訂正を行うことができないシナリオを記述している。したがって、それらは量子的フォールトトレランスに関して異なる視点を提供する。さらに、連結符号のエンコーダ内のノイズの影響を記述するための有用なモデルを提供する。距離やエンコーダの特性といった符号の性質に依存する特定の雑音閾値を超えると、情報は木の深さとともに指数関数的に減衰する。一方、ある効率的な復号器の研究により、距離d>=2の符号と十分小さい(しかしゼロでない)雑音に対して、古典的な情報と絡み合いは無限の深さの雑音木に伝播することを示した。実際、これは、各ノードに特定の2-qubitエンコーダを持つバイナリツリーでさえも当てはまり、受信したキュービットは、距離 d=1 のバイナリ反復符号で符号化される。 We study quantum networks with tree structures, in which information propagates from a root to leaves. At each node in the network, the received qubit unitarily interacts with fresh ancilla qubits, after which each qubit is sent through a noisy channel to a different node in the next level. Therefore, as the tree depth grows, there is a competition between the irreversible effect of noise and the protection against such noise achieved by delocalization of information. In the classical setting, where each node simply copies the input bit into multiple output bits, this model has been studied as the broadcasting or reconstruction problem on trees, which has broad applications. In this work, we study the quantum version of this problem. We consider a Clifford encoder at each node that encodes the input qubit in a stabilizer code, along with a single qubit Pauli noise channel at each edge. Such noisy quantum trees describe a scenario in which one has access to a stream of fresh (low-entropy) ancilla qubits, but cannot perform error correction. Therefore, they provide a different perspective on quantum fault tolerance. Furthermore, they provide a useful model for describing the effect of noise within the encoders of concatenated codes. We prove that above certain noise thresholds, which depend on the properties of the code such as its distance, as well as the properties of the encoder, information decays exponentially with the depth of the tree. On the other hand, by studying certain efficient decoders, we prove that for codes with distance d>=2 and for sufficiently small (but non-zero) noise, classical information and entanglement propagate over a noisy tree with infinite depth. Indeed, we find that this remains true even for binary trees with certain 2-qubit encoders at each node, which encode the received qubit in the binary repetition code with distance d=1.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# テキストアノテーションのためのオープンソースLCM:モデル設定と微調整のための実践的ガイド Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning ( http://arxiv.org/abs/2307.02179v2 ) ライセンス: Link先を確認	Meysam Alizadeh, Maël Kubli, Zeynab Samei, Shirin Dehghani, Mohammadmasiha Zahedivafa, Juan Diego Bermeo, Maria Korobeynikova, Fabrizio Gilardi,	(参考訳) 本稿では、政治科学研究に典型的なテキスト分類タスクにおけるオープンソースのLarge Language Models(LLM)の性能について検討する。姿勢・話題・関連分類などの課題を調べることで,テキスト分析におけるLLMの使用に関する情報的判断を学者に指導することを目指す。具体的には、ニュース記事やつぶやきデータセットを用いたテキストアノテーションタスクにおいて、ゼロショットと微調整の両方のLDMの評価を行う。解析の結果、微調整によりオープンソースのLCMの性能が向上し、ゼロショットのGPT-3.5やGPT-4に匹敵する結果が得られるが、微調整のGPT-3.5には遅れが生じる。さらに,注釈付きテキストを比較的控えめな量で微調整を施すことが,少人数の訓練より望ましいことを確認した。この結果から,微調整されたオープンソース LLM は,幅広いテキストアノテーションアプリケーションに効果的に展開できることが示唆された。テキストアノテーションへのLLMの適用を容易にするPythonノートを提供する。 This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT-3.5 and GPT-4, though still lagging behind fine-tuned GPT-3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# 収束保証を用いたフェアネスを考慮したフェデレーションミニマックス最適化 Fairness-aware Federated Minimax Optimization with Convergence Guarantee ( http://arxiv.org/abs/2307.04417v3 ) ライセンス: Link先を確認	Gerry Windiarto Mohamad Dunda, Shenghui Song,	(参考訳) フェデレートラーニング(FL)はそのプライバシー保護機能のためにかなりの注目を集めている。それでも、ユーザデータ管理の自由の欠如は、モデルが人種や性別などのセンシティブな要因に偏っている、グループフェアネスの問題につながる可能性がある。そこで本研究では,FLにおけるグループフェアネス問題に明示的に対処するために,拡張ラグランジアン法(FFALM)を用いたフェアフェデレーション平均化アルゴリズムを提案する。具体的には、トレーニング目標に公正性制約を課し、制約付き最適化問題の最小化を解消する。次に、FFALMの収束率の理論上界を導出する。 FFALMの公正性向上効果は,CelebA および UTKFace データセットにおいて,統計的に重大な不均一性の存在下で実証的に示された。 Federated learning (FL) has garnered considerable attention due to its privacy-preserving feature. Nonetheless, the lack of freedom in managing user data can lead to group fairness issues, where models are biased towards sensitive factors such as race or gender. To tackle this issue, this paper proposes a novel algorithm, fair federated averaging with augmented Lagrangian method (FFALM), designed explicitly to address group fairness issues in FL. Specifically, we impose a fairness constraint on the training objective and solve the minimax reformulation of the constrained optimization problem. Then, we derive the theoretical upper bound for the convergence rate of FFALM. The effectiveness of FFALM in improving fairness is shown empirically on CelebA and UTKFace datasets in the presence of severe statistical heterogeneity.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# マルチモーダルからモノモーダルリンパ腫サブタイプモデルへの知識伝達のための視覚トランスフォーマーに基づくフレームワーク A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models ( http://arxiv.org/abs/2308.01328v3 ) ライセンス: Link先を確認	Bilel Guetarni, Feryal Windal, Halim Benhabiles, Marianne Petit, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard,	(参考訳) リンパ腫の亜型を決定することは、生存率を高めるために患者治療を改善するための重要なステップである。この文脈では、遺伝子発現技術に依存している既存のゴールドスタンダード診断法は非常に高価で時間を要するため、アクセシビリティが低下する。 IHC(免疫組織化学)技術に基づく代替診断法は(WHOによって推奨されている)存在するが、同様の制限に悩まされており、精度は低い。ディープラーニングモデルを用いた全体スライド画像(WSI)解析は、既存の方法に対する費用対効果と迅速な代替手段を提供する、がん診断の有望な可能性を示している。本研究では,高分解能WSIからDLBCL(Diffuse Large B-Cell Lymphoma)癌サブタイプを識別するための視覚トランスフォーマーベースのフレームワークを提案する。この目的のために、様々なWSIモダリティから分類器モデルを訓練するためのマルチモーダルアーキテクチャを導入する。次に、知識蒸留プロセスを通じてこのモデルを活用し、モノモーダル分類器の学習を効率的に導く。 157例のリンパ腫データセットを用いて検討したところ, モノモーダル分類モデルの有望な性能が得られた。さらに,本実験データから推定したパワーロー曲線から,有意な患者数からのトレーニングデータが増えると,IHC技術との競合診断精度が向上する可能性が示唆された。さらに,外部乳癌データセット(BCIデータセット)に関する追加実験により,本フレームワークの有効性を確認した。 Determining lymphoma subtypes is a crucial step for better patient treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which relies on gene expression technology, is highly expensive and time-consuming, making it less accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. Whole Slide Image (WSI) analysis using deep learning models has shown promising potential for cancer diagnosis, that could offer cost-effective and faster alternatives to existing methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we introduce a multi-modal architecture to train a classifier model from various WSI modalities. We then leverage this model through a knowledge distillation process to efficiently guide the learning of a mono-modal classifier. Our experimental study conducted on a lymphoma dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent state-of-the-art methods. In addition, the power-law curve, estimated on our experimental data, suggests that with more training data from a reasonable number of additional patients, our model could achieve competitive diagnosis accuracy with IHC technologies. Furthermore, the efficiency of our framework is confirmed through an additional experimental study on an external breast cancer dataset (BCI dataset).	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# GPLaSDI:Deep Autoencoderによるガウス過程に基づく解釈可能な遅延空間ダイナミクスの同定 GPLaSDI: Gaussian Process-based Interpretable Latent Space Dynamics Identification through Deep Autoencoder ( http://arxiv.org/abs/2308.05882v3 ) ライセンス: Link先を確認	Christophe Bonneville, Youngsoo Choi, Debojyoti Ghosh, Jonathan L. Belof,	(参考訳) 偏微分方程式(PDE)の数値解法は困難であり、計算コストも高い。これにより、精度は高いがフルオーダーモデル(FOM)よりも高速な低階モデル(ROM)が開発された。近年、機械学習の進歩により、LaSDI(Latent Space Dynamics Identification)のような非線形射影法が作成できるようになった。 LaSDIはオートエンコーダを用いて全階PDEソリューションを潜在空間にマッピングし、潜在空間力学を管理するODEのシステムを学ぶ。縮小潜時空間におけるODEシステムの補間と解法により、予測潜時空間力学をデコーダに供給することにより、高速かつ正確なROM予測を行うことができる。本稿では,遅延空間ODE補間のためのガウス過程(GP)に依存する新しいLaSDIベースのフレームワークであるGPLaSDIを紹介する。 GPを使うことには2つの大きな利点がある。まず、ROM予測に対する不確実性の定量化を可能にする。第二に、この予測の不確実性を活用することで、追加のトレーニングデータポイントの厳選による効率的な適応トレーニングが可能になる。このアプローチは、基礎となるPDEの事前知識を必要としない。したがって、GPLaSDI は本質的に非侵入的であり、既知の PDE やその残余のない問題に適用することができる。本稿では,バーガース方程式,プラズマ物理学におけるブラソフ方程式,熱バブル問題に対する我々のアプローチの有効性を実証する。提案手法は, 最大7%の相対誤差で200～10万倍の高速化を実現する。 Numerically solving partial differential equations (PDEs) can be challenging and computationally expensive. This has led to the development of reduced-order models (ROMs) that are accurate but faster than full order models (FOMs). Recently, machine learning advances have enabled the creation of non-linear projection methods, such as Latent Space Dynamics Identification (LaSDI). LaSDI maps full-order PDE solutions to a latent space using autoencoders and learns the system of ODEs governing the latent space dynamics. By interpolating and solving the ODE system in the reduced latent space, fast and accurate ROM predictions can be made by feeding the predicted latent space dynamics into the decoder. In this paper, we introduce GPLaSDI, a novel LaSDI-based framework that relies on Gaussian process (GP) for latent space ODE interpolations. Using GPs offers two significant advantages. First, it enables the quantification of uncertainty over the ROM predictions. Second, leveraging this prediction uncertainty allows for efficient adaptive training through a greedy selection of additional training data points. This approach does not require prior knowledge of the underlying PDEs. Consequently, GPLaSDI is inherently non-intrusive and can be applied to problems without a known PDE or its residual. We demonstrate the effectiveness of our approach on the Burgers equation, Vlasov equation for plasma physics, and a rising thermal bubble problem. Our proposed method achieves between 200 and 100,000 times speed-up, with up to 7% relative error.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# 勾配支配確率最適化のための均質化手法 A Homogenization Approach for Gradient-Dominated Stochastic Optimization ( http://arxiv.org/abs/2308.10630v3 ) ライセンス: Link先を確認	Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye,	(参考訳) グラディエント優位性は強い凸性よりも弱い条件であるが、非凸最適化においても大域収束を十分に保証する。この特性は、機械学習、強化学習(RL)、および運用管理に広く応用されている。本稿では,最近提案された均質化アプローチに基づく確率関数の確率的等質二階降下法(SHSODM)を提案する。理論的には, サンプルの複雑性解析を行い, さらに分散低減手法を取り入れた拡張結果を示す。以上の結果から,SHSODMは勾配優先確率最適化法において,立方正則化を伴わない他の2次法で達成される最もよく知られたサンプル複雑性と一致した。理論的には、ホモジェナイゼーションアプローチはニュートン型システムではなく、各イテレーションにおいて極端固有ベクトル問題を解くことのみに依存しているため、我々の手法は、不条件問題におけるより安価な計算コストとロバスト性を利用することができる。いくつかのRLタスクに関する数値実験は、SHSODMの他のオフ・ザ・シェルフ法と比較して優れた性能を示す。 Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient dominance property based on a recently proposed homogenization approach. Theoretically, we provide its sample complexity analysis, and further present an enhanced result by incorporating variance reduction techniques. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated stochastic optimization but without cubic regularization. Empirically, since the homogenization approach only relies on solving extremal eigenvector problem at each iteration instead of Newton-type system, our methods gain the advantage of cheaper computational cost and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the better performance of SHSODM compared to other off-the-shelf methods.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# 可視マーカーを用いた無布地変形性表面再構成 Textureless Deformable Surface Reconstruction with Invisible Markers ( http://arxiv.org/abs/2308.13678v2 ) ライセンス: Link先を確認	Xinyuan Li, Yu Guo, Yubei Tu, Yu Ji, Yanchen Liu, Jinwei Ye, Changxi Zheng,	(参考訳) 変形可能な表面をテクスチャをほとんどあるいは全く持たずに再構築・追跡することは、長年にわたる課題となっている。基本的に、課題は、クロスイメージ対応を確立するための特徴を欠いた、テクスチャのない表面から生じている。本研究では, 物体の表面性状を積極的に高め, 3次元表面再構成と対応追跡を容易にする新しい種類のマーカーを提案する。我々のマーカーは蛍光染料でできており、紫外線(UV)の下でのみ可視であり、通常の照明条件下では見えない。マーカーを活用することで,紫外光と可視光の下での表面変形を時間多重的に捉えるマルチカメラシステムを設計する。紫外線の下では、物体のマーカーが表面のテクスチャを豊かにし、高品質な3D形状の再構築と追跡を可能にする。可視光の下では、マーカーは見えなくなり、物体の元々の触れられていない外観を捉えることができます。我々は,手振り,表情,手振り布,物体間相互作用など,さまざまな困難な場面で実験を行った。これらすべてのケースにおいて、我々のシステムは堅牢で高品質な3D再構成と追跡を実現できることを実証する。 Reconstructing and tracking deformable surface with little or no texture has posed long-standing challenges. Fundamentally, the challenges stem from textureless surfaces lacking features for establishing cross-image correspondences. In this work, we present a novel type of markers to proactively enrich the object's surface features, and thereby ease the 3D surface reconstruction and correspondence tracking. Our markers are made of fluorescent dyes, visible only under the ultraviolet (UV) light and invisible under regular lighting condition. Leveraging the markers, we design a multi-camera system that captures surface deformation under the UV light and the visible light in a time multiplexing fashion. Under the UV light, markers on the object emerge to enrich its surface texture, allowing high-quality 3D shape reconstruction and tracking. Under the visible light, markers become invisible, allowing us to capture the object's original untouched appearance. We perform experiments on various challenging scenes, including hand gestures, facial expressions, waving cloth, and hand-object interaction. In all these cases, we demonstrate that our system is able to produce robust, high-quality 3D reconstruction and tracking.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# シャーロック・ホームズは夢を演じない:社会科学と生命科学におけるエビデンス理論の重要性 Sherlock Holmes Doesn't Play Dice: The significance of Evidence Theory for the Social and Life Sciences ( http://arxiv.org/abs/2309.03222v2 ) ライセンス: Link先を確認	V. L. Raju Chinthalapati, Guido Fioretti,	(参考訳) エビデンス理論 (Demster-Shafer Theory, Belief Functions Theory) はデータ融合においてますます使われてきているが、社会科学と生命科学におけるその可能性はしばしば、その特徴に対する認識の欠如によって曖昧になっている。この論文では、事象が成立する恐れから生じる不確実性を、エビデンス理論が表現できると強調する。対照的に、確率論は意思決定者が現在検討している可能性に制限されなければならない。次に,確率論の様々なバージョンに対するベイズの理論と,デンプスター・シェーファーの組合せルールがどのように関連しているかを説明し,情報理論のどの応用をエビデンス理論によって拡張できるかについて議論する。最後に、我々の主張を、監査演習に現れる部分的に重なり合う、部分的に矛盾する解を理解するためにエビデンス理論が用いられる例で説明する。 While Evidence Theory (Demster-Shafer Theory, Belief Functions Theory) is being increasingly used in data fusion, its potentialities in the Social and Life Sciences are often obscured by lack of awareness of its distinctive features. With this paper we stress that Evidence Theory can express the uncertainty deriving from the fear that events may materialize, that one has not been able to figure out. By contrast, Probability Theory must limit itself to the possibilities that a decision-maker is currently envisaging. Subsequently, we illustrate how Dempster-Shafer's combination rule relates to Bayes' Theorem for various versions of Probability Theory and discuss which applications of Information Theory can be enhanced by Evidence Theory. Finally, we illustrate our claims with an example where Evidence Theory is used to make sense of the partially overlapping, partially contradictory solutions that appear in an auditing exercise.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# ハイブリッド量子最適化を用いた分散スケジューリングによる需要側応答のインセンティブ Incentivising Demand Side Response through Discount Scheduling using Hybrid Quantum Optimization ( http://arxiv.org/abs/2309.05502v2 ) ライセンス: Link先を確認	David Bucher, Jonas Nüßlein, Corey O'Meara, Ivan Angelov, Benedikt Wimmer, Kumar Ghosh, Giorgio Cortiana, Claudia Linnhoff-Popien,	(参考訳) デマンドサイド・レスポンス(Demand Side Response, DSR)は、消費者が電気需要の管理に積極的に参加できるようにする戦略である。高需要時のグリッドの歪みを緩和し、(再生可能な)電力資源のよりバランスよく効率的な利用を促進することを目的としている。我々は、ディスカウントスケジューリングを通じてDSRを実装し、地域エネルギー混合がより再生可能エネルギーで構成されている時間に電力消費パターンを調整するために消費者に個別の価格インセンティブを提供する。個々の顧客消費に対する割引を調整するため、ディスカウントスケジューリング問題(DSP)は大規模な組合せ最適化タスクとなる。そこで我々は,D-WaveのLeap Hybrid Cloudを用いたハイブリッド量子コンピューティングアプローチを採用した。従来の混合整数オプティマイザであるGurobiに対して,固定実行時のソリューション品質と,割引割当の公平性の観点から,Leapをベンチマークした。さらに,サブルーチンを動作させる量子コンピュータや古典コンピュータに適用したDSPの大規模分解アルゴリズムを提案する。実世界のデータから生成された合成データを用いて,古典的分解法は,最大3200人の消費者に最適な総合的な「新規解法」品質を与えるが,ハイブリッド量子アプローチは消費者に均等に分散したディスカウントを提供する。 Demand Side Response (DSR) is a strategy that enables consumers to actively participate in managing electricity demand. It aims to alleviate strain on the grid during high demand and promote a more balanced and efficient use of (renewable) electricity resources. We implement DSR through discount scheduling, which involves offering discrete price incentives to consumers to adjust their electricity consumption patterns to times when their local energy mix consists of more renewable energy. Since we tailor the discounts to individual customers' consumption, the Discount Scheduling Problem (DSP) becomes a large combinatorial optimization task. Consequently, we adopt a hybrid quantum computing approach, using D-Wave's Leap Hybrid Cloud. We benchmark Leap against Gurobi, a classical Mixed Integer optimizer in terms of solution quality at fixed runtime and fairness in terms of discount allocation. Furthermore, we propose a large-scale decomposition algorithm/heuristic for the DSP, applied with either quantum or classical computers running the subroutines, which significantly reduces the problem size while maintaining solution quality. Using synthetic data generated from real-world data, we observe that the classical decomposition method obtains the best overall \newp{solution quality for problem sizes up to 3200 consumers, however, the hybrid quantum approach provides more evenly distributed discounts across consumers.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# RGB-Dビデオからの物理に基づく剛体物体追跡と摩擦フィルタ Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos ( http://arxiv.org/abs/2309.15703v2 ) ライセンス: Link先を確認	Rama Krishna Kandukuri, Michael Strecke, Joerg Stueckler,	(参考訳) 感覚観察による物体の相互作用の物理に基づく理解は、拡張現実やロボット工学において必須の能力である。シミュレーションと制御のためにシーンのプロパティをキャプチャすることができる。本稿では,RGB-D画像から剛体物体を3次元で追跡し,物体の物理的特性を推定する,リアル・トゥ・シムのための新しい手法を提案する。我々は,任意のメッシュ形状の接触と摩擦をモデル化できる拡張カルマンフィルタの状態遷移モデルとして,微分可能な物理シミュレーションを用いて,物理的に妥当な軌道を推定する。提案手法は, 位置, 向き, 速度をフィルタし, 同時に物体の摩擦係数を推定できることを実証する。我々は,単一物体と衝突物体の合成画像列における様々なスライディングシナリオに対するアプローチを分析する。また、実世界のデータセットに対する我々のアプローチを実証し、評価する。我々は,この新たな問題設定と手法との比較において,今後の研究を促進するために,新しいベンチマークデータセットを公開している。 Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables to capture the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyze our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# 従来のNo-Go理論を克服する - 複数のアクセスチャネルにおける量子アドバンテージ Overcoming Traditional No-Go Theorems: Quantum Advantage in Multiple Access Channels ( http://arxiv.org/abs/2309.17263v2 ) ライセンス: Link先を確認	Ananya Chakraborty, Sahil Gopalkrishna Naik, Edwin Peter Lobo, Ram Krishna Patra, Samrat Sen, Mir Alimuddin, Amit Mukherjee, Manik Banik,	(参考訳) マルチノード構成の領域へのポイント・ツー・ポイント通信モデルの拡張は、インターネットや通信ネットワークにおける多くのアプリケーションを見つける。ここでは、Multiple Access Channel(MAC)と呼ばれる、一般的に遭遇するネットワーク構成において、量子通信の新たな利点を確立する。 MACは複数の遠隔送信者で構成され、それぞれのメッセージを共通の受信機に送信する。量子超デンス符号化プロトコルとは異なり、ここで報告される利点は送信者と受信者の絡み合いを引き起こすことなく実現される。特に、そのような利点は、1つの送信者と1つの受信者を含む伝統的なポイント・ツー・ポイントの通信では達成不可能であり、そこではホレヴォとフランケル・ワイナーのノー・ゴー定理の制限が成立する。 MAC設定内では、この独特な利点は、受信機が複数の送信機から受信した量子システムを同時に復号するユニークな能力を通じて実現される。興味深いことに、MAC設計のいくつかは、プゼー=バレット=ルドルフの定理や「絡みのない非局所性」の概念など、もともと全く異なる目的のために探求された量子基礎の様々な構成からインスピレーションを得ている。ネットワーク通信における直接的な応用の他に、提示された量子優位性は「入力なしの量子非局所性」の概念と深いつながりを示唆し、絡み合った測定の半デバイス非依存的な認証の可能性を秘めている。 Extension of point-to-point communication model to the realm of multi-node configurations finds a plethora of applications in internet and telecommunication networks. Here, we establish a novel advantage of quantum communication in a commonly encountered network configuration known as the Multiple Access Channel (MAC). A MAC consists of multiple distant senders aiming to send their respective messages to a common receiver. Unlike the quantum superdense coding protocol, the advantage reported here is realized without invoking entanglement between the senders and the receiver. Notably, such an advantage is unattainable in traditional point-to-point communication involving one sender and one receiver, where the limitations imposed by the Holevo and Frankel Weiner no-go theorems come into play. Within the MAC setup, this distinctive advantage materializes through the receiver's unique ability to simultaneously decode the quantum systems received from multiple senders. Intriguingly, some of our MAC designs draw inspiration from various other constructs in quantum foundations, such as the Pusey-Barrett-Rudolph theorem and the concept of `nonlocality without entanglement', originally explored for entirely different purposes. Beyond its immediate applications in network communication, the presented quantum advantage hints at a profound connection with the concept of `quantum nonlocality without inputs' and holds the potential for semi-device-independent certification of entangled measurements.	翻訳日:2024-05-31 02:21:25 公開日:2024-05-29
# 自己指導型学習における情報の流れ Information Flow in Self-Supervised Learning ( http://arxiv.org/abs/2309.17281v3 ) ライセンス: Link先を確認	Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan, Yifan Zhang,	(参考訳) 本稿では,2つのデュアルブランチ(シームズアーキテクチャ)自己教師型学習アプローチ,すなわちBarlow Twinsとスペクトルコントラスト学習を,行列相互情報のレンズを通して包括的に分析する。これらの手法の損失関数は,行列の相互情報と行列の関節エントロピーの両方を暗黙的に最適化することを証明する。この知見は、特にMAEやU-MAEといった単一ブランチアルゴリズムのカテゴリについて、相互情報と共同エントロピーがエントロピーとなる分野をさらに探求するきっかけとなる。この直感に基づいて,行列ベースのエントロピー推定を正規化器として活用し,U-MAEを特別に仮定する新しい手法であるマトリックス変分マスク付きオートエンコーダ(M-MAE)を導入する。実験的な評価は、M-MAEの有効性を最先端の手法と比較し、線形プローブのViT-Baseが3.9%改善され、微細チューニングのViT-Largeが1%改善された。 In this paper, we conduct a comprehensive analysis of two dual-branch (Siamese architecture) self-supervised learning approaches, namely Barlow Twins and spectral contrastive learning, through the lens of matrix mutual information. We prove that the loss functions of these methods implicitly optimize both matrix mutual information and matrix joint entropy. This insight prompts us to further explore the category of single-branch algorithms, specifically MAE and U-MAE, for which mutual information and joint entropy become the entropy. Building on this intuition, we introduce the Matrix Variational Masked Auto-Encoder (M-MAE), a novel method that leverages the matrix-based estimation of entropy as a regularizer and subsumes U-MAE as a special case. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 連続したコントラスト音声言語理解 Continual Contrastive Spoken Language Understanding ( http://arxiv.org/abs/2310.02699v2 ) ライセンス: Link先を確認	Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj,	(参考訳) 近年、ニューラルネットワークは様々な分野において顕著な進歩を見せており、音声処理は例外ではない。しかし、この分野における最近のブレークスルーは、大規模なデータセットと膨大なコンピューティングリソースを使用した広範なオフライントレーニングを必要とする。残念なことに、これらのモデルは、新しいタスクを継続的に学習する際に、以前取得した知識を維持するのに苦労している。本稿では,クラスインクリメンタルラーニング(CIL)設定における音声言語理解のためのシーケンス・ツー・シーケンス学習モデルの問題点を考察し,経験的リプレイとコントラスト学習の組み合わせに依存するCIL手法であるCOCONUTを提案する。 COCONUTは、リハーサルサンプルのみに適用された標準的な教師付きコントラスト損失の修正版を通じて、同じクラスからより近いサンプルを引き出し、他のクラスをプッシュすることで、学習された表現を保存する。さらに,音声とテキストの特徴を整列させることにより,モデルが新たなデータの識別的表現をより学習するのに役立つマルチモーダル・コントラッシブ・ロスを利用する。また, 比較的損失の強さと, 蒸留に使用する教師・学生アーキテクチャを組み合わせるために, 異なるコントラスト的設計について検討した。確立された2つのSLUデータセットに対する実験により,提案手法の有効性とベースラインに対する大幅な改善が示された。また,COCONUTをデコーダ側で動作させるメソッドと組み合わせることで,さらなるメトリクス改善が期待できることを示す。 Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 抗レラクゼーション被覆および緩衝ガス充填アルカリ気相細胞における光蓄積の比較研究 Comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali vapor cells ( http://arxiv.org/abs/2310.03726v3 ) ライセンス: Link先を確認	Marin Ðujić, D. Buhin, N. Šantić, D. Aumiler, T. Ban,	(参考訳) 熱ルビジウム蒸気中における電磁誘導透過 (EIT) を用いた反ラキセーション被覆および緩衝ガス充填アルカリ気相セルの光蓄積特性の比較検討を行った。バッファーガスを充填したセルを使用することで, 保存時間と効率が, 抗レラクテーション被覆セルに比べて$10倍向上した。この知見は将来の電界展開可能な高性能量子メモリの開発に寄与する。 We perform a comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali-vapor cells using electromagnetically induced transparency (EIT) in warm rubidium vapor. The use of a buffer-gas-filled cell resulted in $\approx$10-fold improvement in storage time and efficiency compared to antirelaxation coated cells. Our findings contribute to the development of future field-deployable high-performance quantum memories.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 主成分分析のための非接触ホテルリングのデフレの誤り伝播について On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis ( http://arxiv.org/abs/2310.04283v2 ) ライセンス: Link先を確認	Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis,	(参考訳) 主成分分析(PCA)は、データセットの分散を最もよく表す、いわゆる主成分によって分散される部分空間を見つけることを目的としている。デフレレーション法は、最も重要でないものから始まり、重要でないものに向かって、個別の主成分を逐次発見する一般的なメタアルゴリズムである。しかし、デフレが進むにつれて、主成分の不正確な推定による数値誤差は、その逐次的性質により伝播する。本稿では,不正確なHotellingのデフレ手法の誤差伝搬を数学的に特徴づける。先頭の固有ベクトルを見つけるためのサブルーチンが抽象的で、様々なアルゴリズムを表現できる場合の$i)$と、サブルーチンとしてパワーイテレーションが使用される場合の$ii)の2つのシナリオを考えます。後者の場合、電力反復による追加方向情報により、サブルーチン非依存の場合よりも厳密な誤差が得られる。どちらのシナリオでも、エラーがどのように進行し、その後の主成分推定に影響を及ぼすかを明確に特徴付ける。 Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the imprecise estimation of principal components propagate due to its sequential nature. This paper mathematically characterizes the error propagation of the inexact Hotelling's deflation method. We consider two scenarios: $i)$ when the sub-routine for finding the leading eigenvector is abstract and can represent various algorithms; and $ii)$ when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the sub-routine agnostic case. For both scenarios, we explicitly characterize how the errors progress and affect subsequent principal component estimations.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 近似量子誤り訂正符号の複雑さと順序 Complexity and order in approximate quantum error-correcting codes ( http://arxiv.org/abs/2310.04710v2 ) ライセンス: Link先を確認	Jinmin Yi, Weicheng Ye, Daniel Gottesman, Zi-Wen Liu,	(参考訳) 量子回路の複雑度と近似量子誤差補正(AQEC)特性の厳密な関係を確立し,格子系を含む全次元および幾何学的シナリオを網羅する。そこで本研究では,AQECの最適精度と密接に関連しているサブシステム分散(subsystem variance)と呼ばれるコードパラメータについて紹介する。我々の重要な発見は、サブシステムの分散が$O(k/n)$しきい値以下であれば、コードサブ空間の任意の状態は特定の回路の複雑さの低い境界に従わなければならないということです。この結果に基づいて、サブスペース間の境界として$O(k/n)$を提案し、AQECコードとしてカウントすべきではない。 AQECのこの理論は、多体量子系の量子複雑性と秩序を理解するための汎用的なフレームワークを提供し、多体および高エネルギー物理学において顕著な重要性を持つ、特にトポロジカル秩序と臨界量子系の広い物理シナリオに対する新しい洞察を提供する。我々は、大まかに$O(1/n)$は、非自明な量子秩序に関連する特徴に対するサブシステム分散の共通で、物理的に重要な ` `scaling threshold''' を表す、様々な観点から観察する。 We establish rigorous connections between quantum circuit complexity and approximate quantum error correction (AQEC) properties, covering both all-to-all and geometric scenarios including lattice systems. To this end, we introduce a type of code parameter that we call subsystem variance, which is closely related to the optimal AQEC precision. Our key finding is that if the subsystem variance is below an $O(k/n)$ threshold then any state in the code subspace must obey certain circuit complexity lower bounds, which identify nontrivial ``phases'' of codes. Based on our results, we propose $O(k/n)$ as a boundary between subspaces that should and should not count as AQEC codes. This theory of AQEC provides a versatile framework for understanding the quantum complexity and order of many-body quantum systems, offering new insights for wide-ranging physical scenarios, in particular topological order and critical quantum systems which are of outstanding importance in many-body and high energy physics. We observe from various different perspectives that roughly $O(1/n)$ represents a common, physically significant ``scaling threshold'' of subsystem variance for features associated with nontrivial quantum order.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 連続的グローバル最適化に基づくParFam --(ニューラルガイド付き)シンボリック回帰 ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization ( http://arxiv.org/abs/2310.05537v3 ) ライセンス: Link先を確認	Philipp Scholl, Katharina Bieker, Hillary Hauger, Gitta Kutyniok,	(参考訳) 記号回帰(SR)の問題は、物理法則の特定や、与えられたデータから金融市場の振舞いを記述する数学的方程式の導出など、多くの異なる応用で生じる。 SRの問題に対処するためには様々な方法があり、しばしば遺伝的プログラミングに基づいている。しかしながら、これらの手法は通常複雑であり、様々なハイパーパラメータを含む。本稿では,ParFamを用いて離散的記号回帰問題を連続的に変換する手法を提案する。グローバルオプティマイザと組み合わせることで,SR問題に対処するための高効率な手法が提案される。我々はParFamの表現率を理論的に解析し、SRベンチマークスーツSRBenchに基づく広範な数値実験によりその性能を実証し、最先端の結果が得られたことを示す。さらに、ParFamをガイドするために、事前訓練されたトランスフォーマーネットワークDL-ParFamを組み込んだ拡張を行い、最適化プロセスを最大2等級高速化する。私たちのコードと結果はhttps://github.com/Philipp238/parfam.comで確認できます。 The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes. Our code and results can be found at https://github.com/Philipp238/parfam.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# InstructRetro: Retrieval-Augmented Pretrainingのインストラクションチューニング InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining ( http://arxiv.org/abs/2310.07713v3 ) ライセンス: Link先を確認	Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro,	(参考訳) 自動回帰型大言語モデル~(LLM)の検索による事前学習は、外部データベースを活用することにより、より複雑で現実的な正確性を示す。しかし、既存の事前学習によるLLMのサイズは制限されている(例えば、Retroは7.5Bパラメータを持つ)ため、命令チューニングとゼロショットの一般化の有効性が制限されている。本稿では,検索を前提としたLLMとしては最大規模のRetro 48Bを紹介する。具体的には、12兆のトークンから検索することで、Retro拡張法を用いて、さらに1000億のトークンに43BのGPTモデルを事前訓練し続けます。特に、得られた基盤モデルであるRetro 48Bは、1.2TトークンでトレーニングされたGPT 43Bを、わずか2.58%のGPU時間で上回っており、この手法のスケーリング可能性を示している。 Retroでのインストラクションチューニングの後、InstructRetroは幅広いゼロショットタスクにおいて、命令チューニングされたGPTよりも大幅に改善されていることを示す。具体的には、InstructRetroの平均的な改善は、8つの短い形式QAにまたがるGPTよりも7%、長い形式QAに10%、そして3つの要約タスクに16%である。驚いたことに、InstructRetroアーキテクチャからエンコーダを廃止し、デコーダのバックボーンを直接使用でき、同等の結果が得られます。提案手法は, 学習前の検索を継続し, より優れたGPTデコーダを得るための有望な方向を示すものである。私たちのコードとチェックポイントは、https://huggingface.co/nvidia/retro-48b-instruct-4k.comで公開されています。 Pretraining auto-regressive large language models~(LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval. Specifically, we continue to pretrain a 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. Notably, the obtained foundation model, Retro 48B, largely outperforms the counterpart GPT 43B trained on 1.2T tokens in terms of perplexity with only 2.58% additional GPU hours, demonstrating the significant scaling potential of the method. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA and reading comprehension tasks, 10% over GPT across 4 challenging long-form QA tasks, and 16% over GPT across 3 summarization tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. Our results highlight the promising direction to obtain a better GPT decoder through continued pretraining with retrieval before instruction tuning. Our code and checkpoints are publicly available at: https://huggingface.co/nvidia/retro-48b-instruct-4k.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 量子技術のための自己参照光位相雑音解析器 A self-referenced optical phase noise analyzer for quantum technologies ( http://arxiv.org/abs/2310.08258v2 ) ライセンス: Link先を確認	Robert Freund, Christian D. Marciniak, Thomas Monz,	(参考訳) 第二世代の量子技術は、工学化された量子システムを利用して古典的な代替品より優れていることを目標としている。量子的優位性を実現するために必要なコヒーレンスを維持するには、ホストシステムが被るノイズの詳細な知識と制御が必要である。パワースペクトル密度によるノイズプロセスの特徴付けは、科学や技術を通して日常的に行われ、必要なタスクとなる。例えば、主要な量子技術プラットフォームにおける位相ノイズパワースペクトルを決定することは、多くの位相ノイズアナライザの範囲外か、あるいは違法に高価である。本研究では,量子技術応用のためのコスト効率の高い光位相ノイズアナライザを提示し,特徴付ける。この設定を用いて、729\ \rm{nm}$に近い2つのライン幅のウルトラ安定振動子を比較し、これらを基準として、この測定装置で達成されたノイズフロアを、制限とトレードオフに焦点をあてて決定し、議論する。この実装において達成されたノイズフロアは、低コストで全ストック構成であり、低複雑さの位相ノイズアナライザであり、商用製品と比較して好適である。このセットアップは、多くのコンポーネントメーカーがそうであるように、より安定した参照や運用量子システムをセンサーとして使用せずに、特にアプリケーションを見つけることができる。 Second generation quantum technologies aim to outperform classical alternatives by utilizing engineered quantum systems. Maintaining the coherence required to enable any quantum advantage requires detailed knowledge and control over the noise the hosting system is subjected to. Characterizing noise processes via their power spectral density is routinely done throughout science and technology and can be a demanding task. Determining the phase noise power spectrum in leading quantum technology platforms, for example, can be either outside the reach of many phase noise analyzers, or be prohibitively expensive. In this work, we present and characterize a cost-effective optical phase noise analyzer for quantum technology applications. Using this setup we compare two $\approx1\ \rm{Hz}$ linewidth ultra-stable oscillators near $729\ \rm{nm}$, using them as references to determine and discuss the noise floor achieved in this measurement apparatus with a focus on limitations and their tradeoffs. The achieved noise floor in this implementation of a low-cost, all-stock component, low-complexity phase noise analyzer compares favourably to commercial offerings. This setup can find application in particular without a more stable reference or operational quantum system as sensor as would be the case for many component manufacturers.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 実世界ツインフィールド量子鍵分布における位相ノイズ Phase Noise in Real-World Twin-Field Quantum Key Distribution ( http://arxiv.org/abs/2310.08621v3 ) ライセンス: Link先を確認	Gianluca Bertaina, Cecilia Clivati, Simone Donadello, Carlo Liorni, Alice Meda, Salvatore Virzì, Marco Gramegna, Marco Genovese, Filippo Levi, Davide Calonico, Massimiliano Dispenza, Ivo Pietro Degiovanni,	(参考訳) ツインフィールド量子鍵分布(TF-QKD)プロトコルの現実実装におけるノイズ源の影響について,光子源からの位相雑音に着目して検討した。この研究は、鍵レートの決定におけるレーザー品質、ネットワークトポロジー、繊維長、アームバランス、検出器性能の役割を強調している。注目すべきは、主要なTF-QKDプロトコルが、異なるメカニズムにもかかわらず位相ノイズの影響を受けていることである。本研究は,2つ以上の波長幅のレーザーと位相制御技術によるデューティサイクルの改善を実証し,高精度時間/周波数分布サービスによる潜在的な相乗効果を明らかにする。統合と小型化に向けて進化するウルトラスタブルレーザーは、既存のネットワーク上でのアジャイルTF-QKD実装を約束する。位相ノイズと実用的な制約に適切に対処することで、いくつかの国で開発中の量子通信インフラの安全な長距離リンクを確立するのに不可欠な、一貫した鍵レート予測、プロトコルの選択、レイアウト設計が可能になる。 The impact of noise sources in real-world implementations of Twin-Field Quantum Key Distribution (TF-QKD) protocols is investigated, focusing on phase noise from photon sources and connecting fibers. This work emphasizes the role of laser quality, network topology, fiber length, arm balance, and detector performance in determining key rates. Remarkably, it reveals that the leading TF-QKD protocols are similarly affected by phase noise despite different mechanisms. This study demonstrates duty cycle improvements of over a factor of two through narrow-linewidth lasers and phase-control techniques, highlighting the potential synergy with high-precision time/frequency distribution services. Ultrastable lasers, evolving toward integration and miniaturization, offer promise for agile TF-QKD implementations on existing networks. Properly addressing phase noise and practical constraints allows for consistent key rate predictions, protocol selection, and layout design, crucial for establishing secure long-haul links for the Quantum Communication Infrastructures under development in several countries.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# CLIPによるインクリメンタルオブジェクト検出 Incremental Object Detection with CLIP ( http://arxiv.org/abs/2310.08815v2 ) ライセンス: Link先を確認	Ziyue Huang, Yupeng He, Qingjie Liu, Yunhong Wang,	(参考訳) インクリメンタルな分類タスクとは対照的に、インクリメンタルな検出タスクは、複数の連続学習段階にわたって異なるラベル付き境界ボックスを持つことができるため、データのあいまいさの存在によって特徴付けられる。この現象は、しばしばモデルが新しいクラスを効果的に学習する能力を損なう。しかし、既存の研究はモデルの前方互換性にはあまり注意を払わず、漸進的な学習に適していることを制限している。この障害を克服するために、CLIPのような視覚言語モデルを用いて、異なるクラスセットのテキスト特徴埋め込みを生成することを提案する。次に、段階的なシナリオをシミュレートするために、早期の学習段階において利用できない新しいクラスを置き換えるために、スーパークラスを使用します。最後に、CLIP画像エンコーダを用いて、潜在的なオブジェクトを正確に識別する。そこで我々は,この微妙に認識された検出ボックスを擬似アノテーションとしてトレーニングプロセスに組み込むことにより,検出性能をさらに向上させる。我々は,PASCAL VOC 2007データセットを用いた様々な漸進的な学習環境に対するアプローチを評価し,そのアプローチは,特に新クラスの認識において最先端の手法よりも優れていることを示す。 In contrast to the incremental classification task, the incremental detection task is characterized by the presence of data ambiguity, as an image may have differently labeled bounding boxes across multiple continuous learning stages. This phenomenon often impairs the model's ability to effectively learn new classes. However, existing research has paid less attention to the forward compatibility of the model, which limits its suitability for incremental learning. To overcome this obstacle, we propose leveraging a visual-language model such as CLIP to generate text feature embeddings for different class sets, which enhances the feature space globally. We then employ super-classes to replace the unavailable novel classes in the early learning stage to simulate the incremental scenario. Finally, we utilize the CLIP image encoder to accurately identify potential objects. We incorporate the finely recognized detection boxes as pseudo-annotations into the training process, thereby further improving the detection performance. We evaluate our approach on various incremental learning settings using the PASCAL VOC 2007 dataset, and our approach outperforms state-of-the-art methods, particularly for recognizing the new classes.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# ポイントワイド相互情報プロファイルの特性と推定について On the Properties and Estimation of Pointwise Mutual Information Profiles ( http://arxiv.org/abs/2310.10240v2 ) ライセンス: Link先を確認	Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx,	(参考訳) ポイントワイド相互情報プロファイル(ポイントワイド相互情報プロファイル、英: pointwise mutual information profile)は、与えられた確率変数のペアに対するポイントワイド相互情報の分布である。その重要な性質の1つは、期待値がこれらの確率変数間の相互情報であることである。本稿では,多変量正規分布の分布を解析的に記述し,モンテカルロ法を用いてその分布を正確に推定できる分布の新たなファミリーであるベンドとミキシングモデルを導入する。次に、ベンドモデルとミキシングモデルを用いて、既存の相互情報推定器の限界を調査し、変分推定器で使用される神経評論家の行動を調べ、実験的な外乱が相互情報推定に与える影響を理解する方法を示す。最後に,ベンドモデルとミキシングモデルを用いて相互情報のモデルベースベイズ推定を行い,不確実性定量化が必要な領域専門知識の問題に適合することを示す。 The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables. One of its important properties is that its expected value is precisely the mutual information between these random variables. In this paper, we analytically describe the profiles of multivariate normal distributions and introduce a novel family of distributions, Bend and Mix Models, for which the profile can be accurately estimated using Monte Carlo methods. We then show how Bend and Mix Models can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how Bend and Mix Models can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# Self-Pro: グラフニューラルネットワークのためのセルフプロンプトとチューニングフレームワーク Self-Pro: Self-Prompt and Tuning Framework for Graph Neural Networks ( http://arxiv.org/abs/2310.10362v2 ) ライセンス: Link先を確認	Chenghua Gong, Xiang Li, Jianxiang Yu, Cheng Yao, Jiaqi Tan, Chengcheng Yu,	(参考訳) グラフはWebアプリケーションにとって重要なモデリングツールとなり、グラフニューラルネットワーク(GNN)はグラフ表現学習において大きな成功を収めた。しかし、彼らの演技は大量の監督に大きく依存している。近年, 'pre-train, fine-tune'' はラベル依存や一般化の貧弱な問題に対処するパラダイムとなっている。しかし、事前学習戦略はホモフィリーなグラフとヘテロフィリーなグラフで異なり、様々な下流タスクの目的も異なる。これにより、プリテキストとダウンストリームタスクの間にギャップが生じ、結果として‘負の転送’が発生し、パフォーマンスが低下する。自然言語処理の素早い学習にインスパイアされた多くの研究は、ギャップを埋め、事前訓練されたモデルを完全に活用する。しかし、グラフプロンプトの既存の方法はホモフィリーに調整されており、グラフ上の固有のヘテロフィリーを無視している。一方、そのほとんどはランダムに初期化されたプロンプトに依存しており、安定性に悪影響を及ぼす。そこで本研究では,モデルとデータ自体に基づくグラフのプロンプトフレームワークであるSelf-Promptを提案する。まず,非対称なグラフのコントラスト学習を前文として導入し,前文と下流タスクの目的を整合させる。次に,プリトレーニングをセルフアダプタとして再利用し,タスク適応のためのグラフ自体に基づいたセルフプロンプトを導入する。最後に、11のベンチマークデータセットに対する広範な実験を行い、その優位性を実証する。私たちは \url{https://github.com/gongchenghua/Self-Pro} でコードを提供しています。 Graphs have become an important modeling tool for Web applications, and graph neural networks (GNNs) have achieved great success in graph representation learning. However, their performance heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in natural language processing, many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning as pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at \url{https://github.com/gongchenghua/Self-Pro}.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# ACES: 自動生成モデルによる多言語プログラミングパズルの生成 ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models ( http://arxiv.org/abs/2310.10692v4 ) ライセンス: Link先を確認	Julien Pourcel, Cédric Colas, Gaia Molinaro, Pierre-Yves Oudeyer, Laetitia Teodorescu,	(参考訳) 新しく興味深い問題を発明する能力は、イノベーション、芸術、科学を駆動する人間の知能の驚くべき特徴である。そこで我々は,Pythonプログラミングパズルの文脈において,最先端の生成モデルのパワーを活用して,難解で解決可能な問題を多種多様に生成することを目的として,このプロセスを自動化する手法を提案する。本質的に動機づけられた文献に触発されて、自動コーデック検索(ACES)は、発生した問題の多様性と難易度を共同で最適化する。 LLM生成セマンティックディスクリプタの空間における問題(例えば、文字列操作、動的プログラミングなど)を表現し、その難しさをLlama-3-70Bの成功率の線形化関数として経験的に測定する。 ACESは、以前生成された問題をコンテキスト内例として用いて、ターゲットセマンティック記述子(ゴール指向探索)の多様性を達成するために、大きな言語モデルに難しい問題を生成するように反復的に促す。 ACESは、ベースラインメソッドが生み出す問題よりも多様性があり、また既存のPythonプログラミングベンチマークで見られる問題よりも、11の最先端コード LLM で平均して3倍難しい問題を生成する。 The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically motivated literature, Autotelic CodE Search (ACES) jointly optimizes for the diversity and difficulty of generated problems. We represent problems in a space of LLM-generated semantic descriptors describing the programming skills required to solve them (e.g. string manipulation, dynamic programming, etc.) and measure their difficulty empirically as a linearly decreasing function of the success rate of Llama-3-70B, a state-of-the-art LLM problem solver. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors (goal-directed exploration) using previously generated problems as in-context examples. ACES generates problems that are more diverse and more challenging than problems produced by baseline methods and three times more challenging than problems found in existing Python programming benchmarks on average across 11 state-of-the-art code LLMs.	翻訳日:2024-05-31 02:11:35 公開日:2024-05-29
# 3D-GPT:大規模言語モデルを用いた手続き型3Dモデリング 3D-GPT: Procedural 3D Modeling with Large Language Models ( http://arxiv.org/abs/2310.12945v2 ) ライセンス: Link先を確認	Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould,	(参考訳) 効率的な自動コンテンツ作成の追求において、変更可能なパラメータとルールベースのシステムを活用する手続き生成が有望なアプローチとして現れている。それにもかかわらず、規則、アルゴリズム、パラメータの深い理解を必要とする複雑な性質を考えると、それは要求に満ちた努力かもしれない。作業負荷を低減するため,命令駆動型3Dモデリングのための大規模言語モデル~(LLM)を利用するフレームワークである3D-GPTを導入する。 3D-GPTは、3Dモデリングタスクをアクセス可能なセグメントに分割し、各タスクにアプエージェントを割り当てる。 3D-GPTは、タスクディスパッチエージェント、概念化エージェント、モデリングエージェントの3つのコアエージェントを統合する。彼らは共同で2つの目標を達成する。まず、簡潔なシーン記述を強化し、後続の命令に基づいてテキストを動的に適応させながら、それらを詳細な形式に進化させる。第二に、プロシージャ生成を統合し、リッチテキストからパラメータ値を抽出し、3Dソフトウェアに精通してアセット生成を行う。我々の実証調査では、3D-GPTが解釈し、指示を実行し、信頼性の高い結果を提供するだけでなく、人間デザイナーと効果的に協力することを確認した。さらに、Blenderとシームレスに統合され、拡張された操作可能性のロックが解除される。本研究は3次元モデリングにおけるLLMの可能性を強調し,シーン生成とアニメーションの今後の進歩のための基本的なフレームワークを提供する。 In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# 大規模言語モデルはなぜ正しい連鎖を生成するのか? Why Can Large Language Models Generate Correct Chain-of-Thoughts? ( http://arxiv.org/abs/2310.13571v3 ) ライセンス: Link先を確認	Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar,	(参考訳) 本稿では,大規模言語モデル(LLM)の能力について述べる。本研究では,LLMを効果的に誘導し,コヒーレントな思考連鎖を生成する方法について検討する。これを実現するために,自然言語生成に適した2階層階層型グラフィカルモデルを提案する。この枠組み内では、真の言語に由来するものと比較して、LLM生成された思考の連鎖の可能性を測る魅力的な幾何学的収束率を確立する。本研究は,LLMが推論スキルを要求されるタスクのパフォーマンス向上を説明する上で,(潜在的に)正しい思考系列を生成できることを理論的に正当化するものである。 This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# 光の超放射性と回転量子流体からの絡み合い Entanglement from superradiance and rotating quantum fluids of light ( http://arxiv.org/abs/2310.16031v3 ) ライセンス: Link先を確認	Adrià Delhom, Killian Guerrero, Paula Calizaya, Kévin Falque, Alberto Bramati, Anthony J. Brady, Maxime J. Jacquet, Ivan Agullo,	(参考訳) 超放射光による放射の増幅は、多くの物理系で観測される普遍的な現象である。超放射能散乱は、コヒーレント状態を含む異なる入力状態の絡み合いを生じさせ、この現象の本質的な量子的性質を確立することを実証する。これらの概念を実験に適用するために,光の偏光流体の散逸ダイナミクスにより動的に安定な地平線のないエルゴリージョンを構築する新しい手法を提案する。我々は,安定なエルゴリージョンの作成を実証するために,システムを数値シミュレーションする。その後,本システム内での回転超放射能について検討し,主に絡み合いの発生と,その拡張の可能性について検討した。この方法では、入力状態を自由に制御できる最先端の実験において、回転超放射による量子放出の研究が可能である。 The amplification of radiation by superradiance is a universal phenomenon observed in numerous physical systems. We demonstrate that superradiant scattering generates entanglement for different input states, including coherent states, thereby establishing the inherently quantum nature of this phenomenon. To put these concepts to the test, we propose a novel approach to create horizonless ergoregions, which are nonetheless dynamically stable thanks to the dissipative dynamics of a polaritonic fluid of light. We numerically simulate the system to demonstrate the creation of a stable ergoregion. Subsequently, we investigate rotational superradiance within this system, with a primary focus on entanglement generation and the possibilities for its enhancement using current techniques. Our methods permit the investigation of quantum emission by rotational superradiance in state-of-the-art experiments, in which the input state can be controlled at will.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# 3時間拘束型アクター臨界および拘束型自然アクター臨界アルゴリズムの有限時間解析 Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms ( http://arxiv.org/abs/2310.16363v3 ) ライセンス: Link先を確認	Prashansa Panda, Shalabh Bhatnagar,	(参考訳) アクター批判法は、特に状態-作用空間が大きい場合に、広範囲の強化学習タスクに多大な応用を見出した。本稿では,不等式制約を含む制約付きマルコフ決定過程(C-MDP)の関数近似を用いたアクター評論家および自然なアクター評論家アルゴリズムについて考察し,これらのアルゴリズムを非i.d(マルコフアン)環境で非漸近解析する。目的関数と制約関数の両方が所定コスト関数の政策依存の長期平均となるような長期平均コスト基準を考察する。ラグランジュ乗算器法を用いて不等式制約を扱う。これらのアルゴリズムが性能(ラグランジュ)関数$L(\theta,\gamma)$の1次定常点(すなわち $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$)を見つけることが保証されていることを証明している。また、3つの異なるセーフティガイム環境の実験結果も示す。 Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms. We also show the results of experiments on three different Safety-Gym environments.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# 原子アンサンブルにおける弱場励起以外の光散乱特性 Light scattering properties beyond weak-field excitation in atomic ensembles ( http://arxiv.org/abs/2310.17106v2 ) ライセンス: Link先を確認	Chung-Hsien Wang, Nai-Yu Tsai, Yi-Cheng Wang, H. H. Jen,	(参考訳) 大型原子系の光学的性質の研究において、線形結合方程式による系のダイナミクスを単純化するために弱いレーザー駆動がしばしば仮定される。本稿では,原子アンサンブルの光散乱特性について,累積膨張法を用いて検討する。定常方程式に高次相関を漸進的に組み込むことにより、完全な密度行列を解いた正確な解と比較して精度を向上することができる。分析の結果, 弱い双極子-双極子相互作用 (DDI) の段階において, 1次展開は光深度に対する良好な予測を導出し, より密度の高い原子配置は高次相関を考慮する必要があることがわかった。入射光の強度が増加すると、原子飽和効果が顕著になり、光透過性、エネルギーシフト、崩壊速度が著しく変化する。この飽和現象は、弱い駆動条件下でもサブラジアント原子配列にまで広がり、線形モデルからはかなり逸脱する。本研究は,線形モデルに対する平均場モデルを精度と計算複雑性の両立を図ったものである。しかし、このような光物質相互作用系におけるヒルベルト空間の指数関数的増加により理論的に難しいため、大きくて密度の高い原子系における高次累積物の役割は依然として不明である。 In the study of optical properties of large atomic system, a weak laser driving is often assumed to simplify the system dynamics by linearly coupled equations. Here, we investigate the light scattering properties of atomic ensembles beyond weak-field excitation through the cumulant expansion method. By progressively incorporating higher-order correlations into the steady-state equations, an enhanced accuracy can be achieved in comparison to the exact solutions from solving a full density matrix. Our analysis reveals that, in the regime of weak dipole-dipole interaction (DDI), the first-order expansion yields satisfactory predictions for optical depth, while denser atomic configurations necessitate consideration of higher-order correlations. As the intensity of incident light increases, atom saturation effects become noticeable, giving rise to significant changes in light transparency, energy shift, and decay rate. This saturation phenomenon extends to subradiant atom arrays even under weak driving conditions, leading to substantial deviations from the linear model. Our findings demonstrate the mean-field models as good extensions to linear models as it balances both accuracy and computational complexity. However, the crucial role of higher-order cumulants in large and dense atom systems remains unclear, since it is challenging theoretically owing to the exponentially increasing Hilbert space in such light-matter interacting systems.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# セマンティック通信を利用した無線AI生成コンテンツ(AIGC)プロビジョニングフレームワーク A Wireless AI-Generated Content (AIGC) Provisioning Framework Empowered by Semantic Communication ( http://arxiv.org/abs/2310.17705v2 ) ライセンス: Link先を確認	Runze Cheng, Yao Sun, Dusit Niyato, Lan Zhang, Lei Zhang, Muhammad Ali Imran,	(参考訳) 生成型AIアプリケーションは、多種多様な高品質なAI生成コンテンツ(AIGC)を作成することで、最近、巨大なユーザベースに対応している。モバイルデバイスの普及とモバイルトラフィックの急速な増加により、無線通信ネットワークによる高品質なAIGCサービスへのユビキタスなアクセスが、未来の方向になりつつある。しかし、不安定なチャネル、帯域幅の限られたリソース、不均一な分散計算リソースを備えた無線ネットワークにおいて、適切なAIGCサービスを提供することは困難である。これらの課題に対処するために、セムコムを用いたセマンティック通信(セムコム)によるAIGC(セムAIGC)生成および送信フレームワークを提案する。具体的には、セマンティックエンコーダとデコーダに拡散モデルを統合することで、ワークロード調整可能なトランシーバを設計し、エッジおよびローカルでの計算資源利用の調整を可能にする。さらに、リソースを意識したwOrk lOad Trade-off(ROOT)スキームを考案し、トランスシーバの負荷適応決定をインテリジェントに行い、動的無線チャンネル条件やサービス要件に応じたコンテンツを生成し、送信し、微調整する。提案するSemAIGCフレームワークは,従来の手法に比べてレイテンシとコンテンツ品質が優れていることがシミュレーションによって検証された。 Generative AI applications have been recently catering to a vast user base by creating diverse and high-quality AI-generated content (AIGC). With the proliferation of mobile devices and rapid growth of mobile traffic, providing ubiquitous access to high-quality AIGC services via wireless communication networks is becoming the future direction. However, it is challenging to provide qualified AIGC services in wireless networks with unstable channels, limited bandwidth resources, and unevenly distributed computational resources. To tackle these challenges, we propose a semantic communication (SemCom)-empowered AIGC (SemAIGC) generation and transmission framework, where only semantic information of the content rather than all the binary bits should be generated and transmitted by using SemCom. Specifically, SemAIGC integrates diffusion models within the semantic encoder and decoder to design a workload-adjustable transceiver thereby allowing adjustment of computational resource utilization in edge and local. In addition, a Resource-aware wOrk lOad Trade-off (ROOT) scheme is devised to intelligently make workload adaptation decisions for the transceiver, thus efficiently generating, transmitting, and fine-tuning content as per dynamic wireless channel conditions and service requirements. Simulations verify the superiority of our proposed SemAIGC framework in terms of latency and content quality compared to conventional approaches.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# モデル適応によるデバイアスアルゴリズム Debiasing Algorithm through Model Adaptation ( http://arxiv.org/abs/2310.18913v4 ) ライセンス: Link先を確認	Tomasz Limisiewicz, David Mareček, Tomáš Musil,	(参考訳) 大規模言語モデルは、ますます増え続けるタスクの解決策になりつつある。しかし、能力の増大に伴い、モデルはトレーニングデータに存在するバイアスやステレオタイプから生じる急激な相関に依存する傾向にある。本研究は,言語モデルにおけるジェンダーバイアスの検出と緩和のための新しい手法を提案する。因果解析を行い、問題のあるモデル成分を同定し、フィードフォワードの中間層が最も偏りを伝達しやすいことを明らかにする。解析結果に基づいて,これらの層の重み行列に線形射影を適用することにより,モデルに介入する。提案手法であるDAMAは,下流タスクにおけるモデルの性能を維持しながら,様々な指標によって測定されるバイアスを著しく低減する。我々はLLaMAの最先端性能を再訓練する手法とモデルのためのコードをリリースし、バイアスを著しく低減した。 Large language models are becoming the go-to solution for the ever-growing number of tasks. However, with growing capacity, models are prone to rely on spurious correlations stemming from biases and stereotypes present in the training data. This work proposes a novel method for detecting and mitigating gender bias in language models. We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias. Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers. Our titular method, DAMA, significantly decreases bias as measured by diverse metrics while maintaining the model's performance on downstream tasks. We release code for our method and models, which retrain LLaMA's state-of-the-art performance while being significantly less biased.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# MIST:メンバーシップ不変のサブスペーストレーニングによるメンバーシップ推論攻撃の回避 MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training ( http://arxiv.org/abs/2311.00919v2 ) ライセンス: Link先を確認	Jiacheng Li, Ninghui Li, Bruno Ribeiro,	(参考訳) メンバー推論(MI)攻撃では、敵は機械学習(ML)モデルをトレーニングするためにインスタンスが使用されているかどうかを判断しようとする。 MI攻撃は、プライベートデータを使用してMLモデルをトレーニングする際の大きなプライバシー上の懸念事項である。文献におけるほとんどのMI攻撃は、MLモデルがトレーニングデータに適合するように訓練されているという事実を生かし、トレーニングインスタンスに非常に少ない損失をもたらす。したがって、ほとんどのMI攻撃に対する防御は、モデルのトレーニングデータへの適合性を低下させようとする。しかし、一般的には精度が低下する。トレーニングインスタンスがMI攻撃に対する脆弱性の度合いが異なることを観察する。ほとんどのインスタンスは、トレーニングに含まれていない場合でも、損失が小さくなります。これらのインスタンスでは、モデルをMI攻撃の心配なしにうまく適合させることができる。効果的な防御は、MI攻撃に弱いインスタンスを(暗黙的に)特定し、過度な適合を避ける必要がある。大きな課題は、効率的なトレーニングプロセスにおいて、そのような効果をどのように達成するかである。表現学習における2つの新たな進歩を生かして,MI攻撃を防御する新しいメンバーシップ・不変部分空間訓練(MIST)手法を提案する。 MISTは、他のインスタンスに大きな影響を与えることなく、脆弱性のあるインスタンスの過度な適合を避ける。我々は、MISTと他の様々なSOTAMI防衛を、いくつかのSOTAMI攻撃と比較し、広範囲にわたる実験的研究を行った。 MISTは他の防御よりも優れており、テスト精度は最小限に抑えられる。 In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# 再利用を学ぶ:知識スコープの制限と拒否メカニズムを通じて、大規模言語モデルをより制御可能で信頼性の高いものにする Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism ( http://arxiv.org/abs/2311.01041v3 ) ライセンス: Link先を確認	Lang Cao,	(参考訳) 大きな言語モデル(LLM)は印象的な言語理解と生成能力を示し、様々な領域にわたる幅広い質問に答えることを可能にする。しかし、これらのモデルは欠陥がなく、しばしばエラーや誤報を含む応答を生成する。これらの不正確さは、一般に幻覚と呼ばれ、多くのシナリオでLLMを信頼できない、さらには使用できないようにしている。本稿では,LLMにおける幻覚の問題を,特に質問応答の文脈において緩和することに焦点を当てる。全ての質問に答える代わりに、私たちはLLMにエラーを避けるために難しい質問に答えることを拒否するように指示する拒絶メカニズムを探求する。そこで我々は,L2R(Learning to Refuse)と呼ばれるシンプルで効果的な解を提案する。これを実現するため、構造化知識ベースを用いてLLMの世界のすべての理解を表現し、追跡可能な金の知識を提供する。この知識基盤はLLMとは分離されており、当初は空だった。検証済みの知識で満たされ、徐々に拡張される。 LLMがドメイン外の質問に遭遇すると、システムはその知識の範囲を認識し、その質問に答えられるかどうかを判断する。さらに,LLMの知識ベースを自動的かつ効率的に拡張する手法を提案する。定性的かつ定量的な分析により,LLMの可制御性と信頼性が向上することが実証された。 Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# HetCAN:デュアルレベル認識を備えた異種グラフカスケード注意ネットワーク HetCAN: A Heterogeneous Graph Cascade Attention Network with Dual-Level Awareness ( http://arxiv.org/abs/2311.03275v2 ) ライセンス: Link先を確認	Zeyuan Zhao, Qingqing Ge, Anfeng Cheng, Yiding Liu, Xiang Li, Shuaiqiang Wang,	(参考訳) 異種グラフニューラルネットワーク(HGNN)は、最近、現実世界のアプリケーションでユビキタスな異種グラフをモデリングする際、顕著な能力を示した。ヘテロジニアスグラフの既存の手法の多くは、複数の畳み込み層や注意層を積み重ねてノード埋め込みを学習しており、これはノードレベルの側面から高次情報をキャプチャすると考えられる。しかし、異種グラフの異なる種類のノードには多様な特徴があり、また特徴レベルの側面から高次情報を含むノードの特徴間の相互作用を捉えることも必要である。さらに、ほとんどのメソッドは、まずノードの機能を同じ低次元空間にマッピングすることで整列するが、この方法でノードの型情報を失う可能性がある。本稿では,複数のカスケードブロックからなる新規なヘテロジニアスグラフカスケード注意ネットワーク(HetCAN)を提案する。各カスケードブロックは、タイプアウェアエンコーダとディメンションアウェアエンコーダの2つのコンポーネントを含む。具体的には、タイプ認識エンコーダはノード型情報の損失を補償し、グラフの不均一性をフル活用することを目的としている。次元認識エンコーダは、ノード特徴間の相互作用をキャプチャすることで、特徴レベルの高次情報を学ぶことができる。これらのコンポーネントの助けを借りて、HetCANはノードの特徴、グラフの不均一性、およびノード埋め込みにおけるグラフ構造に関する情報を包括的にエンコードすることができる。大規模な実験は、HetCANが先進的な競争相手よりも優れていることを示し、その効率性と堅牢性を示している。 Heterogeneous graph neural networks(HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Most existing methods for heterogeneous graphs mainly learn node embeddings by stacking multiple convolutional or attentional layers, which can be considered as capturing the high-order information from node-level aspect. However, different types of nodes in heterogeneous graphs have diverse features, it is also necessary to capture interactions among node features, namely the high-order information from feature-level aspect. In addition, most methods first align node features by mapping them into one same low-dimensional space, while they may lose some type information of nodes in this way. To address these problems, in this paper, we propose a novel Heterogeneous graph Cascade Attention Network (HetCAN) composed of multiple cascade blocks. Each cascade block includes two components, the type-aware encoder and the dimension-aware encoder. Specifically, the type-aware encoder compensates for the loss of node type information and aims to make full use of graph heterogeneity. The dimension-aware encoder is able to learn the feature-level high-order information by capturing the interactions among node features. With the assistance of these components, HetCAN can comprehensively encode information of node features, graph heterogeneity and graph structure in node embeddings. Extensive experiments demonstrate the superiority of HetCAN over advanced competitors and also exhibit its efficiency and robustness.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# 無限大データの最小二乗クラスタリングのための高性能ハイブリッドアルゴリズム High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data ( http://arxiv.org/abs/2311.04517v3 ) ライセンス: Link先を確認	Ravil Mussabayev, Rustam Mussabayev,	(参考訳) 本稿では,Infinitely Tall Data (MSSC-ITD) の最小階数クラスタリング(Minimum Sum-of-Squares Clustering of Infinitely Tall Data, MSC-ITD)という,クラスタリング問題の新しい定式化と,その有効解に対するハイブリッド並列手法の革新的な集合であるHPClustを提案する。現代の高性能コンピューティング技術を利用することで、HPClustは、有効性、計算効率、拡張性といった主要なクラスタリング指標を強化する。 MapReduceフレームワークによる処理時間を短縮するバニラデータ並列処理とは対照的に,本手法では,マルチストラテジーな競合協調並列処理と,目的関数ランドスケープの複雑な特性を活用して,優れた性能を実現する。スケールに苦しむ他のアルゴリズムとは異なり、当社のアルゴリズムは本質的に並列であり、スケーラビリティと並列性の向上によるソリューション品質の向上、中小データセット用に設計された高度なアルゴリズムよりも優れています。 4つの並列戦略を特徴とするHPClustの評価は,従来の手法や最先端手法よりも優れた性能を示す。これらの結果から,並列処理はクラスタリング効率を向上するだけでなく,精度も向上することが示された。さらに、計算効率とクラスタリング品質のバランスについて検討し、データセットの詳細とリソース可用性に基づいた最適な並列戦略に関する洞察を提供する。本研究はクラスタリングアルゴリズムにおける並列性についての理解を深め,MSSC-ITD に対して,高度な並列アプローチの厳密なハイブリッド化が最適な結果をもたらすことを示す。合成データに関する実験は、HPClustの異常なスケーラビリティとノイズに対する堅牢性をさらに確認した。 This paper introduces a novel formulation of the clustering problem, namely the Minimum Sum-of-Squares Clustering of Infinitely Tall Data (MSSC-ITD), and presents HPClust, an innovative set of hybrid parallel approaches for its effective solution. By utilizing modern high-performance computing techniques, HPClust enhances key clustering metrics: effectiveness, computational efficiency, and scalability. In contrast to vanilla data parallelism, which only accelerates processing time through the MapReduce framework, our approach unlocks superior performance by leveraging the multi-strategy competitive-cooperative parallelism and intricate properties of the objective function landscape. Unlike other available algorithms that struggle to scale, our algorithm is inherently parallel in nature, improving solution quality through increased scalability and parallelism, and outperforming even advanced algorithms designed for small and medium-sized datasets. Our evaluation of HPClust, featuring four parallel strategies, demonstrates its superiority over traditional and cutting-edge methods by offering better performance in the key metrics. These results also show that parallel processing not only enhances the clustering efficiency, but the accuracy as well. Additionally, we explore the balance between computational efficiency and clustering quality, providing insights into optimal parallel strategies based on dataset specifics and resource availability. This research advances our understanding of parallelism in clustering algorithms, demonstrating that a judicious hybridization of advanced parallel approaches yields optimal results for MSSC-ITD. Experiments on synthetic data further confirm HPClust's exceptional scalability and robustness to noise.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# PINE:シークレット共有ベクトルの効率的なノルム境界検証 PINE: Efficient Norm-Bound Verification for Secret-Shared Vectors ( http://arxiv.org/abs/2311.10237v2 ) ライセンス: Link先を確認	Guy N. Rothblum, Eran Omri, Junye Chen, Kunal Talwar,	(参考訳) 高次元ベクトルのセキュアアグリゲーションは、フェデレートされた統計学と学習における基本的なプリミティブである。 PRIOのような2サーバシステムは、秘密共有ベクトルのスケーラブルな集約を可能にする。敵のクライアントは集約を操作しようとするかもしれないので、それぞれの(秘密の共有された)コントリビューションが適切に形成されていることを保証することが重要です。本研究では、各寄与ベクトルがユークリッドノルムに有界であることを保証するという、重要かつよく研究された目標に焦点を当てる。有界ノルム寄与を保証するための既存のプロトコルは、大きな通信オーバーヘッドを発生させるか、ノルム境界の近似的な検証しかできない。通信オーバーヘッドの少ない正確な標準検証を可能にする新しいプロトコルであるPrivate Inexpensive Norm Enforcement (PINE)を提案する。高次元ベクトルの場合、従来の16-32倍のオーバヘッドに比べて通信オーバヘッドは数パーセントである。 Secure aggregation of high-dimensional vectors is a fundamental primitive in federated statistics and learning. A two-server system such as PRIO allows for scalable aggregation of secret-shared vectors. Adversarial clients might try to manipulate the aggregate, so it is important to ensure that each (secret-shared) contribution is well-formed. In this work, we focus on the important and well-studied goal of ensuring that each contribution vector has bounded Euclidean norm. Existing protocols for ensuring bounded-norm contributions either incur a large communication overhead, or only allow for approximate verification of the norm bound. We propose Private Inexpensive Norm Enforcement (PINE): a new protocol that allows exact norm verification with little communication overhead. For high-dimensional vectors, our approach has a communication overhead of a few percent, compared to the 16-32x overhead of previous approaches.	翻訳日:2024-05-31 00:10:23 公開日:2024-05-29
# Bayesian Neural Networks: Min-Max Game Framework Bayesian Neural Networks: A Min-Max Game Framework ( http://arxiv.org/abs/2311.11126v2 ) ライセンス: Link先を確認	Junping Hong, Ercan Engin Kuruoglu,	(参考訳) 本稿では,ゲーム理論を定式化したベイズニューラルネットワーク(BNN)と最大符号化速度歪み損失を用いたディープニューラルネットワークのロバスト性と雑音解析について予備的検討を行う。 BNNは深層学習にある程度の堅牢性を提供しており、ミニマックス法はベイズ法を補助する自然な保守的な方法であった。最近の閉ループ転写ニューラルネットワークに触発されて、決定論的ニューラルネットワーク$f$とサンプリングネットワーク$f + \xi$または$f + r\xi$の間のゲーム理論を介してBNNを定式化する。従来のBNNと比較すると、BNNは中心$f$とサンプリングポイント$f + r\xi$の間の一定のギャップ内で解空間を学習し、以前のBNNと比較して意味のある事前設定を持つ保守的な選択である。さらに、$f$ と $f + r\xi$ の間の最小点は、十分に訓練されたモデル $f$ で、部分空間次元が十分に大きいときに安定となる。これにより、$f$は、たとえ$f$が真のデータを数回繰り返してオンライントレーニングしているとしても、予測レベルよりもサブスペース内の配布外データやノイズデータを認識する確率が高い。これまでのところ、我々の実験はMNISTとFashion MNISTのデータセットに限られており、現実的なデータセットと複雑なニューラルネットワークモデルを用いたさらなる実験は、上記の議論を検証するために実装されるべきである。 This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcription neural network, we formulate the BNN via game theory between the deterministic neural network $f$ and the sampling network $f + \xi$ or $f + r\xi$. Compared with previous BNN, BNN via game theory learns a solution space within a certain gap between the center $f$ and the sampling point $f + r\xi$, and is a conservative choice with a meaningful prior setting compared with previous BNN. Furthermore, the minimum points between $f$ and $f + r\xi$ become stable when the subspace dimension is large enough with a well-trained model $f$. With these, the model $f$ can have a high chance of recognizing the out-of-distribution data or noise data in the subspace rather than the prediction level, even if $f$ is in online training after a few iterations of true data. So far, our experiments are limited to MNIST and Fashion MNIST data sets, more experiments with realistic data sets and complicated neural network models should be implemented to validate the above arguments.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# 量子系における局所純度蒸留-純度と絡み合いの相補性を探る Local Purity Distillation in Quantum Systems: Exploring the Complementarity Between Purity and Entanglement ( http://arxiv.org/abs/2311.11820v2 ) ライセンス: Link先を確認	Ray Ganardi, Piotr Masajada, Moein Naseri, Alexander Streltsov,	(参考訳) 量子力学と量子絡み合いは、量子情報科学において重要な関係を持つ2つの重要な量子資源理論を表す。その重要性にもかかわらず、この2つの理論の複雑な関係は未だ完全には理解されていない。ここでは、特に局所冷却過程の文脈において、絡み合いと熱力学の相互作用を掘り下げる。ギブス保存型ローカル操作と古典通信の枠組みを導入・開発する。本フレームワークでは,リモートパーティがローカルシステムを地上状態に効果的に冷却できる戦略を探求する。我々の分析は、量子状態の1つのコピーのみがアクセス可能なシナリオに重点を置いており、理想的な性能は、これらの制約の下で達成可能な基底状態への可能な限りの忠実さによって定義される。局所冷却は局所純度の抽出と一致し, 完全縮退した局所ハミルトン系システムに着目する。この文脈では、局所純度抽出の効率性とシステムに存在する絡み合いの度合いとの間に強力なリンクを確立する。さらに、多くの関連するシナリオにおいて、最適性能は半定値プログラミング手法によって正確に決定できることを実証する。本研究は, 絡み込み検出・推定技術など, 様々な実用化への扉を開くものである。我々は、有界な絡み合い状態のクラスに対する絡み合いの量を評価することによってこれを実証する。 Quantum thermodynamics and quantum entanglement represent two pivotal quantum resource theories with significant relevance in quantum information science. Despite their importance, the intricate relationship between these two theories is still not fully understood. Here, we delve into the interplay between entanglement and thermodynamics, particularly in the context of local cooling processes. We introduce and develop the framework of Gibbs-preserving local operations and classical communication. Within this framework, we explore strategies enabling remote parties to effectively cool their local systems to the ground state. Our analysis is centered on scenarios where only a single copy of a quantum state is accessible, with the ideal performance defined by the highest possible fidelity to the ground state achievable under these constraints. We focus on systems with fully degenerate local Hamiltonians, where local cooling aligns with the extraction of local purity. In this context, we establish a powerful link between the efficiency of local purity extraction and the degree of entanglement present in the system, a concept we define as purity-entanglement complementarity. Moreover, we demonstrate that in many pertinent scenarios, the optimal performance can be precisely determined through semidefinite programming techniques. Our findings open doors to various practical applications, including techniques for entanglement detection and estimation. We demonstrate this by evaluating the amount of entanglement for a class of bound entangled states.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# 創発的医用画像評価のための特徴抽出 : 進化する傾向に対する新たな証拠 Feature Extraction for Generative Medical Imaging Evaluation: New Evidence Against an Evolving Trend ( http://arxiv.org/abs/2311.13717v3 ) ライセンス: Link先を確認	McKell Woodland, Austin Castelo, Mais Al Taie, Jessica Albuquerque Marques Silva, Mohamed Eltaher, Frank Mohn, Alexander Shieh, Austin Castelo, Suprateek Kundu, Joshua P. Yung, Ankit B. Patel, Kristy K. Brock,	(参考訳) Fr'echet Inception Distance (FID)は、合成画像の品質を評価するために広く用いられている指標である。 ImageNetベースの特徴抽出装置に依存しており、医療画像に適用可能であるかどうかは不明だ。最近のトレンドは、医用画像で訓練された特徴抽出器を通して、医用画像にFIDを適用することである。本研究では,ImageNetをベースとした抽出器がRadImageNetよりも人間の判断に整合していることを示すことで,この実践に挑戦する。我々は,Fr'echet distances (FDs) を用いた4つの医用画像モダリティと4つのデータ拡張技術を用いた16のStyleGAN2ネットワークの評価を行った。視覚的チューリングテストによる人的判断と比較したところ,ImageNetをベースとした抽出機が人的判断と整合性のあるランキングを作成したのに対し,ImageNetをトレーニングしたSwaV抽出機から抽出したFDは専門家による評価と有意な相関を示した。対照的に、RadImageNetベースのランキングは不安定であり、人間の判断とは矛盾していた。以上の結果から,医用画像抽出装置はFDを本質的に改善せず,信頼性を損なうことさえできないという新たな証拠が得られた。私たちのコードはhttps://github.com/mckellwoodland/fid-med-eval.comで利用可能です。 Fr\'echet Inception Distance (FID) is a widely used metric for assessing synthetic image quality. It relies on an ImageNet-based feature extractor, making its applicability to medical imaging unclear. A recent trend is to adapt FID to medical imaging through feature extractors trained on medical images. Our study challenges this practice by demonstrating that ImageNet-based extractors are more consistent and aligned with human judgment than their RadImageNet counterparts. We evaluated sixteen StyleGAN2 networks across four medical imaging modalities and four data augmentation techniques with Fr\'echet distances (FDs) computed using eleven ImageNet or RadImageNet-trained feature extractors. Comparison with human judgment via visual Turing tests revealed that ImageNet-based extractors produced rankings consistent with human judgment, with the FD derived from the ImageNet-trained SwAV extractor significantly correlating with expert evaluations. In contrast, RadImageNet-based rankings were volatile and inconsistent with human judgment. Our findings challenge prevailing assumptions, providing novel evidence that medical image-trained feature extractors do not inherently improve FDs and can even compromise their reliability. Our code is available at https://github.com/mckellwoodland/fid-med-eval.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# Polyak モメンタムを呈し, 大型カタパルトによる発火性小腫の発見 Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults ( http://arxiv.org/abs/2311.15051v3 ) ライセンス: Link先を確認	Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun,	(参考訳) ポリアクの運動量による勾配降下は、現代の機械や深層学習で広く使われているが、訓練軌道に対するその影響の具体的な理解はいまだ解明されていない。本研究では, 線形対角線ネットワークや非線形ニューラルネットワークの場合, 学習率の高い運動量勾配は大きなカタパルトを呈し, 勾配勾配よりもはるかに平坦なミニマに向かって反復することを示した。大カタパルトは自己安定化効果(Damian et al , 2023)の運動量"延長"によって引き起こされると仮定する。我々は、単純なおもちゃの例と線形対角ネットワークの仮説を支持する実証的な証拠で、我々の仮説を理論的、実証的に支持する。 Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains elusive. In this work, we empirically show that for linear diagonal networks and nonlinear neural networks, momentum gradient descent with a large learning rate displays large catapults, driving the iterates towards much flatter minima than those found by gradient descent. We hypothesize that the large catapult is caused by momentum "prolonging" the self-stabilization effect (Damian et al., 2023). We provide theoretical and empirical support for our hypothesis in a simple toy example and empirical evidence supporting our hypothesis for linear diagonal networks.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# 連続測定による空洞結合原子アンサンブルのスピンスクイーズ生成の解析 Analysis of spin-squeezing generation in cavity-coupled atomic ensembles with continuous measurements ( http://arxiv.org/abs/2311.15725v3 ) ライセンス: Link先を確認	A. Caprotti, M. Barbiero, M. G. Tarallo, M. G. Genoni, G. Bertaina,	(参考訳) 我々は3レベル原子の光キャビティへの結合と透過キャビティ場の連続量子測定によるスピンスクイーズ状態の生成を分析し、原子アンサンブルの進化をモニタリングする。解析処理と顕微鏡シミュレーションにより,原子数$N$で大きなスピンスクイーズを実現できることを示した。しかし、いくつかの文献とは対照的に、最適なアプローチで提案される継続的なフィードバックなしにハイゼンベルクのスケーリングが得られないことを明確にする。実際、断熱キャビティ除去近似と大きな$N$制限では、スピンスクイーズに対して$N^{-2/3}$、対応するプロトコル長に対して$N^{-1/3}$のスケーリング挙動が見つかる。これらの結果はブロッホ球の曲率を考えることでのみ得られるが、これは集合スピン作用素をその赤道に直交的に線型化することで不正確な予測が得られるからである。完全なシミュレーションにより, スピンスクイーズ生成がシステムパラメータにどのように依存するかを特徴付けるとともに, キャビティ充填のダイナミクスと徐々に混合して, メトロジー上の優位性が失われるまで, 悪いキャビティ状態から逸脱する。最後に、このスピンスクイーズプロトコルの最先端光時計への応用について論じる。 We analyze the generation of spin-squeezed states via coupling of three-level atoms to an optical cavity and continuous quantum measurement of the transmitted cavity field in order to monitor the evolution of the atomic ensemble. Using analytical treatment and microscopic simulations of the dynamics, we show that one can achieve significant spin squeezing, favorably scaling with the number of atoms $N$. However, contrary to some previous literature, we clarify that it is not possible to obtain Heisenberg scaling without the continuous feedback that is proposed in optimal approaches. In fact, in the adiabatic cavity removal approximation and large $N$ limit, we find the scaling behavior $N^{-2/3}$ for spin squeezing and $N^{-1/3}$ for the corresponding protocol duration. These results can be obtained only by considering the curvature of the Bloch sphere, since linearizing the collective spin operators tangentially to its equator yields inaccurate predictions. With full simulations, we characterize how spin-squeezing generation depends on the system parameters and departs from the bad cavity regime, by gradually mixing with cavity-filling dynamics until metrological advantage is lost. Finally, we discuss the relevance of this spin-squeezing protocol to state-of-the-art optical clocks.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# スパイにおけるホログラフィックエンタングルメントエントロピー A Holographic Entanglement Entropy at Spi ( http://arxiv.org/abs/2311.16056v2 ) ライセンス: Link先を確認	Abir Ghosh, Chethan Krishnan,	(参考訳) 場の量子論における部分領域に対する有限エンタングルメントエントロピーを定義するには、2つの論理的に独立なスケール、すなわち部分領域のサイズを制御するIRスケールとUVカットオフが必要である。 AdS/CFTでは、IRスケールはAdS長尺、UVカットオフはバルクラジアルカットオフ、サブリージョンは無次元の角度で指定される。これは、AdS/CFTにおける龍高柳表面とその領域を決定するデータである。漸近的に平坦な空間には、空間無限大(spi)に関連付けることのできる ``spi-部分領域' という概念が存在すると論じる。幾何的にAdS部分領域とは全く異なるが、この角度データはスピの2分割として解釈できる重要な特徴を持っている。したがって、スパイス領域に関連するRT面の面積は、AdS/CFTのように、この二分割の下でのバルク状態の還元密度行列の絡み合いエントロピーと解釈できる。対称スパイサブリージョンでは、これらのRT面は漸近カウサルダイヤモンドの腰である。空の平坦空間では、それらはリンドラー地平線に還元され、カッシーニ、フエルタ、マイヤーズのAdS-リンドラー地平線に類似する。これらの結果は、空空間のスクリーンに固定された最小曲面に関する以前の研究と結びついており、また、ブラックホールがバルクにある場合の議論を一般化している。スパイス領域としてのブラックホール RT の表面の位相は様々であり、AdS のブラックホール (小、大) の位相と自然に結合する。重要な観測は、放射状カットオフは平らな空間におけるIRスケールと関連しており、実際には紫外線の発散は存在しないということである。これは、サブAdSスケールにおいてホログラフィック双対性はIR/IR対応であり、自由度は局所QFTのそれではなく長弦のものであるという以前の提案と一致している。弦はもちろん、UV有限である。 Defining finite entanglement entropy for a subregion in quantum field theory requires the introduction of two logically independent scales: an IR scale that controls the size of the subregion, and a UV cut-off. In AdS/CFT, the IR scale is the AdS lengthscale, the UV cut-off is the bulk radial cut-off, and the subregion is specified by dimensionless angles. This is the data that determines Ryu-Takayanagi surfaces and their areas in AdS/CFT. We argue that in asymptotically flat space there exists the notion of a ``spi-subregion" that one can associate to spatial infinity (spi). Even though geometrically quite different from an AdS subregion, this angle data has the crucial feature that it allows an interpretation as a bi-partitioning of spi. Therefore, the area of the RT surface associated to the spi-subregion can be interpreted as the entanglement entropy of the reduced density matrix of the bulk state under this bi-partition, as in AdS/CFT. For symmetric spi-subregions, these RT surfaces are the waists of Asymptotic Causal Diamonds. In empty flat space they reduce to Rindler horizons, and are analogues of the AdS-Rindler horizons of Casini, Huerta \& Myers. We connect these results to previous work on minimal surfaces anchored to screens in empty space, but also generalize the discussion to the case where there are black holes in the bulk. The phases of black hole RT surfaces as the spi-subregion is varied, naturally connect with those of black holes (small and large) in AdS. A key observation is that the radial cut-off is associated to an IR scale in flat space -- and in fact there are no UV divergences. We argue that this is consistent with previous suggestions that in sub-AdS scales the holographic duality is an IR/IR correspondence and that the degrees of freedom are {\em not} those of a local QFT, but those of long strings. Strings are of course, famously UV finite.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# FedAL: 敵対的学習によって実現されたブラックボックスのフェデレーション知識蒸留 FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning ( http://arxiv.org/abs/2311.16584v2 ) ライセンス: Link先を確認	Pengchao Han, Xingyan Shi, Jianwei Huang,	(参考訳) 知識蒸留(KD)は、異なるモデルアーキテクチャを持ち、ローカルデータやモデルパラメータを他と共有しない分散クライアント間の協調学習を可能にする。各クライアントは、フェデレートされたKDとして知られるターゲットとして、すべてのクライアントモデルの平均モデル出力/機能を使用して、ローカルモデルを更新する。しかし、クライアントのローカルモデルが不均一なローカルデータセットでトレーニングされている場合、既存のフェデレーションKDメソッドはよく機能しないことが多い。本稿では,クライアント間のデータ不均一性に対処するために,Adversarial Learning (FedAL) によって実現されたフェデレーション知識の蒸留を提案する。まず、データの不均一性に起因するクライアント間の局所モデル出力のばらつきを軽減するため、サーバはクライアント間のコンセンサスモデル出力をクライアントと差別者間のmin-maxゲームを介してクライアント間のコンセンサスモデル出力を達成するために、クライアントのローカルモデルトレーニングを誘導する識別器として機能する。さらに、クライアントの不均一なローカルデータのために、クライアントのローカルトレーニングとグローバルな知識伝達の間に破滅的な忘れが生じる可能性がある。この課題に向けて、我々は、クライアントが他者へ知識を転送/学習する能力を保証するため、ローカルトレーニングとグローバルナレッジトランスファーの両方において、予測の少ない正規化を設計する。実験により,FedALとその変異体は,他の連合KDベースラインよりも高い精度が得られることが示された。 Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# アンチエイリアスレンダリングのためのマルチスケール3次元ガウススプラッティング Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering ( http://arxiv.org/abs/2311.17089v2 ) ライセンス: Link先を確認	Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee,	(参考訳) 3Dガウシアンは最近、3D再構成とレンダリングの非常に効率的な表現として現れた。高精細度で高精細度で高精細度でレンダリングするが、低精細度や遠方のカメラ位置でレンダリングすると大幅に劣化する。低解像度または遠距離レンダリングにおいて、画像の画素サイズは、スティングされた各3Dガウスの画面サイズと比較してニキスト周波数以下になり、エイリアス効果をもたらす。レンダリングは1ピクセルあたりのよりスプティングされたガウスアンの連続したアルファブレンディングによって劇的に遅くなる。これらの問題に対処するために,異なるスケールでガウスを維持できるマルチスケール3次元ガウススプラッティングアルゴリズムを提案する。高解像度画像はより小さなガウスでレンダリングされ、低解像度画像はより小さなガウスでレンダリングされる。同様のトレーニング時間で,本アルゴリズムは,1次元ガウス分割よりも4$\times$-128$\times$スケールレンダリングで,13\%-66\% PSNRと160\%-2400\%のレンダリング速度を達成できる。私たちのコードや他の結果は、プロジェクトのWebサイトhttps://jokeryan.github.io/projects/ms-gs/で公開されています。 3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13\%-66\% PSNR and 160\%-2400\% rendering speed improvement at 4$\times$-128$\times$ scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splitting. Our code and more results are available on our project website https://jokeryan.github.io/projects/ms-gs/	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# Brainformer:fMRIによる人間の視覚脳機能とマシンビジョンモデル Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI ( http://arxiv.org/abs/2312.00236v2 ) ライセンス: Link先を確認	Xuan-Bac Nguyen, Xin Li, Pawan Sinha, Samee U. Khan, Khoa Luu,	(参考訳) 人間の知覚は、信念を形成し、現実を理解する上で重要な役割を果たす。脳機能のより深い理解は、新しいディープニューラルネットワークの開発につながるだろう。本研究では,人間の知覚システムにおける機能的磁気共鳴イメージング(fMRI)パターンを機械学習の観点から解析するための,単純で効果的なトランスフォーマーベースフレームワークであるBrainformerを提案する。具体的には,fMRI信号を用いて脳活動パターンを探索するマルチスケールfMRI変換器を提案する。このアーキテクチャは、高次元fMRI信号符号化のためのシンプルだが効率的なモジュールを含み、3D Voxels Embeddingと呼ばれる新しい埋め込み技術が組み込まれている。第2に、脳の関心領域の機能からインスピレーションを得て、脳fMRI誘導損失と呼ばれる新しい損失関数を導入する。この損失関数は、fMRIデータを用いてディープニューラルネットワークのこれらの領域からの脳活動パターンを模倣する。この研究は、人間の知覚からニューラルネットワークへ知識を伝達する先進的なアプローチを導入する。実験により、fMRI情報を活用することで、様々な画像認識タスクにおいて、マシンビジョンモデルがState-of-the-Artメソッドに匹敵する結果が得られることを示した。 Human perception plays a vital role in forming beliefs and understanding reality. A deeper understanding of brain functionality will lead to the development of novel deep neural networks. In this work, we introduce a novel framework named Brainformer, a straightforward yet effective Transformer-based framework, to analyze Functional Magnetic Resonance Imaging (fMRI) patterns in the human perception system from a machine-learning perspective. Specifically, we present the Multi-scale fMRI Transformer to explore brain activity patterns through fMRI signals. This architecture includes a simple yet efficient module for high-dimensional fMRI signal encoding and incorporates a novel embedding technique called 3D Voxels Embedding. Secondly, drawing inspiration from the functionality of the brain's Region of Interest, we introduce a novel loss function called Brain fMRI Guidance Loss. This loss function mimics brain activity patterns from these regions in the deep neural network using fMRI data. This work introduces a prospective approach to transfer knowledge from human perception to neural networks. Our experiments demonstrate that leveraging fMRI information allows the machine vision model to achieve results comparable to State-of-the-Art methods in various image recognition tasks.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# ImputeFormer: 一般化可能な時空間インプットのための低ランク変換器 ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation ( http://arxiv.org/abs/2312.01728v3 ) ライセンス: Link先を確認	Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, Jian Sun,	(参考訳) データ不足は、特に時空間データのモデリングにおいて、科学と工学の両方のタスクにおいて広範囲にわたる問題である。この問題はデータ駆動型ソリューションに貢献するために多くの研究を惹きつける。既存の計算ソリューションには、主に低ランクモデルとディープラーニングモデルが含まれる。前者は一般的な構造上の先行を前提としているが、モデル能力は限られている。後者は表現性の健全な特徴を持っているが、基礎となる時空間構造についての事前の知識が欠けている。 2つのパラダイムの強みを生かして、強い帰納バイアスと高いモデル表現率のバランスをとるために、低ランク化誘起変換器を実証する。時空間データの固有構造を活用することにより、バランスの取れた信号-雑音表現を学習し、様々な計算問題に対して一般化することができる。交通流,太陽エネルギー,スマートメーター,空気品質など,異種データセットの精度,効率,汎用性において,その優位性を示す。実証結果の証明は、低ランク性のような時系列プリミティブを組み込むことで、広範囲の時空間計算問題にアプローチする一般化可能なモデルの開発を大幅に促進できるという強い信念を与える。 Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to data-driven solutions. Existing imputation solutions mainly include low-rank models and deep learning models. The former assumes general structural priors but has limited model capacity. The latter possesses salient features of expressivity but lacks prior knowledge of the underlying spatiotemporal structures. Leveraging the strengths of both two paradigms, we demonstrate a low rankness-induced Transformer to achieve a balance between strong inductive bias and high model expressivity. The exploitation of the inherent structures of spatiotemporal data enables our model to learn balanced signal-noise representations, making it generalizable for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and versatility in heterogeneous datasets, including traffic flow, solar energy, smart meters, and air quality. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rankness, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# FaultFormer: 適応型ベアリング故障分類のための事前学習用変換器 FaultFormer: Pretraining Transformers for Adaptable Bearing Fault Classification ( http://arxiv.org/abs/2312.02380v3 ) ライセンス: Link先を確認	Anthony Zhou, Amir Barati Farimani,	(参考訳) グローバルな消費の増大は、ディープラーニングのスマート製造や機械の健康モニタリングへの重要な応用を動機付けてきた。特に、振動データの解析は、軸受欠陥の検出により予測保守に関する有意義な洞察を抽出する大きな可能性を秘めている。ディープラーニングは、これらの機械的故障を予測する強力な方法だが、新しいタスクやデータセットへの一般化性に欠け、高価なラベル付き機械的データを必要とする。本稿では,トランスモデルに基づく自己教師型事前学習および微調整フレームワークを提案することで,この問題に対処する。特に、トランスモデルを用いて、最先端の精度に到達するための異なるトークン化とデータ拡張戦略について検討する。さらに、振動信号に対する自己教師付きマスクプリトレーニングとその低データ状態、タスク適応、データセット適応への応用を実証する。事前トレーニングは、不足した未確認のトレーニングサンプルのパフォーマンス向上と、事前トレーニングディストリビューション以外の障害クラスを微調整する際のパフォーマンス向上を可能にする。さらに、事前訓練されたトランスフォーマーは、数ショットで異なるデータセットに一般化できることが示されている。このパラダイムでは、異なるベアリング、障害、機械からラベル付けされていないデータに基づいてモデルを事前訓練し、特定の製造ニーズに合った、新しいデータ共有アプリケーションに素早くデプロイすることが可能になる。 The growth of global consumption has motivated important applications of deep learning to smart manufacturing and machine health monitoring. In particular, analyzing vibration data offers great potential to extract meaningful insights into predictive maintenance by the detection of bearing faults. Deep learning can be a powerful method to predict these mechanical failures; however, they lack generalizability to new tasks or datasets and require expensive, labeled mechanical data. We address this by presenting a novel self-supervised pretraining and fine-tuning framework based on transformer models. In particular, we investigate different tokenization and data augmentation strategies to reach state-of-the-art accuracies using transformer models. Furthermore, we demonstrate self-supervised masked pretraining for vibration signals and its application to low-data regimes, task adaptation, and dataset adaptation. Pretraining is able to improve performance on scarce, unseen training samples, as well as when fine-tuning on fault classes outside of the pretraining distribution. Furthermore, pretrained transformers are shown to be able to generalize to a different dataset in a few-shot manner. This introduces a new paradigm where models can be pretrained on unlabeled data from different bearings, faults, and machinery and quickly deployed to new, data-scarce applications to suit specific manufacturing needs.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# マシンビジョンセラピー:マルチモーダルな大規模言語モデルでは、文脈内学習による視覚的ロバスト性を高めることができる Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning ( http://arxiv.org/abs/2312.02546v2 ) ライセンス: Link先を確認	Zhuo Huang, Chang Liu, Yinpeng Dong, Hang Su, Shibao Zheng, Tongliang Liu,	(参考訳) Contrastive Language-Image Pre-Training (CLIP) のような視覚モデルは、優れた一般化性能を示すが、そのゼロショットのロバスト性は、微調整なしではout-of-Distribution (OOD) のシナリオで制限されている。一般的に行われるように、人間の監督を好ましく提供するのではなく、強力な視覚的理解能力を持つマルチモーダルな大規模言語モデル(MLLM)を利用することができる。しかし、MLLMはタスクの不整合性により視覚問題に苦しむことが示され、その利用を妨げている。本稿では,MLLMを効果的に活用して,視覚モデルからノイズ予測を補正するマシンビジョンセラピーを提案する。復調ラベルを微調整することにより、教師なしの方法で学習モデルの性能を高めることができる。不整合性問題を解決するために,視覚タスクをMLLMと整合させる新しいDICL戦略を提案する。具体的には、あるクラスが他のクラスと混同される確率を推定する遷移行列を推定することにより、最も確率の高いノイズクラスから正しい例と間違った例を含む命令を構築することができる。このような命令は、ICL能力を持つ任意のMLLMにおいて、視覚モデルの誤った予測を検出し、修正するのに役立つ。 ImageNet、WILDS、DomainBed、その他のOODデータセットに関する広範な実験を通じて、本手法の定量的かつ定性的な効果を慎重に検証する。私たちのコードはhttps://github.com/tmllab/Machine_Vision_Therapyで利用可能です。 Although vision models such as Contrastive Language-Image Pre-Training (CLIP) show impressive generalization performance, their zero-shot robustness is still limited under Out-of-Distribution (OOD) scenarios without fine-tuning. Instead of undesirably providing human supervision as commonly done, it is possible to take advantage of Multi-modal Large Language Models (MLLMs) that hold powerful visual understanding abilities. However, MLLMs are shown to struggle with vision problems due to the incompatibility of tasks, thus hindering their utilization. In this paper, we propose to effectively leverage MLLMs to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models. By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner. To solve the incompatibility issue, we propose a novel Denoising In-Context Learning (DICL) strategy to align vision tasks with MLLMs. Concretely, by estimating a transition matrix that captures the probability of one class being confused with another, an instruction containing a correct exemplar and an erroneous one from the most probable noisy class can be constructed. Such an instruction can help any MLLMs with ICL ability to detect and rectify incorrect predictions of vision models. Through extensive experiments on ImageNet, WILDS, DomainBed, and other OOD datasets, we carefully validate the quantitative and qualitative effectiveness of our method. Our code is available at https://github.com/tmllab/Machine_Vision_Therapy.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# 自己監督型事前訓練とカスタマイズ型ファインチューニングを用いた変圧器によるレーンレンダリングの知的異常検出 Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning ( http://arxiv.org/abs/2312.04398v2 ) ライセンス: Link先を確認	Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah,	(参考訳) デジタルマップを使った急成長するナビゲーションサービスは、ドライバーにとって非常に便利だ。それでも、レーンレンダリングマップ画像における異常の存在は、しばしば潜在的な危険をもたらし、そのような異常は人間の運転者に誤解を与え、結果として安全でない運転条件に寄与する。そこで本論文では,データ前処理,マスク付き画像モデリング(MiM)手法による自己教師型事前学習,ラベル平滑化によるクロスエントロピーベース損失を用いた微調整,そして後処理により,最先端のディープラーニング技術,特にトランスフォーマーモデルを用いた4相パイプラインを提案する。提案したパイプラインの有効性を検証した各種実験を行った。その結果,提案パイプラインはレーンレンダリング画像異常検出において優れた性能を示し,特にMiMを用いた自己教師付き事前学習は,全体のトレーニング時間を著しく短縮し,検出精度を大幅に向上させることができることがわかった。例えば、Uniform Maskingを自己教師付きプレトレーニング(Swin-Trans-UM)として使用すると、94.77%の精度が得られ、AUCスコアは0.9743となり、プレトレーニングのない純粋なSwin Transformer(Swin-Trans)は94.01%、AUCは0.9498となった。微調整のエポックは、オリジナルの280から41に劇的に縮小された。結論として,MiMや他の先進的なディープラーニング技術を用いた自己教師付き事前学習を取り入れたパイプラインが,デジタルナビゲーションシステムにおけるレーンレンダリング画像異常検出の精度と効率を高めるための堅牢なソリューションとして登場した。 The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# AIベースのリアクティブシステムによるサイバー攻撃の対処 - 全体論と今後の展望 Tackling Cyberattacks through AI-based Reactive Systems: A Holistic Review and Future Vision ( http://arxiv.org/abs/2312.06229v2 ) ライセンス: Link先を確認	Sergio Bernardez Molina, Pantaleone Nespoli, Félix Gómez Mármol,	(参考訳) 情報技術(IT)の利用が、今日の世界で指数的な成長を遂げていることは否定できない。このデジタルトランスフォーメーションは、サイバー犯罪の領域において、数多くのセキュリティ上の課題を引き起こしている。こうした脅威に応えて、公共部門と民間部門はITセキュリティ対策の強化を優先している。セキュリティ上の懸念が高まる中、人工知能(AI)はサイバーセキュリティの世界で注目を集めている。本稿では,AIによる脅威応答システムの最近の進歩を包括的に調査する。私たちの知る限り、AI反応ドメインに関する最新の調査は2017年に実施された。それ以来、多くの文献が出版されており、レビューする価値がある。最先端の反応系に関する包括的調査では、複数の値を持つ5つの重要な特徴が同定され、異なる研究間の均質な比較が促進された。さらに、記事収集の厳密な方法論を通じて、この分野で最も関係のある22の出版物が選択されている。その後、これらの出版物は、識別された特徴を用いて詳細な分析の対象となり、論文間の重要な関係を明らかにする包括的な概要を生成できるようになった。これらの関係は、文学における潜在的なギャップの同定とともに、論文でさらに詳しく説明され、将来的な貢献を導く可能性がある。これらの潜在的なギャップを指摘し、具体的な提案を通じて可能な開発領域を提案することで、合計7つの研究課題が特定されている。 There is no denying that the use of Information Technology (IT) is undergoing exponential growth in today's world. This digital transformation has also given rise to a multitude of security challenges, notably in the realm of cybercrime. In response to these growing threats, public and private sectors have prioritized the strengthening of IT security measures. In light of the growing security concern, Artificial Intelligence (AI) has gained prominence within the cybersecurity landscape. This paper presents a comprehensive survey of recent advancements in AI-driven threat response systems. To the best of our knowledge, the most recent survey covering the AI reaction domain was conducted in 2017. Since then, considerable literature has been published, and therefore, it is worth reviewing it. In this comprehensive survey of the state of the art reaction systems, five key features with multiple values have been identified, facilitating a homogeneous comparison between the different works. In addition, through a meticulous methodology of article collection, the 22 most relevant publications in the field have been selected. Then each of these publications has been subjected to a detailed analysis using the features identified, which has allowed for the generation of a comprehensive overview revealing significant relationships between the papers. These relationships are further elaborated in the paper, along with the identification of potential gaps in the literature, which may guide future contributions. A total of seven research challenges have been identified, pointing out these potential gaps and suggesting possible areas of development through concrete proposals.	翻訳日:2024-05-31 00:00:32 公開日:2024-05-29
# テキストガイドでリアル世界のイメージをデノイング Tell Me What You See: Text-Guided Real-World Image Denoising ( http://arxiv.org/abs/2312.10191v2 ) ライセンス: Link先を確認	Erez Yosef, Raja Giryes,	(参考訳) ノイズセンサによる画像再構成は難しい問題である。多くの解決策が提案されているが、主なアプローチは、シーンのノイズの真の統計をモデル化すると共に、優れた自然像を事前に学習することである。非常に低い照明条件下では、そのようなアプローチは通常不十分であり、例えば、複数のキャプチャーを使用するという形で追加情報が必要である。我々は,シーンを撮影する撮影者が容易に行えるように,シーンを事前に記述する代替手段として提案する。画像生成における拡散モデルの成功に触発されて,テキスト誘導拡散モデルを用いて,画像キャプション情報の追加は,合成画像と実画像の両方において,画像の復調と再構成を著しく改善することを示す。 Image reconstruction from noisy sensor measurements is a challenging problem. Many solutions have been proposed for it, where the main approach is learning good natural images prior along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of using multiple captures. We suggest as an alternative to add a description of the scene as prior, which can be easily done by the photographer capturing the scene. Inspired by the remarkable success of diffusion models for image generation, using a text-guided diffusion model we show that adding image caption information significantly improves image denoising and reconstruction on both synthetic and real-world images.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# SICsとTriangle Group$(3,3,3)$ SICs and the Triangle Group $(3,3,3)$ ( http://arxiv.org/abs/2312.13400v2 ) ライセンス: Link先を確認	Danylo Yakymenko,	(参考訳) 対称情報完備な正値測度 (SICs for short) がすべての次元に存在するという問題は、ザウナー予想として知られており、今日まで残っている。既知のSICの例のほとんどは、ワイル・ハイゼンベルク群の作用の軌道として構成されている。このような場合、SICは、ワイル・ハイゼンベルク群の自己同型を定義する、いわゆる正準三次ユニタリの下で不変であるようである。ここでは、これらの位数 3 個のユニタリが三角形群 $(3,3,3)$ の射影ユニタリ表現に現れることを示す。このような表現の完全な記述と、正準三次ユニタリの構造に関する結果を得るためにどのように使用できるかを示す。特に、任意の正準位数 3 のユニタリが、次元 $d>3$ が素数であれば、ザウナーのユニタリに共役であるという事実を証明する別の方法を示す。 The problem of existence of symmetric informationally-complete positive operator-valued measures (SICs for short) in every dimension is known as Zauner's conjecture and remains open to this day. Most of the known SIC examples are constructed as an orbit of the Weyl-Heisenberg group action. It appears that in these cases SICs are invariant under the so-called canonical order-three unitaries, which define automorphisms of the Weyl-Heisenberg group. In this note, we show that those order-three unitaries appear in projective unitary representations of the triangle group $(3,3,3)$. We give a full description of such representations and show how it can be used to obtain results about the structure of canonical order-three unitaries. In particular, we present an alternative way of proving the fact that any canonical order-three unitary is conjugate to Zauner's unitary if the dimension $d>3$ is prime.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# 平面符号による二項符号の結合 Concatenating Binomial Codes with the Planar Code ( http://arxiv.org/abs/2312.14390v2 ) ライセンス: Link先を確認	Juliette Soule, Andrew C. Doherty, Arne L. Grimsmo,	(参考訳) 回転対称ボソニック符号(英: Rotation symmetric bosonic codes)は、特に超伝導量子ビット実験において、量子ビットの振動度に魅力的な符号化法である。これらのコードはかなりの損失とデファス化を許容するが、大規模なデバイスを実現するためには、より高いレベルのコードと組み合わせる必要がある。耐故障性量子計算のための計測に基づくスキームにおいて,これらの符号と平面符号の整合性について検討する。本稿では,基本レベルエンコーディングとしての二項符号に着目し,様々な種類の測定プロトコルにおいて損失を受ける場合のブレークフェアポイントを推定する。これらの符号は光子損失誤差に耐性があるが、ゲート演算と測定には平均光子数と位相分解能の両方を必要とする。二項符号量子ビットを用いた平面符号の優れた性能を得るために、適応位相測定、最大量子状態推定、重み付き最小重み復号法を実装する必要がある。 Rotation symmetric bosonic codes are an attractive encoding for qubits into oscillator degrees of freedom, particularly in superconducting qubit experiments. While these codes can tolerate considerable loss and dephasing, they will need to be combined with higher level codes to achieve large-scale devices. We investigate concatenating these codes with the planar code in a measurement-based scheme for fault-tolerant quantum computation. We focus on binomial codes as the base level encoding, and estimate break-even points for such encodings under loss for various types of measurement protocol. These codes are more resistant to photon loss errors, but require both higher mean photon numbers and higher phase resolution for gate operations and measurements. We find that it is necessary to implement adaptive phase measurements, maximum likelihood quantum state inference, and weighted minimum weight decoding to obtain good performance for a planar code using binomial code qubits.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# 多項集合に対する排他的有限時間相関関数:量子輸送と熱力学の理論的枠組みの連結 Exact finite-time correlation functions for multi-terminal setups: Connecting theoretical frameworks for quantum transport and thermodynamics ( http://arxiv.org/abs/2312.15065v3 ) ライセンス: Link先を確認	Gianmichele Blasi, Shishir Khandelwal, Géraldine Haack,	(参考訳) オープン量子系の輸送は、量子マスター方程式、散乱行列、ハイゼンベルク運動方程式など、様々な理論的な枠組みを通して探索することができる。フレームワークの選択は、相互作用の存在、システムと環境の間の結合強度、定常状態か過渡状態かといった要因に依存する。既存の文献はこれらのフレームワークを独立して扱い、統一された視点を欠いている。我々の研究は、電圧と温度の偏りの下での2段階のセットアップにおいて、最小レベルの量子ドットモデルを用いて、これらのアプローチの役割と現状を明らかにすることで、このギャップに対処する。粒子およびエネルギー電流の解析式と定常状態と過渡状態の両方における変動を導出する。ハイゼンベルク方程式の厳密な結果は、それぞれの妥当性条件の中で散乱行列やマスター方程式のアプローチと一致していることが示される。重要なことは、Heisenberg や散乱行列による弱カップリングにおけるマスター方程式の適用性を任意の結合強度でブリッジし、弱カップリング限界のプロトコルを確立することである。 Transport in open quantum systems can be explored through various theoretical frameworks, including the quantum master equation, scattering matrix, and Heisenberg equation of motion. The choice of framework depends on factors such as the presence of interactions, the coupling strength between the system and environment, and whether the focus is on steady-state or transient regimes. Existing literature treats these frameworks independently, lacking a unified perspective. Our work addresses this gap by clarifying the role and status of these approaches using a minimal single-level quantum dot model in a two-terminal setup under voltage and temperature biases. We derive analytical expressions for particle and energy currents and their fluctuations in both steady-state and transient regimes. Exact results from the Heisenberg equation are shown to align with scattering matrix and master equation approaches within their respective validity regimes. Crucially, we establish a protocol for the weak-coupling limit, bridging the applicability of master equations at weak-coupling with Heisenberg or scattering matrix approaches at arbitrary coupling strength.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# ファジィドライバ生成のためのプロンプトファジィ Prompt Fuzzing for Fuzz Driver Generation ( http://arxiv.org/abs/2312.17677v2 ) ライセンス: Link先を確認	Yunlong Lyu, Yuxuan Xie, Peng Chen, Hao Chen,	(参考訳) 高品質なファジィドライバを構築するには時間を要するだけでなく、ライブラリの深い理解も必要です。しかし、最先端のファズドライバ自動生成技術は期待に届かなかった。消費者コードから派生したファジィドライバは深い州に到達できるが、カバー範囲は限られている。逆に、解釈ファジィはほとんどのAPI呼び出しを探索できるが、大規模な検索空間内では数多くの試行が必要である。 PromptFuzzは,未知のライブラリコードを探索するために,ファジィドライバを反復的に生成するファジィ処理を行う,カバーガイド付ファジィファジィである。ファジィファジィ処理におけるファジィドライバのAPI使用法を検討するために,命令型プログラム生成,誤プログラム検証,カバレッジ誘導型プロンプト突然変異,制約付きファジィスケジューリングなど,いくつかの重要な手法を提案する。 PromptFuzzを実装し,14の現実世界のライブラリで評価した。 OSS-FuzzとHopper(最先端のファズドライバ生成ツール)と比較して、PromptFuzzが生成したファズドライバはそれぞれOSS-FuzzとHopperのブランチカバレッジの1.61倍と1.63倍を達成した。さらに、PromptFuzzが生成したファズドライバは、合計49回のクラッシュのうち33回の真に新しいバグを検出し、そのうち30回のバグがそれぞれのコミュニティによって確認されている。 Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous attempts within a large search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we propose several key techniques: instructive program generation, erroneous program validation, coverage-guided prompt mutation, and constrained fuzzer scheduling. We implemented PromptFuzz and evaluated it on 14 real-world libraries. Compared with OSS-Fuzz and Hopper (the state-of-the-art fuzz driver generation tool), fuzz drivers generated by PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than those by OSS-Fuzz and Hopper, respectively. Moreover, the fuzz drivers generated by PromptFuzz detected 33 genuine, new bugs out of a total of 49 crashes, out of which 30 bugs have been confirmed by their respective communities.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# タブラルデータの自動モデル選択 Automated Model Selection for Tabular Data ( http://arxiv.org/abs/2401.00961v2 ) ライセンス: Link先を確認	Avinash Amballa, Gayathri Akkinapalli, Manas Madine, Naga Pavana Priya Yarrabolu, Przemyslaw A. Grabowicz,	(参考訳) 表形式のデータセットの形式で構造化されたデータには、異なる、離散的な特徴が含まれており、個々の重要度とターゲットに対する相対的重要性は様々である。 1つ以上の機能の組み合わせは、単純な個々の機能コントリビューションよりも予測的かつ有意義なものです。 Rの混合効果線形モデルライブラリは、モデル設計においてそのようなインタラクティブな機能の組み合わせを提供することができる。しかし、多くの特徴とそこから選択できる相互作用を考えると、モデル選択は指数関数的に難しいタスクとなる。計算コストを小さく保ちながら特徴的相互作用を取り入れた表型データセットの予測のためのモデル選択プロセスを自動化することを目的としている。このフレームワークには、優先順位ベースのランダムグリッド検索とグレディ検索という、2つの異なる機能選択のアプローチが含まれている。優先性に基づくアプローチでは、事前確率を用いて特徴組合せを効率的に探索し、探索を誘導する。 Greedyメソッドは、その影響に基づいて機能を追加したり削除したりすることで、反復的にソリューションを構築します。合成実験は、予測的特徴の組み合わせを効果的に捉える能力を示す。 Structured data in the form of tabular datasets contain features that are distinct and discrete, with varying individual and relative importances to the target. Combinations of one or more features may be more predictive and meaningful than simple individual feature contributions. R's mixed effect linear models library allows users to provide such interactive feature combinations in the model design. However, given many features and possible interactions to select from, model selection becomes an exponentially difficult task. We aim to automate the model selection process for predictions on tabular datasets incorporating feature interactions while keeping computational costs small. The framework includes two distinct approaches for feature selection: a Priority-based Random Grid Search and a Greedy Search method. The Priority-based approach efficiently explores feature combinations using prior probabilities to guide the search. The Greedy method builds the solution iteratively by adding or removing features based on their impact. Experiments on synthetic demonstrate the ability to effectively capture predictive feature combinations.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# 変動創発予測のためのグラフニューラルネットワークのダイナミクスに基づく特徴増強 Dynamics-based Feature Augmentation of Graph Neural Networks for Variant Emergence Prediction ( http://arxiv.org/abs/2401.03390v2 ) ライセンス: Link先を確認	Majd Al Aawar, Srikar Mutnuri, Mansooreh Montazerin, Ajitesh Srivastava,	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックで、新型ウイルスの出現が大きな要因となっている。 1つ以上の国で新しい変種が出現すると、他の国は、その潜在的な到着に備えてその拡散を監視する。新しい変種の影響と流行のピークのタイミングは、変種がいつ到着するかに大きく依存する。新たな変種の普及を予測するための現在の手法は統計モデルに依存しているが、これらの手法は、新しい変種が既に関心のある領域に到達し、有意な有病率を持つ場合にのみ有効である。既存の変種が特定のリージョンにいつ到着するかを予測できますか? この問題に対処するために,変量力学インフォームドグラフニューラルネット(GNN)アプローチを提案する。まず、大規模な流行モデルに適用可能な地域(国)のペア間での変動有病率のダイナミクスを導出する。このダイナミクスは、GNNに特定の機能を導入する動機となっている。提案した動的インフォームドGNNは,現在普及している物理インフォームドニューラルネットワーク(PINN)のフレームワークを含む,すべてのベースラインより優れていることを示す。そこで本研究では,87か国,36変種を対象に,ユーザ定義モデルの予測性能を評価するベンチマークツールを提案する。 During the COVID-19 pandemic, a major driver of new surges has been the emergence of new variants. When a new variant emerges in one or more countries, other nations monitor its spread in preparation for its potential arrival. The impact of the new variant and the timings of epidemic peaks in a country highly depend on when the variant arrives. The current methods for predicting the spread of new variants rely on statistical modeling, however, these methods work only when the new variant has already arrived in the region of interest and has a significant prevalence. Can we predict when a variant existing elsewhere will arrive in a given region? To address this question, we propose a variant-dynamics-informed Graph Neural Network (GNN) approach. First, we derive the dynamics of variant prevalence across pairs of regions (countries) that apply to a large class of epidemic models. The dynamics motivate the introduction of certain features in the GNN. We demonstrate that our proposed dynamics-informed GNN outperforms all the baselines, including the currently pervasive framework of Physics-Informed Neural Networks (PINNs). To advance research in this area, we introduce a benchmarking tool to assess a user-defined model's prediction performance across 87 countries and 36 variants.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# SynHing: グラフ学習と説明のための合成異種情報ネットワーク生成 SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation ( http://arxiv.org/abs/2401.04133v2 ) ライセンス: Link先を確認	Ming-Yi Hong, Yi-Hsiang Huang, Shao-En Lin, You-Chen Teng, Chih-Yu Wang, Che Lin,	(参考訳) グラフニューラルネットワーク(GNN)は、コミュニティ分析やレコメンデーションシステムなど、さまざまな領域でグラフ構造を記述している。 GNNの解釈がますます重要になるにつれて、堅牢なベースラインと拡張グラフデータセットの需要は、特に異種情報ネットワーク(HIN)の文脈において強調される。そこで我々はSynHingを紹介した。SynHingはグラフ学習と説明の強化を目的としたSynthetic Heterogeneous Information Network Generationの新しいフレームワークである。 SynHingは、ターゲットHINの主要なモチーフを体系的に識別し、クラスタ内およびクラスタ間マージモジュールによるボトムアップ生成プロセスを使用する。この過程は、後処理技術によって補われ、合成HINが元のグラフの構造的および統計的性質を密接に反映することを保証する。重要な点として、SynHingはGNN説明モデルの評価、説明可能な合成HIN生成のための新しい標準の設定、複雑なネットワークにおける解釈可能な機械学習の進歩に寄与する、地道なモチーフを提供する。 Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# 拡散モデルに基づく効率的な画像分解ネットワーク Efficient Image Deblurring Networks based on Diffusion Models ( http://arxiv.org/abs/2401.05907v2 ) ライセンス: Link先を確認	Kang Chen, Yuanjie Liu,	(参考訳) 本稿では,デフォーカスデブロアリングのためのスライディングウインドウモデル,Swintormerについて述べる。この方法は拡散モデルを用いて遅延前の特徴を生成し、より詳細な画像の復元を支援する。さらに、スライドウィンドウ戦略を適用することで、推論効率を高めるために、特殊なTransformerブロックが組み込まれている。この新しいアプローチの採用により、イテレーション毎にMAC(Multiply-Accumulate Operations)が大幅に削減され、メモリ要件が大幅に削減された。現在のGRL法と比較して、Swintormerモデルはメモリ容量に依存する計算負荷を140.35 GMACsから8.02 GMACsに大幅に削減し、デフォーカスのデフォーカスを27.04 dBから27.07 dBに改善した。この革新的な技術は、メモリ制限されたデバイス上での高解像度画像の処理を可能にし、潜在的なアプリケーションシナリオを大幅に広げる。この記事では、それぞれのネットワークモジュールが最終的なパフォーマンスにどのように貢献するかを網羅的に調査するアブレーションスタディをまとめて紹介する。ソースコードとモデルは、以下のWebサイトで利用可能になる。 This article presents a sliding window model for defocus deblurring, named Swintormer, which achieves the best performance to date with remarkably low memory usage. This method utilizes a diffusion model to generate latent prior features, aiding in the restoration of more detailed images. Additionally, by adapting the sliding window strategy, it incorporates specialized Transformer blocks to enhance inference efficiency. The adoption of this new approach has led to a substantial reduction in Multiply-Accumulate Operations (MACs) per iteration, drastically cutting down memory requirements. In comparison to the currently leading GRL method, our Swintormer model significantly reduces the computational load that must depend on memory capacity, from 140.35 GMACs to 8.02 GMACs, while improving the Peak Signal-to-Noise Ratio (PSNR) for defocus deblurring from 27.04 dB to 27.07 dB. This innovative technique enables the processing of higher resolution images on memory-limited devices, vastly broadening potential application scenarios. The article wraps up with an ablation study, offering a comprehensive examination of how each network module contributes to the final performance.The source code and model will be available at the following website: https://github.com/bnm6900030/swintormer.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# 大規模言語モデルのためのコードシミュレーションの課題 Code Simulation Challenges for Large Language Models ( http://arxiv.org/abs/2401.09074v3 ) ライセンス: Link先を確認	Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge,	(参考訳) 多くの推論、計画、問題解決タスクは本質的なアルゴリズムの性質を共有している。この研究は、Large Language Models (LLM) がいかにコーディングとアルゴリズムタスクをシミュレートし、そのようなアルゴリズム推論タスクにおける一般的な機能についての洞察を提供するかを研究する。我々は、直線プログラムのベンチマーク、クリティカルパスを含むコード、近似命令および冗長命令を導入する。さらに,アルゴリズムのソートとネストループによるLLMのシミュレーション能力を評価し,ルーチンの計算複雑性がLLMの実行をシミュレートする能力に直接影響を与えることを示す。最も強力なLCMは比較的強力なシミュレーション能力を示すが、このプロセスは脆弱であり、パターン認識に大きく依存しており、記憶の影響を受けている。本稿では,コンパイラの計算パターンを行/追従することによって,LLMにコード実行行をシミュレートするように指示する,既成の計算処理手法であるChain of Simulation(CoSm)を提案する。 CoSmは、シミュレーション性能を改善しながら、LLMの記憶と浅いパターン認識を効率的に行う。コードシミュレーションにおけるCoSmの成功は、他の一般的なシミュレーション推論タスクにインスピレーションを与えるものだと考えている。 Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks.	翻訳日:2024-05-30 23:50:38 公開日:2024-05-29
# 大規模言語モデル時代の進化的計算:サーベイとロードマップ Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap ( http://arxiv.org/abs/2401.10034v3 ) ライセンス: Link先を確認	Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, Kay Chen Tan,	(参考訳) 大規模言語モデル (LLMs) は自然言語処理に革命をもたらしただけでなく、様々な分野にも進出し、人工知能への大きな一歩を踏み出した。 LLMと進化的アルゴリズム(EA)の相互作用は、目的や方法論の違いにもかかわらず、複雑な問題に適用可能性の共通の追求を共有している。一方、EAは、ブラックボックス設定下でのLLMのさらなる拡張のための最適化フレームワークを提供し、柔軟性のあるグローバル検索能力を持つLLMに権限を与えることができる。一方、LLMに固有の豊富なドメイン知識により、EAはよりインテリジェントな検索を行うことができる。さらに、LLMのテキスト処理と生成能力は、幅広いタスクにまたがってEAをデプロイするのに役立ちます。本稿では,これらの相補的優位性に基づいて,相互インスピレーションを LLM 強化 EA と EA 強化 LLM の2つの主要経路に分類する,徹底的なレビューと,先進的なロードマップを提供する。コード生成、ソフトウェア工学、ニューラルアーキテクチャ探索、および様々な生成タスクを含む様々なシナリオにおいて、LLMとEAの相補性を実証するために、いくつかの統合されたシナジー手法が導入された。 LLM時代のEA研究に焦点をあてた最初の総合的なレビューとして、本論文はLLMとEAの協調可能性を理解するための基礎的な足場を提供する。特定された課題と今後の方向性は、研究者や実践者が、最適化と人工知能の進歩を推進し、この革新的なコラボレーションの可能性を最大限に解き放つためのガイダンスを提供する。私たちは、関連する論文をインデックスするGitHubリポジトリを作成しました。 Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence. We have created a GitHub repository to index the relevant papers: https://github.com/wuxingyu-ai/LLM4EC.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 交通流予測のためのハイブリッド時変グラフニューラルネットワーク A novel hybrid time-varying graph neural network for traffic flow forecasting ( http://arxiv.org/abs/2401.10155v3 ) ライセンス: Link先を確認	Benao Dai, Baolin Ye, Lingxi Li,	(参考訳) これらの課題を克服するために,交通流予測のためのハイブリッド時変グラフニューラルネットワーク(HTVGNN)を提案する。まず、時間変化マスク強化に基づく新しいマルチアテンション機構を報告し、トラフィックネットワーク内の異なるトラフィックノード間の動的時間依存性をより正確にモデル化した。次に,道路ネットワークにおける異なる交通ノード間の静的および動的空間的関連を同時に学習するグラフ学習手法を提案する。一方、時間変化グラフの学習能力を高めるために、各時間ステップで学習したグラフを結合するグラフ学習機構が設計された。最後に,提案手法の有効性を4つの実データを用いて実証した。シミュレーションの結果,HTVGNNは最先端の時空間グラフニューラルネットワークモデルと比較して予測精度が優れていることがわかった。さらに、このアブレーション実験により、結合グラフ学習機構がHTVGNNの長期予測性能を効果的に向上できることを確認した。 In order to overcome these challenges, we have proposed a novel hybrid time-varying graph neural network (HTVGNN) for traffic flow prediction. Firstly, a novel time-aware multi-attention mechanism based on time-varying mask enhancement was reported to more accurately model the dynamic temporal dependencies among distinct traffic nodes in the traffic network. Secondly, we have proposed a novel graph learning strategy to concurrently learn both static and dynamic spatial associations between different traffic nodes in road networks. Meanwhile, in order to enhance the learning ability of time-varying graphs, a coupled graph learning mechanism was designed to couple the graphs learned at each time step. Finally, the effectiveness of the proposed method HTVGNN was demonstrated with four real data sets. Simulation results revealed that HTVGNN achieves superior prediction accuracy compared to the state of the art space-time graph neural network models. Additionally, the ablation experiment verifies that the coupled graph learning mechanism can effectively improve the long-term prediction performance of HTVGNN.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 知的障害者の表情認識における深層学習の有効性の評価 Assessing the Efficacy of Deep Learning Approaches for Facial Expression Recognition in Individuals with Intellectual Disabilities ( http://arxiv.org/abs/2401.11877v2 ) ライセンス: Link先を確認	F. Xavier Gaya-Morey, Silvia Ramis, Jose M. Buades-Rubio, Cristina Manresa-Yee,	(参考訳) 表情認識は、ユーザの感情状態を識別する能力を持つ社会ロボットを付与する手段として重要視されている。社会ロボティクスの使用には、家庭、介護施設、保育所など様々な設定が含まれており、幅広い利用者に利用されている。しかし,知的障害者の表情認識の直接的利用は,本研究ではまだ研究されていない。この目的を達成するために、知的障害を持つ個人がいないデータセットの集合や、そのような個人を特徴とするデータセットを含む、さまざまなアプローチで12の畳み込みニューラルネットワークのセットをトレーニングする。本研究の結果は, 知的障害者, 知的障害者, および, 知的障害者の間で, 表情に有意な差異が認められた。注目すべきことに,本研究では,各利用者の表情を効果的に扱えるように調整したユーザ固有の訓練手法により,この集団内での表情認識の必要性が示された。 Facial expression recognition has gained significance as a means of imparting social robots with the capacity to discern the emotional states of users. The use of social robotics includes a variety of settings, including homes, nursing homes or daycare centers, serving to a wide range of users. Remarkable performance has been achieved by deep learning approaches, however, its direct use for recognizing facial expressions in individuals with intellectual disabilities has not been yet studied in the literature, to the best of our knowledge. To address this objective, we train a set of 12 convolutional neural networks in different approaches, including an ensemble of datasets without individuals with intellectual disabilities and a dataset featuring such individuals. Our examination of the outcomes, both the performance and the important image regions for the models, reveals significant distinctions in facial expressions between individuals with and without intellectual disabilities, as well as among individuals with intellectual disabilities. Remarkably, our findings show the need of facial expression recognition within this population through tailored user-specific training methodologies, which enable the models to effectively address the unique expressions of each user.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 厳格なAI監査にはブラックボックスアクセスが不十分 Black-Box Access is Insufficient for Rigorous AI Audits ( http://arxiv.org/abs/2401.14446v3 ) ライセンス: Link先を確認	Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell,	(参考訳) AIシステムの外部監査は、AIガバナンスの重要なメカニズムとして、ますます認識されている。しかし、監査の有効性は監査人に与えられるアクセスの程度に依存する。最近の最先端のAIシステムの監査は、主にブラックボックスアクセスに依存しており、監査官はシステムに問い合わせて出力を観察することしかできない。しかしながら、システムの内部動作(例えば重量、アクティベーション、勾配)へのホワイトボックスアクセスは、監査人がより強力な攻撃を行い、モデルをより徹底的に解釈し、微調整を行うことを可能にする。一方、トレーニングやデプロイメント情報(方法論、コード、ドキュメンテーション、データ、デプロイメントの詳細、内部評価からの発見など)への外部アクセスは、監査人が開発プロセスを精査し、より対象とする評価を設計できるようにします。本稿では,ブラックボックス監査の限界と,ホワイトボックス監査とアウトサイドボックス監査の利点について検討する。また、これらの監査を最小限のセキュリティリスクで実施するための技術的、物理的、法的保護についても論じる。その結果,(1)監査員が使用するアクセスと手法に関する透明性は,監査結果を適切に解釈するには必要であり,(2)ブラックボックスのみよりも,ホワイトボックスとアウト・ザ・ボックスのアクセスの方がかなり精査できることがわかった。 External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 深層学習とオープンアース観測データを用いたグローバル氷河マッピングに向けて Towards Global Glacier Mapping with Deep Learning and Open Earth Observation Data ( http://arxiv.org/abs/2401.15113v2 ) ライセンス: Link先を確認	Konstantin A. Maslov, Claudio Persello, Thomas Schellenberger, Alfred Stein,	(参考訳) 正確な地球規模の氷河マッピングは、気候変動の影響を理解するために重要である。その重要性にもかかわらず、世界規模での自動氷河マッピングはほとんど未調査のままである。本稿では、このギャップに対処し、畳み込み変換型ディープラーニングモデルであるGlaViTU(GlaViTU)を提案する。空間的, 時間的, クロスセンサーの一般化を評価することで, 従来観測されていなかった画像に対して, 我々の最善策は >0.85 の団結を達成し, 高山アジアなどの破片の多い地域では >0.75 まで低下し, クリーンアイスが支配する地域では >0.90 まで上昇することを示す。面積と距離の偏差の点での人間の専門家の不確実性に対する比較検証は、GlaViTUのパフォーマンス、アプローチ、あるいは専門家レベルのデラインの整合性を強調している。合成開口レーダデータ、すなわち後方散乱と干渉コヒーレンスを追加することで、利用可能なすべての領域の精度が向上する。氷河の度合いの調整された信頼性が報告され、予測はより信頼性が高く解釈可能である。また、世界中の氷河の9%をカバーするベンチマークデータセットもリリースしました。本研究は, 自動多時期・グローバル氷河マッピングへの取り組みを支援する。 Accurate global glacier mapping is critical for understanding climate change impacts. Despite its importance, automated glacier mapping at a global scale remains largely unexplored. Here we address this gap and propose Glacier-VisionTransformer-U-Net (GlaViTU), a convolutional-transformer deep learning model, and five strategies for multitemporal global-scale glacier mapping using open satellite imagery. Assessing the spatial, temporal and cross-sensor generalisation shows that our best strategy achieves intersection over union >0.85 on previously unobserved images in most cases, which drops to >0.75 for debris-rich areas such as High-Mountain Asia and increases to >0.90 for regions dominated by clean ice. A comparative validation against human expert uncertainties in terms of area and distance deviations underscores GlaViTU performance, approaching or matching expert-level delineation. Adding synthetic aperture radar data, namely, backscatter and interferometric coherence, increases the accuracy in all regions where available. The calibrated confidence for glacier extents is reported making the predictions more reliable and interpretable. We also release a benchmark dataset that covers 9% of glaciers worldwide. Our results support efforts towards automated multitemporal and global glacier mapping.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# マルコフ連鎖モンテカルロの並列アフィン変換チューニング Parallel Affine Transformation Tuning of Markov Chain Monte Carlo ( http://arxiv.org/abs/2401.16567v2 ) ライセンス: Link先を確認	Philip Schär, Michael Habeck, Daniel Rudolf,	(参考訳) マルコフ連鎖モンテカルロサンプリング器の性能は、その共分散構造、確率質量の位置、尾の挙動などのターゲット分布の性質に強く依存する。本研究では, サンプル空間の単射アフィン変換を用いて, 対象分布の特性を向上し, 変換された空間内を走行するサンプリング器の性能を向上する。特に,サンプリング中のアフィン変換を適応的に学習するフレキシブルでユーザフレンドリなスキームを提案する。さらに,本手法とギブシアン極スライスサンプリングを組み合わせることで,実世界のデータに基づいて,比較的低い計算コストで高品質なサンプルを作成できることを示す。 The performance of Markov chain Monte Carlo samplers strongly depends on the properties of the target distribution such as its covariance structure, the location of its probability mass and its tail behavior. We explore the use of bijective affine transformations of the sample space to improve the properties of the target distribution and thereby the performance of samplers running in the transformed space. In particular, we propose a flexible and user-friendly scheme for adaptively learning the affine transformation during sampling. Moreover, the combination of our scheme with Gibbsian polar slice sampling is shown to produce samples of high quality at comparatively low computational cost in several settings based on real-world data.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 位置:ベイジアンディープラーニングは大規模AIの時代に必要である Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI ( http://arxiv.org/abs/2402.00809v3 ) ライセンス: Link先を確認	Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang,	(参考訳) ディープラーニング研究の現在の状況では、大規模な画像と言語データセットを含む教師付きタスクにおいて、高い予測精度を達成することに重点が置かれている。しかし、より広い視点から見れば、不確実性、活動的かつ継続的な学習、科学的なデータなど、見落とされがちなメトリクス、タスク、データタイプが、注意を喚起する。 Bayesian Deep Learning(BDL)は,これらのさまざまな設定にまたがってメリットを提供する,有望な道の1つである。本稿では,BDLが深層学習の能力を高めることができることを示唆する。 BDLの強みを再考し、既存の課題を認識し、これらの障害に対処するためのエキサイティングな研究方法を強調します。今後の議論は、大規模ファンデーションモデルをBDLと組み合わせて、その潜在能力を最大限に活用する方法に焦点を当てている。 In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# スパイクニューラルネットワークによる効率的な時系列予測 Efficient and Effective Time-Series Forecasting with Spiking Neural Networks ( http://arxiv.org/abs/2402.01533v2 ) ライセンス: Link先を確認	Changze Lv, Yansen Wang, Dongqi Han, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li,	(参考訳) 生物学的ニューロンのスパイク行動にインスパイアされたスパイキングニューラルネットワーク(SNN)は、時間的データの複雑さを捉えるためのユニークな経路を提供する。しかし,SNNを時系列予測に適用することは,効率的な時間的アライメントの難しさ,符号化プロセスの複雑さ,モデル選択のための標準ガイドラインの欠如などにより困難である。本稿では,時間情報処理におけるスパイクニューロンの効率を活かした時系列予測タスクにおけるSNNの枠組みを提案する。一連の実験を通して,提案手法が従来の時系列予測手法に匹敵する,あるいは優れた結果をもたらすことを示す。さらに,SNNの時系列データにおける時間的依存性を捉える能力を評価するための詳細な解析実験を行い,時間的データの複雑なダイナミクスをモデル化する上で,そのニュアンスな強度と有効性について貴重な知見を提供する。本研究は, SNNの普及に寄与し, 時系列予測タスクの代替として, より生物学的にインスパイアされ, 時間的に意識された予測モデルを開発するための経路を提供する。私たちのコードはhttps://github.com/microsoft/SeqSNNで公開されています。 Spiking neural networks (SNNs), inspired by the spiking behavior of biological neurons, provide a unique pathway for capturing the intricacies of temporal data. However, applying SNNs to time-series forecasting is challenging due to difficulties in effective temporal alignment, complexities in encoding processes, and the absence of standardized guidelines for model selection. In this paper, we propose a framework for SNNs in time-series forecasting tasks, leveraging the efficiency of spiking neurons in processing temporal information. Through a series of experiments, we demonstrate that our proposed SNN-based approaches achieve comparable or superior results to traditional time-series forecasting methods on diverse benchmarks with much less energy consumption. Furthermore, we conduct detailed analysis experiments to assess the SNN's capacity to capture temporal dependencies within time-series data, offering valuable insights into its nuanced strengths and effectiveness in modeling the intricate dynamics of temporal data. Our study contributes to the expanding field of SNNs and offers a promising alternative for time-series forecasting tasks, presenting a pathway for the development of more biologically inspired and temporally aware forecasting models. Our code is available at https://github.com/microsoft/SeqSNN.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 大規模言語モデルのためのガードレールの構築 Building Guardrails for Large Language Models ( http://arxiv.org/abs/2402.01822v2 ) ライセンス: Link先を確認	Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang,	(参考訳) 大規模言語モデル(LLM)が私たちの日常生活に統合されるにつれて、リスクの特定と緩和が不可欠です。 LLMの入力や出力をフィルタリングするガードレールは、コアセーフガード技術として登場した。このポジションペーパーでは、現在のオープンソースソリューション(Llama Guard, Nvidia NeMo, Guardrails AI)を詳しく調べ、より完全なソリューションを構築するための課題と道筋について論じる。従来の研究から強固な証拠を引用し,様々なLLMアプリケーションにおける多様な文脈の包括的考察に基づいて,LLMのガードレール構築のための体系的アプローチを提唱する。我々は,複数の専門分野のチームと共同で,正確な技術的要件の特定,要求の複雑さを受け入れるための高度なニューラルシンボリック実装の探索,最終製品の品質を保証するための検証とテストの開発などを通じて,社会工学的手法を採用することを提案する。 As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# グラフニューラルネットワークのための分布内プロキシグラフの生成 Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks ( http://arxiv.org/abs/2402.02036v2 ) ライセンス: Link先を確認	Zhuomin Chen, Jiaxing Zhang, Jingchao Ni, Xiaoting Li, Yuchen Bian, Md Mezbahul Islam, Ananda Mohan Mondal, Hua Wei, Dongsheng Luo,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフデータ処理においてビルディングブロックとなり、重要な領域で広く応用されている。高度なアプリケーションにGNNをデプロイする必要性の高まりは、意思決定プロセスにおけるユーザ説明可能性を必要としている。 GNNの説明可能性のための一般的なパラダイムは、ラベルを元のグラフと比較することで説明可能な部分グラフを特定することである。この課題は、トレーニングセットの元々のグラフから説明可能なサブグラフのセットへの相当な分布シフトにより、ラベルの正確な予測ができないため、困難である。そこで本研究では,学習データの分布を示す説明可能な部分グラフのプロキシグラフを生成する手法を提案する。本稿では,グラフ生成器を用いてプロキシグラフを生成するパラメトリック手法を提案する。情報理論に基づく新たなトレーニング目的は、プロキシグラフがトレーニングデータの分布に従属するだけでなく、説明的要因も保持するように設計されている。このような生成されたプロキシグラフは、説明可能な部分グラフのラベルの予測を確実に近似するために使用することができる。提案手法は, GNNのより正確な説明が可能であることを示す。 Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original graphs. This task is challenging due to the substantial distributional shift from the original graphs in the training set to the set of explainable subgraphs, which prevents accurate prediction of labels with the subgraphs. To address it, in this paper, we propose a novel method that generates proxy graphs for explainable subgraphs that are in the distribution of training data. We introduce a parametric method that employs graph generators to produce proxy graphs. A new training objective based on information theory is designed to ensure that proxy graphs not only adhere to the distribution of training data but also preserve explanatory factors. Such generated proxy graphs can be reliably used to approximate the predictions of the labels of explainable subgraphs. Empirical evaluations across various datasets demonstrate our method achieves more accurate explanations for GNNs.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 限界を超えて:大規模言語モデルにおける文脈長を拡張する手法の調査 Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models ( http://arxiv.org/abs/2402.02244v3 ) ライセンス: Link先を確認	Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi,	(参考訳) 近年,大規模言語モデル (LLM) は,文脈理解,論理的推論への関与,応答の生成など,顕著な能力を示している。しかし、これは厳密な計算とメモリの要求を犠牲にして達成され、長い入力シーケンスを効果的にサポートする能力を妨げる。本調査は,LLMのシーケンス長を延長するために考案された最近の手法と手法を包括的にレビューし,長文理解の能力を高めるものである。特に、計算要求の比例的な増加を回避しつつ、より長いシーケンスの処理を強化するために設計された、修正された位置符号化や変更された注意機構などのアーキテクチャ変更を含む幅広い手法をレビューし、分類する。本研究で検討した多種多様な手法は, LLMの異なる位相,すなわちトレーニング, 微調整, 推論に利用することができる。これにより、LLMは拡張シーケンスを効率的に処理できる。今後の研究の方向性を示唆する上で,LLMの継続的な進歩におけるシーケンス長の重要性を浮き彫りにした上で,現行の方法論の限界について論じる。 Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer ( http://arxiv.org/abs/2402.02464v3 ) ライセンス: Link先を確認	Zhangyang Gao, Daize Dong, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li,	(参考訳) 非ユークリッドグラフを純粋言語やユークリッドベクトルとしてモデル化することは可能か。非ユークリッド性はグラフモデリングにおいて長期にわたる課題を提起している。最近のグラフニューラルネットワークとグラフ変換器はユークリッドベクトルとしてグラフを符号化しようとするが、元のグラフをベクトルから復元することは依然として困難である。本稿では,非ユークリッドグラフをユークリッド空間の学習可能なグラフワードに変換するGraph2Seqエンコーダと,グラフワードから元のグラフを再構成して情報等価性を確保するGraphGPTデコーダを紹介する。 1) 事前学習したGraph2Seqはグラフ表現学習に優れ、8/9ドルのグラフ分類と回帰タスクで最先端の結果が得られる。 2) 事前訓練したグラフGPTは強力なグラフ生成器として機能し, 少数ショットグラフ生成と条件グラフ生成の両方を実行する強力な能力によって実証された。 (3) Graph2Seq+GraphGPT は、既知の非ユークリッド問題を克服し、ユークリッド空間におけるグラフの効果的な混合を可能にする。 (4)エッジ中心の事前学習フレームワークであるGraphsGPTは、グラフドメインタスクにおいて、表現と生成の両方において優れた効果を示す。コードは \href{https://github.com/A4Bio/GraphsGPT}{GitHub} で公開されている。 Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 難解なギブズサンプリング Diffusive Gibbs Sampling ( http://arxiv.org/abs/2402.03008v5 ) ライセンス: Link先を確認	Wenlin Chen, Mingtian Zhang, Brooks Paige, José Miguel Hernández-Lobato, David Barber,	(参考訳) 従来のマルコフ・チェイン・モンテカルロ法(MCMC)のマルチモーダル分布に対する不適切な混合は、ベイズ推論や分子動力学のような実践的応用において重要な課題である。そこで本稿では,ディフューシブギブズサンプリング(Diffusive Gibbs Sampling, DiGS)を提案する。 DiGSは拡散モデルにおける最近の発展を統合し、ガウスの畳み込みを利用して元の空間の孤立モードをブリッジする補助ノイズ分布を作成し、ギブスサンプリングを用いて両方の空間からサンプルを交互に描画する。新規なメトロポリス・ウィスティン・ギブス法は, サンプリング工程における混合性を高めるために提案されている。 DiGSは、並列テンパリングのような最先端の手法よりも、マルチモーダル分布をサンプリングするためのより優れた混合特性を示し、ガウス、ベイズニューラルネットワーク、分子動力学の混合を含む様々なタスクにおける性能を大幅に改善した。 The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# IMUSE:IMUベースの表情キャプチャ IMUSE: IMU-based Facial Expression Capture ( http://arxiv.org/abs/2402.03944v2 ) ライセンス: Link先を確認	Youjia Wang, Yiwen Wu, Hengan Zhou, Hongyang Lin, Xingyue Peng, Yingwenqi Jiang, Yingsheng Zhu, Guanpeng Long, Yatu Zhang, Jingya Wang, Lan Xu, Jingyi Yu,	(参考訳) 顔の動きのキャプチャーと分析では、支配的なソリューションは一般的に、プライバシーを保護できず、閉塞に対して脆弱な視覚的手がかりに基づいている。慣性測定ユニット (IMU) は救難の可能性を秘めているが、主にフルボディのモーションキャプチャーに採用されている。本稿では,このギャップを埋めるためにIMUSICを提案する。これは純粋IMU信号を用いた表情キャプチャの新しい経路であり,従来の視覚的解とはかなり離れているため,IMUSICのキーデザインは三部作である。我々はまず、解剖学的に駆動されるIMU配置スキームを伴って、顔の撮影に適したマイクロIMUを設計する。そして、多様な表情とパフォーマンスのために、リッチなIMU/視覚信号を提供する新しいIMU-ARKitデータセットをコントリビュートする。このようなユニークなマルチモダリティは、IMUベースの顔行動分析のような将来の方向性に大きな可能性をもたらします。さらに、IMU-ARKitを用いて、純IMU信号から顔のブレンドシェープパラメータを正確に予測する強力なベースライン手法を提案する。具体的には、この新たなトラッキングタスクのための2段階のトレーニング戦略を備えたTransformer拡散モデルを調整する。 IMUSICフレームワークは,視覚的手法が乱れ,同時にユーザのプライバシを保護するシナリオにおいて,正確な顔認証を行うことができる。 IMUSICアプローチの有効性を検証するため,IMU構成と技術コンポーネントの両方について広範な実験を行った。特に、IMUSICは、プライバシー保護の顔キャプチャー、隠蔽に対するハイブリッドキャプチャー、視覚的手がかりによってしばしば見えない微小な顔の動きの検出など、様々な可能性と斬新な応用を可能にしている。私たちは、私たちのコミュニティで顔のキャプチャと分析の可能性をさらに強化するために、データセットと実装をリリースします。 For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSIC to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSIC is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. Specifically, we tailor a Transformer diffusion model with a two-stage training strategy for this novel tracking task. The IMUSIC framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSIC approach. Notably, IMUSIC enables various potential and novel applications, i.e., privacy-protecting facial capture, hybrid capture against occlusions, or detecting minute facial movements that are often invisible through visual cues. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.	翻訳日:2024-05-30 23:40:54 公開日:2024-05-29
# 学習アルゴリズムによるより柔軟なPAC-Bayesianメタラーニング More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms ( http://arxiv.org/abs/2402.04054v2 ) ライセンス: Link先を確認	Hossein Zakerinia, Amin Behjati, Christoph H. Lampert,	(参考訳) PAC-Bayesian理論を用いたメタラーニング手法の研究のための新しいフレームワークを提案する。これまでの作業に比べて大きな利点は、タスク間の知識の伝達を実現する方法において、柔軟性を高めることである。従来のアプローチでは、モデル上の事前分布を学習することで、間接的にしか実現できなかった。対照的に、新しい一般化は、メタ学習のプロセスが将来のタスクに使用されるべき学習アルゴリズムを学習するよりも、はるかに直接的に表現できることを証明している。フレームワークの柔軟性は、幅広いメタ学習メカニズムを分析したり、新しいメカニズムを設計したりするのに適しています。理論的貢献以外は、我々のフレームワークが実用的なメタ学習メカニズムの予測品質を改善することを実証的に示しています。 We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory. Its main advantage over previous work is that it allows for more flexibility in how the transfer of knowledge between tasks is realized. For previous approaches, this could only happen indirectly, by means of learning prior distributions over models. In contrast, the new generalization bounds that we prove express the process of meta-learning much more directly as learning the learning algorithm that should be used for future tasks. The flexibility of our framework makes it suitable to analyze a wide range of meta-learning mechanisms and even design new mechanisms. Other than our theoretical contributions we also show empirically that our framework improves the prediction quality in practical meta-learning mechanisms.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# モダリティギャップ全体での検索によるマルチモーダル非教師付きドメイン一般化 Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap ( http://arxiv.org/abs/2402.04416v2 ) ライセンス: Link先を確認	Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis,	(参考訳) ドメイン一般化 (Domain Generalization, DG) は、共有ラベル空間の仮定の下で、1つ以上のソースドメインを利用するテストドメインを見えないように一般化するモデルを学習する重要な問題である。しかし、ほとんどのDG手法は、ターゲットのラベル空間における豊富なソースデータへのアクセスを前提としている。この設定では、微調整中にタスク非依存の未ラベルソースデータセットを使用する、unsupervised domain generalization (MUDG) のマルチモーダルバージョンに取り組む。私たちのフレームワークは、ソースデータセットとターゲットタスクの関係を明示的に想定していません。代わりに、ソースデータセットを、共同ビジョン言語空間で正確かつ効率的に検索できるという前提にのみ依存する。 MUDG設定で3つのコントリビューションを行います。まず,テキストクエリと粗い量子化に使用される画像セントロイドとの間の距離が大きいため,近接した近接探索が低リコールに悩まされることを理論的に示す。そこで我々は,画像空間の代わりにクエリ空間にセントロイドを格納することで,近傍のリコールを改善する単純なクラスタリングアルゴリズムであるペアk-meansを提案する。第2に、ゼロショット精度を改善し、検索した画像データを多様化するために、ターゲットラベルに対する適応的なテキスト拡張方式を提案する。最後に、下流目標精度をさらに向上させるため、2つの単純だが効果的なコンポーネントを提示する。我々は、それぞれのベンチマークで、最先端の名前のみの転送、ソースフリーDG、ゼロショット(ZS)の手法と比較し、20種類のデータセットで一貫した精度の向上を示す。コードは:https://github.com/Chris210634/mudg Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# グラフ上の確率測定のための一般ソボレフ輸送 Generalized Sobolev Transport for Probability Measures on a Graph ( http://arxiv.org/abs/2402.04516v2 ) ライセンス: Link先を確認	Tam Le, Truyen Nguyen, Kenji Fukumizu,	(参考訳) グラフ距離空間上での測度に対する最適輸送(OT)問題について検討する。最近、Le et al (2022) はグラフ構造を利用し、高速な計算のために閉形式表現を生成する OT の変種、すなわち Sobolev transport (ST) を提案する。しかし、ST は定義の中の $L^p$ の幾何構造と本質的に結合しているため、他の先行構造に対して ST を利用するのは自明ではない。対照的に、古典的なOTは、基礎となるコスト関数を変更することによって、様々な幾何学構造に適応する柔軟性を持つ。重要な例はOrlicz-Wasserstein (OW) であり、これは \emph{Orlicz 幾何構造を利用して$L^p$構造を超えて動く。標準的な$p$オーダーのWassersteinと比較して、OWは、特定の機械学習アプローチを著しく前進させるのに役立ちます。それでもOWは、2レベル最適化の定式化により、計算に新たな課題を提起している。本研究では,Orlicz構造に対する特定の凸関数のクラスを利用して,一般化ソボレフ輸送(GST)を提案する。 GSTはSTを特別な場合として包含し、$L^p$幾何を超える事前構造に利用できる。 OWに関して、OWの複雑な2段階最適化問題とは異なり、GSTを計算するために単変量最適化問題を単に解くだけでよいことを示す。 GSTはOWよりも数桁高速であることを示す。さらに、文書分類におけるGSTの利点と、トポロジカルデータ解析におけるいくつかの課題について、予備的な証拠を提供する。 We study the optimal transport (OT) problem for measures supported on a graph metric space. Recently, Le et al. (2022) leverage the graph structure and propose a variant of OT, namely Sobolev transport (ST), which yields a closed-form expression for a fast computation. However, ST is essentially coupled with the $L^p$ geometric structure within its definition which makes it nontrivial to utilize ST for other prior structures. In contrast, the classic OT has the flexibility to adapt to various geometric structures by modifying the underlying cost function. An important instance is the Orlicz-Wasserstein (OW) which moves beyond the $L^p$ structure by leveraging the \emph{Orlicz geometric structure}. Comparing to the usage of standard $p$-order Wasserstein, OW remarkably helps to advance certain machine learning approaches. Nevertheless, OW brings up a new challenge on its computation due to its two-level optimization formulation. In this work, we leverage a specific class of convex functions for Orlicz structure to propose the generalized Sobolev transport (GST). GST encompasses the ST as its special case, and can be utilized for prior structures beyond the $L^p$ geometry. In connection with the OW, we show that one only needs to simply solve a univariate optimization problem to compute the GST, unlike the complex two-level optimization problem in OW. We empirically illustrate that GST is several-order faster than the OW. Moreover, we provide preliminary evidences on the advantages of GST for document classification and for several tasks in topological data analysis.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# ラベルシフトロバストテスト時間適応のためのチャネル選択正規化 Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation ( http://arxiv.org/abs/2402.04958v2 ) ライセンス: Link先を確認	Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky,	(参考訳) ディープニューラルネットワークは多くの異なるタスクに有用な応用があるが、その性能はデータ分散の変化によって大きく影響を受ける可能性がある。例えば、バイオメディカル分野では、トレーニングとテストデータセット間のデータ(異なるマシン、人口)の変化によってパフォーマンスが影響を受ける可能性がある。実世界のシナリオに対するロバストさと一般化を保証するため、最近、推論中に新しいデータ分布にモデルを調整するためのアプローチとしてテスト時間適応法が研究されている。テスト時のバッチ正規化は、ドメインシフトベンチマークで魅力的なパフォーマンスを達成した、シンプルで一般的な方法である。テストバッチのバッチ正規化統計を再計算することで実装される。これまでの研究は、トレーニングデータと同じラベル分布を持つテストデータによる分析に重点を置いてきた。しかし、多くの実用的な応用において、この手法はラベルの分布シフトに弱いため、時には破滅的な失敗を引き起こすことがある。これにより、デプロイにテスト時間適応手法を適用するリスクが生じる。本稿では、ディープネットワークにおけるチャネルのみを選択的に適応させ、ラベルシフトに敏感な劇的な適応を最小化することで、この問題に対処することを提案する。 1) 後続のネットワーク層はラベルシフトに敏感であり,(2) 個々の特徴は特定のクラスに敏感である。提案手法をCIFAR10-C, Imagenet-C, 脂肪肝診断の3つの分類課題に適用し, 共変量およびラベル分布の変化について検討した。提案手法は,TTAの利点を生かしつつ,他の手法に共通する障害のリスクを大幅に低減するとともに,ハイパーパラメータの選択に頑健であることを示す。 Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# 原理的優先度ベイズ最適化 Principled Preferential Bayesian Optimization ( http://arxiv.org/abs/2402.05367v2 ) ライセンス: Link先を確認	Wenjie Xu, Wenbin Wang, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones,	(参考訳) 優先ベイズ最適化 (BO) の問題について検討し, 2つの候補解に対してのみ好みのフィードバックでブラックボックス関数を最適化することを目的とする。確率比のアイデアに触発されて、選好フィードバックのみを用いてブラックボックス関数の信頼度セットを構築する。この問題を解くために,効率的な計算手法を用いた楽観的アルゴリズムを開発した。この境界により、予測された最良の解を、保証された収束率で報告するスキームを設計することができる。ガウス過程, 標準試験関数, 熱的快適性最適化問題のサンプル実験結果から, 提案手法は, 既往の最先端ヒューリスティックよりも安定に, あるいは競争的な性能を達成できることが示唆された。 We study the problem of preferential Bayesian optimization (BO), where we aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. Inspired by the likelihood ratio idea, we construct a confidence set of the black-box function using only the preference feedback. An optimistic algorithm with an efficient computational method is then developed to solve the problem, which enjoys an information-theoretic bound on the total cumulative regret, a first-of-its-kind for preferential BO. This bound further allows us to design a scheme to report an estimated best solution, with a guaranteed convergence rate. Experimental results on sampled instances from Gaussian processes, standard test functions, and a thermal comfort optimization problem all show that our method stably achieves better or competitive performance as compared to the existing state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret bounds or convergence.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# 言語エージェント強化のためのエントロピー規則化トークンレベルポリシー最適化 Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement ( http://arxiv.org/abs/2402.06700v3 ) ライセンス: Link先を確認	Muning Wen, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen,	(参考訳) 大規模言語モデル(LLM)は、対話的な意思決定タスクにおいてインテリジェントなエージェントとして期待されている。伝統的なアプローチは、しばしば厳密に設計されたプロンプト、高品質な例、文脈内学習、教師付き微調整(RLHF)のための追加の報酬モデルに依存する。強化学習(Reinforcement Learning, RL)は、タスク固有の環境に直接関与することで、これらの依存関係を克服するLLMの動的代替手段を提供する。それでも、大きなハードルに直面している。 1) 探索を必要とする指数的に広大な活動空間から生じる不安定性 2)行動レベルの報酬信号に基づいてトークン単位のクレジットを割り当てることの課題は,報酬の最大化とコーパスデータの正確なモデル化の相違をもたらす。これらの課題に対応するために,トークンレベルでLLMを最適化するためのエントロピー拡張RL法であるEntropy-Regularized Token-level Policy Optimization (ETPO)を導入する。 ETPOの中心となるのは、RLプロセスと言語モデリングの原則を調和させるように設計された、新しいソフトなベルマンアップデートです。この手法は、Q関数の更新を粗いアクションレベルの視点からより粒度の細かいトークンレベルの視点へ分解し、最適化整合性の理論的証明に裏付ける。重要なことに、この分解は行動探索において線形時間の複雑さを生じさせる。我々は,データサイエンスコード生成を多段階対話タスクのシリーズとしてモデル化するシミュレーション環境におけるETPOの有効性を評価する。トークンレベルの分解とPPO法の適用の動機について、より詳細な予備研究については、arXiv:2405.15821を参照してください。 Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging directly with task-specific environments. Nonetheless, it faces significant hurdles: 1) instability stemming from the exponentially vast action space requiring exploration; 2) challenges in assigning token-level credit based on action-level reward signals, resulting in discord between maximizing rewards and accurately modeling corpus data. In response to these challenges, we introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. At the heart of ETPO is our novel per-token soft Bellman update, designed to harmonize the RL process with the principles of language modeling. This methodology decomposes the Q-function update from a coarse action-level view to a more granular token-level perspective, backed by theoretical proof of optimization consistency. Crucially, this decomposition renders linear time complexity in action exploration. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks; results underline ETPO's potential as a robust method for refining the interactive decision-making capabilities of language agents. For a more detailed preliminary work describing our motivation for token-level decomposition and applying it in PPO methods, please refer to arXiv:2405.15821.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# 力学系における実験設計のためのネスティング粒子フィルタ Nesting Particle Filters for Experimental Design in Dynamical Systems ( http://arxiv.org/abs/2402.07868v4 ) ライセンス: Link先を確認	Sahel Iqbal, Adrien Corenflos, Simo Särkkä, Hany Abdulsamad,	(参考訳) 本稿では,リスクに敏感な政策最適化として定式化した非交換可能データに対するベイズ実験設計手法を提案する。 Inside-Out SMC$^2$ algorithm, a nested sequential Monte Carlo technique to inferimal design, and embed it into a Particle Markov chain Monte Carlo framework to perform gradient-based policy amortization。提案手法は, コントラスト推定器に頼らないため, 他のアモータイズされた実験設計手法と異なる。一連の力学系の数値検証は,他の最先端戦略と比較して,本手法の有効性を示す。 In this paper, we propose a novel approach to Bayesian experimental design for non-exchangeable data that formulates it as risk-sensitive policy optimization. We develop the Inside-Out SMC$^2$ algorithm, a nested sequential Monte Carlo technique to infer optimal designs, and embed it into a particle Markov chain Monte Carlo framework to perform gradient-based policy amortization. Our approach is distinct from other amortized experimental design techniques, as it does not rely on contrastive estimators. Numerical validation on a set of dynamical systems showcases the efficacy of our method in comparison to other state-of-the-art strategies.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# 提案的満足度のための少ないデータからより良い表現を学習する Learning Better Representations From Less Data For Propositional Satisfiability ( http://arxiv.org/abs/2402.08365v2 ) ライセンス: Link先を確認	Mohamed Ghanem, Frederik Schmitt, Julian Siber, Bernd Finkbeiner,	(参考訳) NP完全問題に対するニューラルネットワークのトレーニングは通常、非常に大量のトレーニングデータを必要とし、しばしば出力の正確性を保証するために計算に高価なシンボル検証器と結合する必要がある。本稿では,NP完全問題であるNeuResについて述べる。証明書駆動のトレーニングとエキスパートのイテレーションを組み合わせることで、私たちのモデルは、分類のみのためにトレーニングされたモデルよりも優れた表現を学びます。 NeuRes は証明システムとして命題分解を使い、満足できない証明を生成し、真理の割り当てを満足させる過程を加速し、両方の可能性を並列に探索する。そこで本研究では,新しい節を導出するために,動的公式の埋め込みから句のペアを自動回帰的に選択するアテンションベースアーキテクチャを提案する。さらに、我々は、モデル生成証明が、より長い教師の証明を、新たな基礎的真実として徐々に置き換える専門家の反復を採用する。これにより、先進的な解法によって生成される証明のデータセットを、余分なガイダンスなしでトレーニング後に約32%削減できる。このことは、NeuResがその自己改善ワークフローのために教師アルゴリズムの最適性によって制限されないことを示している。このモデルでは,NuroSATよりも,正しく分類された例と証明された例の両方において,はるかに優れた性能が得られることを示す。 Training neural networks on NP-complete problems typically demands very large amounts of training data and often needs to be coupled with computationally expensive symbolic verifiers to ensure output correctness. In this paper, we present NeuRes, a neuro-symbolic approach to address both challenges for propositional satisfiability, being the quintessential NP-complete problem. By combining certificate-driven training and expert iteration, our model learns better representations than models trained for classification only, with a much higher data efficiency -- requiring orders of magnitude less training data. NeuRes employs propositional resolution as a proof system to generate proofs of unsatisfiability and to accelerate the process of finding satisfying truth assignments, exploring both possibilities in parallel. To realize this, we propose an attention-based architecture that autoregressively selects pairs of clauses from a dynamic formula embedding to derive new clauses. Furthermore, we employ expert iteration whereby model-generated proofs progressively replace longer teacher proofs as the new ground truth. This enables our model to reduce a dataset of proofs generated by an advanced solver by ~32% after training on it with no extra guidance. This shows that NeuRes is not limited by the optimality of the teacher algorithm owing to its self-improving workflow. We show that our model achieves far better performance than NeuroSAT in terms of both correctly classified and proven instances.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# マルコフ決定過程による遷移制約ベイズ最適化 Transition Constrained Bayesian Optimization via Markov Decision Processes ( http://arxiv.org/abs/2402.08406v2 ) ライセンス: Link先を確認	Jose Pablo Folch, Calvin Tsay, Robert M Lee, Behrang Shafei, Weronika Ormaniec, Andreas Krause, Mark van der Wilk, Ruth Misener, Mojmír Mutný,	(参考訳) ベイズ最適化はブラックボックス関数を最適化する手法である。従来は、検索スペースを任意にクエリできる設定に重点を置いていた。しかし、現実の多くの問題は、この柔軟性を提供していない。特に、次のクエリの検索空間は、以前のものに依存しているかもしれない。物理科学において、局所的な運動の制約、特定の変数の単調性、測定の正確性に影響を与える遷移といった形で生じる。いずれにせよ、そのような移行の制約は計画の形式を必要とする。この研究はマルコフ決定過程の枠組みを通じて古典的ベイズ最適化を拡張した。我々は,地平線全体に向けて計画する政策を得るため,強化学習を用いて実用機能の抽出可能な線形化を反復的に解決する。これは、政策空間における取得関数の最適化と平行である。結果として得られる政策は歴史に依存し、マルコフ的でない可能性がある。本稿では, 化学反応器最適化, 情報経路計画, 機械校正, その他の合成例の応用例を紹介する。 Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements. Altogether, such transition constraints necessitate a form of planning. This work extends classical Bayesian optimization via the framework of Markov Decision Processes. We iteratively solve a tractable linearization of our utility function using reinforcement learning to obtain a policy that plans ahead for the entire horizon. This is a parallel to the optimization of an acquisition function in policy space. The resulting policy is potentially history-dependent and non-Markovian. We showcase applications in chemical reactor optimization, informative path planning, machine calibration, and other synthetic examples.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# SLEB: 冗長性検証によるLLMのストリーム化と変圧器ブロックの除去 SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks ( http://arxiv.org/abs/2402.09025v2 ) ライセンス: Link先を確認	Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim,	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて非常に効果的であることが証明されている。しかし、それらの多数のパラメータは、実践的なデプロイに重大な課題を生じさせる。 LLMのサイズと複雑さを減らすことを目的とした技術であるPruningは、ネットワークから冗長なコンポーネントを取り除くことで潜在的なソリューションを提供する。プルーニングの約束にもかかわらず、既存の手法は、かなりエンドツーエンドのLSM推論スピードアップを達成するのに苦労することが多い。本稿では、冗長なトランスブロックを排除し、LCMを合理化するための新しいアプローチであるSLEBを紹介する。 LLMは隣接するブロックの出力間に高い類似性を有するブロックレベルの冗長性を示すため、我々は変圧器ブロックをプルーニングの基本単位として選択する。この選択により、LLMの処理速度を効果的に向上できる。実験結果から,SLEBはLLM推論を高速化し,高いパープレキシティと精度を維持しつつ,従来のLLMプルーニング法よりも優れており,SLEBはLLMの効率を高めるための有望な技術であることが示された。コードは、https://github.com/jiwonsong-dev/SLEB.comで入手できる。 Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# 連続多変量分布の普遍生成モデルとしてのパラメータ化量子回路 Parameterized quantum circuits as universal generative models for continuous multivariate distributions ( http://arxiv.org/abs/2402.09848v2 ) ライセンス: Link先を確認	Alice Barthe, Michele Grossi, Sofia Vallecorsa, Jordi Tura, Vedran Dunjko,	(参考訳) パラメータ化量子回路は、回帰、分類、生成タスクにおける機械学習モデルの基盤として広く使われている。教師付き学習では、その表現性は徹底的に研究され、いくつかの普遍性特性が証明されている。しかし、量子生成モデリングの場合、特に連続変数上の分布をモデル化するタスクでは、ほとんど知られていない。本研究では,サンプルモデルを用いた予測値の抽出を行う。このようなモデルは、古典的なランダムデータがアップロードされた量子回路から、固定可観測物のセットの期待値を出力する。多変量分布の生成のための変分量子アルゴリズムの普遍性を証明する。必要最小のキュービット数と必要最小限の必要な測定量とを接続し、普遍性を許容し、厳密な境界を証明できる様々なアーキテクチャを探索する。我々の結果は、生成的モデリングタスクにおける将来の量子回路の設計を導くのに役立つかもしれない。 Parameterized quantum circuits have been extensively used as the basis for machine learning models in regression, classification, and generative tasks. For supervised learning, their expressivity has been thoroughly investigated and several universality properties have been proven. However, in the case of quantum generative modelling, much less is known, especially when the task is to model distributions over continuous variables. In this work, we elucidate expectation value sampling-based models. Such models output the expectation values of a set of fixed observables from a quantum circuit into which classical random data has been uploaded. We prove the universality of such variational quantum algorithms for the generation of multivariate distributions. We explore various architectures which allow universality and prove tight bounds connecting the minimal required qubit number, and the minimal required number of measurements needed. Our results may help guide the design of future quantum circuits in generative modelling tasks.	翻訳日:2024-05-30 23:31:04 公開日:2024-05-29
# LLMs as Bridges:Reformulating Grounded Multimodal Named Entity Recognition LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition ( http://arxiv.org/abs/2402.09989v4 ) ライセンス: Link先を確認	Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan,	(参考訳) Grounded Multimodal Named Entity Recognition (GMNER) は、名前付きエンティティ、エンティティタイプ、および対応する視覚領域を識別することを目的とした、初期段階のマルチモーダルタスクである。 GMNERタスクは2つの難しい特性を示す。 1) ソーシャルメディアにおける画像テキストペア間の相関が弱かったため, 名前付きエンティティのかなりの部分が接地不能となった。 2) 類似したタスク(例えば,句の局所化,表現理解の参照など)でよく用いられる粗粒度参照表現と細粒度名前付きエンティティとの区別がある。本稿では,大規模な言語モデル(LLM)を接続ブリッジとして活用することにより,GMNERをMNER-VE-VGタスクに再構成する統合フレームワークであるRiVEGを提案する。この改革は2つの利点をもたらす。 1) MNERの最適性能を維持し, 地域特徴を事前に抽出するためにオブジェクト検出手法を用いることの必要性を排除し, 既存のGMNER手法の2つの大きな限界に自然に対処する。 2) エンティティ拡張表現とビジュアルエンタテインメント(VE)モジュールの導入により,ビジュアルグラウンド(VG)とエンティティグラウンド(EG)が統合される。これによってRiVEGは,現在のあるいは将来的なマルチモーダル事前トレーニングモデルのVisual EntailmentとVisual Grounding機能を,懸命に継承することが可能になります。大規模な実験により、RiVEGは既存のGMNERデータセットの最先端の手法より優れており、全3つのサブタスクで10.65%、6.21%、および8.83%の絶対的なリードを達成している。 Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coarse-grained referring expressions commonly used in similar tasks (e.g., phrase localization, referring expression comprehension) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge. This reformulation brings two benefits: 1) It maintains the optimal MNER performance and eliminates the need for employing object detection methods to pre-extract regional features, thereby naturally addressing two major limitations of existing GMNER methods. 2) The introduction of entity expansion expression and Visual Entailment (VE) module unifies Visual Grounding (VG) and Entity Grounding (EG). It enables RiVEG to effortlessly inherit the Visual Entailment and Visual Grounding capabilities of any current or prospective multimodal pretraining models. Extensive experiments demonstrate that RiVEG outperforms state-of-the-art methods on the existing GMNER dataset and achieves absolute leads of 10.65%, 6.21%, and 8.83% in all three subtasks.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# DDPMインバージョンを用いたゼロショット非教師付きテキスト音声編集 Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion ( http://arxiv.org/abs/2402.10009v4 ) ライセンス: Link先を確認	Hila Manor, Tomer Michaeli,	(参考訳) 大規模な事前学習モデルを用いて、ゼロショットで信号を編集する手法は、最近画像領域で急速に進歩している。しかし、この波はまだオーディオ領域に届いていない。本稿では,DDPMインバージョンと事前学習拡散モデルを用いた2つの音声信号のゼロショット編集手法について検討する。まず、ZEro-shot Text-based Audio (ZETA) 編集を画像領域から採用する。第2のZEro-shot UnSupervized (ZEUS) 編集は、意味論的に意味のある編集方向を監督なしで発見するための新しいアプローチである。音楽信号に適用すると、特定の楽器の参加の制御からメロディの即興演奏まで、音楽的に興味深い変更が多岐にわたることが分かる。サンプルとコードはhttps://hilamanor.github.io/AudioEditing/ で確認できる。 Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found in https://hilamanor.github.io/AudioEditing/ .	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# PointMamba: ポイントクラウド分析のためのシンプルな状態空間モデル PointMamba: A Simple State Space Model for Point Cloud Analysis ( http://arxiv.org/abs/2402.10739v4 ) ライセンス: Link先を確認	Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai,	(参考訳) トランスフォーマーは、優れたグローバルモデリング能力のために、ポイントクラウド分析タスクの基本的なアーキテクチャの1つになっています。しかし、注意機構は2次複雑さを持つため、大域的モデリングに訴える線形複雑化法の設計を行うことができる。本稿では,最近の代表的状態空間モデル(SSM)であるMambaを,NLPからポイントクラウド解析タスクへ移行したPointMambaを提案する。従来のトランスフォーマーとは異なり、PointMambaは線形複雑性アルゴリズムを採用し、グローバルなモデリング能力を示しながら計算コストを大幅に削減する。具体的には、空間充填曲線を有効点トークン化に利用し、非常に単純で非階層的なマンバエンコーダをバックボーンとして採用する。総合的な評価では、PointMambaは複数のデータセットで優れたパフォーマンスを実現し、GPUメモリ使用量とFLOPを大幅に削減している。本研究は,3次元視覚関連課題におけるSSMの可能性を明らかにするとともに,今後の研究に有効なマンバベースラインを提示する。コードはhttps://github.com/LMD0311/PointMambaで入手できる。 Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity, making the design of a linear complexity method with global modeling appealing. In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs. Specifically, our method leverages space-filling curves for effective point tokenization and adopts an extremely simple, non-hierarchical Mamba encoder as the backbone. Comprehensive evaluations demonstrate that PointMamba achieves superior performance across multiple datasets while significantly reducing GPU memory usage and FLOPs. This work underscores the potential of SSMs in 3D vision-related tasks and presents a simple yet effective Mamba-based baseline for future research. The code is available at https://github.com/LMD0311/PointMamba.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# RAG-Driver:マルチモーダル大言語モデルにおける検索強化型インコンテキスト学習による汎用運転説明 RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model ( http://arxiv.org/abs/2402.10828v2 ) ライセンス: Link先を確認	Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd,	(参考訳) 私たちは、しばしば不透明なAIメソッドを使用するロボットを信頼する必要があります。彼らは私たち自身を説明する必要があり、彼らの説明を信頼する必要があります。この点において、説明責任は、特に複雑な自律運転において、エンドユーザー間の透明性と受け入れを促進するために、信頼できる自律的意思決定において重要な役割を担っている。近年のMLLM(Multi-Modal Large Language Model)の進歩は、自然言語の説明とともに制御予測を生成することにより、駆動エージェントとしての説明可能性を高める有望な可能性を示している。しかし、高価なアノテーションコストと異なるデータセット間のドメインギャップによる厳しいデータ不足は、堅牢で汎用的なシステムの開発を極めて難しい課題にしている。さらに,MLLMの厳格に高価なトレーニング要件と破滅的忘れの未解決問題により,展開後の一般性はさらに制限された。これらの課題に対処するために,提案するRAG-Driverは,高能率,説明性,一般化可能な自律運転にコンテキスト内学習を活用する,検索強化型マルチモーダルな大規模言語モデルである。 RAG-Driverが運転動作の説明,正当化,制御信号の予測を行う上で,最先端の性能を発揮することを実証的に検証した。さらに重要なのは、さらなる訓練をすることなく、目に見えない環境に例外的なゼロショットの一般化能力を示すことだ。 We need to trust robots that use often opaque AI methods. They need to explain themselves to us, and we need to trust their explanation. In this regard, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations. However, severe data scarcity due to expensive annotation costs and significant domain gaps between different datasets makes the development of a robust and generalisable system an extremely challenging task. Moreover, the prohibitively expensive training requirements of MLLM and the unsolved problem of catastrophic forgetting further limit their generalisability post-deployment. To address these challenges, we present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving. By grounding in retrieved expert demonstration, we empirically validate that RAG-Driver achieves state-of-the-art performance in producing driving action explanations, justifications, and control signal prediction. More importantly, it exhibits exceptional zero-shot generalisation capabilities to unseen environments without further training endeavours.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# ベイズ最適化によるペロブスカイト実験からの物理材料パラメータ抽出 Physics-based material parameters extraction from perovskite experiments via Bayesian optimization ( http://arxiv.org/abs/2402.11101v4 ) ライセンス: Link先を確認	Hualin Zhan, Viqar Ahmad, Azul Mayon, Grace Tabi, Anh Dinh Bui, Zhuofeng Li, Daniel Walter, Hieu Nguyen, Klaus Weber, Thomas White, Kylie Catchpole,	(参考訳) 実験的分析からペロブスカイトの物質パラメータを抽出する能力は、光電気・光電子応用の合理的な設計に不可欠である。しかし、この分析の難しさは、理論モデルの複雑さとペロブスカイトの材料パラメータの数によって著しく増加する。ここでは、キャリアのドリフト拡散と動的欠陥占有を含む複雑なフル物理モデルに基づいて、過渡発光実験から有機金属ペロブスカイト半導体の8つの基本材料パラメータを抽出できる解析プラットフォームを開発するためにベイズ最適化を用いる。熱劣化の例としては、キャリヤ移動率とトラップアシスト組換え係数が顕著に低下し、欠陥エネルギー準位はほぼ変化していないことが示されている。キャリヤ移動率の低下は, パーロブスカイト型太陽電池の熱劣化に対する全体的な影響を減少させ, 補充係数の低下が補充係数の増大に影響を及ぼすにもかかわらず, 補充係数を減少させることによって支配することができる。将来、このプラットフォームは他の実験や実験の組み合わせに便利に適用でき、半導体材料の発見と最適化を加速する。 The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis platform that can extract up to 8 fundamental material parameters of an organometallic perovskite semiconductor from a transient photoluminescence experiment, based on a complex full physics model that includes drift-diffusion of carriers and dynamic defect occupation. An example study of thermal degradation reveals that the carrier mobility and trap-assisted recombination coefficient are reduced noticeably, while the defect energy level remains nearly unchanged. The reduced carrier mobility can dominate the overall effect on thermal degradation of perovskite solar cells by reducing the fill factor, despite the opposite effect of the reduced trap-assisted recombination coefficient on increasing the fill factor. In future, this platform can be conveniently applied to other experiments or to combinations of experiments, accelerating materials discovery and optimization of semiconductor materials for photovoltaics and other applications.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# 大規模言語モデルのための知識境界のベンチマーク:モデル評価の異なる視点 Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation ( http://arxiv.org/abs/2402.11493v2 ) ライセンス: Link先を確認	Xunjian Yin, Xu Zhang, Jie Ruan, Xiaojun Wan,	(参考訳) 近年,多種多様なタスクにおいて顕著な性能を達成し,大規模言語モデルの開発において大きな進歩を遂げている。言語モデルの知識能力を評価するため,従来の研究では,質問応答ペアに基づくベンチマークが多数提案されている。我々は,言語モデルがアクティベートに敏感であるため,不確定な質問や限定的な言い回しをクエリとして評価することは,信頼性が高く,包括的ではないと論じる。そこで本稿では,言語モデル内での素早い知識と素早い知識の両方を包含する,知識境界という新しい概念を導入する。知識境界は言語モデル評価の迅速な感度を回避し、より信頼性と堅牢性を高める。与えられたモデルの知識境界を探索するために,各知識に対して最適なプロンプトを識別する新しいアルゴリズムである,セマンティック制約付き予測勾配降下法を提案する。実験では,既存の手法と比較して知識境界計算におけるアルゴリズムの性能が優れていることを示した。さらに,知識境界を持つ複数の領域における複数の言語モデルの能力を評価する。 In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or limited paraphrases as the query, since language models are sensitive to prompt. Therefore, we introduce a novel concept named knowledge boundary to encompass both prompt-agnostic and prompt-sensitive knowledge within language models. Knowledge boundary avoids prompt sensitivity in language model evaluations, rendering them more dependable and robust. To explore the knowledge boundary for a given model, we propose projected gradient descent method with semantic constraints, a new algorithm designed to identify the optimal prompt for each piece of knowledge. Experiments demonstrate a superior performance of our algorithm in computing the knowledge boundary compared to existing methods. Furthermore, we evaluate the ability of multiple language models in several domains with knowledge boundary.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# ワッサーシュタイン分布ロバストモデルに対する普遍的一般化保証 Universal generalization guarantees for Wasserstein distributionally robust models ( http://arxiv.org/abs/2402.11981v2 ) ライセンス: Link先を確認	Tam Le, Jérôme Malick,	(参考訳) 分散ロバストな最適化は、堅牢な機械学習モデルをトレーニングし、データの不確実性と分散シフトをキャプチャする魅力的な方法として登場した。最近の統計分析により、ワッサーシュタイン曖昧性集合から構築されたロバストモデルが優れた一般化を保証することが証明され、次元性の呪いが破られる。しかし、これらの結果は特定の場合、近似のコスト、あるいは実際は検証が難しい仮定の下で得られる。対照的に、この記事では、輸送コスト関数や損失関数、潜在的に凸や非平滑性を含むすべての実例をカバーする正確な一般化を保証する。例えば、私たちの結果は制限的な仮定を必要とせず、ディープラーニングに適用されます。この結果は,非平滑解析法と古典的濃度解析法を組み合わせた新しい証明手法によって達成される。我々のアプローチは、(二重)正則化を含む分布的に頑健な問題をワッサーシュタイン/シンクホーンの最近のバージョンに拡張するのに十分である。 Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that robust models built from Wasserstein ambiguity sets have nice generalization guarantees, breaking the curse of dimensionality. However, these results are obtained in specific cases, at the cost of approximations, or under assumptions difficult to verify in practice. In contrast, we establish, in this article, exact generalization guarantees that cover all practical cases, including any transport cost function and any loss function, potentially non-convex and nonsmooth. For instance, our result applies to deep learning, without requiring restrictive assumptions. We achieve this result through a novel proof technique that combines nonsmooth analysis rationale with classical concentration results. Our approach is general enough to extend to the recent versions of Wasserstein/Sinkhorn distributionally robust problems that involve (double) regularizations.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# 多発性対数的ミニマックス後悔を伴うリニアバンディット Linear bandits with polylogarithmic minimax regret ( http://arxiv.org/abs/2402.12042v2 ) ライセンス: Link先を確認	Josep Lumbreras, Marco Tomamichel,	(参考訳) 本研究では,未知ベクトルに近づいた単位球上の動作を選択すると,下ガウス雑音パラメータが線形に消滅する線形確率帯域の雑音モデルについて検討する。我々は,この問題に対するアルゴリズムを導入し,時間軸で$\log^3(T)$,時間軸で$T$と,典型的な帯域幅アルゴリズムに対するこの後悔の平方根スケーリングとは対照的に,ミニマックス後悔のスケーリングを$\log^3(T)$とする。我々の戦略は、重み付けされた最小二乗推定に基づいて、固有値関係 $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time steps $t$ をノイズモデルとは独立で、独立した関心を持つような幾何学的議論を通じて達成する。これにより、各時間ステップにおける期待された後悔を$O(\frac1{t})$の順番で厳格に制御することができ、累積的後悔の対数的スケーリングにつながります。 We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# 医療用多言語言語モデルの構築に向けて Towards Building Multilingual Language Model for Medicine ( http://arxiv.org/abs/2402.13963v3 ) ライセンス: Link先を確認	Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie,	(参考訳) オープンソースの多言語医療言語モデルの開発は、様々な地域から幅広い言語的に多様な聴衆に利益をもたらすことができる。まず、MMedCと呼ばれる6つの主要言語を含む約25.5Bのトークンを含む多言語医療コーパスを構築し、さらに、多言語医療LLMの開発を監視するために、MMedBenchと呼ばれる有理性を備えた多言語医療多言語質問応答ベンチマークを提案し、第3に、MMedCで訓練された他の自動回帰型言語モデルとともに、ベンチマーク上で多数のオープンソースの大規模言語モデル(LLM)を評価した。我々の最終モデルであるMMed-Llama 3は、8Bパラメータしか持たないが、GPT-4に匹敵するようなMMedBenchおよび英語ベンチマークの他のすべてのオープンソースモデルと比較して、優れた性能が得られる。そこで本研究では,多言語医療用LLMの開発を支援するための大規模コーパス,ベンチマーク,一連のモデルを提案する。 The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for general LLMs; Second, to monitor the development of multilingual medical LLMs, we propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; Third, we have assessed a number of open-source large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC. Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks, even rivaling GPT-4. In conclusion, in this work, we present a large-scale corpus, a benchmark and a series of models to support the development of multilingual medical LLMs.	翻訳日:2024-05-30 23:21:18 公開日:2024-05-29
# EOS決定の観点からのマルチモーダル幻覚の緩和 Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective ( http://arxiv.org/abs/2402.14545v2 ) ライセンス: Link先を確認	Zihao Yue, Liang Zhang, Qin Jin,	(参考訳) 大規模なマルチモーダルモデル(LMM)は、視覚的な入力に存在しないコンテンツを生成するため、しばしば多モーダル幻覚に悩まされる。本稿では,この問題の新たなアングルを探究する:過度に詳細なトレーニングデータにより,モデルが生成をタイムリーに終了する能力が損なわれ,視覚的知覚限界を超えて出力が継続する。特殊な終末トークンであるEOSを用いて、モデルがどのように生成を終了させるかを調べることで、生成したテキストと画像を比較してシーケンス全体の完全性を評価する。この観察は、モデルが過度に長い出力を避けるために、その視覚的知覚に基づいて適切なEOS決定を行う固有の可能性を持っていることを示唆している。このような可能性を活用するために,モデルが正規指導データから学習することで幻覚を減らすことができる訓練目標と,有害な訓練データがモデル幻覚を悪化させるのを防ぐためのデータフィルタリング戦略の2つの手法を検討する。どちらの手法も追加のデータや知識を必要とせずにLMMの幻覚性能を大幅に向上させる。 Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation, leading to continued outputs beyond visual perception limits. By investigating how the model decides to terminate generation with EOS, the special end-of-sentence token, we find that the model assesses the completeness of the entire sequence by comparing the generated text with the image. This observation suggests that the model possesses an inherent potential of making proper EOS decisions based on its visual perception to avoid overly lengthy outputs. To take advantage of such potential, we explore two methods to mitigate multimodal hallucinations: a training objective that enables the model to reduce hallucinations by learning from regular instruction data, and a data filtering strategy to prevent harmful training data from exacerbating model hallucinations. Both methods significantly improve the hallucination performance of LMMs, without requiring any additional data or knowledge.	翻訳日:2024-05-30 23:21:17 公開日:2024-05-29
# 対人訓練における不均一規則化の再考 : ロバスト性・精度トレードオフの改善 Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off ( http://arxiv.org/abs/2402.14648v2 ) ライセンス: Link先を確認	Futa Waseda, Ching-Chun Chang, Isao Echizen,	(参考訳) 敵の訓練は、敵の例(AE)を防衛する最先端のアプローチであるが、正確さを犠牲にして高い堅牢性が達成されるロバストネス・精度のトレードオフに悩まされている。本研究では,このトレードオフを緩和するために,潜在表現の不等式正規化を活用し,識別的かつ逆向きに不変表現を学習する。非分散正規化を伴う表現学習における2つの主要な課題を解析し、(1)不分散損失と分類目的との「段階的な衝突」が最適下収束をもたらすこと、(2)クリーン入力と逆入力の分散分布から生じる混合分布問題について分析する。これらの問題に対処するため,非対称的非分散損失と停止段階演算と予測器を組み込んだ非対称表現正規化逆行訓練(AR-AT)と,混合分布問題を解決するための分割バッチノーム(BN)構造を提案する。本手法は,識別能力を犠牲にすることなく,逆不変表現を学習することにより,ロバスト性・精度のトレードオフを大幅に改善する。さらに,本研究の知見が知識蒸留に基づく防衛手法との関連性について考察し,それらの相対的成功の深い理解に寄与する。 Although adversarial training has been the state-of-the-art approach to defend against adversarial examples (AEs), it suffers from a robustness-accuracy trade-off, where high robustness is achieved at the cost of clean accuracy. In this work, we leverage invariance regularization on latent representations to learn discriminative yet adversarially invariant representations, aiming to mitigate this trade-off. We analyze two key issues in representation learning with invariance regularization: (1) a "gradient conflict" between invariance loss and classification objectives, leading to suboptimal convergence, and (2) the mixture distribution problem arising from diverged distributions of clean and adversarial inputs. To address these issues, we propose Asymmetrically Representation-regularized Adversarial Training (AR-AT), which incorporates asymmetric invariance loss with stop-gradient operation and a predictor to improve the convergence, and a split-BatchNorm (BN) structure to resolve the mixture distribution problem. Our method significantly improves the robustness-accuracy trade-off by learning adversarially invariant representations without sacrificing discriminative ability. Furthermore, we discuss the relevance of our findings to knowledge-distillation-based defense methods, contributing to a deeper understanding of their relative successes.	翻訳日:2024-05-30 23:21:17 公開日:2024-05-29
# ダブルIウォーターマーク : LLMファインチューニングのためのモデル著作権保護 Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning ( http://arxiv.org/abs/2402.14883v2 ) ライセンス: Link先を確認	Shen Li, Liuyi Yao, Jinyang Gao, Lan Zhang, Yaliang Li,	(参考訳) さまざまなアプリケーションをサポートするために、ビジネスオーナーにとって一般的で効率的なアプローチは、LLMオーナやクラウドサーバが提供するAPIを通じて、トレーニング済みのLLMを微調整するための貴重なデータセットを活用している。しかし、このプロセスはモデル誤用のかなりのリスクを伴い、ビジネスオーナーに深刻な経済的影響をもたらす可能性がある。したがって、LLM微調整中にこれらのカスタマイズされたモデルの著作権を保護することは、緊急の現実的な要件となっているが、そのような保護を提供するための既存のソリューションは限られている。このプレス問題に対処するため、「ダブルI透かし」と呼ばれる新しい透かし手法を提案する。具体的には、インストラクションチューニングデータに基づいて、2種類のバックドアデータパラダイムを導入し、それぞれインストラクションと入力をトリガーとする。 LLMの学習機能を活用して、データセットにカスタマイズされたバックドアサンプルを組み込むことにより、細調整中に特定の透かし情報をカスタマイズされたモデルに効果的に注入することで、商業シナリオにおける透かしの注入と検証が容易になる。提案手法を各種微調整法で評価し, その無害性, 頑健性, 独特性, 不受容性, 妥当性を定量的および定性的な分析により検証した。 To support various applications, a prevalent and efficient approach for business owners is leveraging their valuable datasets to fine-tune a pre-trained LLM through the API provided by LLM owners or cloud servers. However, this process carries a substantial risk of model misuse, potentially resulting in severe economic consequences for business owners. Thus, safeguarding the copyright of these customized models during LLM fine-tuning has become an urgent practical requirement, but there are limited existing solutions to provide such protection. To tackle this pressing issue, we propose a novel watermarking approach named ``Double-I watermark''. Specifically, based on the instruct-tuning data, two types of backdoor data paradigms are introduced with trigger in the instruction and the input, respectively. By leveraging LLM's learning capability to incorporate customized backdoor samples into the dataset, the proposed approach effectively injects specific watermarking information into the customized model during fine-tuning, which makes it easy to inject and verify watermarks in commercial scenarios. We evaluate the proposed "Double-I watermark" under various fine-tuning methods, demonstrating its harmlessness, robustness, uniqueness, imperceptibility, and validity through both quantitative and qualitative analyses.	翻訳日:2024-05-30 23:21:17 公開日:2024-05-29
# 実用性保証によるデータの公平性の達成 Achievable Fairness on Your Data With Utility Guarantees ( http://arxiv.org/abs/2402.17106v2 ) ライセンス: Link先を確認	Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu,	(参考訳) 機械学習のフェアネスでは、異なるセンシティブなグループ間の格差を最小限に抑えるトレーニングモデルはしばしば精度を低下させる。このトレードオフの深刻さは、本質的にデータセットの不均衡やバイアスといったデータセット特性に依存しているため、多様なデータセット間で均一な公平性要件を使用することは疑問の余地が残る。これを解決するために、厳密な統計的保証を背景として、個々のデータセットに適合する公平性-正確性トレードオフ曲線を近似する計算効率の良い手法を提案する。 You-Only-Train-Once(YOTO)フレームワークを利用することで、トレードオフ曲線を近似する際に複数のモデルを訓練する際の計算負担を軽減する。そこで本研究では,推定誤差による誤った結論を避けつつ,モデルフェアネスを監査する堅牢な枠組みを実践者に提供し,評価の不確実性を定量化する手法を提案する。我々の実験は、表形式(例えば、アダルト)、画像(CelebA)、言語(Jigsaw)データセットにまたがるものであり、我々のアプローチは、様々なデータモダリティで達成可能な最適トレードオフを確実に定量化するだけでなく、SOTAフェアネス法における準最適性の検出にも役立ちます。 In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.	翻訳日:2024-05-30 23:21:17 公開日:2024-05-29
# 大規模言語モデルの学習自由長期スケーリング Training-Free Long-Context Scaling of Large Language Models ( http://arxiv.org/abs/2402.17463v2 ) ライセンス: Link先を確認	Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong,	(参考訳) 大規模言語モデル(LLM)によるコヒーレントテキストの処理と生成能力は,入力トークンの数が事前学習期間を超えると著しく低下する。 Llama2 70Bは100k以上のトークンのコンテキストウィンドウを連続的なトレーニングなしでサポートできる。長いシーケンスの注意計算をチャンクベースのモジュールに分解することで、DCAは同じチャンク(Intra-Chunk)と異なるチャンク(Inter-Chunk)内のトークンの相対的な位置情報を効果的にキャプチャし、Flash Attentionとシームレスに統合する。 DCAは、その印象的な補間機能に加えて、微調整されたモデルに匹敵する、あるいはそれ以上に優れた、実用的な長期コンテキストタスクのパフォーマンスを実現している。プロプライエタリモデルと比較すると,トレーニングフリーの70Bモデルでは,gpt-3.5-16kのパフォーマンスの94%を達成しています。この作業で使用されるすべてのコードとデータは、 \url{https://github.com/HKUNLP/ChunkLlama} でリリースされる。 The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By decomposing the attention computation for long sequences into chunk-based modules, DCA manages to effectively capture the relative positional information of tokens within the same chunk (Intra-Chunk) and across distinct chunks (Inter-Chunk), as well as integrates seamlessly with Flash Attention. In addition to its impressive extrapolation capability, DCA achieves performance on practical long-context tasks that is comparable to or even better than that of finetuned models. When compared with proprietary models, our training-free 70B model attains 94% of the performance of gpt-3.5-16k, indicating it is a viable open-source alternative. All code and data used in this work are released at \url{https://github.com/HKUNLP/ChunkLlama}.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# ジョブショップスケジューリング問題の解決のための双方向グラフ注意ネットワークを用いたトポロジ表現の学習 Learning Topological Representations with Bidirectional Graph Attention Network for Solving Job Shop Scheduling Problem ( http://arxiv.org/abs/2402.17606v2 ) ライセンス: Link先を確認	Cong Zhang, Zhiguang Cao, Yaoxin Wu, Wen Song, Jing Sun,	(参考訳) 既存の学習に基づくジョブショップスケジューリング問題(JSSP)の解法は、通常、非方向グラフに適した既製のGNNモデルを使用し、解離グラフ(DG)のリッチで有意義なトポロジ構造を無視する。本稿では,このアテンション機構に基づく新しいGNNアーキテクチャである,トポロジ対応双方向グラフアテンションネットワーク(TBGAT)を提案し,JSSPをローカル検索フレームワークに組み込む。具体的には、TBGATは、それぞれ前方と後方のビューからDGを埋め込み、ビューの異なるトポロジに従ってメッセージが伝播し、グラフの注意を通して集約される。そこで本稿では,DGの前方および後方トポロジ的ソートを計算するためのメッセージパス機構に基づく新しい演算子を提案する。さらに,TBGATはジョブ数とマシン数に線形計算の複雑さがあることを理論的および実験的に示し,本手法の実用的価値を高めた。さらに、5つの合成データセットと7つの古典的なベンチマークに関する広範な実験により、TBGATは広い範囲のニューラルネットワークよりも大きなマージンで、新しいSOTA結果を達成することが示された。すべてのコードとデータはhttps://github.com/zcaicaros/TBGAT.comで公開されている。 Existing learning-based methods for solving job shop scheduling problems (JSSP) usually use off-the-shelf GNN models tailored to undirected graphs and neglect the rich and meaningful topological structures of disjunctive graphs (DGs). This paper proposes the topology-aware bidirectional graph attention network (TBGAT), a novel GNN architecture based on the attention mechanism, to embed the DG for solving JSSP in a local search framework. Specifically, TBGAT embeds the DG from a forward and a backward view, respectively, where the messages are propagated by following the different topologies of the views and aggregated via graph attention. Then, we propose a novel operator based on the message-passing mechanism to calculate the forward and backward topological sorts of the DG, which are the features for characterizing the topological structures and exploited by our model. In addition, we theoretically and experimentally show that TBGAT has linear computational complexity to the number of jobs and machines, respectively, strengthening our method's practical value. Besides, extensive experiments on five synthetic datasets and seven classic benchmarks show that TBGAT achieves new SOTA results by outperforming a wide range of neural methods by a large margin. All the code and data are publicly available online at https://github.com/zcaicaros/TBGAT.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# バランシング法:拡散モデルにおける分散誘導型デバイアス Balancing Act: Distribution-Guided Debiasing in Diffusion Models ( http://arxiv.org/abs/2402.18206v3 ) ライセンス: Link先を確認	Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, R. Venkatesh Babu,	(参考訳) 拡散モデル(DM)は、前例のない画像生成能力を持つ強力な生成モデルとして登場した。これらのモデルは、データ拡張とクリエイティブなアプリケーションに広く利用されている。しかし、DMはトレーニングデータセットに存在するバイアスを反映する。これは特に、DMが1つのサブグループと他のグループ(例えば、女性と男性)を優先する顔の文脈において関係している。本研究では,追加データやモデル再学習に頼ることなく,DMを劣化させる手法を提案する。具体的には,生成した画像を所定の属性分布に従うように強制する分散誘導法を提案する。これを実現するために、UNetを識別する潜在機能には、リッチな階層的セマンティクスが備わっており、デバイアス発生を誘導するためにも同様に活用できる、という重要な洞察に基づいて構築する。 ADP(Attribute Distribution Predictor)をトレーニングします - 潜伏した特徴を属性の分布にマッピングする小さなmlpです。 ADPは、既存の属性分類器から生成された擬似ラベルで訓練される。 ADPを用いた配電誘導により,公平な生成が可能となる。提案手法は, 単一/複数属性間のバイアスを低減し, 非条件およびテキスト条件拡散モデルにおいて, ベースラインのマージンを著しく上回る。さらに、生成されたデータとトレーニングセットを再バランスさせることにより、フェア属性分類器をトレーニングする下流タスクを提案する。 Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability. These models are widely used for data augmentation and creative applications. However, DMs reflect the biases present in the training datasets. This is especially concerning in the context of faces, where the DM prefers one demographic subgroup vs others (eg. female vs male). In this work, we present a method for debiasing DMs without relying on additional data or model retraining. Specifically, we propose Distribution Guidance, which enforces the generated images to follow the prescribed attribute distribution. To realize this, we build on the key insight that the latent features of denoising UNet hold rich demographic semantics, and the same can be leveraged to guide debiased generation. We train Attribute Distribution Predictor (ADP) - a small mlp that maps the latent features to the distribution of attributes. ADP is trained with pseudo labels generated from existing attribute classifiers. The proposed Distribution Guidance with ADP enables us to do fair generation. Our method reduces bias across single/multiple attributes and outperforms the baseline by a significant margin for unconditional and text-conditional diffusion models. Further, we present a downstream task of training a fair attribute classifier by rebalancing the training set with our generated data.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 量子コンピュータにおける部分微分方程式の変分量子シミュレーションにおける境界処理 Boundary Treatment for Variational Quantum Simulations of Partial Differential Equations on Quantum Computers ( http://arxiv.org/abs/2402.18619v2 ) ライセンス: Link先を確認	Paul Over, Sergio Bengoechea, Thomas Rung, Francesco Clerici, Leonardo Scandurra, Eugene de Villiers, Dieter Jaksch,	(参考訳) 本稿では, 2次偏微分方程式で表される初期境界値問題を解くための変分量子アルゴリズムを提案する。このアプローチでは、現在のノイズの多い中間スケール量子時代の量子コンピュータに適した、ハイブリッドな古典/量子ハードウェアを使用する。偏微分方程式は、まずモジュラー制御-状態演算子(アンザッツ)で最適制御問題に変換される。最適化器が必要とする目的関数とその導関数は、アンシラ量子ビットを測定して量子コンピュータ上で効率よく評価でき、最適化手順は古典的なハードウェアを用いる。この研究の焦点は境界条件の処理であり、補正手法を用いて量子ハードウェアの特性に合わせて調整される。この目的のために、偏微分方程式の境界条件と離散項はユニタリ演算の列に分解され、その後量子ゲートにコンパイルされる。量子ハードウェアを古典的にエミュレートすることにより、2階偏微分方程式に対して、アプローチの精度とゲートの複雑さを評価する。例としては、様々なディリクレ条件、ノイマン条件、ロビン条件と組み合わせてスカラー特性に対する定常かつ非定常な拡散輸送方程式がある。このフレキシブルアプローチの結果は、関連する量子回路の量子ビット数において、顕著なポリログの複雑さのスケーリングと組み合わせて、堅牢な振る舞いと強い予測精度を示す。残る課題は最適化手順を高速化する適応的なアンザッツ戦略を指す。 The paper presents a variational quantum algorithm to solve initial-boundary value problems described by second-order partial differential equations. The approach uses hybrid classical/quantum hardware that is well suited for quantum computers of the current noisy intermediate-scale quantum era. The partial differential equation is initially translated into an optimal control problem with a modular control-to-state operator (ansatz). The objective function and its derivatives required by the optimizer can efficiently be evaluated on a quantum computer by measuring an ancilla qubit, while the optimization procedure employs classical hardware. The focal aspect of the study is the treatment of boundary conditions, which is tailored to the properties of the quantum hardware using a correction technique. For this purpose, the boundary conditions and the discretized terms of the partial differential equation are decomposed into a sequence of unitary operations and subsequently compiled into quantum gates. The accuracy and gate complexity of the approach are assessed for second-order partial differential equations by classically emulating the quantum hardware. The examples include steady and unsteady diffusive transport equations for a scalar property in combination with various Dirichlet, Neumann, or Robin conditions. The results of this flexible approach display a robust behavior and a strong predictive accuracy in combination with a remarkable polylog complexity scaling in the number of qubits of the involved quantum circuits. Remaining challenges refer to adaptive ansatz strategies that speed up the optimization procedure.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 拡散モデルによる顔スワップ Face Swap via Diffusion Model ( http://arxiv.org/abs/2403.01108v2 ) ライセンス: Link先を確認	Feifei Wang,	(参考訳) 本稿では,2つのポートレート画像間の顔交換のための拡散モデルに基づくフレームワークを提案する。基本フレームワークは3つのコンポーネント(IP-Adapter、ControlNet、Stable Diffusionのインパインティングパイプライン)で構成され、それぞれ顔の特徴符号化、マルチ条件生成、顔インパインティングである。さらに、顔面誘導最適化とCodeFormerベースのブレンディングを導入して、生成品質をさらに改善します。具体的には、最近の軽量化手法(DreamBooth-LoRA)に取り組み、アイデンティティの整合性を保証する。 1) 情報源の同一性を表すために稀な識別子 "sks" を用いて, 2) 画像の特徴をテキストの特徴のように各横断層に注入する。次に、安定拡散の強い塗装能力を活用し、ターゲットポートレートのキャニー画像と顔検出アノテーションを条件として利用し、ContorlNetの生成をガイドし、ソースポートレートとターゲットポートレートを整列させる。さらに顔のアライメントを補正するため、サンプル生成時のテキスト埋め込みを最適化するために顔誘導損失を追加する。コードは、https://github.com/somuchtome/Faceswap.comで入手できる。 This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation. The code is available at: https://github.com/somuchtome/Faceswap	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# WebCiteS: Citationsを用いた中国語Web検索結果の分散クエリ焦点要約(Attributed Query-Focused Summarization) WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations ( http://arxiv.org/abs/2403.01774v2 ) ライセンス: Link先を確認	Haolin Deng, Chang Wang, Xin Li, Dezhang Yuan, Junlang Zhan, Tianhua Zhou, Jin Ma, Jun Gao, Ruifeng Xu,	(参考訳) 大規模言語モデル(LLM)における属性の強化は重要な課題である。実現可能なアプローチの1つは、LLMが世代をサポートする外部ソースを引用できるようにすることである。しかし、この領域の既存のデータセットと評価方法には、依然として顕著な制限がある。本研究では、属性付きクエリ中心要約(AQFS)のタスクを定式化し、7kの人称注釈の要約を引用した中国語データセットであるWebCiteSを提示する。 WebCiteSは、実際のユーザクエリとWeb検索結果から派生したもので、モデルのトレーニングと評価のための貴重なリソースを提供する。帰属評価における先行研究は、起伏誤差と引用誤差を区別しない。また、複数のソースから部分的なサポートを引き出す文の自動検証にも不足している。これらの課題に対処するために、詳細なメトリクスを開発し、自動評価器が文を細かな検証のためにサブステートに分解できるようにする。 WebCiteSのオープンソースモデルとプロプライエタリモデルの両方を包括的に評価することは、LLMが正しく引用する上で直面する課題を浮き彫りにして、さらなる改善の必要性を浮き彫りにしている。データセットとコードは、この決定的な分野のさらなる研究を促進するために、オープンソース化される。 Enhancing the attribution in large language models (LLMs) is a crucial task. One feasible approach is to enable LLMs to cite external sources that support their generations. However, existing datasets and evaluation methods in this domain still exhibit notable limitations. In this work, we formulate the task of attributed query-focused summarization (AQFS) and present WebCiteS, a Chinese dataset featuring 7k human-annotated summaries with citations. WebCiteS derives from real-world user queries and web search results, offering a valuable resource for model training and evaluation. Prior works in attribution evaluation do not differentiate between groundedness errors and citation errors. They also fall short in automatically verifying sentences that draw partial support from multiple sources. We tackle these issues by developing detailed metrics and enabling the automatic evaluator to decompose the sentences into sub-claims for fine-grained verification. Our comprehensive evaluation of both open-source and proprietary models on WebCiteS highlights the challenge LLMs face in correctly citing sources, underscoring the necessity for further improvement. The dataset and code will be open-sourced to facilitate further research in this crucial field.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 空気質推論のための時空間ニューラルネットワーク Spatio-Temporal Field Neural Networks for Air Quality Inference ( http://arxiv.org/abs/2403.02354v2 ) ライセンス: Link先を確認	Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Kun Wang, Shifen Cheng, Yuxuan Liang,	(参考訳) 空気質推定問題は、限られた観測地点からの履歴データを利用して、未知の場所で空気質指数を推定することを目的としている。ステーションのメンテナンスコストの高さによるデータの分散性を考慮すると、優れた推論アルゴリズムはコストを効果的に削減し、データの粒度を改善できる。時空間グラフニューラルネットワークはこの問題に対して優れた進歩を遂げているが、非ユークリッドおよび離散データ構造モデリングではそのポテンシャルが制限されている。本研究では、新しいモデルである時空間ニューラルネットワークとそれに対応する新しいフレームワークであるピラミッド推論を提案することにより、2つの異なる時空間的視点、フィールド、グラフを組み合わせるための最初の試みを行う。広範にわたる実験により,中国本土の大気質推定において,提案モデルと枠組みの優位性を実証した。 The air quality inference problem aims to utilize historical data from a limited number of observation sites to infer the air quality index at an unknown location. Considering the sparsity of data due to the high maintenance cost of the stations, good inference algorithms can effectively save the cost and refine the data granularity. While spatio-temporal graph neural networks have made excellent progress on this problem, their non-Euclidean and discrete data structure modeling of reality limits its potential. In this work, we make the first attempt to combine two different spatio-temporal perspectives, fields and graphs, by proposing a new model, Spatio-Temporal Field Neural Network, and its corresponding new framework, Pyramidal Inference. Extensive experiments validate that our model achieves state-of-the-art performance in nationwide air quality inference in the Chinese Mainland, demonstrating the superiority of our proposed model and framework.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# アクティブな統計的推測 Active Statistical Inference ( http://arxiv.org/abs/2403.03208v2 ) ライセンス: Link先を確認	Tijana Zrnic, Emmanuel J. Candès,	(参考訳) アクティブ・ラーニングの概念に着想を得て,機械学習支援データ収集を用いた統計的推論のためのアクティブ・推論法を提案。収集可能なラベルの数に関する予算を仮定すると、この方法論は機械学習モデルを使用して、どのデータポイントがラベルにとって最も有益なものかを識別し、予算を効果的に活用する。モデルは不確実性を示すデータポイントに対してラベルの収集を優先順位付けし、自信のあるモデルの予測に依存する。アクティブ推論は、ブラックボックス機械学習モデルを利用し、データ分散を処理しながら、確実に妥当な信頼区間と仮説テストを構成する。キーポイントは、非適応的に収集されたデータに依存する既存のベースラインよりもはるかに少ないサンプルで同じレベルの精度を達成することである。これは、同じ数のサンプルに対して、アクティブ推論はより小さな信頼区間とより強力なp値を可能にすることを意味する。我々は、世論調査、国勢調査分析、およびプロテオミクスからデータセットに対するアクティブな推測を評価する。 Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 一から多へ:言語モデルにおける毒性緩和の範囲を広げる From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models ( http://arxiv.org/abs/2403.03893v2 ) ライセンス: Link先を確認	Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis,	(参考訳) これまで、言語モデルにおける毒性の緩和は、ほぼ完全に単一言語設定に焦点が当てられていた。言語モデルが多言語機能を取り入れているため、私たちの安全対策はペースを保ちます。この研究ギャップを認識し,本手法は,複数の言語が提示する複雑さに対処するため,従来の毒性緩和の範囲を広げるものである。言語間で十分なアノテートされたデータセットがないため、私たちは翻訳データを用いて緩和手法を評価し、強化する。また,静的かつ連続的な毒性緩和シナリオにおいて,検索強化手法に対する微調整緩和手法の比較を行った。これにより,翻訳品質と言語間移動が毒性軽減に及ぼす影響を検討することができる。また、モデルのサイズとデータ量がこれらの緩和努力の成功にどのように影響するかについても検討する。本研究は,9つの言語を網羅し,多種多様な言語族と資源利用のレベルを表現している。総合的な実験を通じて、多言語毒性緩和の複雑さに関する洞察を提供し、価値ある洞察を提供し、このますます重要な分野における将来の研究の道を開く。コードとデータはhttps://github.com/for-ai/goodtriever.comで公開されている。 To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 文脈偏見におけるロバスト感情認識 Robust Emotion Recognition in Context Debiasing ( http://arxiv.org/abs/2403.05963v2 ) ライセンス: Link先を確認	Dingkang Yang, Kun Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Lihua Zhang,	(参考訳) 文脈認識型感情認識(CAER)は、近年、制約のない環境における感情コンピューティング技術の実践的応用を高めている。メインストリームCAER法は多様な文脈と主観的特徴からアンサンブル表現を抽出し,対象者の感情状態を知覚する。進歩にもかかわらず、最大の課題は、コンテキストバイアスの干渉によるものである。有害なバイアスは、モデルに背景のコンテキストと感情のラベルの間の急激な相関に頼らざるを得ない。本稿では,このような問題に対処するために,反現実的感情推論(CLEF)フレームワークを提案する。具体的には、まず一般化因果グラフを定式化し、CAERの変数間の因果関係を分離する。因果グラフに続いて、CLEFはコンテキストバイアスによって引き起こされる副作用を捉えるために、非侵襲的なコンテキストブランチを導入している。提案手法では, 実測結果と実測結果とを比較して, 全体因果効果から直接文脈効果を排除し, バイアス緩和と頑健な予測を行う。モデルに依存しないフレームワークとして、CLEFは既存のメソッドに簡単に統合でき、一貫したパフォーマンス向上をもたらす。 Context-aware emotion recognition (CAER) has recently boosted the practical applications of affective computing techniques in unconstrained environments. Mainstream CAER methods invariably extract ensemble representations from diverse contexts and subject-centred characteristics to perceive the target person's emotional state. Despite advancements, the biggest challenge remains due to context bias interference. The harmful bias forces the models to rely on spurious correlations between background contexts and emotion labels in likelihood estimation, causing severe performance bottlenecks and confounding valuable context priors. In this paper, we propose a counterfactual emotion inference (CLEF) framework to address the above issue. Specifically, we first formulate a generalized causal graph to decouple the causal relationships among the variables in CAER. Following the causal graph, CLEF introduces a non-invasive context branch to capture the adverse direct effect caused by the context bias. During the inference, we eliminate the direct context effect from the total causal effect by comparing factual and counterfactual outcomes, resulting in bias mitigation and robust prediction. As a model-agnostic framework, CLEF can be readily integrated into existing methods, bringing consistent performance gains.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 画像復元のための拡散浄化を伴うデカップリングデータ整合性 Decoupled Data Consistency with Diffusion Purification for Image Restoration ( http://arxiv.org/abs/2403.06054v5 ) ライセンス: Link先を確認	Xiang Li, Soo Min Kwon, Ismail R. Alkhouri, Saiprasad Ravishankar, Qing Qu,	(参考訳) 拡散モデルは最近、データ分布をモデル化する能力に優れ、幅広い画像復元タスクに優れており、強力な生成前駆体として注目を集めている。画像復元の問題を解決するために,拡散モデルの逆サンプリングプロセスに追加の確率勾配ステップを組み込むことで,データ一貫性を実現する手法が多数存在する。しかし、さらなる勾配のステップは、計算オーバーヘッドが大きくなり、推論時間が増大するにつれて、現実の実用的な応用に挑戦する。また、データ一貫性ステップの数は、逆サンプリングステップの数によって制限されるため、加速拡散モデルサンプリング器を使用する際のさらなる困難が生じる。本研究では,データ整合性から逆処理を分離することにより,これらの問題に対処する新しい拡散型画像復元法を提案する。本手法は,データの整合性を維持するための再構成フェーズと,拡散浄化による事前処理を行う精製フェーズの交互化を含む。我々の手法は多目的性を示し、潜在空間における効率的な問題解決に高い適応性を与える。さらに、一貫性モデルを統合することで、多数のサンプリングステップの必要性を低減する。提案手法の有効性は,画像のデノイング,デブロアリング,インペイント,超解像など,画像修復作業における総合的な実験を通じて検証される。 Diffusion models have recently gained traction as a powerful class of deep generative priors, excelling in a wide range of image restoration tasks due to their exceptional ability to model data distributions. To solve image restoration problems, many existing techniques achieve data consistency by incorporating additional likelihood gradient steps into the reverse sampling process of diffusion models. However, the additional gradient steps pose a challenge for real-world practical applications as they incur a large computational overhead, thereby increasing inference time. They also present additional difficulties when using accelerated diffusion model samplers, as the number of data consistency steps is limited by the number of reverse sampling steps. In this work, we propose a novel diffusion-based image restoration solver that addresses these issues by decoupling the reverse process from the data consistency steps. Our method involves alternating between a reconstruction phase to maintain data consistency and a refinement phase that enforces the prior via diffusion purification. Our approach demonstrates versatility, making it highly adaptable for efficient problem-solving in latent space. Additionally, it reduces the necessity for numerous sampling steps through the integration of consistency models. The efficacy of our approach is validated through comprehensive experiments across various image restoration tasks, including image denoising, deblurring, inpainting, and super-resolution.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# 有限温度混合状態のSj$\ddot{\text{o}}$qvist量子幾何テンソル Sj$\ddot{\text{o}}$qvist quantum geometric tensor of finite-temperature mixed states ( http://arxiv.org/abs/2403.06944v2 ) ライセンス: Link先を確認	Zheng Zhou, Xu-Yang Hou, Xin Wang, Jia-Chen Tang, Hao Guo, Chih-Chun Chien,	(参考訳) 量子幾何学テンソル(QGT)は、量子状態の局所的な幾何学的性質と関連する位相情報を明らかにする。ここで、QGTの有限温度での混合量子状態への一般化は、Sj$\ddot{\text{o}}$qvist 距離に基づいて展開される。結果の Sj$\ddot{\text{o}}$qvist QGT は密度行列の個々のスペクトルレベルのゲージ変換の下で不変である。ピタゴラスのような関係は距離とゲージ変換を結び、平行輸送条件の役割を明らかにする。 QGTの真の部分は自然にフィッシャー・ラオ計量とフビニ・スタディ計量の和に分解され、量子距離への異なる寄与を区別することができる。 QGTの虚部はベリー曲率の重み付け和に比例し、ある条件下での混合状態の幾何学的位相をもたらす。本稿では,QGTの温度依存性を説明するために,異なる次元の3つの例を示す。 The quantum geometric tensor (QGT) reveals local geometric properties and associated topological information of quantum states. Here a generalization of the QGT to mixed quantum states at finite temperatures based on the Sj$\ddot{\text{o}}$qvist distance is developed. The resulting Sj$\ddot{\text{o}}$qvist QGT is invariant under gauge transformations of individual spectrum levels of the density matrix. A Pythagorean-like relation connects the distances and gauge transformations, which clarifies the role of the parallel-transport condition. The real part of the QGT naturally decomposes into a sum of the Fisher-Rao metric and Fubini-Study metric, allowing a distinction between different contributions to the quantum distance. The imaginary part of the QGT is proportional to a weighted summation of the Berry curvatures, which leads to a geometric phase for mixed states under certain conditions. We present three examples of different dimensions to illustrate the temperature dependence of the QGT and a discussion on possible implications.	翻訳日:2024-05-30 23:11:33 公開日:2024-05-29
# SemGauss-SLAM:Dense Semantic Gaussian Splatting SLAM SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM ( http://arxiv.org/abs/2403.07494v3 ) ライセンス: Link先を確認	Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang,	(参考訳) 本稿では,3次元ガウス表現を用いた高密度セマンティックSLAMシステムSemGauss-SLAMを提案する。本システムでは,3次元ガウス表現にセマンティックな特徴を組み込んで,環境の空間的レイアウト内に意味情報をエンコードすることで,正確なセマンティックなシーン表現を実現する。さらに、3次元ガウス表現の更新のための特徴レベル損失を提案し、3次元ガウス最適化のための高レベルガイダンスを可能にする。さらに,3次元ガウス表現とカメラポーズの協調最適化に多フレーム意味型アソシエーションを活用することで,追跡における累積ドリフトを低減し,セマンティック再構築精度を向上させるために,セマンティックインフォームドバンドルアライメントを導入し,低ドリフトトラッキングと正確なマッピングを実現する。我々のSemGauss-SLAM法は,ReplicaおよびScanNetデータセット上でのマッピングと追跡の精度の観点から,既存の放射場に基づくSLAM法よりも優れた性能を示すとともに,高精度なセマンティックセマンティックセマンティックセグメンテーションと密集セマンティックマッピングの優れた機能を示す。 We propose SemGauss-SLAM, a dense semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging multi-frame semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to low-drift tracking and accurate mapping. Our SemGauss-SLAM method demonstrates superior performance over existing radiance field-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in high-precision semantic segmentation and dense semantic mapping.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# Itergent (2+1)D Topological Order from Iterative (1+1)D gauging Emergent (2+1)D topological orders from iterative (1+1)D gauging ( http://arxiv.org/abs/2403.07575v2 ) ライセンス: Link先を確認	Jose Garre Rubio,	(参考訳) ゲージは、既存の大域対称性をローカライズするためにゲージ場を導入し、その結果、再びゲージできるゲージ場上の双対大域対称性をもたらす。スピン鎖上のゲージ過程をアベリア群対称性で反復し、2次元格子にゲージ場を配置することにより、局所対称性は任意のアベリア群に対して$XZX$-codeの安定化子となる。ゲージマップをツイストすることで、奇数の小冊子項に違反し、融合によって移動双極子励起が生じるような、陽イオンを明示的に閉じ込める新しい符号を得る。我々の構成は、初期(1+1)D大域対称系の異なる量子位相をとることによって、任意のギャップ付き境界を自然に実現している。提案手法は,より低次元のトポロジコードを得るための新しい経路を確立し,そのギャップ境界とテンソルネットワーク表現を同定する。 Gauging introduces gauge fields in order to localize an existing global symmetry, resulting in a dual global symmetry on the gauge fields that can be gauged again. By iterating the gauging process on spin chains with Abelian group symmetries and arranging the gauge fields in a 2D lattice, the local symmetries become the stabilizer of the $XZZX$-code for any Abelian group. By twisting the gauging map we obtain new codes that explicitly confine anyons, which violate an odd number of plaquette terms and whose fusion results in mobile dipole excitations. Our construction naturally realizes any gapped boundary by taking different quantum phases of the initial (1+1)D globally symmetric system. Our method establishes a new route to obtain higher dimensional topological codes from lower ones, to identify their gapped boundaries and their tensor network representations.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・トレーニング(ACAT)の導入によるMLロバストネスの促進 Introducing Adaptive Continuous Adversarial Training (ACAT) to Enhance ML Robustness ( http://arxiv.org/abs/2403.10461v2 ) ライセンス: Link先を確認	Mohamed elShehaby, Aditya Kotha, Ashraf Matrawy,	(参考訳) 敵の訓練は、敵の攻撃に対する機械学習(ML)モデルの堅牢性を高める。しかし、ネットワーク/サイバーセキュリティ領域におけるラベル付きトレーニングデータと敵のトレーニングデータを取得することは困難かつコストがかかる。そこで,本論文では,現実に検出された逆データを用いて,連続学習セッション中に,逆学習サンプルをモデルに統合するAdaptive Continuous Adversarial Training (ACAT)を紹介する。 SPAM検出データセットによる実験結果から、ACATは従来のプロセスと比較して逆サンプル検出に要する時間を短縮することを示した。さらに, MLに基づくSPAMフィルタの精度は, 3回のトレーニング後に69%から88%に向上した。 Adversarial training enhances the robustness of Machine Learning (ML) models against adversarial attacks. However, obtaining labeled training and adversarial training data in network/cybersecurity domains is challenging and costly. Therefore, this letter introduces Adaptive Continuous Adversarial Training (ACAT), a method that integrates adversarial training samples into the model during continuous learning sessions using real-world detected adversarial data. Experimental results with a SPAM detection dataset demonstrate that ACAT reduces the time required for adversarial sample detection compared to traditional processes. Moreover, the accuracy of the under-attack ML-based SPAM filter increased from 69% to over 88% after just three retraining sessions.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 先行学習によるフローベース生成超解法モデルの構築 Boosting Flow-based Generative Super-Resolution Models via Learned Prior ( http://arxiv.org/abs/2403.10988v3 ) ライセンス: Link先を確認	Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang, Hao-Wei Chen, Roy Tseng, Chien Feng, Chun-Yi Lee,	(参考訳) フローベース超解像(SR)モデルは、高品質な画像を生成する際に驚くべき能力を示した。しかし、これらの手法は、グリッドアーティファクト、爆発する逆数、固定サンプリング温度による最適以下の結果など、画像生成においていくつかの課題に直面している。これらの問題を克服するために、フローベースSRモデルの推論フェーズに先立って学習された条件を導入する。この前者は,低解像度画像上に条件付き潜在モジュールによって予測された潜時符号であり,フローモデルによりSR画像に変換される。我々のフレームワークは、アーキテクチャや事前訓練された重量を変更することなく、現代のフローベースSRモデルとシームレスに統合するように設計されている。提案手法の有効性を,広範囲な実験とアブレーション解析により評価した。提案するフレームワークは,フローベースSRモデルに固有のすべての問題に対処し,様々なSRシナリオにおける性能を向上させる。私たちのコードは、https://github.com/liyuantsao/BFSRで利用可能です。 Flow-based super-resolution (SR) models have demonstrated astonishing capabilities in generating high-quality images. However, these methods encounter several challenges during image generation, such as grid artifacts, exploding inverses, and suboptimal results due to a fixed sampling temperature. To overcome these issues, this work introduces a conditional learned prior to the inference phase of a flow-based SR model. This prior is a latent code predicted by our proposed latent module conditioned on the low-resolution image, which is then transformed by the flow model into an SR image. Our framework is designed to seamlessly integrate with any contemporary flow-based SR model without modifying its architecture or pre-trained weights. We evaluate the effectiveness of our proposed framework through extensive experiments and ablation analyses. The proposed framework successfully addresses all the inherent issues in flow-based SR models and enhances their performance in various SR scenarios. Our code is available at: https://github.com/liyuantsao/BFSR	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 学習不要損失に基づく拡散誘導の理解と改善 Understanding and Improving Training-free Loss-based Diffusion Guidance ( http://arxiv.org/abs/2403.12404v2 ) ライセンス: Link先を確認	Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, Dongsheng Li,	(参考訳) 事前訓練された拡散モデルにさらなる制御を加えることが、コンピュータビジョン、強化学習、科学のためのAIなど、ますます人気のある研究領域となっている。近年,クリーンな画像に事前学習したオフ・ザ・シェルフネットワークを用いて,学習自由損失に基づくガイダンスを提案する研究がいくつかある。このアプローチは、拡散誘導の無料ランチを提供するように見えるユニバーサル制御フォーマットのゼロショット条件生成を可能にする。本稿では,トレーニングフリーガイダンスの理解を深め,その限界を克服することを目的としている。我々は,学習自由指導を最適化の観点から支援する理論解析を行い,それを分類者に基づく(または分類者なし)指導と区別する。これらの欠点を解明するために, 学習自由指導が逆勾配の影響を受けやすいことを理論的に証明し, 分類器指導と比較して緩やかな収束率を示す。次に,その限界を克服するために,理論的理論的根拠と実証的証拠を伴って,一連の手法を導入する。画像と動きの生成実験により,これらの手法の有効性が確認された。 Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universal control formats, which appears to offer a free lunch in diffusion guidance. In this paper, we aim to develop a deeper understanding of training-free guidance, as well as overcome its limitations. We offer a theoretical analysis that supports training-free guidance from the perspective of optimization, distinguishing it from classifier-based (or classifier-free) guidance. To elucidate their drawbacks, we theoretically demonstrate that training-free guidance is more susceptible to adversarial gradients and exhibits slower convergence rates compared to classifier guidance. We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence. Our experiments in image and motion generation confirm the efficacy of these techniques.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 量子オンサガー関係 Quantum Onsager relations ( http://arxiv.org/abs/2403.12896v3 ) ライセンス: Link先を確認	Mankei Tsang,	(参考訳) 量子情報幾何学を用いて、定常状態に近い開系の力学をモデル化するOnsagerレート方程式の量子一般化を導出する。一般化された方程式は、力のフレキシブルな定義と、エントロピー生成の従来の定義を超えて、統計的発散測度と量子フィッシャー情報化測度を包含する。また、オープン量子系に対する時間反転と詳細バランスの一般的な概念を提案して、輸送テンソルに対する量子オンサーガー-カシミール関係を導出する。結果は、統計力学とパラメータ推定理論の間に顕著な関連性を確立する。 Using quantum information geometry, I derive quantum generalizations of the Onsager rate equations, which model the dynamics of an open system near a steady state. The generalized equations hold for a flexible definition of the forces as well as a large class of statistical divergence measures and quantum-Fisher-information metrics beyond the conventional definition of entropy production. I also derive quantum Onsager-Casimir relations for the transport tensors by proposing a general concept of time reversal and detailed balance for open quantum systems. The results establish a remarkable connection between statistical mechanics and parameter estimation theory.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# LLM埋め込みによるテキストクラスタリング Text clustering with LLM embeddings ( http://arxiv.org/abs/2403.15112v2 ) ライセンス: Link先を確認	Alina Petukhova, Joao P. Matos-Carvalho, Nuno Fachada,	(参考訳) テキストクラスタリングは、デジタルコンテンツの増加を組織化する上で重要なアプローチであり、分類されていないデータに隠されたパターンを構造化し見つけるのに役立つ。本研究では,大規模言語モデル(LLM)におけるテキスト埋め込みとクラスタリングアルゴリズムの違いが,テキストデータセットのクラスタリングに与える影響について検討した。組込みがクラスタリング結果にどのように影響するか, 要約による次元還元による役割, 組込みサイズ調整について, 一連の実験を行った。その結果、LLM埋め込みは構造化言語のニュアンスを捉えるのに優れており、BERTは性能において軽量な選択肢を導いていることがわかった。さらに,組込み次元の増大や要約手法はクラスタリング効率を均一に向上させるものではないことが判明し,これらの手法が実生活モデルで使用するためには慎重な分析が必要であることが示唆された。これらの結果は、テキストクラスタリングアプリケーションにおいて、ニュアンス付きテキスト表現の必要性と計算可能性との複雑なバランスを浮き彫りにする。本研究は, 従来のテキストクラスタリングフレームワークを拡張し, LLMからの埋め込みを組み込むことで, 方法論改善の道を切り開くとともに, 各種テキスト解析における新たな手法を開拓する。 Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. In this research, we investigated how different textual embeddings - particularly those used in large language models (LLMs) - and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and embedding size adjustment. Results reveal that LLM embeddings excel at capturing the nuances of structured language, while BERT leads the lightweight options in performance. In addition, we find that increasing embedding dimensionality and summarisation techniques do not uniformly improve clustering efficiency, suggesting that these strategies require careful analysis to use in real-life models. These results highlight a complex balance between the need for nuanced text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by incorporating embeddings from LLMs, thereby paving the way for improved methodologies and opening new avenues for future research in various types of textual analysis.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# MSCoTDet:マルチスペクトルペデストリアン検出のための言語駆動型マルチモーダルフュージョン MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection ( http://arxiv.org/abs/2403.15209v2 ) ライセンス: Link先を確認	Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro,	(参考訳) RGBと熱モダリティの相補的な情報により, マルチスペクトル歩行者検出は, 概日適用にとって魅力的である。しかしながら、現在のモデルは、特に統計的に偏ったデータセットから得られたモダリティバイアスのために、特定のケース(例えば、熱障害のある歩行者)で歩行者を検出することができないことが多い。本稿では,Large Language Models (LLMs) を用いた多スペクトル歩行者検出におけるモダリティバイアスの緩和について検討する。そこで我々は,マルチスペクトル・チェーン・オブ・ソート(MSCoT)のプロンプト戦略を設計し,LLMがマルチスペクトル歩行者検出を行うように促す。さらに,MSCoTプロンプトをマルチスペクトル歩行者検出に統合するMSCoTDet(Multispectral Chain-of-Thought Detection)フレームワークを提案する。この目的のために我々は,MSCoTの出力を融合させる言語駆動型マルチモーダルフュージョン (LMF) 戦略を設計し,視覚に基づくマルチスペクトル歩行者検出モデルの検出結果に即した。大規模な実験により、MSCoTDetはモダリティバイアスを効果的に軽減し、多スペクトル歩行者検出を改善することが検証された。 Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in certain cases (e.g., thermal-obscured pedestrians), particularly due to the modality bias learned from statistically biased datasets. In this paper, we investigate how to mitigate modality bias in multispectral pedestrian detection using Large Language Models (LLMs). Accordingly, we design a Multispectral Chain-of-Thought (MSCoT) prompting strategy, which prompts the LLM to perform multispectral pedestrian detection. Moreover, we propose a novel Multispectral Chain-of-Thought Detection (MSCoTDet) framework that integrates MSCoT prompting into multispectral pedestrian detection. To this end, we design a Language-driven Multi-modal Fusion (LMF) strategy that enables fusing the outputs of MSCoT prompting with the detection results of vision-based multispectral pedestrian detection models. Extensive experiments validate that MSCoTDet effectively mitigates modality biases and improves multispectral pedestrian detection.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 量子/古典系と量子軌道に対するマルコフ力学 Markovian dynamics for a quantum/classical system and quantum trajectories ( http://arxiv.org/abs/2403.16065v2 ) ライセンス: Link先を確認	Alberto Barchielli,	(参考訳) 量子軌道法は、数値計算の出発点としてオープンシステム理論において使われ、連続した時間で量子システムのモニタリングを記述するために用いられる。ここでは、この手法を拡張して、量子/古典ハイブリッドシステムのダイナミクスに対する一般的なアプローチを開発する。 2つの結合確率微分方程式を用いることで、古典的成分と、それぞれ固有の力学を持ち、互いに相互作用する量子的成分を記述することができる。数学的に厳密な構成は、マルコフの合同力学を持ち、量子成分のヒルベルト空間上の有界作用素のみを含むという制限の下で与えられる。重要な特徴は、相互作用が量子成分から古典成分への情報のフローを許容するならば、必然的に力学は散逸的であることである。また、この理論は、純粋に量子の場合において量子力学半群に還元され、純粋に古典的な場合においてリウヴィル方程式とコルモゴロフ-フォッカー-プランク方程式を含む適切なハイブリッド力学半群とどのように結びついているかを示す。さらに、この半群は、提案された確率力学をハイブリッドマスター方程式に基づく他の様々な提案と比較することができる。いくつかの単純な例は、説明できる様々な物理的な振る舞いを示すために構築されており、特に隠れ絡みを示すモデルが導入されている。 Quantum trajectory techniques have been used in the theory of open systems as a starting point for numerical computations and to describe the monitoring of a quantum system in continuous time. Here we extend this technique and use it to develop a general approach to the dynamics of quantum/classical hybrid systems. By using two coupled stochastic differential equations, we can describe a classical component and a quantum one which have their own intrinsic dynamics and which interact with each other. A mathematically rigorous construction is given, under the restriction of having a Markovian joint dynamics and of involving only bounded operators on the Hilbert space of the quantum component. An important feature is that, if the interaction allows for a flow of information from the quantum component to the classical one, necessarily the dynamics is dissipative. We show also how this theory is connected to a suitable hybrid dynamical semigroup, which reduces to a quantum dynamical semigroup in the purely quantum case and includes Liouville and Kolmogorov-Fokker-Plank equations in the purely classical case. Moreover, this semigroup allows to compare the proposed stochastic dynamics with various other proposals based on hybrid master equations. Some simple examples are constructed in order to show the variety of physical behaviours which can be described; in particular, a model presenting hidden entanglement is introduced.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# SegICL:医療画像におけるセグメンテーション強化のためのマルチモーダルインコンテキスト学習フレームワーク SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging ( http://arxiv.org/abs/2403.16578v3 ) ライセンス: Link先を確認	Lingdong Shen, Fangxin Shang, Xiaoshuang Huang, Yehui Yang, Haifeng Huang, Shiming Xiang,	(参考訳) 医用画像のセグメンテーションの分野では、アウト・オブ・ディストリビューション(OOD)のセグメンテーションタスクを費用対効果で扱うことが大きな課題である。ユニバーサルセグメンテーションモデル(Universal segmentation model)は、医療画像の様々なモダリティを一般化することを目的としたソリューションである。少ないショットの学習セグメンテーション法は、典型的にはデータの特定のモダリティのために設計されており、他のモダリティで使用するために直接転送することはできない。そこで我々は,画像セグメンテーションにIn-Context Learning(ICL)を活用する新しいアプローチであるSegICLを紹介した。既存の方法とは異なり、SegICLはテキスト誘導セグメンテーションを採用し、小さなイメージマスクペアでコンテキスト内学習を行う機能を備えており、OODタスク(OODモダリティとデータセットを含む)のスクラッチや微調整からモデルをトレーニングする必要がなくなる。 OODタスクにおけるショット数とセグメンテーション性能の正の相関を示す。ショット供給時のセグメンテーション性能はゼロショット設定時の性能の約1.5倍である。これは、SegICLがコンテキスト情報に基づく新しいセグメンテーションタスクに効果的に対処していることを示している。さらに、SegICLはOODおよび分散タスクのメインストリームモデルに匹敵するパフォーマンスを示す。私たちのコードは、論文レビューの後にリリースされます。 In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 都市VLP:都市域プロファイリングのためのマルチグラニュラリティビジョンランゲージ準備 UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Region Profiling ( http://arxiv.org/abs/2403.16831v2 ) ライセンス: Link先を確認	Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang,	(参考訳) 都市域のプロファイリングは、人口動態、インフラ、経済活動などの特徴を保存しつつ、都市の低次元的な表現を学習することを目的としている。しかし、一般的な事前訓練されたモデル、特に衛星画像に依存しているモデルは、二重課題に直面している。第一に、衛星データからマクロレベルのパターンのみに集中させることは、ある場所での建築詳細などの微妙な詳細さの欠如による偏見をもたらす可能性があり、第二に、事前訓練されたモデルにおける解釈可能性の欠如は、都市計画の透明な証拠を提供する上での有用性を制限している。これらの問題に対処して、ビジョン・ランゲージ事前学習に基づくUrbanVLPという新しいフレームワークを考案した。我々のUrbanVLPは、マクロ(サテライト)レベルとマイクロ(ストリートビュー)レベルの複数粒度情報をシームレスに統合し、事前訓練されたモデルの制限を克服します。さらに、自動テキスト生成と校正を導入し、都市画像の高品質なテキスト記述を作成することにより、下流アプリケーションにおける解釈可能性を高める。 6つの都市指標予測タスクで実施された厳密な実験は、その優れた性能を示している。 Urban region profiling aims to learn a low-dimensional representation of a given urban area while preserving its characteristics, such as demographics, infrastructure, and economic activities, for urban planning and development. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place.Secondly, the lack of interpretability in pretrained models limits their utility in providing transparent evidence for urban planning. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro (street-view) levels, overcoming the limitations of prior pretrained models. Moreover, it introduces automatic text generation and calibration, elevating interpretability in downstream applications by producing high-quality text descriptions of urban imagery. Rigorous experiments conducted across six urban indicator prediction tasks underscore its superior performance.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 関係探索による大規模言語モデルを用いた曖昧なエンティティマッチング Disambiguate Entity Matching using Large Language Models through Relation Discovery ( http://arxiv.org/abs/2403.17344v2 ) ライセンス: Link先を確認	Zezhou Huang,	(参考訳) エンティティマッチングは、ファジィ結合や重複解消といったタスクの中心にある、データ統合とクリーニングにおいて重要な課題である。従来のアプローチでは、編集距離やJaccardの類似性、最近では、GPTのような大規模言語モデル(LLM)の進歩を含む組み込みやディープニューラルネットワークなど、ファジィな項表現の克服に重点を置いてきた。しかし、エンティティマッチングにおける中核的な課題は、特に外部データベースとの統合において「マッチ」を構成するものを定義することの曖昧さにまで及んでいる。この曖昧さは、実体間の詳細と粒度の異なるレベルから生じ、正確な一致を複雑にする。本稿では,意味的類似性を純粋に識別するアプローチから,マッチングにおけるあいまいさの解消に不可欠なエンティティ間の「関係」を理解し定義するアプローチを提案する。本手法では,タスクに関連する一連の関係を事前に定義することにより,類似性のスペクトルをより効率的にナビゲートすることができる。 Entity matching is a critical challenge in data integration and cleaning, central to tasks like fuzzy joins and deduplication. Traditional approaches have focused on overcoming fuzzy term representations through methods such as edit distance, Jaccard similarity, and more recently, embeddings and deep neural networks, including advancements from large language models (LLMs) like GPT. However, the core challenge in entity matching extends beyond term fuzziness to the ambiguity in defining what constitutes a "match," especially when integrating with external databases. This ambiguity arises due to varying levels of detail and granularity among entities, complicating exact matches. We propose a novel approach that shifts focus from purely identifying semantic similarities to understanding and defining the "relations" between entities as crucial for resolving ambiguities in matching. By predefining a set of relations relevant to the task at hand, our method allows analysts to navigate the spectrum of similarity more effectively, from exact matches to conceptually related entities.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# マルチビュー特徴抽出のための量子加速クロスレグレッションアルゴリズム Quantum accelerated cross regression algorithm for multiview feature extraction ( http://arxiv.org/abs/2403.17444v2 ) ライセンス: Link先を確認	Hai-Ling Liu, Ya-Qian Zhao, Ren-Gang Li, Xin Zhang,	(参考訳) マルチビュー特徴抽出(MvFE)は、機械学習、画像処理、その他の分野に広く応用されている。大規模高次元データを扱う場合、MvFEにより古典コンピュータの性能は深刻な問題に直面し、高価な行列計算を行う。この課題に対処するために、MvFEのための量子加速クロスレグレッションアルゴリズムを提案する。 1) MvFE の分野における量子コンピューティングのギャップを埋める MvFE の量子バージョンアルゴリズムを提案し、(2) 量子アルゴリズムは対象データ行列のブロックエンコーディングを構築するように設計され、ブロックエンコーディングフレームワークに基づく最適なハミルトンシミュレーション技術を使用して、対象データ行列の量子シミュレーションを効率的に実現することができる。提案手法は,アルゴリズムのシミュレーション誤差への依存を低減し,アルゴリズム性能を向上させる。(3)古典的アルゴリズムと比較して,提案アルゴリズムは,データ点数,データ点の次元,ビューデータ数において多項式加速度を有する。 Multi-view Feature Extraction (MvFE) has wide applications in machine learning, image processing and other fields. When dealing with massive high-dimensional data, the performance of classical computer faces severe challenges due to MvFE involves expensive matrix calculation. To address this challenge, a quantum-accelerated cross-regression algorithm for MvFE is proposed. The main contributions are as follows:(1) a quantum version algorithm for MvFE is proposed for the first time, filling the gap of quantum computing in the field of MvFE;(2) a quantum algorithm is designed to construct the block-encoding of the target data matrix, so that the optimal Hamiltonian simulation technology based on the block-encoding framework can be used to efficiently realize the quantum simulation of the target data matrix. This approach reduces the dependence of the algorithm's on simulation errors to enhance algorithm performance;(3) compared with the classical counterpart algorithm, the proposed quantum algorithm has a polynomial acceleration in the number of data points, the dimension of data points and the number of view data.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# マスクオートエンコーダはPDE学習者である Masked Autoencoders are PDE Learners ( http://arxiv.org/abs/2403.17728v2 ) ライセンス: Link先を確認	Anthony Zhou, Amir Barati Farimani,	(参考訳) 偏微分方程式(PDE)に対するニューラルソルバは、高速で正確な物理解を生成する大きな可能性を持っているが、その実用性は、その一般化性によって制限されている。 PDEは幅広いスケールで進化し、様々な振る舞いを示す。これらの現象を予測するには、様々な係数、境界条件、解像度、方程式を含む様々な入力の学習表現が必要となる。一般化可能なPDEモデリングへのステップとして,物理問題に対するマスク付き事前学習を適用する。 PDEを横断する自己教師型学習によって、マスク付きオートエンコーダは異種物理学を統合し、意味のある潜在表現を学習し、この空間で潜在PDE算術を実行することができる。さらに,マスク付きプレトレーニングによりPDE係数の回帰とPDE特徴の分類が向上することが実証された。最後に、学習した潜在表現にニューラルソルバを条件付けすることで、様々な係数、離散化、境界条件、および目に見えないPDEにおけるタイムステッピングと超分解能のパフォーマンスを向上させることができる。マスク付きプレトレーニングは、大規模でラベルなし、異質なデータセットにまたがる統一的な方法として現れて、大規模に潜在物理学を学ぶことを願っている。 Neural solvers for partial differential equations (PDEs) have great potential to generate fast and accurate physics solutions, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs which may encompass different coefficients, boundary conditions, resolutions, or even equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for physics problems. Through self-supervised learning across PDEs, masked autoencoders can consolidate heterogeneous physics to learn meaningful latent representations and perform latent PDE arithmetic in this space. Furthermore, we demonstrate that masked pretraining can improve PDE coefficient regression and the classification of PDE features. Lastly, conditioning neural solvers on learned latent representations can improve time-stepping and super-resolution performance across a variety of coefficients, discretizations, or boundary conditions, as well as on unseen PDEs. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.	翻訳日:2024-05-30 23:01:49 公開日:2024-05-29
# 単語マッチングを超える: 構文が機械翻訳における文脈内例選択を改善した Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation ( http://arxiv.org/abs/2403.19285v2 ) ライセンス: Link先を確認	Chenming Tang, Zhixiang Wang, Yunfang Wu,	(参考訳) In-context Learning (ICL) は、大規模言語モデル (LLM) の時代において、あるタスクに対して LLM のパワーを誘発するいくつかの例が示される、流行の促進戦略である。情報のある例をどうやって選ぶかは、未解決の問題である。機械翻訳(MT)のテキスト内サンプル選択は、構文レベルの深い知識を無視しつつ、表面的な単語レベルの特徴に重点を置いている。本稿では,ポリノミアル距離を用いた依存関係木間の構文的類似性を計算し,構文に基づくMTの例選択手法を提案する。さらに,単語レベルと構文レベルの両方の基準で選択された例を組み合わせたアンサンブル戦略を提案する。英語と6の共通言語による実験結果から,文法はMTのICLを効果的に向上し,12の翻訳方向のうち11のCOMETスコアが最も高い。 In-context learning (ICL) is the trending prompting strategy in the era of large language models (LLMs), where a few examples are demonstrated to evoke LLMs' power for a given task. How to select informative examples remains an open issue. Previous works on in-context example selection for machine translation (MT) focus on superficial word-level features while ignoring deep syntax-level knowledge. In this paper, we propose a syntax-based in-context example selection method for MT, by computing the syntactic similarity between dependency trees using Polynomial Distance. In addition, we propose an ensemble strategy combining examples selected by both word-level and syntax-level criteria. Experimental results between English and 6 common languages indicate that syntax can effectively enhancing ICL for MT, obtaining the highest COMET scores on 11 out of 12 translation directions.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 前向きパスのみを用いたテスト時間モデル適応 Test-Time Model Adaptation with Only Forward Passes ( http://arxiv.org/abs/2404.01650v2 ) ライセンス: Link先を確認	Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao,	(参考訳) テストタイム適応は、トレーニング済みのモデルを、潜在的に分布シフトのある未確認テストサンプルに適応させるのに有効であることが証明されている。しかし、現実のシナリオでは、モデルは通常、リソース制限されたデバイス(例えばFPGA)にデプロイされ、しばしば量子化され、アクセラレーションのための非修飾パラメータでハードコードされる。既存のメソッドは、サポートされないかもしれないモデル更新の計算集約的なバックプロパゲーションに大きく依存しているため、多くの場合、実現不可能である。そこで本研究では,テスト時間フォワード最適化適応法(FOA)を提案する。 FOAでは、微分自由共分散行列適応進化戦略を用いて、新たに追加されたプロンプト(モデルの入力として)のみを学習する。この戦略をオンラインの教師なし環境下で安定的に動作させるため、テスト学習統計の不一致とモデル予測エントロピーを測定して、新しい適合度関数を考案する。さらに、シフトテストサンプルのモデルアクティベーションを直接調整し、ソーストレーニング領域と整合させ、適応性能をさらに向上させるアクティベーションシフト方式を設計する。 FOAはバックプロパゲーションやモデルウェイトを変更することなく、量子化された8ビットのViT上で動作し、32ビットの32ビットのViTでは勾配ベースのTENTより優れ、ImageNet-Cでは最大24倍のメモリ削減を実現している。 Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 文脈内学習のデコンストラクタ:破壊によるプロンプト理解 Deconstructing In-Context Learning: Understanding Prompts via Corruption ( http://arxiv.org/abs/2404.02054v2 ) ライセンス: Link先を確認	Namrata Shivagunde, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky,	(参考訳) 大きな言語モデル(LLMs)から$``$learn in context$"$は、提供されたプロンプトに基づいて、その使用が爆発的に増加し、ChatGPT、Claude、BardといったAIアシスタントの普及につながった。これらのAIアシスタントは、人間のフィードバックを使用するアライメント技術によって、マイナーな迅速な修正に対して堅牢であることが知られている。対照的に、彼らがバックボーンとして使用する基礎となる事前訓練されたLSMは、この点において脆いことが知られている。高品質のバックボーンモデルの構築は依然として中心的な課題であり、その品質を評価するための一般的なアプローチは、ほとんどショット評価を行うことである。このような評価は、マイナーな迅速な修正に非常に敏感であることや、特定のインコンテキストの例を選択することで有名である。これまでの研究では、プロンプトの異なる要素の変更がモデルのパフォーマンスにどのように影響するかを調べてきた。しかし、これらの初期の研究は特定のプロンプト属性の限られた数に集中する傾向があり、しばしば矛盾する結果を生んだ。さらに、以前の研究では、パラメータが150億未満のモデルに焦点を当てたり、GPT-3やPaLMのようなブラックボックスモデルのみを精査し、複製を困難にしていた。本研究では,全プロンプトをタスク記述,デモインプット,ラベル,インラインインストラクションの4つのコンポーネントに分解する。これらの要素の構造的・意味的腐敗がモデル性能に及ぼす影響について検討する。分類と生成タスクをカバーする10のデータセットを用いて,1.5Bから70Bのモデルについて検討した。プロンプト内の繰り返しテキストはモデル性能を向上し、より大きなモデル($30B)はプロンプトのセマンティクスにより敏感であることがわかった。最後に、実演にタスクとインライン命令を追加することで、意味的に破損してもモデル性能が向上することが観察された。 The ability of large language models (LLMs) to $``$learn in context$"$ based on the provided prompt has led to an explosive growth in their use, culminating in the proliferation of AI assistants such as ChatGPT, Claude, and Bard. These AI assistants are known to be robust to minor prompt modifications, mostly due to alignment techniques that use human feedback. In contrast, the underlying pre-trained LLMs they use as a backbone are known to be brittle in this respect. Building high-quality backbone models remains a core challenge, and a common approach to assessing their quality is to conduct few-shot evaluation. Such evaluation is notorious for being highly sensitive to minor prompt modifications, as well as the choice of specific in-context examples. Prior work has examined how modifying different elements of the prompt can affect model performance. However, these earlier studies tended to concentrate on a limited number of specific prompt attributes and often produced contradictory results. Additionally, previous research either focused on models with fewer than 15 billion parameters or exclusively examined black-box models like GPT-3 or PaLM, making replication challenging. In the present study, we decompose the entire prompt into four components: task description, demonstration inputs, labels, and inline instructions provided for each demonstration. We investigate the effects of structural and semantic corruptions of these elements on model performance. We study models ranging from 1.5B to 70B in size, using ten datasets covering classification and generation tasks. We find that repeating text within the prompt boosts model performance, and bigger models ($\geq$30B) are more sensitive to the semantics of the prompt. Finally, we observe that adding task and inline instructions to the demonstrations enhances model performance even when the instructions are semantically corrupted.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# ポイントクラウド映像表現学習のためのPDEモデリングの検討 On Exploring PDE Modeling for Point Cloud Video Representation Learning ( http://arxiv.org/abs/2404.04720v2 ) ライセンス: Link先を確認	Zhuoxu Huang, Zhenkun Fan, Tao Xu, Jungong Han,	(参考訳) 複雑な構造と秩序のない空間配置のため、ポイントクラウドビデオ表現学習は困難である。従来の手法はフレーム・ツー・フレームの相関やポイント・ワイド対応追跡に苦慮している。近年、偏微分方程式(PDE)は、特定の制約の中で空間的時間的データ情報を均一に解く新しい視点を提供する。有形点対応の追跡は依然として困難であるが,PDE解決問題としてポイントクラウド映像表現学習の形式化を提案する。 PDEは時間とともに空間形状の変形を解くために使用される流体解析にインスパイアされ、時間的情報によって影響を受ける空間点の変動を解決するためにPDEを用いている。空間的時間的相関をモデル化することにより、時間的特徴と空間的変動を規則化し、ポイントクラウドビデオにおける表現学習を強化することを目指す。我々は、PointNetライクなエンコーダとPDE解決モジュールで構成されるMotion PointNetを紹介する。当初,空間変動の初期状態をモデル化する軽量で効果的なエンコーダを構築した。その後,PDE分解モジュールをパラメータ化潜在空間で開発し,ポイントクラウドビデオに固有の時空間相関に対処する。 PDEの解法は、特徴分布の変換において重要なコントラスト学習構造により導かれ、洗練され、ポイントクラウドビデオデータ内の特徴表現が最適化される。注目すべきは、Motion PointNetがMSRAction-3Dデータセットで97.52%という驚くべき精度を達成したことです。 Point cloud video representation learning is challenging due to complex structures and unordered spatial arrangement. Traditional methods struggle with frame-to-frame correlations and point-wise correspondence tracking. Recently, partial differential equations (PDE) have provided a new perspective in uniformly solving spatial-temporal data information within certain constraints. While tracking tangible point correspondence remains challenging, we propose to formalize point cloud video representation learning as a PDE-solving problem. Inspired by fluid analysis, where PDEs are used to solve the deformation of spatial shape over time, we employ PDE to solve the variations of spatial points affected by temporal information. By modeling spatial-temporal correlations, we aim to regularize spatial variations with temporal features, thereby enhancing representation learning in point cloud videos. We introduce Motion PointNet composed of a PointNet-like encoder and a PDE-solving module. Initially, we construct a lightweight yet effective encoder to model an initial state of the spatial variations. Subsequently, we develop our PDE-solving module in a parameterized latent space, tailored to address the spatio-temporal correlations inherent in point cloud video. The process of solving PDE is guided and refined by a contrastive learning structure, which is pivotal in reshaping the feature distribution, thereby optimizing the feature representation within point cloud video data. Remarkably, our Motion PointNet achieves an impressive accuracy of 97.52% on the MSRAction-3D dataset, surpassing the current state-of-the-art in all aspects while consuming minimal resources (only 0.72M parameters and 0.82G FLOPs).	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 2024年エヴァラティンのラテンパイプ:ラテンのモルフォシンタクティック分析 ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin ( http://arxiv.org/abs/2404.05839v2 ) ライセンス: Link先を確認	Milan Straka, Jana Straková, Federica Gamba,	(参考訳) 我々は、EvaLatin 2024 Dependency Parsing 共有タスクの受賞申請である LatinPipe を提示する。本システムでは, 基本および大型の事前学習型LMの微調整による連結と, 係り受け解析と形態解析の両方を共同で学習する形態学用ドット積アテンションヘッド, ソフトマックス分類ヘッドから構成される。これは、より統一されたアノテーションスタイルを達成するために、アノテーションの追加調和を利用して、利用可能な7つのラテンコーパスからのサンプリングによって訓練される。微調整の前に、凍結重量のあるいくつかの初期エポックでシステムを訓練する。また、Transformer(s)上にBiLSTMレイヤを積み重ねることで、局所的な相対的コンテキスト化も追加します。最後に、7つのランダムにインスタンス化されたネットワークから出力された確率分布を最終提出のためにアンサンブルする。コードはhttps://github.com/ufal/evalatin2024-latinpipeで公開されている。 We present LatinPipe, the winning submission to the EvaLatin 2024 Dependency Parsing shared task. Our system consists of a fine-tuned concatenation of base and large pre-trained LMs, with a dot-product attention head for parsing and softmax classification heads for morphology to jointly learn both dependency parsing and morphological analysis. It is trained by sampling from seven publicly available Latin corpora, utilizing additional harmonization of annotations to achieve a more unified annotation style. Before fine-tuning, we train the system for a few initial epochs with frozen weights. We also add additional local relative contextualization by stacking the BiLSTM layers on top of the Transformer(s). Finally, we ensemble output probability distributions from seven randomly instantiated networks for the final submission. The code is available at https://github.com/ufal/evalatin2024-latinpipe.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# CoVoMix:人間のような多話者会話のためのゼロショット音声生成の改善 CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations ( http://arxiv.org/abs/2404.06690v2 ) ライセンス: Link先を確認	Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng,	(参考訳) ゼロショット音声合成(TTS)モデリングの最近の進歩は、高忠実で多様な音声を生成するために大きな進歩をもたらした。しかし、対話生成は、音声における人間のような自然性を達成するとともに、引き続き課題である。本稿では,ゼロショット,ヒューマンライク,マルチスピーカ,マルチラウンド音声生成のための新しいモデルであるCoVoMix: Conversational Voice Mixture Generationを紹介する。 CoVoMixはまず対話テキストを個別のトークンの複数のストリームに変換する。これらのトークンストリームは、フローマッチングベースの音響モデルに入力され、混合メル-スペクトログラムを生成する。最後に、HiFi-GANモデルを用いて音声波形を生成する。さらに、対話モデリングと生成の有効性を測定するための総合的なメトリクスセットを考案する。実験の結果,CoVoMixは自然性やコヒーレンスにおいて人間に似た対話を生成できるだけでなく,複数の話者が複数ラウンドの会話を行うことができることがわかった。これは、ある話者の発話が他の話者の介在物や笑いとシームレスに混合される単一のチャンネルで生成された事例によって例示され、後者が注意深いリスナーとしての役割を示す。オーディオサンプルはhttps://aka.ms/covomix.comで入手できる。 Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix first converts dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. This is exemplified by instances generated in a single channel where one speaker's utterance is seamlessly mixed with another's interjections or laughter, indicating the latter's role as an attentive listener. Audio samples are available at https://aka.ms/covomix.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 有限マルコフ長による混合状態量子相の安定性 Stability of mixed-state quantum phases via finite Markov length ( http://arxiv.org/abs/2404.07251v2 ) ライセンス: Link先を確認	Shengqi Sang, Timothy H. Hsieh,	(参考訳) ハミルトン基底状態の量子相の場合、エネルギーギャップは、ギャップが有限である限り、位相の安定性を保証する上で中心的な役割を果たす。混合状態相と遷移を特徴づける等しく重要な量として,量子条件相互情報(CMI)が指数関数的に減衰する長さ尺度であるマルコフ長を提案する。局所リンドブラディアンの下で進化する状態について、マルコフの長さが進化に沿って有限であるならば、それは同じ相のままであり、つまり、前者の進化を逆転できる別の準局所リンドブラディアン進化が存在するということである。この診断をデコヒーレンスに基づくトーリックコードに適用し,マルコフ長はデコヒーレンス遷移以外の至る所で有限であることを示す。この場合、CMIはランダム結合イジングモデルにおける点欠陥の自由エネルギーコストにマッピングできる。これは混合状態相転移が陰極性転移と一致することを示唆し、準局所復号チャネルも示唆している。 For quantum phases of Hamiltonian ground states, the energy gap plays a central role in ensuring the stability of the phase as long as the gap remains finite. We propose Markov length, the length scale at which the quantum conditional mutual information (CMI) decays exponentially, as an equally essential quantity characterizing mixed-state phases and transitions. For a state evolving under a local Lindbladian, we argue that if its Markov length remains finite along the evolution, then it remains in the same phase, meaning there exists another quasi-local Lindbladian evolution that can reverse the former one. We apply this diagnostic to toric code subject to decoherence and show that the Markov length is finite everywhere except at its decodability transition, at which it diverges. CMI in this case can be mapped to the free energy cost of point defects in the random bond Ising model. This implies that the mixed state phase transition coincides with the decodability transition and also suggests a quasi-local decoding channel.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# Sketch-Plan-Generalize:帰納的一般化可能な空間概念の連続的なFew-Shot学習 Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts ( http://arxiv.org/abs/2404.07774v2 ) ライセンス: Link先を確認	Namasivayam Kalithasan, Sachit Sachdeva, Himanshu Gaurav Singh, Vishal Bindal, Arnav Tuli, Gurarmaan Singh Panjeta, Divyanshu Aggarwal, Rohan Paul, Parag Singla,	(参考訳) 本研究の目的は,高層タワーの帰納的構成として,空間概念の帰納的一般化を学習できるようにすることである。人間の実演を前提として、観測されたインスタンスを説明するsuccinct ${ program}$表現を推論する学習アーキテクチャを模索する。さらに、このアプローチは、異なる大きさの新規構造や、以前に学習された概念の階層的な構成として表される複雑な構造に誘導的に一般化すべきである。事前訓練された大きな(視覚的な)言語モデルのコード生成機能と純粋にニューラルモデルを使用する既存のアプローチは、a-prioriが目にしない複雑な概念への一般化が不十分であることを示している。私たちのキーとなる洞察は、帰納的概念学習を要因とすることです。 (i)${\it Sketch:}$新しい概念の粗いシグネチャを検出して推測する (ii)${\it Plan:}$ MCTS search over grounded action sequences (iii)${\it Generalize:}$ordered planをインダクティブプログラムとして抽象化する。私たちのパイプラインは、一般化とモジュラーの再利用を促進し、継続的な概念学習を可能にします。提案手法は,大規模言語モデル(LLM)のコード生成能力と基底的ニューラルネットワーク表現の利点を組み合わせることで,LLMとニューラルオンリーのアプローチに関連する複雑な構造を構築するタスクにおいて,より強力な帰納的一般化を示すニューラルシンボリックプログラムを実現する。さらに、後続の具体化指導のための学習概念を用いた推論と計画能力を示す。 Our goal is to enable embodied agents to learn inductively generalizable spatial concepts, e.g., learning staircase as an inductive composition of towers of increasing height. Given a human demonstration, we seek a learning architecture that infers a succinct ${program}$ representation that explains the observed instance. Additionally, the approach should generalize inductively to novel structures of different sizes or complex structures expressed as a hierarchical composition of previously learned concepts. Existing approaches that use code generation capabilities of pre-trained large (visual) language models, as well as purely neural models, show poor generalization to a-priori unseen complex concepts. Our key insight is to factor inductive concept learning as (i) ${\it Sketch:}$ detecting and inferring a coarse signature of a new concept (ii) ${\it Plan:}$ performing MCTS search over grounded action sequences (iii) ${\it Generalize:}$ abstracting out grounded plans as inductive programs. Our pipeline facilitates generalization and modular reuse, enabling continual concept learning. Our approach combines the benefits of the code generation ability of large language models (LLM) along with grounded neural representations, resulting in neuro-symbolic programs that show stronger inductive generalization on the task of constructing complex structures in relation to LLM-only and neural-only approaches. Furthermore, we demonstrate reasoning and planning capabilities with learned concepts for embodied instruction following.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 検索クエリのセマンティックなドメイン内商品識別 Semantic In-Domain Product Identification for Search Queries ( http://arxiv.org/abs/2404.09091v2 ) ライセンス: Link先を確認	Sanat Sharma, Jayant Kumar, Twisha Naik, Zhaoyu Lu, Arvind Srikantan, Tracy Holloway King,	(参考訳) 検索クエリにおける正確な製品識別と暗黙の製品識別は、特にAdobeのような50以上の製品を持ち、数百のツールにまたがるクエリをカバーしている企業において、ユーザーエクスペリエンスの向上に不可欠である。本研究では,ユーザ行動データから製品分類器を学習するための新しい手法を提案する。私たちのセマンティックモデルでは、デプロイされた表面におけるCTRの相対的な改善(クリックスルーレート)が25%以上、ヌルレートが50%以上減少し、アプリカードが2倍増加し、製品の可視性が向上しました。 Accurate explicit and implicit product identification in search queries is critical for enhancing user experiences, especially at a company like Adobe which has over 50 products and covers queries across hundreds of tools. In this work, we present a novel approach to training a product classifier from user behavioral data. Our semantic model led to >25% relative improvement in CTR (click through rate) across the deployed surfaces; a >50% decrease in null rate; a 2x increase in the app cards surfaced, which helps drive product visibility.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 効率的な量子力学のための排他的あるいはエンコードされた代数構造 Exclusive-or encoded algebraic structure for efficient quantum dynamics ( http://arxiv.org/abs/2404.09312v2 ) ライセンス: Link先を確認	Lukas Broers, Ludwig Mathey,	(参考訳) 本稿では,多体2レベル量子系の代数構造を捉える形式的手法を提案する。この形式主義は、対応するリー代数の元の列挙指標のバイナリ表現に基づいている。その代数の任意の大きな要素の作用は、ビット単位の排他的操作に還元される。この形式主義は自然に多体密度作用素のスパース表現を生成し、そのサイズは動的トランケーション法によって制御される。我々は、この形式主義がリアルタイム進化、消散的リンドブラッド作用、想像的時間進化、および射影的測定プロセスにどのように適用されるかを実証する。量子力学計算のこのアプローチは、密度演算子の非零成分の数と線形に近似する。この排他的あるいは表現的量子代数をORQAと呼ぶ。概念実証として、最大22の2レベルシステムに対する最大独立集合問題に対する量子アニール過程をシミュレートすることで、この形式性の数値的な実証を行う。 We propose a formalism that captures the algebraic structure of many-body two-level quantum systems, and directly motivates an efficient numerical method. This formalism is based on the binary representation of the enumeration-indices of the elements of the corresponding Lie algebra. The action of arbitrarily large elements of that algebra reduces to a few bit-wise exclusive-or operations. This formalism naturally produces sparse representations of many-body density operators, the size of which we control through a dynamic truncation method. We demonstrate how this formalism applies to real-time evolution, dissipative Lindblad action, imaginary-time evolution, and projective measurement processes. We find that this approach to calculating quantum dynamics scales close to linearly with the number of non-zero components in the density operator. We refer to this exclusive-or represented quantum algebra as ORQA. As a proof of concept, we provide a numerical demonstration of this formalism by simulating quantum annealing processes for the maximum independent set problem for up to 22 two-level systems.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# AudioProtoPNet:鳥音分類のための解釈可能なディープラーニングモデル AudioProtoPNet: An interpretable deep learning model for bird sound classification ( http://arxiv.org/abs/2404.10420v2 ) ライセンス: Link先を確認	René Heinrich, Bernhard Sick, Christoph Scholz,	(参考訳) 近年、鳥類の多様性を監視するための深層学習モデルが提案されている。これらのモデルは音響信号を解析することにより高精度に鳥種を検出することができる。しかし、従来のディープラーニングアルゴリズムは、意思決定プロセスに関する洞察を提供するブラックボックスモデルである。鳥類学者のようなドメインの専門家にとって、これらのモデルは効率的であるだけでなく、補助ツールとして使われるために解釈可能であることが重要である。本研究では,そのモデルアーキテクチャによる固有解釈性を提供する音声分類に,Prototypeal Part Network (ProtoPNet) を適用した。本手法は,特徴抽出のためのConvNeXtバックボーンアーキテクチャに基づいて,訓練データのスペクトログラムを用いて各鳥類の原型パターンを学習する。新しいデータの分類は、これらのプロトタイプを潜在空間で比較することで行われ、同時にモデルの判断に対する理解しやすい説明を提供する。異なる地理的領域の鳥種を表す7つの異なるデータセットを用いて,本モデルの性能評価を行った。実験の結果, 平均AUROCは0.82, 平均cmAPは0.37となり, 鳥の音響分類における最先端のブラックボックスモデルに匹敵する結果を得た。そこで本研究は, 生物音響鳥類分類の困難な課題においても, 強力かつ解釈可能な深層学習モデルを開発して, ドメインの専門家に貴重な洞察を提供することを実証する。 Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions. We evaluated the performance of our model on seven different datasets representing bird species from different geographical regions. In our experiments, the model showed excellent results, achieving an average AUROC of 0.82 and an average cmAP of 0.37 across the seven datasets, making it comparable to state-of-the-art black-box models for bird sound classification. Thus, this work demonstrates that even for the challenging task of bioacoustic bird classification, powerful yet interpretable deep learning models can be developed to provide valuable insights to domain experts.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 大規模言語モデルによる規範的要件の運用 Normative Requirements Operationalization with Large Language Models ( http://arxiv.org/abs/2404.12335v2 ) ライセンス: Link先を確認	Nick Feng, Lina Marsso, S. Getir Yaman, Isobel Standen, Yesugen Baatartogtokh, Reem Ayad, Victória Oldemburgo de Mello, Bev Townsend, Hanne Bartels, Ana Cavalcanti, Radu Calinescu, Marsha Chechik,	(参考訳) 規範的な非機能要件は、社会的、法的、倫理的、共感的、文化的規範の違反を避けるために、システムが観察しなければならない制約を規定する。これらの要件は一般的に、異なる専門知識や優先順位を持つ非技術者の利害関係者(倫理学者、弁護士、社会科学者など)によって定義されるため、その整合性と一貫性の確保は非常に困難である。近年の研究では、規則として規範的要件を規定するためにドメイン固有の言語を使用して、一貫性を形式的なメソッドで分析できるという課題に対処している。本稿では,システム機能の抽象表現間の意味的関係を抽出するために,大規模言語モデルを用いた補完的アプローチを提案する。これらの関係は、しばしば非技術的利害関係者(例えば、常識やドメイン知識に基づいて)によって暗黙的に仮定され、規範的要求の一貫性を引き出して分析するための自動推論技術を強化するために使用される。実世界のケーススタディを通じて,規範的要件の導出と運用へのアプローチの有効性を示す。 Normative non-functional requirements specify constraints that a system must observe in order to avoid violations of social, legal, ethical, empathetic, and cultural norms. As these requirements are typically defined by non-technical system stakeholders with different expertise and priorities (ethicists, lawyers, social scientists, etc.), ensuring their well-formedness and consistency is very challenging. Recent research has tackled this challenge using a domain-specific language to specify normative requirements as rules whose consistency can then be analysed with formal methods. In this paper, we propose a complementary approach that uses Large Language Models to extract semantic relationships between abstract representations of system capabilities. These relations, which are often assumed implicitly by non-technical stakeholders (e.g., based on common sense or domain knowledge), are then used to enrich the automated reasoning techniques for eliciting and analyzing the consistency of normative requirements. We show the effectiveness of our approach to normative requirements elicitation and operationalization through a range of real-world case studies.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# ドメイン固有の質問応答のための検索補助生成 Retrieval Augmented Generation for Domain-specific Question Answering ( http://arxiv.org/abs/2404.14760v2 ) ライセンス: Link先を確認	Sanat Sharma, David Seunghyun Yoon, Franck Dernoncourt, Dewang Sultania, Karishma Bagga, Mengjiao Zhang, Trung Bui, Varun Kotte,	(参考訳) 質問応答(QA)は,大規模言語モデルの高度開発において重要な応用となっている。質問応答のための一般的な訓練済みの大規模言語モデルは、金融、医療、教育、顧客サービスといった特定の分野の知識や用語を適切に理解するために訓練されていない。ドメイン固有の理解をより良くするために、私たちはAdobe製品のための社内質問回答システムを構築しました。本稿では,大規模問合せデータベースをコンパイルする新しいフレームワークを提案し,大規模言語モデルの検索対応微調整手法を開発した。我々は,レトリバーの微調整が最終世代に大きな改善をもたらすことを示す。我々の全体的なアプローチは、文脈的接地のための最新の検索情報を維持しながら、世代間の幻覚を減らす。 Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# GSM8K で >97% を達成する: 問題を深く理解することで LLM が数学語問題により良い解をもたらす Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems ( http://arxiv.org/abs/2404.14963v3 ) ライセンス: Link先を確認	Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, Dacheng Tao,	(参考訳) CoT(Chain-of-Thought)のプロンプトにより、さまざまな推論タスクにわたるLLM(Large Language Models)のパフォーマンスが向上した。しかし、CoTは複雑な数学用語の問題を扱うのに不足しており、通常、意味的誤解エラー、計算エラー、ステップミスエラーという3つの落とし穴に悩まされる。従来の研究では、計算エラーとステップミスエラーに対処するが、LLMの性能を制限する主要な要因である意味的誤解の誤りを無視する。そこで本研究では,LLMの数学的問題解決能力を改善するために,意味的誤りに対処するシンプルな解法であるDeeply Understanding the Problems (DUP)を提案する。提案手法の核心は, LLMが問題を深く理解し, より良い推論に使用する重要な問題解決情報を抽出することを奨励することである。 10種類の多変量推論ベンチマークによる大規模な実験により、我々のDUP法は、他の手法よりもずっと優れています。さらに奨励的に、DUPはGSM8Kベンチマークで新しいSOTA結果を達成し、精度は97.1%である。 Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the LLMs' performance. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% under zero-shot setting.	翻訳日:2024-05-30 22:52:03 公開日:2024-05-29
# 医療産業における大規模言語モデル応用の評価に関する総合的研究 A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry ( http://arxiv.org/abs/2404.15777v4 ) ライセンス: Link先を確認	Yining Huang, Keke Tang, Meilian Chen, Boyuan Wang,	(参考訳) 2017年のTransformerアーキテクチャの開始以来、GPTやBERTのような大規模言語モデル(LLM)は大幅に進化し、言語理解と生成の高度な能力を持つ様々な産業に影響を与えた。これらのモデルは、医療分野を変革する可能性を示し、その効果的かつ倫理的な展開を保証するための特別な評価フレームワークの必要性を強調している。この包括的調査は、医療におけるLSMの広範な適用と必要な評価を概説し、医療の成果を高める上で、その能力を完全に活用するための実証的検証の重要性を強調した。本調査は,臨床環境,医療用テキストデータ処理,研究,教育,公衆衛生への意識といった分野におけるLCM応用の詳細な分析を行うために構成されている。まず,臨床診断,医用テキストデータ処理,情報検索,データ分析,教育コンテンツ生成などのタスクにおける評価結果に基づいて,様々な医療応用におけるLCMの役割を探求することから始める。その後のセクションでは、モデル、評価者、比較実験を含む、採用される評価方法とメトリクスについて包括的な議論がなされている。さらに,これらの評価に用いたベンチマークとデータセットについて検討し,質問応答,要約,情報抽出,バイオインフォマティクス,情報検索,総合ベンチマークなどのタスクのベンチマークを分類した記述を提供する。この構造は、医療領域におけるLSMの有効性、正確性、ユーザビリティ、倫理的整合性についてどのように評価されるか、徹底的に理解することを保証する。はぁ...。 Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in various medical applications, detailing their evaluation based on performance in tasks such as clinical diagnosis, medical text data processing, information retrieval, data analysis, and educational content generation. The subsequent sections offer a comprehensive discussion on the evaluation methods and metrics employed, including models, evaluators, and comparative experiments. We further examine the benchmarks and datasets utilized in these evaluations, providing a categorized description of benchmarks for tasks like question answering, summarization, information extraction, bioinformatics, information retrieval and general comprehensive benchmarks. This structure ensures a thorough understanding of how LLMs are assessed for their effectiveness, accuracy, usability, and ethical alignment in the medical domain. ...	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# REBEL:Regressing Relative Rewardsによる強化学習 REBEL: Reinforcement Learning via Regressing Relative Rewards ( http://arxiv.org/abs/2404.16767v2 ) ライセンス: Link先を確認	Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun,	(参考訳) 元々は連続的な制御問題のために開発されたが、PPO(Proximal Policy Optimization)は、生成モデルの微調整を含む様々な強化学習(RL)応用のワークホースとして登場した。残念ながら、PPOは安定収束を可能にするために複数のヒューリスティック(例えば、値ネットワーク、クリップ)を必要としており、これらのコンポーネントの正確な実装に敏感であることで有名である。これに対し、我々は後退して、生成モデルの時代における最小限のRLアルゴリズムがどのようなものになるのかを尋ねる。本稿では、ポリシー最適化の問題をきれいに軽減し、2つの完了間の相対報酬をプロンプトに回帰させ、極めて軽量な実装を可能にするアルゴリズムREBELを提案する。理論的には、自然ポリシーグラディエントのような基本的RLアルゴリズムはREBELの変種と見なせることが証明され、RLの文献における収束とサンプルの複雑さの観点から最も強力な理論的保証と一致させることができる。 REBELはまた、オフラインデータをきれいに組み込んで、実際によく見られる非推移的な好みを扱うように拡張することもできる。経験的に、REBELは言語モデリングと画像生成に統一的なアプローチを提供し、PPOやDPOに近い性能で、PPOよりも実装が簡単で、計算効率が良い。 Llama-3-8B-インストラクションを微調整すると、REBELはAlpacaEval 2.0、MT-Bench、Open LLM Leaderboardで高いパフォーマンスを達成した。 While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative reward between two completions to a prompt in terms of the policy, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and be extended to handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally efficient than PPO. When fine-tuning Llama-3-8B-Instruct, REBEL achieves strong performance in AlpacaEval 2.0, MT-Bench, and Open LLM Leaderboard.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# ブロックチェーンを使用したパブリッククラウドにおけるデータ共有の緩和 Mitigating Data Sharing in Public Cloud using Blockchain ( http://arxiv.org/abs/2404.16872v2 ) ライセンス: Link先を確認	Patil Pratik Vijaykumar, Prerna Tulsiani, Dr. Sunil Mane,	(参考訳) パブリック・クラウド・コンピューティングは、ビジネスの運営方法に変化をもたらしたため、現代のITインフラの基本的な部分となっている。しかし、クラウドセキュリティの懸念は、データ保護、共有、アクセス制御に関連する新たなリスクと課題をもたらす。ブロックチェーンとクラウドのシナジスティックな統合は、大きな可能性を秘めている。ブロックチェーンの分散台帳は、中央集権的な権威への依存を減らすため、透明性、不変性、効率性を保証する。これを受けて、当社のフレームワークは、データ権利、データ共有、データバリデーションといった重要な側面を備えた、クラウド内のセキュアなデータエコシステムを提案しています。また、このアプローチはデータマイグレーションの必要性をなくすことで、相互運用性とスケーラビリティを向上させることを目指している。これにより、既存のパブリッククラウドベースのシステムが、信頼性の強化とクラウドデータの非再検討を容易にブロックチェーンをデプロイできるようになる。 Public Cloud Computing has become a fundamental part of modern IT infrastructure as its adoption has transformed the way businesses operate. However, cloud security concerns introduce new risks and challenges related to data protection, sharing, and access control. A synergistic integration of blockchain with the cloud holds immense potential. Blockchain's distributed ledger ensures transparency, immutability, and efficiency as it reduces the reliance on centralized authorities. Motivated by this, our framework proposes a secure data ecosystem in the cloud with the key aspects being Data Rights, Data Sharing, and Data Validation. Also, this approach aims to increase its interoperability and scalability by eliminating the need for data migration. This will ensure that existing public cloud-based systems can easily deploy blockchain enhancing trustworthiness and non-repudiation of cloud data.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# 会員推論攻撃に対するセンターベース緩和学習 Center-Based Relaxed Learning Against Membership Inference Attacks ( http://arxiv.org/abs/2404.17674v2 ) ライセンス: Link先を確認	Xingli Fang, Jung-Eun Kim,	(参考訳) メンバーシップ推論攻撃(MIA)は現在、主要なプライバシ攻撃戦略の1つと考えられており、その防御機構も広く検討されている。しかしながら、既存の防御アプローチと、パフォーマンスとデプロイメントコストの理想的なモデルとの間にはまだギャップがあります。特に,モデルのプライバシ脆弱性は,モデルのデータ記憶能力と一般化能力のギャップと密接に相関していることがわかった。そこで本研究では,任意の分類モデルに適応し,最小限あるいは不要なモデル一般化性を犠牲にすることで,プライバシ保護を提供する,CRL(Central-based relaxed learning)と呼ばれるアーキテクチャに依存しない新たな学習パラダイムを提案する。我々はCRLがメンバーデータと非メンバーデータの一貫性をよりよく維持できることを強調する。標準分類データセットに関する広範な実験を通じて、モデルキャパシティやデータコストを必要とせずに、このアプローチが同等のパフォーマンスを示すことを実証的に示す。 Membership inference attacks (MIAs) are currently considered one of the main privacy attack strategies, and their defense mechanisms have also been extensively explored. However, there is still a gap between the existing defense approaches and ideal models in performance and deployment costs. In particular, we observed that the privacy vulnerability of the model is closely correlated with the gap between the model's data-memorizing ability and generalization ability. To address this, we propose a new architecture-agnostic training paradigm called center-based relaxed learning (CRL), which is adaptive to any classification model and provides privacy preservation by sacrificing a minimal or no loss of model generalizability. We emphasize that CRL can better maintain the model's consistency between member and non-member data. Through extensive experiments on standard classification datasets, we empirically show that this approach exhibits comparable performance without requiring additional model capacity or data costs.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# GRAMMAR:ドメイン特化検索拡張言語モデルの評価のための基礎的およびモジュール的手法 GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model ( http://arxiv.org/abs/2404.19232v4 ) ライセンス: Link先を確認	Xinzhe Li, Ming Liu, Shang Gao,	(参考訳) Retrieval-augmented Generation (RAG) システムは、ドメイン固有の知識ベースを問うために、様々な産業で活発に研究され、展開されている。しかし、これらのシステムを評価することは、ドメイン固有のクエリの不足とそれに対応する基礎的な真実、そして障害の原因を診断するための体系的なアプローチの欠如など、ユニークな課題を示す。これらの課題に対処するために、GRAMMAR(GRounded and Modular Methodology for Assessment of RAG)という2つの要素からなる評価フレームワークを導入する。 1)リレーショナルデータベースとLLMを活用して,スケーラブルな問合せ対を効率的に生成するデータ生成プロセス。この方法では、言語的バリエーションからクエリロジックを分離し、デバッグ機能を増強する。 2)知識ギャップと堅牢性を区別し,欠陥モジュールの識別を可能にする評価フレームワーク。我々の経験的結果は、モデル脆弱性を正確に識別するために、現在の基準フリー評価手法の限界とGRAMMARの信頼性を裏付けるものである。 Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs. This method facilitates the separation of query logic from linguistic variations for enhanced debugging capabilities; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# TRAMBA: 携帯・ウェアラブルプラットフォーム上での音声・骨伝導音声の高分解能・高機能化のためのハイブリッドトランスフォーマとマンバアーキテクチャ TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms ( http://arxiv.org/abs/2405.01242v3 ) ライセンス: Link先を確認	Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia,	(参考訳) 本稿では,モバイルおよびウェアラブルプラットフォームに適した音響・骨伝導音声強調のためのハイブリッドトランスフォーマーTRAMBAとMambaアーキテクチャを提案する。骨伝導音声強調は、モバイルおよびウェアラブルプラットフォームで採用されるには、いくつかの理由から非現実的である。 i) データ収集は労働集約的であり,その結果,不足する。 (II)数百MBのメモリフットプリントを持つ最先端モデルと資源制約システムに適した手法の間には,性能ギャップが存在する。 TRAMBAを振動に基づくセンシングに適応させるため、広範に利用できる音声音声データセットを用いてTRAMBAを事前訓練する。そして、少量の骨伝導データで微調整を行う。 TRAMBAは、PESQが最大7.3%、STOIが1.8%、メモリフットプリントが桁違いに小さく、推論速度が最大465倍である。我々はTRAMBAを実システムに統合し、TRAMBAを示す i)データサンプリングや送信を少なくすることで、ウェアラブルのバッテリ寿命を最大160%向上させる。 (ii) 雑音の多い環境下では, 放送音声よりも高品質な音声を生成する。 (iii)メモリフットプリントは20.0MB未満である。 We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# ロバスト平均化による正規化Q-ラーニング Regularized Q-learning through Robust Averaging ( http://arxiv.org/abs/2405.02201v2 ) ライセンス: Link先を確認	Peter Schmitt-Förster, Tobias Sutter,	(参考訳) 本稿では,既存のQラーニング手法の弱点を原則的に解決する,2RA Qラーニングと呼ばれる新しいQラーニング変種を提案する。そのような弱点の1つは、制御できない、しばしばパフォーマンスが低下する、基礎となる推定バイアスである。本稿では,最大予測値項に対する分布的に頑健な推定器を提案し,提案した推定バイアスのレベルを正確に制御する。分布的に堅牢な推定器は、提案アルゴリズムがWatkinsのQ-learningに匹敵する計算コストを持つようなクローズドフォームの解を認めている。表の場合, 2RA Q-learning は最適方針に収束し, その漸近平均二乗誤差を解析する。最後に,理論的な知見を裏付ける様々な設定の数値実験を行い,既存の手法よりも2RA Q-learningが優れていることを示す。 We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# ResNCT:CT尿中ネフロート相画像の深層学習モデル ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography ( http://arxiv.org/abs/2405.04629v2 ) ライセンス: Link先を確認	Syed Jamal Safdar Gardezi, Lucas Aronson, Peter Wawrzyn, Hongkun Yu, E. Jason Abel, Daniel D. Shapiro, Meghan G. Lubner, Joshua Warner, Giuseppe Toia, Lu Mao, Pallavi Tiwari, Andrew L. Wentland,	(参考訳) 目的:CT urography(CTU)検査における腎画像合成のためのトランスフォーマーに基づく深層学習モデルの開発と評価を行う。資料と方法: この振り返り研究は地方機関審査委員会によって承認された。深層学習モデル開発のための3相CT尿路撮影を行った119例(平均SD年齢:65ドル:12歳:75/44男性/女性)のデータセットを作成した。各患者の3段階はアフィン登録アルゴリズムで一致した。ネフロート相CT画像合成(ResNCT)のための残留トランスフォーマモデル(Residual transformer model)を開発した。合成画像は、ピーク信号対雑音比(PSNR)、構造類似度指数(SSIM)、正規化クロス相関係数(NCC)、平均絶対誤差(MAE)、ルート平均二乗誤差(RMSE)など、複数の性能指標を用いて評価した。結果: ResNCTモデルは非コントラストおよび尿路画像入力から合成腎画像を生成することに成功した。地上の真相のネフローグラフィー画像では、モデルによって合成された画像は、高いPSNR (27.8$\pm$ 2.7 dB)、SSIM (0.88$\pm$ 0.05)、NAC (0.98$\pm$ 0.02)、低いMAE (0.02$\pm$ 0.005)、RMSE (0.042$\pm$ 0.016) を達成した。結論: ResNCT モデルにより, 地上の真理画像と高い類似性を有するネフロート相CT画像が合成された。 ResNCT モデルでは,CTU 試験において33% の放射線線量減少による腎症相の獲得を除去する手段を提供する。 Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# LangCell: 細胞アイデンティティ理解のためのLanguage-Cell事前トレーニング LangCell: Language-Cell Pre-training for Cell Identity Understanding ( http://arxiv.org/abs/2405.06708v3 ) ライセンス: Link先を確認	Suyuan Zhao, Jiahuan Zhang, Yizhen Luo, Yushuai Wu, Zaiqing Nie,	(参考訳) 細胞識別は、細胞の種類、経路情報、疾患情報など、細胞の様々な意味的側面を包含しており、生物学者がその生物学的特性を理解するのに不可欠である。細胞型アノテートなどの転写学的データから細胞識別を理解することは、生体情報学において重要な課題となっている。これらのセマンティックな側面は人間の専門家によって決定されるため、単一セルとラベルペアによって提供される監視信号なしで、AIモデルが細胞アイデンティティ理解タスクを効果的に実行することは不可能である。このタスクに現在使用されているシングルセル事前訓練言語モデル(PLM)は、単一のモダリティ、トランスクリプトミクスデータのみに基づいて訓練され、セルアイデンティティの知識の理解が欠如している。結果として、望ましいセマンティックラベルでラベル付きデータを欠いている場合には、ダウンストリームタスクや苦労のために微調整される必要がある。この問題に対処するために,事前学習期間中に単一セルデータと自然言語の統一表現を構築し,セルアイデンティティに関連する洞察を直接組み込むという,革新的な手法を提案する。より具体的には、最初のLanguage-Cell事前トレーニングフレームワークであるLangCellを紹介します。 LangCellは、セルアイデンティティ情報に富んだテキストを利用して、クロスモーダルな知識の深い理解を得る。異なるベンチマークで実施された実験の結果、LangCellはゼロショットのセル識別理解シナリオで効果的に機能する唯一のシングルセルPLMであり、また、少数ショットと微調整のセル識別理解シナリオで既存のモデルよりも大幅に優れていることが示された。 Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, have become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce LangCell, the first Language-Cell pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# LLMで作ったブラックボックスの解説は、逆向きに役に立つ LLM-Generated Black-box Explanations Can Be Adversarially Helpful ( http://arxiv.org/abs/2405.06800v2 ) ライセンス: Link先を確認	Rohan Ajwani, Shashidhar Reddy Javaji, Frank Rudzicz, Zining Zhu,	(参考訳) 大規模言語モデル(LLM)は,デジタルアシスタントとして機能することで,複雑な問題の解決と理解を支援する重要なツールになりつつある。 LLMは、これらの問題の入力と出力のみを与えられた場合、すなわち `black-box'' アプローチで、説得力のある説明を生成することができる。しかし、我々の研究はこのアプローチに結びついている隠れたリスクを明らかにし、それを逆助力(adversarial helpness)と呼ぶ。 LLMの説明が間違った答えを正しく見せると、これは起こります。本稿では,この問題が人間だけでなく,LLM評価者にも影響を及ぼすことを示す。より深く掘り下げて、LLMが採用する主要な説得戦略を特定し、検証する。以上の結果から,これらのモデルでは,質問の再フレーミング,信頼度の向上,ミスリードした回答を信頼できる光で表現するためのチェリーピッキングエビデンスなどの戦略が採用されていることが明らかとなった。 LLMが逆向きに有用な説明を生成する際に複雑な構造的知識をナビゲートできるかどうかを調べるため、グラフをナビゲートして特別なタスクを作成する。ほとんどのLSMは、単純なグラフに沿った代替経路を見つけることができず、それらの誤解を招く説明は複雑な知識を用いた論理的推論によってのみ生成されるものではないことを示唆している。これらの知見は,ブラックボックスの説明設定の限界に光を当て,LLMの安全利用に関するアドバイスを提供する。 Large Language Models (LLMs) are becoming vital tools that help us solve and understand complex problems by acting as digital assistants. LLMs can generate convincing explanations, even when only given the inputs and outputs of these problems, i.e., in a ``black-box'' approach. However, our research uncovers a hidden risk tied to this approach, which we call adversarial helpfulness. This happens when an LLM's explanations make a wrong answer look right, potentially leading people to trust incorrect solutions. In this paper, we show that this issue affects not just humans, but also LLM evaluators. Digging deeper, we identify and examine key persuasive strategies employed by LLMs. Our findings reveal that these models employ strategies such as reframing the questions, expressing an elevated level of confidence, and cherry-picking evidence to paint misleading answers in a credible light. To examine if LLMs are able to navigate complex-structured knowledge when generating adversarially helpful explanations, we create a special task based on navigating through graphs. Most LLMs are not able to find alternative paths along simple graphs, indicating that their misleading explanations aren't produced by only logical deductions using complex knowledge. These findings shed light on the limitations of the black-box explanation setting and allow us to provide advice on the safe usage of LLMs.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# 緩やかな非定常過程からの因果推論 Causal Inference from Slowly Varying Nonstationary Processes ( http://arxiv.org/abs/2405.06902v2 ) ライセンス: Link先を確認	Kang Du, Yu Xiang,	(参考訳) 制限構造因果モデル(SCM)フレームワークによる観測データからの因果推論は、非ガウス性や非線形性などのデータ生成機構による原因と効果の非対称性に大きく依存する。この手法は定常時系列に適応できるが、非定常時系列から因果関係を推定することは難しい課題である。本研究では,時間変化フィルタと定常雑音による制約付きSCMを新たに提案し,非定常性から非定常性への非対称性を利用して,二変量およびネットワーク設定の因果同定を行う。本稿では,2変量進化スペクトルの強力な推定値を利用して,ゆっくりと変化するプロセスに効率的な手順を提案する。提案手法の有効性を示すために,高次および非滑らかなフィルタを含む各種合成および実データセットの評価を行った。 Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# PeRFlow:Universal Plug-and-Play AcceleratorとしてのPiecewise Rectified Flow PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator ( http://arxiv.org/abs/2405.07510v3 ) ライセンス: Link先を確認	Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng,	(参考訳) 拡散モデルを高速化するフローベース手法であるPecewise Rectified Flow(PeRFlow)を提案する。 PeRFlowは、生成フローのサンプリングプロセスを複数の時間ウィンドウに分割し、リフロー操作を通じて各間隔の軌跡を直線化し、断片的な線形フローに近づく。 PeRFlowは数ステップの世代で優れたパフォーマンスを達成する。さらに、専用のパラメータ化を通じて、PeRFlowモデルは事前訓練された拡散モデルから知識を継承する。このように、トレーニングは高速に収束し、得られたモデルは、事前訓練された拡散モデルに基づいて様々なワークフローと互換性のある普遍的なプラグアンドプレイアクセラレータとして機能する、有利な転送能力を示す。トレーニングと推論のためのコードも公開されている。 https://github.com/magic-research/piecewise-rectified-flow We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. Codes for training and inference are publicly released. https://github.com/magic-research/piecewise-rectified-flow	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# AnoVox: 自動運転におけるマルチモーダル異常検出ベンチマーク AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving ( http://arxiv.org/abs/2405.07865v2 ) ライセンス: Link先を確認	Daniel Bogdoll, Iramm Hamdard, Lukas Namgyu Rößler, Felix Geisler, Muhammed Bayram, Felix Wang, Jan Imhof, Miguel de Campos, Anushervon Tabarov, Yitian Yang, Hanno Gottschalk, J. Marius Zöllner,	(参考訳) 自動運転車のスケールアップは、道路上のまれな物体のような異常に対処する能力に大きく依存している。このような状況に対処するためには、そもそも異常を検出する必要がある。自動走行の異常検出はここ数年で大きな進歩を遂げてきたが、カメラデータに強く焦点を絞った設計の悪いベンチマークに悩まされている。本研究では,自動運転におけるANOmaly検出のための最大のベンチマークであるAnoVoxを提案する。 AnoVoxは、大規模なマルチモーダルセンサーデータと空間的VOXel地上真実を組み込んでおり、使用済みセンサとは無関係な方法の比較を可能にしている。正規性の形式的定義を提案し,従順なトレーニングデータセットを提供する。 AnoVoxは、コンテンツと時間的異常の両方を含む最初のベンチマークである。 The scale-up of autonomous vehicles depends heavily on their ability to deal with anomalies, such as rare objects on the road. In order to handle such situations, it is necessary to detect anomalies in the first place. Anomaly detection for autonomous driving has made great progress in the past years but suffers from poorly designed benchmarks with a strong focus on camera data. In this work, we propose AnoVox, the largest benchmark for ANOmaly detection in autonomous driving to date. AnoVox incorporates large-scale multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. We propose a formal definition of normality and provide a compliant training dataset. AnoVox is the first benchmark to contain both content and temporal anomalies.	翻訳日:2024-05-30 22:42:17 公開日:2024-05-29
# オープンソース生成AIのリスクと機会 Risks and Opportunities of Open-Source Generative AI ( http://arxiv.org/abs/2405.08597v3 ) ライセンス: Link先を確認	Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster,	(参考訳) Generative AI(Gen AI)の応用は、科学や医学、教育など、さまざまな分野に革命をもたらすことが期待されている。こうした地震的な変化の可能性は、この技術の潜在的なリスクについて活発に議論を巻き起こし、特にAI開発をリードする大手テック企業からの厳しい規制を要求した。この規制は、オープンソースの生成AIの誕生する分野を危険にさらす可能性がある。 Gen AI開発のための3段階のフレームワーク(近、中、長期)を使用して、現在利用可能なもの(中、中)と、より大きな機能(長期)を備えたオープンソース生成AIモデルのリスクと機会を分析します。全体として、オープンソースのGen AIの利点は、そのリスクを上回っている、と私たちは主張する。そのため、我々は、モデル、トレーニング、評価データのオープンソース化を奨励し、オープンソースの生成AIに関連するリスクを管理するための一連の推奨とベストプラクティスを提供します。 Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 特徴融合ネットワークを用いた人・機械用スケーラブル画像符号化 Scalable Image Coding for Humans and Machines Using Feature Fusion Network ( http://arxiv.org/abs/2405.09152v3 ) ライセンス: Link先を確認	Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe,	(参考訳) 画像認識モデルがより普及するにつれて、機械や人間のスケーラブルなコーディング方法がより重要になる。画像認識モデルの応用例としては、交通監視と農業管理がある。これらのユースケースでは、スケーラブルな符号化手法が有効であることが証明される。人間や機械の既存の画像圧縮手法は、これらの要件をある程度満たしている。しかし,これらの圧縮法は特定の画像認識モデルにのみ有効である。本稿では,多数の画像認識モデルと互換性のある人や機械を対象とした,学習に基づくスケーラブルな画像符号化手法を提案する。我々は,機械用画像圧縮モデルと圧縮モデルを組み合わせて,人間の画像復号を容易にするための追加情報を提供する。これらの圧縮モデルの特徴は、効率的な画像圧縮を実現するために、特徴融合ネットワークを用いて融合される。本手法では,特徴融合ネットワークにおいて,異なるサイズの特徴の組み合わせを可能とし,パラメータ数を削減するために,付加的な情報圧縮モデルを調整する。提案手法では,パラメータ数を削減しつつ,画像圧縮モデルを効率よく組み合わせることを確認する。さらに、デコードされた画像の品質とビットレートの観点から画像圧縮性能を評価することにより、提案手法の有効性を実証する。 As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# スピン対称性と熱的利用密度汎関数理論 Spin Symmetry in Thermally-Assisted-Occupation Density Functional Theory ( http://arxiv.org/abs/2405.09187v2 ) ライセンス: Link先を確認	Yu-Yang Wang, Jeng-Da Chai,	(参考訳) マルチ参照(MR)特性を持つ電子系では、従来の交換相関(xc)エネルギー汎関数を持つコーン・シャム密度汎関数論(KS-DFT)は誤ったスピン密度と関連する性質をもたらす。例えば、H2 解離の場合、KS-DFT で機能する同じ xc エネルギーで得られるスピン制限およびスピン非制限の解は、スピン非制限の解における非物理的スピン対称性の破れ効果を、はっきりと異なるものにすることができる。近年, 熱共役密度汎関数理論 (TAO-DFT) は, 実測温度を適切に選択した場合に, 上記のスピン対称性の破れを解消することが示されている。本研究では, TAO-DFTに基づく応答理論を開発し, 十分に高温のTAO-DFTがMR系の非物理的スピン対称性の破れを解消できることを示した。さらに, H2, N2, He2, Ne2の解離, およびねじれたエチレンの解離に対して, 種々の架空の温度のTAO-DFT計算を行った。 For electronic systems with multi-reference (MR) character, Kohn-Sham density functional theory (KS-DFT) with the conventional exchange-correlation (xc) energy functionals can lead to incorrect spin densities and related properties. For example, for H2 dissociation, the spin-restricted and spin-unrestricted solutions obtained with the same xc energy functional in KS-DFT can be distinctly different, yielding the unphysical spin-symmetry breaking effects in the spin-unrestricted solutions. Recently, thermally-assisted-occupation density functional theory (TAO-DFT) has been shown to resolve the aforementioned spin-symmetry breaking, when the fictitious temperature is properly chosen. In this work, a response theory based on TAO-DFT is developed to demonstrate that TAO-DFT with a sufficiently large fictitious temperature can always resolve the unphysical spin-symmetry breaking in MR systems. To further support this, TAO-DFT calculations with various fictitious temperatures are performed for the dissociation of H2, N2, He2, and Ne2 as well as the twisted ethylene.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# ホップ代数からの一般化クラスター状態:非可逆対称性とホップテンソルネットワーク表現 Generalized cluster states from Hopf algebras: non-invertible symmetry and Hopf tensor network representation ( http://arxiv.org/abs/2405.09277v2 ) ライセンス: Link先を確認	Zhian Jia,	(参考訳) クラスタ状態は、測定ベースの量子計算(MBQC)にとって重要なリソースである。対称性保護トポロジカル秩序(SPT)を示すため、トポロジカルフェーズの研究にも重要な役割を果たしている。ホップ代数に基づくクラスター状態の構成について述べる。有限群値quditをホップ代数値quditに一般化し、ホップ代数の正則作用に基づく一般化されたパウリ-X作用素を導入し、ホップ代数上の既約表現作用に基づく一般化されたパウリ-Z作用素を導入することにより、ホップ量子の包括的理論を開発する。ホップ四重項に対して非可逆対称性が自然に現れることを示す。その後、クラスタグラフと呼ばれる二部グラフに対して、同一性状態と自明な表現状態はそれぞれ偶数頂点と奇数頂点に割り当てる。エッジアンタングルを制御された正規動作として導入し、ホップクラスター状態の一般的な構成を提供する。エッジエンタングルの可換性を確保するために,任意の三角形多様体に対してクラスタ格子を構築する手法を提案する。構築を説明する例として,1dクラスタ状態の例を例に挙げる。これはSPT相の有望な候補として機能するので、このシナリオのためにギャップ付きハミルトニアンを構築し、その非可逆対称性に関する詳細な議論を掘り下げる。また,1dクラスタ状態モデルが準1dホップ量子二重モデルと等価であることを示す。また、構造定数のテンソル表現とホップ代数の弦図形を統合することでホップクラスター状態のホップテンソルネットワーク表現を導入する。 Cluster states are crucial resources for measurement-based quantum computation (MBQC). It exhibits symmetry-protected topological (SPT) order, thus also playing a crucial role in studying topological phases. We present the construction of cluster states based on Hopf algebras. By generalizing the finite group valued qudit to a Hopf algebra valued qudit and introducing the generalized Pauli-X operator based on the regular action of the Hopf algebra, as well as the generalized Pauli-Z operator based on the irreducible representation action on the Hopf algebra, we develop a comprehensive theory of Hopf qudits. We demonstrate that non-invertible symmetry naturally emerges for Hopf qudits. Subsequently, for a bipartite graph termed the cluster graph, we assign the identity state and trivial representation state to even and odd vertices, respectively. Introducing the edge entangler as controlled regular action, we provide a general construction of Hopf cluster states. To ensure the commutativity of the edge entangler, we propose a method to construct a cluster lattice for any triangulable manifold. We use the 1d cluster state as an example to illustrate our construction. As this serves as a promising candidate for SPT phases, we construct the gapped Hamiltonian for this scenario and delve into a detailed discussion of its non-invertible symmetries. We also show that the 1d cluster state model is equivalent to the quasi-1d Hopf quantum double model. We also introduce the Hopf tensor network representation of Hopf cluster states by integrating the tensor representation of structure constants with the string diagrams of the Hopf algebra.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 短距離光学系における機械学習:包括的調査 Machine Learning in Short-Reach Optical Systems: A Comprehensive Survey ( http://arxiv.org/abs/2405.09557v2 ) ライセンス: Link先を確認	Chen Shao, Elias Giacoumidis, Syed Moktacim Billah, Shi Li, Jialei Li, Prashasti Sahu, Andre Richter, Tobias Kaefer, Michael Faerber,	(参考訳) 近年,様々な直接検出・自己整合型短距離通信アプリケーションにおける機械学習アルゴリズムの利用について,広範な研究が進められている。これらのアプリケーションには、帯域幅要求予測、信号品質監視、障害検出、トラフィック予測、デジタル信号処理(DSP)に基づく等化など、幅広いタスクが含まれている。汎用的なアプローチとして、機械学習は、決定論的手法が不足する可能性のある光学系ネットワークにおける確率現象に対処する能力を示す。しかし、DSP等化アルゴリズムの場合、その性能改善はしばしば限界であり、特にパッシブ光ネットワーク(PON)のようなコストに敏感な短距離通信シナリオでは、その複雑さは著しく高い。時間的依存を捕捉し、不規則パターンや非線形パターンを効果的に処理し、変動時間間隔を調節する。本稿では,短距離通信における機械学習技術の応用について概説する。特に、機械学習信号処理に使用される時系列手法の新たな分類法を導入し、構造化された分類フレームワークを提供する。我々の分類学は、現在の時系列法を、伝統的な方法、フーリエ畳み込みに基づく方法、トランスフォーマーに基づくモデル、時系列畳み込みネットワークの4つのグループに分類する。最後に、この急速に発展する分野における今後の研究の方向性を強調し、ハードウェア実装に関連する複雑さを軽減するための具体的な解決策を概説する。我々は,複雑性問題に対処して,短時間の光通信システムにおいて,より実用的で効率的な機械学習アプローチの展開の道を開くことを目的としている。 In recent years, extensive research has been conducted to explore the utilization of machine learning algorithms in various direct-detected and self-coherent short-reach communication applications. These applications encompass a wide range of tasks, including bandwidth request prediction, signal quality monitoring, fault detection, traffic prediction, and digital signal processing (DSP)-based equalization. As a versatile approach, machine learning demonstrates the ability to address stochastic phenomena in optical systems networks where deterministic methods may fall short. However, when it comes to DSP equalization algorithms, their performance improvements are often marginal, and their complexity is prohibitively high, especially in cost-sensitive short-reach communications scenarios such as passive optical networks (PONs). They excel in capturing temporal dependencies, handling irregular or nonlinear patterns effectively, and accommodating variable time intervals. Within this extensive survey, we outline the application of machine learning techniques in short-reach communications, specifically emphasizing their utilization in high-bandwidth demanding PONs. Notably, we introduce a novel taxonomy for time-series methods employed in machine learning signal processing, providing a structured classification framework. Our taxonomy categorizes current time series methods into four distinct groups: traditional methods, Fourier convolution-based methods, transformer-based models, and time-series convolutional networks. Finally, we highlight prospective research directions within this rapidly evolving field and outline specific solutions to mitigate the complexity associated with hardware implementations. We aim to pave the way for more practical and efficient deployment of machine learning approaches in short-reach optical communication systems by addressing complexity concerns.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# ガウス状態に対するGKLSベクトル場ダイナミクス GKLS Vector Field Dynamics for Gaussian States ( http://arxiv.org/abs/2405.10282v2 ) ライセンス: Link先を確認	Hans Cruz-Prado, Octavio Castaños, Giuseppe Marmo, Francisco Nettel,	(参考訳) ガウス状態によって記述された系に対するGKLS生成器に付随するベクトル場を構築する。このベクトル場は作用素の代数の双対空間上で定義され、位置と運動量の2次作用素に制限される。 GKLS動力学は分解原理、すなわち、このベクトル場を3つの部分、保守的ハミルトン成分、勾配的成分、チェ・クラウスベクトル場に分解できることを示した。最後の2つの用語は、散逸に関連する「摂動」と見なされている。散逸項の異なる調和振動子に対する例を示す。 We construct the vector field associated with the GKLS generator for systems described by Gaussian states. This vector field is defined on the dual space of the algebra of operators, restricted to operators quadratic in position and momentum. It is shown that the GKLS dynamics accepts a decomposition principle, that is, this vector field can be decomposed in three parts, a conservative Hamiltonian component, a gradient-like and a Choi-Krauss vector field. The last two terms are considered a "perturbation" associated with dissipation. Examples are presented for a harmonic oscillator with different dissipation terms.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 下次収束は弱凸関数に部分微分収束をもたらす:一様速度保証を伴う Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates Guarantees ( http://arxiv.org/abs/2405.10289v3 ) ライセンス: Link先を確認	Feng Ruan,	(参考訳) 非平滑で非凸確率最適化では、集団リスクにアプローチする際のサンプル平均推定値の定常点を解析するために、部分微分写像の均一収束を理解することが重要である。しかし、この収束を特徴づけることは依然として根本的な課題である。この研究は、経験的リスクが集団リスクに収束するにつれて、部分微分写像の均一収束と下次写像の均一収束を結びつけることによって、新しい視点を導入する。確率的弱凸対象に対しては、任意の開集合において、級数(対応する部分微分集合から任意に選択される)の収束に関する一様有界は、ハウスドルフ計量によって測られる部分微分集合自体の収束に関する一様有界となることを証明している。この手法を用いて,確率凸合成対象の偏微分集合に対する一様収束率を導出する。我々の結果は、Hausdorff計量において、集団と有限サンプル部分微分が連続である必要があるが、それでも厳密な収束速度を提供する、文学における主要な分布仮定に頼らない。これらの保証は、有限サンプル内のそのような目的の非滑らかな風景に対する新たな洞察をもたらす。 In nonsmooth, nonconvex stochastic optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of sample average approximations of risk as they approach the population risk. Yet, characterizing this convergence remains a fundamental challenge. This work introduces a novel perspective by connecting the uniform convergence of subdifferential mappings to that of subgradient mappings as empirical risk converges to the population risk. We prove that, for stochastic weakly-convex objectives, and within any open set, a uniform bound on the convergence of subgradients -- chosen arbitrarily from the corresponding subdifferential sets -- translates to a uniform bound on the convergence of the subdifferential sets itself, measured by the Hausdorff metric. Using this technique, we derive uniform convergence rates for subdifferential sets of stochastic convex-composite objectives. Our results do not rely on key distributional assumptions in the literature, which require the population and finite sample subdifferentials to be continuous in the Hausdorff metric, yet still provide tight convergence rates. These guarantees lead to new insights into the nonsmooth landscapes of such objectives within finite samples.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 放射線診断の自動化 : 最近の進歩を振り返って Automated Radiology Report Generation: A Review of Recent Advances ( http://arxiv.org/abs/2405.10842v2 ) ライセンス: Link先を確認	Phillip Sloan, Philip Clatworthy, Edwin Simpson, Majid Mirmehdi,	(参考訳) 医療画像部門の需要が高まる中、放射線技師がタイムリーで正確なレポートを配信する能力に負担がかかっている。人工知能の最近の技術進歩は、自動放射線学レポート生成(ARRG)に大きな可能性を示し、研究の爆発を引き起こした。本稿では,現代ARRG手法の方法論的考察を行う。 (i)可用性、サイズ、採用率などの特性に基づくデータセットの評価。二コントラスト学習、強化学習等の深層学習訓練方法を検討すること。 3) CNNとトランスフォーマーモデルのバリエーションを含む最先端のモデルアーキテクチャを探求すること。四マルチモーダル入力及び知識グラフによる臨床知識の統合に関するアウトライン技術及び (v) 一般的に適用されるNLP測定値や質的臨床評価を含む, 現行モデル評価手法の精査を行った。さらに、レビューされたモデルの定量的結果を分析し、トップパフォーマンスモデルを調べ、さらなる洞察を求める。最後に、潜在的な新しい方向が強調され、他の放射線学的モダリティから追加のデータセットが採用され、将来の発展の重要な領域として予測される評価方法が改善された。 Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# CC-GPX:Common Crawlによる高品質アノテート地理空間データの抽出 CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl ( http://arxiv.org/abs/2405.11039v2 ) ライセンス: Link先を確認	Ilya Ilyankou, Meihui Wang, James Haworth, Stefano Cavazzi,	(参考訳) Common Crawl (CC) コーパスは2008年以来9.5ペタバイト以上のデータを含む最大のオープンウェブクローリングデータセットである。データセットは、大規模な言語モデルのトレーニングに役立ち、(望ましくない)コンテンツのために研究され、より小さなドメイン固有のデータセットのために蒸留されている。しかし、我々の知る限りでは、注釈付き地理空間データの源としてCCを用いる研究は行われていない。本稿では,CC で発見された GPX ファイルから注釈付きユーザ生成トラックを抽出する効率的なパイプラインと,最新の CC リリース6 から,人文記述と MultiLineString ベクトルデータのペア化によるマルチモーダルデータセットを提案する。このデータセットは、人々のアウトドアアクティビティパターン、アウトドアエクスペリエンスについて話す方法、軌跡生成やアノテーションモデルの開発に使用することができる。再現可能なコードはGitHubで入手可能です。 The Common Crawl (CC) corpus is the largest open web crawl dataset containing 9.5+ petabytes of data captured since 2008. The dataset is instrumental in training large language models, and as such it has been studied for (un)desirable content, and distilled for smaller, domain-specific datasets. However, to our knowledge, no research has been dedicated to using CC as a source of annotated geospatial data. In this paper, we introduce an efficient pipeline to extract annotated user-generated tracks from GPX files found in CC, and the resulting multimodal dataset with 1,416 pairings of human-written descriptions and MultiLineString vector data from the 6 most recent CC releases. The dataset can be used to study people's outdoor activity patterns, the way people talk about their outdoor experiences, and for developing trajectory generation or track annotation models. Our reproducible code is available on GitHub: https://github.com/ilyankou/cc-gpx	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 効率的なRow-wise Attentionを用いた高分解能マルチビュー拡散 Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention ( http://arxiv.org/abs/2405.11616v2 ) ライセンス: Link先を確認	Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo,	(参考訳) 本稿では,単一視点画像から高解像度のマルチビュー画像を生成する新しい多視点拡散法であるEra3Dを紹介する。マルチビュー生成の大幅な進歩にもかかわらず、既存の手法はカメラ前のミスマッチ、非効率性、解像度の低さに悩まされ、結果として画質の悪いマルチビュー画像となる。具体的には、入力画像は予め定義されたカメラタイプ、例えば焦点距離が一定である視点カメラに従わなければならないと仮定し、仮定が失敗すると歪んだ形状になる。さらに、それらが採用するフルイメージや高密度なマルチビューの注目は、画像解像度が増大するにつれて、計算複雑性の爆発的な爆発を引き起こす。仮定と現実のギャップを埋めるために、Era3Dはまず拡散型カメラ予測モジュールを提案し、入力画像の焦点長と高さを推定し、形状歪みのない画像を生成する。さらに,多視点拡散の先駆的先行を強制するために,行ワイドアテンションと呼ばれるシンプルだが効率的なアテンション層が用いられ,効率的なクロスビュー情報融合が実現されている。その結果、最先端の手法と比較して、Era3Dは最大512512解像度の高品質なマルチビュー画像を生成し、計算複雑性を12倍に削減した。総合的な実験により、Era3Dは様々な単一ビューの入力画像から高品質で詳細な3Dメッシュを再構築でき、ベースラインのマルチビュー拡散法よりも大幅に優れていることが示された。プロジェクトページ: https://penghtyx.github.io/Era3D/。 In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should comply with a predefined camera type, e.g. a perspective camera with a fixed focal length, leading to distorted shapes when the assumption fails. Moreover, the full-image or dense multiview attention they employ leads to an exponential explosion of computational complexity as image resolution increases, resulting in prohibitively expensive training costs. To bridge the gap between assumption and reality, Era3D first proposes a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, which allows our method to generate images without shape distortions. Furthermore, a simple but efficient attention layer, named row-wise attention, is used to enforce epipolar priors in the multiview diffusion, facilitating efficient cross-view information fusion. Consequently, compared with state-of-the-art methods, Era3D generates high-quality multiview images with up to a 512512 resolution while reducing computation complexity by 12x times. Comprehensive experiments demonstrate that Era3D can reconstruct high-quality and detailed 3D meshes from diverse single-view input images, significantly outperforming baseline multiview diffusion methods. Project page: https://penghtyx.github.io/Era3D/.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# Track Anything Rapter (TAR) Track Anything Rapter(TAR) ( http://arxiv.org/abs/2405.11655v2 ) ライセンス: Link先を確認	Tharun V. Puthanveettil, Fnu Obaid ur Rahman,	(参考訳) 物体追跡はコンピュータビジョンにおける基本的なタスクであり、交通監視、ロボット工学、自律走行車追跡など、様々な領域にまたがる幅広い実用的応用がある。本研究の目的は,テキスト,画像,クリックなどのユーザが提供するマルチモーダルクエリに基づいて,関心のあるオブジェクトを検出し,セグメンテーションし,追跡することを目的とした,TAR(Track Anything Rapter)と呼ばれる高度な航空車両システムを開発することである。 TARは、DINO、CLIP、SAMといった最先端の事前訓練モデルを使用して、クエリされたオブジェクトの相対的なポーズを推定する。トラッキング問題はVisual Servoingタスクとしてアプローチされており、UAVは高度なモーションプランニングと制御アルゴリズムを通じてオブジェクトに一貫してフォーカスすることができる。我々は、これらの基礎モデルとカスタムの高レベル制御アルゴリズムの統合によって、カスタムビルドされたPX4 Autopilot対応のVoxl2 M500ドローンに、高度に安定して正確なトラッキングシステムを構築する方法を紹介する。追従アルゴリズムの性能を検証するために,Vicon ベースの基底真理と比較した。さらに,オクルージョンを含むシナリオにおける追跡支援における基礎モデルの信頼性を評価する。最後に、クリック、バウンディングボックス、イメージテンプレートなど、複数のモードでシームレスに機能するモデルの能力をテストし、検証する。 Object tracking is a fundamental task in computer vision with broad practical applications across various domains, including traffic monitoring, robotics, and autonomous vehicle tracking. In this project, we aim to develop a sophisticated aerial vehicle system known as Track Anything Rapter (TAR), designed to detect, segment, and track objects of interest based on user-provided multimodal queries, such as text, images, and clicks. TAR utilizes cutting-edge pre-trained models like DINO, CLIP, and SAM to estimate the relative pose of the queried object. The tracking problem is approached as a Visual Servoing task, enabling the UAV to consistently focus on the object through advanced motion planning and control algorithms. We showcase how the integration of these foundational models with a custom high-level control algorithm results in a highly stable and precise tracking system deployed on a custom-built PX4 Autopilot-enabled Voxl2 M500 drone. To validate the tracking algorithm's performance, we compare it against Vicon-based ground truth. Additionally, we evaluate the reliability of the foundational models in aiding tracking in scenarios involving occlusions. Finally, we test and validate the model's ability to work seamlessly with multiple modalities, such as click, bounding box, and image templates.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 3峡谷貯水池における地すべり感受性マッピングのための統計的・機械学習・深層学習モデルの解釈可能性 Interpretability of Statistical, Machine Learning, and Deep Learning Models for Landslide Susceptibility Mapping in Three Gorges Reservoir Area ( http://arxiv.org/abs/2405.11762v2 ) ライセンス: Link先を確認	Cheng Chen, Lei Fan,	(参考訳) 地すべり感受性マッピング(LSM)は,高リスク領域の特定と予防戦略の実施に不可欠である。本研究では,地すべりの感受性予測における統計的,機械学習(ML),深層学習(DL)モデルの解釈可能性について検討した。これは、地すべりに統計的に関係のある19の要因の包括的セットと、地すべりを誘発する直接に関連する9の要因の専用のセットの2種類の入力因子を組み込むことによって達成される。モデル性能がLSMの重要な指標であることを考えると、解釈可能性に関する調査は、考慮されたモデル間でのLSMの精度の評価と比較を自然に行ないます。本研究では、畳み込みニューラルネットワークモデルが最も精度が高く(19因子0.8447、0.8048、9因子 0.8048)、一方Extreme Gradient Boosting and Support Vector Machineは、従来の統計モデルよりも優れた予測能力を示した。これらの結果から,DLアルゴリズムと高度MLアルゴリズムは,入力要因と地すべりの発生との複雑な関係を効果的に捉えることができることがわかった。しかし、予測の解釈性は、特に19の要因のより広いセットを使用する場合、様々なモデルで異なっていた。 SHAP、LIME、DeepLIFTといった説明法も解釈結果のバリエーションをもたらしている。 19因子からなる包括的集合を用いることで予測精度は向上したが、モデル解釈における複雑さと矛盾が導入された。予測力は犠牲になったが、様々なモデルにまたがるより一貫した重要な要因によって証明され、フィールド調査レポートの調査結果と一致していたように、9つの要因の専用セットに焦点をあてることで解釈可能性を高めた。 Landslide susceptibility mapping (LSM) is crucial for identifying high-risk areas and informing prevention strategies. This study investigates the interpretability of statistical, machine learning (ML), and deep learning (DL) models in predicting landslide susceptibility. This is achieved by incorporating various relevant interpretation methods and two types of input factors: a comprehensive set of 19 contributing factors that are statistically relevant to landslides, as well as a dedicated set of 9 triggering factors directly associated with triggering landslides. Given that model performance is a crucial metric in LSM, our investigations into interpretability naturally involve assessing and comparing LSM accuracy across different models considered. In our investigation, the convolutional neural network model achieved the highest accuracy (0.8447 with 19 factors; 0.8048 with 9 factors), while Extreme Gradient Boosting and Support Vector Machine also demonstrated strong predictive capabilities, outperforming conventional statistical models. These findings indicate that DL and sophisticated ML algorithms can effectively capture the complex relationships between input factors and landslide occurrence. However, the interpretability of predictions varied among different models, particularly when using the broader set of 19 contributing factors. Explanation methods like SHAP, LIME, and DeepLIFT also led to variations in interpretation results. Using a comprehensive set of 19 contributing factors improved prediction accuracy but introduced complexities and inconsistency in model interpretations. Focusing on a dedicated set of 9 triggering factors sacrificed some predictive power but enhanced interpretability, as evidenced by more consistent key factors identified across various models and alignment with the findings of field investigation reports....	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# データアノテーションの効率的・統計的品質推定法について On Efficient and Statistical Quality Estimation for Data Annotation ( http://arxiv.org/abs/2405.11919v2 ) ライセンス: Link先を確認	Jan-Christoph Klie, Juan Haladjian, Marc Kirchner, Rahul Nair,	(参考訳) アノテーション付きデータセットは、教師付き機械学習モデルをトレーニング、評価、比較、生産化するための重要な要素である。したがって、アノテーションが高品質であることは必須である。彼らの創造のためには、優れた品質管理とそれによる信頼性の高い品質見積が必要である。そして、アノテーション処理中に品質が不十分な場合には、修正措置を講じて改善することができる。品質評価は、専門家が手動でインスタンスを正しくも正しくもラベル付けすることで行われることが多い。しかし、アノテーション付きのインスタンスをチェックするのはコストがかかる傾向にある。したがって、実際には、通常はサブセットのみを検査するが、大部分は正当化や統計的なパワーを考慮せずに選択され、多くの場合は比較的小さい。しかし、小さなサンプルサイズに基づく推定は、誤り率の不正確な値につながる可能性がある。不要な大規模なサンプルサイズの使用には、例えばアノテーションの追加など、もっと多くの費用がかかる可能性がある。そこで我々はまず,アノテーションの誤り率を推定するのに必要となる最小限のサンプルサイズを見つけるために,信頼区間の使い方を詳細に記述する。次に, 誤り率推定の代替として, 受入サンプリングを適用することで, 同じ統計的保証を提供しながら, 必要なサンプルサイズを最大50%削減できることを示す。 Annotated datasets are an essential ingredient to train, evaluate, compare and productionalize supervised machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. Quality estimation is often performed by having experts manually label instances as correct or incorrect. But checking all annotated instances tends to be expensive. Therefore, in practice, usually only subsets are inspected; sizes are chosen mostly without justification or regard to statistical power and more often than not, are relatively small. Basing estimates on small sample sizes, however, can lead to imprecise values for the error rate. Using unnecessarily large sample sizes costs money that could be better spent, for instance on more annotations. Therefore, we first describe in detail how to use confidence intervals for finding the minimal sample size needed to estimate the annotation error rate. Then, we propose applying acceptance sampling as an alternative to error rate estimation We show that acceptance sampling can reduce the required sample sizes up to 50% while providing the same statistical guarantees.	翻訳日:2024-05-30 22:32:31 公開日:2024-05-29
# 単一画像の学習:マルチモーダル大言語モデルにおける効率的な機械学習 Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models ( http://arxiv.org/abs/2405.12523v2 ) ライセンス: Link先を確認	Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi,	(参考訳) 機械学習は、機械学習モデルにエンコードされたプライベートまたはセンシティブな情報を削除することによって、忘れられる権利を持つ個人に権限を与える。しかし、Multimodal Large Language Models (MLLM) にMUを効果的に適用できるかは、特にリークされた概念の視覚的データを忘れるシナリオにおいて不確実である。この課題を克服するために, 複数ステップで単一の画像を微調整することで, 概念の視覚的認識を解き放つための, SIU (Single Image Unlearning) を提案する。 SIUは2つの重要な側面から構成される。 i)多面的微調整データの構築。我々は,忘れられる概念の微調整データを構築するための4つの目標を導入する。 (二)共同訓練損失概念の視覚的認識を同期的に忘れ,MLLMの実用性を維持するために,Cross Entropy Lossと組み合わせた新しいDual Masked KL-divergence Lossを用いてMLLMを微調整する。本手法と並行して,MLLMにおけるMUの新しいベンチマークであるMMUBenchを確立し,その評価のためのメトリクスの集合を導入する。 MMUBench の実験結果から,SIU は既存手法の性能を大幅に上回っていることがわかった。さらに,SIUは侵入的メンバーシップ推論攻撃や脱獄攻撃を回避できることがわかった。私たちの知る限りでは、MLLMでMUを初めて探求しています。近い将来、コードとベンチマークをリリースします。 Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Jointly training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-29
# AIのタッチを見つける: LLM対応のスパンをテキストで識別する Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text ( http://arxiv.org/abs/2405.12689v2 ) ライセンス: Link先を確認	Yafu Li, Zhilin Wang, Leyang Cui, Wei Bi, Shuming Shi, Yue Zhang,	(参考訳) AI生成テキスト検出は、強力な言語モデルが人間レベルの生成に近づくにつれ、注目を集めている。限定的な作業は、(部分的には)AIパラフレーズテキストの検出に費やされている。しかし、AIパラフレーズは、テキストの洗練と多様性のための様々なアプリケーションシナリオで一般的に使用される。そこで本研究では,パラフレーズ付きテキストスパン検出(PTD)という新たな検出フレームワークを提案し,テキスト内のパラフレーズ付きテキストスパンを同定する。テキストレベルの検出とは異なり、PTDは全文を取り込み、各文にパラフレーズ度を示すスコアを割り当てる。パラフレーズ付きテキストスパン検出のための専用データセットであるPASTEDを構築した。 In-distriionとout-of-distriionの結果は、AIパラフレーズテキストスパンの同定におけるPTDモデルの有効性を示す。統計的およびモデル解析は、パラフレーズ付きテキストの周囲の文脈の重要な役割を説明する。広範な実験により、PTDモデルは多種多様なパラフレージングプロンプトと複数のパラフレージングテキストスパンに一般化できることが示されている。私たちはリソースをhttps://github.com/Linzwcs/PASTEDでリリースします。 AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-29
# 検索広告戦略の最適化:強化広告ランキングと入札のための強化強化学習と一般第二価格オークションの統合 Optimizing Search Advertising Strategies: Integrating Reinforcement Learning with Generalized Second-Price Auctions for Enhanced Ad Ranking and Bidding ( http://arxiv.org/abs/2405.13381v2 ) ライセンス: Link先を確認	Chang Zhou, Yang Zhao, Jin Cao, Yi Shen, Xiaoling Cui, Chiyu Cheng,	(参考訳) 本稿では,Eコマースプラットフォームにおける広告ランキングと入札機構に着目し,検索広告における戦略的最適化手法の統合について検討する。強化学習と進化戦略の組み合わせを用いて,多様なユーザインタラクションに適応し,広告主コスト,ユーザ関連性,プラットフォーム収益のバランスを最適化する動的モデルを提案する。提案手法は,広告の配置精度とコスト効率を大幅に向上させ,実際のシナリオにおけるモデルの適用性を示すものである。 This paper explores the integration of strategic optimization methods in search advertising, focusing on ad ranking and bidding mechanisms within E-commerce platforms. By employing a combination of reinforcement learning and evolutionary strategies, we propose a dynamic model that adjusts to varying user interactions and optimizes the balance between advertiser cost, user relevance, and platform revenue. Our results suggest significant improvements in ad placement accuracy and cost efficiency, demonstrating the model's applicability in real-world scenarios.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-29
# 物理AIハイブリッドモデリングによる天気予報の微粒化 Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling ( http://arxiv.org/abs/2405.13796v3 ) ライセンス: Link先を確認	Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai,	(参考訳) データ駆動人工知能(AI)モデルは、特に中距離や近距離での天気予報において大きな進歩を遂げている。しかし、ほとんどのデータ駆動の天気予報モデルは、時間次元の微細な物理的進化ではなく、データマッピングの学習に焦点を当てたブラックボックスシステムである。その結果、データセットの時間スケールの制限により、これらのモデルはより詳細な時間スケールでの予測を妨げている。本稿では,天気予報をトレーニングデータセットを超える細粒度テンポラルスケールに一般化する物理AIハイブリッドモデル(WeatherGFT)を提案する。具体的には、小さな時間スケール(例えば300秒)で物理進化をシミュレートするために慎重に設計されたPDEカーネルを使用し、学習可能なルータと並列ニューラルネットワークを用いてバイアス補正を行う。さらに、異なるリードタイムでのモデルの一般化を促進するためのリードタイムアウェアトレーニングフレームワークを導入する。物理AIモジュールの重み解析は、物理学が大きな進化をし、AIが適応的に修正を行うことを示している。大規模な実験により、WeatherGFTは時間単位のデータセットでトレーニングされ、複数のリードタイムで最先端のパフォーマンスを達成し、30分間の予測を一般化する能力を示している。 Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets prevent these models from forecasting at finer time scales. This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales beyond training dataset. Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale (e.g., 300 seconds) and use a parallel neural networks with a learnable router for bias correction. Furthermore, we introduce a lead time-aware training framework to promote the generalization of the model at different lead times. The weight analysis of physics-AI modules indicates that physics conducts major evolution while AI performs corrections adaptively. Extensive experiments show that WeatherGFT trained on an hourly dataset, achieves state-of-the-art performance across multiple lead times and exhibits the capability to generalize 30-minute forecasts.	翻訳日:2024-05-30 22:22:47 公開日:2024-05-29
# マルチモーダル大言語モデルにおける視覚的推論補充のための投機的プロンプト Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models ( http://arxiv.org/abs/2405.13872v2 ) ライセンス: Link先を確認	Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang,	(参考訳) CoT(Chain-of-Thought)と関連する合理性に基づく研究の最近の進歩は、複雑な推論タスクにおけるLarge Language Models(LLM)の性能を大幅に向上させた。 MLLM(Multimodal Large Language Models)の進化に伴い、複雑なマルチモーダル推論問題に対処する能力の向上が重要なフロンティアとなっている。しかし、CoTにマルチモーダルな論理を組み込むことは、まだ十分には研究されていない。本稿では,MLLMの視覚的合理性を段階的に抽出する,IoT(Image-of-Thought)プロンプト手法を提案する。具体的には、IoTプロンプトは入力画像と質問に基づいて重要な視覚情報抽出操作を自動的に設計することができる。視覚情報リファインメントの各ステップは、複雑な視覚的推論問題に対する回答をサポートする特定の視覚的理性を特定する。テキストCoT以外にも、IoTは視覚的およびテキスト的合理性を利用して、MLLMが複雑なマルチモーダル情報を理解するのに役立つ。 IoTプロンプトは、さまざまなMLLMのさまざまな視覚的理解タスクにおいて、ゼロショットの視覚的推論性能を改善した。さらに、IoTによって生成されたステップバイステップの視覚的特徴説明は、視覚的推論プロセスを解明し、大規模マルチモーダルモデルの認知過程の分析を支援する。 Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT has yet to be thoroughly investigated. We propose the Image-of-Thought (IoT) prompting method, which helps MLLMs to extract visual rationales step-by-step. Specifically, IoT prompting can automatically design critical visual information extraction operations based on the input images and questions. Each step of visual information refinement identifies specific visual rationales that support answers to complex visual reasoning questions. Beyond the textual CoT, IoT simultaneously utilizes visual and textual rationales to help MLLMs understand complex multimodal information. IoT prompting has improved zero-shot visual reasoning performance across various visual understanding tasks in different MLLMs. Moreover, the step-by-step visual feature explanations generated by IoT prompting elucidate the visual reasoning process, aiding in analyzing the cognitive processes of large multimodal models	翻訳日:2024-05-30 22:22:47 公開日:2024-05-29
# 実現可能なコンセプトセットジェネレータによる高速説明可能性 Fast Explainability via Feasible Concept Sets Generator ( http://arxiv.org/abs/2405.18664v1 ) ライセンス: Link先を確認	Deng Pan, Nuno Moniz, Nitesh Chawla,	(参考訳) 長年のジレンマは、一般的な適用性と推論速度という、より広範な説明方法の適用を防止する。一方、既存のモデルに依存しない説明法は、説明すべき予測モデルについて最小限の事前推定を行う。それでも、モデルの振る舞いを近似するために、伝播やバックプロパゲーションを通じてモデルに追加のクエリが必要であるため、推論が遅くなり、時間に敏感なタスクでの使用が妨げられる。一方で、低コストで高速な推論を実現するためのモデルに依存した様々な説明が提案されている。本研究では,モデルに依存しないアプローチの普遍性とモデル固有のアプローチの効率とのギャップを,予測モデルの構造を仮定せずに新たなフレームワークを提案し,推論時に高い効率を達成し,リアルタイムな説明を可能にすることによって橋渡しする。これを実現するために、まず、人間の理解可能な概念の集合を通して説明を定義し、最小限の概念集合を通してモデル予測を解明する枠組みを提案する。第二に、最小限の可能な集合生成器が予測モデルに付随する説明として学習できることを示し、予測のための説明を生成する。最後に、実時間推論を容易にしながら、堅牢な説明を提供する新しいモデルに依存しない手法を実装することにより、この枠組みを検証する。我々の主張は包括的な実験によって裏付けられ、我々のアプローチの有効性と効率を強調している。 A long-standing dilemma prevents the broader application of explanation methods: general applicability and inference speed. On the one hand, existing model-agnostic explanation methods usually make minimal pre-assumptions about the prediction models to be explained. Still, they require additional queries to the model through propagation or back-propagation to approximate the models' behaviors, resulting in slow inference and hindering their use in time-sensitive tasks. On the other hand, various model-dependent explanations have been proposed that achieve low-cost, fast inference but at the expense of limiting their applicability to specific model structures. In this study, we bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches by proposing a novel framework without assumptions on the prediction model's structures, achieving high efficiency during inference and allowing for real-time explanations. To achieve this, we first define explanations through a set of human-comprehensible concepts and propose a framework to elucidate model predictions via minimal feasible concept sets. Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model, generating explanations for predictions. Finally, we validate this framework by implementing a novel model-agnostic method that provides robust explanations while facilitating real-time inference. Our claims are substantiated by comprehensive experiments, highlighting the effectiveness and efficiency of our approach.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# Zipper: モダリティを再利用するための多層デコーダアーキテクチャ Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities ( http://arxiv.org/abs/2405.18669v1 ) ライセンス: Link先を確認	Vicky Zayats, Peter Chen, Melissa Merrari, Dirk Padfield,	(参考訳) 複数の生成基盤モデル、特に異なるモダリティで訓練されたモデルを統合することは、その部分の総和よりも大きい何かに重大な課題をもたらす。 2つの主要なハードルは、整列データ(同様の意味を持つが異なるモダリティで表現される概念)の可用性と、ドメイン間の生成タスクにおいて、元のユニモーダル能力を損なうことなく、効果的にユニモーダル表現を活用することである。本稿では,これらの問題に対処する多目的デコーダアーキテクチャであるZipperを提案する。音声とテキストのモダリティを融合させる実験では,限定されたテキスト音声データを持つシナリオにおいて,提案アーキテクチャが極めて競合的に機能することを示した。また,本モデルでは,対応する変調塔(e.g.テキスト)を凍結することにより,単調(e.g.テキスト・テキスト生成)生成性能を選択的に維持する柔軟性を示す。出力モダリティがテキストである自動音声認識(ASR)のようなクロスモーダルタスクにおいて、テキストバックボーンの凍結が無視可能な性能劣化をもたらすことを示す。出力モダリティが音声であるTTS(text-to-Speech Generation)のようなクロスモーダルなタスクでは、事前訓練された音声バックボーンを使用することで、ベースラインよりも優れたパフォーマンスが得られることを示す。 Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain generative tasks, without compromising their original unimodal capabilities. We propose Zipper, a multi-tower decoder architecture that addresses these concerns by using cross-attention to flexibly compose multimodal generative models from independently pre-trained unimodal decoders. In our experiments fusing speech and text modalities, we show the proposed architecture performs very competitively in scenarios with limited aligned text-speech data. We also showcase the flexibility of our model to selectively maintain unimodal (e.g., text-to-text generation) generation performance by freezing the corresponding modal tower (e.g. text). In cross-modal tasks such as automatic speech recognition (ASR) where the output modality is text, we show that freezing the text backbone results in negligible performance degradation. In cross-modal tasks such as text-to-speech generation (TTS) where the output modality is speech, we show that using a pre-trained speech backbone results in superior performance to the baseline.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# リレーショナルデータベースへの微分プライベート合成データの適用 Adapting Differentially Private Synthetic Data to Relational Databases ( http://arxiv.org/abs/2405.18670v1 ) ライセンス: Link先を確認	Kaveh Alimohammadi, Hao Wang, Ojas Gulati, Akash Srivastava, Navid Azizan,	(参考訳) 既存の差分プライベート(DP)合成データ生成機構は、典型的には単一ソーステーブルを仮定する。実際には、データは複数のテーブルに分散し、テーブルにまたがる関係を持つことが多い。本稿では,既存のDP機構と組み合わせて合成関係データベースを生成するアルゴリズムを提案する。本アルゴリズムは,参照整合性を維持しつつ,低次辺分布の近似誤差を最小限に抑えるために,個々の合成表間の関係を反復的に洗練する。最後に,提案アルゴリズムのDPと理論的実用性を保証する。 Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce the first-of-its-kind algorithm that can be combined with any existing DP mechanisms to generate synthetic relational databases. Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors in terms of low-order marginal distributions while maintaining referential integrity. Finally, we provide both DP and theoretical utility guarantees for our algorithm.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 透かしのカウンターファクトな説明 Watermarking Counterfactual Explanations ( http://arxiv.org/abs/2405.18671v1 ) ライセンス: Link先を確認	Hangzhi Guo, Amulya Yadav,	(参考訳) 説明可能な人工知能(XAI)の分野は、現代の機械学習(ML)モデルを支える意思決定プロセスについてエンドユーザに説明を提供する技術に焦点を当てている。 XAIテクニックの広大な宇宙では、予測結果に悪影響を及ぼす個々のエンドユーザーに対して、容易に理解しやすく、行動可能な(あるいは対照的な)ケースを提供することによって、MLモデルの予測を説明するために、反ファクトリアル(CF)の説明がエンドユーザによって好まれることが多い。しかし、最近の研究では、実世界のアプリケーションでCFの説明を使用する際の重大なセキュリティ上の懸念が示されている。特に、悪意のある敵はCFの説明を利用して、プロプライエタリなMLモデルに対してクエリ効率の良いモデル抽出攻撃を行うことができる。本稿では,不許可なモデル抽出攻撃(CF説明に依存する)の検出に利用することができるモデル非依存型透かしフレームワーク(CF説明に透かしを追加する)を提案する。提案するフレームワークは,2段階の最適化問題を解くことで,生成したCF説明に識別不能な透かしを埋め込むことにより,これらのCF説明に依存する将来のモデル抽出攻撃を,Null hypothesis important testing (NHST) スキームを用いて検出し,これらの埋め込み透かしが生成されたCF説明の品質を損なわないことを保証する。我々は,本フレームワークの性能を,実世界のさまざまなデータセット,CF説明手法,モデル抽出手法で評価し,透かしを用いたCF説明を用いてトレーニングした抽出MLモデルを正確に識別するために,透かし検出システムを使用することを実証した。我々の研究は、現実世界のアプリケーションでCFの説明を安全に採用するための道を開いた。 The field of Explainable Artificial Intelligence (XAI) focuses on techniques for providing explanations to end-users about the decision-making processes that underlie modern-day machine learning (ML) models. Within the vast universe of XAI techniques, counterfactual (CF) explanations are often preferred by end-users as they help explain the predictions of ML models by providing an easy-to-understand & actionable recourse (or contrastive) case to individual end-users who are adversely impacted by predicted outcomes. However, recent studies have shown significant security concerns with using CF explanations in real-world applications; in particular, malicious adversaries can exploit CF explanations to perform query-efficient model extraction attacks on proprietary ML models. In this paper, we propose a model-agnostic watermarking framework (for adding watermarks to CF explanations) that can be leveraged to detect unauthorized model extraction attacks (which rely on the watermarked CF explanations). Our novel framework solves a bi-level optimization problem to embed an indistinguishable watermark into the generated CF explanation such that any future model extraction attacks that rely on these watermarked CF explanations can be detected using a null hypothesis significance testing (NHST) scheme, while ensuring that these embedded watermarks do not compromise the quality of the generated CF explanations. We evaluate this framework's performance across a diverse set of real-world datasets, CF explanation methods, and model extraction techniques, and show that our watermarking detection system can be used to accurately identify extracted ML models that are trained using the watermarked CF explanations. Our work paves the way for the secure adoption of CF explanations in real-world applications.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 微視的画像分類のためのLLMに基づく階層的概念分解 LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification ( http://arxiv.org/abs/2405.18672v1 ) ライセンス: Link先を確認	Renyi Qu, Mark Yatskar,	(参考訳) 視覚言語タスクの解釈可能なモデルの最近の進歩は、競争的な性能を達成したが、大きな言語モデル(LLM)からの非構造化テキスト出力に依存しているため、その解釈可能性に悩まされることがしばしばある。これはランダム性を導入し、AIシステムの安全性問題に対処するために不可欠な透明性と信頼性の両方を損なう。本稿では,構造化概念解析によるモデル解釈可能性の向上を目的とした新しいフレームワークである‘texttt{Hi-CoDe}(階層概念分解)’を紹介する。 1)入力画像を視覚概念の階層構造に分解し,視覚概念木を形成する。 2) CLIPから派生した概念特化機能を利用する単純な線形分類器のアンサンブルを用いて分類を行う。我々のアプローチは、最先端のモデルの性能だけでなく、意思決定プロセスに対する明確な洞察を提供し、さまざまな概念の重要性を強調することによって透明性も向上します。これにより、潜在的な障害モードを詳細に分析し、モデルコンパクト性を向上させることができるため、精度を損なうことなく、新しいベンチマークを解釈可能である。 Recent advancements in interpretable models for vision-language tasks have achieved competitive performance; however, their interpretability often suffers due to the reliance on unstructured text outputs from large language models (LLMs). This introduces randomness and compromises both transparency and reliability, which are essential for addressing safety issues in AI systems. We introduce \texttt{Hi-CoDe} (Hierarchical Concept Decomposition), a novel framework designed to enhance model interpretability through structured concept analysis. Our approach consists of two main components: (1) We use GPT-4 to decompose an input image into a structured hierarchy of visual concepts, thereby forming a visual concept tree. (2) We then employ an ensemble of simple linear classifiers that operate on concept-specific features derived from CLIP to perform classification. Our approach not only aligns with the performance of state-of-the-art models but also advances transparency by providing clear insights into the decision-making process and highlighting the importance of various concepts. This allows for a detailed analysis of potential failure modes and improves model compactness, therefore setting a new benchmark in interpretability without compromising the accuracy.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# ベイズ忠実データ同化のためのディープベイズフィルタ Deep Bayesian Filter for Bayes-faithful Data Assimilation ( http://arxiv.org/abs/2405.18674v1 ) ライセンス: Link先を確認	Yuta Tarumi, Keisuke Fukuda, Shin-ichi Maeda,	(参考訳) 非線形状態空間モデルの状態推定は難しい課題である。既存の同化法は主に、真の後部が必然的にガウス的でないような物理的空間上のガウス的後部を仮定する。非線形状態空間モデル(SSM)のデータ同化のためのディープベイズフィルタ(DBF)を提案する。 DBFは、新しい潜伏変数 $h_t$ を新しい潜伏変数 (``fancy'') 空間上に構築し、観測を $o_t$ に同化する。周辺一空想空間上の状態遷移を直線的に制限すること (ii) ガウス逆観測作用素 $q(h_t\|o_t)$ を学習すると、後部は常に DBF に対してガウス的である。非常に特筆すべきは、後部の構造化設計は、時間ステップでモンテカルロサンプリングエラーを蓄積することなく、後部の再帰的な計算のための解析公式を提供することである。 DBF はガウス逆観測作用素 $q(h_t\|o_t)$ とその他の潜在 SSM パラメータ(例えば、ダイナミックス行列)を求める。実験の結果,DBFは様々なタスクや条件下でモデルベースアプローチや潜在同化手法よりも優れていた。 State estimation for nonlinear state space models is a challenging task. Existing assimilation methodologies predominantly assume Gaussian posteriors on physical space, where true posteriors become inevitably non-Gaussian. We propose Deep Bayesian Filtering (DBF) for data assimilation on nonlinear state space models (SSMs). DBF constructs new latent variables $h_t$ on a new latent (``fancy'') space and assimilates observations $o_t$. By (i) constraining the state transition on fancy space to be linear and (ii) learning a Gaussian inverse observation operator $q(h_t\|o_t)$, posteriors always remain Gaussian for DBF. Quite distinctively, the structured design of posteriors provides an analytic formula for the recursive computation of posteriors without accumulating Monte-Carlo sampling errors over time steps. DBF seeks the Gaussian inverse observation operators $q(h_t\|o_t)$ and other latent SSM parameters (e.g., dynamics matrix) by maximizing the evidence lower bound. Experiments show that DBF outperforms model-based approaches and latent assimilation methods in various tasks and conditions.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# Zero-to-Hero:アテンションマップフィルタリングによるゼロショット新規ビュー合成の実現 Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering ( http://arxiv.org/abs/2405.18677v1 ) ライセンス: Link先を確認	Ido Sobol, Chenfeng Xu, Or Litany,	(参考訳) 単一のソースイメージに基づいて任意のビューからリアルなイメージを生成することは、電子商取引から没入型仮想体験に至るまで、コンピュータビジョンにおいて重要な課題である。拡散モデル、特にZero-1-to-3モデルの最近の進歩は、可塑性ビュー、ビデオ、および3Dモデルを生成するために広く採用されている。しかし、これらのモデルは、新しいビュー生成における矛盾と不確実性、特に視点の変化に苦慮している。本研究では,Zero-1-to-3のデノナイズ過程において,注目マップを操作することによってビュー合成を向上させる新しいテストタイム手法であるZero-to-Heroを提案する。偏極化過程と確率勾配降下(SGD)の類似性を引き出すことにより,注目マップを集約し,生成信頼性と信頼性を向上させるフィルタリング機構を実装した。このプロセスは、再訓練や重要な計算資源を必要とせず、幾何的整合性を改善する。さらに、ソースビューからの情報を統合するために自己認識機構を変更し、形状歪みを低減する。これらのプロセスは、特別なサンプリングスケジュールによってさらにサポートされます。実験結果から, 分布域外オブジェクトの多種多様な集合に対して, 忠実度と整合性に大きな改善が認められた。 Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences. Recent advancements in diffusion models, particularly the Zero-1-to-3 model, have been widely adopted for generating plausible views, videos, and 3D models. However, these models still struggle with inconsistencies and implausibility in new views generation, especially for challenging changes in viewpoint. In this work, we propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps during the denoising process of Zero-1-to-3. By drawing an analogy between the denoising process and stochastic gradient descent (SGD), we implement a filtering mechanism that aggregates attention maps, enhancing generation reliability and authenticity. This process improves geometric consistency without requiring retraining or significant computational resources. Additionally, we modify the self-attention mechanism to integrate information from the source view, reducing shape distortions. These processes are further supported by a specialized sampling schedule. Experimental results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# Vim-F: 周波数領域での学習から得られる視覚状態空間モデル Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain ( http://arxiv.org/abs/2405.18679v1 ) ライセンス: Link先を確認	Juntao Zhang, Kun Bian, Peng Cheng, Wenbo An, Jianning Liu, Jun Zhou,	(参考訳) 近年、Mambaディープラーニングモデルとして知られる効率的なハードウェア対応設計を持つステートスペースモデル(SSM)は、言語理解のような長いシーケンスのモデリングにおいて大きな進歩を遂げている。したがって、SSMに基づく効率的で汎用的な視覚バックボーンの構築は有望な方向である。従来の畳み込みニューラルネットワーク(CNN)やビジョントランスフォーマー(ViT)と比較して、ビジョン・マンバ(ViM)メソッドのパフォーマンスは、まだ完全に競合していない。 SSMが画像データを処理するために、ViMは一般的に2D画像を1Dシーケンスに平らにし、必然的にいくつかの2Dローカル依存関係を無視し、グローバルな視点から空間的関係を解釈するモデルの能力を弱める。我々は、Fast Fourier Transform (FFT) を用いて特徴マップのスペクトルを取得し、元の特徴マップに追加し、VIMが周波数領域と空間領域の両方で統一された視覚表現をモデル化できるようにする。周波数領域情報の導入により、ViMはスキャン中にグローバルな受容野を持つことができる。周波数領域と空間領域の両方で純粋なマンバエンコーダとスキャンを利用するVim-Fと呼ばれる新しいモデルを提案する。さらに,Vim-F への位置埋め込みの必要性を疑問視し,Vim-F における位置埋め込みの必要性を考察した。最後に、Vim-Fのパッチ埋め込みを再設計し、より局所的な相関を捉えるために畳み込みステムを活用し、Vim-Fの性能をさらに向上させる。コードは以下の通り: \url{https://github.com/yws-wxs/Vim-F}。 In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences such as language understanding. Therefore, building efficient and general-purpose visual backbones based on SSMs is a promising direction. Compared to traditional convolutional neural networks (CNNs) and Vision Transformers (ViTs), the performance of Vision Mamba (ViM) methods is not yet fully competitive. To enable SSMs to process image data, ViMs typically flatten 2D images into 1D sequences, inevitably ignoring some 2D local dependencies, thereby weakening the model's ability to interpret spatial relationships from a global perspective. We use Fast Fourier Transform (FFT) to obtain the spectrum of the feature map and add it to the original feature map, enabling ViM to model a unified visual representation in both frequency and spatial domains. The introduction of frequency domain information enables ViM to have a global receptive field during scanning. We propose a novel model called Vim-F, which employs pure Mamba encoders and scans in both the frequency and spatial domains. Moreover, we question the necessity of position embedding in ViM and remove it accordingly in Vim-F, which helps to fully utilize the efficient long-sequence modeling capability of ViM. Finally, we redesign a patch embedding for Vim-F, leveraging a convolutional stem to capture more local correlations, further improving the performance of Vim-F. Code is available at: \url{https://github.com/yws-wxs/Vim-F}.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 高次元近傍探索のためのナビゲート可能なグラフ:構築と限界 Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits ( http://arxiv.org/abs/2405.18680v1 ) ライセンス: Link先を確認	Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, Torsten Suel,	(参考訳) 近年,グラフに基づく近接探索手法への関心が高まっており,その多くが高次元点集合上のナビゲート可能なグラフの構築に重点を置いている。グラフは、任意の開始ノードから任意の目標ノードへ、所定の距離関数に従って目的地に最も近い隣人へ常に移動する、欲求ルーティング戦略を用いて、うまく移動することができればナビゲート可能である。完全なグラフは任意の点集合に対してナビゲート可能であるが、アプリケーションにとって重要な問題はスペーサーグラフを構築することができるかどうかである。この問題は低次元においてかなりよく理解されているが、高次元の点集合に対する最初の上界と下界のいくつかを確立する。まず、任意の次元の任意の$n$点に対して平均次数$O(\sqrt{n \log n })$のナビゲート可能なグラフを構築するための単純かつ効率的な方法を与える。ユークリッド計量の$O(\log n)$次元の下でも、任意の$\alpha < 1/2$に対して平均級数$O(n^{\alpha})$のナビゲート可能なグラフが存在しない。我々の下界は二項確率変数の鋭い反集中境界に依存しており、これはランダムな点の集合の近傍が著しく重複しないことを示すために使われる。 There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 組合せ最適化のためのランダムキーGRASP A random-key GRASP for combinatorial optimization ( http://arxiv.org/abs/2405.18681v1 ) ライセンス: Link先を確認	Antonio A. Chaves, Mauricio G. C. Resende, Ricardo M. A. Silva,	(参考訳) 本稿ではランダムキーオプティマイザ(RKO)パラダイムを用いた問題独立なGRASPメタヒューリスティックを提案する。 GRASP (greedy randomized Adaptive search procedure) は、半グレディな構成手順を繰り返し適用し、その後局所的な探索手順を施すメタヒューリスティックな組合せ最適化法である。すべてのイテレーションで見つかる最良のソリューションは、GRASPのソリューションとして返される。 Continuous GRASP (C-GRASP) は、ユニットハイパーキューブの継続的な最適化のためのGRASPの拡張である。ランダムキー最適化器(RKO)は、ランダムキーのベクトルを用いて、組合せ最適化問題の解を符号化する。デコーダを使用して、ランダムキーのベクトルによって符号化されたソリューションを評価する。ランダムキーGRASPは、デコーダを用いてユニットハイパーキューブの点を評価するC-GRASPである。問題非依存のコンポーネントと問題依存のデコーダからなるランダムキーGRASPについて述べる。概念実証として、ランダムキーGRASPは、旅行セールスマン問題、ハブのツリー配置問題、スタイナー三重被覆問題、ノード容量グラフ分割問題、ジョブシークエンシングとツール切替問題という5つのNPハード組合せ最適化問題でテストされる。 This paper proposes a problem-independent GRASP metaheuristic using the random-key optimizer (RKO) paradigm. GRASP (greedy randomized adaptive search procedure) is a metaheuristic for combinatorial optimization that repeatedly applies a semi-greedy construction procedure followed by a local search procedure. The best solution found over all iterations is returned as the solution of the GRASP. Continuous GRASP (C-GRASP) is an extension of GRASP for continuous optimization in the unit hypercube. A random-key optimizer (RKO) uses a vector of random keys to encode a solution to a combinatorial optimization problem. It uses a decoder to evaluate a solution encoded by the vector of random keys. A random-key GRASP is a C-GRASP where points in the unit hypercube are evaluated employing a decoder. We describe random key GRASP consisting of a problem-independent component and a problem-dependent decoder. As a proof of concept, the random-key GRASP is tested on five NP-hard combinatorial optimization problems: traveling salesman problem, tree of hubs location problem, Steiner triple covering problem, node capacitated graph partitioning problem, and job sequencing and tool switching problem.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# GPTは医学的理解を再定義できるか? 医学的機械読解におけるGPTの評価 Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension ( http://arxiv.org/abs/2405.18682v1 ) ライセンス: Link先を確認	Shubham Vatsal, Ayush Singh,	(参考訳) 大規模言語モデル(LLM)は、異なる領域における多くのタスクにおいて顕著なパフォーマンスを示している。しかし, 閉本バイオメディカル機械読解術(MRC)の成績は, 深く評価されていない。本研究は,4つの閉本バイオメディカルMCCベンチマークにおけるGPTの評価である。従来のプロンプト手法を実験し、新しいプロンプト手法を導入する。 LLM固有の検索問題のいくつかを解決するため,従来のRAGセットアップにおいて,ベクトルデータベースを用いて重要なチャンクを検索する必要性を緩和するImplicit Retrieval Augmented Generation (RAG) というプロンプト戦略を提案する。さらに,本手法による自然言語生成の質的評価について報告する。その結果、我々の新しいプロンプト技術は、4つのデータセットのうち2つで最高のパフォーマンスを得ることができ、残りの2つにランク付けできることがわかった。実験により、ゼロショット設定でもGPTのような現代のLLMは教師付きモデルよりも優れており、2つのベンチマークで新たなSoTA(State-of-the-art)結果が得られた。 Large language models (LLMs) have shown remarkable performance on many tasks in different domains. However, their performance in closed-book biomedical machine reading comprehension (MRC) has not been evaluated in depth. In this work, we evaluate GPT on four closed-book biomedical MRC benchmarks. We experiment with different conventional prompting techniques as well as introduce our own novel prompting method. To solve some of the retrieval problems inherent to LLMs, we propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases to retrieve important chunks in traditional RAG setups. Moreover, we report qualitative assessments on the natural language generation outputs from our approach. The results show that our new prompting technique is able to get the best performance in two out of four datasets and ranks second in rest of them. Experiments show that modern-day LLMs like GPT even in a zero-shot setting can outperform supervised models, leading to new state-of-the-art (SoTA) results on two of the benchmarks.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 半群正規化を用いた時間連続ネットワークによる画像登録のための差分型学習 Learning Diffeomorphism for Image Registration with Time-Continuous Networks using Semigroup Regularization ( http://arxiv.org/abs/2405.18684v1 ) ライセンス: Link先を確認	Mohammadjavad Matinkia, Nilanjan Ray,	(参考訳) DIR(Diffomorphic Image registration)は、3次元画像解析において重要な課題であり、画像間の変形を保存するトポロジーを見つけることを目的としている。フローマップ微分方程式の解を微分同相変形として焦点をあてた最近の手法では、離散時間ステップと様々な正規化項を使い、ヤコビアンの負の行列式をペナル化し、解ベクトル場の滑らかさを課す。本稿では, 時間連続体における微分同相を, 正規化項が少なく, 付加的な統合を伴わないような, 微分同相な3次元画像登録のための新しい学習手法を提案する。フローマップの基本特性の1つとして、半群特性を正規化の唯一の形式として利用し、一対のイメージ間の時間的に連続な微分同相流を保証する。この特性を活用することで、トレーニングと評価の両方において、さらなる正規化項の必要性が軽減され、スケーリングとスキャアリング統合が不要になる。時間連続微分同相を実現するために、拡散モデルでよく用いられる手法である時間埋め込みユニペットを用いる。提案手法は, 連続時間間隔における微分同相性を保証することにより, より良い登録結果が得られることを示す。 2つの公開データセット(OASISとCANDI)に対する実験結果は、学習に基づく手法と最適化に基づく手法の両方よりも、我々のモデルの方が優れていることを示す。 Diffeomorphic image registration (DIR) is a critical task in 3D medical image analysis, aimed at finding topology preserving deformations between pairs of images. Focusing on the solution of the flow map differential equation as the diffeomorphic deformation, recent methods use discrete timesteps along with various regularization terms to penalize the negative determinant of Jacobian and impose smoothness of the solution vector field. In this paper, we propose a novel learning-based approach for diffeomorphic 3D-image registration which finds the diffeomorphisms in the time continuum with fewer regularization terms and no additional integration. As one of the fundamental properties of flow maps, we exploit the semigroup property as the only form of regularization, ensuring temporally continuous diffeomorphic flows between pairs of images. Leveraging this property, our method alleviates the need for additional regularization terms and scaling and squaring integration during both training and evaluation. To achieve time-continuous diffeomorphisms, we employ time-embedded UNets, a technique commonly utilized in diffusion models. The proposed method reveals that ensuring diffeomorphism in a continuous time interval leads to better registration results. Experimental results on two public datasets (OASIS and CANDI) demonstrate the superiority of our model over both learning-based and optimization-based methods.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 学習密度比による拒絶 Rejection via Learning Density Ratios ( http://arxiv.org/abs/2405.18686v1 ) ライセンス: Link先を確認	Alexander Soen, Hisham Husain, Philip Schulz, Vu Nguyen,	(参考訳) 拒絶による分類は、モデルを予測しないことを許容する学習パラダイムとして現れます。主なアプローチは、典型的な損失関数を増大させることで教師付き学習パイプラインを変更することである。そこで我々は,事前学習したモデルの性能を最大化する理想的なデータ分布を求める。これは損失リスクの最適化を通じて$ \phi$-divergence regularization 項で定式化することができる。この理想的な分布を通して、この分布とデータ分布の密度比を利用して拒絶判定を行うことができる。私たちは、$ \phi $-divergencesが$ \alpha $-divergenceのファミリーによって指定される設定に焦点を当てます。私たちのフレームワークはクリーンでノイズの多いデータセットで実証的にテストされています。 Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ \phi$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ \phi $-divergences are specified by the family of $ \alpha $-divergence. Our framework is tested empirically over clean and noisy datasets.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 家庭用ロボティクスの強化:効率的なトレーニングとパフォーマンス向上のための対話型強化学習 Advancing Household Robotics: Deep Interactive Reinforcement Learning for Efficient Training and Enhanced Performance ( http://arxiv.org/abs/2405.18687v1 ) ライセンス: Link先を確認	Arpita Soni, Sujatha Alla, Suresh Dodda, Hemanth Volikatla,	(参考訳) 家庭内ロボットが家事を行う市場は、こうしたロボットが日常の責任を和らげるにつれて成長している。国内ロボットは一般的に、労働者を解雇したとしてしばしば批判される産業ロボットとは対照的に、人間の労働を緩和する役割で歓迎されている。しかし、これらのロボットが家事を行う前には、周囲の認識、意思決定、人間の行動の把握など、いくつかの小さな活動に精通する必要がある。強化学習(Reinforcement Learning, RL)は、ロボットが自分の環境と対話し、報酬を最大限にするために自分の行動を最適化する方法を学ぶための、重要なロボティクス技術として登場した。しかし、Deep Reinforcement Learningの目標は、RLとニューラルネットワークを組み合わせることで、現実の環境でより複雑で継続的なアクションステートスペースに対処することだ。 DeepRLの有効性は、対話的なフィードバックを通じてさらに強化され、トレーナーがロボットの学習プロセスを高速化するためのリアルタイムガイダンスを提供する。それにもかかわらず、現在の手法には欠点があり、すなわち、同じ条件下で繰り返し学習される指導の一時的な適用である。そこで本研究では,永続的なルールベースシステムを利用したDeep Interactive Reinforcement Learningを通じて,情報とアドバイスを保存・再利用する新しい手法を提案する。この方法は訓練プロセスを短縮するだけでなく、インストラクターが実行しなければならない反復回数を減らす。本研究は,家庭用ロボットの開発を推進し,学習者としての有効性と効率を向上させる可能性を秘めている。 The market for domestic robots made to perform household chores is growing as these robots relieve people of everyday responsibilities. Domestic robots are generally welcomed for their role in easing human labor, in contrast to industrial robots, which are frequently criticized for displacing human workers. But before these robots can carry out domestic chores, they need to become proficient in several minor activities, such as recognizing their surroundings, making decisions, and picking up on human behaviors. Reinforcement learning, or RL, has emerged as a key robotics technology that enables robots to interact with their environment and learn how to optimize their actions to maximize rewards. However, the goal of Deep Reinforcement Learning is to address more complicated, continuous action-state spaces in real-world settings by combining RL with Neural Networks. The efficacy of DeepRL can be further augmented through interactive feedback, in which a trainer offers real-time guidance to expedite the robot's learning process. Nevertheless, the current methods have drawbacks, namely the transient application of guidance that results in repeated learning under identical conditions. Therefore, we present a novel method to preserve and reuse information and advice via Deep Interactive Reinforcement Learning, which utilizes a persistent rule-based system. This method not only expedites the training process but also lessens the number of repetitions that instructors will have to carry out. This study has the potential to advance the development of household robots and improve their effectiveness and efficiency as learners.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# 適応的経験推定による効率的選好型強化学習 Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation ( http://arxiv.org/abs/2405.18688v1 ) ライセンス: Link先を確認	Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han,	(参考訳) 評価に基づく強化学習(PbRL)は、報酬工学を使わずにトレーニングエージェントに優れた能力を示す。しかしながら、PbRLの顕著な制限は、人間のフィードバックへの依存である。この依存は、価値/政治学習と組み合わせた正確な報酬学習を必要とする学習ループに起因しており、かなりの数のサンプルを必要とする。学習ループを強化するために,ラベルスムース化とポリシー規則化を併用した効率的なPbRL法であるSEERを提案する。ラベルスムーシングは、人間の嗜好ラベルをスムースにすることで報酬モデルの過度な適合を減らす。さらに、現在のリプレイメモリからサポートされた状態-アクションペアを使用して、保守的な推定値$\widehat{Q}$をブートストラップし、過大評価バイアスを緩和し、ポリシー学習規則化に利用します。オンラインとオフラインの両方で、さまざまな複雑なタスクに対する実験結果から、我々のアプローチがフィードバック効率を向上し、最先端の手法を大きなマージンで上回ることを示した。アブレーション研究により、SEERは以前の研究と比べてより正確なQ-関数を達成することが明らかとなった。 Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques. Label smoothing reduces overfitting of the reward model by smoothing human preference labels. Additionally, we bootstrap a conservative estimate $\widehat{Q}$ using well-supported state-action pairs from the current replay memory to mitigate overestimation bias and utilize it for policy learning regularization. Our experimental results across a variety of complex tasks, both in online and offline settings, demonstrate that our approach improves feedback efficiency, outperforming state-of-the-art methods by a large margin. Ablation studies further reveal that SEER achieves a more accurate Q-function compared to prior work.	翻訳日:2024-05-30 21:13:51 公開日:2024-05-29
# DeepHGNN:階層的関連多変量時系列のグラフニューラルネットワークによる予測手法の検討 DeepHGNN: Study of Graph Neural Network based Forecasting Methods for Hierarchically Related Multivariate Time Series ( http://arxiv.org/abs/2405.18693v1 ) ライセンス: Link先を確認	Abishek Sriramulu, Nicolas Fourrier, Christoph Bergmeir,	(参考訳) グラフニューラルネットワーク(GNN)は,特に系列内時間相関と系列間関係を同時に考慮する能力について,予測領域において大きな注目を集めている。本稿では,複雑な階層構造における予測を目的とした新しい階層型GNN(DeepHGNN)フレームワークを提案する。 DeepHGNNのユニークな点は、その革新的なグラフベースの階層的補間とエンドツーエンドの和解機構にある。このアプローチは、様々な階層レベルの予測精度とコヒーレンスを保証し、それらをまたいだ信号を共有し、階層的な予測において重要な課題に対処する。階層的時系列における重要な洞察は、予測可能性のレベル間でのばらつきであり、上位レベルは通常より予測可能なコンポーネントを提示する。 DeepHGNNは、すべての階層レベルの知識をプールし、活用することで、この洞察に基づいて、全体的な予測精度を向上させる。複数の最先端モデルに対する包括的評価セットにより,DeepHGNNの優れた性能が確認された。本研究は,DeepHGNNが予測精度を大幅に向上させる効果を実証するだけでなく,階層的時系列予測におけるグラフベースの手法の理解にも寄与する。 Graph Neural Networks (GNN) have gained significant traction in the forecasting domain, especially for their capacity to simultaneously account for intra-series temporal correlations and inter-series relationships. This paper introduces a novel Hierarchical GNN (DeepHGNN) framework, explicitly designed for forecasting in complex hierarchical structures. The uniqueness of DeepHGNN lies in its innovative graph-based hierarchical interpolation and an end-to-end reconciliation mechanism. This approach ensures forecast accuracy and coherence across various hierarchical levels while sharing signals across them, addressing a key challenge in hierarchical forecasting. A critical insight in hierarchical time series is the variance in forecastability across levels, with upper levels typically presenting more predictable components. DeepHGNN capitalizes on this insight by pooling and leveraging knowledge from all hierarchy levels, thereby enhancing the overall forecast accuracy. Our comprehensive evaluation set against several state-of-the-art models confirm the superior performance of DeepHGNN. This research not only demonstrates DeepHGNN's effectiveness in achieving significantly improved forecast accuracy but also contributes to the understanding of graph-based methods in hierarchical time series forecasting.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# 収束保証者によるスペクトルリスク安全強化学習 Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees ( http://arxiv.org/abs/2405.18698v1 ) ライセンス: Link先を確認	Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, Songhwai Oh,	(参考訳) リスク対応型強化学習(RCRL)の分野は,リスク対策に基づく制約を明示的に扱うことにより,最悪のシナリオの可能性を効果的に低減するために開発されている。しかし、リスク尺度の非線形性は収束性と最適性を達成することを困難にしている。非線形性によって引き起こされる困難を克服するために,スペクトルリスク尺度制約付きRLアルゴリズム,スペクトルリスク制約付きポリシー最適化(SRCPO)を提案する。双レベル最適化構造では、外部問題はリスク測度から導出される双対変数を最適化することであり、内部問題はこれらの双対変数が与えられたときの最適ポリシーを見つけることである。提案手法は,我々の知る限り,表の設定における最適収束を保証する最初の方法である。さらに,提案手法は連続制御タスク上で評価され,制約を満たす他のRCRLアルゴリズムの中で最高の性能を示した。 The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO), a bilevel optimization approach that utilizes the duality of spectral risk measures. In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables. The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting. Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# シーン認識型ニューラルヒューマンモーション予測のためのマルチコンディション潜時拡散ネットワーク Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction ( http://arxiv.org/abs/2405.18700v1 ) ライセンス: Link先を確認	Xuehao Gao, Yang Yang, Yang Wu, Shaoyi Du, Auo-Jun Qi,	(参考訳) 3次元の人間の動きを推定することは、人間の活動を理解し、その人の意図を分析するなど、多くの応用において基本である。人間の動きを予測するために多くの実りある努力がなされてきたが、ほとんどのアプローチはポーズ駆動の予測に焦点を合わせ、文脈環境から離れて人間の動きを推測することで、シーン内の身体の位置運動を残している。しかし、現実世界の人間の動きはゴール指向であり、周囲のシーンの空間的レイアウトの影響を強く受けている。本稿では,従来の3次元体の動きと現在の3次元シーンのコンテキストに基づいて,人間の動作予測タスクを多条件共同推論問題として再構成するマルチコンディション潜伏拡散ネットワーク(MCLD)を提案する。具体的には、MCLDは、原動列上での関節分布を直接モデル化する代わりに、後続の埋め込み空間内で条件拡散プロセスを実行し、過去の体の動きと現在のシーン条件の埋め込みから将来の人間の動き埋め込みへの相互マッピングを特徴付ける。大規模人間の動き予測データセットに関する大規模な実験により、我々のMCLDは、現実的および多種多様な予測に関する最先端の手法よりも大幅に改善されていることが示された。 Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes. In this paper, instead of planning future human motion in a 'dark' room, we propose a Multi-Condition Latent Diffusion network (MCLD) that reformulates the human motion prediction task as a multi-condition joint inference problem based on the given historical 3D body motion and the current 3D scene contexts. Specifically, instead of directly modeling joint distribution over the raw motion sequences, MCLD performs a conditional diffusion process within the latent embedding space, characterizing the cross-modal mapping from the past body movement and current scene context condition embeddings to the future human motion embedding. Extensive experiments on large-scale human motion prediction datasets demonstrate that our MCLD achieves significant improvements over the state-of-the-art methods on both realistic and diverse predictions.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# FocSAM: セグメンテーションにおけるフォーカスされたオブジェクトを深く掘り下げる FocSAM: Delving Deeply into Focused Objects in Segmenting Anything ( http://arxiv.org/abs/2405.18706v1 ) ライセンス: Link先を確認	You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji,	(参考訳) Segment Anything Model (SAM)はセグメンテーションモデルの注目すべきマイルストーンであり、その堅牢なゼロショット機能と多様なプロンプトを扱う能力によって強調されている。 SAMは、インタラクティブセグメンテーションを、大きなエンコーダによるイメージ前処理と、軽量デコーダによるインタラクティブ推論に分離し、効率的なリアルタイムパフォーマンスを保証するパイプラインに従う。しかしSAMは、このパイプライン上での挑戦的なサンプルの安定性の問題に直面している。これらの問題は2つの主な要因から生じる。まず、画像前処理は、画像レベルのズームイン戦略を用いてSAMを動的に無効にし、インタラクション中にターゲットオブジェクトに再フォーカスする。第二に、軽量デコーダは画像埋め込みと対話的な情報を十分に統合するのに苦労する。これら2つの制限に対処するため、我々は2つの重要な側面に基づいてパイプラインを再設計したFocSAMを提案する。まず,Dwin-MSA(Dynamic Window Multi-head Self-Attention)を提案する。 Dwin-MSAは対象オブジェクトの周囲の注意計算をローカライズし、最小の計算オーバーヘッドでオブジェクト関連の埋め込みを強化する。次に,Pixel-wise Dynamic ReLU (P-DyReLU) を提案する。実験的に、FocSAMはSAMのインタラクティブセグメンテーション性能を向上し、既存の最先端の手法をセグメンテーション品質に適合させ、CPU上での推論時間の5.6%しか必要としない。 The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces stability issues in challenging samples upon this pipeline. These issues arise from two main factors. Firstly, the image preprocessing disables SAM from dynamically using image-level zoom-in strategies to refocus on the target object during interaction. Secondly, the lightweight decoder struggles to sufficiently integrate interactive information with image embeddings. To address these two limitations, we propose FocSAM with a pipeline redesigned on two pivotal aspects. First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object. Dwin-MSA localizes attention computations around the target object, enhancing object-related embeddings with minimal computational overhead. Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks that have significant impacts on the overall segmentation results. Experimentally, FocSAM augments SAM's interactive segmentation performance to match the existing state-of-the-art method in segmentation quality, requiring only about 5.6% of this method's inference time on CPUs.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# ベクトルエッジコンピューティングにおける適応的・並列的フェデレーション学習 Adaptive and Parallel Split Federated Learning in Vehicular Edge Computing ( http://arxiv.org/abs/2405.18707v1 ) ライセンス: Link先を確認	Xianke Qiang, Zheng Chang, Yun Hu, Lei Liu, Timo Hamalainen,	(参考訳) 車両エッジインテリジェンス(VEI)は、車両エッジコンピューティング(VEC)システムに人工知能(AI)を収容することで、将来のインテリジェントトランスポートシステムを実現するための有望なパラダイムである。フェデレーテッド・ラーニング(FL)は、VEIにおける車両データのプライバシを保護しつつ、局所的に協調的なモデルトレーニングとアグリゲーションを促進する基本的な技術の1つである。しかし、従来のFLは、車両の不均一性に適応し、資源に制約のある車両で大規模なモデルを訓練し、重量プライバシーの漏洩をモデル化する際の課題に直面している。一方、スプリットラーニング(SL)は、モデルワイトリークのリスクを軽減し、車両上でのトレーニング負荷を解放する、有望な協調学習フレームワークとして提案されている。 SLは、モデル全体を車両側モデルとEC側モデルに分割することで、車両とエッジクラウド(EC)の間のモデルを順次訓練する。本研究では、SLとFLの利点を組み合わせて、ベクトルエッジコンピューティング(ASFV)のための適応分割フェデレート学習スキームを開発する。 ASFVスキームはモデルを適応的に分割し、移動体選択と資源配分を考慮したトレーニングプロセスを並列化する。非独立で同一の分散データを用いて行った広範囲なシミュレーションにより、提案手法は既存のベンチマークと比較してトレーニングの遅延を著しく低減し、ネットワークのダイナミクスや車両の移動性に適応することを示した。 Vehicular edge intelligence (VEI) is a promising paradigm for enabling future intelligent transportation systems by accommodating artificial intelligence (AI) at the vehicular edge computing (VEC) system. Federated learning (FL) stands as one of the fundamental technologies facilitating collaborative model training locally and aggregation, while safeguarding the privacy of vehicle data in VEI. However, traditional FL faces challenges in adapting to vehicle heterogeneity, training large models on resource-constrained vehicles, and remaining susceptible to model weight privacy leakage. Meanwhile, split learning (SL) is proposed as a promising collaborative learning framework which can mitigate the risk of model wights leakage, and release the training workload on vehicles. SL sequentially trains a model between a vehicle and an edge cloud (EC) by dividing the entire model into a vehicle-side model and an EC-side model at a given cut layer. In this work, we combine the advantages of SL and FL to develop an Adaptive Split Federated Learning scheme for Vehicular Edge Computing (ASFV). The ASFV scheme adaptively splits the model and parallelizes the training process, taking into account mobile vehicle selection and resource allocation. Our extensive simulations, conducted on non-independent and identically distributed data, demonstrate that the proposed ASFV solution significantly reduces training latency compared to existing benchmarks, while adapting to network dynamics and vehicles' mobility.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# リコメンダシステムのための特徴インタラクション選択のための認知的進化学習 Cognitive Evolutionary Learning to Select Feature Interactions for Recommender Systems ( http://arxiv.org/abs/2405.18708v1 ) ライセンス: Link先を確認	Runlong Yu, Qixiang Shao, Qi Liu, Huan Liu, Enhong Chen,	(参考訳) 機能相互作用の選択は、商業レコメンデータシステムにおける基本的な問題である。ほとんどのアプローチは、専門家の指導の下で、同じ事前定義された操作によって、すべての特徴と相互作用を均等に列挙します。 1) アーキテクチャがタスクやデータに不適応であるため、モデルの学習能力を保証することはできない; (2) 機能やインタラクションが不要なノイズをもたらし、トレーニングプロセスが複雑になる可能性がある。本稿では,タスクガイダンスに基づく適切な操作,特徴,インタラクションを選択するために,モデルを適応的に進化させることを目的とする。自然生物の進化と機能に触発されて,認知能力は多様な環境下で反応し,生き残ることができる生物の特性を指す,新しい『textsl{Cognitive EvoLutionary Learning』(CELL)フレームワークを提案する。これは3つの段階、すなわちDNA探索、ゲノム探索、モデル機能から構成される。特に、モデルとタスクの関係を生物と自然環境の関係とみなすならば、機能対の相互作用は二重鎖DNAに類似し、関連する特徴と相互作用はゲノムに類似することができる。この線に沿って、自然選択のための生物の生存率をシミュレートするために、操作、特徴、相互作用に関するモデルの適合度を診断する。 Cellはさまざまなタスクやデータに対して,さまざまなモデルに適応的に進化し,実践者がオフザシェルフモデルにアクセスできることを示す。 4つの実世界のデータセットに対する大規模な実験は、細胞が最先端のベースラインを大幅に上回っていることを示している。また、我々は、セルが特徴対の予め定義された相互作用パターンを一貫して発見できることを確認するために、合成実験を行う。 Feature interaction selection is a fundamental problem in commercial recommender systems. Most approaches equally enumerate all features and interactions by the same pre-defined operation under expert guidance. Their recommendation is unsatisfactory sometimes due to the following issues: (1)~They cannot ensure the learning abilities of models because their architectures are poorly adaptable to tasks and data; (2)~Useless features and interactions can bring unnecessary noise and complicate the training process. In this paper, we aim to adaptively evolve the model to select appropriate operations, features, and interactions under task guidance. Inspired by the evolution and functioning of natural organisms, we propose a novel \textsl{Cognitive EvoLutionary Learning (CELL)} framework, where cognitive ability refers to a property of organisms that allows them to react and survive in diverse environments. It consists of three stages, i.e., DNA search, genome search, and model functioning. Specifically, if we regard the relationship between models and tasks as the relationship between organisms and natural environments, interactions of feature pairs can be analogous to double-stranded DNA, of which relevant features and interactions can be analogous to genomes. Along this line, we diagnose the fitness of the model on operations, features, and interactions to simulate the survival rates of organisms for natural selection. We show that CELL can adaptively evolve into different models for different tasks and data, which enables practitioners to access off-the-shelf models. Extensive experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines. Also, we conduct synthetic experiments to ascertain that CELL can consistently discover the pre-defined interaction patterns for feature pairs.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# FP8とリターン:LLMトレーニングの安定性に及ぼす高精度化の効果の定量化 To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability ( http://arxiv.org/abs/2405.18710v1 ) ライセンス: Link先を確認	Joonhyung Lee, Jeongin Bae, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee,	(参考訳) 大規模言語モデル(LLM)事前学習に伴う膨大な計算コストは、プロセスの高速化のために、精度の低い浮動小数点表現に大きな関心を惹き付けている。その結果、BrainFloat16(BF16)の精度は、近年のアクセラレーターにハードウェアサポートが組み込まれているLCMトレーニングのデファクトスタンダードとなった。 FP8が最近導入された最新のプロセッサでは、この傾向はさらに進んでいる。しかしながら、BF16より安定でないことが判明したFP16の以前の経験は、FP8がFP16よりも少ないビットでも、LCMトレーニングに費用対効果があるかどうかという懸念を提起している。我々は、コスト効率を高めるために、高精度トレーニングスキームは、高精度トレーニングスキームと同等のトレーニング安定性とハイパーパラメータ感度を持つ必要があると論じる。しかし、現在利用可能なFP8訓練方法は、経済的代替品としての使用を可能にするには不十分であることがわかった。これにより、ランダムシード間の堅牢性や学習率の観点から、低精度LDMトレーニングの安定性を検討することができる。そこで本研究では,自動回帰言語モデルにおける損失ランドスケープのシャープネスを定量化するための新しい評価手法と指標を提案する。浮動小数点表現におけるインクリメンタルビット削減をシミュレートすることにより,表現力とトレーニング安定性の関係を解析し,今後の研究を支援する。 The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent accelerators. This trend has gone even further in the latest processors, where FP8 has recently been introduced. However, prior experience with FP16, which was found to be less stable than BF16, raises concerns as to whether FP8, with even fewer bits than FP16, can be a cost-effective option for LLM training. We argue that reduced-precision training schemes must have similar training stability and hyperparameter sensitivities to their higher-precision counterparts in order to be cost-effective. However, we find that currently available methods for FP8 training are not robust enough to allow their use as economical replacements. This prompts us to investigate the stability of reduced-precision LLM training in terms of robustness across random seeds and learning rates. To this end, we propose new evaluation techniques and a new metric for quantifying loss landscape sharpness in autoregressive language models. By simulating incremental bit reductions in floating-point representations, we analyze the relationship between representational power and training stability with the intent of aiding future research into the field.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# 内部整合性を持つ言語モデルにおける推論の校正 Calibrating Reasoning in Language Models with Internal Consistency ( http://arxiv.org/abs/2405.18711v1 ) ライセンス: Link先を確認	Zhihui Xie, Jizhou Guo, Tong Yu, Shuai Li,	(参考訳) 大型言語モデル (LLM) は様々な推論タスクにおいて印象的な能力を示しており、例えばチェーン・オブ・シント (CoT) のような技術によって助けられ、言語化された推論が引き起こされる。しかし、LSMは明らかな誤りや矛盾のあるテキストを生成することが多く、それらが堅牢に処理し、生成した有理性を利用する能力に疑問を呈している。本研究では, 内部表現のレンズによるLLMにおけるCoT推論について検討し, これらの表現が生成した有理数にどのように影響するかに着目した。予備分析の結果、生成した有理値が解答精度を向上させる一方で、中間層におけるモデルの内部表現と最終層における表現との間に矛盾が生じ、それらの推論プロセスの信頼性を損なう可能性が示唆された。そこで本研究では,中間層から復号された遅延予測の一致を検証し,モデルの信頼性の尺度として内部整合性を提案する。異なるモデルとデータセットにわたる広範な実験研究により、内部の一貫性が正しい推論経路と間違った推論経路を効果的に区別することを示した。そこで本研究では,内部整合性の高い高重み付け推論経路によるCoT推論の校正手法を提案する。さらなる分析により、レイヤ間の注意パターンとフィードフォワードモジュールが明らかになり、内部の不整合の出現に関する洞察が得られる。本研究は, LLMの自己評価に内部表現を用いることの可能性を示すものである。 Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought (CoT) prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious mistakes and contradictions, raising doubts about their ability to robustly process and utilize generated rationales. In this work, we investigate CoT reasoning in LLMs through the lens of internal representations, focusing on how these representations are influenced by generated rationales. Our preliminary analysis reveals that while generated rationales improve answer accuracy, inconsistencies emerge between the model's internal representations in middle layers and those in final layers, potentially undermining the reliability of their reasoning processes. To address this, we propose internal consistency as a measure of the model's confidence by examining the agreement of latent predictions decoded from intermediate layers. Extensive empirical studies across different models and datasets demonstrate that internal consistency effectively distinguishes between correct and incorrect reasoning paths. Motivated by this, we propose a new approach to calibrate CoT reasoning by up-weighting reasoning paths with high internal consistency, resulting in a significant boost in reasoning performance. Further analysis uncovers distinct patterns in attention and feed-forward modules across layers, providing insights into the emergence of internal inconsistency. In summary, our results demonstrate the potential of using internal representations for self-evaluation of LLMs.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# NeRF on-the-go: Exploiting Uncertainity for Distractor-free NeRFs in the Wild NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild ( http://arxiv.org/abs/2405.18715v1 ) ライセンス: Link先を確認	Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng,	(参考訳) ニューラルネットワーク(Neural Radiance Fields、NeRF)は、静的なシーンのマルチビュー画像からフォトリアリスティックなビューを合成することに成功したが、動いた物体、影、照明変更などの邪魔をする動的な現実世界環境では課題に直面している。既存の手法は、制御された環境と低い閉塞率を管理するが、特に高い閉塞シナリオ下では、レンダリング品質が不足する。本稿では,手軽にキャプチャされた画像列のみから,複雑なシーンにおける新規ビューのロバストな合成を可能にする,シンプルで効果的なNeRF On-the-goを提案する。不確実性に陥りつつも,本手法は捕集に支配的であったとしても,効率的に散逸を除去するだけでなく,顕著に高速な収束速度を実現する。様々な場面における総合的な実験を通して,本手法は最先端技術よりも顕著に改善されていることを示す。この進歩は、多様な動的現実世界のアプリケーションにおいて、NeRFの新しい道を開く。 Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusion scenarios. In this paper, we introduce NeRF On-the-go, a simple yet effective approach that enables the robust synthesis of novel views in complex, in-the-wild scenes from only casually captured image sequences. Delving into uncertainty, our method not only efficiently eliminates distractors, even when they are predominant in captures, but also achieves a notably faster convergence speed. Through comprehensive experiments on various scenes, our method demonstrates a significant improvement over state-of-the-art techniques. This advancement opens new avenues for NeRF in diverse and dynamic real-world applications.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# SketchDeco: カラーでB&Wスケッチをデコレート SketchDeco: Decorating B&W Sketches with Colour ( http://arxiv.org/abs/2405.18716v1 ) ライセンス: Link先を確認	Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song,	(参考訳) 本稿では,色彩の普遍的な幼児期活動とそのデザイン・ストーリーボードへの応用から着想を得た,色彩のスケッチ化のための新しいアプローチを提案する。精度と利便性のバランスを考慮し,直感的なユーザコントロールを実現するために,地域マスクとカラーパレットを活用し,手動のカラーアサインやテキストプロンプトの制限をクリアする。 ControlNetとステージ生成を戦略的に組み合わせ、安定拡散v1.5を導入し、BLIP-2テキストプロンプトを活用することにより、忠実な画像生成とユーザ指向のカラー化を容易にする。局所的およびグローバルな一貫性の課題に対処するため,我々は,インバージョンスキーム,ガイド付きサンプリング,スケーリング係数を持つ自己保持機構などの発明的なソリューションを採用している。このツールは、高速でトレーニングのないだけでなく、消費者向けのNvidia RTX 4090 Super GPUとも互換性がある。 Project Page: \url{https://chaitron.github.io/SketchDeco/} This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the limitations of textual prompts. By strategically combining ControlNet and staged generation, incorporating Stable Diffusion v1.5, and leveraging BLIP-2 text prompts, our methodology facilitates faithful image generation and user-directed colourisation. Addressing challenges of local and global consistency, we employ inventive solutions such as an inversion scheme, guided sampling, and a self-attention mechanism with a scaling factor. The resulting tool is not only fast and training-free but also compatible with consumer-grade Nvidia RTX 4090 Super GPUs, making it a valuable asset for both creative professionals and enthusiasts in various fields. Project Page: \url{https://chaitron.github.io/SketchDeco/}	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# ベイジアンパースケーションによる効率的なモデル非依存アライメント Efficient Model-agnostic Alignment via Bayesian Persuasion ( http://arxiv.org/abs/2405.18718v1 ) ライセンス: Link先を確認	Fengshuo Bai, Mingzhi Wang, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang,	(参考訳) 近年の大規模言語モデル(LLM)の進歩により,LLMと人間の意図との合意を維持するための効果的な手法としてアライメントが出現している。現在の手法は、主に監視ファインチューニング(SFT)や人間からのフィードバックからの強化学習(RLHF)を通じて直接訓練される。本稿では,より小さなモデルを用いてブラックボックスの大規模モデルをコーディネートする効率的な手法について検討し,モデルに依存しない軽量ベイズパーステンションアライメントフレームワークを提案する。我々はこの問題を,小型モデルの観点からの信号処理戦略の最適化として定式化する。説得プロセスでは、小さなモデル(アドバイザ)が情報項目(すなわち状態)を観察し、大きなモデル(Receiver)を説得して、改善された応答を引き出す。その後、受信者は、入力、アドバイザからの信号、および情報項目に関する更新された信念に基づいて応答を生成する。筆者らは,本フレームワークを用いてトレーニングを行うことで,様々なタスクにおいて,各種受信者の性能を大幅に向上させることができることを示した。理論的には,我々の説得の枠組みを解析し,助言者の後悔に上限を与え,最適なシグナル伝達戦略を学習する上での有効性を確認した。実験の結果, GPT-2は様々なモデルの性能を著しく向上し, 数学的推論能力は16.1%, コード生成能力は13.7%向上した。ベイズパーステンションの観点からアライメントフレームワークを再考するための最初のステップを、私たちの作業が提供してくれることを願っています。 With recent advancements in large language models (LLMs), alignment has emerged as an effective technique for keeping LLMs consensus with human intent. Current methods primarily involve direct training through Supervised Fine-tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), both of which require substantial computational resources and extensive ground truth data. This paper explores an efficient method for aligning black-box large models using smaller models, introducing a model-agnostic and lightweight Bayesian Persuasion Alignment framework. We formalize this problem as an optimization of the signaling strategy from the small model's perspective. In the persuasion process, the small model (Advisor) observes the information item (i.e., state) and persuades large models (Receiver) to elicit improved responses. The Receiver then generates a response based on the input, the signal from the Advisor, and its updated belief about the information item. Through training using our framework, we demonstrate that the Advisor can significantly enhance the performance of various Receivers across a range of tasks. We theoretically analyze our persuasion framework and provide an upper bound on the Advisor's regret, confirming its effectiveness in learning the optimal signaling strategy. Our Empirical results demonstrates that GPT-2 can significantly improve the performance of various models, achieving an average enhancement of 16.1% in mathematical reasoning ability and 13.7% in code generation. We hope our work can provide an initial step toward rethinking the alignment framework from the Bayesian Persuasion perspective.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# コンテキスト位置エンコーディング: 重要なものを数えることを学ぶ Contextual Position Encoding: Learning to Count What's Important ( http://arxiv.org/abs/2405.18719v1 ) ライセンス: Link先を確認	Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar,	(参考訳) 注意機構はLarge Language Models (LLM) の重要なコンポーネントであり、シーケンス内のトークン同士の対話を可能にするが、順序不変である。 PE(Incorporating position encoding)は、i-thトークンへの出席など、位置ごとの対応を可能にする。しかし、現在のPE法ではトークンカウントを用いて位置を導出しているため、i-th文への出席など、より高度な抽象レベルに一般化することはできない。本論文では,モデルによって決定される特定のトークンにのみ位置を増設することにより,コンテキスト上で位置を条件付けることのできる新しい位置符号化手法であるコンテキスト位置符号化(CoPE)を提案する。これにより、$i$-thの特定の単語、名詞、文への出席など、より一般的な位置アドレス付けが可能になる。一般的な位置埋め込みがフェールした場合,CoPEは選択コピー,カウント,フリップフロップといったタスクを解くことができ,言語モデリングやコーディングタスクの難易度を改善することができることを示す。 The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# 視覚言語ナビゲーションのための大規模モデルによる修正可能なランドマーク発見 Correctable Landmark Discovery via Large Models for Vision-Language Navigation ( http://arxiv.org/abs/2405.18721v1 ) ライセンス: Link先を確認	Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang,	(参考訳) Vision-Language Navigation (VLN) は、ターゲット位置に到達するために、エージェントが言語命令に従う必要がある。ナビゲーションを成功させる重要な要因は、指導で暗示されるランドマークを様々な視覚的観察と整合させることである。しかしながら、以前のVLNエージェントは、限られたナビゲーションデータから学習し、十分なオープンワールドアライメント知識がないため、特に探索されていないシーンでは正確なモダリティアライメントを実行できない。本研究では,Currectable LaNdmark DiScOvery と呼ばれる新しい VLN パラダイムをLarge ModEls (CONSOLE) 経由で提案する。 CONSOLEでは、2つの大きなモデルChatGPTとCLIPに基づく新しい修正可能なランドマーク発見スキームを導入することで、VLNをオープンワールドシーケンシャルなランドマーク発見問題として捉えた。具体的には、ChatGPTを使用して、豊かなオープンワールドのランドマークコモンセンスを提供し、これらのコモンセンスに基づいてCLIP駆動のランドマーク発見を行う。視覚的制約の欠如による前者の騒音を軽減するため,学習可能な共起スコアリングモジュールを導入し,実際の観測結果に基づいて各共起の重要度を補正し,正確なランドマーク発見を行う。我々はさらに、異なるVLNエージェントとエレガントな組み合わせのための観察強化戦略を設計し、修正されたランドマーク特徴を用いて行動決定のための観察機能を得る。複数の人気のあるVLNベンチマーク(R2R、REVERIE、R4R、RxR)の大規模な実験結果から、強力なベースラインよりもCONSOLEの顕著な優位性が確認された。特に,我々のCONSOLEは,目に見えないシナリオにおいて,R2RとR4Rの最先端結果を確立している。コードはhttps://github.com/expectorlin/CONSOLEで入手できる。 Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# コンフォーマルデプレッション予測 Conformal Depression Prediction ( http://arxiv.org/abs/2405.18723v1 ) ライセンス: Link先を確認	Yonghong Li, Shan Qu, Xiuzhuang Zhou,	(参考訳) 深層学習に基づく既存の抑うつ認識手法は将来性を示すが、それらの実践的応用は信頼性の欠如によって妨げられ、深層モデルはしばしば「textit{black box}」モデルとして展開されるため、モデル予測の信頼性については不透明である。うつ病認識のようなリスクの高い臨床応用では、意思決定において不確実性定量化が不可欠である。本稿では,共形予測(CP)に基づく不確実性定量化手法である共形抑うつ予測(CDP)を導入する。 CDPはプラグ・アンド・プレイのモジュールで、モデルの再トレーニングも、うつ病データ分布の仮定も必要としない。 CDPは、入力毎の性能保証ではなく、全ての入力に対して平均的な性能保証を提供するため、近似条件付き共形予測法であるCDP-ACCを提案する。 CDP-ACCは、まず、近傍緩和により予測分布を推定し、次に、ネストシーケンスを構成することで、各入力に対してより厳密な予測間隔を提供する等角スコア関数を導入する。 AVEC 2013 と AVEC 2014 データセットにおける不確実性定量化のうつ病認識への応用と CDP と CDP-ACC の有効性と優位性を実証的に実証した。 While existing depression recognition methods based on deep learning show promise, their practical application is hindered by the lack of trustworthiness, as these deep models are often deployed as \textit{black box} models, leaving us uncertain about the confidence of the model predictions. For high-risk clinical applications like depression recognition, uncertainty quantification is essential in decision-making. In this paper, we introduce conformal depression prediction (CDP), a depression recognition method with uncertainty quantification based on conformal prediction (CP), giving valid confidence intervals with theoretical coverage guarantees for the model predictions. CDP is a plug-and-play module that requires neither model retraining nor an assumption about the depression data distribution. As CDP provides only an average performance guarantee across all inputs rather than per-input performance guarantee, we propose CDP-ACC, an improved conformal prediction with approximate conditional coverage. CDP-ACC firstly estimates the prediction distribution through neighborhood relaxation, and then introduces a conformal score function by constructing nested sequences, so as to provide tighter prediction interval for each specific input. We empirically demonstrate the application of uncertainty quantification in depression recognition, and the effectiveness and superiority of CDP and CDP-ACC on the AVEC 2013 and AVEC 2014 datasets	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# 多ラベル特性予測のための階層型プロンプトを用いた微分分子表現の適応 Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction ( http://arxiv.org/abs/2405.18724v1 ) ライセンス: Link先を確認	Linjia Kang, Songhua Zhou, Shuyan Fang, Shichao Liu, Wen Zhang,	(参考訳) 分子特性の正確な予測は、薬物発見の分野において重要である。しかし、既存の手法では、実世界の分子が通常複数の特性ラベルを持つという事実を完全には考慮していない。したがって、分子表現学習モデルは、タスク間の多粒性相関情報を考慮した微分分子表現を生成する必要がある。この目的のために,階層型プロンプト分子表現学習フレームワーク (HiPM) を導入し,タスク認識プロンプトを通じて分子表現におけるタスクの差分表現を強化し,ラベル間の共有情報を用いて異なるタスク間の負の伝達を緩和する。 HiPMは主に、分子表現エンコーダ(MRE)とタスク・アウェア・プロンプタ(TAP)の2つのコアコンポーネントで構成されている。 MREは、原子レベルとモチーフレベルの分子的特徴を捉えるために、階層的メッセージパッシングネットワークアーキテクチャを使用し、TAPは、集約的階層的クラスタリングを使用して、タスクの親和性と特異性を反映したプロンプトツリーを構築し、モデルがマルチラベルプロパティ予測の複雑さを効果的に処理できるようにする。大規模な実験により、HiPMは様々なマルチラベルデータセットにまたがって最先端のパフォーマンスを達成し、マルチラベル分子表現学習の新しい視点を提供する。 Accurate prediction of molecular properties is critical in the field of drug discovery. However, existing methods do not fully consider the fact that molecules in the real world usually possess multiple property labels, and complex high-order relationships may exist among these labels. Therefore, molecular representation learning models should generate differential molecular representations that consider multi-granularity correlation information among tasks. To this end, our research introduces a Hierarchical Prompted Molecular Representation Learning Framework (HiPM), which enhances the differential expression of tasks in molecular representations through task-aware prompts, and utilizes shared information among labels to mitigate negative transfer between different tasks. HiPM primarily consists of two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). The MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atomic and motif levels, while the TAP uses agglomerative hierarchical clustering to build a prompt tree that reflects the affinity and distinctiveness of tasks, enabling the model to effectively handle the complexity of multi-label property predictions. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a new perspective on multi-label molecular representation learning.	翻訳日:2024-05-30 21:04:06 公開日:2024-05-29
# 根拠のないモバイルクラウドセンシングデータの質を高めることは可能か? Can We Enhance the Quality of Mobile Crowdsensing Data Without Ground Truth? ( http://arxiv.org/abs/2405.18725v1 ) ライセンス: Link先を確認	Jiajie Li, Bo Gu, Shimin Gong, Zhou Su, Mohsen Guizani,	(参考訳) モバイル・クラウドセンシング(MCS)は、様々な領域で顕著なトレンドとなっている。しかし、モバイルユーザ(MU)が送信したセンシングデータの質を保証することは、依然として複雑で困難な問題である。この課題に対処するためには、低品質なセンシングデータを検出し、MCSシステムの正常な操作を妨害する可能性のある悪意のあるMUを特定するための高度な手法が必要である。そこで本稿では,低品質データを高品質データからセンシングタスクで分離可能なPRBTD(Predict- and reputation-based truth discovery)フレームワークを提案する。まず、相関に着目した時空間変換ネットワークを用いて、入力センシングデータの基礎的真実を予測する。そして、予測結果に基づいて特徴としてデータの検知誤差を抽出し、データ間の影響を計算する。最後に、評価に基づく真理探索(TD)モジュールを設計し、低品質なデータとその意味を識別する。 MUが送信したセンシングデータを考えると、PRBTDは重いノイズでデータを排除し、悪意のあるMUを高精度に識別することができる。 PRBTDは同定精度とデータ品質向上の点で既存の手法よりも優れていた。 Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this article proposes a prediction- and reputation-based truth discovery (PRBTD) framework, which can separate low-quality data from high-quality data in sensing tasks. First, we apply a correlation-focused spatial-temporal transformer network to predict the ground truth of the input sensing data. Then, we extract the sensing errors of the data as features based on the prediction results to calculate the implications among the data. Finally, we design a reputation-based truth discovery (TD) module for identifying low-quality data with their implications. Given sensing data submitted by MUs, PRBTD can eliminate the data with heavy noise and identify malicious MUs with high accuracy. Extensive experimental results demonstrate that PRBTD outperforms the existing methods in terms of identification accuracy and data quality enhancement.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# 聴覚処理経路の逆転:fMRIによる粗大な音像再構成 Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI ( http://arxiv.org/abs/2405.18726v1 ) ライセンス: Link先を確認	Che Liu, Changde Du, Xiaoyu Chen, Huiguang He,	(参考訳) 低レベルの音響特徴から高レベルの意味理解に音を変換する人間の聴覚システムの階層的処理からインスピレーションを得て,新しい粗大な音声再構成手法を提案する。非侵襲的機能的磁気共鳴画像(fMRI)データを活用することで,聴覚処理の逆経路を再現する。 CLAPを用いてfMRIデータを低次元のセマンティック空間に粗くデコードし、続いてセマンティック特徴によって導かれる高次元AudioMAE潜在空間に細粒度デコードする。これらの微細な神経機能は、潜在拡散モデル(LDM)によるオーディオ再構成の条件として機能する。 Brain2Sound、Brain2Music、Brain2Speechの3つの公開fMRIデータセットに対する検証は、FD、FAD、KLといったメトリクスで最先端のパフォーマンスを示す、スタンドアローンの微細なアプローチよりも粗大な復号法の方が優れていることを示す。さらに,復号化時に意味的プロンプトを用いることで,意味的特徴が最適でない場合に,再構成音声の品質を向上させる。多様な刺激にまたがるモデルの多角性を示すことは、脳から音声への普遍的な枠組みとしての可能性を浮き彫りにしている。本研究は,人間の聴覚システムの理解に寄与し,神経復号法と音声再構成法の境界を推し進める。 Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utilize CLAP to decode fMRI data coarsely into a low-dimensional semantic space, followed by a fine-grained decoding into the high-dimensional AudioMAE latent space guided by semantic features. These fine-grained neural features serve as conditions for audio reconstruction through a Latent Diffusion Model (LDM). Validation on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech-underscores the superiority of our coarse-to-fine decoding method over stand-alone fine-grained approaches, showcasing state-of-the-art performance in metrics like FD, FAD, and KL. Moreover, by employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal. The demonstrated versatility of our model across diverse stimuli highlights its potential as a universal brain-to-audio framework. This research contributes to the comprehension of the human auditory system, pushing boundaries in neural decoding and audio reconstruction methodologies.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# CtrlA: プローブ誘導制御による適応型検索拡張生成 CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control ( http://arxiv.org/abs/2405.18727v1 ) ライセンス: Link先を確認	Huanshuo Liu, Hao Zhang, Zhijiang Guo, Kuicai Dong, Xiangyang Li, Yi Quan Lee, Cong Zhang, Yong Liu,	(参考訳) 大規模言語モデル(LLM)の幻覚を、検索された外部知識で緩和するための有望な解決策として、検索拡張世代(RAG)が出現している。 Adaptive RAGは、検索の必要性を動的に評価し、外部知識と内部知識のバランスをとることによって、このアプローチを強化する。しかし,既存の適応RAG法は,LLMの言語的,あるいは確率的フィードバックに頼って,要求に基づく検索を主に実現し,慎重に構築したデータセットを直接微調整することで,信頼性の低い検索要求決定,高コスト化,および準最適応答生成を実現している。 CtrlAと呼ばれる効果的なプローブ誘導適応RAGフレームワークを導入することで、LCMの内部状態を探索し、そのような問題を緩和する試みを初めて提示する。具体的には、CtrlAは、LLMの内部状態を監視し、信頼度を評価するための信頼プローブと、LLMの表現を操作することによってLCMの振舞いを調節する。実験により、CtrlAは様々なタスクにおいて既存の適応RAG法よりも優れていることが示され、正直な制御によりLLMを効果的に誠実にすることができ、信頼性監視が検索トリガの有望な指標であることが証明された。私たちのコードはhttps://github.com/HSLiu-Initial/CtrlA.git.comで公開されています。 Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying on superficially verbalize-based or probability-based feedback of LLMs, or directly fine-tuning LLMs via carefully crafted datasets, resulting in unreliable retrieval necessity decisions, heavy extra costs, and sub-optimal response generation. We present the first attempts to delve into the internal states of LLMs to mitigate such issues by introducing an effective probe-guided adaptive RAG framework, termed CtrlA. Specifically, CtrlA employs an honesty probe to regulate the LLM's behavior by manipulating its representations for increased honesty, and a confidence probe to monitor the internal states of LLM and assess confidence levels, determining the retrieval necessity during generation. Experiments show that CtrlA is superior to existing adaptive RAG methods on a diverse set of tasks, the honesty control can effectively make LLMs more honest and confidence monitoring is proven to be a promising indicator of retrieval trigger. Our codes are available at https://github.com/HSLiu-Initial/CtrlA.git.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# オフライン強化学習のための優先アクション最適化拡散法 Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning ( http://arxiv.org/abs/2405.18729v1 ) ライセンス: Link先を確認	Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He,	(参考訳) オフライン強化学習(RL)は、以前に収集したデータセットから最適なポリシーを学ぶことを目的としている。近年、その強力な表現能力により、拡散モデルはオフラインのRL問題に対するポリシーモデルとして大きな可能性を示している。しかし、拡散ポリシーに基づく以前のオフラインRLアルゴリズムは、一般的にポリシーを改善するために重み付き回帰を採用する。このアプローチは、収集されたアクションのみを使用してポリシーを最適化し、Q値に敏感であり、さらなるパフォーマンス向上の可能性を制限する。そこで本研究では,オフラインRLのための新たな優先作用最適化拡散ポリシーを提案する。特に、表現的条件拡散モデルを用いて、行動ポリシーの多様な分布を表現する。一方、拡散モデルに基づいて、同じ行動分布内にある好ましい行動が、評論家関数を介して自動的に生成される。さらに, 騒音優先行動に適応し, 安定した訓練を行うことにより, 政策改善を図るために, 雑音優先選択最適化を設計する。特にKitchenやAntMazeのような粗末な報酬タスクにおいて,従来のオフラインRL手法と比較して,提案手法が競争力や優れた性能を提供することを示す。さらに,提案手法の有効性を実証的に証明した。 Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach optimizes the policy only using the collected actions and is sensitive to Q-values, which limits the potential for further performance enhancement. To this end, we propose a novel preferred-action-optimized diffusion policy for offline RL. In particular, an expressive conditional diffusion model is utilized to represent the diverse distribution of a behavior policy. Meanwhile, based on the diffusion model, preferred actions within the same behavior distribution are automatically generated through the critic function. Moreover, an anti-noise preference optimization is designed to achieve policy improvement by using the preferred actions, which can adapt to noise-preferred actions for stable training. Extensive experiments demonstrate that the proposed method provides competitive or superior performance compared to previous state-of-the-art offline RL methods, particularly in sparse reward tasks such as Kitchen and AntMaze. Additionally, we empirically prove the effectiveness of anti-noise preference optimization.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# VBIM-Net:逆散乱問題に対する変分独立反復ネットワーク VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems ( http://arxiv.org/abs/2405.18731v1 ) ライセンス: Link先を確認	Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao,	(参考訳) 近年,逆散乱問題 (ISP) の解法として,フィールド型反復法と深層学習 (DL) 技術を統合する可能性が高まっている。本稿では,VBIM-Netという新しい変動型ボーン・イテレーティブ・ネットワークを提案する。提案するVBIM-Netは,複数のサブネットワーク層による変動ボルン反復法(VBIM)における全電界とコントラストの交互更新をエミュレートする。我々は,各サブネットワークにコントラスト変動の計算を組み込み,散乱体残差を近似コントラスト変動に変換し,U-Netで拡張することにより,既存のアプローチのように一致した測定寸法とグリッド解像度の要求を回避する。各層の出力の総体とコントラストは、サブネットの変数の物理的解釈性を保証するVBIM-Netの損失関数に制御される。さらに、モデルの安定性を高めるために、余分なノイズを伴うトレーニングスキームを設計する。提案したVBIM-Netのインバージョン品質,一般化能力,ロバスト性を検証した。この研究は、効率的なフィールド型DLスキームの設計に新たなインスピレーションを与えるかもしれない。 Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating updates of the total electric field and the contrast in the variational Born iterative method (VBIM) by multiple layers of subnetworks. We embed the calculation of the contrast variation into each of the subnetworks, converting the scattered field residual into an approximate contrast variation and then enhancing it by a U-Net, thus avoiding the requirement of matched measurement dimension and grid resolution as in existing approaches. The total field and contrast of each layer's output is supervised in the loss function of VBIM-Net, which guarantees the physical interpretability of variables of the subnetworks. In addition, we design a training scheme with extra noise to enhance the model's stability. Extensive numerical results on synthetic and experimental data both verify the inversion quality, generalization ability, and robustness of the proposed VBIM-Net. This work may provide some new inspiration for the design of efficient field-type DL schemes.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# Gemini & Physical World: 大規模言語モデルはマルチモーダルソーシャルメディアポストから地震の震度を推定できる Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts ( http://arxiv.org/abs/2405.18732v1 ) ライセンス: Link先を確認	S. Mostafa Mousavi, Marc Stogaitis, Tajinder Gadh, Richard M Allen, Alexei Barski, Robert Bosch, Patrick Robertson, Nivetha Thiruverahan, Youngmin Cho,	(参考訳) 本稿では,ソーシャルメディアデータとCCTV映像を用いた地盤揺らぎ強度の推定手法を提案する。マルチモーダル言語モデルであるGemini Pro(Reid et al 2024)モデルを用いて、生成AIと自然言語処理を利用した非構造化データから関連情報を抽出できることを実証する。モデル出力は、MMI(Modified Mercalli Intensity)値の形で、独立した観測データとよく一致している。さらに,ゲミニは,高度な視覚的・聴覚的理解能力の他に,訓練中に獲得したと思われる地震の大きさ,距離,MMI強度の一般的関係の理解の簡易化など,さらなる知識の源泉を生かしていると考えられる。これらの発見は、ジェミニの物理世界に対する一般的な理解の範囲とその現象に関する興味深い疑問を提起する。ゲミニが確立された科学的知識と整合した結果を生成する能力は、ジェミニのようなLLMが地震のような複雑な物理現象の理解を深める可能性を強調している。より具体的には、この研究の結果は、ジェミニのようなLLMが市民の地震学に革命をもたらす可能性を強調し、目撃者によるクラウドソースされたデータの迅速かつ効果的で柔軟な分析を可能にし、地震の影響を評価し、危機的状況認識を提供する。この手法は, 早期警戒システムの改善, 災害対応, 地震発生域全体の回復性向上に大きく貢献する。本研究は,震災対策のためのソーシャルメディアとAIの力を活用するための重要なステップを提供する。 This paper presents a novel approach for estimating the ground shaking intensity using social media data and CCTV footage. Employing the Gemini Pro (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. The ability of Gemini to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# 中国のチェッカーにおける効率的な学習--マルチエージェント強化学習におけるパラメータ共有の比較 Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2405.18733v1 ) ライセンス: Link先を確認	Noah Adhikari, Allen Gu,	(参考訳) 完全パラメータ共有型マルチエージェント強化学習(MARL)は,中国チェッカーの完全情報同種競争において,独立した部分共有アーキテクチャよりも優れることを示す。実験を行うため、可変サイズ6プレーヤチャイナチェッカーという新しいMARL環境を開発した。このカスタム環境はPettingZooで開発され、チェリングジャンプを含むゲームの伝統的なルールをすべてサポートしている。これは私たちの知る限りでは、真のゲームに忠実な中国のチェッカーの最初の実装です。中国のチェッカーは、その大きな分岐係数と潜在的に無限の地平線のために学ぶのが難しい。我々は、他のRLドメインの複雑なアクション空間から分岐アクション(サブムーブ)の概念を借用する。これにより、作用空間の次元が大幅に減少する。我々の観測空間はAlphaGoにインスパイアされ、情報を符号化するために多くのバイナリゲームボードを3Dアレーに積み重ねている。 PettingZoo環境、トレーニングおよび評価ロジック、分析スクリプトは、 \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}で見ることができる。 We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditional rules of the game including chaining jumps. This is, to the best of our knowledge, the first implementation of Chinese Checkers that remains faithful to the true game. Chinese Checkers is difficult to learn due to its large branching factor and potentially infinite horizons. We borrow the concept of branching actions (submoves) from complex action spaces in other RL domains, where a submove may not end a player's turn immediately. This drastically reduces the dimensionality of the action space. Our observation space is inspired by AlphaGo with many binary game boards stacked in a 3D array to encode information. The PettingZoo environment, training and evaluation logic, and analysis scripts can be found on \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# PillarHist:ハイト・アウェア・ヒストグラムに基づく量子化対応ピラー特徴エンコーダ PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram ( http://arxiv.org/abs/2405.18734v1 ) ライセンス: Link先を確認	Sifan Zhou, Zhihang Yuan, Dawei Yang, Xubin Wen, Xing Hu, Yuguang Shi, Ziyu Zhao, Xiaobo Lu,	(参考訳) リアルタイムかつ高性能な3Dオブジェクト検出は、自律走行とロボット工学において重要な役割を果たす。最近の柱型3次元物体検出器は、コンパクトな表現と計算オーバーヘッドの低さから注目されており、搭載された配置や量子化に適している。しかし、既存の柱型検出器は、柱特徴符号化(PFE)において、その性能と量子化ポテンシャルを著しく制限する、高さ寸法に沿った情報損失と大きな数値分布差に悩まされている。上記の課題に対処するために,まず,PFE中の異なる入力情報の重要性を明らかにし,その高さ寸法を3次元検出性能向上の鍵となる要素として同定する。そこで本研究では,PillarHistという高さ対応の柱特徴エンコーダを提案する。具体的には、ピラーヒストは1つの柱内の異なる高さの点の離散分布を統計する。このシンプルで効果的な設計は、PFEの計算オーバーヘッドを大幅に減らしながら、高さの寸法に沿った情報を大幅に保存する。一方、PillarHistは、PFE入力の算術的分布を安定範囲に制限し、量子化に親しみやすいようにしている。特に、PillarHistはPFEステージ内でのみ動作してパフォーマンスを向上し、複雑な操作を導入することなく、既存の柱ベースのメソッドへのシームレスな統合を可能にする。大規模な実験は効率と性能の両面でPillarHistの有効性を示している。 Real-time and high-performance 3D object detection plays a critical role in autonomous driving and robotics. Recent pillar-based 3D object detectors have gained significant attention due to their compact representation and low computational overhead, making them suitable for onboard deployment and quantization. However, existing pillar-based detectors still suffer from information loss along height dimension and large numerical distribution difference during pillar feature encoding (PFE), which severely limits their performance and quantization potential. To address above issue, we first unveil the importance of different input information during PFE and identify the height dimension as a key factor in enhancing 3D detection performance. Motivated by this observation, we propose a height-aware pillar feature encoder named PillarHist. Specifically, PillarHist statistics the discrete distribution of points at different heights within one pillar. This simple yet effective design greatly preserves the information along the height dimension while significantly reducing the computation overhead of the PFE. Meanwhile, PillarHist also constrains the arithmetic distribution of PFE input to a stable range, making it quantization-friendly. Notably, PillarHist operates exclusively within the PFE stage to enhance performance, enabling seamless integration into existing pillar-based methods without introducing complex operations. Extensive experiments show the effectiveness of PillarHist in terms of both efficiency and performance.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# WLC-Net: 頑健で高速な深層学習木葉分類法 WLC-Net: a robust and fast deep-learning wood-leaf classification method ( http://arxiv.org/abs/2405.18737v1 ) ライセンス: Link先を確認	Hanlong Li, Pei Wang, Yuhan Wu, Jing Ren, Yuhang Gao, Lingyun Zhang, Mingtai Zhang, Wenxin Chen,	(参考訳) 木葉分類は, 地中レーザースキャン(TLS)点群の解析と推定において必須かつ基本的な前提条件であり, 胸の高さ(DBH), 地中バイオマス(AGB), 木体積などの重要な測定値を含む。これに対処するため, 木葉分類ネットワーク(WLC-Net), 木葉分類ネットワーク(WLC-Net)を導入し, 木葉と葉点を木点クラウド内で区別する深層学習モデルを提案する。WLC-Netは, 線形性を固有の特徴として組み込んだ線形性の向上, 入力出力フレームワークの最適化, 遠心解析手法の最適化などにより, 3種の樹木データセットを用いて評価を行った。さらに、WLC-Netは様々なツリーポイントクラウドに強い適用性を示し、さらなる最適化を約束している。 Wood-leaf classification is an essential and fundamental prerequisite in the analysis and estimation of forest attributes from terrestrial laser scanning (TLS) point clouds,including critical measurements such as diameter at breast height(DBH),above-ground biomass(AGB),wood volume.To address this,we introduce the Wood-Leaf Classification Network(WLC-Net),a deep learning model derived from PointNet++,designed to differentiate between wood and leaf points within tree point clouds.WLC-Net enhances classification accuracy,completeness,and speed by incorporating linearity as an inherent feature,refining the input-output framework,and optimizing the centroid sampling technique.WLC-Net was trained and assessed using three distinct tree species datasets,comprising a total of 102 individual tree point clouds:21 Chinese ash trees,21 willow trees,and 60 tropical trees.For comparative evaluation,five alternative methods,including PointNet++,DGCNN,Krishna Moorthy's method,LeWoS, and Sun's method,were also applied to these datasets.The classification accuracy of all six methods was quantified using three metrics:overall accuracy(OA),mean Intersection over Union(mIoU),and F1-score.Across all three datasets,WLC-Net demonstrated superior performance, achieving OA scores of 0.9778, 0.9712, and 0.9508;mIoU scores of 0.9761, 0.9693,and 0.9141;and F1-scores of 0.8628, 0.7938,and 0.9019,respectively.The time costs of WLC-Net were also recorded to evaluate the efficiency.The average processing time was 102.74s per million points for WLC-Net.In terms of visual inspect,accuracy evaluation and efficiency evaluation,the results suggest that WLC-Net presents a promising approach for wood-leaf classification,distinguished by its high accuracy. In addition,WLC-Net also exhibits strong applicability across various tree point clouds and holds promise for further optimization.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# 多モードLDMにおける逆画像検索キュースパラメトリックメモリ Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs ( http://arxiv.org/abs/2405.18740v1 ) ライセンス: Link先を確認	Jialiang Xu, Michael Moor, Jure Leskovec,	(参考訳) 近年のMLLM(Multimodal large language model)の顕著な進歩にもかかわらず、GPT-4スイートのような最先端のモデルは、知識集約的なタスクに苦戦している。これを解決するために、逆画像検索(Reverse Image Retrieval、RIR)拡張生成について検討する。 RIRは、GPT-4Vの知識集約型視覚質問応答(VQA)を37-43%、GPT-4 Turboを25-27%、GPT-4oを18-20%改善する。驚いたことに、RIRはモデルが自身の世界知識によりよくアクセスするのに役立ちます。具体的には、RIR拡張は、クエリへの直接応答を必ずしも含まない視覚的およびテキスト的手がかりを提供することによって有効であることを示す。また,RIRがパフォーマンスを損なうようなケースを解明し,人的評価を行う。最後に、RIRを使用することによる全体的なアドバンテージは、RIRをデフォルト設定であるアプローチよりも優れたパフォーマンスを実現するために、RIRを使用するエージェントを選択することが難しくなることに気付きます。 Despite impressive advances in recent multimodal large language models (MLLMs), state-of-the-art models such as from the GPT-4 suite still struggle with knowledge-intensive tasks. To address this, we consider Reverse Image Retrieval (RIR) augmented generation, a simple yet effective strategy to augment MLLMs with web-scale reverse image search results. RIR robustly improves knowledge-intensive visual question answering (VQA) of GPT-4V by 37-43%, GPT-4 Turbo by 25-27%, and GPT-4o by 18-20% in terms of open-ended VQA evaluation metrics. To our surprise, we discover that RIR helps the model to better access its own world knowledge. Concretely, our experiments suggest that RIR augmentation helps by providing further visual and textual cues without necessarily containing the direct answer to a query. In addition, we elucidate cases in which RIR can hurt performance and conduct a human evaluation. Finally, we find that the overall advantage of using RIR makes it difficult for an agent that can choose to use RIR to perform better than an approach where RIR is the default setting.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# Genshin: 大規模言語モデルによる自然言語処理のための汎用シールド Genshin: General Shield for Natural Language Processing with Large Language Models ( http://arxiv.org/abs/2405.18741v1 ) ライセンス: Link先を確認	Xiao Peng, Tao Liu, Ying Wang,	(参考訳) ChatGPT、Gemini、LLaMAのような大規模言語モデル(LLM)が最近流行し、無数のドメインでかなりの進歩と一般化能力を示している。しかし、LSMはより大きなブラックボックスが不透明度を悪化させ、解釈可能性はほとんどない。 LLMの本質に埋め込まれた不確実性と不透明性は、金融詐欺やフィッシングなどの高額な領域への適用を制限する。現在のアプローチは、主に後方解釈可能なアルゴリズムによる従来のテキスト分類に依存しており、システムの防御を壊すために多種多様な敵のサンプルを作成する攻撃者に悩まされ、ユーザーは効率と堅牢性の間のトレードオフを強制する。この問題に対処するために,LLMを防御的なワンタイムプラグインとして活用する,Genshin(大規模言語モデル付き自然言語処理一般シールド)と呼ばれる新しいカスケーディングフレームワークを提案する。テキストを新しい、あるいは構造的なものに変えようとするLLMのほとんどのアプリケーションとは異なり、源信はLLMを使ってテキストを元の状態に復元する。玄信は、LLMの一般化可能性、中央モデルの識別、単純モデルの解釈可能性を組み合わせることを目的としている。感傷的分析とスパム検出の課題に対する実験により,現在の中央値モデルに致命的な欠陥がみられ,LLMの回復能力が向上し,ゲンシンが効果的かつ効果的であることが確認された。アブレーション研究では、いくつかの興味深い観察を発掘した。第4パラダイムから派生したツールである LLM ディフェンダーを用いて, BERT の最適マスクレート 15% を NLP の第3パラダイムに再現した。さらに、LLMを潜在的な敵ツールとして使用する場合、攻撃者は意味的にほとんど損失のない効果的な攻撃を実行することができる。 Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive one-time plug-ins. Unlike most applications of LLMs that try to transform text into something new or structural, Genshin uses LLMs to recover text to its original state. Genshin aims to combine the generalizability of the LLM, the discrimination of the median model, and the interpretability of the simple model. Our experiments on the task of sentimental analysis and spam detection have shown fatal flaws of the current median models and exhilarating results on LLMs' recovery ability, demonstrating that Genshin is both effective and efficient. In our ablation study, we unearth several intriguing observations. Utilizing the LLM defender, a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimal mask rate results in the 3rd paradigm of NLP. Additionally, when employing the LLM as a potential adversarial tool, attackers are capable of executing effective attacks that are nearly semantically lossless.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# 文法的誘導による音楽的フレーズ分割 Musical Phrase Segmentation via Grammatical Induction ( http://arxiv.org/abs/2405.18742v1 ) ライセンス: Link先を確認	Reed Perkins, Dan Ventura,	(参考訳) 入力シーケンスから文脈自由文法を推論するアルゴリズムのクラスである文法的帰納アルゴリズムを用いた音楽句のセグメンテーションの課題について概説する。様々な音楽的視点の組み合わせを用いて、3つのデータセット上での5つの文法的帰納アルゴリズムの性能を解析する。実験の結果, LONGESTFIRSTアルゴリズムは, 3つのデータセットのすべてで最高のF1スコアを達成し, 時間的視点を含む入力エンコーディングが最高のパフォーマンスをもたらすことがわかった。 We outline a solution to the challenge of musical phrase segmentation that uses grammatical induction algorithms, a class of algorithms which infer a context-free grammar from an input sequence. We analyze the performance of five grammatical induction algorithms on three datasets using various musical viewpoint combinations. Our experiments show that the LONGESTFIRST algorithm achieves the best F1 scores across all three datasets and that input encodings that include the duration viewpoint result in the best performance.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# PermLLM: WAN下での3秒以内の大規模言語モデルのプライベート推論 PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN ( http://arxiv.org/abs/2405.18744v1 ) ライセンス: Link先を確認	Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng,	(参考訳) ChatGPTの出現は、大きな言語モデル(LLM)時代の到来を表している。 LLMは様々な分野でそのパワーを実証するが、ユーザーのクエリがモデルプロバイダに送られると、深刻なプライバシー上の懸念が生じる。一方、ユーザーのデバイスにLSMをデプロイすると、すべてのモデルデータがリークされる。セキュアなマルチパーティ計算(MPC)に基づく既存の手法は、モデルのパラメータとユーザクエリの両方のプライバシを保護することができた。しかし、1つのトークンだけを生成するには、ギガバイトのデータ転送と数分を要するため、ほとんどの現実世界アプリケーションでは実用的ではない。セキュアなランダムな置換を用いた非線形関数の評価を高速化するPermLLMを提案する。 PermLLMは、最適化された秘密共有プロトコルと同型暗号化とともに、既存のMPCソリューションよりも大幅に高速な10ms RTTと1Gbpsのネットワーク設定の下で、約3s/tokenの速度でChatGLM-6Bモデルの双方向のプライベート推論を実現する。 The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed to protect both the privacy of the model parameters and user queries. However, they require gigabytes of data transfer and several minutes to generate just one token, making them impractical for most real-world applications. To improve the efficiency of private LLM inference, we propose PermLLM, which accelerates the evaluation of non-linear functions using secure random permutation. Along with the optimized secret sharing protocols and homomorphic encryption, PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token, under a realistic network setting (10ms RTT and 1Gbps bandwidth), which is magnitudes faster than existing MPC solutions.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# PanoNormal:単眼の室内360度表面の正規化 PanoNormal: Monocular Indoor 360° Surface Normal Estimation ( http://arxiv.org/abs/2405.18745v1 ) ライセンス: Link先を確認	Kun Huang, Fanglue Zhang, Neil Dodgson,	(参考訳) 等角面上の球面歪みの存在は、表面正規推定のような高密度回帰コンピュータビジョンタスクにおいて明らかな課題である。畳み込みニューラルネットワーク(CNN)の最近の進歩は、球面の歪みを緩和しようとするが、多くの場合、その固定された受容野のために、全体構造を効果的に捉えるのに不足する。一方、視覚変換器(ViT)は、グローバルな自己アテンション機構を通じて長距離依存関係を確立するのに優れるが、局所的な詳細を保存する際の制限に直面する。 CNN と ViT の強度を組み合わせた 360{\deg} 画像のための単分子面正規推定アーキテクチャである \textit{PanoNormal} を紹介する。具体的には,球面特徴分布を考慮した多段階のグローバル自己注意方式を採用し,シーンの包括的理解を高めた。実験結果から,本手法は複数の一般的な360{\deg}単分子データセットにまたがる最先端性能を実現することができることがわかった。コードとモデルはリリースされる。 The presence of spherical distortion on the Equirectangular image is an acknowledged challenge in dense regression computer vision tasks, such as surface normal estimation. Recent advances in convolutional neural networks (CNNs) strive to mitigate spherical distortion but often fall short in capturing holistic structures effectively, primarily due to their fixed receptive field. On the other hand, vision transformers (ViTs) excel in establishing long-range dependencies through a global self-attention mechanism, yet they encounter limitations in preserving local details. We introduce \textit{PanoNormal}, a monocular surface normal estimation architecture designed for 360{\deg} images, which combines the strengths of CNNs and ViTs. Specifically, we employ a multi-level global self-attention scheme with the consideration of the spherical feature distribution, enhancing the comprehensive understanding of the scene. Our experimental results demonstrate that our approach achieves state-of-the-art performance across multiple popular 360{\deg} monocular datasets. The code and models will be released.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# STIQ: 信頼できないクラウドからの量子ニューラルネットワークのトレーニングと推論の保護 STIQ: Safeguarding Training and Inferencing of Quantum Neural Networks from Untrusted Cloud ( http://arxiv.org/abs/2405.18746v1 ) ライセンス: Link先を確認	Satwik Kundu, Swaroop Ghosh,	(参考訳) 現在の量子クラウドプロバイダが課している高コストは、量子リソースの増大の必要性と相まって、潜在的に信頼できないプロバイダからのより安価なクラウドベースの量子サービスの台頭を動機付ける可能性がある。これらの信頼できないプラットフォームに量子ニューラルネットワーク(QNN)などの量子モデルをデプロイしたり、ホストしたりすると、無数のセキュリティ上の懸念が生じ、最も重要なのがモデル盗難である。この脆弱性は、トレーニングや推論中にクラウドプロバイダがこれらの回路に完全にアクセスすることに由来する。そこで本研究では,このようなクラウドベースの敵に対するQNNの保護を目的とした,新たなアンサンブルベースの戦略であるSTIQを紹介する。提案手法は,同一または異なるプラットフォーム上でホストする2つの異なるQNNを同時に訓練し,各ネットワークが難解な出力を出力することにより,クラウド環境内で動作している敵に対して,個々のQNNを非効率に処理する。しかし、これらの出力が(集約関数を使って)局所的に結合されると、正しい結果が明らかになる。様々なQNNやデータセットにわたる広範な実験を通じて、我々の手法は、計算オーバーヘッドの合計で$$\leq 2\times$を犠牲にして、個別にホストされたモデルの精度と損失を最大66%まで効果的に隠蔽することが証明された。しかし、このトレードオフは、クラウドベースの環境でのQNNのセキュリティ強化と整合性のために、信頼できない敵に対して支払う小さな価格である。また、実際の127量子ビットのIBM\_Sherbrookeハードウェア上でSTIQの実用的応用を実証し、STIQが最大60%の難読化を実現し、非難読化モデルに匹敵する性能を併せ持つことを示した。 The high expenses imposed by current quantum cloud providers, coupled with the escalating need for quantum resources, may incentivize the emergence of cheaper cloud-based quantum services from potentially untrusted providers. Deploying or hosting quantum models, such as Quantum Neural Networks (QNNs), on these untrusted platforms introduces a myriad of security concerns, with the most critical one being model theft. This vulnerability stems from the cloud provider's full access to these circuits during training and/or inference. In this work, we introduce STIQ, a novel ensemble-based strategy designed to safeguard QNNs against such cloud-based adversaries. Our method innovatively trains two distinct QNNs concurrently, hosting them on same or different platforms, in a manner that each network yields obfuscated outputs rendering the individual QNNs ineffective for adversaries operating within cloud environments. However, when these outputs are combined locally (using an aggregate function), they reveal the correct result. Through extensive experiments across various QNNs and datasets, our technique has proven to effectively masks the accuracy and losses of the individually hosted models by upto 76\%, albeit at the expense of $\leq 2\times$ increase in the total computational overhead. This trade-off, however, is a small price to pay for the enhanced security and integrity of QNNs in a cloud-based environment prone to untrusted adversaries. We also demonstrated STIQ's practical application by evaluating it on real 127-qubit IBM\_Sherbrooke hardware, showing that STIQ achieves up to 60\% obfuscation, with combined performance comparable to an unobfuscated model.	翻訳日:2024-05-30 18:58:09 公開日:2024-05-29
# 抗体モデルのためのSARS-CoV-2相互作用データセットとVHH系列コーパス A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models ( http://arxiv.org/abs/2405.18749v1 ) ライセンス: Link先を確認	Hirofumi Tsuruta, Hiroyuki Yamazaki, Ryota Maeda, Ryotaro Tamura, Akihiro Imura,	(参考訳) 抗体は、有害な異物を取り除くために免疫系によって生産される重要なタンパク質であり、ヒト疾患の治療において重要な治療薬となっている。抗体治療の発見を加速するため, 抗体配列を用いた言語モデル構築への関心が高まっている。しかし,ラベル付きデータセットの不足により,事前学習した言語モデルの抗体発見への適用性は十分に評価されていない。 AVIDa-SARS-CoV-2は重症急性呼吸器症候群ウイルス2(SARS-CoV-2)スパイクタンパク質に免疫された2つのアルパサから得られた重鎖抗体(VHH)相互作用の抗原可変ドメインを特徴とするデータセットである。 AVIDa-SARS-CoV-2は、デルタおよびOmicron変異体のような12のSARS-CoV-2変異体への多様なVHH配列の結合または非結合を示すバイナリラベルを含む。さらに,VHHCorpus-2Mは,200万以上のVHH配列を含む,抗体言語モデルの事前学習データセットである。 VHHCorpus-2Mおよび既存の一般タンパク質および抗体特異的言語モデルを用いたVHHBERTを用いたSARS-CoV-2-VHH結合予測のためのベンチマーク結果を報告する。これらの結果は,AVIDa-SARS-CoV-2が結合予測のための抗体言語モデルの表現能力を評価するための貴重なベンチマークを提供し,AI駆動型抗体発見の開発を容易にすることを確認する。データセットはhttps://datasets.cognanous.comで公開されている。 Antibodies are crucial proteins produced by the immune system to eliminate harmful foreign substances and have become pivotal therapeutic agents for treating human diseases. To accelerate the discovery of antibody therapeutics, there is growing interest in constructing language models using antibody sequences. However, the applicability of pre-trained language models for antibody discovery has not been thoroughly evaluated due to the scarcity of labeled datasets. To overcome these limitations, we introduce AVIDa-SARS-CoV-2, a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. Furthermore, we release VHHCorpus-2M, a pre-training dataset for antibody language models, containing over two million VHH sequences. We report benchmark results for predicting SARS-CoV-2-VHH binding using VHHBERT pre-trained on VHHCorpus-2M and existing general protein and antibody-specific pre-trained language models. These results confirm that AVIDa-SARS-CoV-2 provides valuable benchmarks for evaluating the representation capabilities of antibody language models for binding prediction, thereby facilitating the development of AI-driven antibody discovery. The datasets are available at https://datasets.cognanous.com.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# T2V-Turbo:Mixed Reward Feedbackによるビデオ一貫性モデルの高品質化 T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback ( http://arxiv.org/abs/2405.18750v1 ) ライセンス: Link先を確認	Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang,	(参考訳) 拡散に基づくテキスト・ツー・ビデオ(T2V)モデルは大きな成功を収めたが、反復サンプリングプロセスの遅いサンプリング速度によって妨げられ続けている。この課題に対処するために、サンプル品質のコストにもかかわらず、高速な推論を容易にするために一貫性モデルが提案されている。本稿では,ビデオ一貫性モデル(VCM)の品質ボトルネックを解消し,高速かつ高品質なビデオ生成を実現することを目的としている。本稿では,T2V-Turboについて述べる。このT2V-Turboは,様々な報酬モデルから得られるフィードバックを,事前学習したT2Vモデルの一貫性蒸留(CD)プロセスに統合する。特に、CD損失の計算から自然に生じる単一ステップ世代に関連する報酬を直接最適化し、反復サンプリングプロセスを通じて勾配の逆伝播によるメモリ制約を効果的に回避する。興味深いことに、我々のT2V-Turboの4段階の世代は、Gen-2とPikaを抜いてVBenchで最高スコアを達成した。さらに,T2V-Turboの4ステップ世代は,教師モデルから得られた50ステップのDDIMサンプルよりも好まれ,ビデオ生成品質を向上しつつ,10倍以上の加速を示すことが確認された。 Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$. We introduce T2V-Turbo, which integrates feedback from a mixture of differentiable reward models into the consistency distillation (CD) process of a pre-trained T2V model. Notably, we directly optimize rewards associated with single-step generations that arise naturally from computing the CD loss, effectively bypassing the memory constraints imposed by backpropagating gradients through an iterative sampling process. Remarkably, the 4-step generations from our T2V-Turbo achieve the highest total score on VBench, even surpassing Gen-2 and Pika. We further conduct human evaluations to corroborate the results, validating that the 4-step generations from our T2V-Turbo are preferred over the 50-step DDIM samples from their teacher models, representing more than a tenfold acceleration while improving video generation quality.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# 条件付きバッチ正規化を用いた補助タスク変調によるマルチモーダルメタラーニングの限界について On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization ( http://arxiv.org/abs/2405.18751v1 ) ライセンス: Link先を確認	Jordi Armengol-Estapé, Vincent Michalski, Ramnath Kumar, Pierre-Luc St-Charles, Doina Precup, Samira Ebrahimi Kahou,	(参考訳) 少ないショット学習は、少数の例から見れば、新しいタスクに対処できる表現を学習することを目的としている。近年の研究では、クロスモーダル学習は、数発の分類において表現を改善することが示されている。より具体的に言えば、言語は視覚学習を導くのに使える豊富なモダリティである。本研究では, 分類器, 補助ネットワーク, ブリッジネットワークという3つのコンポーネントから構成される, 数ショット学習のためのマルチモーダルアーキテクチャを実験する。分類器が主分類タスクを実行する間、補助ネットワークは同じ入力から言語表現を予測することを学習し、ブリッジネットワークは、補助ネットワークの高レベルな特徴を条件付きバッチ正規化を用いて、少数ショット分類器の層に対する変調パラメータに変換する。このブリッジは、言語と視覚の間の軽量なセマンティックアライメントの形式を奨励し、分類器に役立てるべきである。しかし、2つの一般的な数ショット分類ベンチマークに対する提案されたアプローチを評価すると、そのことが分かる。 a) 改善はベンチマーク全体にわたって再現されず、 b)ブリッジネットワークによって導入された計算とパラメータの追加による改善。言語表現を用いたマルチモーダルなメタラーニングにおける今後の研究に対する洞察と提言に貢献する。 Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# 再現性危機の解決--認証ロバスト性検証の事例から Confronting the Reproducibility Crisis: A Case Study in Validating Certified Robustness ( http://arxiv.org/abs/2405.18753v1 ) ライセンス: Link先を確認	Richard H. Moulton, Gary A. McCully, John D. Hastings,	(参考訳) 再現性は科学的研究の基盤であり、検証、拡張、進歩を可能にする。しかし、ソフトウェアと依存関係の急速に進化する性質は、特に複雑なコードベースと特殊なツールキットが使用されるディープニューラルネットワークの対角的堅牢性のような分野において、研究結果を再現する上で大きな課題を生んでいる。本稿では,VeriGauge ツールキットを用いた "SoK: Certified Robustness for Deep Neural Networks" における検証結果の検証を試みる。ドキュメント化された方法論に従えば、古い依存関係や利用できない依存関係、バージョンコンフリクト、ドライバの不互換性など、多くのソフトウェアとハードウェアの互換性の問題が発生した。元の結果のサブセットを走らせることができたが、これらの技術的障害と試験結果のわずかな相違により、様々な検証手法の実証的堅牢な精度に関する重要な発見が発覚した。この実践的な経験は、再現可能性の欠如が科学的完全性を脅かし、進歩を妨げる敵の堅牢性研究に支障をきたす再現性危機に光を当てている。本稿では,コンテナ化やソフトウェア保存,包括的なドキュメントプラクティスといった潜在的なソリューションを提案する。さらに、再現可能な研究のための堅牢なフレームワークを開発するために、研究コミュニティ内でのコラボレーションと標準化の取り組みの必要性を強調している。本研究は, 再現性危機に先立ち, 科学的再現性に関する現在進行中の談話に貢献することを目的としており, 研究成果の信頼性と妥当性を, 敵の堅牢性だけでなく, セキュリティ・技術研究全般において保証するベストプラクティスを提唱する。 Reproducibility is a cornerstone of scientific research, enabling validation, extension, and progress. However, the rapidly evolving nature of software and dependencies poses significant challenges to reproducing research results, particularly in fields like adversarial robustness for deep neural networks, where complex codebases and specialized toolkits are utilized. This paper presents a case study of attempting to validate the results on certified adversarial robustness in "SoK: Certified Robustness for Deep Neural Networks" using the VeriGauge toolkit. Despite following the documented methodology, numerous software and hardware compatibility issues were encountered, including outdated or unavailable dependencies, version conflicts, and driver incompatibilities. While a subset of the original results could be run, key findings related to the empirical robust accuracy of various verification methods proved elusive due to these technical obstacles, as well as slight discrepancies in the test results. This practical experience sheds light on the reproducibility crisis afflicting adversarial robustness research, where a lack of reproducibility threatens scientific integrity and hinders progress. The paper discusses the broader implications of this crisis, proposing potential solutions such as containerization, software preservation, and comprehensive documentation practices. Furthermore, it highlights the need for collaboration and standardization efforts within the research community to develop robust frameworks for reproducible research. By addressing the reproducibility crisis head-on, this work aims to contribute to the ongoing discourse on scientific reproducibility and advocate for best practices that ensure the reliability and validity of research findings within not only adversarial robustness, but security and technology research as a whole.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# GIST: 異種データ要約のためのGreedy Independent Set Thresholding GIST: Greedy Independent Set Thresholding for Diverse Data Summarization ( http://arxiv.org/abs/2405.18754v1 ) ライセンス: Link先を確認	Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo,	(参考訳) 本稿では,機械学習,例えばデータサンプリング,特徴選択など,多種多様な応用が可能な,min-distance various data summarization(\textsf{MDDS}$)と呼ばれる新しいサブセット選択タスクを提案する。計量空間における点の集合が与えられたとき、ゴールは、点の全体の効用と、任意の選択された点間の最小距離を、制約$\|S\| \le k$の条件で捉える多様性項を組み合わせた目的を最大化することである。例えば、このポイントは、深層ニューラルネットワークから抽出された画像の学習された埋め込みなど、データサンプリング問題におけるトレーニング例に対応できる。この研究は$\texttt{GIST}$アルゴリズムを示し、bicriteria greedyアルゴリズムで一連の最大独立集合問題を近似することで$\frac{2}{3}$-approximation guarantee for $\textsf{MDDS}$を達成する。また、任意の$\varepsilon > 0$に対して、相補的な$(\frac{2}{3}+\varepsilon)$-近似の硬度も証明する。最後に,$\texttt{GIST}$が合成データに対して$\textsf{MDDS}$の既存の手法よりも優れており,実世界の画像分類実験ではImageNetの単発サブセット選択について検討する。 We propose a novel subset selection task called min-distance diverse data summarization ($\textsf{MDDS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between any pair of selected points, subject to the constraint $\|S\| \le k$. For example, the points may correspond to training examples in a data sampling problem, e.g., learned embeddings of images extracted from a deep neural network. This work presents the $\texttt{GIST}$ algorithm, which achieves a $\frac{2}{3}$-approximation guarantee for $\textsf{MDDS}$ by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove a complementary $(\frac{2}{3}+\varepsilon)$-hardness of approximation, for any $\varepsilon > 0$. Finally, we provide an empirical study that demonstrates $\texttt{GIST}$ outperforms existing methods for $\textsf{MDDS}$ on synthetic data, and also for a real-world image classification experiment the studies single-shot subset selection for ImageNet.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# 確率的コントラスト型連続学習 Provable Contrastive Continual Learning ( http://arxiv.org/abs/2405.18756v1 ) ライセンス: Link先を確認	Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang,	(参考訳) 継続的な学習には、動的データ分散による漸進的なタスクの学習が必要である。これまでに, コントラスト損失と蒸留損失を併用して連続学習の訓練を行うことで, 高い性能が得られることが確認されている。しかし、私たちの知る限りでは、この対照的な連続的な学習フレームワークは、説得力のある理論的な説明を欠いている。本研究では,このギャップを理論的性能保証の確立によって埋める。これは,モデルの性能が,対照的な連続学習フレームワークにおいて,従来のタスクのトレーニング損失によっていかに境界づけられているかを明らかにする。我々の理論的な説明は、事前学習が継続的な学習に役立つという考えをさらに支持する。これらの保証の理論的解析から着想を得て、異なるタスクに適応蒸留係数を用いるCILAと呼ばれる新しいコントラスト型連続学習アルゴリズムを提案する。これらの蒸留係数は、従来の平均蒸留損失と平均コントラスト損失との比で容易に計算できる。提案手法は,標準ベンチマークの精度を向上し,新しい最先端性能を実現する。 Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill this gap by establishing theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks in the contrastive continual learning framework. Our theoretical explanations further support the idea that pre-training can benefit continual learning. Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA, which uses adaptive distillation coefficients for different tasks. These distillation coefficients are easily computed by the ratio between average distillation losses and average contrastive losses from previous tasks. Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# ベイズ原理で継続的に学ぶこと Learning to Continually Learn with the Bayesian Principle ( http://arxiv.org/abs/2405.18758v1 ) ライセンス: Link先を確認	Soochan Lee, Hyeonseong Jeon, Jaehyeon Son, Gunhee Kim,	(参考訳) ディープラーニングの現代において、継続学習の研究は主に、非定常的なデータストリーム上で確率的勾配勾配のニューラルネットワークをトレーニングする際の忘れを緩和することに焦点を当てている。一方、統計機械学習の古典的な文献では、多くのモデルは、バッチトレーニングと同じ学習結果をもたらす逐次ベイズ更新規則を持つ。しかし、それらはしばしば複雑な実世界のデータをモデル化するのに非常に単純である。本研究では、ニューラルネットワークの強力な表現力と、忘れることに対する単純な統計モデルの堅牢性を組み合わせたメタラーニングパラダイムを採用する。我々の新しいメタ連続学習フレームワークでは、連続学習は理想的な逐次ベイズ更新規則を介して統計モデルでのみ行われ、ニューラルネットワークは生データと統計モデルをブリッジするためにメタ学習される。ニューラルネットワークは継続学習中に固定されているため、破滅的な忘れ物から保護されている。このアプローチはパフォーマンスを大幅に向上するだけでなく、優れたスケーラビリティも発揮します。このアプローチはドメインに依存しないモデルに依存しないため、幅広い問題に適用でき、既存のモデルアーキテクチャと容易に統合できる。 In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch training, i.e., they are completely immune to catastrophic forgetting. However, they are often overly simple to model complex real-world data. In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models' robustness to forgetting. In our novel meta-continual learning framework, continual learning takes place only in statistical models via ideal sequential Bayesian update rules, while neural networks are meta-learned to bridge the raw data and the statistical models. Since the neural networks remain fixed during continual learning, they are protected from catastrophic forgetting. This approach not only achieves significantly improved performance but also exhibits excellent scalability. Since our approach is domain-agnostic and model-agnostic, it can be applied to a wide range of problems and easily integrated with existing model architectures.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# FDQN: ゲーム自動化のための柔軟なQ-ネットワークフレームワーク FDQN: A Flexible Deep Q-Network Framework for Game Automation ( http://arxiv.org/abs/2405.18761v1 ) ライセンス: Link先を確認	Prabhath Reddy Gujavarthy,	(参考訳) 強化学習では、特にドメインがリアルタイムのオンラインインタラクションやWebゲームのような適応戦略を必要とする場合、動的環境における高次元的、迅速な意思決定を自動化することがしばしば困難である。本研究は,CNNを用いて高次元センサデータをリアルタイムに処理し,異なるゲーム環境のさまざまなアクション空間にモデルアーキテクチャを動的に適用し,さまざまなAtariゲームやChrome Dinoゲームにおいて,以前のベースラインモデルをベースラインとして向上させるという,この課題に対処可能な,最先端のフレキシブルQネットワーク(FDQN)フレームワークを提案する。 epsilon-greedyポリシを使うことで、パフォーマンス向上のために新しい学習と利用のバランスを効果的に保ち、フレームワークのコア部分に触れることなく、他のHTMLベースのゲームに容易に適応できるモジュール構造で設計されている。 FDQNフレームワークは、実験室条件下でよく定義されたタスクをうまく解決できることが実証されているが、より重要なことは、より困難な現実世界のケースへの潜在的な応用を議論し、自動化されたゲームプレイ以降のさらなる探索の出発点として機能することである。 In reinforcement learning, it is often difficult to automate high-dimensional, rapid decision-making in dynamic environments, especially when domains require real-time online interaction and adaptive strategies such as web-based games. This work proposes a state-of-the-art Flexible Deep Q-Network (FDQN) framework that can address this challenge with a selfadaptive approach that is processing high-dimensional sensory data in realtime using a CNN and dynamically adapting the model architecture to varying action spaces of different gaming environments and outperforming previous baseline models in various Atari games and the Chrome Dino game as baselines. Using the epsilon-greedy policy, it effectively balances the new learning and exploitation for improved performance, and it has been designed with a modular structure that it can be easily adapted to other HTML-based games without touching the core part of the framework. It is demonstrated that the FDQN framework can successfully solve a well-defined task in a laboratory condition, but more importantly it also discusses potential applications to more challenging real-world cases and serve as the starting point for future further exploration into automated game play and beyond.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# Inpaint Biases: 正確な画像生成と不偏画像生成のための道 Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation ( http://arxiv.org/abs/2405.18762v1 ) ライセンス: Link先を確認	Jiyoon Myung, Jihyeon Park,	(参考訳) 本稿では、訓練データセットにほとんど表現されない、あるいは欠落している非伝統的な概念を正確にレンダリングする際の高度なテキスト・画像モデルの限界について検討する。これらの制限が、これらのモデルの創造的可能性を限定するだけでなく、ステレオタイプを補強するリスクも生じさせる。これらの課題に対処するために,ユーザ定義マスクとインペイント技術を用いたInpaint Biasesフレームワークを導入し,特に新規あるいは不正確なオブジェクトに対して,画像生成の精度を向上させる。実験的な検証を通じて、このフレームワークが生成した画像の忠実度をユーザの意図に大きく改善し、それによってモデルの創造能力を拡大し、バイアスを緩和するリスクを緩和することを示す。本研究は,創造的表現のための非バイアスで汎用的なツールとして,テキスト・ツー・イメージ・モデルの進歩に寄与する。 This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# BCIにおける脳波データを用いたジェネリック表現学習のための大脳モデル Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI ( http://arxiv.org/abs/2405.18765v1 ) ライセンス: Link先を確認	Wei-Bang Jiang, Li-Ming Zhao, Bao-Liang Lu,	(参考訳) 現在の脳波に基づくディープラーニングモデルは、典型的には、脳-コンピュータ相互作用(BCI)における特定のデータセットや応用のために設計されており、モデルの規模を制限し、知覚能力と一般化性を低下させる。近年,Large Language Models (LLMs) はテキスト処理において前例のない成功を収めており,Large EEG Models (LEMs) の機能を探究している。我々は,LEMが脳波データセットの異なるタスクタイプの制限を突破し,教師なし事前学習を通じて脳波信号の普遍的知覚能力を得ることを期待している。次に、モデルは異なる下流タスクのために微調整できる。しかし、テキストデータと比較すると、EEGデータセットのボリュームは概して小さく、フォーマットは様々である。例えば、電極のミスマッチ数、不等長データサンプル、様々なタスク設計、低信号対雑音比がある。これらの課題を克服するため、我々はLarge Brain Model (LaBraM) と呼ばれる脳波の統一基盤モデルを提案する。 LaBraMは、EEG信号をEEGチャネルパッチにセグメント化することで、データセット間の学習を可能にする。ベクトル量子化されたニューラルスペクトル予測は、連続的な生のEEGチャネルパッチをコンパクトなニューラルコードにエンコードするセマンティックにリッチなニューラルトークンを訓練するために使用される。次に、マスクされたEEGチャネルパッチの元のニューラルコードを予測することにより、ニューラルトランスフォーマーを事前訓練する。 LaBraMは、約20のデータセットから約2500時間のさまざまなEEG信号で事前トレーニングされ、複数の異なる下流タスクで検証された。異常検出,事象型分類,感情認識,歩行予測実験の結果,LaBraMはそれぞれの分野でSOTA法よりも優れていた。私たちのコードはhttps://github.com/935963004/LaBraM.comで公開されています。 The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# RNA Flow:逆フォールディングフローマッチングによるRNA構造とシーケンス設計 RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching ( http://arxiv.org/abs/2405.18768v1 ) ライセンス: Link先を確認	Divya Nori, Wengong Jin,	(参考訳) 多様な生物学的応用におけるRNA工学の重要性の高まりにより、構造に基づくRNA設計のためのAI手法の開発への関心が高まっている。拡散モデルはタンパク質設計において優れているが、RNAに適応させることは、RNAのコンフォメーションの柔軟性と大きな構造予測モデルを微調整する計算コストにより、新しい課題をもたらす。そこで本研究では,タンパク質条件のRNA配列構造設計のためのフローマッチングモデルであるRNAFlowを提案する。そのデノナイジングネットワークはRNA逆フォールディングモデルと事前訓練されたRosettaFold2NAネットワークを統合し、RNA配列と構造を生成する。構造記述過程における逆折り畳みの統合により,構造予測ネットワークの修正によるトレーニングの簡易化が可能となる。我々は、動的RNAコンフォメーションをモデル化するために、推論されたコンフォメーションアンサンブルに条件付けすることで、逆折り畳みモデルをさらに強化する。タンパク質条件のRNA構造と配列生成タスクの評価は、既存のRNA設計手法に対するRNAFlowの優位性を示している。 The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# OUS:Scene-Guided Dynamic Facial Expression Recognition OUS: Scene-Guided Dynamic Facial Expression Recognition ( http://arxiv.org/abs/2405.18769v1 ) ライセンス: Link先を確認	Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, Jing Liu, Jiawen Yu, Xuan Tong, Yating Li, Wenqiang Zhang,	(参考訳) 動的顔表情認識(DFER)は情緒的コンピューティングには不可欠であるが、シーンコンテキストの影響を見落としていることが多い。人間のアノテータは、環境手がかりやボディランゲージなど、さまざまな角度から感情を統合するのが一般的であるのに対して、既存のDFERメソッドでは、シーンを、顔情報にのみ焦点をあてて、フィルタリングが必要なノイズとして考える傾向があります。これを「剛性認知問題」と呼ぶ。 Rigid Cognitive Problemは、いくつかのサンプルにおいて、アノテーションの認識とモデルの間に相違をもたらす可能性がある。感情の人間の認知パラダイムとより緊密に一致させるために,情景DFER法(OUS)の総合的理解を提案する。 OUSはシーンと顔の特徴を効果的に統合し、DFERのシーン固有の感情的知識を組み合わせる。 DFERフィールドにおける2つの大きなデータセットであるDFEWとFERV39kに関する大規模な実験は、ousが既存の手法よりも大幅に優れていることを示した。 Rigid Cognitive Problemを解析することにより、ousはシーンコンテキストと感情表現の複雑な関係をうまく理解し、現実世界のシナリオにおける人間の感情的理解と密接に一致させる。 Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# ビジュアル・ランゲージ・アタックに対する防御のための多対多関係の活用 Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks ( http://arxiv.org/abs/2405.18770v1 ) ライセンス: Link先を確認	Futa Waseda, Antonio Tejero-de-Pablos,	(参考訳) 近年の研究では、視覚言語(VL)モデルが画像テキスト検索(ITR)の敵攻撃に対して脆弱であることが示されている。しかし、既存のVLモデルの防衛戦略は、画像とテキストの同時操作を考慮しないゼロショット画像分類と、複数の方法で単一の画像を記述することができるITR固有の多対多(N:N)の性質に重点を置いている。そこで本研究では,ITRのVLモデルに対する敵攻撃に対する防衛戦略を初めて検討した。特に,敵の強靭性を高めるため,IMRにおけるN:N関係の活用に着目する。列車データ中の1対1画像テキストペアに対して, 対角訓練は容易にオーバーフィットするが, 1対1(N:N)/多対1(N:1)画像テキストペアを作成するための多様な拡張技術は, VLモデルの対角的ロバスト性を大幅に向上させることができることがわかった。さらに, 画像・テキスト・ペアのアライメントは, 防御戦略の有効性に不可欠であり, 不適切な拡張はモデルの性能を低下させる可能性があることを示す。そこで本研究では,IMRにおけるN:N関係を利用した新たな防衛戦略を提案し,基本拡張と生成モデルに基づく拡張を用いて,多種多様かつ高整合なN:Nペアを効果的に生成する。この研究は、VLタスクにおける敵の攻撃を防御する新しい視点を提供し、将来の研究のための新たな研究方向を開く。 Recent studies have revealed that vision-language (VL) models are vulnerable to adversarial attacks for image-text retrieval (ITR). However, existing defense strategies for VL models primarily focus on zero-shot image classification, which do not consider the simultaneous manipulation of image and text, as well as the inherent many-to-many (N:N) nature of ITR, where a single image can be described in numerous ways, and vice versa. To this end, this paper studies defense strategies against adversarial attacks on VL models for ITR for the first time. Particularly, we focus on how to leverage the N:N relationship in ITR to enhance adversarial robustness. We found that, although adversarial training easily overfits to specific one-to-one (1:1) image-text pairs in the train data, diverse augmentation techniques to create one-to-many (1:N) / many-to-one (N:1) image-text pairs can significantly improve adversarial robustness in VL models. Additionally, we show that the alignment of the augmented image-text pairs is crucial for the effectiveness of the defense strategy, and that inappropriate augmentations can even degrade the model's performance. Based on these findings, we propose a novel defense strategy that leverages the N:N relationship in ITR, which effectively generates diverse yet highly-aligned N:N pairs using basic augmentations and generative model-based augmentations. This work provides a novel perspective on defending against adversarial attacks in VL tasks and opens up new research directions for future work.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# 環境制約付き最大被覆問題に対する信頼性のある微分制約の展開 Evolving Reliable Differentiating Constraints for the Chance-constrained Maximum Coverage Problem ( http://arxiv.org/abs/2405.18772v1 ) ライセンス: Link先を確認	Saba Sadeghi Ahouei, Jacob de Nobel, Aneta Neumann, Thomas Bäck, Frank Neumann,	(参考訳) チャンス制約問題(英語版)は、小さな確率で破ることができる制約の確率的成分を含む。確率制約が反復探索アルゴリズムの性能に与える影響について検討し,確率制約のあるグラフにおける古典的最大被覆問題について検討する。我々のゴールは、アルゴリズムの性能が期待されるだけでなく、高い信頼性で著しく異なるグラフに対する信頼性の高い確率制約設定を進化させることである。これにより、異なるタイプのアルゴリズムが、異なるタイプの制約設定にどのように対処できるかを学習し、理解し、自動アルゴリズム選択をサポートすることができる。本研究では、2つの確率探索アルゴリズムの性能を高い信頼性で区別する確率制約セットを提供する進化的アルゴリズムを開発する。まず、(1+1)〜EAの適合度関数として従来の近似比を用いてインスタンスを進化させ、信頼性のあるインスタンスを生成するのに不適切であることを示す。そこで本研究では,性能比の分散を考慮した2つのアルゴリズムの性能差を計算するための新しい尺度を提案する。実験の結果,本手法は性能比の不安定性の問題の解決に成功しており,様々なアルゴリズムの性能に大きく違いがあるような,信頼性の高い確率制約セットの進化に繋がることがわかった。 Chance-constrained problems involve stochastic components in the constraints which can be violated with a small probability. We investigate the impact of different types of chance constraints on the performance of iterative search algorithms and study the classical maximum coverage problem in graphs with chance constraints. Our goal is to evolve reliable chance constraint settings for a given graph where the performance of algorithms differs significantly not just in expectation but with high confidence. This allows to better learn and understand how different types of algorithms can deal with different types of constraint settings and supports automatic algorithm selection. We develop an evolutionary algorithm that provides sets of chance constraints that differentiate the performance of two stochastic search algorithms with high confidence. We initially use traditional approximation ratio as the fitness function of (1+1)~EA to evolve instances, which shows inadequacy to generate reliable instances. To address this issue, we introduce a new measure to calculate the performance difference for two algorithms, which considers variances of performance ratios. Our experiments show that our approach is highly successful in solving the instability issue of the performance ratios and leads to evolving reliable sets of chance constraints with significantly different performance for various types of algorithms.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# LLaMA-Reg : LLaMA 2による医用画像の無監督登録 LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration ( http://arxiv.org/abs/2405.18774v1 ) ライセンス: Link先を確認	Mingrui Ma, Yu Yang,	(参考訳) 医用画像登録は, 医用画像解析において重要な課題である。本稿では,事前訓練された大言語モデルを用いた医用画像登録手法を提案する。事前訓練された大言語モデルを用いて、医用画像の深い特徴を登録モデルにエンコードすることで、医用画像登録タスクにおける大言語モデルの可能性を示す画像登録精度を効果的に向上させることができる。デュアルエンコーダを用いて、画像ペアの深い特徴抽出を行い、事前訓練された大言語モデルに特徴を入力します。登録タスクに大規模な言語モデルを適応させるためには、登録モデルにおいて大きな言語モデルの重みを凍結し、大きな言語モデルを微調整するためにアダプタを利用する。 (a)大きな言語モデルコンピューティングの前に、視覚トークンを言語空間にマッピングする。 b) 大規模言語モデルから視覚空間へ出力されるモデル化された言語トークンを計画する。提案手法は,微調整された大言語モデルから出力する特徴と,各エンコーダ層から出力する特徴とを組み合わせて,デコーダの登録に必要な変形場を徐々に生成する。登録作業における大きな予測モデルの有効性を実証するため, 膝・脳MRI実験を行い, 最先端の結果を得た。 Medical image registration is an essential topic in medical image analysis. In this paper, we propose a method for medical image registration using a pretrained large language model. We find that using the pretrained large language model to encode deep features of the medical images in the registration model can effectively improve image registration accuracy, indicating the great potential of the large language model in medical image registration tasks. We use dual encoders to perform deep feature extraction on image pairs and then input the features into the pretrained large language model. To adapt the large language model to our registration task, the weights of the large language model are frozen in the registration model, and an adapter is utilized to fine-tune the large language model, which aims at (a) mapping the visual tokens to the language space before the large language model computing, (b) project the modeled language tokens output from the large language model to the visual space. Our method combines output features from the fine-tuned large language model with the features output from each encoder layer to gradually generate the deformation fields required for registration in the decoder. To demonstrate the effectiveness of the large prediction model in registration tasks, we conducted experiments on knee and brain MRI and achieved state-of-the-art results.	翻訳日:2024-05-30 18:48:25 公開日:2024-05-29
# LMO-DP:微分プライベート微調整(大規模)言語モデルのランダム化機構の最適化 LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models ( http://arxiv.org/abs/2405.18776v1 ) ライセンス: Link先を確認	Qin Yang, Meisam Mohammad, Han Wang, Ali Payani, Ashish Kundu, Kai Shu, Yan Yan, Yuan Hong,	(参考訳) 大規模訓練済みの大規模言語モデルのための厳密なプライバシを確保するために,DP-SGDとその変種を識別的にプライベートな確率勾配Descent (DP-SGD) が提案されている。しかし、特により強力なプライバシー体制(例えば、プライバシー予算$\epsilon < 3$)では、過度に勾配を乱し、精度を低下させるガウスのメカニズムに大きく依存している。このような制約に対処するため、我々はLMO-DP(Language Model-based Optimal Differential Privacy)メカニズムを提案する。これは、強力なプライバシ体制(例えば、$0.1\leq \epsilon<3$)であっても、高度に微調整された(大規模)言語モデルの厳密な構成を可能にするための第一歩である。さらに,提案手法は,雑音の大きさを著しく低減するサブ最適DPを効率的に導出する,新しいオフライン最適雑音探索法を提案する。例えば、SST-2データセット上の細調整のRoBERTa-large(300万のパラメータを持つ)は、ガウス機構(例えば、小さな$\epsilon$と$\delta$に対して$\sim 50\%$)を大幅に上回ることによって、92.20%の精度($\epsilon=0.3$, $\delta=10^{-10}$)を達成することができる。また,GPT-2におけるテキスト生成タスクについても同様の知見が得られた。最後に、私たちの知る限り、LMO-DPは強力な差分プライバシー保証を持つLlama-2を正確に微調整する最初のソリューションでもある。コードは間もなくリリースされ、要求に応じて利用可能になる。 Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $\epsilon < 3$). To address such limitations, we propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism, which takes the first step to enable the tight composition of accurately fine-tuning (large) language models with a sub-optimal DP mechanism, even in strong privacy regimes (e.g., $0.1\leq \epsilon<3$). Furthermore, we propose a novel offline optimal noise search method to efficiently derive the sub-optimal DP that significantly reduces the noise magnitude. For instance, fine-tuning RoBERTa-large (with 300M parameters) on the SST-2 dataset can achieve an accuracy of 92.20% (given $\epsilon=0.3$, $\delta=10^{-10}$) by drastically outperforming the Gaussian mechanism (e.g., $\sim 50\%$ for small $\epsilon$ and $\delta$). We also draw similar findings on the text generation tasks on GPT-2. Finally, to our best knowledge, LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees. The code will be released soon and available upon request.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# SPABA: 最適サンプル複素性を実現する単一ループ確率確率確率的二値アルゴリズム SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity ( http://arxiv.org/abs/2405.18777v1 ) ライセンス: Link先を確認	Tianshu Chu, Dachuan Xu, Wei Yao, Jin Zhang,	(参考訳) 機械学習における大規模ネスト最適化問題に対する確率的双レベル最適化手法は広く研究されているが、双レベル最適化を解くための最適複雑性境界がシングルレベル最適化と同一であるかどうかには疑問が残る。 SPABAは, (Li et al , 2021) における非凸最適化のための PAGE 法のバイレベル設定への適応であり, 有限サムおよび期待設定の両方において最適なサンプル複雑性を実現することができる。 PAGEを実装する際に,確率的双レベルと単一レベルの最適化の間に複雑性解析のギャップがないことを証明し,SPABAの最適性を示す。特に、 (Dagr\'eou et al , 2022) の結果によって示されるように、SGD や SAGA のような他の確率勾配推定器を実装する際には、複雑性解析のギャップが存在する可能性がある。 SPABAに加えて、我々は、我々の収束率と複雑性解析を利用して、最先端のサンプル複雑性結果に適合または改善する、他のいくつかのシングルループ確率的二値アルゴリズムを提案する。数値実験により提案手法の実用的な性能を実証した。 While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex optimization in (Li et al., 2021) to the bilevel setting, can achieve optimal sample complexity in both the finite-sum and expectation settings. We show the optimality of SPABA by proving that there is no gap in complexity analysis between stochastic bilevel and single-level optimization when implementing PAGE. Notably, as indicated by the results of (Dagr\'eou et al., 2022), there might exist a gap in complexity analysis when implementing other stochastic gradient estimators, like SGD and SAGA. In addition to SPABA, we propose several other single-loop stochastic bilevel algorithms, that either match or improve the state-of-the-art sample complexity results, leveraging our convergence rate and complexity analysis. Numerical experiments demonstrate the superior practical performance of the proposed methods.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 大規模言語モデルにおけるバイアスの定量化 Quantitative Certification of Bias in Large Language Models ( http://arxiv.org/abs/2405.18780v1 ) ライセンス: Link先を確認	Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, Gagandeep Singh,	(参考訳) 大規模言語モデル(LLM)は、社会的バイアスを示し、ステレオタイプをサポートする応答を生成することができる。しかし、従来のベンチマークでは、大量のプロンプトにスケールできないため、LLMバイアスを徹底的に評価するには不十分であり、保証は提供されない。そこで本稿では,提案する新たな認証フレームワークであるQuaCer-B(Quantitative Certification of Bias)を提案する。証明書は、分布からサンプリングされた機密属性を含むプロンプトの集合に対して、LSMからバイアス応答を得る確率に関する高信頼境界から構成される。与えられた分布から様々な接頭辞を引いたプロンプトに対するLLMのバイアス認証について説明する。ランダムなトークンシーケンス,手動のジェイルブレイクの混合,およびLDMの埋め込み空間におけるジェイルブレイクの分布について検討し,そのバイアスを証明した。我々は、人気のあるLLMをQuaCer-Bで認証し、それらのバイアスに関する新しい洞察を示す。 Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtaining unbiased responses from target LLMs under large sets of prompts. A certificate consists of high-confidence bounds on the probability of obtaining biased responses from the LLM for any set of prompts containing sensitive attributes, sampled from a distribution. We illustrate the bias certification in LLMs for prompts with various prefixes drawn from given distributions. We consider distributions of random token sequences, mixtures of manual jailbreaks, and jailbreaks in the LLM's embedding space to certify its bias. We certify popular LLMs with QuaCer-B and present novel insights into their biases.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 変圧器におけるアテンションマスクとレイヤーノームの役割について On the Role of Attention Masks and LayerNorm in Transformers ( http://arxiv.org/abs/2405.18781v1 ) ライセンス: Link先を確認	Xinyi Wu, Amir Ajorlou, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie,	(参考訳) 自己注意(Self-attention)は、トランスフォーマーの鍵となるメカニズムであり、現代の基礎モデルの基本的な構成要素である。近年の研究では、深度が増加し、モデル表現率が制限され、モデル深度がさらに活用されるにつれて、純粋な自己意識がランク崩壊の度合いの上昇に悩まされることが示されている。しかし、既存の階位崩壊に関する文献は、階位崩壊問題を緩和するかもしれない変圧器の他の重要な要素を見落としている。本稿では,アテンションマスクとレイヤー正規化(LayerNorm)の影響を考慮した,自己注意下でのランク崩壊の一般解析を行う。特に、純粋なマスク付き注意は依然としてランク1部分空間への指数的崩壊に悩まされているが、局所マスク付き注意は崩壊速度を確実に遅くすることができる。 LayerNorm との自己アテンションの場合、ある値行列のクラスにおいて、ランク 1 の部分空間の崩壊が指数関数的に起こることを示す。しかし、非自明な反例の構築により、値行列の適切な選択により、列の一般類はランク 1 の部分空間に収束せず、LayerNorm の自己注意力学は1 とフルの任意のランクのリッチな平衡集合を同時に持つことができる。我々の結果は、LayerNormが自己注意のランク崩壊に何の役割も果たさないという以前の仮説を否定し、LayerNormとの自己意識が、当初考えられていたよりもはるかに表現力があり、多角的な非線形力学系を構成することを示唆している。 Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical components in transformers that may alleviate the rank collapse issue. In this paper, we provide a general analysis of rank collapse under self-attention, taking into account the effects of attention masks and layer normalization (LayerNorm). In particular, we find that although pure masked attention still suffers from exponential collapse to a rank one subspace, local masked attention can provably slow down the collapse rate. In the case of self-attention with LayerNorm, we first show that for certain classes of value matrices, collapse to a rank one subspace still happens exponentially. However, through construction of nontrivial counterexamples, we then establish that with proper choice of value matrices, a general class of sequences may not converge to a rank one subspace, and the self-attention dynamics with LayerNorm can simultaneously possess a rich set of equilibria with any possible rank between one and full. Our result refutes the previous hypothesis that LayerNorm plays no role in the rank collapse of self-attention and suggests that self-attention with LayerNorm constitutes a much more expressive, versatile nonlinear dynamical system than what was originally thought.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# プラグ・アンド・プレイプリミティブとしての拡散モデルを用いた原理的確率的イメージング Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors ( http://arxiv.org/abs/2405.18782v1 ) ライセンス: Link先を確認	Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, Katherine L. Bouman,	(参考訳) 拡散モデル (DM) は, 複雑な画像分布をモデル化する際, ベイジアン逆問題の解法として, 表現的画像先行を導出した。しかし、既存のDMベースの手法の多くは、生成過程の近似に依拠して異なる逆問題に一般化し、ベイズフレームワーク内で定義された対象の後方から逸脱する不正確なサンプル分布をもたらす。このような近似を避けつつ、DMの生成能力を活用するために、ガウスの擬似問題の後部サンプリングに還元して一般逆問題に対する後部サンプリングを行うマルコフ連鎖モンテカルロアルゴリズムを提案する。重要なことは、一般のDM定式化を統一インターフェースとして活用することで、最先端のDMを厳格に解決することができる。提案手法が実世界のブラックホールイメージング問題を含む6つの逆問題(3つの線形問題と3つの非線形問題)に対して有効であることを示す。実験結果から,提案手法は既存の DM 画像逆解析法と比較して,より高精度な再構成と後方推定が可能であることが示唆された。 Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 動的トンネル法による変分量子アルゴリズムの大域的最適化 Global optimization in variational quantum algorithms via dynamic tunneling method ( http://arxiv.org/abs/2405.18783v1 ) ライセンス: Link先を確認	Seung Park, Kyunghyun Baek, Seungjin Lee, Mahn-Soo Choi,	(参考訳) 動的トンネル流れを利用した変分量子アルゴリズムのグローバル最適化ルーチンを提案する。もともと、局所最小値の周辺で勾配に基づく最適化器が収集した情報を活用するために設計されたもので、従来の動的トンネル流を量子状態の距離測定に応用し、量子状態のパラメトリゼーションから生じる外在的縮退の問題を解消する。パラメータ空間上のユークリッド距離測定に基づく従来の動的トンネル法と比較しながら, 横フィールドイジングモデルに対する変分量子固有解法に適用し, ルーチンの性能を実証する。 We present a global optimization routine for the variational quantum algorithms, which utilizes the dynamic tunneling flow. Originally designed to leverage information gathered by a gradient-based optimizer around local minima, we adapt the conventional dynamic tunneling flow to exploit the distance measure of quantum states, resolving issues of extrinsic degeneracy arising from the parametrization of quantum states. Our global optimization algorithm is applied to the variational quantum eigensolver for the transverse-field Ising model to demonstrate the performance of our routine while comparing it with the conventional dynamic tunneling method, which is based on the Euclidean distance measure on the parameter space.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# LP-3DGS:3Dガウス平滑化の学習 LP-3DGS: Learning to Prune 3D Gaussian Splatting ( http://arxiv.org/abs/2405.18784v1 ) ライセンス: Link先を確認	Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan,	(参考訳) 近年, 3D Gaussian Splatting (3DGS) は, 高品質かつ高速なレンダリング速度のため, 新規ビュー合成 (NVS) の主流手法の1つとなっている。しかし、ポイントベースのシーン表現として、3DGSはシーンに適合する多数のガウスを発生させ、高いメモリ使用率をもたらす可能性がある。提案された改善には、経験的および予め設定されたプルーニング比または重要スコア閾値のいずれかが必要である。このようなハイパーパラメータは、各シーンのレンダリング品質を維持しながら、最大プルーニング比率を最適化し達成するために、複数のラウンドのトレーニングを必要とする。本研究では,学習から実践までの3DGS(Learning-to-prune 3DGS,LP-3DGS)を提案する。従来のストレートスルー推定器(STE)法を用いて2次元マスク勾配を近似する代わりに,Gumbel-Sigmoid法を用いてマスク機能を再設計し,既存の3DGSのトレーニングプロセスと差別化・互換性を持たせる。大規模な実験により、LP-3DGSは効率的かつ高品質な良好なバランスを保っていることが示されている。 Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset pruning ratio or importance score threshold to prune the point cloud. Such hyperparamter requires multiple rounds of training to optimize and achieve the maximum pruning ratio, while maintaining the rendering quality for each scene. In this work, we propose learning-to-prune 3DGS (LP-3DGS), where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically. Instead of using the traditional straight-through estimator (STE) method to approximate the binary mask gradient, we redesign the masking function to leverage the Gumbel-Sigmoid method, making it differentiable and compatible with the existing training process of 3DGS. Extensive experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 量子回路コンパイルにおける最適マルチビットパスフィニングのための高速かつ適応的なアルゴリズム A Fast and Adaptable Algorithm for Optimal Multi-Qubit Pathfinding in Quantum Circuit Compilation ( http://arxiv.org/abs/2405.18785v1 ) ライセンス: Link先を確認	Gary J Mooney,	(参考訳) 量子コンピューティングは、様々な研究や産業分野において、複雑で古典的に難解な問題をシミュレートし、解決する能力を大幅に強化する可能性がある。しかし、我々は現在、デバイスが比較的小さく、かなりのノイズレベルに悩まされており、大規模な計算を禁止している、ノイズの多い中間量子(NISQ)時代にいる。この状態以降の量子的優位性を達成するためには、クビットデコヒーレンスと2量子ゲートからのノイズの影響を最小限に抑えることが不可欠である。直近のアプローチは、回路を物理デバイスにマッピングする量子回路コンパイルプロセスの最適化を改善することで、ノイズの多いゲートと回路実行時間を短縮することである。この研究は、量子回路のコンパイルマッピング問題における臨界サブルーチンとして、マルチキュービットパスフィンディングに焦点を当てている。回路SWAPゲート深さに対して量子ハードウェア上の量子ビットを最適にナビゲートする二進整数線形計画法を用いてモデル化したアルゴリズムを導入するとともに,蓄積したゲート誤差を最適化し,様々な問題修正に柔軟に適用することができる。このマルチキュービットパスフィンディングアルゴリズムは、ゲートエラーペナルティ、SWAP運動制約、およびソースおよびターゲットキュービット位置とキュービットチームの構成可能なアレンジメントを考慮に入れている。我々は、様々な量子ハードウェアレイアウトのアルゴリズムをベンチマークし、計算ランタイム、解SWAP深さ、累積SWAPゲート誤差率などの特性を評価した。結果は、現在の量子デバイスにおけるアルゴリズムの実践的ランタイムを示し、その効果を様々なハードウェア構成で比較し、将来の量子ハードウェア設計に対する洞察を与える。 Quantum computing has the potential to significantly enhance our ability to simulate and solve complex, classically intractable problems across various fields of research and industry. However, we are currently in the noisy intermediate-scale quantum (NISQ) era, where devices are relatively small and suffer from substantial noise levels, prohibiting large-scale computations. To achieve any quantum advantage in this regime and beyond, it is crucial to minimise the impact of noise from qubit decoherence and two-qubit gates. A direct approach is to improve the optimisation of quantum circuit compilation processes that map circuits onto physical devices, thereby reducing noisy gates and circuit execution times. This work focuses on multi-qubit pathfinding as a critical subroutine within the quantum circuit compilation mapping problem. We introduce an algorithm, modelled using binary integer linear programming, that navigates qubits on quantum hardware optimally with respect to circuit SWAP-gate depth, while also optimising for accumulated gate errors and can be flexibly adapted to various problem modifications. This multi-qubit pathfinding algorithm incorporates considerations for gate-error penalties, SWAP movement constraints, and configurable arrangements of source and target qubit locations and qubit teams. We have benchmarked the algorithm across a variety of quantum hardware layouts, assessing properties such as computational runtimes, solution SWAP depths, and accumulated SWAP-gate error rates. The results demonstrate the algorithm's practical runtimes on current quantum devices and compare its effectiveness across different hardware configurations, providing insights for future quantum hardware design.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# MOKD:最適化カーネル依存性の最大化によるFew-shot分類のためのクロスドメインファインタニング MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence ( http://arxiv.org/abs/2405.18786v1 ) ライセンス: Link先を確認	Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu-ming Cheung, Bo Han,	(参考訳) クロスドメインな小ショット分類において、 'emph{nearest centroid classifier} (NCC) は、サンプルと各クラスのプロトタイプの類似性を測定することで、少数ショット分類を行うことができる計量空間を構築するために表現を学ぶことを目的としている。 NCCの背後にある直感は、各サンプルは他のクラスのサンプルから押し離されながら、その標本が属するクラスセントロイドに近づくことである。しかし,本論文では,異なるクラスからの2つのサンプルの NCC 学習表現に高い類似性があることが判明した。この問題に対処するために、与えられたタスクのラベル付きデータで示されるクラスタ構造にマッチするクラス固有の表現の集合を学習するために、二段階最適化フレームワークである 'emph{maximizing Optimization kernel dependency} (MOKD) を提案する。特に、MOKDは最初に \emph{Hilbert-Schmidt Independence criterion} (HSIC) で採用されているカーネルを最適化し、より正確に依存を捉えることができる最適化されたカーネルHSIC (opt-HSIC) を得る。次に、オプトHSICに関する最適化問題に対処し、表現とラベル間の依存を同時に最大化し、全てのサンプル間の依存を最小限に抑える。 Meta-Datasetに関する大規模な実験により、MOKDは、ほとんどの場合、目に見えないドメインでのより優れた一般化性能を達成できるだけでなく、より良いデータ表現クラスタを学習できることが示されている。 MOKDのプロジェクトリポジトリは以下の通りである。 \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}。 In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other classes. However, in this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. In order to address this problem, we propose a bi-level optimization framework, \emph{maximizing optimized kernel dependence} (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data of the given task. Specifically, MOKD first optimizes the kernel adopted in \emph{Hilbert-Schmidt independence criterion} (HSIC) to obtain the optimized kernel HSIC (opt-HSIC) that can capture the dependence more precisely. Then, an optimization problem regarding the opt-HSIC is addressed to simultaneously maximize the dependence between representations and labels and minimize the dependence among all samples. Extensive experiments on Meta-Dataset demonstrate that MOKD can not only achieve better generalization performance on unseen domains in most cases but also learn better data representation clusters. The project repository of MOKD is available at: \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# マルチスケールDeep Feature Statistics を用いたオピニオン・ウインドウ・ブラインド画像品質評価 Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics ( http://arxiv.org/abs/2405.18790v1 ) ライセンス: Link先を確認	Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang,	(参考訳) 深層学習に基づく手法はブラインド画像品質評価(BIQA)の分野に大きな影響を与えてきたが、これらの手法は多量の人間の評価データを用いたトレーニングを必要とすることが多い。対照的に、従来の知識に基づく手法は訓練に費用対効果があるが、人間の視覚的知覚に沿った特徴を効果的に抽出する際の課題に直面している。これらのギャップを埋めるために、我々は、事前学習された視覚モデルから統計解析モデルへの深い特徴を、意見不明なBIQA(OU-BIQA)を達成するためのマルチスケールDeep Feature Statistics(MDFS)モデルに統合し、人間のレーティングデータへの依存をなくし、トレーニング効率を著しく改善することを提案する。具体的には、事前訓練された視覚モデルからパッチワイドなマルチスケール特徴を抽出し、その後、多変量ガウスモデル(MVG)に組み込む。テスト画像から派生したMVGモデルと、高品質な画像集合から派生したベンチマークMVGモデルとの距離を定量化して最終品質スコアを決定する。各種データセットを用いた総合的な実験の結果,提案モデルでは,最先端のBIQAモデルと比較して,人間の視覚知覚との整合性が良好であることが示された。さらに、多様なターゲット固有のBIQAタスク間での一般化性の向上を示す。私たちのコードは、https://github.com/eezkni/MDFSで利用可能です。 Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 決定論的RLポリティクスのインサンプルオフポリティ評価のためのカーネルメトリック学習 Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies ( http://arxiv.org/abs/2405.18792v1 ) ライセンス: Link先を確認	Haanvid Lee, Tri Wahyu Guntara, Jongmin Lee, Yung-Kyun Noh, Kee-Eung Kim,	(参考訳) 連続行動空間を有する環境における強化学習(RL)のための決定論的目標政策のオフ・ポリティクス評価(OPE)を検討する。 OPEの重要サンプリングは一般的に用いられるが,行動方針が目標方針から著しく逸脱した場合には,高いばらつきに悩まされる。この問題に対処するため、OPEに関する最近の研究は、重要な再サンプリングを伴うインサンプルラーニングを提案している。しかし、これらのアプローチは連続的な作用空間に対する決定論的対象ポリシーには適用できない。この制限に対処するために、カーネルを用いた決定論的ターゲットポリシーを緩和し、アクション値関数の推定時間差更新ベクトルの総平均二乗誤差を最小限に抑えるカーネルメトリクスを学習し、アクション値関数をポリシー評価に使用する。この緩和による推定誤差のバイアスと分散を導出し、最適なカーネル計量に対する解析解を提供する。種々のテスト領域を用いた実証実験において,カーネルを用いたサンプル内学習を用いたOPEは,他のベースラインよりも精度が大幅に向上することを示した。 We consider off-policy evaluation (OPE) of deterministic target policies for reinforcement learning (RL) in environments with continuous action spaces. While it is common to use importance sampling for OPE, it suffers from high variance when the behavior policy deviates significantly from the target policy. In order to address this issue, some recent works on OPE proposed in-sample learning with importance resampling. Yet, these approaches are not applicable to deterministic target policies for continuous action spaces. To address this limitation, we propose to relax the deterministic target policy using a kernel and learn the kernel metrics that minimize the overall mean squared error of the estimated temporal difference update vector of an action value function, where the action value function is used for policy evaluation. We derive the bias and variance of the estimation error due to this relaxation and provide analytic solutions for the optimal kernel metric. In empirical studies using various test domains, we show that the OPE with in-sample learning using the kernel with optimized metric achieves significantly improved accuracy than other baselines.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 距離空間における適応的離散化に基づく非エポゾディック強化学習 Adaptive Discretization-based Non-Episodic Reinforcement Learning in Metric Spaces ( http://arxiv.org/abs/2405.18793v1 ) ライセンス: Link先を確認	Avik Kar, Rahul Singh,	(参考訳) 状態作用空間を距離空間とし、遷移核と報酬をリプシッツ関数とするリプシッツ MDP に対する非絶対強化学習について検討する。計算効率の良い UCB-based algorithm, $\textit{ZoRL-}\epsilon$ は、状態-作用空間を適応的に離散化し、それらの後悔が $\epsilon$-optimal policy に対して $\mathcal{O}(\epsilon^{-(2d_\mathcal{S} + d^\epsilon_z + 1)}\log{(T)} として有界であることを示し、$d^\epsilon_z$ は $\epsilon$-zooming dimension である。対照的に、MDP の固定離散化にバニラ $\textit{UCRL-}2$ を使用する場合、後悔 w.r.t. a $\epsilon$-optimal policy scales as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d + 1)}\log{(T)}$ として、d^\epsilon_z \ll d$ のとき、適応性は巨大になる。連続 MDP の大きな族に対する「一様良い」アルゴリズムの絶対的後悔は、少なくとも$\Omega(\log{(T)})$として漸近的にスケールする。適応的な離散化は、エピソード RL において $\mathcal{\tilde{O}}(H^{2.5}K^\frac{d_z + 1}{d_z + 2})$ regret をもたらすことが示されているが、$d_z \to d$ as $T \to \infty$ であるから、持続時間が $T$ で増加する一定期間のエピソードを用いてこれを非エピソードケースに拡張しようとする試みは無駄である。現在の研究は、非エポゾディックRLに対する適応性ゲインを得る方法を示している。理論的結果は、連続的な状態作用空間を持つシステムに対する$\textit{ZoRL-}\epsilon$ と '$\textit{UCRL-C}$' の固定離散化に基づく $\textit{UCRL-}2$ の性能を比較する2つのシステムのシミュレーションによって支持される。 We study non-episodic Reinforcement Learning for Lipschitz MDPs in which state-action space is a metric space, and the transition kernel and rewards are Lipschitz functions. We develop computationally efficient UCB-based algorithm, $\textit{ZoRL-}\epsilon$ that adaptively discretizes the state-action space and show that their regret as compared with $\epsilon$-optimal policy is bounded as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d^\epsilon_z + 1)}\log{(T)})$, where $d^\epsilon_z$ is the $\epsilon$-zooming dimension. In contrast, if one uses the vanilla $\textit{UCRL-}2$ on a fixed discretization of the MDP, the regret w.r.t. a $\epsilon$-optimal policy scales as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d + 1)}\log{(T)})$ so that the adaptivity gains are huge when $d^\epsilon_z \ll d$. Note that the absolute regret of any 'uniformly good' algorithm for a large family of continuous MDPs asymptotically scales as at least $\Omega(\log{(T)})$. Though adaptive discretization has been shown to yield $\mathcal{\tilde{O}}(H^{2.5}K^\frac{d_z + 1}{d_z + 2})$ regret in episodic RL, an attempt to extend this to the non-episodic case by employing constant duration episodes whose duration increases with $T$, is futile since $d_z \to d$ as $T \to \infty$. The current work shows how to obtain adaptivity gains for non-episodic RL. The theoretical results are supported by simulations on two systems where the performance of $\textit{ZoRL-}\epsilon$ is compared with that of '$\textit{UCRL-C}$,' the fixed discretization-based extension of $\textit{UCRL-}2$ for systems with continuous state-action spaces.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 参照アドバンテージ分解によるフェデレーションQ-Learning: ほぼ最適回帰と対数通信コスト Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost ( http://arxiv.org/abs/2405.18795v1 ) ライセンス: Link先を確認	Zhong Zheng, Haochen Zhang, Lingzhou Xue,	(参考訳) 本稿では,表在的マルコフ決定過程におけるモデル自由連合強化学習について考察する。中央サーバの協調の下で、複数のエージェントが協調して環境を探索し、生データを共有せずに最適なポリシーを学ぶ。フェデレートされたQ-ラーニングアルゴリズムの最近の進歩は、通信コストの低いほぼ直線的後悔のスピードアップを実現しているにもかかわらず、既存のアルゴリズムは情報バウンドよりも過度な後悔しか達成していない。本稿では,FedQ-Advantageと呼ばれる新しいモデルフリーなフェデレーションQ-ラーニングアルゴリズムを提案する。提案アルゴリズムは,分散低減のための参照アドバンテージ分解を利用して,エージェントとサーバ間の同期と,イベントによって引き起こされるポリシー更新という,2つの異なるメカニズムの下で動作する。本アルゴリズムは対数通信コストの低減だけでなく,時間的地平線が十分に大きい場合と比較して,対数係数とほぼ直線的後悔速度に制限された情報に到達し,ほぼ最適に後悔することを示す。 In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication cost, existing algorithms only attain suboptimal regrets compared to the information bound. We propose a novel model-free federated Q-learning algorithm, termed FedQ-Advantage. Our algorithm leverages reference-advantage decomposition for variance reduction and operates under two distinct mechanisms: synchronization between the agents and the server, and policy update, both triggered by events. We prove that our algorithm not only requires a lower logarithmic communication cost but also achieves an almost optimal regret, reaching the information bound up to a logarithmic factor and near-linear regret speedup compared to its single-agent counterpart when the time horizon is sufficiently large.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 物体学習型畳み込みニューラルネットワークによる顔処理 Face processing emerges from object-trained convolutional neural networks ( http://arxiv.org/abs/2405.18800v1 ) ライセンス: Link先を確認	Zhenhua Zhao, Ji Chen, Zhicheng Lin, Haojiang Ying,	(参考訳) 顔処理はドメイン固有の神経認知機構に依存するのか、それともドメイン全般の物体認識機構に依存しているのかは、長い間議論されてきた。これらの仮説をヒトで直接テストすることは、顔と物体の両方に広範囲に露出するため、難しいことが証明されている。ここでは、顔の露出なしに訓練できる畳み込みニューラルネットワーク(CNN)の最近の進歩に乗じて、これらの仮説を体系的に検証する。ドメイン・ジェネラル・メカニズムは、顔に特別な事前トレーニングを加えることなく、顔処理がニューラルネットワークから現れることを実証している。その結果、私たちはCNNを物体だけに訓練し、顔の認識と表現能力、および顔のように見える物体(顔のパレドリア刺激)をテストしました。文字制限のため、詳細は付加されたpdfを参照してください。 Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks (CNNs) that can be trained without face exposure (i.e., pre-trained weights). Domain-general mechanism accounts posit that face processing can emerge from a neural network without specialized pre-training on faces. Consequently, we trained CNNs solely on objects and tested their ability to recognize and represent faces as well as objects that look like faces (face pareidolia stimuli).... Due to the character limits, for more details see in attached pdf	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# SketchTriplet: 自己監督型Sketch-Text- Image Triplet生成 SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation ( http://arxiv.org/abs/2405.18801v1 ) ライセンス: Link先を確認	Zhenbei Wu, Qiang Wang, Jie Yang,	(参考訳) フリーハンドスケッチの不足は、難しい問題である。大規模なスケッチデータセットの出現にもかかわらず、これらのデータセットは主に単一のオブジェクトレベルでのスケッチで構成されている。シーンスケッチ用の大規模なペアデータセットは引き続き欠如している。本稿では,既存のシーンスケッチに依存しないシーンスケッチ自動生成手法を提案し,シーンスケッチへの単一オブジェクトスケッチの変換を可能にする。そこで本研究では,ベクトルスケッチキャプションとスケッチセマンティック展開のための手法を提案する。さらに,マルチモーダルな知覚制約を融合したスケッチ生成ネットワークを設計し,ゼロショット画像・スケッチダウンストリームタスクに適用し,実験検証による最先端性能の実証を行う。最後に,提案したスケッチ・ツー・スケッチ生成手法を利用して,シーン・スケッチを中心にした大規模データセットをコントリビュートする。本研究は,スケッチベース画像検索およびスケッチ制御画像合成タスクにおいて,既存のモデルの性能を大幅に向上させることができることを確認した。データセットとコードを公開します。 The scarcity of free-hand sketch presents a challenging problem. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multi-modal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets. Our research confirms that this dataset can significantly enhance the capabilities of existing models in sketch-based image retrieval and sketch-controlled image synthesis tasks. We will make our dataset and code publicly available.	翻訳日:2024-05-30 18:38:40 公開日:2024-05-29
# 更新ダイジェストと投票に基づく防衛を用いたフェデレーション学習におけるセキュリティとプライバシの強化 Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense ( http://arxiv.org/abs/2405.18802v1 ) ライセンス: Link先を確認	Wenjie Li, Kai Fan, Jingyuan Zhang, Hui Li, Wei Yang Bryan Lim, Qiang Yang,	(参考訳) Federated Learning(FL)は、データ所有者がデータのローカライズを維持しながら、モデルの共同トレーニングを可能にする、有望なプライバシ保護機械学習パラダイムである。その可能性にもかかわらず、FLはクライアントとサーバの両方の信頼性に関する課題に直面している。本稿では,分散学習環境におけるビザンチン攻撃に対するプライバシー保護と耐性の重要な問題に対処する,新しいフレームワークである \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD)を紹介する。 FLUDは、$\mathsf{LinfSample}$メソッドという革新的なアプローチを採用している。このダイジェストにより、サーバは共有距離行列を計算し、セキュアなマルチパーティ計算(SMPC)に関連するオーバーヘッドを3桁に減らし、良質な更新と悪質な更新を効果的に区別することができる。さらにFLUDは、通信ラウンドを最小化するために最適化されたSMPCプロトコルを使用するプライバシー保護、投票ベースの防衛メカニズムを統合している。包括的実験では、通信の低さと実行時のオーバーヘッドを伴いながら、ビザンチンの敵に対するFLUDの有効性を実証した。 FLUDは、分散環境におけるセキュアで信頼性の高いFLのためのスケーラブルなフレームワークを提供する。 Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD), which addresses the critical issues of privacy preservation and resistance to Byzantine attacks within distributed learning environments. FLUD utilizes an innovative approach, the $\mathsf{LinfSample}$ method, allowing clients to compute the $l_{\infty}$ norm across sliding windows of updates as an update digest. This digest enables the server to calculate a shared distance matrix, significantly reducing the overhead associated with Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and malicious updates. Additionally, FLUD integrates a privacy-preserving, voting-based defense mechanism that employs optimized SMPC protocols to minimize communication rounds. Our comprehensive experiments demonstrate FLUD's effectiveness in countering Byzantine adversaries while incurring low communication and runtime overhead. FLUD offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# ニューラルネットワークにおけるセミリング活性化 Semiring Activation in Neural Networks ( http://arxiv.org/abs/2405.18805v1 ) ライセンス: Link先を確認	Bart M. N. Smets, Peter D. Donker, Jim W. Portegies, Remco Duits,	(参考訳) ニューラルネットワークでの使用に適したセミリングに基づいて、トレーニング可能な非線形演算子のクラスを導入する。これらの作用素は、ニューラルネットワークにおける活性化関数を持つ線形作用素の伝統的な交替を一般化する。セミリング(英: Semiring)は、線形性の一般化された表記を記述する代数的構造であり、ニューラルネットワークに含まれる訓練可能な作用素の範囲を大きく広げている。実際、最大または最小プール演算は、固定された核を持つ熱帯半環の畳み込みである。トレーニング可能なセミリング演算子の活性化関数を置き換える実験を行い、これらが完全に接続されただけでなく畳み込みニューラルネットワーク(ConvNeXt)にも適用可能であることを示す。本稿では,従来のアクティベーション関数をトレーニング可能なセミリングアクティベーションに置き換えることの課題と,そのトレードオフについて論じる。 We introduce a class of trainable nonlinear operators based on semirings that are suitable for use in neural networks. These operators generalize the traditional alternation of linear operators with activation functions in neural networks. Semirings are algebraic structures that describe a generalised notation of linearity, greatly expanding the range of trainable operators that can be included in neural networks. In fact, max- or min-pooling operations are convolutions in the tropical semiring with a fixed kernel. We perform experiments where we replace the activation functions for trainable semiring-based operators to show that these are viable operations to include in fully connected as well as convolutional neural networks (ConvNeXt). We discuss some of the challenges of replacing traditional activation functions with trainable semiring activations and the trade-offs of doing so.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# BRACTIVE:人間の視覚脳学習における脳活動的アプローチ BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning ( http://arxiv.org/abs/2405.18808v1 ) ライセンス: Link先を確認	Xuan-Bac Nguyen, Hojin Jang, Xin Li, Samee U. Khan, Pawan Sinha, Khoa Luu,	(参考訳) 人間の脳は、非常に効率的な処理ユニットであり、その仕組みを理解することによって、機械学習における新しいアルゴリズムとアーキテクチャを刺激することができる。本研究では,脳活動ネットワーク(BRACTIVE)という,人間の視覚脳を研究するためのトランスフォーマーベースのアプローチを紹介する。 BRACTIVEの主な目的は、被験者の視覚的特徴をfMRI信号を介して対応する脳表現と整合させることである。これにより、被験者の脳の関心領域(ROI)を特定できます。従来の脳研究手法とは異なり、1つの被験者のROIしか識別できず、被験者数によって制限されているが、BRACTIVEは自動的に複数の被験者とROIに識別を拡張している。実験の結果, BRACTIVEは, 顔や身体選択領域などの興味のある領域を効果的に同定し, 神経科学的な所見と整合し, 様々な対象カテゴリーに適用可能であることが示された。さらに重要なのは、人間の視覚的脳活動を利用して、ディープニューラルネットワークを誘導することで、さまざまなベンチマークのパフォーマンスが向上することです。これは、神経科学と機械知能研究の両方においてBRACTIVEの可能性を促進する。 The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding brain representations via fMRI signals. It allows us to identify the brain's Regions of Interest (ROI) of the subjects. Unlike previous brain research methods, which can only identify ROIs for one subject at a time and are limited by the number of subjects, BRACTIVE automatically extends this identification to multiple subjects and ROIs. Our experiments demonstrate that BRACTIVE effectively identifies person-specific regions of interest, such as face and body-selective areas, aligning with neuroscience findings and indicating potential applicability to various object categories. More importantly, we found that leveraging human visual brain activity to guide deep neural networks enhances performance across various benchmarks. It encourages the potential of BRACTIVE in both neuroscience and machine intelligence studies.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# UniPTS: 熟練したポストトレーニングスパシティのための統一フレームワーク UniPTS: A Unified Framework for Proficient Post-Training Sparsity ( http://arxiv.org/abs/2405.18810v1 ) ライセンス: Link先を確認	Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji,	(参考訳) Post-training Sparsity (PTS)は、必要な限られたデータで効率的なネットワークスパシティを追求する、最近登場した道である。しかし、既存のPSS手法は、データセット全体を通してスパースネットワークをリトレーニングする従来の手法と比較して、特に高空間比で性能が著しく低下している。本稿では,従来のスパシティの性能をPSSの文脈に大きく変化させる3つの基本因子を変換することで,この相違を解消しようとする。特に本研究は,(1)高密度ネットワークからスパースネットワークへの効率的な知識伝達を促進するベースデケイド・スパシティーの目的から成っている。 2) PTS の小型キャリブレーションに過度な適合を回避しつつ,最適空間分布を推定する探索アルゴリズムについて検討した。 (3) トレーニング安定性を確保しつつ, 空間構造を包括的に最適化することを目的とした, 事前の側面を前提としたダイナミックスパーストレーニングの実施。提案するフレームワークはUniPTSと呼ばれ,既存のPTS手法よりも広範なベンチマークで優れていることが検証されている。図示として、最近提案されたレシピであるPOTのパフォーマンスを3.9%から68.6%に向上させ、ImageNet上でResNet-50を90%の間隔でプルーニングする。論文のコードはhttps://github.com/xjjxmu/UniPTS.comで公開しています。 Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# MindSemantix:脳-言語モデルによる脳視覚体験の解読 MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model ( http://arxiv.org/abs/2405.18812v1 ) ライセンス: Link先を確認	Ziqi Ren, Jie Li, Xuetong Xue, Xin Li, Fan Yang, Zhicheng Jiao, Xinbo Gao,	(参考訳) fMRIで捉えた脳の活動を通して人間の視覚体験を解読することは、神経科学研究の分野における魅力的な最先端の課題である。観察画像自体を単に予測するのではなく、脳活動を意味のあるキャプションにデコードすることで、視覚情報の高レベルな解釈と要約が可能になり、現実の状況における応用の柔軟性が自然に向上する。本研究では,脳活動における視覚的に誘発される意味的内容の理解を可能にする,新しいマルチモーダルフレームワークであるMindSemantixを紹介する。私たちのMindSemantixは、LLMを脳活動分析に織り込み、シームレスでエンドツーエンドのBrain-Language Modelを構築することで、より理想的な脳キャプションパラダイムを探求しています。脳の応答から意味情報を効果的に捉えるために,脳Q-Formerをコアアーキテクチャとして利用するBrain-Text Transformerを提案する。トレーニング済みの脳エンコーダと凍結LDMを統合して、脳ビジョン言語を多モードでアライメントし、堅牢な脳-言語対応を確立する。神経表現の一般化性を高めるために,脳エンコーダを自己教師付き学習技術を用いて,大規模・クロスオブジェクトfMRIデータセット上で事前訓練する。 MindSemantixは、刺激再構成のような下流脳のデコードタスクに、より実現可能性を提供します。 MindSemantixのキャプションにより、私たちのフレームワークは、安定拡散のような高度な生成モデルと統合し、脳の視覚的知覚を理解することを促進する。 MindSemantixは、脳の活動から派生した視覚的および意味的な情報に深く根ざした高品質なキャプションを生成する。このアプローチは、先行技術よりも相当に定量的に改善されている。私たちのコードは解放されます。 Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge in the field of neuroscience research. Compared to merely predicting the viewed image itself, decoding brain activity into meaningful captions provides a higher-level interpretation and summarization of visual information, which naturally enhances the application flexibility in real-world situations. In this work, we introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity. Our MindSemantix explores a more ideal brain captioning paradigm by weaving LLMs into brain activity analysis, crafting a seamless, end-to-end Brain-Language Model. To effectively capture semantic information from brain responses, we propose Brain-Text Transformer, utilizing a Brain Q-Former as its core architecture. It integrates a pre-trained brain encoder with a frozen LLM to achieve multi-modal alignment of brain-vision-language and establish a robust brain-language correspondence. To enhance the generalizability of neural representations, we pre-train our brain encoder on a large-scale, cross-subject fMRI dataset using self-supervised learning techniques. MindSemantix provides more feasibility to downstream brain decoding tasks such as stimulus reconstruction. Conditioned by MindSemantix captioning, our framework facilitates this process by integrating with advanced generative models like Stable Diffusion and excels in understanding brain visual perception. MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity. This approach has demonstrated substantial quantitative improvements over prior art. Our code will be released.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# 反復的破壊軌道マッチングによる線形逆問題に対するフロー優先法 Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching ( http://arxiv.org/abs/2405.18816v1 ) ライセンス: Link先を確認	Yasi Zhang, Peiyu Yu, Yaxuan Zhu, Yingshan Chang, Feng Gao, Ying Nian Wu, Oscar Leong,	(参考訳) フローマッチングに基づく生成モデルは、高解像度画像合成において、その単純さと優れた性能のために大きな注目を集めている。変数の即時変化式を利用することで、学習フローから直接画像可能性を計算することができ、逆問題などの下流タスクの先行候補として候補を魅了することができる。特に、そのような画像確率をMAP推定問題に組み込むことが自然なアプローチである。しかし、大きな障害は、ODEソルバをバックプロパゲートする必要があるため、ログのような計算が遅いことにある。本研究では,MAP推定器を効率的に近似し,様々な線形逆問題の解法を提案する。我々のアルゴリズムは、MAPの目的を関数評価の数である$N$ ``local MAP'の目的の和で近似できるという観察によって数学的に正当化されている。ツイーディの公式を利用することで、これらの目的を逐次最適化するために勾配ステップを実行できることを示す。我々は,超解法,デブロアリング,インペインティング,圧縮センシングなどの線形逆問題に対するアプローチを検証し,フローマッチングに基づく他の手法よりも優れていることを示す。 Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis. By leveraging the instantaneous change-of-variables formula, one can directly compute image likelihoods from a learned flow, making them enticing candidates as priors for downstream tasks such as inverse problems. In particular, a natural approach would be to incorporate such image probabilities in a maximum-a-posteriori (MAP) estimation problem. A major obstacle, however, lies in the slow computation of the log-likelihood, as it requires backpropagating through an ODE solver, which can be prohibitively slow for high-dimensional problems. In this work, we propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems. Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ ``local MAP'' objectives, where $N$ is the number of function evaluations. By leveraging Tweedie's formula, we show that we can perform gradient steps to sequentially optimize these objectives. We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing, and demonstrate that we can outperform other methods based on flow matching.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# 効率的な持続型位相最適化のための微分型補間法 Diffeomorphic interpolation for efficient persistence-based topological optimization ( http://arxiv.org/abs/2405.18820v1 ) ライセンス: Link先を確認	Mathieu Carriere, Marc Theveneau, Théo Lacombe,	(参考訳) トポロジカルデータ分析(TDA)は、構造化オブジェクトから定量的トポロジカル記述子を抽出するパイプラインを提供する。これにより、ある対象が幾らかの位相的性質を示す範囲を主張する位相的損失函数の定義が可能になる。これらの損失は、トポロジカル最適化勾配降下ルーチンの実行に使用できる。勾配は極めてスパースである傾向があるが、一般に損失関数は入力対象のごくわずかな座標にのみ依存するので、実際は劇的に遅い最適化スキームが得られるので、点雲の位相最適化の中心的なケースに着目して、微分型補間を用いてこの制限を克服し、スパース勾配を空間全体上で定義された滑らかなベクトル場に、量子化リプシッツ定数で変換する。特に,本手法は,TDAで日常的に使用されるサブサンプリング手法と効率的に組み合わせることで,サブサンプル上で計算された勾配から導出される微分同相法を用いて,全入力オブジェクトの座標を更新し,前例のないスケールで点雲の位相最適化を行うことができることを示す。最後に,ブラックボックスオートエンコーダ(AE)正則化に対する我々のアプローチの妥当性を示すとともに,固定,事前学習,ブラックボックスAEモデルに関連する潜在空間のトポロジ的事前適用を目標とし,微分型フローの学習を一度に行うことができ,線形時間で新たなデータに再適用可能であることを示す(ただし,バニラトポロジ的最適化はスクラッチから再実行する必要がある)。さらに、フローを反転させることで、トポロジ的に最適化された潜在空間を直接サンプリングすることでデータを生成することができ、モデルのより優れた解釈可能性が得られる。 Topological Data Analysis (TDA) provides a pipeline to extract quantitative topological descriptors from structured objects. This enables the definition of topological loss functions, which assert to what extent a given object exhibits some topological properties. These losses can then be used to perform topological optimizationvia gradient descent routines. While theoretically sounded, topological optimization faces an important challenge: gradients tend to be extremely sparse, in the sense that the loss function typically depends on only very few coordinates of the input object, yielding dramatically slow optimization schemes in practice.Focusing on the central case of topological optimization for point clouds, we propose in this work to overcome this limitation using diffeomorphic interpolation, turning sparse gradients into smooth vector fields defined on the whole space, with quantifiable Lipschitz constants. In particular, we show that our approach combines efficiently with subsampling techniques routinely used in TDA, as the diffeomorphism derived from the gradient computed on a subsample can be used to update the coordinates of the full input object, allowing us to perform topological optimization on point clouds at an unprecedented scale. Finally, we also showcase the relevance of our approach for black-box autoencoder (AE) regularization, where we aim at enforcing topological priors on the latent spaces associated to fixed, pre-trained, black-box AE models, and where we show thatlearning a diffeomorphic flow can be done once and then re-applied to new data in linear time (while vanilla topological optimization has to be re-run from scratch). Moreover, reverting the flow allows us to generate data by sampling the topologically-optimized latent space directly, yielding better interpretability of the model.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# 自由な毒性検出 Toxicity Detection for Free ( http://arxiv.org/abs/2405.18822v1 ) ライセンス: Link先を確認	Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner,	(参考訳) 現在のLSMは一般に安全要件に従うように調整されており、有害なプロンプトを拒否する傾向がある。しかし、LSMは有害なプロンプトを拒絶したり、過度に注意し、良心的な例を拒否することができない。さらに、最先端の毒性検知器は、低いFPRで低いTPRを持ち、有害な例が稀な現実世界のアプリケーションに高いコストをもたらす。本稿では,LSM自体から直接抽出した情報を用いて有害なプロンプトを検出するMULI(Moduleration Using LLM Introspection)について検討する。代替拒絶反応の分布と第1応答トークンのロジットの分布において,良性と有毒なプロンプトの間に有意な差が認められた。特定の開始トークンのロバストに基づく玩具モデルでは、トレーニングや追加の計算コストを必要とせず、信頼性の高い性能が得られることを示す。我々は、複数の測定値の下でSOTA検出器をはるかに上回る、第1応答トークンロジットのスパースロジスティック回帰モデルを用いて、よりロジスティックな検出器を構築する。 Current LLMs are generally aligned to follow safety requirements and tend to refuse toxic prompts. However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples. In addition, state-of-the-art toxicity detectors have low TPRs at low FPR, incurring high costs in real-world applications where toxic examples are rare. In this paper, we explore Moderation Using LLM Introspection (MULI), which detects toxic prompts using the information extracted directly from LLMs themselves. We found significant gaps between benign and toxic prompts in the distribution of alternative refusal responses and in the distribution of the first response token's logits. These gaps can be used to detect toxicities: We show that a toy model based on the logits of specific starting tokens gets reliable performance, while requiring no training or additional computational cost. We build a more robust detector using a sparse logistic regression model on the first response token logits, which greatly exceeds SOTA detectors under multiple metrics.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# エネルギーシステムにおける強化学習はなぜ説明を必要とするのか Why Reinforcement Learning in Energy Systems Needs Explanations ( http://arxiv.org/abs/2405.18823v1 ) ライセンス: Link先を確認	Hallah Shahid Butt, Benjamin Schäfer,	(参考訳) 経済発展に伴い、インフラの複雑さは劇的に増大した。同様に、化石燃料から再生可能エネルギー源への移行によって、正確な予測と予測だけでなく、予測のプロセスの理解にも役立つようなシステムが必要である。人工知能と機械学習技術は、エネルギーセクターのさまざまな問題に対する優れたソリューションを見つけるのに役立っている。しかし、強化学習のような最先端技術の使用は驚くべきことではない。本稿では,エネルギーシステムにおける強化技術の適用と,これらのモデルの説明がいかに役立つかについて論じる。 With economic development, the complexity of infrastructure has increased drastically. Similarly, with the shift from fossil fuels to renewable sources of energy, there is a dire need for such systems that not only predict and forecast with accuracy but also help in understanding the process of predictions. Artificial intelligence and machine learning techniques have helped in finding out wellperforming solutions to different problems in the energy sector. However, the usage of state-of-the-art techniques like reinforcement learning is not surprisingly convincing. This paper discusses the application of reinforcement techniques in energy systems and how explanations of these models can be helpful	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# グラフニューラルネットワークに対するラベル伝搬に基づくノード注入攻撃 Node Injection Attack Based on Label Propagation Against Graph Neural Network ( http://arxiv.org/abs/2405.18824v1 ) ライセンス: Link先を確認	Peican Zhu, Zechen Pan, Keke Tang, Xiaodong Cui, Jinhuan Wang, Qi Xuan,	(参考訳) グラフニューラルネットワーク(GNN)は,ノード分類,リンク予測,グラフ分類など,さまざまなグラフ学習タスクにおいて顕著な成功を収めている。 GNNの成功の鍵は、近隣の集約による効果的な構造情報表現にある。しかし、攻撃者は偽ノードを注入することで容易に集約プロセスを摂動でき、グラフインジェクション攻撃に対してGNNが脆弱であることを明らかにする。既存のグラフインジェクション攻撃法は主に、ラベル伝搬による近隣の集約プロセスを見下ろしながら、古典的な特徴集約プロセスの損傷に焦点を当てている。このギャップを埋めるために,ノード分類タスクに対してグラフ注入攻撃を行うラベル伝搬型グローバルインジェクションアタック(LPGIA)を提案する。具体的には,ラベル伝播の観点から集約プロセスを解析し,グラフ注入攻撃問題をグローバルなインジェクションラベル特異性攻撃問題に変換する。この問題を解決するため、LPGIAはラベル伝搬に基づく戦略を用いて、注入ノードに接続されたノードの組み合わせを最適化する。次に、LPGIAは機能マッピングを利用して、注入されたノードの悪意ある機能を生成する。代表的GNNに対する広範な実験において、LPGIAは様々なデータセットにおいて、これまでの最も優れたインジェクション攻撃法よりも優れており、その優位性と転送性を示している。 Graph Neural Network (GNN) has achieved remarkable success in various graph learning tasks, such as node classification, link prediction and graph classification. The key to the success of GNN lies in its effective structure information representation through neighboring aggregation. However, the attacker can easily perturb the aggregation process through injecting fake nodes, which reveals that GNN is vulnerable to the graph injection attack. Existing graph injection attack methods primarily focus on damaging the classical feature aggregation process while overlooking the neighborhood aggregation process via label propagation. To bridge this gap, we propose the label-propagation-based global injection attack (LPGIA) which conducts the graph injection attack on the node classification task. Specifically, we analyze the aggregation process from the perspective of label propagation and transform the graph injection attack problem into a global injection label specificity attack problem. To solve this problem, LPGIA utilizes a label propagation-based strategy to optimize the combinations of the nodes connected to the injected node. Then, LPGIA leverages the feature mapping to generate malicious features for injected nodes. In extensive experiments against representative GNNs, LPGIA outperforms the previous best-performing injection attack method in various datasets, demonstrating its superiority and transferability.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# 最小資源を持つ量子ビット上のデバイス非依存次元リークヌルテスト Device-independent dimension leakage null test on qubits with minimal resources ( http://arxiv.org/abs/2405.18827v1 ) ライセンス: Link先を確認	Tomasz Rybotycki, Tomasz Białecki, Josep Batle, Adam Bednorz,	(参考訳) 我々は、デバイス独立であり、最小限の異なる実験を必要とするキュービットの2レベル空間のヌルテストを構築する。ほとんどの量子ビットは10以上の標準偏差でテストに失敗する。脱コヒーレンスや位相シフトといった一般的な技術的欠陥に対するテストの堅牢さは、逸脱の起源が既知の効果を超えていることを示している。 We construct a null test of the two-level space of a qubit, which is both device independent and needs a minimal number of different experiments. We demonstrate its feasibility on IBM Quantum, with most qubits failing the test by more than 10 standard deviations. The robustness of the test against common technical imperfections, like decoherence and phase shifts, and supposedly negligible leakage, indicates that the origin of deviations is beyond known effects.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# CHANI: 生体吸入によるニューロンの相関に基づくホークス凝集 CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration ( http://arxiv.org/abs/2405.18828v1 ) ライセンス: Link先を確認	Sophie Jaffard, Samuel Vaiter, Patricia Reynaud-Bouret,	(参考訳) 本研究の目的は,生物学にインスパイアされたニューラルネットワークが,局所的な変換のみによって分類タスクを学習できることを数学的に証明することである。そこで本研究では,CHANI(Correlation-based Hawkes Aggregation of Neurons with Bio-Inspiration)と呼ばれるスパイキングニューラルネットワークを提案する。シナプス重みはエキスパートアグリゲーションアルゴリズムによって更新され、局所的で単純な学習ルールを提供する。ネットワークが平均的かつ漸近的に学習できることを証明することができたのです。さらに、ネットワークが複数のクラスをエンコードし、中間層内の同じニューロンを複数のクラスで活性化できるという意味で、神経集合を自動生成することを示し、合成データセット上で数値シミュレーションを行った。この理論的なアプローチは、生物学的にインスパイアされたネットワークの従来の実証的な検証とは対照的であり、局所的な学習規則によって神経細胞が複雑な概念を表現できるアセンブリを形成する方法を理解するための道を開く。 The present work aims at proving mathematically that a neural network inspired by biology can learn a classification task thanks to local transformations only. In this purpose, we propose a spiking neural network named CHANI (Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration), whose neurons activity is modeled by Hawkes processes. Synaptic weights are updated thanks to an expert aggregation algorithm, providing a local and simple learning rule. We were able to prove that our network can learn on average and asymptotically. Moreover, we demonstrated that it automatically produces neuronal assemblies in the sense that the network can encode several classes and that a same neuron in the intermediate layers might be activated by more than one class, and we provided numerical simulations on synthetic dataset. This theoretical approach contrasts with the traditional empirical validation of biologically inspired networks and paves the way for understanding how local learning rules enable neurons to form assemblies able to represent complex concepts.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# 3次元視覚質問応答ベンチマークによるゼロショットGPT-4Vの性能評価 Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks ( http://arxiv.org/abs/2405.18831v1 ) ライセンス: Link先を確認	Simranjit Singh, Georgios Pavlakos, Dimitrios Stamoulis,	(参考訳) 基礎モデルの文脈における3次元視覚質問回答(VQA)問題の「修正」に関心があるため、これらの新しいパラダイムが既存の閉語彙データセットにどのように影響するかを評価することが不可欠である。本稿では,基礎モデル(GPT-4 Vision and GPT-4)のゼロショット性能を,確立された3次元VQAベンチマーク,すなわち3D-VQAとScanQAで評価する。我々は,従来のモデリング手法と比較して,GPTに基づくエージェントの性能を文脈化するための調査を行う。我々は,GPTをベースとしたエージェントが,クローズドボキャブラリのアプローチと同等に機能することを発見した。我々の研究は,閉鎖語彙設定において,「盲」モデルが驚くほど強いベースラインを確立するという最近の知見を裏付けるものである。エージェントは,テキストのテキストグラウンド化によって,シーン固有の語彙から大きな利益を享受できることを実証する。これまでのベースラインと事前比較を行うことで、マルチモーダルな3Dベンチマークを改良するためのコミュニティの継続的な取り組みを知らせることを期待します。 As interest in "reformulating" the 3D Visual Question Answering (VQA) problem in the context of foundation models grows, it is imperative to assess how these new paradigms influence existing closed-vocabulary datasets. In this case study, we evaluate the zero-shot performance of foundational models (GPT-4 Vision and GPT-4) on well-established 3D VQA benchmarks, namely 3D-VQA and ScanQA. We provide an investigation to contextualize the performance of GPT-based agents relative to traditional modeling approaches. We find that GPT-based agents without any fine-tuning perform on par with the closed vocabulary approaches. Our findings corroborate recent results that "blind" models establish a surprisingly strong baseline in closed-vocabulary settings. We demonstrate that agents benefit significantly from scene-specific vocabulary via in-context textual grounding. By presenting a preliminary comparison with previous baselines, we hope to inform the community's ongoing efforts to refine multi-modal 3D benchmarks.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# MoNDE: 大規模スパースモデルのためのニアデータエキスパートの混合 MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models ( http://arxiv.org/abs/2405.18832v1 ) ライセンス: Link先を確認	Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim,	(参考訳) MoE(Mixture-of-Experts)の大規模言語モデル(LLM)は、GPUメモリ容量を超えることが多いメモリ要件を持ち、二次記憶から専門計算のためのGPUへのコストのかかるパラメータ移動を必要とする。そこで本研究では,MoE LLM推論を効率的に実現するニアデータ・コンピューティング・ソリューションであるMixture of Near-Data Experts(MoNDE)を提案する。 MoNDEは、$\textit{hot}$専門家だけをGPUに転送し、残りの$\textit{cold}$専門家をホストメモリデバイス内で計算することで、MoEパラメータの運動量を削減している。大規模な専門家パラメータの転送を小さなアクティベーションに置き換えることで、MoNDEは通信効率のよいMoE推論を可能にし、エンコーダとデコーダの両方で既存のパラメータオフロードフレームワークを大幅に高速化する。 Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter movement by transferring only the $\textit{hot}$ experts to the GPU, while computing the remaining $\textit{cold}$ experts inside the host memory device. By replacing the transfers of massive expert parameters with the ones of small activations, MoNDE enables far more communication-efficient MoE inference, thereby resulting in substantial speedups over the existing parameter offloading frameworks for both encoder and decoder operations.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# Do Finetti: 交換可能なデータに対する因果効果について Do Finetti: On Causal Effects for Exchangeable Data ( http://arxiv.org/abs/2405.18836v1 ) ライセンス: Link先を確認	Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf,	(参考訳) データをi.d.d.(独立で同一の分散)でない環境での因果効果の推定について検討する。我々は、独立因果関係の仮定を満たす交換可能なデータに焦点を当てる。従来の因果効果推定フレームワーク(例:構造因果モデルとdo-calculus)は、通常、i.d.データに制限され、複数の環境データに自然に発生する、より一般的な交換可能な生成プロセスに拡張されない。このギャップに対処するために、我々は、交換可能なデータのための一般化されたフレームワークを開発し、我々の設定における因果効果の同定と推定を容易にする、切り離された因果分解公式を導入する。潜在的な応用を説明するために、我々は因果的P\'olya urnモデルを導入し、介入が交換可能なデータ設定においてどのように影響を伝播するかを示す。最後に,マルチ環境データから因果探索と効果推定を同時に行うアルゴリズムを開発した。 We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal P\'olya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data.	翻訳日:2024-05-30 18:28:55 公開日:2024-05-29
# 量子回路パラメータ化のための変圧器 Transformer for Parameterized Quantum Circuits Expressibility Prediction ( http://arxiv.org/abs/2405.18837v1 ) ライセンス: Link先を確認	Fei Zhang, Jie Li, Zhimin He, Haozhen Situ,	(参考訳) 特定の問題に対する指数関数的に高速な計算により、近年は量子コンピューティングに大きな注目を集めている。変分量子アルゴリズム(VQA)は、量子コンピューティングを実装する上で重要な手法であり、適切なタスク固有アンサッツにより、VQAの量子優位性を効果的に向上させることができる。しかし、膨大な検索スペースは、最適なタスク固有のアンサッツを見つけるのを困難にしている。ヒルベルト空間を効果的に探索するために量子状態の多様性を定量化する表現性は、一方のアンザッツが他方よりも優れているかどうかを評価するために用いられる。本研究では,パラメータ化量子回路の表現性予測におけるトランスフォーマーモデルの有効性について検討した。ゲートワイズ法により生成されるノイズレス回路を含む2つのデータセットを構築し, 量子ビット, ゲート数, 深さで変動する。回路はグラフに変換され、その表現性はKL分割と相対KL分割を用いて計算される。 Transformerモデルはこれらのデータセットに基づいてトレーニングされ、回路特性と表現性の間の複雑な関係をキャプチャする。 5つの評価指標を算出し, 実験結果から, 種々の表現可能性計算法において, 訓練されたモデルが高い性能とロバスト性を達成することを示す。本研究は、効率的な量子回路設計のアイデアを提供し、量子アーキテクチャ探索法の進歩に寄与することができる。 With the exponentially faster computation for certain problems, quantum computing has garnered significant attention in recent years. Variational Quantum Algorithm (VQA) is a crucial method to implement quantum computing, and an appropriate task-specific ansatz can effectively enhance the quantum advantage of VQAs. However, the vast search space makes it challenging to find the optimal task-specific ansatz. Expressibility, quantifying the diversity of quantum states to explore the Hilbert space effectively, can be used to evaluate whether one ansatz is superior than another. This study investigates the effectiveness of the Transformer model in predicting the expressibility of parameterized quantum circuits. We construct two datasets containing noiseless circuits generated by the gatewise method, varying in qubits, gate numbers and depths. The circuits are transformed into graphs, and then their expressibility are calculated using KL-divergence and Relative KL-divergence. A Transformer model is trained on these datasets to capture the intricate relationships between circuit characteristics and expressibility. Five evaluation metrics are calculated, and experimental results demonstrate that the trained model achieves high performance and robustness across various expressibility calculation methods. This research provides ideas for efficient quantum circuit design and can contribute to the advancement of quantum architecture search methods.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# 自動走行車開発におけるヒューマンファクター管理に必要な戦略 Requirements Strategy for Managing Human Factors in Automated Vehicle Development ( http://arxiv.org/abs/2405.18838v1 ) ライセンス: Link先を確認	Amna Pir Muhammad, Alessia Knauss, Eric Knauss, Jonas Bärgman,	(参考訳) 人的要因(HF)の知識の統合は、自動車両(AV)のような安全クリティカルなシステムを開発する際に重要である。 HFの知識がAV開発プロセスを通して継続的に考慮されることを保証することは、これらの先進的なシステムの有効性、安全性、受け入れなど、いくつかの理由において不可欠である。しかしながら、アジャイル開発における要件としてHFを含めることは難しい。最近、アジャイル開発における要件エンジニアリングの課題に対処するための要件戦略が提案されている。 AVのアジャイル開発におけるHF要件の調査に,要件戦略の概念をレンズとして適用することにより,本論文は3つの領域に到達した。 a)HF要件の所有権及び責任 b)HF要件及び情報モデルの構造及び c)HF要求に係る作業及び特徴フローの定義グローバルな自動車産業の専門家との13の半構造化インタビューに基づいて、これらの3分野について質的な洞察を提供する。インタビュアーが共有する多様な視点と経験は、洞察に富んだ見解を提供し、業界内でHFを統合するための各領域の潜在的なソリューション空間について、実世界の実践と戦略を強調するのに役立ちました。 The integration of human factors (HF) knowledge is crucial when developing safety-critical systems, such as automated vehicles (AVs). Ensuring that HF knowledge is considered continuously throughout the AV development process is essential for several reasons, including efficacy, safety, and acceptance of these advanced systems. However, it is challenging to include HF as requirements in agile development. Recently, Requirements Strategies have been suggested to address requirements engineering challenges in agile development. By applying the concept of Requirements Strategies as a lens to the investigation of HF requirements in agile development of AVs, this paper arrives at three areas for investigation: a) ownership and responsibility for HF requirements, b) structure of HF requirements and information models, and c) definition of work and feature flows related to HF requirements. Based on 13 semi-structured interviews with professionals from the global automotive industry, we provide qualitative insights in these three areas. The diverse perspectives and experiences shared by the interviewees provide insightful views and helped to reason about the potential solution spaces in each area for integrating HF within the industry, highlighting the real-world practices and strategies used.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# MEGA:人間のメッシュ回復のためのマスケ生成オートエンコーダ MEGA: Masked Generative Autoencoder for Human Mesh Recovery ( http://arxiv.org/abs/2405.18839v1 ) ライセンス: Link先を確認	Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer,	(参考訳) 単一のRGB画像からのHuman Mesh Recovery(HMR)は、類似した2D投影が複数の3D解釈に対応できるため、非常に曖昧な問題である。しかしながら、ほとんどのHMR法はこの曖昧さを無視し、関連する不確実性を考慮せずに単一の予測を行う。いくつかのアプローチは、人間のメッシュの分布を生成し、複数の予測のサンプリングを可能にするが、それらのうちの1つの予測を行う際に、最新の単一出力モデルと競合するものは存在しない。本研究は,マスク生成モデルに基づく新しい手法を提案する。人間のポーズと形状をトークン化することにより、HMRタスクを入力画像に条件付けられた離散トークンのシーケンスを生成するものとして定式化する。画像と部分的ヒューマンメッシュトークンシーケンスから人間のメッシュを復元するために訓練された MaskEd Generative Autoencoder であるMEGA を紹介する。画像が与えられた場合、フレキシブルな生成方式により、決定論的モードで1つの人間のメッシュを予測したり、確率論的モードで複数の人間のメッシュを生成できる。 MEGAにより、複数の出力を提案し、予測の不確実性を評価することができる。 In-the-wildベンチマークの実験により、MEGAは決定論的および確率的モードにおける最先端のパフォーマンスを達成し、単一出力および複数出力のアプローチより優れていることが示された。 Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as similar 2D projections can correspond to multiple 3D interpretations. Nevertheless, most HMR methods overlook this ambiguity and make a single prediction without accounting for the associated uncertainty. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. MEGA enables us to propose multiple outputs and to evaluate the uncertainty of the predictions. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# 開語彙セマンティックセグメンテーションのための超球面空間におけるパラメータ効率の微調整 Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation ( http://arxiv.org/abs/2405.18840v1 ) ライセンス: Link先を確認	Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Lingxi Xie, Qi Tian, Wei Shen,	(参考訳) オープンボキャブラリセマンティックセグメンテーションは、画像中の各ピクセルに任意のテキスト記述をラベル付けしようとする。ビジョン言語基盤モデル、特にCLIPは、最近、オープン語彙機能を取得するための強力なツールとして登場した。しかし、ピクセルレベルの予測能力を備えた微調整のCLIPは、しばしば3つの問題に悩まされる。 1 計算コストが高いこと。 2)CLIPとCLIPの2つの性質の相違 3) 目に見えないカテゴリーにおける一般化能力の低下。これらの問題に対処するため, 2つのCLIPモダリティに対して超球面空間で実施される対称パラメータ効率細調整(PEFT)戦略を提案する。具体的には、PEFT戦略は、全ての学習可能な行列のうち、効率的なブロック対角学習可能な変換行列と二重相互関係通信モジュールによって達成される。 PEFT戦略は2つのCLIPモダリティと対称に行われるので、それら間のミスアライメントが軽減される。さらに,CLIPテキストエンコーダにおいて,超球面エネルギーの原理に従ってPEFTに新たな制約を適用する。すなわち,微調整時の超球面エネルギーの最小化は,CLIPテキストエンコーダが提供する一般化能力の破壊を防止するため,元のパラメータ空間の内在的構造を保存する。様々なベンチマークにおいて、H-CLIPは、CLIPの総パラメータの約4%を更新するだけで、新しいSOTAオープン語彙セマンティックセマンティックセグメンテーション結果を達成することが示された。 Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between the two inherent modalities of CLIP, and 3) degraded generalization ability on unseen categories. To address these issues, we propose H-CLIP a symmetrical parameter-efficient fine-tuning (PEFT) strategy conducted in hyperspherical space for both of the two CLIP modalities. Specifically, the PEFT strategy is achieved by a series of efficient block-diagonal learnable transformation matrices and a dual cross-relation communication module among all learnable matrices. Since the PEFT strategy is conducted symmetrically to the two CLIP modalities, the misalignment between them is mitigated. Furthermore, we apply an additional constraint to PEFT on the CLIP text encoder according to the hyperspherical energy principle, i.e., minimizing hyperspherical energy during fine-tuning preserves the intrinsic structure of the original parameter space, to prevent the destruction of the generalization ability offered by the CLIP text encoder. Extensive evaluations across various benchmarks show that H-CLIP achieves new SOTA open-vocabulary semantic segmentation results while only requiring updating approximately 4% of the total parameters of CLIP.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# 自動走行車開発におけるヒューマンファクターの管理--課題と実践に向けて Managing Human Factors in Automated Vehicle Development: Towards Challenges and Practices ( http://arxiv.org/abs/2405.18841v1 ) ライセンス: Link先を確認	Amna Pir Muhammad, Eric Knauss, Jonas Bärgman, Alessia Knauss,	(参考訳) 技術的複雑さと社会的影響のため、自動走行車(AV)の開発は、現在の自動車工学の実践に挑戦する。研究は、人的要因(HF)の知識を安全かつ受け入れられるために、AVを開発する際に考慮することが重要であることを示している。本研究は,自動車産業におけるアジャイルAV開発におけるHF要求の実践と課題について考察する。我々は、HFの専門家やAVエンジニアを含むスウェーデンの自動車会社から10人の業界専門家にインタビューした。半構造化インタビューの質的な分析に基づいて、HFの知識をアジャイルAV開発に伝達し、取り入れるための現在のアプローチと関連する課題について論じる。私たちの発見は、HFの知識をアジャイルなAV開発に効果的に組み込む上で重要な問題について、将来の研究に集中するのに役立ちます。 Due to the technical complexity and social impact, automated vehicle (AV) development challenges the current state of automotive engineering practice. Research shows that it is important to consider human factors (HF) knowledge when developing AVs to make them safe and accepted. This study explores the current practices and challenges of the automotive industries for incorporating HF requirements during agile AV development. We interviewed ten industry professionals from several Swedish automotive companies, including HF experts and AV engineers. Based on our qualitative analysis of the semi-structured interviews, a number of current approaches for communicating and incorporating HF knowledge into agile AV development and associated challenges are discussed. Our findings may help to focus future research on issues that are critical to effectively incorporate HF knowledge into agile AV development.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# 野生における記述的画像品質評価 Descriptive Image Quality Assessment in the Wild ( http://arxiv.org/abs/2405.18842v1 ) ライセンス: Link先を確認	Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Tianfan Xue, Chao Dong,	(参考訳) 視覚言語モデル(VLM)の急速な進歩により、VLMベースの画像品質評価(IQA)は、画像品質を言語的に記述し、人間の表現と整合し、IQAタスクの多面的な性質を捉えようとしている。しかし、現在の方法はまだ実用には程遠い。まず、事前の作業は特定のサブタスクや設定に絞られ、多様な現実世界のアプリケーションと一致しない。第二に、データセットのカバレッジ、スケール、品質に制限があるため、パフォーマンスは準最適である。これらの課題を克服するために、野生における画像品質評価(DepictQA-Wild)を紹介する。本手法は,評価タスクと比較タスク,簡潔かつ詳細な応答,完全参照,非参照シナリオを含む多機能IQAタスクパラダイムを含む。そこで本研究では,データ品質を向上する基盤トラスインフォームドデータセット構築手法を導入し,短時間のジョイントフレームワークの下でデータセットを495Kにスケールアップする。そこで我々はDQ-495Kという,包括的で大規模で高品質なデータセットを構築した。また、画像の解像度をトレーニング中に保持し、解像度に関する品質問題に対処し、低品質の応答をフィルタリングするのに有用な信頼性スコアを推定する。実験結果から,DepictQA-Wildは従来のスコアベース手法,VLMモデル以前のIQAモデル,歪み識別,即時評価,推論タスクにおいて独自のGPT-4Vよりも優れていた。我々の優位性は、Webダウンロードされた画像の評価や、モデル処理された画像のランク付けなど、現実世界のアプリケーションによってさらに確認される。データセットとコードはhttps://depictqa.github.io/depictqa-wild/でリリースされる。 With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-world applications. Second, their performance is sub-optimal due to limitations in dataset coverage, scale, and quality. To overcome these challenges, we introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild). Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios. We introduce a ground-truth-informed dataset construction approach to enhance data quality, and scale up the dataset to 495K under the brief-detail joint framework. Consequently, we construct a comprehensive, large-scale, and high-quality dataset, named DQ-495K. We also retain image resolution during training to better handle resolution-related quality issues, and estimate a confidence score that is helpful to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. Our advantages are further confirmed by real-world applications including assessing the web-downloaded images and ranking model-processed images. Datasets and codes will be released in https://depictqa.github.io/depictqa-wild/.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# データ駆動型機械故障検出:総合的レビュー Data-driven Machinery Fault Detection: A Comprehensive Review ( http://arxiv.org/abs/2405.18843v1 ) ライセンス: Link先を確認	Dhiraj Neupane, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal,	(参考訳) 先進的な製造の時代には、機械の故障をできるだけ早く診断し、安全で効率的な運転を保証することがこれまで以上に重要になりました。産業用ビッグデータの急増とセンシング・計算技術の進歩により、機械/深度学習アプローチに基づくデータ駆動型機械学習故障診断(MFD)ソリューションが製造業においてユビキタスに利用されている。多くの関連ソリューションが提案され、多くの論文でレビューされている産業応用において、故障した機械信号のタイムリーかつ正確に識別することが不可欠である。 MFDに関する多くのソリューションとレビューが利用可能であるにもかかわらず、既存の作品はいくつかの側面を欠いていることが多い。利用可能な文献の多くは、特定の種類の機器や分析方法に集中しているため、幅広い製造環境において適用性に制限がある。さらに、ノイズの多いデータ処理、適切な特徴の選択、新しい障害や予期せぬ障害に対応するモデルの適用など、データ駆動型アプローチの実装に関わる課題に関する議論は、表面的あるいは完全に見落とされがちである。そこで本調査では, 各種機械故障の検出・診断にさまざまな機械学習アプローチを用いた記事の総合的なレビュー, 強度と限界の強調, 条件に基づく解析に使用される手法のレビュー, 利用可能な機械故障データセットの総合的な検討, 今後の研究者に対して, これらのアプローチをMFDに使用しながら直面する可能性のある課題について紹介し, それらの問題を緩和するための潜在的ソリューションを推奨する。今後の研究の見通しは、この分野をより深く理解するためにも指摘されている。この記事は、研究者の助けとなり、この分野のさらなる発展に貢献してくれると信じている。 In this era of advanced manufacturing, it's now more crucial than ever to diagnose machine faults as early as possible to guarantee their safe and efficient operation. With the massive surge in industrial big data and advancement in sensing and computational technologies, data-driven Machinery Fault Diagnosis (MFD) solutions based on machine/deep learning approaches have been used ubiquitously in manufacturing. Timely and accurately identifying faulty machine signals is vital in industrial applications for which many relevant solutions have been proposed and are reviewed in many articles. Despite the availability of numerous solutions and reviews on MFD, existing works often lack several aspects. Most of the available literature has limited applicability in a wide range of manufacturing settings due to their concentration on a particular type of equipment or method of analysis. Additionally, discussions regarding the challenges associated with implementing data-driven approaches, such as dealing with noisy data, selecting appropriate features, and adapting models to accommodate new or unforeseen faults, are often superficial or completely overlooked. Thus, this survey provides a comprehensive review of the articles using different types of machine learning approaches for the detection and diagnosis of various types of machinery faults, highlights their strengths and limitations, provides a review of the methods used for condition-based analyses, comprehensively discusses the available machinery fault datasets, introduces future researchers to the possible challenges they have to encounter while using these approaches for MFD and recommends the probable solutions to mitigate those problems. The future research prospects are also pointed out for a better understanding of the field. We believe this article will help researchers and contribute to the further development of the field.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# Wikiコントリビュータのシミュレーション、モデリング、分類:「善」、「悪」、そして「悪」 Simulation, Modelling and Classification of Wiki Contributors: Spotting The Good, The Bad, and The Ugly ( http://arxiv.org/abs/2405.18845v1 ) ライセンス: Link先を確認	Silvia García Méndez, Fátima Leal, Benedita Malheiro, Juan Carlos Burguillo Rial, Bruno Veloso, Adriana E. Chis, Horacio González Vélez,	(参考訳) データクラウドソーシング(Data crowdsourcing)は、自発的なコントリビュータのグループが、ニュース、コメント、メディアから知識、分類に至るまで、非常に関連性の高いデータをプラットフォームに提供する、データ取得プロセスである。通常、ユーザ生成データストリームを処理して、wiki、コラボレーティブマップ、eコマースサイト、ソーシャルネットワークなどのポピュラーなサービスを提供し、洗練する。しかしながら、このモナス・オペランディは、敵対的環境における意図しないデータ操作に関する深刻な懸念を提起する。本稿では,人間と非人間(ロボット)を自動的に識別するためのシミュレーション,モデリング,分類手法を提案する。 WikiVoyageをテストベッドとして利用することで,実データと合成データの両方からなるクラスバランスのデータストリームを使用することで,分類者の信頼性と品質を大幅に向上させることが証明された。以上の結果から,本手法は良性ボットと良性ボットと,最大92%の分類精度を持つヒトコントリビュータを区別できることがわかった。 Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage - a free worldwide wiki travel guide open to contribution from the general public - as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# アジャイルにおける要求戦略の定義 - デザインサイエンス研究 Defining Requirements Strategies in Agile: A Design Science Research Study ( http://arxiv.org/abs/2405.18847v1 ) ライセンス: Link先を確認	Amna Pir Muhammad, Eric Knauss, Odzaya Batsaikhan, Nassiba El Haskouri, Yi-Chun Lin, Alessia Knauss,	(参考訳) 調査によると、現在アジャイル開発で直面している課題の多くは、要件エンジニアリングに関連している。デザインサイエンスの研究に基づいて、未定義の要求戦略からアジャイル開発で生じる重要な課題を考察する。これらの課題に対処し、要求戦略の重要な構成要素を合成する潜在的な方法を模索する。我々のデザインサイエンス研究は、通信技術、セキュリティサービス、自動車の分野における3つの産業ケースで、複数のケーススタディに基づいています。具体的な課題とワークフローを理解するために,計20回のインタビュー,2回のワークショップ,2回の参加者観察,各事例の文書分析を頼りにしました。いずれの場合も、プロセスマネージャや経験豊富なエンジニアとのコラボレーションにおいて、要件戦略を定義します。この経験から、私たちはアジャイル開発における要件戦略を定義するためのガイドラインを抽出します。 Research shows that many of the challenges currently encountered with agile development are related to requirements engineering. Based on design science research, this paper investigates critical challenges that arise in agile development from an undefined requirements strategy. We explore potential ways to address these challenges and synthesize the key building blocks of requirements strategies. Our design science research rests on a multiple case study with three industrial cases in the domains of communication technology, security services, and automotive. We relied on a total of 20 interviews, two workshops, participant observation in two cases, and document analysis in each of the cases to understand concrete challenges and workflows. In each case, we define a requirements strategy in collaboration with process managers and experienced engineers. From this experience, we extract guidelines for defining requirements strategies in agile development.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# コンテキストコントラストによる異常検出 Anomaly Detection by Context Contrasting ( http://arxiv.org/abs/2405.18848v1 ) ライセンス: Link先を確認	Alain Ryser, Thomas M. Sutter, Alexander Marx, Julia E. Vogt,	(参考訳) 異常検出は、標準から逸脱するサンプルを特定することに焦点を当てる。画像などの高次元データを扱う場合、異常パターンを検出するための重要な要件は、トレーニング中に見られる通常の概念をキャプチャする低次元表現を学習することである。近年の自己教師型学習の進歩は、この点において大きな可能性を秘めている。しかし、最も成功した自己教師付き異常検出法の多くは、異常の構造に関する事前知識を前提として、訓練中に合成異常を利用する。しかし、多くの現実世界のアプリケーションでは、目に見えないデータから何を期待すべきかは分かっていません。本研究では、通常のトレーニングデータを通常のプロパティを保持しながら異なるコンテキストに設定し、異なる視点からデータを観察することで、この問題に対処するCon2を提案する。その結果、見つからない通常のデータは学習した文脈表現に固執するが、異常は起こらないため、トレーニング中に異常について何も知らないまま検出できる。提案手法は,より現実的な医療環境において,潜在的な異常に関する知識が不足している場合に,より優れたパフォーマンスを示すとともに,様々なベンチマーク上での最先端のパフォーマンスを実現することを実証した。 Anomaly Detection focuses on identifying samples that deviate from the norm. When working with high-dimensional data such as images, a crucial requirement for detecting anomalous patterns is learning lower-dimensional representations that capture normal concepts seen during training. Recent advances in self-supervised learning have shown great promise in this regard. However, many of the most successful self-supervised anomaly detection methods assume prior knowledge about the structure of anomalies and leverage synthetic anomalies during training. Yet, in many real-world applications, we do not know what to expect from unseen data, and we can solely leverage knowledge about normal data. In this work, we propose Con2, which addresses this problem by setting normal training data into distinct contexts while preserving its normal properties, letting us observe the data from different perspectives. Unseen normal data consequently adheres to learned context representations while anomalies fail to do so, letting us detect them without any knowledge about anomalies during training. Our experiments demonstrate that our approach achieves state-of-the-art performance on various benchmarks while exhibiting superior performance in a more realistic healthcare setting, where knowledge about potential anomalies is often scarce.	翻訳日:2024-05-30 18:19:11 公開日:2024-05-29
# SFANet:気象予報のための空間周波数アテンションネットワーク SFANet: Spatial-Frequency Attention Network for Weather Forecasting ( http://arxiv.org/abs/2405.18849v1 ) ライセンス: Link先を確認	Jiaze Wang, Hao Chen, Hongcan Xu, Jinpeng Li, Bowen Wang, Kun Shao, Furui Liu, Huaxi Chen, Guangyong Chen, Pheng-Ann Heng,	(参考訳) 天気予報は様々な分野において重要な役割を担い、意思決定とリスク管理を推進している。しかし、伝統的な手法は、特に高解像度データの存在下で、気象系の複雑な力学を捉えるのに苦労することが多い。本稿では、これらの課題に対処し、時空間天気予報の精度を高めるために設計された新しいディープラーニングフレームワークである空間周波数注意ネットワーク(SFANet)を提案する。既存の手法の限界からインスピレーションを得て,高度なトークンミキシングとアテンション機構をシームレスに統合する革新的なアプローチを提案する。プールと空間混合の両戦略を活用することにより、SFANetは高次元時空間列の処理を最適化し、成分間関係情報を保存し、広範囲の長距離関係をモデル化する。機能統合をさらに強化するため,我々は空間周波数アテンションモジュールを導入し,複雑な相互モーダル相関を捉える。 SEVIR (Storm EVent ImageRy) と ICAR (Institute for Climate and Application Research) - El Ni\~{n}o Southern Oscillation (ENSO) の2つの異なるデータセットに対する広範な実験的評価は、SFANetの顕著な性能を示している。特に,SFANetは,降水パターンの予測やEl Ni\~{n}oイベントの予測に習熟し,最先端の手法に対する大幅な進歩を実現している。 Weather forecasting plays a critical role in various sectors, driving decision-making and risk management. However, traditional methods often struggle to capture the complex dynamics of meteorological systems, particularly in the presence of high-resolution data. In this paper, we propose the Spatial-Frequency Attention Network (SFANet), a novel deep learning framework designed to address these challenges and enhance the accuracy of spatiotemporal weather prediction. Drawing inspiration from the limitations of existing methodologies, we present an innovative approach that seamlessly integrates advanced token mixing and attention mechanisms. By leveraging both pooling and spatial mixing strategies, SFANet optimizes the processing of high-dimensional spatiotemporal sequences, preserving inter-component relational information and modeling extensive long-range relationships. To further enhance feature integration, we introduce a novel spatial-frequency attention module, enabling the model to capture intricate cross-modal correlations. Our extensive experimental evaluation on two distinct datasets, the Storm EVent ImageRy (SEVIR) and the Institute for Climate and Application Research (ICAR) - El Ni\~{n}o Southern Oscillation (ENSO) dataset, demonstrates the remarkable performance of SFANet. Notably, SFANet achieves substantial advancements over state-of-the-art methods, showcasing its proficiency in forecasting precipitation patterns and predicting El Ni\~{n}o events.	翻訳日:2024-05-30 18:19:10 公開日:2024-05-29
# LetsMap: セマンティックなBEVマッピングのための教師なし表現学習 LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping ( http://arxiv.org/abs/2405.18852v1 ) ライセンス: Link先を確認	Nikhil Gosala, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo Drews-Jr, Wolfram Burgard, Abhinav Valada,	(参考訳) Semantic Bird's Eye View (BEV) マップは、自律運転における様々な意思決定タスクに対する強い排他的推論を備えたリッチな表現を提供する。しかしながら、ほとんどのBEVマッピングアプローチでは、大量の人間による注釈付きBEV基底真理データに依存する、完全に教師付き学習パラダイムを採用している。本研究では,この制限に対処するため,単眼正面視(FV)画像からのセマンティックなBEVマップをラベル効率よく生成する,教師なし表現学習手法を提案する。提案手法は,2つの解離したニューラルパスを教師なしの方法で利用し,BEV内の少数のラベルのみを用いたセマンティックBEVマッピングのタスクを微調整することで,シーン幾何学とシーン意味論を独立に推論するネットワークを事前訓練する。本研究では,FV画像の空間的・時間的整合性を利用して,シーン表現を符号化する新しい時間的マスク付きオートエンコーダの定式化に依存しながら,シーン形状を学習する。 KITTI-360 と nuScenes データセットの大規模な評価は,BEV ラベルの 1% しか使用せず,追加ラベル付きデータも使用せず,我々のアプローチが既存の最先端アプローチと同等であることを示している。 Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.	翻訳日:2024-05-30 18:19:10 公開日:2024-05-29
# Snapshot Spectral Imaging Face Anti-Spoofingにおけるコントラスト学習の促進 Supervised Contrastive Learning for Snapshot Spectral Imaging Face Anti-Spoofing ( http://arxiv.org/abs/2405.18853v1 ) ライセンス: Link先を確認	Chuanbiao Song, Yan Hong, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang,	(参考訳) 本研究は,顔認識システムにおける顔の偽造防止機能を強化することを目的とした,最先端の再均衡型コントラスト学習戦略を明らかにし,印刷写真や高現実的なシリコンマスクやラテックスマスクによる課題への対処に焦点を当てた。ハイパースペクトル画像の提供にSnapshot Spectral Imaging技術を活用するHySpeFASデータセットを活用することで,データ再サンプリングによるクラスレベルのコントラスト学習と,革新的なリアルタイムリウェイト技術との調和を実現した。この方法は、データセットの不均衡を効果的に軽減し、アイデンティティ関連のバイアスを低減する。 CVPR 2024のChalearn Snapshot Spectral Imaging Face Anti-spoofing Challengeにおいて,HySpeFASデータセット上での平均分類誤差率(ACER)は前例のない0.0000\%に達した。 This study reveals a cutting-edge re-balanced contrastive learning strategy aimed at strengthening face anti-spoofing capabilities within facial recognition systems, with a focus on countering the challenges posed by printed photos, and highly realistic silicone or latex masks. Leveraging the HySpeFAS dataset, which benefits from Snapshot Spectral Imaging technology to provide hyperspectral images, our approach harmonizes class-level contrastive learning with data resampling and an innovative real-face oriented reweighting technique. This method effectively mitigates dataset imbalances and reduces identity-related biases. Notably, our strategy achieved an unprecedented 0.0000\% Average Classification Error Rate (ACER) on the HySpeFAS dataset, ranking first at the Chalearn Snapshot Spectral Imaging Face Anti-spoofing Challenge on CVPR 2024.	翻訳日:2024-05-30 18:19:10 公開日:2024-05-29
# SSGA-Net: 自律運転のためのステップワイドグローバルローカルアグリゲーションネットワーク SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving ( http://arxiv.org/abs/2405.18857v1 ) ライセンス: Link先を確認	Yiming Cui, Cheng Han, Dongfang Liu,	(参考訳) 視覚に基づく知覚は、自動運転の鍵となるモジュールである。これらの視覚的認識タスクの中で、ビデオオブジェクト検出は、高速な動きや複数のポーズによって生じる特徴劣化のため、主要かつ困難なタスクである。現在のモデルは、通常、隣接するフレームから特徴を集約してタスクヘッドのオブジェクト表現を強化し、より正確な予測を生成する。性能は向上するが、これらの手法は将来のフレームの情報に依存し、高い計算複雑性に悩まされる。一方、アグリゲーションプロセスは、推論時間中に再構成できない。これらの問題により、既存のモデルのほとんどがオンラインアプリケーションでは利用できない。これらの問題を解決するために、段階的に空間的にグローバルな集約ネットワークを導入する。提案するモデルは,主に3つの部分を含む。多段階のステップワイドネットワークは、前段階からの予測とオブジェクト表現を徐々に洗練する。空間的グローバル・ローカル・アグリゲーションは、隣接するフレームからの局所情報と現在のフレームからのグローバル・セマンティクスを融合させ、特徴劣化を解消する。ダイナミックアグリゲーション戦略は、リファインメント結果に基づいて早期にアグリゲーションプロセスを停止し、冗長性を除去し、効率を向上する。 ImageNet VIDベンチマークの大規模な実験により、提案モデルの有効性と効率が検証された。 Visual-based perception is the key module for autonomous driving. Among those visual perception tasks, video object detection is a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current models usually aggregate features from the neighboring frames to enhance the object representations for the task heads to generate more accurate predictions. Though getting better performance, these methods rely on the information from the future frames and suffer from high computational complexity. Meanwhile, the aggregation process is not reconfigurable during the inference time. These issues make most of the existing models infeasible for online applications. To solve these problems, we introduce a stepwise spatial global-local aggregation network. Our proposed models mainly contain three parts: 1). Multi-stage stepwise network gradually refines the predictions and object representations from the previous stage; 2). Spatial global-local aggregation fuses the local information from the neighboring frames and global semantics from the current frame to eliminate the feature degradation; 3). Dynamic aggregation strategy stops the aggregation process early based on the refinement results to remove redundancy and improve efficiency. Extensive experiments on the ImageNet VID benchmark validate the effectiveness and efficiency of our proposed models.	翻訳日:2024-05-30 18:19:10 公開日:2024-05-29
# ドメインにインスパイアされたシャープネス-ドメインシフトによる最小化 Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts ( http://arxiv.org/abs/2405.18861v1 ) ライセンス: Link先を確認	Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang,	(参考訳) 本稿ではドメインシフト下での最適化のためのドメインインスパイアされたシャープネス認識最小化(DISAM)アルゴリズムを提案する。これは、異なる領域にまたがるSAMの不整合収束度によって動機付けられ、特定の領域に対する最適化バイアスを引き起こし、したがって全体の収束を損なう。この問題に対処するために、我々は、(十分に)最適化されていないドメインに対する圧倒的な(不十分な)摂動を防ぐために、シャープネス推定におけるドメインレベルの収束一貫性を検討する。特に、DIAMは、ドメイン損失の分散を最小化する制約を導入し、摂動発生における弾性勾配校正を可能にする: 1つのドメインが平均レベル \textit{w.r.t.} の損失以上に最適化されると、そのドメインへの勾配摂動は自動的に弱くなる。このメカニズムの下では、不整合収束が生じると、理論上disAMがより高速な総合収束を実現し、原理的に一般化を向上できることが示される。様々な領域一般化ベンチマークの広範囲な実験は、最先端の手法よりもDIAMの方が優れていることを示している。さらに、パラメータ効率の良い微調整と事前学習モデルの組み合わせにより、DIAMの優れた効率性を示す。ソースコードはhttps://github.com/MediaBrain-SJTU/DISAMで公開されている。 This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.	翻訳日:2024-05-30 18:19:10 公開日:2024-05-29
# 単一分子分光法による新しいビュー合成のためのニューラルラジアンス場 Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy ( http://arxiv.org/abs/2405.18863v1 ) ライセンス: Link先を確認	Zijie Jiang, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Kenji Miki,	(参考訳) 腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下手術は腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下手術の診断に有用である。この目的を達成するための典型的な方法は、構造移動(SfM)やポアソン表面の再構成を含む従来の3D再構成技術を統合することである。これらの手法は、点雲やメッシュなどの明示的な3D表現を生成し、新しい視点から画像のレンダリングを可能にする。しかし、胃内の低テクスチュア領域と非ランベルト領域の存在は、しばしば、点雲とメッシュのノイズと不完全な再構成をもたらし、高品質の画像レンダリングの達成を妨げる。本稿では,ニューラルラジアンスフィールド(NeRF)の新しい手法をモノクロガストロスコープデータに適用し,新しい視点に向けて光現実像を合成する。単眼胃内視鏡の局所領域における視線間隔による性能劣化に対処するため, 既設点雲からの幾何先行をNeRFのトレーニングに組み込んだ。近年のNeRF法と比較すると,胃内の新しい視点からの高忠実度画像のレンダリングは質的,定量的に行われている。 Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# 最適マルチモーダル埋め込み空間のトポロジ的展望 Topological Perspectives on Optimal Multimodal Embedding Spaces ( http://arxiv.org/abs/2405.18867v1 ) ライセンス: Link先を確認	Abdul Aziz A. B, A. B Abdul Rahim,	(参考訳) マルチモーダルモデル開発における最近の進歩は、テキスト・ツー・イメージ生成の領域におけるパラダイムシフトを浮き彫りにした。これらの進歩の中で、CLIPは、テキスト情報と視覚情報を一体化された潜在空間内にエンコードする高度なオートエンコーダである、顕著な成果として際立っている。本稿では,CLIPと最近のCLOOBの比較分析について述べる。これらのモデルによって構築された埋め込み空間内での複雑な区別を明らかにするために、トポロジカルデータ解析を用いる。提案手法は,モダリティギャップドライバ,高次元と低次元の両方に存在するクラスタリング構造,および各埋め込み空間を形成する上で,次元崩壊が果たす重要な役割を包括的に検討することを含む。実証実験は、様々な文脈シナリオにおける下流性能に関する分析の影響を裏付けるものである。本研究は,CLIP と CLOOB の比較効果を生かし,それぞれの長所と短所に関する知見を提供し,マルチモーダルモデル研究のさらなる洗練と発展のための基盤を提供することを目的としている。 Recent strides in multimodal model development have ignited a paradigm shift in the realm of text-to-image generation. Among these advancements, CLIP stands out as a remarkable achievement which is a sophisticated autoencoder adept at encoding both textual and visual information within a unified latent space. This paper delves into a comparative analysis between CLIP and its recent counterpart, CLOOB. To unravel the intricate distinctions within the embedding spaces crafted by these models, we employ topological data analysis. Our approach encompasses a comprehensive examination of the modality gap drivers, the clustering structures existing across both high and low dimensions, and the pivotal role that dimension collapse plays in shaping their respective embedding spaces. Empirical experiments substantiate the implications of our analyses on downstream performance across various contextual scenarios. Through this investigation, we aim to shed light on the nuanced intricacies that underlie the comparative efficacy of CLIP and CLOOB, offering insights into their respective strengths and weaknesses, and providing a foundation for further refinement and advancement in multimodal model research.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# ハーモニックトラップ電位内に置かれた重力波検出器における量子重力信号 Quantum gravity signatures in gravitational wave detectors placed inside a harmonic trap potential ( http://arxiv.org/abs/2405.18868v1 ) ライセンス: Link先を確認	Soham Sen, Sunandan Gangopadhyay, Sukanta Bhattacharyya,	(参考訳) 本研究は,高調波トラップ内にのみ設置された偏光と入ってくる重力波と相互作用する重力波の一般重力波検出器について考察する。このモデルは、重力波の共鳴検出器の記述とよく一致する。よく知られた検出器-重力波相互作用のシナリオは、検出器を量子力学的に扱う半古典的な手法を用いるが、重力波は古典的なレベルで考えることができる。解析では、重力波の摂動の離散モード分解を用いて、重力波と調和振動子に対応する運動量演算子の位置と運動量演算子を含むハミルトニアンを導出する。そして、初期状態から未知の最終状態に移行するための高調波振動子-重力波テンソル積状態の遷移確率を計算した。重力波のエネルギーフラックス関係を用いて、全エネルギーを検出器の初期状態における重力子数の組み合わせとして考えると、共鳴吸収の場合の遷移確率は半古典吸収の場合と全く同じ解析形式を取る。本モデルでは, 半古典的アンカルージに完全に欠落した単一重力子の自然放出を観測した。したがって、これは線型化された量子重力の直接的なシグネチャを与える。 In this work, we consider a general gravitational wave detector of gravitational wave interacting with an incoming gravitational wave carrying plus polarization only placed inside a harmonic trap. This model can be well acquainted with the description of a resonant detector of gravitational wave as well. The well known detector-gravitational wave interaction scenario uses the method of a semi classical approach where the detector is treated quantum mechanically but the gravitational wave is considered at a classical level. In our analysis, we use a discrete mode decomposition of the gravitational wave perturbation which results in a Hamiltonian involving the position and momentum operators corresponding to the gravitational wave and the harmonic oscillator. We have then calculated the transition probability for the harmonic oscillator-gravitational wave tensor product state for going from an initial state to some unknown final state. Using the energy flux relation of the gravitational waves, we observe that if we consider the total energy as a combination of the number of gravitons in the initial state of the detector then the transition probability for the resonant absorption case scenario takes the analytical form which is exactly similar to the semi-classical absorption case. In case of the emission scenario, we observe a spontaneous emission of a single graviton which was completely absent in the semi-classical analouge of this model. This therefore gives a direct signature of linearized quantum gravity.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# データ駆動型電力管理に向けて - マルチリージョン同調データと知識グラフ Towards Data-Driven Electricity Management: Multi-Region Harmonized Data and Knowledge Graph ( http://arxiv.org/abs/2405.18869v1 ) ライセンス: Link先を確認	Vid Hanžel, Blaž Bertalanič, Carolina Fortuna,	(参考訳) 人口増加と技術進歩により、世界的な電力消費が増加し、結果としてCO2排出量も増加している。住宅セクターは世界の電力消費の25%を占めており、快適さを犠牲にすることなく効率を高め、CO2フットプリントを減らす大きな可能性を秘めている。しかし、複数の地域をまたがる家庭レベルでの一様消費データが欠如していることは、大規模な研究や堅牢な多地域モデル開発を妨げる。本稿では、公開されているソースからコンパイルされ、均一なフォーマットで提示されるマルチリージョンデータセットについて紹介する。このデータにより、デアグリゲーション、需要予測、アプライアンスON/OFF分類などの機械学習タスクが可能になる。さらに,家庭の電力消費を特徴付けるRDF知識グラフを開発し,Wikidata や DBpedia などのオープンな知識ベースとの相互運用性とセマンティッククエリを実現するための,家庭関連プロパティとコンテキストを関連づける。この構造化されたデータは、さまざまな利害関係者にデータ駆動型ポリシーやビジネス開発について通知するために利用することができる。 Due to growing population and technological advances, global electricity consumption, and consequently also CO2 emissions are increasing. The residential sector makes up 25% of global electricity consumption and has great potential to increase efficiency and reduce CO2 footprint without sacrificing comfort. However, a lack of uniform consumption data at the household level spanning multiple regions hinders large-scale studies and robust multi-region model development. This paper introduces a multi-region dataset compiled from publicly available sources and presented in a uniform format. This data enables machine learning tasks such as disaggregation, demand forecasting, appliance ON/OFF classification, etc. Furthermore, we develop an RDF knowledge graph that characterizes the electricity consumption of the households and contextualizes it with household related properties enabling semantic queries and interoperability with other open knowledge bases like Wikidata and DBpedia. This structured data can be utilized to inform various stakeholders towards data-driven policy and business development.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# LLMはマインドタスクの高次理論上での成人人間のパフォーマンスを達成する LLMs achieve adult human performance on higher-order theory of mind tasks ( http://arxiv.org/abs/2405.18870v1 ) ライセンス: Link先を確認	Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar,	(参考訳) 本稿では,大規模言語モデル (LLM) が高次心の理論 (ToM) をいかに発展させたかを検討する。本稿では、手書きテストスイートであるMulti-Order Theory of Mind Q&Aを導入し、5つのLCMのパフォーマンスと、新たに集まった成人のベンチマークを比較することによって、以前の作業の上に構築する。 GPT-4とFlan-PaLMは、ToMタスク全体において、成人レベルおよびほぼ成人レベルに到達し、GPT-4は6次推定で成人レベルを超えることが判明した。以上の結果から,ToM能力を実現するためのモデルサイズと微調整の間には相互作用があることが示唆された。高次ToMが幅広い協調的かつ競争的な人間の行動に果たす役割を考えると、これらの発見はユーザ向けLLMアプリケーションに重大な影響を及ぼす。 This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# DFAMiner:ラベル付きサンプルから最小限のDFAを分離する DFAMiner: Mining minimal separating DFAs from labelled samples ( http://arxiv.org/abs/2405.18871v1 ) ライセンス: Link先を確認	Daniele Dell'Erba, Yong Li, Sven Schewe,	(参考訳) ラベル付きサンプルから最小限の決定論的有限オートマトン(DFA)を分離する受動的学習ツールであるDFAMinerを提案する。分離オートマトンは、通常モデルチェックで一般的に発生する興味深いオートマトンクラスであり、パリティゲーム解決の基本的な問題に関心を寄せている。まず,3値のDFA(3DFA)を,通常の語彙順に付与されたラベル付きサンプルの集合からインクリメンタルに構築する,単純で線形な時間的アルゴリズムを提案する。この3DFAは、ラベル付き例を正確に認識できるように、状態を受け入れ、拒否すると同時に、不注意な状態も受け入れている。そこで我々は,SAT問題を最小化して構築したオートマトンを最小化することにより,ラベル付きサンプルの最小分離DFAをマイニングするツールを開発した。実験的な評価の結果,本ツールはサンプルから最小限のDFAを学習するための標準ベンチマークにおいて,最先端のツールよりも優れていた。 DFAを分離する効率的な構築の進歩は、DFAMinerが最大7色までの単純な言語に対して最適な分離オートマトンを作成できることを示すために、パリティゲーム解決の低い境界を見つけることにも繋がる。将来的には、データ構造の改善が期待できるだろう。 We propose DFAMiner, a passive learning tool for learning minimal separating deterministic finite automata (DFA) from a set of labelled samples. Separating automata are an interesting class of automata that occurs generally in regular model checking and has raised interest in foundational questions of parity game solving. We first propose a simple and linear-time algorithm that incrementally constructs a three-valued DFA (3DFA) from a set of labelled samples given in the usual lexicographical order. This 3DFA has accepting and rejecting states as well as don't-care states, so that it can exactly recognise the labelled examples. We then apply our tool to mining a minimal separating DFA for the labelled samples by minimising the constructed automata via a reduction to solving SAT problems. Empirical evaluation shows that our tool outperforms current state-of-the-art tools significantly on standard benchmarks for learning minimal separating DFAs from samples. Progress in the efficient construction of separating DFAs can also lead to finding the lower bound of parity game solving, where we show that DFAMiner can create optimal separating automata for simple languages with up to 7 colours. Future improvements might offer inroads to better data structures.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# トレーニング可能な特徴マッチングアテンションネットワークに基づく単一画像超解像 Single image super-resolution based on trainable feature matching attention network ( http://arxiv.org/abs/2405.18872v1 ) ライセンス: Link先を確認	Qizhou Chen, Qing Shao,	(参考訳) 畳み込みニューラルネットワーク(CNN)は近年,画像の超解像(SR)に広く利用されている。様々な技術は、CNN構造を変更したり、改善された自己認識機構を取り入れることでSR性能を向上させる。興味深いことに、これらの進歩は共通の特徴を共有している。高周波の詳細を明示的に学習する代わりに、特徴マップの自分自身の要素の重み付けされた和を利用して、畳み込みや非局所的といった、暗黙的な特徴処理モードを学ぶ。対照的に、初期の辞書ベースのアプローチは、Low-Resolution (LR)機能にマッチして再構築するために、特徴分解を明示的に学習する。この分析に基づいて、この明示的な特徴学習をCNNにマージし、その表現能力を増強するために、トレーニング可能な特徴マッチング(TFM)を導入する。 TFMでは、トレーニング可能な機能セットが統合され、特徴マッチングを通じて画像のトレーニングから機能を明示的に学習する。さらに,提案するトレーニング可能な特徴マッチング注意ネットワーク(TFMAN)に非局所的およびチャネル的注意を組み込むことにより,SR性能をさらに向上する。本研究では,非局所演算の計算要求を軽減するため,SRNL (Same-size-divided Region-level Non-Local) と呼ばれる簡易な変種を提案する。 SRNLは入力特徴写像から一様に分割されたブロック上で非局所計算を並列に行う。 TFMとSRNLの有効性は、アブレーション研究とモジュール探索を通じて検証される。パラメータ利用を最適化するために、TFMANのバックボーンとして繰り返し畳み込みネットワークを用いる。ベンチマークデータセットに関する総合的な実験により、TFMANはパラメータを減らしながら、ほとんどの比較において優れた結果が得られることが示された。コードはhttps://github.com/qizhou000/tfman.comから入手できる。 Convolutional Neural Networks (CNNs) have been widely employed for image Super-Resolution (SR) in recent years. Various techniques enhance SR performance by altering CNN structures or incorporating improved self-attention mechanisms. Interestingly, these advancements share a common trait. Instead of explicitly learning high-frequency details, they learn an implicit feature processing mode that utilizes weighted sums of a feature map's own elements for reconstruction, akin to convolution and non-local. In contrast, early dictionary-based approaches learn feature decompositions explicitly to match and rebuild Low-Resolution (LR) features. Building on this analysis, we introduce Trainable Feature Matching (TFM) to amalgamate this explicit feature learning into CNNs, augmenting their representation capabilities. Within TFM, trainable feature sets are integrated to explicitly learn features from training images through feature matching. Furthermore, we integrate non-local and channel attention into our proposed Trainable Feature Matching Attention Network (TFMAN) to further enhance SR performance. To alleviate the computational demands of non-local operations, we propose a streamlined variant called Same-size-divided Region-level Non-Local (SRNL). SRNL conducts non-local computations in parallel on blocks uniformly divided from the input feature map. The efficacy of TFM and SRNL is validated through ablation studies and module explorations. We employ a recurrent convolutional network as the backbone of our TFMAN to optimize parameter utilization. Comprehensive experiments on benchmark datasets demonstrate that TFMAN achieves superior results in most comparisons while using fewer parameters. The code is available at https://github.com/qizhou000/tfman.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# クエリとキーは常に関連しているか? : トランスフォーマー波動関数のケーススタディ Are queries and keys always relevant? A case study on Transformer wave functions ( http://arxiv.org/abs/2405.18874v1 ) ライセンス: Link先を確認	Riccardo Rende, Luciano Loris Viteritti,	(参考訳) ドット製品アテンションメカニズムは、元々自然言語処理(NLP)タスク用に設計されたもので、現代のトランスフォーマーの基盤となっている。文中の単語ペア間の意味的関係を、クエリとキー間の類似性を計算することによって、適切にキャプチャする。本研究では、量子多体スピンハミルトニアンの基底状態に近似する変動波関数のパラメトリゼーションの特定の領域において、トランスフォーマーの適合性について検討する。具体的には、格子上の量子量体系の分野における一般的なベンチマークである2次元の$J_1$-$J_2$Heisenbergモデルで数値シミュレーションを行う。標準的な注意機構の性能と、クエリやキーを省略した簡易バージョンを比較し、位置のみに依存することで、計算コストとパラメータ使用量の削減を図り、競合する結果を得る。さらに,標準アテンション機構によって生成されたアテンションマップの解析により,最適化の終了時に,アテンション重みが効果的に入力非依存となることを示す。解析計算により解析結果をサポートし、大規模システムの研究において、なぜクエリとキーが注意機構から省略されるのかを物理的に把握する。興味深いことに、同じ引数を長い入力文の制限で NLP ドメインに拡張することができる。 The dot product attention mechanism, originally designed for natural language processing (NLP) tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum-many body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems. Interestingly, the same arguments can be extended to the NLP domain, in the limit of long input sentences.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# 地域・グローバル・リコースにおける対物的メタルル Counterfactual Metarules for Local and Global Recourse ( http://arxiv.org/abs/2405.18875v1 ) ライセンス: Link先を確認	Tom Bewley, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, Manuela Veloso,	(参考訳) T-CRExは局所的およびグローバル的対実的説明(CE)のための新しいモデルに依存しない手法である。ツリーベースのサロゲートモデルを活用して、モデル行動のグローバルな分析と、ユーザのための多様なリコメンデーションオプションの両方を提供する、最適な領域を示す'メタル'とともに、カウンターファクトルールを学ぶ。実験により、T-CRExは、CEデシデラタにおける既存のルールベースベースラインよりも、桁違いに高速に動作しながら、より優れたアグリゲーション性能を達成することが示された。 We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside 'metarules' denoting their regions of optimality, providing both a global analysis of model behaviour and diverse recourse options for users. Experiments indicate that T-CREx achieves superior aggregate performance over existing rule-based baselines on a range of CE desiderata, while being orders of magnitude faster to run.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# ブロックチェーン生態系の公平性について On Fairness Concerns in the Blockchain Ecosystem ( http://arxiv.org/abs/2405.18876v1 ) ライセンス: Link先を確認	Johnnatan Messias Peixoto Afonso,	(参考訳) ブロックチェーンは、分散化と透明性を促進することで、銀行や金融といった中央集権的なセクターに革命をもたらした。ブロックチェーンでは、情報は参加者やアプリケーションが発行したトランザクションを通じて送信される。マイナーは、より高いインセンティブや手数料を持つものを優先して、ブロックインクルージョンのために準備中の取引を選択し、注文し、検証する。トランザクションを含む順序は、ブロックチェーンの最終状態に影響を与える可能性がある。さらに、ブロックチェーン上で動作するアプリケーションは、中核機能を変更するための意思決定力を分散化するためのガバナンスプロトコルに依存することが多い。これらの変更は、参加者がこれらのアプリケーションとどのようにやりとりするかに影響を与える可能性がある。 1つのトークンが1つの投票に等しいため、複数のトークンを持つ参加者は、提案された変更を支持し、拒否する投票力が高い。この投票力が分散される範囲は疑問の余地があり、少数の保有者の間で集中している場合、ガバナンス攻撃につながる可能性がある。この論文では、BitcoinとEthereumのブロックチェーンを監査し、トランザクションの優先順位を決定するためにマイナーが続く規範を調査します。また、参加者間で投票力が公平に分散されているかどうかを評価するために、Commenなどの分散型ガバナンスプロトコルを監査する。私たちの発見は、ブロックチェーンと分散アプリケーションの将来的な発展に重大な影響を与えます。 Blockchains revolutionized centralized sectors like banking and finance by promoting decentralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state. Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. These changes can affect how participants interact with these applications. Since one token equals one vote, participants holding multiple tokens have a higher voting power to support or reject the proposed changes. The extent to which this voting power is distributed is questionable and if highly concentrated among a few holders can lead to governance attacks. In this thesis, we audit the Bitcoin and Ethereum blockchains to investigate the norms followed by miners in determining the transaction prioritization. We also audit decentralized governance protocols such as Compound to evaluate whether the voting power is fairly distributed among the participants. Our findings have significant implications for future developments of blockchains and decentralized applications.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# 連続製品グラフニューラルネットワーク Continuous Product Graph Neural Networks ( http://arxiv.org/abs/2405.18877v1 ) ライセンス: Link先を確認	Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo,	(参考訳) 複数のグラフ上に定義されたマルチドメインデータの処理は、計算機科学における様々な実践的応用において大きな可能性を秘めている。しかし、現在の手法は主に離散グラフフィルタリングに限られている。グラフ上のテンソル偏微分方程式(TPDEG)は、既存の離散的方法論の限界に対処しながら、複数の相互作用グラフにまたがる構造化データをモデル化するための原則的なフレームワークを提供する。本稿では,PTDEGの自然な解として現れるCITRUS(Continuous Product Graph Neural Networks)を紹介する。 CITRUSは、カルト系グラフ製品からの連続熱核の分離性を利用して、グラフスペクトル分解を効率的に実装する。本研究では,CITRUSの安定性と過度な平滑化特性を,ドメイン固有のグラフ摂動とグラフスペクトルの影響に応答して完全に理論的に解析する。我々は、CITRUSをよく知られた交通・気象時空間予測データセットで評価し、既存の手法よりも優れた性能を示す。 Processing multidomain data defined on multiple graphs holds significant potential in various practical applications in computer science. However, current methods are mostly limited to discrete graph filtering operations. Tensorial partial differential equations on graphs (TPDEGs) provide a principled framework for modeling structured data across multiple interacting graphs, addressing the limitations of the existing discrete methodologies. In this paper, we introduce Continuous Product Graph Neural Networks (CITRUS) that emerge as a natural solution to the TPDEG. CITRUS leverages the separability of continuous heat kernels from Cartesian graph products to efficiently implement graph spectral decomposition. We conduct thorough theoretical analyses of the stability and over-smoothing properties of CITRUS in response to domain-specific graph perturbations and graph spectra effects on the performance. We evaluate CITRUS on well-known traffic and weather spatiotemporal forecasting datasets, demonstrating superior performance over existing approaches.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# 医療応用のための多人数計算によるデータイミューテーションのプライバシ保護 Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications ( http://arxiv.org/abs/2405.18878v1 ) ライセンス: Link先を確認	Julia Jentsch, Ali Burak Ünal, Şeyma Selcan Mağara, Mete Akgün,	(参考訳) 欠落したデータの処理は機械学習では不可欠だが、多くのデータセットにはエラーや非レスポンスによるギャップが含まれている。単純だが不十分なリストワイズ削除のような従来の方法とは異なり、この文献はより高度で効果的な方法を提供し、サンプルのサイズと精度を改善している。しかし、これらの方法はデータセット全体にアクセスする必要があり、これはデータが複数のソースに分散される際のプライバシー規制とは矛盾する。特に医療と医療の分野では、そのようなアクセスは患者に関する機密情報を明らかにする。本研究は,セキュアなマルチパーティ計算を用いた機密データに対するプライバシ保護計算手法に対処し,当事者の機密情報を明らかにすることなく,セキュアな計算を可能にする。本研究では,プライバシ保護手法を用いて,平均,中央値,回帰値,kNNの計算手法を考案した。患者データの保護の重要性を考慮した医療・医療分野を特に対象とし,糖尿病データセット上での方法を示す。糖尿病データセットの実験では、プライバシ保護の計算手法の正しさが検証され、最も誤差が大きいのは3ドル(約3,300円)の10^{-3}$の平文法でした。また,本手法のスケーラビリティをさまざまなサンプルに適用し,実際の医療問題への適用性を示した。分析の結果,全手法がサンプル数と線形にスケールできることが判明した。 kNNを除いて、我々のメソッドのランタイムは、大きなデータセットに使用できることを示している。 Handling missing data is crucial in machine learning, but many datasets contain gaps due to errors or non-response. Unlike traditional methods such as listwise deletion, which are simple but inadequate, the literature offers more sophisticated and effective methods, thereby improving sample size and accuracy. However, these methods require accessing the whole dataset, which contradicts the privacy regulations when the data is distributed among multiple sources. Especially in the medical and healthcare domain, such access reveals sensitive information about patients. This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation, enabling secure computations without revealing any party's sensitive information. In this study, we realized the mean, median, regression, and kNN imputation methods in a privacy-preserving way. We specifically target the medical and healthcare domains considering the significance of protection of the patient data, showcasing our methods on a diabetes dataset. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 \times 10^{-3}$, closely matching plaintext methods. We also analyzed the scalability of our methods to varying numbers of samples, showing their applicability to real-world healthcare problems. Our analysis demonstrated that all our methods scale linearly with the number of samples. Except for kNN, the runtime of all our methods indicates that they can be utilized for large datasets.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# 時空間予測の効率性:因果グラフ処理ニューラルネットワーク Spatiotemporal Forecasting Meets Efficiency: Causal Graph Process Neural Networks ( http://arxiv.org/abs/2405.18879v1 ) ライセンス: Link先を確認	Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo,	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ内のノードとして表されるセンサ(または他の測定方法)間の関係帰納バイアスを活用することにより、時空間予測の高度化を実現している。しかし、現在の手法はしばしばリカレントニューラルネットワーク(RNN)に依存しており、ランタイムとメモリ使用の増加につながっている。さらに、これらの手法は典型的には1ホップの近傍で機能し、受容野の減少を悪化させる。因果グラフプロセス(CGP)は、パラメータの削減とメモリ消費の最小化のために、MLP層の代わりにグラフフィルタを使用する代替手段を提供する。本稿では,CGP と GNN を組み合わせた非線形モデルである Causal Graph Process Neural Network (CGProNet) を紹介する。 CGProNetは高階グラフフィルタを採用し、より少ないパラメータでモデルを最適化し、メモリ使用量を削減し、実行効率を向上させる。本稿では,CGProNetの理論的および実験的安定性解析について概説する。合成および実データに関する実験は、CGProNetの優れた効率性を示し、競合予測性能を維持しながら、メモリと時間要求を最小限に抑える。 Graph Neural Networks (GNNs) have advanced spatiotemporal forecasting by leveraging relational inductive biases among sensors (or any other measuring scheme) represented as nodes in a graph. However, current methods often rely on Recurrent Neural Networks (RNNs), leading to increased runtimes and memory use. Moreover, these methods typically operate within 1-hop neighborhoods, exacerbating the reduction of the receptive field. Causal Graph Processes (CGPs) offer an alternative, using graph filters instead of MLP layers to reduce parameters and minimize memory consumption. This paper introduces the Causal Graph Process Neural Network (CGProNet), a non-linear model combining CGPs and GNNs for spatiotemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency. We present a comprehensive theoretical and experimental stability analysis, highlighting key aspects of CGProNet. Experiments on synthetic and real data demonstrate CGProNet's superior efficiency, minimizing memory and time requirements while maintaining competitive forecasting performance.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# EventZoom: 強化されたニューロモーフィックビジョンのためのイベントベースのデータ拡張への進歩的なアプローチ EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision ( http://arxiv.org/abs/2405.18880v1 ) ライセンス: Link先を確認	Yiting Dong, Xiang He, Guobin Shen, Dongcheng Zhao, Yang Li, Yi Zeng,	(参考訳) Dynamic Vision Sensors (DVS)がキャプチャしたイベントデータは、従来のビデオキャプチャとは異なる視覚処理にユニークなアプローチを提供し、動的およびリアルタイムシナリオにおけるその効率を示す。高時間分解能や低エネルギー消費といったアドバンテージにもかかわらず、イベントデータの適用は、データセットのサイズと多様性の制限による課題に直面している。これを解決するために、イベントデータ用に特別に設計されたデータ拡張戦略であるEventZoomを開発しました。 EventZoomは、時間と空間をインテリジェントにブレンドして、その信頼性を維持しながらデータの多様性と複雑さを高める、プログレッシブな時間戦略を採用している。本手法は,複雑な動的シーンを扱うアルゴリズムの適応性とロバスト性を向上させることを目的としている。教師付き学習,半教師付き学習,教師なし学習を含む,さまざまな教師付き学習フレームワークを対象に,EventZoomを実験的に検証した。以上の結果から,EventZoomはさまざまな学習環境において,強力なイベントベースデータ拡張ツールとしての有効性と適用性を確認するとともに,他のデータ拡張手法を一貫して上回ることを示す。 Event data captured by Dynamic Vision Sensors (DVS) offers a unique approach to visual processing that differs from traditional video capture, showcasing its efficiency in dynamic and real-time scenarios. Despite advantages such as high temporal resolution and low energy consumption, the application of event data faces challenges due to limited dataset size and diversity. To address this, we developed EventZoom -- a data augmentation strategy specifically designed for event data. EventZoom employs a progressive temporal strategy that intelligently blends time and space to enhance the diversity and complexity of the data while maintaining its authenticity. This method aims to improve the quality of data for model training and enhance the adaptability and robustness of algorithms in handling complex dynamic scenes. We have experimentally validated EventZoom across various supervised learning frameworks, including supervised, semi-supervised, and unsupervised learning. Our results demonstrate that EventZoom consistently outperforms other data augmentation methods, confirming its effectiveness and applicability as a powerful event-based data augmentation tool in diverse learning settings.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# 直接雑音最適化を用いた拡散モデルの調整自由配向 Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization ( http://arxiv.org/abs/2405.18881v1 ) ライセンス: Link先を確認	Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang,	(参考訳) 本研究では,人間の嗜好改善など,下流タスクの具体的目的を表す連続報酬関数を用いた拡散モデルのアライメント問題に焦点をあてる。アライメント問題の主目的は、生成したサンプルが目標報酬関数を最大化するように拡散モデルで学習した分布を調整することである。拡散モデルのサンプリング過程において, 直接雑音最適化 (DNO) と呼ばれる新しいアライメント手法を提案する。設計上、DNOはチューニング不要で、生成中にオンライン形式でアライメントが発生するため、プロンプトに依存しない。我々は、DNOの理論的性質を厳密に研究し、また、微分不可能な報酬関数を扱う変種を提案する。さらに,DNO の素直な実装は,最適化されたサンプルが高い報酬を得られるが,事前学習された分布をサポートできない,不当な分配報酬ハック問題に悩まされることも見いだした。この問題を解決するために,古典的高次元統計理論を活用し,確率正規化によるDNO損失の増大を提案する。我々は、人間のフィードバックデータに基づいて訓練された複数の人気報酬関数について広範な実験を行い、提案したDNOアプローチが、最先端の報酬スコアと高画質を、すべて生成に適切な時間予算で達成できることを実証した。 In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.	翻訳日:2024-05-30 18:09:15 公開日:2024-05-29
# DecomCAM: 分解と統合によるサリエンシマップを越えた拡張 DecomCAM: Advancing Beyond Saliency Maps through Decomposition and Integration ( http://arxiv.org/abs/2405.18882v1 ) ライセンス: Link先を確認	Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Linlin Yang, Bo Fan, Jilong Zhong, Juan Zhang, Baochang Zhang,	(参考訳) 複雑な深層ネットワーク、特に事前訓練された視覚言語モデル(VLM)の解釈は、非常に難しい課題である。現在のクラスアクティベーションマップ(CAM)手法では、モデルの意思決定基準を明らかにする領域が強調されているが、明確なサリエンシマップと詳細な解釈容易性は欠如している。このギャップを埋めるために,チャネル活性化マップから共有パターンを抽出する新しい分解・積分法であるDecomCAMを提案する。特異値分解を利用して、DecomCAMはクラス識別活性化マップを直交サブサービスマップ(OSSM)に分解し、ターゲット概念への貢献に基づいて統合する。 6つのベンチマークでの大規模な実験により、DecomCAMは正確な位置決めに優れるだけでなく、解釈可能性と計算効率のバランスを最適化できることがわかった。さらなる分析により、OSSMは識別可能なオブジェクトコンポーネントと相関し、モデルの推論のきめ細かい理解を促進することが判明した。これにより、DecomCAMは高度なディープラーニングモデルの微妙な解釈のための潜在的なツールとして位置づけられる。コードはhttps://github.com/CapricornGuang/DecomCAMで利用可能である。 Interpreting complex deep networks, notably pre-trained vision-language models (VLMs), is a formidable challenge. Current Class Activation Map (CAM) methods highlight regions revealing the model's decision-making basis but lack clear saliency maps and detailed interpretability. To bridge this gap, we propose DecomCAM, a novel decomposition-and-integration method that distills shared patterns from channel activation maps. Utilizing singular value decomposition, DecomCAM decomposes class-discriminative activation maps into orthogonal sub-saliency maps (OSSMs), which are then integrated together based on their contribution to the target concept. Extensive experiments on six benchmarks reveal that DecomCAM not only excels in locating accuracy but also achieves an optimizing balance between interpretability and computational efficiency. Further analysis unveils that OSSMs correlate with discernible object components, facilitating a granular understanding of the model's reasoning. This positions DecomCAM as a potential tool for fine-grained interpretation of advanced deep learning models. The code is avaible at https://github.com/CapricornGuang/DecomCAM.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# イオントラップ誘起ac磁場のキャラクタリゼーション Characterization of ion-trap-induced ac-magnetic fields ( http://arxiv.org/abs/2405.18883v1 ) ライセンス: Link先を確認	Manoj K. Joshi, Milena Guevara-Bertsch, Florian Kranzl, Rainer Blatt, Christian F. Roos,	(参考訳) 高周波イオントラップの非平衡電流によって生じる発振磁場は、精密分光実験に有害な遷移周波数シフトとサイドバンド遷移を誘導する。本稿では、2光子分光法に基づいて、直流バイアス磁場を変更したり、トラップRF電力を変更したりすることなく、rf誘起磁場の強度と方向を決定する手法について述べる。この技術は、狭い直線幅の遷移を特徴とする閉じ込められたイオン実験にも容易に適用できる。 The oscillating magnetic field produced by unbalanced currents in radio-frequency ion traps induces transition frequency shifts and sideband transitions that can be harmful to precision spectroscopy experiments. Here, we describe a methodology, based on two-photon spectroscopy, for determining both the strength and direction of rf-induced magnetic fields without modifying any DC magnetic bias field or changing any trap RF power. The technique is readily applicable to any trapped-ion experiment featuring narrow linewidth transitions.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 汎用ブラックボックス離散最適化のためのミックス・オブ・エクササイズ学習 Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization ( http://arxiv.org/abs/2405.18884v1 ) ライセンス: Link先を確認	Shengcai Liu, Zhiyuan Wang, Yew-Soon Ong, Xin Yao, Ke Tang,	(参考訳) 実世界のアプリケーションは様々な個別の最適化問題を含む。これらの問題のそれぞれに特別なオプティマイザを設計することは困難であり、通常、かなりのドメイン知識と人間の努力を必要とする。したがって、幅広い問題に対するオフザシェルフツールとしての汎用オプティマイザの開発は、長年の研究目標となっている。この記事では、完全なデータ駆動学習最適化(L2O)アプローチによってトレーニングされた、新しい汎用神経オプティマイザであるMEGOを紹介する。 MEGOは、トレーニング問題の解決から経験を学習したエキスパートの混合物で構成されており、バイナリ決定変数による最適化問題の基盤モデルと見なすことができる。解決すべき問題を提示すると、MEGOは関連する専門家モデルを選択して高品質なソリューションを生成する。 MEGOは、スタンドアロンのサンプル効率最適化器として、あるいは既存の検索メソッドと組み合わせて、初期ソリューションジェネレータとして使用することができる。 MEGOの一般性は、3つの古典的な問題クラスと3つの問題クラスを含む6つの問題クラスで検証されている。 MEGOは古典的な問題クラスのみに訓練され、6つの問題クラスすべてで非常によく機能し、ソリューションの品質と効率の両面で広く使われている汎用オプティマイザをはるかに上回っている。 MEGOは特定の最先端のオプティマイザを超越する場合もある。さらに、MEGOは問題間の類似度尺度を提供し、問題分類の新しい視点をもたらす。 L2Oを通した汎用オプティマイザの追求において、MEGOは最初の重要な一歩である。 Real-world applications involve various discrete optimization problems. Designing a specialized optimizer for each of these problems is challenging, typically requiring significant domain knowledge and human efforts. Hence, developing general-purpose optimizers as an off-the-shelf tool for a wide range of problems has been a long-standing research target. This article introduces MEGO, a novel general-purpose neural optimizer trained through a fully data-driven learning-to-optimize (L2O) approach. MEGO consists of a mixture-of-experts trained on experiences from solving training problems and can be viewed as a foundation model for optimization problems with binary decision variables. When presented with a problem to solve, MEGO actively selects relevant expert models to generate high-quality solutions. MEGO can be used as a standalone sample-efficient optimizer or in conjunction with existing search methods as an initial solution generator. The generality of MEGO is validated across six problem classes, including three classic problem classes and three problem classes arising from real-world applications in compilers, network analysis, and 3D reconstruction. Trained solely on classic problem classes, MEGO performs very well on all six problem classes, significantly surpassing widely used general-purpose optimizers in both solution quality and efficiency. In some cases, MEGO even surpasses specialized state-of-the-art optimizers. Additionally, MEGO provides a similarity measure between problems, yielding a new perspective for problem classification. In the pursuit of general-purpose optimizers through L2O, MEGO represents an initial yet significant step forward.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 低ランク・低精度分解を用いた大規模言語モデル圧縮 Compressing Large Language Models using Low Rank and Low Precision Decomposition ( http://arxiv.org/abs/2405.18886v1 ) ライセンス: Link先を確認	Rajarshi Saha, Naomi Sagan, Varun Srivastava, Andrea J. Goldsmith, Mert Pilanci,	(参考訳) 現在、LLM(Large Language Models)の禁止サイズは、メモリ制約のあるエッジデバイスへのデプロイを困難にしている。このアルゴリズムは、重量行列 $\mathbf{W}$ の固有の低ランク構造を利用して、低ランクで低精度な分解を $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$ として近似することで、新しい学習後 LLM 圧縮アルゴリズムである $\rm CALDERA$ を導入する。ここで、$\mathbf{L}$ と $\mathbf{R}$ は低いランク因子であり、$\mathbf{Q}$, $\mathbf{L}$ と $\mathbf{R}$ のエントリは量子化される。モデルを各層に$\mathbf{Q} + \mathbf{L}\mathbf{R}$分解を代入して圧縮し、圧縮されたモデルのゼロショット性能を評価する。さらに、$\mathbf{L}$ と $\mathbf{R}$ は容易にローランク適応が可能となり、ゼロショット性能が向上する。 $\rm CALDERA$ はこの分解を最適化問題 $\min_{\mathbf{Q},\mathbf{L},\mathbf{R}}\lVert(\mathbf{Q} + \mathbf{L}\mathbf{R} - \mathbf{W})\mathbf{X}^\top\rVert_{\rm F}^2$ として定式化し、$\mathbf{X}$ はキャリブレーションデータである。ランク制約回帰フレームワークを用いて,$\rm CALDERA$の近似誤差に関する理論的上限を設定し,目標ランクと量子化ビット予算の影響を分析して,圧縮率とモデル性能のトレードオフについて検討した。その結果、LlaMa-$2$$7$B/$70$BとLlaMa-$3$8$Bの圧縮は、パラメータあたり2.5ドル以下という既存のトレーニング後のLCM圧縮技術より優れていることが示された。実装は以下の通りである。 \href{https://github.com/pilancilab/caldera}{https://github.com/pilancilab/caldera}。 The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy them on memory-constrained edge devices. This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$. Here, $\mathbf{L}$ and $\mathbf{R}$ are low rank factors, and the entries of $\mathbf{Q}$, $\mathbf{L}$ and $\mathbf{R}$ are quantized. The model is compressed by substituting each layer with its $\mathbf{Q} + \mathbf{L}\mathbf{R}$ decomposition, and the zero-shot performance of the compressed model is evaluated. Additionally, $\mathbf{L}$ and $\mathbf{R}$ are readily amenable to low-rank adaptation, consequently enhancing the zero-shot performance. $\rm CALDERA$ obtains this decomposition by formulating it as an optimization problem $\min_{\mathbf{Q},\mathbf{L},\mathbf{R}}\lVert(\mathbf{Q} + \mathbf{L}\mathbf{R} - \mathbf{W})\mathbf{X}^\top\rVert_{\rm F}^2$, where $\mathbf{X}$ is the calibration data, and $\mathbf{Q}, \mathbf{L}, \mathbf{R}$ are constrained to be representable using low-precision formats. Theoretical upper bounds on the approximation error of $\rm CALDERA$ are established using a rank-constrained regression framework, and the tradeoff between compression ratio and model performance is studied by analyzing the impact of target rank and quantization bit budget. Results illustrate that compressing LlaMa-$2$ $7$B/$70$B and LlaMa-$3$ $8$B models obtained using $\rm CALDERA$ outperforms existing post-training LLM compression techniques in the regime of less than $2.5$ bits per parameter. The implementation is available at: \href{https://github.com/pilancilab/caldera}{https://github.com/pilancilab/caldera}.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 深層強化学習に基づく住宅におけるプライバシ・コストのトレードオフを伴う積極的負荷形成戦略 Proactive Load-Shaping Strategies with Privacy-Cost Trade-offs in Residential Households based on Deep Reinforcement Learning ( http://arxiv.org/abs/2405.18888v1 ) ライセンス: Link先を確認	Ruichang Zhang, Youcheng Sun, Mustafa A. Mustafa,	(参考訳) スマートメーターはエネルギー管理と効率を高める上で重要な役割を担いますが、エネルギー消費パターンを通じて詳細なユーザー行動を明らかにすることで、プライバシー上の懸念を生じさせます。近年の研究では、コストのバランスを保ちながらユーザのプライバシを保護するため、バッテリ支援型ロードシェイピング技術の開発に焦点が当てられている。本稿では,攻撃者を誤解させるような人工的な負荷シグネチャを積極的に生成することにより,ユーザのプライバシを保護するために設計された,深層強化学習に基づくロードシェイピングアルゴリズム(PLS-DQN)を提案する。我々は,提案アルゴリズムを非侵入負荷監視(NILM)の敵に対して評価する。その結果,本手法は実際のエネルギー利用パターンを効果的に隠蔽するだけでなく,コスト効率を維持しつつユーザのプライバシを向上させる上で,最先端の手法よりも優れていることがわかった。 Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on developing battery-aided load-shaping techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learning-based load-shaping algorithm (PLS-DQN) designed to protect user privacy by proactively creating artificial load signatures that mislead potential attackers. We evaluate our proposed algorithm against a non-intrusive load monitoring (NILM) adversary. The results demonstrate that our approach not only effectively conceals real energy usage patterns but also outperforms state-of-the-art methods in enhancing user privacy while maintaining cost efficiency.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 生成AIの暖房と使用状況の認識について On Perception of Prevalence of Cheating and Usage of Generative AI ( http://arxiv.org/abs/2405.18889v1 ) ライセンス: Link先を確認	Roman Denkin,	(参考訳) 本報告では,学生の不正行為の頻度に対する教員の認識と,生成AIが学術的整合性に与える影響について検討する。データは、ウプサラ大学情報工学科の教員の匿名調査を通じて収集され、2004年から2023年までの不正調査に関する機関統計とともに分析された。その結果、教師は一般的に不正行為を一般的とはみなさないが、ジェネレーティブAIのアクセシビリティのため、その発生率が増加しているという強い信念が持たれている。ほとんどの教師は、不正行為とAIの使用法を同一視していないが、学生の間で広く使われていることを認めている。さらに、教師の認識は不正傾向の客観的なデータと一致し、学業不条理の進化する風景に対する認識を浮き彫りにしている。 This report investigates the perceptions of teaching staff on the prevalence of student cheating and the impact of Generative AI on academic integrity. Data was collected via an anonymous survey of teachers at the Department of Information Technology at Uppsala University and analyzed alongside institutional statistics on cheating investigations from 2004 to 2023. The results indicate that while teachers generally do not view cheating as highly prevalent, there is a strong belief that its incidence is increasing, potentially due to the accessibility of Generative AI. Most teachers do not equate AI usage with cheating but acknowledge its widespread use among students. Furthermore, teachers' perceptions align with objective data on cheating trends, highlighting their awareness of the evolving landscape of academic dishonesty.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 局所的推定グローバル摂動はフェデレートシャープネス認識最小化のための局所摂動よりも優れている Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization ( http://arxiv.org/abs/2405.18890v1 ) ライセンス: Link先を確認	Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang,	(参考訳) フェデレートラーニング(FL)では、クライアント間のマルチステップ更新とデータの異質性により、よりシャープなミニマでロスランドスケープが発生し、結果のグローバルモデルの性能が低下する。一般的なフェデレーションアプローチは、シャープネス認識最小化(SAM)を局所的なトレーニングに組み込んでこの問題を軽減する。しかし、局所的な損失景観は、異種環境におけるグローバルな損失景観の平坦さを正確に反映するものではなく、結果として、局所的なシャープネスを最小化し、クライアントデータに対する摂動を計算することは、FLにおけるSAMの有効性と集中的なトレーニングとを一致させることができない。この課題を解決するために,FedLESAMを提案する。FedLESAMは,クライアント側におけるグローバルな摂動方向を,前回のアクティブラウンドと現在のラウンドのグローバルモデルの違いとして局所的に推定するアルゴリズムである。改善された品質に加えて、FedLESAMはイテレーション毎に一度だけバックプロパゲーションを実行するため、フェデレートされたSAMベースのアプローチを高速化する。理論的には、一貫した摂動を保証することによって、元のFedSAMよりもわずかに密接な境界を証明している。実験的に,フェデレートされた4つのベンチマークデータセットを3つの分割戦略で包括的に実験し,FedLESAMの優れた性能と効率を実証した。 In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# Few-Shot Testing: 1つのベイズ試験ベクトルを用いた膜深部ニューラルネットワークの不確かさの推定 Few-Shot Testing: Estimating Uncertainty of Memristive Deep Neural Networks Using One Bayesian Test Vector ( http://arxiv.org/abs/2405.18894v1 ) ライセンス: Link先を確認	Soyed Tuhin Ahmed, Mehdi Tahoori,	(参考訳) ニューラルネットワーク(NN)のようなディープラーニングアルゴリズムのパフォーマンスは、最近大幅に向上し、多くのドメインで最先端のパフォーマンスを達成することができる。しかし、メモリと計算リソースの制約のため、エッジデバイスにNNを実装するのは難しい作業である。したがって、メモリメモリ(CIM)などのハードウェアアクセラレータは、行列ベクトル乗算(行列ベクトル乗算)など、最も一般的な演算を高速化するために開発された。しかし、固有のデバイス特性、温度などの外部環境要因、未熟な製造プロセスにより、メムリスタは製造や実行中に発生する欠陥や変動など、様々な非理想に悩まされる。その結果、モデルによる予測に完全な信頼が欠如している。本稿では,デバイス非イデアルの存在下でハードウェアアクセラレーターが行うNN予測の信頼性を向上させるために,memristor-based CIMハードウェア上で実装されたNNのモデル不確かさを推定できるベイズテストベクトル生成フレームワークを提案する。従来の点推定試験ベクトル生成法と比較して,本手法は異なるモデル次元でより一般化可能であり,ハードウェアに1つのベイズベクトルだけを格納する必要がある。提案手法は, 異なるモデル次元, タスク, 故障率, 変動ノイズに基づいて評価し, メモリオーバーヘッドを0.024$ MB に抑えながら, 常に100\% のカバレッジを達成可能であることを示す。 The performance of deep learning algorithms such as neural networks (NNs) has increased tremendously recently, and they can achieve state-of-the-art performance in many domains. However, due to memory and computation resource constraints, implementing NNs on edge devices is a challenging task. Therefore, hardware accelerators such as computation-in-memory (CIM) with memristive devices have been developed to accelerate the most common operations, i.e., matrix-vector multiplication. However, due to inherent device properties, external environmental factors such as temperature, and an immature fabrication process, memristors suffer from various non-idealities, including defects and variations occurring during manufacturing and runtime. Consequently, there is a lack of complete confidence in the predictions made by the model. To improve confidence in NN predictions made by hardware accelerators in the presence of device non-idealities, in this paper, we propose a Bayesian test vector generation framework that can estimate the model uncertainty of NNs implemented on memristor-based CIM hardware. Compared to the conventional point estimate test vector generation method, our method is more generalizable across different model dimensions and requires storing only one test Bayesian vector in the hardware. Our method is evaluated on different model dimensions, tasks, fault rates, and variation noise to show that it can consistently achieve $100\%$ coverage with only $0.024$ MB of memory overhead.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 経験的方程式開発のための単位認識型遺伝的プログラミング Unit-Aware Genetic Programming for the Development of Empirical Equations ( http://arxiv.org/abs/2405.18896v1 ) ライセンス: Link先を確認	Julia Reuter, Viktor Martinek, Roland Herzog, Sanaz Mostaghim,	(参考訳) 経験方程式を開発する際には、ドメインの専門家はこれらを正確で物理的法則に従うように要求する。しばしば、未知の単位を持つ定数は方程式とともに発見される。従来の単位認識型遺伝的プログラミング(GP)アプローチは、未決定単位の未知定数を含む場合には使用できない。本稿では,未知の単位を「ジョーカー」として伝播させ,単位違反の大きさを返却する次元解析手法を提案する。本稿では,GPアルゴリズムの次元解析を統合するために,エボリューティブカリング,修復機構,多目的アプローチという3つの手法を提案する。基礎的真理を持つデータセットの実験は、エボリューティブ・カリングの同等の性能を示し、次元解析なしでベースラインへの多目的アプローチを示す。根拠のないデータセットの大規模な分析により、単位認識アルゴリズムは、単位依存の解を生成する一方で、精度の低い犠牲しか生じないことが明らかとなった。全体として、単元型経験方程式を開発するための有望な新しいアプローチを提示した。 When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# MLAE:パラメータ効率の良いファインチューニングのためのマスク付きLoRAエキスパート MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2405.18897v1 ) ライセンス: Link先を確認	Junjie Wang, Guangjing Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Qicheng Lao,	(参考訳) 大規模事前学習モデルの完全微調整に要する広範囲なパラメータ更新による課題に対して,ローランド適応(LoRA)を例として,パラメータ効率のよい微調整(PEFT)法が出現している。 LoRAは微調整のプロセスを単純化するが、低ランク行列における一定の冗長性に苦しむ可能性があり、単にランクを上げることによる有効性は限られている。これらの問題に対処するため、自然な考え方は、低ランク行列の学習プロセスの独立性と多様性を高めることである。そこで我々は,マスクの概念をPEFTに適用する革新的な手法であるMasked LoRA Experts (MLAE)を提案する。本手法は,低ランク行列を独立したランク1サブマトリクス,すなわち 'experts' に変換するセル分解戦略を取り入れ,独立性を向上する。さらに、これらの専門家を訓練中に選択的に活性化する二項マスク行列を導入し、専門家レベルのドロップアウト戦略に基づいて、より多様で異方性のある学習を促進する。本研究により, この選択的活性化は, 性能の向上だけでなく, MLAE間のパラメータ類似性を顕著に低下させ, パラメータ数を増加させると共に, モデルの品質を著しく向上させると共に, より多様な知識獲得を促進することが明らかとなった。注目すべきことに、MLAEはVTAB-1kベンチマークで平均78.8%、FGVCベンチマークで90.9%の精度で新しいSOTA性能を実現し、優れた性能を示している。私たちのコードはhttps://github.com/jie040109/MLAEで利用可能です。 In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or ``experts'', thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model while barely increasing the parameter count. Remarkably, MLAE achieves new SOTA performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, demonstrating superior performance. Our code is available at https://github.com/jie040109/MLAE.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# スペクトルの忠実度と空間的エンハンスメント:衛星画像のためのパンシャープ技術の評価とカスケード Spectral Fidelity and Spatial Enhancement: An Assessment and Cascading of Pan-Sharpening Techniques for Satellite Imagery ( http://arxiv.org/abs/2405.18900v1 ) ライセンス: Link先を確認	Abdul Aziz A. B, A. B Abdul Rahim,	(参考訳) 本研究は, スペクトルの忠実度と空間的エンハンスメントの重要な側面に着目し, 衛星画像のパンシャーピング技術に関する包括的評価を行う。リモートセンシングにおける情報的アルゴリズム選択の必要性から,既存手法の詳細な比較分析により,新たなカスケード・構造化評価フレームワークが提案されている。研究結果は,空間分解能の増強とともに,スペクトル精度が約88%の複雑なトレードオフを浮き彫りにした。この研究は、パンシャーピングの実践的意義に光を当て、リモートセンシングアプリケーションにおけるスペクトル面と空間面の両方の重要性を強調している。様々なパンシャーピングアルゴリズムは、その性能の全体像を提供するために体系的に採用され、その能力と限界のより深い理解に寄与した。 This research presents a comprehensive assessment of pan-sharpening techniques for satellite imagery, focusing on the critical aspects of spectral fidelity and spatial enhancement. Motivated by the need for informed algorithm selection in remote sensing, A novel cascaded and structured evaluation framework has been proposed with a detailed comparative analysis of existing methodologies. The research findings underscore the intricate trade-offs between spectral accuracy of about 88\% with spatial resolution enhancement. The research sheds light on the practical implications of pan-sharpening and emphasizes the significance of both spectral and spatial aspects in remote sensing applications. Various pan-sharpening algorithms were systematically employed to provide a holistic view of their performance, contributing to a deeper understanding of their capabilities and limitations.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# デファリングシステム評価のための因果的枠組み A Causal Framework for Evaluating Deferring Systems ( http://arxiv.org/abs/2405.18902v1 ) ライセンス: Link先を確認	Filippo Palomba, Andrea Pugnana, José Manuel Alvarez, Salvatore Ruggieri,	(参考訳) 定義システムは教師付き機械学習(ML)モデルを拡張し、予測を人間の専門家に延期する。しかし、遅延戦略がシステム精度に与える影響を評価することは、まだ見過ごされている領域である。本稿では、因果レンズによる遅延システムの評価により、このギャップを埋める。我々は、因果推論の潜在的な結果フレームワークと延期システムとを関連づける。これにより,遅延戦略の因果的影響が予測精度に与える影響を明らかにすることができる。 2つのシナリオを区別する。最初の例では、遅延したインスタンスに対して、人間とMLモデルの予測の両方にアクセスできます。そのような場合、遅延したインスタンスとそれらの集合に対する個々の因果効果を特定できる。第2のシナリオでは、遅延したインスタンスに対して、人間の予測しか利用できない。この場合、回帰不連続設計を利用して局所因果効果を推定できる。文献からの7つの遅延システムの合成および実データに対するアプローチを実証的に評価した。 Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This allows us to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we can access both the human and the ML model predictions for the deferred instances. In such a case, we can identify the individual causal effects for deferred instances and aggregates of them. In the second scenario, only human predictions are available for the deferred instances. In this case, we can resort to regression discontinuity design to estimate a local causal effect. We empirically evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 厳密なスコーリング規則による言語生成 Language Generation with Strictly Proper Scoring Rules ( http://arxiv.org/abs/2405.18906v1 ) ライセンス: Link先を確認	Chenze Shao, Fandong Meng, Yijin Liu, Jie Zhou,	(参考訳) 最大推定(MLE)に基づく言語生成は,テキスト生成の基本的なアプローチとなっている。最大確率推定は通常、統計決定理論における対数スコアとしても知られる対数類似損失を最小化することによって行われる。対数スコアは、モデルが真の確率を報告したときにのみ期待されるスコアが最大化されるような、誠実な予測を促進するという意味では、厳密には適切である。多くの厳密な適切な採点規則が存在するが、対数スコアは観察されたサンプルの確率にのみ依存する唯一の局所採点ルールであり、自然テキストの指数的に大きなサンプル空間を扱うことができる。本研究では,非局所的なスコアリングルールを用いた言語モデリングを可能にするため,スコアリングルールを言語生成に適用するための簡単な戦略を提案する。この戦略を活用することで、対数スコアの代替として、2つの古典的な厳密なスコアルールであるブライアスコアと球面スコアを用いて言語生成モデルを訓練する。実験結果から, 他のハイパーパラメータを調整せずに損失関数を置換するだけで, モデル生成能力が大幅に向上することが示唆された。さらに、LLaMA-7BやLLaMA-13Bのような大きな言語モデル(LLM)にも拡張可能である。ソースコード: \url{https://github.com/shaochenze/ScoringRulesLM}。 Language generation based on maximum likelihood estimation (MLE) has become the fundamental approach for text generation. Maximum likelihood estimation is typically performed by minimizing the log-likelihood loss, also known as the logarithmic score in statistical decision theory. The logarithmic score is strictly proper in the sense that it encourages honest forecasts, where the expected score is maximized only when the model reports true probabilities. Although many strictly proper scoring rules exist, the logarithmic score is the only local scoring rule among them that depends exclusively on the probability of the observed sample, making it capable of handling the exponentially large sample space of natural text. In this work, we propose a straightforward strategy for adapting scoring rules to language generation, allowing for language modeling with any non-local scoring rules. Leveraging this strategy, we train language generation models using two classic strictly proper scoring rules, the Brier score and the Spherical score, as alternatives to the logarithmic score. Experimental results indicate that simply substituting the loss function, without adjusting other hyperparameters, can yield substantial improvements in model's generation capabilities. Moreover, these improvements can scale up to large language models (LLMs) such as LLaMA-7B and LLaMA-13B. Source code: \url{https://github.com/shaochenze/ScoringRulesLM}.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# クロスドメインデータによるシンガポールのパーキング可用性予測 - 新しいデータセットとデータ駆動アプローチ Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach ( http://arxiv.org/abs/2405.18910v1 ) ライセンス: Link先を確認	Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang,	(参考訳) 車両の増加は、効率的な駐車スペース管理の必要性を強調している。リアルタイムパーキングアベイラビリティ(PA)の予測は、交通渋滞とそれに伴う社会問題を軽減するのに役立つ。本研究では,シンガポールにおける今後のPAを,様々な領域の複雑な要因で総合的に予測することを目的としている。 1)新しいデータセット: シンガポールの1,687の駐車場から1年分のPAデータを含む,さまざまな空間的・時間的要因に富んだデータセットについて紹介する。 2)データ駆動アプローチ: 数千の駐車場にまたがる将来のPAを集合的かつ効率的に予測する,新しいディープラーニングフレームワークであるDeepPAを提案する。 (3) 大規模実験と展開: DeepPAは既存の先進モデルと比較して最大3時間予測の予測誤差を9.2%削減することを示した。さらに,DeepPAを実践的なWebベースプラットフォームに実装し,ドライバーを支援するリアルタイムPA予測と,シンガポールの知事に都市計画を通知する。データセットとソースコードはhttps://github.com/yoshall/SINPAで公開しています。 The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domains. The contributions in this paper are listed as follows: (1) A New Dataset: We introduce the \texttt{SINPA} dataset, containing a year's worth of PA data from 1,687 parking lots in Singapore, enriched with various spatial and temporal factors. (2) A Data-Driven Approach: We present DeepPA, a novel deep-learning framework, to collectively and efficiently predict future PA across thousands of parking lots. (3) Extensive Experiments and Deployment: DeepPA demonstrates a 9.2% reduction in prediction error for up to 3-hour forecasts compared to existing advanced models. Furthermore, we implement DeepPA in a practical web-based platform to provide real-time PA predictions to aid drivers and inform urban planning for the governors in Singapore. We release the dataset and source code at https://github.com/yoshall/SINPA.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 能動学習とモデル選択の相乗化による対人テスト時間適応の探索 Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection ( http://arxiv.org/abs/2405.18911v1 ) ライセンス: Link先を確認	Yushu Li, Yongyi Su, Xulei Yang, Kui Jia, Xun Xu,	(参考訳) 既存のテスト時間適応(TTA)アプローチは、ラベルのないテストデータストリームでモデルに適応することが多い。近年の研究では,Human-In-the-Loop Test-Time Adaptation (HILTTA)と呼ばれる,限定的な人間のアノテーションを導入することで,仮説を緩和した。既存のHILTTAの焦点は、ラベル付けする最も情報に富むサンプル、すなわちアクティブラーニングの選択にある。本研究では,超パラメータに敏感なTTAの落とし穴を動機として,能動的学習とモデル選択の相乗化によるHILTTAへのアプローチを提案する。具体的には、まず人間のアノテーション(能動的学習)のサンプルを選択し、次にラベル付きデータを用いて最適なハイパーパラメータ(モデル選択)を選択する。サンプル選択戦略は、アクティブラーニングとモデル選択の目的とのバランスを考慮し、サンプルを選択するために調整される。提案手法は,最先端のHILTTA手法やストリームベースのアクティブラーニング手法よりも優れている,既製のTTA手法と互換性があることを4つのTTAデータセットで実証した。重要な点として,本提案手法は,市販のTTA方式で常に最悪の過度パラメータの選択を防止できる。ソースコードは公開時に公開される。 Existing test-time adaptation (TTA) approaches often adapt models with the unlabeled testing data stream. A recent attempt relaxed the assumption by introducing limited human annotation, referred to as Human-In-the-Loop Test-Time Adaptation (HILTTA) in this study. The focus of existing HILTTA lies on selecting the most informative samples to label, a.k.a. active learning. In this work, we are motivated by a pitfall of TTA, i.e. sensitive to hyper-parameters, and propose to approach HILTTA by synergizing active learning and model selection. Specifically, we first select samples for human annotation (active learning) and then use the labeled data to select optimal hyper-parameters (model selection). A sample selection strategy is tailored for choosing samples by considering the balance between active learning and model selection purposes. We demonstrate on 4 TTA datasets that the proposed HILTTA approach is compatible with off-the-shelf TTA methods which outperform the state-of-the-art HILTTA methods and stream-based active learning methods. Importantly, our proposed method can always prevent choosing the worst hyper-parameters on all off-the-shelf TTA methods. The source code will be released upon publication.	翻訳日:2024-05-30 17:59:30 公開日:2024-05-29
# 土壌水分予測のためのスマート農業における時系列基盤モデルの導入 Leveraging Time-Series Foundation Models in Smart Agriculture for Soil Moisture Forecasting ( http://arxiv.org/abs/2405.18913v1 ) ライセンス: Link先を確認	Boje Deforce, Bart Baesens, Estefanía Serral Asensio,	(参考訳) 近年、自然言語処理とコンピュータビジョンの基礎モデルが急増し、様々な領域におけるイノベーションが加速した。この進歩に触発されて、スマート農業における時系列予測の基礎モデルの可能性を探る。具体的には、土壌水ポテンシャル(\psi_\mathrm{soil}$)を予測するため、土壌水の状態(SOTA)時系列基盤モデルである$\texttt{TimeGPT}$という新しい応用法を提案する。伝統的に、このタスクは幅広い入力変数に依存する。我々は$\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in:$i$) a zero-shot setting,$ii$) 歴史的$\psi_\mathrm{soil}$ Measurement,$iii$) 細調整された設定を探索し、モデルに外因性変数を追加する。我々は$\texttt{TimeGPT}$のパフォーマンスを、$\psi_\mathrm{soil}$を予測するための確立されたSOTAベースラインモデルと比較する。我々の結果は、$\texttt{TimeGPT}$が、歴史的な$\psi_\mathrm{soil}$データのみを使用して、競合予測精度を達成し、農業アプリケーションに対するその顕著な可能性を強調していることを示している。本研究は、伝統的に大規模なデータ収集やドメインの専門知識に依存したタスクの予測を可能にすることにより、農業における持続的開発のための時系列モデル構築の道を開くものである。 The recent surge in foundation models for natural language processing and computer vision has fueled innovation across various domains. Inspired by this progress, we explore the potential of foundation models for time-series forecasting in smart agriculture, a field often plagued by limited data availability. Specifically, this work presents a novel application of $\texttt{TimeGPT}$, a state-of-the-art (SOTA) time-series foundation model, to predict soil water potential ($\psi_\mathrm{soil}$), a key indicator of field water status that is typically used for irrigation advice. Traditionally, this task relies on a wide array of input variables. We explore $\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in: ($i$) a zero-shot setting, ($ii$) a fine-tuned setting relying solely on historic $\psi_\mathrm{soil}$ measurements, and ($iii$) a fine-tuned setting where we also add exogenous variables to the model. We compare $\texttt{TimeGPT}$'s performance to established SOTA baseline models for forecasting $\psi_\mathrm{soil}$. Our results demonstrate that $\texttt{TimeGPT}$ achieves competitive forecasting accuracy using only historical $\psi_\mathrm{soil}$ data, highlighting its remarkable potential for agricultural applications. This research paves the way for foundation time-series models for sustainable development in agriculture by enabling forecasting tasks that were traditionally reliant on extensive data collection and domain expertise.	翻訳日:2024-05-30 17:49:44 公開日:2024-05-29
# 信頼に満ちた結束に向けて:大規模言語モデルがレゾネーターを橋渡ししている Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners ( http://arxiv.org/abs/2405.18915v1 ) ライセンス: Link先を確認	Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao,	(参考訳) 大言語モデル(LLM)は、深刻な不信の連鎖(CoT)問題に悩まされる。従来の研究では測定と説明が試みられていたが、CoTの内部での詳細な分析は欠如しており、すべての推論コンポーネント間の相互作用を共同で考慮していない。本稿では,まずCoTステップの粒度におけるCoT忠実度問題について検討し,集中的推論と分散推論という2つの推論パラダイムを同定し,信頼度との関係を明らかにする。その後,文脈,CoT,回答の因果関係に関する共同分析を行った。その結果、LLMが回答を予測すると、文脈からCoTに欠けている正しい情報を思い出すことができ、不誠実な問題を引き起こすことが証明された。最後に,この問題を緩和するための推論ブリッジ手法を提案する。そこでは属性法を用いてCoT生成のヒントとして情報をリコールし,その意味的一貫性と属性スコアに基づいてノイズの多いCoTをフィルタリングする。大規模な実験は、我々のアプローチが不誠実なCoT問題を効果的に軽減することを示した。 Large language models (LLMs) suffer from serious unfaithful chain-of-thought (CoT) issues. Previous work attempts to measure and explain it but lacks in-depth analysis within CoTs and does not consider the interactions among all reasoning components jointly. In this paper, we first study the CoT faithfulness issue at the granularity of CoT steps, identify two reasoning paradigms: centralized reasoning and distributed reasoning, and find their relationship with faithfulness. Subsequently, we conduct a joint analysis of the causal relevance among the context, CoT, and answer during reasoning. The result proves that, when the LLM predicts answers, it can recall correct information missing in the CoT from the context, leading to unfaithfulness issues. Finally, we propose the inferential bridging method to mitigate this issue, in which we use the attribution method to recall information as hints for CoT generation and filter out noisy CoTs based on their semantic consistency and attribution scores. Extensive experiments demonstrate that our approach effectively alleviates the unfaithful CoT problem.	翻訳日:2024-05-30 17:49:44 公開日:2024-05-29
# 対実データ増大を考慮した因果行動の影響 Causal Action Influence Aware Counterfactual Data Augmentation ( http://arxiv.org/abs/2405.18917v1 ) ライセンス: Link先を確認	Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, Georg Martius,	(参考訳) オフラインデータは、ロボットに複雑な振る舞いを教えるための価値と実践的なリソースである。理想的には、学習エージェントは、利用可能なデモンストレーションの不足によって制約されるべきではない。しかし、現実のシナリオの複雑さは通常、ニューラルネットワークポリシーが素早い相関関係を拾い上げ、非因果関係を学ぶのを防ぐために大量のデータを必要とします。 CAIACは、オンライン環境のインタラクションにアクセスすることなく、固定データセットから実現可能な合成遷移を生成できるデータ拡張手法である。因果的影響を定量化するための原則的手法を利用することで、データセット内の独立軌跡間の状態空間の$\it{action}$-unffected部分を交換することで、反ファクト的推論を行うことができる。これは、分散シフトに対するオフライン学習アルゴリズムのロバスト性を大幅に向上させることを実証的に示す。 Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\it{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift.	翻訳日:2024-05-30 17:49:44 公開日:2024-05-29
# 小惑星帯における低推力移動の計算 : 天体力学操作と機械学習アプローチの比較 Computing low-thrust transfers in the asteroid belt, a comparison between astrodynamical manipulations and a machine learning approach ( http://arxiv.org/abs/2405.18918v1 ) ライセンス: Link先を確認	Giacomo Acciarini, Laurent Beauregard, Dario Izzo,	(参考訳) 低推力軌道は、小惑星帯のミッションにおける科学的出力とコスト効率の最適化に重要な役割を果たしている。高スラスト移動とは異なり、低スラスト軌道は複雑な最適制御問題を解く必要がある。この複雑さは、軌道力学の複雑さによって訪れた小惑星の数とともに指数関数的に増加する。文献では、解析的および機械学習技術を含む、完全な最適化なしに低推力転送を近似する手法が提案されている。本研究では,新しい解析近似を提案し,その精度と性能を機械学習手法と比較する。解析的近似は軌道理論を利用して軌道のコストを推定するが、機械学習はよりブラックボックスなアプローチを採用し、ニューラルネットワークを利用して様々な属性に基づいて最適な移動を予測する。私たちは、時間と燃料の最適制御問題を解決することで、約300万回の転送のデータセットを構築します。このデータベース上の2つの手法の比較は、特に長い転送において、機械学習の優位性を明らかにしている。多変量移動のような課題にもかかわらず、どちらのアプローチも、多くの小惑星を含む軌道のデータベース上で、最終的な質量誤差において数パーセント以内の精度を維持している。この研究は、小惑星帯におけるミッション機会の効率的な探索に寄与し、様々な近似戦略の強さと限界についての洞察を提供する。 Low-thrust trajectories play a crucial role in optimizing scientific output and cost efficiency in asteroid belt missions. Unlike high-thrust transfers, low-thrust trajectories require solving complex optimal control problems. This complexity grows exponentially with the number of asteroids visited due to orbital mechanics intricacies. In the literature, methods for approximating low-thrust transfers without full optimization have been proposed, including analytical and machine learning techniques. In this work, we propose new analytical approximations and compare their accuracy and performance to machine learning methods. While analytical approximations leverage orbit theory to estimate trajectory costs, machine learning employs a more black-box approach, utilizing neural networks to predict optimal transfers based on various attributes. We build a dataset of about 3 million transfers, found by solving the time and fuel optimal control problems, for different time of flights, which we also release open-source. Comparison between the two methods on this database reveals the superiority of machine learning, especially for longer transfers. Despite challenges such as multi revolution transfers, both approaches maintain accuracy within a few percent in the final mass errors, on a database of trajectories involving numerous asteroids. This work contributes to the efficient exploration of mission opportunities in the asteroid belt, providing insights into the strengths and limitations of different approximation strategies.	翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

Title

Authors

Abstract

論文公表日・翻訳日

# 早期軽度認知障害の解剖学的バイオマーカー同定のための機械学習アプローチ

A Machine Learning Approach for Identifying Anatomical Biomarkers of Early Mild Cognitive Impairment ( http://arxiv.org/abs/2407.00040v1 )

ライセンス: Link先を確認

Alwani Liyana Ahmad, Jose Sanchez-Bornot, Roberto C. Sotero, Damien Coyle, Zamzuri Idris, Ibrahima Faye,

(参考訳) アルツハイマー病(英語: Alzheimer's Disease、AD)は、認知機能や運動機能に障害を与え、高齢化に主に影響を及ぼす進行性神経変性疾患である。磁気共鳴画像(MRI)のようなアクセス可能な方法でADを早期に検出することは、疾患の進行を停止または遅らせるための効果的な介入を開発するために不可欠である。本研究の目的は、MRIベースのバイオマーカーを選択し、個人を健康的なコントロール(HC)と不安定なコントロール(uHC)に分類する機械学習手法を網羅的に分析することである。この研究は、アルツハイマー病ニューロインフォマティクスイニシアチブ(ADNI)とOASIS-3(Open Access Series of Imaging Studies)のMRIデータを利用しており、HCとuHCの両方の参加者に焦点を当てている。この研究は、バランスの取れたデータセットとバランスの取れていないデータセットの分類法をテストすることで、不均衡なデータの課題に対処し、多項式回帰を用いてデータを調和させて、年齢、性別、頭蓋内容積などのニュアンス変数を緩和する。その結果、Gaussian Naive Bayes と RusBoost の分類器は最適な性能を示し、それぞれ ADNI データセット上で76.46% と 72.48% の精度を達成した。 OASIS-3データセットでは、Kernel Naive BayesとRusBoostは64.66%から75.71%のアキュラシーを発生させ、年齢に合わせたデータセットをさらに改善した。後角皮質、海馬、外側心室、外側眼窩前頭皮質などの脳領域は、早期の認知機能低下の過程で大きな影響が認められる。小さなサンプルサイズのような制限にもかかわらず、この研究の調和化アプローチはバイオマーカーの選択の堅牢性を高め、MRIを用いた早期AD検出のための半自動機械学習パイプラインの可能性を示している。

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder that primarily affects the aging population by impairing cognitive and motor functions. Early detection of AD through accessible methodologies like magnetic resonance imaging (MRI) is vital for developing effective interventions to halt or slow the disease's progression. This study aims to perform a comprehensive analysis of machine learning techniques for selecting MRI-based biomarkers and classifying individuals into healthy controls (HC) and unstable controls (uHC) who later show mild cognitive impairment within five years. The research utilizes MRI data from the Alzheimer's Disease Neuroinformatics Initiative (ADNI) and the Open Access Series of Imaging Studies 3 (OASIS-3), focusing on both HC and uHC participants. The study addresses the challenges of imbalanced data by testing classification methods on balanced and unbalanced datasets, and harmonizes data using polynomial regression to mitigate nuisance variables like age, gender, and intracranial volume. Results indicate that Gaussian Naive Bayes and RusBoost classifiers shows an optimal performance, achieving accuracies of up to 76.46% and 72.48% respectively on the ADNI dataset. For the OASIS-3 dataset, Kernel Naive Bayes and RusBoost yield accuracies ranging from 64.66% to 75.71%, improving further in age-matched datasets. Brain regions like the entorhinal cortex, hippocampus, lateral ventricle, and lateral orbitofrontal cortex are identified as significantly impacted during early cognitive decline. Despite limitations such as small sample sizes, the study's harmonization approach enhances the robustness of biomarker selection, suggesting the potential of this semi-automatic machine learning pipeline for early AD detection using MRI.

翻訳日:2024-07-22 22:38:24 公開日:2024-05-29

# cryoSPHERE:Cryo EMからの単一粒子の不均一な再構成

cryoSPHERE: Single-particle heterogeneous reconstruction from cryo EM ( http://arxiv.org/abs/2407.01574v1 )

ライセンス: Link先を確認

Gabriel Ducrocq, Lukas Grunewald, Sebastian Westenhoff, Fredrik Lindsten,

(参考訳) タンパク質の3次元構造は、その機能を決定する上で重要な役割を果たす。 AlphaFoldのような手法はアミノ酸配列のみに基づくタンパク質構造予測に革命をもたらした。しかし、タンパク質はしばしば複数の異なるコンフォメーションに出現し、完全なコンフォメーション分布を解決することは極めて重要である。単一粒子の低温電子顕微鏡(cryo EM)は、与えられたタンパク質の多数の画像をキャプチャする強力なツールであり、しばしば異なるコンフォーメーション(粒子として参照)を持つ。しかし、この画像はタンパク質の非常にノイズの多い投射であり、Cryo EM再構成の伝統的な方法は、1つまたは数個のコンフォメーションの回復に限られている。本稿では,AlphaFoldのタンパク質構造を入力として利用する深層学習手法であるCryoSPHEREを紹介する。この定式化は、単一のタンパク質構造の有意義な再構成を取り戻すのに十分な制約をもたらすことが示されている。異種再建の現況に対して一貫した改善が見られた例を3例に挙げる。

The three-dimensional structure of a protein plays a key role in determining its function. Methods like AlphaFold have revolutionized protein structure prediction based only on the amino-acid sequence. However, proteins often appear in multiple different conformations, and it is highly relevant to resolve the full conformational distribution. Single-particle cryo-electron microscopy (cryo EM) is a powerful tool for capturing a large number of images of a given protein, frequently in different conformations (referred to as particles). The images are, however, very noisy projections of the protein, and traditional methods for cryo EM reconstruction are limited to recovering a single, or a few, conformations. In this paper, we introduce cryoSPHERE, a deep learning method that takes as input a nominal protein structure, e.g. from AlphaFold, learns how to divide it into segments, and how to move these as approximately rigid bodies to fit the different conformations present in the cryo EM dataset. This formulation is shown to provide enough constraints to recover meaningful reconstructions of single protein structures. This is illustrated in three examples where we show consistent improvements over the current state-of-the-art for heterogeneous reconstruction.

翻訳日:2024-07-22 22:18:55 公開日:2024-05-29

# Hebbian Learningを用いたホップフィールドネットワークのプロトタイプ解析

Prototype Analysis in Hopfield Networks with Hebbian Learning ( http://arxiv.org/abs/2407.03342v1 )

ライセンス: Link先を確認

Hayden McAlister, Anthony Robins, Lech Szymanski,

(参考訳) ホップフィールドネットワークにおけるプロトタイプ形成について論じる。通常、高い相関状態を持つヘビアン学習は、劣化したメモリ性能をもたらす。このような学習は、未学習状態が大きな相関した状態のサブセットの代表として出現し、能力難を緩和するプロトタイプ形成に繋がることを示す。このプロセスは、人間の認知におけるプロトタイプ学習と類似している。本稿では,心理学,統計物理学,計算機科学からの貢献を網羅した,連想記憶におけるプロトタイプ学習の実質的な文献レビューを行う。理論的な観点からプロトタイプの生成を分析し,学習用プロトタイプのサンプル数,これらのサンプルのノイズ数,非サンプル状態の数に基づいて,これらの状態に対する安定性条件を導出する。安定性条件は、安定性変化の要因としてプロトタイプ状態に対する安定性の確率を構築するために用いられる。また、従来のネットワーク分析と類似点があり、プロトタイプのキャパシティを見つけることができます。標準ヘビアン学習を用いた単純なホップフィールドネットワークを用いた実験により,プロトタイプ形成に対するこれらの期待を裏付ける。我々は実験を、複数のプロトタイプでデータに基づいて訓練されたホップフィールドネットワークに拡張し、同時に複数のプロトタイプを安定化できるネットワークを見つける。複数のプロトタイプ状態のアトラクションの流域を測定し,実例数と実例の一致により,アトラクタ強度が増大することを示した。我々はプロトタイプ状態の安定性と支配性をこれらの状態のエネルギープロファイルと結びつけ、特にプロファイル形状をターゲット状態や他の刺激状態と比較する。

We discuss prototype formation in the Hopfield network. Typically, Hebbian learning with highly correlated states leads to degraded memory performance. We show this type of learning can lead to prototype formation, where unlearned states emerge as representatives of large correlated subsets of states, alleviating capacity woes. This process has similarities to prototype learning in human cognition. We provide a substantial literature review of prototype learning in associative memories, covering contributions from psychology, statistical physics, and computer science. We analyze prototype formation from a theoretical perspective and derive a stability condition for these states based on the number of examples of the prototype presented for learning, the noise in those examples, and the number of non-example states presented. The stability condition is used to construct a probability of stability for a prototype state as the factors of stability change. We also note similarities to traditional network analysis, allowing us to find a prototype capacity. We corroborate these expectations of prototype formation with experiments using a simple Hopfield network with standard Hebbian learning. We extend our experiments to a Hopfield network trained on data with multiple prototypes and find the network is capable of stabilizing multiple prototypes concurrently. We measure the basins of attraction of the multiple prototype states, finding attractor strength grows with the number of examples and the agreement of examples. We link the stability and dominance of prototype states to the energy profile of these states, particularly when comparing the profile shape to target states or other spurious states.

翻訳日:2024-07-22 22:09:04 公開日:2024-05-29

# 近代ホップフィールドネットワークにおけるロバスト性向上とハイパーパラメータ選択

Improved Robustness and Hyperparameter Selection in Modern Hopfield Networks ( http://arxiv.org/abs/2407.08742v1 )

ライセンス: Link先を確認

Hayden McAlister, Anthony Robins, Lech Szymanski,

(参考訳) 現代のホップフィールドネットワークは、よりシャープな相互作用関数を許容することによって、古典的なホップフィールドネットワークを一般化する。これにより、近くの学習されたアトラクションが互いに干渉しないため、自己連想記憶としてのネットワークの容量が増大する。しかし、ネットワークの実装は、メモリベクトルとプローブベクトルのドット積に大きな指数を適用することに依存している。データの次元が大きければ、計算は非常に大きくなり、実際の実装で浮動小数点数を使用する場合の問題が発生する。この問題を詳細に記述し、元のネットワーク記述を変更して問題を緩和し、更新やトレーニング中にネットワークのダイナミクスを変更することはないことを示す。また、現在のホップフィールドネットワークにおけるハイパーパラメータ選択を大幅に改善し、相互作用頂点への依存を取り除き、元のネットワークのように相互作用頂点に大きく変化しない最適なハイパーパラメータ領域が得られることを示した。

The modern Hopfield network generalizes the classical Hopfield network by allowing for sharper interaction functions. This increases the capacity of the network as an autoassociative memory as nearby learned attractors will not interfere with one another. However, the implementation of the network relies on applying large exponents to the dot product of memory vectors and probe vectors. If the dimension of the data is large the calculation can be very large and result in problems when using floating point numbers in a practical implementation. We describe this problem in detail, modify the original network description to mitigate the problem, and show the modification will not alter the networks' dynamics during update or training. We also show our modification greatly improves hyperparameter selection for the modern Hopfield network, removing the dependence on the interaction vertex and resulting in an optimal region of hyperparameters that does not significantly change with the interaction vertex as it does in the original network.

翻訳日:2024-07-22 13:48:17 公開日:2024-05-29

# 看護計画の最適化:医療機関のサプライチェーンアプローチ

Optimizing Nurse Scheduling: A Supply Chain Approach for Healthcare Institutions ( http://arxiv.org/abs/2407.11195v1 )

ライセンス: Link先を確認

Jubin Thomas,

(参考訳) 組織を管理する場合、プランナーは多くの困難なシナリオに直面します。このような例では、直感や管理的経験のみに頼るだけでは十分ではなく、定量的なアプローチが必要である。この需要は、厳重なスケールと制約の複雑さが重大な課題を引き起こすビッグデータの時代においてさらに強調される。そこで本研究の目的は,組織管理における重要な課題である人事スケジューリングの基盤となる枠組みを提供することである。具体的には,契約義務や強制休業期間などの要因によって複雑化した作業である,スタッフのシフト割り当ての最適化に焦点をあてる。さらに、現在の状況は、様々な産業で従業員不足が頻発していることが特徴であり、多くの組織では、それに対応するための効率的で信頼性の高い管理ツールが欠如している。したがって, 医療環境における人事スケジューリングの課題である, 介護者のロスター問題に特に注目が集まっている。これらの問題は、単一の医療施設が数百人の看護師を雇う可能性があることや、適切なスタッフレベルや夜勤の休息といった厳しい制約が課せられることを考えると、様々な変数が特徴である。さらに、新型コロナウイルス(COVID-19)のパンデミックは、医療機関のスタッフの課題を悪化させ、従業員ニーズを正確に評価し、危機状況下での効果的な運用のためのシフト割り当てを最適化することの重要性を浮き彫りにした。

When managing an organization, planners often encounter numerous challenging scenarios. In such instances, relying solely on intuition or managerial experience may not suffice, necessitating a quantitative approach. This demand is further accentuated in the era of big data, where the sheer scale and complexity of constraints pose significant challenges. Therefore, the aim of this study is to provide a foundational framework for addressing personnel scheduling, a critical issue in organizational management. Specifically, we focus on optimizing shift assignments for staff, a task fraught with complexities due to factors such as contractual obligations and mandated rest periods. Moreover, the current landscape is characterized by frequent employee shortages across various industries, with many organizations lacking efficient and dependable management tools to address them. Therefore, our attention is particularly drawn to the nurse rostering problem, a personnel scheduling challenge prevalent in healthcare settings. These issues are characterized by a multitude of variables, given that a single healthcare facility may employ hundreds of nurses, alongside stringent constraints such as the need for adequate staffing levels and rest periods postnight shifts. Furthermore, the ongoing COVID19 pandemic has exacerbated staffing challenges in healthcare institutions, underlining the importance of accurately assessing staffing needs and optimizing shift allocations for effective operation amidst crisis situations.

翻訳日:2024-07-22 12:00:08 公開日:2024-05-29

# オープンソースライセンスの変更と取り消しについて

On the modification and revocation of open source licences ( http://arxiv.org/abs/2407.13064v1 )

ライセンス: Link先を確認

Paul Gagnon, Misha Benjamin, Justine Gauthier, Catherine Regis, Jenny Lee, Alexei Nordell-Markovits,

(参考訳) 歴史的に、オープンソースライセンス下で資料がリリースされると、オープンソースへのコミットメントは無効とみなされてきた。本稿では,オープンソースコントリビュータがユーザを強制する権利のサブセットの作成について論じる。 (i)最新のモデルの更新。 (二新たな利用制限を受理すること、又は三ソフトウェアの使用を全面的に停止すること。これは従来のオープンソースアプローチから逸脱するものの、オープンソースAIモデルに関連する法的、評判、道徳的なリスクは、下流の使用をもっとコントロールできるコントリビュータを正当化する可能性がある。最近の法律改正により、あるケースでは、オープンソースコントリビュータの責任への扉が開かれた。著者らは、下流のユーザがバイアスやガードレール回避、あるいは彼らのコントリビューションに対する敵攻撃といった問題に対処するアップデートを確実に実施できることを、コントリビュータは歓迎するだろうと考えている。最後に、このライセンスカテゴリがRAILライセンスとどのように相互作用するか、OSSプラットフォームやスキャニングツールといった主要な利害関係者による運用と採用の方法について述べる。

Historically, open source commitments have been deemed irrevocable once materials are released under open source licenses. In this paper, the authors argue for the creation of a subset of rights that allows open source contributors to force users to (i) update to the most recent version of a model, (ii) accept new use case restrictions, or even (iii) cease using the software entirely. While this would be a departure from the traditional open source approach, the legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses. Recent legislative changes have also opened the door to liability of open source contributors in certain cases. The authors believe that contributors would welcome the ability to ensure that downstream users are implementing updates that address issues like bias, guardrail workarounds or adversarial attacks on their contributions. Finally, this paper addresses how this license category would interplay with RAIL licenses, and how it should be operationalized and adopted by key stakeholders such as OSS platforms and scanning tools.

翻訳日:2024-07-22 08:18:00 公開日:2024-05-29

# 因果グラフ検証に向けたPrompt-based vs. Fine-Tuned LLMs

Prompt-based vs. Fine-tuned LLMs Toward Causal Graph Verification ( http://arxiv.org/abs/2406.16899v1 )

ライセンス: Link先を確認

Yuni Susanti, Nina Holsmoelle,

(参考訳) 本研究の目的は,テキストソースを用いた因果グラフの自動検証に自然言語処理(NLP)技術を適用することである。因果グラフは、しばしば教師なし因果発見法から派生し、人間の専門家による手作業による評価を必要とする。 NLP技術、すなわちBERTやChatGPTのような大規模言語モデル(LLM)は、テキストコンテキストに基づいてノードペア間の因果関係を観測できるかどうかを予測することによって、結果の因果グラフを検証できる可能性がある。本研究では,(1)因果関係分類タスクに微調整された事前学習言語モデル,(2)プロンプトベースLPMの2種類のNLPモデルの性能を比較した。プロンプトベースのLLMが様々なタスクに対して比較的うまく機能する以前の研究とは対照的に、バイオメディカルおよびオープンドメインのデータセットに関する予備実験では、微調整されたモデルはプロンプトベースのLLMよりも優れており、F1スコアは最大20.5ポイント向上している。コードと事前処理されたデータセットをリポジトリで共有しました。

This work aims toward an application of natural language processing (NLP) technology for automatic verification of causal graphs using text sources. A causal graph is often derived from unsupervised causal discovery methods and requires manual evaluation from human experts. NLP technologies, i.e., Large Language Models (LLMs) such as BERT and ChatGPT, can potentially be used to verify the resulted causal graph by predicting if causal relation can be observed between node pairs based on the textual context. In this work, we compare the performance of two types of NLP models: (1) Pre-trained language models fine-tuned for causal relation classification task and, (2) prompt-based LLMs. Contrasted to previous studies where prompt-based LLMs work relatively well over a set of diverse tasks, preliminary experiments on biomedical and open-domain datasets suggest that the fine-tuned models far outperform the prompt-based LLMs, up to 20.5 points improvement of F1 score. We shared the code and the pre-processed datasets in our repository.

翻訳日:2024-07-01 06:41:31 公開日:2024-05-29

# 平等な説明可能なAIを進化させる学際的専門知識

Interdisciplinary Expertise to Advance Equitable Explainable AI ( http://arxiv.org/abs/2406.18563v1 )

ライセンス: Link先を確認

Chloe R. Bennett, Heather Cole-Lewis, Stephanie Farquhar, Naama Haamel, Boris Babenko, Oran Lang, Mat Fleck, Ilana Traynis, Charles Lau, Ivor Horn, Courtney Lyles,

(参考訳) 人工知能(AI)の分野は、健康と医療に急速に影響している。従来の研究は、データ代表性やモデルパフォーマンスに厳格な注意を払って、エクイティを推し進め、バイアスを減らす必要性を明確に示した。しかし、社会疫学のベストプラクティスと健康のエクイティを活用してAIの説明可能性を向上させる機会もあり、見いだされた協会の仮説の策定に役立てることができる。本稿では、説明可能なAI(XAI)に注目し、複数の視点からAIモデルの説明を議論し、批判的に評価し、将来の研究のバイアスと方向性の領域を特定するための学際的専門家パネルレビューのためのフレームワークを記述する。我々は,学際的専門家パネルの重要性を強調し,歴史的かつ文脈的に理解された,より正確で公平な解釈を創出する。学際的なパネルディスカッションは、バイアスを減らし、潜在的な共同創設者を特定し、文献にギャップがある追加研究の機会を特定するのに役立つ。これらの洞察は、AIモデルの改善の機会を示唆する。

The field of artificial intelligence (AI) is rapidly influencing health and healthcare, but bias and poor performance persists for populations who face widespread structural oppression. Previous work has clearly outlined the need for more rigorous attention to data representativeness and model performance to advance equity and reduce bias. However, there is an opportunity to also improve the explainability of AI by leveraging best practices of social epidemiology and health equity to help us develop hypotheses for associations found. In this paper, we focus on explainable AI (XAI) and describe a framework for interdisciplinary expert panel review to discuss and critically assess AI model explanations from multiple perspectives and identify areas of bias and directions for future research. We emphasize the importance of the interdisciplinary expert panel to produce more accurate, equitable interpretations which are historically and contextually informed. Interdisciplinary panel discussions can help reduce bias, identify potential confounders, and identify opportunities for additional research where there are gaps in the literature. In turn, these insights can suggest opportunities for AI model improvement.

翻訳日:2024-07-01 06:00:20 公開日:2024-05-29

# 回転平均化:サイクルグラフにおける原始双対法と閉形

Rotation Averaging: A Primal-Dual Method and Closed-Forms in Cycle Graphs ( http://arxiv.org/abs/2406.18564v1 )

ライセンス: Link先を確認

Gabriel Moreira, Manuel Marques, João Paulo Costeira,

(参考訳) 幾何的再構成の土台である回転平均化(英語版)は、それらの間の測定された相対方向の集合を最適に説明する絶対回転の集合を求める。回転の同期は、バンドル調整や動きからの構造化の不可欠な部分であるだけでなく、視覚的同時ローカライゼーションやマッピングにも応用され、反復型ソルバの初期化やカメラネットワークキャリブレーションにも応用されている。しかし、この最適化問題は非凸と高次元の両方である。本稿では,最大推定点からこの問題に対処し,2倍のコントリビューションを行う。まず、広く受け入れられているスペクトル初期化を動機とした、新しい原始双対法を考案した。さらに、サイクルグラフトポロジにおける平均回転点の定常点を特徴付け、スペクトルグラフ理論におけるこの結果の文脈化を行う。提案手法を複数の設定でベンチマークし、双対性理論を用いて解を証明し、精度と性能を著しく向上させる。

A cornerstone of geometric reconstruction, rotation averaging seeks the set of absolute rotations that optimally explains a set of measured relative orientations between them. In addition to being an integral part of bundle adjustment and structure-from-motion, the problem of synchronizing rotations also finds applications in visual simultaneous localization and mapping, where it is used as an initialization for iterative solvers, and camera network calibration. Nevertheless, this optimization problem is both non-convex and high-dimensional. In this paper, we address it from a maximum likelihood estimation standpoint and make a twofold contribution. Firstly, we set forth a novel primal-dual method, motivated by the widely accepted spectral initialization. Further, we characterize stationary points of rotation averaging in cycle graphs topologies and contextualize this result within spectral graph theory. We benchmark the proposed method in multiple settings and certify our solution via duality theory, achieving a significant gain in precision and performance.

翻訳日:2024-07-01 06:00:20 公開日:2024-05-29

# Chiribella et al., Phys. Lett. 132 (2024) 190201, arXiv:2301.10885

Comment on Chiribella et al., Phys. Rev. Lett. 132 (2024) 190201, arXiv:2301.10885 ( http://arxiv.org/abs/2406.04363v1 )

ライセンス: Link先を確認

Robert B. Griffiths,

(参考訳) Chiribella et al の論文 'Bell Nonlocality in Classical Systems Coexisting with Other System Types' は、非可換な量子プロジェクタを無視した方法で量子文脈における 'classical' を定義し、したがってヒルベルト量子量子理論と矛盾しない。

The article `Bell Nonlocality in Classical Systems Coexisting with Other System Types' by Chiribella et al. defines `classical' in a quantum context in a way that ignores noncommuting quantum projectors, and is hence inconsistent with Hilbert-space quantum theory.

翻訳日:2024-06-23 14:05:12 公開日:2024-05-29

# オートマタ・シ・アプリカティイの論文

Sisteme Hibride de Invatare Automata si Aplicatii ( http://arxiv.org/abs/2406.11870v1 )

ライセンス: Link先を確認

Eduard Hogea, Darian Onchis,

(参考訳) 本稿では、分類と回帰のために、ディープニューラルネットワークアプローチとニューロシンボリックアプローチを提案する。 Logic Tensor Networksに基づくニューロシンボリック予測モデルは、警告または攻撃と呼ばれる悪い接続の特徴と通常の接続の特徴を説明すると同時に、識別することができる。提案するハイブリッドシステムは、経験を通じて、深層ニューラルネットワークによる独自の改善能力と、象徴的な人工知能アプローチによって提供される結果の解釈可能性の両方を取り入れている。ハイブリッドシステムへのシフトの必要性を正当化するために、高密度ニューラルネットワークとニューロシンボリックネットワークの詳細な説明、実装、比較を行う。関連する比較のために、同じデータセットをトレーニングに使用し、結果のメトリクスを比較した。結果のメトリクスのレビューでは、どちらの手法も予測モデルに類似した精度を持つが、Logic Tensor Networksはデータよりもインタラクティブな精度と推論の推論も可能である。また、過度な緩和やスケーラビリティの問題といった他の利点や欠点も議論されている。

In this paper, a deep neural network approach and a neuro-symbolic one are proposed for classification and regression. The neuro-symbolic predictive models based on Logic Tensor Networks are capable of discriminating and in the same time of explaining the characterization of bad connections, called alerts or attacks, and of normal connections. The proposed hybrid systems incorporate both the ability of deep neural networks to improve on their own through experience and the interpretability of the results provided by symbolic artificial intelligence approach. To justify the need for shifting towards hybrid systems, explanation, implementation, and comparison of the dense neural network and the neuro-symbolic network is performed in detail. For the comparison to be relevant, the same datasets were used in training and the metrics resulted have been compared. A review of the resulted metrics shows that while both methods have similar precision in their predictive models, with Logic Tensor Networks being also possible to have interactive accuracy and deductive reasoning over data. Other advantages and disadvantages such as overfitting mitigation and scalability issues are also further discussed.

翻訳日:2024-06-23 13:24:48 公開日:2024-05-29

# Pretrained Mobility Transformer: 人体移動のための基礎モデル

Pretrained Mobility Transformer: A Foundation Model for Human Mobility ( http://arxiv.org/abs/2406.02578v1 )

ライセンス: Link先を確認

Xinhua Wu, Haoyu He, Yanchao Wang, Qi Wang,

(参考訳) ユビキタスなモバイルデバイスは、個人が都市空間を詳細にナビゲートし利用する方法を明らかにする、膨大な量の位置情報ベースのサービスデータを生成している。本研究では,都市空間と人間の移動性を理解するための基礎モデルを構築するために,これらの広範囲な未ラベルのユーザトラジェクトリを利用する。本稿では, ユーザトラジェクトリを自己回帰的に処理し, 地理的領域をトークンに変換し, 空間的および時間的情報をこれらの表現内に埋め込むためのトランスフォーマアーキテクチャを利用する, PMT (textbf{M}obility \textbf{T}ransformer) を提案する。 2ヶ月間に3つの大都市圏で実施された実験は、PMTが地域の地理的・社会的なデコグラフィー特性を捉える能力を示している。提案したPMTは、次の位置予測、軌道計算、軌道生成など、様々な下流タスクにまたがる。これらの結果は、都市空間機能と個人の移動性嗜好に関する新たな洞察を提供する、人間の移動性の複雑なパターンの復号化におけるPMTの能力と有効性を支持する。

Ubiquitous mobile devices are generating vast amounts of location-based service data that reveal how individuals navigate and utilize urban spaces in detail. In this study, we utilize these extensive, unlabeled sequences of user trajectories to develop a foundation model for understanding urban space and human mobility. We introduce the \textbf{P}retrained \textbf{M}obility \textbf{T}ransformer (PMT), which leverages the transformer architecture to process user trajectories in an autoregressive manner, converting geographical areas into tokens and embedding spatial and temporal information within these representations. Experiments conducted in three U.S. metropolitan areas over a two-month period demonstrate PMT's ability to capture underlying geographic and socio-demographic characteristics of regions. The proposed PMT excels across various downstream tasks, including next-location prediction, trajectory imputation, and trajectory generation. These results support PMT's capability and effectiveness in decoding complex patterns of human mobility, offering new insights into urban spatial functionality and individual mobility preferences.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-29

# 効率的な数値計算のためのオープンソースフレームワーク

An Open-Source Framework for Efficient Numerically-Tailored Computations ( http://arxiv.org/abs/2406.02579v1 )

ライセンス: Link先を確認

Louis Ledoux, Marc Casas,

(参考訳) 本稿では,効率的な行列行列乗算(MMM)を容易にするために設計された多用途オープンソースフレームワークを提案する。このフレームワークは2つの主要なコントリビューションを提供している: 1つは、算術データパス生成のための微調整された自動パイプラインで、高度にカスタマイズ可能なシストリックなMMMカーネルを実現する。このフレームワークは、人工知能(AI)推論や海面高度(SSH)計算など、さまざまな数値要件を示す多様なハイパフォーマンスコンピューティング(HPC)ワークロードに対して、エネルギーコスト当たりの精度を体系的に向上させる。 AI推論では、ResNet18、ResNet34、ResNet50、DenseNet121、DenseNet161、DenseNet169、VGG11という最先端のニューラルネットワークモデルを、2つのデータセット、2つのコンピュータフォーマット、27の異なる中間演算データパスと共に検討する。 IEEE754-32の3.3\times$とResNet50のImageNet推論中のBfloat16の1.4\times$の3.3\times$である。これは従来の浮動小数点演算器(FPU)に匹敵する8.2.3\%と8.6\%の精度を維持しながら達成される。 SSH計算の文脈では、FPUにおける従来の2倍精度演算と4倍精度演算の精度を上回る2倍精度の単語を用いて、完全再現可能な結果を得る。提案手法は, IEEE754-64 と IEEE754-128 と比較して, SSH の計算精度を最低で 5\times$ と $27\times$ で向上させ, 結果として 5.6\times$ と $115.1\times$ の計算精度の向上を実現した。

We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNet18, ResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in conjunction with two datasets, two computer formats, and 27 distinct intermediate arithmetic datapaths. Our approach consistently reduces energy consumption across all cases, with a notable example being the reduction by factors of $3.3\times$ for IEEE754-32 and $1.4\times$ for Bfloat16 during ImageNet inference with ResNet50. This is accomplished while maintaining accuracies of $82.3\%$ and $86\%$, comparable to those achieved with conventional Floating-Point Units (FPUs). In the context of SSH computation, our method achieves fully-reproducible results using double-precision words, surpassing the accuracy of conventional double- and quad-precision arithmetic in FPUs. Our approach enhances SSH computation accuracy by a minimum of $5\times$ and $27\times$ compared to IEEE754-64 and IEEE754-128, respectively, resulting in $5.6\times$ and $15.1\times$ improvements in accuracy per power cost.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-29

# ディープニューラルネットワークとしてのカオスダイナミクスの爆発

Exploiting Chaotic Dynamics as Deep Neural Networks ( http://arxiv.org/abs/2406.02580v1 )

ライセンス: Link先を確認

Shuhong Liu, Nozomi Akashi, Qingyao Huang, Yasuo Kuniyoshi, Kohei Nakajima,

(参考訳) カオスは、非線形性および初期状態に対する感度から生じる複素ダイナミクスを示す。これらの特徴は、高度な計算応用のポテンシャルを裏付ける表現性の深さを示唆している。しかし、情報処理にカオス力学を効果的に活用するための戦略は、ほとんど解明されていない。本研究では,様々な最先端の深層ニューラルネットワークでカオスの本質を見出すことができることを示した。この啓示から着想を得た本研究では,カオス力学を直接活用して深層学習アーキテクチャを提案する。我々のアプローチは、異なるカオスシステムにまたがって体系的に評価される。すべての場合において、我々のフレームワークは精度、収束速度、効率の点で従来のディープニューラルネットワークに優れた結果をもたらす。さらに,本手法では,過渡的カオス形成の活発な役割を見出した。この研究は、情報処理において長年見過ごされてきたカオスの統合のための新しい経路を提供し、機械学習とニューロモルフィック計算の領域におけるカオス力学の将来的な融合に関する洞察を提供する。

Chaos presents complex dynamics arising from nonlinearity and a sensitivity to initial states. These characteristics suggest a depth of expressivity that underscores their potential for advanced computational applications. However, strategies to effectively exploit chaotic dynamics for information processing have largely remained elusive. In this study, we reveal that the essence of chaos can be found in various state-of-the-art deep neural networks. Drawing inspiration from this revelation, we propose a novel method that directly leverages chaotic dynamics for deep learning architectures. Our approach is systematically evaluated across distinct chaotic systems. In all instances, our framework presents superior results to conventional deep neural networks in terms of accuracy, convergence speed, and efficiency. Furthermore, we found an active role of transient chaos formation in our scheme. Collectively, this study offers a new path for the integration of chaos, which has long been overlooked in information processing, and provides insights into the prospective fusion of chaotic dynamics within the domains of machine learning and neuromorphic computation.

翻訳日:2024-06-09 15:49:54 公開日:2024-05-29

# ε$-Optimally Solving Zero-Sum POSGs

$ε$-Optimally Solving Zero-Sum POSGs ( http://arxiv.org/abs/2406.00054v1 )

ライセンス: Link先を確認

Erwan Escudie, Matthia Sabatelli, Jilles Dibangoye,

(参考訳) ゼロサム部分可観測確率ゲーム (zs-POSGs) の解法は、元のゲームを占領マルコフゲームと呼ばれる新しいゲームに埋め込む。この再構成により、zs-POSGを解くためにベルマンの最適性原理を適用することができる。しかし、現在のソリューションを改善するには、指数関数的に多くの潜在的な制約を持つ線形プログラムを解く必要があり、このアプローチのスケーラビリティを著しく制限する。本稿では、この制限を克服するために、最適値関数の新たな一様連続性特性を利用する。まず、最適性を損なうことなく、最新の更新ルールよりも計算効率の良い新しい演算子を構築する。特に、現在の解を改善するには、指数関数的な制約の減少を伴う線形プログラムが必要となる。また,各領域における保証を維持しつつ,既存の手法のスケーラビリティを向上させる点ベースの値反復アルゴリズムについても示す。

A recent method for solving zero-sum partially observable stochastic games (zs-POSGs) embeds the original game into a new one called the occupancy Markov game. This reformulation allows applying Bellman's principle of optimality to solve zs-POSGs. However, improving a current solution requires solving a linear program with exponentially many potential constraints, which significantly restricts the scalability of this approach. This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation. We first construct a new operator that is computationally more efficient than the state-of-the-art update rules without compromising optimality. In particular, improving a current solution now involves a linear program with an exponential drop in constraints. We then also show that point-based value iteration algorithms utilizing our findings improve the scalability of existing methods while maintaining guarantees in various domains.

翻訳日:2024-06-06 08:53:00 公開日:2024-05-29

# 文脈と時間知覚的長期記憶を用いた会話エージェントの実現に向けて

Toward Conversational Agents with Context and Time Sensitive Long-term Memory ( http://arxiv.org/abs/2406.00057v1 )

ライセンス: Link先を確認

Nick Alonso, Tomás Figliolia, Anthony Ndirango, Beren Millidge,

(参考訳) 近年,長期記憶を持つ会話エージェントへの関心が高まっており,検索強化生成(RAG)を用いた言語モデルの開発が急速に進んでいる。最近まで、RAGに関するほとんどの研究は、長文の会話の情報ではなく、ウィキペディアのような巨大なテキストデータベースからの情報検索に重点を置いてきた。本稿では,データベースの静的検索と比較して,長文形式の会話データからの効果的な検索が2つの問題に直面していることを論じる。 1)時間/イベントベースのクエリで、会話イベントの時間や順序(例えば、火曜日の第3回会話)に基づいて、モデルが過去の会話に関する情報を取得する必要がある。 2) 周囲の会話コンテキストを理解する必要があるあいまいなクエリ。これらの課題に対処できるRAGベースのエージェントをより良く開発するために、私たちは、最近の長文でシミュレートされた会話のデータセットの上に構築された、あいまいで時間的な質問の新しいデータセットを作成し、標準RAGベースのアプローチがそのような質問を不十分に扱うことを実証する。そこで我々は,連鎖型検索手法,標準ベクトルデータベース検索,問合せを曖昧にするためのプロンプト手法を組み合わせた新しい検索モデルを開発し,これらの課題を解決するための現在の手法よりも大幅に改善されていることを示す。この新しいデータセットとより高度なRAGエージェントは、重要なベンチマークとして機能し、さまざまなAIアプリケーションで使用可能な、効果的なメモリ拡張会話エージェントへと踏み込むことができると考えています。

There has recently been growing interest in conversational agents with long-term memory which has led to the rapid development of language models that use retrieval-augmented generation (RAG). Until recently, most work on RAG has focused on information retrieval from large databases of texts, like Wikipedia, rather than information from long-form conversations. In this paper, we argue that effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval: 1) time/event-based queries, which requires the model to retrieve information about previous conversations based on time or the order of a conversational event (e.g., the third conversation on Tuesday), and 2) ambiguous queries that require surrounding conversational context to understand. To better develop RAG-based agents that can deal with these challenges, we generate a new dataset of ambiguous and time-based questions that build upon a recent dataset of long-form, simulated conversations, and demonstrate that standard RAG based approaches handle such questions poorly. We then develop a novel retrieval model which combines chained-of-table search methods, standard vector-database retrieval, and a prompting method to disambiguate queries, and demonstrate that this approach substantially improves over current methods at solving these tasks. We believe that this new dataset and more advanced RAG agent can act as a key benchmark and stepping stone towards effective memory augmented conversational agents that can be used in a wide variety of AI applications.

翻訳日:2024-06-06 08:53:00 公開日:2024-05-29

# Conveyor: ツール部分実行を備えた効率的なツール対応LDM

Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution ( http://arxiv.org/abs/2406.00059v1 )

ライセンス: Link先を確認

Yechen Xu, Xinhao Kong, Tingjun Chen, Danyang Zhuo,

(参考訳) 大規模言語モデル(LLM)サービスワークロードの複雑さは、ChatGPTプラグインのような外部ツール呼び出しとの統合によって大幅に増大している。本稿では, LLMデコーディングと並行してツール部分実行を行う要求に対して, 効率的なLLMサービスを実現するための新たな機会を特定する。この目的のために、外部ツールを含む要求を処理するために最適化された効率的なLLMサービスシステムであるConveyorを設計する。ツール開発者がLCMサービスシステムに部分的な実行機会を公開するための新しいインターフェースと、部分的なツール実行を容易にする要求スケジューラを導入する。ツールの部分的な実行は、要求完了のレイテンシを最大38.8%改善することを示した。

The complexity of large language model (LLM) serving workloads has substantially increased due to the integration with external tool invocations, such as ChatGPT plugins. In this paper, we identify a new opportunity for efficient LLM serving for requests that trigger tools: tool partial execution alongside LLM decoding. To this end, we design Conveyor, an efficient LLM serving system optimized for handling requests involving external tools. We introduce a novel interface for tool developers to expose partial execution opportunities to the LLM serving system and a request scheduler that facilitates partial tool execution. Our results demonstrate that tool partial execution can improve request completion latency by up to 38.8%.

翻訳日:2024-06-06 08:53:00 公開日:2024-05-29

# 言語モデルのカスケード・アウェア・トレーニング

Cascade-Aware Training of Language Models ( http://arxiv.org/abs/2406.00060v1 )

ライセンス: Link先を確認

Congchao Wang, Sean Augenstein, Keith Rush, Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Aditya Krishna Menon, Alec Go,

(参考訳) サービスコストとレイテンシの削減は、ビジネスアプリケーションに言語モデル(LM)を配置する上で、基本的な懸念事項である。これを解決するために、LMのカスケードは、より単純なクエリのためにより小さなモデルを条件付きで使用する効果的なソリューションを提供する。カスケードシステムは一般に独立に訓練されたモデルで構築され、訓練中にカスケードされたLMの推論時間相互作用を考慮するという利点を無視している。本稿では,カスケードの性能トレードオフを最適化する手法として,カスケード対応トレーニング(CAT)を提案する。我々は,小規模なLMをカスケードや下流の能力において,その位置を意識して訓練することで,推定時間の利点を得る。提案手法の有効性を,SuperGLUE,WMT22,FLAN2021データセットの60以上のLMタスクで示す。

Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries. Cascaded systems are typically built with independently trained models, neglecting the advantages of considering inference-time interactions of the cascaded LMs during training. In this paper, we present cascade-aware training(CAT), an approach to optimizing the overall quality-cost performance tradeoff of a cascade of LMs. We achieve inference-time benefits by training the small LM with awareness of its place in a cascade and downstream capabilities. We demonstrate the value of the proposed method with over 60 LM tasks of the SuperGLUE, WMT22, and FLAN2021 datasets.

翻訳日:2024-06-06 08:53:00 公開日:2024-05-29

# STAT: トレーニング後の変圧器の収縮

STAT: Shrinking Transformers After Training ( http://arxiv.org/abs/2406.00061v1 )

ライセンス: Link先を確認

Megan Flynn, Alexander Wang, Dean Edward Alvarez, Christopher De Sa, Anil Damle,

(参考訳) 本稿では,変換器モデルに微調整を伴わない簡単なアルゴリズムSTATを提案する。 STATは、次の層の重みを補正して精度を保ちながら、注意頭とニューロンの両方をネットワークから排除する。ネットワーク内の各層ブロックは、ネットワーク構造を保存する一連の基本行列分解を用いて圧縮される。われわれのアルゴリズムは、BERTを圧縮するのに数分を要し、単一のGPUを用いて7Bパラメータを持つモデルを圧縮するのに3時間もかからない。わずか数百のデータ例を使用して、STATはネットワークの出力を保存し、既存の勾配のないプルーニング法を改善する。優れた微調整を含む手法とさえ競合する。本稿では, GLUE, Squad, WikiText2 などのベンチマークを用いて, BERT, DistilBERT, Llama-2 などのエンコーダアーキテクチャとデコーダアーキテクチャの両方に適用した。

We present STAT: a simple algorithm to prune transformer models without any fine-tuning. STAT eliminates both attention heads and neurons from the network, while preserving accuracy by calculating a correction to the weights of the next layer. Each layer block in the network is compressed using a series of principled matrix factorizations that preserve the network structure. Our entire algorithm takes minutes to compress BERT, and less than three hours to compress models with 7B parameters using a single GPU. Using only several hundred data examples, STAT preserves the output of the network and improves upon existing gradient-free pruning methods. It is even competitive with methods that include significant fine-tuning. We demonstrate our method on both encoder and decoder architectures, including BERT, DistilBERT, and Llama-2 using benchmarks such as GLUE, Squad, WikiText2.

翻訳日:2024-06-06 08:43:16 公開日:2024-05-29

# 臨床テキスト匿名化のための大規模言語モデルの可能性 : 比較研究

Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study ( http://arxiv.org/abs/2406.00062v1 )

ライセンス: Link先を確認

David Pissarra, Isabel Curioso, João Alveira, Duarte Pereira, Bruno Ribeiro, Tomás Souper, Vasco Gomes, André V. Carreiro, Vitor Rolla,

(参考訳) 自動臨床テキスト匿名化は、患者のプライバシーと安全性を確保しつつ、二次的使用のためにテキスト健康データを広く共有する可能性を秘めている。文学において多くの複雑で理論的に成功した匿名化解の提案にもかかわらず、これらの手法は依然として欠陥がある。そのため、医療機関はデータへのオープンアクセスを望んでいない。近年のLarge Language Models (LLM) の開発は、様々なタスクを遂行する能力を考えると、この分野をさらに発展させる有望な機会となっている。本稿では,LLMによる生成匿名化の課題に適した6つの新しい評価指標を提案する。さらに, LLM法の比較研究を行い, 2つのベースライン法との比較を行った。本研究は,臨床テキストの信頼性の高い匿名化に向けて,LCMを用いたモデルを構築した。

Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage while assuring patient privacy and safety. Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed. As such, clinical institutions are still reluctant to apply them for open access to their data. Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field, given their capability to perform various tasks. This paper proposes six new evaluation metrics tailored to the challenges of generative anonymization with LLMs. Moreover, we present a comparative study of LLM-based methods, testing them against two baseline techniques. Our results establish LLM-based models as a reliable alternative to common approaches, paving the way toward trustworthy anonymization of clinical text.

翻訳日:2024-06-06 08:43:16 公開日:2024-05-29

# システム2レコメンダ:時間的ポイント・プロシースによる勧告システムにおける実用性とエンゲージメントの両立

System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes ( http://arxiv.org/abs/2406.01611v1 )

ライセンス: Link先を確認

Arpit Agarwal, Nicolas Usunier, Alessandro Lazaric, Maximilian Nickel,

(参考訳) リコメンダーシステムは現代の人間の体験の重要な部分であり、その影響は、食べる食べ物から読むニュースまで様々である。しかし、レコメンデーションプラットフォームがユーザ目標とどの程度一致しているかについては、まだ議論がある。この議論を刺激する中核的な問題は、プラットフォームがコンテンツを最適化するために使用している主要な指標である、いいね!、シェア、ウォッチタイムなどのエンゲージメント信号に基づいてユーザユーティリティを推測する、という課題である。これは、ユーザーがユーティリティ駆動の意思決定プロセス(System-2と呼ぶ)、例えば、それらに関連するニュースを読むことは、しばしば衝動的意思決定プロセス(System-1と呼ぶ)、例えばクリックベイトニュースに費やす時間によって構成されるためである。その結果、観測されたエンゲージメントがユーティリティ駆動なのかインパルス駆動なのかを推測することは困難である。本稿では、エンゲージメント信号ではなく、プラットフォームへのリターン確率に基づいてユーザユーティリティを推測する、リコメンデータシステムに対する新しいアプローチを提案する。私たちの直感は、ユーザーがユーティリティを作成すれば、長期的にはプラットフォームに戻る傾向がありますが、ユーティリティを追加しない純粋なエンゲージメント駆動インタラクションは、短期的にはユーザリターンに影響を与えるかもしれませんが、持続的な効果はありません。本稿では,過去のコンテンツインタラクションが,自己興奮型ホークスプロセスに基づくユーザの到着率に影響を及ぼす生成モデルを提案する。これらのプラットフォームへの到着率は、System-1とSystem-2の両方の決定プロセスの組み合わせである。 System-2の到着強度は実用性に依存するが、System-1の到着強度は即時的な満足度に依存し、急速に消滅する傾向にある。そこで本研究では,システム1とシステム2のアンタングルを解消し,ユーザ利用によるコンテンツ最適化を可能にすることを解析的に示す。提案手法の有効性を実証するために, 合成データの実験を行った。

Recommender systems are an important part of the modern human experience whose influence ranges from the food we eat to the news we read. Yet, there is still debate as to what extent recommendation platforms are aligned with the user goals. A core issue fueling this debate is the challenge of inferring a user utility based on engagement signals such as likes, shares, watch time etc., which are the primary metric used by platforms to optimize content. This is because users utility-driven decision-processes (which we refer to as System-2), e.g., reading news that are relevant for them, are often confounded by their impulsive decision-processes (which we refer to as System-1), e.g., spend time on click-bait news. As a result, it is difficult to infer whether an observed engagement is utility-driven or impulse-driven. In this paper we explore a new approach to recommender systems where we infer user utility based on their return probability to the platform rather than engagement signals. Our intuition is that users tend to return to a platform in the long run if it creates utility for them, while pure engagement-driven interactions that do not add utility, may affect user return in the short term but will not have a lasting effect. We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process. These arrival rates to the platform are a combination of both System-1 and System-2 decision processes. The System-2 arrival intensity depends on the utility and has a long lasting effect, while the System-1 intensity depends on the instantaneous gratification and tends to vanish rapidly. We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility. We conduct experiments on synthetic data to demonstrate the effectiveness of our approach.

翻訳日:2024-06-05 21:31:36 公開日:2024-05-29

# 量子力学から解釈まで、理想的な量子計測を教える

Teaching ideal quantum measurement, from dynamics to interpretation ( http://arxiv.org/abs/2405.20353v1 )

ライセンス: Link先を確認

Armen E. Allahverdyan, Roger Balian, Theo M. Nieuwenhuizen,

(参考訳) 本稿では, 実験系Sと装置Aとの相互作用の動的過程として解析された理想的な測度に関する大学院コースについて, 量子統計力学によって記述した。装置A=M+Bは、マクロな測定装置Mと浴槽Bを含み、測定の理想性の要件により、分離された化合物系S+M+Bのハミルトニアンを特定することができる。結果として生じる力学方程式は、単純なモデルに対して解くことができる。保存法は、切り離しと登録という2つの独立した緩和機構を含むことが示されている。 M と B の大きい大きさで正当化される近似が必要である。 S+A の最終密度行列 $\hat{\cal D}(t_f)$ は平衡形式を持つ。これは、測定の大規模な実行の結果を世界規模で記述している。測定問題、すなわち$\hat{\cal D}(t_f)$から個々の物理特性を抽出すると、その不明瞭さが実行の部分集合に関連付けられた部分に分割されることから生じる。この曖昧さに対処するため、各ランは、M の異なるポインタ値 $A_i$ で終わると仮定する。ボルンの法則は、試験された可観測物の保存法則から生じ、S. Von Neumann の初期状態から M の最終的な表示の出現頻度を表す。我々は、非可換観測値の測定を解析する際に、$q$-probabilitiesと$q$-correlationsという用語を提唱する。これらの考え方は、異なるタイプのコースに適応することができる。

We present a graduate course on ideal measurements, analyzed as dynamical processes of interaction between the tested system S and an apparatus A, described by quantum statistical mechanics. The apparatus A=M+B involves a macroscopic measuring device M and a bath B. The requirements for ideality of the measurement allow us to specify the Hamiltonian of the isolated compound system S+M+B. The resulting dynamical equations may be solved for simple models. Conservation laws are shown to entail two independent relaxation mechanisms: truncation and registration. Approximations, justified by the large size of M and of B, are needed. The final density matrix $\hat{\cal D}(t_f)$ of S+A has an equilibrium form. It describes globally the outcome of a large set of runs of the measurement. The measurement problem, i.e., extracting physical properties of individual runs from $\hat{\cal D}(t_f)$, then arises due to the ambiguity of its splitting into parts associated with subsets of runs. To deal with this ambiguity, we postulate that each run ends up with a distinct pointer value $A_i$ of the macroscopic M. This is compatible with the principles of quantum mechanics. Born's rule then arises from the conservation law for the tested observable; it expresses the frequency of occurrence of the final indications $A_i$ of M in terms of the initial state of S. Von Neumann's reduction amounts to updating of information due to selection of $A_i$. We advocate the terms $q$-probabilities and $q$-correlations when analyzing measurements of non-commuting observables. These ideas may be adapted to different types of courses.

翻訳日:2024-06-03 18:44:15 公開日:2024-05-29

# アコースティックエミッションとディープトランスファー学習による溶接継手の条件モニタリングについて:一般化, 正規損失および超収束

On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence ( http://arxiv.org/abs/2405.20887v1 )

ライセンス: Link先を確認

Emmanuel Ramasso, Rafael de O. Teloli, Romain Marcel,

(参考訳) 本稿では, 畳み込みニューラルネットワーク(CNN)を用いた深部伝達学習を用いて, 超音波放射を用いたボルト接合部の状態をモニタリングする。ボルト構造は多くのメカニカルシステムにおいて重要な要素であり、その状態を監視する能力は、効果的な構造的健康モニタリングに不可欠である。 3本のボルトで接続された2本の細いビームからなるOrION-AEベンチマークを用いて,本手法の性能評価を行った。この構造から得られたデータは、連続ウェーブレット変換を用いて音響放射データストリームを画像に変換し、事前学習したCNNを用いて特徴抽出と復調を行う。本実験では, ボルトの締め付けレベルを推定するために, 単センサと多センサフュージョンを比較し, 実・前フィルタデータを用いた性能評価を行った。特に,CNNに基づく移動学習の一般化機能に着目し,不正確な予測を根本事実に近づいた場合,不正確な予測を過度に減らし,誤分類誤りを助長する順序的損失関数について検討した。ネットワーク構成や学習速度スケジューラについても検討し,ネットワーク間の複数イテレーションにおいて高い分類精度を実現する。さらに,CNNを用いた伝達学習の音響放射によるボルト状構造物監視の一般化能力について,訓練中に必要となる事前情報量について検証した。

This paper investigates the use of deep transfer learning based on convolutional neural networks (CNNs) to monitor the condition of bolted joints using acoustic emissions. Bolted structures are critical components in many mechanical systems, and the ability to monitor their condition status is crucial for effective structural health monitoring. We evaluated the performance of our methodology using the ORION-AE benchmark, a structure composed of two thin beams connected by three bolts, where highly noisy acoustic emission measurements were taken to detect changes in the applied tightening torque of the bolts. The data used from this structure is derived from the transformation of acoustic emission data streams into images using continuous wavelet transform, and leveraging pretrained CNNs for feature extraction and denoising. Our experiments compared single-sensor versus multiple-sensor fusion for estimating the tightening level (loosening) of bolts and evaluated the use of raw versus prefiltered data on the performance. We particularly focused on the generalization capabilities of CNN-based transfer learning across different measurement campaigns and we studied ordinal loss functions to penalize incorrect predictions less severely when close to the ground truth, thereby encouraging misclassification errors to be in adjacent classes. Network configurations as well as learning rate schedulers are also investigated, and super-convergence is obtained, i.e., high classification accuracy is achieved in a few number of iterations with different networks. Furthermore, results demonstrate the generalization capabilities of CNN-based transfer learning for monitoring bolted structures by acoustic emission with varying amounts of prior information required during training.

翻訳日:2024-06-03 14:08:24 公開日:2024-05-29

# 原子核ノルム規則化マトリックスの相対誤差境界解析

Relative Error Bound Analysis for Nuclear Norm Regularized Matrix Completion ( http://arxiv.org/abs/1504.06817v2 )

ライセンス: Link先を確認

Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou,

(参考訳) 本稿では,核ノルム正規化行列の完備化に対する相対誤差を,フルランク行列の完備化に着目して開発する。対象行列のトップ固有空間が不整合であるという仮定の下で、未知行列の最良の低ランク近似を回復する相対上界を導出する。複数の研究がフルランク行列補完の回復誤差の分析に費やされているが、その誤差境界は通常加法的であり、完全な回復ケースを得ることができず、より一般的には固有値の歪んだ分布を利用するのが困難である。本分析は, 正規化定式化の最適条件と, 低ランク行列完備化の既定保証に基づく。我々の知る限りでは、これは行列完備化の正規化された定式化のために証明された最初の相対的境界である。

In this paper, we develop a relative error bound for nuclear norm regularized matrix completion, with the focus on the completion of full-rank matrices. Under the assumption that the top eigenspaces of the target matrix are incoherent, we derive a relative upper bound for recovering the best low-rank approximation of the unknown matrix. Although multiple works have been devoted to analyzing the recovery error of full-rank matrix completion, their error bounds are usually additive, making it impossible to obtain the perfect recovery case and more generally difficult to leverage the skewed distribution of eigenvalues. Our analysis is built upon the optimality condition of the regularized formulation and existing guarantees for low-rank matrix completion. To the best of our knowledge, this is the first relative bound that has been proved for the regularized formulation of matrix completion.

翻訳日:2024-06-02 18:32:50 公開日:2024-05-29

# 言語横断・文字レベルニューラルな形態的タグ付け

Cross-lingual, Character-Level Neural Morphological Tagging ( http://arxiv.org/abs/1708.09157v4 )

ライセンス: Link先を確認

Ryan Cotterell, Georg Heigold,

(参考訳) 一般的なNLPタスクであっても、多くの言語では十分な監視ができない。そこで本研究では,高リソース言語と低リソース言語に対する形態的タグ付けを予測するために,文字レベルのリカレントなニューラルタグをトレーニングするトランスファーラーニング手法について検討する。複数の関連言語間の共同文字表現の学習は、高リソース言語から低リソース言語への知識伝達を成功させ、モノリンガルモデルの精度を最大30%向上させる。

Even for common NLP tasks, sufficient supervision is not available in many languages -- morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.

翻訳日:2024-06-02 14:47:20 公開日:2024-05-29

# オープンオントロジースロット充填のための弾性CRF

Elastic CRFs for Open-ontology Slot Filling ( http://arxiv.org/abs/1811.01331v2 )

ライセンス: Link先を確認

Yinpei Dai, Yichi Zhang, Hong Liu, Zhijian Ou, Yi Huang, Junlan Feng,

(参考訳) スロットフィリングはタスク指向の対話システムにおいて重要なコンポーネントであり、(ユーザ)発話をスロットと呼ばれるセマンティックな概念にパースするために使用される。オントロジーはスロットの集合と各スロットが取ることのできる値によって定義される。スロットフィリングをシーケンスラベリングタスクとして扱う最も広く使われているプラクティスは、2つの主な欠点に悩まされている。まず、オントロジーは通常事前に定義され、固定されているため、目に見えないスロットの新しいラベルを検出できない。第二に、スロットラベルのワンホット符号化は、スロットと類似のセマンティクスとの相関を無視するので、異なるドメインで学んだ知識を共有することは困難である。これらの問題に対処するために,各スロットは自然言語記述の埋め込みによって表現され,CRF層でモデル化される弾性条件付きランダムフィールド(eCRF)と呼ばれる新しいモデルを提案する。スロットに対する言語記述が利用可能であれば、eCRFによって新しいスロット値を検出することができる。実験の結果,eCRFはドメイン内タスクとクロスドメインタスクの両方において既存のモデルよりも優れており,特に未確認スロットや値の予測において優れていることがわかった。

Slot filling is a crucial component in task-oriented dialog systems that is used to parse (user) utterances into semantic concepts called slots. An ontology is defined by the collection of slots and the values that each slot can take. The most widely used practice of treating slot filling as a sequence labeling task suffers from two main drawbacks. First, the ontology is usually pre-defined and fixed and therefore is not able to detect new labels for unseen slots. Second, the one-hot encoding of slot labels ignores the correlations between slots with similar semantics, which makes it difficult to share knowledge learned across different domains. To address these problems, we propose a new model called elastic conditional random field (eCRF), where each slot is represented by the embedding of its natural language description and modeled by a CRF layer. New slot values can be detected by eCRF whenever a language description is available for the slot. In our experiment, we show that eCRFs outperform existing models in both in-domain and cross-domain tasks, especially in predicting unseen slots and values.

翻訳日:2024-06-02 14:47:20 公開日:2024-05-29

# 量子ニューラルネットワークの一般化研究

Generalization Study of Quantum Neural Network ( http://arxiv.org/abs/2006.02388v2 )

ライセンス: Link先を確認

JinZhe Jiang, Xin Zhang, Chen Li, YaQian Zhao, RenGang Li,

(参考訳) 一般化はニューラルネットワークの重要な特徴であり、それについて多くの研究がなされている。近年、量子コンプ・ティング(quantum compu-ting)の発展に伴い、新たな機会がもたらされる。本稿では,量子ゲートによって構築された量子ニューラルネットワークのクラスについて検討した。このモデルでは、特徴データをまずヒルベルト空間の量子状態にマッピングし、その上にユニタリ進化を実装し、最後に量子状態の即時測定によって分類結果を得ることができた。四項ニューラルネットワークにおける全ての演算はユニタリであるため、パラメータはヒルベルト空間の超球面を構成する。従来のニューラルネットワークと比較すると、パラメータ空間はフラットである。したがって、局所的な最適化に陥ることは容易ではなく、量子ニューラルネットワークはより一般化されている。提案手法を検証するため,提案手法を3つの公開データセット上で評価した。

Generalization is an important feature of neural network, and there have been many studies on it. Recently, with the development of quantum compu-ting, it brings new opportunities. In this paper, we studied a class of quantum neural network constructed by quantum gate. In this model, we mapped the feature data to a quantum state in Hilbert space firstly, and then implement unitary evolution on it, in the end, we can get the classification result by im-plement measurement on the quantum state. Since all the operations in quan-tum neural networks are unitary, the parameters constitute a hypersphere of Hilbert space. Compared with traditional neural network, the parameter space is flatter. Therefore, it is not easy to fall into local optimum, which means the quantum neural networks have better generalization. In order to validate our proposal, we evaluated our model on three public datasets, the results demonstrated that our model has better generalization than the classical neu-ral network with the same structure.

翻訳日:2024-06-02 14:47:20 公開日:2024-05-29

# ChexNetで事前訓練したResNet50を用いたコロナ病に対応するX線画像の転写学習アプローチ

Transfer learning approach to Classify the X-ray image that corresponds to corona disease Using ResNet50 pretrained by ChexNet ( http://arxiv.org/abs/2105.08382v2 )

ライセンス: Link先を確認

Mahyar Bolhassani,

(参考訳) コロナウイルスは世界中の人々に悪影響を及ぼした。コビッド19ウイルスと、肺炎やインフルエンザなどの他の呼吸器疾患との間には共通の症状がある。そのため, 早期診断は, 患者を救うだけでなく, 感染拡大を防ぐためにも重要である。最も信頼性の高い診断方法の1つは、肺のX線画像によるものである。深層学習アプローチの助けを借りて,感染した肺の状態を知るための深層モデルを教えることができる。したがって、新しいサンプルをCovid19感染患者であるかどうかの分類が可能である。このプロジェクトでは、ImageNetデータセットとCheXNetデータセットによって事前訓練されたResNet50に基づいて、ディープモデルをトレーニングする。 Kaggle氏が導入した不均衡なCoronaHack Chest X-Rayデータセットに基づいて、バイナリとマルチクラスの分類を適用した。また,Focal lossとCross Entropy lossの比較を行った。

Coronavirus adversely has affected people worldwide. There are common symptoms between the Covid19 virus disease and other respiratory diseases like pneumonia or Influenza. Therefore, diagnosing it fast is crucial not only to save patients but also to prevent it from spreading. One of the most reliant methods of diagnosis is through X-ray images of a lung. With the help of deep learning approaches, we can teach the deep model to learn the condition of an affected lung. Therefore, it can classify the new sample as if it is a Covid19 infected patient or not. In this project, we train a deep model based on ResNet50 pretrained by ImageNet dataset and CheXNet dataset. Based on the imbalanced CoronaHack Chest X-Ray dataset introducing by Kaggle we applied both binary and multi-class classification. Also, we compare the results when using Focal loss and Cross entropy loss.

翻訳日:2024-06-01 00:29:19 公開日:2024-05-29

# マニピュレーションとピアメカニズム:サーベイ

Manipulation and Peer Mechanisms: A Survey ( http://arxiv.org/abs/2210.01984v3 )

ライセンス: Link先を確認

Matthew Olckers, Toby Walsh,

(参考訳) ピアメカニズムでは、賞の競争相手も誰が勝つかを決定する。各競技者には、賞のランク、成績、候補者の指名を依頼することができる。この賞は、金融援助、コースグレード、会議での賞などの価値があり得るため、競技者は、その仕組みを操作する誘惑を受けることができる。ピアメカニズムの操作を防止または回避するためのアプローチを調査する。我々はいくつかの重要な研究課題を特定して調査を締めくくった。

In peer mechanisms, the competitors for a prize also determine who wins. Each competitor may be asked to rank, grade, or nominate peers for the prize. Since the prize can be valuable, such as financial aid, course grades, or an award at a conference, competitors may be tempted to manipulate the mechanism. We survey approaches to prevent or discourage the manipulation of peer mechanisms. We conclude our survey by identifying several important research challenges.

翻訳日:2024-06-01 00:22:17 公開日:2024-05-29

# LoopDraw: 形状合成と編集のためのループベース自己回帰モデル

LoopDraw: a Loop-Based Autoregressive Model for Shape Synthesis and Editing ( http://arxiv.org/abs/2212.04981v2 )

ライセンス: Link先を確認

Nam Anh Dinh, Haochen Wang, Greg Shakhnarovich, Rana Hanocka,

(参考訳) 幾何学の普遍的な3次元表現は存在しないが、点雲、メッシュ、暗黙の関数、ボクセルなど多くの代替品がある。本研究では, 断面閉ループの列を用いて, 形状を表現するための新しい, 説得力のある代替手段を提案する。すべての平面にまたがるループは、自己回帰的な形状の合成と編集に活用する組織階層を形成します。ループは基底形状の非局所的な記述であり、単純なループ操作(シフトなど)は幾何学に大きな構造変化をもたらす。これは、点雲の点や三角形メッシュの三角形のような局所原始的な操作とは対照的である。さらに、ループは直感的かつ自然なプリミティブであり、コンピュータとユーザの両方の形状を解析し、編集するものであることを実証する。

There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few. In this work, we present a new, compelling alternative for representing shapes using a sequence of cross-sectional closed loops. The loops across all planes form an organizational hierarchy which we leverage for autoregressive shape synthesis and editing. Loops are a non-local description of the underlying shape, as simple loop manipulations (such as shifts) result in significant structural changes to the geometry. This is in contrast to manipulating local primitives such as points in a point cloud or a triangle in a triangle mesh. We further demonstrate that loops are intuitive and natural primitive for analyzing and editing shapes, both computationally and for users.

翻訳日:2024-06-01 00:12:24 公開日:2024-05-29

# 最適2倍ロバスト推定のためのニュアンス関数チューニング

Nuisance Function Tuning for Optimal Doubly Robust Estimation ( http://arxiv.org/abs/2212.14857v2 )

ライセンス: Link先を確認

Sean McGrath, Rajarshi Mukherjee,

(参考訳) 二重頑健な汎函数の推定子は、平均処理効果汎関数に対する確率スコアと条件結果平均のような2つの複素ニュアンス関数を推定することに依存する。因果推論と条件付き独立性試験の文献にまたがる応用を目撃した二重頑健な非パラメトリック関数に対して、ニュアンス関数を最適収束率で推定する方法の問題点を考察する。いくつかのプラグイン型推定器とワンステップ型推定器に対して、ニュアンス関数推定器の異なるチューニングパラメータ選択と興味関数の最適推定率に関するサンプル分割戦略の相互作用を述べる。これらの各推定器および各サンプル分割戦略について、興味の関数に対する最適収束率を得るために、低規則性条件下でニュアンス関数推定器をアンダースムースすることの必要性を示す。適切なニュアンス関数チューニングとサンプル分割戦略により、これらの推定器のいくつかは、ニュアンス関数のすべてのH\"古い滑らか度クラスにおいて収束の最小値を達成することができることを示す。

Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in type estimators and a one-step type estimator, we illustrate the interplay between different tuning parameter choices for the nuisance function estimators and sample splitting strategies on the optimal rate of estimating the functional of interest. For each of these estimators and each sample splitting strategy, we show the necessity to undersmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. By performing suitable nuisance function tuning and sample splitting strategies, we show that some of these estimators can achieve minimax rates of convergence in all H\"older smoothness classes of the nuisance functions.

翻訳日:2024-06-01 00:12:24 公開日:2024-05-29

# 音源とターゲット埋め込みの混合による配電シフトへのわずかな適応

Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings ( http://arxiv.org/abs/2305.14521v3 )

ライセンス: Link先を確認

Yihao Xue, Ali Payani, Yu Yang, Baharan Mirzasoleiman,

(参考訳) トレーニング済みの機械学習モデルは、新しいターゲット環境にデプロイされた場合、分散シフトに適応する必要がある。対象分布からラベル付きデータを取得する場合、ターゲット分布からのサンプルを少数含む少数ショット適応が必須となる。そこで本研究では,MixProを提案する。 MixProはまず、トレーニング済みの大規模なデータと、ターゲットとする少数のデータとを混合(直線的に組み合わせ)することによって、比較的大きなデータセットを生成する。このプロセスは、小さなターゲットデータ中の特定のノイズを緩和しながら、ソースとターゲットの両方の重要な特徴を保存します。そして、混合埋め込み上に線形分類器を訓練し、小さなターゲットデータを過度に適合させることなく、モデルを目標分布に効果的に適応させる。理論的には、従来の方法よりもMixProの利点を実証する。実験の結果,MixPro がベースラインを最大 7 % 上回る性能を示し,対象とする例は 2-4 例に留まった。

Pretrained machine learning models need to be adapted to distribution shifts when deployed in new target environments. When obtaining labeled data from the target distribution is expensive, few-shot adaptation with only a few examples from the target distribution becomes essential. In this work, we propose MixPro, a lightweight and highly data-efficient approach for few-shot adaptation. MixPro first generates a relatively large dataset by mixing (linearly combining) pre-trained embeddings of large source data with those of the few target examples. This process preserves important features of both source and target distributions, while mitigating the specific noise in the small target data. Then, it trains a linear classifier on the mixed embeddings to effectively adapts the model to the target distribution without overfitting the small target data. Theoretically, we demonstrate the advantages of MixPro over previous methods. Our experiments, conducted across various model architectures on 8 datasets featuring different types of distribution shifts, reveal that MixPro can outperform baselines by up to 7\%, with only 2-4 target examples.

翻訳日:2024-06-01 00:12:24 公開日:2024-05-29

# SecureFalcon: LLMによるソフトウェア脆弱性の自動検出はまだ存在するか?

SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? ( http://arxiv.org/abs/2307.06616v2 )

ライセンス: Link先を確認

Mohamed Amine Ferrag, Ammar Battah, Norbert Tihanyi, Ridhi Jain, Diana Maimut, Fatima Alwahedi, Thierry Lestable, Narinderjit Singh Thandi, Abdechakour Mechri, Merouane Debbah, Lucas C. Cordeiro,

(参考訳) ソフトウェアの脆弱性は、クラッシュ、データ損失、セキュリティ侵害など、数多くの問題を引き起こす可能性がある。これらの問題は品質を著しく侵害し、ソフトウェアアプリケーションやシステムの市場採用に悪影響を及ぼす可能性がある。静的解析のような従来のバグ修正手法は、しばしば偽陽性を生成する。 FV(Formal Verification)の形式である境界モデルチェックは、静的アナライザよりも正確な結果を提供するが、かなりのリソースを必要とし、開発者の生産性を著しく損なう。機械学習(ML)は、FVメソッドに匹敵する精度を達成でき、ほぼリアルタイムで人気のあるインスタントコード補完フレームワークで使用できるか? 本稿では,Falcon-40Bモデルから派生した1億1100万のパラメータしか持たない,ソフトウェア脆弱性の分類に適した,革新的なモデルアーキテクチャであるSecureFalconを紹介する。最高のパフォーマンスを達成するため、FormAIデータセットとFalconVulnDBという2つのデータセットを使用してモデルをトレーニングしました。 FalconVulnDBは、最近のパブリックデータセット、すなわちSySeVRフレームワーク、Draper VDISC、Bigvul、Diversevul、SARD Juliet、ReVealデータセットの組み合わせである。これらのデータセットには、CWE-119、CWE-120、CWE-476、CWE-122、CWE-190、CWE-121、CWE-78、CWE-787、CWE-20、CWE-762など、最も危険なソフトウェア脆弱性が含まれている。 SecureFalconはバイナリ分類で94%の精度、マルチクラス化で最大92%、即時CPU推論時間を実現している。 BERT、RoBERTa、CodeBERT、および従来のMLアルゴリズムといった既存のモデルよりも優れており、ソフトウェアの脆弱性検出とインスタントコード補完フレームワークの境界を推し進めることを約束している。

Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide more accurate outcomes compared to static analyzers, it demands substantial resources and significantly hinders developer productivity. Can Machine Learning (ML) achieve accuracy comparable to FV methods and be used in popular instant code completion frameworks in near real-time? In this paper, we introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model and explicitly tailored for classifying software vulnerabilities. To achieve the best performance, we trained our model using two datasets, namely the FormAI dataset and the FalconVulnDB. The FalconVulnDB is a combination of recent public datasets, namely the SySeVR framework, Draper VDISC, Bigvul, Diversevul, SARD Juliet, and ReVeal datasets. These datasets contain the top 25 most dangerous software weaknesses, such as CWE-119, CWE-120, CWE-476, CWE-122, CWE-190, CWE-121, CWE-78, CWE-787, CWE-20, and CWE-762. SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times. It outperforms existing models such as BERT, RoBERTa, CodeBERT, and traditional ML algorithms, promising to push the boundaries of software vulnerability detection and instant code completion frameworks.

翻訳日:2024-06-01 00:02:40 公開日:2024-05-29

# 単体及び多体量子カオスにおける局所レベル間隔の統計

Statistics of local level spacings in single- and many-body quantum chaos ( http://arxiv.org/abs/2308.06766v2 )

ライセンス: Link先を確認

Peng Tian, Roman Riser, Eugene Kanzieper,

(参考訳) 局所的なレベルの間隔の概念を導入し、確率行列理論のアプローチでそれらの統計を研究する。無限次元のランダム行列の極限において、平均局所間隔の普遍列と、量子系の大域対称性とその内部-カオスまたは正則-力学を一意に識別するそれらの比を決定する。これらの発見は、単体および多体量子系を監視するための新しい枠組みを提供するもので、リーマンゼータ関数の零点、不合理な矩形ビリヤードのスペクトル、サハデフ・イェーキタエフ・ハミルトンの多体スペクトルの数値実験によって裏付けられている。

We introduce a notion of local level spacings and study their statistics within a random-matrix-theory approach. In the limit of infinite-dimensional random matrices, we determine universal sequences of mean local spacings and of their ratios which uniquely identify the global symmetries of a quantum system and its internal -- chaotic or regular -- dynamics. These findings, which offer a new framework to monitor single- and many-body quantum systems, are corroborated by numerical experiments performed for zeros of the Riemann zeta function, spectra of irrational rectangular billiards and many-body spectra of the Sachdev-Ye-Kitaev Hamiltonians.

翻訳日:2024-06-01 00:02:40 公開日:2024-05-29

# 高速かつレグレトな最適アーム同定法:基本極限と低複雑さアルゴリズム

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms ( http://arxiv.org/abs/2309.00591v3 )

ライセンス: Link先を確認

Qining Zhang, Lei Ying,

(参考訳) 本稿では,2つの目的を持つ確率的マルチアーマッド帯域(MAB)問題について考察する。一最適な腕の素早い識別及びコミットメント (ii)$T$連続ラウンドの一連の報酬最大化。それぞれの目的は、個別によく研究されているが、つまり、最高の腕の識別である。 (i)と後悔の最小化 (ii) 実用的重要性にもかかわらず, 両目的の同時実現は未解決の問題である。本稿では,これら2つの目的を達成することを目的とした,emph{Regret Optimal Best Arm Identification} (ROBAI)を紹介する。事前に決定された停止時間と適応的な停止時間の両方でROBAIを解くため、EOCPとその変種と呼ばれるアルゴリズムをそれぞれ提示する。これはガウスと一般の包帯において漸近的最適後悔を達成できるだけでなく、事前決定された停止時間を持つ$\mathcal{O}(\log T)$ラウンドと適応的な停止時間を持つ$\mathcal{O}(\log^2 T)$ラウンドにおいて最適なアームにコミットする。さらに、ROBAIのコミットメント時間(サンプル複雑性と同等)の低い境界を特徴付け、EOCPとその変種が予め決定された停止時間に最適なサンプルであり、適応的な停止時間にほぼ最適であることを示す。数値的な結果は、我々の理論解析を裏付け、古典的 UCB アルゴリズムによってもたらされる興味深い「過剰探索」現象を明らかにし、EOCP は UCB よりもはるかに早く探索を中止しているにもかかわらず、より少ない後悔、すなわち $\mathcal{O}(\log T)$ 対 $\mathcal{O}(T)$ である。

This paper considers a stochastic Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. This paper introduces \emph{Regret Optimal Best Arm Identification} (ROBAI) which aims to achieve these dual objectives. To solve ROBAI with both pre-determined stopping time and adaptive stopping time requirements, we present an algorithm called EOCP and its variants respectively, which not only achieve asymptotic optimal regret in both Gaussian and general bandits, but also commit to the optimal arm in $\mathcal{O}(\log T)$ rounds with pre-determined stopping time and $\mathcal{O}(\log^2 T)$ rounds with adaptive stopping time. We further characterize lower bounds on the commitment time (equivalent to the sample complexity) of ROBAI, showing that EOCP and its variants are sample optimal with pre-determined stopping time, and almost sample optimal with adaptive stopping time. Numerical results confirm our theoretical analysis and reveal an interesting "over-exploration" phenomenon carried by classic UCB algorithms, such that EOCP has smaller regret even though it stops exploration much earlier than UCB, i.e., $\mathcal{O}(\log T)$ versus $\mathcal{O}(T)$, which suggests over-exploration is unnecessary and potentially harmful to system performance.

翻訳日:2024-05-31 23:52:32 公開日:2024-05-29

# Hi Model, generating 'nice' without 'good' is not bad as generate 'rice'!

Hi Model, generating 'nice' instead of 'good' is not as bad as generating 'rice'! Towards Context and Semantic Infused Dialogue Generation Loss Function and Evaluation Metric ( http://arxiv.org/abs/2309.05804v2 )

ライセンス: Link先を確認

Abhisek Tiwari, Muhammed Sinan, Kaushik Roy, Amit Sheth, Sriparna Saha, Pushpak Bhattacharyya,

(参考訳) 過去20年間、対話モデリングは、単純なルールベースの応答からパーソナライズされ説得力のある応答生成へと、大きな進歩を遂げてきた。しかし,これらの進歩にもかかわらず,対話生成の目的関数と評価指標はいまだに停滞している。これらの語彙ベースのメトリクス、例えばクロスエントロピーとBLEUには2つの重要な制限がある。 (a)意味的考慮のない単語間マッチング:「ニセ」と「ライス」を「良い」に生成できないために同じクレジットを割り当てる (b) 生成された応答を評価するための欠落したコンテキスト属性: 生成された応答が進行中の対話コンテキストに関連しているとしても、コーパスで提供される金の発話にマッチしないよう罰せられることがある。本稿では,これらの制約を包括的に検討し,Semantic Infused Contextualized diaLogue (SemTextualLogue)ロス関数と呼ばれる新たな損失関数を提案する。また、文脈と意味的関連性の両方を取り入れて、Dialuationと呼ばれる評価指標を定式化する。課題指向シナリオとオープンドメインシナリオを含む2つの対話コーパス上で,事前学習モデルと事前学習モデルの両方を実験した。 SemTextualLoguelossで訓練した対話生成モデルは,従来のクロスエントロピー損失関数よりも優れた性能を示した。その結果、対話生成モデルの効果的な訓練は、意味論と文脈を取り入れることに大きく依存していることが判明した。このパターンは、従来のメトリクスと比較して、文脈と意味の両方の考慮が人間の評価と強く相関する、導入されたダイアリュージョンメトリックにも反映されている。

Over the past two decades, dialogue modeling has made significant strides, moving from simple rule-based responses to personalized and persuasive response generation. However, despite these advancements, the objective functions and evaluation metrics for dialogue generation have remained stagnant. These lexical-based metrics, e.g., cross-entropy and BLEU, have two key limitations: (a) word-to-word matching without semantic consideration: It assigns the same credit for failure to generate "nice" and "rice" for "good", (b) missing context attribute for evaluating the generated response: Even if a generated response is relevant to the ongoing dialogue context, it may still be penalized for not matching the gold utterance provided in the corpus. In this paper, we first investigate these limitations comprehensively and propose a new loss function called Semantic Infused Contextualized diaLogue (SemTextualLogue) loss function. We also formulate an evaluation metric called Dialuation, incorporating both context and semantic relevance. We experimented with both non-pretrained and pre-trained models on two dialogue corpora, encompassing task-oriented and open-domain scenarios. We found that the dialogue generation models trained with SemTextualLogueloss attained superior performance compared to the traditional cross-entropy loss function. The findings establish that the effective training of a dialogue generation model hinges significantly on incorporating semantics and context. This pattern is also mirrored in the introduced Dialuation metric, where the consideration of both context and semantics correlates more strongly with human evaluation compared to traditional metrics.

翻訳日:2024-05-31 23:52:32 公開日:2024-05-29

# 深層学習における不特定性回避のためのループポーラリティ解析

Loop Polarity Analysis to Avoid Underspecification in Deep Learning ( http://arxiv.org/abs/2309.10211v2 )

ライセンス: Link先を確認

Donald Martin, Jr., David Kinney,

(参考訳) ディープラーニングは、データの複雑なパターンを検出するための強力なテクニックセットである。しかし、そのプロセスの因果構造が過小評価されると、深層学習モデルは脆くなり、データ生成プロセスの分布の変化に対する堅牢性に欠ける。本稿では,データ生成プロセスの因果構造を特定するツールとしてループ極性解析を応用し,深層学習パイプラインにおけるシステム構造とシステム挙動の関係について,より堅牢な理解を符号化する。 SIRモデルに基づくシミュレートされた流行データを用いて、システムを構成する異なるフィードバックループの極性を測定することで、ニューラルネットワークのより堅牢な推論を実現し、ディープラーニングモデルのアウト・オブ・ディストリビューション性能を改善し、システム力学にインスパイアされたアプローチを機械学習開発パイプラインに注入する方法を実証する。

Deep learning is a powerful set of techniques for detecting complex patterns in data. However, when the causal structure of that process is underspecified, deep learning models can be brittle, lacking robustness to shifts in the distribution of the data-generating process. In this paper, we turn to loop polarity analysis as a tool for specifying the causal structure of a data-generating process, in order to encode a more robust understanding of the relationship between system structure and system behavior within the deep learning pipeline. We use simulated epidemic data based on an SIR model to demonstrate how measuring the polarity of the different feedback loops that compose a system can lead to more robust inferences on the part of neural networks, improving the out-of-distribution performance of a deep learning model and infusing a system-dynamics-inspired approach into the machine learning development pipeline.

翻訳日:2024-05-31 23:52:32 公開日:2024-05-29

# 非エルゴードカイラル量子力学のためのライダーベルクプラットフォーム

Rydberg platform for non-ergodic chiral quantum dynamics ( http://arxiv.org/abs/2309.12392v2 )

ライセンス: Link先を確認

Riccardo J. Valencia-Tortora, Nicola Pancotti, Michael Fleischhauer, Hannes Bernien, Jamir Marino,

(参考訳) 本稿では,右(または左)の原子が励起された場合にのみ,原子が状態を変化させることができる方向の反ブロッケード条件により,リドベルク原子のキラル相互作用を工学的に制御する機構を提案する。提案手法のスケーラビリティにより,一方向キャラクタを持つ運動的制約付きモデルの多体ダイナミクスを探索することができる。我々は、原子に作用する2つの駆動場の強度を単に調整することで、傷跡、閉じ込め、あるいは局在化を通じて非エルゴード的挙動を観察する。我々は、我々のメカニズムが古典的なノイズの存在下でどのように持続し、相互作用におけるキラリティの度合いを調整できるかを議論し、中性原子配列を用いた指向性、強い相関の量子力学のフロンティアに向けて展開する。

We propose a mechanism for engineering chiral interactions in Rydberg atoms via a directional antiblockade condition, where an atom can change its state only if an atom to its right (or left) is excited. The scalability of our scheme enables us to explore the many-body dynamics of kinetically constrained models with unidirectional character. We observe non-ergodic behavior via either scars, confinement, or localization, upon simply tuning the strength of two driving fields acting on the atoms. We discuss how our mechanism persists in the presence of classical noise and how the degree of chirality in the interactions can be tuned, opening towards the frontier of directional, strongly correlated, quantum mechanics using neutral atoms arrays.

翻訳日:2024-05-31 23:52:32 公開日:2024-05-29

# Think, Act, and Ask: オープンワールドの対話型パーソナライズされたロボットナビゲーション

Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation ( http://arxiv.org/abs/2310.07968v4 )

ライセンス: Link先を確認

Yinpei Dai, Run Peng, Sikai Li, Joyce Chai,

(参考訳) Zero-Shot Object Navigation (ZSON)は、エージェントが未知の環境でオープン語彙オブジェクトへナビゲートすることを可能にする。 ZSONの既存の研究は主に、汎用オブジェクトクラスを見つけるための個別の命令に従うことに焦点を当てており、自然言語の相互作用の利用や、ユーザ固有のオブジェクトを特定する複雑さを無視している。これらの制限に対処するために、ZIPON(Zero-shot Interactive Personalized Object Navigation)を導入する。 ZIPON を解決するために,Large Language Models (LLM) を用いた Open-woRld Interactive persOnalized Navigation (ORION) と呼ばれる新しいフレームワークを提案する。実験の結果,ユーザフィードバックを活用できる対話型エージェントの性能は著しく向上した。しかし,タスク完了とナビゲーションとインタラクションの効率のバランスが良好であることは,すべての方法において依然として困難である。さらに,多様なユーザフィードバックフォームがエージェントのパフォーマンスに与える影響について,さらなる知見を提供する。コードはhttps://github.com/sled-group/navchat.comで入手できる。

Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. The existing works of ZSON mainly focus on following individual instructions to find generic object classes, neglecting the utilization of natural language interaction and the complexities of identifying user-specific objects. To address these limitations, we introduce Zero-shot Interactive Personalized Object Navigation (ZIPON), where robots need to navigate to personalized goal objects while engaging in conversations with users. To solve ZIPON, we propose a new framework termed Open-woRld Interactive persOnalized Navigation (ORION), which uses Large Language Models (LLMs) to make sequential decisions to manipulate different modules for perception, navigation and communication. Experimental results show that the performance of interactive agents that can leverage user feedback exhibits significant improvement. However, obtaining a good balance between task completion and the efficiency of navigation and interaction remains challenging for all methods. We further provide more findings on the impact of diverse user feedback forms on the agents' performance. Code is available at https://github.com/sled-group/navchat.

翻訳日:2024-05-31 23:52:32 公開日:2024-05-29

# 情報理論に基づく等価性の発見に向けて

Towards Information Theory-Based Discovery of Equivariances ( http://arxiv.org/abs/2310.16555v3 )

ライセンス: Link先を確認

Hippolyte Charvin, Nicola Catenacci Volpi, Daniel Polani,

(参考訳) 対称性の存在は、システムに厳密な制約のセットを課す。この制約された構造により、インテリジェントなエージェントがそのようなシステムと対話し、システムの対称性を内部化して情報処理によって学習と一般化の効率を大幅に改善することができる。並行して、複雑性に制約のある学習と行動の原則モデルが、情報理論の手法の利用を増大させる。ここでは、これら2つの視点を統合して、情報理論レンズがシステムの対称性の効果を「見る」ことができるかどうかを理解したい。そこで本研究では,学習と情報制約を考慮した適応行動に関する多くの原則研究において,生産的基盤として機能するインフォメーション・ボトルネック(Information Bottleneck)の新たな変種を提案する。離散的な場合と特定の技術的前提の下では)我々の手法は対称性と情報パロジニーのある種の双対性を定式化していることを示す。この情報理論的処理は、さらに「粗さ」が対応する最適圧縮によって保存される入力出力相互情報の量によって測定される「ソフト」同値の原理的概念を示唆する。この新たな概念は、有界有理性(bounded rationality)の分野と神経表現における対称性の研究の間の橋渡しを提供する。この枠組みは、(現実的かつソフトな)同値を自動的に発見することを可能にする。

The presence of symmetries imposes a stringent set of constraints on a system. This constrained structure allows intelligent agents interacting with such a system to drastically improve the efficiency of learning and generalization, through the internalisation of the system's symmetries into their information-processing. In parallel, principled models of complexity-constrained learning and behaviour make increasing use of information-theoretic methods. Here, we wish to marry these two perspectives and understand whether and in which form the information-theoretic lens can "see" the effect of symmetries of a system. For this purpose, we propose a novel variant of the Information Bottleneck principle, which has served as a productive basis for many principled studies of learning and information-constrained adaptive behaviour. We show (in the discrete case and under a specific technical assumption) that our approach formalises a certain duality between symmetry and information parsimony: namely, channel equivariances can be characterised by the optimal mutual information-preserving joint compression of the channel's input and output. This information-theoretic treatment furthermore suggests a principled notion of "soft" equivariance, whose "coarseness" is measured by the amount of input-output mutual information preserved by the corresponding optimal compression. This new notion offers a bridge between the field of bounded rationality and the study of symmetries in neural representations. The framework may also allow (exact and soft) equivariances to be automatically discovered.

翻訳日:2024-05-31 23:42:43 公開日:2024-05-29

# 量子電子回路の統一線形応答理論

Unified linear response theory of quantum electronic circuits ( http://arxiv.org/abs/2310.17399v2 )

ライセンス: Link先を確認

L. Peri, M. Benito, C. J. B. Ford, M. F. Gonzalez-Zalba,

(参考訳) 有限周波数での多レベル量子系の電気応答のモデル化は、典型的には2つの不完全パラダイムの文脈で行われてきた。 (i)任意の周波数で有効であるが、動的損失を無視する入出力理論及び (II)半古典理論は、よくダイナミックな散逸効果を捉えるが、低周波数でのみ正確である。ここでは、任意の周波数に有効な統一理論を開発し、緩和と強調によってもたらされる量子的振る舞いと非一意的効果の両方を捉える。この理論により、マルチレベルシステムは、エネルギー準位にのみ依存する共振性RCC回路の普遍的な小信号等価回路モデルによって記述できる。我々は,2重量子ドット電荷量子ビットとマヨラナ量子ビットに適用し,アディバティックから共鳴,コヒーレントから非コヒーレントまで連続的にシステムを記述する能力を示し,量子状態の読み出しを改善するための新しい現実的な実験を提案する。我々のモデルは、ハイブリッド量子古典回路の設計を容易にし、量子ビット制御と量子状態の読み出しをシミュレーションする。

Modelling the electrical response of multi-level quantum systems at finite frequency has been typically performed in the context of two incomplete paradigms: (i) input-output theory, which is valid at any frequency but neglects dynamic losses, and (ii) semiclassical theory, which captures well dynamic dissipation effects but is only accurate at low frequencies. Here, we develop a unifying theory, valid for arbitrary frequencies, that captures both the quantum behaviour and the non-unitary effects introduced by relaxation and dephasing. The theory allows a multi-level system to be described by a universal small-signal equivalent circuit model, a resonant RLC circuit, whose topology only depends on the number of energy levels. We apply our model to a double quantum-dot charge qubit and a Majorana qubit, showing the capability to continuously describe the systems from adiabatic to resonant and from coherent to incoherent, suggesting new and realistic experiments for improved quantum state readout. Our model will facilitate the design of hybrid quantum-classical circuits and the simulation of qubit control and quantum state readout.

翻訳日:2024-05-31 23:42:43 公開日:2024-05-29

# 量子状態の安定化器分解を最適化するための基底

Bases for optimising stabiliser decompositions of quantum states ( http://arxiv.org/abs/2311.17384v2 )

ライセンス: Link先を確認

Nadish de Silva, Ming Yin, Sergii Strelchuk,

(参考訳) スタビライザー状態は量子計算理論において中心的な役割を果たす。例えば、最も一般的な量子誤り訂正スキームの計算基底状態を符号化するのに使用される。任意量子状態は多くの安定化器分解(安定化器状態の重ね合わせとして表される方法)を許容する。安定化器分解の構造を理解することは、短期量子コンピュータの検証とシミュレーションに重要な応用をもたらす。我々は、$n$-qubit 安定化状態の線型依存のベクトル空間を導入し、研究する。これらの空間はベクトルを含む標準基底を持ち、その大きさは指数関数的に$n$で成長する。定数サイズ3の線形依存のエレガントな基底を構築する。我々のスパース基底は、まずすべての$n$-qubit安定化状態の辞書をコンパイルせずに計算できる。我々は既存の手法よりも多くの量子ビットの状態の安定化度を明示的に計算するためにそれらを利用する。最後に、魔法状態の安定化器ランクに関する理論的境界を改善するための将来の応用について述べる。

Stabiliser states play a central role in the theory of quantum computation. For example, they are used to encode computational basis states in the most common quantum error correction schemes. Arbitrary quantum states admit many stabiliser decompositions: ways of being expressed as a superposition of stabiliser states. Understanding the structure of stabiliser decompositions has significant applications in verifying and simulating near-term quantum computers. We introduce and study the vector space of linear dependencies of $n$-qubit stabiliser states. These spaces have canonical bases containing vectors whose size grows exponentially in $n$. We construct elegant bases of linear dependencies of constant size three. Critically, our sparse bases can be computed without first compiling a dictionary of all $n$-qubit stabiliser states. We utilise them to explicitly compute the stabiliser extent of states of more qubits than is feasible with existing techniques. Finally, we delineate future applications to improving theoretical bounds on the stabiliser rank of magic states.

翻訳日:2024-05-31 23:32:48 公開日:2024-05-29

# 古典的フレネット・サーレット装置から量子力学的進化の曲率とねじりまで : 第1報定常ハミルトニアス

From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part I. Stationary Hamiltonians ( http://arxiv.org/abs/2311.18458v3 )

ライセンス: Link先を確認

Paul M. Alsing, Carlo Cafaro,

(参考訳) 三次元ユークリッド空間における空間曲線のフレネット・セレット装置が曲線の局所幾何学を決定することが知られている。特に、Frenet-Serret 装置は曲線の曲率やねじれを含む重要な幾何学的不変量を特定する。また、量子情報科学において、物理系に関する量子情報を符号化する量子状態を巧みに操作する際には、低複雑性と高効率が重要な特徴であることが認識されている。本稿では,動的に変化する状態ベクトルによって追跡される量子曲線の曲げとねじれの定量化について,幾何学的視点を提案する。具体的には、シュロディンガー方程式を指定した定常ハミルトニアンの下で一元的に進化する平行輸送された純粋量子状態によって追跡される射影ヒルベルト空間における量子軌道に対するフレネット・セレット装置の量子バージョンを提案する。提案した定数曲率係数は, 接ベクトルと状態ベクトルとの共変微分の大きさ2乗で与えられる。提案した定数ねじれ係数は、接ベクトルと状態ベクトルの両方に直交する接ベクトルの共変微分の射影の大きさの2乗で定義される。トーション係数は、量子曲線のねじれの便利な測度を提供する。注目すべきは、提案した曲率とねじり係数が文献に存在するものと一致するが、全く異なる方法で導入されることである。

It is known that the Frenet-Serret apparatus of a space curve in three-dimensional Euclidean space determines the local geometry of curves. In particular, the Frenet-Serret apparatus specifies important geometric invariants, including the curvature and the torsion of a curve. It is also acknowledged in quantum information science that low complexity and high efficiency are essential features to achieve when cleverly manipulating quantum states that encode quantum information about a physical system. In this paper, we propose a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by dynamically evolving state vectors. Specifically, we propose a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced by a parallel-transported pure quantum state evolving unitarily under a stationary Hamiltonian specifying the Schrodinger equation. Our proposed constant curvature coefficient is given by the magnitude squared of the covariant derivative of the tangent vector to the state vector and represents a useful measure of the bending of the quantum curve. Our proposed constant torsion coefficient, instead, is defined in terms of the magnitude squared of the projection of the covariant derivative of the tangent vector, orthogonal to both the tangent vector and the state vector. The torsion coefficient provides a convenient measure of the twisting of the quantum curve. Remarkably, we show that our proposed curvature and torsion coefficients coincide with those existing in the literature, although introduced in a completely different manner...

翻訳日:2024-05-31 23:32:48 公開日:2024-05-29

# 古典的フレネット・サーレット装置から量子力学的進化の曲率とねじりまで : 第2報非定常ハミルトニアス

From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part II. Nonstationary Hamiltonians ( http://arxiv.org/abs/2311.18463v3 )

ライセンス: Link先を確認

Paul M. Alsing, Carlo Cafaro,

(参考訳) 非定常ハミルトニアンの下で進化する状態ベクトルによって追跡される量子曲線の曲げとねじれの定量化に関する幾何学的視点を示す。具体的には、定常ハミルトニアンに対する既存の幾何学的視点を頼りに、時変曲率とねじれ係数の両方が重要な役割を果たす時間依存量子力学シナリオへの我々の理論的構成の一般化について議論する。具体的には、シュロディンガーの進化方程式を指定した時間依存ハミルトニアンの下で、並列輸送された純粋量子状態によって一元的に進化したヒルベルト空間における量子軌道に対するフレネット・セレット装置の量子バージョンを示す。時変曲率係数は、接ベクトルの共変微分を状態ベクトルに二乗して指定し、量子曲線の曲げを測定する。時間変化のねじれ係数は、接ベクトルの共変微分の状態ベクトルへの射影の大きさの2乗、接ベクトルと状態ベクトルに直交し、さらに量子曲線のねじれを測定することによって与えられる。時間変化の設定は、統計的観点からよりリッチな構造を示す。例えば、時間に依存しない構成とは異なり、一般化された分散の概念は非定常ハミルトニアンの下で進化する量子状態によってトレースされた曲線のねじれの定義において非自明に現れる。我々の構成の意義を物理的に説明するために、正弦波振動時間依存電位によって指定された、正確に可溶な時間依存の2状態ラビ問題に適用する。

We present a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by state vectors evolving under nonstationary Hamiltonians. Specifically, relying on the existing geometric viewpoint for stationary Hamiltonians, we discuss the generalization of our theoretical construct to time-dependent quantum-mechanical scenarios where both time-varying curvature and torsion coefficients play a key role. Specifically, we present a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced out by a parallel-transported pure quantum state evolving unitarily under a time-dependent Hamiltonian specifying the Schrodinger evolution equation. The time-varying curvature coefficient is specified by the magnitude squared of the covariant derivative of the tangent vector to the state vector and measures the bending of the quantum curve. The time-varying torsion coefficient, instead, is given by the magnitude squared of the projection of the covariant derivative of the tangent vector to the state vector, orthogonal to the tangent vector and state vector and, in addition, measures the twisting of the quantum curve. We find that the time-varying setting exhibits a richer structure from a statistical standpoint. For instance, unlike the time-independent configuration, we find that the notion of generalized variance enters nontrivially in the definition of the torsion of a curve traced out by a quantum state evolving under a nonstationary Hamiltonian. To physically illustrate the significance of our construct, we apply it to an exactly soluble time-dependent two-state Rabi problem specified by a sinusoidal oscillating time-dependent potential...

翻訳日:2024-05-31 23:32:48 公開日:2024-05-29

# MonoNPHM:モノクロビデオからの動的頭部再構成

MonoNPHM: Dynamic Head Reconstruction from Monocular Videos ( http://arxiv.org/abs/2312.06740v2 )

ライセンス: Link先を確認

Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos, Martin Rünz, Lourdes Agapito, Matthias Nießner,

(参考訳) モノクラーRGBビデオからの動的3次元頭部再構成のためのモノクラーニューラルパラメトリックヘッドモデル(MonoNPHM)を提案する。そこで本研究では,ニューラルパラメトリックモデル上でのテクスチャ場をパラメータ化する潜在外観空間を提案する。我々は、RGBからの勾配が逆レンダリング中の潜時幾何学符号に効果的に影響を及ぼすような、下層の幾何学と相関する予測色値を制約する。表現空間の表現能力を高めるために,超次元で後方変形場を拡大し,位相的に困難な表現における色や幾何学的表現を改善する。先行学習としてMonoNPHMを用いて,符号付き距離場に基づくボリュームレンダリングを用いた3次元頭部再構成の課題にアプローチする。後ろ向きの変形場を数値的に反転させることで,我々の正準幾何学的表現と密接な結びつきを持つ顔アンカーポイントを用いたランドマークロスを組み込んだ。単眼RGBビデオからの動的顔再構成の課題を評価するために,カジュアルな条件下でのKinectシークエンスを20個記録する。 MonoNPHMはすべてのベースラインを大きなマージンで上回り、RGBトラッキングを通じて容易にアクセス可能なニューラルパラメトリック顔モデルに向けた重要なステップとなる。

We present Monocular Neural Parametric Head Models (MonoNPHM) for dynamic 3D head reconstructions from monocular RGB videos. To this end, we propose a latent appearance space that parameterizes a texture field on top of a neural parametric model. We constrain predicted color values to be correlated with the underlying geometry such that gradients from RGB effectively influence latent geometry codes during inverse rendering. To increase the representational capacity of our expression space, we augment our backward deformation field with hyper-dimensions, thus improving color and geometry representation in topologically challenging expressions. Using MonoNPHM as a learned prior, we approach the task of 3D head reconstruction using signed distance field based volumetric rendering. By numerically inverting our backward deformation field, we incorporated a landmark loss using facial anchor points that are closely tied to our canonical geometry representation. To evaluate the task of dynamic face reconstruction from monocular RGB videos we record 20 challenging Kinect sequences under casual conditions. MonoNPHM outperforms all baselines with a significant margin, and makes an important step towards easily accessible neural parametric face models through RGB tracking.

翻訳日:2024-05-31 23:32:48 公開日:2024-05-29

# 物質波の交叉ウィグナー関数によるグーイ位相と量子干渉

Gouy phase and quantum interference with cross-Wigner functions for matter-waves ( http://arxiv.org/abs/2401.00083v2 )

ライセンス: Link先を確認

Lucas S. Marinho, Pedro R. Dieguez, Carlos H. S. Vieira, Irismar G. da Paz,

(参考訳) グーイ相は、古典的な電磁波から物質波、量子光学まで、様々な波の現象を正確に記述するために不可欠である。本研究では, 相互ウィグナー変換に基づく位相空間法を用いて, 相関ガウス波パケットによって特徴付けられる物質波の進化における空間的および時間的干渉を解析する。まず、その自由進化を伴う初期関数の交叉ウィグナーと、二重スリット配置による進化について考察する。グローバルなグーイ位相を取得する波動関数と異なり、クロスウィグナーは進化時間が異なるため、グーイ位相差を取得する。以上の結果から,時間的干渉の正確な記述には時間的様相が重要であることが示唆された。さらに、物質波を用いた二重スリット実験において、空間強度干渉項からクロスウィグナーを再構成するためのウィグナー関数に基づく手法を提案する。

The Gouy phase is essential for accurately describing various wave phenomena, ranging from classical electromagnetic waves to matter waves and quantum optics. In this work, we employ phase-space methods based on the cross-Wigner transformation to analyze spatial and temporal interference in the evolution of matter waves characterized initially by a correlated Gaussian wave packet. First, we consider the cross-Wigner of the initial function with its free evolution, and second for the evolution through a double-slit arrangement. Different from the wave function which acquires a global Gouy phase, we find that the cross-Wigner acquires a Gouy phase difference due to different evolution times. The results suggest that temporal like-Gouy phases are important for an accurate description of temporal interference. Furthermore, we propose a technique based on the Wigner function to reconstruct the cross-Wigner from the spatial intensity interference term in a double-slit experiment with matter waves.

翻訳日:2024-05-31 23:23:04 公開日:2024-05-29

# GA-SmaAt-GNet: 極端沈殿用生成逆小注意GNet

GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting ( http://arxiv.org/abs/2401.09881v2 )

ライセンス: Link先を確認

Eloy Reulen, Siamak Mehrkanoon,

(参考訳) 近年、データ駆動モデリングのアプローチは、気象予報など様々な気象学の分野で大きな注目を集めている。しかし、これらの手法は極度の気象条件を扱う際にしばしば困難に直面する。そこで本研究では, 降水量抑制のためのGA-SmaAt-GNetモデルを提案する。このモデルは独自のSmaAt-GNetジェネレータを備えており、成功しているSmaAt-UNetアーキテクチャを拡張し、降水マスク(二値降水マップ)を統合して予測精度を高めることができる。さらに、GA-SmaAt-GNetはPix2Pixアーキテクチャにインスパイアされた注意増強された識別器を組み込んでいる。この革新的なフレームワークは、複数のデータソースを使用して、生成的な降水今ストリーミングする方法を舗装する。オランダの実際の降水データを用いて,SmaAt-GNetとGA-SmaAt-GNetの性能を評価し,他のモデルと比較して,全体的な性能および極端な降水イベントに対する顕著な改善を明らかにした。具体的には,夏と秋の降水強度が平均的にピークである場合,本アーキテクチャは主な性能向上を示す。さらに,GA-SmaAt-GNetモデルと降水データセットの不確実性解析を行い,その予測能力について考察する。最後に、Grad-CAMを用いてモデルの予測を視覚的に説明し、ネットワーク全体の入力アクティベーションの領域をハイライトするアクティベーションヒートマップを生成する。

In recent years, data-driven modeling approaches have gained significant attention across various meteorological applications, particularly in weather forecasting. However, these methods often face challenges in handling extreme weather conditions. In response, we present the GA-SmaAt-GNet model, a novel generative adversarial framework for extreme precipitation nowcasting. This model features a unique SmaAt-GNet generator, an extension of the successful SmaAt-UNet architecture, capable of integrating precipitation masks (binarized precipitation maps) to enhance predictive accuracy. Additionally, GA-SmaAt-GNet incorporates an attention-augmented discriminator inspired by the Pix2Pix architecture. This innovative framework paves the way for generative precipitation nowcasting using multiple data sources. We evaluate the performance of SmaAt-GNet and GA-SmaAt-GNet using real-life precipitation data from the Netherlands, revealing notable improvements in overall performance and for extreme precipitation events compared to other models. Specifically, our proposed architecture demonstrates its main performance gain in summer and autumn, when precipitation intensity is typically at its peak. Furthermore, we conduct uncertainty analysis on the GA-SmaAt-GNet model and the precipitation dataset, providing insights into its predictive capabilities. Finally, we employ Grad-CAM to offer visual explanations of our model's predictions, generating activation heatmaps that highlight areas of input activation throughout the network.

翻訳日:2024-05-31 23:23:04 公開日:2024-05-29

# ログアクセス不要のブラックボックス大言語モデル強化のためのスケッチガイド付き制約付き復号法

Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access ( http://arxiv.org/abs/2401.09967v2 )

ライセンス: Link先を確認

Saibo Geng, Berkay Döner, Chris Wendler, Martin Josifoski, Robert West,

(参考訳) 制約付き復号化(Constrained decoding)は、言語モデル出力の制約を強制するテクニックで、再訓練やアーキテクチャの変更なしにテキスト生成を制御する手段を提供する。しかしながら、そのアプリケーションは一般的に、ユーザーが次のトーケン分布(通常はソフトマックスロジットを介して)にアクセスできるモデルに限定されており、ブラックボックスの大規模言語モデル(LLM)で制限される。本稿では,ブラックボックスLLMのロジットにアクセスせずに動作するブラックボックスLLMの制約付き復号法であるスケッチ誘導制約復号法(SGCD)を提案する。 SGCDは、ローカルにホストされた補助モデルを使用して、制約のないブラックボックスLSMの出力を洗練し、この初期出力を「スケッチ」として効果的に処理し、さらなる実験を行う。このアプローチは、従来のロジットベースのテクニックを補完するものであり、完全なモデルの透明性が利用できない設定で制約付きデコードの適用を可能にする。本研究では,複雑なNLPタスクに対するブラックボックスLLMの有用性と柔軟性をいかに向上させるかを示す。

Constrained decoding, a technique for enforcing constraints on language model outputs, offers a way to control text generation without retraining or architectural modifications. Its application is, however, typically restricted to models that give users access to next-token distributions (usually via softmax logits), which poses a limitation with blackbox large language models (LLMs). This paper introduces sketch-guided constrained decoding (SGCD), a novel approach to constrained decoding for blackbox LLMs, which operates without access to the logits of the blackbox LLM. SGCD utilizes a locally hosted auxiliary model to refine the output of an unconstrained blackbox LLM, effectively treating this initial output as a "sketch" for further elaboration. This approach is complementary to traditional logit-based techniques and enables the application of constrained decoding in settings where full model transparency is unavailable. We demonstrate the efficacy of SGCD through experiments in closed information extraction and constituency parsing, showing how it enhances the utility and flexibility of blackbox LLMs for complex NLP tasks.

翻訳日:2024-05-31 23:23:04 公開日:2024-05-29

# Chem-FINESE:テキスト再構成によるファインショット要素抽出の検証

Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction ( http://arxiv.org/abs/2401.10189v4 )

ライセンス: Link先を確認

Qingyun Wang, Zixuan Zhang, Hongxiang Li, Xuan Liu, Jiawei Han, Huimin Zhao, Heng Ji,

(参考訳) 化学領域における微粒な数発の実体抽出は、2つの固有の課題に直面している。第一に、一般領域におけるエンティティ抽出タスクと比較して、化学論文からの文は、通常より多くのエンティティを含む。さらに、エンティティ抽出モデルは通常、長い尾型のエンティティを抽出することが困難である。本稿では,これら2つの課題に対処するために,シークエンス・ツー・シーケンス(seq2seq)をベースとした複数ショットのエンティティ抽出手法であるChem-FINESEを提案する。私たちのChem-FINESEは、入力文から名前付きエンティティを抽出するSeq2seqエンティティ抽出器と、抽出されたエンティティから元の入力文を再構成するSeq2seqセルフバリデーションモジュールの2つのコンポーネントを備えている。優れたエンティティ抽出システムがエンティティを忠実に抽出する必要があるという事実にインスパイアされた新しい自己検証モジュールは、エンティティ抽出結果を活用して元の入力文を再構築する。さらに,抽出過程における過剰コピーを減らすために,新たなコントラスト損失を設計する。最後に、ChemNERスキーマでドメインの専門家によって注釈付けされた、新しいきめ細かい化学エンティティ抽出データセットであるChemNER+をリリースする。 ChemNER+とCHEMETのデータセットによる数ショット設定の実験では、新たに提案したフレームワークは、それぞれ8.26%と6.84%の絶対F1スコアゲインに寄与している。

Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq) based few-shot entity extraction approach, to address these two challenges. Our Chem-FINESE has two components: a seq2seq entity extractor to extract named entities from the input sentence and a seq2seq self-validation module to reconstruct the original input sentence from extracted entities. Inspired by the fact that a good entity extraction system needs to extract entities faithfully, our new self-validation module leverages entity extraction results to reconstruct the original input sentence. Besides, we design a new contrastive loss to reduce excessive copying during the extraction process. Finally, we release ChemNER+, a new fine-grained chemical entity extraction dataset that is annotated by domain experts with the ChemNER schema. Experiments in few-shot settings with both ChemNER+ and CHEMET datasets show that our newly proposed framework has contributed up to 8.26% and 6.84% absolute F1-score gains respectively.

翻訳日:2024-05-31 23:23:04 公開日:2024-05-29

# ループ変換器を用いたグラフアルゴリズムのシミュレーション

Simulation of Graph Algorithms with Looped Transformers ( http://arxiv.org/abs/2402.01107v2 )

ライセンス: Link先を確認

Artur Back de Luca, Kimon Fountoulakis,

(参考訳) ニューラルネットワークを用いたグラフアルゴリズムの実行は、最近、有望な経験的進歩のために大きな関心を集めている。このことは、ニューラルネットワークが推論ステップをリレーショナルデータで再現する方法について、さらなる理解を動機付けている。本研究では,理論的な観点から,グラフ上のアルゴリズムをシミュレートするトランスフォーマーネットワークの能力について検討する。私たちが使用しているアーキテクチャは、グラフと相互作用する追加の注意頭を持つループ変換器です。我々は,このアーキテクチャがDijkstraの最短経路,Breadth- and Depth-First Search,Kosarajuの強結合成分,および複数のアルゴリズムを同時にシミュレーションできることを示す。ネットワーク内のパラメータ数は入力グラフのサイズによって増加しないため、ネットワークは上記のアルゴリズムを任意のグラフに対してシミュレートすることができる。この性質にもかかわらず、有限精度による解のシミュレーションには限界がある。最後に,付加的なアテンションヘッドを利用する場合のチューリング完全度を一定幅で示す。

The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.

翻訳日:2024-05-31 23:13:17 公開日:2024-05-29

# 固有スペクトルによるカーネルリッジレス回帰におけるオーバーフィッティングの特徴

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum ( http://arxiv.org/abs/2402.01297v3 )

ライセンス: Link先を確認

Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius,

(参考訳) 固定された入力次元に対するオーバーパラメータ化された状態において、カーネルリッジレス回帰(KRR)に対する既存の非漸近的テストエラー境界を強化するために、カーネル行列の条件数に対する新しい境界を導出する。多項式スペクトル減衰を持つ核に対しては、以前の研究から境界を回復し、指数減衰に対しては、我々の境界は非自明で新規である。私たちの貢献は2つあります。一ガウス以下の設計の前提の下で、過度に誘惑された過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過度な過大な過度の現象を厳格に証明し、文献の既存のギャップを埋めること。 (II) 従来のガウス設計の前提を用いてKRR一般化を近似することに対する懸念を提起し, この特徴の独立性が, 過度な適合性を保証する上で重要な役割を担っていることを確認した。

We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.

翻訳日:2024-05-31 23:13:17 公開日:2024-05-29

# S2malloc: 統計的に安全なアロケータ

S2malloc: Statistically Secure Allocator for Use-After-Free Protection And More ( http://arxiv.org/abs/2402.01894v2 )

ライセンス: Link先を確認

Ruizhe Wang, Meng Xu, N. Asokan,

(参考訳) ヒープメモリへの攻撃、メモリオーバーフロー、ダブルおよび無効なフリー、UAF(Use-after-free)、および様々なヒープ・スプレー技術は増加を続けている。既存のエントロピーベースの安全なメモリアロケータは、これらの攻撃ベクトルのほとんど全てに対して統計的に防御する。彼らはUAF攻撃に対する防御を主張するが、その設計は(失敗に終わった)試みを検出するように調整されていない。このため、このエントロピーベースの保護に打ち勝つために、攻撃者はヒープスプレーの可能性を秘め、同じ攻撃を繰り返すだけで成功の可能性がさらに向上する。 S2mallocを導入し、他のセキュリティ保証を妥協したり、大幅な性能上のオーバーヘッドを発生させることなく、UAF-attempt検出を強化することを目的としている。これを実現するために、UAFの試みを検知する自由ブロックカナリア(FBC)、攻撃者が被害者のオブジェクトを正確に上書きするのを阻止するランダムインブロックオフセット(RIO)、攻撃者のアドレスに基づいてブロックサイズを推定するランダムバッグレイアウト(RBL)の3つの革新的な構成を用いる。私たちはそれを示します (a) RIOオフセットのオブジェクトサイズを25%保存することにより、攻撃者が同じポインタを再利用した場合は8バイトのカナリアが69%の保護率を提供し、攻撃者が64バイトのオブジェクトをターゲットとするUAF攻撃に対して、他の攻撃に対して同等またはそれ以上のセキュリティ保証を持たずに、96%の保護率を提供する。 (b) S2mallocは実用的であり、PARSECでの実行時のオーバーヘッドはわずか2.8%、SPECでは11.5%である。最先端のエントロピーベースのアロケータと比較して、S2mallocはさらなる性能オーバーヘッドを発生させることなくUAF保護を改善する。 UAFを緩和するアロケータと比較して、S2mallocは、オーバーヘッドを大幅に低減するために、保護の失敗の極小確率で取引する。

Attacks on heap memory, encompassing memory overflow, double and invalid free, use-after-free (UAF), and various heap spraying techniques are ever-increasing. Existing entropy-based secure memory allocators provide statistical defenses against virtually all of these attack vectors. Although they claim protections against UAF attacks, their designs are not tailored to detect (failed) attempts. Consequently, to beat this entropy-based protection, an attacker can simply launch the same attack repeatedly with the potential use of heap spraying to further improve their chance of success. We introduce S2malloc, aiming to enhance UAF-attempt detection without compromising other security guarantees or introducing significant performance overhead. To achieve this, we use three innovative constructs in secure allocator design: free block canaries (FBC) to detect UAF attempts, random in-block offset (RIO) to stop the attacker from accurately overwriting the victim object, and random bag layout (RBL) to impede attackers from estimating the block size based on its address. We show that (a) by reserving 25% of the object size for the RIO offset, an 8-byte canary offers a 69% protection rate if the attacker reuses the same pointer and 96% protection rate if the attacker does not, against UAF exploitation attempts targeting a 64 bytes object, with equal or higher security guarantees against all other attacks; and (b) S2malloc is practical, with only a 2.8% run-time overhead on PARSEC and an 11.5% overhead on SPEC. Compared to state-of-the-art entropy-based allocators, S2malloc improves UAF-protection without incurring additional performance overhead. Compared to UAF-mitigating allocators, S2malloc trades off a minuscule probability of failed protection for significantly lower overhead.

翻訳日:2024-05-31 23:13:17 公開日:2024-05-29

# ニューラルコントラクトのダイナミクスを学習する - 線形化の拡張とグローバルな保証

Learning Neural Contracting Dynamics: Extended Linearization and Global Guarantees ( http://arxiv.org/abs/2402.08090v3 )

ライセンス: Link先を確認

Sean Jaffe, Alexander Davydov, Deniz Lapsekili, Ambuj Singh, Francesco Bullo,

(参考訳) 学習力学系におけるグローバルな安定性と堅牢性を保証することは、不確実性に直面したシステムの健全性を保証するために不可欠である。拡張線形化契約力学(ELCD)は,グローバルな契約性を保証するニューラルネットワークベースの力学系である。 ELCDの鍵となる特徴は、非線形ベクトル場の拡張線型化のパラメトリゼーションである。最も基本的な形では、ELCDは保証される。 (i)グローバルに指数関数的に安定する (二)均衡縮小、及び (三)世界規模のメートル法に関する契約データ空間におけるより一般的なメトリクスに対する縮約を可能にするため、データ空間と潜在空間の間の微分同相を訓練し、潜在空間における縮約を強制し、データ空間における大域的縮約性を保証する。我々は,高次元LASA,マルチリンク振り子,ローゼンブロックデータセット上でのELCDの性能を示す。

Global stability and robustness guarantees in learned dynamical systems are essential to ensure well-behavedness of the systems in the face of uncertainty. We present Extended Linearized Contracting Dynamics (ELCD), the first neural network-based dynamical system with global contractivity guarantees in arbitrary metrics. The key feature of ELCD is a parametrization of the extended linearization of the nonlinear vector field. In its most basic form, ELCD is guaranteed to be (i) globally exponentially stable, (ii) equilibrium contracting, and (iii) globally contracting with respect to some metric. To allow for contraction with respect to more general metrics in the data space, we train diffeomorphisms between the data space and a latent space and enforce contractivity in the latent space, which ensures global contractivity in the data space. We demonstrate the performance of ELCD on the high dimensional LASA, multi-link pendulum, and Rosenbrock datasets.

翻訳日:2024-05-31 21:05:54 公開日:2024-05-29

# GradSafe: 安全臨界勾配解析によるLCMの脱獄プロンプト検出

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis ( http://arxiv.org/abs/2402.13494v2 )

ライセンス: Link先を確認

Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong,

(参考訳) 大規模言語モデル(LLM)は、脱獄プロンプトからの脅威に直面している。 jailbreakプロンプトを検出する既存の方法は、主にオンラインモデレーションAPIまたは微調整LDMである。しかし、これらの戦略は、広範囲でリソース集約的なデータ収集とトレーニングプロセスを必要とすることが多い。本研究では, LLMにおける安全クリティカルパラメータの勾配を精査し, 脱獄プロンプトを効果的に検出するGradSafeを提案する。 LLMのジェイルブレイクに対する損失の勾配は、コンプライアンス応答と組み合わせることで、特定の安全クリティカルパラメータに類似したパターンを示す。対照的に、安全なプロンプトは異なる勾配パターンをもたらす。この観測に基づいて、GradSafeは、(コンプライアンス対応を備えた)プロンプトから勾配を分析して、Jailbreakプロンプトを正確に検出する。 Llama Guardは、大規模なデータセットによる微調整によってジェイルブレイクのプロンプトを検出するが、Llama-2にさらなるトレーニングなしで適用されたGradSafeは、Llama Guardより優れていた。 ToxicChatとXSTestで評価したように、この優れたパフォーマンスはゼロショットとアダプションの両方のシナリオで一貫しています。ソースコードはhttps://github.com/xyq7/GradSafeで入手できる。

Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of safety-critical parameters in LLMs. Our method is grounded in a pivotal observation: the gradients of an LLM's loss for jailbreak prompts paired with compliance response exhibit similar patterns on certain safety-critical parameters. In contrast, safe prompts lead to different gradient patterns. Building on this observation, GradSafe analyzes the gradients from prompts (paired with compliance responses) to accurately detect jailbreak prompts. We show that GradSafe, applied to Llama-2 without further training, outperforms Llama Guard, despite its extensive finetuning with a large dataset, in detecting jailbreak prompts. This superior performance is consistent across both zero-shot and adaptation scenarios, as evidenced by our evaluations on ToxicChat and XSTest. The source code is available at https://github.com/xyq7/GradSafe.

翻訳日:2024-05-31 20:54:36 公開日:2024-05-29

# UNITS: 統合マルチタスク時系列モデル

UNITS: A Unified Multi-Task Time Series Model ( http://arxiv.org/abs/2403.00131v2 )

ライセンス: Link先を確認

Shanghua Gao, Teddy Koker, Owen Queen, Thomas Hartvigsen, Theodoros Tsiligkaridis, Marinka Zitnik,

(参考訳) 時系列モデルの進歩は、従来のディープラーニング手法から、事前訓練された基礎モデルへのシフトを促している。事前訓練されたトランスフォーマーと再プログラムされたテキストベースのLCMは、最先端の結果を報告するが、最高のパフォーマンスのアーキテクチャはタスクによって大きく異なり、しばしばモデルは時系列予測のみに焦点を当てるなど、限られた範囲を持つ。予測的および生成的時系列タスクを単一のフレームワークで統一するモデルは、達成が困難なままである。タスクトークン化を用いたマルチタスク時系列モデルUniTSを導入し,予測および生成タスクを単一モデル内で表現する。 UniTSは、ユニバーサル時系列表現を得るために設計された改良されたトランスフォーマーブロックを利用する。この設計は、多種多様な動的パターン、サンプリングレート、時間スケールを持つ、多種多様なマルチドメイン事前トレーニングデータセットから、多くの下流データセットへの転送可能性を誘導する。人間の活動センサー、医療、エンジニアリング、ファイナンスドメインにまたがる38のデータセットに対して、UniTSモデルは、12の予測モデル、20の分類モデル、18の異常検出モデル、および16の計算モデルに対して好意的に機能する。 UniTSは、新しいデータドメインやタスクを評価する際に、効果的な数ショットと迅速な学習機能を示す。従来のシングルタスク設定では、UniTSは強いタスク特化時系列モデルより優れている。ソースコードとデータセットはhttps://github.com/mims-harvard/UniTSで公開されている。

Advances in time series models are driving a shift from conventional deep learning methods to pre-trained foundational models. While pre-trained transformers and reprogrammed text-based LLMs report state-of-the-art results, the best-performing architectures vary significantly across tasks, and models often have limited scope, such as focusing only on time series forecasting. Models that unify predictive and generative time series tasks under a single framework remain challenging to achieve. We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. UniTS leverages a modified transformer block designed to obtain universal time series representations. This design induces transferability from a heterogeneous, multi-domain pre-training dataset-often with diverse dynamic patterns, sampling rates, and temporal scales-to many downstream datasets, which can also be diverse in task specifications and data domains. Across 38 datasets spanning human activity sensors, healthcare, engineering, and finance domains, UniTS model performs favorably against 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including repurposed text-based LLMs. UniTS demonstrates effective few-shot and prompt learning capabilities when evaluated on new data domains and tasks. In the conventional single-task setting, UniTS outperforms strong task-specialized time series models. The source code and datasets are available at https://github.com/mims-harvard/UniTS.

翻訳日:2024-05-31 20:54:36 公開日:2024-05-29

# IOI:非参照画像とビデオ品質メトリクスに対する可視的ワンイテレーション・アドバイザリアタック

IOI: Invisible One-Iteration Adversarial Attack on No-Reference Image- and Video-Quality Metrics ( http://arxiv.org/abs/2403.05955v2 )

ライセンス: Link先を確認

Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin,

(参考訳) 非参照画像とビデオ品質のメトリクスは、ビデオ処理ベンチマークで広く使われている。ビデオアタックによる学習ベースのメトリクスの堅牢性は、広く研究されていない。成功したことに加えて、ビデオ処理ベンチマークで使用できる攻撃は、高速で受け入れがたいものである必要がある。 Invisible One-Iteration (IOI) は参照画像やビデオ品質の指標に反する攻撃である。対象および主観的テストにより,画像とビデオのデータセットを用いた8つの先行手法との比較を行った。本手法は,攻撃性能と速度を同等に保ちながら,攻撃された各種メトリックアーキテクチャの視覚的品質に優れていた。私たちはGitHubでコードを公開しました。

No-reference image- and video-quality metrics are widely used in video processing benchmarks. The robustness of learning-based metrics under video attacks has not been widely studied. In addition to having success, attacks that can be employed in video processing benchmarks must be fast and imperceptible. This paper introduces an Invisible One-Iteration (IOI) adversarial attack on no reference image and video quality metrics. We compared our method alongside eight prior approaches using image and video datasets via objective and subjective tests. Our method exhibited superior visual quality across various attacked metric architectures while maintaining comparable attack success and speed. We made the code available on GitHub: https://github.com/katiashh/ioi-attack.

翻訳日:2024-05-31 20:44:52 公開日:2024-05-29

# 分子を解釈可能な文法のランダムウォークとして表現する

Representing Molecules as Random Walks Over Interpretable Grammars ( http://arxiv.org/abs/2403.08147v2 )

ライセンス: Link先を確認

Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik,

(参考訳) 分子発見の最近の研究は、主に小さな薬物のような分子に焦点が当てられ、同様に材料設計において適切な技術を持たない多くの重要な応用が残されている。これらの応用は、既知のサブ構造を用いて慎重に設計されるサンプルが少なく、より複雑な分子構造に依存していることが多い。本稿では,設計基盤となるモチーフを特徴とする階層設計空間を明示的に記述したグラフ文法を用いて,そのような分子を表現・推論するためのデータ効率・解釈可能なモデルを提案する。本稿では,分子生成と特性予測の両方を容易にする設計空間上のランダムウォークという,新しい表現を提案する。本研究では, 予測分子の性能, 効率, 合成可能性の観点から, 既存の手法に対する明確な優位性を実証し, 提案手法の化学的解釈可能性に関する詳細な知見を提供する。

Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representing and reasoning over such molecules in terms of graph grammars that explicitly describe the hierarchical design space featuring motifs to be the design basis. We present a novel representation in the form of random walks over the design space, which facilitates both molecule generation and property prediction. We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method's chemical interpretability.

翻訳日:2024-05-31 20:44:52 公開日:2024-05-29

# メンタルヘルスのための大規模言語モデル:システムレビュー

Large Language Model for Mental Health: A Systematic Review ( http://arxiv.org/abs/2403.15401v2 )

ライセンス: Link先を確認

Zhijun Guo, Alvina Lai, Johan Hilge Thygesen, Joseph Farrington, Thomas Keen, Kezhi Li,

(参考訳) 大規模言語モデル(LLM)は、デジタルヘルスの潜在的な応用に対して大きな注目を集めている一方、メンタルヘルスへの応用は、現在進行中の議論の対象となっている。本研究は, 早期スクリーニング, デジタル介入, 臨床応用の強さと限界に着目し, 精神保健におけるLSMの使用状況を評価することを目的とする。 PRISMAガイドラインに従って, PubMed, IEEE Xplore, Scopus, JMIRをキーワードとして検索した。非英語記事を除く2017年1月1日から2023年12月31日までの記事を掲載した。 30項目が評価され, テキストによる精神疾患と自殺的思考検出(n=12), メンタルヘルス会話エージェント(n=5), その他のメンタルヘルスにおけるLSMの応用と評価(n=13。 LLMは、メンタルヘルスの問題を検知し、アクセス可能で非スティグマタイズされたeヘルスサービスを提供する上で、かなりの効果を発揮する。しかし、現在の臨床使用に伴うリスクは、彼らの利益を上回る可能性がある。この研究は、専門家によって注釈付けされた多言語データセットの欠如、生成されたコンテンツの正確性と信頼性に関する懸念、LCMの「ブラックボックス」の性質による解釈可能性の課題、永続的な倫理的ジレンマなど、いくつかの重要な問題を明らかにしている。これには、明確な倫理的枠組みの欠如、データのプライバシーへの懸念、セラピストと患者の双方によるLSMへの過度な信頼の可能性が含まれており、従来の医療行為を損なう可能性がある。これらの問題にもかかわらず、LSMの急速な開発は、新たな臨床支援としての可能性を強調し、この分野における継続的な研究と開発の必要性を強調している。

Large language models (LLMs) have attracted significant attention for potential applications in digital health, while their application in mental health is subject to ongoing debate. This systematic review aims to evaluate the usage of LLMs in mental health, focusing on their strengths and limitations in early screening, digital interventions, and clinical applications. Adhering to PRISMA guidelines, we searched PubMed, IEEE Xplore, Scopus, and the JMIR using keywords: 'mental health OR mental illness OR mental disorder OR psychiatry' AND 'large language models'. We included articles published between January 1, 2017, and December 31, 2023, excluding non-English articles. 30 articles were evaluated, which included research on mental illness and suicidal ideation detection through text (n=12), usage of LLMs for mental health conversational agents (CAs) (n=5), and other applications and evaluations of LLMs in mental health (n=13). LLMs exhibit substantial effectiveness in detecting mental health issues and providing accessible, de-stigmatized eHealth services. However, the current risks associated with the clinical use might surpass their benefits. The study identifies several significant issues: the lack of multilingual datasets annotated by experts, concerns about the accuracy and reliability of the content generated, challenges in interpretability due to the 'black box' nature of LLMs, and persistent ethical dilemmas. These include the lack of a clear ethical framework, concerns about data privacy, and the potential for over-reliance on LLMs by both therapists and patients, which could compromise traditional medical practice. Despite these issues, the rapid development of LLMs underscores their potential as new clinical aids, emphasizing the need for continued research and development in this area.

翻訳日:2024-05-31 20:35:08 公開日:2024-05-29

# 巨大な言語モデルでさえ、間違った理由を正すのが難しい

Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons ( http://arxiv.org/abs/2403.17760v2 )

ライセンス: Link先を確認

Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David R. Mortensen, Lori Levin,

(参考訳) 本稿では,NLPの観点から,トークンの区別のみに基づいて包括性を識別するモデルを最小化し,GPT-4とLlama 2が強いバイアスで失敗する可能性を示す,大きな語彙重なりを持つNLIのための小さな挑戦データセットを提案する。そして、この失敗を説明するために、さらに挑戦的なサブタスクを作成します。計算言語学の観点から、曲面特徴によって区別できない3種類の形容詞を持つ構成群を同定する。これにより, LLM のこれらの構造に対する理解を様々な方法で探究することが可能となり, 両者の区別に様々な方法で失敗し, それらの意味を適切に表現したり, 語彙的特徴を捉えたりすることができないことが示唆された。

In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort to explain this failure. From a Computational Linguistics perspective, we identify a group of constructions with three classes of adjectives which cannot be distinguished by surface features. This enables us to probe for LLM's understanding of these constructions in various ways, and we find that they fail in a variety of ways to distinguish between them, suggesting that they don't adequately represent their meaning or capture the lexical properties of phrasal heads.

翻訳日:2024-05-31 20:35:08 公開日:2024-05-29

# Serpent: マルチスケール構造化状態空間モデルによるスケーラブルで効率的な画像復元

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models ( http://arxiv.org/abs/2403.17902v2 )

ライセンス: Link先を確認

Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi,

(参考訳) 効率的な画像復元アーキテクチャの計算構築ブロックのランドスケープは、畳み込み処理と様々な注意機構の組み合わせによって支配されている。しかし、畳み込みフィルタは効率的ではあるが本質的に局所的であるため、画像内の長距離依存関係のモデリングに苦慮している。対照的に、注意は任意の画像領域間のグローバルな相互作用を捉えるのに優れるが、画像次元の二次的なコストに悩まされる。本研究では,最近の状態空間モデル(SSM)とマルチスケール信号処理を組み合わせた高解像度画像復元のための効率的なアーキテクチャであるSerpentを提案する。もともとシーケンスモデリングのために導入されたSSMは、入力サイズが好適な線形スケーリングで、グローバルな受容場を維持することができる。本稿では,従来の信号処理原理に着想を得た新しい階層型アーキテクチャを提案し,入力画像をシーケンスの集合に変換し,マルチスケールで処理する。実験結果から,Serpentはコンピュート・オブ・ザ・アーティファクト(FLOPSの最大150ドル分の削減)と最大5ドル分のGPUメモリを必要とすると同時に,コンピュート・オブ・ザ・アーティファクトに匹敵する再現性を実現することができることを示した。 Serpentによって達成された効率向上は、特に高解像度で顕著である。

The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arbitrary image regions, but suffers from a quadratic cost in image dimension. In this work, we propose Serpent, an efficient architecture for high-resolution image restoration that combines recent advances in state space models (SSMs) with multi-scale signal processing in its core computational block. SSMs, originally introduced for sequence modeling, can maintain a global receptive field with a favorable linear scaling in input size. We propose a novel hierarchical architecture inspired by traditional signal processing principles, that converts the input image into a collection of sequences and processes them in a multi-scale fashion. Our experimental results demonstrate that Serpent can achieve reconstruction quality on par with state-of-the-art techniques, while requiring orders of magnitude less compute (up to $150$ fold reduction in FLOPS) and a factor of up to $5\times$ less GPU memory while maintaining a compact model size. The efficiency gains achieved by Serpent are especially notable at high image resolutions.

翻訳日:2024-05-31 20:35:08 公開日:2024-05-29

# DeFT:効率的な木構造LPM推論のためのFlashツリーアテンションによるデコーディング

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference ( http://arxiv.org/abs/2404.00242v2 )

ライセンス: Link先を確認

Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin,

(参考訳) LLMとのツリー構造相互作用の需要が高まる中、木構造推論に適したIO対応木注目アルゴリズムであるDeFT(Decoding with Flash Tree-Attention)を導入する。従来のシーケンスベースのデコーディングとは異なり、ツリー構造化デコーディングは、自己整合性、少数ショットプロンプト、マルチステップ推論、マルチモデル/ヘッド調整など、現代的なタスク要件に適合する。しかし、既存のシーケンスベースの推論システムは木構造デコードには適していないため、計算、メモリフットプリント、メモリアクセスの冗長性が低下し、推論効率が低下する。この課題に対処するために、DeFTは、メモリ効率の低いメモリフットプリントによるメモリ効率の低い注意計算を、(1)QKVの作成:木分割によるKV誘導グループ化戦略を提案し、GPUグローバルメモリとオンチップ共有メモリ間のKVキャッシュのメモリ読み込み/書き込みを最小化しながら、GPUリソース利用を最適化する;(2)注意計算:各QKVグループの部分的注意を融合カーネルで計算し、ツリートポロジーを意識したグローバルリコメンデーション戦略を用いて最終注目を得る。注意計算中に73-99%のKVキャッシュIOと100%のIOを減らし(例えば、Softmax)、DeFTは3つの実用的なツリーベースワークロードで2.52/3.82倍の高速化を実現している。

Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-model/head coordination. However, existing sequence-based inference systems are ill-suited for tree-structured decoding, resulting in redundancy in computation, memory footprints, and memory access, thereby undermining inference efficiency. To address this challenge, DeFT maintains memory-efficient attention calculation with low memory footprints through two key stages: (1) QKV Preparation: We propose a KV-Guided Grouping Strategy with Tree Split to intelligently group QKV, optimizing GPU resource utilization while minimizing memory reads/writes for KV cache between GPU global memory and on-chip shared memory; (2)Attention Calculation: We compute partial attention of each QKV group in a fused kernel and employ a Tree-topology-aware Global Reduction strategy to obtain final attention. By reducing 73-99% KV cache IO and nearly 100% IO for partial results during attention calculation (e.g., Softmax), DeFT achieves up to 2.52/3.82x speedup in the end-to-end/attention latency across three practical tree-based workloads: namely, few-shot prompting, multi-step reasoning, and speculative decoding, over state-of-the-art attention algorithms.

翻訳日:2024-05-31 20:35:08 公開日:2024-05-29

# 音声基礎モデルの大規模評価

A Large-Scale Evaluation of Speech Foundation Models ( http://arxiv.org/abs/2404.09385v2 )

ライセンス: Link先を確認

Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee,

(参考訳) ファンデーションモデルパラダイムは、共有ファンデーションモデルを利用して、さまざまなタスクに対して最先端(SOTA)のパフォーマンスを実現し、下流固有のモデリングとデータアノテーションを最小限にする必要がある。このアプローチは自然言語処理(NLP)分野において極めて重要であることが証明されている。しかし、音声処理コミュニティには、このパラダイムを体系的に探求するための同様の設定が欠けている。本研究では,音声処理の汎用性能ベンチマーク (SUPERB) を構築し,このパラダイムの有効性について検討する。凍結基盤モデルを用いてSUPERBにおける音声処理タスクに対処する統合マルチタスクフレームワークを提案する。この結果とコミュニティの投稿とを組み合わせることで,基礎モデルパラダイムがスピーチに有望であること,マルチタスクフレームワークがシンプルかつ効果的であること,そして最も優れた基礎モデルが,ほとんどのSUPERBタスク間での競争的一般化性を示していること,などが確認できる。再現性と拡張性のために、決定論的ベンチマークを可能にし、オンラインのリーダーボードによる結果共有を可能にし、コミュニティ主導のベンチマークデータベースを通じてコラボレーションを促進し、新しい開発サイクルをサポートする長期的なプラットフォームを開発しました。最後に,SUPERBと音声基礎モデルの詳細な理解を目的とした一連の分析を行い,モデル内のタスク間の情報フロー,重み付きベンチマークプロトコルの正確性,ベンチマークの統計的意義と堅牢性などについて述べる。

The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark.

翻訳日:2024-05-31 20:25:21 公開日:2024-05-29

# インシデント応答GPT:生成人工知能を用いた交通事故対応計画の作成

IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence ( http://arxiv.org/abs/2404.18550v2 )

ライセンス: Link先を確認

Artur Grigorev, Adriana-Simona Mihaita Khaled Saleh, Yuming Ou,

(参考訳) 道路事故による交通渋滞は、都市環境において大きな課題となり、汚染、経済的な損失、交通渋滞が増大する。これらのインシデントを効果的に管理することは、その悪影響を軽減するために不可欠であるが、都市交通システムの複雑さと潜在的なインシデントの多様性は、かなりの障害を表している。本稿では,迅速な情報提供,適応可能な交通事故対応計画を提供することで,交通管理当局を支援する革新的なソリューションであるインシデントレスGPTを紹介する。生成型AIプラットフォームをリアルタイムトラフィックインシデントレポートと運用ガイドラインに統合することにより,交通インシデントに対応する意思決定プロセスの合理化を目指す。この研究は、交通管理におけるAIの展開に関わる重要な課題に対処する。都市交通ネットワークの複雑さの克服、リアルタイムな意思決定能力の確保、地方法と規制の整合、AI駆動システムに対する公的な受け入れの確保などだ。事故報告のテキスト分析、交通シミュレーションによるAIレコメンデーションの検証、透明で検証されたAIシステムの実装の組み合わせを通じて、IncidenceResponseGPTは、トラフィックフローを最適化し、交通インシデントに直面した混雑を低減するための有望なアプローチを提供する。この作業は、交通管理当局、緊急対応チーム、自治体など、都市交通管理とインシデント管理のすべての統合的なステークホルダーにも及んでいる。本研究は,交通事故の迅速解決だけでなく,都市交通システムへの全体的な影響を最小限に抑える枠組みを開発することを目的としている。

Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems.

翻訳日:2024-05-31 20:25:21 公開日:2024-05-29

# LLMの理解には統計的一般化以上のものが必要だ

Understanding LLMs Requires More Than Statistical Generalization ( http://arxiv.org/abs/2405.01964v2 )

ライセンス: Link先を確認

Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár,

(参考訳) この10年、ディープラーニング理論における花の咲く研究が「なぜディープラーニングは一般化するのか?」と答えようとしている。パースペクティブの強力なシフトは、補間系における過度にパラメトリケートされたモデルの研究という、この進歩を早めた。本稿では, LLMの望ましい性質のいくつかは, 良好な統計一般化の結果ではなく, 別々に理論的な説明を必要とするため, もう一つの視点シフトが原因であると主張する。我々の中心的な議論は、AR確率モデルは本質的には識別不可能である、という観察に依存している。我々は,(1)ゼロショット規則外挿の非識別性,(2)文脈内学習の近似的非識別性,(3)微視的学習の非識別性という3つのケーススタディを通じて,非識別性が実際的関連性を持つ理由を考察した。我々は, LLM関連一般化対策, 伝達可能性, 誘導バイアスに着目した有望な研究方向性を概観する。

The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

翻訳日:2024-05-31 20:25:21 公開日:2024-05-29

# 宇宙における機械学習: 搭載MLモデルの放射能に対するロバスト性の調査

Machine Learning in Space: Surveying the Robustness of on-board ML models to Radiation ( http://arxiv.org/abs/2405.02642v2 )

ライセンス: Link先を確認

Kevin Lange, Federico Fontana, Francesco Rossi, Mattia Varile, Giovanni Apruzzese,

(参考訳) 現代の宇宙船はますます機械学習(ML)に依存している。しかし、宇宙の物理機器は、放射線などの様々な自然の危険にさらされており、コンピュータ装置の正しい操作を阻害する可能性がある。自然に誘発される欠陥がML関連ハードウェアに損傷をもたらすことを示す証拠は数多くあるが、宇宙用途のMLモデルに対する放射の影響は十分に研究されていない。これは問題であり、これらの自然現象によってMLモデルがどのように影響を受けるかを理解していないため、放射耐性MLソフトウェアを開発する上で「どこから始めるか」は不確実である。 ML研究者として、私たちはこのジレンマに取り組みます。機械学習を専門とするスペースインダストリー実践者と組むことで,最先端技術に関するリフレクティブな分析を行う。本研究は, 宇宙船用MLモデルに対する自然災害の影響について, 先行研究が徹底的に検証しなかった事実を提示する。そして、"負の結果"を通して、いくつかの既存のオープンソース技術は、衛星におけるMLのいくつかの応用に対する放射の影響を研究するために、研究者によってはほとんど利用できないことを示す。建設的なステップとして、我々は現在のフレームワークを活用して、放射誘発断層に対するクラウド検出のための実用的なMLモデルのロバスト性を評価するための簡単な実験を行った。我々の評価は、すべての欠点が、いくつかの先行研究で主張されているような破壊的なものではないことを明らかにしている。私たちのリソースを一般公開することで、宇宙耐性MLモデルの開発を先導するために、研究者が宇宙船にアクセスできる足場を提供しています。

Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space applications are not well-studied. This is a problem: without understanding how ML models are affected by these natural phenomena, it is uncertain "where to start from" to develop radiation-tolerant ML software. As ML researchers, we attempt to tackle this dilemma. By partnering up with space-industry practitioners specialized in ML, we perform a reflective analysis of the state of the art. We provide factual evidence that prior work did not thoroughly examine the impact of natural hazards on ML models meant for spacecraft. Then, through a "negative result", we show that some existing open-source technologies can hardly be used by researchers to study the effects of radiation for some applications of ML in satellites. As a constructive step forward, we perform simple experiments showcasing how to leverage current frameworks to assess the robustness of practical ML models for cloud detection against radiation-induced faults. Our evaluation reveals that not all faults are as devastating as claimed by some prior work. By publicly releasing our resources, we provide a foothold -- usable by researchers without access to spacecraft -- for spearheading development of space-tolerant ML models.

翻訳日:2024-05-31 20:15:18 公開日:2024-05-29

# 知識グラフに基づくニューラルシンボリックシステムの研究

Exploring knowledge graph-based neural-symbolic system from application perspective ( http://arxiv.org/abs/2405.03524v4 )

ライセンス: Link先を確認

Shenzhe Zhu, Shengxiang Sun,

(参考訳) 人工知能(AI)とディープニューラルネットワークの進歩は、視覚とテキスト処理に大きな進歩をもたらした。しかし、AIシステムにおける人間のような推論と解釈可能性を達成することは、依然として大きな課題である。ニューラルネットワークをシンボリックシステムと統合するNeural-Symbolicパラダイムは、より解釈可能なAIへの有望な経路を提供する。このパラダイムの中では、知識グラフ(KG)が重要であり、相互接続された実体や関係を通じて知識を表現する構造的かつ動的な方法を提供する。本稿では、KGに基づくニューラルシンボリック統合の最近の進歩について、ニューラルネットワークの論理的知識による推論と解釈可能性の向上(Symbol for Neural)、ニューラルネットワーク手法(Neural for Symbol)によるシンボリックシステムの完全性と正確性の改善(Neural for Symbol)、ハイブリッドニューラルシンボリック統合(Hybrid Neural-Symbolic Integration)におけるそれらの組み合わせ適用の促進という、3つのカテゴリの統合をサポートする方法について検討する。最新のトレンドを強調し、Neural-Symbolic AIにおける今後の研究方向を提案する。

Advancements in Artificial Intelligence (AI) and deep neural networks have driven significant progress in vision and text processing. However, achieving human-like reasoning and interpretability in AI systems remains a substantial challenge. The Neural-Symbolic paradigm, which integrates neural networks with symbolic systems, presents a promising pathway toward more interpretable AI. Within this paradigm, Knowledge Graphs (KG) are crucial, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, typically as triples (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, examining how it supports integration in three categories: enhancing the reasoning and interpretability of neural networks with symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes future research directions in Neural-Symbolic AI.

翻訳日:2024-05-31 20:15:18 公開日:2024-05-29

# 耳に耳を傾ける:雑音のある音声をターゲットに

Look Once to Hear: Target Speech Hearing with Noisy Examples ( http://arxiv.org/abs/2405.06289v3 )

ライセンス: Link先を確認

Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota,

(参考訳) 混み合った環境では、人間の脳はターゲット話者からのスピーチに集中することができる。本稿では,この能力を実現するための新しいインテリジェントな聴取システムを提案する。ナイーブなアプローチは、ターゲット話者を登録するためにクリーンな音声サンプルを必要とすることである。しかしこれは、クリーンな例を得ることは現実のシナリオでは困難であり、ユニークなユーザーインターフェイスの問題を生み出すため、聞き取り可能なアプリケーションドメインとうまく一致しない。本稿では,対象話者を数秒間観察して,目標話者の単一,短く,雑音の多いバイノーラルな例を捉える,最初の登録インタフェースを提案する。このノイズのある例は、干渉する話者や雑音の存在下での音声抽出の登録と後続の音声抽出に使用される。本システムでは,5秒未満の雑音の入出力音声を用いて7.01dBの信号品質向上を実現し,6.24msで8ミリ秒の音声チャンクを処理可能である。本研究は,屋内および屋外のマルチパス環境における実世界の静的・移動型話者への一般化を実証するものである。最後に、ノイズの多い例の登録インターフェースは、クリーンな例に比べてパフォーマンスの劣化を起こさないが、便利でユーザフレンドリーである。一歩後退して、人工知能による人間の聴覚知覚を高めるための重要な一歩を踏み出した。 https://github.com/vb000/LookOnceToHear.com/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/ s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s/s

In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence. We provide code and data at: https://github.com/vb000/LookOnceToHear.

翻訳日:2024-05-31 20:15:18 公開日:2024-05-29

# URDFormer: 実世界の画像から人工シミュレーション環境を構築するパイプライン

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images ( http://arxiv.org/abs/2405.11656v2 )

ライセンス: Link先を確認

Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta,

(参考訳) 視覚的にも身体的にも現実的にもシミュレーションシーンを構築することは、ロボット工学からコンピュータビジョンまで、領域における実践的な関心の問題である。この問題は、大規模なデータハングリー学習手法が物理的な意思決定システムのための新たなトレーニングデータソースを求める研究者によってさらに重要になっている。しかし、シミュレーションモデルの構築は依然として手作業で行われていることが多い。グラフィックデザイナとシミュレーションエンジニアは、事前に定義された資産を使って、リアルな動的およびキネマティックな特性を持つリッチなシーンを構築する。これは、データ駆動型ロボット制御に必要な一般化特性を達成するために、少数のシーンにスケールする可能性があるが、我々は「自然」キネマティック構造と動的構造を完備した、多数の現実的なシーンを合成できるパイプラインが必要である。この問題に対処するため、我々は自然画像から構造を推論しシミュレーションシーンを生成するモデルを開発し、Webスケールのデータセットからスケーラブルなシーン生成を可能にした。これらのイメージ・トゥ・シミュレートモデルをトレーニングするために、現実的な画像から完全なシーンモデルへのマッピング、逆問題のモデル化を可能にするペア化トレーニングデータを生成するために、制御可能なテキスト・ツー・イメージ生成モデルをどのように利用できるかを示す。このパラダイムによって、セマンティックおよび物理リアリズムを用いたシミュレーションにおいて、大規模なシーンデータセットを構築することができることを示す。本稿では,実世界の画像から機械的・動的構造を表現したシミュレーションシーンを生成し,ロボット制御ポリシのトレーニングに使用する統合エンドツーエンドパイプラインを提案する。そして、オブジェクトの操作のようなタスクのために、現実世界にしっかりとデプロイします。そこで本研究は,シミュレーション環境を大規模に生成するためのパイプラインと,ロバストなロボット制御ポリシをトレーニングする統合システムの両方を提供する。

Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by hand. A graphic designer and a simulation engineer work with predefined assets to construct rich scenes with realistic dynamic and kinematic properties. While this may scale to small numbers of scenes, to achieve the generalization properties that are required for data-driven robotic control, we require a pipeline that is able to synthesize large numbers of realistic scenes, complete with 'natural' kinematic and dynamic structures. To attack this problem, we develop models for inferring structure and generating simulation scenes from natural images, allowing for scalable scene generation from web-scale datasets. To train these image-to-simulation models, we show how controllable text-to-image generative models can be used in generating paired training data that allows for modeling of the inverse problem, mapping from realistic images back to complete scene models. We show how this paradigm allows us to build large datasets of scenes in simulation with semantic and physical realism. We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images and use these for training robotic control policies. We then robustly deploy in the real world for tasks like articulated object manipulation. In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies in the resulting environments.

翻訳日:2024-05-31 20:15:18 公開日:2024-05-29

# 自己改善による大規模視覚言語モデルにおける視覚言語モダリティアライメントの強化

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement ( http://arxiv.org/abs/2405.15973v2 )

ライセンス: Link先を確認

Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao,

(参考訳) 大規模視覚言語モデル(LVLM)は、特定のデータセットに対する視覚指導による様々な視覚的質問応答および推論タスクにおいて印象的な結果を得た。しかし、視覚的モダリティと言語的モダリティの整合性を改善する余地は依然として大きい。このアライメントを強化するには、通常、その能力と品質に大きく依存する外部モデルやデータが必要である。本稿では,自己改善による視覚的・言語的モダリティの整合性を向上し,外部モデルやデータの必要性を解消するフレームワークであるSIMAを提案する。 SIMAは、既存のビジョンインストラクションチューニングデータセットからのプロンプトを活用して、自己生成応答を生成し、コンテキスト内自己批判機構を使用して、優先順位調整のためのレスポンスペアを選択する。重要なイノベーションは、コンテキスト内自己批判プロセス中に3つの視覚メトリクスを導入し、画像の理解を深める応答の選択においてLVLMを導くことである。 14の幻覚と総合的なベンチマークの実験を通して、SIMAは全てのベンチマークでモデル性能を向上するだけでなく、過去のアプローチよりも優れたモダリティアライメントを実現することを示した。

Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending on their capabilities and quality, which inevitably sets an upper bound on performance. In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data. SIMA leverages prompts from existing vision instruction tuning datasets to self-generate responses and employs an in-context self-critic mechanism to select response pairs for preference tuning. The key innovation is the introduction of three vision metrics during the in-context self-critic process, which can guide the LVLM in selecting responses that enhance image comprehension. Through experiments across 14 hallucination and comprehensive benchmarks, we demonstrate that SIMA not only improves model performance across all benchmarks but also achieves superior modality alignment, outperforming previous approaches.

翻訳日:2024-05-31 20:05:24 公開日:2024-05-29

# 医用テキストデータの要約におけるオープンソース言語モデルの比較分析

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data ( http://arxiv.org/abs/2405.16295v3 )

ライセンス: Link先を確認

Yuhao Chen, Zhimu Wang, Bo Wen, Farhana Zulkernine,

(参考訳) 医療ノートや対話における構造化されていないテキストには、豊富な情報が含まれている。近年のLarge Language Models (LLMs) の進歩は、非構造化テキストデータに対する回答および要約タスクにおいて優れた性能を示し、従来のテキスト解析手法よりも優れている。しかし、医学図表のような分野固有のデータに対して、異なるLCMの性能を客観的に評価し報告する科学的研究は文献に欠けている。 GPT-4 をアセスメントとして,医療要約タスクにおける Llama2 や Mistral などのオープンソース LLM の性能評価手法を提案する。 LLMの定量的評価に対する革新的なアプローチは、品質管理を可能にし、特定のタスクに有効なLLMの選択を支援し、デジタルヘルスにおける知識発見を促進する。

Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.

翻訳日:2024-05-31 19:55:33 公開日:2024-05-29

# 重要なことを記憶する: マルチトラバースからの創発的シーン分解

Memorize What Matters: Emergent Scene Decomposition from Multitraverse ( http://arxiv.org/abs/2405.17187v2 )

ライセンス: Link先を確認

Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez,

(参考訳) 人間は自然に永久的な要素の記憶を保持するが、短命の瞬間はしばしば記憶のひび割れを乗り越える。この選択的保持は、ロボット知覚、局所化、マッピングに不可欠である。ロボットにこの能力を付与するために,3次元ガウスマッピング(3DGM)を導入する。 3DGMは、同じ領域から複数のRGBビデオをガウスベースの環境マップに変換し、同時に2D短命なオブジェクトセグメンテーションを実行する。私たちのキーとなる観察は、オブジェクトが頻繁に変化する間、環境は横断的に一貫しているということです。これにより、環境オブジェクトの分解を実現するために、繰り返し発生するトラバーサルからの自己超越を活用できる。より具体的には、3DGMは、堅牢な微分可能なレンダリング問題としてマルチトラバース環境マッピングを定式化し、環境のピクセルとオブジェクトをそれぞれインレーヤとアウトレーヤとして扱う。頑健な特徴蒸留, 特徴残量マイニング, 頑健な最適化を用いて, 3DGMは人間の介入なしに2次元分割と3次元マッピングを共同で行う。 We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and Neural rendering。本手法の有効性と可能性を検証した。

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.

翻訳日:2024-05-31 19:55:33 公開日:2024-05-29

# 乱流の大規模渦シミュレーションのためのデータ駆動クロージャモデルの誤差解析

A note on the error analysis of data-driven closure models for large eddy simulations of turbulence ( http://arxiv.org/abs/2405.17612v2 )

ライセンス: Link先を確認

Dibyajyoti Chakraborty, Shivam Barwey, Hong Zhang, Romit Maulik,

(参考訳) 本研究では,データ駆動型乱流閉鎖モデルを用いて,流れの軌跡予測における誤差伝搬の数学的定式化を行う。大渦シミュレーション予測の予測状態がサブサンプル直接数値シミュレーションの予測状態に近くなければならないという仮定の下で,データ駆動クロージャモデルを利用する場合の予測誤差の上限を求める。また、この誤差は、時間ステップサイズと、クロージャを用いて最初のワンステップエラーを増幅する役割を担っているヤコビアンに大きく影響されることも示している。また, この誤差は, 閉包定式化のジャコビアンの影響を受けやすいシステムヤコビアンの上界とロールアウト時間で指数関数的に伝播することを示した。これらの知見は、同定されたエラーバウンド項に基づくMLモデルの新たな正規化手法の開発を可能にし、その堅牢性を改善し、エラーの伝播を低減する。

In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation.

翻訳日:2024-05-31 19:55:33 公開日:2024-05-29

# ハイブリッドな選好最適化:補助的目的による直接選好最適化の強化

Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives ( http://arxiv.org/abs/2405.17956v2 )

ライセンス: Link先を確認

Anirudhan Badrinath, Prabhat Agarwal, Jiajing Xu,

(参考訳) 大規模言語モデル(LLM)の整合性を確保するため、先行研究は人間フィードバック(RLHF)や直接選好最適化(DPO)による強化学習を活用している。 DPOは、最大推定に基づいてより単純なフレームワークを提供するが、LLM設計者の好みに応じて、言語モデルをチューニングし、非微分可能および非バイナリ目的を容易に最大化する能力に妥協する(例えば、より単純な言語を使用したり、特定の有害なコンテンツを最小化するなど)。これらは、ユーザの好みと一致せず、バイナリの好みデータによって引き付けられることもない。本稿では,DPOの簡易性と性能をRLの一般化性に活かすために,DPOとRLHFのハイブリッドアプローチを提案する。 DPOの暗黙的な報酬分解に対する単純な拡張により、LLM をチューニングすることで、オフライン RL を用いて任意の補助報酬の集合を最大化することができる。提案手法であるHybrid Preference Optimization (HPO) は, ユーザの嗜好と補助的設計目的の両方に効果的に一般化できると同時に, 様々な課題のあるベンチマークやモデルサイズでアライメント性能を保っていることを示す。

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to tune language models to easily maximize non-differentiable and non-binary objectives according to the LLM designer's preferences (e.g., using simpler language or minimizing specific kinds of harmful content). These may neither align with user preferences nor even be able to be captured tractably by binary preference data. To leverage the simplicity and performance of DPO with the generalizability of RL, we propose a hybrid approach between DPO and RLHF. With a simple augmentation to the implicit reward decomposition of DPO, we allow for tuning LLMs to maximize a set of arbitrary auxiliary rewards using offline RL. The proposed method, Hybrid Preference Optimization (HPO), shows the ability to effectively generalize to both user preferences and auxiliary designer objectives, while preserving alignment performance across a range of challenging benchmarks and model sizes.

翻訳日:2024-05-31 19:45:41 公開日:2024-05-29

# 効果的な崩壊理論としての因果フェルミオン系

Causal Fermion Systems as an Effective Collapse Theory ( http://arxiv.org/abs/2405.19254v1 )

ライセンス: Link先を確認

Felix Finster, Johannes Kleiner, Claudio F. Paganini,

(参考訳) 非相対論的極限において、因果フェルミオン系は効果的な崩壊理論をもたらすことが示されている。 Schr\\odinger方程式に対する非線形および確率的補正項は因果作用原理から導かれる。統計作用素の力学は、Kossakowski-Lindblad形式の決定論的方程式によって記述される。さらに、量子状態はボルンの規則と互換性のある動的崩壊を起こす。有効モデルは連続自発局所化モデルと類似しているが、確率積分の保存法則と顕微鏡長スケール$\ell_{\min}$の時間的非局所性により異なる。

It is shown that, in the non-relativistic limit, causal fermion systems give rise to an effective collapse theory. The nonlinear and stochastic correction terms to the Schr\"odinger equation are derived from the causal action principle. The dynamics of the statistical operator is described by a deterministic equation of Kossakowski-Lindblad form. Moreover, the quantum state undergoes a dynamical collapse compatible with Born's rule. The effective model has similarities with the continuous spontaneous localization model, but differs from it by a conservation law for the probability integral as well as a non-locality in time on a microscopic length scale $\ell_{\min}$.

翻訳日:2024-05-31 19:45:41 公開日:2024-05-29

# O(\sqrt{T})= Regret を用いた線形二次レギュレータ学習のための近似トンプソンサンプリング

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret ( http://arxiv.org/abs/2405.19380v1 )

ライセンス: Link先を確認

Yeoneung Kim, Gihun Kim, Insoon Yang,

(参考訳) 本稿では,線形二次レギュレータ(LQR)を改良したベイズ的残差値$O(\sqrt{T})$で学習する近似トンプソンサンプリングアルゴリズムを提案する。本手法では,Langevin の動的特性と簡単な励起機構を巧みに設計したプレコンディショナーを用いる。励振信号は、プレコンディショナーの最小固有値を時間とともに増加させ、近似した後方サンプリングプロセスを加速させることを示す。さらに,本アルゴリズムにより生成された近似後縁部の非自明な濃度特性を同定する。これらの性質により、システム状態のモーメントを束縛し、文献でよく使われるパラメータ集合に対する非現実的な制限的仮定を伴わずに$O(\sqrt{T})$ regret boundを達成できる。

We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Moreover, we identify nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without the unrealistic restrictive assumptions on parameter sets that are often used in the literature.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 反モニー洗浄のためのネットワーク分析 -系統的な文献レビューと実験的評価-

Network Analytics for Anti-Money Laundering -- A Systematic Literature Review and Experimental Evaluation ( http://arxiv.org/abs/2405.19383v1 )

ライセンス: Link先を確認

Bruno Deprez, Toon Vanderschueren, Wouter Verbeke, Bart Baesens, Tim Verdonck,

(参考訳) マネーロンダリングは、違法な活動の資金提供によって社会を負担する、広範囲にわたる課題を提示する。マネーロンダリングをより効果的に戦い、検出するために、ネットワーク情報の利用がますます検討され、マネーロンダリングには必ずしも相互接続されたパーティが伴うことを悪用している。これにより、反マネーロンダリング(AML)のためのネットワーク分析(NA)に関する文献が急増した。しかし、文献は断片化されており、既存の作品の包括的な概要が欠落している。これにより、適用可能なメソッドとその比較検出能力の限定的な理解がもたらされる。そこで本稿では,文献の大規模かつ体系的なレビューを行う。我々は、Web of ScienceとScopusデータベースの97の論文を特定し分析し、その結果、Bockel-Rickermannらの詐欺分析フレームワークによるアプローチの分類結果を得た。さらに,一様セットアップにおける顕著なNA手法の性能評価と比較を行うための総合的な実験フレームワークを提案する。このフレームワークは一般公開されているEllipticデータセットに適用され、手動機能エンジニアリング、ランダムウォークベースのメソッド、ディープラーニングGNNを実装している。ネットワーク分析により,グラフニューラルネットワークを用いたAMLモデルの予測能力が向上し,最良の結果が得られた。研究者や実践者がこれらの結果を拡張し、プロプライエタリなデータで実験できるように、実験フレームワークのオープンソース実装が提供されている。そこで我々は,AMLにおけるネットワーク分析の分析と評価に向けて,標準化されたアプローチを推進することを目的としている。

Money laundering presents a pervasive challenge, burdening society by financing illegal activities. To more effectively combat and detect money laundering, the use of network information is increasingly being explored, exploiting that money laundering necessarily involves interconnected parties. This has lead to a surge in literature on network analytics (NA) for anti-money laundering (AML). The literature, however, is fragmented and a comprehensive overview of existing work is missing. This results in limited understanding of the methods that may be applied and their comparative detection power. Therefore, this paper presents an extensive and systematic review of the literature. We identify and analyse 97 papers in the Web of Science and Scopus databases, resulting in a taxonomy of approaches following the fraud analytics framework of Bockel-Rickermann et al.. Moreover, this paper presents a comprehensive experimental framework to evaluate and compare the performance of prominent NA methods in a uniform setup. The framework is applied on the publicly available Elliptic data set and implements manual feature engineering, random walk-based methods, and deep learning GNNs. We conclude from the results that network analytics increases the predictive power of the AML model with graph neural networks giving the best results. An open source implementation of the experimental framework is provided to facilitate researchers and practitioners to extend upon these results and experiment on proprietary data. As such, we aim to promote a standardised approach towards the analysis and evaluation of network analytics for AML.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# VLEOシミュレーションのためのニューラルネットワーク:熱圏モデリングのためのサーモネットの導入

NeuralODEs for VLEO simulations: Introducing thermoNET for Thermosphere Modeling ( http://arxiv.org/abs/2405.19384v1 )

ライセンス: Link先を確認

Dario Izzo, Giacomo Acciarini, Francesco Biscani,

(参考訳) 本研究では,衛星軌道伝搬における熱圏密度を表現するために,微分可能な計算量を削減した新しいニューラルアーキテクチャ「サーモネット」を提案する。運動方程式の右側にニューラルネットワークが現れるため、結果として生じる衛星力学はニューラルノード(NeuralODE)によって制御され、その完全に微分可能な性質によって特徴付けられる。ネットワークパラメータの効率的なトレーニングは、2つの異なるアプローチによって行われる。最初のアプローチでは、ネットワークは宇宙船の動力学とは独立して訓練を行い、JB-08やNRLMSISE-00といった地上の真理モデルに対して純粋な回帰処理を行う。第2のパラダイムでは、ネットワークパラメータは観測されたダイナミクスに基づいて学習され、ODEの感度によって適応する。どちらの場合も、結果はフレキシブルでコンパクトな熱圏密度モデルであり、軌道予測の精度を維持しながら数値伝播効率を大幅に向上させる。

We introduce a novel neural architecture termed thermoNET, designed to represent thermospheric density in satellite orbital propagation using a reduced amount of differentiable computations. Due to the appearance of a neural network on the right-hand side of the equations of motion, the resulting satellite dynamics is governed by a NeuralODE, a neural Ordinary Differential Equation, characterized by its fully differentiable nature, allowing the derivation of variational equations (hence of the state transition matrix) and facilitating its use in connection to advanced numerical techniques such as Taylor-based numerical propagation and differential algebraic techniques. Efficient training of the network parameters occurs through two distinct approaches. In the first approach, the network undergoes training independently of spacecraft dynamics, engaging in a pure regression task against ground truth models, including JB-08 and NRLMSISE-00. In the second paradigm, network parameters are learned based on observed dynamics, adapting through ODE sensitivities. In both cases, the outcome is a flexible, compact model of the thermosphere density greatly enhancing numerical propagation efficiency while maintaining accuracy in the orbital predictions.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 10年ぶりのビデオ異常検出:調査と展望

Video Anomaly Detection in 10 Years: A Survey and Outlook ( http://arxiv.org/abs/2405.19387v1 )

ライセンス: Link先を確認

Moshira Abdalla, Sajid Javed, Muaz Al Radi, Anwaar Ulhaq, Naoufel Werghi,

(参考訳) ビデオ異常検出(VAD)は、監視、医療、環境監視といった様々な領域において非常に重要である。多くの調査では従来のVAD手法に重点を置いているが、特定のアプローチや新たなトレンドを探求する深みを欠いていることが多い。この調査では、従来の教師付きトレーニングパラダイムを超えて、弱教師付き、自己監督型、教師なしのアプローチを包含する、ディープラーニングベースのVADを調査している。このレビューの顕著な特徴は、大規模なデータセット、特徴抽出、学習方法、損失関数、正規化、異常スコア予測を含む、VADパラダイムの中核的な課題の調査である。さらに,視覚言語モデル(VLM)をVADの強力な特徴抽出器として検討した。 VLMは視覚データをビデオからテキスト記述や音声言語と統合し、異常検出に不可欠なシーンの微妙な理解を可能にする。これらの課題に対処し、今後の研究方向性を提案することにより、複雑な実世界のシナリオにおいて、VLMの能力を活用した堅牢で効率的なVADシステムの開発を促進することを目的としている。この包括的分析は、既存の知識ギャップを埋め、研究者に貴重な洞察を与え、VAD研究の将来形成に貢献しようとしている。

Video anomaly detection (VAD) holds immense importance across diverse domains such as surveillance, healthcare, and environmental monitoring. While numerous surveys focus on conventional VAD methods, they often lack depth in exploring specific approaches and emerging trends. This survey explores deep learning-based VAD, expanding beyond traditional supervised training paradigms to encompass emerging weakly supervised, self-supervised, and unsupervised approaches. A prominent feature of this review is the investigation of core challenges within the VAD paradigms including large-scale datasets, features extraction, learning methods, loss functions, regularization, and anomaly score prediction. Moreover, this review also investigates the vision language models (VLMs) as potent feature extractors for VAD. VLMs integrate visual data with textual descriptions or spoken language from videos, enabling a nuanced understanding of scenes crucial for anomaly detection. By addressing these challenges and proposing future research directions, this review aims to foster the development of robust and efficient VAD systems leveraging the capabilities of VLMs for enhanced anomaly detection in complex real-world scenarios. This comprehensive analysis seeks to bridge existing knowledge gaps, provide researchers with valuable insights, and contribute to shaping the future of VAD research.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 統一的変分法による2次元電子ガスの基底状態相

Ground state phases of the two-dimension electron gas with a unified variational approach ( http://arxiv.org/abs/2405.19397v1 )

ライセンス: Link先を確認

Conor Smith, Yixiao Chen, Ryan Levy, Yubo Yang, Miguel A. Morales, Shiwei Zhang,

(参考訳) 2次元電子ガス(2DEG)は基本的なモデルであり、近年の2次元物質の実験的および理論的研究の進展により関心が高まりつつある。 2DEGの基底状態の現在の理解は、異なる相に対する異なるアンサーゼの変分比較に基づいて、量子モンテカルロ計算に依存する。我々は, メッセージパス型ニューラル量子状態アーキテクチャを用いた一般的な逆流型波動関数である単一変分アンサッツを用いて, 密度範囲全体の統一的な記述を行う。変分最適化は、前回の最良の結果よりも低い基底状態エネルギーをもたらす。ウィグナー結晶(WC)相への遷移は、現在信じられているよりも低い密度の rs = 37 +/- 1 で自動的に起こる。液体とWC相の間、同じアンザッツと変分探索は、幅広い密度の中間状態の存在を強く示唆し、短距離ネマティックスピン相関が強化された。

The two-dimensional electron gas (2DEG) is a fundamental model, which is drawing increasing interest because of recent advances in experimental and theoretical studies of 2D materials. Current understanding of the ground state of the 2DEG relies on quantum Monte Carlo calculations, based on variational comparisons of different ansatze for different phases. We use a single variational ansatz, a general backflow-type wave function using a message-passing neural quantum state architecture, for a unified description across the entire density range. The variational optimization consistently leads to lower ground-state energies than previous best results. Transition into a Wigner crystal (WC) phase occurs automatically at rs = 37 +/- 1, a density lower than currently believed. Between the liquid and WC phases, the same ansatz and variational search strongly suggest the existence of intermediate states in a broad range of densities, with enhanced short-range nematic spin correlations.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 大N場理論のニューラルスケーリング法則:リッジレス限界を超えた解法モデル

Neural Scaling Laws From Large-N Field Theory: Solvable Model Beyond the Ridgeless Limit ( http://arxiv.org/abs/2405.19398v1 )

ライセンス: Link先を確認

Zhengkang Zhang,

(参考訳) ニューラルネットワークに基づく多くの機械学習モデルはスケーリング法則を示しており、その性能はモデルのサイズとトレーニングデータセットに関するパワー法則としてスケールする。我々は、Maloney, Roberts, Sully が最近提案したモデルにおいて、ニューラルスケーリング法則を研究するための簡易な設定を提供するために、大N場の理論法を用いる。本手法は, モデル動作の正則化に不可欠であるリッジパラメータの非ゼロ値に対して, 後者の論文の結果を拡張した。新たなより正確なスケーリング法則の獲得に加えて、モデルとトレーニングデータセットサイズの間の対称性を説明するダイアグラムレベルでの双対性変換も発見する。同じ双対性は、量子場理論をシミュレートするニューラルネットワークを設計する最近の試みの根底にある。

Many machine learning models based on neural networks exhibit scaling laws: their performance scales as power laws with respect to the sizes of the model and training data set. We use large-N field theory methods to solve a model recently proposed by Maloney, Roberts and Sully which provides a simplified setting to study neural scaling laws. Our solution extends the result in this latter paper to general nonzero values of the ridge parameter, which are essential to regularize the behavior of the model. In addition to obtaining new and more precise scaling laws, we also uncover a duality transformation at the diagrams level which explains the symmetry between model and training data set sizes. The same duality underlies recent efforts to design neural networks to simulate quantum field theories.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 直接絡み合い生成のためのパリティ依存状態伝達

Parity-dependent state transfer for direct entanglement generation ( http://arxiv.org/abs/2405.19408v1 )

ライセンス: Link先を確認

Federico A. Roy, João H. Romeiro, Leon Koch, Ivan Tsitsilin, Johannes Schirk, Niklas J. Glaser, Niklas Bruckmoser, Malay Singh, Franz X. Haslbeck, Gerhard B. P. Huber, Gleb Krylov, Achim Marx, Frederik Pfeiffer, Christian M. F. Schneider, Christian Schweizer, Florian Wallner, David Bunch, Lea Richard, Lasse Södergren, Klaus Liegener, Max Werninghaus, Stefan Filipp,

(参考訳) 量子情報技術が進歩するにつれ、スケーリングと接続性の課題に直面している。特に、遠方の量子ビット間の接続の必要性と、効率的な絡み合いの生成の必要性の2つが技術的実装から独立している。完全状態移動(Perfect State Transfer)は、キュービット格子の遠いノード間の量子状態の時間的最適転送と、最も近い近傍結合のみを実現する技術であり、デバイス接続性を改善する重要なツールを提供する。重要なことに、転送プロトコルは効果的なパリティに依存しない非局所的な相互作用をもたらし、その効用を絡み合った状態の効率的な生成にまで拡張する。ここでは, 超伝導量子ビットの連鎖上での完全状態移動と多ビット絡みの発生を実験的に実証する。このシステムは6つの固定周波数トランスモンキュービットから構成されており、結合はパラメトリックドライブを介して制御される。すべての結合を同時に活性化し、個々の振幅と周波数をエンジニアリングすることにより、最大6キュービットのパーフェクトステートトランスファーを実装し、異なる初期状態に対するそれぞれの単一励起ダイナミクスを観察する。次に、このプロトコルを複数の励起の存在下で適用し、そのパリティに依存した性質を検証する。最後に、この特性を利用して、単一転送操作のみを用いてマルチキュービットグリーンバーガー・ホーネ・ザイリンガー状態を作成し、その効率的な絡み合い生成への応用を実証する。

As quantum information technologies advance they face challenges in scaling and connectivity. In particular, two necessities remain independent of the technological implementation: the need for connectivity between distant qubits and the need for efficient generation of entanglement. Perfect State Transfer is a technique which realises the time optimal transfer of a quantum state between distant nodes of qubit lattices with only nearest-neighbour couplings, hence providing an important tool to improve device connectivity. Crucially, the transfer protocol results in effective parity-dependent non-local interactions, extending its utility to the efficient generation of entangled states. Here, we experimentally demonstrate Perfect State Transfer and the generation of multi-qubit entanglement on a chain of superconducting qubits. The system consists of six fixed-frequency transmon qubits connected by tunable couplers, where the couplings are controlled via parametric drives. By simultaneously activating all couplings and engineering their individual amplitudes and frequencies, we implement Perfect State Transfer on up to six qubits and observe the respective single-excitation dynamics for different initial states. We then apply the protocol in the presence of multiple excitations and verify its parity-dependent property, where the number of excitations within the chain controls the phase of the transferred state. Finally, we utilise this property to prepare a multi-qubit Greenberger-Horne-Zeilinger state using only a single transfer operation, demonstrating its application for efficient entanglement generation.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 物質のk局所量子相の安定性について

On stability of k-local quantum phases of matter ( http://arxiv.org/abs/2405.19412v1 )

ライセンス: Link先を確認

Ali Lavasani, Michael J. Gullans, Victor V. Albert, Maissam Barkeshli,

(参考訳) 現在のトポロジカル位相の理論の枠組みは、幾何学的に局所的な相互作用を持つ系の熱力学的極限に基づいている。自然な疑問は、幾何学的局所性の制約を緩和し、それをより弱いグラフ理論の$k$-局所性の概念に置き換えるならば、物質相の概念がどの程度明確に定義されているかである。この問題に対処するためのステップとして、一般的な量子的低密度パリティチェック符号に対応するハミルトンの摂動に対するエネルギーギャップの安定性を分析し、Bravyi と Hastings [Commun. Math. Phys. 307, 609 (2011)] の仕事を延長する。主な結果のまとめとして、もしある定数 $\varepsilon_1,\varepsilon_2>0$ が存在して、相互作用グラフ上の半径 $r の球の大きさ $\Gamma(r)$ が$\Gamma(r) = O(\exp(r^{1-\varepsilon_1}))$ を満たすと、半径 $r\le\rho^\ast = O(\log(n)^{1+\varepsilon_2})$ の局所基底状態は局所的な摂動に対して安定となる。これは、$D$-次元ユークリッドの場合よりもほぼ指数関数的に改善され、$\Gamma(r) = O(r^D)$ と $\rho^\ast = O(n^\alpha)$ は、ある$\alpha > 0$ である。従うアプローチは、$\varepsilon_1 = 0$を持つ有限レートqLDPC符号の安定性を証明できない。局所ハミルトニアンは広い零温度エントロピーを持つことができるので、熱力学の第3法則の意味を論じる。

The current theoretical framework for topological phases of matter is based on the thermodynamic limit of a system with geometrically local interactions. A natural question is to what extent the notion of a phase of matter remains well-defined if we relax the constraint of geometric locality, and replace it with a weaker graph-theoretic notion of $k$-locality. As a step towards answering this question, we analyze the stability of the energy gap to perturbations for Hamiltonians corresponding to general quantum low-density parity-check codes, extending work of Bravyi and Hastings [Commun. Math. Phys. 307, 609 (2011)]. A corollary of our main result is that if there exist constants $\varepsilon_1,\varepsilon_2>0$ such that the size $\Gamma(r)$ of balls of radius $r$ on the interaction graph satisfy $\Gamma(r) = O(\exp(r^{1-\varepsilon_1}))$ and the local ground states of balls of radius $r\le\rho^\ast = O(\log(n)^{1+\varepsilon_2})$ are locally indistinguishable, then the energy gap of the associated Hamiltonian is stable against local perturbations. This gives an almost exponential improvement over the $D$-dimensional Euclidean case, which requires $\Gamma(r) = O(r^D)$ and $\rho^\ast = O(n^\alpha)$ for some $\alpha > 0$. The approach we follow falls just short of proving stability of finite-rate qLDPC codes, which have $\varepsilon_1 = 0$; we discuss some strategies to extend the result to these cases. We discuss implications for the third law of thermodynamics, as $k$-local Hamiltonians can have extensive zero-temperature entropy.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# VisTA-SR:農業用低速度撮像カメラの精度と分解能の向上

VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture ( http://arxiv.org/abs/2405.19413v1 )

ライセンス: Link先を確認

Heesup Yun, Sassoum Lo, Christine H. Diepenbrock, Brian N. Bailey, J. Mason Earles,

(参考訳) 熱カメラは、植物温度の非侵襲的な測定を可能にするため、農業研究において重要なツールである。低コストのサーマルカメラを利用することで、農業研究と生産におけるサーマルイメージング導入の障壁を低くすることができる。本稿では,農業用低コスト熱画像カメラの温度精度と画質を改善するためのアプローチを提案する。コンピュータビジョン技術、特にディープラーニングネットワークの進歩を活用して、RGBと熱画像を組み合わせた低解像度サーマルカメラの能力を高めるために、$\textbf{Vis}$ual \&$\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement)という手法を提案する。この研究には、温度測定の校正と検証、ペア画像データセットの取得、農業用熱画像に適したディープラーニングネットワークの開発が含まれる。本研究は,農業領域における画像強調の課題に対処し,高分解能産業用カメラに代わる低コスト熱カメラの可能性を探るものである。実験により, 農業における温度精度と画像のシャープ性の向上に本手法が有効であることを示し, よりアクセスしやすく, 効率的な熱イメージングソリューションの確立を図った。

Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature accuracy and image quality of low-cost thermal imaging cameras for agricultural applications. Leveraging advancements in computer vision techniques, particularly deep learning networks, we propose a method, called $\textbf{VisTA-SR}$ ($\textbf{Vis}$ual \& $\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement) that combines RGB and thermal images to enhance the capabilities of low-resolution thermal cameras. The research includes calibration and validation of temperature measurements, acquisition of paired image datasets, and the development of a deep learning network tailored for agricultural thermal imaging. Our study addresses the challenges of image enhancement in the agricultural domain and explores the potential of low-cost thermal cameras to replace high-resolution industrial cameras. Experimental results demonstrate the effectiveness of our approach in enhancing temperature accuracy and image sharpness, paving the way for more accessible and efficient thermal imaging solutions in agriculture.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 許容性による安全:高速かつ安全な強化学習のためのシールド構築

Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning ( http://arxiv.org/abs/2405.19414v1 )

ライセンス: Link先を確認

Alexander Politowicz, Sahisnu Mazumder, Bing Liu,

(参考訳) 実生活問題に対する強化学習(RL)ソリューションの設計は依然として大きな課題である。主な関心領域は安全である。シールドディング」は、ユーザ定義の安全仕様を安全なエージェント動作に変換することで、RLの安全性を強制する一般的な手法である。しかし、これらの手法は、極端な学習遅延に悩まされ、問題のモデルや安全なドメインの設計に広範囲な人的努力を必要とするか、事前計算を必要とする。本稿では,安全と遮蔽構造に対処する新しい許容性に基づく枠組みを提案する。許容性はもともと、RLトレーニング効率を改善するための最適な解決策にはならない(許容不可能な)動作を排除するために設計された。本論文は,安全性を本枠組みに自然に組み込むことが可能であること,すなわち,安全性を含む許容範囲を延長することにより,安全性と効率の向上を両立できることを示す。 3つの標準RLアプリケーションを用いた実験評価は, 提案手法の有効性を示す。

Designing Reinforcement Learning (RL) solutions for real-life problems remains a significant challenge. A major area of concern is safety. "Shielding" is a popular technique to enforce safety in RL by turning user-defined safety specifications into safe agent behavior. However, these methods either suffer from extreme learning delays, demand extensive human effort in designing models and safe domains in the problem, or require pre-computation. In this paper, we propose a new permissibility-based framework to deal with safety and shield construction. Permissibility was originally designed for eliminating (non-permissible) actions that will not lead to an optimal solution to improve RL training efficiency. This paper shows that safety can be naturally incorporated into this framework, i.e. extending permissibility to include safety, and thereby we can achieve both safety and improved efficiency. Experimental evaluation using three standard RL applications shows the effectiveness of the approach.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 生成的類似性を持つコントラスト学習を用いたヒト誘導的ビアーゼをキャプチャする空間の学習

Using Contrastive Learning with Generative Similarity to Learn Spaces that Capture Human Inductive Biases ( http://arxiv.org/abs/2405.19420v1 )

ライセンス: Link先を確認

Raja Marjieh, Sreejan Kumar, Declan Campbell, Liyi Zhang, Gianluca Bencomo, Jake Snell, Thomas L. Griffiths,

(参考訳) 人間は、少数の例から学び、感覚データから有用な情報を抽象化するために、強い帰納バイアスに頼る。機械学習モデルにそのようなバイアスを注入することで、数ショットの学習、堅牢性、アライメントなど、さまざまなベンチマークのパフォーマンスが向上することが示されている。しかし、人間の類似性判断のような心理的に豊かなトレーニングデータがスケールするにはコストがかかるため、目標を達成するための効果的なトレーニング手順を見つけることは困難である。ここでは,2つのデータポイントが同一分布からサンプリングされた場合の類似性を考えるベイズ的類似性の概念を導入することで,この問題に対処する。この尺度は確率的プログラムを含む複雑な生成過程に適用できる。生成的類似性は, 特定の帰納的バイアスを表現する空間埋め込みの学習を可能にするため, 正確な形状を抽出可能な場合でも, 対照的な学習目標を定義するのに有効であることを示す。本研究では, 幾何学的形状の帰納的バイアスを捕捉し, 確率的プログラムによってパラメータ化される抽象的描画スタイルをよりよく識別するために, 提案手法の有用性を実証する。

Humans rely on strong inductive biases to learn from few examples and abstract useful information from sensory data. Instilling such biases in machine learning models has been shown to improve their performance on various benchmarks including few-shot learning, robustness, and alignment. However, finding effective training procedures to achieve that goal can be challenging as psychologically-rich training data such as human similarity judgments are expensive to scale, and Bayesian models of human inductive biases are often intractable for complex, realistic domains. Here, we address this challenge by introducing a Bayesian notion of generative similarity whereby two datapoints are considered similar if they are likely to have been sampled from the same distribution. This measure can be applied to complex generative processes, including probabilistic programs. We show that generative similarity can be used to define a contrastive learning objective even when its exact form is intractable, enabling learning of spatial embeddings that express specific inductive biases. We demonstrate the utility of our approach by showing how it can be used to capture human inductive biases for geometric shapes, and to better distinguish different abstract drawing styles that are parameterized by probabilistic programs.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# バイスタブル画像を用いた視覚言語モデルの評価

Evaluating Vision-Language Models on Bistable Images ( http://arxiv.org/abs/2405.19423v1 )

ライセンス: Link先を確認

Artemis Panagopoulou, Coby Melkin, Chris Callison-Burch,

(参考訳) ビスタブル・イメージ(ビスタブル・イメージ、英: Bistable image、または不明瞭または可逆的イメージ)は、2つの異なる解釈で見ることができる視覚刺激を示すが、観察者は同時に見ることはできない。本研究では,バイスタブル画像を用いた視覚言語モデルについて,これまでで最も広範な検討を行った。私たちは手動で29枚のバイスタブル画像と関連するラベルを集め、明るさ、色調、回転で116種類の操作を行ないました。 6つのモデルアーキテクチャにまたがる分類タスクと生成タスクにおいて,12種類のモデルを評価した。以上の結果から,Idefics 系と LLaVA1.5-13b 系のモデルを除くと,各モデル間の解釈の相違が顕著であり,画像操作下での差は最小であり,画像回転の例外は少ないことが明らかとなった。さらに、モデル嗜好を人間と比較し、モデルが人間と同じ連続性バイアスを示しておらず、しばしば人間の初期解釈から分岐していることを指摘した。また,プロンプトの変動や同義語ラベルの使用の影響についても検討し,これらの要因が画像操作よりもモデル解釈に大きく影響することを発見した。すべてのコードとデータはオープンソースである。

Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected them to 116 different manipulations in brightness, tint, and rotation. We evaluated twelve different models in both classification and generative tasks across six model architectures. Our findings reveal that, with the exception of models from the Idefics family and LLaVA1.5-13b, there is a pronounced preference for one interpretation over another among the models, and minimal variance under image manipulations, with few exceptions on image rotations. Additionally, we compared the model preferences with humans, noting that the models do not exhibit the same continuity biases as humans and often diverge from human initial interpretations. We also investigated the influence of variations in prompts and the use of synonymous labels, discovering that these factors significantly affect model interpretations more than image manipulations showing a higher influence of the language priors on bistable image interpretations compared to image-text training data. All code and data is open sourced.

翻訳日:2024-05-31 19:35:56 公開日:2024-05-29

# 拡散政策攻撃者:拡散政策に対する敵対的攻撃の作法

Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies ( http://arxiv.org/abs/2405.19424v1 )

ライセンス: Link先を確認

Yipu Chen, Haotian Xue, Yongxin Chen,

(参考訳) 拡散モデル (DM) は行動クローニング (BC) の有望なアプローチとして出現している。 DMに基づく拡散ポリシー(DP)は、BCのパフォーマンスを新たな高さまで向上させ、様々なタスクにまたがる堅牢な有効性を証明し、その固有の柔軟性と実装の容易さを兼ね備えている。政策創出の基盤としてDPの採用が増加しているにもかかわらず、安全の重大な問題は未解決のままである。過去の試みでは、深い政策ネットワークをターゲットとしていたが、DPは拡散モデルを政策ネットワークとして使用し、連鎖構造とランダム性によって従来の手法による攻撃を効果的に行なわなかった。本稿では, 敵のシナリオを導入し, オフライン・オンライン攻撃を包含し, グローバル・パッチベースの攻撃を行うことにより, DPの安全性に関する包括的検討を行う。 DP-Attackerは、上記すべてのシナリオにまたがる効果的な敵攻撃を構築できるアルゴリズム群である。我々は、様々な操作タスクにわたる事前訓練された拡散ポリシーに対する攻撃を行う。実験により,DP-Attackerは全てのシナリオにおいてDPの成功率を大幅に低下させることができることを示した。特にオフラインのシナリオでは、DP-Attackerはすべてのフレームに適用可能な高度に転送可能な摂動を生成することができる。さらに、環境に適用された場合、効果的にモデルを欺くような逆の物理的パッチの作成について説明する。ビデオの結果は以下のとおりだ。

Diffusion models (DMs) have emerged as a promising approach for behavior cloning (BC). Diffusion policies (DP) based on DMs have elevated BC performance to new heights, demonstrating robust efficacy across diverse tasks, coupled with their inherent flexibility and ease of implementation. Despite the increasing adoption of DP as a foundation for policy generation, the critical issue of safety remains largely unexplored. While previous attempts have targeted deep policy networks, DP used diffusion models as the policy network, making it ineffective to be attacked using previous methods because of its chained structure and randomness injected. In this paper, we undertake a comprehensive examination of DP safety concerns by introducing adversarial scenarios, encompassing offline and online attacks, and global and patch-based attacks. We propose DP-Attacker, a suite of algorithms that can craft effective adversarial attacks across all aforementioned scenarios. We conduct attacks on pre-trained diffusion policies across various manipulation tasks. Through extensive experiments, we demonstrate that DP-Attacker has the capability to significantly decrease the success rate of DP for all scenarios. Particularly in offline scenarios, DP-Attacker can generate highly transferable perturbations applicable to all frames. Furthermore, we illustrate the creation of adversarial physical patches that, when applied to the environment, effectively deceive the model. Video results are put in: https://sites.google.com/view/diffusion-policy-attacker.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 言語モデルエージェントのための適応型会話内チーム構築

Adaptive In-conversation Team Building for Language Model Agents ( http://arxiv.org/abs/2405.19425v1 )

ライセンス: Link先を確認

Linxin Song, Jiale Liu, Jieyu Zhang, Shaokun Zhang, Ao Luo, Shijian Wang, Qingyun Wu, Chi Wang,

(参考訳) 複数の大規模言語モデル(LLM)エージェントを活用することは、複雑なタスクに取り組む上で有望なアプローチであることを示している。タスクが与えられたら、効果的に解決するためのLLMエージェントのチームをどのように構築すればよいか? 私たちの新しい適応型チーム構築パラダイムは、Captain Agentという新しいエージェント設計を通じて実現された柔軟なソリューションを提供します。タスク解決プロセスの各ステップごとに動的にチームを編成し、管理し、ネストしたグループの会話とリフレクションを利用して、多様な専門知識を確保し、ステレオタイプ的なアウトプットを防ぐ。柔軟性がありながら構造化された問題解決のアプローチを可能にし、冗長性を低減し、出力の多様性を高めるのに役立つ。 6つの実世界のシナリオに対する総合的な評価では、エージェントは21.94%の精度で既存のマルチエージェントメソッドを著しく上回り、タスク固有のプロンプトエンジニアリングを必要とせずに優れたパフォーマンスを提供する。

Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks, while the effective design of multiple agents for a particular application remains an art. It is thus intriguing to answer a critical question: Given a task, how can we build a team of LLM agents to solve it effectively? Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent. It dynamically forms and manages teams for each step of a task-solving process, utilizing nested group conversations and reflection to ensure diverse expertise and prevent stereotypical outputs. It allows for a flexible yet structured approach to problem-solving and can help reduce redundancy and enhance output diversity. A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods with 21.94% improvement in average accuracy, providing outstanding performance without requiring task-specific prompt engineering.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 深層学習による経口読解頻度の評価

Deep Learning for Assessment of Oral Reading Fluency ( http://arxiv.org/abs/2405.19426v1 )

ライセンス: Link先を確認

Mithilesh Vaidya, Binaya Kumar Sahoo, Preeti Rao,

(参考訳) 読み流しの評価はリテラシープログラムの重要な要素であり、早期教育介入の指導と監視に役立っている。教員が実施する演習のリソース集約性を考えると,口頭読みの音声記録を操作できる自動ツールの開発は,客観的かつ高度にスケーラブルなソリューションとして魅力的である。精度、レート、表現力などの複雑な側面は、読み流しの人間の判断を下す。そこで本研究では,人間専門家がラベル付けした物語テキストの子どもの音声記録の学習データセットのエンドツーエンドモデリングについて検討する。事前訓練されたwav2vec2.0モデルは、ラベル付きデータの限られた量による課題を軽減する可能性から採用されている。本報告では, 学習した語彙・音響・韻律的特徴の組込みが, 読み流しの知覚に重要であることを明らかにする。

Reading fluency assessment is a critical component of literacy programmes, serving to guide and monitor early education interventions. Given the resource intensive nature of the exercise when conducted by teachers, the development of automatic tools that can operate on audio recordings of oral reading is attractive as an objective and highly scalable solution. Multiple complex aspects such as accuracy, rate and expressiveness underlie human judgements of reading fluency. In this work, we investigate end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. The pre-trained wav2vec2.0 model is adopted due its potential to alleviate the challenges from the limited amount of labeled data. We report the performance of a number of system variations on the relevant measures, and also probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 量子ヒストリーにおける空間と時間相関

Space and time correlations in quantum histories ( http://arxiv.org/abs/2405.19427v1 )

ライセンス: Link先を確認

Leonardo Castellani, Anna Gabetti,

(参考訳) 一般化された量子ヒストリーの定式化は、同じ履歴密度行列の異なるトレースを取ることによって、空間と時間相関の対称的な処理を可能にする。この枠組みで空間的・時間的絡み合いをどう特徴づけるかを思い出す。静的複合システムのケケットに履歴状態をマッピングする操作プロトコルが提示される。例えば、我々のアプローチでは、Leggett-Garg と temporal CHSH の不等式がどのように破られるかを示す。

The formalism of generalized quantum histories allows a symmetrical treatment of space and time correlations, by taking different traces of the same history density matrix. We recall how to characterize spatial and temporal entanglement in this framework. An operative protocol is presented, to map a history state into the ket of a static composite system. We show, by examples, how the Leggett-Garg and the temporal CHSH inequalities can be violated in our approach.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# コンフォーマル再帰的特徴除去

Conformal Recursive Feature Elimination ( http://arxiv.org/abs/2405.19429v1 )

ライセンス: Link先を確認

Marcos López-De-Castro, Alberto García-Galindo, Rubén Armañanzas,

(参考訳) 従来の統計手法とは異なり、コンフォーマル予測(CP)はデータの交換可能性のみに基づいて、個々の予測に関連する有効かつ正確な信頼レベルを決定することができる。本稿では,CP フレームワークを利用した新しい特徴選択手法を提案する。提案したCRFE(Conformal Recursive Feature Elimination)は,データセットの非整合性を高める機能を特定し,再帰的に削除する。また、CRFEの自動停止基準と、機能のサブセット間の一貫性を測定するための新しいインデックスも提示する。 CRFE選択は、データの複数のパーティションを用いて、複数のマルチクラスデータセット上の古典的再帰的特徴除去(RFE)手法と比較される。その結果、CRFEはデータセットの半分でRFEを明らかに上回り、残りの半分では同様のパフォーマンスを実現していることがわかった。自動停止基準は、分類性能を計算せずに有効かつ非冗長な機能のサブセットを提供する。

Unlike traditional statistical methods, Conformal Prediction (CP) allows for the determination of valid and accurate confidence levels associated with individual predictions based only on exchangeability of the data. We here introduce a new feature selection method that takes advantage of the CP framework. Our proposal, named Conformal Recursive Feature Elimination (CRFE), identifies and recursively removes features that increase the non-conformity of a dataset. We also present an automatic stopping criterion for CRFE, as well as a new index to measure consistency between subsets of features. CRFE selections are compared to the classical Recursive Feature Elimination (RFE) method on several multiclass datasets by using multiple partitions of the data. The results show that CRFE clearly outperforms RFE in half of the datasets, while achieving similar performance in the rest. The automatic stopping criterion provides subsets of effective and non-redundant features without computing any classification performance.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 合意を超えて:言語的インフォームド・カウンセリングに基づく自動評価手法の合理化

Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals ( http://arxiv.org/abs/2405.19433v1 )

ライセンス: Link先を確認

Yupei Wang, Renfen Hu, Zhe Zhao,

(参考訳) 現在の自動エッセイスコアリング(AES)手法は、ヒトのレイカーと高い一致を示しているが、それらのスコアリングメカニズムは十分に解明されていない。提案手法は,Large Language Models (LLMs) によって支援された対実的介入を用いて,エッセイ評価において,BERT のようなモデルは主に文レベルの特徴に焦点をあてる一方で,LLM は慣習,言語複雑性,組織に順応するものであり,より包括的アライメントとルーブリックのスコアリングとの整合性を示す。さらに、LLMはフィードバック中の反事実的介入を識別することができる。我々のアプローチは、ニューラルネットワークAES手法の理解を改善し、モデル駆動決定における透明性を求める他の領域にも適用できる。コードとデータはGitHubでリリースされる。

While current automated essay scoring (AES) methods show high agreement with human raters, their scoring mechanisms are not fully explored. Our proposed method, using counterfactual intervention assisted by Large Language Models (LLMs), reveals that when scoring essays, BERT-like models primarily focus on sentence-level features, while LLMs are attuned to conventions, language complexity, as well as organization, indicating a more comprehensive alignment with scoring rubrics. Moreover, LLMs can discern counterfactual interventions during feedback. Our approach improves understanding of neural AES methods and can also apply to other domains seeking transparency in model-driven decisions. The codes and data will be released at GitHub.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 一般化滑らか性下における多目的最適化の収束性について

On the Convergence of Multi-objective Optimization under Generalized Smoothness ( http://arxiv.org/abs/2405.19440v1 )

ライセンス: Link先を確認

Qi Zhang, Peiyao Xiao, Kaiyi Ji, Shaofeng Zou,

(参考訳) 多目的最適化(MOO)はマルチタスク学習など様々な分野で注目を集めている。最近の研究は、理論的な分析を伴う効果的なアルゴリズムを提供しているが、それらは標準の$L$-smoothや、リカレントニューラルネットワーク(RNN)やトランスフォーマーのようなニューラルネットワークには不満足な境界段階の仮定によって制限されている。本稿では、より一般的で現実的な$\ell$-smooth損失関数の研究を行い、$\ell$は勾配ノルムの一般非減少関数である。目的物間の最小改善を最大化する競合回避(CA)方向を近似した,$\ell$-smooth MOO問題,一般化されたSmooth Multi-objective Gradient descent (GSMGrad) とその確率的変種であるStochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad) の2つの新しいシングルループアルゴリズムを開発した。両アルゴリズムの総合収束解析を行い, 平均CA距離を保証した$\epsilon$-accurate Pareto定常点(すなわち, 更新方向とCA方向のギャップ)に全反復で収束することを示し, 完全$\mathcal{O}(\epsilon^{-2})$と$\mathcal{O}(\epsilon^{-4})$サンプルは決定論的および確率的設定にそれぞれ必要である。私たちのアルゴリズムは、より多くのサンプルを使用して、各イテレーションにおいてより厳密な$\epsilon$-level CA距離を保証することができます。また,GSMGradと同等の性能保証を達成しつつ,一定の時間と空間のみを用いてGSMGrad-FAという実用的なGSMGradの変種を提案する。提案手法の有効性を検証し,提案手法の有効性を検証した。

Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which are typically unsatisfactory for neural networks, such as recurrent neural networks (RNNs) and transformers. In this paper, we study a more general and realistic class of $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We develop two novel single-loop algorithms for $\ell$-smooth MOO problems, Generalized Smooth Multi-objective Gradient descent (GSMGrad) and its stochastic variant, Stochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad), which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of both algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. Our algorithms can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we propose a practical variant of GSMGrad named GSMGrad-FA using only constant-level time and space, while achieving the same performance guarantee as GSMGrad. Our experiments validate our theory and demonstrate the effectiveness of the proposed methods.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 動き平均化による大規模DSM登録

Large-scale DSM registration via motion averaging ( http://arxiv.org/abs/2405.19442v1 )

ライセンス: Link先を確認

Ningli Xu, Rongjun Qin,

(参考訳) 広域デジタルサーフェスモデル(DSM)の生成には、多数の個人と部分的に重複したDSMを登録する必要がある。これは、複数のDSMからの多くの観測が考慮された場合、メモリオーバーフローを引き起こすため、典型的な登録アルゴリズムでは難しい問題となる。逐次登録アルゴリズムは計算を著しく削減できるが、特に小さな重なり合ったペアに対して脆弱であり、大きなエラーの蓄積につながる。本研究では,DSM間の相対的なポーズを表すエッジを持つシーングラフを構築するために,ペアワイズDSMを登録する,動き平均化問題としてDSM登録タスクを構築する新しいソリューションを提案する。具体的には、大きなDSMのグリッド構造に基づいて、新しい近接探索法を用いてペアワイズ登録を行う。シーングラフは,O(N)複雑性の極めて高速な動き平均アルゴリズムを用いて最適化可能である(Nは画像数を指す)。高分解能衛星由来DSMの評価は、計算と精度を著しく向上させる。

Generating wide-area digital surface models (DSMs) requires registering a large number of individual, and partially overlapped DSMs. This presents a challenging problem for a typical registration algorithm, since when a large number of observations from these multiple DSMs are considered, it may easily cause memory overflow. Sequential registration algorithms, although can significantly reduce the computation, are especially vulnerable for small overlapped pairs, leading to a large error accumulation. In this work, we propose a novel solution that builds the DSM registration task as a motion averaging problem: pair-wise DSMs are registered to build a scene graph, with edges representing relative poses between DSMs. Specifically, based on the grid structure of the large DSM, the pair-wise registration is performed using a novel nearest neighbor search method. We show that the scene graph can be optimized via an extremely fast motion average algorithm with O(N) complexity (N refers to the number of images). Evaluation of high-resolution satellite-derived DSM demonstrates significant improvement in computation and accuracy.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# MathChat: マルチターンインタラクションにおける数学的推論と指導のベンチマーク

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions ( http://arxiv.org/abs/2405.19444v1 )

ライセンス: Link先を確認

Zhenwen Liang, Dian Yu, Wenhao Yu, Wenlin Yao, Zhihan Zhang, Xiangliang Zhang, Dong Yu,

(参考訳) 大規模言語モデル(LLM)は数学的な問題解決において、特に一ターンの質問応答形式において顕著な能力を示した。しかし、現実のシナリオでは、多ターンや対話的な情報交換を必要とする数学的な質問応答がしばしば必要であり、これらのタスクにおけるLLMの性能はまだ未定である。本稿では,LLMを幅広い数学的タスクの範囲にわたって評価するための総合的なベンチマークであるMathChatを紹介する。これらのタスクは、マルチターン相互作用とオープンエンド生成におけるモデルの能力を評価するために構成される。我々は,MathChatベンチマークにおける様々なSOTA LLMの性能評価を行い,これらのモデルが一ターン質問応答において優れている一方で,持続的な推論と対話理解を必要とする複雑なシナリオにおいて,それらの性能は著しく低下していることを示した。マルチターンタスクやオープンエンドタスクに直面する既存のLLMの制限に対処するため,LLMファインタニングのための合成対話型数学データセットであるMathChatSyncを開発した。実験結果は、MathChatsyncのような多種多様な対話型インストラクションチューニングデータセットでLLMをトレーニングする必要性を強調している。本研究は, LLMの多元数理推論能力向上に向けた有望な方向性の1つを概説し, 対話型数学的問題解決や実世界の応用に長けているLCMの開発を推し進めるものであると考えている。

Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a comprehensive benchmark specifically designed to evaluate LLMs across a broader spectrum of mathematical tasks. These tasks are structured to assess the models' abilities in multiturn interactions and open ended generation. We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios that require sustained reasoning and dialogue understanding. To address the above limitations of existing LLMs when faced with multiturn and open ended tasks, we develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations. Experimental results emphasize the need for training LLMs with diverse, conversational instruction tuning datasets like MathChatsync. We believe this work outlines one promising direction for improving the multiturn mathematical reasoning abilities of LLMs, thus pushing forward the development of LLMs that are more adept at interactive mathematical problem solving and real world applications.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# FourierMamba: イメージデライニングのためのステートスペースモデルとフーリエラーニング統合

FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining ( http://arxiv.org/abs/2405.19450v1 )

ライセンス: Link先を確認

Dong Li, Yidi Liu, Xueyang Fu, Senyan Xu, Zheng-Jun Zha,

(参考訳) Image derainingは雨が降る画像から雨の跡を取り除き、透明な背景を復元することを目的としている。現在、フーリエ変換を用いたいくつかの研究は、降雨を捉える前に有効な周波数として機能するため、画像の劣化に有効であることが証明されている。しかし、画像に低周波と高周波の依存性があるにもかかわらず、これらのフーリエ法は、学習手順の整合性に異なる周波数の相関を利用することは稀であり、画像デラリニングにおける周波数情報の完全利用を制限している。あるいは、最近登場したMamba手法は、様々な領域(例えば、空間的、時間的)における相関をモデル化するための効果と効率を描いており、異なる周波数を相関付けるために、探索されていないフーリエ空間にMambaを導入することは、画像のデライニングを改善するのに役立つと論じている。これにより,FourierMambaという新たなフレームワークが提案され,Fourier空間におけるMambaとのイメージデベリングが実現された。フーリエマムバのコアは、フーリエ空間における周波数順序のユニークな配置に依拠し、低周波順序形式は空間次元(軸に配置されていない)とチャネル次元(軸に配置されている)で異なる形で表される。そこで我々は、空間次元とチャネル次元のフーリエ空間情報を異なる設計で関連付けるフーリエマンバを設計する。具体的には、空間次元フーリエ空間において、周波数をスキャンして低周波数から高周波数に並べ替えることで、周波数間の接続を秩序的に関連付けるジグザグ符号を導入する。

Image deraining aims to remove rain streaks from rainy images and restore clear backgrounds. Currently, some research that employs the Fourier transform has proved to be effective for image deraining, due to it acting as an effective frequency prior for capturing rain streaks. However, despite there exists dependency of low frequency and high frequency in images, these Fourier-based methods rarely exploit the correlation of different frequencies for conjuncting their learning procedures, limiting the full utilization of frequency information for image deraining. Alternatively, the recently emerged Mamba technique depicts its effectiveness and efficiency for modeling correlation in various domains (e.g., spatial, temporal), and we argue that introducing Mamba into its unexplored Fourier spaces to correlate different frequencies would help improve image deraining. This motivates us to propose a new framework termed FourierMamba, which performs image deraining with Mamba in the Fourier space. Owning to the unique arrangement of frequency orders in Fourier space, the core of FourierMamba lies in the scanning encoding of different frequencies, where the low-high frequency order formats exhibit differently in the spatial dimension (unarranged in axis) and channel dimension (arranged in axis). Therefore, we design FourierMamba that correlates Fourier space information in the spatial and channel dimensions with distinct designs. Specifically, in the spatial dimension Fourier space, we introduce the zigzag coding to scan the frequencies to rearrange the orders from low to high frequencies, thereby orderly correlating the connections between frequencies; in the channel dimension Fourier space with arranged orders of frequencies in axis, we can directly use Mamba to perform frequency correlation and improve the channel information representation.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 遮蔽クラッツァーポテンシャルとその変種について

On the screened Kratzer potential and its variants ( http://arxiv.org/abs/2405.19451v1 )

ライセンス: Link先を確認

Francisco M. Fernández,

(参考訳) 最近提案された2原子分子の振動-回転スペクトルとその熱力学特性は欠陥を示す。これらのポテンシャルのパラメータ $D_e $ と $r_e$ は、それぞれ解離エネルギーと平衡結合長ではないことが容易に示せる。私たちは、その間違いをシンプルで一般的な方法で克服する方法を示します。

We argue that several potentials proposed recently for the analysis of the vibrational-rotational spectra of diatomic molecules and their thermodynamic properties exhibit a flaw. One can easily show that the parameters $D_e $ and $r_e$ in those potentials are not the dissociation energy and equilibrium bond length, respectively, as the proposers believe. We show how to overcome the mistake in a simple and quite general way.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# ゲイター:現実世界の四足歩行のための一貫した表現学習

Gaitor: Learning a Unified Representation Across Gaits for Real-World Quadruped Locomotion ( http://arxiv.org/abs/2405.19452v1 )

ライセンス: Link先を確認

Alexander L. Mitchell, Wolfgang Merkt, Aristotelis Papatheodorou, Ioannis Havoutis, Ingmar Posner,

(参考訳) 現在の四足歩行の最先端は、地形を横断する頑丈な動きを生み出すことができるが、望ましいロボット軌道をトロットやクロールのような個別の移動スキルに分割する必要がある。対照的に、本研究では、歩行タイプと特徴の連続的なブレンディングを可能にする四足歩行の単一統一表現を学習する可能性を示す。本稿では,運動スキルのゆがみのある表現を学習し,トレーニング中に見られるすべての歩行タイプに共通する情報を共有するGaitorを提案する。学習した表現に現れる構造は、異なる歩行タイプ間の位相相関を符号化できることで解釈可能である。これらは連続的な歩行遷移を生み出すために利用することができる。また、フットスイング特性はアンタングル化され、直接対応可能である。この構造化された潜在表現で動作する初歩的な地形符号化と学習プランナーとともに、Gaitorは、不均一な地形に反応しながら、所望の歩行タイプやユーザの特徴を含む動作コマンドを受信することができる。我々はANYmal Cプラットフォーム上でのシミュレーションと実世界の両方の設定でGaitorを評価した。我々の知る限りでは、これは複数の歩行に対して統一的で解釈可能な潜在表現を学習する最初の仕事であり、実際の四足歩行ロボット上で異なる移動モード間でオンデマンドで連続的にブレンドする結果となった。

The current state-of-the-art in quadruped locomotion is able to produce robust motion for terrain traversal but requires the segmentation of a desired robot trajectory into a discrete set of locomotion skills such as trot and crawl. In contrast, in this work we demonstrate the feasibility of learning a single, unified representation for quadruped locomotion enabling continuous blending between gait types and characteristics. We present Gaitor, which learns a disentangled representation of locomotion skills, thereby sharing information common to all gait types seen during training. The structure emerging in the learnt representation is interpretable in that it is found to encode phase correlations between the different gait types. These can be leveraged to produce continuous gait transitions. In addition, foot swing characteristics are disentangled and directly addressable. Together with a rudimentary terrain encoding and a learned planner operating in this structured latent representation, Gaitor is able to take motion commands including desired gait type and characteristics from a user while reacting to uneven terrain. We evaluate Gaitor in both simulated and real-world settings on the ANYmal C platform. To the best of our knowledge, this is the first work learning such a unified and interpretable latent representation for multiple gaits, resulting in on-demand continuous blending between different locomotion modes on a real quadruped robot.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# 誤り耐性スプリットフッド学習のためのスプリットポイントの最適化

Optimizing Split Points for Error-Resilient SplitFed Learning ( http://arxiv.org/abs/2405.19453v1 )

ライセンス: Link先を確認

Chamani Shiranthika, Parvaneh Saeedi, Ivan V. Bajić,

(参考訳) フェデレート・ラーニング(FL)、スプリット・ラーニング(SL)、スプリット・フェデレート・ラーニング(Split Federated Learning)といった分散学習の最近の進歩は、機械学習の可能性を広げている。 SplitFedは、FL内の個々のクライアントの計算負担を最小限に抑え、プライバシーを維持しながらSLを並列化する。本研究では,SplitFedのパケット損失に対するモデル分割点のレジリエンスについて検討した。 SplitFedのパラメータアグリゲーション戦略について検討し、各ポイントでモデルを分割する影響を調べる。実験はヒト胚画像分割作業で行われ、より深い分割点の統計的に有意な利点を明らかにした。

Recent advancements in decentralized learning, such as Federated Learning (FL), Split Learning (SL), and Split Federated Learning (SplitFed), have expanded the potentials of machine learning. SplitFed aims to minimize the computational burden on individual clients in FL and parallelize SL while maintaining privacy. This study investigates the resilience of SplitFed to packet loss at model split points. It explores various parameter aggregation strategies of SplitFed by examining the impact of splitting the model at different points-either shallow split or deep split-on the final global model performance. The experiments, conducted on a human embryo image segmentation task, reveal a statistically significant advantage of a deeper split point.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# Deep Grokking: ディープニューラルネットワークはより一般化するのか?

Deep Grokking: Would Deep Neural Networks Generalize Better? ( http://arxiv.org/abs/2405.19454v1 )

ライセンス: Link先を確認

Simin Fan, Razvan Pascanu, Martin Jaggi,

(参考訳) グラッキング現象に関する最近の研究は、ニューラルネットワークのトレーニング力学と一般化挙動の複雑さを照らしている。グロキング(Grokking)とは、ネットワークがトレーニングセットに完全に適合する拡張オーバーフィッティングフェーズの後に発生する、テストセット上でのネットワークの一般化精度の急激な上昇を指す。既存の研究は主に2層MLPや1層トランスフォーマーのような浅層ネットワークに焦点を当てているが、我々はディープネットワーク(例えば12層MLP)のグラッキングについて検討する。我々はこの現象を実証的に再現し、深層ニューラルネットワークが浅いものよりもグラッキングの影響を受けやすいことを発見した。一方,テスト精度が2次サージを示すMLPモデルの深さを増大させると,浅層モデルにはほとんど見られない,興味深い多段階一般化現象が観測される。さらに,特徴量の減少と,グルーキング時の過度適合から一般化段階への位相遷移の間の説得力のある対応を明らかにする。さらに,多段階一般化現象は特徴ランクの二重発色パターンとよく一致していることがわかった。これらの観測から、内部特徴ランクは重量ノルムと比較してモデルの一般化挙動のより有望な指標となる可能性が示唆された。我々の研究は、ディープニューラルネットワークに潜入し、特徴ランクと一般化性能の関係を調査する最初のものであると信じています。

Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on shallow networks such as 2-layer MLP and 1-layer Transformer, we explore grokking on deep networks (e.g. 12-layer MLP). We empirically replicate the phenomenon and find that deep neural networks can be more susceptible to grokking than its shallower counterparts. Meanwhile, we observe an intriguing multi-stage generalization phenomenon when increase the depth of the MLP model where the test accuracy exhibits a secondary surge, which is scarcely seen on shallow models. We further uncover compelling correspondences between the decreasing of feature ranks and the phase transition from overfitting to the generalization stage during grokking. Additionally, we find that the multi-stage generalization phenomenon often aligns with a double-descent pattern in feature ranks. These observations suggest that internal feature rank could serve as a more promising indicator of the model's generalization behavior compared to the weight-norm. We believe our work is the first one to dive into grokking in deep neural networks, and investigate the relationship of feature rank and generalization performance.

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# スタートアップ評価パイプラインの自動化 - スタートアップ成功予測フレームワーク(SSFF)

An Automated Startup Evaluation Pipeline: Startup Success Forecasting Framework (SSFF) ( http://arxiv.org/abs/2405.19456v1 )

ライセンス: Link先を確認

Xisen Wang, Yigit Ihlamur,

(参考訳) スタートアップを初期段階で評価することは、専門家による詳細な分析を必要とする複雑なタスクである。このプロセスを大規模に自動化することは、ビジネスに大きな影響を与えるが、固有の複雑さは課題を引き起こす。本稿では、従来の機械学習と高度な言語モデルを組み合わせた新しい自動化システムであるStartup Success Forecasting Framework(SSFF)を導入することで、この課題に対処する。このインテリジェントエージェントベースのアーキテクチャは、分析をエンドツーエンドで実行するベンチャーキャピタリストのように、推論、行動、合成、決定するために設計されています。 SSFFは3つの主な部分で構成されている。 - 予測ブロック: ランダムな森林とニューラルネットワークを使用して予測を行う。外部知識ブロック:外部ソースからリアルタイム情報を取得する。このフレームワークは、創業者とスタートアップの説明に関する最小限の入力データを必要とし、外部リソースからの追加データでそれを強化し、すべて自動化された方法で高精度に詳細な分析を行う。

Evaluating startups in their early stages is a complex task that requires detailed analysis by experts. While automating this process on a large scale can significantly impact businesses, the inherent complexity poses challenges. This paper addresses this challenge by introducing the Startup Success Forecasting Framework (SSFF), a new automated system that combines traditional machine learning with advanced language models. This intelligent agent-based architecture is designed to reason, act, synthesize, and decide like a venture capitalist to perform the analysis end-to-end. The SSFF is made up of three main parts: - Prediction Block: Uses random forests and neural networks to make predictions. - Analyst Block: Simulates VC analysis scenario and uses SOTA prompting techniques - External Knowledge Block: Gathers real-time information from external sources. This framework requires minimal input data about the founder and startup description, enhances it with additional data from external resources, and performs a detailed analysis with high accuracy, all in an automated manner

翻訳日:2024-05-31 19:26:02 公開日:2024-05-29

# MemControl:自動パラメータ選択による医療拡散モデルにおける記憶の緩和

MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection ( http://arxiv.org/abs/2405.19458v1 )

ライセンス: Link先を確認

Raman Dutt, Pedro Sanchez, Ondrej Bohdal, Sotirios A. Tsaftaris, Timothy Hospedales,

(参考訳) 拡散モデルは、トレーニング分布を忠実に反映した画像を生成する際、顕著な能力を示す。しかし、これらのモデルはデータの記憶をトレーニングする傾向があり、特に医用画像のような敏感な分野において、プライバシー、倫理、法的な懸念を生じさせる。我々は、暗記は深層モデルの過度パラメータ化によって引き起こされると仮定し、微調整時のモデルキャパシティの正規化が効果的な緩和戦略である可能性を示唆した。パラメータ効率のよい微調整(PEFT)手法は、特定のパラメータを選択的に更新することでキャパシティ制御に有望なアプローチを提供する。しかし、生成品質と記憶のバランスをとる学習可能なパラメータの最適なサブセットを見つけることは、いまだ解明されていない。この課題に対処するために、記憶と生成品質の指標を報酬として利用することにより、自動パラメータ選択をガイドする二段階最適化フレームワークを提案する。我々のフレームワークは、生成記憶トレードオフを満たすために更新すべき最適パラメータセットをうまく識別する。我々は,医用画像生成の特定のタスクに対する実験を行い,モデルパラメータの0.019%を微調整することで,既存の最先端のトレーニング時間緩和戦略を上回りました。さらに、我々のフレームワークを通じて得られた戦略は、異なるデータセットやドメイン間で転送可能であることを示す。提案するフレームワークは,大規模なデータセットに対してスケーラブルであり,報酬関数の選択に非依存である。最後に、我々のフレームワークと既存のアプローチを組み合わせることで、さらなる記憶の緩和を実現できることを示す。

Diffusion models show a remarkable ability in generating images that closely mirror the training distribution. However, these models are prone to training data memorization, leading to significant privacy, ethical, and legal concerns, particularly in sensitive fields such as medical imaging. We hypothesize that memorization is driven by the overparameterization of deep models, suggesting that regularizing model capacity during fine-tuning could be an effective mitigation strategy. Parameter-efficient fine-tuning (PEFT) methods offer a promising approach to capacity control by selectively updating specific parameters. However, finding the optimal subset of learnable parameters that balances generation quality and memorization remains elusive. To address this challenge, we propose a bi-level optimization framework that guides automated parameter selection by utilizing memorization and generation quality metrics as rewards. Our framework successfully identifies the optimal parameter set to be updated to satisfy the generation-memorization tradeoff. We perform our experiments for the specific task of medical image generation and outperform existing state-of-the-art training-time mitigation strategies by fine-tuning as few as 0.019% of model parameters. Furthermore, we show that the strategies learned through our framework are transferable across different datasets and domains. Our proposed framework is scalable to large datasets and agnostic to the choice of reward functions. Finally, we show that our framework can be combined with existing approaches for further memorization mitigation.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# クラスタリングに基づくドメイン一般化のための検証スプリット

Clustering-Based Validation Splits for Domain Generalisation ( http://arxiv.org/abs/2405.19461v1 )

ライセンス: Link先を確認

Andrea Napoli, Paul White,

(参考訳) 本稿では,ドメインシフトによるモデル選択の問題について考察する。この設定では、トレーニングセットと検証セットの間の最大平均誤差(MMD)が、選択されたモデルの一般化可能性を高めることが提案される。この目的を最大化するカーネルk平均クラスタリングに基づくデータ分割アルゴリズムを提案する。このアルゴリズムは線形プログラミングを利用して分割のサイズ、ラベル、(任意に)群分布を制御し、収束を保証する。このテクニックは、ドメイン一般化(DG)と教師なしドメイン適応(UDA)タスクの両方において、さまざまなデータセットとトレーニングアルゴリズムの代替分割戦略を一貫して上回る。分析はまた、トレーニングと検証セットの間のMDDが、テスト領域の精度と強いランク関連(\rho=0.63$)であることを示し、このアプローチの有効性をさらに裏付けている。

This paper considers the problem of model selection under domain shift. In this setting, it is proposed that a high maximum mean discrepancy (MMD) between the training and validation sets increases the generalisability of selected models. A data splitting algorithm based on kernel k-means clustering, which maximises this objective, is presented. The algorithm leverages linear programming to control the size, label, and (optionally) group distributions of the splits, and comes with convergence guarantees. The technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation (DG) and unsupervised domain adaptation (UDA) tasks. Analysis also shows the MMD between the training and validation sets to be strongly rank-correlated ($\rho=0.63$) with test domain accuracy, further substantiating the validity of this approach.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# クリティカルラーニング期間: 効率的なデータ処理のための早期トレーニングダイナミクスを活用する

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning ( http://arxiv.org/abs/2405.19462v1 )

ライセンス: Link先を確認

Everlyn Asiko Chimoto, Jay Gala, Orevaoghene Ahia, Julia Kreutzer, Bruce A. Bassett, Sara Hooker,

(参考訳) ニューラルマシン翻訳モデルは、非常にデータと計算能力が高い。しかし、全てのデータポイントがモデルトレーニングと一般化に等しく寄与するわけではない。低値のデータポイントを取り除くためのデータプルーニングは、モデルの性能を大幅に低下させることなく、計算予算を大幅に削減する利点がある。本稿では、初期モデルトレーニングのダイナミクスを活用して、モデル性能の最も関連性の高いデータポイントを識別する新しいデータプルーニング手法であるチェックポイントアクロスタイム(CAT)を提案する。我々は、COMET-QE、LASER、LaBSEなど、いくつかのデータプルーニング技術に対してCATをベンチマークする。 CAT は Indo-European 言語のベンチマークを複数のテストセットで上回ります。英語-ドイツ語、英語-フランス語、英語-スワヒリの翻訳タスクに適用すると、CATはトレーニングデータの最大50%をプルーニングしながら、完全なデータセットを使用するのに匹敵するパフォーマンスを達成する。我々は、CATが選択したデータポイントを検査し、それよりも長い文や、ユニークな単語や稀な単語が好まれる傾向にあることを示す。

Neural Machine Translation models are extremely data and compute-hungry. However, not all data points contribute equally to model training and generalization. Data pruning to remove the low-value data points has the benefit of drastically reducing the compute budget without significant drop in model performance. In this paper, we propose a new data pruning technique: Checkpoints Across Time (CAT), that leverages early model training dynamics to identify the most relevant data points for model performance. We benchmark CAT against several data pruning techniques including COMET-QE, LASER and LaBSE. We find that CAT outperforms the benchmarks on Indo-European languages on multiple test sets. When applied to English-German, English-French and English-Swahili translation tasks, CAT achieves comparable performance to using the full dataset, while pruning up to 50% of training data. We inspect the data points that CAT selects and find that it tends to favour longer sentences and sentences with unique or rare words.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# ストリームデータを用いた楽器可変回帰の確率的最適化アルゴリズム

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data ( http://arxiv.org/abs/2405.19463v1 )

ライセンス: Link先を確認

Xuxing Chen, Abhishek Roy, Yifan Hu, Krishnakumar Balasubramanian,

(参考訳) 本研究では,条件付き確率最適化問題として問題を見極め,インストゥルメンタル変数回帰のためのアルゴリズムを開発し,解析する。最小二乗変数回帰の文脈では、我々のアルゴリズムは行列逆転やミニバッチを必要とせず、ストリーミングデータを用いて機器変数回帰を行うための完全なオンラインアプローチを提供する。真のモデルが線型であれば、次数 $\mathcal{O}(\log T/T)$ と $\mathcal{O}(1/T^{1-\iota})$ がそれぞれ$\iota>0$ となる。重要なことは、2サンプルのオラクルが利用可能である場合、我々は共同創設者と機器変数の関係を明示的にモデル化し、推定することを避け、この問題をミニマックス最適化問題として再定義することに基づく最近の研究に対するアプローチの利点を実証する。理論的結果を裏付ける数値実験が提供される。

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-\iota})$ for any $\iota>0$, respectively under the availability of two-sample and one-sample oracles, respectively, where $T$ is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# スマートシティディジタル双生児のための生成AIを活用する:データ、シナリオ、3D都市モデル、都市デザインの自律的生成に関する調査

Leveraging Generative AI for Smart City Digital Twins: A Survey on the Autonomous Generation of Data, Scenarios, 3D City Models, and Urban Designs ( http://arxiv.org/abs/2405.19464v1 )

ライセンス: Link先を確認

Haowen Xu, Femi Omitaomu, Soheil Sabri, Xiao Li, Yongze Song,

(参考訳) 先進的な情報、コミュニケーション、およびコンピューティング技術を統合することで、現代の都市のデジタルトランスフォーメーションは、効率的で持続可能な都市管理のためのデータ駆動型スマートシティアプリケーションの時代を象徴している。それらの効果にもかかわらず、これらのアプリケーションは、異なる都市サブシステムを監視し、特徴付けるために、大量の高次元およびマルチドメインデータに頼り、データ品質と可用性によって制限されたアプリケーション領域における課題を示し、また、都市シナリオの生成や代替案の設計にコストがかかる。ディープラーニングの新たな研究領域として、生成人工知能(AI)モデルは、データとコード生成における独自の価値を実証している。本稿では, 交通・移動管理, エネルギーシステム運用, 建築・インフラ管理, 都市デザインなど, 都市部におけるスマートシティの領域における課題に対処するために, 生成型AI技術と都市デジタルツインの革新的な統合を検討することを目的とする。調査は、一般的な生成AIモデルとその応用分野の導入から始まり、続いて、生成AI技術の自律的能力を活用した既存の都市科学応用の構造化されたレビューで始まった。 (a)都市モニタリングと予測分析を促進するためのデータ拡張 b) 合成データ及びシナリオ生成 (c)自動3D都市モデリング、及び (d)都市デザインと最適化の創出。このレビューに基づいて、スマートシティのより信頼性が高く、スケーラブルで、自動化された管理のために、生成可能なAIモデルを次世代の都市デジタルツインに統合する潜在的な機会と技術的戦略について論じる。

The digital transformation of modern cities by integrating advanced information, communication, and computing technologies has marked the epoch of data-driven smart city applications for efficient and sustainable urban management. Despite their effectiveness, these applications often rely on massive amounts of high-dimensional and multi-domain data for monitoring and characterizing different urban sub-systems, presenting challenges in application areas that are limited by data quality and availability, as well as costly efforts for generating urban scenarios and design alternatives. As an emerging research area in deep learning, Generative Artificial Intelligence (AI) models have demonstrated their unique values in data and code generation. This survey paper aims to explore the innovative integration of generative AI techniques and urban digital twins to address challenges in the realm of smart cities in various urban sectors, such as transportation and mobility management, energy system operations, building and infrastructure management, and urban design. The survey starts with the introduction of popular generative AI models with their application areas, followed by a structured review of the existing urban science applications that leverage the autonomous capability of the generative AI techniques to facilitate (a) data augmentation for promoting urban monitoring and predictive analytics, (b) synthetic data and scenario generation, (c) automated 3D city modeling, and (d) generative urban design and optimization. Based on the review, this survey discusses potential opportunities and technical strategies that integrate generative AI models into the next-generation urban digital twins for more reliable, scalable, and automated management of smart cities.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# RAP: Sparse-and-Correlated Adapterを用いた効率的なテキストビデオ検索

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter ( http://arxiv.org/abs/2405.19465v1 )

ライセンス: Link先を確認

Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li,

(参考訳) Text-Video Retrieval (TVR)は、関連するビデオコンテンツと自然言語クエリを連携させることを目的としている。現在までに、ほとんどの最先端のTVR手法は、大規模な事前学習された視覚言語モデル(例えばCLIP)に基づいて、画像からビデオへの変換学習を学習している。しかし、TVR用にトレーニング済みのモデルを完全に微調整することは、非常に高価な計算コストを発生させる。そこで本研究では,テキスト・ビデオ検索の高速化を図るため,テキスト・ビデオ検索の手法をRAP (sparse-andcorrelated AdaPter) を用いて提案する。テキスト・ビデオのシナリオに適合するため,RAPには時間的間隔と相関性という2つの欠かせない特徴が備わっている。具体的には,凍結したCLIPバックボーンから画像毎の特徴を改良する低ランク変調モジュールを提案する。さらに、まずトップレスポンシブな視覚パッチを選択し、学習可能な時間とパッチのオフセットによる相関モデリングを強化する非同期な自己認識機構を導入する。 4つのTVRデータセットに対する大規模な実験により、RAPは完全な微調整や他のパラメータ効率の良い微調整方法と比較して、優れた、または同等のパフォーマンスを達成することが示された。

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). However, fully fine-tuning these pre-trained models for TVR incurs prohibitively expensive computation costs. To this end, we propose to conduct efficient text-video Retrieval with a sparse-andcorrelated AdaPter (RAP), i.e., fine-tuning the pre-trained model with a few parameterized layers. To accommodate the text-video scenario, we equip our RAP with two indispensable characteristics: temporal sparsity and correlation. Specifically, we propose a low-rank modulation module to refine the per-image features from the frozen CLIP backbone, which accentuates salient frames within the video features while alleviating temporal redundancy. Besides, we introduce an asynchronous self-attention mechanism that first selects the top responsive visual patches and augments the correlation modeling between them with learnable temporal and patch offsets. Extensive experiments on four TVR datasets demonstrate that RAP achieves superior or comparable performance compared to the fully fine-tuned counterpart and other parameter-efficient fine-tuning methods.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 自己回帰生成による後方サンプリング

Posterior Sampling via Autoregressive Generation ( http://arxiv.org/abs/2405.19466v1 )

ライセンス: Link先を確認

Kelly W Zhang, Tiffany, Cai, Hongseok Namkoong, Daniel Russo,

(参考訳) 知的エージェントは不確実性を理解し、それを解決するために積極的に情報を集める必要がある。本稿では,大規模な履歴データから帯域幅アルゴリズムを学習するための新しいフレームワークを提案する。まず、過去のデータを用いて自己回帰モデルを事前訓練し、繰り返しフィードバック/リワードの順序を予測する(例えば、時間とともに異なるユーザに対して表示されるニュース記事に対する応答)。正確な予測を行うために、モデルは、リッチなアクション特徴(例:記事の見出し)と、より多くの報酬が集められるにつれて信念を研ぐ方法(例:各記事が推奨されるようにクリックする)に基づいて、暗黙的に情報事前を学習する。意思決定時には、各アクションに対して想像された報酬の列を自動で(インプット)サンプリングし、最大平均的な報酬でアクションを選択する。ヒューリスティックとは程遠いが、我々のアプローチはトンプソンサンプリング(学習前の学習)の実装であり、注目すべき活発な探索アルゴリズムである。我々は,事前学習の損失がオンライン意思決定性能を直接制御できることを証明し,事前学習された言語モデルのエンドツーエンド微調整を統合してニュース記事の見出しテキストを処理し,パフォーマンスを向上させるニューズレコメンデーションタスクにおいて,我々のフレームワークを実証する。

Real-world decision-making requires grappling with a perpetual lack of data as environments change; intelligent agents must comprehend uncertainty and actively gather information to resolve it. We propose a new framework for learning bandit algorithms from massive historical data, which we demonstrate in a cold-start recommendation problem. First, we use historical data to pretrain an autoregressive model to predict a sequence of repeated feedback/rewards (e.g., responses to news articles shown to different users over time). In learning to make accurate predictions, the model implicitly learns an informed prior based on rich action features (e.g., article headlines) and how to sharpen beliefs as more rewards are gathered (e.g., clicks as each article is recommended). At decision-time, we autoregressively sample (impute) an imagined sequence of rewards for each action, and choose the action with the largest average imputed reward. Far from a heuristic, our approach is an implementation of Thompson sampling (with a learned prior), a prominent active exploration algorithm. We prove our pretraining loss directly controls online decision-making performance, and we demonstrate our framework on a news recommendation task where we integrate end-to-end fine-tuning of a pretrained language model to process news article headline text to improve performance.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 機械学習におけるデータ最小化原理

The Data Minimization Principle in Machine Learning ( http://arxiv.org/abs/2405.19471v1 )

ライセンス: Link先を確認

Prakhar Ganesh, Cuong Tran, Reza Shokri, Ferdinando Fioretto,

(参考訳) データ最小化の原則は、不正使用、不正アクセス、データ漏洩の可能性を最小化するために収集、処理、保持されたデータの量を減らすことを目的としている。プライバシ・バイ・デザインの原則に則って、データ最小化はさまざまなグローバルなデータ保護規制によって支持されている。しかし、厳密な定式化が欠如しているため、その実践的な実装は依然として課題である。本稿では,このギャップに対処し,その法的定義に基づくデータ最小化のための最適化フレームワークを提案する。その後、データ最小化の実行にいくつかの最適化アルゴリズムを適用し、最小化目標への準拠やユーザのプライバシへの影響の観点から包括的な評価を行う。我々の分析は、データ最小化のプライバシー期待と実際のプライバシー利益のミスマッチを強調し、現実のプライバシーリスクの複数の側面を考慮に入れたアプローチの必要性を強調している。

The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation. This paper addresses this gap and introduces an optimization framework for data minimization based on its legal definitions. It then adapts several optimization algorithms to perform data minimization and conducts a comprehensive evaluation in terms of their compliance with minimization objectives as well as their impact on user privacy. Our analysis underscores the mismatch between the privacy expectations of data minimization and the actual privacy benefits, emphasizing the need for approaches that account for multiple facets of real-world privacy risks.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 基礎モデルの時代への参加

Participation in the age of foundation models ( http://arxiv.org/abs/2405.19479v1 )

ライセンス: Link先を確認

Harini Suresh, Emily Tseng, Meg Young, Mary L. Gray, Emma Pierson, Karen Levy,

(参考訳) ファンデーションモデルの能力に対する関心と投資の高まりは、そのようなシステムを幅広い公共サービスに影響を与えるように位置づけてきた。これらの機会の他に、これらのシステムが既存の権力不均衡を緩和し、疎外化コミュニティに不均衡な害をもたらすリスクがある。参加型アプローチは、利害関係者に代理店や意思決定権限を貸すことを約束する。しかし、参加型AI/MLにおける既存のアプローチは、一般的にコンテキストに深く根ざしている。私たちの論文はこの質問を尋問する。まず、ファンデーションモデルへの参加を取り入れる既存の試みについて検討する。参加者と規模の間の緊張を強調し、インパクトのあるコミュニティが、普遍的な適用を意図した基盤モデルを有意義に形成することは困難であることを示す。これに対し、我々は、より局所的でアプリケーション指向の、有意義な参加機会を特定する、参加型ファンデーションモデルのための青写真を開発する。基盤」レイヤに加えて、我々のフレームワークでは、ステークホルダーが基盤となるドメインに対して共通の技術基盤、規範、ガバナンスを開発する「下層」層と、影響のあるコミュニティが特定の下流タスクに基盤モデルを使用することを形作る「下層」層を提案しています。中間の「下層」層は、検討すべき潜在的な害の範囲を包含し、議論と介入のためのより具体的な道筋をコミュニティに与えている。同時に、関連するユースケースにまたがって入力をスケールすることで、重複を回避する。臨床医療,金融サービス,ジャーナリズムの3つのケーススタディを通じて,この多層モデルが基盤層にのみ介入するよりも,より有意義な参加機会を生み出すかを説明する。

Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question. First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model that is intended to be universally applicable. In response, we develop a blueprint for participatory foundation models that identifies more local, application-oriented opportunities for meaningful participation. In addition to the "foundation" layer, our framework proposes the "subfloor'' layer, in which stakeholders develop shared technical infrastructure, norms and governance for a grounded domain, and the "surface'' layer, in which affected communities shape the use of a foundation model for a specific downstream task. The intermediate "subfloor'' layer scopes the range of potential harms to consider, and affords communities more concrete avenues for deliberation and intervention. At the same time, it avoids duplicative effort by scaling input across relevant use cases. Through three case studies in clinical care, financial services, and journalism, we illustrate how this multi-layer model can create more meaningful opportunities for participation than solely intervening at the foundation layer.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# テンソルSVDのための量子アルゴリズム

Quantum Algorithms for tensor-SVD ( http://arxiv.org/abs/2405.19485v1 )

ライセンス: Link先を確認

Jezer Jojo, Ankit Khandelwal, M Girish Chandra,

(参考訳) 量子コンピューティングへの応用の有望な領域は線形代数問題である。本研究では,2つの新しい量子t-SVD (tensor-SVD)アルゴリズムを提案する。最初のアルゴリズムは、主にコンテキスト認識レコメンデーションシステムのための量子t-SVDアルゴリズムを提案した以前の研究に基づいている。しかし、新しいアルゴリズムは、元の欠点に対処し、修正しようとしており、既存の作業と根本的に異なるアプローチである。提案する2番目のアルゴリズムは、既知の変分量子SVDアルゴリズムに基づくハイブリッド変分法を用いる。

A promising area of applications for quantum computing is in linear algebra problems. In this work, we introduce two new quantum t-SVD (tensor-SVD) algorithms. The first algorithm is largely based on previous work that proposed a quantum t-SVD algorithm for context-aware recommendation systems. The new algorithm however seeks to address and fix certain drawbacks to the original, and is fundamentally different in its approach compared to the existing work. The second algorithm proposed uses a hybrid variational approach largely based on a known variational quantum SVD algorithm.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 大規模データのためのオンライン非パラメトリック教師付き学習

Online Nonparametric Supervised Learning for Massive Data ( http://arxiv.org/abs/2405.19486v1 )

ライセンス: Link先を確認

Mohamed Chaouch, Omama M. Al-Hamed,

(参考訳) 単純さ、計算コストの低さ、データ要求の面での利点にもかかわらず、線形判別分析、二次判別分析、ロジスティック回帰のようなパラメトリック機械学習アルゴリズムは、線形性、通常課される正規分布と高次元性に対する特徴の整合性、といった深刻な欠点に悩まされている。特徴制約の線形性と正規性を克服するバッチカーネルベースの非パラメトリック分類器は、教師付き分類問題の興味深い代替手段である。しかし、それは『次元の計算』に苦しめられている。この問題は、ビッグデータ時代の爆発的なサンプルサイズによって緩和できるが、大規模データサイズはデータの保存と分類器の計算にいくつかの課題をもたらす。これらの課題により、古典的な非パラメトリック分類器はもはや適用されない。これにより,非パラメトリック分類器の大規模化とストリーミングデータフレームワークのリアルタイム計算に適応した高速なアルゴリズムを開発することができる。このオンライン分類器は2つのステップを含む。まず、計算コストを非常に低く抑えるために、オンラインの原理成分分析について検討する。そして、確率近似アルゴリズムを適用し、非パラメトリック分類器のリアルタイム計算を得る。提案手法は、リアルタイムな胎児の健康モニタリングによく使用される機械学習アルゴリズムと比較して評価・比較する。研究によると、オフライン(またはバッチ)だけでなく、オンライン分類器もランダムな森林アルゴリズムと競合する。さらに、オンライン分類器はオフライン分類器と比較して、最良のトレードオフ精度/計算コストを与えることを示した。

Despite their benefits in terms of simplicity, low computational cost and data requirement, parametric machine learning algorithms, such as linear discriminant analysis, quadratic discriminant analysis or logistic regression, suffer from serious drawbacks including linearity, poor fit of features to the usually imposed normal distribution and high dimensionality. Batch kernel-based nonparametric classifier, which overcomes the linearity and normality of features constraints, represent an interesting alternative for supervised classification problem. However, it suffers from the ``curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while large-scale data size presents some challenges in the storage of data and the calculation of the classifier. These challenges make the classical batch nonparametric classifier no longer applicable. This motivates us to develop a fast algorithm adapted to the real-time calculation of the nonparametric classifier in massive as well as streaming data frameworks. This online classifier includes two steps. First, we consider an online principle components analysis to reduce the dimension of the features with a very low computation cost. Then, a stochastic approximation algorithm is deployed to obtain a real-time calculation of the nonparametric classifier. The proposed methods are evaluated and compared to some commonly used machine learning algorithms for real-time fetal well-being monitoring. The study revealed that, in terms of accuracy, the offline (or Batch), as well as, the online classifiers are good competitors to the random forest algorithm. Moreover, we show that the online classifier gives the best trade-off accuracy/computation cost compared to the offline classifier.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 大規模言語モデルに基づく全二重音声対話方式

A Full-duplex Speech Dialogue Scheme Based On Large Language Models ( http://arxiv.org/abs/2405.19487v1 )

ライセンス: Link先を確認

Peng Wang, Songshuo Lu, Yaohua Tang, Sijie Yan, Yuanjun Xiong, Wei Xia,

(参考訳) 本稿では,シームレスな対話が可能な生成対話システムについて述べる。これは、知覚モジュール、運動関数モジュール、および2つの状態を持つ単純な有限状態マシン(ニューラルFSMと呼ばれる)の概念を認識するために慎重に整列された大きな言語モデル(LLM)に基づいている。知覚機能モジュールと運動機能モジュールは同時に動作し、システムは同時にユーザの声を聴くことができる。 LLMは、問い合わせ応答のためのテキストトークンを生成し、神経FSMに制御トークンを出力することにより、応答、待機、または中断を開始するための自律的な決定を行う。 LLMのこれらのタスクはすべて、リアルタイムに対話のシリアライズされたビュー上で次のトークン予測として実行される。実生活のインタラクションをシミュレーションした自動品質評価では,LLMベースの半二重対話システムと比較して,平均会話応答遅延を3倍以上に削減し,500ミリ秒以内の応答を50%以上と評価した。 LLMをわずか80億のパラメータで実行すると、音声による対話において最も有効な商用LLMよりも8%高い割り込み精度を示す。

We present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concept of a simple finite state machine (called neural FSM) with two states. The perception and motor function modules operate simultaneously, allowing the system to simultaneously speak and listen to the user. The LLM generates textual tokens for inquiry responses and makes autonomous decisions to start responding to, wait for, or interrupt the user by emitting control tokens to the neural FSM. All these tasks of the LLM are carried out as next token prediction on a serialized view of the dialogue in real-time. In automatic quality evaluations simulating real-life interaction, the proposed system reduces the average conversation response latency by more than 3 folds compared with LLM-based half-duplex dialogue systems while responding within less than 500 milliseconds in more than 50% of evaluated interactions. Running a LLM with only 8 billion parameters, our system exhibits a 8% higher interruption precision rate than the best available commercial LLM for voice-based dialogue.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# Total Segmentator MRI : MRI画像における59個の解剖学的構造の配列非依存的セグメンテーション

TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images ( http://arxiv.org/abs/2405.19492v1 )

ライセンス: Link先を確認

Tugba Akinci D'Antonoli, Lucas K. Berger, Ashraya K. Indrakanti, Nathan Vishwanathan, Jakob Weiß, Matthias Jung, Zeynep Berkarda, Alexander Rau, Marco Reisert, Thomas Küstner, Alexandra Walter, Elmar M. Merkle, Martin Segeroth, Joshy Cyriac, Shan Yang, Jakob Wasserthal,

(参考訳) 目的:MR画像のほとんどの解剖学的構造をMRシーケンスから独立して自動的かつ堅牢に分割できる,オープンソースで使いやすいセグメンテーションモデルを開発すること。材料と方法:本研究では,TotalSegmentatorをMR画像に拡張した。 298個のMRIスキャンと227個のCTスキャンを用いて59個の解剖学的構造(20の臓器,18の骨,11の筋肉,7の血管,3の組織型)を分類した。 MRとCTの画像は、通常の臨床研究からランダムにサンプリングされ、現実世界のデータセット(年齢、病理、スキャナー、身体部分、シーケンス、コントラスト、エコー時間、反復時間、フィールド強度、スライス厚さ、サイト)を表す。我々は,このデータセット上でnnU-Netセグメンテーションアルゴリズムを訓練し,モデルの性能を評価するためにDice類似度係数(Dice)を算出した。結果: Dice スコアは 0.824 (CI: 0.801, 0.842) であった。このモデルは、他の2つの公開セグメンテーションモデル(Dice score, 0.824 vs 0.762; p<0.001 and 0.762 versus 0.542; p<0.001)を大きく上回った。 The CT image test set of the original TotalSegmentator paper, almost match the performance of the original TotalSegmentator (Dice score, 0.960 versus 0.970; p<0.001)。結論:本モデルでは,TotalSegmentatorをMR画像に拡張する。アノテーション付きデータセット(https://zenodo.org/doi/10.5281/zenodo.11367004)とオープンソースツールキット(https://www.github.com/wasserth/TotalSegmentator)が公開されている。

Purpose: To develop an open-source and easy-to-use segmentation model that can automatically and robustly segment most major anatomical structures in MR images independently of the MR sequence. Materials and Methods: In this study we extended the capabilities of TotalSegmentator to MR images. 298 MR scans and 227 CT scans were used to segment 59 anatomical structures (20 organs, 18 bones, 11 muscles, 7 vessels, 3 tissue types) relevant for use cases such as organ volumetry, disease characterization, and surgical planning. The MR and CT images were randomly sampled from routine clinical studies and thus represent a real-world dataset (different ages, pathologies, scanners, body parts, sequences, contrasts, echo times, repetition times, field strengths, slice thicknesses and sites). We trained an nnU-Net segmentation algorithm on this dataset and calculated Dice similarity coefficients (Dice) to evaluate the model's performance. Results: The model showed a Dice score of 0.824 (CI: 0.801, 0.842) on the test set, which included a wide range of clinical data with major pathologies. The model significantly outperformed two other publicly available segmentation models (Dice score, 0.824 versus 0.762; p<0.001 and 0.762 versus 0.542; p<0.001). On the CT image test set of the original TotalSegmentator paper it almost matches the performance of the original TotalSegmentator (Dice score, 0.960 versus 0.970; p<0.001). Conclusion: Our proposed model extends the capabilities of TotalSegmentator to MR images. The annotated dataset (https://zenodo.org/doi/10.5281/zenodo.11367004) and open-source toolkit (https://www.github.com/wasserth/TotalSegmentator) are publicly available.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 後方刺激ブリルアン散乱による光学的絡み合い

Optomechanical entanglement induced by backward stimulated Brillouin scattering ( http://arxiv.org/abs/2405.19494v1 )

ライセンス: Link先を確認

P. Djorwé, A. H. Abdel-Aty, K. S. Nisar, S. G. Nana Engo,

(参考訳) 本稿では,ロバストな光学的絡み合いを生成する手法を提案する。このスキームは、光学構造内にホストされる後方刺激ブリルアン散乱(BSBS)プロセスに基づいている。我々のベンチマークシステムは、BSBS(放射圧)プロセスを介して2つの光学モードに結合された音響(機械)モードで構成されている。有効機械結合の適度な値に対して、BSBSは比較的弱い絡み合いを誘導する。この絡み合いは、機械的結合強度が十分に強い場合、少なくとも1桁まで大きく強化される。生成した絡み合いは熱ゆらぎに対して十分に堅牢である。我々の研究は、BSBS効果に基づくエンタングルメント生成の新しいスキームを提供し、マイクロ波やハイブリッド光学構造に拡張することができる。このような生成された絡み合った状態は、量子情報処理、量子センシング、量子コンピューティングに利用できる。

We propose a scheme to generate robust optomechanical entanglement. This scheme is based on a Backward Stimulated Brillouin Scattering (BSBS) process, which is hosted within an optomechanical structure. Our benchmark system consists of an acoustic (mechanical) mode coupled to two optical modes through the BSBS (radiation pressure) process. For a moderate values of the effective mechanical coupling, the BSBS induces a relatively weak entanglement. This entanglement is greatly enhanced, for at least up to one order of magnitude, when the mechanical coupling strength is strong enough. The generated entanglement is robust enough against thermal fluctuation. Our work provides a new scheme for entanglement generation based on BSBS effect, and can be extended to microwaves and hybrid optomechanical structures. Such a generated entangled states can be used for quantum information processing, quantum sensing, and quantum computing.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# Qiskit Code Assistant: 量子コンピューティングコードを生成するためのLLMのトレーニング

Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code ( http://arxiv.org/abs/2405.19495v1 )

ライセンス: Link先を確認

Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito,

(参考訳) Code Large Language Models (Code LLMs) は強力なツールとして登場し、コーディングプロセスの自動化とアプリケーション構築に必要な時間と労力の削減によって、ソフトウェア開発の世界に革命をもたらした。本稿では,量子コンピューティングの分野を専門とする Code LLM のトレーニングに焦点をあてる。まず、古典的なプログラミング手法や言語とは大きく異なる量子コンピューティングプログラミングのユニークなニーズについて議論する。量子コンピューティングに特化したコードLLMは、量子コンピューティングと量子情報理論の基本的な理解を必要とする。しかし、利用可能な量子コードサンプルの不足と、継続的なデータセット更新を必要とする急速に進化するフィールドは、重大な課題を呈している。さらに,Qiskitライブラリを用いた高品質な量子コードを生成するために,コードLLMをトレーニングする作業についても論じる。本研究は, 訓練用LLMの様々な側面と訓練条件, および現在のモデルで得られた結果について検討することを含む。我々のモデルを評価するために、我々は、Qiskitを用いた量子コンピューティングプログラミングの分野に特化して設計された一連のテストを含むHumanEvalに似たカスタムベンチマークを開発した。以上の結果から,我々のモデルは量子コンピューティングタスクにおける既存の最先端モデルよりも優れていたことが示唆された。また、コード提案の例を示し、私たちのモデルを他の関連するコードLLMと比較します。最後に、量子コンピューティングの計算科学者、研究者、実践者に対して、Code LLMsの潜在的なメリットについて論じる。この状況に関係のあるさまざまな機能や今後の作業についても検討しています。

Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ significantly from classical programming approaches or languages. A Code LLM specializing in quantum computing requires a foundational understanding of quantum computing and quantum information theory. However, the scarcity of available quantum code examples and the rapidly evolving field, which necessitates continuous dataset updates, present significant challenges. Moreover, we discuss our work on training Code LLMs to produce high-quality quantum code using the Qiskit library. This work includes an examination of the various aspects of the LLMs used for training and the specific training conditions, as well as the results obtained with our current models. To evaluate our models, we have developed a custom benchmark, similar to HumanEval, which includes a set of tests specifically designed for the field of quantum computing programming using Qiskit. Our findings indicate that our model outperforms existing state-of-the-art models in quantum computing tasks. We also provide examples of code suggestions, comparing our model to other relevant code LLMs. Finally, we introduce a discussion on the potential benefits of Code LLMs for quantum computing computational scientists, researchers, and practitioners. We also explore various features and future work that could be relevant in this context.

翻訳日:2024-05-31 19:16:17 公開日:2024-05-29

# 未ペアデータを用いた音声ドメイン転送のためのガウス流橋

Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data ( http://arxiv.org/abs/2405.19497v1 )

ライセンス: Link先を確認

Eloi Moliner, Sebastian Braun, Hannes Gamper,

(参考訳) オーディオドメイン転送(Audio domain transfer)とは、元のコンテンツを保持しながら、異なるドメインの特性に合わせて音声信号を変更するプロセスである。本稿では,生成モデルにおけるガウス流橋の可能性について検討する。提案フレームワークは,2つの決定論的確率フローの一連の実装を通じて,音声信号の異なる分布間の伝達問題に対処する。提案フレームワークは,対象領域の特定の側面を定義する連続制御変数を通じて,対象分布特性の操作を容易にする。特に、このアプローチはペアの例をトレーニングに頼ってはいません。音声内容の一貫性を維持する上での課題に対処するため,チャンクをベースとしたデータサンプルとノイズの最適輸送結合を含むトレーニング戦略を推奨する。教師なし手法と確立されたベースラインを比較すると,残響や歪み操作のタスクにおいて,競争性能が向上することがわかった。この研究で得られた興味深い結果は、さらなる探査の可能性を浮き彫りにしている。

Audio domain transfer is the process of modifying audio signals to match characteristics of a different domain, while retaining the original content. This paper investigates the potential of Gaussian Flow Bridges, an emerging approach in generative modeling, for this problem. The presented framework addresses the transport problem across different distributions of audio signals through the implementation of a series of two deterministic probability flows. The proposed framework facilitates manipulation of the target distribution properties through a continuous control variable, which defines a certain aspect of the target domain. Notably, this approach does not rely on paired examples for training. To address identified challenges on maintaining the speech content consistent, we recommend a training strategy that incorporates chunk-based minibatch Optimal Transport couplings of data samples and noise. Comparing our unsupervised method with established baselines, we find competitive performance in tasks of reverberation and distortion manipulation. Despite encoutering limitations, the intriguing results obtained in this study underscore potential for further exploration.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 機械心理学:人工汎用知能研究の促進のための非公理推論システムと操作条件の統合

Machine Psychology: Integrating Operant Conditioning with the Non-Axiomatic Reasoning System for Advancing Artificial General Intelligence Research ( http://arxiv.org/abs/2405.19498v1 )

ライセンス: Link先を確認

Robert Johansson,

(参考訳) 本稿では,機械心理学という学際的枠組みを導入し,操作的学習心理学から特定の人工知能モデル,非公理推論システム(NARS)を融合させ,人工知能(AGI)の研究を強化する。この枠組みの中核的な前提は、適応は生物学的および人工知能の両方にとって不可欠であり、操作的な条件付け原理によって理解できるということである。本研究は,OpenNARS for Applications (ONA) を用いた3つの操作的学習タスクを通して,本手法を評価する。単純な識別タスクでは、NARSは迅速な学習を示し、トレーニングとテストの段階で完全な精度を達成した。タスク条件が逆転した際の動作の調整に成功し、NARSの適応性を示した。条件付き識別タスクにおいて、NARSは複雑な学習シナリオを効果的に処理し、条件付きキューに基づいて複雑な仮説を形成し、活用することで高い精度を達成する。これらの知見は適応型AGIシステム構築のフレームワークとしてのオペラントコンディショニングの活用を支援する。 NARSの知識と資源不足の条件下での運用能力は、感覚運動の推論能力と相まって、AGIの堅牢なモデルとして確立されている。 Machine Psychologyフレームワークは、継続的学習やゴール駆動行動といった自然知性の要素を取り入れることで、現実世界のアプリケーションにスケーラブルで柔軟なアプローチを提供する。今後の研究は、強化されたNARSシステム、より高度なタスクを使用して、このフレームワークを多種多様な複雑な課題に適用し、人間レベルのAIの開発をさらに進展させるべきである。

This paper introduces an interdisciplinary framework called Machine Psychology, which merges principles from operant learning psychology with a specific Artificial Intelligence model, the Non-Axiomatic Reasoning System (NARS), to enhance Artificial General Intelligence (AGI) research. The core premise of this framework is that adaptation is crucial to both biological and artificial intelligence and can be understood through operant conditioning principles. The study assesses this approach via three operant learning tasks using OpenNARS for Applications (ONA): simple discrimination, changing contingencies, and conditional discrimination tasks. In the simple discrimination task, NARS demonstrated rapid learning, achieving perfect accuracy during both training and testing phases. The changing contingencies task showcased NARS's adaptability, as it successfully adjusted its behavior when task conditions were reversed. In the conditional discrimination task, NARS handled complex learning scenarios effectively, achieving high accuracy by forming and utilizing intricate hypotheses based on conditional cues. These findings support the application of operant conditioning as a framework for creating adaptive AGI systems. NARS's ability to operate under conditions of insufficient knowledge and resources, coupled with its sensorimotor reasoning capabilities, establishes it as a robust model for AGI. The Machine Psychology framework, by incorporating elements of natural intelligence such as continuous learning and goal-driven behavior, offers a scalable and flexible approach for real-world applications. Future research should investigate using enhanced NARS systems, more advanced tasks, and applying this framework to diverse, complex challenges to further progress the development of human-level AI.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 勝利のためのモメンタム:異種環境における協調的強化学習

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments ( http://arxiv.org/abs/2405.19499v1 )

ライセンス: Link先を確認

Han Wang, Sihong He, Zhili Zhang, Fei Miao, James Anderson,

(参考訳) 我々は、フェデレート強化学習(FRL)問題を探り、N$エージェントが共通の方針を、軌跡データを共有せずに共同で学習する。これまで、既存のFRL作業は、主に同じまたは‘類似’環境で動作するエージェントに焦点を当ててきた。対照的に、我々の問題設定は、任意に大きな環境不均一性を可能にします。完全に異なる環境における平均性能を最大化する最適ポリシーを得るために,FedSVRPG-MとFedHAPG-Mの2つのアルゴリズムを提案する。既存の結果とは対照的に, 運動量機構を利用するFedSVRPG-MとFedHAPG-Mは, 環境の不均一性に関わらず, 平均性能関数の定常点に正確に収束できることを実証した。さらに、分散還元法やヘッセン近似の利点を取り入れることで、両アルゴリズムは、$\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$のサンプル複雑性を特徴とする最先端の収束結果が得られる。特に,本アルゴリズムはエージェント数に関して線形収束の高速化を享受し,共通ポリシーの発見におけるエージェント間の協調のメリットを強調している。

We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# MDS-ViTNet:視覚変換器による視線追跡の精度予測の改善

MDS-ViTNet: Improving saliency prediction for Eye-Tracking with Vision Transformer ( http://arxiv.org/abs/2405.19501v1 )

ライセンス: Link先を確認

Polezhaev Ignat, Goncharenko Igor, Iurina Natalya,

(参考訳) 本稿では、視覚的サリエンシ予測や視線追跡を改善するため、MDS-ViTNet(Multi Decoder Saliency by Vision Transformer Network)と呼ばれる新しい手法を提案する。このアプローチは、マーケティング、医療、ロボティクス、小売など、さまざまな分野において大きな可能性を秘めている。本稿では、従来のImageNetバックボーンを超えて、Vision Transformerを利用するネットワークアーキテクチャを提案する。フレームワークはエンコーダ-デコーダ構造を採用し、エンコーダはSwinトランスフォーマーを使用して最も重要な機能を効率的に埋め込む。このプロセスにはTransfer Learningメソッドが含まれており、Vision TransformerのレイヤはEncoder Transformerで変換され、CNN Decoderにシームレスに統合される。この手法は、元の入力画像からの情報損失を最小限に抑える。デコーダは2つの異なる注意マップを生成するためにデュアルデコーダを利用するマルチデコーダ技術を採用している。これらの写像はその後、追加のCNNモデルを介して特異出力に結合される。我々のトレーニングモデルMDS-ViTNetは、いくつかのベンチマークで最先端の結果を得る。さらなるコラボレーションを促進するために、コードやモデル、データセットを一般向けに公開するつもりです。

In this paper, we present a novel methodology we call MDS-ViTNet (Multi Decoder Saliency by Vision Transformer Network) for enhancing visual saliency prediction or eye-tracking. This approach holds significant potential for diverse fields, including marketing, medicine, robotics, and retail. We propose a network architecture that leverages the Vision Transformer, moving beyond the conventional ImageNet backbone. The framework adopts an encoder-decoder structure, with the encoder utilizing a Swin transformer to efficiently embed most important features. This process involves a Transfer Learning method, wherein layers from the Vision Transformer are converted by the Encoder Transformer and seamlessly integrated into a CNN Decoder. This methodology ensures minimal information loss from the original input image. The decoder employs a multi-decoding technique, utilizing dual decoders to generate two distinct attention maps. These maps are subsequently combined into a singular output via an additional CNN model. Our trained model MDS-ViTNet achieves state-of-the-art results across several benchmarks. Committed to fostering further collaboration, we intend to make our code, models, and datasets accessible to the public.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 低温における2000サイト光ツイーザアレイにおける単一原子の再配置

Rearrangement of single atoms in a 2000-site optical tweezers array at cryogenic temperatures ( http://arxiv.org/abs/2405.19503v1 )

ライセンス: Link先を確認

Grégoire Pichard, Desiree Lim, Etienne Bloch, Julien Vaneecloo, Lilian Bourachot, Gert-Jan Both, Guillaume Mériaux, Sylvain Dutartre, Richard Hostein, Julien Paris, Bruno Ximenez, Adrien Signoles, Antoine Browaeys, Thierry Lahaye, Davide Dreon,

(参考訳) 我々は6Kの低温環境下で2088箇所までの光学的ツイーザからなる1つのルビジウム原子のトラップについて報告する。我々のアプローチは、真空だが室温の顕微鏡目的と、捕捉された原子の低温環境を確保するために目的が突き出ている窓のない熱シールドとの併用に依拠する。効率的なトラップを行うのに十分な光学パワーを達成するために、我々は2つのレーザーをわずかに異なる波長で組み合わせる。設計の性能と限界について論じる。最後に、フィールドプログラマブルゲートアレイによって制御される移動光学式ツイーザを用いた828原子ターゲットアレイの原子間アレンジメントを実証する。

We report on the trapping of single rubidium atoms in large arrays of optical tweezers comprising up to 2088 sites in a cryogenic environment at 6 K. Our approach relies on the use of microscope objectives that are in-vacuum but at room temperature, in combination with windowless thermal shields into which the objectives are protruding to ensure a cryogenic environment for the trapped atoms. To achieve enough optical power for efficient trapping, we combine two lasers at slightly different wavelengths. We discuss the performance and limitations of our design. Finally, we demonstrate atom-by-atom rearrangement of an 828-atom target array using moving optical tweezers controlled by a field-programmable gate array.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 任意遅延を伴う時間変数ネットワークにおける分散最適化

Decentralized Optimization in Time-Varying Networks with Arbitrary Delays ( http://arxiv.org/abs/2405.19513v1 )

ライセンス: Link先を確認

Tomas Ortega, Hamid Jafarkhani,

(参考訳) 通信遅延によるネットワークの分散最適化問題を考察する。そのようなネットワークの例としては、協調機械学習、センサーネットワーク、マルチエージェントシステムなどがある。通信遅延を模倣するため、ネットワークに仮想非計算ノードを追加し、有向グラフを生成する。このことは、有向グラフ上の分散最適化ソリューションの調査を動機付けている。既存のソリューションでは、ノードはアウト学位を知っていると仮定し、適用性は制限される。この制限を克服するために、DT-GOと呼ばれる新しいゴシップベースのアルゴリズムを導入する。このアルゴリズムは一般的な有向ネットワーク、例えば遅延や限定的な認知能力を持つネットワークに適用できる。我々は凸目標と非凸目標の両方に対して収束率を導出し、このアルゴリズムが集中確率勾配Descentと同じ複雑さのオーダーを達成することを示す。言い換えれば、グラフ位相と遅延の影響は高次項に限られる。さらに、時間変化のネットワークトポロジに対応するために分析を拡張します。理論的知見を裏付ける数値シミュレーションが提案されている。

We consider a decentralized optimization problem for networks affected by communication delays. Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems. To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs. This motivates investigating decentralized optimization solutions on directed graphs. Existing solutions assume nodes know their out-degrees, resulting in limited applicability. To overcome this limitation, we introduce a novel gossip-based algorithm, called DT-GO, that does not need to know the out-degrees. The algorithm is applicable in general directed networks, for example networks with delays or limited acknowledgment capabilities. We derive convergence rates for both convex and non-convex objectives, showing that our algorithm achieves the same complexity order as centralized Stochastic Gradient Descent. In other words, the effects of the graph topology and delays are confined to higher-order terms. Additionally, we extend our analysis to accommodate time-varying network topologies. Numerical simulations are provided to support our theoretical findings.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 無線周波数での視覚認識の実現

Enabling Visual Recognition at Radio Frequency ( http://arxiv.org/abs/2405.19516v1 )

ライセンス: Link先を確認

Haowen Lai, Gaoxiang Luo, Yifei Liu, Mingmin Zhao,

(参考訳) 本稿では、光信号に挑戦する条件に対する耐性を提供しつつ、RF分解能をLiDARに近づける新しいRFイメージングシステムであるPanoRadarを紹介する。当社のLiDAR対応3D画像は, 表面正規推定, セマンティックセグメンテーション, オブジェクト検出など, 無線周波数での様々な視覚認識タスクを初めて実現した。 PanoRadarは、回転するシングルチップmmWaveレーダーと、新しい信号処理と機械学習アルゴリズムを組み合わせて、周囲の高解像度な3D画像を生成する。本システムはロボットの動きを正確に推定し,高密度の合成アンテナ網によるコヒーレントイメージングを可能にする。また、高い方位分解能を利用して、学習に基づく手法で高分解能を向上させる。さらに、PanoRadarは2D畳み込みによる3D学習に取り組み、RF信号のユニークな特性のために課題に対処する。以上の結果から,パノラダルの12棟の建物における堅牢な性能が示された。

This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. PanoRadar utilizes a rotating single-chip mmWave radar, along with a combination of novel signal processing and machine learning algorithms, to create high-resolution 3D images of the surroundings. Our system accurately estimates robot motion, allowing for coherent imaging through a dense grid of synthetic antennas. It also exploits the high azimuth resolution to enhance elevation resolution using learning-based methods. Furthermore, PanoRadar tackles 3D learning via 2D convolutions and addresses challenges due to the unique characteristics of RF signals. Our results demonstrate PanoRadar's robust performance across 12 buildings.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 中距離を超える大気・海洋予測のためのハイブリッド・マシンラーニング/物理モデルの可能性を探る

Exploring the Potential of Hybrid Machine-Learning/Physics-Based Modeling for Atmospheric/Oceanic Prediction Beyond the Medium Range ( http://arxiv.org/abs/2405.19518v1 )

ライセンス: Link先を確認

Dhruvit Patel, Troy Arcomano, Brian Hunt, Istvan Szunyogh, Edward Ott,

(参考訳) 本稿では、機械学習(ML)と従来の物理モデルを組み合わせたハイブリッドモデリング手法の可能性について検討する。短距離・中距離気象予測のためのアプローチを検証したArcomano et al(2022年)と、気候モデリングの可能性を調査したArcomano et al(2023年)の作業を拡張した。論文の予測実験に使用するハイブリッドモデルは,低分解能で簡易なパラメータ化大気一般循環モデル(AGCM)SPEEDYに基づいている。 SPEEDYのハイブリッド化された予後変数に加えて、現在のモデルには3つのMLベースの予後変数がある。そのうちの1つは6~hの累積降水であり、もう1つは海面温度であり、もう1つは海深300mの層の熱量である。このモデルには、エルニーニョのサイクルと、季節によって3～7ヶ月の降水を伴う世界的なテレコネクションを予測する能力がある。このモデルはケルビン波やロスビー波、MJOに伴う降水の赤道変動を捉えている。赤道域の降水量の予測は東太平洋で15日、西太平洋で11.5日である。このモデルは空間分解能が低いが、これらのタスクには高解像度で純粋に物理ベースの従来の運用予測モデルに匹敵する予測技術がある。

This paper explores the potential of a hybrid modeling approach that combines machine learning (ML) with conventional physics-based modeling for weather prediction beyond the medium range. It extends the work of Arcomano et al. (2022), which tested the approach for short- and medium-range weather prediction, and the work of Arcomano et al. (2023), which investigated its potential for climate modeling. The hybrid model used for the forecast experiments of the paper is based on the low-resolution, simplified parameterization atmospheric general circulation model (AGCM) SPEEDY. In addition to the hybridized prognostic variables of SPEEDY, the current version of the model has three purely ML-based prognostic variables. One of these is 6~h cumulative precipitation, another is the sea surface temperature, while the third is the heat content of the top 300 m deep layer of the ocean. The model has skill in predicting the El Ni\~no cycle and its global teleconnections with precipitation for 3-7 months depending on the season. The model captures equatorial variability of the precipitation associated with Kelvin and Rossby waves and MJO. Predictions of the precipitation in the equatorial region have skill for 15 days in the East Pacific and 11.5 days in the West Pacific. Though the model has low spatial resolution, for these tasks it has prediction skill comparable to what has been published for high-resolution, purely physics-based, conventional operational forecast models.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 低リソース医療質問応答のための2層検索拡張フレームワーク:Redditデータを用いた概念実証

Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data ( http://arxiv.org/abs/2405.19519v1 )

ライセンス: Link先を確認

Sudeshna Das, Yao Ge, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker,

(参考訳) Retrieval augmented generation(RAG)は、生成モデル出力を制限し、関連するインコンテキストテキストを提供することで幻覚の可能性を軽減する能力を提供する。生成的大言語モデル(LLM)は、文脈が有限であるため、トークンの数を組み込むことができるため、答えを生成するための知識の量を制限することができる。本稿では,クエリに着目した回答生成のための2層RAGフレームワークを提案する。評価は,資源制約設定における2層フレームワークの有効性を示し,研究者がユーザからリアルタイムに近いデータを得ることを可能にする。

Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for query-focused answer generation and evaluate a proof-of-concept for this framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. The evaluations demonstrate the effectiveness of the two-layer framework in resource constrained settings to enable researchers in obtaining near real-time data from users.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 困難を伴うクラウドソーシング:不均一項目に対するベイズ評価モデル

Crowdsourcing with Difficulty: A Bayesian Rating Model for Heterogeneous Items ( http://arxiv.org/abs/2405.19521v1 )

ライセンス: Link先を確認

Seong Woo Han, Ozan Adıgüzel, Bob Carpenter,

(参考訳) 応用統計学と機械学習では、訓練に使用される「金の標準」はしばしば偏りがあり、ほとんど常にうるさい。 DawidとSkeneの人気の高いクラウドソーシングモデルは、レーダ(コーダ、アノテータ)の感度と特異性を調整するが、トレーニングのために収集されたレーティングデータの分布特性を捉えることができず、それがトレーニングのバイアスとなる。本研究では,難易度,識別性,推測可能性に項目レベルの影響を加えることで,コンセンサスカテゴリを推測できる汎用的な測定エラーモデルを提案する。さらに、これらのモデルのバイモーダル後部を制限し、(必要であれば許容)敵のレーダを避ける方法を示す。このモデルが後方予測チェックに適合するかどうかを検証し, ベイジアンによる$\chi^2$検定の類似性を検証した。 Dawid と Skene のモデルは適合試験の良さによって無視されるが、アイテムの不均一性を調整する新しいモデルは拒否されない。我々は,2つのよく研究されたデータセット,歯科用X線撮影におけるケーリーのバイナリ・レーティング・データ,および自然言語による含意について述べる。

In applied statistics and machine learning, the "gold standards" used for training are often biased and almost always noisy. Dawid and Skene's justifiably popular crowdsourcing model adjusts for rater (coder, annotator) sensitivity and specificity, but fails to capture distributional properties of rating data gathered for training, which in turn biases training. In this study, we introduce a general purpose measurement-error model with which we can infer consensus categories by adding item-level effects for difficulty, discriminativeness, and guessability. We further show how to constrain the bimodal posterior of these models to avoid (or if necessary, allow) adversarial raters. We validate our model's goodness of fit with posterior predictive checks, the Bayesian analogue of $\chi^2$ tests. Dawid and Skene's model is rejected by goodness of fit tests, whereas our new model, which adjusts for item heterogeneity, is not rejected. We illustrate our new model with two well-studied data sets, binary rating data for caries in dental X-rays and implication in natural language.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# 人工知能インデックスレポート2024

Artificial Intelligence Index Report 2024 ( http://arxiv.org/abs/2405.19522v1 )

ライセンス: Link先を確認

Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Jack Clark,

(参考訳) 2024指数は今まででもっとも包括的な指標であり、AIが社会に影響を及ぼした影響がより顕著になることはなかった重要な瞬間に到達した。今年は、AIの技術的進歩、技術に対する大衆の認識、開発を取り巻く地政学のダイナミクスなど、重要なトレンドをより広範囲にカバーする範囲を広げました。このエディションでは、AIトレーニングコストに関する新たな見積、責任あるAIの状況に関する詳細な分析、AIが科学と医学に与える影響に関するまったく新しい章が導入されている。 AIインデックスは、人工知能(AI)に関連するデータを追跡、照合、蒸留、可視化する。私たちのミッションは、政策立案者、研究者、幹部、ジャーナリスト、一般大衆に対して、AIの複雑な分野に関するより徹底的で厳密な理解を深めるために、偏見のない、厳格に審査された、広範囲にソースされたデータを提供することです。 AIインデックスは、人工知能に関するデータと洞察の最も信頼性が高く権威のある情報源の1つとして、世界的に認識されている。ニューヨーク・タイムズ、ブルームバーグ、ガーディアンなどの主要新聞では、何百もの学術的引用を集めており、米国、英国、欧州連合の高レベルの政策立案者によって参照されている。今年のエディションは、サイズ、スケール、スコープのすべての旧版を上回り、AIが私たちの人生で持つ重要性が増していることを反映している。

The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ever before, this edition introduces new estimates on AI training costs, detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI's impact on science and medicine. The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The AI Index is recognized globally as one of the most credible and authoritative sources for data and insights on artificial intelligence. Previous editions have been cited in major newspapers, including the The New York Times, Bloomberg, and The Guardian, have amassed hundreds of academic citations, and been referenced by high-level policymakers in the United States, the United Kingdom, and the European Union, among other places. This year's edition surpasses all previous ones in size, scale, and scope, reflecting the growing significance that AI is coming to hold in all of our lives.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# ポイント・プロセス・ラーニングとTaccs-Fiksel 推定の特殊な場合の比較

Comparison of Point Process Learning and its special case Takacs-Fiksel estimation ( http://arxiv.org/abs/2405.19523v1 )

ライセンス: Link先を確認

Julia Jansson, Ottmar Cronie,

(参考訳) 最近、Cronie et al (2024)はポイントプロセスのクロスバリデーションの概念と、ポイントプロセス学習(PPL)と呼ばれる新しい統計方法論を導入した。 PPLでは、ポイントプロセス/パターンをトレーニングと検証セットに分割し、パラメトリドのパパンガルー条件強度によって後者を前者から予測する。モデルパラメータは点過程予測誤差を最小化することで推定され、この概念はPPLの2番目のビルディングブロックとして導入された。 PPLは、Gibsハードコアプロセスのカーネル強度推定とパラメータ推定の両方において、最先端技術よりも優れていることを示した。後者の場合、最先端技術は擬似的類似度推定によって表される。本稿では,PPLとTaccs-Fiksel推定の関係について検討する。本稿では, 特定の損失関数を持つPLPが, クロスバリデーション体制を離脱する傾向にある場合, 特定の損失関数を持つPLPをTakacs-Fiksel推定に漸近的に還元するという意味では, PPLの特別な場合であることを示す。さらに、PPLは重み関数によって与えられるある種のハイパーパラメータを伴い、予測誤差が期待値ゼロであることを保証する。重み関数は一般ギブスモデルに対して明示的だが難解な形式をとることを示す。そこで本研究では,実際の重量関数を推定するための異なる手法を提案する。一般のPPLセットアップが特殊ケースであるTakacs-Fiksel推定と比較してどのように動作するかを評価するため、一般的なGibsモデルでは損失関数やハイパーパラメータが得られ、PPLは平均二乗誤差でTakacs-Fiksel推定を著しく上回る。ここで、ハイパーパラメータは、クロスバリデーションパラメータと重み関数の推定値である。

Recently, Cronie et al. (2024) introduced the notion of cross-validation for point processes and a new statistical methodology called Point Process Learning (PPL). In PPL one splits a point process/pattern into a training and a validation set, and then predicts the latter from the former through a parametrised Papangelou conditional intensity. The model parameters are estimated by minimizing a point process prediction error; this notion was introduced as the second building block of PPL. It was shown that PPL outperforms the state-of-the-art in both kernel intensity estimation and estimation of the parameters of the Gibbs hard-core process. In the latter case, the state-of-the-art was represented by pseudolikelihood estimation. In this paper we study PPL in relation to Takacs-Fiksel estimation, of which pseudolikelihood is a special case. We show that Takacs-Fiksel estimation is a special case of PPL in the sense that PPL with a specific loss function asymptotically reduces to Takacs-Fiksel estimation if we let the cross-validation regime tend to leave-one-out cross-validation. Moreover, PPL involves a certain type of hyperparameter given by a weight function which ensures that the prediction errors have expectation zero if and only if we have the correct parametrisation. We show that the weight function takes an explicit but intractable form for general Gibbs models. Consequently, we propose different approaches to estimate the weight function in practice. In order to assess how the general PPL setup performs in relation to its special case Takacs-Fiksel estimation, we conduct a simulation study where we find that for common Gibbs models we can find loss functions and hyperparameters so that PPL typically outperforms Takacs-Fiksel estimation significantly in terms of mean square error. Here, the hyperparameters are the cross-validation parameters and the weight function estimate.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# AIリスクマネジメントは安全とセキュリティの両方を取り入れるべきである

AI Risk Management Should Incorporate Both Safety and Security ( http://arxiv.org/abs/2405.19524v1 )

ライセンス: Link先を確認

Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal,

(参考訳) 安全に整合した言語モデルにおけるセキュリティ脆弱性の暴露、例えば、敵攻撃に対する感受性は、AIの安全性とAIのセキュリティの間の複雑な相互作用に光を当てている。現在、2つの規律はAIリスク管理という大まかな目標の下にまとめられているが、それらは歴史的に別々に進化し、異なる視点を生み出している。そこで,本稿では,AIリスクマネジメントの利害関係者が,安全と安全の間のニュアンス,シナジー,相互作用を意識し,主に効果的で全体論的リスク軽減アプローチを考案するために,両分野の視点を明確かつ考慮しなくてはならないことを主張する。残念なことに、このビジョンは「安全」と「安全」の基本的な概念の定義が矛盾し、コミュニティ全体でのコンセンサスが欠如しているため、しばしば難解である。 AIのリスク管理はますます学際的になってきており、この問題は特に健全だ。この概念的課題を踏まえ、我々は、コミュニティ間の共通理解と効果的なコラボレーションを促進することを目的として、AIの安全性とAIのセキュリティの違いと相互作用を明らかにする統一された参照フレームワークを導入する。

The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# ビデオオブジェクトセグメンテーションにおける領域一般化のためのサブネットの動的成長木を用いた生涯学習

Lifelong Learning Using a Dynamically Growing Tree of Sub-networks for Domain Generalization in Video Object Segmentation ( http://arxiv.org/abs/2405.19525v1 )

ライセンス: Link先を確認

Islam Osman, Mohamed S. Shehata,

(参考訳) 現在の最先端のビデオオブジェクトセグメンテーションモデルは、大量のラベル付きトレーニングデータセットを用いた教師あり学習を用いて大きな成功を収めている。しかし、これらのモデルは単一のソースドメインを使用してトレーニングされ、同じソースドメインからサンプルされたビデオを使用して評価される。これらのモデルが異なる対象領域からサンプリングされたビデオを用いて評価されると、それらの性能はドメインの一般化が貧弱なため著しく低下する。本稿では,マルチドメインソースから効果的に学習するサブネットワーク(DGT)の動的成長木を提案する。 DGTは、学習済みのドメインを忘れることなく、モデルが新しいドメインから継続的に効果的に学習することを可能にする、新しい生涯学習技術を使用している。したがって、モデルはドメイン外のビデオに一般化することができる。提案手法は,シングルソース・イン・ドメイン(従来のビデオ・オブジェクト・セグメンテーション),マルチソース・イン・ドメイン,マルチソース・アウト・オブ・ドメイン・ビデオ・オブジェクト・セグメンテーションを用いて評価する。 DGTの結果は、DAVIS16データセットとDAVIS17データセットでそれぞれ0.2%と3.5%という、単一ソースのドメイン内パフォーマンス向上を示している。しかし、DGTをドメイン内マルチソースを用いて評価すると、この結果は最先端のビデオオブジェクトセグメンテーションや他の生涯学習技術と比較して優れた性能を示し、Fスコアの平均的なパフォーマンスは6.9%、破滅的最小化は6.9%向上した。最後に、ドメイン外実験では、DGTのパフォーマンスは、それぞれ1ショットと5ショットの最先端よりも2.7%、4%向上している。

Current state-of-the-art video object segmentation models have achieved great success using supervised learning with massive labeled training datasets. However, these models are trained using a single source domain and evaluated using videos sampled from the same source domain. When these models are evaluated using videos sampled from a different target domain, their performance degrades significantly due to poor domain generalization, i.e., their inability to learn from multi-domain sources simultaneously using traditional supervised learning. In this paper, We propose a dynamically growing tree of sub-networks (DGT) to learn effectively from multi-domain sources. DGT uses a novel lifelong learning technique that allows the model to continuously and effectively learn from new domains without forgetting the previously learned domains. Hence, the model can generalize to out-of-domain videos. The proposed work is evaluated using single-source in-domain (traditional video object segmentation), multi-source in-domain, and multi-source out-of-domain video object segmentation. The results of DGT show a single source in-domain performance gain of 0.2% and 3.5% on the DAVIS16 and DAVIS17 datasets, respectively. However, when DGT is evaluated using in-domain multi-sources, the results show superior performance compared to state-of-the-art video object segmentation and other lifelong learning techniques with an average performance increase in the F-score of 6.9% with minimal catastrophic forgetting. Finally, in the out-of-domain experiment, the performance of DGT is 2.7% and 4% better than state-of-the-art in 1 and 5-shots, respectively.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# モーションプリミティブを用いたリアルタイムロボット支援ハンドオブジェクトインタラクション

Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives ( http://arxiv.org/abs/2405.19531v1 )

ライセンス: Link先を確認

Mingqi Yuan, Huijiang Wang, Kai-Fung Chu, Fumiya Iida, Bo Li, Wenjun Zeng,

(参考訳) 人工知能(AI)の進歩は、人間-ロボットインタラクション(HRI)技術の進化を促している。しかし、特に人間との物理的接触を必要とするタスクにおいて、シームレスな相互作用を達成する上で重要な課題が残っている。これらの課題は、人間の行動の正確なリアルタイム認識、ロボットの適応制御アルゴリズム、人間とロボットの動きの効果的な調整の必要性から生じる。本稿では,動的ロボット支援ハンドオブジェクトインタラクション(HOI)に着目した物理HRIの強化手法を提案する。提案手法は,ロボットとロボットの協調を支援するために,ポーズ推定,適応ロボット制御,モーションプリミティブを統合した。具体的には,動作プリミティブモデル(MPM)が人間の手の動きをロボット動作に変換するように設計された,単一のRGB画像から手の動きをリアルタイムに3Dモデリングするトランスフォーマーベースのアルゴリズムを用いる。ロボットのアクション実装は、継続的に更新された3Dハンドモデルを使用して動的に微調整される。リングウェアリングタスクを含む実験的な検証は、リアルタイムな動作に適応し、正確なタスク実行を支援するシステムの有効性を実証する。

Advances in artificial intelligence (AI) have been propelling the evolution of human-robot interaction (HRI) technologies. However, significant challenges remain in achieving seamless interactions, particularly in tasks requiring physical contact with humans. These challenges arise from the need for accurate real-time perception of human actions, adaptive control algorithms for robots, and the effective coordination between human and robotic movements. In this paper, we propose an approach to enhancing physical HRI with a focus on dynamic robot-assisted hand-object interaction (HOI). Our methodology integrates hand pose estimation, adaptive robot control, and motion primitives to facilitate human-robot collaboration. Specifically, we employ a transformer-based algorithm to perform real-time 3D modeling of human hands from single RGB images, based on which a motion primitives model (MPM) is designed to translate human hand motions into robotic actions. The robot's action implementation is dynamically fine-tuned using the continuously updated 3D hand models. Experimental validations, including a ring-wearing task, demonstrate the system's effectiveness in adapting to real-time movements and assisting in precise task executions.

翻訳日:2024-05-31 19:06:28 公開日:2024-05-29

# マルチMarginal Matching Gapによる複数表現の対比

Contrasting Multiple Representations with the Multi-Marginal Matching Gap ( http://arxiv.org/abs/2405.19532v1 )

ライセンス: Link先を確認

Zoe Piran, Michal Klein, James Thornton, Marco Cuturi,

(参考訳) 複数の(k\geq 3$)ビューやモダリティを通して見ることができる複雑なオブジェクトの有意義な表現を学習することは、機械学習における中核的なタスクである。既存のメソッドでは、元々はペアビュー用に意図された損失を使用し、$\tfrac12k(k-1)$ロスペアをインスタンス化するか、あるいは、‘textit{one vs. average-of-rest}戦略に従って、最小の埋め込みを使用することで、$k$ビューに拡張している。我々はM3G(Multi-marginal matching gap)を提案し、M3G(Multi-marginal optimal transport)理論からツールを借りてすべての$k$ビューを同時に組み込む。それぞれのビューが$k$-tuplesに変換され、その後$k$-tuplesに変換されると、損失は、$n$ ground-truth $k$-tuplesとMM-OTポリマッチングコストをマッチングするコストとを比較します。 MM-OT問題の指数関数的複雑性$O(n^k$)は恐ろしいように思えるが、その問題に対するシンクホーンアルゴリズムの適切な一般化は、例えば、$k=3\sim 6$ view に6.4~\sim128$ のミニバッチを用いてスケールできることを実験で示す。本実験は、自己教師型タスクとマルチモーダル型タスクの両方において、ペアワイズ損失のマルチビュー拡張よりも性能が向上したことを示す。

Learning meaningful representations of complex objects that can be seen through multiple ($k\geq 3$) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to $k$ views, either by instantiating $\tfrac12k(k-1)$ loss-pairs, or by using reduced embeddings, following a \textit{one vs. average-of-rest} strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to simultaneously incorporate all $k$ views. Given a batch of $n$ points, each seen as a $k$-tuple of views subsequently transformed into $k$ embeddings, our loss contrasts the cost of matching these $n$ ground-truth $k$-tuples with the MM-OT polymatching cost, which seeks $n$ optimally arranged $k$-tuples chosen within these $n\times k$ vectors. While the exponential complexity $O(n^k$) of the MM-OT problem may seem daunting, we show in experiments that a suitable generalization of the Sinkhorn algorithm for that problem can scale to, e.g., $k=3\sim 6$ views using mini-batches of size $64~\sim128$. Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# 選好学習アルゴリズムは選好ランキングを学習しない

Preference Learning Algorithms Do Not Learn Preference Rankings ( http://arxiv.org/abs/2405.19534v1 )

ライセンス: Link先を確認

Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho,

(参考訳) 優先学習アルゴリズム(例えば、RLHFやDPO)は、LLMを操り、人間に好まれる世代を生成するために頻繁に使われていますが、その内部動作に対する私たちの理解は限定的です。そこで本研究では,選好学習モデルを用いて,好ましくない出力よりも好ましくない出力により高い確率を割り当てる従来の知恵を,$\textit{ ranking accuracy}$で測定した。驚いたことに、ほとんどの最先端の選好調整モデルでは、一般的な選好データセットでは60%未満のランキング精度が得られる。さらに、DPO や RLHF の目的を完璧に最適化すれば、優先順位調整 LLM が達成できるという $\textit{idealized ranking accuracy}$ を導出する。我々は既存のモデルが有意な$\textit{alignment gap}$ -- $\textit{i.e.}$を示すことを示した。提案手法は,参照モデルにおける微妙なランク付け誤りの修正に経験的かつ理論的に不適なDPO目的に起因し,与えられた選好データポイントの学習の難しさを定量化するための単純かつ効率的な公式を導出する。最後に、評価精度は、モデルが目的の基準モデルに近い場合に、経験的に人気の高い利率指標と強く相関し、オン・ポリティ(例えば、RLHF)とオフ・ポリティ(例えば、DPO)の選好学習アルゴリズムの違いにさらに光を当てることを示した。

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via $\textit{ranking accuracy}$. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the $\textit{idealized ranking accuracy}$ that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant $\textit{alignment gap}$ -- $\textit{i.e.}$, a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e.g., RLHF) and off-policy (e.g., DPO) preference learning algorithms.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# Einstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen correlations for teleporting collective spin state in a two-dimensional trapion crystal

Generating Einstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen correlations for teleporting collective spin states in a two dimensional trapped ion crystal ( http://arxiv.org/abs/2405.19536v1 )

ライセンス: Link先を確認

Muhammad Miskeen Khan, Edwin Chaparro, Bhuvanesh Sundar, Allison Carter, John Bollinger, Klaus Molmer, Ana Maria Rey,

(参考訳) 我々は、エンジニアEinstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen(EPR)相関に対する絡み合い資源としてのフォノン$\unicode{x2013}$介在相互作用の利用を提案し、2$\unicode{x2013}$次元イオン結晶における集合スピン状態のテレポーテーションを行う。我々は、異なる核スピン度に対応するサブシステム間の連続可変量子テレポーテーションプロトコルをエミュレートする。それぞれにおいて、量子状態は電子スピンの度合いで符号化され、結晶の振動モードと結合する。スピンコヒーレント状態とその相変位変種、絡み合ったスピンスクイーズ状態、およびディック状態の高忠実テレポーテーションは、数十イオンから数百イオンの配列における現実的な実験条件に対して可能であることを示す。

We propose the use of phonon$\unicode{x2013}$mediated interactions as an entanglement resource to engineer Einstein$\unicode{x2013}$Podolsky$\unicode{x2013}$Rosen (EPR) correlations and to perform teleportation of collective spin states in two$\unicode{x2013}$dimensional ion crystals. We emulate continuous variable quantum teleportation protocols between subsystems corresponding to different nuclear spin degrees of freedom. In each of them, a quantum state is encoded in an electronic spin degree of freedom that couples to the vibrational modes of the crystal. We show that high fidelity teleportation of spin-coherent states and their phase-displaced variant, entangled spin-squeezed states, and Dicke states, is possible for realistic experimental conditions in arrays from a few tens to a few hundred ions.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# パラメータ化量子回路の最適複雑性

Optimal complexity of parameterized quantum circuits ( http://arxiv.org/abs/2405.19537v1 )

ライセンス: Link先を確認

Guilherme Ilário Correr, Pedro C. Azado, Diogo O. Soares-Pinto, Gabriel Carlo,

(参考訳) パラメータ化量子回路は、NISQ時代の領域における量子変分アルゴリズムの発展に重要な役割を果たしている。さまざまなタスクを実行する実際の能力を知ることが,その上で最も重要なのです。それらを普遍ランダム回路の原型クラスと比較することにより、ハール測度によって定義される漸近的複雑性へのアプローチはより速く、それに到達するためのゲートが少なくなることが判明した。このためにトポロジーが重要視されている。メジャー化基準は、表現可能性と平均的絡み合いを補完するツールとして証明されている。

Parameterized quantum circuits play a key role for the development of quantum variational algorithms in the realm of the NISQ era. Knowing their actual capability of performing different kinds of tasks is then of the utmost importance. By comparing them with a prototypical class of universal random circuits we have found that their approach to the asymptotic complexity defined by the Haar measure is faster, needing less gates to reach it. Topology has been revealed crucial for this. The majorization criterion has proven as a relevant complementary tool to the expressibility and the mean entanglement.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# CheXpert Plus:何百もの放射線学のテキスト、画像、患者

CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and Patients ( http://arxiv.org/abs/2405.19538v1 )

ライセンス: Link先を確認

Pierre Chambon, Jean-Benoit Delbrouck, Thomas Sounack, Shih-Cheng Huang, Zhihong Chen, Maya Varma, Steven QH Truong, Chu The Chuong, Curtis P. Langlotz,

(参考訳) 5年前にCheXpertの最初の論文がリリースされて以来、CheXpertは最も広く使われ、引用された臨床AIデータセットの1つになった。ビジョン言語モデルの出現は、CheXpertイメージに関連するレポートの共有要求の高まりを招き、人口統計データを取得することへのAIフェアネス研究者の関心が高まった。これを解決するため、CheXpert Plusは、放射線学の分野におけるその後のすべての機械学習タスクに対するモデルのスケーリング、パフォーマンス、堅牢性、公平性を高めるために公開された、新しい放射線学データソースのコレクションとして機能する。 CheXpert Plusは、放射線学で公開された最大のテキストデータセットで、合計で3600万のテキストトークンがあり、1300万のインプレッショントークンが含まれている。私たちの知る限りでは、これは放射線学における最大のテキスト識別の取り組みであり、ほぼ100万PHIが匿名化されている。大規模な英語ペアデータセットが放射線学でリリースされたのは2回目であり、これにより初めて大規模なクロスインスティテュートトレーニングが可能になる。全てのレポートは、DICOMフォーマットの高品質な画像と組み合わせられ、様々な臨床および社会経済的グループを含む多数の画像と患者のメタデータ、および多くの病理ラベルとRadGraphアノテーションが提供される。このデータセットは、放射線科医のさらなる支援と医療改善に役立つAIモデルの研究を促進することを願っている。 https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 モデルは以下のURLで利用可能である。

Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# 大規模配電系統における低エントロピー結合の計算

Computing Low-Entropy Couplings for Large-Support Distributions ( http://arxiv.org/abs/2405.19540v1 )

ライセンス: Link先を確認

Samuel Sokota, Dylan Sam, Christian Schroeder de Witt, Spencer Compton, Jakob Foerster, J. Zico Kolter,

(参考訳) 最小エントロピー結合(MEC、Minimum-Entropy coupling)とは、最小エントロピーを持つ最小エントロピーを求める過程であり、因果関係やステガノグラフィーなどの分野で応用されている。しかし、既存のアルゴリズムは、大容量の分布に対して計算的に抽出可能であるか、特定の分布タイプに制限されているか、ハイパーパラメータの選択に敏感である。この研究は、従来の反復MEC(IMEC)アプローチを一般化されたパーティションベースの形式主義に統一することで、これらの制限に対処する。この枠組みから任意の離散分布を処理できる新しいIMECアルゴリズムであるARIMECを導出し、最適化されたハイパーパラメータ設定に対してIMECを堅牢にする方法を提案する。これらの革新はIMECの言語モデルを用いた高スループットステガノグラフィーへの応用を促進する。私たちのコードベースはhttps://github.com/ssokota/mec で公開されています。

Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec .

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# Aモード超音波信号の動的復号化による解剖学的領域認識とリアルタイム骨追跡法

Anatomical Region Recognition and Real-time Bone Tracking Methods by Dynamically Decoding A-Mode Ultrasound Signals ( http://arxiv.org/abs/2405.19542v1 )

ライセンス: Link先を確認

Bangyu Lan, Stefano Stramigioli, Kenan Niu,

(参考訳) 整形外科と補綴ロボットの運動解析には正確な骨追跡が不可欠である。従来の方法(例えば皮膚マーカー)は軟部組織のアーティファクトであり、手術で使用される骨のピンは、追加の外傷や感染のリスクをもたらす。エレクトロミオグラフィー(EMG)では、関節角度を直接測定できないため、運動学的推定のための複雑なアルゴリズムが必要である。これらの問題に対処するため、Aモード超音波による追跡は非侵襲的で安全な代替手段として提案されている。しかし、この手法は、受信した超音波信号を処理する際にピーク検出の精度が限られている。本稿では,Aモード超音波信号を用いた解剖学的領域認識と骨追跡のための深層学習手法を提案する。このアルゴリズムは、同時に骨追跡を行い、Aモード超音波トランスデューサが置かれた解剖学的領域を特定することができる。これは、カスケードされたU-Netのすべてのエンコーディング層とデコード層の間の完全な接続を含み、骨のピークを持つ可能性が高い信号領域のみに焦点を合わせ、ピークの正確な位置を特定し、信号の解剖学的領域を分類する。実験では, 関節周囲の解剖学的領域に対する動的追跡条件下で, 解剖学的領域の分類において97%の精度, 約0.5$\pm$1mmの精度を示した。一般に, 超音波が付加機能として付加された解剖学的領域の精度と認識において, 従来の手法を超える大きな可能性を示す。

Accurate bone tracking is crucial for kinematic analysis in orthopedic surgery and prosthetic robotics. Traditional methods (e.g., skin markers) are subject to soft tissue artifacts, and the bone pins used in surgery introduce the risk of additional trauma and infection. For electromyography (EMG), its inability to directly measure joint angles requires complex algorithms for kinematic estimation. To address these issues, A-mode ultrasound-based tracking has been proposed as a non-invasive and safe alternative. However, this approach suffers from limited accuracy in peak detection when processing received ultrasound signals. To build a precise and real-time bone tracking approach, this paper introduces a deep learning-based method for anatomical region recognition and bone tracking using A-mode ultrasound signals, specifically focused on the knee joint. The algorithm is capable of simultaneously performing bone tracking and identifying the anatomical region where the A-mode ultrasound transducer is placed. It contains the fully connection between all encoding and decoding layers of the cascaded U-Nets to focus only on the signal region that is most likely to have the bone peak, thus pinpointing the exact location of the peak and classifying the anatomical region of the signal. The experiment showed a 97% accuracy in the classification of the anatomical regions and a precision of around 0.5$\pm$1mm under dynamic tracking conditions for various anatomical areas surrounding the knee joint. In general, this approach shows great potential beyond the traditional method, in terms of the accuracy achieved and the recognition of the anatomical region where the ultrasound has been attached as an additional functionality.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# 最適双対化による大規模言語モデルのワンショット安全アライメント

One-Shot Safety Alignment for Large Language Models via Optimal Dualization ( http://arxiv.org/abs/2405.19544v1 )

ライセンス: Link先を確認

Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding,

(参考訳) LLM(Large Language Models, 大規模言語モデル)を取り巻く安全性の懸念が高まり、その利便性と安全性を同時に向上するために、様々な人間の好みに合わせる必要がある。有望なアプローチは、RLHF(Reinforcement Learning from Human Feedback)を通じて安全性の制約を実施することである。このような制約付きRLHFでは、一般的なラグランジアンベースの原始双対ポリシー最適化手法は計算コストが高く、しばしば不安定である。本稿では,制約付きアライメントを等価な非制約アライメント問題に還元する双対化の観点を提案する。我々は、閉形式を持つ滑らかで凸な双対函数を事前に最適化する。このショートカットは、煩雑な原始二重ポリシー反復の必要性を排除し、計算負担を大幅に低減し、訓練安定性を向上させる。我々の戦略はモデルベースと嗜好ベースのシナリオ(それぞれMoCANとPeCAN)の2つの実践的アルゴリズムに導かれる。幅広い実験により,本手法の有効性が示された。

The growing safety concerns surrounding Large Language Models (LLMs) raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, common Lagrangian-based primal-dual policy optimization methods are computationally expensive and often unstable. This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. This shortcut eliminates the need for cumbersome primal-dual policy iterations, thus greatly reducing the computational burden and improving training stability. Our strategy leads to two practical algorithms in model-based and preference-based scenarios (MoCAN and PeCAN, respectively). A broad range of experiments demonstrate the effectiveness of our methods.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# マルチモーダルコントラスト学習のためのCLIPLosとノルムに基づくデータ選択法

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning ( http://arxiv.org/abs/2405.19547v1 )

ライセンス: Link先を確認

Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei Du,

(参考訳) データ選択は、大規模なビジュアル言語モデル(例えば、CLIP)、特に騒がしいWebキュレートデータセットにおいて、中心的な問題として現れている。 3つの主要なデータ選択アプローチは、(1)外部の非CLIPモデルを活用してデータ選択を支援すること、(2)元々のOpenAI CLIPモデルよりも高品質なデータを選択するのに効果的であるCLIPスタイルの埋め込みモデルをトレーニングすること、(3)特定のモデルプロパティを必要とせずにCLIP埋め込みに適用可能なより良いメトリクスや戦略を設計すること(例えば、CLIPScoreは人気のあるメトリックである)である。最初の2つのアプローチは広く研究されているが、第3のアプローチは未調査のままである。本稿では,2つの新しい手法を提案することによって,第3のアプローチを推し進める。まず,1つのサンプルから2つのモダリティのアライメントのみを考慮する古典的なCLIPスコアの代わりに,1つのサンプルとその対照的なペア間のアライメントを追加するCLIPロスインスパイア法であるnegCLIPLossを導入する。第二に、下流タスクが分かっている場合、事前学習データと対象データとの類似性を測定するために、ノルムシムという新しい基準ベースの指標を提案する。我々は、データ選択ベンチマークDataComp~\cite{gadre2023datacomp}でメソッドをテストする。 OpenAIのCLIP-L/14のみを使用した最高のベースラインと比較すると,ImageNet-1kでは5.3倍,38ダウンストリーム評価タスクでは2.8倍の改善を実現している。さらに、negCLIPLossとNormSimはどちらも既存の技術と互換性がある。現在のベストメソッドDFN~\cite{fang2023data} とHYPE~\cite{kim2024hype} を組み合わせることで、ダウンストリームタスクにおける平均パフォーマンスを0.9\%向上させ、新しい最先端を実現することができます。

Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data than the original OpenAI CLIP model, and (3) designing better metrics or strategies universally applicable to any CLIP embedding without requiring specific model properties (e.g., CLIPScore is one popular metric). While the first two approaches have been extensively studied, the third remains under-explored. In this paper, we advance the third approach by proposing two new methods. Firstly, instead of classical CLIP scores that only consider the alignment between two modalities from a single sample, we introduce negCLIPLoss, a CLIP loss-inspired method that adds the alignment between one sample and its contrastive pairs as an extra normalization term for better quality measurement. Secondly, when downstream tasks are known, we propose a new norm-based metric, NormSim, to measure the similarity between pretraining data and target data. We test our methods on the data selection benchmark, DataComp~\cite{gadre2023datacomp}. Compared to the best baseline using only OpenAI's CLIP-L/14, our methods achieve a 5.3\% improvement on ImageNet-1k and a 2.8\% improvement on 38 downstream evaluation tasks. Moreover, both negCLIPLoss and NormSim are compatible with existing techniques. By combining our methods with the current best methods DFN~\cite{fang2023data} and HYPE~\cite{kim2024hype}, we can boost average performance on downstream tasks by 0.9\%, achieving a new state-of-the-art.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# RLeXplore: 本質的な動機付け強化学習における加速研究

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning ( http://arxiv.org/abs/2405.19548v1 )

ライセンス: Link先を確認

Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Glen Berseth, Wenjun Zeng,

(参考訳) 外部報酬は、特定のタスクにおける強化学習(RL)エージェントを効果的に導くことができる。しかしながら、外在的な報酬は、設計やアノテーションに必要な人的労力のために、複雑な環境でしばしば不足する。この制限は、補助的かつ高密度な信号を提供し、エージェントが教師なしの方法で学習できるようにする本質的な報酬の必要性を浮き彫りにする。様々な本質的な報酬の定式化が提案されているが、その実装と最適化の詳細は不十分であり、標準化が欠如しているため、研究の進展を妨げている。このギャップに対処するため、我々はRLeXploreを紹介した。RLeXploreは8つの最先端固有の報酬アルゴリズムの信頼性のある実装を提供する統一的で高度にモジュール化されたプラグイン・アンド・プレイのフレームワークである。さらに、重要な実装の詳細を特定し、本質的な動機付けRLにおける適切な標準プラクティスを確立するための詳細な研究を行う。 RLeXploreのソースコードはhttps://github.com/RLE-Foundation/RLeXploreで公開されている。

Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. The source code for RLeXplore is available at https://github.com/RLE-Foundation/RLeXplore.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# パスワード同期モデルによるストレス試験能力の回復

Stress-Testing Capability Elicitation With Password-Locked Models ( http://arxiv.org/abs/2405.19550v1 )

ライセンス: Link先を確認

Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov, David Krueger,

(参考訳) 大規模言語モデル(LLM)の安全性を決定するためには、AI開発者は危険な能力を評価する必要がある。しかし、単純なプロンプト戦略はLLMの全機能を引き出すのに失敗することが多い。より堅牢に機能を引き出す1つの方法は、タスクを完了させるためにLLMを微調整することです。本稿では,微調整による誘引が能力を引き出すのに十分である条件について検討する。これを実現するために、パスワードロックモデル(LLM)を導入し、その機能の一部を意図的に隠蔽するように微調整する。具体的には、これらのLSMは、プロンプトにパスワードが存在する場合にのみこれらの能力を示すように訓練され、それ以外はより弱いLSMを模倣する。パスワードロックされたモデルは、パスワードを使わずにこれらのパスワードロックされた機能を引き出すことができるかどうかをテストすることによって、機能を引き出す新しい方法を可能にする。いくつかの高品質なデモは、パスワードでロックされた機能を完全に引き出すのに十分であることがわかった。より驚くべきことに、微調整は、同じパスワードや異なるパスワードを使ってロックされた他の機能を引き出すことができる。さらに、評価のみが利用可能であり、実演ではない場合、強化学習のようなアプローチは、しばしば能力を引き出すことができる。全体としては、ファインチューニングは現在のモデルの隠れた能力を引き出す効果的な方法であるが、高品質なデモンストレーションが得られない場合、例えばモデルの(隠された)能力が人間のデモの能力を上回る場合など、信頼性が低い可能性があることを示唆している。

To determine the safety of large language models (LLMs), AI developers must be able to assess their dangerous capabilities. But simple prompting strategies often fail to elicit an LLM's full capabilities. One way to elicit capabilities more robustly is to fine-tune the LLM to complete the task. In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elicit capabilities. To do this, we introduce password-locked models, LLMs fine-tuned such that some of their capabilities are deliberately hidden. Specifically, these LLMs are trained to exhibit these capabilities only when a password is present in the prompt, and to imitate a much weaker LLM otherwise. Password-locked models enable a novel method of evaluating capabilities elicitation methods, by testing whether these password-locked capabilities can be elicited without using the password. We find that a few high-quality demonstrations are often sufficient to fully elicit password-locked capabilities. More surprisingly, fine-tuning can elicit other capabilities that have been locked using the same password, or even different passwords. Furthermore, when only evaluations, and not demonstrations, are available, approaches like reinforcement learning are still often able to elicit capabilities. Overall, our findings suggest that fine-tuning is an effective method of eliciting hidden capabilities of current models, but may be unreliable when high-quality demonstrations are not available, e.g. as may be the case when models' (hidden) capabilities exceed those of human demonstrators.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# ソフト分解を用いた連続モンテカルロのマルチモーダル分布の収束境界

Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition ( http://arxiv.org/abs/2405.19553v1 )

ライセンス: Link先を確認

Holden Lee, Matheau Santana-Gijzen,

(参考訳) 逐次モンテカルロ法(SMC)アルゴリズムで得られたサンプルの実証測度の下での関数$f$の分散に関するバウンダリを,大域マルコフ連鎖混合力学よりも局所的に依存する時間的複雑さで証明する。 SMCはマルコフ・チェイン・モンテカルロ (MCMC) 法であり、既知の分布からN$の粒子を引いてから始まり、各インスタンスでマルコフ連鎖を滑らかにするために再サンプリングする。原則として、SMCは多モード性から問題を緩和しようとする。しかし、SMCのほとんどの理論的保証は、一様条件でのみ効率的である大域的な混合時間境界を仮定することで得られる。局所MCMCダイナミクスにのみ依存する混合時間を用いて,真のマルチモーダル設定でバウンダリが得られることを示す。

We prove bounds on the variance of a function $f$ under the empirical measure of the samples obtained by the Sequential Monte Carlo (SMC) algorithm, with time complexity depending on local rather than global Markov chain mixing dynamics. SMC is a Markov Chain Monte Carlo (MCMC) method, which starts by drawing $N$ particles from a known distribution, and then, through a sequence of distributions, re-weights and re-samples the particles, at each instance applying a Markov chain for smoothing. In principle, SMC tries to alleviate problems from multi-modality. However, most theoretical guarantees for SMC are obtained by assuming global mixing time bounds, which are only efficient in the uni-modal setting. We show that bounds can be obtained in the truly multi-modal setting, with mixing times that depend only on local MCMC dynamics.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# 離散分布のクラスタリング混合:Mitraのアルゴリズムについて

Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm ( http://arxiv.org/abs/2405.19559v1 )

ライセンス: Link先を確認

Mohamed Seif, Yanxi Chen,

(参考訳) 本稿では、一般的な離散混合分布モデルを分類するためのMitraのアルゴリズム \cite{mitra2008clustering} の洗練された解析を行う。このアルゴリズムはスペクトルクラスタリング \cite{mcsherry 2001spectral} に基づいて構築され、確率分布に対して魅力的な条件を提供する。この分析は,確率的ブロックモデルに2分割するようにモデルを調整し,より洗練された条件を与える。その結果, 分離条件の改善が得られた。

In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined conditions. Compared to those derived in \cite{mitra2008clustering}, our improved separation conditions are obtained.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# Quo Vadis ChatGPT : 大規模言語モデルから大規模知識モデルへ

Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models ( http://arxiv.org/abs/2405.19561v1 )

ライセンス: Link先を確認

Venkat Venkatasubramanian, Arijit Chakraborty,

(参考訳) 自然言語処理や画像合成といったアプリケーションにおけるトランスフォーマーベースの生成ニューラルネットワークアーキテクチャを用いたChatGPTやその他の大規模言語モデル(LLM)の驚くべき成功は、プロセスシステム工学(PSE)の潜在的な可能性に多くの研究者が興奮している。これらの分野でのLLMのほぼ人間的なパフォーマンスは非常に印象的であり、驚き、そして大きなブレークスルーです。その機能は、ドキュメントの最初のドラフトを書くこと、コード記述補助、テキストの要約など、特定のタスクで非常に役立ちます。しかし、その成功は、詳細なドメイン知識の欠如のために、まだ推論、計画、説明ができないため、非常に科学的領域において限られている。これは化学工学などの分野において、物理や化学(および生物学)の基本法則、構成的関係、材料、プロセス、システムに関する高度な技術知識によって支配される問題である。純粋にデータ駆動機械学習はすぐに使えるが、科学と工学の分野におけるAIの長期的な成功は、第一原理と技術的な知識を効果的に活用するハイブリッドAIシステムの開発に依存する。我々はこれらのハイブリッドAIシステムをLKM(Large Knowledge Models)と呼んでいる。本稿では,化学工学におけるこのようなシステム開発における課題と機会について論じる。

The startling success of ChatGPT and other large language models (LLMs) using transformer-based generative neural network architecture in applications such as natural language processing and image synthesis has many researchers excited about potential opportunities in process systems engineering (PSE). The almost human-like performance of LLMs in these areas is indeed very impressive, surprising, and a major breakthrough. Their capabilities are very useful in certain tasks, such as writing first drafts of documents, code writing assistance, text summarization, etc. However, their success is limited in highly scientific domains as they cannot yet reason, plan, or explain due to their lack of in-depth domain knowledge. This is a problem in domains such as chemical engineering as they are governed by fundamental laws of physics and chemistry (and biology), constitutive relations, and highly technical knowledge about materials, processes, and systems. Although purely data-driven machine learning has its immediate uses, the long-term success of AI in scientific and engineering domains would depend on developing hybrid AI systems that use first principles and technical knowledge effectively. We call these hybrid AI systems Large Knowledge Models (LKMs), as they will not be limited to only NLP-based techniques or NLP-like applications. In this paper, we discuss the challenges and opportunities in developing such systems in chemical engineering.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# 選択的説明

Selective Explanations ( http://arxiv.org/abs/2405.19562v1 )

ライセンス: Link先を確認

Lucas Monteiro Paes, Dennis Wei, Flavio P. Calmon,

(参考訳) 特徴属性法は、重要点を入力特徴に割り当てることで、ブラックボックス機械学習(ML)モデルを説明する。これらの手法は大規模MLモデルでは計算コストがかかる。この課題に対処するために、機械学習モデルに1つの推論だけで特徴帰属スコアを予測するためのトレーニングを行う、償却説明書の開発への取り組みが増えている。その効率にもかかわらず、償却された説明者は不正確な予測や誤解を招く説明を生み出すことができる。本稿では,新しい特徴帰属法である選択的説明法を提案する。 (i)償却説明書が品質の低い説明を生成するときの検出 (二)最初の推測による説明という手法を用いて、これらの説明を改善する。我々の選択的説明法は,説明を受けるサンプルのごく一部を初期推定で特定し,償却説明書と高品質な説明書とのギャップを埋める基本的方法を提供する。

Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there has been increasing efforts to develop amortized explainers, where a machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. In this paper, we propose selective explanations, a novel feature attribution method that (i) detects when amortized explainers generate low-quality explanations and (ii) improves these explanations using a technique called explanations with initial guess. Our selective explanation method allows practitioners to specify the fraction of samples that receive explanations with initial guess, offering a principled way to bridge the gap between amortized explainers and their high-quality counterparts.

翻訳日:2024-05-31 18:56:18 公開日:2024-05-29

# 大規模言語モデルにおける未学習の気候誤報

Unlearning Climate Misinformation in Large Language Models ( http://arxiv.org/abs/2405.19563v1 )

ライセンス: Link先を確認

Michael Fore, Simranjit Singh, Chaehong Lee, Amritanshu Pandey, Antonios Anastasopoulos, Dimitrios Stamoulis,

(参考訳) 気候変動に関する誤報は、人類にとって最も深刻な脅威の1つに対処する上で、重要な障害となっている。本稿では,気候情報に関する大規模言語モデル(LLM)の事実的精度について検討する。実/偽ラベル付きQ&Aデータを用いて、気候変動問題に対する真正な応答を生成できるオープンソースモデルを比較し、気候変動に関する主張を微調整し評価する。本研究は, 意図的に偽の気候情報に汚染されたモデルの検出可能性について検討し, 他領域におけるモデル応答の精度には影響しないことを示した。さらに, 未学習アルゴリズム, 微調整, 検索・拡張生成(RAG)の有効性を, 気候変動トピックに基づく現実的なLLMの有効性と比較した。評価の結果, 未学習アルゴリズムは, プライバシの文脈における非効率性を示唆する以前の知見にもかかわらず, 曖昧な概念的主張に対して有効であることが明らかとなった。これらの知見は、より現実的に信頼性の高いLLMの開発を導くことを目的としており、誤情報攻撃に対するLLMの安全性を確保するための追加作業の必要性を強調している。

Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity. This paper investigates factual accuracy in large language models (LLMs) regarding climate information. Using true/false labeled Q&A data for fine-tuning and evaluating LLMs on climate-related claims, we compare open-source models, assessing their ability to generate truthful responses to climate change questions. We investigate the detectability of models intentionally poisoned with false climate information, finding that such poisoning may not affect the accuracy of a model's responses in other domains. Furthermore, we compare the effectiveness of unlearning algorithms, fine-tuning, and Retrieval-Augmented Generation (RAG) for factually grounding LLMs on climate change topics. Our evaluation reveals that unlearning algorithms can be effective for nuanced conceptual claims, despite previous findings suggesting their inefficacy in privacy contexts. These insights aim to guide the development of more factually reliable LLMs and highlight the need for additional work to secure LLMs against misinformation attacks.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# 2次元Rydberg原子アレイにおける雑音耐性パリティ制御ゲートによる量子誤差検出

Quantum error detection with noise-resilient parity-controlled gate in two-dimensional Rydberg atom arrays ( http://arxiv.org/abs/2405.19564v1 )

ライセンス: Link先を確認

F. Q. Guo, S. L. Su, Weibin Li, X. Q. Shao,

(参考訳) 量子誤り検出は主に量子情報処理の基本的な操作である量子ビットパリティの正確な測定に依存する。本稿では,2次元Rydberg原子配列内の量子誤差を検出するためのレジリエントパリティ制御ゲートを提案する。本手法は, 補助原子の動的進化を追跡することにより, 仮想励起制御原子の偶数パリティと奇数パリティの識別を可能にする。 Rydberg状態とRydberg状態の間のスピン交換双極子相互作用と単光子と2光子駆動を用いることで、Rydberg-parity測定を従来の方法と比較して大幅に高速化する。本稿では,3ビット繰り返し符号,ZZZ$およびXXXX$の安定化器を特徴とする標準曲面符号,およびXZZX$構成の回転曲面符号について検討し,単発読み出しによる安定化器の測定を容易にする。我々は,Rydberg状態間の望ましくない相互作用,原子位置のゆらぎ,劣化ノイズ,レーザー振幅の不均一性などの潜在的な実験的不完全性を考慮して,我々の戦略の有効性を評価するために,徹底的な数値シミュレーションを行った。特に、パリティメータの物理機構の信頼性と利点の確保に重点を置いている。これらの結果は、我々のプロトコルの堅牢性と生存性を確認し、近い将来、Rydberg原子系を用いた量子エラー検出の有望な候補として位置づけた。

Quantum error detection relies primarily on precise measurement of qubit parity, a fundamental operation in quantum information processing. Here, we introduce a resilient parity-controlled gate tailored for detecting quantum errors within a 2D Rydberg atom array. Our method enables the discrimination between even and odd parities of virtually excited control atoms by tracking the dynamic evolution of an auxiliary atom. Using spin-exchange dipolar interactions of Rydberg states and single- and two-photon driving between ground states and Rydberg states, our method speeds up Rydberg-parity measurements by a large amount compared to previous methods. In practical application, we explore three-qubit repetition codes, standard surface codes featuring stabilizers in the forms $ZZZZ$ and $XXXX$, as well as rotated surface codes in the $XZZX$ configuration, facilitating the measurement of stabilizers with a single-shot readout. We carry out thorough numerical simulations to evaluate the feasibility of our strategy, considering potential experimental imperfections such as undesired interactions between Rydberg states, fluctuations in atomic positions, dephasing noise, and laser amplitude inhomogeneities. Particular emphasis is placed on ensuring the reliability and advantages of the physical mechanisms of the parity meter. These results affirm the robustness and viability of our protocol, positioning it as a promising candidate for quantum error detection employing the Rydberg atom system in the foreseeable future.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# Dr-LLaVA:シンボリック臨床接地による視覚指導

Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding ( http://arxiv.org/abs/2405.19567v1 )

ライセンス: Link先を確認

Shenghuan Sun, Gregory M. Goldgof, Alexander Schubert, Zhiqing Sun, Thomas Hartvigsen, Atul J. Butte, Ahmed Alaa,

(参考訳) VLM(Vision-Language Models)は、医用画像を分析し、自然言語の相互作用に関わり、診断や治療のタスクを支援することで、臨床医を支援する。しかしながら、VLMはしばしば「幻覚的」な振る舞いを示し、文脈的マルチモーダル情報に基づかないテキスト出力を生成する。この課題は医学領域において特に顕著であり、VLM出力は単一の相互作用において正確であるだけでなく、マルチターン会話を通して臨床推論や診断経路と一致している。そこで本研究では,臨床推論のシンボル表現を用いて,VLMを医用知識に根ざしたアライメントアルゴリズムを提案する。これらの表現は利用されます i) GPT-4誘導視覚指導チューニングデータを大規模に生成し、臨床とVLMの会話と臨床推論のデモンストレーションをシミュレートし、 (II)臨床医とVLMの相互作用を通じてVLM世代の臨床的妥当性を評価する自動報酬関数を作成する。提案アルゴリズムは, トレーニングデータ生成や報酬モデル構築における人間の関与を排除し, 人間のフィードバックによる標準的な強化学習(RLHF)と比較してコストを低減させる。本アルゴリズムを用いて骨髄病理のスライド解析に最適化された対話型VLMであるDr-LLaVAを開発し,マルチターン医療会話において高い性能を示した。

Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VLM outputs to be accurate in single interactions but also to be consistent with clinical reasoning and diagnostic pathways throughout multi-turn conversations. For this purpose, we propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge. These representations are utilized to (i) generate GPT-4-guided visual instruction tuning data at scale, simulating clinician-VLM conversations with demonstrations of clinical reasoning, and (ii) create an automatic reward function that evaluates the clinical validity of VLM generations throughout clinician-VLM interactions. Our algorithm eliminates the need for human involvement in training data generation or reward model construction, reducing costs compared to standard reinforcement learning with human feedback (RLHF). We apply our alignment algorithm to develop Dr-LLaVA, a conversational VLM finetuned for analyzing bone marrow pathology slides, demonstrating strong performance in multi-turn medical conversations.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# インクリメンタル・フォーショット・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマン

Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation ( http://arxiv.org/abs/2405.19568v1 )

ライセンス: Link先を確認

Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding,

(参考訳) インクリメンタルなFew-shot Semantic Segmentation(iFSS)の目的は、トレーニング済みのセグメンテーションモデルを、古いトレーニングデータにアクセスせずに、アノテートされた少数のイメージを通じて、新しいクラスに拡張することだ。新たなクラスを漸進的に学習する過程で、古いクラスのデータ分布が破壊され、破滅的な忘れがもたらされる。一方、新しいクラスにはサンプルがほとんどないため、新しいクラスの満足できる表現を学習することは不可能である。 iFSS問題に対して、OINetと呼ばれる背景埋め込み空間 \textbf{O}rganization とプロトタイプ \textbf{I}nherit Network を提案する。具体的には、ベースクラスをトレーニングする場合、OINetはバックグラウンドで複数の分類ヘッドを使用し、複数のサブクラスプロトタイプを設定して、潜在する新規クラスに埋め込みスペースを予約する。そこで本研究では,新しいクラスを段階的に学習する過程で,現在学習している新しいクラスに最もよく適合するサブクラスプロトタイプを選択し,選択したプロトタイプの埋め込み空間を継承する戦略を提案する。この操作により、ベースクラスの分布に影響を与えることなく、少数のサンプルを使用して、新しいクラスを埋め込みスペースに登録することができる。 Pascal-VOC と COCO の結果,OINet は新たな最先端を実現している。

The goal of incremental Few-shot Semantic Segmentation (iFSS) is to extend pre-trained segmentation models to new classes via few annotated images without access to old training data. During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting. Meanwhile, the novel classes have only few samples, making models impossible to learn the satisfying representations of novel classes. For the iFSS problem, we propose a network called OINet, i.e., the background embedding space \textbf{O}rganization and prototype \textbf{I}nherit Network. Specifically, when training base classes, OINet uses multiple classification heads for the background and sets multiple sub-class prototypes to reserve embedding space for the latent novel classes. During incrementally learning novel classes, we propose a strategy to select the sub-class prototypes that best match the current learning novel classes and make the novel classes inherit the selected prototypes' embedding space. This operation allows the novel classes to be registered in the embedding space using few samples without affecting the distribution of the base classes. Results on Pascal-VOC and COCO show that OINet achieves a new state of the art.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# 組立およびブールプリミティブによる凸分解の改善

Improved Convex Decomposition with Ensembling and Boolean Primitives ( http://arxiv.org/abs/2405.19569v1 )

ライセンス: Link先を確認

Vaibhav Vavilala, Florian Kluger, Seemandhar Jain, Bodo Rosenhahn, David Forsyth,

(参考訳) プリミティブ(幾何学的に単純な形状で構造を正確に抽象化する)の観点でシーンを記述することは、確立されたビジョン問題である。異なるシーンは異なる数のプリミティブを必要とし、プリミティブは強く相互作用するが、提案されたソリューションは推論時に評価することができる。最先端の手法は、一定数のプリミティブからなる開始点を予測するための学習された回帰手順と、幾何を洗練させ、冗長プリミティブを除去する降下法を含む。手法は深度, 正常予測, シーンセグメンテーションの精度で評価される。本稿では,精度の大幅な向上が期待できることを示す。 (a)少数の負の原始体を取り入れて b) さまざまなレグレッション手順をまとめる。組み立ては予測される各スタートポイントを精錬し、損失を埋め合わせることでベストを選択する。標準データセットにおける大規模な実験により、負のプリミティブが多数の画像で有用であることが確認され、我々の精巧な選択選択戦略がより優れていることが確認され、適合問題が非常に難しいことが確認された。

Describing a scene in terms of primitives -- geometrically simple shapes that offer a parsimonious but accurate abstraction of structure -- is an established vision problem. This is a good model of a difficult fitting problem: different scenes require different numbers of primitives and primitives interact strongly, but any proposed solution can be evaluated at inference time. The state of the art method involves a learned regression procedure to predict a start point consisting of a fixed number of primitives, followed by a descent method to refine the geometry and remove redundant primitives. Methods are evaluated by accuracy in depth and normal prediction and in scene segmentation. This paper shows that very significant improvements in accuracy can be obtained by (a) incorporating a small number of negative primitives and (b) ensembling over a number of different regression procedures. Ensembling is by refining each predicted start point, then choosing the best by fitting loss. Extensive experiments on a standard dataset confirm that negative primitives are useful in a large fraction of images, and that our refine-then-choose strategy outperforms choose-then-refine, confirming that the fitting problem is very difficult.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# 高速拡散インバージョンによるブラインド画像復元

Blind Image Restoration via Fast Diffusion Inversion ( http://arxiv.org/abs/2405.19572v1 )

ライセンス: Link先を確認

Hamadi Chihaoui, Abdelhak Lemkhenter, Paolo Favaro,

(参考訳) 近年,事前学習した拡散モデルを用いて画像復元(IR)課題を解決するための様々な手法が提案されている。しかし、これらの手法の多くは、IRタスクの劣化演算子が完全に知られていると仮定している。さらに、これらの手法の共通する特徴は、劣化した入力画像との整合性を満たすために拡散サンプリングプロセスを変更することである。この選択は、最近、準最適であることが示され、復元された画像がデータ多様体から逸脱する原因となった。これらの問題に対処するため,高速拡散インバージョン (BIRD) を用いたブラインド画像復元法を提案する。復元された画像がデータ多様体上に配置されることを保証するため,事前学習した拡散モデルに基づく新しいサンプリング手法を提案する。提案手法の鍵となる考え方は、初期ノイズがサンプリングされると、逆サンプリングを変更すること、すなわち、中間ラテントを全て変更しないことである。これは究極的には、入力ノイズの空間における最適化問題としてIRタスクをキャストすることと同値である。さらに, 完全にローリングされていない拡散モデルの逆転に伴う計算コストを軽減するために, これらのモデルの本質的能力を活用して, 大きな時間ステップを用いて前方拡散過程をスキップする。画像復元作業におけるBIRDの有効性を実験的に検証し,それらすべてに対して最先端の性能を実現することを示す。私たちのコードはhttps://github.com/hamadichihaoui/BIRD.comで公開されています。

Recently, various methods have been proposed to solve Image Restoration (IR) tasks using a pre-trained diffusion model leading to state-of-the-art performance. However, most of these methods assume that the degradation operator in the IR task is completely known. Furthermore, a common characteristic among these approaches is that they alter the diffusion sampling process in order to satisfy the consistency with the degraded input image. This choice has recently been shown to be sub-optimal and to cause the restored image to deviate from the data manifold. To address these issues, we propose Blind Image Restoration via fast Diffusion inversion (BIRD) a blind IR method that jointly optimizes for the degradation model parameters and the restored image. To ensure that the restored images lie onto the data manifold, we propose a novel sampling technique on a pre-trained diffusion model. A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled. This is ultimately equivalent to casting the IR task as an optimization problem in the space of the input noise. Moreover, to mitigate the computational cost associated with inverting a fully unrolled diffusion model, we leverage the inherent capability of these models to skip ahead in the forward diffusion process using large time steps. We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance on all of them. Our code is available at https://github.com/hamadichihaoui/BIRD.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# ハウサ映画レビューにおけるアスペクト・ポラリティ分類のための深部畳み込みニューラルネットワークモデル

A Deep Convolutional Neural Network-based Model for Aspect and Polarity Classification in Hausa Movie Reviews ( http://arxiv.org/abs/2405.19575v1 )

ライセンス: Link先を確認

Umar Ibrahim, Abubakar Yakubu Zandam, Fatima Muhammad Adam, Aminu Musa,

(参考訳) Aspect-based Sentiment Analysis (ABSA) は、感情のニュアンスを理解するのに不可欠である。本稿では,感情分析研究においてあまり表現されない言語であるハウサ映画レビューにおいて,アスペクトと極性分類に適した新しいDeep Convolutional Neural Network(CNN)モデルを提案する。 Hausa ABSAデータセットが作成され、リソースの可用性に大きなギャップを埋める。このデータセットは、TF-IDF変換のためにsci-kit-learnを使用して前処理され、手動で注釈付きアスペクトレベルの特徴オントロジーワードと感情極性割り当てを含む。提案モデルでは,CNNとアスペクトワード予測の注意機構を組み合わせることで,文脈情報と感情の極性を活用する。アスペクト項抽出の91%、感情極性分類の92%の精度で、モデルは従来のマシンモデルより優れており、特定のアスペクトや感情に関する洞察を提供する。本研究は、ABSA研究、特に表現不足言語において、異文化間言語研究に影響を及ぼす。

Aspect-based Sentiment Analysis (ABSA) is crucial for understanding sentiment nuances in text, especially across diverse languages and cultures. This paper introduces a novel Deep Convolutional Neural Network (CNN)-based model tailored for aspect and polarity classification in Hausa movie reviews, an underrepresented language in sentiment analysis research. A comprehensive Hausa ABSA dataset is created, filling a significant gap in resource availability. The dataset, preprocessed using sci-kit-learn for TF-IDF transformation, includes manually annotated aspect-level feature ontology words and sentiment polarity assignments. The proposed model combines CNNs with attention mechanisms for aspect-word prediction, leveraging contextual information and sentiment polarities. With 91% accuracy on aspect term extraction and 92% on sentiment polarity classification, the model outperforms traditional machine models, offering insights into specific aspects and sentiments. This study advances ABSA research, particularly in underrepresented languages, with implications for cross-cultural linguistic research.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# 情報システムマネジメントの転換 : ディジタルエンジニアリング統合のための参照モデル

Transforming Information Systems Management: A Reference Model for Digital Engineering Integration ( http://arxiv.org/abs/2405.19576v1 )

ライセンス: Link先を確認

John Bonar, John Hastings,

(参考訳) デジタルエンジニアリングの実践は、情報保証とシステムライフサイクル管理を改善するために、重要かつ未使用のポテンシャルを提供する。本稿では、モデルベースのエンジニアリング、デジタルスレッド、統合された製品ライフサイクルといった機能が、一般的なフレームワークのギャップにどのように対処できるかを検討する。参照モデルは、参照情報システムにデジタルエンジニアリング技術を適用し、トレーサビリティ、リスク可視性、正確性、統合性を示す。モデルは、ビューをまたいだ権威的な要素を再利用しながら、戦略的ニーズと要求とアーキテクチャを結びつける。モデルの分析は、デジタルエンジニアリングがコンプライアンス、監視、変更管理、リスクアセスメントのギャップを埋めることを示している。デジタルエンジニアリングの採用が目的であることは、サイバーセキュリティ、オペレーション、サービス配信、システムガバナンスを、包括的なデジタルシステム表現を通じて変革する可能性があることを示している。この研究は、組織がインフラを近代化し、デジタルトランスフォーメーションを追求するときに、情報システムにデジタル工学の応用を成熟させる基盤を提供する。

Digital engineering practices offer significant yet underutilized potential for improving information assurance and system lifecycle management. This paper examines how capabilities like model-based engineering, digital threads, and integrated product lifecycles can address gaps in prevailing frameworks. A reference model demonstrates applying digital engineering techniques to a reference information system, exhibiting enhanced traceability, risk visibility, accuracy, and integration. The model links strategic needs to requirements and architecture while reusing authoritative elements across views. Analysis of the model shows digital engineering closes gaps in compliance, monitoring, change management, and risk assessment. Findings indicate purposeful digital engineering adoption could transform cybersecurity, operations, service delivery, and system governance through comprehensive digital system representations. This research provides a foundation for maturing application of digital engineering for information systems as organizations modernize infrastructure and pursue digital transformation.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# スピン系における安定化器レニーエントロピーのための非平衡量子モンテカルロアルゴリズム

Non-equilibrium Quantum Monte Carlo Algorithm for Stabilizer Rényi Entropy in Spin Systems ( http://arxiv.org/abs/2405.19577v1 )

ライセンス: Link先を確認

Zejun Liu, Bryan K. Clark,

(参考訳) 量子マジック(英: Quantum magic, nonstabilizerness)は、安定化状態を持つ古典的なシミュラビリティに関する量子系の重要な特徴である。本研究では,量子魔法の尺度の1つである安定化器R'enyiエントロピーを,サインプロブレム自由ハミルトニアンを持つスピン系で計算するための,新しい,効率的なアルゴリズムを提案する。このアルゴリズムは、2つの分割関数のアンサンブル間の作業の経路積分の量子モンテカルロシミュレーションに基づいており、全ての空間次元と温度に適用される。このアルゴリズムは, 有限温度と零温度の両方で1次元および2次元の逆場Isingモデル上で実演し, テンソルネットワークに基づくアルゴリズムと定量的に一致することを示す。さらに,計算コストを解析し,解析的および数値的証拠の両方をシステムサイズの多項式として提供する。

Quantum magic, or nonstabilizerness, provides a crucial characterization of quantum systems, regarding the classical simulability with stabilizer states. In this work, we propose a novel and efficient algorithm for computing stabilizer R\'enyi entropy, one of the measures for quantum magic, in spin systems with sign-problem free Hamiltonians. This algorithm is based on the quantum Monte Carlo simulation of the path integral of the work between two partition function ensembles and it applies to all spatial dimensions and temperatures. We demonstrate this algorithm on the one and two dimensional transverse field Ising model at both finite and zero temperatures and show the quantitative agreements with tensor-network based algorithms. Furthermore, we analyze the computational cost and provide both analytical and numerical evidences for it to be polynomial in system size.

翻訳日:2024-05-31 18:46:29 公開日:2024-05-29

# EvaGaussian:Blurry画像からのガウス散乱を支援するイベントストリーム

EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images ( http://arxiv.org/abs/2405.20224v1 )

ライセンス: Link先を確認

Wangbo Yu, Chaoran Feng, Jiye Tang, Xu Jia, Li Yuan, Yonghong Tian,

(参考訳) 3次元ガウススプラッティング(3D-GS)は、3次元シーン再構成と新しいビュー合成において例外的な機能を示した。しかし、その訓練は高品質でシャープな画像と正確なカメラポーズに大きく依存している。これらの要件を満たすことは、高速移動カメラや長時間露光を必要とする低照度環境において、モーションブルーのイメージが一般的に遭遇する、理想的でない現実世界のシナリオでは難しい。これらの課題に対処するために,イベントストリーム支援ガウシアンスプラッティング(EvaGaussians)を紹介した。これは,イベントカメラがキャプチャしたイベントストリームを統合して,ぼやけた画像から高品質な3D-GSを再構築する,新たなアプローチである。イベントカメラによって提供される高時間分解能とダイナミックレンジを利用して、イベントストリームを利用して、動きブル画像の形成過程を明示的にモデル化し、3D-GSの劣化する再構築を導く。露出時間中に3D-GSパラメータを共同で最適化し,カメラモーショントラジェクトリを復元することにより,複雑なテクスチャの詳細を持つ高忠実度新規ビューの獲得を確実なものにすることができる。提案手法を網羅的に評価し,従来の最先端のデブロアレンダリング手法と比較した。定性的・定量的な比較は, ぼやけた画像から細部を復元し, 高忠実なノベルビューを創出する上で, 既存の手法を超越していることを示す。

3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light environments that require long exposure times. To address these challenges, we introduce Event Stream Assisted Gaussian Splatting (EvaGaussians), a novel approach that integrates event streams captured by an event camera to assist in reconstructing high-quality 3D-GS from blurry images. Capitalizing on the high temporal resolution and dynamic range offered by the event camera, we leverage the event streams to explicitly model the formation process of motion-blurred images and guide the deblurring reconstruction of 3D-GS. By jointly optimizing the 3D-GS parameters and recovering camera motion trajectories during the exposure time, our method can robustly facilitate the acquisition of high-fidelity novel views with intricate texture details. We comprehensively evaluated our method and compared it with previous state-of-the-art deblurring rendering methods. Both qualitative and quantitative comparisons demonstrate that our method surpasses existing techniques in restoring fine details from blurry images and producing high-fidelity novel views.

翻訳日:2024-05-31 13:29:24 公開日:2024-05-29

# ViTGAN:視覚変換器を用いたガン訓練

ViTGAN: Training GANs with Vision Transformers ( http://arxiv.org/abs/2107.04589v2 )

ライセンス: Link先を確認

Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu,

(参考訳) 近年、視覚変換器(ViT)は、視覚固有の誘導バイアスを少なくしながら、画像認識に競争力を発揮している。本稿では,このような性能を画像生成に拡張できるかどうかについて検討する。この目的のために、我々はViTアーキテクチャをGAN(Generative Adversarial Network)に統合する。 ViT差別者に対しては、既存のGANの正規化手法が自己注意と不適切な相互作用をし、トレーニング中に深刻な不安定を生じさせることが観察された。この問題を解決するために、我々は、VTを用いたGANのトレーニングのための新しい正規化手法をいくつか導入する。 ViTジェネレータに対しては,収束を容易にするため,潜在層と画素マッピング層のアーキテクチャ選択について検討する。実証的に、我々のアプローチはViTGANと呼ばれ、CIFAR-10、CelebA、LSUN寝室という3つのデータセット上で、主要なCNNベースのGANモデルに匹敵する性能を実現している。

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel mapping layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# MRCpy: 最小リスク分類のためのライブラリ

MRCpy: A Library for Minimax Risk Classifiers ( http://arxiv.org/abs/2108.01952v4 )

ライセンス: Link先を確認

Kartheek Bondugula, Verónica Álvarez, José I. Segovia-Martín, Aritz Pérez, Santiago Mazuelas,

(参考訳) 教師付き分類のためのライブラリは、機械学習手法の幅広い使用を可能にしている。既存のライブラリであるScikit-learn、 Caret、mlpackは、古典的な経験的リスク最小化(ERM)アプローチに基づいた技術を実装している。我々は,ロバストリスク最小化(RRM)アプローチに基づいて,ミニマックスリスク分類器 (MRC) を実装したPythonライブラリ MRCpy を提案する。このライブラリは、性能保証を提供し、高次元での効率的な学習を可能にし、分散シフトに適応できる複数の MRC を提供する。 MRCpyはオブジェクト指向のアプローチに従い、Scikit-learnのような人気のあるPythonライブラリの標準に準拠している。ソースコードはGPL-3.0ライセンスでhttps://github.com/MachineLearningBCAM/MRCpyで入手できる。

Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The library offers multiple variants of MRCs that can provide performance guarantees, enable efficient learning in high dimensions, and adapt to distribution shifts. MRCpy follows an object-oriented approach and adheres to the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries. The source code is available under the GPL-3.0 license at https://github.com/MachineLearningBCAM/MRCpy.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# PARIS:睡眠の質を高めるためのパーソナライズされたアクティビティ勧告

PARIS: Personalized Activity Recommendation for Improving Sleep Quality ( http://arxiv.org/abs/2110.13745v2 )

ライセンス: Link先を確認

Meghna Singh, Saksham Goel, Abhiraj Mohan, Jaideep Srivastava,

(参考訳) 睡眠の質は、人々の身体的および精神的な健康に大きな影響を与えます。睡眠不足の人は、身体的・精神的苦痛、活動制限、不安、痛みを報告しやすい。さらに、ここ数年、アクティビティ監視やヘルストラッキングのためのアプリケーションやデバイスが爆発的に増えている。これらのウェアラブルデバイスから収集された信号は、睡眠品質の研究と改善に利用することができる。本稿では,身体活動と睡眠の質の関係を利用して,機械学習技術を用いて睡眠改善を支援する方法を提案する。通常、人々はいくつかの行動モードを持ち、その生体機能は分割できる。アクティビティデータに基づいて時系列クラスタリングを行うと、特定の対象に対して最も明白な行動モードと相関するクラスタセンターが見つかる。アクティビティのレシピは、各クラスタ内の各動作モードに対して、適切な睡眠品質のために生成される。これらのアクティビティレシピは、日々のルーチン中に被験者にリラックスした、激しいアクティビティの混合を提案するためのアクティビティ推奨エンジンに供給される。推奨は、睡眠の質の向上を目的とし、年齢、性別、体重指数(BMI)、安静時心拍数など、被験者のライフスタイルの制約に基づいてさらにパーソナライズされる。これは、心拍数を下げたり、睡眠の全体的な品質を改善したりといった長期的な健康目標に役立ちます。

The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to study and improve sleep quality. In this paper, we utilize the relationship between physical activity and sleep quality to find ways of assisting people improve their sleep using machine learning techniques. People usually have several behavior modes that their bio-functions can be divided into. Performing time series clustering on activity data, we find cluster centers that would correlate to the most evident behavior modes for a specific subject. Activity recipes are then generated for good sleep quality for each behavior mode within each cluster. These activity recipes are supplied to an activity recommendation engine for suggesting a mix of relaxed to intense activities to subjects during their daily routines. The recommendations are further personalized based on the subjects' lifestyle constraints, i.e. their age, gender, body mass index (BMI), resting heart rate, etc, with the objective of the recommendation being the improvement of that night's quality of sleep. This would in turn serve a longer-term health objective, like lowering heart rate, improving the overall quality of sleep, etc.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# 交換対称コインを持つ2つの量子ウォーカー間の空間的絡み合い

Spatial entanglement between two quantum walkers with exchange symmetric coins ( http://arxiv.org/abs/2203.00873v2 )

ライセンス: Link先を確認

Ibrahim Yahaya Muhammad, Tanapat Deesuwan, Sikarin Yoo-Kong, Suwat Tangwancharoen, Monsit Tanasittikosol,

(参考訳) 本研究では、2つのコイン状態間の初期および最終交換対称性が、2つの量子ウォーカー間の空間的絡み合いのダイナミクスにどのように影響するかを検討する。特に、初期状態が反対称であり、コインの最終的な測定結果が対称的な結果をもたらすとき、硬貨がいつ測定されたかに関わらず、すべての初期絡み合いは空間的な自由度に転送される。逆に、最終的な結果が反対称であれば、空間絡み合いは減衰振動を示し、周期(T$)はコイン演算子パラメータ(\theta$)に逆比例する。これらの挙動は対称初期状態に対して逆転する。さらに,選択後の結果に対称性がない場合,初期状態に関わらず,同じ空間的絡み合い減衰を観測する。我々の研究結果は、量子ウォークにおける対称性が絡み合いのダイナミクスにどのように影響するかを明らかにし、量子技術への応用に対する潜在的な洞察を提供する。

We investigate how the initial and final exchange symmetries between the two-coin states influence the spatial entanglement dynamics between the two corresponding quantum walkers. Notably, when the initial state is anti-symmetric and the final measurement on the coins yields symmetric outcomes, all the initial entanglement will be transferred to the spatial degrees of freedom, regardless of when the coins are measured. Conversely, if the final outcomes are anti-symmetric, the spatial entanglement exhibits damped oscillation with a period ($T$) being inversely proportional to the coin operator parameter ($\theta$). These behaviours are reversed for symmetric initial states. Moreover, we also observe the same spatial entanglement damping regardless of the initial state when the post-selected results lack symmetry. Our findings reveal how symmetries affect the entanglement dynamics in quantum walks, offering potential insights for applications in quantum technology.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# 双方向オンライン市場のための動的マッチングバンド

Dynamic Matching Bandit For Two-Sided Online Markets ( http://arxiv.org/abs/2205.03699v3 )

ライセンス: Link先を確認

Yuantong Li, Chi-hua Wang, Guang Cheng, Will Wei Sun,

(参考訳) 両面のオンラインマッチングプラットフォームは、様々な市場で採用されている。しかし、現在の市場でのエージェントの好みは通常暗黙的で未知であるため、データから学ぶ必要がある。意思決定プロセスに関わる動的側情報の増加に伴い、現代のオンラインマッチング手法では、コンテキスト情報に基づいてエージェントのシフト好みを追跡する能力が要求される。これにより,この動的オンラインマッチング問題とコンテキスト情報との新たな枠組みが提案され,マッチング決定における動的嗜好が実現される。既存の作業はオンラインマッチングと静的な嗜好に重点を置いているが、これは不十分である。本稿では,この問題に適応する動的マッチング帯域幅アルゴリズムを提案する。提案する動的マッチングアルゴリズムの鍵となる要素は、統計的保証を伴う選好ランクのオンライン推定である。理論的には,提案した動的マッチングアルゴリズムは,高い確率でエージェント-最適安定マッチング結果を提供する。特に、対数的後悔上限$\mathcal{O}(\log(T))$を証明し、対応するインスタンス依存の後悔下限を構築する。実験では、動的マッチングアルゴリズムは、様々な選好スキーム、文脈の次元、報奨雑音レベル、文脈変動レベルに対して堅牢であることを示し、求職市場への適用により、提案手法の実用性をさらに実証する。

Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side information involved in the decision process, modern online matching methodology demands the capability to track shifting preferences for agents based on contextual information. This motivates us to propose a novel framework for this dynamic online matching problem with contextual information, which allows for dynamic preferences in matching decisions. Existing works focus on online matching with static preferences, but this is insufficient: the two-sided preference changes as soon as one side's contextual information updates, resulting in non-static matching. In this paper, we propose a dynamic matching bandit algorithm to adapt to this problem. The key component of the proposed dynamic matching algorithm is an online estimation of the preference ranking with a statistical guarantee. Theoretically, we show that the proposed dynamic matching algorithm delivers an agent-optimal stable matching result with high probability. In particular, we prove a logarithmic regret upper bound $\mathcal{O}(\log(T))$ and construct a corresponding instance-dependent matching regret lower bound. In the experiments, we demonstrate that dynamic matching algorithm is robust to various preference schemes, dimensions of contexts, reward noise levels, and context variation levels, and its application to a job-seeking market further demonstrates the practical usage of the proposed method.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# 単一原子からの絡み合った多光子グラフ状態の効率的な生成

Efficient generation of entangled multi-photon graph states from a single atom ( http://arxiv.org/abs/2205.12736v2 )

ライセンス: Link先を確認

Philip Thomas, Leonardo Ruscio, Olivier Morin, Gerhard Rempe,

(参考訳) 絡み合いは強力な概念であり、科学的・技術的進歩の可能性を秘めている。現代の研究の中心は、絡み合った状態の生成と制御を少数のキュービットから多くのキュービットに拡張し、それらをデコヒーレンスから保護することである。これらのクビットキャリアは自然に頑丈で操作が容易であるため、光子は顕著な役割を担っている。しかし、フォトニックエンタングルメントを作成するための最も成功した技術は本質的に確率的であり、従って拡張性に厳しい制限が課せられる。ここでは、空洞内に単一のメモリ原子を持つ決定論的プロトコルを実装することにより、これらを回避する。調整原子軌道回転による単一光子放射をインターリーブし,グリーンベルガー・ホルン・ゼリンジャー状態の最大14光子と最大12光子の線形クラスター状態のそれぞれ76(6)%,56(4)%を効率よく成長させる。光子1個あたり43.18(7)%のソース対検出効率のおかげで、これらの大きな状態は1分間に1度、以前のどの実験よりも桁違いに速く測定できる。将来的には、この速度をさらに高めることができ、このスキームは空洞内の2つの原子に拡張したり、量子力学的に結合して高次元のクラスター状態を生成することができる。フォトニックエンタングルメント生成の確率的スキームによって生じる限界を克服し、我々はスケーラブルな計測に基づく量子計算と通信の方法を提案する。

Entanglement is a powerful concept with an enormous potential for scientific and technological advances. A central focus in modern research is to extend the generation and control of entangled states from few to many qubits, and protect them against decoherence. Optical photons play a prominent role as these qubit carriers are naturally robust and easy to manipulate. However, the most successful technique to date for creating photonic entanglement is inherently probabilistic and therefore subject to severe scalability limitations. Here we avoid these by implementing a deterministic protocol with a single memory atom in a cavity. We interleave controlled single-photon emissions with tailored atomic qubit rotations to efficiently grow Greenberger-Horne-Zeilinger states of up to 14 photons and linear cluster states of up to 12 photons with a fidelity lower bounded by 76(6)% and 56(4)%, respectively. Thanks to a source-to-detection efficiency of 43.18(7)% per photon we measure these large states about once every minute, orders of magnitude faster than in any previous experiment. In the future, this rate could be increased even further, the scheme could be extended to two atoms in a cavity, or several sources could be quantum mechanically coupled, to generate higher-dimensional cluster states. Overcoming the limitations encountered by probabilistic schemes for photonic entanglement generation, our results may offer a way towards scalable measurement-based quantum computation and communication.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# ツイスト付き等変微分(TED)K-理論における任意の位相次数

Anyonic Topological Order in Twisted Equivariant Differential (TED) K-Theory ( http://arxiv.org/abs/2206.13563v2 )

ライセンス: Link先を確認

Hisham Sati, Urs Schreiber,

(参考訳) 等変K-理論による非相互作用性結晶性トポロジカル絶縁相の分類は広く受け入れられているが、それ故に、トポロジカルブレイド量子ゲートを支持する位相的に秩序づけられた基底状態を持つ位相への一般化は、広く開放されている。それとは対照的に、相互作用しない位相を分類するK-理論の成功は、相互作用する位相秩序のK-理論的な分類を先導するものとして暗黙的に認識され、代わりに他の提案の混合が検討されている。しかしながら、K-理論だけが価電子の実際の物理学と密接に結びついており、自己整合性は、他の任意の提案がK-理論に結び付けなければならないことを要求している。ここでは、結晶のブリルアントーラス orbi-オリエンティフォールド内の結節点の補間における点の構成空間のツイスト等変微分(TED)K理論により、特に相互作用する2d半金属において、対称性の保護/エンハンスSU(2)-アノニックトポロジー次数の分類について詳細に議論する。特に、(1) 位相的 2d 半金属相 modulo 大域質量項は、直交点の補集合の平坦な微分等式 K-理論によって分類される; (2) n-電子相互作用相は、ブリルアントーラスの n 個の点の構成空間のK-理論によって分類される; (3) 等式 K-電子の「インナー局所系」による幾分無視されたねじれは、フェルミ粒子を任意の量子aに変換するChen, Wilczeck, Witten & Halperin (1989) の効果的な「実」ゲージ相互作用を反映する; (4) 誘導されたSu(2)-電子相互作用相は、相互作用束の構成空間の相互作用のチャーンクラスに反映される、という主張を論じる。

While the classification of non-interacting crystalline topological insulator phases by equivariant K-theory has become widely accepted, its generalization to anyonic interacting phases -- hence to phases with topologically ordered ground states supporting topological braid quantum gates -- has remained wide open. On the contrary, the success of K-theory with classifying non-interacting phases seems to have tacitly been perceived as precluding a K-theoretic classification of interacting topological order; and instead a mix of other proposals has been explored. However, only K-theory connects closely to the actual physics of valence electrons; and self-consistency demands that any other proposal must connect to K-theory. Here we provide a detailed argument for the classification of symmetry protected/enhanced su(2)-anyonic topological order, specifically in interacting 2d semi-metals, by the twisted equivariant differential (TED) K-theory of configuration spaces of points in the complement of nodal points inside the crystal's Brillouin torus orbi-orientifold. We argue, in particular, that: (1) topological 2d semi-metal phases modulo global mass terms are classified by the flat differential twisted equivariant K-theory of the complement of the nodal points; (2) n-electron interacting phases are classified by the K-theory of configuration spaces of n points in the Brillouin torus; (3) the somewhat neglected twisting of equivariant K-theory by "inner local systems" reflects the effective "fictitious" gauge interaction of Chen, Wilczeck, Witten & Halperin (1989), which turns fermions into anyonic quanta; (4) the induced su(2)-anyonic topological order is reflected in the twisted Chern classes of the interacting valence bundle over configuration space, constituting the hypergeometric integral construction of monodromy braid representations.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# ポテンシャルエネルギー損失による分断学習の保護

Protecting Split Learning by Potential Energy Loss ( http://arxiv.org/abs/2210.09617v2 )

ライセンス: Link先を確認

Fei Zheng, Chaochao Chen, Lingjuan Lyu, Xinyi Fu, Xing Fu, Weiqiang Wang, Xiaolin Zheng, Jianwei Yin,

(参考訳) 実践的なプライバシー保護学習手法として、スプリットラーニングは学術や産業で注目を集めている。しかし、中間結果がトレーニングと推論中に共有されるため、そのセキュリティは常に疑問視されている。本稿では,分割学習の前方埋め込みによるプライバシー漏洩に着目した。具体的には、フォワード埋め込みにはラベルに関する情報が多すぎるため、攻撃者はいくつかのラベル付きサンプルを使用してトップモデルを微調整するか、クラスタリングのような教師なしのアタックを実行して、フォワード埋め込みから真のラベルを推測することができる。このようなプライバシリークを防止するため,同じクラスの埋め込みを決定境界に向かって押し進めることで,フォワード埋め込みをより複雑にするための潜在的なエネルギー損失を提案する。したがって、攻撃者が前方埋め込みから学ぶことは困難である。実験の結果,本手法は細調整攻撃とクラスタリング攻撃の両方の性能を著しく低下させることがわかった。

As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage from the forward embeddings of split learning. Specifically, since the forward embeddings contain too much information about the label, the attacker can either use a few labeled samples to fine-tune the top model or perform unsupervised attacks such as clustering to infer the true labels from the forward embeddings. To prevent such kind of privacy leakage, we propose the potential energy loss to make the forward embeddings become more 'complicated', by pushing embeddings of the same class towards the decision boundary. Therefore, it is hard for the attacker to learn from the forward embeddings. Experiment results show that our method significantly lowers the performance of both fine-tuning attacks and clustering attacks.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# 教師なし領域適応による画像編集のためのGANインバージョン

GAN Inversion for Image Editing via Unsupervised Domain Adaptation ( http://arxiv.org/abs/2211.12123v2 )

ライセンス: Link先を確認

Siyu Xing, Chen Gong, Hewei Guo, Xiao-Yu Zhang, Xinwen Hou, Yu Liu,

(参考訳) 既存のGANインバージョン手法は、より一般的な低品質(LQ)入力に苦しむ一方で、高品質(HQ)イメージの再構築に優れる。この問題に対処するために、HQおよびLQ画像の効果的な逆変換と編集のために、Unsupervised Domain Adaptation (UDA) をインバージョンプロセス、すなわち UDA-inversion として提案する。未ペアのHQイメージをソースドメインとして、LQイメージを未ラベルのターゲットドメインとして、対象ドメインの損失値がソースドメインの損失によって上界となるという理論的保証と、2つのドメイン間の差を測定する新しい差分関数を導入する。その後、この上限を最小化してHQおよびLQ画像の正確な潜時符号を得る。これにより、HQ画像の構成的表現を自然に学習し、監督なしでLQ画像に変換することができる。 UDA-InversionはFFHQデータセットで22.14のPSNRを実現し、教師付きメソッドと互換性がある。

Existing GAN inversion methods work brilliantly in reconstructing high-quality (HQ) images while struggling with more common low-quality (LQ) inputs in practical application. To address this issue, we propose Unsupervised Domain Adaptation (UDA) in the inversion process, namely UDA-inversion, for effective inversion and editing of both HQ and LQ images. Regarding unpaired HQ images as the source domain and LQ images as the unlabeled target domain, we introduce a theoretical guarantee: loss value in the target domain is upper-bounded by loss in the source domain and a novel discrepancy function measuring the difference between two domains. Following that, we can only minimize this upper bound to obtain accurate latent codes for HQ and LQ images. Thus, constructive representations of HQ images can be spontaneously learned and transformed into LQ images without supervision. UDA-Inversion achieves a better PSNR of 22.14 on FFHQ dataset and performs comparably to supervised methods.

翻訳日:2024-05-31 02:51:07 公開日:2024-05-29

# 確率的文脈 MDP のためのエルダーベースレグレット

Eluder-based Regret for Stochastic Contextual MDPs ( http://arxiv.org/abs/2211.14932v3 )

ライセンス: Link先を確認

Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour,

(参考訳) 本稿では,確率的マルコフ決定過程(CMDP)における後悔最小化のためのE-UC$^3$RLアルゴリズムを提案する。このアルゴリズムは、実現可能な関数クラスと \emph{offline} 最小二乗およびログ損失回帰オラクルへのアクセスという最小の仮定の下で機能する。我々のアルゴリズムは効率が良く(効率的なオフライン回帰オラクルを仮定すると)、$ \widetilde{O}(H^3 \sqrt{T |S| |A|d_{\mathrm{E}}(\mathcal{P}) \log (|\mathcal{F}||\mathcal{P}|/\delta) )} の後悔を保証する。我々の知る限り、我々のアルゴリズムは、一般的なオフライン関数近似設定の下で動作しているCMDPに対する、最初の効率的かつレート最適後悔最小化アルゴリズムである。さらに、エルダー次元を別の興味を持つような一般有界測度に拡張する。

We present the E-UC$^3$RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $ \widetilde{O}(H^3 \sqrt{T |S| |A|d_{\mathrm{E}}(\mathcal{P}) \log (|\mathcal{F}| |\mathcal{P}|/ \delta) )}) , $ with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, $\mathcal{P}$ and $\mathcal{F}$ are finite function classes used to approximate the context-dependent dynamics and rewards, respectively, and $d_{\mathrm{E}}(\mathcal{P})$ is the Eluder dimension of $\mathcal{P}$ w.r.t the Hellinger distance. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting. In addition, we extend the Eluder dimension to general bounded metrics which may be of separate interest.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# Quota と Complementary Preferences Constraint による両面競合型マッチング推奨市場

Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints ( http://arxiv.org/abs/2301.10230v3 )

ライセンス: Link先を確認

Yuantong Li, Guang Cheng, Xiaowu Dai,

(参考訳) 本稿では,エージェントの嗜好が未知であり,データから学習しなければならないような,相補的な嗜好とクォータ制約を伴う双方向のオンラインマッチング市場の課題に対処する,新たなレコメンデーションアルゴリズムを提案する。混合クォータと相補的な選好制約の存在は、マッチングプロセスの不安定性を招き、この問題を解決するのが難しくなる。この課題を克服するために、バンドレート学習フレームワークとして問題を定式化し、マルチエージェントマルチタイプトンプソンサンプリング(MMTS)アルゴリズムを提案する。このアルゴリズムは、トンプソンサンプリングの強度と新しい二重マッチング手法を組み合わせて、安定したマッチング結果を提供する。我々の理論的分析は、MMTSが安定性を達成できることを示すものであり、全会社のクォータ$Q$、利用可能なタイプワーカーの最大サイズの平方根である$\sqrt{K_{\max}T}}とタイムホライズン$T$に対して線形性を示す高い確率で、合計$\widetilde{\mathcal{O}}(Q{\sqrt{K_{\max}T}})$-Bayesian regretを持つ。さらに,様々な環境下でのMMTSの有効性についてもシミュレーション研究を行った。実験で使われたコードを提供しています。

In this paper, we propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints, where agents' preferences are unknown a priori and must be learned from data. The presence of mixed quota and complementary preferences constraints can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a new double matching technique to provide a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it can achieve stability and has a total $\widetilde{\mathcal{O}}(Q{\sqrt{K_{\max}T}})$-Bayesian regret with high probability, which exhibits linearity with respect to the total firm's quota $Q$, the square root of the maximum size of available type workers $\sqrt{K_{\max}}$ and time horizon $T$. In addition, simulation studies also demonstrate MMTS's effectiveness in various settings. We provide code used in our experiments \url{https://github.com/Likelyt/Double-Matching}.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# アルゴリズム設計型ニューラルネットワーク(ADANNs):パラメトリック偏微分方程式の高次深部演算子学習

Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations ( http://arxiv.org/abs/2302.03286v2 )

ライセンス: Link先を確認

Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger,

(参考訳) 本稿では,パラメータ偏微分方程式(PDE)に関連する近似演算子に対する新しいディープラーニング手法を提案する。特に、特定の近似問題に適した特定のANN初期化スキームとともに、特定の人工知能ニューラルネットワーク(ANN)アーキテクチャを設計する新しい戦略を導入する。提案手法では,高速な古典的数値近似手法と深層演算子学習手法を併用する。具体的には、既存のANNアーキテクチャのカスタマイズされた適応と、これらのANNアーキテクチャの特別な初期化を導入し、初期化時に、ANNが検討された近似問題に対して、選択された最適化された古典的数値アルゴリズムを忠実に模倣するようにした。得られたANNアーキテクチャとその初期化スキームは、数値アルゴリズムや文学からの一般的なディープラーニング手法に強く影響を受けており、その意味では、アルゴリズム設計されたニューラルネットワーク(ADANN)として、アルゴリズムで作成された初期化スキームとともに導入されたANNを参照する。いくつかのパラメトリックPDEの場合,提案手法を数値的に検証する。検証された数値例では、ADANN手法は既存の近似アルゴリズムと、文献からのディープラーニングの方法論を著しく上回っている。

In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the proposed approach we combine efficient classical numerical approximation techniques with deep operator learning methodologies. Specifically, we introduce customized adaptions of existing ANN architectures together with specialized initializations for these ANN architectures so that at initialization we have that the ANNs closely mimic a chosen efficient classical numerical algorithm for the considered approximation problem. The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs). We numerically test the proposed ADANN methodology in the case of several parametric PDEs. In the tested numerical examples the ADANN methodology significantly outperforms existing traditional approximation algorithms as well as existing deep operator learning methodologies from the literature.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# プライベート推定におけるサブセットベースインスタンス最適性

Subset-Based Instance Optimality in Private Estimation ( http://arxiv.org/abs/2303.01262v3 )

ライセンス: Link先を確認

Travis Dick, Alex Kulesza, Ziteng Sun, Ananda Theertha Suresh,

(参考訳) 本稿では,差分プライベート推定アルゴリズムのインスタンス最適性を新たに定義する。私たちの定義では、データセットの$D$に対して同時に競合する最適なアルゴリズムと、最高のプライベートベンチマークアルゴリズムが必要です。 (a)事前にD$を知っており、 (b) は$D$の大規模なサブセット上での最悪のパフォーマンスによって評価される。つまり、潜在的に極端な点が$D$に加算された場合、ベンチマークアルゴリズムはうまく動作しない。これにより、ベンチマークは以前の作業で提案されたベンチマークよりも大幅に強くなります。それにもかかわらず、実際の評価されたデータセットに対して、手段、量子化、および$\ell_p$-norm最小化を含む幅広いデータセット特性のクラスを推定する際に、インスタンス最適性の概念を達成するプライベートアルゴリズムを構築する方法を示す。具体的には,提案アルゴリズムが既存アルゴリズムの漸近的性能と同時に一致するか,あるいは超えていることを示す。

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# 安全-帯域制限下における中本合意のトレードオフ-

Security--Throughput Tradeoff of Nakamoto Consensus under Bandwidth Constraints ( http://arxiv.org/abs/2303.09113v3 )

ライセンス: Link先を確認

Lucianna Kiffer, Joachim Neu, Srivatsan Sridhar, Aviv Zohar, David Tse,

(参考訳) セキュリティとパフォーマンスのトレードオフの古典的な問題を再考する: 限られた能力を持つノードのネットワークが与えられると、特定のブロックの生産速度に対して、敵の力の何パーセントがNCであるのか? 制限された通信や計算資源といった現実的な制約を捉えない有界遅延モデルであるため、中本プロトコルの最先端分析ではこの問題に答えられていない。境界帯域モデルを用いて,PoW中本コンセンサスに対するセキュリティ・パフォーマンストレードオフを改良した解析手法を開発した。このモデルでは,従来の有界遅延モデルとは対照的に,中本氏の私的攻撃はもはや最悪の攻撃ではなく,帯域幅の制限によるネットワーク混雑を悪用したティーシング戦略と呼ばれる新たな攻撃戦略が著しく悪化していることが示されている。 PoSでは、同化ブロックは、非常に低いブロック生成率を除いて、従来のPoS Nakamotoコンセンサスプロトコルの安全性を損なうため、混雑を悪化させる可能性がある。このような均等なスパムに対処するため、我々はBlanking NC (BlaNC) と呼ぶPoS NCプロトコルの変種を提示し、PoW NCと同じレジリエンスを実現する。

For Nakamoto's longest-chain consensus protocol, whose proof-of-work (PoW) and proof-of-stake (PoS) variants power major blockchains such as Bitcoin and Cardano, we revisit the classic problem of the security-performance tradeoff: Given a network of nodes with limited capacities, against what fraction of adversary power is Nakamoto consensus (NC) secure for a given block production rate? State-of-the-art analyses of Nakamoto's protocol fail to answer this question because their bounded-delay model does not capture realistic constraints such as limited communication- and computation-resources. We develop a new analysis technique to prove a refined security-performance tradeoff for PoW Nakamoto consensus in a bounded-bandwidth model. In this model, we show that, in contrast to the classic bounded-delay model, Nakamoto's private attack is no longer the worst attack, and a new attack strategy we call the teasing strategy, that exploits the network congestion caused by limited bandwidth, is strictly worse. In PoS, equivocating blocks can exacerbate congestion, making the traditional PoS Nakamoto consensus protocol insecure except at very low block production rates. To counter such equivocation spamming, we present a variant of the PoS NC protocol we call Blanking NC (BlaNC), which achieves the same resilience as PoW NC.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# 拡散モデルのセマンティック潜在空間における解釈的方向の発見

Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models ( http://arxiv.org/abs/2303.11073v2 )

ライセンス: Link先を確認

René Haas, Inbar Huberman-Spiegelglas, Rotem Mulayoff, Stella Graßhof, Sami S. Brandt, Tomer Michaeli,

(参考訳) Denoising Diffusion Models (DDM) はGenerative Adversarial Networks (GAN) と強力な競合関係にある。しかし、画像合成や編集に広く用いられているにもかかわらず、その潜在空間はいまだによく理解されていない。近年,「$h$-space」とよばれるDDMのセマンティック潜在空間が,GANを連想させる形でセマンティック画像編集を容易にすることが示されている。 h$-スペースは拡散過程のすべての時間ステップでDDMのデノイザーのボトルネックアクティベーションから成り立っている。本稿では,h-spaceの特性について検討し,その中に意味のある意味的方向を求めるための新しい手法を提案する。まず、事前訓練されたDDMにおける解釈可能な意味方向を明らかにするための教師なし手法の研究から始める。具体的には,グローバル潜伏方向が潜伏空間の主成分として現れることを示す。さらに,遅延符号のデノイザWr.t.のヤコビアンのスペクトル解析により,画像固有の意味方向を検出する新しい手法を提案する。次に、非条件DDMにおける教師付き方式で方向を求めることで分析を拡張した。実画像のラベル付きデータセットか、ドメイン固有の属性分類器で生成されたサンプルにアノテートすることで、そのような方向を見つけることができることを示す。さらに、簡単な線形射影により、検出された方向を意味的に切り離す方法を示す。私たちのアプローチは、アーキテクチャの変更、テキストベースのガイダンス、CLIPベースの最適化、モデル微調整を必要とせずに適用できます。

Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs). However, despite their widespread use in image synthesis and editing applications, their latent space is still not as well understood. Recently, a semantic latent space for DDMs, coined `$h$-space', was shown to facilitate semantic image editing in a way reminiscent of GANs. The $h$-space is comprised of the bottleneck activations in the DDM's denoiser across all timesteps of the diffusion process. In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it. We start by studying unsupervised methods for revealing interpretable semantic directions in pretrained DDMs. Specifically, we show that global latent directions emerge as the principal components in the latent space. Additionally, we provide a novel method for discovering image-specific semantic directions by spectral analysis of the Jacobian of the denoiser w.r.t. the latent code. Next, we extend the analysis by finding directions in a supervised fashion in unconditional DDMs. We demonstrate how such directions can be found by relying on either a labeled data set of real images or by annotating generated samples with a domain-specific attribute classifier. We further show how to semantically disentangle the found direction by simple linear projection. Our approaches are applicable without requiring any architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# グラフ上のランダム逆問題:分散オンライン学習

Random Inverse Problems Over Graphs: Decentralized Online Learning ( http://arxiv.org/abs/2303.11789v5 )

ライセンス: Link先を確認

Tao Li, Xiwei Zhang,

(参考訳) ネットワークグラフ上の分散ランダム逆問題のフレームワークをオンライン測定で構築し,分散化されたオンライン学習アルゴリズムを提案する。これはヒルベルト空間における分散パラメータ推定と、再現されたカーネルヒルベルト空間(RKHS-LMS)における最小平均平方問題を統一する。我々は、アルゴリズムの収束を、L2有界なマルティンゲール差項を持つヒルベルト空間における不均一なランダム差分方程式のクラスにおける漸近安定性に変換し、ヒルベルト空間におけるL2-漸近安定性理論を開発する。ネットワークグラフが連結され、フォワード作用素の列が励起条件の無限次元時空間持続性を満たすならば、全てのノードの見積もりは平均二乗であり、ほぼ確実に一致している。さらに,RKHSにおける非定常および非独立なオンラインデータストリームに基づく分散オンライン学習アルゴリズムを提案し,ランダム入力データによって誘導される演算子が励振条件の無限次元時空間持続性を満たす場合,そのアルゴリズムが平均二乗でほぼ確実に整合であることを証明した。

We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# SoftED:時系列イベント検出のソフト評価基準

SoftED: Metrics for Soft Evaluation of Time Series Event Detection ( http://arxiv.org/abs/2304.00439v2 )

ライセンス: Link先を確認

Rebecca Salles, Janio Lima, Rafaelli Coutinho, Esther Pacitti, Florent Masseglia, Reza Akbarinia, Chao Chen, Jonathan Garibaldi, Fabio Porto, Eduardo Ogasawara,

(参考訳) 時系列イベント検出法は,検出精度にのみ焦点をあてた標準分類基準によって評価される。しかし、事象を検出する不正確さは、しばしば、隣り合う検出に反映される前のまたは遅れた影響から生じる。これらの検出は、必要なアクションをトリガーしたり、不満足な結果を軽減するのに役立ちます。この文脈では、現在のメトリクスは不十分であり、イベント検出のコンテキストには不十分である。近隣の検知に対する時間と時間的寛容の両方の概念を取り入れたメトリクスの需要がある。本稿では,イベント検出手法のソフトアセスメントのために設計された,新しいメトリクスセットであるSoftEDメトリクスを紹介する。これにより、検出精度と、その検出がイベントを表す程度の両方を評価することができる。彼らは、通常の分類指標と比較して、時間的寛容性を36倍以上の実験に取り入れ、イベントと代表的検出を関連付けることにより、イベント検出の評価を改善した。 SoftEDメトリクスは、検出評価とメソッド選択への貢献を示すドメインスペシャリストによって検証された。

Time series event detection methods are evaluated mainly by standard classification metrics that focus solely on detection accuracy. However, inaccuracy in detecting an event can often result from its preceding or delayed effects reflected in neighboring detections. These detections are valuable to trigger necessary actions or help mitigate unwelcome consequences. In this context, current metrics are insufficient and inadequate for the context of event detection. There is a demand for metrics that incorporate both the concept of time and temporal tolerance for neighboring detections. This paper introduces SoftED metrics, a new set of metrics designed for soft evaluating event detection methods. They enable the evaluation of both detection accuracy and the degree to which their detections represent events. They improved event detection evaluation by associating events and their representative detections, incorporating temporal tolerance in over 36\% of experiments compared to the usual classification metrics. SoftED metrics were validated by domain specialists that indicated their contribution to detection evaluation and method selection.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# 複合組織均質化のためのニューラルネットワークトランスモデル

A Neural Network Transformer Model for Composite Microstructure Homogenization ( http://arxiv.org/abs/2304.07877v2 )

ライセンス: Link先を確認

Emil Pitz, Kishore Pochiraju,

(参考訳) 複合組織における不均一性と不確実性は、厳密にモデル化された場合の計算ボトルネックか、応力場における解の不正確さと、近似された場合の故障予測につながる。任意の構造と非線形構造を解析するのに適する手法は存在するが、その計算コストは大規模構造解析において実用的ではない。サロゲートモデル (Surrogate Model) またはリダクションオーダーモデル (ROM) は、一般的に効率を高めるが、通常は単一のマイクロ構造でキャリブレーションされる。森田中法のような均質化法は、幅広い構成特性に対して急速な均質化をもたらす。しかし、位相における応力やひずみ平均化のような仮定を単純化することは、ミクロ構造における決定論的および確率的バリエーションの両方を考慮すべきである。本稿では,様々なミクロ構造や構成成分の知識を捉えるトランスフォーマーニューラルネットワークアーキテクチャについて述べる。エラストプラストマトリックス内の線形弾性繊維の任意の合成ミクロ構造のイメージや抽象化が与えられた場合、トランスフォーマーネットワークは、履歴依存、非線形、均質化されたストレス-ひずみ応答を予測する。主成分分析 (PCA) を用いた2点統計計算と, 畳み込みニューラルネットワーク (CNN) を用いたオートエンコーダを用いた。どちらの手法も、均質化物質応答を正確に予測する。開発されたトランスニューラルネットワークは、様々なミクロ構造に一般化可能で拡張可能な、ミクロ構造間翻訳の効率的な手段を提供する。本稿では,サイクリングおよびランダム負荷下でのネットワークアーキテクチャ,データ生成のトレーニングとテスト,パフォーマンスについて述べる。

Heterogeneity and uncertainty in a composite microstructure lead to either computational bottlenecks if modeled rigorously or to solution inaccuracies in the stress field and failure predictions if approximated. Although methods suitable for analyzing arbitrary and non-linear microstructures exist, their computational cost makes them impractical to use in large-scale structural analysis. Surrogate models or Reduced Order Models (ROMs) commonly enhance efficiencies but are typically calibrated with a single microstructure. Homogenization methods, such as the Mori-Tanaka method, offer rapid homogenization for a wide range of constituent properties. However, simplifying assumptions, like stress and strain averaging in phases, render the consideration of both deterministic and stochastic variations in microstructure infeasible. This paper illustrates a transformer neural network architecture that captures the knowledge of various microstructures and constituents, enabling it to function as a computationally efficient homogenization surrogate model. Given an image or an abstraction of an arbitrary composite microstructure of linearly elastic fibers in an elastoplastic matrix, the transformer network predicts the history-dependent, non-linear, and homogenized stress-strain response. Two methods for encoding microstructure features were tested: calculating two-point statistics using Principal Component Analysis (PCA) for dimensionality reduction and employing an autoencoder with a Convolutional Neural Network (CNN). Both methods accurately predict the homogenized material response. The developed transformer neural network offers an efficient means for microstructure-to-property translation, generalizable and extendable to a variety of microstructures. The paper describes the network architecture, training and testing data generation, and performance under cycling and random loadings.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# PLIP:人物表現学習のための言語画像事前学習

PLIP: Language-Image Pre-training for Person Representation Learning ( http://arxiv.org/abs/2305.08386v2 )

ライセンス: Link先を確認

Jialong Zuo, Jiahao Hong, Feng Zhang, Changqian Yu, Hanyu Zhou, Changxin Gao, Nong Sang, Jingdong Wang,

(参考訳) 言語イメージ事前学習は、一般的なドメインにおける強力な表現を学習するための効果的なテクニックである。しかし、直接人体表現学習を行う場合、これらの一般的な事前学習法は不満足な性能に悩まされる。理由は、批判的な人物の特徴、すなわちきめ細かい属性やアイデンティティを無視するからである。この問題に対処するために,PLIPと呼ばれる人物表現学習のための新しい言語画像事前学習フレームワークを提案する。具体的には、3つのプレテキストタスクを精巧に設計する。 1) テキスト誘導画像のカラー化は,人物関連画像領域と微粒なカラー部分のテキストフレーズとの対応性を確立することを目的としている。 2【画像誘導属性予測】は、画像中の人物の微粒な属性情報をマイニングすることを目的とする。 3) アイデンティティベースのVision-Language Contrastは、インスタンスレベルではなく、アイデンティティレベルでのクロスモーダル表現の相関を目指している。さらに,事前トレーニングフレームワークを実装するために,SynTH-PEDESという画像テキストペアを用いた大規模人物データセットを構築し,テキストアノテーションを自動生成する。我々は、SynTH-PEDES上でPLIPを事前訓練し、下流の人中心のタスクにまたがってモデルを評価する。 PLIPはこれらのタスクの既存のメソッドを大幅に改善するだけでなく、ゼロショットやドメインの一般化設定でも優れた機能を示している。コード、データセット、重み付けは~\url{https://github.com/Zplusdragon/PLIP} でリリースされる。

Language-image pre-training is an effective technique for learning powerful representations in general domains. However, when directly turning to person representation learning, these general pre-training methods suffer from unsatisfactory performance. The reason is that they neglect critical person-related characteristics, i.e., fine-grained attributes and identities. To address this issue, we propose a novel language-image pre-training framework for person representation learning, termed PLIP. Specifically, we elaborately design three pretext tasks: 1) Text-guided Image Colorization, aims to establish the correspondence between the person-related image regions and the fine-grained color-part textual phrases. 2) Image-guided Attributes Prediction, aims to mine fine-grained attribute information of the person body in the image; and 3) Identity-based Vision-Language Contrast, aims to correlate the cross-modal representations at the identity level rather than the instance level. Moreover, to implement our pre-train framework, we construct a large-scale person dataset with image-text pairs named SYNTH-PEDES by automatically generating textual annotations. We pre-train PLIP on SYNTH-PEDES and evaluate our models by spanning downstream person-centric tasks. PLIP not only significantly improves existing methods on all these tasks, but also shows great ability in the zero-shot and domain generalization settings. The code, dataset and weights will be released at~\url{https://github.com/Zplusdragon/PLIP}

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# 物理インフォームドニューラルネットワークの効率的な誤り証明

Efficient Error Certification for Physics-Informed Neural Networks ( http://arxiv.org/abs/2305.10157v2 )

ライセンス: Link先を確認

Francisco Eiras, Adel Bibi, Rudy Bunel, Krishnamurthy Dj Dvijotham, Philip Torr, M. Pawan Kumar,

(参考訳) 最近の研究は、物理情報ニューラルネットワーク(PINN)が偏微分方程式(PDE)を効率的に解くことができるという有望な証拠を提供している。しかし、従来の研究では、時空間領域をまたいだPINNの最悪の残差(数値解法の耐性に類似した尺度)を保証できなかった。実世界のアプリケーションでは、有限の点集合に対するテストは、異なる集合におけるパフォーマンスが著しく悪化する可能性があるため、配置のための十分な基盤とみなすことはできない。この問題を緩和するため、PINNの継続的な適用性ドメインに対するエラーベースの条件を保証します。 PINN残差エラーをバインドする汎用的で効率的でスケーラブルなポストトレーニングフレームワークである$\partial$-CROWNを導入する。本稿では,古典的に研究されている2つのPINN(Burgers' と Schr\odinger' の方程式)と,Allan-Cahn と Diffusion-Sorption の2つの実世界の応用において,より難易度の高い証明を得る上での有効性を示す。

Recent work provides promising evidence that Physics-Informed Neural Networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their solution and the ones obtained by a solver on a set of inputs. In real-world applications, one cannot consider tests on a finite set of points to be sufficient grounds for deployment, as the performance could be substantially worse on a different set. To alleviate this issue, we establish guaranteed error-based conditions for PINNs over their continuous applicability domain. To verify the extent to which they hold, we introduce $\partial$-CROWN: a general, efficient and scalable post-training framework to bound PINN residual errors. We demonstrate its effectiveness in obtaining tight certificates by applying it to two classically studied PINNs - Burgers' and Schr\"odinger's equations -, and two more challenging ones with real-world applications - the Allan-Cahn and Diffusion-Sorption equations.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# UP5:Fairness-Aware RecommendationのためのUnbiased Foundation Model

UP5: Unbiased Foundation Model for Fairness-aware Recommendation ( http://arxiv.org/abs/2305.12090v2 )

ライセンス: Link先を確認

Wenyue Hua, Yingqiang Ge, Shuyuan Xu, Jianchao Ji, Yongfeng Zhang,

(参考訳) LLM(Large Language Models)のような基礎モデルの最近の進歩は、それらをRecommender Systems(RS)の最前線へと押し上げている。実用性にもかかわらず、LSMが社会的ステレオタイプを必然的に持続させ、不当な勧告をもたらすのではないかという懸念が高まっている。本論文は,多くのユーザが意思決定や需要充足のために考えるように,RSにとって公平性は不可欠であるため,性別や年齢などの特定の敏感な特徴に公平であるように推奨するレコメンデーションに対して,ユーザ側の公正性に焦点をあてる。本稿では,LLM ベースのレコメンデーションモデルにおいて,T5 と LLaMA のバックボーンをベースとしたレコメンデーションモデルが示す不公平さの程度について検討し,LLM ベースのレコメンデーションモデルにおけるユーザの公平な扱いを促進するための適切な方法について議論する。フェアネスを意識したLLMレコメンデーションのための新しいCFP法をUnbiased Foundation mOdels(UFO)に導入する。実世界の2つのデータセットであるMovieLens-1MとInsurationで実験を行い、マッチングベースとシーケンシャルベースの両方のフェアネス対応レコメンデーションモデルと比較した。その結果,CFPは高い公正度でより優れたレコメンデーション性能が得られることがわかった。データとコードはhttps://github.com/agiresearch/UP5.comで公開されている。

Recent advances in Foundation Models such as Large Language Models (LLMs) have propelled them to the forefront of Recommender Systems (RS). Despite their utility, there is a growing concern that LLMs might inadvertently perpetuate societal stereotypes, resulting in unfair recommendations. Since fairness is critical for RS as many users take it for decision-making and demand fulfillment, this paper focuses on user-side fairness for LLM-based recommendation where the users may require a recommender system to be fair on specific sensitive features such as gender or age. In this paper, we dive into the extent of unfairness exhibited by LLM-based recommender models based on both T5 and LLaMA backbones, and discuss appropriate methods for promoting equitable treatment of users in LLM-based recommendation models. We introduce a novel Counterfactually-Fair-Prompt (CFP) method towards Unbiased Foundation mOdels (UFO) for fairness-aware LLM-based recommendation. Experiments are conducted on two real-world datasets, MovieLens-1M and Insurance, and compared with both matching-based and sequential-based fairness-aware recommendation models. Results show that CFP achieves better recommendation performance with a high level of fairness. Data and code are open-sourced at https://github.com/agiresearch/UP5.

翻訳日:2024-05-31 02:41:05 公開日:2024-05-29

# InstructVid2Vid:自然言語による制御可能なビデオ編集

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions ( http://arxiv.org/abs/2305.12328v2 )

ライセンス: Link先を確認

Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang,

(参考訳) InstructVid2Vidは人間の言語指導による映像編集のためのエンドツーエンド拡散方式である。我々のアプローチは、自然言語ディレクティブによって案内される映像操作を強化し、サンプルごとの微調整や逆変換の必要性を排除します。提案したInstructVid2Vidモデルは、予め訓練された画像生成モデルであるStable Diffusionを変更して、ビデオフレームの時間依存シーケンスを生成する。異なるモデルの集合的インテリジェンスを活用することで、私たちは、実世界のシナリオでデータを収集するよりコスト効率の良い代替手段として、ビデオインストラクション三脚に富んだトレーニングデータセットを構築しました。生成したビデオ内の連続したフレーム間のコヒーレンスを高めるために、フレーム間一貫性損失を提案し、トレーニングプロセス中にそれを組み込む。推論段階におけるマルチモーダル分類器フリーガイダンスにより、生成されたビデオは、入力されたビデオと付随する命令の両方に共鳴することができる。実験結果から,InstructVid2Vidは高品質で時間的コヒーレントなビデオを生成し,属性編集や背景変更,スタイル転送などの多様な編集を行うことができることがわかった。これらの結果は,提案手法の汎用性と有効性を示すものである。

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion. The proposed InstructVid2Vid model modifies a pretrained image generation model, Stable Diffusion, to generate a time-dependent sequence of video frames. By harnessing the collective intelligence of disparate models, we engineer a training dataset rich in video-instruction triplets, which is a more cost-efficient alternative to collecting data in real-world scenarios. To enhance the coherence between successive frames within the generated videos, we propose the Inter-Frames Consistency Loss and incorporate it during the training process. With multimodal classifier-free guidance during the inference stage, the generated videos is able to resonate with both the input video and the accompanying instructions. Experimental results demonstrate that InstructVid2Vid is capable of generating high-quality, temporally coherent videos and performing diverse edits, including attribute editing, background changes, and style transfer. These results underscore the versatility and effectiveness of our proposed method.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# ランジュバンモンテカルロアルゴリズムによる非対数・非平滑サンプリング

Non-Log-Concave and Nonsmooth Sampling via Langevin Monte Carlo Algorithms ( http://arxiv.org/abs/2305.15988v2 )

ライセンス: Link先を確認

Tim Tsz-Kit Lau, Han Liu, Thomas Pock,

(参考訳) マルチモーダル性のため,低次元でもしばしば困難であるガウス混合系(例えば,ガウス混合系)からの近似サンプリング問題について検討する。我々はマルコフ連鎖モンテカルロ法(MCMC)を用いてこの課題を実行することに集中する。さらに, 近位MCMC法が開発されている2つの非平滑症例にも関心がある。 (i)非滑らかな前者は、ガウス混合とみなす。 (II)ラプラシアン混合分布このような非滑らかで非log-concaveサンプリングタスクは、ベイズ推論や画像のデコンボリューションのような逆問題の画像化への幅広い応用から生じる。我々は,最もよく用いられるLangevin Monte Carloアルゴリズムの性能を比較するために数値シミュレーションを行う。

We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermore, we are also interested in two nonsmooth cases for which a large class of proximal MCMC methods have been developed: (i) a nonsmooth prior is considered with a Gaussian mixture likelihood; (ii) a Laplacian mixture distribution. Such nonsmooth and non-log-concave sampling tasks arise from a wide range of applications to Bayesian inference and imaging inverse problems such as image deconvolution. We perform numerical simulations to compare the performance of most commonly used Langevin Monte Carlo algorithms.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# ベイズ原理によるニューラル付加モデルの改善

Improving Neural Additive Models with Bayesian Principles ( http://arxiv.org/abs/2305.16905v4 )

ライセンス: Link先を確認

Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Rätsch, Vincent Fortuin,

(参考訳) ニューラル加算モデル(NAM)は、個別の加算サブネットワークにおける入力特徴を扱うことにより、ディープニューラルネットワークの透明性を高める。しかし、それらは校正された不確実性を提供し、関連する特徴や相互作用の選択を可能にする固有のメカニズムを欠いている。ベイズの観点から NAM にアプローチすることで、我々はこれらを3つの主要な方法で拡張する。 a) 個別の付加的部分ネットワークに信頼性のある期間を提供することロ経験的ベイズ手続により特徴の暗黙的な選択を行うための限界確率を推定すること。 c) 微調整されたモデルにおける二階相互作用の候補としての機能対のランク付けを容易にすること。特にLaplace-approximated NAMs (LA-NAMs) を開発した。

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 自己教師付き学習のための行列情報理論

Matrix Information Theory for Self-Supervised Learning ( http://arxiv.org/abs/2305.17326v6 )

ライセンス: Link先を確認

Yifan Zhang, Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan,

(参考訳) 最大エントロピー符号化フレームワークは、SimSiam、Barlow Twins、MECといった多くの非コントラスト学習手法に対して統一的な視点を提供する。このフレームワークに着想を得たMatrix-SSLは,行列情報理論を利用して最大エントロピー符号化損失を行列均一性損失として解釈する手法である。さらに、Matrix-SSLは、行列アライメント損失をシームレスに取り込み、異なる分岐に共分散行列を直接アライメントすることで、最大エントロピー符号化法を強化する。実験結果から, Matrix-SSLは, 線形評価条件下でのImageNetデータセットや, 伝達学習タスクのためのMS-COCO上で, 最先端の手法よりも優れていることがわかった。具体的には,MS-COCO上で伝達学習を行う場合,MoCo v2やBYOLといった従来のSOTA手法よりも3.3%向上し,800エポックの事前学習に比べて400エポックに留まった。また,行列クロスエントロピー損失を用いた7Bモデルを微調整し,標準クロスエントロピー損失に対するGSM8Kデータセットのマージンを3.1%とすることで,表現学習を言語モデリングシステムに導入する。コードはhttps://github.com/yifanzhang-pro/Matrix-SSLで公開されている。

The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# フローガイド型ナノスケールローカライゼーションの設計空間の展望

Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization ( http://arxiv.org/abs/2305.18493v2 )

ライセンス: Link先を確認

Filip Lemic, Gerard Calvo Bartra, Arnau Brosa López, Jorge Torres Gómez, Jakob Struye, Falko Dressler, Sergi Abadal, Xavier Costa Perez,

(参考訳) Terahertz(THz)をベースとした無線通信機能を備えたナノデバイスは、ヒトの血流内におけるフロー誘導局在のプライマーを提供する。このようなローカライゼーションは、知覚された事象の場所をイベント自体に割り当てることを可能にし、早期かつ正確な診断の線に沿って精度の高い医療の恩恵を与え、コストと侵襲性を低減させる。フロー誘導型ローカライゼーションはまだ初歩的な段階であり、この問題を対象とする研究はごくわずかである。それにもかかわらず、提案手法の性能評価は、通常、単一の性能指標に沿って、そのようなスケール(例えば、ナノデバイスの限られたエネルギー)と、そのような困難な環境(例えば、体内のTHz伝搬の極端減衰)で関係する様々な側面を無視する非標準化方法で既に実施されている。このように、これらの評価は現実主義のレベルが低く、客観的に比較することはできない。この問題に対処するために、我々はシナリオの環境およびスケールに関連する特質を説明し、その精度や信頼性などの不均一なパフォーマンス指標に沿って、最先端のフロー誘導型ローカライゼーションアプローチの2つの性能を評価する。

Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits in precision medicine along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# Vocos: 高品質音声合成のための時間領域とフーリエベースニューラルボコーダのギャップを埋める

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis ( http://arxiv.org/abs/2306.00814v3 )

ライセンス: Link先を確認

Hubert Siuzdak,

(参考訳) ニューラルヴォコーディングの最近の進歩は、主に時間領域で動作するジェネレーティブ・アドバイサル・ネットワーク(GAN)によって駆動される。このアプローチは有効であるが、時間周波数表現による帰納バイアスを無視し、再帰的かつ計算集約的なアップサンプリング操作をもたらす。フーリエに基づく時間周波数表現は、人間の聴覚知覚とより正確に一致し、その計算のために確立された高速アルゴリズムの恩恵を受ける、魅力的な代替手段である。それでも、複雑な値を持つ分光器の直接再構成は歴史的に問題であり、主に位相回復の問題が原因である。本研究は、フーリエスペクトル係数を直接生成する新しいモデルであるVocosを提示することで、このギャップを埋めようとしている。我々の評価で示されているように、Vocosは音質の最先端に適合するだけでなく、計算効率も大幅に向上し、時間-ドメインのニューラル・ヴォコーディング・アプローチに比べて処理速度が大幅に向上する。ソースコードとモデルの重み付けはhttps://github.com/gemelo-ai/vocos.comでオープンソース化された。

Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that directly generates Fourier spectral coefficients. Vocos not only matches the state-of-the-art in audio quality, as demonstrated in our evaluations, but it also substantially improves computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. The source code and model weights have been open-sourced at https://github.com/gemelo-ai/vocos.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 交流場センサとしてのフロケット時間結晶

Floquet time-crystals as sensors of AC fields ( http://arxiv.org/abs/2306.03927v4 )

ライセンス: Link先を確認

Fernando Iemini, Rosario Fazio, Anna Sanpera,

(参考訳) 離散時間結晶で示される長距離空間秩序と時間秩序は、極端に弱い信号を感知する際に有利な特性となる。本稿では、弱いACフィールドの量子センサとしての性能について検討し、量子フィッシャー情報測定を用いて、長時間の尋問を可能としながら、ショットノイズ限界を克服できることを実証する。このようなシステムでは、集団間相互作用はノイズに対してその力学を安定させ、不完全をプロトコル化するのに十分堅牢である。

The long range spatial and temporal ordering displayed by discrete time crystals, can become advantageous properties when used for sensing extremely weak signals. Here, we investigate their performance as quantum sensors of weak AC-fields and demonstrate, using the quantum Fisher information measure, that they can overcome the shot noise limit while allowing long interrogation times. In such systems, collective interactions stabilize their dynamics against noise making them robust enough to protocol imperfections.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 確率的崩壊: より単純なサブネットに向けたSGDダイナミクスのグラディエントノイズの抽出方法

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks ( http://arxiv.org/abs/2306.04251v3 )

ライセンス: Link先を確認

Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli,

(参考訳) 本研究では,より単純なサブネットワークに過度に表現的ネットワークを駆動する確率勾配降下(SGD)の強い暗黙バイアスを明らかにし,独立パラメータの数を劇的に削減し,一般化を改善する。このバイアスを明らかにするために、SGD によって修正されないパラメータ空間の不変集合や部分集合を同定する。我々は、より単純な(スパースまたはローランクの)サブネットワークに対応する不変集合の2つのクラスに焦点を合わせ、モダンアーキテクチャに一般的に現れる。解析により、SGDはこれらの単純不変集合に対する確率的誘引性の性質を示すことが明らかとなった。我々は,不変量集合の周囲のロスランドスケープの曲率と,確率勾配によってもたらされる雑音との競合に基づいて,確率的誘引性の十分な条件を確立する。顕著なことに、ノイズのレベルが増大すると魅力が増し、サドルポイントや列車損失の局所的な最大値に関連する魅力的な不変集合が出現する。我々は、訓練されたディープニューラルネットワークにおける魅力的な不変集合の存在を経験的に観察し、SGDのダイナミクスがしばしば消滅または冗長なニューロンを持つ単純なサブネットに崩壊することを示す。さらに、この確率的崩壊の単純化プロセスが、線形教師学生フレームワークの一般化にどう役立つかを実証する。最後に、この分析を通じて、長期にわたる学習率の高い早期学習が、その後の一般化に有効である理由を機械的に説明する。

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant sets that correspond to simpler (sparse or low-rank) subnetworks and commonly appear in modern architectures. Our analysis uncovers that SGD exhibits a property of stochastic attractivity towards these simpler invariant sets. We establish a sufficient condition for stochastic attractivity based on a competition between the loss landscape's curvature around the invariant set and the noise introduced by stochastic gradients. Remarkably, we find that an increased level of noise strengthens attractivity, leading to the emergence of attractive invariant sets associated with saddle-points or local maxima of the train loss. We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework. Finally, through this analysis, we mechanistically explain why early training with large learning rates for extended periods benefits subsequent generalization.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# Decom--CAM: tell me you see, in details! Feature-Level Interpretation via Decomposition Class Activation Map

Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map ( http://arxiv.org/abs/2306.04644v2 )

ライセンス: Link先を確認

Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Juan Zhang, Xuan Gong, Baochang Zhang,

(参考訳) ディープラーニングの解釈は非常に難しい問題です。クラスアクティベーションマップ(CAM)は、オブジェクトの位置を強調することによって深層モデルの予測を解釈するために広く使われているが、決定を行うためにモデルが使用する健全な機能についての洞察を得られていない。さらに、既存の評価プロトコルは、解釈可能性のパフォーマンスとモデルの判断品質の相関を見落とし、より根本的な問題を提示します。本稿では,2段階の解法である分解クラス活性化マップ(Decom-CAM)を提案する。 Decom-CAMは、特異値分解を用いて中間活性化写像を直交的特徴に分解し、それらの積分により塩分マップを生成する。特徴の直交性により、CAMは局所的な特徴を捉え、入力画像の目、鼻、顔などの意味的要素を特定できるため、深いモデル解釈にとってより有益である。包括的比較を保証するため、分類精度の結果に基づいてデータセットをサブセットに分割し、各サブセットの解釈可能性性能を別々に評価することで、新しい評価プロトコルを導入する。以上の結果から,Decom-CAMは,すべてのレベルの分類精度において,より高精度な精度マップを生成することにより,最先端の手法を著しく上回ることを示す。機能レベルの解釈可能性のアプローチと組み合わせることで、ディープニューラルネットワークの意思決定プロセスを理解するための新たな方向性の道を開くことができる。

Interpretation of deep learning remains a very challenging problem. Although the Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location, it fails to provide insight into the salient features used by the model to make decisions. Furthermore, existing evaluation protocols often overlook the correlation between interpretability performance and the model's decision quality, which presents a more fundamental issue. This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM), which offers a feature-level interpretation of the model's prediction. Decom-CAM decomposes intermediate activation maps into orthogonal features using singular value decomposition and generates saliency maps by integrating them. The orthogonality of features enables CAM to capture local features and can be used to pinpoint semantic components such as eyes, noses, and faces in the input image, making it more beneficial for deep model interpretation. To ensure a comprehensive comparison, we introduce a new evaluation protocol by dividing the dataset into subsets based on classification accuracy results and evaluating the interpretability performance on each subset separately. Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly by generating more precise saliency maps across all levels of classification accuracy. Combined with our feature-level interpretability approach, this paper could pave the way for a new direction for understanding the decision-making process of deep neural networks.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 量子ビット準備・測定シナリオにおける半対称情報完全測定の自己検査

Self-testing of semisymmetric informationally complete measurements in a qubit prepare-and-measure scenario ( http://arxiv.org/abs/2306.07248v4 )

ライセンス: Link先を確認

Gábor Drótos, Károly F. Pál, Tamás Vértesi,

(参考訳) 自己検査は量子システムを認証するための強力な方法である。当初、デバイス非依存(DI)設定で提案されていたセルフテストは、セミデバイス非依存(セミDI)設定に緩和された。本研究では,1パラメータファミリーに属する特定の種類の非射影量子ビット測定を,半DI準備・測定(PM)シナリオを用いて自己検査することに焦点を当てた。興味深いことに,これまでに発見された最も単純なPMシナリオは,4つの準備と4つの測定のみで,第4の測定を自己検査するためのものである。この測定は 4-アウトカムな非射影作用素評価測度 (POVM) であり、Geng et al [Phys. Rev. Lett. 126, 100401 (2021)] によって導入された半対称情報完備(半SIC) POVM のクラスに該当する。そこで我々は,PMシナリオにおけるセミDI自己検査の分析手法を開発した。我々の結果は、潜在的に最小限の PM シナリオ内で超極小の qubit POVM を自己テストする方法を開拓する。

Self-testing is a powerful method for certifying quantum systems. Initially proposed in the device-independent (DI) setting, self-testing has since been relaxed to the semi-device-independent (semi-DI) setting. In this study, we focus on the self-testing of a specific type of non-projective qubit measurements belonging to a one-parameter family, using the semi-DI prepare-and-measure (PM) scenario. Remarkably, we identify the simplest PM scenario discovered so far, involving only four preparations and four measurements, for self-testing the fourth measurement. This particular measurement is a four-outcome non-projective positive operator-valued measure (POVM) and falls in the class of semisymmetric informationally complete (semi-SIC) POVMs introduced by Geng et al. [Phys. Rev. Lett. 126, 100401 (2021)]. To achieve this, we develop analytical techniques for semi-DI self-testing in the PM scenario. Our results shall pave the way towards self-testing any extremal qubit POVM within a potentially minimal PM scenario.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# ニューラルサーフェスレンダリングによるクリーニングシーンにおけるAny-View 6DoFロボットグラスピングの学習

Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering ( http://arxiv.org/abs/2306.07392v4 )

ライセンス: Link先を確認

Snehal Jauhri, Ishikaa Lunawat, Georgia Chalvatzaki,

(参考訳) 現実世界のロボット操作において重要な課題は、追加のシーン探索を必要とせずに、あらゆる視点から散らばったシーンのオブジェクトを効果的につかむ6DoFである。この研究は、グルーピングをレンダリングとして再解釈し、6DoFグルーピング検出のための新しい方法であるNeuGraspNetを導入し、ニューラルボリューム表現とサーフェスレンダリングの進歩を活用する。ロボットのエンドエフェクタと物体の表面との相互作用を符号化し、共同学習により局所的な物体表面をレンダリングし、共有特徴空間における把握機能を学習する。このアプローチでは、グローバルな(シーンレベルの)特徴をつかむために、局所的な(グラフレベルの)神経表面の特徴をつかむために使用します。これにより、部分的に観察されたシーンであっても、有効で完全に暗黙的な6DoFのクオリティ予測が可能になる。 NeuGraspNetは、モバイル操作のシナリオに共通するランダムな視点で動作し、既存の暗黙的および半単純的把握方法より優れている。この手法の現実的な適用性は、オープンで散らばった空間をつかむ移動マニピュレータロボットで実証されている。 Project website at https://sites.google.com/view/neugraspnet

A significant challenge for real-world robotic manipulation is the effective 6DoF grasping of objects in cluttered scenes from any single viewpoint without the need for additional scene exploration. This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering. It encodes the interaction between a robot's end-effector and an object's surface by jointly learning to render the local object surface and learning grasping functions in a shared feature space. The approach uses global (scene-level) features for grasp generation and local (grasp-level) neural surface features for grasp evaluation. This enables effective, fully implicit 6DoF grasp quality prediction, even in partially observed scenes. NeuGraspNet operates on random viewpoints, common in mobile manipulation scenarios, and outperforms existing implicit and semi-implicit grasping methods. The real-world applicability of the method has been demonstrated with a mobile manipulator robot, grasping in open, cluttered spaces. Project website at https://sites.google.com/view/neugraspnet

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# DiffAug:ロバスト分類器の訓練のためのディフューズ・アンド・ディネーズ強化

DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers ( http://arxiv.org/abs/2306.09192v2 )

ライセンス: Link先を確認

Chandramouli Sastry, Sri Harsha Dumpala, Sageev Oore,

(参考訳) DiffAugは、画像分類器を訓練するための、単純で効率的な拡散に基づく拡張手法である。与えられた例にDiffAugを適用すると、1つの前方拡散ステップと1つの逆拡散ステップからなる。 ResNet-50アーキテクチャとVision Transformerアーキテクチャの両方を用いて、DiffAugで訓練された分類器を網羅的に評価し、コバリアレートシフトに対するロバスト性の向上、検証された逆精度、および分布検出における単一ステップ逆拡散の驚くべき効果を実証する。 DiffAugをAugMixやDeepAugmentのような他の拡張と組み合わせると、さらなる堅牢性の向上が示されます。最後に、このアプローチに基づいて分類器誘導拡散を改善する。 (i)分類器の一般化 (二)勾配品質(知覚アライメントの改善)、及び (iii)画像生成性能。そこで本稿では,新たなデータを必要としない頑健さを向上し,既存の拡張アプローチを効果的に補完する,計算効率のよいトレーニング手法を提案する。

We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers for the crucial yet challenging goal of improved classifier robustness. Applying DiffAug to a given example consists of one forward-diffusion step followed by one reverse-diffusion step. Using both ResNet-50 and Vision Transformer architectures, we comprehensively evaluate classifiers trained with DiffAug and demonstrate the surprising effectiveness of single-step reverse diffusion in improving robustness to covariate shifts, certified adversarial accuracy and out of distribution detection. When we combine DiffAug with other augmentations such as AugMix and DeepAugment we demonstrate further improved robustness. Finally, building on this approach, we also improve classifier-guided diffusion wherein we observe improvements in: (i) classifier-generalization, (ii) gradient quality (i.e., improved perceptual alignment) and (iii) image generation performance. We thus introduce a computationally efficient technique for training with improved robustness that does not require any additional data, and effectively complements existing augmentation approaches.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 非対称初期状態からの電荷ゆらぎのダイナミクス

Dynamics of charge fluctuations from asymmetric initial states ( http://arxiv.org/abs/2306.12404v3 )

ライセンス: Link先を確認

Bruno Bertini, Katja Klobas, Mario Collura, Pasquale Calabrese, Colin Rylands,

(参考訳) 保存電荷密度は、量子多体系において非常に特殊な観測可能量であり、建設によって力学に関する情報を符号化する。したがって、それらの進化は一般的な観測可能なものよりもはるかに単純な解釈であり、任意の時間にシステムの状態に関する普遍的な情報を返すことが期待されている。ここでは、電荷非対称初期状態で準備された系における保存されたU(1)電荷のゆらぎのダイナミクスについて検討する。与えられたサブシステムの電荷ゆらぎを、切り刻まれた電荷のフルカウント統計と、電荷の対称性セクターに解決されたサブシステムと残りの部分の間の量子的絡み合いを用いて特徴づける。初期状態が空間において均質であるとしても、電荷揺らぎは初期状態の電荷非対称性に起因する有効不均一性を生成することを示す。この観測により、この問題を不均一な電荷対称状態上の電荷ゆらぎにマッピングし、最近開発された時空双対性アプローチを用いてそれを扱う。相互作用可能なシステムに対する処理を専門にすることで、時空双対性アプローチと一般化された流体力学を組み合わせて明確な予測を求める。

Conserved-charge densities are very special observables in quantum many-body systems as, by construction, they encode information about the dynamics. Therefore, their evolution is expected to be of much simpler interpretation than that of generic observables and to return universal information on the state of the system at any given time. Here we study the dynamics of the fluctuations of conserved U(1) charges in systems that are prepared in charge-asymmetric initial states. We characterise the charge fluctuations in a given subsystem using the full-counting statistics of the truncated charge and the quantum entanglement between the subsystem and the rest resolved to the symmetry sectors of the charge. We show that, even though the initial states considered are homogeneous in space, the charge fluctuations generate an effective inhomogeneity due to the charge-asymmetric nature of the initial states. We use this observation to map the problem into that of charge fluctuations on inhomogeneous, charge-symmetric states and treat it using a recently developed space-time duality approach. Specialising the treatment to interacting integrable systems we combine the space-time duality approach with generalised hydrodynamics to find explicit predictions.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 非エルミート型低温原子ポンプによるき裂線形応答

Kinked linear response from non-Hermitian cold-atom pumping ( http://arxiv.org/abs/2306.13139v2 )

ライセンス: Link先を確認

Fang Qin, Ruizhe Shen, Linhu Li, Ching Hua Lee,

(参考訳) 非エルミート系と非エルミート系が指数関数的に局所化された皮膚モードを持つことはよく知られている。しかし,本研究では, 急激な物理的衝動の欠如にもかかわらず, 非ハーモニティ性は量子気体の半古典波パケット軌道において, 急激で顕著なキツネを生じさせることがわかった。これは、すべての物理的カップリングが局所的であっても、非エルミートポンピングから過小評価された内在的非局所性に由来するため、不連続なバンド幾何学やベリー曲率につながるバンド構造における謎的な特異性をもたらす。具体的には,超低温原子配置におけるキンク応答の実現に焦点をあてる。実測実験では, 物理原子雲の微調整をせずに応答キンクを観察できるようなレーザー誘起損失を有する2次元光学格子の超低温原子配置を提案する。以上の結果から,非エルミチアン励起による非エルミチアン励起による特異な非単調な挙動が示され,超低温原子プラットフォーム上での非エルミチアン動力学研究の新たな道筋が示唆された。

It is well known that non-Hermitian, non-reciprocal systems may harbor exponentially localized skin modes. However, in this work, we find that, generically, non-Hermiticity gives rise to abrupt and prominent kinks in the semi-classical wave packet trajectories of quantum gases, despite the absence of sudden physical impulses. This physically stems from a hitherto underappreciated intrinsic non-locality from non-Hermitian pumping, even if all physical couplings are local, thereby resulting in enigmatic singularities in the band structure that lead to discontinuous band geometry and Berry curvature. Specifically, we focus on the realization of the kinked response in an ultracold atomic setup. For a concrete experimental demonstration, we propose an ultracold atomic setup in a two-dimensional optical lattice with laser-induced loss such that response kinks can be observed without fine-tuning in the physical atomic cloud dynamics. Our results showcase unique non-monotonic behavior from non-Hermitian pumping beyond the non-Hermitian skin effect and suggest new avenues for investigating non-Hermitian dynamics on ultracold atomic platforms.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# 高励起状態の断熱時間進化

Adiabatic time evolution of highly excited states ( http://arxiv.org/abs/2306.13967v3 )

ライセンス: Link先を確認

Hadi Yarloo, Hua-Chen Zhang, Anne E. B. Nielsen,

(参考訳) 量子システムの断熱時間進化は、状態の準備から計算の単純化、トポロジカル変換、最適化や量子コンピューティングに至るまで、広く使われているツールである。断熱時間進化は一般にギャップのある基底状態に対してうまく機能するが、保護エネルギーギャップが欠如しているスペクトルの中央の熱状態に対してはうまく機能しない。ここでは、特定の種類の高励起状態である量子多体傷が、エネルギーギャップの保護がないにもかかわらず断熱的な時間進化に適していることが示される。テンソルネットワークと2次元分数的量子ホールモデルから構築された2つのかなり異なるモデルを考えると、必要な最終断熱忠実度が約0.99のとき、量子不足は断熱力学に関してギャップ付き基底状態と類似する。 1次元モデルの傷痕状態が断熱的に変換できる最大速度は、一般的な熱と障害駆動の局所化状態の両方に対して指数関数的に減少するのに対し、システムサイズによるパワー則として減少する。傾斜速度が一定で非常に低い場合、一元性からの距離のずれはスカー状態のランプ速度と線形にスケールするが、ギャップのある地盤状態の2次的な変化は見いだされる。したがって、0.9999以上のような、必要となる断熱係数が非常に高い場合には、ギャップのある基底状態がより良く動作する。傷跡から漏れ出す2つのメカニズムを特定し,その結果を説明する。単一で孤立した基底状態の操作は量子的応用では一般的であるが、傷跡状態の断熱的進化は、単一のシステムで同時に基底状態のような状態の塔全体を操作できる柔軟性を提供する。

Adiabatic time evolution of quantum systems is a widely used tool with applications ranging from state preparation through simplifications of computations and topological transformations to optimization and quantum computing. Adiabatic time evolution generally works well for gapped ground states, but not for thermal states in the middle of the spectrum that lack a protecting energy gap. Here we show that quantum many-body scars -- a particular type of highly excited states -- are suitable for adiabatic time evolution despite the absence of a protecting energy gap. Considering two rather different models, namely a one-dimensional model constructed from tensor networks and a two-dimensional fractional quantum Hall model with anyons, we find that the quantum scars perform similarly to gapped ground states with respect to adiabatic dynamics when the required final adiabatic fidelity is around 0.99. The maximum speed at which the scar state of the one-dimensional model can be adiabatically transformed decreases as a power law with system size, as opposed to exponentially for both generic thermal and disorder-driven localized states. At constant and very low ramp speed, we find that the deviation of the fidelity from unity scales linearly with ramp speed for scar states, but quadratically for gapped ground states. The gapped ground states hence perform better when the required adiabatic fidelities are very high, such as 0.9999 and above. We identify two mechanisms for leakage out of the scar state and use them to explain our results. While manipulating a single, isolated ground state is common in quantum applications, adiabatic evolution of scar states provides the flexibility to manipulate an entire tower of ground-state-like states simultaneously in a single system.

翻訳日:2024-05-31 02:31:12 公開日:2024-05-29

# ノイズの多い量子木:補正なしの無限保護

Noisy Quantum Trees: Infinite Protection Without Correction ( http://arxiv.org/abs/2306.14294v3 )

ライセンス: Link先を確認

Shiv Akshar Yadavalli, Iman Marvian,

(参考訳) 我々は,木構造を持つ量子ネットワークについて研究し,そこでは根から葉まで情報を伝達する。ネットワーク内の各ノードにおいて、受信されたキュービットは、新しいアンシラ量子ビットと一元的に相互作用し、その後、各キュービットはノイズチャネルを介して次のレベルで別のノードに送信される。したがって,木深度が大きくなるにつれて,情報の非局在化によって達成されるノイズの可逆効果と,そのようなノイズに対する保護との競合が生じる。古典的な設定では、各ノードが入力ビットを複数の出力ビットにコピーするだけで、このモデルは広範に応用できる木上の放送や再構成の問題として研究されている。本研究では,この問題の量子バージョンについて検討する。本稿では,各ノードにおけるCliffordエンコーダについて検討し,各エッジに単一キュービットのPauliノイズチャネルとともに,入力キュービットを安定化器コードに符号化する。このようなノイズの多い量子木は、新鮮な(低エントロピー)アンシラ量子ビットのストリームにアクセスすることができるが、誤り訂正を行うことができないシナリオを記述している。したがって、それらは量子的フォールトトレランスに関して異なる視点を提供する。さらに、連結符号のエンコーダ内のノイズの影響を記述するための有用なモデルを提供する。距離やエンコーダの特性といった符号の性質に依存する特定の雑音閾値を超えると、情報は木の深さとともに指数関数的に減衰する。一方、ある効率的な復号器の研究により、距離d>=2の符号と十分小さい(しかしゼロでない)雑音に対して、古典的な情報と絡み合いは無限の深さの雑音木に伝播することを示した。実際、これは、各ノードに特定の2-qubitエンコーダを持つバイナリツリーでさえも当てはまり、受信したキュービットは、距離 d=1 のバイナリ反復符号で符号化される。

We study quantum networks with tree structures, in which information propagates from a root to leaves. At each node in the network, the received qubit unitarily interacts with fresh ancilla qubits, after which each qubit is sent through a noisy channel to a different node in the next level. Therefore, as the tree depth grows, there is a competition between the irreversible effect of noise and the protection against such noise achieved by delocalization of information. In the classical setting, where each node simply copies the input bit into multiple output bits, this model has been studied as the broadcasting or reconstruction problem on trees, which has broad applications. In this work, we study the quantum version of this problem. We consider a Clifford encoder at each node that encodes the input qubit in a stabilizer code, along with a single qubit Pauli noise channel at each edge. Such noisy quantum trees describe a scenario in which one has access to a stream of fresh (low-entropy) ancilla qubits, but cannot perform error correction. Therefore, they provide a different perspective on quantum fault tolerance. Furthermore, they provide a useful model for describing the effect of noise within the encoders of concatenated codes. We prove that above certain noise thresholds, which depend on the properties of the code such as its distance, as well as the properties of the encoder, information decays exponentially with the depth of the tree. On the other hand, by studying certain efficient decoders, we prove that for codes with distance d>=2 and for sufficiently small (but non-zero) noise, classical information and entanglement propagate over a noisy tree with infinite depth. Indeed, we find that this remains true even for binary trees with certain 2-qubit encoders at each node, which encode the received qubit in the binary repetition code with distance d=1.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# テキストアノテーションのためのオープンソースLCM:モデル設定と微調整のための実践的ガイド

Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning ( http://arxiv.org/abs/2307.02179v2 )

ライセンス: Link先を確認

Meysam Alizadeh, Maël Kubli, Zeynab Samei, Shirin Dehghani, Mohammadmasiha Zahedivafa, Juan Diego Bermeo, Maria Korobeynikova, Fabrizio Gilardi,

(参考訳) 本稿では、政治科学研究に典型的なテキスト分類タスクにおけるオープンソースのLarge Language Models(LLM)の性能について検討する。姿勢・話題・関連分類などの課題を調べることで,テキスト分析におけるLLMの使用に関する情報的判断を学者に指導することを目指す。具体的には、ニュース記事やつぶやきデータセットを用いたテキストアノテーションタスクにおいて、ゼロショットと微調整の両方のLDMの評価を行う。解析の結果、微調整によりオープンソースのLCMの性能が向上し、ゼロショットのGPT-3.5やGPT-4に匹敵する結果が得られるが、微調整のGPT-3.5には遅れが生じる。さらに,注釈付きテキストを比較的控えめな量で微調整を施すことが,少人数の訓練より望ましいことを確認した。この結果から,微調整されたオープンソース LLM は,幅広いテキストアノテーションアプリケーションに効果的に展開できることが示唆された。テキストアノテーションへのLLMの適用を容易にするPythonノートを提供する。

This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT-3.5 and GPT-4, though still lagging behind fine-tuned GPT-3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# 収束保証を用いたフェアネスを考慮したフェデレーションミニマックス最適化

Fairness-aware Federated Minimax Optimization with Convergence Guarantee ( http://arxiv.org/abs/2307.04417v3 )

ライセンス: Link先を確認

Gerry Windiarto Mohamad Dunda, Shenghui Song,

(参考訳) フェデレートラーニング(FL)はそのプライバシー保護機能のためにかなりの注目を集めている。それでも、ユーザデータ管理の自由の欠如は、モデルが人種や性別などのセンシティブな要因に偏っている、グループフェアネスの問題につながる可能性がある。そこで本研究では,FLにおけるグループフェアネス問題に明示的に対処するために,拡張ラグランジアン法(FFALM)を用いたフェアフェデレーション平均化アルゴリズムを提案する。具体的には、トレーニング目標に公正性制約を課し、制約付き最適化問題の最小化を解消する。次に、FFALMの収束率の理論上界を導出する。 FFALMの公正性向上効果は,CelebA および UTKFace データセットにおいて,統計的に重大な不均一性の存在下で実証的に示された。

Federated learning (FL) has garnered considerable attention due to its privacy-preserving feature. Nonetheless, the lack of freedom in managing user data can lead to group fairness issues, where models are biased towards sensitive factors such as race or gender. To tackle this issue, this paper proposes a novel algorithm, fair federated averaging with augmented Lagrangian method (FFALM), designed explicitly to address group fairness issues in FL. Specifically, we impose a fairness constraint on the training objective and solve the minimax reformulation of the constrained optimization problem. Then, we derive the theoretical upper bound for the convergence rate of FFALM. The effectiveness of FFALM in improving fairness is shown empirically on CelebA and UTKFace datasets in the presence of severe statistical heterogeneity.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# マルチモーダルからモノモーダルリンパ腫サブタイプモデルへの知識伝達のための視覚トランスフォーマーに基づくフレームワーク

A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models ( http://arxiv.org/abs/2308.01328v3 )

ライセンス: Link先を確認

Bilel Guetarni, Feryal Windal, Halim Benhabiles, Marianne Petit, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard,

(参考訳) リンパ腫の亜型を決定することは、生存率を高めるために患者治療を改善するための重要なステップである。この文脈では、遺伝子発現技術に依存している既存のゴールドスタンダード診断法は非常に高価で時間を要するため、アクセシビリティが低下する。 IHC(免疫組織化学)技術に基づく代替診断法は(WHOによって推奨されている)存在するが、同様の制限に悩まされており、精度は低い。ディープラーニングモデルを用いた全体スライド画像(WSI)解析は、既存の方法に対する費用対効果と迅速な代替手段を提供する、がん診断の有望な可能性を示している。本研究では,高分解能WSIからDLBCL(Diffuse Large B-Cell Lymphoma)癌サブタイプを識別するための視覚トランスフォーマーベースのフレームワークを提案する。この目的のために、様々なWSIモダリティから分類器モデルを訓練するためのマルチモーダルアーキテクチャを導入する。次に、知識蒸留プロセスを通じてこのモデルを活用し、モノモーダル分類器の学習を効率的に導く。 157例のリンパ腫データセットを用いて検討したところ, モノモーダル分類モデルの有望な性能が得られた。さらに,本実験データから推定したパワーロー曲線から,有意な患者数からのトレーニングデータが増えると,IHC技術との競合診断精度が向上する可能性が示唆された。さらに,外部乳癌データセット(BCIデータセット)に関する追加実験により,本フレームワークの有効性を確認した。

Determining lymphoma subtypes is a crucial step for better patient treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which relies on gene expression technology, is highly expensive and time-consuming, making it less accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. Whole Slide Image (WSI) analysis using deep learning models has shown promising potential for cancer diagnosis, that could offer cost-effective and faster alternatives to existing methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we introduce a multi-modal architecture to train a classifier model from various WSI modalities. We then leverage this model through a knowledge distillation process to efficiently guide the learning of a mono-modal classifier. Our experimental study conducted on a lymphoma dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent state-of-the-art methods. In addition, the power-law curve, estimated on our experimental data, suggests that with more training data from a reasonable number of additional patients, our model could achieve competitive diagnosis accuracy with IHC technologies. Furthermore, the efficiency of our framework is confirmed through an additional experimental study on an external breast cancer dataset (BCI dataset).

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# GPLaSDI:Deep Autoencoderによるガウス過程に基づく解釈可能な遅延空間ダイナミクスの同定

GPLaSDI: Gaussian Process-based Interpretable Latent Space Dynamics Identification through Deep Autoencoder ( http://arxiv.org/abs/2308.05882v3 )

ライセンス: Link先を確認

Christophe Bonneville, Youngsoo Choi, Debojyoti Ghosh, Jonathan L. Belof,

(参考訳) 偏微分方程式(PDE)の数値解法は困難であり、計算コストも高い。これにより、精度は高いがフルオーダーモデル(FOM)よりも高速な低階モデル(ROM)が開発された。近年、機械学習の進歩により、LaSDI(Latent Space Dynamics Identification)のような非線形射影法が作成できるようになった。 LaSDIはオートエンコーダを用いて全階PDEソリューションを潜在空間にマッピングし、潜在空間力学を管理するODEのシステムを学ぶ。縮小潜時空間におけるODEシステムの補間と解法により、予測潜時空間力学をデコーダに供給することにより、高速かつ正確なROM予測を行うことができる。本稿では,遅延空間ODE補間のためのガウス過程(GP)に依存する新しいLaSDIベースのフレームワークであるGPLaSDIを紹介する。 GPを使うことには2つの大きな利点がある。まず、ROM予測に対する不確実性の定量化を可能にする。第二に、この予測の不確実性を活用することで、追加のトレーニングデータポイントの厳選による効率的な適応トレーニングが可能になる。このアプローチは、基礎となるPDEの事前知識を必要としない。したがって、GPLaSDI は本質的に非侵入的であり、既知の PDE やその残余のない問題に適用することができる。本稿では,バーガース方程式,プラズマ物理学におけるブラソフ方程式,熱バブル問題に対する我々のアプローチの有効性を実証する。提案手法は, 最大7%の相対誤差で200～10万倍の高速化を実現する。

Numerically solving partial differential equations (PDEs) can be challenging and computationally expensive. This has led to the development of reduced-order models (ROMs) that are accurate but faster than full order models (FOMs). Recently, machine learning advances have enabled the creation of non-linear projection methods, such as Latent Space Dynamics Identification (LaSDI). LaSDI maps full-order PDE solutions to a latent space using autoencoders and learns the system of ODEs governing the latent space dynamics. By interpolating and solving the ODE system in the reduced latent space, fast and accurate ROM predictions can be made by feeding the predicted latent space dynamics into the decoder. In this paper, we introduce GPLaSDI, a novel LaSDI-based framework that relies on Gaussian process (GP) for latent space ODE interpolations. Using GPs offers two significant advantages. First, it enables the quantification of uncertainty over the ROM predictions. Second, leveraging this prediction uncertainty allows for efficient adaptive training through a greedy selection of additional training data points. This approach does not require prior knowledge of the underlying PDEs. Consequently, GPLaSDI is inherently non-intrusive and can be applied to problems without a known PDE or its residual. We demonstrate the effectiveness of our approach on the Burgers equation, Vlasov equation for plasma physics, and a rising thermal bubble problem. Our proposed method achieves between 200 and 100,000 times speed-up, with up to 7% relative error.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# 勾配支配確率最適化のための均質化手法

A Homogenization Approach for Gradient-Dominated Stochastic Optimization ( http://arxiv.org/abs/2308.10630v3 )

ライセンス: Link先を確認

Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye,

(参考訳) グラディエント優位性は強い凸性よりも弱い条件であるが、非凸最適化においても大域収束を十分に保証する。この特性は、機械学習、強化学習(RL)、および運用管理に広く応用されている。本稿では,最近提案された均質化アプローチに基づく確率関数の確率的等質二階降下法(SHSODM)を提案する。理論的には, サンプルの複雑性解析を行い, さらに分散低減手法を取り入れた拡張結果を示す。以上の結果から,SHSODMは勾配優先確率最適化法において,立方正則化を伴わない他の2次法で達成される最もよく知られたサンプル複雑性と一致した。理論的には、ホモジェナイゼーションアプローチはニュートン型システムではなく、各イテレーションにおいて極端固有ベクトル問題を解くことのみに依存しているため、我々の手法は、不条件問題におけるより安価な計算コストとロバスト性を利用することができる。いくつかのRLタスクに関する数値実験は、SHSODMの他のオフ・ザ・シェルフ法と比較して優れた性能を示す。

Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient dominance property based on a recently proposed homogenization approach. Theoretically, we provide its sample complexity analysis, and further present an enhanced result by incorporating variance reduction techniques. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated stochastic optimization but without cubic regularization. Empirically, since the homogenization approach only relies on solving extremal eigenvector problem at each iteration instead of Newton-type system, our methods gain the advantage of cheaper computational cost and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the better performance of SHSODM compared to other off-the-shelf methods.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# 可視マーカーを用いた無布地変形性表面再構成

Textureless Deformable Surface Reconstruction with Invisible Markers ( http://arxiv.org/abs/2308.13678v2 )

ライセンス: Link先を確認

Xinyuan Li, Yu Guo, Yubei Tu, Yu Ji, Yanchen Liu, Jinwei Ye, Changxi Zheng,

(参考訳) 変形可能な表面をテクスチャをほとんどあるいは全く持たずに再構築・追跡することは、長年にわたる課題となっている。基本的に、課題は、クロスイメージ対応を確立するための特徴を欠いた、テクスチャのない表面から生じている。本研究では, 物体の表面性状を積極的に高め, 3次元表面再構成と対応追跡を容易にする新しい種類のマーカーを提案する。我々のマーカーは蛍光染料でできており、紫外線(UV)の下でのみ可視であり、通常の照明条件下では見えない。マーカーを活用することで,紫外光と可視光の下での表面変形を時間多重的に捉えるマルチカメラシステムを設計する。紫外線の下では、物体のマーカーが表面のテクスチャを豊かにし、高品質な3D形状の再構築と追跡を可能にする。可視光の下では、マーカーは見えなくなり、物体の元々の触れられていない外観を捉えることができます。我々は,手振り,表情,手振り布,物体間相互作用など,さまざまな困難な場面で実験を行った。これらすべてのケースにおいて、我々のシステムは堅牢で高品質な3D再構成と追跡を実現できることを実証する。

Reconstructing and tracking deformable surface with little or no texture has posed long-standing challenges. Fundamentally, the challenges stem from textureless surfaces lacking features for establishing cross-image correspondences. In this work, we present a novel type of markers to proactively enrich the object's surface features, and thereby ease the 3D surface reconstruction and correspondence tracking. Our markers are made of fluorescent dyes, visible only under the ultraviolet (UV) light and invisible under regular lighting condition. Leveraging the markers, we design a multi-camera system that captures surface deformation under the UV light and the visible light in a time multiplexing fashion. Under the UV light, markers on the object emerge to enrich its surface texture, allowing high-quality 3D shape reconstruction and tracking. Under the visible light, markers become invisible, allowing us to capture the object's original untouched appearance. We perform experiments on various challenging scenes, including hand gestures, facial expressions, waving cloth, and hand-object interaction. In all these cases, we demonstrate that our system is able to produce robust, high-quality 3D reconstruction and tracking.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# シャーロック・ホームズは夢を演じない:社会科学と生命科学におけるエビデンス理論の重要性

Sherlock Holmes Doesn't Play Dice: The significance of Evidence Theory for the Social and Life Sciences ( http://arxiv.org/abs/2309.03222v2 )

ライセンス: Link先を確認

V. L. Raju Chinthalapati, Guido Fioretti,

(参考訳) エビデンス理論 (Demster-Shafer Theory, Belief Functions Theory) はデータ融合においてますます使われてきているが、社会科学と生命科学におけるその可能性はしばしば、その特徴に対する認識の欠如によって曖昧になっている。この論文では、事象が成立する恐れから生じる不確実性を、エビデンス理論が表現できると強調する。対照的に、確率論は意思決定者が現在検討している可能性に制限されなければならない。次に,確率論の様々なバージョンに対するベイズの理論と,デンプスター・シェーファーの組合せルールがどのように関連しているかを説明し,情報理論のどの応用をエビデンス理論によって拡張できるかについて議論する。最後に、我々の主張を、監査演習に現れる部分的に重なり合う、部分的に矛盾する解を理解するためにエビデンス理論が用いられる例で説明する。

While Evidence Theory (Demster-Shafer Theory, Belief Functions Theory) is being increasingly used in data fusion, its potentialities in the Social and Life Sciences are often obscured by lack of awareness of its distinctive features. With this paper we stress that Evidence Theory can express the uncertainty deriving from the fear that events may materialize, that one has not been able to figure out. By contrast, Probability Theory must limit itself to the possibilities that a decision-maker is currently envisaging. Subsequently, we illustrate how Dempster-Shafer's combination rule relates to Bayes' Theorem for various versions of Probability Theory and discuss which applications of Information Theory can be enhanced by Evidence Theory. Finally, we illustrate our claims with an example where Evidence Theory is used to make sense of the partially overlapping, partially contradictory solutions that appear in an auditing exercise.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# ハイブリッド量子最適化を用いた分散スケジューリングによる需要側応答のインセンティブ

Incentivising Demand Side Response through Discount Scheduling using Hybrid Quantum Optimization ( http://arxiv.org/abs/2309.05502v2 )

ライセンス: Link先を確認

David Bucher, Jonas Nüßlein, Corey O'Meara, Ivan Angelov, Benedikt Wimmer, Kumar Ghosh, Giorgio Cortiana, Claudia Linnhoff-Popien,

(参考訳) デマンドサイド・レスポンス(Demand Side Response, DSR)は、消費者が電気需要の管理に積極的に参加できるようにする戦略である。高需要時のグリッドの歪みを緩和し、(再生可能な)電力資源のよりバランスよく効率的な利用を促進することを目的としている。我々は、ディスカウントスケジューリングを通じてDSRを実装し、地域エネルギー混合がより再生可能エネルギーで構成されている時間に電力消費パターンを調整するために消費者に個別の価格インセンティブを提供する。個々の顧客消費に対する割引を調整するため、ディスカウントスケジューリング問題(DSP)は大規模な組合せ最適化タスクとなる。そこで我々は,D-WaveのLeap Hybrid Cloudを用いたハイブリッド量子コンピューティングアプローチを採用した。従来の混合整数オプティマイザであるGurobiに対して,固定実行時のソリューション品質と,割引割当の公平性の観点から,Leapをベンチマークした。さらに,サブルーチンを動作させる量子コンピュータや古典コンピュータに適用したDSPの大規模分解アルゴリズムを提案する。実世界のデータから生成された合成データを用いて,古典的分解法は,最大3200人の消費者に最適な総合的な「新規解法」品質を与えるが,ハイブリッド量子アプローチは消費者に均等に分散したディスカウントを提供する。

Demand Side Response (DSR) is a strategy that enables consumers to actively participate in managing electricity demand. It aims to alleviate strain on the grid during high demand and promote a more balanced and efficient use of (renewable) electricity resources. We implement DSR through discount scheduling, which involves offering discrete price incentives to consumers to adjust their electricity consumption patterns to times when their local energy mix consists of more renewable energy. Since we tailor the discounts to individual customers' consumption, the Discount Scheduling Problem (DSP) becomes a large combinatorial optimization task. Consequently, we adopt a hybrid quantum computing approach, using D-Wave's Leap Hybrid Cloud. We benchmark Leap against Gurobi, a classical Mixed Integer optimizer in terms of solution quality at fixed runtime and fairness in terms of discount allocation. Furthermore, we propose a large-scale decomposition algorithm/heuristic for the DSP, applied with either quantum or classical computers running the subroutines, which significantly reduces the problem size while maintaining solution quality. Using synthetic data generated from real-world data, we observe that the classical decomposition method obtains the best overall \newp{solution quality for problem sizes up to 3200 consumers, however, the hybrid quantum approach provides more evenly distributed discounts across consumers.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# RGB-Dビデオからの物理に基づく剛体物体追跡と摩擦フィルタ

Physics-Based Rigid Body Object Tracking and Friction Filtering From RGB-D Videos ( http://arxiv.org/abs/2309.15703v2 )

ライセンス: Link先を確認

Rama Krishna Kandukuri, Michael Strecke, Joerg Stueckler,

(参考訳) 感覚観察による物体の相互作用の物理に基づく理解は、拡張現実やロボット工学において必須の能力である。シミュレーションと制御のためにシーンのプロパティをキャプチャすることができる。本稿では,RGB-D画像から剛体物体を3次元で追跡し,物体の物理的特性を推定する,リアル・トゥ・シムのための新しい手法を提案する。我々は,任意のメッシュ形状の接触と摩擦をモデル化できる拡張カルマンフィルタの状態遷移モデルとして,微分可能な物理シミュレーションを用いて,物理的に妥当な軌道を推定する。提案手法は, 位置, 向き, 速度をフィルタし, 同時に物体の摩擦係数を推定できることを実証する。我々は,単一物体と衝突物体の合成画像列における様々なスライディングシナリオに対するアプローチを分析する。また、実世界のデータセットに対する我々のアプローチを実証し、評価する。我々は,この新たな問題設定と手法との比較において,今後の研究を促進するために,新しいベンチマークデータセットを公開している。

Physics-based understanding of object interactions from sensory observations is an essential capability in augmented reality and robotics. It enables to capture the properties of a scene for simulation and control. In this paper, we propose a novel approach for real-to-sim which tracks rigid objects in 3D from RGB-D images and infers physical properties of the objects. We use a differentiable physics simulation as state-transition model in an Extended Kalman Filter which can model contact and friction for arbitrary mesh-based shapes and in this way estimate physically plausible trajectories. We demonstrate that our approach can filter position, orientation, velocities, and concurrently can estimate the coefficient of friction of the objects. We analyze our approach on various sliding scenarios in synthetic image sequences of single objects and colliding objects. We also demonstrate and evaluate our approach on a real-world dataset. We make our novel benchmark datasets publicly available to foster future research in this novel problem setting and comparison with our method.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# 従来のNo-Go理論を克服する - 複数のアクセスチャネルにおける量子アドバンテージ

Overcoming Traditional No-Go Theorems: Quantum Advantage in Multiple Access Channels ( http://arxiv.org/abs/2309.17263v2 )

ライセンス: Link先を確認

Ananya Chakraborty, Sahil Gopalkrishna Naik, Edwin Peter Lobo, Ram Krishna Patra, Samrat Sen, Mir Alimuddin, Amit Mukherjee, Manik Banik,

(参考訳) マルチノード構成の領域へのポイント・ツー・ポイント通信モデルの拡張は、インターネットや通信ネットワークにおける多くのアプリケーションを見つける。ここでは、Multiple Access Channel(MAC)と呼ばれる、一般的に遭遇するネットワーク構成において、量子通信の新たな利点を確立する。 MACは複数の遠隔送信者で構成され、それぞれのメッセージを共通の受信機に送信する。量子超デンス符号化プロトコルとは異なり、ここで報告される利点は送信者と受信者の絡み合いを引き起こすことなく実現される。特に、そのような利点は、1つの送信者と1つの受信者を含む伝統的なポイント・ツー・ポイントの通信では達成不可能であり、そこではホレヴォとフランケル・ワイナーのノー・ゴー定理の制限が成立する。 MAC設定内では、この独特な利点は、受信機が複数の送信機から受信した量子システムを同時に復号するユニークな能力を通じて実現される。興味深いことに、MAC設計のいくつかは、プゼー=バレット=ルドルフの定理や「絡みのない非局所性」の概念など、もともと全く異なる目的のために探求された量子基礎の様々な構成からインスピレーションを得ている。ネットワーク通信における直接的な応用の他に、提示された量子優位性は「入力なしの量子非局所性」の概念と深いつながりを示唆し、絡み合った測定の半デバイス非依存的な認証の可能性を秘めている。

Extension of point-to-point communication model to the realm of multi-node configurations finds a plethora of applications in internet and telecommunication networks. Here, we establish a novel advantage of quantum communication in a commonly encountered network configuration known as the Multiple Access Channel (MAC). A MAC consists of multiple distant senders aiming to send their respective messages to a common receiver. Unlike the quantum superdense coding protocol, the advantage reported here is realized without invoking entanglement between the senders and the receiver. Notably, such an advantage is unattainable in traditional point-to-point communication involving one sender and one receiver, where the limitations imposed by the Holevo and Frankel Weiner no-go theorems come into play. Within the MAC setup, this distinctive advantage materializes through the receiver's unique ability to simultaneously decode the quantum systems received from multiple senders. Intriguingly, some of our MAC designs draw inspiration from various other constructs in quantum foundations, such as the Pusey-Barrett-Rudolph theorem and the concept of `nonlocality without entanglement', originally explored for entirely different purposes. Beyond its immediate applications in network communication, the presented quantum advantage hints at a profound connection with the concept of `quantum nonlocality without inputs' and holds the potential for semi-device-independent certification of entangled measurements.

翻訳日:2024-05-31 02:21:25 公開日:2024-05-29

# 自己指導型学習における情報の流れ

Information Flow in Self-Supervised Learning ( http://arxiv.org/abs/2309.17281v3 )

ライセンス: Link先を確認

Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan, Yifan Zhang,

(参考訳) 本稿では,2つのデュアルブランチ(シームズアーキテクチャ)自己教師型学習アプローチ,すなわちBarlow Twinsとスペクトルコントラスト学習を,行列相互情報のレンズを通して包括的に分析する。これらの手法の損失関数は,行列の相互情報と行列の関節エントロピーの両方を暗黙的に最適化することを証明する。この知見は、特にMAEやU-MAEといった単一ブランチアルゴリズムのカテゴリについて、相互情報と共同エントロピーがエントロピーとなる分野をさらに探求するきっかけとなる。この直感に基づいて,行列ベースのエントロピー推定を正規化器として活用し,U-MAEを特別に仮定する新しい手法であるマトリックス変分マスク付きオートエンコーダ(M-MAE)を導入する。実験的な評価は、M-MAEの有効性を最先端の手法と比較し、線形プローブのViT-Baseが3.9%改善され、微細チューニングのViT-Largeが1%改善された。

In this paper, we conduct a comprehensive analysis of two dual-branch (Siamese architecture) self-supervised learning approaches, namely Barlow Twins and spectral contrastive learning, through the lens of matrix mutual information. We prove that the loss functions of these methods implicitly optimize both matrix mutual information and matrix joint entropy. This insight prompts us to further explore the category of single-branch algorithms, specifically MAE and U-MAE, for which mutual information and joint entropy become the entropy. Building on this intuition, we introduce the Matrix Variational Masked Auto-Encoder (M-MAE), a novel method that leverages the matrix-based estimation of entropy as a regularizer and subsumes U-MAE as a special case. The empirical evaluations underscore the effectiveness of M-MAE compared with the state-of-the-art methods, including a 3.9% improvement in linear probing ViT-Base, and a 1% improvement in fine-tuning ViT-Large, both on ImageNet.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 連続したコントラスト音声言語理解

Continual Contrastive Spoken Language Understanding ( http://arxiv.org/abs/2310.02699v2 )

ライセンス: Link先を確認

Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj,

(参考訳) 近年、ニューラルネットワークは様々な分野において顕著な進歩を見せており、音声処理は例外ではない。しかし、この分野における最近のブレークスルーは、大規模なデータセットと膨大なコンピューティングリソースを使用した広範なオフライントレーニングを必要とする。残念なことに、これらのモデルは、新しいタスクを継続的に学習する際に、以前取得した知識を維持するのに苦労している。本稿では,クラスインクリメンタルラーニング(CIL)設定における音声言語理解のためのシーケンス・ツー・シーケンス学習モデルの問題点を考察し,経験的リプレイとコントラスト学習の組み合わせに依存するCIL手法であるCOCONUTを提案する。 COCONUTは、リハーサルサンプルのみに適用された標準的な教師付きコントラスト損失の修正版を通じて、同じクラスからより近いサンプルを引き出し、他のクラスをプッシュすることで、学習された表現を保存する。さらに,音声とテキストの特徴を整列させることにより,モデルが新たなデータの識別的表現をより学習するのに役立つマルチモーダル・コントラッシブ・ロスを利用する。また, 比較的損失の強さと, 蒸留に使用する教師・学生アーキテクチャを組み合わせるために, 異なるコントラスト的設計について検討した。確立された2つのSLUデータセットに対する実験により,提案手法の有効性とベースラインに対する大幅な改善が示された。また,COCONUTをデコーダ側で動作させるメソッドと組み合わせることで,さらなるメトリクス改善が期待できることを示す。

Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 抗レラクゼーション被覆および緩衝ガス充填アルカリ気相細胞における光蓄積の比較研究

Comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali vapor cells ( http://arxiv.org/abs/2310.03726v3 )

ライセンス: Link先を確認

Marin Ðujić, D. Buhin, N. Šantić, D. Aumiler, T. Ban,

(参考訳) 熱ルビジウム蒸気中における電磁誘導透過 (EIT) を用いた反ラキセーション被覆および緩衝ガス充填アルカリ気相セルの光蓄積特性の比較検討を行った。バッファーガスを充填したセルを使用することで, 保存時間と効率が, 抗レラクテーション被覆セルに比べて$10倍向上した。この知見は将来の電界展開可能な高性能量子メモリの開発に寄与する。

We perform a comparative study of light storage in antirelaxation-coated and buffer-gas-filled alkali-vapor cells using electromagnetically induced transparency (EIT) in warm rubidium vapor. The use of a buffer-gas-filled cell resulted in $\approx$10-fold improvement in storage time and efficiency compared to antirelaxation coated cells. Our findings contribute to the development of future field-deployable high-performance quantum memories.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 主成分分析のための非接触ホテルリングのデフレの誤り伝播について

On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis ( http://arxiv.org/abs/2310.04283v2 )

ライセンス: Link先を確認

Fangshuo Liao, Junhyung Lyle Kim, Cruz Barnum, Anastasios Kyrillidis,

(参考訳) 主成分分析(PCA)は、データセットの分散を最もよく表す、いわゆる主成分によって分散される部分空間を見つけることを目的としている。デフレレーション法は、最も重要でないものから始まり、重要でないものに向かって、個別の主成分を逐次発見する一般的なメタアルゴリズムである。しかし、デフレが進むにつれて、主成分の不正確な推定による数値誤差は、その逐次的性質により伝播する。本稿では,不正確なHotellingのデフレ手法の誤差伝搬を数学的に特徴づける。先頭の固有ベクトルを見つけるためのサブルーチンが抽象的で、様々なアルゴリズムを表現できる場合の$i)$と、サブルーチンとしてパワーイテレーションが使用される場合の$ii)の2つのシナリオを考えます。後者の場合、電力反復による追加方向情報により、サブルーチン非依存の場合よりも厳密な誤差が得られる。どちらのシナリオでも、エラーがどのように進行し、その後の主成分推定に影響を及ぼすかを明確に特徴付ける。

Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the imprecise estimation of principal components propagate due to its sequential nature. This paper mathematically characterizes the error propagation of the inexact Hotelling's deflation method. We consider two scenarios: $i)$ when the sub-routine for finding the leading eigenvector is abstract and can represent various algorithms; and $ii)$ when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the sub-routine agnostic case. For both scenarios, we explicitly characterize how the errors progress and affect subsequent principal component estimations.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 近似量子誤り訂正符号の複雑さと順序

Complexity and order in approximate quantum error-correcting codes ( http://arxiv.org/abs/2310.04710v2 )

ライセンス: Link先を確認

Jinmin Yi, Weicheng Ye, Daniel Gottesman, Zi-Wen Liu,

(参考訳) 量子回路の複雑度と近似量子誤差補正(AQEC)特性の厳密な関係を確立し,格子系を含む全次元および幾何学的シナリオを網羅する。そこで本研究では,AQECの最適精度と密接に関連しているサブシステム分散(subsystem variance)と呼ばれるコードパラメータについて紹介する。我々の重要な発見は、サブシステムの分散が$O(k/n)$しきい値以下であれば、コードサブ空間の任意の状態は特定の回路の複雑さの低い境界に従わなければならないということです。この結果に基づいて、サブスペース間の境界として$O(k/n)$を提案し、AQECコードとしてカウントすべきではない。 AQECのこの理論は、多体量子系の量子複雑性と秩序を理解するための汎用的なフレームワークを提供し、多体および高エネルギー物理学において顕著な重要性を持つ、特にトポロジカル秩序と臨界量子系の広い物理シナリオに対する新しい洞察を提供する。我々は、大まかに$O(1/n)$は、非自明な量子秩序に関連する特徴に対するサブシステム分散の共通で、物理的に重要な ` `scaling threshold''' を表す、様々な観点から観察する。

We establish rigorous connections between quantum circuit complexity and approximate quantum error correction (AQEC) properties, covering both all-to-all and geometric scenarios including lattice systems. To this end, we introduce a type of code parameter that we call subsystem variance, which is closely related to the optimal AQEC precision. Our key finding is that if the subsystem variance is below an $O(k/n)$ threshold then any state in the code subspace must obey certain circuit complexity lower bounds, which identify nontrivial ``phases'' of codes. Based on our results, we propose $O(k/n)$ as a boundary between subspaces that should and should not count as AQEC codes. This theory of AQEC provides a versatile framework for understanding the quantum complexity and order of many-body quantum systems, offering new insights for wide-ranging physical scenarios, in particular topological order and critical quantum systems which are of outstanding importance in many-body and high energy physics. We observe from various different perspectives that roughly $O(1/n)$ represents a common, physically significant ``scaling threshold'' of subsystem variance for features associated with nontrivial quantum order.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 連続的グローバル最適化に基づくParFam --(ニューラルガイド付き)シンボリック回帰

ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization ( http://arxiv.org/abs/2310.05537v3 )

ライセンス: Link先を確認

Philipp Scholl, Katharina Bieker, Hillary Hauger, Gitta Kutyniok,

(参考訳) 記号回帰(SR)の問題は、物理法則の特定や、与えられたデータから金融市場の振舞いを記述する数学的方程式の導出など、多くの異なる応用で生じる。 SRの問題に対処するためには様々な方法があり、しばしば遺伝的プログラミングに基づいている。しかしながら、これらの手法は通常複雑であり、様々なハイパーパラメータを含む。本稿では,ParFamを用いて離散的記号回帰問題を連続的に変換する手法を提案する。グローバルオプティマイザと組み合わせることで,SR問題に対処するための高効率な手法が提案される。我々はParFamの表現率を理論的に解析し、SRベンチマークスーツSRBenchに基づく広範な数値実験によりその性能を実証し、最先端の結果が得られたことを示す。さらに、ParFamをガイドするために、事前訓練されたトランスフォーマーネットワークDL-ParFamを組み込んだ拡張を行い、最適化プロセスを最大2等級高速化する。私たちのコードと結果はhttps://github.com/Philipp238/parfam.comで確認できます。

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes. Our code and results can be found at https://github.com/Philipp238/parfam.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# InstructRetro: Retrieval-Augmented Pretrainingのインストラクションチューニング

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining ( http://arxiv.org/abs/2310.07713v3 )

ライセンス: Link先を確認

Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro,

(参考訳) 自動回帰型大言語モデル~(LLM)の検索による事前学習は、外部データベースを活用することにより、より複雑で現実的な正確性を示す。しかし、既存の事前学習によるLLMのサイズは制限されている(例えば、Retroは7.5Bパラメータを持つ)ため、命令チューニングとゼロショットの一般化の有効性が制限されている。本稿では,検索を前提としたLLMとしては最大規模のRetro 48Bを紹介する。具体的には、12兆のトークンから検索することで、Retro拡張法を用いて、さらに1000億のトークンに43BのGPTモデルを事前訓練し続けます。特に、得られた基盤モデルであるRetro 48Bは、1.2TトークンでトレーニングされたGPT 43Bを、わずか2.58%のGPU時間で上回っており、この手法のスケーリング可能性を示している。 Retroでのインストラクションチューニングの後、InstructRetroは幅広いゼロショットタスクにおいて、命令チューニングされたGPTよりも大幅に改善されていることを示す。具体的には、InstructRetroの平均的な改善は、8つの短い形式QAにまたがるGPTよりも7%、長い形式QAに10%、そして3つの要約タスクに16%である。驚いたことに、InstructRetroアーキテクチャからエンコーダを廃止し、デコーダのバックボーンを直接使用でき、同等の結果が得られます。提案手法は, 学習前の検索を継続し, より優れたGPTデコーダを得るための有望な方向を示すものである。私たちのコードとチェックポイントは、https://huggingface.co/nvidia/retro-48b-instruct-4k.comで公開されています。

Pretraining auto-regressive large language models~(LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval. Specifically, we continue to pretrain a 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. Notably, the obtained foundation model, Retro 48B, largely outperforms the counterpart GPT 43B trained on 1.2T tokens in terms of perplexity with only 2.58% additional GPU hours, demonstrating the significant scaling potential of the method. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA and reading comprehension tasks, 10% over GPT across 4 challenging long-form QA tasks, and 16% over GPT across 3 summarization tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. Our results highlight the promising direction to obtain a better GPT decoder through continued pretraining with retrieval before instruction tuning. Our code and checkpoints are publicly available at: https://huggingface.co/nvidia/retro-48b-instruct-4k.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 量子技術のための自己参照光位相雑音解析器

A self-referenced optical phase noise analyzer for quantum technologies ( http://arxiv.org/abs/2310.08258v2 )

ライセンス: Link先を確認

Robert Freund, Christian D. Marciniak, Thomas Monz,

(参考訳) 第二世代の量子技術は、工学化された量子システムを利用して古典的な代替品より優れていることを目標としている。量子的優位性を実現するために必要なコヒーレンスを維持するには、ホストシステムが被るノイズの詳細な知識と制御が必要である。パワースペクトル密度によるノイズプロセスの特徴付けは、科学や技術を通して日常的に行われ、必要なタスクとなる。例えば、主要な量子技術プラットフォームにおける位相ノイズパワースペクトルを決定することは、多くの位相ノイズアナライザの範囲外か、あるいは違法に高価である。本研究では,量子技術応用のためのコスト効率の高い光位相ノイズアナライザを提示し,特徴付ける。この設定を用いて、729\ \rm{nm}$に近い2つのライン幅のウルトラ安定振動子を比較し、これらを基準として、この測定装置で達成されたノイズフロアを、制限とトレードオフに焦点をあてて決定し、議論する。この実装において達成されたノイズフロアは、低コストで全ストック構成であり、低複雑さの位相ノイズアナライザであり、商用製品と比較して好適である。このセットアップは、多くのコンポーネントメーカーがそうであるように、より安定した参照や運用量子システムをセンサーとして使用せずに、特にアプリケーションを見つけることができる。

Second generation quantum technologies aim to outperform classical alternatives by utilizing engineered quantum systems. Maintaining the coherence required to enable any quantum advantage requires detailed knowledge and control over the noise the hosting system is subjected to. Characterizing noise processes via their power spectral density is routinely done throughout science and technology and can be a demanding task. Determining the phase noise power spectrum in leading quantum technology platforms, for example, can be either outside the reach of many phase noise analyzers, or be prohibitively expensive. In this work, we present and characterize a cost-effective optical phase noise analyzer for quantum technology applications. Using this setup we compare two $\approx1\ \rm{Hz}$ linewidth ultra-stable oscillators near $729\ \rm{nm}$, using them as references to determine and discuss the noise floor achieved in this measurement apparatus with a focus on limitations and their tradeoffs. The achieved noise floor in this implementation of a low-cost, all-stock component, low-complexity phase noise analyzer compares favourably to commercial offerings. This setup can find application in particular without a more stable reference or operational quantum system as sensor as would be the case for many component manufacturers.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 実世界ツインフィールド量子鍵分布における位相ノイズ

Phase Noise in Real-World Twin-Field Quantum Key Distribution ( http://arxiv.org/abs/2310.08621v3 )

ライセンス: Link先を確認

Gianluca Bertaina, Cecilia Clivati, Simone Donadello, Carlo Liorni, Alice Meda, Salvatore Virzì, Marco Gramegna, Marco Genovese, Filippo Levi, Davide Calonico, Massimiliano Dispenza, Ivo Pietro Degiovanni,

(参考訳) ツインフィールド量子鍵分布(TF-QKD)プロトコルの現実実装におけるノイズ源の影響について,光子源からの位相雑音に着目して検討した。この研究は、鍵レートの決定におけるレーザー品質、ネットワークトポロジー、繊維長、アームバランス、検出器性能の役割を強調している。注目すべきは、主要なTF-QKDプロトコルが、異なるメカニズムにもかかわらず位相ノイズの影響を受けていることである。本研究は,2つ以上の波長幅のレーザーと位相制御技術によるデューティサイクルの改善を実証し,高精度時間/周波数分布サービスによる潜在的な相乗効果を明らかにする。統合と小型化に向けて進化するウルトラスタブルレーザーは、既存のネットワーク上でのアジャイルTF-QKD実装を約束する。位相ノイズと実用的な制約に適切に対処することで、いくつかの国で開発中の量子通信インフラの安全な長距離リンクを確立するのに不可欠な、一貫した鍵レート予測、プロトコルの選択、レイアウト設計が可能になる。

The impact of noise sources in real-world implementations of Twin-Field Quantum Key Distribution (TF-QKD) protocols is investigated, focusing on phase noise from photon sources and connecting fibers. This work emphasizes the role of laser quality, network topology, fiber length, arm balance, and detector performance in determining key rates. Remarkably, it reveals that the leading TF-QKD protocols are similarly affected by phase noise despite different mechanisms. This study demonstrates duty cycle improvements of over a factor of two through narrow-linewidth lasers and phase-control techniques, highlighting the potential synergy with high-precision time/frequency distribution services. Ultrastable lasers, evolving toward integration and miniaturization, offer promise for agile TF-QKD implementations on existing networks. Properly addressing phase noise and practical constraints allows for consistent key rate predictions, protocol selection, and layout design, crucial for establishing secure long-haul links for the Quantum Communication Infrastructures under development in several countries.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# CLIPによるインクリメンタルオブジェクト検出

Incremental Object Detection with CLIP ( http://arxiv.org/abs/2310.08815v2 )

ライセンス: Link先を確認

Ziyue Huang, Yupeng He, Qingjie Liu, Yunhong Wang,

(参考訳) インクリメンタルな分類タスクとは対照的に、インクリメンタルな検出タスクは、複数の連続学習段階にわたって異なるラベル付き境界ボックスを持つことができるため、データのあいまいさの存在によって特徴付けられる。この現象は、しばしばモデルが新しいクラスを効果的に学習する能力を損なう。しかし、既存の研究はモデルの前方互換性にはあまり注意を払わず、漸進的な学習に適していることを制限している。この障害を克服するために、CLIPのような視覚言語モデルを用いて、異なるクラスセットのテキスト特徴埋め込みを生成することを提案する。次に、段階的なシナリオをシミュレートするために、早期の学習段階において利用できない新しいクラスを置き換えるために、スーパークラスを使用します。最後に、CLIP画像エンコーダを用いて、潜在的なオブジェクトを正確に識別する。そこで我々は,この微妙に認識された検出ボックスを擬似アノテーションとしてトレーニングプロセスに組み込むことにより,検出性能をさらに向上させる。我々は,PASCAL VOC 2007データセットを用いた様々な漸進的な学習環境に対するアプローチを評価し,そのアプローチは,特に新クラスの認識において最先端の手法よりも優れていることを示す。

In contrast to the incremental classification task, the incremental detection task is characterized by the presence of data ambiguity, as an image may have differently labeled bounding boxes across multiple continuous learning stages. This phenomenon often impairs the model's ability to effectively learn new classes. However, existing research has paid less attention to the forward compatibility of the model, which limits its suitability for incremental learning. To overcome this obstacle, we propose leveraging a visual-language model such as CLIP to generate text feature embeddings for different class sets, which enhances the feature space globally. We then employ super-classes to replace the unavailable novel classes in the early learning stage to simulate the incremental scenario. Finally, we utilize the CLIP image encoder to accurately identify potential objects. We incorporate the finely recognized detection boxes as pseudo-annotations into the training process, thereby further improving the detection performance. We evaluate our approach on various incremental learning settings using the PASCAL VOC 2007 dataset, and our approach outperforms state-of-the-art methods, particularly for recognizing the new classes.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# ポイントワイド相互情報プロファイルの特性と推定について

On the Properties and Estimation of Pointwise Mutual Information Profiles ( http://arxiv.org/abs/2310.10240v2 )

ライセンス: Link先を確認

Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx,

(参考訳) ポイントワイド相互情報プロファイル(ポイントワイド相互情報プロファイル、英: pointwise mutual information profile)は、与えられた確率変数のペアに対するポイントワイド相互情報の分布である。その重要な性質の1つは、期待値がこれらの確率変数間の相互情報であることである。本稿では,多変量正規分布の分布を解析的に記述し,モンテカルロ法を用いてその分布を正確に推定できる分布の新たなファミリーであるベンドとミキシングモデルを導入する。次に、ベンドモデルとミキシングモデルを用いて、既存の相互情報推定器の限界を調査し、変分推定器で使用される神経評論家の行動を調べ、実験的な外乱が相互情報推定に与える影響を理解する方法を示す。最後に,ベンドモデルとミキシングモデルを用いて相互情報のモデルベースベイズ推定を行い,不確実性定量化が必要な領域専門知識の問題に適合することを示す。

The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables. One of its important properties is that its expected value is precisely the mutual information between these random variables. In this paper, we analytically describe the profiles of multivariate normal distributions and introduce a novel family of distributions, Bend and Mix Models, for which the profile can be accurately estimated using Monte Carlo methods. We then show how Bend and Mix Models can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how Bend and Mix Models can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# Self-Pro: グラフニューラルネットワークのためのセルフプロンプトとチューニングフレームワーク

Self-Pro: Self-Prompt and Tuning Framework for Graph Neural Networks ( http://arxiv.org/abs/2310.10362v2 )

ライセンス: Link先を確認

Chenghua Gong, Xiang Li, Jianxiang Yu, Cheng Yao, Jiaqi Tan, Chengcheng Yu,

(参考訳) グラフはWebアプリケーションにとって重要なモデリングツールとなり、グラフニューラルネットワーク(GNN)はグラフ表現学習において大きな成功を収めた。しかし、彼らの演技は大量の監督に大きく依存している。近年, 'pre-train, fine-tune'' はラベル依存や一般化の貧弱な問題に対処するパラダイムとなっている。しかし、事前学習戦略はホモフィリーなグラフとヘテロフィリーなグラフで異なり、様々な下流タスクの目的も異なる。これにより、プリテキストとダウンストリームタスクの間にギャップが生じ、結果として‘負の転送’が発生し、パフォーマンスが低下する。自然言語処理の素早い学習にインスパイアされた多くの研究は、ギャップを埋め、事前訓練されたモデルを完全に活用する。しかし、グラフプロンプトの既存の方法はホモフィリーに調整されており、グラフ上の固有のヘテロフィリーを無視している。一方、そのほとんどはランダムに初期化されたプロンプトに依存しており、安定性に悪影響を及ぼす。そこで本研究では,モデルとデータ自体に基づくグラフのプロンプトフレームワークであるSelf-Promptを提案する。まず,非対称なグラフのコントラスト学習を前文として導入し,前文と下流タスクの目的を整合させる。次に,プリトレーニングをセルフアダプタとして再利用し,タスク適応のためのグラフ自体に基づいたセルフプロンプトを導入する。最後に、11のベンチマークデータセットに対する広範な実験を行い、その優位性を実証する。私たちは \url{https://github.com/gongchenghua/Self-Pro} でコードを提供しています。

Graphs have become an important modeling tool for Web applications, and graph neural networks (GNNs) have achieved great success in graph representation learning. However, their performance heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in natural language processing, many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning as pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at \url{https://github.com/gongchenghua/Self-Pro}.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# ACES: 自動生成モデルによる多言語プログラミングパズルの生成

ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models ( http://arxiv.org/abs/2310.10692v4 )

ライセンス: Link先を確認

Julien Pourcel, Cédric Colas, Gaia Molinaro, Pierre-Yves Oudeyer, Laetitia Teodorescu,

(参考訳) 新しく興味深い問題を発明する能力は、イノベーション、芸術、科学を駆動する人間の知能の驚くべき特徴である。そこで我々は,Pythonプログラミングパズルの文脈において,最先端の生成モデルのパワーを活用して,難解で解決可能な問題を多種多様に生成することを目的として,このプロセスを自動化する手法を提案する。本質的に動機づけられた文献に触発されて、自動コーデック検索(ACES)は、発生した問題の多様性と難易度を共同で最適化する。 LLM生成セマンティックディスクリプタの空間における問題(例えば、文字列操作、動的プログラミングなど)を表現し、その難しさをLlama-3-70Bの成功率の線形化関数として経験的に測定する。 ACESは、以前生成された問題をコンテキスト内例として用いて、ターゲットセマンティック記述子(ゴール指向探索)の多様性を達成するために、大きな言語モデルに難しい問題を生成するように反復的に促す。 ACESは、ベースラインメソッドが生み出す問題よりも多様性があり、また既存のPythonプログラミングベンチマークで見られる問題よりも、11の最先端コード LLM で平均して3倍難しい問題を生成する。

The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically motivated literature, Autotelic CodE Search (ACES) jointly optimizes for the diversity and difficulty of generated problems. We represent problems in a space of LLM-generated semantic descriptors describing the programming skills required to solve them (e.g. string manipulation, dynamic programming, etc.) and measure their difficulty empirically as a linearly decreasing function of the success rate of Llama-3-70B, a state-of-the-art LLM problem solver. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors (goal-directed exploration) using previously generated problems as in-context examples. ACES generates problems that are more diverse and more challenging than problems produced by baseline methods and three times more challenging than problems found in existing Python programming benchmarks on average across 11 state-of-the-art code LLMs.

翻訳日:2024-05-31 02:11:35 公開日:2024-05-29

# 3D-GPT:大規模言語モデルを用いた手続き型3Dモデリング

3D-GPT: Procedural 3D Modeling with Large Language Models ( http://arxiv.org/abs/2310.12945v2 )

ライセンス: Link先を確認

Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould,

(参考訳) 効率的な自動コンテンツ作成の追求において、変更可能なパラメータとルールベースのシステムを活用する手続き生成が有望なアプローチとして現れている。それにもかかわらず、規則、アルゴリズム、パラメータの深い理解を必要とする複雑な性質を考えると、それは要求に満ちた努力かもしれない。作業負荷を低減するため,命令駆動型3Dモデリングのための大規模言語モデル~(LLM)を利用するフレームワークである3D-GPTを導入する。 3D-GPTは、3Dモデリングタスクをアクセス可能なセグメントに分割し、各タスクにアプエージェントを割り当てる。 3D-GPTは、タスクディスパッチエージェント、概念化エージェント、モデリングエージェントの3つのコアエージェントを統合する。彼らは共同で2つの目標を達成する。まず、簡潔なシーン記述を強化し、後続の命令に基づいてテキストを動的に適応させながら、それらを詳細な形式に進化させる。第二に、プロシージャ生成を統合し、リッチテキストからパラメータ値を抽出し、3Dソフトウェアに精通してアセット生成を行う。我々の実証調査では、3D-GPTが解釈し、指示を実行し、信頼性の高い結果を提供するだけでなく、人間デザイナーと効果的に協力することを確認した。さらに、Blenderとシームレスに統合され、拡張された操作可能性のロックが解除される。本研究は3次元モデリングにおけるLLMの可能性を強調し,シーン生成とアニメーションの今後の進歩のための基本的なフレームワークを提供する。

In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# 大規模言語モデルはなぜ正しい連鎖を生成するのか?

Why Can Large Language Models Generate Correct Chain-of-Thoughts? ( http://arxiv.org/abs/2310.13571v3 )

ライセンス: Link先を確認

Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar,

(参考訳) 本稿では,大規模言語モデル(LLM)の能力について述べる。本研究では,LLMを効果的に誘導し,コヒーレントな思考連鎖を生成する方法について検討する。これを実現するために,自然言語生成に適した2階層階層型グラフィカルモデルを提案する。この枠組み内では、真の言語に由来するものと比較して、LLM生成された思考の連鎖の可能性を測る魅力的な幾何学的収束率を確立する。本研究は,LLMが推論スキルを要求されるタスクのパフォーマンス向上を説明する上で,(潜在的に)正しい思考系列を生成できることを理論的に正当化するものである。

This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# 光の超放射性と回転量子流体からの絡み合い

Entanglement from superradiance and rotating quantum fluids of light ( http://arxiv.org/abs/2310.16031v3 )

ライセンス: Link先を確認

Adrià Delhom, Killian Guerrero, Paula Calizaya, Kévin Falque, Alberto Bramati, Anthony J. Brady, Maxime J. Jacquet, Ivan Agullo,

(参考訳) 超放射光による放射の増幅は、多くの物理系で観測される普遍的な現象である。超放射能散乱は、コヒーレント状態を含む異なる入力状態の絡み合いを生じさせ、この現象の本質的な量子的性質を確立することを実証する。これらの概念を実験に適用するために,光の偏光流体の散逸ダイナミクスにより動的に安定な地平線のないエルゴリージョンを構築する新しい手法を提案する。我々は,安定なエルゴリージョンの作成を実証するために,システムを数値シミュレーションする。その後,本システム内での回転超放射能について検討し,主に絡み合いの発生と,その拡張の可能性について検討した。この方法では、入力状態を自由に制御できる最先端の実験において、回転超放射による量子放出の研究が可能である。

The amplification of radiation by superradiance is a universal phenomenon observed in numerous physical systems. We demonstrate that superradiant scattering generates entanglement for different input states, including coherent states, thereby establishing the inherently quantum nature of this phenomenon. To put these concepts to the test, we propose a novel approach to create horizonless ergoregions, which are nonetheless dynamically stable thanks to the dissipative dynamics of a polaritonic fluid of light. We numerically simulate the system to demonstrate the creation of a stable ergoregion. Subsequently, we investigate rotational superradiance within this system, with a primary focus on entanglement generation and the possibilities for its enhancement using current techniques. Our methods permit the investigation of quantum emission by rotational superradiance in state-of-the-art experiments, in which the input state can be controlled at will.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# 3時間拘束型アクター臨界および拘束型自然アクター臨界アルゴリズムの有限時間解析

Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms ( http://arxiv.org/abs/2310.16363v3 )

ライセンス: Link先を確認

Prashansa Panda, Shalabh Bhatnagar,

(参考訳) アクター批判法は、特に状態-作用空間が大きい場合に、広範囲の強化学習タスクに多大な応用を見出した。本稿では,不等式制約を含む制約付きマルコフ決定過程(C-MDP)の関数近似を用いたアクター評論家および自然なアクター評論家アルゴリズムについて考察し,これらのアルゴリズムを非i.d(マルコフアン)環境で非漸近解析する。目的関数と制約関数の両方が所定コスト関数の政策依存の長期平均となるような長期平均コスト基準を考察する。ラグランジュ乗算器法を用いて不等式制約を扱う。これらのアルゴリズムが性能(ラグランジュ)関数$L(\theta,\gamma)$の1次定常点(すなわち $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$)を見つけることが保証されていることを証明している。また、3つの異なるセーフティガイム環境の実験結果も示す。

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms. We also show the results of experiments on three different Safety-Gym environments.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# 原子アンサンブルにおける弱場励起以外の光散乱特性

Light scattering properties beyond weak-field excitation in atomic ensembles ( http://arxiv.org/abs/2310.17106v2 )

ライセンス: Link先を確認

Chung-Hsien Wang, Nai-Yu Tsai, Yi-Cheng Wang, H. H. Jen,

(参考訳) 大型原子系の光学的性質の研究において、線形結合方程式による系のダイナミクスを単純化するために弱いレーザー駆動がしばしば仮定される。本稿では,原子アンサンブルの光散乱特性について,累積膨張法を用いて検討する。定常方程式に高次相関を漸進的に組み込むことにより、完全な密度行列を解いた正確な解と比較して精度を向上することができる。分析の結果, 弱い双極子-双極子相互作用 (DDI) の段階において, 1次展開は光深度に対する良好な予測を導出し, より密度の高い原子配置は高次相関を考慮する必要があることがわかった。入射光の強度が増加すると、原子飽和効果が顕著になり、光透過性、エネルギーシフト、崩壊速度が著しく変化する。この飽和現象は、弱い駆動条件下でもサブラジアント原子配列にまで広がり、線形モデルからはかなり逸脱する。本研究は,線形モデルに対する平均場モデルを精度と計算複雑性の両立を図ったものである。しかし、このような光物質相互作用系におけるヒルベルト空間の指数関数的増加により理論的に難しいため、大きくて密度の高い原子系における高次累積物の役割は依然として不明である。

In the study of optical properties of large atomic system, a weak laser driving is often assumed to simplify the system dynamics by linearly coupled equations. Here, we investigate the light scattering properties of atomic ensembles beyond weak-field excitation through the cumulant expansion method. By progressively incorporating higher-order correlations into the steady-state equations, an enhanced accuracy can be achieved in comparison to the exact solutions from solving a full density matrix. Our analysis reveals that, in the regime of weak dipole-dipole interaction (DDI), the first-order expansion yields satisfactory predictions for optical depth, while denser atomic configurations necessitate consideration of higher-order correlations. As the intensity of incident light increases, atom saturation effects become noticeable, giving rise to significant changes in light transparency, energy shift, and decay rate. This saturation phenomenon extends to subradiant atom arrays even under weak driving conditions, leading to substantial deviations from the linear model. Our findings demonstrate the mean-field models as good extensions to linear models as it balances both accuracy and computational complexity. However, the crucial role of higher-order cumulants in large and dense atom systems remains unclear, since it is challenging theoretically owing to the exponentially increasing Hilbert space in such light-matter interacting systems.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# セマンティック通信を利用した無線AI生成コンテンツ(AIGC)プロビジョニングフレームワーク

A Wireless AI-Generated Content (AIGC) Provisioning Framework Empowered by Semantic Communication ( http://arxiv.org/abs/2310.17705v2 )

ライセンス: Link先を確認

Runze Cheng, Yao Sun, Dusit Niyato, Lan Zhang, Lei Zhang, Muhammad Ali Imran,

(参考訳) 生成型AIアプリケーションは、多種多様な高品質なAI生成コンテンツ(AIGC)を作成することで、最近、巨大なユーザベースに対応している。モバイルデバイスの普及とモバイルトラフィックの急速な増加により、無線通信ネットワークによる高品質なAIGCサービスへのユビキタスなアクセスが、未来の方向になりつつある。しかし、不安定なチャネル、帯域幅の限られたリソース、不均一な分散計算リソースを備えた無線ネットワークにおいて、適切なAIGCサービスを提供することは困難である。これらの課題に対処するために、セムコムを用いたセマンティック通信(セムコム)によるAIGC(セムAIGC)生成および送信フレームワークを提案する。具体的には、セマンティックエンコーダとデコーダに拡散モデルを統合することで、ワークロード調整可能なトランシーバを設計し、エッジおよびローカルでの計算資源利用の調整を可能にする。さらに、リソースを意識したwOrk lOad Trade-off(ROOT)スキームを考案し、トランスシーバの負荷適応決定をインテリジェントに行い、動的無線チャンネル条件やサービス要件に応じたコンテンツを生成し、送信し、微調整する。提案するSemAIGCフレームワークは,従来の手法に比べてレイテンシとコンテンツ品質が優れていることがシミュレーションによって検証された。

Generative AI applications have been recently catering to a vast user base by creating diverse and high-quality AI-generated content (AIGC). With the proliferation of mobile devices and rapid growth of mobile traffic, providing ubiquitous access to high-quality AIGC services via wireless communication networks is becoming the future direction. However, it is challenging to provide qualified AIGC services in wireless networks with unstable channels, limited bandwidth resources, and unevenly distributed computational resources. To tackle these challenges, we propose a semantic communication (SemCom)-empowered AIGC (SemAIGC) generation and transmission framework, where only semantic information of the content rather than all the binary bits should be generated and transmitted by using SemCom. Specifically, SemAIGC integrates diffusion models within the semantic encoder and decoder to design a workload-adjustable transceiver thereby allowing adjustment of computational resource utilization in edge and local. In addition, a Resource-aware wOrk lOad Trade-off (ROOT) scheme is devised to intelligently make workload adaptation decisions for the transceiver, thus efficiently generating, transmitting, and fine-tuning content as per dynamic wireless channel conditions and service requirements. Simulations verify the superiority of our proposed SemAIGC framework in terms of latency and content quality compared to conventional approaches.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# モデル適応によるデバイアスアルゴリズム

Debiasing Algorithm through Model Adaptation ( http://arxiv.org/abs/2310.18913v4 )

ライセンス: Link先を確認

Tomasz Limisiewicz, David Mareček, Tomáš Musil,

(参考訳) 大規模言語モデルは、ますます増え続けるタスクの解決策になりつつある。しかし、能力の増大に伴い、モデルはトレーニングデータに存在するバイアスやステレオタイプから生じる急激な相関に依存する傾向にある。本研究は,言語モデルにおけるジェンダーバイアスの検出と緩和のための新しい手法を提案する。因果解析を行い、問題のあるモデル成分を同定し、フィードフォワードの中間層が最も偏りを伝達しやすいことを明らかにする。解析結果に基づいて,これらの層の重み行列に線形射影を適用することにより,モデルに介入する。提案手法であるDAMAは,下流タスクにおけるモデルの性能を維持しながら,様々な指標によって測定されるバイアスを著しく低減する。我々はLLaMAの最先端性能を再訓練する手法とモデルのためのコードをリリースし、バイアスを著しく低減した。

Large language models are becoming the go-to solution for the ever-growing number of tasks. However, with growing capacity, models are prone to rely on spurious correlations stemming from biases and stereotypes present in the training data. This work proposes a novel method for detecting and mitigating gender bias in language models. We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias. Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers. Our titular method, DAMA, significantly decreases bias as measured by diverse metrics while maintaining the model's performance on downstream tasks. We release code for our method and models, which retrain LLaMA's state-of-the-art performance while being significantly less biased.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# MIST:メンバーシップ不変のサブスペーストレーニングによるメンバーシップ推論攻撃の回避

MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training ( http://arxiv.org/abs/2311.00919v2 )

ライセンス: Link先を確認

Jiacheng Li, Ninghui Li, Bruno Ribeiro,

(参考訳) メンバー推論(MI)攻撃では、敵は機械学習(ML)モデルをトレーニングするためにインスタンスが使用されているかどうかを判断しようとする。 MI攻撃は、プライベートデータを使用してMLモデルをトレーニングする際の大きなプライバシー上の懸念事項である。文献におけるほとんどのMI攻撃は、MLモデルがトレーニングデータに適合するように訓練されているという事実を生かし、トレーニングインスタンスに非常に少ない損失をもたらす。したがって、ほとんどのMI攻撃に対する防御は、モデルのトレーニングデータへの適合性を低下させようとする。しかし、一般的には精度が低下する。トレーニングインスタンスがMI攻撃に対する脆弱性の度合いが異なることを観察する。ほとんどのインスタンスは、トレーニングに含まれていない場合でも、損失が小さくなります。これらのインスタンスでは、モデルをMI攻撃の心配なしにうまく適合させることができる。効果的な防御は、MI攻撃に弱いインスタンスを(暗黙的に)特定し、過度な適合を避ける必要がある。大きな課題は、効率的なトレーニングプロセスにおいて、そのような効果をどのように達成するかである。表現学習における2つの新たな進歩を生かして,MI攻撃を防御する新しいメンバーシップ・不変部分空間訓練(MIST)手法を提案する。 MISTは、他のインスタンスに大きな影響を与えることなく、脆弱性のあるインスタンスの過度な適合を避ける。我々は、MISTと他の様々なSOTAMI防衛を、いくつかのSOTAMI攻撃と比較し、広範囲にわたる実験的研究を行った。 MISTは他の防御よりも優れており、テスト精度は最小限に抑えられる。

In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# 再利用を学ぶ:知識スコープの制限と拒否メカニズムを通じて、大規模言語モデルをより制御可能で信頼性の高いものにする

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism ( http://arxiv.org/abs/2311.01041v3 )

ライセンス: Link先を確認

Lang Cao,

(参考訳) 大きな言語モデル(LLM)は印象的な言語理解と生成能力を示し、様々な領域にわたる幅広い質問に答えることを可能にする。しかし、これらのモデルは欠陥がなく、しばしばエラーや誤報を含む応答を生成する。これらの不正確さは、一般に幻覚と呼ばれ、多くのシナリオでLLMを信頼できない、さらには使用できないようにしている。本稿では,LLMにおける幻覚の問題を,特に質問応答の文脈において緩和することに焦点を当てる。全ての質問に答える代わりに、私たちはLLMにエラーを避けるために難しい質問に答えることを拒否するように指示する拒絶メカニズムを探求する。そこで我々は,L2R(Learning to Refuse)と呼ばれるシンプルで効果的な解を提案する。これを実現するため、構造化知識ベースを用いてLLMの世界のすべての理解を表現し、追跡可能な金の知識を提供する。この知識基盤はLLMとは分離されており、当初は空だった。検証済みの知識で満たされ、徐々に拡張される。 LLMがドメイン外の質問に遭遇すると、システムはその知識の範囲を認識し、その質問に答えられるかどうかを判断する。さらに,LLMの知識ベースを自動的かつ効率的に拡張する手法を提案する。定性的かつ定量的な分析により,LLMの可制御性と信頼性が向上することが実証された。

Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# HetCAN:デュアルレベル認識を備えた異種グラフカスケード注意ネットワーク

HetCAN: A Heterogeneous Graph Cascade Attention Network with Dual-Level Awareness ( http://arxiv.org/abs/2311.03275v2 )

ライセンス: Link先を確認

Zeyuan Zhao, Qingqing Ge, Anfeng Cheng, Yiding Liu, Xiang Li, Shuaiqiang Wang,

(参考訳) 異種グラフニューラルネットワーク(HGNN)は、最近、現実世界のアプリケーションでユビキタスな異種グラフをモデリングする際、顕著な能力を示した。ヘテロジニアスグラフの既存の手法の多くは、複数の畳み込み層や注意層を積み重ねてノード埋め込みを学習しており、これはノードレベルの側面から高次情報をキャプチャすると考えられる。しかし、異種グラフの異なる種類のノードには多様な特徴があり、また特徴レベルの側面から高次情報を含むノードの特徴間の相互作用を捉えることも必要である。さらに、ほとんどのメソッドは、まずノードの機能を同じ低次元空間にマッピングすることで整列するが、この方法でノードの型情報を失う可能性がある。本稿では,複数のカスケードブロックからなる新規なヘテロジニアスグラフカスケード注意ネットワーク(HetCAN)を提案する。各カスケードブロックは、タイプアウェアエンコーダとディメンションアウェアエンコーダの2つのコンポーネントを含む。具体的には、タイプ認識エンコーダはノード型情報の損失を補償し、グラフの不均一性をフル活用することを目的としている。次元認識エンコーダは、ノード特徴間の相互作用をキャプチャすることで、特徴レベルの高次情報を学ぶことができる。これらのコンポーネントの助けを借りて、HetCANはノードの特徴、グラフの不均一性、およびノード埋め込みにおけるグラフ構造に関する情報を包括的にエンコードすることができる。大規模な実験は、HetCANが先進的な競争相手よりも優れていることを示し、その効率性と堅牢性を示している。

Heterogeneous graph neural networks(HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Most existing methods for heterogeneous graphs mainly learn node embeddings by stacking multiple convolutional or attentional layers, which can be considered as capturing the high-order information from node-level aspect. However, different types of nodes in heterogeneous graphs have diverse features, it is also necessary to capture interactions among node features, namely the high-order information from feature-level aspect. In addition, most methods first align node features by mapping them into one same low-dimensional space, while they may lose some type information of nodes in this way. To address these problems, in this paper, we propose a novel Heterogeneous graph Cascade Attention Network (HetCAN) composed of multiple cascade blocks. Each cascade block includes two components, the type-aware encoder and the dimension-aware encoder. Specifically, the type-aware encoder compensates for the loss of node type information and aims to make full use of graph heterogeneity. The dimension-aware encoder is able to learn the feature-level high-order information by capturing the interactions among node features. With the assistance of these components, HetCAN can comprehensively encode information of node features, graph heterogeneity and graph structure in node embeddings. Extensive experiments demonstrate the superiority of HetCAN over advanced competitors and also exhibit its efficiency and robustness.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# 無限大データの最小二乗クラスタリングのための高性能ハイブリッドアルゴリズム

High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data ( http://arxiv.org/abs/2311.04517v3 )

ライセンス: Link先を確認

Ravil Mussabayev, Rustam Mussabayev,

(参考訳) 本稿では,Infinitely Tall Data (MSSC-ITD) の最小階数クラスタリング(Minimum Sum-of-Squares Clustering of Infinitely Tall Data, MSC-ITD)という,クラスタリング問題の新しい定式化と,その有効解に対するハイブリッド並列手法の革新的な集合であるHPClustを提案する。現代の高性能コンピューティング技術を利用することで、HPClustは、有効性、計算効率、拡張性といった主要なクラスタリング指標を強化する。 MapReduceフレームワークによる処理時間を短縮するバニラデータ並列処理とは対照的に,本手法では,マルチストラテジーな競合協調並列処理と,目的関数ランドスケープの複雑な特性を活用して,優れた性能を実現する。スケールに苦しむ他のアルゴリズムとは異なり、当社のアルゴリズムは本質的に並列であり、スケーラビリティと並列性の向上によるソリューション品質の向上、中小データセット用に設計された高度なアルゴリズムよりも優れています。 4つの並列戦略を特徴とするHPClustの評価は,従来の手法や最先端手法よりも優れた性能を示す。これらの結果から,並列処理はクラスタリング効率を向上するだけでなく,精度も向上することが示された。さらに、計算効率とクラスタリング品質のバランスについて検討し、データセットの詳細とリソース可用性に基づいた最適な並列戦略に関する洞察を提供する。本研究はクラスタリングアルゴリズムにおける並列性についての理解を深め,MSSC-ITD に対して,高度な並列アプローチの厳密なハイブリッド化が最適な結果をもたらすことを示す。合成データに関する実験は、HPClustの異常なスケーラビリティとノイズに対する堅牢性をさらに確認した。

This paper introduces a novel formulation of the clustering problem, namely the Minimum Sum-of-Squares Clustering of Infinitely Tall Data (MSSC-ITD), and presents HPClust, an innovative set of hybrid parallel approaches for its effective solution. By utilizing modern high-performance computing techniques, HPClust enhances key clustering metrics: effectiveness, computational efficiency, and scalability. In contrast to vanilla data parallelism, which only accelerates processing time through the MapReduce framework, our approach unlocks superior performance by leveraging the multi-strategy competitive-cooperative parallelism and intricate properties of the objective function landscape. Unlike other available algorithms that struggle to scale, our algorithm is inherently parallel in nature, improving solution quality through increased scalability and parallelism, and outperforming even advanced algorithms designed for small and medium-sized datasets. Our evaluation of HPClust, featuring four parallel strategies, demonstrates its superiority over traditional and cutting-edge methods by offering better performance in the key metrics. These results also show that parallel processing not only enhances the clustering efficiency, but the accuracy as well. Additionally, we explore the balance between computational efficiency and clustering quality, providing insights into optimal parallel strategies based on dataset specifics and resource availability. This research advances our understanding of parallelism in clustering algorithms, demonstrating that a judicious hybridization of advanced parallel approaches yields optimal results for MSSC-ITD. Experiments on synthetic data further confirm HPClust's exceptional scalability and robustness to noise.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# PINE:シークレット共有ベクトルの効率的なノルム境界検証

PINE: Efficient Norm-Bound Verification for Secret-Shared Vectors ( http://arxiv.org/abs/2311.10237v2 )

ライセンス: Link先を確認

Guy N. Rothblum, Eran Omri, Junye Chen, Kunal Talwar,

(参考訳) 高次元ベクトルのセキュアアグリゲーションは、フェデレートされた統計学と学習における基本的なプリミティブである。 PRIOのような2サーバシステムは、秘密共有ベクトルのスケーラブルな集約を可能にする。敵のクライアントは集約を操作しようとするかもしれないので、それぞれの(秘密の共有された)コントリビューションが適切に形成されていることを保証することが重要です。本研究では、各寄与ベクトルがユークリッドノルムに有界であることを保証するという、重要かつよく研究された目標に焦点を当てる。有界ノルム寄与を保証するための既存のプロトコルは、大きな通信オーバーヘッドを発生させるか、ノルム境界の近似的な検証しかできない。通信オーバーヘッドの少ない正確な標準検証を可能にする新しいプロトコルであるPrivate Inexpensive Norm Enforcement (PINE)を提案する。高次元ベクトルの場合、従来の16-32倍のオーバヘッドに比べて通信オーバヘッドは数パーセントである。

Secure aggregation of high-dimensional vectors is a fundamental primitive in federated statistics and learning. A two-server system such as PRIO allows for scalable aggregation of secret-shared vectors. Adversarial clients might try to manipulate the aggregate, so it is important to ensure that each (secret-shared) contribution is well-formed. In this work, we focus on the important and well-studied goal of ensuring that each contribution vector has bounded Euclidean norm. Existing protocols for ensuring bounded-norm contributions either incur a large communication overhead, or only allow for approximate verification of the norm bound. We propose Private Inexpensive Norm Enforcement (PINE): a new protocol that allows exact norm verification with little communication overhead. For high-dimensional vectors, our approach has a communication overhead of a few percent, compared to the 16-32x overhead of previous approaches.

翻訳日:2024-05-31 00:10:23 公開日:2024-05-29

# Bayesian Neural Networks: Min-Max Game Framework

Bayesian Neural Networks: A Min-Max Game Framework ( http://arxiv.org/abs/2311.11126v2 )

ライセンス: Link先を確認

Junping Hong, Ercan Engin Kuruoglu,

(参考訳) 本稿では,ゲーム理論を定式化したベイズニューラルネットワーク(BNN)と最大符号化速度歪み損失を用いたディープニューラルネットワークのロバスト性と雑音解析について予備的検討を行う。 BNNは深層学習にある程度の堅牢性を提供しており、ミニマックス法はベイズ法を補助する自然な保守的な方法であった。最近の閉ループ転写ニューラルネットワークに触発されて、決定論的ニューラルネットワーク$f$とサンプリングネットワーク$f + \xi$または$f + r*\xi$の間のゲーム理論を介してBNNを定式化する。従来のBNNと比較すると、BNNは中心$f$とサンプリングポイント$f + r*\xi$の間の一定のギャップ内で解空間を学習し、以前のBNNと比較して意味のある事前設定を持つ保守的な選択である。さらに、$f$ と $f + r*\xi$ の間の最小点は、十分に訓練されたモデル $f$ で、部分空間次元が十分に大きいときに安定となる。これにより、$f$は、たとえ$f$が真のデータを数回繰り返してオンライントレーニングしているとしても、予測レベルよりもサブスペース内の配布外データやノイズデータを認識する確率が高い。これまでのところ、我々の実験はMNISTとFashion MNISTのデータセットに限られており、現実的なデータセットと複雑なニューラルネットワークモデルを用いたさらなる実験は、上記の議論を検証するために実装されるべきである。

This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcription neural network, we formulate the BNN via game theory between the deterministic neural network $f$ and the sampling network $f + \xi$ or $f + r*\xi$. Compared with previous BNN, BNN via game theory learns a solution space within a certain gap between the center $f$ and the sampling point $f + r*\xi$, and is a conservative choice with a meaningful prior setting compared with previous BNN. Furthermore, the minimum points between $f$ and $f + r*\xi$ become stable when the subspace dimension is large enough with a well-trained model $f$. With these, the model $f$ can have a high chance of recognizing the out-of-distribution data or noise data in the subspace rather than the prediction level, even if $f$ is in online training after a few iterations of true data. So far, our experiments are limited to MNIST and Fashion MNIST data sets, more experiments with realistic data sets and complicated neural network models should be implemented to validate the above arguments.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# 量子系における局所純度蒸留-純度と絡み合いの相補性を探る

Local Purity Distillation in Quantum Systems: Exploring the Complementarity Between Purity and Entanglement ( http://arxiv.org/abs/2311.11820v2 )

ライセンス: Link先を確認

Ray Ganardi, Piotr Masajada, Moein Naseri, Alexander Streltsov,

(参考訳) 量子力学と量子絡み合いは、量子情報科学において重要な関係を持つ2つの重要な量子資源理論を表す。その重要性にもかかわらず、この2つの理論の複雑な関係は未だ完全には理解されていない。ここでは、特に局所冷却過程の文脈において、絡み合いと熱力学の相互作用を掘り下げる。ギブス保存型ローカル操作と古典通信の枠組みを導入・開発する。本フレームワークでは,リモートパーティがローカルシステムを地上状態に効果的に冷却できる戦略を探求する。我々の分析は、量子状態の1つのコピーのみがアクセス可能なシナリオに重点を置いており、理想的な性能は、これらの制約の下で達成可能な基底状態への可能な限りの忠実さによって定義される。局所冷却は局所純度の抽出と一致し, 完全縮退した局所ハミルトン系システムに着目する。この文脈では、局所純度抽出の効率性とシステムに存在する絡み合いの度合いとの間に強力なリンクを確立する。さらに、多くの関連するシナリオにおいて、最適性能は半定値プログラミング手法によって正確に決定できることを実証する。本研究は, 絡み込み検出・推定技術など, 様々な実用化への扉を開くものである。我々は、有界な絡み合い状態のクラスに対する絡み合いの量を評価することによってこれを実証する。

Quantum thermodynamics and quantum entanglement represent two pivotal quantum resource theories with significant relevance in quantum information science. Despite their importance, the intricate relationship between these two theories is still not fully understood. Here, we delve into the interplay between entanglement and thermodynamics, particularly in the context of local cooling processes. We introduce and develop the framework of Gibbs-preserving local operations and classical communication. Within this framework, we explore strategies enabling remote parties to effectively cool their local systems to the ground state. Our analysis is centered on scenarios where only a single copy of a quantum state is accessible, with the ideal performance defined by the highest possible fidelity to the ground state achievable under these constraints. We focus on systems with fully degenerate local Hamiltonians, where local cooling aligns with the extraction of local purity. In this context, we establish a powerful link between the efficiency of local purity extraction and the degree of entanglement present in the system, a concept we define as purity-entanglement complementarity. Moreover, we demonstrate that in many pertinent scenarios, the optimal performance can be precisely determined through semidefinite programming techniques. Our findings open doors to various practical applications, including techniques for entanglement detection and estimation. We demonstrate this by evaluating the amount of entanglement for a class of bound entangled states.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# 創発的医用画像評価のための特徴抽出 : 進化する傾向に対する新たな証拠

Feature Extraction for Generative Medical Imaging Evaluation: New Evidence Against an Evolving Trend ( http://arxiv.org/abs/2311.13717v3 )

ライセンス: Link先を確認

McKell Woodland, Austin Castelo, Mais Al Taie, Jessica Albuquerque Marques Silva, Mohamed Eltaher, Frank Mohn, Alexander Shieh, Austin Castelo, Suprateek Kundu, Joshua P. Yung, Ankit B. Patel, Kristy K. Brock,

(参考訳) Fr'echet Inception Distance (FID)は、合成画像の品質を評価するために広く用いられている指標である。 ImageNetベースの特徴抽出装置に依存しており、医療画像に適用可能であるかどうかは不明だ。最近のトレンドは、医用画像で訓練された特徴抽出器を通して、医用画像にFIDを適用することである。本研究では,ImageNetをベースとした抽出器がRadImageNetよりも人間の判断に整合していることを示すことで,この実践に挑戦する。我々は,Fr'echet distances (FDs) を用いた4つの医用画像モダリティと4つのデータ拡張技術を用いた16のStyleGAN2ネットワークの評価を行った。視覚的チューリングテストによる人的判断と比較したところ,ImageNetをベースとした抽出機が人的判断と整合性のあるランキングを作成したのに対し,ImageNetをトレーニングしたSwaV抽出機から抽出したFDは専門家による評価と有意な相関を示した。対照的に、RadImageNetベースのランキングは不安定であり、人間の判断とは矛盾していた。以上の結果から,医用画像抽出装置はFDを本質的に改善せず,信頼性を損なうことさえできないという新たな証拠が得られた。私たちのコードはhttps://github.com/mckellwoodland/fid-med-eval.comで利用可能です。

Fr\'echet Inception Distance (FID) is a widely used metric for assessing synthetic image quality. It relies on an ImageNet-based feature extractor, making its applicability to medical imaging unclear. A recent trend is to adapt FID to medical imaging through feature extractors trained on medical images. Our study challenges this practice by demonstrating that ImageNet-based extractors are more consistent and aligned with human judgment than their RadImageNet counterparts. We evaluated sixteen StyleGAN2 networks across four medical imaging modalities and four data augmentation techniques with Fr\'echet distances (FDs) computed using eleven ImageNet or RadImageNet-trained feature extractors. Comparison with human judgment via visual Turing tests revealed that ImageNet-based extractors produced rankings consistent with human judgment, with the FD derived from the ImageNet-trained SwAV extractor significantly correlating with expert evaluations. In contrast, RadImageNet-based rankings were volatile and inconsistent with human judgment. Our findings challenge prevailing assumptions, providing novel evidence that medical image-trained feature extractors do not inherently improve FDs and can even compromise their reliability. Our code is available at https://github.com/mckellwoodland/fid-med-eval.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# Polyak モメンタムを呈し, 大型カタパルトによる発火性小腫の発見

Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults ( http://arxiv.org/abs/2311.15051v3 )

ライセンス: Link先を確認

Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun,

(参考訳) ポリアクの運動量による勾配降下は、現代の機械や深層学習で広く使われているが、訓練軌道に対するその影響の具体的な理解はいまだ解明されていない。本研究では, 線形対角線ネットワークや非線形ニューラルネットワークの場合, 学習率の高い運動量勾配は大きなカタパルトを呈し, 勾配勾配よりもはるかに平坦なミニマに向かって反復することを示した。大カタパルトは自己安定化効果(Damian et al , 2023)の運動量"延長"によって引き起こされると仮定する。我々は、単純なおもちゃの例と線形対角ネットワークの仮説を支持する実証的な証拠で、我々の仮説を理論的、実証的に支持する。

Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains elusive. In this work, we empirically show that for linear diagonal networks and nonlinear neural networks, momentum gradient descent with a large learning rate displays large catapults, driving the iterates towards much flatter minima than those found by gradient descent. We hypothesize that the large catapult is caused by momentum "prolonging" the self-stabilization effect (Damian et al., 2023). We provide theoretical and empirical support for our hypothesis in a simple toy example and empirical evidence supporting our hypothesis for linear diagonal networks.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# 連続測定による空洞結合原子アンサンブルのスピンスクイーズ生成の解析

Analysis of spin-squeezing generation in cavity-coupled atomic ensembles with continuous measurements ( http://arxiv.org/abs/2311.15725v3 )

ライセンス: Link先を確認

A. Caprotti, M. Barbiero, M. G. Tarallo, M. G. Genoni, G. Bertaina,

(参考訳) 我々は3レベル原子の光キャビティへの結合と透過キャビティ場の連続量子測定によるスピンスクイーズ状態の生成を分析し、原子アンサンブルの進化をモニタリングする。解析処理と顕微鏡シミュレーションにより,原子数$N$で大きなスピンスクイーズを実現できることを示した。しかし、いくつかの文献とは対照的に、最適なアプローチで提案される継続的なフィードバックなしにハイゼンベルクのスケーリングが得られないことを明確にする。実際、断熱キャビティ除去近似と大きな$N$制限では、スピンスクイーズに対して$N^{-2/3}$、対応するプロトコル長に対して$N^{-1/3}$のスケーリング挙動が見つかる。これらの結果はブロッホ球の曲率を考えることでのみ得られるが、これは集合スピン作用素をその赤道に直交的に線型化することで不正確な予測が得られるからである。完全なシミュレーションにより, スピンスクイーズ生成がシステムパラメータにどのように依存するかを特徴付けるとともに, キャビティ充填のダイナミクスと徐々に混合して, メトロジー上の優位性が失われるまで, 悪いキャビティ状態から逸脱する。最後に、このスピンスクイーズプロトコルの最先端光時計への応用について論じる。

We analyze the generation of spin-squeezed states via coupling of three-level atoms to an optical cavity and continuous quantum measurement of the transmitted cavity field in order to monitor the evolution of the atomic ensemble. Using analytical treatment and microscopic simulations of the dynamics, we show that one can achieve significant spin squeezing, favorably scaling with the number of atoms $N$. However, contrary to some previous literature, we clarify that it is not possible to obtain Heisenberg scaling without the continuous feedback that is proposed in optimal approaches. In fact, in the adiabatic cavity removal approximation and large $N$ limit, we find the scaling behavior $N^{-2/3}$ for spin squeezing and $N^{-1/3}$ for the corresponding protocol duration. These results can be obtained only by considering the curvature of the Bloch sphere, since linearizing the collective spin operators tangentially to its equator yields inaccurate predictions. With full simulations, we characterize how spin-squeezing generation depends on the system parameters and departs from the bad cavity regime, by gradually mixing with cavity-filling dynamics until metrological advantage is lost. Finally, we discuss the relevance of this spin-squeezing protocol to state-of-the-art optical clocks.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# スパイにおけるホログラフィックエンタングルメントエントロピー

A Holographic Entanglement Entropy at Spi ( http://arxiv.org/abs/2311.16056v2 )

ライセンス: Link先を確認

Abir Ghosh, Chethan Krishnan,

(参考訳) 場の量子論における部分領域に対する有限エンタングルメントエントロピーを定義するには、2つの論理的に独立なスケール、すなわち部分領域のサイズを制御するIRスケールとUVカットオフが必要である。 AdS/CFTでは、IRスケールはAdS長尺、UVカットオフはバルクラジアルカットオフ、サブリージョンは無次元の角度で指定される。これは、AdS/CFTにおける龍高柳表面とその領域を決定するデータである。漸近的に平坦な空間には、空間無限大(spi)に関連付けることのできる ``spi-部分領域' という概念が存在すると論じる。幾何的にAdS部分領域とは全く異なるが、この角度データはスピの2分割として解釈できる重要な特徴を持っている。したがって、スパイス領域に関連するRT面の面積は、AdS/CFTのように、この二分割の下でのバルク状態の還元密度行列の絡み合いエントロピーと解釈できる。対称スパイサブリージョンでは、これらのRT面は漸近カウサルダイヤモンドの腰である。空の平坦空間では、それらはリンドラー地平線に還元され、カッシーニ、フエルタ、マイヤーズのAdS-リンドラー地平線に類似する。これらの結果は、空空間のスクリーンに固定された最小曲面に関する以前の研究と結びついており、また、ブラックホールがバルクにある場合の議論を一般化している。スパイス領域としてのブラックホール RT の表面の位相は様々であり、AdS のブラックホール (小、大) の位相と自然に結合する。重要な観測は、放射状カットオフは平らな空間におけるIRスケールと関連しており、実際には紫外線の発散は存在しないということである。これは、サブAdSスケールにおいてホログラフィック双対性はIR/IR対応であり、自由度は局所QFTのそれではなく長弦のものであるという以前の提案と一致している。弦はもちろん、UV有限である。

Defining finite entanglement entropy for a subregion in quantum field theory requires the introduction of two logically independent scales: an IR scale that controls the size of the subregion, and a UV cut-off. In AdS/CFT, the IR scale is the AdS lengthscale, the UV cut-off is the bulk radial cut-off, and the subregion is specified by dimensionless angles. This is the data that determines Ryu-Takayanagi surfaces and their areas in AdS/CFT. We argue that in asymptotically flat space there exists the notion of a ``spi-subregion" that one can associate to spatial infinity (spi). Even though geometrically quite different from an AdS subregion, this angle data has the crucial feature that it allows an interpretation as a bi-partitioning of spi. Therefore, the area of the RT surface associated to the spi-subregion can be interpreted as the entanglement entropy of the reduced density matrix of the bulk state under this bi-partition, as in AdS/CFT. For symmetric spi-subregions, these RT surfaces are the waists of Asymptotic Causal Diamonds. In empty flat space they reduce to Rindler horizons, and are analogues of the AdS-Rindler horizons of Casini, Huerta \& Myers. We connect these results to previous work on minimal surfaces anchored to screens in empty space, but also generalize the discussion to the case where there are black holes in the bulk. The phases of black hole RT surfaces as the spi-subregion is varied, naturally connect with those of black holes (small and large) in AdS. A key observation is that the radial cut-off is associated to an IR scale in flat space -- and in fact there are no UV divergences. We argue that this is consistent with previous suggestions that in sub-AdS scales the holographic duality is an IR/IR correspondence and that the degrees of freedom are {\em not} those of a local QFT, but those of long strings. Strings are of course, famously UV finite.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# FedAL: 敵対的学習によって実現されたブラックボックスのフェデレーション知識蒸留

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning ( http://arxiv.org/abs/2311.16584v2 )

ライセンス: Link先を確認

Pengchao Han, Xingyan Shi, Jianwei Huang,

(参考訳) 知識蒸留(KD)は、異なるモデルアーキテクチャを持ち、ローカルデータやモデルパラメータを他と共有しない分散クライアント間の協調学習を可能にする。各クライアントは、フェデレートされたKDとして知られるターゲットとして、すべてのクライアントモデルの平均モデル出力/機能を使用して、ローカルモデルを更新する。しかし、クライアントのローカルモデルが不均一なローカルデータセットでトレーニングされている場合、既存のフェデレーションKDメソッドはよく機能しないことが多い。本稿では,クライアント間のデータ不均一性に対処するために,Adversarial Learning (FedAL) によって実現されたフェデレーション知識の蒸留を提案する。まず、データの不均一性に起因するクライアント間の局所モデル出力のばらつきを軽減するため、サーバはクライアント間のコンセンサスモデル出力をクライアントと差別者間のmin-maxゲームを介してクライアント間のコンセンサスモデル出力を達成するために、クライアントのローカルモデルトレーニングを誘導する識別器として機能する。さらに、クライアントの不均一なローカルデータのために、クライアントのローカルトレーニングとグローバルな知識伝達の間に破滅的な忘れが生じる可能性がある。この課題に向けて、我々は、クライアントが他者へ知識を転送/学習する能力を保証するため、ローカルトレーニングとグローバルナレッジトランスファーの両方において、予測の少ない正規化を設計する。実験により,FedALとその変異体は,他の連合KDベースラインよりも高い精度が得られることが示された。

Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# アンチエイリアスレンダリングのためのマルチスケール3次元ガウススプラッティング

Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering ( http://arxiv.org/abs/2311.17089v2 )

ライセンス: Link先を確認

Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee,

(参考訳) 3Dガウシアンは最近、3D再構成とレンダリングの非常に効率的な表現として現れた。高精細度で高精細度で高精細度でレンダリングするが、低精細度や遠方のカメラ位置でレンダリングすると大幅に劣化する。低解像度または遠距離レンダリングにおいて、画像の画素サイズは、スティングされた各3Dガウスの画面サイズと比較してニキスト周波数以下になり、エイリアス効果をもたらす。レンダリングは1ピクセルあたりのよりスプティングされたガウスアンの連続したアルファブレンディングによって劇的に遅くなる。これらの問題に対処するために,異なるスケールでガウスを維持できるマルチスケール3次元ガウススプラッティングアルゴリズムを提案する。高解像度画像はより小さなガウスでレンダリングされ、低解像度画像はより小さなガウスでレンダリングされる。同様のトレーニング時間で,本アルゴリズムは,1次元ガウス分割よりも4$\times$-128$\times$スケールレンダリングで,13\%-66\% PSNRと160\%-2400\%のレンダリング速度を達成できる。私たちのコードや他の結果は、プロジェクトのWebサイトhttps://jokeryan.github.io/projects/ms-gs/で公開されています。

3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13\%-66\% PSNR and 160\%-2400\% rendering speed improvement at 4$\times$-128$\times$ scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splitting. Our code and more results are available on our project website https://jokeryan.github.io/projects/ms-gs/

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# Brainformer:fMRIによる人間の視覚脳機能とマシンビジョンモデル

Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI ( http://arxiv.org/abs/2312.00236v2 )

ライセンス: Link先を確認

Xuan-Bac Nguyen, Xin Li, Pawan Sinha, Samee U. Khan, Khoa Luu,

(参考訳) 人間の知覚は、信念を形成し、現実を理解する上で重要な役割を果たす。脳機能のより深い理解は、新しいディープニューラルネットワークの開発につながるだろう。本研究では,人間の知覚システムにおける機能的磁気共鳴イメージング(fMRI)パターンを機械学習の観点から解析するための,単純で効果的なトランスフォーマーベースフレームワークであるBrainformerを提案する。具体的には,fMRI信号を用いて脳活動パターンを探索するマルチスケールfMRI変換器を提案する。このアーキテクチャは、高次元fMRI信号符号化のためのシンプルだが効率的なモジュールを含み、3D Voxels Embeddingと呼ばれる新しい埋め込み技術が組み込まれている。第2に、脳の関心領域の機能からインスピレーションを得て、脳fMRI誘導損失と呼ばれる新しい損失関数を導入する。この損失関数は、fMRIデータを用いてディープニューラルネットワークのこれらの領域からの脳活動パターンを模倣する。この研究は、人間の知覚からニューラルネットワークへ知識を伝達する先進的なアプローチを導入する。実験により、fMRI情報を活用することで、様々な画像認識タスクにおいて、マシンビジョンモデルがState-of-the-Artメソッドに匹敵する結果が得られることを示した。

Human perception plays a vital role in forming beliefs and understanding reality. A deeper understanding of brain functionality will lead to the development of novel deep neural networks. In this work, we introduce a novel framework named Brainformer, a straightforward yet effective Transformer-based framework, to analyze Functional Magnetic Resonance Imaging (fMRI) patterns in the human perception system from a machine-learning perspective. Specifically, we present the Multi-scale fMRI Transformer to explore brain activity patterns through fMRI signals. This architecture includes a simple yet efficient module for high-dimensional fMRI signal encoding and incorporates a novel embedding technique called 3D Voxels Embedding. Secondly, drawing inspiration from the functionality of the brain's Region of Interest, we introduce a novel loss function called Brain fMRI Guidance Loss. This loss function mimics brain activity patterns from these regions in the deep neural network using fMRI data. This work introduces a prospective approach to transfer knowledge from human perception to neural networks. Our experiments demonstrate that leveraging fMRI information allows the machine vision model to achieve results comparable to State-of-the-Art methods in various image recognition tasks.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# ImputeFormer: 一般化可能な時空間インプットのための低ランク変換器

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation ( http://arxiv.org/abs/2312.01728v3 )

ライセンス: Link先を確認

Tong Nie, Guoyang Qin, Wei Ma, Yuewen Mei, Jian Sun,

(参考訳) データ不足は、特に時空間データのモデリングにおいて、科学と工学の両方のタスクにおいて広範囲にわたる問題である。この問題はデータ駆動型ソリューションに貢献するために多くの研究を惹きつける。既存の計算ソリューションには、主に低ランクモデルとディープラーニングモデルが含まれる。前者は一般的な構造上の先行を前提としているが、モデル能力は限られている。後者は表現性の健全な特徴を持っているが、基礎となる時空間構造についての事前の知識が欠けている。 2つのパラダイムの強みを生かして、強い帰納バイアスと高いモデル表現率のバランスをとるために、低ランク化誘起変換器を実証する。時空間データの固有構造を活用することにより、バランスの取れた信号-雑音表現を学習し、様々な計算問題に対して一般化することができる。交通流,太陽エネルギー,スマートメーター,空気品質など,異種データセットの精度,効率,汎用性において,その優位性を示す。実証結果の証明は、低ランク性のような時系列プリミティブを組み込むことで、広範囲の時空間計算問題にアプローチする一般化可能なモデルの開発を大幅に促進できるという強い信念を与える。

Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to data-driven solutions. Existing imputation solutions mainly include low-rank models and deep learning models. The former assumes general structural priors but has limited model capacity. The latter possesses salient features of expressivity but lacks prior knowledge of the underlying spatiotemporal structures. Leveraging the strengths of both two paradigms, we demonstrate a low rankness-induced Transformer to achieve a balance between strong inductive bias and high model expressivity. The exploitation of the inherent structures of spatiotemporal data enables our model to learn balanced signal-noise representations, making it generalizable for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and versatility in heterogeneous datasets, including traffic flow, solar energy, smart meters, and air quality. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rankness, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# FaultFormer: 適応型ベアリング故障分類のための事前学習用変換器

FaultFormer: Pretraining Transformers for Adaptable Bearing Fault Classification ( http://arxiv.org/abs/2312.02380v3 )

ライセンス: Link先を確認

Anthony Zhou, Amir Barati Farimani,

(参考訳) グローバルな消費の増大は、ディープラーニングのスマート製造や機械の健康モニタリングへの重要な応用を動機付けてきた。特に、振動データの解析は、軸受欠陥の検出により予測保守に関する有意義な洞察を抽出する大きな可能性を秘めている。ディープラーニングは、これらの機械的故障を予測する強力な方法だが、新しいタスクやデータセットへの一般化性に欠け、高価なラベル付き機械的データを必要とする。本稿では,トランスモデルに基づく自己教師型事前学習および微調整フレームワークを提案することで,この問題に対処する。特に、トランスモデルを用いて、最先端の精度に到達するための異なるトークン化とデータ拡張戦略について検討する。さらに、振動信号に対する自己教師付きマスクプリトレーニングとその低データ状態、タスク適応、データセット適応への応用を実証する。事前トレーニングは、不足した未確認のトレーニングサンプルのパフォーマンス向上と、事前トレーニングディストリビューション以外の障害クラスを微調整する際のパフォーマンス向上を可能にする。さらに、事前訓練されたトランスフォーマーは、数ショットで異なるデータセットに一般化できることが示されている。このパラダイムでは、異なるベアリング、障害、機械からラベル付けされていないデータに基づいてモデルを事前訓練し、特定の製造ニーズに合った、新しいデータ共有アプリケーションに素早くデプロイすることが可能になる。

The growth of global consumption has motivated important applications of deep learning to smart manufacturing and machine health monitoring. In particular, analyzing vibration data offers great potential to extract meaningful insights into predictive maintenance by the detection of bearing faults. Deep learning can be a powerful method to predict these mechanical failures; however, they lack generalizability to new tasks or datasets and require expensive, labeled mechanical data. We address this by presenting a novel self-supervised pretraining and fine-tuning framework based on transformer models. In particular, we investigate different tokenization and data augmentation strategies to reach state-of-the-art accuracies using transformer models. Furthermore, we demonstrate self-supervised masked pretraining for vibration signals and its application to low-data regimes, task adaptation, and dataset adaptation. Pretraining is able to improve performance on scarce, unseen training samples, as well as when fine-tuning on fault classes outside of the pretraining distribution. Furthermore, pretrained transformers are shown to be able to generalize to a different dataset in a few-shot manner. This introduces a new paradigm where models can be pretrained on unlabeled data from different bearings, faults, and machinery and quickly deployed to new, data-scarce applications to suit specific manufacturing needs.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# マシンビジョンセラピー:マルチモーダルな大規模言語モデルでは、文脈内学習による視覚的ロバスト性を高めることができる

Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning ( http://arxiv.org/abs/2312.02546v2 )

ライセンス: Link先を確認

Zhuo Huang, Chang Liu, Yinpeng Dong, Hang Su, Shibao Zheng, Tongliang Liu,

(参考訳) Contrastive Language-Image Pre-Training (CLIP) のような視覚モデルは、優れた一般化性能を示すが、そのゼロショットのロバスト性は、微調整なしではout-of-Distribution (OOD) のシナリオで制限されている。一般的に行われるように、人間の監督を好ましく提供するのではなく、強力な視覚的理解能力を持つマルチモーダルな大規模言語モデル(MLLM)を利用することができる。しかし、MLLMはタスクの不整合性により視覚問題に苦しむことが示され、その利用を妨げている。本稿では,MLLMを効果的に活用して,視覚モデルからノイズ予測を補正するマシンビジョンセラピーを提案する。復調ラベルを微調整することにより、教師なしの方法で学習モデルの性能を高めることができる。不整合性問題を解決するために,視覚タスクをMLLMと整合させる新しいDICL戦略を提案する。具体的には、あるクラスが他のクラスと混同される確率を推定する遷移行列を推定することにより、最も確率の高いノイズクラスから正しい例と間違った例を含む命令を構築することができる。このような命令は、ICL能力を持つ任意のMLLMにおいて、視覚モデルの誤った予測を検出し、修正するのに役立つ。 ImageNet、WILDS、DomainBed、その他のOODデータセットに関する広範な実験を通じて、本手法の定量的かつ定性的な効果を慎重に検証する。私たちのコードはhttps://github.com/tmllab/Machine_Vision_Therapyで利用可能です。

Although vision models such as Contrastive Language-Image Pre-Training (CLIP) show impressive generalization performance, their zero-shot robustness is still limited under Out-of-Distribution (OOD) scenarios without fine-tuning. Instead of undesirably providing human supervision as commonly done, it is possible to take advantage of Multi-modal Large Language Models (MLLMs) that hold powerful visual understanding abilities. However, MLLMs are shown to struggle with vision problems due to the incompatibility of tasks, thus hindering their utilization. In this paper, we propose to effectively leverage MLLMs to conduct Machine Vision Therapy which aims to rectify the noisy predictions from vision models. By fine-tuning with the denoised labels, the learning model performance can be boosted in an unsupervised manner. To solve the incompatibility issue, we propose a novel Denoising In-Context Learning (DICL) strategy to align vision tasks with MLLMs. Concretely, by estimating a transition matrix that captures the probability of one class being confused with another, an instruction containing a correct exemplar and an erroneous one from the most probable noisy class can be constructed. Such an instruction can help any MLLMs with ICL ability to detect and rectify incorrect predictions of vision models. Through extensive experiments on ImageNet, WILDS, DomainBed, and other OOD datasets, we carefully validate the quantitative and qualitative effectiveness of our method. Our code is available at https://github.com/tmllab/Machine_Vision_Therapy.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# 自己監督型事前訓練とカスタマイズ型ファインチューニングを用いた変圧器によるレーンレンダリングの知的異常検出

Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning ( http://arxiv.org/abs/2312.04398v2 )

ライセンス: Link先を確認

Yongqi Dong, Xingmin Lu, Ruohan Li, Wei Song, Bart van Arem, Haneen Farah,

(参考訳) デジタルマップを使った急成長するナビゲーションサービスは、ドライバーにとって非常に便利だ。それでも、レーンレンダリングマップ画像における異常の存在は、しばしば潜在的な危険をもたらし、そのような異常は人間の運転者に誤解を与え、結果として安全でない運転条件に寄与する。そこで本論文では,データ前処理,マスク付き画像モデリング(MiM)手法による自己教師型事前学習,ラベル平滑化によるクロスエントロピーベース損失を用いた微調整,そして後処理により,最先端のディープラーニング技術,特にトランスフォーマーモデルを用いた4相パイプラインを提案する。提案したパイプラインの有効性を検証した各種実験を行った。その結果,提案パイプラインはレーンレンダリング画像異常検出において優れた性能を示し,特にMiMを用いた自己教師付き事前学習は,全体のトレーニング時間を著しく短縮し,検出精度を大幅に向上させることができることがわかった。例えば、Uniform Maskingを自己教師付きプレトレーニング(Swin-Trans-UM)として使用すると、94.77%の精度が得られ、AUCスコアは0.9743となり、プレトレーニングのない純粋なSwin Transformer(Swin-Trans)は94.01%、AUCは0.9498となった。微調整のエポックは、オリジナルの280から41に劇的に縮小された。結論として,MiMや他の先進的なディープラーニング技術を用いた自己教師付き事前学習を取り入れたパイプラインが,デジタルナビゲーションシステムにおけるレーンレンダリング画像異常検出の精度と効率を高めるための堅牢なソリューションとして登場した。

The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# AIベースのリアクティブシステムによるサイバー攻撃の対処 - 全体論と今後の展望

Tackling Cyberattacks through AI-based Reactive Systems: A Holistic Review and Future Vision ( http://arxiv.org/abs/2312.06229v2 )

ライセンス: Link先を確認

Sergio Bernardez Molina, Pantaleone Nespoli, Félix Gómez Mármol,

(参考訳) 情報技術(IT)の利用が、今日の世界で指数的な成長を遂げていることは否定できない。このデジタルトランスフォーメーションは、サイバー犯罪の領域において、数多くのセキュリティ上の課題を引き起こしている。こうした脅威に応えて、公共部門と民間部門はITセキュリティ対策の強化を優先している。セキュリティ上の懸念が高まる中、人工知能(AI)はサイバーセキュリティの世界で注目を集めている。本稿では,AIによる脅威応答システムの最近の進歩を包括的に調査する。私たちの知る限り、AI反応ドメインに関する最新の調査は2017年に実施された。それ以来、多くの文献が出版されており、レビューする価値がある。最先端の反応系に関する包括的調査では、複数の値を持つ5つの重要な特徴が同定され、異なる研究間の均質な比較が促進された。さらに、記事収集の厳密な方法論を通じて、この分野で最も関係のある22の出版物が選択されている。その後、これらの出版物は、識別された特徴を用いて詳細な分析の対象となり、論文間の重要な関係を明らかにする包括的な概要を生成できるようになった。これらの関係は、文学における潜在的なギャップの同定とともに、論文でさらに詳しく説明され、将来的な貢献を導く可能性がある。これらの潜在的なギャップを指摘し、具体的な提案を通じて可能な開発領域を提案することで、合計7つの研究課題が特定されている。

There is no denying that the use of Information Technology (IT) is undergoing exponential growth in today's world. This digital transformation has also given rise to a multitude of security challenges, notably in the realm of cybercrime. In response to these growing threats, public and private sectors have prioritized the strengthening of IT security measures. In light of the growing security concern, Artificial Intelligence (AI) has gained prominence within the cybersecurity landscape. This paper presents a comprehensive survey of recent advancements in AI-driven threat response systems. To the best of our knowledge, the most recent survey covering the AI reaction domain was conducted in 2017. Since then, considerable literature has been published, and therefore, it is worth reviewing it. In this comprehensive survey of the state of the art reaction systems, five key features with multiple values have been identified, facilitating a homogeneous comparison between the different works. In addition, through a meticulous methodology of article collection, the 22 most relevant publications in the field have been selected. Then each of these publications has been subjected to a detailed analysis using the features identified, which has allowed for the generation of a comprehensive overview revealing significant relationships between the papers. These relationships are further elaborated in the paper, along with the identification of potential gaps in the literature, which may guide future contributions. A total of seven research challenges have been identified, pointing out these potential gaps and suggesting possible areas of development through concrete proposals.

翻訳日:2024-05-31 00:00:32 公開日:2024-05-29

# テキストガイドでリアル世界のイメージをデノイング

Tell Me What You See: Text-Guided Real-World Image Denoising ( http://arxiv.org/abs/2312.10191v2 )

ライセンス: Link先を確認

Erez Yosef, Raja Giryes,

(参考訳) ノイズセンサによる画像再構成は難しい問題である。多くの解決策が提案されているが、主なアプローチは、シーンのノイズの真の統計をモデル化すると共に、優れた自然像を事前に学習することである。非常に低い照明条件下では、そのようなアプローチは通常不十分であり、例えば、複数のキャプチャーを使用するという形で追加情報が必要である。我々は,シーンを撮影する撮影者が容易に行えるように,シーンを事前に記述する代替手段として提案する。画像生成における拡散モデルの成功に触発されて,テキスト誘導拡散モデルを用いて,画像キャプション情報の追加は,合成画像と実画像の両方において,画像の復調と再構成を著しく改善することを示す。

Image reconstruction from noisy sensor measurements is a challenging problem. Many solutions have been proposed for it, where the main approach is learning good natural images prior along with modeling the true statistics of the noise in the scene. In the presence of very low lighting conditions, such approaches are usually not enough, and additional information is required, e.g., in the form of using multiple captures. We suggest as an alternative to add a description of the scene as prior, which can be easily done by the photographer capturing the scene. Inspired by the remarkable success of diffusion models for image generation, using a text-guided diffusion model we show that adding image caption information significantly improves image denoising and reconstruction on both synthetic and real-world images.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# SICsとTriangle Group$(3,3,3)$

SICs and the Triangle Group $(3,3,3)$ ( http://arxiv.org/abs/2312.13400v2 )

ライセンス: Link先を確認

Danylo Yakymenko,

(参考訳) 対称情報完備な正値測度 (SICs for short) がすべての次元に存在するという問題は、ザウナー予想として知られており、今日まで残っている。既知のSICの例のほとんどは、ワイル・ハイゼンベルク群の作用の軌道として構成されている。このような場合、SICは、ワイル・ハイゼンベルク群の自己同型を定義する、いわゆる正準三次ユニタリの下で不変であるようである。ここでは、これらの位数 3 個のユニタリが三角形群 $(3,3,3)$ の射影ユニタリ表現に現れることを示す。このような表現の完全な記述と、正準三次ユニタリの構造に関する結果を得るためにどのように使用できるかを示す。特に、任意の正準位数 3 のユニタリが、次元 $d>3$ が素数であれば、ザウナーのユニタリに共役であるという事実を証明する別の方法を示す。

The problem of existence of symmetric informationally-complete positive operator-valued measures (SICs for short) in every dimension is known as Zauner's conjecture and remains open to this day. Most of the known SIC examples are constructed as an orbit of the Weyl-Heisenberg group action. It appears that in these cases SICs are invariant under the so-called canonical order-three unitaries, which define automorphisms of the Weyl-Heisenberg group. In this note, we show that those order-three unitaries appear in projective unitary representations of the triangle group $(3,3,3)$. We give a full description of such representations and show how it can be used to obtain results about the structure of canonical order-three unitaries. In particular, we present an alternative way of proving the fact that any canonical order-three unitary is conjugate to Zauner's unitary if the dimension $d>3$ is prime.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# 平面符号による二項符号の結合

Concatenating Binomial Codes with the Planar Code ( http://arxiv.org/abs/2312.14390v2 )

ライセンス: Link先を確認

Juliette Soule, Andrew C. Doherty, Arne L. Grimsmo,

(参考訳) 回転対称ボソニック符号(英: Rotation symmetric bosonic codes)は、特に超伝導量子ビット実験において、量子ビットの振動度に魅力的な符号化法である。これらのコードはかなりの損失とデファス化を許容するが、大規模なデバイスを実現するためには、より高いレベルのコードと組み合わせる必要がある。耐故障性量子計算のための計測に基づくスキームにおいて,これらの符号と平面符号の整合性について検討する。本稿では,基本レベルエンコーディングとしての二項符号に着目し,様々な種類の測定プロトコルにおいて損失を受ける場合のブレークフェアポイントを推定する。これらの符号は光子損失誤差に耐性があるが、ゲート演算と測定には平均光子数と位相分解能の両方を必要とする。二項符号量子ビットを用いた平面符号の優れた性能を得るために、適応位相測定、最大量子状態推定、重み付き最小重み復号法を実装する必要がある。

Rotation symmetric bosonic codes are an attractive encoding for qubits into oscillator degrees of freedom, particularly in superconducting qubit experiments. While these codes can tolerate considerable loss and dephasing, they will need to be combined with higher level codes to achieve large-scale devices. We investigate concatenating these codes with the planar code in a measurement-based scheme for fault-tolerant quantum computation. We focus on binomial codes as the base level encoding, and estimate break-even points for such encodings under loss for various types of measurement protocol. These codes are more resistant to photon loss errors, but require both higher mean photon numbers and higher phase resolution for gate operations and measurements. We find that it is necessary to implement adaptive phase measurements, maximum likelihood quantum state inference, and weighted minimum weight decoding to obtain good performance for a planar code using binomial code qubits.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# 多項集合に対する排他的有限時間相関関数:量子輸送と熱力学の理論的枠組みの連結

Exact finite-time correlation functions for multi-terminal setups: Connecting theoretical frameworks for quantum transport and thermodynamics ( http://arxiv.org/abs/2312.15065v3 )

ライセンス: Link先を確認

Gianmichele Blasi, Shishir Khandelwal, Géraldine Haack,

(参考訳) オープン量子系の輸送は、量子マスター方程式、散乱行列、ハイゼンベルク運動方程式など、様々な理論的な枠組みを通して探索することができる。フレームワークの選択は、相互作用の存在、システムと環境の間の結合強度、定常状態か過渡状態かといった要因に依存する。既存の文献はこれらのフレームワークを独立して扱い、統一された視点を欠いている。我々の研究は、電圧と温度の偏りの下での2段階のセットアップにおいて、最小レベルの量子ドットモデルを用いて、これらのアプローチの役割と現状を明らかにすることで、このギャップに対処する。粒子およびエネルギー電流の解析式と定常状態と過渡状態の両方における変動を導出する。ハイゼンベルク方程式の厳密な結果は、それぞれの妥当性条件の中で散乱行列やマスター方程式のアプローチと一致していることが示される。重要なことは、Heisenberg や散乱行列による弱カップリングにおけるマスター方程式の適用性を任意の結合強度でブリッジし、弱カップリング限界のプロトコルを確立することである。

Transport in open quantum systems can be explored through various theoretical frameworks, including the quantum master equation, scattering matrix, and Heisenberg equation of motion. The choice of framework depends on factors such as the presence of interactions, the coupling strength between the system and environment, and whether the focus is on steady-state or transient regimes. Existing literature treats these frameworks independently, lacking a unified perspective. Our work addresses this gap by clarifying the role and status of these approaches using a minimal single-level quantum dot model in a two-terminal setup under voltage and temperature biases. We derive analytical expressions for particle and energy currents and their fluctuations in both steady-state and transient regimes. Exact results from the Heisenberg equation are shown to align with scattering matrix and master equation approaches within their respective validity regimes. Crucially, we establish a protocol for the weak-coupling limit, bridging the applicability of master equations at weak-coupling with Heisenberg or scattering matrix approaches at arbitrary coupling strength.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# ファジィドライバ生成のためのプロンプトファジィ

Prompt Fuzzing for Fuzz Driver Generation ( http://arxiv.org/abs/2312.17677v2 )

ライセンス: Link先を確認

Yunlong Lyu, Yuxuan Xie, Peng Chen, Hao Chen,

(参考訳) 高品質なファジィドライバを構築するには時間を要するだけでなく、ライブラリの深い理解も必要です。しかし、最先端のファズドライバ自動生成技術は期待に届かなかった。消費者コードから派生したファジィドライバは深い州に到達できるが、カバー範囲は限られている。逆に、解釈ファジィはほとんどのAPI呼び出しを探索できるが、大規模な検索空間内では数多くの試行が必要である。 PromptFuzzは,未知のライブラリコードを探索するために,ファジィドライバを反復的に生成するファジィ処理を行う,カバーガイド付ファジィファジィである。ファジィファジィ処理におけるファジィドライバのAPI使用法を検討するために,命令型プログラム生成,誤プログラム検証,カバレッジ誘導型プロンプト突然変異,制約付きファジィスケジューリングなど,いくつかの重要な手法を提案する。 PromptFuzzを実装し,14の現実世界のライブラリで評価した。 OSS-FuzzとHopper(最先端のファズドライバ生成ツール)と比較して、PromptFuzzが生成したファズドライバはそれぞれOSS-FuzzとHopperのブランチカバレッジの1.61倍と1.63倍を達成した。さらに、PromptFuzzが生成したファズドライバは、合計49回のクラッシュのうち33回の真に新しいバグを検出し、そのうち30回のバグがそれぞれのコミュニティによって確認されている。

Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous attempts within a large search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we propose several key techniques: instructive program generation, erroneous program validation, coverage-guided prompt mutation, and constrained fuzzer scheduling. We implemented PromptFuzz and evaluated it on 14 real-world libraries. Compared with OSS-Fuzz and Hopper (the state-of-the-art fuzz driver generation tool), fuzz drivers generated by PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than those by OSS-Fuzz and Hopper, respectively. Moreover, the fuzz drivers generated by PromptFuzz detected 33 genuine, new bugs out of a total of 49 crashes, out of which 30 bugs have been confirmed by their respective communities.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# タブラルデータの自動モデル選択

Automated Model Selection for Tabular Data ( http://arxiv.org/abs/2401.00961v2 )

ライセンス: Link先を確認

Avinash Amballa, Gayathri Akkinapalli, Manas Madine, Naga Pavana Priya Yarrabolu, Przemyslaw A. Grabowicz,

(参考訳) 表形式のデータセットの形式で構造化されたデータには、異なる、離散的な特徴が含まれており、個々の重要度とターゲットに対する相対的重要性は様々である。 1つ以上の機能の組み合わせは、単純な個々の機能コントリビューションよりも予測的かつ有意義なものです。 Rの混合効果線形モデルライブラリは、モデル設計においてそのようなインタラクティブな機能の組み合わせを提供することができる。しかし、多くの特徴とそこから選択できる相互作用を考えると、モデル選択は指数関数的に難しいタスクとなる。計算コストを小さく保ちながら特徴的相互作用を取り入れた表型データセットの予測のためのモデル選択プロセスを自動化することを目的としている。このフレームワークには、優先順位ベースのランダムグリッド検索とグレディ検索という、2つの異なる機能選択のアプローチが含まれている。優先性に基づくアプローチでは、事前確率を用いて特徴組合せを効率的に探索し、探索を誘導する。 Greedyメソッドは、その影響に基づいて機能を追加したり削除したりすることで、反復的にソリューションを構築します。合成実験は、予測的特徴の組み合わせを効果的に捉える能力を示す。

Structured data in the form of tabular datasets contain features that are distinct and discrete, with varying individual and relative importances to the target. Combinations of one or more features may be more predictive and meaningful than simple individual feature contributions. R's mixed effect linear models library allows users to provide such interactive feature combinations in the model design. However, given many features and possible interactions to select from, model selection becomes an exponentially difficult task. We aim to automate the model selection process for predictions on tabular datasets incorporating feature interactions while keeping computational costs small. The framework includes two distinct approaches for feature selection: a Priority-based Random Grid Search and a Greedy Search method. The Priority-based approach efficiently explores feature combinations using prior probabilities to guide the search. The Greedy method builds the solution iteratively by adding or removing features based on their impact. Experiments on synthetic demonstrate the ability to effectively capture predictive feature combinations.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# 変動創発予測のためのグラフニューラルネットワークのダイナミクスに基づく特徴増強

Dynamics-based Feature Augmentation of Graph Neural Networks for Variant Emergence Prediction ( http://arxiv.org/abs/2401.03390v2 )

ライセンス: Link先を確認

Majd Al Aawar, Srikar Mutnuri, Mansooreh Montazerin, Ajitesh Srivastava,

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックで、新型ウイルスの出現が大きな要因となっている。 1つ以上の国で新しい変種が出現すると、他の国は、その潜在的な到着に備えてその拡散を監視する。新しい変種の影響と流行のピークのタイミングは、変種がいつ到着するかに大きく依存する。新たな変種の普及を予測するための現在の手法は統計モデルに依存しているが、これらの手法は、新しい変種が既に関心のある領域に到達し、有意な有病率を持つ場合にのみ有効である。既存の変種が特定のリージョンにいつ到着するかを予測できますか? この問題に対処するために,変量力学インフォームドグラフニューラルネット(GNN)アプローチを提案する。まず、大規模な流行モデルに適用可能な地域(国)のペア間での変動有病率のダイナミクスを導出する。このダイナミクスは、GNNに特定の機能を導入する動機となっている。提案した動的インフォームドGNNは,現在普及している物理インフォームドニューラルネットワーク(PINN)のフレームワークを含む,すべてのベースラインより優れていることを示す。そこで本研究では,87か国,36変種を対象に,ユーザ定義モデルの予測性能を評価するベンチマークツールを提案する。

During the COVID-19 pandemic, a major driver of new surges has been the emergence of new variants. When a new variant emerges in one or more countries, other nations monitor its spread in preparation for its potential arrival. The impact of the new variant and the timings of epidemic peaks in a country highly depend on when the variant arrives. The current methods for predicting the spread of new variants rely on statistical modeling, however, these methods work only when the new variant has already arrived in the region of interest and has a significant prevalence. Can we predict when a variant existing elsewhere will arrive in a given region? To address this question, we propose a variant-dynamics-informed Graph Neural Network (GNN) approach. First, we derive the dynamics of variant prevalence across pairs of regions (countries) that apply to a large class of epidemic models. The dynamics motivate the introduction of certain features in the GNN. We demonstrate that our proposed dynamics-informed GNN outperforms all the baselines, including the currently pervasive framework of Physics-Informed Neural Networks (PINNs). To advance research in this area, we introduce a benchmarking tool to assess a user-defined model's prediction performance across 87 countries and 36 variants.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# SynHing: グラフ学習と説明のための合成異種情報ネットワーク生成

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation ( http://arxiv.org/abs/2401.04133v2 )

ライセンス: Link先を確認

Ming-Yi Hong, Yi-Hsiang Huang, Shao-En Lin, You-Chen Teng, Chih-Yu Wang, Che Lin,

(参考訳) グラフニューラルネットワーク(GNN)は、コミュニティ分析やレコメンデーションシステムなど、さまざまな領域でグラフ構造を記述している。 GNNの解釈がますます重要になるにつれて、堅牢なベースラインと拡張グラフデータセットの需要は、特に異種情報ネットワーク(HIN)の文脈において強調される。そこで我々はSynHingを紹介した。SynHingはグラフ学習と説明の強化を目的としたSynthetic Heterogeneous Information Network Generationの新しいフレームワークである。 SynHingは、ターゲットHINの主要なモチーフを体系的に識別し、クラスタ内およびクラスタ間マージモジュールによるボトムアップ生成プロセスを使用する。この過程は、後処理技術によって補われ、合成HINが元のグラフの構造的および統計的性質を密接に反映することを保証する。重要な点として、SynHingはGNN説明モデルの評価、説明可能な合成HIN生成のための新しい標準の設定、複雑なネットワークにおける解釈可能な機械学習の進歩に寄与する、地道なモチーフを提供する。

Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# 拡散モデルに基づく効率的な画像分解ネットワーク

Efficient Image Deblurring Networks based on Diffusion Models ( http://arxiv.org/abs/2401.05907v2 )

ライセンス: Link先を確認

Kang Chen, Yuanjie Liu,

(参考訳) 本稿では,デフォーカスデブロアリングのためのスライディングウインドウモデル,Swintormerについて述べる。この方法は拡散モデルを用いて遅延前の特徴を生成し、より詳細な画像の復元を支援する。さらに、スライドウィンドウ戦略を適用することで、推論効率を高めるために、特殊なTransformerブロックが組み込まれている。この新しいアプローチの採用により、イテレーション毎にMAC(Multiply-Accumulate Operations)が大幅に削減され、メモリ要件が大幅に削減された。現在のGRL法と比較して、Swintormerモデルはメモリ容量に依存する計算負荷を140.35 GMACsから8.02 GMACsに大幅に削減し、デフォーカスのデフォーカスを27.04 dBから27.07 dBに改善した。この革新的な技術は、メモリ制限されたデバイス上での高解像度画像の処理を可能にし、潜在的なアプリケーションシナリオを大幅に広げる。この記事では、それぞれのネットワークモジュールが最終的なパフォーマンスにどのように貢献するかを網羅的に調査するアブレーションスタディをまとめて紹介する。ソースコードとモデルは、以下のWebサイトで利用可能になる。

This article presents a sliding window model for defocus deblurring, named Swintormer, which achieves the best performance to date with remarkably low memory usage. This method utilizes a diffusion model to generate latent prior features, aiding in the restoration of more detailed images. Additionally, by adapting the sliding window strategy, it incorporates specialized Transformer blocks to enhance inference efficiency. The adoption of this new approach has led to a substantial reduction in Multiply-Accumulate Operations (MACs) per iteration, drastically cutting down memory requirements. In comparison to the currently leading GRL method, our Swintormer model significantly reduces the computational load that must depend on memory capacity, from 140.35 GMACs to 8.02 GMACs, while improving the Peak Signal-to-Noise Ratio (PSNR) for defocus deblurring from 27.04 dB to 27.07 dB. This innovative technique enables the processing of higher resolution images on memory-limited devices, vastly broadening potential application scenarios. The article wraps up with an ablation study, offering a comprehensive examination of how each network module contributes to the final performance.The source code and model will be available at the following website: https://github.com/bnm6900030/swintormer.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# 大規模言語モデルのためのコードシミュレーションの課題

Code Simulation Challenges for Large Language Models ( http://arxiv.org/abs/2401.09074v3 )

ライセンス: Link先を確認

Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, Samuele Marro, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge,

(参考訳) 多くの推論、計画、問題解決タスクは本質的なアルゴリズムの性質を共有している。この研究は、Large Language Models (LLM) がいかにコーディングとアルゴリズムタスクをシミュレートし、そのようなアルゴリズム推論タスクにおける一般的な機能についての洞察を提供するかを研究する。我々は、直線プログラムのベンチマーク、クリティカルパスを含むコード、近似命令および冗長命令を導入する。さらに,アルゴリズムのソートとネストループによるLLMのシミュレーション能力を評価し,ルーチンの計算複雑性がLLMの実行をシミュレートする能力に直接影響を与えることを示す。最も強力なLCMは比較的強力なシミュレーション能力を示すが、このプロセスは脆弱であり、パターン認識に大きく依存しており、記憶の影響を受けている。本稿では,コンパイラの計算パターンを行/追従することによって,LLMにコード実行行をシミュレートするように指示する,既成の計算処理手法であるChain of Simulation(CoSm)を提案する。 CoSmは、シミュレーション性能を改善しながら、LLMの記憶と浅いパターン認識を効率的に行う。コードシミュレーションにおけるCoSmの成功は、他の一般的なシミュレーション推論タスクにインスピレーションを与えるものだと考えている。

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks.

翻訳日:2024-05-30 23:50:38 公開日:2024-05-29

# 大規模言語モデル時代の進化的計算:サーベイとロードマップ

Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap ( http://arxiv.org/abs/2401.10034v3 )

ライセンス: Link先を確認

Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, Kay Chen Tan,

(参考訳) 大規模言語モデル (LLMs) は自然言語処理に革命をもたらしただけでなく、様々な分野にも進出し、人工知能への大きな一歩を踏み出した。 LLMと進化的アルゴリズム(EA)の相互作用は、目的や方法論の違いにもかかわらず、複雑な問題に適用可能性の共通の追求を共有している。一方、EAは、ブラックボックス設定下でのLLMのさらなる拡張のための最適化フレームワークを提供し、柔軟性のあるグローバル検索能力を持つLLMに権限を与えることができる。一方、LLMに固有の豊富なドメイン知識により、EAはよりインテリジェントな検索を行うことができる。さらに、LLMのテキスト処理と生成能力は、幅広いタスクにまたがってEAをデプロイするのに役立ちます。本稿では,これらの相補的優位性に基づいて,相互インスピレーションを LLM 強化 EA と EA 強化 LLM の2つの主要経路に分類する,徹底的なレビューと,先進的なロードマップを提供する。コード生成、ソフトウェア工学、ニューラルアーキテクチャ探索、および様々な生成タスクを含む様々なシナリオにおいて、LLMとEAの相補性を実証するために、いくつかの統合されたシナジー手法が導入された。 LLM時代のEA研究に焦点をあてた最初の総合的なレビューとして、本論文はLLMとEAの協調可能性を理解するための基礎的な足場を提供する。特定された課題と今後の方向性は、研究者や実践者が、最適化と人工知能の進歩を推進し、この革新的なコラボレーションの可能性を最大限に解き放つためのガイダンスを提供する。私たちは、関連する論文をインデックスするGitHubリポジトリを作成しました。

Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence. We have created a GitHub repository to index the relevant papers: https://github.com/wuxingyu-ai/LLM4EC.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 交通流予測のためのハイブリッド時変グラフニューラルネットワーク

A novel hybrid time-varying graph neural network for traffic flow forecasting ( http://arxiv.org/abs/2401.10155v3 )

ライセンス: Link先を確認

Benao Dai, Baolin Ye, Lingxi Li,

(参考訳) これらの課題を克服するために,交通流予測のためのハイブリッド時変グラフニューラルネットワーク(HTVGNN)を提案する。まず、時間変化マスク強化に基づく新しいマルチアテンション機構を報告し、トラフィックネットワーク内の異なるトラフィックノード間の動的時間依存性をより正確にモデル化した。次に,道路ネットワークにおける異なる交通ノード間の静的および動的空間的関連を同時に学習するグラフ学習手法を提案する。一方、時間変化グラフの学習能力を高めるために、各時間ステップで学習したグラフを結合するグラフ学習機構が設計された。最後に,提案手法の有効性を4つの実データを用いて実証した。シミュレーションの結果,HTVGNNは最先端の時空間グラフニューラルネットワークモデルと比較して予測精度が優れていることがわかった。さらに、このアブレーション実験により、結合グラフ学習機構がHTVGNNの長期予測性能を効果的に向上できることを確認した。

In order to overcome these challenges, we have proposed a novel hybrid time-varying graph neural network (HTVGNN) for traffic flow prediction. Firstly, a novel time-aware multi-attention mechanism based on time-varying mask enhancement was reported to more accurately model the dynamic temporal dependencies among distinct traffic nodes in the traffic network. Secondly, we have proposed a novel graph learning strategy to concurrently learn both static and dynamic spatial associations between different traffic nodes in road networks. Meanwhile, in order to enhance the learning ability of time-varying graphs, a coupled graph learning mechanism was designed to couple the graphs learned at each time step. Finally, the effectiveness of the proposed method HTVGNN was demonstrated with four real data sets. Simulation results revealed that HTVGNN achieves superior prediction accuracy compared to the state of the art space-time graph neural network models. Additionally, the ablation experiment verifies that the coupled graph learning mechanism can effectively improve the long-term prediction performance of HTVGNN.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 知的障害者の表情認識における深層学習の有効性の評価

Assessing the Efficacy of Deep Learning Approaches for Facial Expression Recognition in Individuals with Intellectual Disabilities ( http://arxiv.org/abs/2401.11877v2 )

ライセンス: Link先を確認

F. Xavier Gaya-Morey, Silvia Ramis, Jose M. Buades-Rubio, Cristina Manresa-Yee,

(参考訳) 表情認識は、ユーザの感情状態を識別する能力を持つ社会ロボットを付与する手段として重要視されている。社会ロボティクスの使用には、家庭、介護施設、保育所など様々な設定が含まれており、幅広い利用者に利用されている。しかし,知的障害者の表情認識の直接的利用は,本研究ではまだ研究されていない。この目的を達成するために、知的障害を持つ個人がいないデータセットの集合や、そのような個人を特徴とするデータセットを含む、さまざまなアプローチで12の畳み込みニューラルネットワークのセットをトレーニングする。本研究の結果は, 知的障害者, 知的障害者, および, 知的障害者の間で, 表情に有意な差異が認められた。注目すべきことに,本研究では,各利用者の表情を効果的に扱えるように調整したユーザ固有の訓練手法により,この集団内での表情認識の必要性が示された。

Facial expression recognition has gained significance as a means of imparting social robots with the capacity to discern the emotional states of users. The use of social robotics includes a variety of settings, including homes, nursing homes or daycare centers, serving to a wide range of users. Remarkable performance has been achieved by deep learning approaches, however, its direct use for recognizing facial expressions in individuals with intellectual disabilities has not been yet studied in the literature, to the best of our knowledge. To address this objective, we train a set of 12 convolutional neural networks in different approaches, including an ensemble of datasets without individuals with intellectual disabilities and a dataset featuring such individuals. Our examination of the outcomes, both the performance and the important image regions for the models, reveals significant distinctions in facial expressions between individuals with and without intellectual disabilities, as well as among individuals with intellectual disabilities. Remarkably, our findings show the need of facial expression recognition within this population through tailored user-specific training methodologies, which enable the models to effectively address the unique expressions of each user.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 厳格なAI監査にはブラックボックスアクセスが不十分

Black-Box Access is Insufficient for Rigorous AI Audits ( http://arxiv.org/abs/2401.14446v3 )

ライセンス: Link先を確認

Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell,

(参考訳) AIシステムの外部監査は、AIガバナンスの重要なメカニズムとして、ますます認識されている。しかし、監査の有効性は監査人に与えられるアクセスの程度に依存する。最近の最先端のAIシステムの監査は、主にブラックボックスアクセスに依存しており、監査官はシステムに問い合わせて出力を観察することしかできない。しかしながら、システムの内部動作(例えば重量、アクティベーション、勾配)へのホワイトボックスアクセスは、監査人がより強力な攻撃を行い、モデルをより徹底的に解釈し、微調整を行うことを可能にする。一方、トレーニングやデプロイメント情報(方法論、コード、ドキュメンテーション、データ、デプロイメントの詳細、内部評価からの発見など)への外部アクセスは、監査人が開発プロセスを精査し、より対象とする評価を設計できるようにします。本稿では,ブラックボックス監査の限界と,ホワイトボックス監査とアウトサイドボックス監査の利点について検討する。また、これらの監査を最小限のセキュリティリスクで実施するための技術的、物理的、法的保護についても論じる。その結果,(1)監査員が使用するアクセスと手法に関する透明性は,監査結果を適切に解釈するには必要であり,(2)ブラックボックスのみよりも,ホワイトボックスとアウト・ザ・ボックスのアクセスの方がかなり精査できることがわかった。

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 深層学習とオープンアース観測データを用いたグローバル氷河マッピングに向けて

Towards Global Glacier Mapping with Deep Learning and Open Earth Observation Data ( http://arxiv.org/abs/2401.15113v2 )

ライセンス: Link先を確認

Konstantin A. Maslov, Claudio Persello, Thomas Schellenberger, Alfred Stein,

(参考訳) 正確な地球規模の氷河マッピングは、気候変動の影響を理解するために重要である。その重要性にもかかわらず、世界規模での自動氷河マッピングはほとんど未調査のままである。本稿では、このギャップに対処し、畳み込み変換型ディープラーニングモデルであるGlaViTU(GlaViTU)を提案する。空間的, 時間的, クロスセンサーの一般化を評価することで, 従来観測されていなかった画像に対して, 我々の最善策は >0.85 の団結を達成し, 高山アジアなどの破片の多い地域では >0.75 まで低下し, クリーンアイスが支配する地域では >0.90 まで上昇することを示す。面積と距離の偏差の点での人間の専門家の不確実性に対する比較検証は、GlaViTUのパフォーマンス、アプローチ、あるいは専門家レベルのデラインの整合性を強調している。合成開口レーダデータ、すなわち後方散乱と干渉コヒーレンスを追加することで、利用可能なすべての領域の精度が向上する。氷河の度合いの調整された信頼性が報告され、予測はより信頼性が高く解釈可能である。また、世界中の氷河の9%をカバーするベンチマークデータセットもリリースしました。本研究は, 自動多時期・グローバル氷河マッピングへの取り組みを支援する。

Accurate global glacier mapping is critical for understanding climate change impacts. Despite its importance, automated glacier mapping at a global scale remains largely unexplored. Here we address this gap and propose Glacier-VisionTransformer-U-Net (GlaViTU), a convolutional-transformer deep learning model, and five strategies for multitemporal global-scale glacier mapping using open satellite imagery. Assessing the spatial, temporal and cross-sensor generalisation shows that our best strategy achieves intersection over union >0.85 on previously unobserved images in most cases, which drops to >0.75 for debris-rich areas such as High-Mountain Asia and increases to >0.90 for regions dominated by clean ice. A comparative validation against human expert uncertainties in terms of area and distance deviations underscores GlaViTU performance, approaching or matching expert-level delineation. Adding synthetic aperture radar data, namely, backscatter and interferometric coherence, increases the accuracy in all regions where available. The calibrated confidence for glacier extents is reported making the predictions more reliable and interpretable. We also release a benchmark dataset that covers 9% of glaciers worldwide. Our results support efforts towards automated multitemporal and global glacier mapping.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# マルコフ連鎖モンテカルロの並列アフィン変換チューニング

Parallel Affine Transformation Tuning of Markov Chain Monte Carlo ( http://arxiv.org/abs/2401.16567v2 )

ライセンス: Link先を確認

Philip Schär, Michael Habeck, Daniel Rudolf,

(参考訳) マルコフ連鎖モンテカルロサンプリング器の性能は、その共分散構造、確率質量の位置、尾の挙動などのターゲット分布の性質に強く依存する。本研究では, サンプル空間の単射アフィン変換を用いて, 対象分布の特性を向上し, 変換された空間内を走行するサンプリング器の性能を向上する。特に,サンプリング中のアフィン変換を適応的に学習するフレキシブルでユーザフレンドリなスキームを提案する。さらに,本手法とギブシアン極スライスサンプリングを組み合わせることで,実世界のデータに基づいて,比較的低い計算コストで高品質なサンプルを作成できることを示す。

The performance of Markov chain Monte Carlo samplers strongly depends on the properties of the target distribution such as its covariance structure, the location of its probability mass and its tail behavior. We explore the use of bijective affine transformations of the sample space to improve the properties of the target distribution and thereby the performance of samplers running in the transformed space. In particular, we propose a flexible and user-friendly scheme for adaptively learning the affine transformation during sampling. Moreover, the combination of our scheme with Gibbsian polar slice sampling is shown to produce samples of high quality at comparatively low computational cost in several settings based on real-world data.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 位置:ベイジアンディープラーニングは大規模AIの時代に必要である

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI ( http://arxiv.org/abs/2402.00809v3 )

ライセンス: Link先を確認

Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang,

(参考訳) ディープラーニング研究の現在の状況では、大規模な画像と言語データセットを含む教師付きタスクにおいて、高い予測精度を達成することに重点が置かれている。しかし、より広い視点から見れば、不確実性、活動的かつ継続的な学習、科学的なデータなど、見落とされがちなメトリクス、タスク、データタイプが、注意を喚起する。 Bayesian Deep Learning(BDL)は,これらのさまざまな設定にまたがってメリットを提供する,有望な道の1つである。本稿では,BDLが深層学習の能力を高めることができることを示唆する。 BDLの強みを再考し、既存の課題を認識し、これらの障害に対処するためのエキサイティングな研究方法を強調します。今後の議論は、大規模ファンデーションモデルをBDLと組み合わせて、その潜在能力を最大限に活用する方法に焦点を当てている。

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# スパイクニューラルネットワークによる効率的な時系列予測

Efficient and Effective Time-Series Forecasting with Spiking Neural Networks ( http://arxiv.org/abs/2402.01533v2 )

ライセンス: Link先を確認

Changze Lv, Yansen Wang, Dongqi Han, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li,

(参考訳) 生物学的ニューロンのスパイク行動にインスパイアされたスパイキングニューラルネットワーク(SNN)は、時間的データの複雑さを捉えるためのユニークな経路を提供する。しかし,SNNを時系列予測に適用することは,効率的な時間的アライメントの難しさ,符号化プロセスの複雑さ,モデル選択のための標準ガイドラインの欠如などにより困難である。本稿では,時間情報処理におけるスパイクニューロンの効率を活かした時系列予測タスクにおけるSNNの枠組みを提案する。一連の実験を通して,提案手法が従来の時系列予測手法に匹敵する,あるいは優れた結果をもたらすことを示す。さらに,SNNの時系列データにおける時間的依存性を捉える能力を評価するための詳細な解析実験を行い,時間的データの複雑なダイナミクスをモデル化する上で,そのニュアンスな強度と有効性について貴重な知見を提供する。本研究は, SNNの普及に寄与し, 時系列予測タスクの代替として, より生物学的にインスパイアされ, 時間的に意識された予測モデルを開発するための経路を提供する。私たちのコードはhttps://github.com/microsoft/SeqSNNで公開されています。

Spiking neural networks (SNNs), inspired by the spiking behavior of biological neurons, provide a unique pathway for capturing the intricacies of temporal data. However, applying SNNs to time-series forecasting is challenging due to difficulties in effective temporal alignment, complexities in encoding processes, and the absence of standardized guidelines for model selection. In this paper, we propose a framework for SNNs in time-series forecasting tasks, leveraging the efficiency of spiking neurons in processing temporal information. Through a series of experiments, we demonstrate that our proposed SNN-based approaches achieve comparable or superior results to traditional time-series forecasting methods on diverse benchmarks with much less energy consumption. Furthermore, we conduct detailed analysis experiments to assess the SNN's capacity to capture temporal dependencies within time-series data, offering valuable insights into its nuanced strengths and effectiveness in modeling the intricate dynamics of temporal data. Our study contributes to the expanding field of SNNs and offers a promising alternative for time-series forecasting tasks, presenting a pathway for the development of more biologically inspired and temporally aware forecasting models. Our code is available at https://github.com/microsoft/SeqSNN.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 大規模言語モデルのためのガードレールの構築

Building Guardrails for Large Language Models ( http://arxiv.org/abs/2402.01822v2 )

ライセンス: Link先を確認

Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang,

(参考訳) 大規模言語モデル(LLM)が私たちの日常生活に統合されるにつれて、リスクの特定と緩和が不可欠です。 LLMの入力や出力をフィルタリングするガードレールは、コアセーフガード技術として登場した。このポジションペーパーでは、現在のオープンソースソリューション(Llama Guard, Nvidia NeMo, Guardrails AI)を詳しく調べ、より完全なソリューションを構築するための課題と道筋について論じる。従来の研究から強固な証拠を引用し,様々なLLMアプリケーションにおける多様な文脈の包括的考察に基づいて,LLMのガードレール構築のための体系的アプローチを提唱する。我々は,複数の専門分野のチームと共同で,正確な技術的要件の特定,要求の複雑さを受け入れるための高度なニューラルシンボリック実装の探索,最終製品の品質を保証するための検証とテストの開発などを通じて,社会工学的手法を採用することを提案する。

As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# グラフニューラルネットワークのための分布内プロキシグラフの生成

Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks ( http://arxiv.org/abs/2402.02036v2 )

ライセンス: Link先を確認

Zhuomin Chen, Jiaxing Zhang, Jingchao Ni, Xiaoting Li, Yuchen Bian, Md Mezbahul Islam, Ananda Mohan Mondal, Hua Wei, Dongsheng Luo,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフデータ処理においてビルディングブロックとなり、重要な領域で広く応用されている。高度なアプリケーションにGNNをデプロイする必要性の高まりは、意思決定プロセスにおけるユーザ説明可能性を必要としている。 GNNの説明可能性のための一般的なパラダイムは、ラベルを元のグラフと比較することで説明可能な部分グラフを特定することである。この課題は、トレーニングセットの元々のグラフから説明可能なサブグラフのセットへの相当な分布シフトにより、ラベルの正確な予測ができないため、困難である。そこで本研究では,学習データの分布を示す説明可能な部分グラフのプロキシグラフを生成する手法を提案する。本稿では,グラフ生成器を用いてプロキシグラフを生成するパラメトリック手法を提案する。情報理論に基づく新たなトレーニング目的は、プロキシグラフがトレーニングデータの分布に従属するだけでなく、説明的要因も保持するように設計されている。このような生成されたプロキシグラフは、説明可能な部分グラフのラベルの予測を確実に近似するために使用することができる。提案手法は, GNNのより正確な説明が可能であることを示す。

Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original graphs. This task is challenging due to the substantial distributional shift from the original graphs in the training set to the set of explainable subgraphs, which prevents accurate prediction of labels with the subgraphs. To address it, in this paper, we propose a novel method that generates proxy graphs for explainable subgraphs that are in the distribution of training data. We introduce a parametric method that employs graph generators to produce proxy graphs. A new training objective based on information theory is designed to ensure that proxy graphs not only adhere to the distribution of training data but also preserve explanatory factors. Such generated proxy graphs can be reliably used to approximate the predictions of the labels of explainable subgraphs. Empirical evaluations across various datasets demonstrate our method achieves more accurate explanations for GNNs.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 限界を超えて:大規模言語モデルにおける文脈長を拡張する手法の調査

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models ( http://arxiv.org/abs/2402.02244v3 )

ライセンス: Link先を確認

Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi,

(参考訳) 近年,大規模言語モデル (LLM) は,文脈理解,論理的推論への関与,応答の生成など,顕著な能力を示している。しかし、これは厳密な計算とメモリの要求を犠牲にして達成され、長い入力シーケンスを効果的にサポートする能力を妨げる。本調査は,LLMのシーケンス長を延長するために考案された最近の手法と手法を包括的にレビューし,長文理解の能力を高めるものである。特に、計算要求の比例的な増加を回避しつつ、より長いシーケンスの処理を強化するために設計された、修正された位置符号化や変更された注意機構などのアーキテクチャ変更を含む幅広い手法をレビューし、分類する。本研究で検討した多種多様な手法は, LLMの異なる位相,すなわちトレーニング, 微調整, 推論に利用することができる。これにより、LLMは拡張シーケンスを効率的に処理できる。今後の研究の方向性を示唆する上で,LLMの継続的な進歩におけるシーケンス長の重要性を浮き彫りにした上で,現行の方法論の限界について論じる。

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer ( http://arxiv.org/abs/2402.02464v3 )

ライセンス: Link先を確認

Zhangyang Gao, Daize Dong, Cheng Tan, Jun Xia, Bozhen Hu, Stan Z. Li,

(参考訳) 非ユークリッドグラフを純粋言語やユークリッドベクトルとしてモデル化することは可能か。非ユークリッド性はグラフモデリングにおいて長期にわたる課題を提起している。最近のグラフニューラルネットワークとグラフ変換器はユークリッドベクトルとしてグラフを符号化しようとするが、元のグラフをベクトルから復元することは依然として困難である。本稿では,非ユークリッドグラフをユークリッド空間の学習可能なグラフワードに変換するGraph2Seqエンコーダと,グラフワードから元のグラフを再構成して情報等価性を確保するGraphGPTデコーダを紹介する。 1) 事前学習したGraph2Seqはグラフ表現学習に優れ、8/9ドルのグラフ分類と回帰タスクで最先端の結果が得られる。 2) 事前訓練したグラフGPTは強力なグラフ生成器として機能し, 少数ショットグラフ生成と条件グラフ生成の両方を実行する強力な能力によって実証された。 (3) Graph2Seq+GraphGPT は、既知の非ユークリッド問題を克服し、ユークリッド空間におけるグラフの効果的な混合を可能にする。 (4)エッジ中心の事前学習フレームワークであるGraphsGPTは、グラフドメインタスクにおいて、表現と生成の両方において優れた効果を示す。コードは \href{https://github.com/A4Bio/GraphsGPT}{GitHub} で公開されている。

Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 難解なギブズサンプリング

Diffusive Gibbs Sampling ( http://arxiv.org/abs/2402.03008v5 )

ライセンス: Link先を確認

Wenlin Chen, Mingtian Zhang, Brooks Paige, José Miguel Hernández-Lobato, David Barber,

(参考訳) 従来のマルコフ・チェイン・モンテカルロ法(MCMC)のマルチモーダル分布に対する不適切な混合は、ベイズ推論や分子動力学のような実践的応用において重要な課題である。そこで本稿では,ディフューシブギブズサンプリング(Diffusive Gibbs Sampling, DiGS)を提案する。 DiGSは拡散モデルにおける最近の発展を統合し、ガウスの畳み込みを利用して元の空間の孤立モードをブリッジする補助ノイズ分布を作成し、ギブスサンプリングを用いて両方の空間からサンプルを交互に描画する。新規なメトロポリス・ウィスティン・ギブス法は, サンプリング工程における混合性を高めるために提案されている。 DiGSは、並列テンパリングのような最先端の手法よりも、マルチモーダル分布をサンプリングするためのより優れた混合特性を示し、ガウス、ベイズニューラルネットワーク、分子動力学の混合を含む様々なタスクにおける性能を大幅に改善した。

The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# IMUSE:IMUベースの表情キャプチャ

IMUSE: IMU-based Facial Expression Capture ( http://arxiv.org/abs/2402.03944v2 )

ライセンス: Link先を確認

Youjia Wang, Yiwen Wu, Hengan Zhou, Hongyang Lin, Xingyue Peng, Yingwenqi Jiang, Yingsheng Zhu, Guanpeng Long, Yatu Zhang, Jingya Wang, Lan Xu, Jingyi Yu,

(参考訳) 顔の動きのキャプチャーと分析では、支配的なソリューションは一般的に、プライバシーを保護できず、閉塞に対して脆弱な視覚的手がかりに基づいている。慣性測定ユニット (IMU) は救難の可能性を秘めているが、主にフルボディのモーションキャプチャーに採用されている。本稿では,このギャップを埋めるためにIMUSICを提案する。これは純粋IMU信号を用いた表情キャプチャの新しい経路であり,従来の視覚的解とはかなり離れているため,IMUSICのキーデザインは三部作である。我々はまず、解剖学的に駆動されるIMU配置スキームを伴って、顔の撮影に適したマイクロIMUを設計する。そして、多様な表情とパフォーマンスのために、リッチなIMU/視覚信号を提供する新しいIMU-ARKitデータセットをコントリビュートする。このようなユニークなマルチモダリティは、IMUベースの顔行動分析のような将来の方向性に大きな可能性をもたらします。さらに、IMU-ARKitを用いて、純IMU信号から顔のブレンドシェープパラメータを正確に予測する強力なベースライン手法を提案する。具体的には、この新たなトラッキングタスクのための2段階のトレーニング戦略を備えたTransformer拡散モデルを調整する。 IMUSICフレームワークは,視覚的手法が乱れ,同時にユーザのプライバシを保護するシナリオにおいて,正確な顔認証を行うことができる。 IMUSICアプローチの有効性を検証するため,IMU構成と技術コンポーネントの両方について広範な実験を行った。特に、IMUSICは、プライバシー保護の顔キャプチャー、隠蔽に対するハイブリッドキャプチャー、視覚的手がかりによってしばしば見えない微小な顔の動きの検出など、様々な可能性と斬新な応用を可能にしている。私たちは、私たちのコミュニティで顔のキャプチャと分析の可能性をさらに強化するために、データセットと実装をリリースします。

For facial motion capture and analysis, the dominated solutions are generally based on visual cues, which cannot protect privacy and are vulnerable to occlusions. Inertial measurement units (IMUs) serve as potential rescues yet are mainly adopted for full-body motion capture. In this paper, we propose IMUSIC to fill the gap, a novel path for facial expression capture using purely IMU signals, significantly distant from previous visual solutions.The key design in our IMUSIC is a trilogy. We first design micro-IMUs to suit facial capture, companion with an anatomy-driven IMU placement scheme. Then, we contribute a novel IMU-ARKit dataset, which provides rich paired IMU/visual signals for diverse facial expressions and performances. Such unique multi-modality brings huge potential for future directions like IMU-based facial behavior analysis. Moreover, utilizing IMU-ARKit, we introduce a strong baseline approach to accurately predict facial blendshape parameters from purely IMU signals. Specifically, we tailor a Transformer diffusion model with a two-stage training strategy for this novel tracking task. The IMUSIC framework empowers us to perform accurate facial capture in scenarios where visual methods falter and simultaneously safeguard user privacy. We conduct extensive experiments about both the IMU configuration and technical components to validate the effectiveness of our IMUSIC approach. Notably, IMUSIC enables various potential and novel applications, i.e., privacy-protecting facial capture, hybrid capture against occlusions, or detecting minute facial movements that are often invisible through visual cues. We will release our dataset and implementations to enrich more possibilities of facial capture and analysis in our community.

翻訳日:2024-05-30 23:40:54 公開日:2024-05-29

# 学習アルゴリズムによるより柔軟なPAC-Bayesianメタラーニング

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms ( http://arxiv.org/abs/2402.04054v2 )

ライセンス: Link先を確認

Hossein Zakerinia, Amin Behjati, Christoph H. Lampert,

(参考訳) PAC-Bayesian理論を用いたメタラーニング手法の研究のための新しいフレームワークを提案する。これまでの作業に比べて大きな利点は、タスク間の知識の伝達を実現する方法において、柔軟性を高めることである。従来のアプローチでは、モデル上の事前分布を学習することで、間接的にしか実現できなかった。対照的に、新しい一般化は、メタ学習のプロセスが将来のタスクに使用されるべき学習アルゴリズムを学習するよりも、はるかに直接的に表現できることを証明している。フレームワークの柔軟性は、幅広いメタ学習メカニズムを分析したり、新しいメカニズムを設計したりするのに適しています。理論的貢献以外は、我々のフレームワークが実用的なメタ学習メカニズムの予測品質を改善することを実証的に示しています。

We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory. Its main advantage over previous work is that it allows for more flexibility in how the transfer of knowledge between tasks is realized. For previous approaches, this could only happen indirectly, by means of learning prior distributions over models. In contrast, the new generalization bounds that we prove express the process of meta-learning much more directly as learning the learning algorithm that should be used for future tasks. The flexibility of our framework makes it suitable to analyze a wide range of meta-learning mechanisms and even design new mechanisms. Other than our theoretical contributions we also show empirically that our framework improves the prediction quality in practical meta-learning mechanisms.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# モダリティギャップ全体での検索によるマルチモーダル非教師付きドメイン一般化

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap ( http://arxiv.org/abs/2402.04416v2 )

ライセンス: Link先を確認

Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis,

(参考訳) ドメイン一般化 (Domain Generalization, DG) は、共有ラベル空間の仮定の下で、1つ以上のソースドメインを利用するテストドメインを見えないように一般化するモデルを学習する重要な問題である。しかし、ほとんどのDG手法は、ターゲットのラベル空間における豊富なソースデータへのアクセスを前提としている。この設定では、微調整中にタスク非依存の未ラベルソースデータセットを使用する、unsupervised domain generalization (MUDG) のマルチモーダルバージョンに取り組む。私たちのフレームワークは、ソースデータセットとターゲットタスクの関係を明示的に想定していません。代わりに、ソースデータセットを、共同ビジョン言語空間で正確かつ効率的に検索できるという前提にのみ依存する。 MUDG設定で3つのコントリビューションを行います。まず,テキストクエリと粗い量子化に使用される画像セントロイドとの間の距離が大きいため,近接した近接探索が低リコールに悩まされることを理論的に示す。そこで我々は,画像空間の代わりにクエリ空間にセントロイドを格納することで,近傍のリコールを改善する単純なクラスタリングアルゴリズムであるペアk-meansを提案する。第2に、ゼロショット精度を改善し、検索した画像データを多様化するために、ターゲットラベルに対する適応的なテキスト拡張方式を提案する。最後に、下流目標精度をさらに向上させるため、2つの単純だが効果的なコンポーネントを提示する。我々は、それぞれのベンチマークで、最先端の名前のみの転送、ソースフリーDG、ゼロショット(ZS)の手法と比較し、20種類のデータセットで一貫した精度の向上を示す。コードは:https://github.com/Chris210634/mudg

Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# グラフ上の確率測定のための一般ソボレフ輸送

Generalized Sobolev Transport for Probability Measures on a Graph ( http://arxiv.org/abs/2402.04516v2 )

ライセンス: Link先を確認

Tam Le, Truyen Nguyen, Kenji Fukumizu,

(参考訳) グラフ距離空間上での測度に対する最適輸送(OT)問題について検討する。最近、Le et al (2022) はグラフ構造を利用し、高速な計算のために閉形式表現を生成する OT の変種、すなわち Sobolev transport (ST) を提案する。しかし、ST は定義の中の $L^p$ の幾何構造と本質的に結合しているため、他の先行構造に対して ST を利用するのは自明ではない。対照的に、古典的なOTは、基礎となるコスト関数を変更することによって、様々な幾何学構造に適応する柔軟性を持つ。重要な例はOrlicz-Wasserstein (OW) であり、これは \emph{Orlicz 幾何構造を利用して$L^p$構造を超えて動く。標準的な$p$オーダーのWassersteinと比較して、OWは、特定の機械学習アプローチを著しく前進させるのに役立ちます。それでもOWは、2レベル最適化の定式化により、計算に新たな課題を提起している。本研究では,Orlicz構造に対する特定の凸関数のクラスを利用して,一般化ソボレフ輸送(GST)を提案する。 GSTはSTを特別な場合として包含し、$L^p$幾何を超える事前構造に利用できる。 OWに関して、OWの複雑な2段階最適化問題とは異なり、GSTを計算するために単変量最適化問題を単に解くだけでよいことを示す。 GSTはOWよりも数桁高速であることを示す。さらに、文書分類におけるGSTの利点と、トポロジカルデータ解析におけるいくつかの課題について、予備的な証拠を提供する。

We study the optimal transport (OT) problem for measures supported on a graph metric space. Recently, Le et al. (2022) leverage the graph structure and propose a variant of OT, namely Sobolev transport (ST), which yields a closed-form expression for a fast computation. However, ST is essentially coupled with the $L^p$ geometric structure within its definition which makes it nontrivial to utilize ST for other prior structures. In contrast, the classic OT has the flexibility to adapt to various geometric structures by modifying the underlying cost function. An important instance is the Orlicz-Wasserstein (OW) which moves beyond the $L^p$ structure by leveraging the \emph{Orlicz geometric structure}. Comparing to the usage of standard $p$-order Wasserstein, OW remarkably helps to advance certain machine learning approaches. Nevertheless, OW brings up a new challenge on its computation due to its two-level optimization formulation. In this work, we leverage a specific class of convex functions for Orlicz structure to propose the generalized Sobolev transport (GST). GST encompasses the ST as its special case, and can be utilized for prior structures beyond the $L^p$ geometry. In connection with the OW, we show that one only needs to simply solve a univariate optimization problem to compute the GST, unlike the complex two-level optimization problem in OW. We empirically illustrate that GST is several-order faster than the OW. Moreover, we provide preliminary evidences on the advantages of GST for document classification and for several tasks in topological data analysis.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# ラベルシフトロバストテスト時間適応のためのチャネル選択正規化

Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation ( http://arxiv.org/abs/2402.04958v2 )

ライセンス: Link先を確認

Pedro Vianna, Muawiz Chaudhary, Paria Mehrbod, An Tang, Guy Cloutier, Guy Wolf, Michael Eickenberg, Eugene Belilovsky,

(参考訳) ディープニューラルネットワークは多くの異なるタスクに有用な応用があるが、その性能はデータ分散の変化によって大きく影響を受ける可能性がある。例えば、バイオメディカル分野では、トレーニングとテストデータセット間のデータ(異なるマシン、人口)の変化によってパフォーマンスが影響を受ける可能性がある。実世界のシナリオに対するロバストさと一般化を保証するため、最近、推論中に新しいデータ分布にモデルを調整するためのアプローチとしてテスト時間適応法が研究されている。テスト時のバッチ正規化は、ドメインシフトベンチマークで魅力的なパフォーマンスを達成した、シンプルで一般的な方法である。テストバッチのバッチ正規化統計を再計算することで実装される。これまでの研究は、トレーニングデータと同じラベル分布を持つテストデータによる分析に重点を置いてきた。しかし、多くの実用的な応用において、この手法はラベルの分布シフトに弱いため、時には破滅的な失敗を引き起こすことがある。これにより、デプロイにテスト時間適応手法を適用するリスクが生じる。本稿では、ディープネットワークにおけるチャネルのみを選択的に適応させ、ラベルシフトに敏感な劇的な適応を最小化することで、この問題に対処することを提案する。 1) 後続のネットワーク層はラベルシフトに敏感であり,(2) 個々の特徴は特定のクラスに敏感である。提案手法をCIFAR10-C, Imagenet-C, 脂肪肝診断の3つの分類課題に適用し, 共変量およびラベル分布の変化について検討した。提案手法は,TTAの利点を生かしつつ,他の手法に共通する障害のリスクを大幅に低減するとともに,ハイパーパラメータの選択に頑健であることを示す。

Deep neural networks have useful applications in many different tasks, however their performance can be severely affected by changes in the data distribution. For example, in the biomedical field, their performance can be affected by changes in the data (different machines, populations) between training and test datasets. To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust models to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks. It is implemented by recalculating batch normalization statistics on test batches. Prior work has focused on analysis with test data that has the same label distribution as the training data. However, in many practical applications this technique is vulnerable to label distribution shifts, sometimes producing catastrophic failure. This presents a risk in applying test time adaptation methods in deployment. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. Our selection scheme is based on two principles that we empirically motivate: (1) later layers of networks are more sensitive to label shift (2) individual features can be sensitive to specific classes. We apply the proposed technique to three classification tasks, including CIFAR10-C, Imagenet-C, and diagnosis of fatty liver, where we explore both covariate and label distribution shifts. We find that our method allows to bring the benefits of TTA while significantly reducing the risk of failure common in other methods, while being robust to choice in hyperparameters.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# 原理的優先度ベイズ最適化

Principled Preferential Bayesian Optimization ( http://arxiv.org/abs/2402.05367v2 )

ライセンス: Link先を確認

Wenjie Xu, Wenbin Wang, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones,

(参考訳) 優先ベイズ最適化 (BO) の問題について検討し, 2つの候補解に対してのみ好みのフィードバックでブラックボックス関数を最適化することを目的とする。確率比のアイデアに触発されて、選好フィードバックのみを用いてブラックボックス関数の信頼度セットを構築する。この問題を解くために,効率的な計算手法を用いた楽観的アルゴリズムを開発した。この境界により、予測された最良の解を、保証された収束率で報告するスキームを設計することができる。ガウス過程, 標準試験関数, 熱的快適性最適化問題のサンプル実験結果から, 提案手法は, 既往の最先端ヒューリスティックよりも安定に, あるいは競争的な性能を達成できることが示唆された。

We study the problem of preferential Bayesian optimization (BO), where we aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. Inspired by the likelihood ratio idea, we construct a confidence set of the black-box function using only the preference feedback. An optimistic algorithm with an efficient computational method is then developed to solve the problem, which enjoys an information-theoretic bound on the total cumulative regret, a first-of-its-kind for preferential BO. This bound further allows us to design a scheme to report an estimated best solution, with a guaranteed convergence rate. Experimental results on sampled instances from Gaussian processes, standard test functions, and a thermal comfort optimization problem all show that our method stably achieves better or competitive performance as compared to the existing state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret bounds or convergence.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# 言語エージェント強化のためのエントロピー規則化トークンレベルポリシー最適化

Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement ( http://arxiv.org/abs/2402.06700v3 )

ライセンス: Link先を確認

Muning Wen, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen,

(参考訳) 大規模言語モデル(LLM)は、対話的な意思決定タスクにおいてインテリジェントなエージェントとして期待されている。伝統的なアプローチは、しばしば厳密に設計されたプロンプト、高品質な例、文脈内学習、教師付き微調整(RLHF)のための追加の報酬モデルに依存する。強化学習(Reinforcement Learning, RL)は、タスク固有の環境に直接関与することで、これらの依存関係を克服するLLMの動的代替手段を提供する。それでも、大きなハードルに直面している。 1) 探索を必要とする指数的に広大な活動空間から生じる不安定性 2)行動レベルの報酬信号に基づいてトークン単位のクレジットを割り当てることの課題は,報酬の最大化とコーパスデータの正確なモデル化の相違をもたらす。これらの課題に対応するために,トークンレベルでLLMを最適化するためのエントロピー拡張RL法であるEntropy-Regularized Token-level Policy Optimization (ETPO)を導入する。 ETPOの中心となるのは、RLプロセスと言語モデリングの原則を調和させるように設計された、新しいソフトなベルマンアップデートです。この手法は、Q関数の更新を粗いアクションレベルの視点からより粒度の細かいトークンレベルの視点へ分解し、最適化整合性の理論的証明に裏付ける。重要なことに、この分解は行動探索において線形時間の複雑さを生じさせる。我々は,データサイエンスコード生成を多段階対話タスクのシリーズとしてモデル化するシミュレーション環境におけるETPOの有効性を評価する。トークンレベルの分解とPPO法の適用の動機について、より詳細な予備研究については、arXiv:2405.15821を参照してください。

Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging directly with task-specific environments. Nonetheless, it faces significant hurdles: 1) instability stemming from the exponentially vast action space requiring exploration; 2) challenges in assigning token-level credit based on action-level reward signals, resulting in discord between maximizing rewards and accurately modeling corpus data. In response to these challenges, we introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. At the heart of ETPO is our novel per-token soft Bellman update, designed to harmonize the RL process with the principles of language modeling. This methodology decomposes the Q-function update from a coarse action-level view to a more granular token-level perspective, backed by theoretical proof of optimization consistency. Crucially, this decomposition renders linear time complexity in action exploration. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks; results underline ETPO's potential as a robust method for refining the interactive decision-making capabilities of language agents. For a more detailed preliminary work describing our motivation for token-level decomposition and applying it in PPO methods, please refer to arXiv:2405.15821.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# 力学系における実験設計のためのネスティング粒子フィルタ

Nesting Particle Filters for Experimental Design in Dynamical Systems ( http://arxiv.org/abs/2402.07868v4 )

ライセンス: Link先を確認

Sahel Iqbal, Adrien Corenflos, Simo Särkkä, Hany Abdulsamad,

(参考訳) 本稿では,リスクに敏感な政策最適化として定式化した非交換可能データに対するベイズ実験設計手法を提案する。 Inside-Out SMC$^2$ algorithm, a nested sequential Monte Carlo technique to inferimal design, and embed it into a Particle Markov chain Monte Carlo framework to perform gradient-based policy amortization。提案手法は, コントラスト推定器に頼らないため, 他のアモータイズされた実験設計手法と異なる。一連の力学系の数値検証は,他の最先端戦略と比較して,本手法の有効性を示す。

In this paper, we propose a novel approach to Bayesian experimental design for non-exchangeable data that formulates it as risk-sensitive policy optimization. We develop the Inside-Out SMC$^2$ algorithm, a nested sequential Monte Carlo technique to infer optimal designs, and embed it into a particle Markov chain Monte Carlo framework to perform gradient-based policy amortization. Our approach is distinct from other amortized experimental design techniques, as it does not rely on contrastive estimators. Numerical validation on a set of dynamical systems showcases the efficacy of our method in comparison to other state-of-the-art strategies.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# 提案的満足度のための少ないデータからより良い表現を学習する

Learning Better Representations From Less Data For Propositional Satisfiability ( http://arxiv.org/abs/2402.08365v2 )

ライセンス: Link先を確認

Mohamed Ghanem, Frederik Schmitt, Julian Siber, Bernd Finkbeiner,

(参考訳) NP完全問題に対するニューラルネットワークのトレーニングは通常、非常に大量のトレーニングデータを必要とし、しばしば出力の正確性を保証するために計算に高価なシンボル検証器と結合する必要がある。本稿では,NP完全問題であるNeuResについて述べる。証明書駆動のトレーニングとエキスパートのイテレーションを組み合わせることで、私たちのモデルは、分類のみのためにトレーニングされたモデルよりも優れた表現を学びます。 NeuRes は証明システムとして命題分解を使い、満足できない証明を生成し、真理の割り当てを満足させる過程を加速し、両方の可能性を並列に探索する。そこで本研究では,新しい節を導出するために,動的公式の埋め込みから句のペアを自動回帰的に選択するアテンションベースアーキテクチャを提案する。さらに、我々は、モデル生成証明が、より長い教師の証明を、新たな基礎的真実として徐々に置き換える専門家の反復を採用する。これにより、先進的な解法によって生成される証明のデータセットを、余分なガイダンスなしでトレーニング後に約32%削減できる。このことは、NeuResがその自己改善ワークフローのために教師アルゴリズムの最適性によって制限されないことを示している。このモデルでは,NuroSATよりも,正しく分類された例と証明された例の両方において,はるかに優れた性能が得られることを示す。

Training neural networks on NP-complete problems typically demands very large amounts of training data and often needs to be coupled with computationally expensive symbolic verifiers to ensure output correctness. In this paper, we present NeuRes, a neuro-symbolic approach to address both challenges for propositional satisfiability, being the quintessential NP-complete problem. By combining certificate-driven training and expert iteration, our model learns better representations than models trained for classification only, with a much higher data efficiency -- requiring orders of magnitude less training data. NeuRes employs propositional resolution as a proof system to generate proofs of unsatisfiability and to accelerate the process of finding satisfying truth assignments, exploring both possibilities in parallel. To realize this, we propose an attention-based architecture that autoregressively selects pairs of clauses from a dynamic formula embedding to derive new clauses. Furthermore, we employ expert iteration whereby model-generated proofs progressively replace longer teacher proofs as the new ground truth. This enables our model to reduce a dataset of proofs generated by an advanced solver by ~32% after training on it with no extra guidance. This shows that NeuRes is not limited by the optimality of the teacher algorithm owing to its self-improving workflow. We show that our model achieves far better performance than NeuroSAT in terms of both correctly classified and proven instances.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# マルコフ決定過程による遷移制約ベイズ最適化

Transition Constrained Bayesian Optimization via Markov Decision Processes ( http://arxiv.org/abs/2402.08406v2 )

ライセンス: Link先を確認

Jose Pablo Folch, Calvin Tsay, Robert M Lee, Behrang Shafei, Weronika Ormaniec, Andreas Krause, Mark van der Wilk, Ruth Misener, Mojmír Mutný,

(参考訳) ベイズ最適化はブラックボックス関数を最適化する手法である。従来は、検索スペースを任意にクエリできる設定に重点を置いていた。しかし、現実の多くの問題は、この柔軟性を提供していない。特に、次のクエリの検索空間は、以前のものに依存しているかもしれない。物理科学において、局所的な運動の制約、特定の変数の単調性、測定の正確性に影響を与える遷移といった形で生じる。いずれにせよ、そのような移行の制約は計画の形式を必要とする。この研究はマルコフ決定過程の枠組みを通じて古典的ベイズ最適化を拡張した。我々は,地平線全体に向けて計画する政策を得るため,強化学習を用いて実用機能の抽出可能な線形化を反復的に解決する。これは、政策空間における取得関数の最適化と平行である。結果として得られる政策は歴史に依存し、マルコフ的でない可能性がある。本稿では, 化学反応器最適化, 情報経路計画, 機械校正, その他の合成例の応用例を紹介する。

Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements. Altogether, such transition constraints necessitate a form of planning. This work extends classical Bayesian optimization via the framework of Markov Decision Processes. We iteratively solve a tractable linearization of our utility function using reinforcement learning to obtain a policy that plans ahead for the entire horizon. This is a parallel to the optimization of an acquisition function in policy space. The resulting policy is potentially history-dependent and non-Markovian. We showcase applications in chemical reactor optimization, informative path planning, machine calibration, and other synthetic examples.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# SLEB: 冗長性検証によるLLMのストリーム化と変圧器ブロックの除去

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks ( http://arxiv.org/abs/2402.09025v2 )

ライセンス: Link先を確認

Jiwon Song, Kyungseok Oh, Taesu Kim, Hyungjun Kim, Yulhwa Kim, Jae-Joon Kim,

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクにおいて非常に効果的であることが証明されている。しかし、それらの多数のパラメータは、実践的なデプロイに重大な課題を生じさせる。 LLMのサイズと複雑さを減らすことを目的とした技術であるPruningは、ネットワークから冗長なコンポーネントを取り除くことで潜在的なソリューションを提供する。プルーニングの約束にもかかわらず、既存の手法は、かなりエンドツーエンドのLSM推論スピードアップを達成するのに苦労することが多い。本稿では、冗長なトランスブロックを排除し、LCMを合理化するための新しいアプローチであるSLEBを紹介する。 LLMは隣接するブロックの出力間に高い類似性を有するブロックレベルの冗長性を示すため、我々は変圧器ブロックをプルーニングの基本単位として選択する。この選択により、LLMの処理速度を効果的に向上できる。実験結果から,SLEBはLLM推論を高速化し,高いパープレキシティと精度を維持しつつ,従来のLLMプルーニング法よりも優れており,SLEBはLLMの効率を高めるための有望な技術であることが示された。コードは、https://github.com/jiwonsong-dev/SLEB.comで入手できる。

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# 連続多変量分布の普遍生成モデルとしてのパラメータ化量子回路

Parameterized quantum circuits as universal generative models for continuous multivariate distributions ( http://arxiv.org/abs/2402.09848v2 )

ライセンス: Link先を確認

Alice Barthe, Michele Grossi, Sofia Vallecorsa, Jordi Tura, Vedran Dunjko,

(参考訳) パラメータ化量子回路は、回帰、分類、生成タスクにおける機械学習モデルの基盤として広く使われている。教師付き学習では、その表現性は徹底的に研究され、いくつかの普遍性特性が証明されている。しかし、量子生成モデリングの場合、特に連続変数上の分布をモデル化するタスクでは、ほとんど知られていない。本研究では,サンプルモデルを用いた予測値の抽出を行う。このようなモデルは、古典的なランダムデータがアップロードされた量子回路から、固定可観測物のセットの期待値を出力する。多変量分布の生成のための変分量子アルゴリズムの普遍性を証明する。必要最小のキュービット数と必要最小限の必要な測定量とを接続し、普遍性を許容し、厳密な境界を証明できる様々なアーキテクチャを探索する。我々の結果は、生成的モデリングタスクにおける将来の量子回路の設計を導くのに役立つかもしれない。

Parameterized quantum circuits have been extensively used as the basis for machine learning models in regression, classification, and generative tasks. For supervised learning, their expressivity has been thoroughly investigated and several universality properties have been proven. However, in the case of quantum generative modelling, much less is known, especially when the task is to model distributions over continuous variables. In this work, we elucidate expectation value sampling-based models. Such models output the expectation values of a set of fixed observables from a quantum circuit into which classical random data has been uploaded. We prove the universality of such variational quantum algorithms for the generation of multivariate distributions. We explore various architectures which allow universality and prove tight bounds connecting the minimal required qubit number, and the minimal required number of measurements needed. Our results may help guide the design of future quantum circuits in generative modelling tasks.

翻訳日:2024-05-30 23:31:04 公開日:2024-05-29

# LLMs as Bridges:Reformulating Grounded Multimodal Named Entity Recognition

LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition ( http://arxiv.org/abs/2402.09989v4 )

ライセンス: Link先を確認

Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan,

(参考訳) Grounded Multimodal Named Entity Recognition (GMNER) は、名前付きエンティティ、エンティティタイプ、および対応する視覚領域を識別することを目的とした、初期段階のマルチモーダルタスクである。 GMNERタスクは2つの難しい特性を示す。 1) ソーシャルメディアにおける画像テキストペア間の相関が弱かったため, 名前付きエンティティのかなりの部分が接地不能となった。 2) 類似したタスク(例えば,句の局所化,表現理解の参照など)でよく用いられる粗粒度参照表現と細粒度名前付きエンティティとの区別がある。本稿では,大規模な言語モデル(LLM)を接続ブリッジとして活用することにより,GMNERをMNER-VE-VGタスクに再構成する統合フレームワークであるRiVEGを提案する。この改革は2つの利点をもたらす。 1) MNERの最適性能を維持し, 地域特徴を事前に抽出するためにオブジェクト検出手法を用いることの必要性を排除し, 既存のGMNER手法の2つの大きな限界に自然に対処する。 2) エンティティ拡張表現とビジュアルエンタテインメント(VE)モジュールの導入により,ビジュアルグラウンド(VG)とエンティティグラウンド(EG)が統合される。これによってRiVEGは,現在のあるいは将来的なマルチモーダル事前トレーニングモデルのVisual EntailmentとVisual Grounding機能を,懸命に継承することが可能になります。大規模な実験により、RiVEGは既存のGMNERデータセットの最先端の手法より優れており、全3つのサブタスクで10.65%、6.21%、および8.83%の絶対的なリードを達成している。

Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coarse-grained referring expressions commonly used in similar tasks (e.g., phrase localization, referring expression comprehension) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge. This reformulation brings two benefits: 1) It maintains the optimal MNER performance and eliminates the need for employing object detection methods to pre-extract regional features, thereby naturally addressing two major limitations of existing GMNER methods. 2) The introduction of entity expansion expression and Visual Entailment (VE) module unifies Visual Grounding (VG) and Entity Grounding (EG). It enables RiVEG to effortlessly inherit the Visual Entailment and Visual Grounding capabilities of any current or prospective multimodal pretraining models. Extensive experiments demonstrate that RiVEG outperforms state-of-the-art methods on the existing GMNER dataset and achieves absolute leads of 10.65%, 6.21%, and 8.83% in all three subtasks.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# DDPMインバージョンを用いたゼロショット非教師付きテキスト音声編集

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion ( http://arxiv.org/abs/2402.10009v4 )

ライセンス: Link先を確認

Hila Manor, Tomer Michaeli,

(参考訳) 大規模な事前学習モデルを用いて、ゼロショットで信号を編集する手法は、最近画像領域で急速に進歩している。しかし、この波はまだオーディオ領域に届いていない。本稿では,DDPMインバージョンと事前学習拡散モデルを用いた2つの音声信号のゼロショット編集手法について検討する。まず、ZEro-shot Text-based Audio (ZETA) 編集を画像領域から採用する。第2のZEro-shot UnSupervized (ZEUS) 編集は、意味論的に意味のある編集方向を監督なしで発見するための新しいアプローチである。音楽信号に適用すると、特定の楽器の参加の制御からメロディの即興演奏まで、音楽的に興味深い変更が多岐にわたることが分かる。サンプルとコードはhttps://hilamanor.github.io/AudioEditing/ で確認できる。

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found in https://hilamanor.github.io/AudioEditing/ .

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# PointMamba: ポイントクラウド分析のためのシンプルな状態空間モデル

PointMamba: A Simple State Space Model for Point Cloud Analysis ( http://arxiv.org/abs/2402.10739v4 )

ライセンス: Link先を確認

Dingkang Liang, Xin Zhou, Wei Xu, Xingkui Zhu, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Xiang Bai,

(参考訳) トランスフォーマーは、優れたグローバルモデリング能力のために、ポイントクラウド分析タスクの基本的なアーキテクチャの1つになっています。しかし、注意機構は2次複雑さを持つため、大域的モデリングに訴える線形複雑化法の設計を行うことができる。本稿では,最近の代表的状態空間モデル(SSM)であるMambaを,NLPからポイントクラウド解析タスクへ移行したPointMambaを提案する。従来のトランスフォーマーとは異なり、PointMambaは線形複雑性アルゴリズムを採用し、グローバルなモデリング能力を示しながら計算コストを大幅に削減する。具体的には、空間充填曲線を有効点トークン化に利用し、非常に単純で非階層的なマンバエンコーダをバックボーンとして採用する。総合的な評価では、PointMambaは複数のデータセットで優れたパフォーマンスを実現し、GPUメモリ使用量とFLOPを大幅に削減している。本研究は,3次元視覚関連課題におけるSSMの可能性を明らかにするとともに,今後の研究に有効なマンバベースラインを提示する。コードはhttps://github.com/LMD0311/PointMambaで入手できる。

Transformers have become one of the foundational architectures in point cloud analysis tasks due to their excellent global modeling ability. However, the attention mechanism has quadratic complexity, making the design of a linear complexity method with global modeling appealing. In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs. Specifically, our method leverages space-filling curves for effective point tokenization and adopts an extremely simple, non-hierarchical Mamba encoder as the backbone. Comprehensive evaluations demonstrate that PointMamba achieves superior performance across multiple datasets while significantly reducing GPU memory usage and FLOPs. This work underscores the potential of SSMs in 3D vision-related tasks and presents a simple yet effective Mamba-based baseline for future research. The code is available at https://github.com/LMD0311/PointMamba.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# RAG-Driver:マルチモーダル大言語モデルにおける検索強化型インコンテキスト学習による汎用運転説明

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model ( http://arxiv.org/abs/2402.10828v2 )

ライセンス: Link先を確認

Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd,

(参考訳) 私たちは、しばしば不透明なAIメソッドを使用するロボットを信頼する必要があります。彼らは私たち自身を説明する必要があり、彼らの説明を信頼する必要があります。この点において、説明責任は、特に複雑な自律運転において、エンドユーザー間の透明性と受け入れを促進するために、信頼できる自律的意思決定において重要な役割を担っている。近年のMLLM(Multi-Modal Large Language Model)の進歩は、自然言語の説明とともに制御予測を生成することにより、駆動エージェントとしての説明可能性を高める有望な可能性を示している。しかし、高価なアノテーションコストと異なるデータセット間のドメインギャップによる厳しいデータ不足は、堅牢で汎用的なシステムの開発を極めて難しい課題にしている。さらに,MLLMの厳格に高価なトレーニング要件と破滅的忘れの未解決問題により,展開後の一般性はさらに制限された。これらの課題に対処するために,提案するRAG-Driverは,高能率,説明性,一般化可能な自律運転にコンテキスト内学習を活用する,検索強化型マルチモーダルな大規模言語モデルである。 RAG-Driverが運転動作の説明,正当化,制御信号の予測を行う上で,最先端の性能を発揮することを実証的に検証した。さらに重要なのは、さらなる訓練をすることなく、目に見えない環境に例外的なゼロショットの一般化能力を示すことだ。

We need to trust robots that use often opaque AI methods. They need to explain themselves to us, and we need to trust their explanation. In this regard, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations. However, severe data scarcity due to expensive annotation costs and significant domain gaps between different datasets makes the development of a robust and generalisable system an extremely challenging task. Moreover, the prohibitively expensive training requirements of MLLM and the unsolved problem of catastrophic forgetting further limit their generalisability post-deployment. To address these challenges, we present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving. By grounding in retrieved expert demonstration, we empirically validate that RAG-Driver achieves state-of-the-art performance in producing driving action explanations, justifications, and control signal prediction. More importantly, it exhibits exceptional zero-shot generalisation capabilities to unseen environments without further training endeavours.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# ベイズ最適化によるペロブスカイト実験からの物理材料パラメータ抽出

Physics-based material parameters extraction from perovskite experiments via Bayesian optimization ( http://arxiv.org/abs/2402.11101v4 )

ライセンス: Link先を確認

Hualin Zhan, Viqar Ahmad, Azul Mayon, Grace Tabi, Anh Dinh Bui, Zhuofeng Li, Daniel Walter, Hieu Nguyen, Klaus Weber, Thomas White, Kylie Catchpole,

(参考訳) 実験的分析からペロブスカイトの物質パラメータを抽出する能力は、光電気・光電子応用の合理的な設計に不可欠である。しかし、この分析の難しさは、理論モデルの複雑さとペロブスカイトの材料パラメータの数によって著しく増加する。ここでは、キャリアのドリフト拡散と動的欠陥占有を含む複雑なフル物理モデルに基づいて、過渡発光実験から有機金属ペロブスカイト半導体の8つの基本材料パラメータを抽出できる解析プラットフォームを開発するためにベイズ最適化を用いる。熱劣化の例としては、キャリヤ移動率とトラップアシスト組換え係数が顕著に低下し、欠陥エネルギー準位はほぼ変化していないことが示されている。キャリヤ移動率の低下は, パーロブスカイト型太陽電池の熱劣化に対する全体的な影響を減少させ, 補充係数の低下が補充係数の増大に影響を及ぼすにもかかわらず, 補充係数を減少させることによって支配することができる。将来、このプラットフォームは他の実験や実験の組み合わせに便利に適用でき、半導体材料の発見と最適化を加速する。

The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis platform that can extract up to 8 fundamental material parameters of an organometallic perovskite semiconductor from a transient photoluminescence experiment, based on a complex full physics model that includes drift-diffusion of carriers and dynamic defect occupation. An example study of thermal degradation reveals that the carrier mobility and trap-assisted recombination coefficient are reduced noticeably, while the defect energy level remains nearly unchanged. The reduced carrier mobility can dominate the overall effect on thermal degradation of perovskite solar cells by reducing the fill factor, despite the opposite effect of the reduced trap-assisted recombination coefficient on increasing the fill factor. In future, this platform can be conveniently applied to other experiments or to combinations of experiments, accelerating materials discovery and optimization of semiconductor materials for photovoltaics and other applications.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# 大規模言語モデルのための知識境界のベンチマーク:モデル評価の異なる視点

Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation ( http://arxiv.org/abs/2402.11493v2 )

ライセンス: Link先を確認

Xunjian Yin, Xu Zhang, Jie Ruan, Xiaojun Wan,

(参考訳) 近年,多種多様なタスクにおいて顕著な性能を達成し,大規模言語モデルの開発において大きな進歩を遂げている。言語モデルの知識能力を評価するため,従来の研究では,質問応答ペアに基づくベンチマークが多数提案されている。我々は,言語モデルがアクティベートに敏感であるため,不確定な質問や限定的な言い回しをクエリとして評価することは,信頼性が高く,包括的ではないと論じる。そこで本稿では,言語モデル内での素早い知識と素早い知識の両方を包含する,知識境界という新しい概念を導入する。知識境界は言語モデル評価の迅速な感度を回避し、より信頼性と堅牢性を高める。与えられたモデルの知識境界を探索するために,各知識に対して最適なプロンプトを識別する新しいアルゴリズムである,セマンティック制約付き予測勾配降下法を提案する。実験では,既存の手法と比較して知識境界計算におけるアルゴリズムの性能が優れていることを示した。さらに,知識境界を持つ複数の領域における複数の言語モデルの能力を評価する。

In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or limited paraphrases as the query, since language models are sensitive to prompt. Therefore, we introduce a novel concept named knowledge boundary to encompass both prompt-agnostic and prompt-sensitive knowledge within language models. Knowledge boundary avoids prompt sensitivity in language model evaluations, rendering them more dependable and robust. To explore the knowledge boundary for a given model, we propose projected gradient descent method with semantic constraints, a new algorithm designed to identify the optimal prompt for each piece of knowledge. Experiments demonstrate a superior performance of our algorithm in computing the knowledge boundary compared to existing methods. Furthermore, we evaluate the ability of multiple language models in several domains with knowledge boundary.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# ワッサーシュタイン分布ロバストモデルに対する普遍的一般化保証

Universal generalization guarantees for Wasserstein distributionally robust models ( http://arxiv.org/abs/2402.11981v2 )

ライセンス: Link先を確認

Tam Le, Jérôme Malick,

(参考訳) 分散ロバストな最適化は、堅牢な機械学習モデルをトレーニングし、データの不確実性と分散シフトをキャプチャする魅力的な方法として登場した。最近の統計分析により、ワッサーシュタイン曖昧性集合から構築されたロバストモデルが優れた一般化を保証することが証明され、次元性の呪いが破られる。しかし、これらの結果は特定の場合、近似のコスト、あるいは実際は検証が難しい仮定の下で得られる。対照的に、この記事では、輸送コスト関数や損失関数、潜在的に凸や非平滑性を含むすべての実例をカバーする正確な一般化を保証する。例えば、私たちの結果は制限的な仮定を必要とせず、ディープラーニングに適用されます。この結果は,非平滑解析法と古典的濃度解析法を組み合わせた新しい証明手法によって達成される。我々のアプローチは、(二重)正則化を含む分布的に頑健な問題をワッサーシュタイン/シンクホーンの最近のバージョンに拡張するのに十分である。

Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that robust models built from Wasserstein ambiguity sets have nice generalization guarantees, breaking the curse of dimensionality. However, these results are obtained in specific cases, at the cost of approximations, or under assumptions difficult to verify in practice. In contrast, we establish, in this article, exact generalization guarantees that cover all practical cases, including any transport cost function and any loss function, potentially non-convex and nonsmooth. For instance, our result applies to deep learning, without requiring restrictive assumptions. We achieve this result through a novel proof technique that combines nonsmooth analysis rationale with classical concentration results. Our approach is general enough to extend to the recent versions of Wasserstein/Sinkhorn distributionally robust problems that involve (double) regularizations.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# 多発性対数的ミニマックス後悔を伴うリニアバンディット

Linear bandits with polylogarithmic minimax regret ( http://arxiv.org/abs/2402.12042v2 )

ライセンス: Link先を確認

Josep Lumbreras, Marco Tomamichel,

(参考訳) 本研究では,未知ベクトルに近づいた単位球上の動作を選択すると,下ガウス雑音パラメータが線形に消滅する線形確率帯域の雑音モデルについて検討する。我々は,この問題に対するアルゴリズムを導入し,時間軸で$\log^3(T)$,時間軸で$T$と,典型的な帯域幅アルゴリズムに対するこの後悔の平方根スケーリングとは対照的に,ミニマックス後悔のスケーリングを$\log^3(T)$とする。我々の戦略は、重み付けされた最小二乗推定に基づいて、固有値関係 $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time steps $t$ をノイズモデルとは独立で、独立した関心を持つような幾何学的議論を通じて達成する。これにより、各時間ステップにおける期待された後悔を$O(\frac1{t})$の順番で厳格に制御することができ、累積的後悔の対数的スケーリングにつながります。

We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# 医療用多言語言語モデルの構築に向けて

Towards Building Multilingual Language Model for Medicine ( http://arxiv.org/abs/2402.13963v3 )

ライセンス: Link先を確認

Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie,

(参考訳) オープンソースの多言語医療言語モデルの開発は、様々な地域から幅広い言語的に多様な聴衆に利益をもたらすことができる。まず、MMedCと呼ばれる6つの主要言語を含む約25.5Bのトークンを含む多言語医療コーパスを構築し、さらに、多言語医療LLMの開発を監視するために、MMedBenchと呼ばれる有理性を備えた多言語医療多言語質問応答ベンチマークを提案し、第3に、MMedCで訓練された他の自動回帰型言語モデルとともに、ベンチマーク上で多数のオープンソースの大規模言語モデル(LLM)を評価した。我々の最終モデルであるMMed-Llama 3は、8Bパラメータしか持たないが、GPT-4に匹敵するようなMMedBenchおよび英語ベンチマークの他のすべてのオープンソースモデルと比較して、優れた性能が得られる。そこで本研究では,多言語医療用LLMの開発を支援するための大規模コーパス,ベンチマーク,一連のモデルを提案する。

The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for general LLMs; Second, to monitor the development of multilingual medical LLMs, we propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; Third, we have assessed a number of open-source large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC. Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks, even rivaling GPT-4. In conclusion, in this work, we present a large-scale corpus, a benchmark and a series of models to support the development of multilingual medical LLMs.

翻訳日:2024-05-30 23:21:18 公開日:2024-05-29

# EOS決定の観点からのマルチモーダル幻覚の緩和

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective ( http://arxiv.org/abs/2402.14545v2 )

ライセンス: Link先を確認

Zihao Yue, Liang Zhang, Qin Jin,

(参考訳) 大規模なマルチモーダルモデル(LMM)は、視覚的な入力に存在しないコンテンツを生成するため、しばしば多モーダル幻覚に悩まされる。本稿では,この問題の新たなアングルを探究する:過度に詳細なトレーニングデータにより,モデルが生成をタイムリーに終了する能力が損なわれ,視覚的知覚限界を超えて出力が継続する。特殊な終末トークンであるEOSを用いて、モデルがどのように生成を終了させるかを調べることで、生成したテキストと画像を比較してシーケンス全体の完全性を評価する。この観察は、モデルが過度に長い出力を避けるために、その視覚的知覚に基づいて適切なEOS決定を行う固有の可能性を持っていることを示唆している。このような可能性を活用するために,モデルが正規指導データから学習することで幻覚を減らすことができる訓練目標と,有害な訓練データがモデル幻覚を悪化させるのを防ぐためのデータフィルタリング戦略の2つの手法を検討する。どちらの手法も追加のデータや知識を必要とせずにLMMの幻覚性能を大幅に向上させる。

Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation, leading to continued outputs beyond visual perception limits. By investigating how the model decides to terminate generation with EOS, the special end-of-sentence token, we find that the model assesses the completeness of the entire sequence by comparing the generated text with the image. This observation suggests that the model possesses an inherent potential of making proper EOS decisions based on its visual perception to avoid overly lengthy outputs. To take advantage of such potential, we explore two methods to mitigate multimodal hallucinations: a training objective that enables the model to reduce hallucinations by learning from regular instruction data, and a data filtering strategy to prevent harmful training data from exacerbating model hallucinations. Both methods significantly improve the hallucination performance of LMMs, without requiring any additional data or knowledge.

翻訳日:2024-05-30 23:21:17 公開日:2024-05-29

# 対人訓練における不均一規則化の再考 : ロバスト性・精度トレードオフの改善

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off ( http://arxiv.org/abs/2402.14648v2 )

ライセンス: Link先を確認

Futa Waseda, Ching-Chun Chang, Isao Echizen,

(参考訳) 敵の訓練は、敵の例(AE)を防衛する最先端のアプローチであるが、正確さを犠牲にして高い堅牢性が達成されるロバストネス・精度のトレードオフに悩まされている。本研究では,このトレードオフを緩和するために,潜在表現の不等式正規化を活用し,識別的かつ逆向きに不変表現を学習する。非分散正規化を伴う表現学習における2つの主要な課題を解析し、(1)不分散損失と分類目的との「段階的な衝突」が最適下収束をもたらすこと、(2)クリーン入力と逆入力の分散分布から生じる混合分布問題について分析する。これらの問題に対処するため,非対称的非分散損失と停止段階演算と予測器を組み込んだ非対称表現正規化逆行訓練(AR-AT)と,混合分布問題を解決するための分割バッチノーム(BN)構造を提案する。本手法は,識別能力を犠牲にすることなく,逆不変表現を学習することにより,ロバスト性・精度のトレードオフを大幅に改善する。さらに,本研究の知見が知識蒸留に基づく防衛手法との関連性について考察し,それらの相対的成功の深い理解に寄与する。

Although adversarial training has been the state-of-the-art approach to defend against adversarial examples (AEs), it suffers from a robustness-accuracy trade-off, where high robustness is achieved at the cost of clean accuracy. In this work, we leverage invariance regularization on latent representations to learn discriminative yet adversarially invariant representations, aiming to mitigate this trade-off. We analyze two key issues in representation learning with invariance regularization: (1) a "gradient conflict" between invariance loss and classification objectives, leading to suboptimal convergence, and (2) the mixture distribution problem arising from diverged distributions of clean and adversarial inputs. To address these issues, we propose Asymmetrically Representation-regularized Adversarial Training (AR-AT), which incorporates asymmetric invariance loss with stop-gradient operation and a predictor to improve the convergence, and a split-BatchNorm (BN) structure to resolve the mixture distribution problem. Our method significantly improves the robustness-accuracy trade-off by learning adversarially invariant representations without sacrificing discriminative ability. Furthermore, we discuss the relevance of our findings to knowledge-distillation-based defense methods, contributing to a deeper understanding of their relative successes.

翻訳日:2024-05-30 23:21:17 公開日:2024-05-29

# ダブルIウォーターマーク : LLMファインチューニングのためのモデル著作権保護

Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning ( http://arxiv.org/abs/2402.14883v2 )

ライセンス: Link先を確認

Shen Li, Liuyi Yao, Jinyang Gao, Lan Zhang, Yaliang Li,

(参考訳) さまざまなアプリケーションをサポートするために、ビジネスオーナーにとって一般的で効率的なアプローチは、LLMオーナやクラウドサーバが提供するAPIを通じて、トレーニング済みのLLMを微調整するための貴重なデータセットを活用している。しかし、このプロセスはモデル誤用のかなりのリスクを伴い、ビジネスオーナーに深刻な経済的影響をもたらす可能性がある。したがって、LLM微調整中にこれらのカスタマイズされたモデルの著作権を保護することは、緊急の現実的な要件となっているが、そのような保護を提供するための既存のソリューションは限られている。このプレス問題に対処するため、「ダブルI透かし」と呼ばれる新しい透かし手法を提案する。具体的には、インストラクションチューニングデータに基づいて、2種類のバックドアデータパラダイムを導入し、それぞれインストラクションと入力をトリガーとする。 LLMの学習機能を活用して、データセットにカスタマイズされたバックドアサンプルを組み込むことにより、細調整中に特定の透かし情報をカスタマイズされたモデルに効果的に注入することで、商業シナリオにおける透かしの注入と検証が容易になる。提案手法を各種微調整法で評価し, その無害性, 頑健性, 独特性, 不受容性, 妥当性を定量的および定性的な分析により検証した。

To support various applications, a prevalent and efficient approach for business owners is leveraging their valuable datasets to fine-tune a pre-trained LLM through the API provided by LLM owners or cloud servers. However, this process carries a substantial risk of model misuse, potentially resulting in severe economic consequences for business owners. Thus, safeguarding the copyright of these customized models during LLM fine-tuning has become an urgent practical requirement, but there are limited existing solutions to provide such protection. To tackle this pressing issue, we propose a novel watermarking approach named ``Double-I watermark''. Specifically, based on the instruct-tuning data, two types of backdoor data paradigms are introduced with trigger in the instruction and the input, respectively. By leveraging LLM's learning capability to incorporate customized backdoor samples into the dataset, the proposed approach effectively injects specific watermarking information into the customized model during fine-tuning, which makes it easy to inject and verify watermarks in commercial scenarios. We evaluate the proposed "Double-I watermark" under various fine-tuning methods, demonstrating its harmlessness, robustness, uniqueness, imperceptibility, and validity through both quantitative and qualitative analyses.

翻訳日:2024-05-30 23:21:17 公開日:2024-05-29

# 実用性保証によるデータの公平性の達成

Achievable Fairness on Your Data With Utility Guarantees ( http://arxiv.org/abs/2402.17106v2 )

ライセンス: Link先を確認

Muhammad Faaiz Taufiq, Jean-Francois Ton, Yang Liu,

(参考訳) 機械学習のフェアネスでは、異なるセンシティブなグループ間の格差を最小限に抑えるトレーニングモデルはしばしば精度を低下させる。このトレードオフの深刻さは、本質的にデータセットの不均衡やバイアスといったデータセット特性に依存しているため、多様なデータセット間で均一な公平性要件を使用することは疑問の余地が残る。これを解決するために、厳密な統計的保証を背景として、個々のデータセットに適合する公平性-正確性トレードオフ曲線を近似する計算効率の良い手法を提案する。 You-Only-Train-Once(YOTO)フレームワークを利用することで、トレードオフ曲線を近似する際に複数のモデルを訓練する際の計算負担を軽減する。そこで本研究では,推定誤差による誤った結論を避けつつ,モデルフェアネスを監査する堅牢な枠組みを実践者に提供し,評価の不確実性を定量化する手法を提案する。我々の実験は、表形式(例えば、アダルト)、画像(CelebA)、言語(Jigsaw)データセットにまたがるものであり、我々のアプローチは、様々なデータモダリティで達成可能な最適トレードオフを確実に定量化するだけでなく、SOTAフェアネス法における準最適性の検出にも役立ちます。

In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.

翻訳日:2024-05-30 23:21:17 公開日:2024-05-29

# 大規模言語モデルの学習自由長期スケーリング

Training-Free Long-Context Scaling of Large Language Models ( http://arxiv.org/abs/2402.17463v2 )

ライセンス: Link先を確認

Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong,

(参考訳) 大規模言語モデル(LLM)によるコヒーレントテキストの処理と生成能力は,入力トークンの数が事前学習期間を超えると著しく低下する。 Llama2 70Bは100k以上のトークンのコンテキストウィンドウを連続的なトレーニングなしでサポートできる。長いシーケンスの注意計算をチャンクベースのモジュールに分解することで、DCAは同じチャンク(Intra-Chunk)と異なるチャンク(Inter-Chunk)内のトークンの相対的な位置情報を効果的にキャプチャし、Flash Attentionとシームレスに統合する。 DCAは、その印象的な補間機能に加えて、微調整されたモデルに匹敵する、あるいはそれ以上に優れた、実用的な長期コンテキストタスクのパフォーマンスを実現している。プロプライエタリモデルと比較すると,トレーニングフリーの70Bモデルでは,gpt-3.5-16kのパフォーマンスの94%を達成しています。この作業で使用されるすべてのコードとデータは、 \url{https://github.com/HKUNLP/ChunkLlama} でリリースされる。

The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length. Given the expensive overhead of finetuning large-scale models with longer sequences, we propose Dual Chunk Attention (DCA), which enables Llama2 70B to support context windows of more than 100k tokens without continual training. By decomposing the attention computation for long sequences into chunk-based modules, DCA manages to effectively capture the relative positional information of tokens within the same chunk (Intra-Chunk) and across distinct chunks (Inter-Chunk), as well as integrates seamlessly with Flash Attention. In addition to its impressive extrapolation capability, DCA achieves performance on practical long-context tasks that is comparable to or even better than that of finetuned models. When compared with proprietary models, our training-free 70B model attains 94% of the performance of gpt-3.5-16k, indicating it is a viable open-source alternative. All code and data used in this work are released at \url{https://github.com/HKUNLP/ChunkLlama}.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# ジョブショップスケジューリング問題の解決のための双方向グラフ注意ネットワークを用いたトポロジ表現の学習

Learning Topological Representations with Bidirectional Graph Attention Network for Solving Job Shop Scheduling Problem ( http://arxiv.org/abs/2402.17606v2 )

ライセンス: Link先を確認

Cong Zhang, Zhiguang Cao, Yaoxin Wu, Wen Song, Jing Sun,

(参考訳) 既存の学習に基づくジョブショップスケジューリング問題(JSSP)の解法は、通常、非方向グラフに適した既製のGNNモデルを使用し、解離グラフ(DG)のリッチで有意義なトポロジ構造を無視する。本稿では,このアテンション機構に基づく新しいGNNアーキテクチャである,トポロジ対応双方向グラフアテンションネットワーク(TBGAT)を提案し,JSSPをローカル検索フレームワークに組み込む。具体的には、TBGATは、それぞれ前方と後方のビューからDGを埋め込み、ビューの異なるトポロジに従ってメッセージが伝播し、グラフの注意を通して集約される。そこで本稿では,DGの前方および後方トポロジ的ソートを計算するためのメッセージパス機構に基づく新しい演算子を提案する。さらに,TBGATはジョブ数とマシン数に線形計算の複雑さがあることを理論的および実験的に示し,本手法の実用的価値を高めた。さらに、5つの合成データセットと7つの古典的なベンチマークに関する広範な実験により、TBGATは広い範囲のニューラルネットワークよりも大きなマージンで、新しいSOTA結果を達成することが示された。すべてのコードとデータはhttps://github.com/zcaicaros/TBGAT.comで公開されている。

Existing learning-based methods for solving job shop scheduling problems (JSSP) usually use off-the-shelf GNN models tailored to undirected graphs and neglect the rich and meaningful topological structures of disjunctive graphs (DGs). This paper proposes the topology-aware bidirectional graph attention network (TBGAT), a novel GNN architecture based on the attention mechanism, to embed the DG for solving JSSP in a local search framework. Specifically, TBGAT embeds the DG from a forward and a backward view, respectively, where the messages are propagated by following the different topologies of the views and aggregated via graph attention. Then, we propose a novel operator based on the message-passing mechanism to calculate the forward and backward topological sorts of the DG, which are the features for characterizing the topological structures and exploited by our model. In addition, we theoretically and experimentally show that TBGAT has linear computational complexity to the number of jobs and machines, respectively, strengthening our method's practical value. Besides, extensive experiments on five synthetic datasets and seven classic benchmarks show that TBGAT achieves new SOTA results by outperforming a wide range of neural methods by a large margin. All the code and data are publicly available online at https://github.com/zcaicaros/TBGAT.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# バランシング法:拡散モデルにおける分散誘導型デバイアス

Balancing Act: Distribution-Guided Debiasing in Diffusion Models ( http://arxiv.org/abs/2402.18206v3 )

ライセンス: Link先を確認

Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, R. Venkatesh Babu,

(参考訳) 拡散モデル(DM)は、前例のない画像生成能力を持つ強力な生成モデルとして登場した。これらのモデルは、データ拡張とクリエイティブなアプリケーションに広く利用されている。しかし、DMはトレーニングデータセットに存在するバイアスを反映する。これは特に、DMが1つのサブグループと他のグループ(例えば、女性と男性)を優先する顔の文脈において関係している。本研究では,追加データやモデル再学習に頼ることなく,DMを劣化させる手法を提案する。具体的には,生成した画像を所定の属性分布に従うように強制する分散誘導法を提案する。これを実現するために、UNetを識別する潜在機能には、リッチな階層的セマンティクスが備わっており、デバイアス発生を誘導するためにも同様に活用できる、という重要な洞察に基づいて構築する。 ADP(Attribute Distribution Predictor)をトレーニングします - 潜伏した特徴を属性の分布にマッピングする小さなmlpです。 ADPは、既存の属性分類器から生成された擬似ラベルで訓練される。 ADPを用いた配電誘導により,公平な生成が可能となる。提案手法は, 単一/複数属性間のバイアスを低減し, 非条件およびテキスト条件拡散モデルにおいて, ベースラインのマージンを著しく上回る。さらに、生成されたデータとトレーニングセットを再バランスさせることにより、フェア属性分類器をトレーニングする下流タスクを提案する。

Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability. These models are widely used for data augmentation and creative applications. However, DMs reflect the biases present in the training datasets. This is especially concerning in the context of faces, where the DM prefers one demographic subgroup vs others (eg. female vs male). In this work, we present a method for debiasing DMs without relying on additional data or model retraining. Specifically, we propose Distribution Guidance, which enforces the generated images to follow the prescribed attribute distribution. To realize this, we build on the key insight that the latent features of denoising UNet hold rich demographic semantics, and the same can be leveraged to guide debiased generation. We train Attribute Distribution Predictor (ADP) - a small mlp that maps the latent features to the distribution of attributes. ADP is trained with pseudo labels generated from existing attribute classifiers. The proposed Distribution Guidance with ADP enables us to do fair generation. Our method reduces bias across single/multiple attributes and outperforms the baseline by a significant margin for unconditional and text-conditional diffusion models. Further, we present a downstream task of training a fair attribute classifier by rebalancing the training set with our generated data.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 量子コンピュータにおける部分微分方程式の変分量子シミュレーションにおける境界処理

Boundary Treatment for Variational Quantum Simulations of Partial Differential Equations on Quantum Computers ( http://arxiv.org/abs/2402.18619v2 )

ライセンス: Link先を確認

Paul Over, Sergio Bengoechea, Thomas Rung, Francesco Clerici, Leonardo Scandurra, Eugene de Villiers, Dieter Jaksch,

(参考訳) 本稿では, 2次偏微分方程式で表される初期境界値問題を解くための変分量子アルゴリズムを提案する。このアプローチでは、現在のノイズの多い中間スケール量子時代の量子コンピュータに適した、ハイブリッドな古典/量子ハードウェアを使用する。偏微分方程式は、まずモジュラー制御-状態演算子(アンザッツ)で最適制御問題に変換される。最適化器が必要とする目的関数とその導関数は、アンシラ量子ビットを測定して量子コンピュータ上で効率よく評価でき、最適化手順は古典的なハードウェアを用いる。この研究の焦点は境界条件の処理であり、補正手法を用いて量子ハードウェアの特性に合わせて調整される。この目的のために、偏微分方程式の境界条件と離散項はユニタリ演算の列に分解され、その後量子ゲートにコンパイルされる。量子ハードウェアを古典的にエミュレートすることにより、2階偏微分方程式に対して、アプローチの精度とゲートの複雑さを評価する。例としては、様々なディリクレ条件、ノイマン条件、ロビン条件と組み合わせてスカラー特性に対する定常かつ非定常な拡散輸送方程式がある。このフレキシブルアプローチの結果は、関連する量子回路の量子ビット数において、顕著なポリログの複雑さのスケーリングと組み合わせて、堅牢な振る舞いと強い予測精度を示す。残る課題は最適化手順を高速化する適応的なアンザッツ戦略を指す。

The paper presents a variational quantum algorithm to solve initial-boundary value problems described by second-order partial differential equations. The approach uses hybrid classical/quantum hardware that is well suited for quantum computers of the current noisy intermediate-scale quantum era. The partial differential equation is initially translated into an optimal control problem with a modular control-to-state operator (ansatz). The objective function and its derivatives required by the optimizer can efficiently be evaluated on a quantum computer by measuring an ancilla qubit, while the optimization procedure employs classical hardware. The focal aspect of the study is the treatment of boundary conditions, which is tailored to the properties of the quantum hardware using a correction technique. For this purpose, the boundary conditions and the discretized terms of the partial differential equation are decomposed into a sequence of unitary operations and subsequently compiled into quantum gates. The accuracy and gate complexity of the approach are assessed for second-order partial differential equations by classically emulating the quantum hardware. The examples include steady and unsteady diffusive transport equations for a scalar property in combination with various Dirichlet, Neumann, or Robin conditions. The results of this flexible approach display a robust behavior and a strong predictive accuracy in combination with a remarkable polylog complexity scaling in the number of qubits of the involved quantum circuits. Remaining challenges refer to adaptive ansatz strategies that speed up the optimization procedure.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 拡散モデルによる顔スワップ

Face Swap via Diffusion Model ( http://arxiv.org/abs/2403.01108v2 )

ライセンス: Link先を確認

Feifei Wang,

(参考訳) 本稿では,2つのポートレート画像間の顔交換のための拡散モデルに基づくフレームワークを提案する。基本フレームワークは3つのコンポーネント(IP-Adapter、ControlNet、Stable Diffusionのインパインティングパイプライン)で構成され、それぞれ顔の特徴符号化、マルチ条件生成、顔インパインティングである。さらに、顔面誘導最適化とCodeFormerベースのブレンディングを導入して、生成品質をさらに改善します。具体的には、最近の軽量化手法(DreamBooth-LoRA)に取り組み、アイデンティティの整合性を保証する。 1) 情報源の同一性を表すために稀な識別子 "sks" を用いて, 2) 画像の特徴をテキストの特徴のように各横断層に注入する。次に、安定拡散の強い塗装能力を活用し、ターゲットポートレートのキャニー画像と顔検出アノテーションを条件として利用し、ContorlNetの生成をガイドし、ソースポートレートとターゲットポートレートを整列させる。さらに顔のアライメントを補正するため、サンプル生成時のテキスト埋め込みを最適化するために顔誘導損失を追加する。コードは、https://github.com/somuchtome/Faceswap.comで入手できる。

This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation. The code is available at: https://github.com/somuchtome/Faceswap

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# WebCiteS: Citationsを用いた中国語Web検索結果の分散クエリ焦点要約(Attributed Query-Focused Summarization)

WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations ( http://arxiv.org/abs/2403.01774v2 )

ライセンス: Link先を確認

Haolin Deng, Chang Wang, Xin Li, Dezhang Yuan, Junlang Zhan, Tianhua Zhou, Jin Ma, Jun Gao, Ruifeng Xu,

(参考訳) 大規模言語モデル(LLM)における属性の強化は重要な課題である。実現可能なアプローチの1つは、LLMが世代をサポートする外部ソースを引用できるようにすることである。しかし、この領域の既存のデータセットと評価方法には、依然として顕著な制限がある。本研究では、属性付きクエリ中心要約(AQFS)のタスクを定式化し、7kの人称注釈の要約を引用した中国語データセットであるWebCiteSを提示する。 WebCiteSは、実際のユーザクエリとWeb検索結果から派生したもので、モデルのトレーニングと評価のための貴重なリソースを提供する。帰属評価における先行研究は、起伏誤差と引用誤差を区別しない。また、複数のソースから部分的なサポートを引き出す文の自動検証にも不足している。これらの課題に対処するために、詳細なメトリクスを開発し、自動評価器が文を細かな検証のためにサブステートに分解できるようにする。 WebCiteSのオープンソースモデルとプロプライエタリモデルの両方を包括的に評価することは、LLMが正しく引用する上で直面する課題を浮き彫りにして、さらなる改善の必要性を浮き彫りにしている。データセットとコードは、この決定的な分野のさらなる研究を促進するために、オープンソース化される。

Enhancing the attribution in large language models (LLMs) is a crucial task. One feasible approach is to enable LLMs to cite external sources that support their generations. However, existing datasets and evaluation methods in this domain still exhibit notable limitations. In this work, we formulate the task of attributed query-focused summarization (AQFS) and present WebCiteS, a Chinese dataset featuring 7k human-annotated summaries with citations. WebCiteS derives from real-world user queries and web search results, offering a valuable resource for model training and evaluation. Prior works in attribution evaluation do not differentiate between groundedness errors and citation errors. They also fall short in automatically verifying sentences that draw partial support from multiple sources. We tackle these issues by developing detailed metrics and enabling the automatic evaluator to decompose the sentences into sub-claims for fine-grained verification. Our comprehensive evaluation of both open-source and proprietary models on WebCiteS highlights the challenge LLMs face in correctly citing sources, underscoring the necessity for further improvement. The dataset and code will be open-sourced to facilitate further research in this crucial field.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 空気質推論のための時空間ニューラルネットワーク

Spatio-Temporal Field Neural Networks for Air Quality Inference ( http://arxiv.org/abs/2403.02354v2 )

ライセンス: Link先を確認

Yutong Feng, Qiongyan Wang, Yutong Xia, Junlin Huang, Siru Zhong, Kun Wang, Shifen Cheng, Yuxuan Liang,

(参考訳) 空気質推定問題は、限られた観測地点からの履歴データを利用して、未知の場所で空気質指数を推定することを目的としている。ステーションのメンテナンスコストの高さによるデータの分散性を考慮すると、優れた推論アルゴリズムはコストを効果的に削減し、データの粒度を改善できる。時空間グラフニューラルネットワークはこの問題に対して優れた進歩を遂げているが、非ユークリッドおよび離散データ構造モデリングではそのポテンシャルが制限されている。本研究では、新しいモデルである時空間ニューラルネットワークとそれに対応する新しいフレームワークであるピラミッド推論を提案することにより、2つの異なる時空間的視点、フィールド、グラフを組み合わせるための最初の試みを行う。広範にわたる実験により,中国本土の大気質推定において,提案モデルと枠組みの優位性を実証した。

The air quality inference problem aims to utilize historical data from a limited number of observation sites to infer the air quality index at an unknown location. Considering the sparsity of data due to the high maintenance cost of the stations, good inference algorithms can effectively save the cost and refine the data granularity. While spatio-temporal graph neural networks have made excellent progress on this problem, their non-Euclidean and discrete data structure modeling of reality limits its potential. In this work, we make the first attempt to combine two different spatio-temporal perspectives, fields and graphs, by proposing a new model, Spatio-Temporal Field Neural Network, and its corresponding new framework, Pyramidal Inference. Extensive experiments validate that our model achieves state-of-the-art performance in nationwide air quality inference in the Chinese Mainland, demonstrating the superiority of our proposed model and framework.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# アクティブな統計的推測

Active Statistical Inference ( http://arxiv.org/abs/2403.03208v2 )

ライセンス: Link先を確認

Tijana Zrnic, Emmanuel J. Candès,

(参考訳) アクティブ・ラーニングの概念に着想を得て,機械学習支援データ収集を用いた統計的推論のためのアクティブ・推論法を提案。収集可能なラベルの数に関する予算を仮定すると、この方法論は機械学習モデルを使用して、どのデータポイントがラベルにとって最も有益なものかを識別し、予算を効果的に活用する。モデルは不確実性を示すデータポイントに対してラベルの収集を優先順位付けし、自信のあるモデルの予測に依存する。アクティブ推論は、ブラックボックス機械学習モデルを利用し、データ分散を処理しながら、確実に妥当な信頼区間と仮説テストを構成する。キーポイントは、非適応的に収集されたデータに依存する既存のベースラインよりもはるかに少ないサンプルで同じレベルの精度を達成することである。これは、同じ数のサンプルに対して、アクティブ推論はより小さな信頼区間とより強力なp値を可能にすることを意味する。我々は、世論調査、国勢調査分析、およびプロテオミクスからデータセットに対するアクティブな推測を評価する。

Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 一から多へ:言語モデルにおける毒性緩和の範囲を広げる

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models ( http://arxiv.org/abs/2403.03893v2 )

ライセンス: Link先を確認

Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis,

(参考訳) これまで、言語モデルにおける毒性の緩和は、ほぼ完全に単一言語設定に焦点が当てられていた。言語モデルが多言語機能を取り入れているため、私たちの安全対策はペースを保ちます。この研究ギャップを認識し,本手法は,複数の言語が提示する複雑さに対処するため,従来の毒性緩和の範囲を広げるものである。言語間で十分なアノテートされたデータセットがないため、私たちは翻訳データを用いて緩和手法を評価し、強化する。また,静的かつ連続的な毒性緩和シナリオにおいて,検索強化手法に対する微調整緩和手法の比較を行った。これにより,翻訳品質と言語間移動が毒性軽減に及ぼす影響を検討することができる。また、モデルのサイズとデータ量がこれらの緩和努力の成功にどのように影響するかについても検討する。本研究は,9つの言語を網羅し,多種多様な言語族と資源利用のレベルを表現している。総合的な実験を通じて、多言語毒性緩和の複雑さに関する洞察を提供し、価値ある洞察を提供し、このますます重要な分野における将来の研究の道を開く。コードとデータはhttps://github.com/for-ai/goodtriever.comで公開されている。

To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 文脈偏見におけるロバスト感情認識

Robust Emotion Recognition in Context Debiasing ( http://arxiv.org/abs/2403.05963v2 )

ライセンス: Link先を確認

Dingkang Yang, Kun Yang, Mingcheng Li, Shunli Wang, Shuaibing Wang, Lihua Zhang,

(参考訳) 文脈認識型感情認識(CAER)は、近年、制約のない環境における感情コンピューティング技術の実践的応用を高めている。メインストリームCAER法は多様な文脈と主観的特徴からアンサンブル表現を抽出し,対象者の感情状態を知覚する。進歩にもかかわらず、最大の課題は、コンテキストバイアスの干渉によるものである。有害なバイアスは、モデルに背景のコンテキストと感情のラベルの間の急激な相関に頼らざるを得ない。本稿では,このような問題に対処するために,反現実的感情推論(CLEF)フレームワークを提案する。具体的には、まず一般化因果グラフを定式化し、CAERの変数間の因果関係を分離する。因果グラフに続いて、CLEFはコンテキストバイアスによって引き起こされる副作用を捉えるために、非侵襲的なコンテキストブランチを導入している。提案手法では, 実測結果と実測結果とを比較して, 全体因果効果から直接文脈効果を排除し, バイアス緩和と頑健な予測を行う。モデルに依存しないフレームワークとして、CLEFは既存のメソッドに簡単に統合でき、一貫したパフォーマンス向上をもたらす。

Context-aware emotion recognition (CAER) has recently boosted the practical applications of affective computing techniques in unconstrained environments. Mainstream CAER methods invariably extract ensemble representations from diverse contexts and subject-centred characteristics to perceive the target person's emotional state. Despite advancements, the biggest challenge remains due to context bias interference. The harmful bias forces the models to rely on spurious correlations between background contexts and emotion labels in likelihood estimation, causing severe performance bottlenecks and confounding valuable context priors. In this paper, we propose a counterfactual emotion inference (CLEF) framework to address the above issue. Specifically, we first formulate a generalized causal graph to decouple the causal relationships among the variables in CAER. Following the causal graph, CLEF introduces a non-invasive context branch to capture the adverse direct effect caused by the context bias. During the inference, we eliminate the direct context effect from the total causal effect by comparing factual and counterfactual outcomes, resulting in bias mitigation and robust prediction. As a model-agnostic framework, CLEF can be readily integrated into existing methods, bringing consistent performance gains.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 画像復元のための拡散浄化を伴うデカップリングデータ整合性

Decoupled Data Consistency with Diffusion Purification for Image Restoration ( http://arxiv.org/abs/2403.06054v5 )

ライセンス: Link先を確認

Xiang Li, Soo Min Kwon, Ismail R. Alkhouri, Saiprasad Ravishankar, Qing Qu,

(参考訳) 拡散モデルは最近、データ分布をモデル化する能力に優れ、幅広い画像復元タスクに優れており、強力な生成前駆体として注目を集めている。画像復元の問題を解決するために,拡散モデルの逆サンプリングプロセスに追加の確率勾配ステップを組み込むことで,データ一貫性を実現する手法が多数存在する。しかし、さらなる勾配のステップは、計算オーバーヘッドが大きくなり、推論時間が増大するにつれて、現実の実用的な応用に挑戦する。また、データ一貫性ステップの数は、逆サンプリングステップの数によって制限されるため、加速拡散モデルサンプリング器を使用する際のさらなる困難が生じる。本研究では,データ整合性から逆処理を分離することにより,これらの問題に対処する新しい拡散型画像復元法を提案する。本手法は,データの整合性を維持するための再構成フェーズと,拡散浄化による事前処理を行う精製フェーズの交互化を含む。我々の手法は多目的性を示し、潜在空間における効率的な問題解決に高い適応性を与える。さらに、一貫性モデルを統合することで、多数のサンプリングステップの必要性を低減する。提案手法の有効性は,画像のデノイング,デブロアリング,インペイント,超解像など,画像修復作業における総合的な実験を通じて検証される。

Diffusion models have recently gained traction as a powerful class of deep generative priors, excelling in a wide range of image restoration tasks due to their exceptional ability to model data distributions. To solve image restoration problems, many existing techniques achieve data consistency by incorporating additional likelihood gradient steps into the reverse sampling process of diffusion models. However, the additional gradient steps pose a challenge for real-world practical applications as they incur a large computational overhead, thereby increasing inference time. They also present additional difficulties when using accelerated diffusion model samplers, as the number of data consistency steps is limited by the number of reverse sampling steps. In this work, we propose a novel diffusion-based image restoration solver that addresses these issues by decoupling the reverse process from the data consistency steps. Our method involves alternating between a reconstruction phase to maintain data consistency and a refinement phase that enforces the prior via diffusion purification. Our approach demonstrates versatility, making it highly adaptable for efficient problem-solving in latent space. Additionally, it reduces the necessity for numerous sampling steps through the integration of consistency models. The efficacy of our approach is validated through comprehensive experiments across various image restoration tasks, including image denoising, deblurring, inpainting, and super-resolution.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# 有限温度混合状態のSj$\ddot{\text{o}}$qvist量子幾何テンソル

Sj$\ddot{\text{o}}$qvist quantum geometric tensor of finite-temperature mixed states ( http://arxiv.org/abs/2403.06944v2 )

ライセンス: Link先を確認

Zheng Zhou, Xu-Yang Hou, Xin Wang, Jia-Chen Tang, Hao Guo, Chih-Chun Chien,

(参考訳) 量子幾何学テンソル(QGT)は、量子状態の局所的な幾何学的性質と関連する位相情報を明らかにする。ここで、QGTの有限温度での混合量子状態への一般化は、Sj$\ddot{\text{o}}$qvist 距離に基づいて展開される。結果の Sj$\ddot{\text{o}}$qvist QGT は密度行列の個々のスペクトルレベルのゲージ変換の下で不変である。ピタゴラスのような関係は距離とゲージ変換を結び、平行輸送条件の役割を明らかにする。 QGTの真の部分は自然にフィッシャー・ラオ計量とフビニ・スタディ計量の和に分解され、量子距離への異なる寄与を区別することができる。 QGTの虚部はベリー曲率の重み付け和に比例し、ある条件下での混合状態の幾何学的位相をもたらす。本稿では,QGTの温度依存性を説明するために,異なる次元の3つの例を示す。

The quantum geometric tensor (QGT) reveals local geometric properties and associated topological information of quantum states. Here a generalization of the QGT to mixed quantum states at finite temperatures based on the Sj$\ddot{\text{o}}$qvist distance is developed. The resulting Sj$\ddot{\text{o}}$qvist QGT is invariant under gauge transformations of individual spectrum levels of the density matrix. A Pythagorean-like relation connects the distances and gauge transformations, which clarifies the role of the parallel-transport condition. The real part of the QGT naturally decomposes into a sum of the Fisher-Rao metric and Fubini-Study metric, allowing a distinction between different contributions to the quantum distance. The imaginary part of the QGT is proportional to a weighted summation of the Berry curvatures, which leads to a geometric phase for mixed states under certain conditions. We present three examples of different dimensions to illustrate the temperature dependence of the QGT and a discussion on possible implications.

翻訳日:2024-05-30 23:11:33 公開日:2024-05-29

# SemGauss-SLAM:Dense Semantic Gaussian Splatting SLAM

SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM ( http://arxiv.org/abs/2403.07494v3 )

ライセンス: Link先を確認

Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang,

(参考訳) 本稿では,3次元ガウス表現を用いた高密度セマンティックSLAMシステムSemGauss-SLAMを提案する。本システムでは,3次元ガウス表現にセマンティックな特徴を組み込んで,環境の空間的レイアウト内に意味情報をエンコードすることで,正確なセマンティックなシーン表現を実現する。さらに、3次元ガウス表現の更新のための特徴レベル損失を提案し、3次元ガウス最適化のための高レベルガイダンスを可能にする。さらに,3次元ガウス表現とカメラポーズの協調最適化に多フレーム意味型アソシエーションを活用することで,追跡における累積ドリフトを低減し,セマンティック再構築精度を向上させるために,セマンティックインフォームドバンドルアライメントを導入し,低ドリフトトラッキングと正確なマッピングを実現する。我々のSemGauss-SLAM法は,ReplicaおよびScanNetデータセット上でのマッピングと追跡の精度の観点から,既存の放射場に基づくSLAM法よりも優れた性能を示すとともに,高精度なセマンティックセマンティックセマンティックセグメンテーションと密集セマンティックマッピングの優れた機能を示す。

We propose SemGauss-SLAM, a dense semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging multi-frame semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to low-drift tracking and accurate mapping. Our SemGauss-SLAM method demonstrates superior performance over existing radiance field-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in high-precision semantic segmentation and dense semantic mapping.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# Itergent (2+1)D Topological Order from Iterative (1+1)D gauging

Emergent (2+1)D topological orders from iterative (1+1)D gauging ( http://arxiv.org/abs/2403.07575v2 )

ライセンス: Link先を確認

Jose Garre Rubio,

(参考訳) ゲージは、既存の大域対称性をローカライズするためにゲージ場を導入し、その結果、再びゲージできるゲージ場上の双対大域対称性をもたらす。スピン鎖上のゲージ過程をアベリア群対称性で反復し、2次元格子にゲージ場を配置することにより、局所対称性は任意のアベリア群に対して$XZX$-codeの安定化子となる。ゲージマップをツイストすることで、奇数の小冊子項に違反し、融合によって移動双極子励起が生じるような、陽イオンを明示的に閉じ込める新しい符号を得る。我々の構成は、初期(1+1)D大域対称系の異なる量子位相をとることによって、任意のギャップ付き境界を自然に実現している。提案手法は,より低次元のトポロジコードを得るための新しい経路を確立し,そのギャップ境界とテンソルネットワーク表現を同定する。

Gauging introduces gauge fields in order to localize an existing global symmetry, resulting in a dual global symmetry on the gauge fields that can be gauged again. By iterating the gauging process on spin chains with Abelian group symmetries and arranging the gauge fields in a 2D lattice, the local symmetries become the stabilizer of the $XZZX$-code for any Abelian group. By twisting the gauging map we obtain new codes that explicitly confine anyons, which violate an odd number of plaquette terms and whose fusion results in mobile dipole excitations. Our construction naturally realizes any gapped boundary by taking different quantum phases of the initial (1+1)D globally symmetric system. Our method establishes a new route to obtain higher dimensional topological codes from lower ones, to identify their gapped boundaries and their tensor network representations.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・アダプティブ・トレーニング(ACAT)の導入によるMLロバストネスの促進

Introducing Adaptive Continuous Adversarial Training (ACAT) to Enhance ML Robustness ( http://arxiv.org/abs/2403.10461v2 )

ライセンス: Link先を確認

Mohamed elShehaby, Aditya Kotha, Ashraf Matrawy,

(参考訳) 敵の訓練は、敵の攻撃に対する機械学習(ML)モデルの堅牢性を高める。しかし、ネットワーク/サイバーセキュリティ領域におけるラベル付きトレーニングデータと敵のトレーニングデータを取得することは困難かつコストがかかる。そこで,本論文では,現実に検出された逆データを用いて,連続学習セッション中に,逆学習サンプルをモデルに統合するAdaptive Continuous Adversarial Training (ACAT)を紹介する。 SPAM検出データセットによる実験結果から、ACATは従来のプロセスと比較して逆サンプル検出に要する時間を短縮することを示した。さらに, MLに基づくSPAMフィルタの精度は, 3回のトレーニング後に69%から88%に向上した。

Adversarial training enhances the robustness of Machine Learning (ML) models against adversarial attacks. However, obtaining labeled training and adversarial training data in network/cybersecurity domains is challenging and costly. Therefore, this letter introduces Adaptive Continuous Adversarial Training (ACAT), a method that integrates adversarial training samples into the model during continuous learning sessions using real-world detected adversarial data. Experimental results with a SPAM detection dataset demonstrate that ACAT reduces the time required for adversarial sample detection compared to traditional processes. Moreover, the accuracy of the under-attack ML-based SPAM filter increased from 69% to over 88% after just three retraining sessions.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 先行学習によるフローベース生成超解法モデルの構築

Boosting Flow-based Generative Super-Resolution Models via Learned Prior ( http://arxiv.org/abs/2403.10988v3 )

ライセンス: Link先を確認

Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang, Hao-Wei Chen, Roy Tseng, Chien Feng, Chun-Yi Lee,

(参考訳) フローベース超解像(SR)モデルは、高品質な画像を生成する際に驚くべき能力を示した。しかし、これらの手法は、グリッドアーティファクト、爆発する逆数、固定サンプリング温度による最適以下の結果など、画像生成においていくつかの課題に直面している。これらの問題を克服するために、フローベースSRモデルの推論フェーズに先立って学習された条件を導入する。この前者は,低解像度画像上に条件付き潜在モジュールによって予測された潜時符号であり,フローモデルによりSR画像に変換される。我々のフレームワークは、アーキテクチャや事前訓練された重量を変更することなく、現代のフローベースSRモデルとシームレスに統合するように設計されている。提案手法の有効性を,広範囲な実験とアブレーション解析により評価した。提案するフレームワークは,フローベースSRモデルに固有のすべての問題に対処し,様々なSRシナリオにおける性能を向上させる。私たちのコードは、https://github.com/liyuantsao/BFSRで利用可能です。

Flow-based super-resolution (SR) models have demonstrated astonishing capabilities in generating high-quality images. However, these methods encounter several challenges during image generation, such as grid artifacts, exploding inverses, and suboptimal results due to a fixed sampling temperature. To overcome these issues, this work introduces a conditional learned prior to the inference phase of a flow-based SR model. This prior is a latent code predicted by our proposed latent module conditioned on the low-resolution image, which is then transformed by the flow model into an SR image. Our framework is designed to seamlessly integrate with any contemporary flow-based SR model without modifying its architecture or pre-trained weights. We evaluate the effectiveness of our proposed framework through extensive experiments and ablation analyses. The proposed framework successfully addresses all the inherent issues in flow-based SR models and enhances their performance in various SR scenarios. Our code is available at: https://github.com/liyuantsao/BFSR

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 学習不要損失に基づく拡散誘導の理解と改善

Understanding and Improving Training-free Loss-based Diffusion Guidance ( http://arxiv.org/abs/2403.12404v2 )

ライセンス: Link先を確認

Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, Dongsheng Li,

(参考訳) 事前訓練された拡散モデルにさらなる制御を加えることが、コンピュータビジョン、強化学習、科学のためのAIなど、ますます人気のある研究領域となっている。近年,クリーンな画像に事前学習したオフ・ザ・シェルフネットワークを用いて,学習自由損失に基づくガイダンスを提案する研究がいくつかある。このアプローチは、拡散誘導の無料ランチを提供するように見えるユニバーサル制御フォーマットのゼロショット条件生成を可能にする。本稿では,トレーニングフリーガイダンスの理解を深め,その限界を克服することを目的としている。我々は,学習自由指導を最適化の観点から支援する理論解析を行い,それを分類者に基づく(または分類者なし)指導と区別する。これらの欠点を解明するために, 学習自由指導が逆勾配の影響を受けやすいことを理論的に証明し, 分類器指導と比較して緩やかな収束率を示す。次に,その限界を克服するために,理論的理論的根拠と実証的証拠を伴って,一連の手法を導入する。画像と動きの生成実験により,これらの手法の有効性が確認された。

Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universal control formats, which appears to offer a free lunch in diffusion guidance. In this paper, we aim to develop a deeper understanding of training-free guidance, as well as overcome its limitations. We offer a theoretical analysis that supports training-free guidance from the perspective of optimization, distinguishing it from classifier-based (or classifier-free) guidance. To elucidate their drawbacks, we theoretically demonstrate that training-free guidance is more susceptible to adversarial gradients and exhibits slower convergence rates compared to classifier guidance. We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence. Our experiments in image and motion generation confirm the efficacy of these techniques.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 量子オンサガー関係

Quantum Onsager relations ( http://arxiv.org/abs/2403.12896v3 )

ライセンス: Link先を確認

Mankei Tsang,

(参考訳) 量子情報幾何学を用いて、定常状態に近い開系の力学をモデル化するOnsagerレート方程式の量子一般化を導出する。一般化された方程式は、力のフレキシブルな定義と、エントロピー生成の従来の定義を超えて、統計的発散測度と量子フィッシャー情報化測度を包含する。また、オープン量子系に対する時間反転と詳細バランスの一般的な概念を提案して、輸送テンソルに対する量子オンサーガー-カシミール関係を導出する。結果は、統計力学とパラメータ推定理論の間に顕著な関連性を確立する。

Using quantum information geometry, I derive quantum generalizations of the Onsager rate equations, which model the dynamics of an open system near a steady state. The generalized equations hold for a flexible definition of the forces as well as a large class of statistical divergence measures and quantum-Fisher-information metrics beyond the conventional definition of entropy production. I also derive quantum Onsager-Casimir relations for the transport tensors by proposing a general concept of time reversal and detailed balance for open quantum systems. The results establish a remarkable connection between statistical mechanics and parameter estimation theory.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# LLM埋め込みによるテキストクラスタリング

Text clustering with LLM embeddings ( http://arxiv.org/abs/2403.15112v2 )

ライセンス: Link先を確認

Alina Petukhova, Joao P. Matos-Carvalho, Nuno Fachada,

(参考訳) テキストクラスタリングは、デジタルコンテンツの増加を組織化する上で重要なアプローチであり、分類されていないデータに隠されたパターンを構造化し見つけるのに役立つ。本研究では,大規模言語モデル(LLM)におけるテキスト埋め込みとクラスタリングアルゴリズムの違いが,テキストデータセットのクラスタリングに与える影響について検討した。組込みがクラスタリング結果にどのように影響するか, 要約による次元還元による役割, 組込みサイズ調整について, 一連の実験を行った。その結果、LLM埋め込みは構造化言語のニュアンスを捉えるのに優れており、BERTは性能において軽量な選択肢を導いていることがわかった。さらに,組込み次元の増大や要約手法はクラスタリング効率を均一に向上させるものではないことが判明し,これらの手法が実生活モデルで使用するためには慎重な分析が必要であることが示唆された。これらの結果は、テキストクラスタリングアプリケーションにおいて、ニュアンス付きテキスト表現の必要性と計算可能性との複雑なバランスを浮き彫りにする。本研究は, 従来のテキストクラスタリングフレームワークを拡張し, LLMからの埋め込みを組み込むことで, 方法論改善の道を切り開くとともに, 各種テキスト解析における新たな手法を開拓する。

Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. In this research, we investigated how different textual embeddings - particularly those used in large language models (LLMs) - and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and embedding size adjustment. Results reveal that LLM embeddings excel at capturing the nuances of structured language, while BERT leads the lightweight options in performance. In addition, we find that increasing embedding dimensionality and summarisation techniques do not uniformly improve clustering efficiency, suggesting that these strategies require careful analysis to use in real-life models. These results highlight a complex balance between the need for nuanced text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by incorporating embeddings from LLMs, thereby paving the way for improved methodologies and opening new avenues for future research in various types of textual analysis.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# MSCoTDet:マルチスペクトルペデストリアン検出のための言語駆動型マルチモーダルフュージョン

MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection ( http://arxiv.org/abs/2403.15209v2 )

ライセンス: Link先を確認

Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro,

(参考訳) RGBと熱モダリティの相補的な情報により, マルチスペクトル歩行者検出は, 概日適用にとって魅力的である。しかしながら、現在のモデルは、特に統計的に偏ったデータセットから得られたモダリティバイアスのために、特定のケース(例えば、熱障害のある歩行者)で歩行者を検出することができないことが多い。本稿では,Large Language Models (LLMs) を用いた多スペクトル歩行者検出におけるモダリティバイアスの緩和について検討する。そこで我々は,マルチスペクトル・チェーン・オブ・ソート(MSCoT)のプロンプト戦略を設計し,LLMがマルチスペクトル歩行者検出を行うように促す。さらに,MSCoTプロンプトをマルチスペクトル歩行者検出に統合するMSCoTDet(Multispectral Chain-of-Thought Detection)フレームワークを提案する。この目的のために我々は,MSCoTの出力を融合させる言語駆動型マルチモーダルフュージョン (LMF) 戦略を設計し,視覚に基づくマルチスペクトル歩行者検出モデルの検出結果に即した。大規模な実験により、MSCoTDetはモダリティバイアスを効果的に軽減し、多スペクトル歩行者検出を改善することが検証された。

Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in certain cases (e.g., thermal-obscured pedestrians), particularly due to the modality bias learned from statistically biased datasets. In this paper, we investigate how to mitigate modality bias in multispectral pedestrian detection using Large Language Models (LLMs). Accordingly, we design a Multispectral Chain-of-Thought (MSCoT) prompting strategy, which prompts the LLM to perform multispectral pedestrian detection. Moreover, we propose a novel Multispectral Chain-of-Thought Detection (MSCoTDet) framework that integrates MSCoT prompting into multispectral pedestrian detection. To this end, we design a Language-driven Multi-modal Fusion (LMF) strategy that enables fusing the outputs of MSCoT prompting with the detection results of vision-based multispectral pedestrian detection models. Extensive experiments validate that MSCoTDet effectively mitigates modality biases and improves multispectral pedestrian detection.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 量子/古典系と量子軌道に対するマルコフ力学

Markovian dynamics for a quantum/classical system and quantum trajectories ( http://arxiv.org/abs/2403.16065v2 )

ライセンス: Link先を確認

Alberto Barchielli,

(参考訳) 量子軌道法は、数値計算の出発点としてオープンシステム理論において使われ、連続した時間で量子システムのモニタリングを記述するために用いられる。ここでは、この手法を拡張して、量子/古典ハイブリッドシステムのダイナミクスに対する一般的なアプローチを開発する。 2つの結合確率微分方程式を用いることで、古典的成分と、それぞれ固有の力学を持ち、互いに相互作用する量子的成分を記述することができる。数学的に厳密な構成は、マルコフの合同力学を持ち、量子成分のヒルベルト空間上の有界作用素のみを含むという制限の下で与えられる。重要な特徴は、相互作用が量子成分から古典成分への情報のフローを許容するならば、必然的に力学は散逸的であることである。また、この理論は、純粋に量子の場合において量子力学半群に還元され、純粋に古典的な場合においてリウヴィル方程式とコルモゴロフ-フォッカー-プランク方程式を含む適切なハイブリッド力学半群とどのように結びついているかを示す。さらに、この半群は、提案された確率力学をハイブリッドマスター方程式に基づく他の様々な提案と比較することができる。いくつかの単純な例は、説明できる様々な物理的な振る舞いを示すために構築されており、特に隠れ絡みを示すモデルが導入されている。

Quantum trajectory techniques have been used in the theory of open systems as a starting point for numerical computations and to describe the monitoring of a quantum system in continuous time. Here we extend this technique and use it to develop a general approach to the dynamics of quantum/classical hybrid systems. By using two coupled stochastic differential equations, we can describe a classical component and a quantum one which have their own intrinsic dynamics and which interact with each other. A mathematically rigorous construction is given, under the restriction of having a Markovian joint dynamics and of involving only bounded operators on the Hilbert space of the quantum component. An important feature is that, if the interaction allows for a flow of information from the quantum component to the classical one, necessarily the dynamics is dissipative. We show also how this theory is connected to a suitable hybrid dynamical semigroup, which reduces to a quantum dynamical semigroup in the purely quantum case and includes Liouville and Kolmogorov-Fokker-Plank equations in the purely classical case. Moreover, this semigroup allows to compare the proposed stochastic dynamics with various other proposals based on hybrid master equations. Some simple examples are constructed in order to show the variety of physical behaviours which can be described; in particular, a model presenting hidden entanglement is introduced.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# SegICL:医療画像におけるセグメンテーション強化のためのマルチモーダルインコンテキスト学習フレームワーク

SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging ( http://arxiv.org/abs/2403.16578v3 )

ライセンス: Link先を確認

Lingdong Shen, Fangxin Shang, Xiaoshuang Huang, Yehui Yang, Haifeng Huang, Shiming Xiang,

(参考訳) 医用画像のセグメンテーションの分野では、アウト・オブ・ディストリビューション(OOD)のセグメンテーションタスクを費用対効果で扱うことが大きな課題である。ユニバーサルセグメンテーションモデル(Universal segmentation model)は、医療画像の様々なモダリティを一般化することを目的としたソリューションである。少ないショットの学習セグメンテーション法は、典型的にはデータの特定のモダリティのために設計されており、他のモダリティで使用するために直接転送することはできない。そこで我々は,画像セグメンテーションにIn-Context Learning(ICL)を活用する新しいアプローチであるSegICLを紹介した。既存の方法とは異なり、SegICLはテキスト誘導セグメンテーションを採用し、小さなイメージマスクペアでコンテキスト内学習を行う機能を備えており、OODタスク(OODモダリティとデータセットを含む)のスクラッチや微調整からモデルをトレーニングする必要がなくなる。 OODタスクにおけるショット数とセグメンテーション性能の正の相関を示す。ショット供給時のセグメンテーション性能はゼロショット設定時の性能の約1.5倍である。これは、SegICLがコンテキスト情報に基づく新しいセグメンテーションタスクに効果的に対処していることを示している。さらに、SegICLはOODおよび分散タスクのメインストリームモデルに匹敵するパフォーマンスを示す。私たちのコードは、論文レビューの後にリリースされます。

In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 都市VLP:都市域プロファイリングのためのマルチグラニュラリティビジョンランゲージ準備

UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Region Profiling ( http://arxiv.org/abs/2403.16831v2 )

ライセンス: Link先を確認

Xixuan Hao, Wei Chen, Yibo Yan, Siru Zhong, Kun Wang, Qingsong Wen, Yuxuan Liang,

(参考訳) 都市域のプロファイリングは、人口動態、インフラ、経済活動などの特徴を保存しつつ、都市の低次元的な表現を学習することを目的としている。しかし、一般的な事前訓練されたモデル、特に衛星画像に依存しているモデルは、二重課題に直面している。第一に、衛星データからマクロレベルのパターンのみに集中させることは、ある場所での建築詳細などの微妙な詳細さの欠如による偏見をもたらす可能性があり、第二に、事前訓練されたモデルにおける解釈可能性の欠如は、都市計画の透明な証拠を提供する上での有用性を制限している。これらの問題に対処して、ビジョン・ランゲージ事前学習に基づくUrbanVLPという新しいフレームワークを考案した。我々のUrbanVLPは、マクロ(サテライト)レベルとマイクロ(ストリートビュー)レベルの複数粒度情報をシームレスに統合し、事前訓練されたモデルの制限を克服します。さらに、自動テキスト生成と校正を導入し、都市画像の高品質なテキスト記述を作成することにより、下流アプリケーションにおける解釈可能性を高める。 6つの都市指標予測タスクで実施された厳密な実験は、その優れた性能を示している。

Urban region profiling aims to learn a low-dimensional representation of a given urban area while preserving its characteristics, such as demographics, infrastructure, and economic activities, for urban planning and development. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place.Secondly, the lack of interpretability in pretrained models limits their utility in providing transparent evidence for urban planning. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro (street-view) levels, overcoming the limitations of prior pretrained models. Moreover, it introduces automatic text generation and calibration, elevating interpretability in downstream applications by producing high-quality text descriptions of urban imagery. Rigorous experiments conducted across six urban indicator prediction tasks underscore its superior performance.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 関係探索による大規模言語モデルを用いた曖昧なエンティティマッチング

Disambiguate Entity Matching using Large Language Models through Relation Discovery ( http://arxiv.org/abs/2403.17344v2 )

ライセンス: Link先を確認

Zezhou Huang,

(参考訳) エンティティマッチングは、ファジィ結合や重複解消といったタスクの中心にある、データ統合とクリーニングにおいて重要な課題である。従来のアプローチでは、編集距離やJaccardの類似性、最近では、GPTのような大規模言語モデル(LLM)の進歩を含む組み込みやディープニューラルネットワークなど、ファジィな項表現の克服に重点を置いてきた。しかし、エンティティマッチングにおける中核的な課題は、特に外部データベースとの統合において「マッチ」を構成するものを定義することの曖昧さにまで及んでいる。この曖昧さは、実体間の詳細と粒度の異なるレベルから生じ、正確な一致を複雑にする。本稿では,意味的類似性を純粋に識別するアプローチから,マッチングにおけるあいまいさの解消に不可欠なエンティティ間の「関係」を理解し定義するアプローチを提案する。本手法では,タスクに関連する一連の関係を事前に定義することにより,類似性のスペクトルをより効率的にナビゲートすることができる。

Entity matching is a critical challenge in data integration and cleaning, central to tasks like fuzzy joins and deduplication. Traditional approaches have focused on overcoming fuzzy term representations through methods such as edit distance, Jaccard similarity, and more recently, embeddings and deep neural networks, including advancements from large language models (LLMs) like GPT. However, the core challenge in entity matching extends beyond term fuzziness to the ambiguity in defining what constitutes a "match," especially when integrating with external databases. This ambiguity arises due to varying levels of detail and granularity among entities, complicating exact matches. We propose a novel approach that shifts focus from purely identifying semantic similarities to understanding and defining the "relations" between entities as crucial for resolving ambiguities in matching. By predefining a set of relations relevant to the task at hand, our method allows analysts to navigate the spectrum of similarity more effectively, from exact matches to conceptually related entities.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# マルチビュー特徴抽出のための量子加速クロスレグレッションアルゴリズム

Quantum accelerated cross regression algorithm for multiview feature extraction ( http://arxiv.org/abs/2403.17444v2 )

ライセンス: Link先を確認

Hai-Ling Liu, Ya-Qian Zhao, Ren-Gang Li, Xin Zhang,

(参考訳) マルチビュー特徴抽出(MvFE)は、機械学習、画像処理、その他の分野に広く応用されている。大規模高次元データを扱う場合、MvFEにより古典コンピュータの性能は深刻な問題に直面し、高価な行列計算を行う。この課題に対処するために、MvFEのための量子加速クロスレグレッションアルゴリズムを提案する。 1) MvFE の分野における量子コンピューティングのギャップを埋める MvFE の量子バージョンアルゴリズムを提案し、(2) 量子アルゴリズムは対象データ行列のブロックエンコーディングを構築するように設計され、ブロックエンコーディングフレームワークに基づく最適なハミルトンシミュレーション技術を使用して、対象データ行列の量子シミュレーションを効率的に実現することができる。提案手法は,アルゴリズムのシミュレーション誤差への依存を低減し,アルゴリズム性能を向上させる。(3)古典的アルゴリズムと比較して,提案アルゴリズムは,データ点数,データ点の次元,ビューデータ数において多項式加速度を有する。

Multi-view Feature Extraction (MvFE) has wide applications in machine learning, image processing and other fields. When dealing with massive high-dimensional data, the performance of classical computer faces severe challenges due to MvFE involves expensive matrix calculation. To address this challenge, a quantum-accelerated cross-regression algorithm for MvFE is proposed. The main contributions are as follows:(1) a quantum version algorithm for MvFE is proposed for the first time, filling the gap of quantum computing in the field of MvFE;(2) a quantum algorithm is designed to construct the block-encoding of the target data matrix, so that the optimal Hamiltonian simulation technology based on the block-encoding framework can be used to efficiently realize the quantum simulation of the target data matrix. This approach reduces the dependence of the algorithm's on simulation errors to enhance algorithm performance;(3) compared with the classical counterpart algorithm, the proposed quantum algorithm has a polynomial acceleration in the number of data points, the dimension of data points and the number of view data.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# マスクオートエンコーダはPDE学習者である

Masked Autoencoders are PDE Learners ( http://arxiv.org/abs/2403.17728v2 )

ライセンス: Link先を確認

Anthony Zhou, Amir Barati Farimani,

(参考訳) 偏微分方程式(PDE)に対するニューラルソルバは、高速で正確な物理解を生成する大きな可能性を持っているが、その実用性は、その一般化性によって制限されている。 PDEは幅広いスケールで進化し、様々な振る舞いを示す。これらの現象を予測するには、様々な係数、境界条件、解像度、方程式を含む様々な入力の学習表現が必要となる。一般化可能なPDEモデリングへのステップとして,物理問題に対するマスク付き事前学習を適用する。 PDEを横断する自己教師型学習によって、マスク付きオートエンコーダは異種物理学を統合し、意味のある潜在表現を学習し、この空間で潜在PDE算術を実行することができる。さらに,マスク付きプレトレーニングによりPDE係数の回帰とPDE特徴の分類が向上することが実証された。最後に、学習した潜在表現にニューラルソルバを条件付けすることで、様々な係数、離散化、境界条件、および目に見えないPDEにおけるタイムステッピングと超分解能のパフォーマンスを向上させることができる。マスク付きプレトレーニングは、大規模でラベルなし、異質なデータセットにまたがる統一的な方法として現れて、大規模に潜在物理学を学ぶことを願っている。

Neural solvers for partial differential equations (PDEs) have great potential to generate fast and accurate physics solutions, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs which may encompass different coefficients, boundary conditions, resolutions, or even equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for physics problems. Through self-supervised learning across PDEs, masked autoencoders can consolidate heterogeneous physics to learn meaningful latent representations and perform latent PDE arithmetic in this space. Furthermore, we demonstrate that masked pretraining can improve PDE coefficient regression and the classification of PDE features. Lastly, conditioning neural solvers on learned latent representations can improve time-stepping and super-resolution performance across a variety of coefficients, discretizations, or boundary conditions, as well as on unseen PDEs. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.

翻訳日:2024-05-30 23:01:49 公開日:2024-05-29

# 単語マッチングを超える: 構文が機械翻訳における文脈内例選択を改善した

Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation ( http://arxiv.org/abs/2403.19285v2 )

ライセンス: Link先を確認

Chenming Tang, Zhixiang Wang, Yunfang Wu,

(参考訳) In-context Learning (ICL) は、大規模言語モデル (LLM) の時代において、あるタスクに対して LLM のパワーを誘発するいくつかの例が示される、流行の促進戦略である。情報のある例をどうやって選ぶかは、未解決の問題である。機械翻訳(MT)のテキスト内サンプル選択は、構文レベルの深い知識を無視しつつ、表面的な単語レベルの特徴に重点を置いている。本稿では,ポリノミアル距離を用いた依存関係木間の構文的類似性を計算し,構文に基づくMTの例選択手法を提案する。さらに,単語レベルと構文レベルの両方の基準で選択された例を組み合わせたアンサンブル戦略を提案する。英語と6の共通言語による実験結果から,文法はMTのICLを効果的に向上し,12の翻訳方向のうち11のCOMETスコアが最も高い。

In-context learning (ICL) is the trending prompting strategy in the era of large language models (LLMs), where a few examples are demonstrated to evoke LLMs' power for a given task. How to select informative examples remains an open issue. Previous works on in-context example selection for machine translation (MT) focus on superficial word-level features while ignoring deep syntax-level knowledge. In this paper, we propose a syntax-based in-context example selection method for MT, by computing the syntactic similarity between dependency trees using Polynomial Distance. In addition, we propose an ensemble strategy combining examples selected by both word-level and syntax-level criteria. Experimental results between English and 6 common languages indicate that syntax can effectively enhancing ICL for MT, obtaining the highest COMET scores on 11 out of 12 translation directions.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 前向きパスのみを用いたテスト時間モデル適応

Test-Time Model Adaptation with Only Forward Passes ( http://arxiv.org/abs/2404.01650v2 )

ライセンス: Link先を確認

Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao,

(参考訳) テストタイム適応は、トレーニング済みのモデルを、潜在的に分布シフトのある未確認テストサンプルに適応させるのに有効であることが証明されている。しかし、現実のシナリオでは、モデルは通常、リソース制限されたデバイス(例えばFPGA)にデプロイされ、しばしば量子化され、アクセラレーションのための非修飾パラメータでハードコードされる。既存のメソッドは、サポートされないかもしれないモデル更新の計算集約的なバックプロパゲーションに大きく依存しているため、多くの場合、実現不可能である。そこで本研究では,テスト時間フォワード最適化適応法(FOA)を提案する。 FOAでは、微分自由共分散行列適応進化戦略を用いて、新たに追加されたプロンプト(モデルの入力として)のみを学習する。この戦略をオンラインの教師なし環境下で安定的に動作させるため、テスト学習統計の不一致とモデル予測エントロピーを測定して、新しい適合度関数を考案する。さらに、シフトテストサンプルのモデルアクティベーションを直接調整し、ソーストレーニング領域と整合させ、適応性能をさらに向上させるアクティベーションシフト方式を設計する。 FOAはバックプロパゲーションやモデルウェイトを変更することなく、量子化された8ビットのViT上で動作し、32ビットの32ビットのViTでは勾配ベースのTENTより優れ、ImageNet-Cでは最大24倍のメモリ削減を実現している。

Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 文脈内学習のデコンストラクタ:破壊によるプロンプト理解

Deconstructing In-Context Learning: Understanding Prompts via Corruption ( http://arxiv.org/abs/2404.02054v2 )

ライセンス: Link先を確認

Namrata Shivagunde, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky,

(参考訳) 大きな言語モデル(LLMs)から$``$learn in context$"$は、提供されたプロンプトに基づいて、その使用が爆発的に増加し、ChatGPT、Claude、BardといったAIアシスタントの普及につながった。これらのAIアシスタントは、人間のフィードバックを使用するアライメント技術によって、マイナーな迅速な修正に対して堅牢であることが知られている。対照的に、彼らがバックボーンとして使用する基礎となる事前訓練されたLSMは、この点において脆いことが知られている。高品質のバックボーンモデルの構築は依然として中心的な課題であり、その品質を評価するための一般的なアプローチは、ほとんどショット評価を行うことである。このような評価は、マイナーな迅速な修正に非常に敏感であることや、特定のインコンテキストの例を選択することで有名である。これまでの研究では、プロンプトの異なる要素の変更がモデルのパフォーマンスにどのように影響するかを調べてきた。しかし、これらの初期の研究は特定のプロンプト属性の限られた数に集中する傾向があり、しばしば矛盾する結果を生んだ。さらに、以前の研究では、パラメータが150億未満のモデルに焦点を当てたり、GPT-3やPaLMのようなブラックボックスモデルのみを精査し、複製を困難にしていた。本研究では,全プロンプトをタスク記述,デモインプット,ラベル,インラインインストラクションの4つのコンポーネントに分解する。これらの要素の構造的・意味的腐敗がモデル性能に及ぼす影響について検討する。分類と生成タスクをカバーする10のデータセットを用いて,1.5Bから70Bのモデルについて検討した。プロンプト内の繰り返しテキストはモデル性能を向上し、より大きなモデル($30B)はプロンプトのセマンティクスにより敏感であることがわかった。最後に、実演にタスクとインライン命令を追加することで、意味的に破損してもモデル性能が向上することが観察された。

The ability of large language models (LLMs) to $``$learn in context$"$ based on the provided prompt has led to an explosive growth in their use, culminating in the proliferation of AI assistants such as ChatGPT, Claude, and Bard. These AI assistants are known to be robust to minor prompt modifications, mostly due to alignment techniques that use human feedback. In contrast, the underlying pre-trained LLMs they use as a backbone are known to be brittle in this respect. Building high-quality backbone models remains a core challenge, and a common approach to assessing their quality is to conduct few-shot evaluation. Such evaluation is notorious for being highly sensitive to minor prompt modifications, as well as the choice of specific in-context examples. Prior work has examined how modifying different elements of the prompt can affect model performance. However, these earlier studies tended to concentrate on a limited number of specific prompt attributes and often produced contradictory results. Additionally, previous research either focused on models with fewer than 15 billion parameters or exclusively examined black-box models like GPT-3 or PaLM, making replication challenging. In the present study, we decompose the entire prompt into four components: task description, demonstration inputs, labels, and inline instructions provided for each demonstration. We investigate the effects of structural and semantic corruptions of these elements on model performance. We study models ranging from 1.5B to 70B in size, using ten datasets covering classification and generation tasks. We find that repeating text within the prompt boosts model performance, and bigger models ($\geq$30B) are more sensitive to the semantics of the prompt. Finally, we observe that adding task and inline instructions to the demonstrations enhances model performance even when the instructions are semantically corrupted.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# ポイントクラウド映像表現学習のためのPDEモデリングの検討

On Exploring PDE Modeling for Point Cloud Video Representation Learning ( http://arxiv.org/abs/2404.04720v2 )

ライセンス: Link先を確認

Zhuoxu Huang, Zhenkun Fan, Tao Xu, Jungong Han,

(参考訳) 複雑な構造と秩序のない空間配置のため、ポイントクラウドビデオ表現学習は困難である。従来の手法はフレーム・ツー・フレームの相関やポイント・ワイド対応追跡に苦慮している。近年、偏微分方程式(PDE)は、特定の制約の中で空間的時間的データ情報を均一に解く新しい視点を提供する。有形点対応の追跡は依然として困難であるが,PDE解決問題としてポイントクラウド映像表現学習の形式化を提案する。 PDEは時間とともに空間形状の変形を解くために使用される流体解析にインスパイアされ、時間的情報によって影響を受ける空間点の変動を解決するためにPDEを用いている。空間的時間的相関をモデル化することにより、時間的特徴と空間的変動を規則化し、ポイントクラウドビデオにおける表現学習を強化することを目指す。我々は、PointNetライクなエンコーダとPDE解決モジュールで構成されるMotion PointNetを紹介する。当初,空間変動の初期状態をモデル化する軽量で効果的なエンコーダを構築した。その後,PDE分解モジュールをパラメータ化潜在空間で開発し,ポイントクラウドビデオに固有の時空間相関に対処する。 PDEの解法は、特徴分布の変換において重要なコントラスト学習構造により導かれ、洗練され、ポイントクラウドビデオデータ内の特徴表現が最適化される。注目すべきは、Motion PointNetがMSRAction-3Dデータセットで97.52%という驚くべき精度を達成したことです。

Point cloud video representation learning is challenging due to complex structures and unordered spatial arrangement. Traditional methods struggle with frame-to-frame correlations and point-wise correspondence tracking. Recently, partial differential equations (PDE) have provided a new perspective in uniformly solving spatial-temporal data information within certain constraints. While tracking tangible point correspondence remains challenging, we propose to formalize point cloud video representation learning as a PDE-solving problem. Inspired by fluid analysis, where PDEs are used to solve the deformation of spatial shape over time, we employ PDE to solve the variations of spatial points affected by temporal information. By modeling spatial-temporal correlations, we aim to regularize spatial variations with temporal features, thereby enhancing representation learning in point cloud videos. We introduce Motion PointNet composed of a PointNet-like encoder and a PDE-solving module. Initially, we construct a lightweight yet effective encoder to model an initial state of the spatial variations. Subsequently, we develop our PDE-solving module in a parameterized latent space, tailored to address the spatio-temporal correlations inherent in point cloud video. The process of solving PDE is guided and refined by a contrastive learning structure, which is pivotal in reshaping the feature distribution, thereby optimizing the feature representation within point cloud video data. Remarkably, our Motion PointNet achieves an impressive accuracy of 97.52% on the MSRAction-3D dataset, surpassing the current state-of-the-art in all aspects while consuming minimal resources (only 0.72M parameters and 0.82G FLOPs).

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 2024年エヴァラティンのラテンパイプ:ラテンのモルフォシンタクティック分析

ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin ( http://arxiv.org/abs/2404.05839v2 )

ライセンス: Link先を確認

Milan Straka, Jana Straková, Federica Gamba,

(参考訳) 我々は、EvaLatin 2024 Dependency Parsing 共有タスクの受賞申請である LatinPipe を提示する。本システムでは, 基本および大型の事前学習型LMの微調整による連結と, 係り受け解析と形態解析の両方を共同で学習する形態学用ドット積アテンションヘッド, ソフトマックス分類ヘッドから構成される。これは、より統一されたアノテーションスタイルを達成するために、アノテーションの追加調和を利用して、利用可能な7つのラテンコーパスからのサンプリングによって訓練される。微調整の前に、凍結重量のあるいくつかの初期エポックでシステムを訓練する。また、Transformer(s)上にBiLSTMレイヤを積み重ねることで、局所的な相対的コンテキスト化も追加します。最後に、7つのランダムにインスタンス化されたネットワークから出力された確率分布を最終提出のためにアンサンブルする。コードはhttps://github.com/ufal/evalatin2024-latinpipeで公開されている。

We present LatinPipe, the winning submission to the EvaLatin 2024 Dependency Parsing shared task. Our system consists of a fine-tuned concatenation of base and large pre-trained LMs, with a dot-product attention head for parsing and softmax classification heads for morphology to jointly learn both dependency parsing and morphological analysis. It is trained by sampling from seven publicly available Latin corpora, utilizing additional harmonization of annotations to achieve a more unified annotation style. Before fine-tuning, we train the system for a few initial epochs with frozen weights. We also add additional local relative contextualization by stacking the BiLSTM layers on top of the Transformer(s). Finally, we ensemble output probability distributions from seven randomly instantiated networks for the final submission. The code is available at https://github.com/ufal/evalatin2024-latinpipe.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# CoVoMix:人間のような多話者会話のためのゼロショット音声生成の改善

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations ( http://arxiv.org/abs/2404.06690v2 )

ライセンス: Link先を確認

Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng,

(参考訳) ゼロショット音声合成(TTS)モデリングの最近の進歩は、高忠実で多様な音声を生成するために大きな進歩をもたらした。しかし、対話生成は、音声における人間のような自然性を達成するとともに、引き続き課題である。本稿では,ゼロショット,ヒューマンライク,マルチスピーカ,マルチラウンド音声生成のための新しいモデルであるCoVoMix: Conversational Voice Mixture Generationを紹介する。 CoVoMixはまず対話テキストを個別のトークンの複数のストリームに変換する。これらのトークンストリームは、フローマッチングベースの音響モデルに入力され、混合メル-スペクトログラムを生成する。最後に、HiFi-GANモデルを用いて音声波形を生成する。さらに、対話モデリングと生成の有効性を測定するための総合的なメトリクスセットを考案する。実験の結果,CoVoMixは自然性やコヒーレンスにおいて人間に似た対話を生成できるだけでなく,複数の話者が複数ラウンドの会話を行うことができることがわかった。これは、ある話者の発話が他の話者の介在物や笑いとシームレスに混合される単一のチャンネルで生成された事例によって例示され、後者が注意深いリスナーとしての役割を示す。オーディオサンプルはhttps://aka.ms/covomix.comで入手できる。

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix first converts dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. This is exemplified by instances generated in a single channel where one speaker's utterance is seamlessly mixed with another's interjections or laughter, indicating the latter's role as an attentive listener. Audio samples are available at https://aka.ms/covomix.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 有限マルコフ長による混合状態量子相の安定性

Stability of mixed-state quantum phases via finite Markov length ( http://arxiv.org/abs/2404.07251v2 )

ライセンス: Link先を確認

Shengqi Sang, Timothy H. Hsieh,

(参考訳) ハミルトン基底状態の量子相の場合、エネルギーギャップは、ギャップが有限である限り、位相の安定性を保証する上で中心的な役割を果たす。混合状態相と遷移を特徴づける等しく重要な量として,量子条件相互情報(CMI)が指数関数的に減衰する長さ尺度であるマルコフ長を提案する。局所リンドブラディアンの下で進化する状態について、マルコフの長さが進化に沿って有限であるならば、それは同じ相のままであり、つまり、前者の進化を逆転できる別の準局所リンドブラディアン進化が存在するということである。この診断をデコヒーレンスに基づくトーリックコードに適用し,マルコフ長はデコヒーレンス遷移以外の至る所で有限であることを示す。この場合、CMIはランダム結合イジングモデルにおける点欠陥の自由エネルギーコストにマッピングできる。これは混合状態相転移が陰極性転移と一致することを示唆し、準局所復号チャネルも示唆している。

For quantum phases of Hamiltonian ground states, the energy gap plays a central role in ensuring the stability of the phase as long as the gap remains finite. We propose Markov length, the length scale at which the quantum conditional mutual information (CMI) decays exponentially, as an equally essential quantity characterizing mixed-state phases and transitions. For a state evolving under a local Lindbladian, we argue that if its Markov length remains finite along the evolution, then it remains in the same phase, meaning there exists another quasi-local Lindbladian evolution that can reverse the former one. We apply this diagnostic to toric code subject to decoherence and show that the Markov length is finite everywhere except at its decodability transition, at which it diverges. CMI in this case can be mapped to the free energy cost of point defects in the random bond Ising model. This implies that the mixed state phase transition coincides with the decodability transition and also suggests a quasi-local decoding channel.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# Sketch-Plan-Generalize:帰納的一般化可能な空間概念の連続的なFew-Shot学習

Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts ( http://arxiv.org/abs/2404.07774v2 )

ライセンス: Link先を確認

Namasivayam Kalithasan, Sachit Sachdeva, Himanshu Gaurav Singh, Vishal Bindal, Arnav Tuli, Gurarmaan Singh Panjeta, Divyanshu Aggarwal, Rohan Paul, Parag Singla,

(参考訳) 本研究の目的は,高層タワーの帰納的構成として,空間概念の帰納的一般化を学習できるようにすることである。人間の実演を前提として、観測されたインスタンスを説明するsuccinct ${ program}$表現を推論する学習アーキテクチャを模索する。さらに、このアプローチは、異なる大きさの新規構造や、以前に学習された概念の階層的な構成として表される複雑な構造に誘導的に一般化すべきである。事前訓練された大きな(視覚的な)言語モデルのコード生成機能と純粋にニューラルモデルを使用する既存のアプローチは、a-prioriが目にしない複雑な概念への一般化が不十分であることを示している。私たちのキーとなる洞察は、帰納的概念学習を要因とすることです。 (i)${\it Sketch:}$新しい概念の粗いシグネチャを検出して推測する (ii)${\it Plan:}$ MCTS search over grounded action sequences (iii)${\it Generalize:}$ordered planをインダクティブプログラムとして抽象化する。私たちのパイプラインは、一般化とモジュラーの再利用を促進し、継続的な概念学習を可能にします。提案手法は,大規模言語モデル(LLM)のコード生成能力と基底的ニューラルネットワーク表現の利点を組み合わせることで,LLMとニューラルオンリーのアプローチに関連する複雑な構造を構築するタスクにおいて,より強力な帰納的一般化を示すニューラルシンボリックプログラムを実現する。さらに、後続の具体化指導のための学習概念を用いた推論と計画能力を示す。

Our goal is to enable embodied agents to learn inductively generalizable spatial concepts, e.g., learning staircase as an inductive composition of towers of increasing height. Given a human demonstration, we seek a learning architecture that infers a succinct ${program}$ representation that explains the observed instance. Additionally, the approach should generalize inductively to novel structures of different sizes or complex structures expressed as a hierarchical composition of previously learned concepts. Existing approaches that use code generation capabilities of pre-trained large (visual) language models, as well as purely neural models, show poor generalization to a-priori unseen complex concepts. Our key insight is to factor inductive concept learning as (i) ${\it Sketch:}$ detecting and inferring a coarse signature of a new concept (ii) ${\it Plan:}$ performing MCTS search over grounded action sequences (iii) ${\it Generalize:}$ abstracting out grounded plans as inductive programs. Our pipeline facilitates generalization and modular reuse, enabling continual concept learning. Our approach combines the benefits of the code generation ability of large language models (LLM) along with grounded neural representations, resulting in neuro-symbolic programs that show stronger inductive generalization on the task of constructing complex structures in relation to LLM-only and neural-only approaches. Furthermore, we demonstrate reasoning and planning capabilities with learned concepts for embodied instruction following.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 検索クエリのセマンティックなドメイン内商品識別

Semantic In-Domain Product Identification for Search Queries ( http://arxiv.org/abs/2404.09091v2 )

ライセンス: Link先を確認

Sanat Sharma, Jayant Kumar, Twisha Naik, Zhaoyu Lu, Arvind Srikantan, Tracy Holloway King,

(参考訳) 検索クエリにおける正確な製品識別と暗黙の製品識別は、特にAdobeのような50以上の製品を持ち、数百のツールにまたがるクエリをカバーしている企業において、ユーザーエクスペリエンスの向上に不可欠である。本研究では,ユーザ行動データから製品分類器を学習するための新しい手法を提案する。私たちのセマンティックモデルでは、デプロイされた表面におけるCTRの相対的な改善(クリックスルーレート)が25%以上、ヌルレートが50%以上減少し、アプリカードが2倍増加し、製品の可視性が向上しました。

Accurate explicit and implicit product identification in search queries is critical for enhancing user experiences, especially at a company like Adobe which has over 50 products and covers queries across hundreds of tools. In this work, we present a novel approach to training a product classifier from user behavioral data. Our semantic model led to >25% relative improvement in CTR (click through rate) across the deployed surfaces; a >50% decrease in null rate; a 2x increase in the app cards surfaced, which helps drive product visibility.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 効率的な量子力学のための排他的あるいはエンコードされた代数構造

Exclusive-or encoded algebraic structure for efficient quantum dynamics ( http://arxiv.org/abs/2404.09312v2 )

ライセンス: Link先を確認

Lukas Broers, Ludwig Mathey,

(参考訳) 本稿では,多体2レベル量子系の代数構造を捉える形式的手法を提案する。この形式主義は、対応するリー代数の元の列挙指標のバイナリ表現に基づいている。その代数の任意の大きな要素の作用は、ビット単位の排他的操作に還元される。この形式主義は自然に多体密度作用素のスパース表現を生成し、そのサイズは動的トランケーション法によって制御される。我々は、この形式主義がリアルタイム進化、消散的リンドブラッド作用、想像的時間進化、および射影的測定プロセスにどのように適用されるかを実証する。量子力学計算のこのアプローチは、密度演算子の非零成分の数と線形に近似する。この排他的あるいは表現的量子代数をORQAと呼ぶ。概念実証として、最大22の2レベルシステムに対する最大独立集合問題に対する量子アニール過程をシミュレートすることで、この形式性の数値的な実証を行う。

We propose a formalism that captures the algebraic structure of many-body two-level quantum systems, and directly motivates an efficient numerical method. This formalism is based on the binary representation of the enumeration-indices of the elements of the corresponding Lie algebra. The action of arbitrarily large elements of that algebra reduces to a few bit-wise exclusive-or operations. This formalism naturally produces sparse representations of many-body density operators, the size of which we control through a dynamic truncation method. We demonstrate how this formalism applies to real-time evolution, dissipative Lindblad action, imaginary-time evolution, and projective measurement processes. We find that this approach to calculating quantum dynamics scales close to linearly with the number of non-zero components in the density operator. We refer to this exclusive-or represented quantum algebra as ORQA. As a proof of concept, we provide a numerical demonstration of this formalism by simulating quantum annealing processes for the maximum independent set problem for up to 22 two-level systems.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# AudioProtoPNet:鳥音分類のための解釈可能なディープラーニングモデル

AudioProtoPNet: An interpretable deep learning model for bird sound classification ( http://arxiv.org/abs/2404.10420v2 )

ライセンス: Link先を確認

René Heinrich, Bernhard Sick, Christoph Scholz,

(参考訳) 近年、鳥類の多様性を監視するための深層学習モデルが提案されている。これらのモデルは音響信号を解析することにより高精度に鳥種を検出することができる。しかし、従来のディープラーニングアルゴリズムは、意思決定プロセスに関する洞察を提供するブラックボックスモデルである。鳥類学者のようなドメインの専門家にとって、これらのモデルは効率的であるだけでなく、補助ツールとして使われるために解釈可能であることが重要である。本研究では,そのモデルアーキテクチャによる固有解釈性を提供する音声分類に,Prototypeal Part Network (ProtoPNet) を適用した。本手法は,特徴抽出のためのConvNeXtバックボーンアーキテクチャに基づいて,訓練データのスペクトログラムを用いて各鳥類の原型パターンを学習する。新しいデータの分類は、これらのプロトタイプを潜在空間で比較することで行われ、同時にモデルの判断に対する理解しやすい説明を提供する。異なる地理的領域の鳥種を表す7つの異なるデータセットを用いて,本モデルの性能評価を行った。実験の結果, 平均AUROCは0.82, 平均cmAPは0.37となり, 鳥の音響分類における最先端のブラックボックスモデルに匹敵する結果を得た。そこで本研究は, 生物音響鳥類分類の困難な課題においても, 強力かつ解釈可能な深層学習モデルを開発して, ドメインの専門家に貴重な洞察を提供することを実証する。

Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions. We evaluated the performance of our model on seven different datasets representing bird species from different geographical regions. In our experiments, the model showed excellent results, achieving an average AUROC of 0.82 and an average cmAP of 0.37 across the seven datasets, making it comparable to state-of-the-art black-box models for bird sound classification. Thus, this work demonstrates that even for the challenging task of bioacoustic bird classification, powerful yet interpretable deep learning models can be developed to provide valuable insights to domain experts.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 大規模言語モデルによる規範的要件の運用

Normative Requirements Operationalization with Large Language Models ( http://arxiv.org/abs/2404.12335v2 )

ライセンス: Link先を確認

Nick Feng, Lina Marsso, S. Getir Yaman, Isobel Standen, Yesugen Baatartogtokh, Reem Ayad, Victória Oldemburgo de Mello, Bev Townsend, Hanne Bartels, Ana Cavalcanti, Radu Calinescu, Marsha Chechik,

(参考訳) 規範的な非機能要件は、社会的、法的、倫理的、共感的、文化的規範の違反を避けるために、システムが観察しなければならない制約を規定する。これらの要件は一般的に、異なる専門知識や優先順位を持つ非技術者の利害関係者(倫理学者、弁護士、社会科学者など)によって定義されるため、その整合性と一貫性の確保は非常に困難である。近年の研究では、規則として規範的要件を規定するためにドメイン固有の言語を使用して、一貫性を形式的なメソッドで分析できるという課題に対処している。本稿では,システム機能の抽象表現間の意味的関係を抽出するために,大規模言語モデルを用いた補完的アプローチを提案する。これらの関係は、しばしば非技術的利害関係者(例えば、常識やドメイン知識に基づいて)によって暗黙的に仮定され、規範的要求の一貫性を引き出して分析するための自動推論技術を強化するために使用される。実世界のケーススタディを通じて,規範的要件の導出と運用へのアプローチの有効性を示す。

Normative non-functional requirements specify constraints that a system must observe in order to avoid violations of social, legal, ethical, empathetic, and cultural norms. As these requirements are typically defined by non-technical system stakeholders with different expertise and priorities (ethicists, lawyers, social scientists, etc.), ensuring their well-formedness and consistency is very challenging. Recent research has tackled this challenge using a domain-specific language to specify normative requirements as rules whose consistency can then be analysed with formal methods. In this paper, we propose a complementary approach that uses Large Language Models to extract semantic relationships between abstract representations of system capabilities. These relations, which are often assumed implicitly by non-technical stakeholders (e.g., based on common sense or domain knowledge), are then used to enrich the automated reasoning techniques for eliciting and analyzing the consistency of normative requirements. We show the effectiveness of our approach to normative requirements elicitation and operationalization through a range of real-world case studies.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# ドメイン固有の質問応答のための検索補助生成

Retrieval Augmented Generation for Domain-specific Question Answering ( http://arxiv.org/abs/2404.14760v2 )

ライセンス: Link先を確認

Sanat Sharma, David Seunghyun Yoon, Franck Dernoncourt, Dewang Sultania, Karishma Bagga, Mengjiao Zhang, Trung Bui, Varun Kotte,

(参考訳) 質問応答(QA)は,大規模言語モデルの高度開発において重要な応用となっている。質問応答のための一般的な訓練済みの大規模言語モデルは、金融、医療、教育、顧客サービスといった特定の分野の知識や用語を適切に理解するために訓練されていない。ドメイン固有の理解をより良くするために、私たちはAdobe製品のための社内質問回答システムを構築しました。本稿では,大規模問合せデータベースをコンパイルする新しいフレームワークを提案し,大規模言語モデルの検索対応微調整手法を開発した。我々は,レトリバーの微調整が最終世代に大きな改善をもたらすことを示す。我々の全体的なアプローチは、文脈的接地のための最新の検索情報を維持しながら、世代間の幻覚を減らす。

Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# GSM8K で >97% を達成する: 問題を深く理解することで LLM が数学語問題により良い解をもたらす

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems ( http://arxiv.org/abs/2404.14963v3 )

ライセンス: Link先を確認

Qihuang Zhong, Kang Wang, Ziyang Xu, Juhua Liu, Liang Ding, Bo Du, Dacheng Tao,

(参考訳) CoT(Chain-of-Thought)のプロンプトにより、さまざまな推論タスクにわたるLLM(Large Language Models)のパフォーマンスが向上した。しかし、CoTは複雑な数学用語の問題を扱うのに不足しており、通常、意味的誤解エラー、計算エラー、ステップミスエラーという3つの落とし穴に悩まされる。従来の研究では、計算エラーとステップミスエラーに対処するが、LLMの性能を制限する主要な要因である意味的誤解の誤りを無視する。そこで本研究では,LLMの数学的問題解決能力を改善するために,意味的誤りに対処するシンプルな解法であるDeeply Understanding the Problems (DUP)を提案する。提案手法の核心は, LLMが問題を深く理解し, より良い推論に使用する重要な問題解決情報を抽出することを奨励することである。 10種類の多変量推論ベンチマークによる大規模な実験により、我々のDUP法は、他の手法よりもずっと優れています。さらに奨励的に、DUPはGSM8Kベンチマークで新しいSOTA結果を達成し、精度は97.1%である。

Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the LLMs' performance. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% under zero-shot setting.

翻訳日:2024-05-30 22:52:03 公開日:2024-05-29

# 医療産業における大規模言語モデル応用の評価に関する総合的研究

A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry ( http://arxiv.org/abs/2404.15777v4 )

ライセンス: Link先を確認

Yining Huang, Keke Tang, Meilian Chen, Boyuan Wang,

(参考訳) 2017年のTransformerアーキテクチャの開始以来、GPTやBERTのような大規模言語モデル(LLM)は大幅に進化し、言語理解と生成の高度な能力を持つ様々な産業に影響を与えた。これらのモデルは、医療分野を変革する可能性を示し、その効果的かつ倫理的な展開を保証するための特別な評価フレームワークの必要性を強調している。この包括的調査は、医療におけるLSMの広範な適用と必要な評価を概説し、医療の成果を高める上で、その能力を完全に活用するための実証的検証の重要性を強調した。本調査は,臨床環境,医療用テキストデータ処理,研究,教育,公衆衛生への意識といった分野におけるLCM応用の詳細な分析を行うために構成されている。まず,臨床診断,医用テキストデータ処理,情報検索,データ分析,教育コンテンツ生成などのタスクにおける評価結果に基づいて,様々な医療応用におけるLCMの役割を探求することから始める。その後のセクションでは、モデル、評価者、比較実験を含む、採用される評価方法とメトリクスについて包括的な議論がなされている。さらに,これらの評価に用いたベンチマークとデータセットについて検討し,質問応答,要約,情報抽出,バイオインフォマティクス,情報検索,総合ベンチマークなどのタスクのベンチマークを分類した記述を提供する。この構造は、医療領域におけるLSMの有効性、正確性、ユーザビリティ、倫理的整合性についてどのように評価されるか、徹底的に理解することを保証する。はぁ...。

Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in various medical applications, detailing their evaluation based on performance in tasks such as clinical diagnosis, medical text data processing, information retrieval, data analysis, and educational content generation. The subsequent sections offer a comprehensive discussion on the evaluation methods and metrics employed, including models, evaluators, and comparative experiments. We further examine the benchmarks and datasets utilized in these evaluations, providing a categorized description of benchmarks for tasks like question answering, summarization, information extraction, bioinformatics, information retrieval and general comprehensive benchmarks. This structure ensures a thorough understanding of how LLMs are assessed for their effectiveness, accuracy, usability, and ethical alignment in the medical domain. ...

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# REBEL:Regressing Relative Rewardsによる強化学習

REBEL: Reinforcement Learning via Regressing Relative Rewards ( http://arxiv.org/abs/2404.16767v2 )

ライセンス: Link先を確認

Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun,

(参考訳) 元々は連続的な制御問題のために開発されたが、PPO(Proximal Policy Optimization)は、生成モデルの微調整を含む様々な強化学習(RL)応用のワークホースとして登場した。残念ながら、PPOは安定収束を可能にするために複数のヒューリスティック(例えば、値ネットワーク、クリップ)を必要としており、これらのコンポーネントの正確な実装に敏感であることで有名である。これに対し、我々は後退して、生成モデルの時代における最小限のRLアルゴリズムがどのようなものになるのかを尋ねる。本稿では、ポリシー最適化の問題をきれいに軽減し、2つの完了間の相対報酬をプロンプトに回帰させ、極めて軽量な実装を可能にするアルゴリズムREBELを提案する。理論的には、自然ポリシーグラディエントのような基本的RLアルゴリズムはREBELの変種と見なせることが証明され、RLの文献における収束とサンプルの複雑さの観点から最も強力な理論的保証と一致させることができる。 REBELはまた、オフラインデータをきれいに組み込んで、実際によく見られる非推移的な好みを扱うように拡張することもできる。経験的に、REBELは言語モデリングと画像生成に統一的なアプローチを提供し、PPOやDPOに近い性能で、PPOよりも実装が簡単で、計算効率が良い。 Llama-3-8B-インストラクションを微調整すると、REBELはAlpacaEval 2.0、MT-Bench、Open LLM Leaderboardで高いパフォーマンスを達成した。

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative reward between two completions to a prompt in terms of the policy, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and be extended to handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally efficient than PPO. When fine-tuning Llama-3-8B-Instruct, REBEL achieves strong performance in AlpacaEval 2.0, MT-Bench, and Open LLM Leaderboard.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# ブロックチェーンを使用したパブリッククラウドにおけるデータ共有の緩和

Mitigating Data Sharing in Public Cloud using Blockchain ( http://arxiv.org/abs/2404.16872v2 )

ライセンス: Link先を確認

Patil Pratik Vijaykumar, Prerna Tulsiani, Dr. Sunil Mane,

(参考訳) パブリック・クラウド・コンピューティングは、ビジネスの運営方法に変化をもたらしたため、現代のITインフラの基本的な部分となっている。しかし、クラウドセキュリティの懸念は、データ保護、共有、アクセス制御に関連する新たなリスクと課題をもたらす。ブロックチェーンとクラウドのシナジスティックな統合は、大きな可能性を秘めている。ブロックチェーンの分散台帳は、中央集権的な権威への依存を減らすため、透明性、不変性、効率性を保証する。これを受けて、当社のフレームワークは、データ権利、データ共有、データバリデーションといった重要な側面を備えた、クラウド内のセキュアなデータエコシステムを提案しています。また、このアプローチはデータマイグレーションの必要性をなくすことで、相互運用性とスケーラビリティを向上させることを目指している。これにより、既存のパブリッククラウドベースのシステムが、信頼性の強化とクラウドデータの非再検討を容易にブロックチェーンをデプロイできるようになる。

Public Cloud Computing has become a fundamental part of modern IT infrastructure as its adoption has transformed the way businesses operate. However, cloud security concerns introduce new risks and challenges related to data protection, sharing, and access control. A synergistic integration of blockchain with the cloud holds immense potential. Blockchain's distributed ledger ensures transparency, immutability, and efficiency as it reduces the reliance on centralized authorities. Motivated by this, our framework proposes a secure data ecosystem in the cloud with the key aspects being Data Rights, Data Sharing, and Data Validation. Also, this approach aims to increase its interoperability and scalability by eliminating the need for data migration. This will ensure that existing public cloud-based systems can easily deploy blockchain enhancing trustworthiness and non-repudiation of cloud data.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# 会員推論攻撃に対するセンターベース緩和学習

Center-Based Relaxed Learning Against Membership Inference Attacks ( http://arxiv.org/abs/2404.17674v2 )

ライセンス: Link先を確認

Xingli Fang, Jung-Eun Kim,

(参考訳) メンバーシップ推論攻撃(MIA)は現在、主要なプライバシ攻撃戦略の1つと考えられており、その防御機構も広く検討されている。しかしながら、既存の防御アプローチと、パフォーマンスとデプロイメントコストの理想的なモデルとの間にはまだギャップがあります。特に,モデルのプライバシ脆弱性は,モデルのデータ記憶能力と一般化能力のギャップと密接に相関していることがわかった。そこで本研究では,任意の分類モデルに適応し,最小限あるいは不要なモデル一般化性を犠牲にすることで,プライバシ保護を提供する,CRL(Central-based relaxed learning)と呼ばれるアーキテクチャに依存しない新たな学習パラダイムを提案する。我々はCRLがメンバーデータと非メンバーデータの一貫性をよりよく維持できることを強調する。標準分類データセットに関する広範な実験を通じて、モデルキャパシティやデータコストを必要とせずに、このアプローチが同等のパフォーマンスを示すことを実証的に示す。

Membership inference attacks (MIAs) are currently considered one of the main privacy attack strategies, and their defense mechanisms have also been extensively explored. However, there is still a gap between the existing defense approaches and ideal models in performance and deployment costs. In particular, we observed that the privacy vulnerability of the model is closely correlated with the gap between the model's data-memorizing ability and generalization ability. To address this, we propose a new architecture-agnostic training paradigm called center-based relaxed learning (CRL), which is adaptive to any classification model and provides privacy preservation by sacrificing a minimal or no loss of model generalizability. We emphasize that CRL can better maintain the model's consistency between member and non-member data. Through extensive experiments on standard classification datasets, we empirically show that this approach exhibits comparable performance without requiring additional model capacity or data costs.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# GRAMMAR:ドメイン特化検索拡張言語モデルの評価のための基礎的およびモジュール的手法

GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model ( http://arxiv.org/abs/2404.19232v4 )

ライセンス: Link先を確認

Xinzhe Li, Ming Liu, Shang Gao,

(参考訳) Retrieval-augmented Generation (RAG) システムは、ドメイン固有の知識ベースを問うために、様々な産業で活発に研究され、展開されている。しかし、これらのシステムを評価することは、ドメイン固有のクエリの不足とそれに対応する基礎的な真実、そして障害の原因を診断するための体系的なアプローチの欠如など、ユニークな課題を示す。これらの課題に対処するために、GRAMMAR(GRounded and Modular Methodology for Assessment of RAG)という2つの要素からなる評価フレームワークを導入する。 1)リレーショナルデータベースとLLMを活用して,スケーラブルな問合せ対を効率的に生成するデータ生成プロセス。この方法では、言語的バリエーションからクエリロジックを分離し、デバッグ機能を増強する。 2)知識ギャップと堅牢性を区別し,欠陥モジュールの識別を可能にする評価フレームワーク。我々の経験的結果は、モデル脆弱性を正確に識別するために、現在の基準フリー評価手法の限界とGRAMMARの信頼性を裏付けるものである。

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs. This method facilitates the separation of query logic from linguistic variations for enhanced debugging capabilities; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# TRAMBA: 携帯・ウェアラブルプラットフォーム上での音声・骨伝導音声の高分解能・高機能化のためのハイブリッドトランスフォーマとマンバアーキテクチャ

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms ( http://arxiv.org/abs/2405.01242v3 )

ライセンス: Link先を確認

Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia,

(参考訳) 本稿では,モバイルおよびウェアラブルプラットフォームに適した音響・骨伝導音声強調のためのハイブリッドトランスフォーマーTRAMBAとMambaアーキテクチャを提案する。骨伝導音声強調は、モバイルおよびウェアラブルプラットフォームで採用されるには、いくつかの理由から非現実的である。 i) データ収集は労働集約的であり,その結果,不足する。 (II)数百MBのメモリフットプリントを持つ最先端モデルと資源制約システムに適した手法の間には,性能ギャップが存在する。 TRAMBAを振動に基づくセンシングに適応させるため、広範に利用できる音声音声データセットを用いてTRAMBAを事前訓練する。そして、少量の骨伝導データで微調整を行う。 TRAMBAは、PESQが最大7.3%、STOIが1.8%、メモリフットプリントが桁違いに小さく、推論速度が最大465倍である。我々はTRAMBAを実システムに統合し、TRAMBAを示す i)データサンプリングや送信を少なくすることで、ウェアラブルのバッテリ寿命を最大160%向上させる。 (ii) 雑音の多い環境下では, 放送音声よりも高品質な音声を生成する。 (iii)メモリフットプリントは20.0MB未満である。

We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# ロバスト平均化による正規化Q-ラーニング

Regularized Q-learning through Robust Averaging ( http://arxiv.org/abs/2405.02201v2 )

ライセンス: Link先を確認

Peter Schmitt-Förster, Tobias Sutter,

(参考訳) 本稿では,既存のQラーニング手法の弱点を原則的に解決する,2RA Qラーニングと呼ばれる新しいQラーニング変種を提案する。そのような弱点の1つは、制御できない、しばしばパフォーマンスが低下する、基礎となる推定バイアスである。本稿では,最大予測値項に対する分布的に頑健な推定器を提案し,提案した推定バイアスのレベルを正確に制御する。分布的に堅牢な推定器は、提案アルゴリズムがWatkinsのQ-learningに匹敵する計算コストを持つようなクローズドフォームの解を認めている。表の場合, 2RA Q-learning は最適方針に収束し, その漸近平均二乗誤差を解析する。最後に,理論的な知見を裏付ける様々な設定の数値実験を行い,既存の手法よりも2RA Q-learningが優れていることを示す。

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# ResNCT:CT尿中ネフロート相画像の深層学習モデル

ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography ( http://arxiv.org/abs/2405.04629v2 )

ライセンス: Link先を確認

Syed Jamal Safdar Gardezi, Lucas Aronson, Peter Wawrzyn, Hongkun Yu, E. Jason Abel, Daniel D. Shapiro, Meghan G. Lubner, Joshua Warner, Giuseppe Toia, Lu Mao, Pallavi Tiwari, Andrew L. Wentland,

(参考訳) 目的:CT urography(CTU)検査における腎画像合成のためのトランスフォーマーに基づく深層学習モデルの開発と評価を行う。資料と方法: この振り返り研究は地方機関審査委員会によって承認された。深層学習モデル開発のための3相CT尿路撮影を行った119例(平均SD年齢:65ドル:12歳:75/44男性/女性)のデータセットを作成した。各患者の3段階はアフィン登録アルゴリズムで一致した。ネフロート相CT画像合成(ResNCT)のための残留トランスフォーマモデル(Residual transformer model)を開発した。合成画像は、ピーク信号対雑音比(PSNR)、構造類似度指数(SSIM)、正規化クロス相関係数(NCC)、平均絶対誤差(MAE)、ルート平均二乗誤差(RMSE)など、複数の性能指標を用いて評価した。結果: ResNCTモデルは非コントラストおよび尿路画像入力から合成腎画像を生成することに成功した。地上の真相のネフローグラフィー画像では、モデルによって合成された画像は、高いPSNR (27.8$\pm$ 2.7 dB)、SSIM (0.88$\pm$ 0.05)、NAC (0.98$\pm$ 0.02)、低いMAE (0.02$\pm$ 0.005)、RMSE (0.042$\pm$ 0.016) を達成した。結論: ResNCT モデルにより, 地上の真理画像と高い類似性を有するネフロート相CT画像が合成された。 ResNCT モデルでは,CTU 試験において33% の放射線線量減少による腎症相の獲得を除去する手段を提供する。

Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# LangCell: 細胞アイデンティティ理解のためのLanguage-Cell事前トレーニング

LangCell: Language-Cell Pre-training for Cell Identity Understanding ( http://arxiv.org/abs/2405.06708v3 )

ライセンス: Link先を確認

Suyuan Zhao, Jiahuan Zhang, Yizhen Luo, Yushuai Wu, Zaiqing Nie,

(参考訳) 細胞識別は、細胞の種類、経路情報、疾患情報など、細胞の様々な意味的側面を包含しており、生物学者がその生物学的特性を理解するのに不可欠である。細胞型アノテートなどの転写学的データから細胞識別を理解することは、生体情報学において重要な課題となっている。これらのセマンティックな側面は人間の専門家によって決定されるため、単一セルとラベルペアによって提供される監視信号なしで、AIモデルが細胞アイデンティティ理解タスクを効果的に実行することは不可能である。このタスクに現在使用されているシングルセル事前訓練言語モデル(PLM)は、単一のモダリティ、トランスクリプトミクスデータのみに基づいて訓練され、セルアイデンティティの知識の理解が欠如している。結果として、望ましいセマンティックラベルでラベル付きデータを欠いている場合には、ダウンストリームタスクや苦労のために微調整される必要がある。この問題に対処するために,事前学習期間中に単一セルデータと自然言語の統一表現を構築し,セルアイデンティティに関連する洞察を直接組み込むという,革新的な手法を提案する。より具体的には、最初のLanguage-Cell事前トレーニングフレームワークであるLangCellを紹介します。 LangCellは、セルアイデンティティ情報に富んだテキストを利用して、クロスモーダルな知識の深い理解を得る。異なるベンチマークで実施された実験の結果、LangCellはゼロショットのセル識別理解シナリオで効果的に機能する唯一のシングルセルPLMであり、また、少数ショットと微調整のセル識別理解シナリオで既存のモデルよりも大幅に優れていることが示された。

Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, have become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce LangCell, the first Language-Cell pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# LLMで作ったブラックボックスの解説は、逆向きに役に立つ

LLM-Generated Black-box Explanations Can Be Adversarially Helpful ( http://arxiv.org/abs/2405.06800v2 )

ライセンス: Link先を確認

Rohan Ajwani, Shashidhar Reddy Javaji, Frank Rudzicz, Zining Zhu,

(参考訳) 大規模言語モデル(LLM)は,デジタルアシスタントとして機能することで,複雑な問題の解決と理解を支援する重要なツールになりつつある。 LLMは、これらの問題の入力と出力のみを与えられた場合、すなわち `black-box'' アプローチで、説得力のある説明を生成することができる。しかし、我々の研究はこのアプローチに結びついている隠れたリスクを明らかにし、それを*逆助力(adversarial helpness)*と呼ぶ。 LLMの説明が間違った答えを正しく見せると、これは起こります。本稿では,この問題が人間だけでなく,LLM評価者にも影響を及ぼすことを示す。より深く掘り下げて、LLMが採用する主要な説得戦略を特定し、検証する。以上の結果から,これらのモデルでは,質問の再フレーミング,信頼度の向上,ミスリードした回答を信頼できる光で表現するためのチェリーピッキングエビデンスなどの戦略が採用されていることが明らかとなった。 LLMが逆向きに有用な説明を生成する際に複雑な構造的知識をナビゲートできるかどうかを調べるため、グラフをナビゲートして特別なタスクを作成する。ほとんどのLSMは、単純なグラフに沿った代替経路を見つけることができず、それらの誤解を招く説明は複雑な知識を用いた論理的推論によってのみ生成されるものではないことを示唆している。これらの知見は,ブラックボックスの説明設定の限界に光を当て,LLMの安全利用に関するアドバイスを提供する。

Large Language Models (LLMs) are becoming vital tools that help us solve and understand complex problems by acting as digital assistants. LLMs can generate convincing explanations, even when only given the inputs and outputs of these problems, i.e., in a ``black-box'' approach. However, our research uncovers a hidden risk tied to this approach, which we call *adversarial helpfulness*. This happens when an LLM's explanations make a wrong answer look right, potentially leading people to trust incorrect solutions. In this paper, we show that this issue affects not just humans, but also LLM evaluators. Digging deeper, we identify and examine key persuasive strategies employed by LLMs. Our findings reveal that these models employ strategies such as reframing the questions, expressing an elevated level of confidence, and cherry-picking evidence to paint misleading answers in a credible light. To examine if LLMs are able to navigate complex-structured knowledge when generating adversarially helpful explanations, we create a special task based on navigating through graphs. Most LLMs are not able to find alternative paths along simple graphs, indicating that their misleading explanations aren't produced by only logical deductions using complex knowledge. These findings shed light on the limitations of the black-box explanation setting and allow us to provide advice on the safe usage of LLMs.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# 緩やかな非定常過程からの因果推論

Causal Inference from Slowly Varying Nonstationary Processes ( http://arxiv.org/abs/2405.06902v2 )

ライセンス: Link先を確認

Kang Du, Yu Xiang,

(参考訳) 制限構造因果モデル(SCM)フレームワークによる観測データからの因果推論は、非ガウス性や非線形性などのデータ生成機構による原因と効果の非対称性に大きく依存する。この手法は定常時系列に適応できるが、非定常時系列から因果関係を推定することは難しい課題である。本研究では,時間変化フィルタと定常雑音による制約付きSCMを新たに提案し,非定常性から非定常性への非対称性を利用して,二変量およびネットワーク設定の因果同定を行う。本稿では,2変量進化スペクトルの強力な推定値を利用して,ゆっくりと変化するプロセスに効率的な手順を提案する。提案手法の有効性を示すために,高次および非滑らかなフィルタを含む各種合成および実データセットの評価を行った。

Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# PeRFlow:Universal Plug-and-Play AcceleratorとしてのPiecewise Rectified Flow

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator ( http://arxiv.org/abs/2405.07510v3 )

ライセンス: Link先を確認

Hanshu Yan, Xingchao Liu, Jiachun Pan, Jun Hao Liew, Qiang Liu, Jiashi Feng,

(参考訳) 拡散モデルを高速化するフローベース手法であるPecewise Rectified Flow(PeRFlow)を提案する。 PeRFlowは、生成フローのサンプリングプロセスを複数の時間ウィンドウに分割し、リフロー操作を通じて各間隔の軌跡を直線化し、断片的な線形フローに近づく。 PeRFlowは数ステップの世代で優れたパフォーマンスを達成する。さらに、専用のパラメータ化を通じて、PeRFlowモデルは事前訓練された拡散モデルから知識を継承する。このように、トレーニングは高速に収束し、得られたモデルは、事前訓練された拡散モデルに基づいて様々なワークフローと互換性のある普遍的なプラグアンドプレイアクセラレータとして機能する、有利な転送能力を示す。トレーニングと推論のためのコードも公開されている。 https://github.com/magic-research/piecewise-rectified-flow

We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. Codes for training and inference are publicly released. https://github.com/magic-research/piecewise-rectified-flow

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# AnoVox: 自動運転におけるマルチモーダル異常検出ベンチマーク

AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving ( http://arxiv.org/abs/2405.07865v2 )

ライセンス: Link先を確認

Daniel Bogdoll, Iramm Hamdard, Lukas Namgyu Rößler, Felix Geisler, Muhammed Bayram, Felix Wang, Jan Imhof, Miguel de Campos, Anushervon Tabarov, Yitian Yang, Hanno Gottschalk, J. Marius Zöllner,

(参考訳) 自動運転車のスケールアップは、道路上のまれな物体のような異常に対処する能力に大きく依存している。このような状況に対処するためには、そもそも異常を検出する必要がある。自動走行の異常検出はここ数年で大きな進歩を遂げてきたが、カメラデータに強く焦点を絞った設計の悪いベンチマークに悩まされている。本研究では,自動運転におけるANOmaly検出のための最大のベンチマークであるAnoVoxを提案する。 AnoVoxは、大規模なマルチモーダルセンサーデータと空間的VOXel地上真実を組み込んでおり、使用済みセンサとは無関係な方法の比較を可能にしている。正規性の形式的定義を提案し,従順なトレーニングデータセットを提供する。 AnoVoxは、コンテンツと時間的異常の両方を含む最初のベンチマークである。

The scale-up of autonomous vehicles depends heavily on their ability to deal with anomalies, such as rare objects on the road. In order to handle such situations, it is necessary to detect anomalies in the first place. Anomaly detection for autonomous driving has made great progress in the past years but suffers from poorly designed benchmarks with a strong focus on camera data. In this work, we propose AnoVox, the largest benchmark for ANOmaly detection in autonomous driving to date. AnoVox incorporates large-scale multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. We propose a formal definition of normality and provide a compliant training dataset. AnoVox is the first benchmark to contain both content and temporal anomalies.

翻訳日:2024-05-30 22:42:17 公開日:2024-05-29

# オープンソース生成AIのリスクと機会

Risks and Opportunities of Open-Source Generative AI ( http://arxiv.org/abs/2405.08597v3 )

ライセンス: Link先を確認

Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, Fazel Keshtkar, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster,

(参考訳) Generative AI(Gen AI)の応用は、科学や医学、教育など、さまざまな分野に革命をもたらすことが期待されている。こうした地震的な変化の可能性は、この技術の潜在的なリスクについて活発に議論を巻き起こし、特にAI開発をリードする大手テック企業からの厳しい規制を要求した。この規制は、オープンソースの生成AIの誕生する分野を危険にさらす可能性がある。 Gen AI開発のための3段階のフレームワーク(近、中、長期)を使用して、現在利用可能なもの(中、中)と、より大きな機能(長期)を備えたオープンソース生成AIモデルのリスクと機会を分析します。全体として、オープンソースのGen AIの利点は、そのリスクを上回っている、と私たちは主張する。そのため、我々は、モデル、トレーニング、評価データのオープンソース化を奨励し、オープンソースの生成AIに関連するリスクを管理するための一連の推奨とベストプラクティスを提供します。

Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 特徴融合ネットワークを用いた人・機械用スケーラブル画像符号化

Scalable Image Coding for Humans and Machines Using Feature Fusion Network ( http://arxiv.org/abs/2405.09152v3 )

ライセンス: Link先を確認

Takahiro Shindo, Taiju Watanabe, Yui Tatsumi, Hiroshi Watanabe,

(参考訳) 画像認識モデルがより普及するにつれて、機械や人間のスケーラブルなコーディング方法がより重要になる。画像認識モデルの応用例としては、交通監視と農業管理がある。これらのユースケースでは、スケーラブルな符号化手法が有効であることが証明される。人間や機械の既存の画像圧縮手法は、これらの要件をある程度満たしている。しかし,これらの圧縮法は特定の画像認識モデルにのみ有効である。本稿では,多数の画像認識モデルと互換性のある人や機械を対象とした,学習に基づくスケーラブルな画像符号化手法を提案する。我々は,機械用画像圧縮モデルと圧縮モデルを組み合わせて,人間の画像復号を容易にするための追加情報を提供する。これらの圧縮モデルの特徴は、効率的な画像圧縮を実現するために、特徴融合ネットワークを用いて融合される。本手法では,特徴融合ネットワークにおいて,異なるサイズの特徴の組み合わせを可能とし,パラメータ数を削減するために,付加的な情報圧縮モデルを調整する。提案手法では,パラメータ数を削減しつつ,画像圧縮モデルを効率よく組み合わせることを確認する。さらに、デコードされた画像の品質とビットレートの観点から画像圧縮性能を評価することにより、提案手法の有効性を実証する。

As image recognition models become more prevalent, scalable coding methods for machines and humans gain more importance. Applications of image recognition models include traffic monitoring and farm management. In these use cases, the scalable coding method proves effective because the tasks require occasional image checking by humans. Existing image compression methods for humans and machines meet these requirements to some extent. However, these compression methods are effective solely for specific image recognition models. We propose a learning-based scalable image coding method for humans and machines that is compatible with numerous image recognition models. We combine an image compression model for machines with a compression model, providing additional information to facilitate image decoding for humans. The features in these compression models are fused using a feature fusion network to achieve efficient image compression. Our method's additional information compression model is adjusted to reduce the number of parameters by enabling combinations of features of different sizes in the feature fusion network. Our approach confirms that the feature fusion network efficiently combines image compression models while reducing the number of parameters. Furthermore, we demonstrate the effectiveness of the proposed scalable coding method by evaluating the image compression performance in terms of decoded image quality and bitrate.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# スピン対称性と熱的利用密度汎関数理論

Spin Symmetry in Thermally-Assisted-Occupation Density Functional Theory ( http://arxiv.org/abs/2405.09187v2 )

ライセンス: Link先を確認

Yu-Yang Wang, Jeng-Da Chai,

(参考訳) マルチ参照(MR)特性を持つ電子系では、従来の交換相関(xc)エネルギー汎関数を持つコーン・シャム密度汎関数論(KS-DFT)は誤ったスピン密度と関連する性質をもたらす。例えば、H2 解離の場合、KS-DFT で機能する同じ xc エネルギーで得られるスピン制限およびスピン非制限の解は、スピン非制限の解における非物理的スピン対称性の破れ効果を、はっきりと異なるものにすることができる。近年, 熱共役密度汎関数理論 (TAO-DFT) は, 実測温度を適切に選択した場合に, 上記のスピン対称性の破れを解消することが示されている。本研究では, TAO-DFTに基づく応答理論を開発し, 十分に高温のTAO-DFTがMR系の非物理的スピン対称性の破れを解消できることを示した。さらに, H2, N2, He2, Ne2の解離, およびねじれたエチレンの解離に対して, 種々の架空の温度のTAO-DFT計算を行った。

For electronic systems with multi-reference (MR) character, Kohn-Sham density functional theory (KS-DFT) with the conventional exchange-correlation (xc) energy functionals can lead to incorrect spin densities and related properties. For example, for H2 dissociation, the spin-restricted and spin-unrestricted solutions obtained with the same xc energy functional in KS-DFT can be distinctly different, yielding the unphysical spin-symmetry breaking effects in the spin-unrestricted solutions. Recently, thermally-assisted-occupation density functional theory (TAO-DFT) has been shown to resolve the aforementioned spin-symmetry breaking, when the fictitious temperature is properly chosen. In this work, a response theory based on TAO-DFT is developed to demonstrate that TAO-DFT with a sufficiently large fictitious temperature can always resolve the unphysical spin-symmetry breaking in MR systems. To further support this, TAO-DFT calculations with various fictitious temperatures are performed for the dissociation of H2, N2, He2, and Ne2 as well as the twisted ethylene.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# ホップ代数からの一般化クラスター状態:非可逆対称性とホップテンソルネットワーク表現

Generalized cluster states from Hopf algebras: non-invertible symmetry and Hopf tensor network representation ( http://arxiv.org/abs/2405.09277v2 )

ライセンス: Link先を確認

Zhian Jia,

(参考訳) クラスタ状態は、測定ベースの量子計算(MBQC)にとって重要なリソースである。対称性保護トポロジカル秩序(SPT)を示すため、トポロジカルフェーズの研究にも重要な役割を果たしている。ホップ代数に基づくクラスター状態の構成について述べる。有限群値quditをホップ代数値quditに一般化し、ホップ代数の正則作用に基づく一般化されたパウリ-X作用素を導入し、ホップ代数上の既約表現作用に基づく一般化されたパウリ-Z作用素を導入することにより、ホップ量子の包括的理論を開発する。ホップ四重項に対して非可逆対称性が自然に現れることを示す。その後、クラスタグラフと呼ばれる二部グラフに対して、同一性状態と自明な表現状態はそれぞれ偶数頂点と奇数頂点に割り当てる。エッジアンタングルを制御された正規動作として導入し、ホップクラスター状態の一般的な構成を提供する。エッジエンタングルの可換性を確保するために,任意の三角形多様体に対してクラスタ格子を構築する手法を提案する。構築を説明する例として,1dクラスタ状態の例を例に挙げる。これはSPT相の有望な候補として機能するので、このシナリオのためにギャップ付きハミルトニアンを構築し、その非可逆対称性に関する詳細な議論を掘り下げる。また,1dクラスタ状態モデルが準1dホップ量子二重モデルと等価であることを示す。また、構造定数のテンソル表現とホップ代数の弦図形を統合することでホップクラスター状態のホップテンソルネットワーク表現を導入する。

Cluster states are crucial resources for measurement-based quantum computation (MBQC). It exhibits symmetry-protected topological (SPT) order, thus also playing a crucial role in studying topological phases. We present the construction of cluster states based on Hopf algebras. By generalizing the finite group valued qudit to a Hopf algebra valued qudit and introducing the generalized Pauli-X operator based on the regular action of the Hopf algebra, as well as the generalized Pauli-Z operator based on the irreducible representation action on the Hopf algebra, we develop a comprehensive theory of Hopf qudits. We demonstrate that non-invertible symmetry naturally emerges for Hopf qudits. Subsequently, for a bipartite graph termed the cluster graph, we assign the identity state and trivial representation state to even and odd vertices, respectively. Introducing the edge entangler as controlled regular action, we provide a general construction of Hopf cluster states. To ensure the commutativity of the edge entangler, we propose a method to construct a cluster lattice for any triangulable manifold. We use the 1d cluster state as an example to illustrate our construction. As this serves as a promising candidate for SPT phases, we construct the gapped Hamiltonian for this scenario and delve into a detailed discussion of its non-invertible symmetries. We also show that the 1d cluster state model is equivalent to the quasi-1d Hopf quantum double model. We also introduce the Hopf tensor network representation of Hopf cluster states by integrating the tensor representation of structure constants with the string diagrams of the Hopf algebra.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 短距離光学系における機械学習:包括的調査

Machine Learning in Short-Reach Optical Systems: A Comprehensive Survey ( http://arxiv.org/abs/2405.09557v2 )

ライセンス: Link先を確認

Chen Shao, Elias Giacoumidis, Syed Moktacim Billah, Shi Li, Jialei Li, Prashasti Sahu, Andre Richter, Tobias Kaefer, Michael Faerber,

(参考訳) 近年,様々な直接検出・自己整合型短距離通信アプリケーションにおける機械学習アルゴリズムの利用について,広範な研究が進められている。これらのアプリケーションには、帯域幅要求予測、信号品質監視、障害検出、トラフィック予測、デジタル信号処理(DSP)に基づく等化など、幅広いタスクが含まれている。汎用的なアプローチとして、機械学習は、決定論的手法が不足する可能性のある光学系ネットワークにおける確率現象に対処する能力を示す。しかし、DSP等化アルゴリズムの場合、その性能改善はしばしば限界であり、特にパッシブ光ネットワーク(PON)のようなコストに敏感な短距離通信シナリオでは、その複雑さは著しく高い。時間的依存を捕捉し、不規則パターンや非線形パターンを効果的に処理し、変動時間間隔を調節する。本稿では,短距離通信における機械学習技術の応用について概説する。特に、機械学習信号処理に使用される時系列手法の新たな分類法を導入し、構造化された分類フレームワークを提供する。我々の分類学は、現在の時系列法を、伝統的な方法、フーリエ畳み込みに基づく方法、トランスフォーマーに基づくモデル、時系列畳み込みネットワークの4つのグループに分類する。最後に、この急速に発展する分野における今後の研究の方向性を強調し、ハードウェア実装に関連する複雑さを軽減するための具体的な解決策を概説する。我々は,複雑性問題に対処して,短時間の光通信システムにおいて,より実用的で効率的な機械学習アプローチの展開の道を開くことを目的としている。

In recent years, extensive research has been conducted to explore the utilization of machine learning algorithms in various direct-detected and self-coherent short-reach communication applications. These applications encompass a wide range of tasks, including bandwidth request prediction, signal quality monitoring, fault detection, traffic prediction, and digital signal processing (DSP)-based equalization. As a versatile approach, machine learning demonstrates the ability to address stochastic phenomena in optical systems networks where deterministic methods may fall short. However, when it comes to DSP equalization algorithms, their performance improvements are often marginal, and their complexity is prohibitively high, especially in cost-sensitive short-reach communications scenarios such as passive optical networks (PONs). They excel in capturing temporal dependencies, handling irregular or nonlinear patterns effectively, and accommodating variable time intervals. Within this extensive survey, we outline the application of machine learning techniques in short-reach communications, specifically emphasizing their utilization in high-bandwidth demanding PONs. Notably, we introduce a novel taxonomy for time-series methods employed in machine learning signal processing, providing a structured classification framework. Our taxonomy categorizes current time series methods into four distinct groups: traditional methods, Fourier convolution-based methods, transformer-based models, and time-series convolutional networks. Finally, we highlight prospective research directions within this rapidly evolving field and outline specific solutions to mitigate the complexity associated with hardware implementations. We aim to pave the way for more practical and efficient deployment of machine learning approaches in short-reach optical communication systems by addressing complexity concerns.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# ガウス状態に対するGKLSベクトル場ダイナミクス

GKLS Vector Field Dynamics for Gaussian States ( http://arxiv.org/abs/2405.10282v2 )

ライセンス: Link先を確認

Hans Cruz-Prado, Octavio Castaños, Giuseppe Marmo, Francisco Nettel,

(参考訳) ガウス状態によって記述された系に対するGKLS生成器に付随するベクトル場を構築する。このベクトル場は作用素の代数の双対空間上で定義され、位置と運動量の2次作用素に制限される。 GKLS動力学は分解原理、すなわち、このベクトル場を3つの部分、保守的ハミルトン成分、勾配的成分、チェ・クラウスベクトル場に分解できることを示した。最後の2つの用語は、散逸に関連する「摂動」と見なされている。散逸項の異なる調和振動子に対する例を示す。

We construct the vector field associated with the GKLS generator for systems described by Gaussian states. This vector field is defined on the dual space of the algebra of operators, restricted to operators quadratic in position and momentum. It is shown that the GKLS dynamics accepts a decomposition principle, that is, this vector field can be decomposed in three parts, a conservative Hamiltonian component, a gradient-like and a Choi-Krauss vector field. The last two terms are considered a "perturbation" associated with dissipation. Examples are presented for a harmonic oscillator with different dissipation terms.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 下次収束は弱凸関数に部分微分収束をもたらす:一様速度保証を伴う

Subgradient Convergence Implies Subdifferential Convergence on Weakly Convex Functions: With Uniform Rates Guarantees ( http://arxiv.org/abs/2405.10289v3 )

ライセンス: Link先を確認

Feng Ruan,

(参考訳) 非平滑で非凸確率最適化では、集団リスクにアプローチする際のサンプル平均推定値の定常点を解析するために、部分微分写像の均一収束を理解することが重要である。しかし、この収束を特徴づけることは依然として根本的な課題である。この研究は、経験的リスクが集団リスクに収束するにつれて、部分微分写像の均一収束と下次写像の均一収束を結びつけることによって、新しい視点を導入する。確率的弱凸対象に対しては、任意の開集合において、級数(対応する部分微分集合から任意に選択される)の収束に関する一様有界は、ハウスドルフ計量によって測られる部分微分集合自体の収束に関する一様有界となることを証明している。この手法を用いて,確率凸合成対象の偏微分集合に対する一様収束率を導出する。我々の結果は、Hausdorff計量において、集団と有限サンプル部分微分が連続である必要があるが、それでも厳密な収束速度を提供する、文学における主要な分布仮定に頼らない。これらの保証は、有限サンプル内のそのような目的の非滑らかな風景に対する新たな洞察をもたらす。

In nonsmooth, nonconvex stochastic optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of sample average approximations of risk as they approach the population risk. Yet, characterizing this convergence remains a fundamental challenge. This work introduces a novel perspective by connecting the uniform convergence of subdifferential mappings to that of subgradient mappings as empirical risk converges to the population risk. We prove that, for stochastic weakly-convex objectives, and within any open set, a uniform bound on the convergence of subgradients -- chosen arbitrarily from the corresponding subdifferential sets -- translates to a uniform bound on the convergence of the subdifferential sets itself, measured by the Hausdorff metric. Using this technique, we derive uniform convergence rates for subdifferential sets of stochastic convex-composite objectives. Our results do not rely on key distributional assumptions in the literature, which require the population and finite sample subdifferentials to be continuous in the Hausdorff metric, yet still provide tight convergence rates. These guarantees lead to new insights into the nonsmooth landscapes of such objectives within finite samples.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 放射線診断の自動化 : 最近の進歩を振り返って

Automated Radiology Report Generation: A Review of Recent Advances ( http://arxiv.org/abs/2405.10842v2 )

ライセンス: Link先を確認

Phillip Sloan, Philip Clatworthy, Edwin Simpson, Majid Mirmehdi,

(参考訳) 医療画像部門の需要が高まる中、放射線技師がタイムリーで正確なレポートを配信する能力に負担がかかっている。人工知能の最近の技術進歩は、自動放射線学レポート生成(ARRG)に大きな可能性を示し、研究の爆発を引き起こした。本稿では,現代ARRG手法の方法論的考察を行う。 (i)可用性、サイズ、採用率などの特性に基づくデータセットの評価。二コントラスト学習、強化学習等の深層学習訓練方法を検討すること。 3) CNNとトランスフォーマーモデルのバリエーションを含む最先端のモデルアーキテクチャを探求すること。四マルチモーダル入力及び知識グラフによる臨床知識の統合に関するアウトライン技術及び (v) 一般的に適用されるNLP測定値や質的臨床評価を含む, 現行モデル評価手法の精査を行った。さらに、レビューされたモデルの定量的結果を分析し、トップパフォーマンスモデルを調べ、さらなる洞察を求める。最後に、潜在的な新しい方向が強調され、他の放射線学的モダリティから追加のデータセットが採用され、将来の発展の重要な領域として予測される評価方法が改善された。

Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# CC-GPX:Common Crawlによる高品質アノテート地理空間データの抽出

CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl ( http://arxiv.org/abs/2405.11039v2 )

ライセンス: Link先を確認

Ilya Ilyankou, Meihui Wang, James Haworth, Stefano Cavazzi,

(参考訳) Common Crawl (CC) コーパスは2008年以来9.5ペタバイト以上のデータを含む最大のオープンウェブクローリングデータセットである。データセットは、大規模な言語モデルのトレーニングに役立ち、(望ましくない)コンテンツのために研究され、より小さなドメイン固有のデータセットのために蒸留されている。しかし、我々の知る限りでは、注釈付き地理空間データの源としてCCを用いる研究は行われていない。本稿では,CC で発見された GPX ファイルから注釈付きユーザ生成トラックを抽出する効率的なパイプラインと,最新の CC リリース6 から,人文記述と MultiLineString ベクトルデータのペア化によるマルチモーダルデータセットを提案する。このデータセットは、人々のアウトドアアクティビティパターン、アウトドアエクスペリエンスについて話す方法、軌跡生成やアノテーションモデルの開発に使用することができる。再現可能なコードはGitHubで入手可能です。

The Common Crawl (CC) corpus is the largest open web crawl dataset containing 9.5+ petabytes of data captured since 2008. The dataset is instrumental in training large language models, and as such it has been studied for (un)desirable content, and distilled for smaller, domain-specific datasets. However, to our knowledge, no research has been dedicated to using CC as a source of annotated geospatial data. In this paper, we introduce an efficient pipeline to extract annotated user-generated tracks from GPX files found in CC, and the resulting multimodal dataset with 1,416 pairings of human-written descriptions and MultiLineString vector data from the 6 most recent CC releases. The dataset can be used to study people's outdoor activity patterns, the way people talk about their outdoor experiences, and for developing trajectory generation or track annotation models. Our reproducible code is available on GitHub: https://github.com/ilyankou/cc-gpx

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 効率的なRow-wise Attentionを用いた高分解能マルチビュー拡散

Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention ( http://arxiv.org/abs/2405.11616v2 )

ライセンス: Link先を確認

Peng Li, Yuan Liu, Xiaoxiao Long, Feihu Zhang, Cheng Lin, Mengfei Li, Xingqun Qi, Shanghang Zhang, Wenhan Luo, Ping Tan, Wenping Wang, Qifeng Liu, Yike Guo,

(参考訳) 本稿では,単一視点画像から高解像度のマルチビュー画像を生成する新しい多視点拡散法であるEra3Dを紹介する。マルチビュー生成の大幅な進歩にもかかわらず、既存の手法はカメラ前のミスマッチ、非効率性、解像度の低さに悩まされ、結果として画質の悪いマルチビュー画像となる。具体的には、入力画像は予め定義されたカメラタイプ、例えば焦点距離が一定である視点カメラに従わなければならないと仮定し、仮定が失敗すると歪んだ形状になる。さらに、それらが採用するフルイメージや高密度なマルチビューの注目は、画像解像度が増大するにつれて、計算複雑性の爆発的な爆発を引き起こす。仮定と現実のギャップを埋めるために、Era3Dはまず拡散型カメラ予測モジュールを提案し、入力画像の焦点長と高さを推定し、形状歪みのない画像を生成する。さらに,多視点拡散の先駆的先行を強制するために,行ワイドアテンションと呼ばれるシンプルだが効率的なアテンション層が用いられ,効率的なクロスビュー情報融合が実現されている。その結果、最先端の手法と比較して、Era3Dは最大512*512解像度の高品質なマルチビュー画像を生成し、計算複雑性を12倍に削減した。総合的な実験により、Era3Dは様々な単一ビューの入力画像から高品質で詳細な3Dメッシュを再構築でき、ベースラインのマルチビュー拡散法よりも大幅に優れていることが示された。プロジェクトページ: https://penghtyx.github.io/Era3D/。

In this paper, we introduce Era3D, a novel multiview diffusion method that generates high-resolution multiview images from a single-view image. Despite significant advancements in multiview generation, existing methods still suffer from camera prior mismatch, inefficacy, and low resolution, resulting in poor-quality multiview images. Specifically, these methods assume that the input images should comply with a predefined camera type, e.g. a perspective camera with a fixed focal length, leading to distorted shapes when the assumption fails. Moreover, the full-image or dense multiview attention they employ leads to an exponential explosion of computational complexity as image resolution increases, resulting in prohibitively expensive training costs. To bridge the gap between assumption and reality, Era3D first proposes a diffusion-based camera prediction module to estimate the focal length and elevation of the input image, which allows our method to generate images without shape distortions. Furthermore, a simple but efficient attention layer, named row-wise attention, is used to enforce epipolar priors in the multiview diffusion, facilitating efficient cross-view information fusion. Consequently, compared with state-of-the-art methods, Era3D generates high-quality multiview images with up to a 512*512 resolution while reducing computation complexity by 12x times. Comprehensive experiments demonstrate that Era3D can reconstruct high-quality and detailed 3D meshes from diverse single-view input images, significantly outperforming baseline multiview diffusion methods. Project page: https://penghtyx.github.io/Era3D/.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# Track Anything Rapter (TAR)

Track Anything Rapter(TAR) ( http://arxiv.org/abs/2405.11655v2 )

ライセンス: Link先を確認

Tharun V. Puthanveettil, Fnu Obaid ur Rahman,

(参考訳) 物体追跡はコンピュータビジョンにおける基本的なタスクであり、交通監視、ロボット工学、自律走行車追跡など、様々な領域にまたがる幅広い実用的応用がある。本研究の目的は,テキスト,画像,クリックなどのユーザが提供するマルチモーダルクエリに基づいて,関心のあるオブジェクトを検出し,セグメンテーションし,追跡することを目的とした,TAR(Track Anything Rapter)と呼ばれる高度な航空車両システムを開発することである。 TARは、DINO、CLIP、SAMといった最先端の事前訓練モデルを使用して、クエリされたオブジェクトの相対的なポーズを推定する。トラッキング問題はVisual Servoingタスクとしてアプローチされており、UAVは高度なモーションプランニングと制御アルゴリズムを通じてオブジェクトに一貫してフォーカスすることができる。我々は、これらの基礎モデルとカスタムの高レベル制御アルゴリズムの統合によって、カスタムビルドされたPX4 Autopilot対応のVoxl2 M500ドローンに、高度に安定して正確なトラッキングシステムを構築する方法を紹介する。追従アルゴリズムの性能を検証するために,Vicon ベースの基底真理と比較した。さらに,オクルージョンを含むシナリオにおける追跡支援における基礎モデルの信頼性を評価する。最後に、クリック、バウンディングボックス、イメージテンプレートなど、複数のモードでシームレスに機能するモデルの能力をテストし、検証する。

Object tracking is a fundamental task in computer vision with broad practical applications across various domains, including traffic monitoring, robotics, and autonomous vehicle tracking. In this project, we aim to develop a sophisticated aerial vehicle system known as Track Anything Rapter (TAR), designed to detect, segment, and track objects of interest based on user-provided multimodal queries, such as text, images, and clicks. TAR utilizes cutting-edge pre-trained models like DINO, CLIP, and SAM to estimate the relative pose of the queried object. The tracking problem is approached as a Visual Servoing task, enabling the UAV to consistently focus on the object through advanced motion planning and control algorithms. We showcase how the integration of these foundational models with a custom high-level control algorithm results in a highly stable and precise tracking system deployed on a custom-built PX4 Autopilot-enabled Voxl2 M500 drone. To validate the tracking algorithm's performance, we compare it against Vicon-based ground truth. Additionally, we evaluate the reliability of the foundational models in aiding tracking in scenarios involving occlusions. Finally, we test and validate the model's ability to work seamlessly with multiple modalities, such as click, bounding box, and image templates.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 3峡谷貯水池における地すべり感受性マッピングのための統計的・機械学習・深層学習モデルの解釈可能性

Interpretability of Statistical, Machine Learning, and Deep Learning Models for Landslide Susceptibility Mapping in Three Gorges Reservoir Area ( http://arxiv.org/abs/2405.11762v2 )

ライセンス: Link先を確認

Cheng Chen, Lei Fan,

(参考訳) 地すべり感受性マッピング(LSM)は,高リスク領域の特定と予防戦略の実施に不可欠である。本研究では,地すべりの感受性予測における統計的,機械学習(ML),深層学習(DL)モデルの解釈可能性について検討した。これは、地すべりに統計的に関係のある19の要因の包括的セットと、地すべりを誘発する直接に関連する9の要因の専用のセットの2種類の入力因子を組み込むことによって達成される。モデル性能がLSMの重要な指標であることを考えると、解釈可能性に関する調査は、考慮されたモデル間でのLSMの精度の評価と比較を自然に行ないます。本研究では、畳み込みニューラルネットワークモデルが最も精度が高く(19因子0.8447、0.8048、9因子 0.8048)、一方Extreme Gradient Boosting and Support Vector Machineは、従来の統計モデルよりも優れた予測能力を示した。これらの結果から,DLアルゴリズムと高度MLアルゴリズムは,入力要因と地すべりの発生との複雑な関係を効果的に捉えることができることがわかった。しかし、予測の解釈性は、特に19の要因のより広いセットを使用する場合、様々なモデルで異なっていた。 SHAP、LIME、DeepLIFTといった説明法も解釈結果のバリエーションをもたらしている。 19因子からなる包括的集合を用いることで予測精度は向上したが、モデル解釈における複雑さと矛盾が導入された。予測力は犠牲になったが、様々なモデルにまたがるより一貫した重要な要因によって証明され、フィールド調査レポートの調査結果と一致していたように、9つの要因の専用セットに焦点をあてることで解釈可能性を高めた。

Landslide susceptibility mapping (LSM) is crucial for identifying high-risk areas and informing prevention strategies. This study investigates the interpretability of statistical, machine learning (ML), and deep learning (DL) models in predicting landslide susceptibility. This is achieved by incorporating various relevant interpretation methods and two types of input factors: a comprehensive set of 19 contributing factors that are statistically relevant to landslides, as well as a dedicated set of 9 triggering factors directly associated with triggering landslides. Given that model performance is a crucial metric in LSM, our investigations into interpretability naturally involve assessing and comparing LSM accuracy across different models considered. In our investigation, the convolutional neural network model achieved the highest accuracy (0.8447 with 19 factors; 0.8048 with 9 factors), while Extreme Gradient Boosting and Support Vector Machine also demonstrated strong predictive capabilities, outperforming conventional statistical models. These findings indicate that DL and sophisticated ML algorithms can effectively capture the complex relationships between input factors and landslide occurrence. However, the interpretability of predictions varied among different models, particularly when using the broader set of 19 contributing factors. Explanation methods like SHAP, LIME, and DeepLIFT also led to variations in interpretation results. Using a comprehensive set of 19 contributing factors improved prediction accuracy but introduced complexities and inconsistency in model interpretations. Focusing on a dedicated set of 9 triggering factors sacrificed some predictive power but enhanced interpretability, as evidenced by more consistent key factors identified across various models and alignment with the findings of field investigation reports....

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# データアノテーションの効率的・統計的品質推定法について

On Efficient and Statistical Quality Estimation for Data Annotation ( http://arxiv.org/abs/2405.11919v2 )

ライセンス: Link先を確認

Jan-Christoph Klie, Juan Haladjian, Marc Kirchner, Rahul Nair,

(参考訳) アノテーション付きデータセットは、教師付き機械学習モデルをトレーニング、評価、比較、生産化するための重要な要素である。したがって、アノテーションが高品質であることは必須である。彼らの創造のためには、優れた品質管理とそれによる信頼性の高い品質見積が必要である。そして、アノテーション処理中に品質が不十分な場合には、修正措置を講じて改善することができる。品質評価は、専門家が手動でインスタンスを正しくも正しくもラベル付けすることで行われることが多い。しかし、アノテーション付きのインスタンスをチェックするのはコストがかかる傾向にある。したがって、実際には、通常はサブセットのみを検査するが、大部分は正当化や統計的なパワーを考慮せずに選択され、多くの場合は比較的小さい。しかし、小さなサンプルサイズに基づく推定は、誤り率の不正確な値につながる可能性がある。不要な大規模なサンプルサイズの使用には、例えばアノテーションの追加など、もっと多くの費用がかかる可能性がある。そこで我々はまず,アノテーションの誤り率を推定するのに必要となる最小限のサンプルサイズを見つけるために,信頼区間の使い方を詳細に記述する。次に, 誤り率推定の代替として, 受入サンプリングを適用することで, 同じ統計的保証を提供しながら, 必要なサンプルサイズを最大50%削減できることを示す。

Annotated datasets are an essential ingredient to train, evaluate, compare and productionalize supervised machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. Quality estimation is often performed by having experts manually label instances as correct or incorrect. But checking all annotated instances tends to be expensive. Therefore, in practice, usually only subsets are inspected; sizes are chosen mostly without justification or regard to statistical power and more often than not, are relatively small. Basing estimates on small sample sizes, however, can lead to imprecise values for the error rate. Using unnecessarily large sample sizes costs money that could be better spent, for instance on more annotations. Therefore, we first describe in detail how to use confidence intervals for finding the minimal sample size needed to estimate the annotation error rate. Then, we propose applying acceptance sampling as an alternative to error rate estimation We show that acceptance sampling can reduce the required sample sizes up to 50% while providing the same statistical guarantees.

翻訳日:2024-05-30 22:32:31 公開日:2024-05-29

# 単一画像の学習:マルチモーダル大言語モデルにおける効率的な機械学習

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models ( http://arxiv.org/abs/2405.12523v2 )

ライセンス: Link先を確認

Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi,

(参考訳) 機械学習は、機械学習モデルにエンコードされたプライベートまたはセンシティブな情報を削除することによって、忘れられる権利を持つ個人に権限を与える。しかし、Multimodal Large Language Models (MLLM) にMUを効果的に適用できるかは、特にリークされた概念の視覚的データを忘れるシナリオにおいて不確実である。この課題を克服するために, 複数ステップで単一の画像を微調整することで, 概念の視覚的認識を解き放つための, SIU (Single Image Unlearning) を提案する。 SIUは2つの重要な側面から構成される。 i)多面的微調整データの構築。我々は,忘れられる概念の微調整データを構築するための4つの目標を導入する。 (二)共同訓練損失概念の視覚的認識を同期的に忘れ,MLLMの実用性を維持するために,Cross Entropy Lossと組み合わせた新しいDual Masked KL-divergence Lossを用いてMLLMを微調整する。本手法と並行して,MLLMにおけるMUの新しいベンチマークであるMMUBenchを確立し,その評価のためのメトリクスの集合を導入する。 MMUBench の実験結果から,SIU は既存手法の性能を大幅に上回っていることがわかった。さらに,SIUは侵入的メンバーシップ推論攻撃や脱獄攻撃を回避できることがわかった。私たちの知る限りでは、MLLMでMUを初めて探求しています。近い将来、コードとベンチマークをリリースします。

Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Jointly training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-29

# AIのタッチを見つける: LLM対応のスパンをテキストで識別する

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text ( http://arxiv.org/abs/2405.12689v2 )

ライセンス: Link先を確認

Yafu Li, Zhilin Wang, Leyang Cui, Wei Bi, Shuming Shi, Yue Zhang,

(参考訳) AI生成テキスト検出は、強力な言語モデルが人間レベルの生成に近づくにつれ、注目を集めている。限定的な作業は、(部分的には)AIパラフレーズテキストの検出に費やされている。しかし、AIパラフレーズは、テキストの洗練と多様性のための様々なアプリケーションシナリオで一般的に使用される。そこで本研究では,パラフレーズ付きテキストスパン検出(PTD)という新たな検出フレームワークを提案し,テキスト内のパラフレーズ付きテキストスパンを同定する。テキストレベルの検出とは異なり、PTDは全文を取り込み、各文にパラフレーズ度を示すスコアを割り当てる。パラフレーズ付きテキストスパン検出のための専用データセットであるPASTEDを構築した。 In-distriionとout-of-distriionの結果は、AIパラフレーズテキストスパンの同定におけるPTDモデルの有効性を示す。統計的およびモデル解析は、パラフレーズ付きテキストの周囲の文脈の重要な役割を説明する。広範な実験により、PTDモデルは多種多様なパラフレージングプロンプトと複数のパラフレージングテキストスパンに一般化できることが示されている。私たちはリソースをhttps://github.com/Linzwcs/PASTEDでリリースします。

AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-29

# 検索広告戦略の最適化:強化広告ランキングと入札のための強化強化学習と一般第二価格オークションの統合

Optimizing Search Advertising Strategies: Integrating Reinforcement Learning with Generalized Second-Price Auctions for Enhanced Ad Ranking and Bidding ( http://arxiv.org/abs/2405.13381v2 )

ライセンス: Link先を確認

Chang Zhou, Yang Zhao, Jin Cao, Yi Shen, Xiaoling Cui, Chiyu Cheng,

(参考訳) 本稿では,Eコマースプラットフォームにおける広告ランキングと入札機構に着目し,検索広告における戦略的最適化手法の統合について検討する。強化学習と進化戦略の組み合わせを用いて,多様なユーザインタラクションに適応し,広告主コスト,ユーザ関連性,プラットフォーム収益のバランスを最適化する動的モデルを提案する。提案手法は,広告の配置精度とコスト効率を大幅に向上させ,実際のシナリオにおけるモデルの適用性を示すものである。

This paper explores the integration of strategic optimization methods in search advertising, focusing on ad ranking and bidding mechanisms within E-commerce platforms. By employing a combination of reinforcement learning and evolutionary strategies, we propose a dynamic model that adjusts to varying user interactions and optimizes the balance between advertiser cost, user relevance, and platform revenue. Our results suggest significant improvements in ad placement accuracy and cost efficiency, demonstrating the model's applicability in real-world scenarios.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-29

# 物理AIハイブリッドモデリングによる天気予報の微粒化

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling ( http://arxiv.org/abs/2405.13796v3 )

ライセンス: Link先を確認

Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai,

(参考訳) データ駆動人工知能(AI)モデルは、特に中距離や近距離での天気予報において大きな進歩を遂げている。しかし、ほとんどのデータ駆動の天気予報モデルは、時間次元の微細な物理的進化ではなく、データマッピングの学習に焦点を当てたブラックボックスシステムである。その結果、データセットの時間スケールの制限により、これらのモデルはより詳細な時間スケールでの予測を妨げている。本稿では,天気予報をトレーニングデータセットを超える細粒度テンポラルスケールに一般化する物理AIハイブリッドモデル(WeatherGFT)を提案する。具体的には、小さな時間スケール(例えば300秒)で物理進化をシミュレートするために慎重に設計されたPDEカーネルを使用し、学習可能なルータと並列ニューラルネットワークを用いてバイアス補正を行う。さらに、異なるリードタイムでのモデルの一般化を促進するためのリードタイムアウェアトレーニングフレームワークを導入する。物理AIモジュールの重み解析は、物理学が大きな進化をし、AIが適応的に修正を行うことを示している。大規模な実験により、WeatherGFTは時間単位のデータセットでトレーニングされ、複数のリードタイムで最先端のパフォーマンスを達成し、30分間の予測を一般化する能力を示している。

Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets prevent these models from forecasting at finer time scales. This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales beyond training dataset. Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale (e.g., 300 seconds) and use a parallel neural networks with a learnable router for bias correction. Furthermore, we introduce a lead time-aware training framework to promote the generalization of the model at different lead times. The weight analysis of physics-AI modules indicates that physics conducts major evolution while AI performs corrections adaptively. Extensive experiments show that WeatherGFT trained on an hourly dataset, achieves state-of-the-art performance across multiple lead times and exhibits the capability to generalize 30-minute forecasts.

翻訳日:2024-05-30 22:22:47 公開日:2024-05-29

# マルチモーダル大言語モデルにおける視覚的推論補充のための投機的プロンプト

Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models ( http://arxiv.org/abs/2405.13872v2 )

ライセンス: Link先を確認

Qiji Zhou, Ruochen Zhou, Zike Hu, Panzhong Lu, Siyang Gao, Yue Zhang,

(参考訳) CoT(Chain-of-Thought)と関連する合理性に基づく研究の最近の進歩は、複雑な推論タスクにおけるLarge Language Models(LLM)の性能を大幅に向上させた。 MLLM(Multimodal Large Language Models)の進化に伴い、複雑なマルチモーダル推論問題に対処する能力の向上が重要なフロンティアとなっている。しかし、CoTにマルチモーダルな論理を組み込むことは、まだ十分には研究されていない。本稿では,MLLMの視覚的合理性を段階的に抽出する,IoT(Image-of-Thought)プロンプト手法を提案する。具体的には、IoTプロンプトは入力画像と質問に基づいて重要な視覚情報抽出操作を自動的に設計することができる。視覚情報リファインメントの各ステップは、複雑な視覚的推論問題に対する回答をサポートする特定の視覚的理性を特定する。テキストCoT以外にも、IoTは視覚的およびテキスト的合理性を利用して、MLLMが複雑なマルチモーダル情報を理解するのに役立つ。 IoTプロンプトは、さまざまなMLLMのさまざまな視覚的理解タスクにおいて、ゼロショットの視覚的推論性能を改善した。さらに、IoTによって生成されたステップバイステップの視覚的特徴説明は、視覚的推論プロセスを解明し、大規模マルチモーダルモデルの認知過程の分析を支援する。

Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT has yet to be thoroughly investigated. We propose the Image-of-Thought (IoT) prompting method, which helps MLLMs to extract visual rationales step-by-step. Specifically, IoT prompting can automatically design critical visual information extraction operations based on the input images and questions. Each step of visual information refinement identifies specific visual rationales that support answers to complex visual reasoning questions. Beyond the textual CoT, IoT simultaneously utilizes visual and textual rationales to help MLLMs understand complex multimodal information. IoT prompting has improved zero-shot visual reasoning performance across various visual understanding tasks in different MLLMs. Moreover, the step-by-step visual feature explanations generated by IoT prompting elucidate the visual reasoning process, aiding in analyzing the cognitive processes of large multimodal models

翻訳日:2024-05-30 22:22:47 公開日:2024-05-29

# 実現可能なコンセプトセットジェネレータによる高速説明可能性

Fast Explainability via Feasible Concept Sets Generator ( http://arxiv.org/abs/2405.18664v1 )

ライセンス: Link先を確認

Deng Pan, Nuno Moniz, Nitesh Chawla,

(参考訳) 長年のジレンマは、一般的な適用性と推論速度という、より広範な説明方法の適用を防止する。一方、既存のモデルに依存しない説明法は、説明すべき予測モデルについて最小限の事前推定を行う。それでも、モデルの振る舞いを近似するために、伝播やバックプロパゲーションを通じてモデルに追加のクエリが必要であるため、推論が遅くなり、時間に敏感なタスクでの使用が妨げられる。一方で、低コストで高速な推論を実現するためのモデルに依存した様々な説明が提案されている。本研究では,モデルに依存しないアプローチの普遍性とモデル固有のアプローチの効率とのギャップを,予測モデルの構造を仮定せずに新たなフレームワークを提案し,推論時に高い効率を達成し,リアルタイムな説明を可能にすることによって橋渡しする。これを実現するために、まず、人間の理解可能な概念の集合を通して説明を定義し、最小限の概念集合を通してモデル予測を解明する枠組みを提案する。第二に、最小限の可能な集合生成器が予測モデルに付随する説明として学習できることを示し、予測のための説明を生成する。最後に、実時間推論を容易にしながら、堅牢な説明を提供する新しいモデルに依存しない手法を実装することにより、この枠組みを検証する。我々の主張は包括的な実験によって裏付けられ、我々のアプローチの有効性と効率を強調している。

A long-standing dilemma prevents the broader application of explanation methods: general applicability and inference speed. On the one hand, existing model-agnostic explanation methods usually make minimal pre-assumptions about the prediction models to be explained. Still, they require additional queries to the model through propagation or back-propagation to approximate the models' behaviors, resulting in slow inference and hindering their use in time-sensitive tasks. On the other hand, various model-dependent explanations have been proposed that achieve low-cost, fast inference but at the expense of limiting their applicability to specific model structures. In this study, we bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches by proposing a novel framework without assumptions on the prediction model's structures, achieving high efficiency during inference and allowing for real-time explanations. To achieve this, we first define explanations through a set of human-comprehensible concepts and propose a framework to elucidate model predictions via minimal feasible concept sets. Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model, generating explanations for predictions. Finally, we validate this framework by implementing a novel model-agnostic method that provides robust explanations while facilitating real-time inference. Our claims are substantiated by comprehensive experiments, highlighting the effectiveness and efficiency of our approach.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# Zipper: モダリティを再利用するための多層デコーダアーキテクチャ

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities ( http://arxiv.org/abs/2405.18669v1 )

ライセンス: Link先を確認

Vicky Zayats, Peter Chen, Melissa Merrari, Dirk Padfield,

(参考訳) 複数の生成基盤モデル、特に異なるモダリティで訓練されたモデルを統合することは、その部分の総和よりも大きい何かに重大な課題をもたらす。 2つの主要なハードルは、整列データ(同様の意味を持つが異なるモダリティで表現される概念)の可用性と、ドメイン間の生成タスクにおいて、元のユニモーダル能力を損なうことなく、効果的にユニモーダル表現を活用することである。本稿では,これらの問題に対処する多目的デコーダアーキテクチャであるZipperを提案する。音声とテキストのモダリティを融合させる実験では,限定されたテキスト音声データを持つシナリオにおいて,提案アーキテクチャが極めて競合的に機能することを示した。また,本モデルでは,対応する変調塔(e.g.テキスト)を凍結することにより,単調(e.g.テキスト・テキスト生成)生成性能を選択的に維持する柔軟性を示す。出力モダリティがテキストである自動音声認識(ASR)のようなクロスモーダルタスクにおいて、テキストバックボーンの凍結が無視可能な性能劣化をもたらすことを示す。出力モダリティが音声であるTTS(text-to-Speech Generation)のようなクロスモーダルなタスクでは、事前訓練された音声バックボーンを使用することで、ベースラインよりも優れたパフォーマンスが得られることを示す。

Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain generative tasks, without compromising their original unimodal capabilities. We propose Zipper, a multi-tower decoder architecture that addresses these concerns by using cross-attention to flexibly compose multimodal generative models from independently pre-trained unimodal decoders. In our experiments fusing speech and text modalities, we show the proposed architecture performs very competitively in scenarios with limited aligned text-speech data. We also showcase the flexibility of our model to selectively maintain unimodal (e.g., text-to-text generation) generation performance by freezing the corresponding modal tower (e.g. text). In cross-modal tasks such as automatic speech recognition (ASR) where the output modality is text, we show that freezing the text backbone results in negligible performance degradation. In cross-modal tasks such as text-to-speech generation (TTS) where the output modality is speech, we show that using a pre-trained speech backbone results in superior performance to the baseline.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# リレーショナルデータベースへの微分プライベート合成データの適用

Adapting Differentially Private Synthetic Data to Relational Databases ( http://arxiv.org/abs/2405.18670v1 )

ライセンス: Link先を確認

Kaveh Alimohammadi, Hao Wang, Ojas Gulati, Akash Srivastava, Navid Azizan,

(参考訳) 既存の差分プライベート(DP)合成データ生成機構は、典型的には単一ソーステーブルを仮定する。実際には、データは複数のテーブルに分散し、テーブルにまたがる関係を持つことが多い。本稿では,既存のDP機構と組み合わせて合成関係データベースを生成するアルゴリズムを提案する。本アルゴリズムは,参照整合性を維持しつつ,低次辺分布の近似誤差を最小限に抑えるために,個々の合成表間の関係を反復的に洗練する。最後に,提案アルゴリズムのDPと理論的実用性を保証する。

Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce the first-of-its-kind algorithm that can be combined with any existing DP mechanisms to generate synthetic relational databases. Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors in terms of low-order marginal distributions while maintaining referential integrity. Finally, we provide both DP and theoretical utility guarantees for our algorithm.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 透かしのカウンターファクトな説明

Watermarking Counterfactual Explanations ( http://arxiv.org/abs/2405.18671v1 )

ライセンス: Link先を確認

Hangzhi Guo, Amulya Yadav,

(参考訳) 説明可能な人工知能(XAI)の分野は、現代の機械学習(ML)モデルを支える意思決定プロセスについてエンドユーザに説明を提供する技術に焦点を当てている。 XAIテクニックの広大な宇宙では、予測結果に悪影響を及ぼす個々のエンドユーザーに対して、容易に理解しやすく、行動可能な(あるいは対照的な)ケースを提供することによって、MLモデルの予測を説明するために、反ファクトリアル(CF)の説明がエンドユーザによって好まれることが多い。しかし、最近の研究では、実世界のアプリケーションでCFの説明を使用する際の重大なセキュリティ上の懸念が示されている。特に、悪意のある敵はCFの説明を利用して、プロプライエタリなMLモデルに対してクエリ効率の良いモデル抽出攻撃を行うことができる。本稿では,不許可なモデル抽出攻撃(CF説明に依存する)の検出に利用することができるモデル非依存型透かしフレームワーク(CF説明に透かしを追加する)を提案する。提案するフレームワークは,2段階の最適化問題を解くことで,生成したCF説明に識別不能な透かしを埋め込むことにより,これらのCF説明に依存する将来のモデル抽出攻撃を,Null hypothesis important testing (NHST) スキームを用いて検出し,これらの埋め込み透かしが生成されたCF説明の品質を損なわないことを保証する。我々は,本フレームワークの性能を,実世界のさまざまなデータセット,CF説明手法,モデル抽出手法で評価し,透かしを用いたCF説明を用いてトレーニングした抽出MLモデルを正確に識別するために,透かし検出システムを使用することを実証した。我々の研究は、現実世界のアプリケーションでCFの説明を安全に採用するための道を開いた。

The field of Explainable Artificial Intelligence (XAI) focuses on techniques for providing explanations to end-users about the decision-making processes that underlie modern-day machine learning (ML) models. Within the vast universe of XAI techniques, counterfactual (CF) explanations are often preferred by end-users as they help explain the predictions of ML models by providing an easy-to-understand & actionable recourse (or contrastive) case to individual end-users who are adversely impacted by predicted outcomes. However, recent studies have shown significant security concerns with using CF explanations in real-world applications; in particular, malicious adversaries can exploit CF explanations to perform query-efficient model extraction attacks on proprietary ML models. In this paper, we propose a model-agnostic watermarking framework (for adding watermarks to CF explanations) that can be leveraged to detect unauthorized model extraction attacks (which rely on the watermarked CF explanations). Our novel framework solves a bi-level optimization problem to embed an indistinguishable watermark into the generated CF explanation such that any future model extraction attacks that rely on these watermarked CF explanations can be detected using a null hypothesis significance testing (NHST) scheme, while ensuring that these embedded watermarks do not compromise the quality of the generated CF explanations. We evaluate this framework's performance across a diverse set of real-world datasets, CF explanation methods, and model extraction techniques, and show that our watermarking detection system can be used to accurately identify extracted ML models that are trained using the watermarked CF explanations. Our work paves the way for the secure adoption of CF explanations in real-world applications.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 微視的画像分類のためのLLMに基づく階層的概念分解

LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification ( http://arxiv.org/abs/2405.18672v1 )

ライセンス: Link先を確認

Renyi Qu, Mark Yatskar,

(参考訳) 視覚言語タスクの解釈可能なモデルの最近の進歩は、競争的な性能を達成したが、大きな言語モデル(LLM)からの非構造化テキスト出力に依存しているため、その解釈可能性に悩まされることがしばしばある。これはランダム性を導入し、AIシステムの安全性問題に対処するために不可欠な透明性と信頼性の両方を損なう。本稿では,構造化概念解析によるモデル解釈可能性の向上を目的とした新しいフレームワークである‘texttt{Hi-CoDe}(階層概念分解)’を紹介する。 1)入力画像を視覚概念の階層構造に分解し,視覚概念木を形成する。 2) CLIPから派生した概念特化機能を利用する単純な線形分類器のアンサンブルを用いて分類を行う。我々のアプローチは、最先端のモデルの性能だけでなく、意思決定プロセスに対する明確な洞察を提供し、さまざまな概念の重要性を強調することによって透明性も向上します。これにより、潜在的な障害モードを詳細に分析し、モデルコンパクト性を向上させることができるため、精度を損なうことなく、新しいベンチマークを解釈可能である。

Recent advancements in interpretable models for vision-language tasks have achieved competitive performance; however, their interpretability often suffers due to the reliance on unstructured text outputs from large language models (LLMs). This introduces randomness and compromises both transparency and reliability, which are essential for addressing safety issues in AI systems. We introduce \texttt{Hi-CoDe} (Hierarchical Concept Decomposition), a novel framework designed to enhance model interpretability through structured concept analysis. Our approach consists of two main components: (1) We use GPT-4 to decompose an input image into a structured hierarchy of visual concepts, thereby forming a visual concept tree. (2) We then employ an ensemble of simple linear classifiers that operate on concept-specific features derived from CLIP to perform classification. Our approach not only aligns with the performance of state-of-the-art models but also advances transparency by providing clear insights into the decision-making process and highlighting the importance of various concepts. This allows for a detailed analysis of potential failure modes and improves model compactness, therefore setting a new benchmark in interpretability without compromising the accuracy.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# ベイズ忠実データ同化のためのディープベイズフィルタ

Deep Bayesian Filter for Bayes-faithful Data Assimilation ( http://arxiv.org/abs/2405.18674v1 )

ライセンス: Link先を確認

Yuta Tarumi, Keisuke Fukuda, Shin-ichi Maeda,

(参考訳) 非線形状態空間モデルの状態推定は難しい課題である。既存の同化法は主に、真の後部が必然的にガウス的でないような物理的空間上のガウス的後部を仮定する。非線形状態空間モデル(SSM)のデータ同化のためのディープベイズフィルタ(DBF)を提案する。 DBFは、新しい潜伏変数 $h_t$ を新しい潜伏変数 (``fancy'') 空間上に構築し、観測を $o_t$ に同化する。周辺一空想空間上の状態遷移を直線的に制限すること (ii) ガウス逆観測作用素 $q(h_t|o_t)$ を学習すると、後部は常に DBF に対してガウス的である。非常に特筆すべきは、後部の構造化設計は、時間ステップでモンテカルロサンプリングエラーを蓄積することなく、後部の再帰的な計算のための解析公式を提供することである。 DBF はガウス逆観測作用素 $q(h_t|o_t)$ とその他の潜在 SSM パラメータ(例えば、ダイナミックス行列)を求める。実験の結果,DBFは様々なタスクや条件下でモデルベースアプローチや潜在同化手法よりも優れていた。

State estimation for nonlinear state space models is a challenging task. Existing assimilation methodologies predominantly assume Gaussian posteriors on physical space, where true posteriors become inevitably non-Gaussian. We propose Deep Bayesian Filtering (DBF) for data assimilation on nonlinear state space models (SSMs). DBF constructs new latent variables $h_t$ on a new latent (``fancy'') space and assimilates observations $o_t$. By (i) constraining the state transition on fancy space to be linear and (ii) learning a Gaussian inverse observation operator $q(h_t|o_t)$, posteriors always remain Gaussian for DBF. Quite distinctively, the structured design of posteriors provides an analytic formula for the recursive computation of posteriors without accumulating Monte-Carlo sampling errors over time steps. DBF seeks the Gaussian inverse observation operators $q(h_t|o_t)$ and other latent SSM parameters (e.g., dynamics matrix) by maximizing the evidence lower bound. Experiments show that DBF outperforms model-based approaches and latent assimilation methods in various tasks and conditions.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# Zero-to-Hero:アテンションマップフィルタリングによるゼロショット新規ビュー合成の実現

Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering ( http://arxiv.org/abs/2405.18677v1 )

ライセンス: Link先を確認

Ido Sobol, Chenfeng Xu, Or Litany,

(参考訳) 単一のソースイメージに基づいて任意のビューからリアルなイメージを生成することは、電子商取引から没入型仮想体験に至るまで、コンピュータビジョンにおいて重要な課題である。拡散モデル、特にZero-1-to-3モデルの最近の進歩は、可塑性ビュー、ビデオ、および3Dモデルを生成するために広く採用されている。しかし、これらのモデルは、新しいビュー生成における矛盾と不確実性、特に視点の変化に苦慮している。本研究では,Zero-1-to-3のデノナイズ過程において,注目マップを操作することによってビュー合成を向上させる新しいテストタイム手法であるZero-to-Heroを提案する。偏極化過程と確率勾配降下(SGD)の類似性を引き出すことにより,注目マップを集約し,生成信頼性と信頼性を向上させるフィルタリング機構を実装した。このプロセスは、再訓練や重要な計算資源を必要とせず、幾何的整合性を改善する。さらに、ソースビューからの情報を統合するために自己認識機構を変更し、形状歪みを低減する。これらのプロセスは、特別なサンプリングスケジュールによってさらにサポートされます。実験結果から, 分布域外オブジェクトの多種多様な集合に対して, 忠実度と整合性に大きな改善が認められた。

Generating realistic images from arbitrary views based on a single source image remains a significant challenge in computer vision, with broad applications ranging from e-commerce to immersive virtual experiences. Recent advancements in diffusion models, particularly the Zero-1-to-3 model, have been widely adopted for generating plausible views, videos, and 3D models. However, these models still struggle with inconsistencies and implausibility in new views generation, especially for challenging changes in viewpoint. In this work, we propose Zero-to-Hero, a novel test-time approach that enhances view synthesis by manipulating attention maps during the denoising process of Zero-1-to-3. By drawing an analogy between the denoising process and stochastic gradient descent (SGD), we implement a filtering mechanism that aggregates attention maps, enhancing generation reliability and authenticity. This process improves geometric consistency without requiring retraining or significant computational resources. Additionally, we modify the self-attention mechanism to integrate information from the source view, reducing shape distortions. These processes are further supported by a specialized sampling schedule. Experimental results demonstrate substantial improvements in fidelity and consistency, validated on a diverse set of out-of-distribution objects.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# Vim-F: 周波数領域での学習から得られる視覚状態空間モデル

Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain ( http://arxiv.org/abs/2405.18679v1 )

ライセンス: Link先を確認

Juntao Zhang, Kun Bian, Peng Cheng, Wenbo An, Jianning Liu, Jun Zhou,

(参考訳) 近年、Mambaディープラーニングモデルとして知られる効率的なハードウェア対応設計を持つステートスペースモデル(SSM)は、言語理解のような長いシーケンスのモデリングにおいて大きな進歩を遂げている。したがって、SSMに基づく効率的で汎用的な視覚バックボーンの構築は有望な方向である。従来の畳み込みニューラルネットワーク(CNN)やビジョントランスフォーマー(ViT)と比較して、ビジョン・マンバ(ViM)メソッドのパフォーマンスは、まだ完全に競合していない。 SSMが画像データを処理するために、ViMは一般的に2D画像を1Dシーケンスに平らにし、必然的にいくつかの2Dローカル依存関係を無視し、グローバルな視点から空間的関係を解釈するモデルの能力を弱める。我々は、Fast Fourier Transform (FFT) を用いて特徴マップのスペクトルを取得し、元の特徴マップに追加し、VIMが周波数領域と空間領域の両方で統一された視覚表現をモデル化できるようにする。周波数領域情報の導入により、ViMはスキャン中にグローバルな受容野を持つことができる。周波数領域と空間領域の両方で純粋なマンバエンコーダとスキャンを利用するVim-Fと呼ばれる新しいモデルを提案する。さらに,Vim-F への位置埋め込みの必要性を疑問視し,Vim-F における位置埋め込みの必要性を考察した。最後に、Vim-Fのパッチ埋め込みを再設計し、より局所的な相関を捉えるために畳み込みステムを活用し、Vim-Fの性能をさらに向上させる。コードは以下の通り: \url{https://github.com/yws-wxs/Vim-F}。

In recent years, State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences such as language understanding. Therefore, building efficient and general-purpose visual backbones based on SSMs is a promising direction. Compared to traditional convolutional neural networks (CNNs) and Vision Transformers (ViTs), the performance of Vision Mamba (ViM) methods is not yet fully competitive. To enable SSMs to process image data, ViMs typically flatten 2D images into 1D sequences, inevitably ignoring some 2D local dependencies, thereby weakening the model's ability to interpret spatial relationships from a global perspective. We use Fast Fourier Transform (FFT) to obtain the spectrum of the feature map and add it to the original feature map, enabling ViM to model a unified visual representation in both frequency and spatial domains. The introduction of frequency domain information enables ViM to have a global receptive field during scanning. We propose a novel model called Vim-F, which employs pure Mamba encoders and scans in both the frequency and spatial domains. Moreover, we question the necessity of position embedding in ViM and remove it accordingly in Vim-F, which helps to fully utilize the efficient long-sequence modeling capability of ViM. Finally, we redesign a patch embedding for Vim-F, leveraging a convolutional stem to capture more local correlations, further improving the performance of Vim-F. Code is available at: \url{https://github.com/yws-wxs/Vim-F}.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 高次元近傍探索のためのナビゲート可能なグラフ:構築と限界

Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits ( http://arxiv.org/abs/2405.18680v1 )

ライセンス: Link先を確認

Haya Diwan, Jinrui Gou, Cameron Musco, Christopher Musco, Torsten Suel,

(参考訳) 近年,グラフに基づく近接探索手法への関心が高まっており,その多くが高次元点集合上のナビゲート可能なグラフの構築に重点を置いている。グラフは、任意の開始ノードから任意の目標ノードへ、所定の距離関数に従って目的地に最も近い隣人へ常に移動する、欲求ルーティング戦略を用いて、うまく移動することができればナビゲート可能である。完全なグラフは任意の点集合に対してナビゲート可能であるが、アプリケーションにとって重要な問題はスペーサーグラフを構築することができるかどうかである。この問題は低次元においてかなりよく理解されているが、高次元の点集合に対する最初の上界と下界のいくつかを確立する。まず、任意の次元の任意の$n$点に対して平均次数$O(\sqrt{n \log n })$のナビゲート可能なグラフを構築するための単純かつ効率的な方法を与える。ユークリッド計量の$O(\log n)$次元の下でも、任意の$\alpha < 1/2$に対して平均級数$O(n^{\alpha})$のナビゲート可能なグラフが存在しない。我々の下界は二項確率変数の鋭い反集中境界に依存しており、これはランダムな点の集合の近傍が著しく重複しないことを示すために使われる。

There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 組合せ最適化のためのランダムキーGRASP

A random-key GRASP for combinatorial optimization ( http://arxiv.org/abs/2405.18681v1 )

ライセンス: Link先を確認

Antonio A. Chaves, Mauricio G. C. Resende, Ricardo M. A. Silva,

(参考訳) 本稿ではランダムキーオプティマイザ(RKO)パラダイムを用いた問題独立なGRASPメタヒューリスティックを提案する。 GRASP (greedy randomized Adaptive search procedure) は、半グレディな構成手順を繰り返し適用し、その後局所的な探索手順を施すメタヒューリスティックな組合せ最適化法である。すべてのイテレーションで見つかる最良のソリューションは、GRASPのソリューションとして返される。 Continuous GRASP (C-GRASP) は、ユニットハイパーキューブの継続的な最適化のためのGRASPの拡張である。ランダムキー最適化器(RKO)は、ランダムキーのベクトルを用いて、組合せ最適化問題の解を符号化する。デコーダを使用して、ランダムキーのベクトルによって符号化されたソリューションを評価する。ランダムキーGRASPは、デコーダを用いてユニットハイパーキューブの点を評価するC-GRASPである。問題非依存のコンポーネントと問題依存のデコーダからなるランダムキーGRASPについて述べる。概念実証として、ランダムキーGRASPは、旅行セールスマン問題、ハブのツリー配置問題、スタイナー三重被覆問題、ノード容量グラフ分割問題、ジョブシークエンシングとツール切替問題という5つのNPハード組合せ最適化問題でテストされる。

This paper proposes a problem-independent GRASP metaheuristic using the random-key optimizer (RKO) paradigm. GRASP (greedy randomized adaptive search procedure) is a metaheuristic for combinatorial optimization that repeatedly applies a semi-greedy construction procedure followed by a local search procedure. The best solution found over all iterations is returned as the solution of the GRASP. Continuous GRASP (C-GRASP) is an extension of GRASP for continuous optimization in the unit hypercube. A random-key optimizer (RKO) uses a vector of random keys to encode a solution to a combinatorial optimization problem. It uses a decoder to evaluate a solution encoded by the vector of random keys. A random-key GRASP is a C-GRASP where points in the unit hypercube are evaluated employing a decoder. We describe random key GRASP consisting of a problem-independent component and a problem-dependent decoder. As a proof of concept, the random-key GRASP is tested on five NP-hard combinatorial optimization problems: traveling salesman problem, tree of hubs location problem, Steiner triple covering problem, node capacitated graph partitioning problem, and job sequencing and tool switching problem.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# GPTは医学的理解を再定義できるか? 医学的機械読解におけるGPTの評価

Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension ( http://arxiv.org/abs/2405.18682v1 )

ライセンス: Link先を確認

Shubham Vatsal, Ayush Singh,

(参考訳) 大規模言語モデル(LLM)は、異なる領域における多くのタスクにおいて顕著なパフォーマンスを示している。しかし, 閉本バイオメディカル機械読解術(MRC)の成績は, 深く評価されていない。本研究は,4つの閉本バイオメディカルMCCベンチマークにおけるGPTの評価である。従来のプロンプト手法を実験し、新しいプロンプト手法を導入する。 LLM固有の検索問題のいくつかを解決するため,従来のRAGセットアップにおいて,ベクトルデータベースを用いて重要なチャンクを検索する必要性を緩和するImplicit Retrieval Augmented Generation (RAG) というプロンプト戦略を提案する。さらに,本手法による自然言語生成の質的評価について報告する。その結果、我々の新しいプロンプト技術は、4つのデータセットのうち2つで最高のパフォーマンスを得ることができ、残りの2つにランク付けできることがわかった。実験により、ゼロショット設定でもGPTのような現代のLLMは教師付きモデルよりも優れており、2つのベンチマークで新たなSoTA(State-of-the-art)結果が得られた。

Large language models (LLMs) have shown remarkable performance on many tasks in different domains. However, their performance in closed-book biomedical machine reading comprehension (MRC) has not been evaluated in depth. In this work, we evaluate GPT on four closed-book biomedical MRC benchmarks. We experiment with different conventional prompting techniques as well as introduce our own novel prompting method. To solve some of the retrieval problems inherent to LLMs, we propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases to retrieve important chunks in traditional RAG setups. Moreover, we report qualitative assessments on the natural language generation outputs from our approach. The results show that our new prompting technique is able to get the best performance in two out of four datasets and ranks second in rest of them. Experiments show that modern-day LLMs like GPT even in a zero-shot setting can outperform supervised models, leading to new state-of-the-art (SoTA) results on two of the benchmarks.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 半群正規化を用いた時間連続ネットワークによる画像登録のための差分型学習

Learning Diffeomorphism for Image Registration with Time-Continuous Networks using Semigroup Regularization ( http://arxiv.org/abs/2405.18684v1 )

ライセンス: Link先を確認

Mohammadjavad Matinkia, Nilanjan Ray,

(参考訳) DIR(Diffomorphic Image registration)は、3次元画像解析において重要な課題であり、画像間の変形を保存するトポロジーを見つけることを目的としている。フローマップ微分方程式の解を微分同相変形として焦点をあてた最近の手法では、離散時間ステップと様々な正規化項を使い、ヤコビアンの負の行列式をペナル化し、解ベクトル場の滑らかさを課す。本稿では, 時間連続体における微分同相を, 正規化項が少なく, 付加的な統合を伴わないような, 微分同相な3次元画像登録のための新しい学習手法を提案する。フローマップの基本特性の1つとして、半群特性を正規化の唯一の形式として利用し、一対のイメージ間の時間的に連続な微分同相流を保証する。この特性を活用することで、トレーニングと評価の両方において、さらなる正規化項の必要性が軽減され、スケーリングとスキャアリング統合が不要になる。時間連続微分同相を実現するために、拡散モデルでよく用いられる手法である時間埋め込みユニペットを用いる。提案手法は, 連続時間間隔における微分同相性を保証することにより, より良い登録結果が得られることを示す。 2つの公開データセット(OASISとCANDI)に対する実験結果は、学習に基づく手法と最適化に基づく手法の両方よりも、我々のモデルの方が優れていることを示す。

Diffeomorphic image registration (DIR) is a critical task in 3D medical image analysis, aimed at finding topology preserving deformations between pairs of images. Focusing on the solution of the flow map differential equation as the diffeomorphic deformation, recent methods use discrete timesteps along with various regularization terms to penalize the negative determinant of Jacobian and impose smoothness of the solution vector field. In this paper, we propose a novel learning-based approach for diffeomorphic 3D-image registration which finds the diffeomorphisms in the time continuum with fewer regularization terms and no additional integration. As one of the fundamental properties of flow maps, we exploit the semigroup property as the only form of regularization, ensuring temporally continuous diffeomorphic flows between pairs of images. Leveraging this property, our method alleviates the need for additional regularization terms and scaling and squaring integration during both training and evaluation. To achieve time-continuous diffeomorphisms, we employ time-embedded UNets, a technique commonly utilized in diffusion models. The proposed method reveals that ensuring diffeomorphism in a continuous time interval leads to better registration results. Experimental results on two public datasets (OASIS and CANDI) demonstrate the superiority of our model over both learning-based and optimization-based methods.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 学習密度比による拒絶

Rejection via Learning Density Ratios ( http://arxiv.org/abs/2405.18686v1 )

ライセンス: Link先を確認

Alexander Soen, Hisham Husain, Philip Schulz, Vu Nguyen,

(参考訳) 拒絶による分類は、モデルを予測しないことを許容する学習パラダイムとして現れます。主なアプローチは、典型的な損失関数を増大させることで教師付き学習パイプラインを変更することである。そこで我々は,事前学習したモデルの性能を最大化する理想的なデータ分布を求める。これは損失リスクの最適化を通じて$ \phi$-divergence regularization 項で定式化することができる。この理想的な分布を通して、この分布とデータ分布の密度比を利用して拒絶判定を行うことができる。私たちは、$ \phi $-divergencesが$ \alpha $-divergenceのファミリーによって指定される設定に焦点を当てます。私たちのフレームワークはクリーンでノイズの多いデータセットで実証的にテストされています。

Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ \phi$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ \phi $-divergences are specified by the family of $ \alpha $-divergence. Our framework is tested empirically over clean and noisy datasets.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 家庭用ロボティクスの強化:効率的なトレーニングとパフォーマンス向上のための対話型強化学習

Advancing Household Robotics: Deep Interactive Reinforcement Learning for Efficient Training and Enhanced Performance ( http://arxiv.org/abs/2405.18687v1 )

ライセンス: Link先を確認

Arpita Soni, Sujatha Alla, Suresh Dodda, Hemanth Volikatla,

(参考訳) 家庭内ロボットが家事を行う市場は、こうしたロボットが日常の責任を和らげるにつれて成長している。国内ロボットは一般的に、労働者を解雇したとしてしばしば批判される産業ロボットとは対照的に、人間の労働を緩和する役割で歓迎されている。しかし、これらのロボットが家事を行う前には、周囲の認識、意思決定、人間の行動の把握など、いくつかの小さな活動に精通する必要がある。強化学習(Reinforcement Learning, RL)は、ロボットが自分の環境と対話し、報酬を最大限にするために自分の行動を最適化する方法を学ぶための、重要なロボティクス技術として登場した。しかし、Deep Reinforcement Learningの目標は、RLとニューラルネットワークを組み合わせることで、現実の環境でより複雑で継続的なアクションステートスペースに対処することだ。 DeepRLの有効性は、対話的なフィードバックを通じてさらに強化され、トレーナーがロボットの学習プロセスを高速化するためのリアルタイムガイダンスを提供する。それにもかかわらず、現在の手法には欠点があり、すなわち、同じ条件下で繰り返し学習される指導の一時的な適用である。そこで本研究では,永続的なルールベースシステムを利用したDeep Interactive Reinforcement Learningを通じて,情報とアドバイスを保存・再利用する新しい手法を提案する。この方法は訓練プロセスを短縮するだけでなく、インストラクターが実行しなければならない反復回数を減らす。本研究は,家庭用ロボットの開発を推進し,学習者としての有効性と効率を向上させる可能性を秘めている。

The market for domestic robots made to perform household chores is growing as these robots relieve people of everyday responsibilities. Domestic robots are generally welcomed for their role in easing human labor, in contrast to industrial robots, which are frequently criticized for displacing human workers. But before these robots can carry out domestic chores, they need to become proficient in several minor activities, such as recognizing their surroundings, making decisions, and picking up on human behaviors. Reinforcement learning, or RL, has emerged as a key robotics technology that enables robots to interact with their environment and learn how to optimize their actions to maximize rewards. However, the goal of Deep Reinforcement Learning is to address more complicated, continuous action-state spaces in real-world settings by combining RL with Neural Networks. The efficacy of DeepRL can be further augmented through interactive feedback, in which a trainer offers real-time guidance to expedite the robot's learning process. Nevertheless, the current methods have drawbacks, namely the transient application of guidance that results in repeated learning under identical conditions. Therefore, we present a novel method to preserve and reuse information and advice via Deep Interactive Reinforcement Learning, which utilizes a persistent rule-based system. This method not only expedites the training process but also lessens the number of repetitions that instructors will have to carry out. This study has the potential to advance the development of household robots and improve their effectiveness and efficiency as learners.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# 適応的経験推定による効率的選好型強化学習

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation ( http://arxiv.org/abs/2405.18688v1 )

ライセンス: Link先を確認

Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han,

(参考訳) 評価に基づく強化学習(PbRL)は、報酬工学を使わずにトレーニングエージェントに優れた能力を示す。しかしながら、PbRLの顕著な制限は、人間のフィードバックへの依存である。この依存は、価値/政治学習と組み合わせた正確な報酬学習を必要とする学習ループに起因しており、かなりの数のサンプルを必要とする。学習ループを強化するために,ラベルスムース化とポリシー規則化を併用した効率的なPbRL法であるSEERを提案する。ラベルスムーシングは、人間の嗜好ラベルをスムースにすることで報酬モデルの過度な適合を減らす。さらに、現在のリプレイメモリからサポートされた状態-アクションペアを使用して、保守的な推定値$\widehat{Q}$をブートストラップし、過大評価バイアスを緩和し、ポリシー学習規則化に利用します。オンラインとオフラインの両方で、さまざまな複雑なタスクに対する実験結果から、我々のアプローチがフィードバック効率を向上し、最先端の手法を大きなマージンで上回ることを示した。アブレーション研究により、SEERは以前の研究と比べてより正確なQ-関数を達成することが明らかとなった。

Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques. Label smoothing reduces overfitting of the reward model by smoothing human preference labels. Additionally, we bootstrap a conservative estimate $\widehat{Q}$ using well-supported state-action pairs from the current replay memory to mitigate overestimation bias and utilize it for policy learning regularization. Our experimental results across a variety of complex tasks, both in online and offline settings, demonstrate that our approach improves feedback efficiency, outperforming state-of-the-art methods by a large margin. Ablation studies further reveal that SEER achieves a more accurate Q-function compared to prior work.

翻訳日:2024-05-30 21:13:51 公開日:2024-05-29

# DeepHGNN:階層的関連多変量時系列のグラフニューラルネットワークによる予測手法の検討

DeepHGNN: Study of Graph Neural Network based Forecasting Methods for Hierarchically Related Multivariate Time Series ( http://arxiv.org/abs/2405.18693v1 )

ライセンス: Link先を確認

Abishek Sriramulu, Nicolas Fourrier, Christoph Bergmeir,

(参考訳) グラフニューラルネットワーク(GNN)は,特に系列内時間相関と系列間関係を同時に考慮する能力について,予測領域において大きな注目を集めている。本稿では,複雑な階層構造における予測を目的とした新しい階層型GNN(DeepHGNN)フレームワークを提案する。 DeepHGNNのユニークな点は、その革新的なグラフベースの階層的補間とエンドツーエンドの和解機構にある。このアプローチは、様々な階層レベルの予測精度とコヒーレンスを保証し、それらをまたいだ信号を共有し、階層的な予測において重要な課題に対処する。階層的時系列における重要な洞察は、予測可能性のレベル間でのばらつきであり、上位レベルは通常より予測可能なコンポーネントを提示する。 DeepHGNNは、すべての階層レベルの知識をプールし、活用することで、この洞察に基づいて、全体的な予測精度を向上させる。複数の最先端モデルに対する包括的評価セットにより,DeepHGNNの優れた性能が確認された。本研究は,DeepHGNNが予測精度を大幅に向上させる効果を実証するだけでなく,階層的時系列予測におけるグラフベースの手法の理解にも寄与する。

Graph Neural Networks (GNN) have gained significant traction in the forecasting domain, especially for their capacity to simultaneously account for intra-series temporal correlations and inter-series relationships. This paper introduces a novel Hierarchical GNN (DeepHGNN) framework, explicitly designed for forecasting in complex hierarchical structures. The uniqueness of DeepHGNN lies in its innovative graph-based hierarchical interpolation and an end-to-end reconciliation mechanism. This approach ensures forecast accuracy and coherence across various hierarchical levels while sharing signals across them, addressing a key challenge in hierarchical forecasting. A critical insight in hierarchical time series is the variance in forecastability across levels, with upper levels typically presenting more predictable components. DeepHGNN capitalizes on this insight by pooling and leveraging knowledge from all hierarchy levels, thereby enhancing the overall forecast accuracy. Our comprehensive evaluation set against several state-of-the-art models confirm the superior performance of DeepHGNN. This research not only demonstrates DeepHGNN's effectiveness in achieving significantly improved forecast accuracy but also contributes to the understanding of graph-based methods in hierarchical time series forecasting.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# 収束保証者によるスペクトルリスク安全強化学習

Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees ( http://arxiv.org/abs/2405.18698v1 )

ライセンス: Link先を確認

Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, Songhwai Oh,

(参考訳) リスク対応型強化学習(RCRL)の分野は,リスク対策に基づく制約を明示的に扱うことにより,最悪のシナリオの可能性を効果的に低減するために開発されている。しかし、リスク尺度の非線形性は収束性と最適性を達成することを困難にしている。非線形性によって引き起こされる困難を克服するために,スペクトルリスク尺度制約付きRLアルゴリズム,スペクトルリスク制約付きポリシー最適化(SRCPO)を提案する。双レベル最適化構造では、外部問題はリスク測度から導出される双対変数を最適化することであり、内部問題はこれらの双対変数が与えられたときの最適ポリシーを見つけることである。提案手法は,我々の知る限り,表の設定における最適収束を保証する最初の方法である。さらに,提案手法は連続制御タスク上で評価され,制約を満たす他のRCRLアルゴリズムの中で最高の性能を示した。

The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO), a bilevel optimization approach that utilizes the duality of spectral risk measures. In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables. The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting. Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# シーン認識型ニューラルヒューマンモーション予測のためのマルチコンディション潜時拡散ネットワーク

Multi-Condition Latent Diffusion Network for Scene-Aware Neural Human Motion Prediction ( http://arxiv.org/abs/2405.18700v1 )

ライセンス: Link先を確認

Xuehao Gao, Yang Yang, Yang Wu, Shaoyi Du, Auo-Jun Qi,

(参考訳) 3次元の人間の動きを推定することは、人間の活動を理解し、その人の意図を分析するなど、多くの応用において基本である。人間の動きを予測するために多くの実りある努力がなされてきたが、ほとんどのアプローチはポーズ駆動の予測に焦点を合わせ、文脈環境から離れて人間の動きを推測することで、シーン内の身体の位置運動を残している。しかし、現実世界の人間の動きはゴール指向であり、周囲のシーンの空間的レイアウトの影響を強く受けている。本稿では,従来の3次元体の動きと現在の3次元シーンのコンテキストに基づいて,人間の動作予測タスクを多条件共同推論問題として再構成するマルチコンディション潜伏拡散ネットワーク(MCLD)を提案する。具体的には、MCLDは、原動列上での関節分布を直接モデル化する代わりに、後続の埋め込み空間内で条件拡散プロセスを実行し、過去の体の動きと現在のシーン条件の埋め込みから将来の人間の動き埋め込みへの相互マッピングを特徴付ける。大規模人間の動き予測データセットに関する大規模な実験により、我々のMCLDは、現実的および多種多様な予測に関する最先端の手法よりも大幅に改善されていることが示された。

Inferring 3D human motion is fundamental in many applications, including understanding human activity and analyzing one's intention. While many fruitful efforts have been made to human motion prediction, most approaches focus on pose-driven prediction and inferring human motion in isolation from the contextual environment, thus leaving the body location movement in the scene behind. However, real-world human movements are goal-directed and highly influenced by the spatial layout of their surrounding scenes. In this paper, instead of planning future human motion in a 'dark' room, we propose a Multi-Condition Latent Diffusion network (MCLD) that reformulates the human motion prediction task as a multi-condition joint inference problem based on the given historical 3D body motion and the current 3D scene contexts. Specifically, instead of directly modeling joint distribution over the raw motion sequences, MCLD performs a conditional diffusion process within the latent embedding space, characterizing the cross-modal mapping from the past body movement and current scene context condition embeddings to the future human motion embedding. Extensive experiments on large-scale human motion prediction datasets demonstrate that our MCLD achieves significant improvements over the state-of-the-art methods on both realistic and diverse predictions.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# FocSAM: セグメンテーションにおけるフォーカスされたオブジェクトを深く掘り下げる

FocSAM: Delving Deeply into Focused Objects in Segmenting Anything ( http://arxiv.org/abs/2405.18706v1 )

ライセンス: Link先を確認

You Huang, Zongyu Lan, Liujuan Cao, Xianming Lin, Shengchuan Zhang, Guannan Jiang, Rongrong Ji,

(参考訳) Segment Anything Model (SAM)はセグメンテーションモデルの注目すべきマイルストーンであり、その堅牢なゼロショット機能と多様なプロンプトを扱う能力によって強調されている。 SAMは、インタラクティブセグメンテーションを、大きなエンコーダによるイメージ前処理と、軽量デコーダによるインタラクティブ推論に分離し、効率的なリアルタイムパフォーマンスを保証するパイプラインに従う。しかしSAMは、このパイプライン上での挑戦的なサンプルの安定性の問題に直面している。これらの問題は2つの主な要因から生じる。まず、画像前処理は、画像レベルのズームイン戦略を用いてSAMを動的に無効にし、インタラクション中にターゲットオブジェクトに再フォーカスする。第二に、軽量デコーダは画像埋め込みと対話的な情報を十分に統合するのに苦労する。これら2つの制限に対処するため、我々は2つの重要な側面に基づいてパイプラインを再設計したFocSAMを提案する。まず,Dwin-MSA(Dynamic Window Multi-head Self-Attention)を提案する。 Dwin-MSAは対象オブジェクトの周囲の注意計算をローカライズし、最小の計算オーバーヘッドでオブジェクト関連の埋め込みを強化する。次に,Pixel-wise Dynamic ReLU (P-DyReLU) を提案する。実験的に、FocSAMはSAMのインタラクティブセグメンテーション性能を向上し、既存の最先端の手法をセグメンテーション品質に適合させ、CPU上での推論時間の5.6%しか必要としない。

The Segment Anything Model (SAM) marks a notable milestone in segmentation models, highlighted by its robust zero-shot capabilities and ability to handle diverse prompts. SAM follows a pipeline that separates interactive segmentation into image preprocessing through a large encoder and interactive inference via a lightweight decoder, ensuring efficient real-time performance. However, SAM faces stability issues in challenging samples upon this pipeline. These issues arise from two main factors. Firstly, the image preprocessing disables SAM from dynamically using image-level zoom-in strategies to refocus on the target object during interaction. Secondly, the lightweight decoder struggles to sufficiently integrate interactive information with image embeddings. To address these two limitations, we propose FocSAM with a pipeline redesigned on two pivotal aspects. First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object. Dwin-MSA localizes attention computations around the target object, enhancing object-related embeddings with minimal computational overhead. Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks that have significant impacts on the overall segmentation results. Experimentally, FocSAM augments SAM's interactive segmentation performance to match the existing state-of-the-art method in segmentation quality, requiring only about 5.6% of this method's inference time on CPUs.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# ベクトルエッジコンピューティングにおける適応的・並列的フェデレーション学習

Adaptive and Parallel Split Federated Learning in Vehicular Edge Computing ( http://arxiv.org/abs/2405.18707v1 )

ライセンス: Link先を確認

Xianke Qiang, Zheng Chang, Yun Hu, Lei Liu, Timo Hamalainen,

(参考訳) 車両エッジインテリジェンス(VEI)は、車両エッジコンピューティング(VEC)システムに人工知能(AI)を収容することで、将来のインテリジェントトランスポートシステムを実現するための有望なパラダイムである。フェデレーテッド・ラーニング(FL)は、VEIにおける車両データのプライバシを保護しつつ、局所的に協調的なモデルトレーニングとアグリゲーションを促進する基本的な技術の1つである。しかし、従来のFLは、車両の不均一性に適応し、資源に制約のある車両で大規模なモデルを訓練し、重量プライバシーの漏洩をモデル化する際の課題に直面している。一方、スプリットラーニング(SL)は、モデルワイトリークのリスクを軽減し、車両上でのトレーニング負荷を解放する、有望な協調学習フレームワークとして提案されている。 SLは、モデル全体を車両側モデルとEC側モデルに分割することで、車両とエッジクラウド(EC)の間のモデルを順次訓練する。本研究では、SLとFLの利点を組み合わせて、ベクトルエッジコンピューティング(ASFV)のための適応分割フェデレート学習スキームを開発する。 ASFVスキームはモデルを適応的に分割し、移動体選択と資源配分を考慮したトレーニングプロセスを並列化する。非独立で同一の分散データを用いて行った広範囲なシミュレーションにより、提案手法は既存のベンチマークと比較してトレーニングの遅延を著しく低減し、ネットワークのダイナミクスや車両の移動性に適応することを示した。

Vehicular edge intelligence (VEI) is a promising paradigm for enabling future intelligent transportation systems by accommodating artificial intelligence (AI) at the vehicular edge computing (VEC) system. Federated learning (FL) stands as one of the fundamental technologies facilitating collaborative model training locally and aggregation, while safeguarding the privacy of vehicle data in VEI. However, traditional FL faces challenges in adapting to vehicle heterogeneity, training large models on resource-constrained vehicles, and remaining susceptible to model weight privacy leakage. Meanwhile, split learning (SL) is proposed as a promising collaborative learning framework which can mitigate the risk of model wights leakage, and release the training workload on vehicles. SL sequentially trains a model between a vehicle and an edge cloud (EC) by dividing the entire model into a vehicle-side model and an EC-side model at a given cut layer. In this work, we combine the advantages of SL and FL to develop an Adaptive Split Federated Learning scheme for Vehicular Edge Computing (ASFV). The ASFV scheme adaptively splits the model and parallelizes the training process, taking into account mobile vehicle selection and resource allocation. Our extensive simulations, conducted on non-independent and identically distributed data, demonstrate that the proposed ASFV solution significantly reduces training latency compared to existing benchmarks, while adapting to network dynamics and vehicles' mobility.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# リコメンダシステムのための特徴インタラクション選択のための認知的進化学習

Cognitive Evolutionary Learning to Select Feature Interactions for Recommender Systems ( http://arxiv.org/abs/2405.18708v1 )

ライセンス: Link先を確認

Runlong Yu, Qixiang Shao, Qi Liu, Huan Liu, Enhong Chen,

(参考訳) 機能相互作用の選択は、商業レコメンデータシステムにおける基本的な問題である。ほとんどのアプローチは、専門家の指導の下で、同じ事前定義された操作によって、すべての特徴と相互作用を均等に列挙します。 1) アーキテクチャがタスクやデータに不適応であるため、モデルの学習能力を保証することはできない; (2) 機能やインタラクションが不要なノイズをもたらし、トレーニングプロセスが複雑になる可能性がある。本稿では,タスクガイダンスに基づく適切な操作,特徴,インタラクションを選択するために,モデルを適応的に進化させることを目的とする。自然生物の進化と機能に触発されて,認知能力は多様な環境下で反応し,生き残ることができる生物の特性を指す,新しい『textsl{Cognitive EvoLutionary Learning』(CELL)フレームワークを提案する。これは3つの段階、すなわちDNA探索、ゲノム探索、モデル機能から構成される。特に、モデルとタスクの関係を生物と自然環境の関係とみなすならば、機能対の相互作用は二重鎖DNAに類似し、関連する特徴と相互作用はゲノムに類似することができる。この線に沿って、自然選択のための生物の生存率をシミュレートするために、操作、特徴、相互作用に関するモデルの適合度を診断する。 Cellはさまざまなタスクやデータに対して,さまざまなモデルに適応的に進化し,実践者がオフザシェルフモデルにアクセスできることを示す。 4つの実世界のデータセットに対する大規模な実験は、細胞が最先端のベースラインを大幅に上回っていることを示している。また、我々は、セルが特徴対の予め定義された相互作用パターンを一貫して発見できることを確認するために、合成実験を行う。

Feature interaction selection is a fundamental problem in commercial recommender systems. Most approaches equally enumerate all features and interactions by the same pre-defined operation under expert guidance. Their recommendation is unsatisfactory sometimes due to the following issues: (1)~They cannot ensure the learning abilities of models because their architectures are poorly adaptable to tasks and data; (2)~Useless features and interactions can bring unnecessary noise and complicate the training process. In this paper, we aim to adaptively evolve the model to select appropriate operations, features, and interactions under task guidance. Inspired by the evolution and functioning of natural organisms, we propose a novel \textsl{Cognitive EvoLutionary Learning (CELL)} framework, where cognitive ability refers to a property of organisms that allows them to react and survive in diverse environments. It consists of three stages, i.e., DNA search, genome search, and model functioning. Specifically, if we regard the relationship between models and tasks as the relationship between organisms and natural environments, interactions of feature pairs can be analogous to double-stranded DNA, of which relevant features and interactions can be analogous to genomes. Along this line, we diagnose the fitness of the model on operations, features, and interactions to simulate the survival rates of organisms for natural selection. We show that CELL can adaptively evolve into different models for different tasks and data, which enables practitioners to access off-the-shelf models. Extensive experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines. Also, we conduct synthetic experiments to ascertain that CELL can consistently discover the pre-defined interaction patterns for feature pairs.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# FP8とリターン:LLMトレーニングの安定性に及ぼす高精度化の効果の定量化

To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability ( http://arxiv.org/abs/2405.18710v1 )

ライセンス: Link先を確認

Joonhyung Lee, Jeongin Bae, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee,

(参考訳) 大規模言語モデル(LLM)事前学習に伴う膨大な計算コストは、プロセスの高速化のために、精度の低い浮動小数点表現に大きな関心を惹き付けている。その結果、BrainFloat16(BF16)の精度は、近年のアクセラレーターにハードウェアサポートが組み込まれているLCMトレーニングのデファクトスタンダードとなった。 FP8が最近導入された最新のプロセッサでは、この傾向はさらに進んでいる。しかしながら、BF16より安定でないことが判明したFP16の以前の経験は、FP8がFP16よりも少ないビットでも、LCMトレーニングに費用対効果があるかどうかという懸念を提起している。我々は、コスト効率を高めるために、高精度トレーニングスキームは、高精度トレーニングスキームと同等のトレーニング安定性とハイパーパラメータ感度を持つ必要があると論じる。しかし、現在利用可能なFP8訓練方法は、経済的代替品としての使用を可能にするには不十分であることがわかった。これにより、ランダムシード間の堅牢性や学習率の観点から、低精度LDMトレーニングの安定性を検討することができる。そこで本研究では,自動回帰言語モデルにおける損失ランドスケープのシャープネスを定量化するための新しい評価手法と指標を提案する。浮動小数点表現におけるインクリメンタルビット削減をシミュレートすることにより,表現力とトレーニング安定性の関係を解析し,今後の研究を支援する。

The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent accelerators. This trend has gone even further in the latest processors, where FP8 has recently been introduced. However, prior experience with FP16, which was found to be less stable than BF16, raises concerns as to whether FP8, with even fewer bits than FP16, can be a cost-effective option for LLM training. We argue that reduced-precision training schemes must have similar training stability and hyperparameter sensitivities to their higher-precision counterparts in order to be cost-effective. However, we find that currently available methods for FP8 training are not robust enough to allow their use as economical replacements. This prompts us to investigate the stability of reduced-precision LLM training in terms of robustness across random seeds and learning rates. To this end, we propose new evaluation techniques and a new metric for quantifying loss landscape sharpness in autoregressive language models. By simulating incremental bit reductions in floating-point representations, we analyze the relationship between representational power and training stability with the intent of aiding future research into the field.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# 内部整合性を持つ言語モデルにおける推論の校正

Calibrating Reasoning in Language Models with Internal Consistency ( http://arxiv.org/abs/2405.18711v1 )

ライセンス: Link先を確認

Zhihui Xie, Jizhou Guo, Tong Yu, Shuai Li,

(参考訳) 大型言語モデル (LLM) は様々な推論タスクにおいて印象的な能力を示しており、例えばチェーン・オブ・シント (CoT) のような技術によって助けられ、言語化された推論が引き起こされる。しかし、LSMは明らかな誤りや矛盾のあるテキストを生成することが多く、それらが堅牢に処理し、生成した有理性を利用する能力に疑問を呈している。本研究では, 内部表現のレンズによるLLMにおけるCoT推論について検討し, これらの表現が生成した有理数にどのように影響するかに着目した。予備分析の結果、生成した有理値が解答精度を向上させる一方で、中間層におけるモデルの内部表現と最終層における表現との間に矛盾が生じ、それらの推論プロセスの信頼性を損なう可能性が示唆された。そこで本研究では,中間層から復号された遅延予測の一致を検証し,モデルの信頼性の尺度として内部整合性を提案する。異なるモデルとデータセットにわたる広範な実験研究により、内部の一貫性が正しい推論経路と間違った推論経路を効果的に区別することを示した。そこで本研究では,内部整合性の高い高重み付け推論経路によるCoT推論の校正手法を提案する。さらなる分析により、レイヤ間の注意パターンとフィードフォワードモジュールが明らかになり、内部の不整合の出現に関する洞察が得られる。本研究は, LLMの自己評価に内部表現を用いることの可能性を示すものである。

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought (CoT) prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious mistakes and contradictions, raising doubts about their ability to robustly process and utilize generated rationales. In this work, we investigate CoT reasoning in LLMs through the lens of internal representations, focusing on how these representations are influenced by generated rationales. Our preliminary analysis reveals that while generated rationales improve answer accuracy, inconsistencies emerge between the model's internal representations in middle layers and those in final layers, potentially undermining the reliability of their reasoning processes. To address this, we propose internal consistency as a measure of the model's confidence by examining the agreement of latent predictions decoded from intermediate layers. Extensive empirical studies across different models and datasets demonstrate that internal consistency effectively distinguishes between correct and incorrect reasoning paths. Motivated by this, we propose a new approach to calibrate CoT reasoning by up-weighting reasoning paths with high internal consistency, resulting in a significant boost in reasoning performance. Further analysis uncovers distinct patterns in attention and feed-forward modules across layers, providing insights into the emergence of internal inconsistency. In summary, our results demonstrate the potential of using internal representations for self-evaluation of LLMs.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# NeRF on-the-go: Exploiting Uncertainity for Distractor-free NeRFs in the Wild

NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild ( http://arxiv.org/abs/2405.18715v1 )

ライセンス: Link先を確認

Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng,

(参考訳) ニューラルネットワーク(Neural Radiance Fields、NeRF)は、静的なシーンのマルチビュー画像からフォトリアリスティックなビューを合成することに成功したが、動いた物体、影、照明変更などの邪魔をする動的な現実世界環境では課題に直面している。既存の手法は、制御された環境と低い閉塞率を管理するが、特に高い閉塞シナリオ下では、レンダリング品質が不足する。本稿では,手軽にキャプチャされた画像列のみから,複雑なシーンにおける新規ビューのロバストな合成を可能にする,シンプルで効果的なNeRF On-the-goを提案する。不確実性に陥りつつも,本手法は捕集に支配的であったとしても,効率的に散逸を除去するだけでなく,顕著に高速な収束速度を実現する。様々な場面における総合的な実験を通して,本手法は最先端技術よりも顕著に改善されていることを示す。この進歩は、多様な動的現実世界のアプリケーションにおいて、NeRFの新しい道を開く。

Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusion scenarios. In this paper, we introduce NeRF On-the-go, a simple yet effective approach that enables the robust synthesis of novel views in complex, in-the-wild scenes from only casually captured image sequences. Delving into uncertainty, our method not only efficiently eliminates distractors, even when they are predominant in captures, but also achieves a notably faster convergence speed. Through comprehensive experiments on various scenes, our method demonstrates a significant improvement over state-of-the-art techniques. This advancement opens new avenues for NeRF in diverse and dynamic real-world applications.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# SketchDeco: カラーでB&Wスケッチをデコレート

SketchDeco: Decorating B&W Sketches with Colour ( http://arxiv.org/abs/2405.18716v1 )

ライセンス: Link先を確認

Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song,

(参考訳) 本稿では,色彩の普遍的な幼児期活動とそのデザイン・ストーリーボードへの応用から着想を得た,色彩のスケッチ化のための新しいアプローチを提案する。精度と利便性のバランスを考慮し,直感的なユーザコントロールを実現するために,地域マスクとカラーパレットを活用し,手動のカラーアサインやテキストプロンプトの制限をクリアする。 ControlNetとステージ生成を戦略的に組み合わせ、安定拡散v1.5を導入し、BLIP-2テキストプロンプトを活用することにより、忠実な画像生成とユーザ指向のカラー化を容易にする。局所的およびグローバルな一貫性の課題に対処するため,我々は,インバージョンスキーム,ガイド付きサンプリング,スケーリング係数を持つ自己保持機構などの発明的なソリューションを採用している。このツールは、高速でトレーニングのないだけでなく、消費者向けのNvidia RTX 4090 Super GPUとも互換性がある。 Project Page: \url{https://chaitron.github.io/SketchDeco/}

This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the limitations of textual prompts. By strategically combining ControlNet and staged generation, incorporating Stable Diffusion v1.5, and leveraging BLIP-2 text prompts, our methodology facilitates faithful image generation and user-directed colourisation. Addressing challenges of local and global consistency, we employ inventive solutions such as an inversion scheme, guided sampling, and a self-attention mechanism with a scaling factor. The resulting tool is not only fast and training-free but also compatible with consumer-grade Nvidia RTX 4090 Super GPUs, making it a valuable asset for both creative professionals and enthusiasts in various fields. Project Page: \url{https://chaitron.github.io/SketchDeco/}

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# ベイジアンパースケーションによる効率的なモデル非依存アライメント

Efficient Model-agnostic Alignment via Bayesian Persuasion ( http://arxiv.org/abs/2405.18718v1 )

ライセンス: Link先を確認

Fengshuo Bai, Mingzhi Wang, Zhaowei Zhang, Boyuan Chen, Yinda Xu, Ying Wen, Yaodong Yang,

(参考訳) 近年の大規模言語モデル(LLM)の進歩により,LLMと人間の意図との合意を維持するための効果的な手法としてアライメントが出現している。現在の手法は、主に監視ファインチューニング(SFT)や人間からのフィードバックからの強化学習(RLHF)を通じて直接訓練される。本稿では,より小さなモデルを用いてブラックボックスの大規模モデルをコーディネートする効率的な手法について検討し,モデルに依存しない軽量ベイズパーステンションアライメントフレームワークを提案する。我々はこの問題を,小型モデルの観点からの信号処理戦略の最適化として定式化する。説得プロセスでは、小さなモデル(アドバイザ)が情報項目(すなわち状態)を観察し、大きなモデル(Receiver)を説得して、改善された応答を引き出す。その後、受信者は、入力、アドバイザからの信号、および情報項目に関する更新された信念に基づいて応答を生成する。筆者らは,本フレームワークを用いてトレーニングを行うことで,様々なタスクにおいて,各種受信者の性能を大幅に向上させることができることを示した。理論的には,我々の説得の枠組みを解析し,助言者の後悔に上限を与え,最適なシグナル伝達戦略を学習する上での有効性を確認した。実験の結果, GPT-2は様々なモデルの性能を著しく向上し, 数学的推論能力は16.1%, コード生成能力は13.7%向上した。ベイズパーステンションの観点からアライメントフレームワークを再考するための最初のステップを、私たちの作業が提供してくれることを願っています。

With recent advancements in large language models (LLMs), alignment has emerged as an effective technique for keeping LLMs consensus with human intent. Current methods primarily involve direct training through Supervised Fine-tuning (SFT) or Reinforcement Learning from Human Feedback (RLHF), both of which require substantial computational resources and extensive ground truth data. This paper explores an efficient method for aligning black-box large models using smaller models, introducing a model-agnostic and lightweight Bayesian Persuasion Alignment framework. We formalize this problem as an optimization of the signaling strategy from the small model's perspective. In the persuasion process, the small model (Advisor) observes the information item (i.e., state) and persuades large models (Receiver) to elicit improved responses. The Receiver then generates a response based on the input, the signal from the Advisor, and its updated belief about the information item. Through training using our framework, we demonstrate that the Advisor can significantly enhance the performance of various Receivers across a range of tasks. We theoretically analyze our persuasion framework and provide an upper bound on the Advisor's regret, confirming its effectiveness in learning the optimal signaling strategy. Our Empirical results demonstrates that GPT-2 can significantly improve the performance of various models, achieving an average enhancement of 16.1% in mathematical reasoning ability and 13.7% in code generation. We hope our work can provide an initial step toward rethinking the alignment framework from the Bayesian Persuasion perspective.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# コンテキスト位置エンコーディング: 重要なものを数えることを学ぶ

Contextual Position Encoding: Learning to Count What's Important ( http://arxiv.org/abs/2405.18719v1 )

ライセンス: Link先を確認

Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar,

(参考訳) 注意機構はLarge Language Models (LLM) の重要なコンポーネントであり、シーケンス内のトークン同士の対話を可能にするが、順序不変である。 PE(Incorporating position encoding)は、i-thトークンへの出席など、位置ごとの対応を可能にする。しかし、現在のPE法ではトークンカウントを用いて位置を導出しているため、i-th文への出席など、より高度な抽象レベルに一般化することはできない。本論文では,モデルによって決定される特定のトークンにのみ位置を増設することにより,コンテキスト上で位置を条件付けることのできる新しい位置符号化手法であるコンテキスト位置符号化(CoPE)を提案する。これにより、$i$-thの特定の単語、名詞、文への出席など、より一般的な位置アドレス付けが可能になる。一般的な位置埋め込みがフェールした場合,CoPEは選択コピー,カウント,フリップフロップといったタスクを解くことができ,言語モデリングやコーディングタスクの難易度を改善することができることを示す。

The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# 視覚言語ナビゲーションのための大規模モデルによる修正可能なランドマーク発見

Correctable Landmark Discovery via Large Models for Vision-Language Navigation ( http://arxiv.org/abs/2405.18721v1 )

ライセンス: Link先を確認

Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang,

(参考訳) Vision-Language Navigation (VLN) は、ターゲット位置に到達するために、エージェントが言語命令に従う必要がある。ナビゲーションを成功させる重要な要因は、指導で暗示されるランドマークを様々な視覚的観察と整合させることである。しかしながら、以前のVLNエージェントは、限られたナビゲーションデータから学習し、十分なオープンワールドアライメント知識がないため、特に探索されていないシーンでは正確なモダリティアライメントを実行できない。本研究では,Currectable LaNdmark DiScOvery と呼ばれる新しい VLN パラダイムをLarge ModEls (CONSOLE) 経由で提案する。 CONSOLEでは、2つの大きなモデルChatGPTとCLIPに基づく新しい修正可能なランドマーク発見スキームを導入することで、VLNをオープンワールドシーケンシャルなランドマーク発見問題として捉えた。具体的には、ChatGPTを使用して、豊かなオープンワールドのランドマークコモンセンスを提供し、これらのコモンセンスに基づいてCLIP駆動のランドマーク発見を行う。視覚的制約の欠如による前者の騒音を軽減するため,学習可能な共起スコアリングモジュールを導入し,実際の観測結果に基づいて各共起の重要度を補正し,正確なランドマーク発見を行う。我々はさらに、異なるVLNエージェントとエレガントな組み合わせのための観察強化戦略を設計し、修正されたランドマーク特徴を用いて行動決定のための観察機能を得る。複数の人気のあるVLNベンチマーク(R2R、REVERIE、R4R、RxR)の大規模な実験結果から、強力なベースラインよりもCONSOLEの顕著な優位性が確認された。特に,我々のCONSOLEは,目に見えないシナリオにおいて,R2RとR4Rの最先端結果を確立している。コードはhttps://github.com/expectorlin/CONSOLEで入手できる。

Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# コンフォーマルデプレッション予測

Conformal Depression Prediction ( http://arxiv.org/abs/2405.18723v1 )

ライセンス: Link先を確認

Yonghong Li, Shan Qu, Xiuzhuang Zhou,

(参考訳) 深層学習に基づく既存の抑うつ認識手法は将来性を示すが、それらの実践的応用は信頼性の欠如によって妨げられ、深層モデルはしばしば「textit{black box}」モデルとして展開されるため、モデル予測の信頼性については不透明である。うつ病認識のようなリスクの高い臨床応用では、意思決定において不確実性定量化が不可欠である。本稿では,共形予測(CP)に基づく不確実性定量化手法である共形抑うつ予測(CDP)を導入する。 CDPはプラグ・アンド・プレイのモジュールで、モデルの再トレーニングも、うつ病データ分布の仮定も必要としない。 CDPは、入力毎の性能保証ではなく、全ての入力に対して平均的な性能保証を提供するため、近似条件付き共形予測法であるCDP-ACCを提案する。 CDP-ACCは、まず、近傍緩和により予測分布を推定し、次に、ネストシーケンスを構成することで、各入力に対してより厳密な予測間隔を提供する等角スコア関数を導入する。 AVEC 2013 と AVEC 2014 データセットにおける不確実性定量化のうつ病認識への応用と CDP と CDP-ACC の有効性と優位性を実証的に実証した。

While existing depression recognition methods based on deep learning show promise, their practical application is hindered by the lack of trustworthiness, as these deep models are often deployed as \textit{black box} models, leaving us uncertain about the confidence of the model predictions. For high-risk clinical applications like depression recognition, uncertainty quantification is essential in decision-making. In this paper, we introduce conformal depression prediction (CDP), a depression recognition method with uncertainty quantification based on conformal prediction (CP), giving valid confidence intervals with theoretical coverage guarantees for the model predictions. CDP is a plug-and-play module that requires neither model retraining nor an assumption about the depression data distribution. As CDP provides only an average performance guarantee across all inputs rather than per-input performance guarantee, we propose CDP-ACC, an improved conformal prediction with approximate conditional coverage. CDP-ACC firstly estimates the prediction distribution through neighborhood relaxation, and then introduces a conformal score function by constructing nested sequences, so as to provide tighter prediction interval for each specific input. We empirically demonstrate the application of uncertainty quantification in depression recognition, and the effectiveness and superiority of CDP and CDP-ACC on the AVEC 2013 and AVEC 2014 datasets

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# 多ラベル特性予測のための階層型プロンプトを用いた微分分子表現の適応

Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction ( http://arxiv.org/abs/2405.18724v1 )

ライセンス: Link先を確認

Linjia Kang, Songhua Zhou, Shuyan Fang, Shichao Liu, Wen Zhang,

(参考訳) 分子特性の正確な予測は、薬物発見の分野において重要である。しかし、既存の手法では、実世界の分子が通常複数の特性ラベルを持つという事実を完全には考慮していない。したがって、分子表現学習モデルは、タスク間の多粒性相関情報を考慮した微分分子表現を生成する必要がある。この目的のために,階層型プロンプト分子表現学習フレームワーク (HiPM) を導入し,タスク認識プロンプトを通じて分子表現におけるタスクの差分表現を強化し,ラベル間の共有情報を用いて異なるタスク間の負の伝達を緩和する。 HiPMは主に、分子表現エンコーダ(MRE)とタスク・アウェア・プロンプタ(TAP)の2つのコアコンポーネントで構成されている。 MREは、原子レベルとモチーフレベルの分子的特徴を捉えるために、階層的メッセージパッシングネットワークアーキテクチャを使用し、TAPは、集約的階層的クラスタリングを使用して、タスクの親和性と特異性を反映したプロンプトツリーを構築し、モデルがマルチラベルプロパティ予測の複雑さを効果的に処理できるようにする。大規模な実験により、HiPMは様々なマルチラベルデータセットにまたがって最先端のパフォーマンスを達成し、マルチラベル分子表現学習の新しい視点を提供する。

Accurate prediction of molecular properties is critical in the field of drug discovery. However, existing methods do not fully consider the fact that molecules in the real world usually possess multiple property labels, and complex high-order relationships may exist among these labels. Therefore, molecular representation learning models should generate differential molecular representations that consider multi-granularity correlation information among tasks. To this end, our research introduces a Hierarchical Prompted Molecular Representation Learning Framework (HiPM), which enhances the differential expression of tasks in molecular representations through task-aware prompts, and utilizes shared information among labels to mitigate negative transfer between different tasks. HiPM primarily consists of two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). The MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atomic and motif levels, while the TAP uses agglomerative hierarchical clustering to build a prompt tree that reflects the affinity and distinctiveness of tasks, enabling the model to effectively handle the complexity of multi-label property predictions. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a new perspective on multi-label molecular representation learning.

翻訳日:2024-05-30 21:04:06 公開日:2024-05-29

# 根拠のないモバイルクラウドセンシングデータの質を高めることは可能か?

Can We Enhance the Quality of Mobile Crowdsensing Data Without Ground Truth? ( http://arxiv.org/abs/2405.18725v1 )

ライセンス: Link先を確認

Jiajie Li, Bo Gu, Shimin Gong, Zhou Su, Mohsen Guizani,

(参考訳) モバイル・クラウドセンシング(MCS)は、様々な領域で顕著なトレンドとなっている。しかし、モバイルユーザ(MU)が送信したセンシングデータの質を保証することは、依然として複雑で困難な問題である。この課題に対処するためには、低品質なセンシングデータを検出し、MCSシステムの正常な操作を妨害する可能性のある悪意のあるMUを特定するための高度な手法が必要である。そこで本稿では,低品質データを高品質データからセンシングタスクで分離可能なPRBTD(Predict- and reputation-based truth discovery)フレームワークを提案する。まず、相関に着目した時空間変換ネットワークを用いて、入力センシングデータの基礎的真実を予測する。そして、予測結果に基づいて特徴としてデータの検知誤差を抽出し、データ間の影響を計算する。最後に、評価に基づく真理探索(TD)モジュールを設計し、低品質なデータとその意味を識別する。 MUが送信したセンシングデータを考えると、PRBTDは重いノイズでデータを排除し、悪意のあるMUを高精度に識別することができる。 PRBTDは同定精度とデータ品質向上の点で既存の手法よりも優れていた。

Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this article proposes a prediction- and reputation-based truth discovery (PRBTD) framework, which can separate low-quality data from high-quality data in sensing tasks. First, we apply a correlation-focused spatial-temporal transformer network to predict the ground truth of the input sensing data. Then, we extract the sensing errors of the data as features based on the prediction results to calculate the implications among the data. Finally, we design a reputation-based truth discovery (TD) module for identifying low-quality data with their implications. Given sensing data submitted by MUs, PRBTD can eliminate the data with heavy noise and identify malicious MUs with high accuracy. Extensive experimental results demonstrate that PRBTD outperforms the existing methods in terms of identification accuracy and data quality enhancement.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# 聴覚処理経路の逆転:fMRIによる粗大な音像再構成

Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI ( http://arxiv.org/abs/2405.18726v1 )

ライセンス: Link先を確認

Che Liu, Changde Du, Xiaoyu Chen, Huiguang He,

(参考訳) 低レベルの音響特徴から高レベルの意味理解に音を変換する人間の聴覚システムの階層的処理からインスピレーションを得て,新しい粗大な音声再構成手法を提案する。非侵襲的機能的磁気共鳴画像(fMRI)データを活用することで,聴覚処理の逆経路を再現する。 CLAPを用いてfMRIデータを低次元のセマンティック空間に粗くデコードし、続いてセマンティック特徴によって導かれる高次元AudioMAE潜在空間に細粒度デコードする。これらの微細な神経機能は、潜在拡散モデル(LDM)によるオーディオ再構成の条件として機能する。 Brain2Sound、Brain2Music、Brain2Speechの3つの公開fMRIデータセットに対する検証は、FD、FAD、KLといったメトリクスで最先端のパフォーマンスを示す、スタンドアローンの微細なアプローチよりも粗大な復号法の方が優れていることを示す。さらに,復号化時に意味的プロンプトを用いることで,意味的特徴が最適でない場合に,再構成音声の品質を向上させる。多様な刺激にまたがるモデルの多角性を示すことは、脳から音声への普遍的な枠組みとしての可能性を浮き彫りにしている。本研究は,人間の聴覚システムの理解に寄与し,神経復号法と音声再構成法の境界を推し進める。

Drawing inspiration from the hierarchical processing of the human auditory system, which transforms sound from low-level acoustic features to high-level semantic understanding, we introduce a novel coarse-to-fine audio reconstruction method. Leveraging non-invasive functional Magnetic Resonance Imaging (fMRI) data, our approach mimics the inverse pathway of auditory processing. Initially, we utilize CLAP to decode fMRI data coarsely into a low-dimensional semantic space, followed by a fine-grained decoding into the high-dimensional AudioMAE latent space guided by semantic features. These fine-grained neural features serve as conditions for audio reconstruction through a Latent Diffusion Model (LDM). Validation on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech-underscores the superiority of our coarse-to-fine decoding method over stand-alone fine-grained approaches, showcasing state-of-the-art performance in metrics like FD, FAD, and KL. Moreover, by employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal. The demonstrated versatility of our model across diverse stimuli highlights its potential as a universal brain-to-audio framework. This research contributes to the comprehension of the human auditory system, pushing boundaries in neural decoding and audio reconstruction methodologies.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# CtrlA: プローブ誘導制御による適応型検索拡張生成

CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control ( http://arxiv.org/abs/2405.18727v1 )

ライセンス: Link先を確認

Huanshuo Liu, Hao Zhang, Zhijiang Guo, Kuicai Dong, Xiangyang Li, Yi Quan Lee, Cong Zhang, Yong Liu,

(参考訳) 大規模言語モデル(LLM)の幻覚を、検索された外部知識で緩和するための有望な解決策として、検索拡張世代(RAG)が出現している。 Adaptive RAGは、検索の必要性を動的に評価し、外部知識と内部知識のバランスをとることによって、このアプローチを強化する。しかし,既存の適応RAG法は,LLMの言語的,あるいは確率的フィードバックに頼って,要求に基づく検索を主に実現し,慎重に構築したデータセットを直接微調整することで,信頼性の低い検索要求決定,高コスト化,および準最適応答生成を実現している。 CtrlAと呼ばれる効果的なプローブ誘導適応RAGフレームワークを導入することで、LCMの内部状態を探索し、そのような問題を緩和する試みを初めて提示する。具体的には、CtrlAは、LLMの内部状態を監視し、信頼度を評価するための信頼プローブと、LLMの表現を操作することによってLCMの振舞いを調節する。実験により、CtrlAは様々なタスクにおいて既存の適応RAG法よりも優れていることが示され、正直な制御によりLLMを効果的に誠実にすることができ、信頼性監視が検索トリガの有望な指標であることが証明された。私たちのコードはhttps://github.com/HSLiu-Initial/CtrlA.git.comで公開されています。

Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying on superficially verbalize-based or probability-based feedback of LLMs, or directly fine-tuning LLMs via carefully crafted datasets, resulting in unreliable retrieval necessity decisions, heavy extra costs, and sub-optimal response generation. We present the first attempts to delve into the internal states of LLMs to mitigate such issues by introducing an effective probe-guided adaptive RAG framework, termed CtrlA. Specifically, CtrlA employs an honesty probe to regulate the LLM's behavior by manipulating its representations for increased honesty, and a confidence probe to monitor the internal states of LLM and assess confidence levels, determining the retrieval necessity during generation. Experiments show that CtrlA is superior to existing adaptive RAG methods on a diverse set of tasks, the honesty control can effectively make LLMs more honest and confidence monitoring is proven to be a promising indicator of retrieval trigger. Our codes are available at https://github.com/HSLiu-Initial/CtrlA.git.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# オフライン強化学習のための優先アクション最適化拡散法

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning ( http://arxiv.org/abs/2405.18729v1 )

ライセンス: Link先を確認

Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li, Dongjiang Li, Zecui Zeng, Lei Sun, Yue Chen, Xuelong Wei, Lusong Li, Xiaodong He,

(参考訳) オフライン強化学習(RL)は、以前に収集したデータセットから最適なポリシーを学ぶことを目的としている。近年、その強力な表現能力により、拡散モデルはオフラインのRL問題に対するポリシーモデルとして大きな可能性を示している。しかし、拡散ポリシーに基づく以前のオフラインRLアルゴリズムは、一般的にポリシーを改善するために重み付き回帰を採用する。このアプローチは、収集されたアクションのみを使用してポリシーを最適化し、Q値に敏感であり、さらなるパフォーマンス向上の可能性を制限する。そこで本研究では,オフラインRLのための新たな優先作用最適化拡散ポリシーを提案する。特に、表現的条件拡散モデルを用いて、行動ポリシーの多様な分布を表現する。一方、拡散モデルに基づいて、同じ行動分布内にある好ましい行動が、評論家関数を介して自動的に生成される。さらに, 騒音優先行動に適応し, 安定した訓練を行うことにより, 政策改善を図るために, 雑音優先選択最適化を設計する。特にKitchenやAntMazeのような粗末な報酬タスクにおいて,従来のオフラインRL手法と比較して,提案手法が競争力や優れた性能を提供することを示す。さらに,提案手法の有効性を実証的に証明した。

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach optimizes the policy only using the collected actions and is sensitive to Q-values, which limits the potential for further performance enhancement. To this end, we propose a novel preferred-action-optimized diffusion policy for offline RL. In particular, an expressive conditional diffusion model is utilized to represent the diverse distribution of a behavior policy. Meanwhile, based on the diffusion model, preferred actions within the same behavior distribution are automatically generated through the critic function. Moreover, an anti-noise preference optimization is designed to achieve policy improvement by using the preferred actions, which can adapt to noise-preferred actions for stable training. Extensive experiments demonstrate that the proposed method provides competitive or superior performance compared to previous state-of-the-art offline RL methods, particularly in sparse reward tasks such as Kitchen and AntMaze. Additionally, we empirically prove the effectiveness of anti-noise preference optimization.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# VBIM-Net:逆散乱問題に対する変分独立反復ネットワーク

VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems ( http://arxiv.org/abs/2405.18731v1 )

ライセンス: Link先を確認

Ziqing Xing, Zhaoyang Zhang, Zirui Chen, Yusong Wang, Haoran Ma, Zhun Wei, Gang Bao,

(参考訳) 近年,逆散乱問題 (ISP) の解法として,フィールド型反復法と深層学習 (DL) 技術を統合する可能性が高まっている。本稿では,VBIM-Netという新しい変動型ボーン・イテレーティブ・ネットワークを提案する。提案するVBIM-Netは,複数のサブネットワーク層による変動ボルン反復法(VBIM)における全電界とコントラストの交互更新をエミュレートする。我々は,各サブネットワークにコントラスト変動の計算を組み込み,散乱体残差を近似コントラスト変動に変換し,U-Netで拡張することにより,既存のアプローチのように一致した測定寸法とグリッド解像度の要求を回避する。各層の出力の総体とコントラストは、サブネットの変数の物理的解釈性を保証するVBIM-Netの損失関数に制御される。さらに、モデルの安定性を高めるために、余分なノイズを伴うトレーニングスキームを設計する。提案したVBIM-Netのインバージョン品質,一般化能力,ロバスト性を検証した。この研究は、効率的なフィールド型DLスキームの設計に新たなインスピレーションを与えるかもしれない。

Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating updates of the total electric field and the contrast in the variational Born iterative method (VBIM) by multiple layers of subnetworks. We embed the calculation of the contrast variation into each of the subnetworks, converting the scattered field residual into an approximate contrast variation and then enhancing it by a U-Net, thus avoiding the requirement of matched measurement dimension and grid resolution as in existing approaches. The total field and contrast of each layer's output is supervised in the loss function of VBIM-Net, which guarantees the physical interpretability of variables of the subnetworks. In addition, we design a training scheme with extra noise to enhance the model's stability. Extensive numerical results on synthetic and experimental data both verify the inversion quality, generalization ability, and robustness of the proposed VBIM-Net. This work may provide some new inspiration for the design of efficient field-type DL schemes.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# Gemini & Physical World: 大規模言語モデルはマルチモーダルソーシャルメディアポストから地震の震度を推定できる

Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts ( http://arxiv.org/abs/2405.18732v1 )

ライセンス: Link先を確認

S. Mostafa Mousavi, Marc Stogaitis, Tajinder Gadh, Richard M Allen, Alexei Barski, Robert Bosch, Patrick Robertson, Nivetha Thiruverahan, Youngmin Cho,

(参考訳) 本稿では,ソーシャルメディアデータとCCTV映像を用いた地盤揺らぎ強度の推定手法を提案する。マルチモーダル言語モデルであるGemini Pro(Reid et al 2024)モデルを用いて、生成AIと自然言語処理を利用した非構造化データから関連情報を抽出できることを実証する。モデル出力は、MMI(Modified Mercalli Intensity)値の形で、独立した観測データとよく一致している。さらに,ゲミニは,高度な視覚的・聴覚的理解能力の他に,訓練中に獲得したと思われる地震の大きさ,距離,MMI強度の一般的関係の理解の簡易化など,さらなる知識の源泉を生かしていると考えられる。これらの発見は、ジェミニの物理世界に対する一般的な理解の範囲とその現象に関する興味深い疑問を提起する。ゲミニが確立された科学的知識と整合した結果を生成する能力は、ジェミニのようなLLMが地震のような複雑な物理現象の理解を深める可能性を強調している。より具体的には、この研究の結果は、ジェミニのようなLLMが市民の地震学に革命をもたらす可能性を強調し、目撃者によるクラウドソースされたデータの迅速かつ効果的で柔軟な分析を可能にし、地震の影響を評価し、危機的状況認識を提供する。この手法は, 早期警戒システムの改善, 災害対応, 地震発生域全体の回復性向上に大きく貢献する。本研究は,震災対策のためのソーシャルメディアとAIの力を活用するための重要なステップを提供する。

This paper presents a novel approach for estimating the ground shaking intensity using social media data and CCTV footage. Employing the Gemini Pro (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. The ability of Gemini to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# 中国のチェッカーにおける効率的な学習--マルチエージェント強化学習におけるパラメータ共有の比較

Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning ( http://arxiv.org/abs/2405.18733v1 )

ライセンス: Link先を確認

Noah Adhikari, Allen Gu,

(参考訳) 完全パラメータ共有型マルチエージェント強化学習(MARL)は,中国チェッカーの完全情報同種競争において,独立した部分共有アーキテクチャよりも優れることを示す。実験を行うため、可変サイズ6プレーヤチャイナチェッカーという新しいMARL環境を開発した。このカスタム環境はPettingZooで開発され、チェリングジャンプを含むゲームの伝統的なルールをすべてサポートしている。これは私たちの知る限りでは、真のゲームに忠実な中国のチェッカーの最初の実装です。中国のチェッカーは、その大きな分岐係数と潜在的に無限の地平線のために学ぶのが難しい。我々は、他のRLドメインの複雑なアクション空間から分岐アクション(サブムーブ)の概念を借用する。これにより、作用空間の次元が大幅に減少する。我々の観測空間はAlphaGoにインスパイアされ、情報を符号化するために多くのバイナリゲームボードを3Dアレーに積み重ねている。 PettingZoo環境、トレーニングおよび評価ロジック、分析スクリプトは、 \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}で見ることができる。

We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditional rules of the game including chaining jumps. This is, to the best of our knowledge, the first implementation of Chinese Checkers that remains faithful to the true game. Chinese Checkers is difficult to learn due to its large branching factor and potentially infinite horizons. We borrow the concept of branching actions (submoves) from complex action spaces in other RL domains, where a submove may not end a player's turn immediately. This drastically reduces the dimensionality of the action space. Our observation space is inspired by AlphaGo with many binary game boards stacked in a 3D array to encode information. The PettingZoo environment, training and evaluation logic, and analysis scripts can be found on \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# PillarHist:ハイト・アウェア・ヒストグラムに基づく量子化対応ピラー特徴エンコーダ

PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram ( http://arxiv.org/abs/2405.18734v1 )

ライセンス: Link先を確認

Sifan Zhou, Zhihang Yuan, Dawei Yang, Xubin Wen, Xing Hu, Yuguang Shi, Ziyu Zhao, Xiaobo Lu,

(参考訳) リアルタイムかつ高性能な3Dオブジェクト検出は、自律走行とロボット工学において重要な役割を果たす。最近の柱型3次元物体検出器は、コンパクトな表現と計算オーバーヘッドの低さから注目されており、搭載された配置や量子化に適している。しかし、既存の柱型検出器は、柱特徴符号化(PFE)において、その性能と量子化ポテンシャルを著しく制限する、高さ寸法に沿った情報損失と大きな数値分布差に悩まされている。上記の課題に対処するために,まず,PFE中の異なる入力情報の重要性を明らかにし,その高さ寸法を3次元検出性能向上の鍵となる要素として同定する。そこで本研究では,PillarHistという高さ対応の柱特徴エンコーダを提案する。具体的には、ピラーヒストは1つの柱内の異なる高さの点の離散分布を統計する。このシンプルで効果的な設計は、PFEの計算オーバーヘッドを大幅に減らしながら、高さの寸法に沿った情報を大幅に保存する。一方、PillarHistは、PFE入力の算術的分布を安定範囲に制限し、量子化に親しみやすいようにしている。特に、PillarHistはPFEステージ内でのみ動作してパフォーマンスを向上し、複雑な操作を導入することなく、既存の柱ベースのメソッドへのシームレスな統合を可能にする。大規模な実験は効率と性能の両面でPillarHistの有効性を示している。

Real-time and high-performance 3D object detection plays a critical role in autonomous driving and robotics. Recent pillar-based 3D object detectors have gained significant attention due to their compact representation and low computational overhead, making them suitable for onboard deployment and quantization. However, existing pillar-based detectors still suffer from information loss along height dimension and large numerical distribution difference during pillar feature encoding (PFE), which severely limits their performance and quantization potential. To address above issue, we first unveil the importance of different input information during PFE and identify the height dimension as a key factor in enhancing 3D detection performance. Motivated by this observation, we propose a height-aware pillar feature encoder named PillarHist. Specifically, PillarHist statistics the discrete distribution of points at different heights within one pillar. This simple yet effective design greatly preserves the information along the height dimension while significantly reducing the computation overhead of the PFE. Meanwhile, PillarHist also constrains the arithmetic distribution of PFE input to a stable range, making it quantization-friendly. Notably, PillarHist operates exclusively within the PFE stage to enhance performance, enabling seamless integration into existing pillar-based methods without introducing complex operations. Extensive experiments show the effectiveness of PillarHist in terms of both efficiency and performance.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# WLC-Net: 頑健で高速な深層学習木葉分類法

WLC-Net: a robust and fast deep-learning wood-leaf classification method ( http://arxiv.org/abs/2405.18737v1 )

ライセンス: Link先を確認

Hanlong Li, Pei Wang, Yuhan Wu, Jing Ren, Yuhang Gao, Lingyun Zhang, Mingtai Zhang, Wenxin Chen,

(参考訳) 木葉分類は, 地中レーザースキャン(TLS)点群の解析と推定において必須かつ基本的な前提条件であり, 胸の高さ(DBH), 地中バイオマス(AGB), 木体積などの重要な測定値を含む。これに対処するため, 木葉分類ネットワーク(WLC-Net), 木葉分類ネットワーク(WLC-Net)を導入し, 木葉と葉点を木点クラウド内で区別する深層学習モデルを提案する。WLC-Netは, 線形性を固有の特徴として組み込んだ線形性の向上, 入力出力フレームワークの最適化, 遠心解析手法の最適化などにより, 3種の樹木データセットを用いて評価を行った。さらに、WLC-Netは様々なツリーポイントクラウドに強い適用性を示し、さらなる最適化を約束している。

Wood-leaf classification is an essential and fundamental prerequisite in the analysis and estimation of forest attributes from terrestrial laser scanning (TLS) point clouds,including critical measurements such as diameter at breast height(DBH),above-ground biomass(AGB),wood volume.To address this,we introduce the Wood-Leaf Classification Network(WLC-Net),a deep learning model derived from PointNet++,designed to differentiate between wood and leaf points within tree point clouds.WLC-Net enhances classification accuracy,completeness,and speed by incorporating linearity as an inherent feature,refining the input-output framework,and optimizing the centroid sampling technique.WLC-Net was trained and assessed using three distinct tree species datasets,comprising a total of 102 individual tree point clouds:21 Chinese ash trees,21 willow trees,and 60 tropical trees.For comparative evaluation,five alternative methods,including PointNet++,DGCNN,Krishna Moorthy's method,LeWoS, and Sun's method,were also applied to these datasets.The classification accuracy of all six methods was quantified using three metrics:overall accuracy(OA),mean Intersection over Union(mIoU),and F1-score.Across all three datasets,WLC-Net demonstrated superior performance, achieving OA scores of 0.9778, 0.9712, and 0.9508;mIoU scores of 0.9761, 0.9693,and 0.9141;and F1-scores of 0.8628, 0.7938,and 0.9019,respectively.The time costs of WLC-Net were also recorded to evaluate the efficiency.The average processing time was 102.74s per million points for WLC-Net.In terms of visual inspect,accuracy evaluation and efficiency evaluation,the results suggest that WLC-Net presents a promising approach for wood-leaf classification,distinguished by its high accuracy. In addition,WLC-Net also exhibits strong applicability across various tree point clouds and holds promise for further optimization.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# 多モードLDMにおける逆画像検索キュースパラメトリックメモリ

Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs ( http://arxiv.org/abs/2405.18740v1 )

ライセンス: Link先を確認

Jialiang Xu, Michael Moor, Jure Leskovec,

(参考訳) 近年のMLLM(Multimodal large language model)の顕著な進歩にもかかわらず、GPT-4スイートのような最先端のモデルは、知識集約的なタスクに苦戦している。これを解決するために、逆画像検索(Reverse Image Retrieval、RIR)拡張生成について検討する。 RIRは、GPT-4Vの知識集約型視覚質問応答(VQA)を37-43%、GPT-4 Turboを25-27%、GPT-4oを18-20%改善する。驚いたことに、RIRはモデルが自身の世界知識によりよくアクセスするのに役立ちます。具体的には、RIR拡張は、クエリへの直接応答を必ずしも含まない視覚的およびテキスト的手がかりを提供することによって有効であることを示す。また,RIRがパフォーマンスを損なうようなケースを解明し,人的評価を行う。最後に、RIRを使用することによる全体的なアドバンテージは、RIRをデフォルト設定であるアプローチよりも優れたパフォーマンスを実現するために、RIRを使用するエージェントを選択することが難しくなることに気付きます。

Despite impressive advances in recent multimodal large language models (MLLMs), state-of-the-art models such as from the GPT-4 suite still struggle with knowledge-intensive tasks. To address this, we consider Reverse Image Retrieval (RIR) augmented generation, a simple yet effective strategy to augment MLLMs with web-scale reverse image search results. RIR robustly improves knowledge-intensive visual question answering (VQA) of GPT-4V by 37-43%, GPT-4 Turbo by 25-27%, and GPT-4o by 18-20% in terms of open-ended VQA evaluation metrics. To our surprise, we discover that RIR helps the model to better access its own world knowledge. Concretely, our experiments suggest that RIR augmentation helps by providing further visual and textual cues without necessarily containing the direct answer to a query. In addition, we elucidate cases in which RIR can hurt performance and conduct a human evaluation. Finally, we find that the overall advantage of using RIR makes it difficult for an agent that can choose to use RIR to perform better than an approach where RIR is the default setting.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# Genshin: 大規模言語モデルによる自然言語処理のための汎用シールド

Genshin: General Shield for Natural Language Processing with Large Language Models ( http://arxiv.org/abs/2405.18741v1 )

ライセンス: Link先を確認

Xiao Peng, Tao Liu, Ying Wang,

(参考訳) ChatGPT、Gemini、LLaMAのような大規模言語モデル(LLM)が最近流行し、無数のドメインでかなりの進歩と一般化能力を示している。しかし、LSMはより大きなブラックボックスが不透明度を悪化させ、解釈可能性はほとんどない。 LLMの本質に埋め込まれた不確実性と不透明性は、金融詐欺やフィッシングなどの高額な領域への適用を制限する。現在のアプローチは、主に後方解釈可能なアルゴリズムによる従来のテキスト分類に依存しており、システムの防御を壊すために多種多様な敵のサンプルを作成する攻撃者に悩まされ、ユーザーは効率と堅牢性の間のトレードオフを強制する。この問題に対処するために,LLMを防御的なワンタイムプラグインとして活用する,Genshin(大規模言語モデル付き自然言語処理一般シールド)と呼ばれる新しいカスケーディングフレームワークを提案する。テキストを新しい、あるいは構造的なものに変えようとするLLMのほとんどのアプリケーションとは異なり、源信はLLMを使ってテキストを元の状態に復元する。玄信は、LLMの一般化可能性、中央モデルの識別、単純モデルの解釈可能性を組み合わせることを目的としている。感傷的分析とスパム検出の課題に対する実験により,現在の中央値モデルに致命的な欠陥がみられ,LLMの回復能力が向上し,ゲンシンが効果的かつ効果的であることが確認された。アブレーション研究では、いくつかの興味深い観察を発掘した。第4パラダイムから派生したツールである LLM ディフェンダーを用いて, BERT の最適マスクレート 15% を NLP の第3パラダイムに再現した。さらに、LLMを潜在的な敵ツールとして使用する場合、攻撃者は意味的にほとんど損失のない効果的な攻撃を実行することができる。

Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive one-time plug-ins. Unlike most applications of LLMs that try to transform text into something new or structural, Genshin uses LLMs to recover text to its original state. Genshin aims to combine the generalizability of the LLM, the discrimination of the median model, and the interpretability of the simple model. Our experiments on the task of sentimental analysis and spam detection have shown fatal flaws of the current median models and exhilarating results on LLMs' recovery ability, demonstrating that Genshin is both effective and efficient. In our ablation study, we unearth several intriguing observations. Utilizing the LLM defender, a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimal mask rate results in the 3rd paradigm of NLP. Additionally, when employing the LLM as a potential adversarial tool, attackers are capable of executing effective attacks that are nearly semantically lossless.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# 文法的誘導による音楽的フレーズ分割

Musical Phrase Segmentation via Grammatical Induction ( http://arxiv.org/abs/2405.18742v1 )

ライセンス: Link先を確認

Reed Perkins, Dan Ventura,

(参考訳) 入力シーケンスから文脈自由文法を推論するアルゴリズムのクラスである文法的帰納アルゴリズムを用いた音楽句のセグメンテーションの課題について概説する。様々な音楽的視点の組み合わせを用いて、3つのデータセット上での5つの文法的帰納アルゴリズムの性能を解析する。実験の結果, LONGESTFIRSTアルゴリズムは, 3つのデータセットのすべてで最高のF1スコアを達成し, 時間的視点を含む入力エンコーディングが最高のパフォーマンスをもたらすことがわかった。

We outline a solution to the challenge of musical phrase segmentation that uses grammatical induction algorithms, a class of algorithms which infer a context-free grammar from an input sequence. We analyze the performance of five grammatical induction algorithms on three datasets using various musical viewpoint combinations. Our experiments show that the LONGESTFIRST algorithm achieves the best F1 scores across all three datasets and that input encodings that include the duration viewpoint result in the best performance.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# PermLLM: WAN下での3秒以内の大規模言語モデルのプライベート推論

PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN ( http://arxiv.org/abs/2405.18744v1 )

ライセンス: Link先を確認

Fei Zheng, Chaochao Chen, Zhongxuan Han, Xiaolin Zheng,

(参考訳) ChatGPTの出現は、大きな言語モデル(LLM)時代の到来を表している。 LLMは様々な分野でそのパワーを実証するが、ユーザーのクエリがモデルプロバイダに送られると、深刻なプライバシー上の懸念が生じる。一方、ユーザーのデバイスにLSMをデプロイすると、すべてのモデルデータがリークされる。セキュアなマルチパーティ計算(MPC)に基づく既存の手法は、モデルのパラメータとユーザクエリの両方のプライバシを保護することができた。しかし、1つのトークンだけを生成するには、ギガバイトのデータ転送と数分を要するため、ほとんどの現実世界アプリケーションでは実用的ではない。セキュアなランダムな置換を用いた非線形関数の評価を高速化するPermLLMを提案する。 PermLLMは、最適化された秘密共有プロトコルと同型暗号化とともに、既存のMPCソリューションよりも大幅に高速な10ms RTTと1Gbpsのネットワーク設定の下で、約3s/tokenの速度でChatGLM-6Bモデルの双方向のプライベート推論を実現する。

The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed to protect both the privacy of the model parameters and user queries. However, they require gigabytes of data transfer and several minutes to generate just one token, making them impractical for most real-world applications. To improve the efficiency of private LLM inference, we propose PermLLM, which accelerates the evaluation of non-linear functions using secure random permutation. Along with the optimized secret sharing protocols and homomorphic encryption, PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token, under a realistic network setting (10ms RTT and 1Gbps bandwidth), which is magnitudes faster than existing MPC solutions.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# PanoNormal:単眼の室内360度表面の正規化

PanoNormal: Monocular Indoor 360° Surface Normal Estimation ( http://arxiv.org/abs/2405.18745v1 )

ライセンス: Link先を確認

Kun Huang, Fanglue Zhang, Neil Dodgson,

(参考訳) 等角面上の球面歪みの存在は、表面正規推定のような高密度回帰コンピュータビジョンタスクにおいて明らかな課題である。畳み込みニューラルネットワーク(CNN)の最近の進歩は、球面の歪みを緩和しようとするが、多くの場合、その固定された受容野のために、全体構造を効果的に捉えるのに不足する。一方、視覚変換器(ViT)は、グローバルな自己アテンション機構を通じて長距離依存関係を確立するのに優れるが、局所的な詳細を保存する際の制限に直面する。 CNN と ViT の強度を組み合わせた 360{\deg} 画像のための単分子面正規推定アーキテクチャである \textit{PanoNormal} を紹介する。具体的には,球面特徴分布を考慮した多段階のグローバル自己注意方式を採用し,シーンの包括的理解を高めた。実験結果から,本手法は複数の一般的な360{\deg}単分子データセットにまたがる最先端性能を実現することができることがわかった。コードとモデルはリリースされる。

The presence of spherical distortion on the Equirectangular image is an acknowledged challenge in dense regression computer vision tasks, such as surface normal estimation. Recent advances in convolutional neural networks (CNNs) strive to mitigate spherical distortion but often fall short in capturing holistic structures effectively, primarily due to their fixed receptive field. On the other hand, vision transformers (ViTs) excel in establishing long-range dependencies through a global self-attention mechanism, yet they encounter limitations in preserving local details. We introduce \textit{PanoNormal}, a monocular surface normal estimation architecture designed for 360{\deg} images, which combines the strengths of CNNs and ViTs. Specifically, we employ a multi-level global self-attention scheme with the consideration of the spherical feature distribution, enhancing the comprehensive understanding of the scene. Our experimental results demonstrate that our approach achieves state-of-the-art performance across multiple popular 360{\deg} monocular datasets. The code and models will be released.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# STIQ: 信頼できないクラウドからの量子ニューラルネットワークのトレーニングと推論の保護

STIQ: Safeguarding Training and Inferencing of Quantum Neural Networks from Untrusted Cloud ( http://arxiv.org/abs/2405.18746v1 )

ライセンス: Link先を確認

Satwik Kundu, Swaroop Ghosh,

(参考訳) 現在の量子クラウドプロバイダが課している高コストは、量子リソースの増大の必要性と相まって、潜在的に信頼できないプロバイダからのより安価なクラウドベースの量子サービスの台頭を動機付ける可能性がある。これらの信頼できないプラットフォームに量子ニューラルネットワーク(QNN)などの量子モデルをデプロイしたり、ホストしたりすると、無数のセキュリティ上の懸念が生じ、最も重要なのがモデル盗難である。この脆弱性は、トレーニングや推論中にクラウドプロバイダがこれらの回路に完全にアクセスすることに由来する。そこで本研究では,このようなクラウドベースの敵に対するQNNの保護を目的とした,新たなアンサンブルベースの戦略であるSTIQを紹介する。提案手法は,同一または異なるプラットフォーム上でホストする2つの異なるQNNを同時に訓練し,各ネットワークが難解な出力を出力することにより,クラウド環境内で動作している敵に対して,個々のQNNを非効率に処理する。しかし、これらの出力が(集約関数を使って)局所的に結合されると、正しい結果が明らかになる。様々なQNNやデータセットにわたる広範な実験を通じて、我々の手法は、計算オーバーヘッドの合計で$$\leq 2\times$を犠牲にして、個別にホストされたモデルの精度と損失を最大66%まで効果的に隠蔽することが証明された。しかし、このトレードオフは、クラウドベースの環境でのQNNのセキュリティ強化と整合性のために、信頼できない敵に対して支払う小さな価格である。また、実際の127量子ビットのIBM\_Sherbrookeハードウェア上でSTIQの実用的応用を実証し、STIQが最大60%の難読化を実現し、非難読化モデルに匹敵する性能を併せ持つことを示した。

The high expenses imposed by current quantum cloud providers, coupled with the escalating need for quantum resources, may incentivize the emergence of cheaper cloud-based quantum services from potentially untrusted providers. Deploying or hosting quantum models, such as Quantum Neural Networks (QNNs), on these untrusted platforms introduces a myriad of security concerns, with the most critical one being model theft. This vulnerability stems from the cloud provider's full access to these circuits during training and/or inference. In this work, we introduce STIQ, a novel ensemble-based strategy designed to safeguard QNNs against such cloud-based adversaries. Our method innovatively trains two distinct QNNs concurrently, hosting them on same or different platforms, in a manner that each network yields obfuscated outputs rendering the individual QNNs ineffective for adversaries operating within cloud environments. However, when these outputs are combined locally (using an aggregate function), they reveal the correct result. Through extensive experiments across various QNNs and datasets, our technique has proven to effectively masks the accuracy and losses of the individually hosted models by upto 76\%, albeit at the expense of $\leq 2\times$ increase in the total computational overhead. This trade-off, however, is a small price to pay for the enhanced security and integrity of QNNs in a cloud-based environment prone to untrusted adversaries. We also demonstrated STIQ's practical application by evaluating it on real 127-qubit IBM\_Sherbrooke hardware, showing that STIQ achieves up to 60\% obfuscation, with combined performance comparable to an unobfuscated model.

翻訳日:2024-05-30 18:58:09 公開日:2024-05-29

# 抗体モデルのためのSARS-CoV-2相互作用データセットとVHH系列コーパス

A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models ( http://arxiv.org/abs/2405.18749v1 )

ライセンス: Link先を確認

Hirofumi Tsuruta, Hiroyuki Yamazaki, Ryota Maeda, Ryotaro Tamura, Akihiro Imura,

(参考訳) 抗体は、有害な異物を取り除くために免疫系によって生産される重要なタンパク質であり、ヒト疾患の治療において重要な治療薬となっている。抗体治療の発見を加速するため, 抗体配列を用いた言語モデル構築への関心が高まっている。しかし,ラベル付きデータセットの不足により,事前学習した言語モデルの抗体発見への適用性は十分に評価されていない。 AVIDa-SARS-CoV-2は重症急性呼吸器症候群ウイルス2(SARS-CoV-2)スパイクタンパク質に免疫された2つのアルパサから得られた重鎖抗体(VHH)相互作用の抗原可変ドメインを特徴とするデータセットである。 AVIDa-SARS-CoV-2は、デルタおよびOmicron変異体のような12のSARS-CoV-2変異体への多様なVHH配列の結合または非結合を示すバイナリラベルを含む。さらに,VHHCorpus-2Mは,200万以上のVHH配列を含む,抗体言語モデルの事前学習データセットである。 VHHCorpus-2Mおよび既存の一般タンパク質および抗体特異的言語モデルを用いたVHHBERTを用いたSARS-CoV-2-VHH結合予測のためのベンチマーク結果を報告する。これらの結果は,AVIDa-SARS-CoV-2が結合予測のための抗体言語モデルの表現能力を評価するための貴重なベンチマークを提供し,AI駆動型抗体発見の開発を容易にすることを確認する。データセットはhttps://datasets.cognanous.comで公開されている。

Antibodies are crucial proteins produced by the immune system to eliminate harmful foreign substances and have become pivotal therapeutic agents for treating human diseases. To accelerate the discovery of antibody therapeutics, there is growing interest in constructing language models using antibody sequences. However, the applicability of pre-trained language models for antibody discovery has not been thoroughly evaluated due to the scarcity of labeled datasets. To overcome these limitations, we introduce AVIDa-SARS-CoV-2, a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. Furthermore, we release VHHCorpus-2M, a pre-training dataset for antibody language models, containing over two million VHH sequences. We report benchmark results for predicting SARS-CoV-2-VHH binding using VHHBERT pre-trained on VHHCorpus-2M and existing general protein and antibody-specific pre-trained language models. These results confirm that AVIDa-SARS-CoV-2 provides valuable benchmarks for evaluating the representation capabilities of antibody language models for binding prediction, thereby facilitating the development of AI-driven antibody discovery. The datasets are available at https://datasets.cognanous.com.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# T2V-Turbo:Mixed Reward Feedbackによるビデオ一貫性モデルの高品質化

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback ( http://arxiv.org/abs/2405.18750v1 )

ライセンス: Link先を確認

Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang,

(参考訳) 拡散に基づくテキスト・ツー・ビデオ(T2V)モデルは大きな成功を収めたが、反復サンプリングプロセスの遅いサンプリング速度によって妨げられ続けている。この課題に対処するために、サンプル品質のコストにもかかわらず、高速な推論を容易にするために一貫性モデルが提案されている。本稿では,ビデオ一貫性モデル(VCM)の品質ボトルネックを解消し,高速かつ高品質なビデオ生成を実現することを目的としている。本稿では,T2V-Turboについて述べる。このT2V-Turboは,様々な報酬モデルから得られるフィードバックを,事前学習したT2Vモデルの一貫性蒸留(CD)プロセスに統合する。特に、CD損失の計算から自然に生じる単一ステップ世代に関連する報酬を直接最適化し、反復サンプリングプロセスを通じて勾配の逆伝播によるメモリ制約を効果的に回避する。興味深いことに、我々のT2V-Turboの4段階の世代は、Gen-2とPikaを抜いてVBenchで最高スコアを達成した。さらに,T2V-Turboの4ステップ世代は,教師モデルから得られた50ステップのDDIMサンプルよりも好まれ,ビデオ生成品質を向上しつつ,10倍以上の加速を示すことが確認された。

Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$. We introduce T2V-Turbo, which integrates feedback from a mixture of differentiable reward models into the consistency distillation (CD) process of a pre-trained T2V model. Notably, we directly optimize rewards associated with single-step generations that arise naturally from computing the CD loss, effectively bypassing the memory constraints imposed by backpropagating gradients through an iterative sampling process. Remarkably, the 4-step generations from our T2V-Turbo achieve the highest total score on VBench, even surpassing Gen-2 and Pika. We further conduct human evaluations to corroborate the results, validating that the 4-step generations from our T2V-Turbo are preferred over the 50-step DDIM samples from their teacher models, representing more than a tenfold acceleration while improving video generation quality.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# 条件付きバッチ正規化を用いた補助タスク変調によるマルチモーダルメタラーニングの限界について

On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization ( http://arxiv.org/abs/2405.18751v1 )

ライセンス: Link先を確認

Jordi Armengol-Estapé, Vincent Michalski, Ramnath Kumar, Pierre-Luc St-Charles, Doina Precup, Samira Ebrahimi Kahou,

(参考訳) 少ないショット学習は、少数の例から見れば、新しいタスクに対処できる表現を学習することを目的としている。近年の研究では、クロスモーダル学習は、数発の分類において表現を改善することが示されている。より具体的に言えば、言語は視覚学習を導くのに使える豊富なモダリティである。本研究では, 分類器, 補助ネットワーク, ブリッジネットワークという3つのコンポーネントから構成される, 数ショット学習のためのマルチモーダルアーキテクチャを実験する。分類器が主分類タスクを実行する間、補助ネットワークは同じ入力から言語表現を予測することを学習し、ブリッジネットワークは、補助ネットワークの高レベルな特徴を条件付きバッチ正規化を用いて、少数ショット分類器の層に対する変調パラメータに変換する。このブリッジは、言語と視覚の間の軽量なセマンティックアライメントの形式を奨励し、分類器に役立てるべきである。しかし、2つの一般的な数ショット分類ベンチマークに対する提案されたアプローチを評価すると、そのことが分かる。 a) 改善はベンチマーク全体にわたって再現されず、 b)ブリッジネットワークによって導入された計算とパラメータの追加による改善。言語表現を用いたマルチモーダルなメタラーニングにおける今後の研究に対する洞察と提言に貢献する。

Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# 再現性危機の解決--認証ロバスト性検証の事例から

Confronting the Reproducibility Crisis: A Case Study in Validating Certified Robustness ( http://arxiv.org/abs/2405.18753v1 )

ライセンス: Link先を確認

Richard H. Moulton, Gary A. McCully, John D. Hastings,

(参考訳) 再現性は科学的研究の基盤であり、検証、拡張、進歩を可能にする。しかし、ソフトウェアと依存関係の急速に進化する性質は、特に複雑なコードベースと特殊なツールキットが使用されるディープニューラルネットワークの対角的堅牢性のような分野において、研究結果を再現する上で大きな課題を生んでいる。本稿では,VeriGauge ツールキットを用いた "SoK: Certified Robustness for Deep Neural Networks" における検証結果の検証を試みる。ドキュメント化された方法論に従えば、古い依存関係や利用できない依存関係、バージョンコンフリクト、ドライバの不互換性など、多くのソフトウェアとハードウェアの互換性の問題が発生した。元の結果のサブセットを走らせることができたが、これらの技術的障害と試験結果のわずかな相違により、様々な検証手法の実証的堅牢な精度に関する重要な発見が発覚した。この実践的な経験は、再現可能性の欠如が科学的完全性を脅かし、進歩を妨げる敵の堅牢性研究に支障をきたす再現性危機に光を当てている。本稿では,コンテナ化やソフトウェア保存,包括的なドキュメントプラクティスといった潜在的なソリューションを提案する。さらに、再現可能な研究のための堅牢なフレームワークを開発するために、研究コミュニティ内でのコラボレーションと標準化の取り組みの必要性を強調している。本研究は, 再現性危機に先立ち, 科学的再現性に関する現在進行中の談話に貢献することを目的としており, 研究成果の信頼性と妥当性を, 敵の堅牢性だけでなく, セキュリティ・技術研究全般において保証するベストプラクティスを提唱する。

Reproducibility is a cornerstone of scientific research, enabling validation, extension, and progress. However, the rapidly evolving nature of software and dependencies poses significant challenges to reproducing research results, particularly in fields like adversarial robustness for deep neural networks, where complex codebases and specialized toolkits are utilized. This paper presents a case study of attempting to validate the results on certified adversarial robustness in "SoK: Certified Robustness for Deep Neural Networks" using the VeriGauge toolkit. Despite following the documented methodology, numerous software and hardware compatibility issues were encountered, including outdated or unavailable dependencies, version conflicts, and driver incompatibilities. While a subset of the original results could be run, key findings related to the empirical robust accuracy of various verification methods proved elusive due to these technical obstacles, as well as slight discrepancies in the test results. This practical experience sheds light on the reproducibility crisis afflicting adversarial robustness research, where a lack of reproducibility threatens scientific integrity and hinders progress. The paper discusses the broader implications of this crisis, proposing potential solutions such as containerization, software preservation, and comprehensive documentation practices. Furthermore, it highlights the need for collaboration and standardization efforts within the research community to develop robust frameworks for reproducible research. By addressing the reproducibility crisis head-on, this work aims to contribute to the ongoing discourse on scientific reproducibility and advocate for best practices that ensure the reliability and validity of research findings within not only adversarial robustness, but security and technology research as a whole.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# GIST: 異種データ要約のためのGreedy Independent Set Thresholding

GIST: Greedy Independent Set Thresholding for Diverse Data Summarization ( http://arxiv.org/abs/2405.18754v1 )

ライセンス: Link先を確認

Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo,

(参考訳) 本稿では,機械学習,例えばデータサンプリング,特徴選択など,多種多様な応用が可能な,min-distance various data summarization(\textsf{MDDS}$)と呼ばれる新しいサブセット選択タスクを提案する。計量空間における点の集合が与えられたとき、ゴールは、点の全体の効用と、任意の選択された点間の最小距離を、制約$|S| \le k$の条件で捉える多様性項を組み合わせた目的を最大化することである。例えば、このポイントは、深層ニューラルネットワークから抽出された画像の学習された埋め込みなど、データサンプリング問題におけるトレーニング例に対応できる。この研究は$\texttt{GIST}$アルゴリズムを示し、bicriteria greedyアルゴリズムで一連の最大独立集合問題を近似することで$\frac{2}{3}$-approximation guarantee for $\textsf{MDDS}$を達成する。また、任意の$\varepsilon > 0$に対して、相補的な$(\frac{2}{3}+\varepsilon)$-近似の硬度も証明する。最後に,$\texttt{GIST}$が合成データに対して$\textsf{MDDS}$の既存の手法よりも優れており,実世界の画像分類実験ではImageNetの単発サブセット選択について検討する。

We propose a novel subset selection task called min-distance diverse data summarization ($\textsf{MDDS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between any pair of selected points, subject to the constraint $|S| \le k$. For example, the points may correspond to training examples in a data sampling problem, e.g., learned embeddings of images extracted from a deep neural network. This work presents the $\texttt{GIST}$ algorithm, which achieves a $\frac{2}{3}$-approximation guarantee for $\textsf{MDDS}$ by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove a complementary $(\frac{2}{3}+\varepsilon)$-hardness of approximation, for any $\varepsilon > 0$. Finally, we provide an empirical study that demonstrates $\texttt{GIST}$ outperforms existing methods for $\textsf{MDDS}$ on synthetic data, and also for a real-world image classification experiment the studies single-shot subset selection for ImageNet.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# 確率的コントラスト型連続学習

Provable Contrastive Continual Learning ( http://arxiv.org/abs/2405.18756v1 )

ライセンス: Link先を確認

Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang,

(参考訳) 継続的な学習には、動的データ分散による漸進的なタスクの学習が必要である。これまでに, コントラスト損失と蒸留損失を併用して連続学習の訓練を行うことで, 高い性能が得られることが確認されている。しかし、私たちの知る限りでは、この対照的な連続的な学習フレームワークは、説得力のある理論的な説明を欠いている。本研究では,このギャップを理論的性能保証の確立によって埋める。これは,モデルの性能が,対照的な連続学習フレームワークにおいて,従来のタスクのトレーニング損失によっていかに境界づけられているかを明らかにする。我々の理論的な説明は、事前学習が継続的な学習に役立つという考えをさらに支持する。これらの保証の理論的解析から着想を得て、異なるタスクに適応蒸留係数を用いるCILAと呼ばれる新しいコントラスト型連続学習アルゴリズムを提案する。これらの蒸留係数は、従来の平均蒸留損失と平均コントラスト損失との比で容易に計算できる。提案手法は,標準ベンチマークの精度を向上し,新しい最先端性能を実現する。

Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill this gap by establishing theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks in the contrastive continual learning framework. Our theoretical explanations further support the idea that pre-training can benefit continual learning. Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA, which uses adaptive distillation coefficients for different tasks. These distillation coefficients are easily computed by the ratio between average distillation losses and average contrastive losses from previous tasks. Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# ベイズ原理で継続的に学ぶこと

Learning to Continually Learn with the Bayesian Principle ( http://arxiv.org/abs/2405.18758v1 )

ライセンス: Link先を確認

Soochan Lee, Hyeonseong Jeon, Jaehyeon Son, Gunhee Kim,

(参考訳) ディープラーニングの現代において、継続学習の研究は主に、非定常的なデータストリーム上で確率的勾配勾配のニューラルネットワークをトレーニングする際の忘れを緩和することに焦点を当てている。一方、統計機械学習の古典的な文献では、多くのモデルは、バッチトレーニングと同じ学習結果をもたらす逐次ベイズ更新規則を持つ。しかし、それらはしばしば複雑な実世界のデータをモデル化するのに非常に単純である。本研究では、ニューラルネットワークの強力な表現力と、忘れることに対する単純な統計モデルの堅牢性を組み合わせたメタラーニングパラダイムを採用する。我々の新しいメタ連続学習フレームワークでは、連続学習は理想的な逐次ベイズ更新規則を介して統計モデルでのみ行われ、ニューラルネットワークは生データと統計モデルをブリッジするためにメタ学習される。ニューラルネットワークは継続学習中に固定されているため、破滅的な忘れ物から保護されている。このアプローチはパフォーマンスを大幅に向上するだけでなく、優れたスケーラビリティも発揮します。このアプローチはドメインに依存しないモデルに依存しないため、幅広い問題に適用でき、既存のモデルアーキテクチャと容易に統合できる。

In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch training, i.e., they are completely immune to catastrophic forgetting. However, they are often overly simple to model complex real-world data. In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models' robustness to forgetting. In our novel meta-continual learning framework, continual learning takes place only in statistical models via ideal sequential Bayesian update rules, while neural networks are meta-learned to bridge the raw data and the statistical models. Since the neural networks remain fixed during continual learning, they are protected from catastrophic forgetting. This approach not only achieves significantly improved performance but also exhibits excellent scalability. Since our approach is domain-agnostic and model-agnostic, it can be applied to a wide range of problems and easily integrated with existing model architectures.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# FDQN: ゲーム自動化のための柔軟なQ-ネットワークフレームワーク

FDQN: A Flexible Deep Q-Network Framework for Game Automation ( http://arxiv.org/abs/2405.18761v1 )

ライセンス: Link先を確認

Prabhath Reddy Gujavarthy,

(参考訳) 強化学習では、特にドメインがリアルタイムのオンラインインタラクションやWebゲームのような適応戦略を必要とする場合、動的環境における高次元的、迅速な意思決定を自動化することがしばしば困難である。本研究は,CNNを用いて高次元センサデータをリアルタイムに処理し,異なるゲーム環境のさまざまなアクション空間にモデルアーキテクチャを動的に適用し,さまざまなAtariゲームやChrome Dinoゲームにおいて,以前のベースラインモデルをベースラインとして向上させるという,この課題に対処可能な,最先端のフレキシブルQネットワーク(FDQN)フレームワークを提案する。 epsilon-greedyポリシを使うことで、パフォーマンス向上のために新しい学習と利用のバランスを効果的に保ち、フレームワークのコア部分に触れることなく、他のHTMLベースのゲームに容易に適応できるモジュール構造で設計されている。 FDQNフレームワークは、実験室条件下でよく定義されたタスクをうまく解決できることが実証されているが、より重要なことは、より困難な現実世界のケースへの潜在的な応用を議論し、自動化されたゲームプレイ以降のさらなる探索の出発点として機能することである。

In reinforcement learning, it is often difficult to automate high-dimensional, rapid decision-making in dynamic environments, especially when domains require real-time online interaction and adaptive strategies such as web-based games. This work proposes a state-of-the-art Flexible Deep Q-Network (FDQN) framework that can address this challenge with a selfadaptive approach that is processing high-dimensional sensory data in realtime using a CNN and dynamically adapting the model architecture to varying action spaces of different gaming environments and outperforming previous baseline models in various Atari games and the Chrome Dino game as baselines. Using the epsilon-greedy policy, it effectively balances the new learning and exploitation for improved performance, and it has been designed with a modular structure that it can be easily adapted to other HTML-based games without touching the core part of the framework. It is demonstrated that the FDQN framework can successfully solve a well-defined task in a laboratory condition, but more importantly it also discusses potential applications to more challenging real-world cases and serve as the starting point for future further exploration into automated game play and beyond.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# Inpaint Biases: 正確な画像生成と不偏画像生成のための道

Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation ( http://arxiv.org/abs/2405.18762v1 )

ライセンス: Link先を確認

Jiyoon Myung, Jihyeon Park,

(参考訳) 本稿では、訓練データセットにほとんど表現されない、あるいは欠落している非伝統的な概念を正確にレンダリングする際の高度なテキスト・画像モデルの限界について検討する。これらの制限が、これらのモデルの創造的可能性を限定するだけでなく、ステレオタイプを補強するリスクも生じさせる。これらの課題に対処するために,ユーザ定義マスクとインペイント技術を用いたInpaint Biasesフレームワークを導入し,特に新規あるいは不正確なオブジェクトに対して,画像生成の精度を向上させる。実験的な検証を通じて、このフレームワークが生成した画像の忠実度をユーザの意図に大きく改善し、それによってモデルの創造能力を拡大し、バイアスを緩和するリスクを緩和することを示す。本研究は,創造的表現のための非バイアスで汎用的なツールとして,テキスト・ツー・イメージ・モデルの進歩に寄与する。

This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# BCIにおける脳波データを用いたジェネリック表現学習のための大脳モデル

Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI ( http://arxiv.org/abs/2405.18765v1 )

ライセンス: Link先を確認

Wei-Bang Jiang, Li-Ming Zhao, Bao-Liang Lu,

(参考訳) 現在の脳波に基づくディープラーニングモデルは、典型的には、脳-コンピュータ相互作用(BCI)における特定のデータセットや応用のために設計されており、モデルの規模を制限し、知覚能力と一般化性を低下させる。近年,Large Language Models (LLMs) はテキスト処理において前例のない成功を収めており,Large EEG Models (LEMs) の機能を探究している。我々は,LEMが脳波データセットの異なるタスクタイプの制限を突破し,教師なし事前学習を通じて脳波信号の普遍的知覚能力を得ることを期待している。次に、モデルは異なる下流タスクのために微調整できる。しかし、テキストデータと比較すると、EEGデータセットのボリュームは概して小さく、フォーマットは様々である。例えば、電極のミスマッチ数、不等長データサンプル、様々なタスク設計、低信号対雑音比がある。これらの課題を克服するため、我々はLarge Brain Model (LaBraM) と呼ばれる脳波の統一基盤モデルを提案する。 LaBraMは、EEG信号をEEGチャネルパッチにセグメント化することで、データセット間の学習を可能にする。ベクトル量子化されたニューラルスペクトル予測は、連続的な生のEEGチャネルパッチをコンパクトなニューラルコードにエンコードするセマンティックにリッチなニューラルトークンを訓練するために使用される。次に、マスクされたEEGチャネルパッチの元のニューラルコードを予測することにより、ニューラルトランスフォーマーを事前訓練する。 LaBraMは、約20のデータセットから約2500時間のさまざまなEEG信号で事前トレーニングされ、複数の異なる下流タスクで検証された。異常検出,事象型分類,感情認識,歩行予測実験の結果,LaBraMはそれぞれの分野でSOTA法よりも優れていた。私たちのコードはhttps://github.com/935963004/LaBraM.comで公開されています。

The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# RNA Flow:逆フォールディングフローマッチングによるRNA構造とシーケンス設計

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching ( http://arxiv.org/abs/2405.18768v1 )

ライセンス: Link先を確認

Divya Nori, Wengong Jin,

(参考訳) 多様な生物学的応用におけるRNA工学の重要性の高まりにより、構造に基づくRNA設計のためのAI手法の開発への関心が高まっている。拡散モデルはタンパク質設計において優れているが、RNAに適応させることは、RNAのコンフォメーションの柔軟性と大きな構造予測モデルを微調整する計算コストにより、新しい課題をもたらす。そこで本研究では,タンパク質条件のRNA配列構造設計のためのフローマッチングモデルであるRNAFlowを提案する。そのデノナイジングネットワークはRNA逆フォールディングモデルと事前訓練されたRosettaFold2NAネットワークを統合し、RNA配列と構造を生成する。構造記述過程における逆折り畳みの統合により,構造予測ネットワークの修正によるトレーニングの簡易化が可能となる。我々は、動的RNAコンフォメーションをモデル化するために、推論されたコンフォメーションアンサンブルに条件付けすることで、逆折り畳みモデルをさらに強化する。タンパク質条件のRNA構造と配列生成タスクの評価は、既存のRNA設計手法に対するRNAFlowの優位性を示している。

The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# OUS:Scene-Guided Dynamic Facial Expression Recognition

OUS: Scene-Guided Dynamic Facial Expression Recognition ( http://arxiv.org/abs/2405.18769v1 )

ライセンス: Link先を確認

Xinji Mai, Haoran Wang, Zeng Tao, Junxiong Lin, Shaoqi Yan, Yan Wang, Jing Liu, Jiawen Yu, Xuan Tong, Yating Li, Wenqiang Zhang,

(参考訳) 動的顔表情認識(DFER)は情緒的コンピューティングには不可欠であるが、シーンコンテキストの影響を見落としていることが多い。人間のアノテータは、環境手がかりやボディランゲージなど、さまざまな角度から感情を統合するのが一般的であるのに対して、既存のDFERメソッドでは、シーンを、顔情報にのみ焦点をあてて、フィルタリングが必要なノイズとして考える傾向があります。これを「剛性認知問題」と呼ぶ。 Rigid Cognitive Problemは、いくつかのサンプルにおいて、アノテーションの認識とモデルの間に相違をもたらす可能性がある。感情の人間の認知パラダイムとより緊密に一致させるために,情景DFER法(OUS)の総合的理解を提案する。 OUSはシーンと顔の特徴を効果的に統合し、DFERのシーン固有の感情的知識を組み合わせる。 DFERフィールドにおける2つの大きなデータセットであるDFEWとFERV39kに関する大規模な実験は、ousが既存の手法よりも大幅に優れていることを示した。 Rigid Cognitive Problemを解析することにより、ousはシーンコンテキストと感情表現の複雑な関係をうまく理解し、現実世界のシナリオにおける人間の感情的理解と密接に一致させる。

Dynamic Facial Expression Recognition (DFER) is crucial for affective computing but often overlooks the impact of scene context. We have identified a significant issue in current DFER tasks: human annotators typically integrate emotions from various angles, including environmental cues and body language, whereas existing DFER methods tend to consider the scene as noise that needs to be filtered out, focusing solely on facial information. We refer to this as the Rigid Cognitive Problem. The Rigid Cognitive Problem can lead to discrepancies between the cognition of annotators and models in some samples. To align more closely with the human cognitive paradigm of emotions, we propose an Overall Understanding of the Scene DFER method (OUS). OUS effectively integrates scene and facial features, combining scene-specific emotional knowledge for DFER. Extensive experiments on the two largest datasets in the DFER field, DFEW and FERV39k, demonstrate that OUS significantly outperforms existing methods. By analyzing the Rigid Cognitive Problem, OUS successfully understands the complex relationship between scene context and emotional expression, closely aligning with human emotional understanding in real-world scenarios.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# ビジュアル・ランゲージ・アタックに対する防御のための多対多関係の活用

Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks ( http://arxiv.org/abs/2405.18770v1 )

ライセンス: Link先を確認

Futa Waseda, Antonio Tejero-de-Pablos,

(参考訳) 近年の研究では、視覚言語(VL)モデルが画像テキスト検索(ITR)の敵攻撃に対して脆弱であることが示されている。しかし、既存のVLモデルの防衛戦略は、画像とテキストの同時操作を考慮しないゼロショット画像分類と、複数の方法で単一の画像を記述することができるITR固有の多対多(N:N)の性質に重点を置いている。そこで本研究では,ITRのVLモデルに対する敵攻撃に対する防衛戦略を初めて検討した。特に,敵の強靭性を高めるため,IMRにおけるN:N関係の活用に着目する。列車データ中の1対1画像テキストペアに対して, 対角訓練は容易にオーバーフィットするが, 1対1(N:N)/多対1(N:1)画像テキストペアを作成するための多様な拡張技術は, VLモデルの対角的ロバスト性を大幅に向上させることができることがわかった。さらに, 画像・テキスト・ペアのアライメントは, 防御戦略の有効性に不可欠であり, 不適切な拡張はモデルの性能を低下させる可能性があることを示す。そこで本研究では,IMRにおけるN:N関係を利用した新たな防衛戦略を提案し,基本拡張と生成モデルに基づく拡張を用いて,多種多様かつ高整合なN:Nペアを効果的に生成する。この研究は、VLタスクにおける敵の攻撃を防御する新しい視点を提供し、将来の研究のための新たな研究方向を開く。

Recent studies have revealed that vision-language (VL) models are vulnerable to adversarial attacks for image-text retrieval (ITR). However, existing defense strategies for VL models primarily focus on zero-shot image classification, which do not consider the simultaneous manipulation of image and text, as well as the inherent many-to-many (N:N) nature of ITR, where a single image can be described in numerous ways, and vice versa. To this end, this paper studies defense strategies against adversarial attacks on VL models for ITR for the first time. Particularly, we focus on how to leverage the N:N relationship in ITR to enhance adversarial robustness. We found that, although adversarial training easily overfits to specific one-to-one (1:1) image-text pairs in the train data, diverse augmentation techniques to create one-to-many (1:N) / many-to-one (N:1) image-text pairs can significantly improve adversarial robustness in VL models. Additionally, we show that the alignment of the augmented image-text pairs is crucial for the effectiveness of the defense strategy, and that inappropriate augmentations can even degrade the model's performance. Based on these findings, we propose a novel defense strategy that leverages the N:N relationship in ITR, which effectively generates diverse yet highly-aligned N:N pairs using basic augmentations and generative model-based augmentations. This work provides a novel perspective on defending against adversarial attacks in VL tasks and opens up new research directions for future work.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# 環境制約付き最大被覆問題に対する信頼性のある微分制約の展開

Evolving Reliable Differentiating Constraints for the Chance-constrained Maximum Coverage Problem ( http://arxiv.org/abs/2405.18772v1 )

ライセンス: Link先を確認

Saba Sadeghi Ahouei, Jacob de Nobel, Aneta Neumann, Thomas Bäck, Frank Neumann,

(参考訳) チャンス制約問題(英語版)は、小さな確率で破ることができる制約の確率的成分を含む。確率制約が反復探索アルゴリズムの性能に与える影響について検討し,確率制約のあるグラフにおける古典的最大被覆問題について検討する。我々のゴールは、アルゴリズムの性能が期待されるだけでなく、高い信頼性で著しく異なるグラフに対する信頼性の高い確率制約設定を進化させることである。これにより、異なるタイプのアルゴリズムが、異なるタイプの制約設定にどのように対処できるかを学習し、理解し、自動アルゴリズム選択をサポートすることができる。本研究では、2つの確率探索アルゴリズムの性能を高い信頼性で区別する確率制約セットを提供する進化的アルゴリズムを開発する。まず、(1+1)〜EAの適合度関数として従来の近似比を用いてインスタンスを進化させ、信頼性のあるインスタンスを生成するのに不適切であることを示す。そこで本研究では,性能比の分散を考慮した2つのアルゴリズムの性能差を計算するための新しい尺度を提案する。実験の結果,本手法は性能比の不安定性の問題の解決に成功しており,様々なアルゴリズムの性能に大きく違いがあるような,信頼性の高い確率制約セットの進化に繋がることがわかった。

Chance-constrained problems involve stochastic components in the constraints which can be violated with a small probability. We investigate the impact of different types of chance constraints on the performance of iterative search algorithms and study the classical maximum coverage problem in graphs with chance constraints. Our goal is to evolve reliable chance constraint settings for a given graph where the performance of algorithms differs significantly not just in expectation but with high confidence. This allows to better learn and understand how different types of algorithms can deal with different types of constraint settings and supports automatic algorithm selection. We develop an evolutionary algorithm that provides sets of chance constraints that differentiate the performance of two stochastic search algorithms with high confidence. We initially use traditional approximation ratio as the fitness function of (1+1)~EA to evolve instances, which shows inadequacy to generate reliable instances. To address this issue, we introduce a new measure to calculate the performance difference for two algorithms, which considers variances of performance ratios. Our experiments show that our approach is highly successful in solving the instability issue of the performance ratios and leads to evolving reliable sets of chance constraints with significantly different performance for various types of algorithms.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# LLaMA-Reg : LLaMA 2による医用画像の無監督登録

LLaMA-Reg: Using LLaMA 2 for Unsupervised Medical Image Registration ( http://arxiv.org/abs/2405.18774v1 )

ライセンス: Link先を確認

Mingrui Ma, Yu Yang,

(参考訳) 医用画像登録は, 医用画像解析において重要な課題である。本稿では,事前訓練された大言語モデルを用いた医用画像登録手法を提案する。事前訓練された大言語モデルを用いて、医用画像の深い特徴を登録モデルにエンコードすることで、医用画像登録タスクにおける大言語モデルの可能性を示す画像登録精度を効果的に向上させることができる。デュアルエンコーダを用いて、画像ペアの深い特徴抽出を行い、事前訓練された大言語モデルに特徴を入力します。登録タスクに大規模な言語モデルを適応させるためには、登録モデルにおいて大きな言語モデルの重みを凍結し、大きな言語モデルを微調整するためにアダプタを利用する。 (a)大きな言語モデルコンピューティングの前に、視覚トークンを言語空間にマッピングする。 b) 大規模言語モデルから視覚空間へ出力されるモデル化された言語トークンを計画する。提案手法は,微調整された大言語モデルから出力する特徴と,各エンコーダ層から出力する特徴とを組み合わせて,デコーダの登録に必要な変形場を徐々に生成する。登録作業における大きな予測モデルの有効性を実証するため, 膝・脳MRI実験を行い, 最先端の結果を得た。

Medical image registration is an essential topic in medical image analysis. In this paper, we propose a method for medical image registration using a pretrained large language model. We find that using the pretrained large language model to encode deep features of the medical images in the registration model can effectively improve image registration accuracy, indicating the great potential of the large language model in medical image registration tasks. We use dual encoders to perform deep feature extraction on image pairs and then input the features into the pretrained large language model. To adapt the large language model to our registration task, the weights of the large language model are frozen in the registration model, and an adapter is utilized to fine-tune the large language model, which aims at (a) mapping the visual tokens to the language space before the large language model computing, (b) project the modeled language tokens output from the large language model to the visual space. Our method combines output features from the fine-tuned large language model with the features output from each encoder layer to gradually generate the deformation fields required for registration in the decoder. To demonstrate the effectiveness of the large prediction model in registration tasks, we conducted experiments on knee and brain MRI and achieved state-of-the-art results.

翻訳日:2024-05-30 18:48:25 公開日:2024-05-29

# LMO-DP:微分プライベート微調整(大規模)言語モデルのランダム化機構の最適化

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models ( http://arxiv.org/abs/2405.18776v1 )

ライセンス: Link先を確認

Qin Yang, Meisam Mohammad, Han Wang, Ali Payani, Ashish Kundu, Kai Shu, Yan Yan, Yuan Hong,

(参考訳) 大規模訓練済みの大規模言語モデルのための厳密なプライバシを確保するために,DP-SGDとその変種を識別的にプライベートな確率勾配Descent (DP-SGD) が提案されている。しかし、特により強力なプライバシー体制(例えば、プライバシー予算$\epsilon < 3$)では、過度に勾配を乱し、精度を低下させるガウスのメカニズムに大きく依存している。このような制約に対処するため、我々はLMO-DP(Language Model-based Optimal Differential Privacy)メカニズムを提案する。これは、強力なプライバシ体制(例えば、$0.1\leq \epsilon<3$)であっても、高度に微調整された(大規模)言語モデルの厳密な構成を可能にするための第一歩である。さらに,提案手法は,雑音の大きさを著しく低減するサブ最適DPを効率的に導出する,新しいオフライン最適雑音探索法を提案する。例えば、SST-2データセット上の細調整のRoBERTa-large(300万のパラメータを持つ)は、ガウス機構(例えば、小さな$\epsilon$と$\delta$に対して$\sim 50\%$)を大幅に上回ることによって、92.20%の精度($\epsilon=0.3$, $\delta=10^{-10}$)を達成することができる。また,GPT-2におけるテキスト生成タスクについても同様の知見が得られた。最後に、私たちの知る限り、LMO-DPは強力な差分プライバシー保証を持つLlama-2を正確に微調整する最初のソリューションでもある。コードは間もなくリリースされ、要求に応じて利用可能になる。

Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $\epsilon < 3$). To address such limitations, we propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism, which takes the first step to enable the tight composition of accurately fine-tuning (large) language models with a sub-optimal DP mechanism, even in strong privacy regimes (e.g., $0.1\leq \epsilon<3$). Furthermore, we propose a novel offline optimal noise search method to efficiently derive the sub-optimal DP that significantly reduces the noise magnitude. For instance, fine-tuning RoBERTa-large (with 300M parameters) on the SST-2 dataset can achieve an accuracy of 92.20% (given $\epsilon=0.3$, $\delta=10^{-10}$) by drastically outperforming the Gaussian mechanism (e.g., $\sim 50\%$ for small $\epsilon$ and $\delta$). We also draw similar findings on the text generation tasks on GPT-2. Finally, to our best knowledge, LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees. The code will be released soon and available upon request.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# SPABA: 最適サンプル複素性を実現する単一ループ確率確率確率的二値アルゴリズム

SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity ( http://arxiv.org/abs/2405.18777v1 )

ライセンス: Link先を確認

Tianshu Chu, Dachuan Xu, Wei Yao, Jin Zhang,

(参考訳) 機械学習における大規模ネスト最適化問題に対する確率的双レベル最適化手法は広く研究されているが、双レベル最適化を解くための最適複雑性境界がシングルレベル最適化と同一であるかどうかには疑問が残る。 SPABAは, (Li et al , 2021) における非凸最適化のための PAGE 法のバイレベル設定への適応であり, 有限サムおよび期待設定の両方において最適なサンプル複雑性を実現することができる。 PAGEを実装する際に,確率的双レベルと単一レベルの最適化の間に複雑性解析のギャップがないことを証明し,SPABAの最適性を示す。特に、 (Dagr\'eou et al , 2022) の結果によって示されるように、SGD や SAGA のような他の確率勾配推定器を実装する際には、複雑性解析のギャップが存在する可能性がある。 SPABAに加えて、我々は、我々の収束率と複雑性解析を利用して、最先端のサンプル複雑性結果に適合または改善する、他のいくつかのシングルループ確率的二値アルゴリズムを提案する。数値実験により提案手法の実用的な性能を実証した。

While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex optimization in (Li et al., 2021) to the bilevel setting, can achieve optimal sample complexity in both the finite-sum and expectation settings. We show the optimality of SPABA by proving that there is no gap in complexity analysis between stochastic bilevel and single-level optimization when implementing PAGE. Notably, as indicated by the results of (Dagr\'eou et al., 2022), there might exist a gap in complexity analysis when implementing other stochastic gradient estimators, like SGD and SAGA. In addition to SPABA, we propose several other single-loop stochastic bilevel algorithms, that either match or improve the state-of-the-art sample complexity results, leveraging our convergence rate and complexity analysis. Numerical experiments demonstrate the superior practical performance of the proposed methods.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 大規模言語モデルにおけるバイアスの定量化

Quantitative Certification of Bias in Large Language Models ( http://arxiv.org/abs/2405.18780v1 )

ライセンス: Link先を確認

Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza Ziyadi, Rahul Gupta, Gagandeep Singh,

(参考訳) 大規模言語モデル(LLM)は、社会的バイアスを示し、ステレオタイプをサポートする応答を生成することができる。しかし、従来のベンチマークでは、大量のプロンプトにスケールできないため、LLMバイアスを徹底的に評価するには不十分であり、保証は提供されない。そこで本稿では,提案する新たな認証フレームワークであるQuaCer-B(Quantitative Certification of Bias)を提案する。証明書は、分布からサンプリングされた機密属性を含むプロンプトの集合に対して、LSMからバイアス応答を得る確率に関する高信頼境界から構成される。与えられた分布から様々な接頭辞を引いたプロンプトに対するLLMのバイアス認証について説明する。ランダムなトークンシーケンス,手動のジェイルブレイクの混合,およびLDMの埋め込み空間におけるジェイルブレイクの分布について検討し,そのバイアスを証明した。我々は、人気のあるLLMをQuaCer-Bで認証し、それらのバイアスに関する新しい洞察を示す。

Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtaining unbiased responses from target LLMs under large sets of prompts. A certificate consists of high-confidence bounds on the probability of obtaining biased responses from the LLM for any set of prompts containing sensitive attributes, sampled from a distribution. We illustrate the bias certification in LLMs for prompts with various prefixes drawn from given distributions. We consider distributions of random token sequences, mixtures of manual jailbreaks, and jailbreaks in the LLM's embedding space to certify its bias. We certify popular LLMs with QuaCer-B and present novel insights into their biases.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 変圧器におけるアテンションマスクとレイヤーノームの役割について

On the Role of Attention Masks and LayerNorm in Transformers ( http://arxiv.org/abs/2405.18781v1 )

ライセンス: Link先を確認

Xinyi Wu, Amir Ajorlou, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie,

(参考訳) 自己注意(Self-attention)は、トランスフォーマーの鍵となるメカニズムであり、現代の基礎モデルの基本的な構成要素である。近年の研究では、深度が増加し、モデル表現率が制限され、モデル深度がさらに活用されるにつれて、純粋な自己意識がランク崩壊の度合いの上昇に悩まされることが示されている。しかし、既存の階位崩壊に関する文献は、階位崩壊問題を緩和するかもしれない変圧器の他の重要な要素を見落としている。本稿では,アテンションマスクとレイヤー正規化(LayerNorm)の影響を考慮した,自己注意下でのランク崩壊の一般解析を行う。特に、純粋なマスク付き注意は依然としてランク1部分空間への指数的崩壊に悩まされているが、局所マスク付き注意は崩壊速度を確実に遅くすることができる。 LayerNorm との自己アテンションの場合、ある値行列のクラスにおいて、ランク 1 の部分空間の崩壊が指数関数的に起こることを示す。しかし、非自明な反例の構築により、値行列の適切な選択により、列の一般類はランク 1 の部分空間に収束せず、LayerNorm の自己注意力学は1 とフルの任意のランクのリッチな平衡集合を同時に持つことができる。我々の結果は、LayerNormが自己注意のランク崩壊に何の役割も果たさないという以前の仮説を否定し、LayerNormとの自己意識が、当初考えられていたよりもはるかに表現力があり、多角的な非線形力学系を構成することを示唆している。

Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical components in transformers that may alleviate the rank collapse issue. In this paper, we provide a general analysis of rank collapse under self-attention, taking into account the effects of attention masks and layer normalization (LayerNorm). In particular, we find that although pure masked attention still suffers from exponential collapse to a rank one subspace, local masked attention can provably slow down the collapse rate. In the case of self-attention with LayerNorm, we first show that for certain classes of value matrices, collapse to a rank one subspace still happens exponentially. However, through construction of nontrivial counterexamples, we then establish that with proper choice of value matrices, a general class of sequences may not converge to a rank one subspace, and the self-attention dynamics with LayerNorm can simultaneously possess a rich set of equilibria with any possible rank between one and full. Our result refutes the previous hypothesis that LayerNorm plays no role in the rank collapse of self-attention and suggests that self-attention with LayerNorm constitutes a much more expressive, versatile nonlinear dynamical system than what was originally thought.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# プラグ・アンド・プレイプリミティブとしての拡散モデルを用いた原理的確率的イメージング

Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors ( http://arxiv.org/abs/2405.18782v1 )

ライセンス: Link先を確認

Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, Katherine L. Bouman,

(参考訳) 拡散モデル (DM) は, 複雑な画像分布をモデル化する際, ベイジアン逆問題の解法として, 表現的画像先行を導出した。しかし、既存のDMベースの手法の多くは、生成過程の近似に依拠して異なる逆問題に一般化し、ベイズフレームワーク内で定義された対象の後方から逸脱する不正確なサンプル分布をもたらす。このような近似を避けつつ、DMの生成能力を活用するために、ガウスの擬似問題の後部サンプリングに還元して一般逆問題に対する後部サンプリングを行うマルコフ連鎖モンテカルロアルゴリズムを提案する。重要なことは、一般のDM定式化を統一インターフェースとして活用することで、最先端のDMを厳格に解決することができる。提案手法が実世界のブラックホールイメージング問題を含む6つの逆問題(3つの線形問題と3つの非線形問題)に対して有効であることを示す。実験結果から,提案手法は既存の DM 画像逆解析法と比較して,より高精度な再構成と後方推定が可能であることが示唆された。

Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 動的トンネル法による変分量子アルゴリズムの大域的最適化

Global optimization in variational quantum algorithms via dynamic tunneling method ( http://arxiv.org/abs/2405.18783v1 )

ライセンス: Link先を確認

Seung Park, Kyunghyun Baek, Seungjin Lee, Mahn-Soo Choi,

(参考訳) 動的トンネル流れを利用した変分量子アルゴリズムのグローバル最適化ルーチンを提案する。もともと、局所最小値の周辺で勾配に基づく最適化器が収集した情報を活用するために設計されたもので、従来の動的トンネル流を量子状態の距離測定に応用し、量子状態のパラメトリゼーションから生じる外在的縮退の問題を解消する。パラメータ空間上のユークリッド距離測定に基づく従来の動的トンネル法と比較しながら, 横フィールドイジングモデルに対する変分量子固有解法に適用し, ルーチンの性能を実証する。

We present a global optimization routine for the variational quantum algorithms, which utilizes the dynamic tunneling flow. Originally designed to leverage information gathered by a gradient-based optimizer around local minima, we adapt the conventional dynamic tunneling flow to exploit the distance measure of quantum states, resolving issues of extrinsic degeneracy arising from the parametrization of quantum states. Our global optimization algorithm is applied to the variational quantum eigensolver for the transverse-field Ising model to demonstrate the performance of our routine while comparing it with the conventional dynamic tunneling method, which is based on the Euclidean distance measure on the parameter space.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# LP-3DGS:3Dガウス平滑化の学習

LP-3DGS: Learning to Prune 3D Gaussian Splatting ( http://arxiv.org/abs/2405.18784v1 )

ライセンス: Link先を確認

Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan,

(参考訳) 近年, 3D Gaussian Splatting (3DGS) は, 高品質かつ高速なレンダリング速度のため, 新規ビュー合成 (NVS) の主流手法の1つとなっている。しかし、ポイントベースのシーン表現として、3DGSはシーンに適合する多数のガウスを発生させ、高いメモリ使用率をもたらす可能性がある。提案された改善には、経験的および予め設定されたプルーニング比または重要スコア閾値のいずれかが必要である。このようなハイパーパラメータは、各シーンのレンダリング品質を維持しながら、最大プルーニング比率を最適化し達成するために、複数のラウンドのトレーニングを必要とする。本研究では,学習から実践までの3DGS(Learning-to-prune 3DGS,LP-3DGS)を提案する。従来のストレートスルー推定器(STE)法を用いて2次元マスク勾配を近似する代わりに,Gumbel-Sigmoid法を用いてマスク機能を再設計し,既存の3DGSのトレーニングプロセスと差別化・互換性を持たせる。大規模な実験により、LP-3DGSは効率的かつ高品質な良好なバランスを保っていることが示されている。

Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset pruning ratio or importance score threshold to prune the point cloud. Such hyperparamter requires multiple rounds of training to optimize and achieve the maximum pruning ratio, while maintaining the rendering quality for each scene. In this work, we propose learning-to-prune 3DGS (LP-3DGS), where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically. Instead of using the traditional straight-through estimator (STE) method to approximate the binary mask gradient, we redesign the masking function to leverage the Gumbel-Sigmoid method, making it differentiable and compatible with the existing training process of 3DGS. Extensive experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 量子回路コンパイルにおける最適マルチビットパスフィニングのための高速かつ適応的なアルゴリズム

A Fast and Adaptable Algorithm for Optimal Multi-Qubit Pathfinding in Quantum Circuit Compilation ( http://arxiv.org/abs/2405.18785v1 )

ライセンス: Link先を確認

Gary J Mooney,

(参考訳) 量子コンピューティングは、様々な研究や産業分野において、複雑で古典的に難解な問題をシミュレートし、解決する能力を大幅に強化する可能性がある。しかし、我々は現在、デバイスが比較的小さく、かなりのノイズレベルに悩まされており、大規模な計算を禁止している、ノイズの多い中間量子(NISQ)時代にいる。この状態以降の量子的優位性を達成するためには、クビットデコヒーレンスと2量子ゲートからのノイズの影響を最小限に抑えることが不可欠である。直近のアプローチは、回路を物理デバイスにマッピングする量子回路コンパイルプロセスの最適化を改善することで、ノイズの多いゲートと回路実行時間を短縮することである。この研究は、量子回路のコンパイルマッピング問題における臨界サブルーチンとして、マルチキュービットパスフィンディングに焦点を当てている。回路SWAPゲート深さに対して量子ハードウェア上の量子ビットを最適にナビゲートする二進整数線形計画法を用いてモデル化したアルゴリズムを導入するとともに,蓄積したゲート誤差を最適化し,様々な問題修正に柔軟に適用することができる。このマルチキュービットパスフィンディングアルゴリズムは、ゲートエラーペナルティ、SWAP運動制約、およびソースおよびターゲットキュービット位置とキュービットチームの構成可能なアレンジメントを考慮に入れている。我々は、様々な量子ハードウェアレイアウトのアルゴリズムをベンチマークし、計算ランタイム、解SWAP深さ、累積SWAPゲート誤差率などの特性を評価した。結果は、現在の量子デバイスにおけるアルゴリズムの実践的ランタイムを示し、その効果を様々なハードウェア構成で比較し、将来の量子ハードウェア設計に対する洞察を与える。

Quantum computing has the potential to significantly enhance our ability to simulate and solve complex, classically intractable problems across various fields of research and industry. However, we are currently in the noisy intermediate-scale quantum (NISQ) era, where devices are relatively small and suffer from substantial noise levels, prohibiting large-scale computations. To achieve any quantum advantage in this regime and beyond, it is crucial to minimise the impact of noise from qubit decoherence and two-qubit gates. A direct approach is to improve the optimisation of quantum circuit compilation processes that map circuits onto physical devices, thereby reducing noisy gates and circuit execution times. This work focuses on multi-qubit pathfinding as a critical subroutine within the quantum circuit compilation mapping problem. We introduce an algorithm, modelled using binary integer linear programming, that navigates qubits on quantum hardware optimally with respect to circuit SWAP-gate depth, while also optimising for accumulated gate errors and can be flexibly adapted to various problem modifications. This multi-qubit pathfinding algorithm incorporates considerations for gate-error penalties, SWAP movement constraints, and configurable arrangements of source and target qubit locations and qubit teams. We have benchmarked the algorithm across a variety of quantum hardware layouts, assessing properties such as computational runtimes, solution SWAP depths, and accumulated SWAP-gate error rates. The results demonstrate the algorithm's practical runtimes on current quantum devices and compare its effectiveness across different hardware configurations, providing insights for future quantum hardware design.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# MOKD:最適化カーネル依存性の最大化によるFew-shot分類のためのクロスドメインファインタニング

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence ( http://arxiv.org/abs/2405.18786v1 )

ライセンス: Link先を確認

Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu-ming Cheung, Bo Han,

(参考訳) クロスドメインな小ショット分類において、 'emph{nearest centroid classifier} (NCC) は、サンプルと各クラスのプロトタイプの類似性を測定することで、少数ショット分類を行うことができる計量空間を構築するために表現を学ぶことを目的としている。 NCCの背後にある直感は、各サンプルは他のクラスのサンプルから押し離されながら、その標本が属するクラスセントロイドに近づくことである。しかし,本論文では,異なるクラスからの2つのサンプルの NCC 学習表現に高い類似性があることが判明した。この問題に対処するために、与えられたタスクのラベル付きデータで示されるクラスタ構造にマッチするクラス固有の表現の集合を学習するために、二段階最適化フレームワークである 'emph{maximizing Optimization kernel dependency} (MOKD) を提案する。特に、MOKDは最初に \emph{Hilbert-Schmidt Independence criterion} (HSIC) で採用されているカーネルを最適化し、より正確に依存を捉えることができる最適化されたカーネルHSIC (opt-HSIC) を得る。次に、オプトHSICに関する最適化問題に対処し、表現とラベル間の依存を同時に最大化し、全てのサンプル間の依存を最小限に抑える。 Meta-Datasetに関する大規模な実験により、MOKDは、ほとんどの場合、目に見えないドメインでのより優れた一般化性能を達成できるだけでなく、より良いデータ表現クラスタを学習できることが示されている。 MOKDのプロジェクトリポジトリは以下の通りである。 \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}。

In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other classes. However, in this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. In order to address this problem, we propose a bi-level optimization framework, \emph{maximizing optimized kernel dependence} (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data of the given task. Specifically, MOKD first optimizes the kernel adopted in \emph{Hilbert-Schmidt independence criterion} (HSIC) to obtain the optimized kernel HSIC (opt-HSIC) that can capture the dependence more precisely. Then, an optimization problem regarding the opt-HSIC is addressed to simultaneously maximize the dependence between representations and labels and minimize the dependence among all samples. Extensive experiments on Meta-Dataset demonstrate that MOKD can not only achieve better generalization performance on unseen domains in most cases but also learn better data representation clusters. The project repository of MOKD is available at: \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# マルチスケールDeep Feature Statistics を用いたオピニオン・ウインドウ・ブラインド画像品質評価

Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics ( http://arxiv.org/abs/2405.18790v1 )

ライセンス: Link先を確認

Zhangkai Ni, Yue Liu, Keyan Ding, Wenhan Yang, Hanli Wang, Shiqi Wang,

(参考訳) 深層学習に基づく手法はブラインド画像品質評価(BIQA)の分野に大きな影響を与えてきたが、これらの手法は多量の人間の評価データを用いたトレーニングを必要とすることが多い。対照的に、従来の知識に基づく手法は訓練に費用対効果があるが、人間の視覚的知覚に沿った特徴を効果的に抽出する際の課題に直面している。これらのギャップを埋めるために、我々は、事前学習された視覚モデルから統計解析モデルへの深い特徴を、意見不明なBIQA(OU-BIQA)を達成するためのマルチスケールDeep Feature Statistics(MDFS)モデルに統合し、人間のレーティングデータへの依存をなくし、トレーニング効率を著しく改善することを提案する。具体的には、事前訓練された視覚モデルからパッチワイドなマルチスケール特徴を抽出し、その後、多変量ガウスモデル(MVG)に組み込む。テスト画像から派生したMVGモデルと、高品質な画像集合から派生したベンチマークMVGモデルとの距離を定量化して最終品質スコアを決定する。各種データセットを用いた総合的な実験の結果,提案モデルでは,最先端のBIQAモデルと比較して,人間の視覚知覚との整合性が良好であることが示された。さらに、多様なターゲット固有のBIQAタスク間での一般化性の向上を示す。私たちのコードは、https://github.com/eezkni/MDFSで利用可能です。

Deep learning-based methods have significantly influenced the blind image quality assessment (BIQA) field, however, these methods often require training using large amounts of human rating data. In contrast, traditional knowledge-based methods are cost-effective for training but face challenges in effectively extracting features aligned with human visual perception. To bridge these gaps, we propose integrating deep features from pre-trained visual models with a statistical analysis model into a Multi-scale Deep Feature Statistics (MDFS) model for achieving opinion-unaware BIQA (OU-BIQA), thereby eliminating the reliance on human rating data and significantly improving training efficiency. Specifically, we extract patch-wise multi-scale features from pre-trained vision models, which are subsequently fitted into a multivariate Gaussian (MVG) model. The final quality score is determined by quantifying the distance between the MVG model derived from the test image and the benchmark MVG model derived from the high-quality image set. A comprehensive series of experiments conducted on various datasets show that our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models. Furthermore, it shows improved generalizability across diverse target-specific BIQA tasks. Our code is available at: https://github.com/eezkni/MDFS

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 決定論的RLポリティクスのインサンプルオフポリティ評価のためのカーネルメトリック学習

Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies ( http://arxiv.org/abs/2405.18792v1 )

ライセンス: Link先を確認

Haanvid Lee, Tri Wahyu Guntara, Jongmin Lee, Yung-Kyun Noh, Kee-Eung Kim,

(参考訳) 連続行動空間を有する環境における強化学習(RL)のための決定論的目標政策のオフ・ポリティクス評価(OPE)を検討する。 OPEの重要サンプリングは一般的に用いられるが,行動方針が目標方針から著しく逸脱した場合には,高いばらつきに悩まされる。この問題に対処するため、OPEに関する最近の研究は、重要な再サンプリングを伴うインサンプルラーニングを提案している。しかし、これらのアプローチは連続的な作用空間に対する決定論的対象ポリシーには適用できない。この制限に対処するために、カーネルを用いた決定論的ターゲットポリシーを緩和し、アクション値関数の推定時間差更新ベクトルの総平均二乗誤差を最小限に抑えるカーネルメトリクスを学習し、アクション値関数をポリシー評価に使用する。この緩和による推定誤差のバイアスと分散を導出し、最適なカーネル計量に対する解析解を提供する。種々のテスト領域を用いた実証実験において,カーネルを用いたサンプル内学習を用いたOPEは,他のベースラインよりも精度が大幅に向上することを示した。

We consider off-policy evaluation (OPE) of deterministic target policies for reinforcement learning (RL) in environments with continuous action spaces. While it is common to use importance sampling for OPE, it suffers from high variance when the behavior policy deviates significantly from the target policy. In order to address this issue, some recent works on OPE proposed in-sample learning with importance resampling. Yet, these approaches are not applicable to deterministic target policies for continuous action spaces. To address this limitation, we propose to relax the deterministic target policy using a kernel and learn the kernel metrics that minimize the overall mean squared error of the estimated temporal difference update vector of an action value function, where the action value function is used for policy evaluation. We derive the bias and variance of the estimation error due to this relaxation and provide analytic solutions for the optimal kernel metric. In empirical studies using various test domains, we show that the OPE with in-sample learning using the kernel with optimized metric achieves significantly improved accuracy than other baselines.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 距離空間における適応的離散化に基づく非エポゾディック強化学習

Adaptive Discretization-based Non-Episodic Reinforcement Learning in Metric Spaces ( http://arxiv.org/abs/2405.18793v1 )

ライセンス: Link先を確認

Avik Kar, Rahul Singh,

(参考訳) 状態作用空間を距離空間とし、遷移核と報酬をリプシッツ関数とするリプシッツ MDP に対する非絶対強化学習について検討する。計算効率の良い UCB-based algorithm, $\textit{ZoRL-}\epsilon$ は、状態-作用空間を適応的に離散化し、それらの後悔が $\epsilon$-optimal policy に対して $\mathcal{O}(\epsilon^{-(2d_\mathcal{S} + d^\epsilon_z + 1)}\log{(T)} として有界であることを示し、$d^\epsilon_z$ は $\epsilon$-zooming dimension である。対照的に、MDP の固定離散化にバニラ $\textit{UCRL-}2$ を使用する場合、後悔 w.r.t. a $\epsilon$-optimal policy scales as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d + 1)}\log{(T)}$ として、d^\epsilon_z \ll d$ のとき、適応性は巨大になる。連続 MDP の大きな族に対する「一様良い」アルゴリズムの絶対的後悔は、少なくとも$\Omega(\log{(T)})$として漸近的にスケールする。適応的な離散化は、エピソード RL において $\mathcal{\tilde{O}}(H^{2.5}K^\frac{d_z + 1}{d_z + 2})$ regret をもたらすことが示されているが、$d_z \to d$ as $T \to \infty$ であるから、持続時間が $T$ で増加する一定期間のエピソードを用いてこれを非エピソードケースに拡張しようとする試みは無駄である。現在の研究は、非エポゾディックRLに対する適応性ゲインを得る方法を示している。理論的結果は、連続的な状態作用空間を持つシステムに対する$\textit{ZoRL-}\epsilon$ と '$\textit{UCRL-C}$' の固定離散化に基づく $\textit{UCRL-}2$ の性能を比較する2つのシステムのシミュレーションによって支持される。

We study non-episodic Reinforcement Learning for Lipschitz MDPs in which state-action space is a metric space, and the transition kernel and rewards are Lipschitz functions. We develop computationally efficient UCB-based algorithm, $\textit{ZoRL-}\epsilon$ that adaptively discretizes the state-action space and show that their regret as compared with $\epsilon$-optimal policy is bounded as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d^\epsilon_z + 1)}\log{(T)})$, where $d^\epsilon_z$ is the $\epsilon$-zooming dimension. In contrast, if one uses the vanilla $\textit{UCRL-}2$ on a fixed discretization of the MDP, the regret w.r.t. a $\epsilon$-optimal policy scales as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d + 1)}\log{(T)})$ so that the adaptivity gains are huge when $d^\epsilon_z \ll d$. Note that the absolute regret of any 'uniformly good' algorithm for a large family of continuous MDPs asymptotically scales as at least $\Omega(\log{(T)})$. Though adaptive discretization has been shown to yield $\mathcal{\tilde{O}}(H^{2.5}K^\frac{d_z + 1}{d_z + 2})$ regret in episodic RL, an attempt to extend this to the non-episodic case by employing constant duration episodes whose duration increases with $T$, is futile since $d_z \to d$ as $T \to \infty$. The current work shows how to obtain adaptivity gains for non-episodic RL. The theoretical results are supported by simulations on two systems where the performance of $\textit{ZoRL-}\epsilon$ is compared with that of '$\textit{UCRL-C}$,' the fixed discretization-based extension of $\textit{UCRL-}2$ for systems with continuous state-action spaces.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 参照アドバンテージ分解によるフェデレーションQ-Learning: ほぼ最適回帰と対数通信コスト

Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost ( http://arxiv.org/abs/2405.18795v1 )

ライセンス: Link先を確認

Zhong Zheng, Haochen Zhang, Lingzhou Xue,

(参考訳) 本稿では,表在的マルコフ決定過程におけるモデル自由連合強化学習について考察する。中央サーバの協調の下で、複数のエージェントが協調して環境を探索し、生データを共有せずに最適なポリシーを学ぶ。フェデレートされたQ-ラーニングアルゴリズムの最近の進歩は、通信コストの低いほぼ直線的後悔のスピードアップを実現しているにもかかわらず、既存のアルゴリズムは情報バウンドよりも過度な後悔しか達成していない。本稿では,FedQ-Advantageと呼ばれる新しいモデルフリーなフェデレーションQ-ラーニングアルゴリズムを提案する。提案アルゴリズムは,分散低減のための参照アドバンテージ分解を利用して,エージェントとサーバ間の同期と,イベントによって引き起こされるポリシー更新という,2つの異なるメカニズムの下で動作する。本アルゴリズムは対数通信コストの低減だけでなく,時間的地平線が十分に大きい場合と比較して,対数係数とほぼ直線的後悔速度に制限された情報に到達し,ほぼ最適に後悔することを示す。

In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication cost, existing algorithms only attain suboptimal regrets compared to the information bound. We propose a novel model-free federated Q-learning algorithm, termed FedQ-Advantage. Our algorithm leverages reference-advantage decomposition for variance reduction and operates under two distinct mechanisms: synchronization between the agents and the server, and policy update, both triggered by events. We prove that our algorithm not only requires a lower logarithmic communication cost but also achieves an almost optimal regret, reaching the information bound up to a logarithmic factor and near-linear regret speedup compared to its single-agent counterpart when the time horizon is sufficiently large.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 物体学習型畳み込みニューラルネットワークによる顔処理

Face processing emerges from object-trained convolutional neural networks ( http://arxiv.org/abs/2405.18800v1 )

ライセンス: Link先を確認

Zhenhua Zhao, Ji Chen, Zhicheng Lin, Haojiang Ying,

(参考訳) 顔処理はドメイン固有の神経認知機構に依存するのか、それともドメイン全般の物体認識機構に依存しているのかは、長い間議論されてきた。これらの仮説をヒトで直接テストすることは、顔と物体の両方に広範囲に露出するため、難しいことが証明されている。ここでは、顔の露出なしに訓練できる畳み込みニューラルネットワーク(CNN)の最近の進歩に乗じて、これらの仮説を体系的に検証する。ドメイン・ジェネラル・メカニズムは、顔に特別な事前トレーニングを加えることなく、顔処理がニューラルネットワークから現れることを実証している。その結果、私たちはCNNを物体だけに訓練し、顔の認識と表現能力、および顔のように見える物体(顔のパレドリア刺激)をテストしました。文字制限のため、詳細は付加されたpdfを参照してください。

Whether face processing depends on unique, domain-specific neurocognitive mechanisms or domain-general object recognition mechanisms has long been debated. Directly testing these competing hypotheses in humans has proven challenging due to extensive exposure to both faces and objects. Here, we systematically test these hypotheses by capitalizing on recent progress in convolutional neural networks (CNNs) that can be trained without face exposure (i.e., pre-trained weights). Domain-general mechanism accounts posit that face processing can emerge from a neural network without specialized pre-training on faces. Consequently, we trained CNNs solely on objects and tested their ability to recognize and represent faces as well as objects that look like faces (face pareidolia stimuli).... Due to the character limits, for more details see in attached pdf

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# SketchTriplet: 自己監督型Sketch-Text- Image Triplet生成

SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation ( http://arxiv.org/abs/2405.18801v1 )

ライセンス: Link先を確認

Zhenbei Wu, Qiang Wang, Jie Yang,

(参考訳) フリーハンドスケッチの不足は、難しい問題である。大規模なスケッチデータセットの出現にもかかわらず、これらのデータセットは主に単一のオブジェクトレベルでのスケッチで構成されている。シーンスケッチ用の大規模なペアデータセットは引き続き欠如している。本稿では,既存のシーンスケッチに依存しないシーンスケッチ自動生成手法を提案し,シーンスケッチへの単一オブジェクトスケッチの変換を可能にする。そこで本研究では,ベクトルスケッチキャプションとスケッチセマンティック展開のための手法を提案する。さらに,マルチモーダルな知覚制約を融合したスケッチ生成ネットワークを設計し,ゼロショット画像・スケッチダウンストリームタスクに適用し,実験検証による最先端性能の実証を行う。最後に,提案したスケッチ・ツー・スケッチ生成手法を利用して,シーン・スケッチを中心にした大規模データセットをコントリビュートする。本研究は,スケッチベース画像検索およびスケッチ制御画像合成タスクにおいて,既存のモデルの性能を大幅に向上させることができることを確認した。データセットとコードを公開します。

The scarcity of free-hand sketch presents a challenging problem. Despite the emergence of some large-scale sketch datasets, these datasets primarily consist of sketches at the single-object level. There continues to be a lack of large-scale paired datasets for scene sketches. In this paper, we propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch, enabling the transformation of single-object sketches into scene sketches. To accomplish this, we introduce a method for vector sketch captioning and sketch semantic expansion. Additionally, we design a sketch generation network that incorporates a fusion of multi-modal perceptual constraints, suitable for application in zero-shot image-to-sketch downstream task, demonstrating state-of-the-art performance through experimental validation. Finally, leveraging our proposed sketch-to-sketch generation method, we contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets. Our research confirms that this dataset can significantly enhance the capabilities of existing models in sketch-based image retrieval and sketch-controlled image synthesis tasks. We will make our dataset and code publicly available.

翻訳日:2024-05-30 18:38:40 公開日:2024-05-29

# 更新ダイジェストと投票に基づく防衛を用いたフェデレーション学習におけるセキュリティとプライバシの強化

Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense ( http://arxiv.org/abs/2405.18802v1 )

ライセンス: Link先を確認

Wenjie Li, Kai Fan, Jingyuan Zhang, Hui Li, Wei Yang Bryan Lim, Qiang Yang,

(参考訳) Federated Learning(FL)は、データ所有者がデータのローカライズを維持しながら、モデルの共同トレーニングを可能にする、有望なプライバシ保護機械学習パラダイムである。その可能性にもかかわらず、FLはクライアントとサーバの両方の信頼性に関する課題に直面している。本稿では,分散学習環境におけるビザンチン攻撃に対するプライバシー保護と耐性の重要な問題に対処する,新しいフレームワークである \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD)を紹介する。 FLUDは、$\mathsf{LinfSample}$メソッドという革新的なアプローチを採用している。このダイジェストにより、サーバは共有距離行列を計算し、セキュアなマルチパーティ計算(SMPC)に関連するオーバーヘッドを3桁に減らし、良質な更新と悪質な更新を効果的に区別することができる。さらにFLUDは、通信ラウンドを最小化するために最適化されたSMPCプロトコルを使用するプライバシー保護、投票ベースの防衛メカニズムを統合している。包括的実験では、通信の低さと実行時のオーバーヘッドを伴いながら、ビザンチンの敵に対するFLUDの有効性を実証した。 FLUDは、分散環境におけるセキュアで信頼性の高いFLのためのスケーラブルなフレームワークを提供する。

Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD), which addresses the critical issues of privacy preservation and resistance to Byzantine attacks within distributed learning environments. FLUD utilizes an innovative approach, the $\mathsf{LinfSample}$ method, allowing clients to compute the $l_{\infty}$ norm across sliding windows of updates as an update digest. This digest enables the server to calculate a shared distance matrix, significantly reducing the overhead associated with Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and malicious updates. Additionally, FLUD integrates a privacy-preserving, voting-based defense mechanism that employs optimized SMPC protocols to minimize communication rounds. Our comprehensive experiments demonstrate FLUD's effectiveness in countering Byzantine adversaries while incurring low communication and runtime overhead. FLUD offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# ニューラルネットワークにおけるセミリング活性化

Semiring Activation in Neural Networks ( http://arxiv.org/abs/2405.18805v1 )

ライセンス: Link先を確認

Bart M. N. Smets, Peter D. Donker, Jim W. Portegies, Remco Duits,

(参考訳) ニューラルネットワークでの使用に適したセミリングに基づいて、トレーニング可能な非線形演算子のクラスを導入する。これらの作用素は、ニューラルネットワークにおける活性化関数を持つ線形作用素の伝統的な交替を一般化する。セミリング(英: Semiring)は、線形性の一般化された表記を記述する代数的構造であり、ニューラルネットワークに含まれる訓練可能な作用素の範囲を大きく広げている。実際、最大または最小プール演算は、固定された核を持つ熱帯半環の畳み込みである。トレーニング可能なセミリング演算子の活性化関数を置き換える実験を行い、これらが完全に接続されただけでなく畳み込みニューラルネットワーク(ConvNeXt)にも適用可能であることを示す。本稿では,従来のアクティベーション関数をトレーニング可能なセミリングアクティベーションに置き換えることの課題と,そのトレードオフについて論じる。

We introduce a class of trainable nonlinear operators based on semirings that are suitable for use in neural networks. These operators generalize the traditional alternation of linear operators with activation functions in neural networks. Semirings are algebraic structures that describe a generalised notation of linearity, greatly expanding the range of trainable operators that can be included in neural networks. In fact, max- or min-pooling operations are convolutions in the tropical semiring with a fixed kernel. We perform experiments where we replace the activation functions for trainable semiring-based operators to show that these are viable operations to include in fully connected as well as convolutional neural networks (ConvNeXt). We discuss some of the challenges of replacing traditional activation functions with trainable semiring activations and the trade-offs of doing so.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# BRACTIVE:人間の視覚脳学習における脳活動的アプローチ

BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning ( http://arxiv.org/abs/2405.18808v1 )

ライセンス: Link先を確認

Xuan-Bac Nguyen, Hojin Jang, Xin Li, Samee U. Khan, Pawan Sinha, Khoa Luu,

(参考訳) 人間の脳は、非常に効率的な処理ユニットであり、その仕組みを理解することによって、機械学習における新しいアルゴリズムとアーキテクチャを刺激することができる。本研究では,脳活動ネットワーク(BRACTIVE)という,人間の視覚脳を研究するためのトランスフォーマーベースのアプローチを紹介する。 BRACTIVEの主な目的は、被験者の視覚的特徴をfMRI信号を介して対応する脳表現と整合させることである。これにより、被験者の脳の関心領域(ROI)を特定できます。従来の脳研究手法とは異なり、1つの被験者のROIしか識別できず、被験者数によって制限されているが、BRACTIVEは自動的に複数の被験者とROIに識別を拡張している。実験の結果, BRACTIVEは, 顔や身体選択領域などの興味のある領域を効果的に同定し, 神経科学的な所見と整合し, 様々な対象カテゴリーに適用可能であることが示された。さらに重要なのは、人間の視覚的脳活動を利用して、ディープニューラルネットワークを誘導することで、さまざまなベンチマークのパフォーマンスが向上することです。これは、神経科学と機械知能研究の両方においてBRACTIVEの可能性を促進する。

The human brain is a highly efficient processing unit, and understanding how it works can inspire new algorithms and architectures in machine learning. In this work, we introduce a novel framework named Brain Activation Network (BRACTIVE), a transformer-based approach to studying the human visual brain. The main objective of BRACTIVE is to align the visual features of subjects with corresponding brain representations via fMRI signals. It allows us to identify the brain's Regions of Interest (ROI) of the subjects. Unlike previous brain research methods, which can only identify ROIs for one subject at a time and are limited by the number of subjects, BRACTIVE automatically extends this identification to multiple subjects and ROIs. Our experiments demonstrate that BRACTIVE effectively identifies person-specific regions of interest, such as face and body-selective areas, aligning with neuroscience findings and indicating potential applicability to various object categories. More importantly, we found that leveraging human visual brain activity to guide deep neural networks enhances performance across various benchmarks. It encourages the potential of BRACTIVE in both neuroscience and machine intelligence studies.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# UniPTS: 熟練したポストトレーニングスパシティのための統一フレームワーク

UniPTS: A Unified Framework for Proficient Post-Training Sparsity ( http://arxiv.org/abs/2405.18810v1 )

ライセンス: Link先を確認

Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji,

(参考訳) Post-training Sparsity (PTS)は、必要な限られたデータで効率的なネットワークスパシティを追求する、最近登場した道である。しかし、既存のPSS手法は、データセット全体を通してスパースネットワークをリトレーニングする従来の手法と比較して、特に高空間比で性能が著しく低下している。本稿では,従来のスパシティの性能をPSSの文脈に大きく変化させる3つの基本因子を変換することで,この相違を解消しようとする。特に本研究は,(1)高密度ネットワークからスパースネットワークへの効率的な知識伝達を促進するベースデケイド・スパシティーの目的から成っている。 2) PTS の小型キャリブレーションに過度な適合を回避しつつ,最適空間分布を推定する探索アルゴリズムについて検討した。 (3) トレーニング安定性を確保しつつ, 空間構造を包括的に最適化することを目的とした, 事前の側面を前提としたダイナミックスパーストレーニングの実施。提案するフレームワークはUniPTSと呼ばれ,既存のPTS手法よりも広範なベンチマークで優れていることが検証されている。図示として、最近提案されたレシピであるPOTのパフォーマンスを3.9%から68.6%に向上させ、ImageNet上でResNet-50を90%の間隔でプルーニングする。論文のコードはhttps://github.com/xjjxmu/UniPTS.comで公開しています。

Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# MindSemantix:脳-言語モデルによる脳視覚体験の解読

MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model ( http://arxiv.org/abs/2405.18812v1 )

ライセンス: Link先を確認

Ziqi Ren, Jie Li, Xuetong Xue, Xin Li, Fan Yang, Zhicheng Jiao, Xinbo Gao,

(参考訳) fMRIで捉えた脳の活動を通して人間の視覚体験を解読することは、神経科学研究の分野における魅力的な最先端の課題である。観察画像自体を単に予測するのではなく、脳活動を意味のあるキャプションにデコードすることで、視覚情報の高レベルな解釈と要約が可能になり、現実の状況における応用の柔軟性が自然に向上する。本研究では,脳活動における視覚的に誘発される意味的内容の理解を可能にする,新しいマルチモーダルフレームワークであるMindSemantixを紹介する。私たちのMindSemantixは、LLMを脳活動分析に織り込み、シームレスでエンドツーエンドのBrain-Language Modelを構築することで、より理想的な脳キャプションパラダイムを探求しています。脳の応答から意味情報を効果的に捉えるために,脳Q-Formerをコアアーキテクチャとして利用するBrain-Text Transformerを提案する。トレーニング済みの脳エンコーダと凍結LDMを統合して、脳ビジョン言語を多モードでアライメントし、堅牢な脳-言語対応を確立する。神経表現の一般化性を高めるために,脳エンコーダを自己教師付き学習技術を用いて,大規模・クロスオブジェクトfMRIデータセット上で事前訓練する。 MindSemantixは、刺激再構成のような下流脳のデコードタスクに、より実現可能性を提供します。 MindSemantixのキャプションにより、私たちのフレームワークは、安定拡散のような高度な生成モデルと統合し、脳の視覚的知覚を理解することを促進する。 MindSemantixは、脳の活動から派生した視覚的および意味的な情報に深く根ざした高品質なキャプションを生成する。このアプローチは、先行技術よりも相当に定量的に改善されている。私たちのコードは解放されます。

Deciphering the human visual experience through brain activities captured by fMRI represents a compelling and cutting-edge challenge in the field of neuroscience research. Compared to merely predicting the viewed image itself, decoding brain activity into meaningful captions provides a higher-level interpretation and summarization of visual information, which naturally enhances the application flexibility in real-world situations. In this work, we introduce MindSemantix, a novel multi-modal framework that enables LLMs to comprehend visually-evoked semantic content in brain activity. Our MindSemantix explores a more ideal brain captioning paradigm by weaving LLMs into brain activity analysis, crafting a seamless, end-to-end Brain-Language Model. To effectively capture semantic information from brain responses, we propose Brain-Text Transformer, utilizing a Brain Q-Former as its core architecture. It integrates a pre-trained brain encoder with a frozen LLM to achieve multi-modal alignment of brain-vision-language and establish a robust brain-language correspondence. To enhance the generalizability of neural representations, we pre-train our brain encoder on a large-scale, cross-subject fMRI dataset using self-supervised learning techniques. MindSemantix provides more feasibility to downstream brain decoding tasks such as stimulus reconstruction. Conditioned by MindSemantix captioning, our framework facilitates this process by integrating with advanced generative models like Stable Diffusion and excels in understanding brain visual perception. MindSemantix generates high-quality captions that are deeply rooted in the visual and semantic information derived from brain activity. This approach has demonstrated substantial quantitative improvements over prior art. Our code will be released.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# 反復的破壊軌道マッチングによる線形逆問題に対するフロー優先法

Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching ( http://arxiv.org/abs/2405.18816v1 )

ライセンス: Link先を確認

Yasi Zhang, Peiyu Yu, Yaxuan Zhu, Yingshan Chang, Feng Gao, Ying Nian Wu, Oscar Leong,

(参考訳) フローマッチングに基づく生成モデルは、高解像度画像合成において、その単純さと優れた性能のために大きな注目を集めている。変数の即時変化式を利用することで、学習フローから直接画像可能性を計算することができ、逆問題などの下流タスクの先行候補として候補を魅了することができる。特に、そのような画像確率をMAP推定問題に組み込むことが自然なアプローチである。しかし、大きな障害は、ODEソルバをバックプロパゲートする必要があるため、ログのような計算が遅いことにある。本研究では,MAP推定器を効率的に近似し,様々な線形逆問題の解法を提案する。我々のアルゴリズムは、MAPの目的を関数評価の数である$N$ ``local MAP'の目的の和で近似できるという観察によって数学的に正当化されている。ツイーディの公式を利用することで、これらの目的を逐次最適化するために勾配ステップを実行できることを示す。我々は,超解法,デブロアリング,インペインティング,圧縮センシングなどの線形逆問題に対するアプローチを検証し,フローマッチングに基づく他の手法よりも優れていることを示す。

Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis. By leveraging the instantaneous change-of-variables formula, one can directly compute image likelihoods from a learned flow, making them enticing candidates as priors for downstream tasks such as inverse problems. In particular, a natural approach would be to incorporate such image probabilities in a maximum-a-posteriori (MAP) estimation problem. A major obstacle, however, lies in the slow computation of the log-likelihood, as it requires backpropagating through an ODE solver, which can be prohibitively slow for high-dimensional problems. In this work, we propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems. Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ ``local MAP'' objectives, where $N$ is the number of function evaluations. By leveraging Tweedie's formula, we show that we can perform gradient steps to sequentially optimize these objectives. We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing, and demonstrate that we can outperform other methods based on flow matching.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# 効率的な持続型位相最適化のための微分型補間法

Diffeomorphic interpolation for efficient persistence-based topological optimization ( http://arxiv.org/abs/2405.18820v1 )

ライセンス: Link先を確認

Mathieu Carriere, Marc Theveneau, Théo Lacombe,

(参考訳) トポロジカルデータ分析(TDA)は、構造化オブジェクトから定量的トポロジカル記述子を抽出するパイプラインを提供する。これにより、ある対象が幾らかの位相的性質を示す範囲を主張する位相的損失函数の定義が可能になる。これらの損失は、トポロジカル最適化勾配降下ルーチンの実行に使用できる。勾配は極めてスパースである傾向があるが、一般に損失関数は入力対象のごくわずかな座標にのみ依存するので、実際は劇的に遅い最適化スキームが得られるので、点雲の位相最適化の中心的なケースに着目して、微分型補間を用いてこの制限を克服し、スパース勾配を空間全体上で定義された滑らかなベクトル場に、量子化リプシッツ定数で変換する。特に,本手法は,TDAで日常的に使用されるサブサンプリング手法と効率的に組み合わせることで,サブサンプル上で計算された勾配から導出される微分同相法を用いて,全入力オブジェクトの座標を更新し,前例のないスケールで点雲の位相最適化を行うことができることを示す。最後に,ブラックボックスオートエンコーダ(AE)正則化に対する我々のアプローチの妥当性を示すとともに,固定,事前学習,ブラックボックスAEモデルに関連する潜在空間のトポロジ的事前適用を目標とし,微分型フローの学習を一度に行うことができ,線形時間で新たなデータに再適用可能であることを示す(ただし,バニラトポロジ的最適化はスクラッチから再実行する必要がある)。さらに、フローを反転させることで、トポロジ的に最適化された潜在空間を直接サンプリングすることでデータを生成することができ、モデルのより優れた解釈可能性が得られる。

Topological Data Analysis (TDA) provides a pipeline to extract quantitative topological descriptors from structured objects. This enables the definition of topological loss functions, which assert to what extent a given object exhibits some topological properties. These losses can then be used to perform topological optimizationvia gradient descent routines. While theoretically sounded, topological optimization faces an important challenge: gradients tend to be extremely sparse, in the sense that the loss function typically depends on only very few coordinates of the input object, yielding dramatically slow optimization schemes in practice.Focusing on the central case of topological optimization for point clouds, we propose in this work to overcome this limitation using diffeomorphic interpolation, turning sparse gradients into smooth vector fields defined on the whole space, with quantifiable Lipschitz constants. In particular, we show that our approach combines efficiently with subsampling techniques routinely used in TDA, as the diffeomorphism derived from the gradient computed on a subsample can be used to update the coordinates of the full input object, allowing us to perform topological optimization on point clouds at an unprecedented scale. Finally, we also showcase the relevance of our approach for black-box autoencoder (AE) regularization, where we aim at enforcing topological priors on the latent spaces associated to fixed, pre-trained, black-box AE models, and where we show thatlearning a diffeomorphic flow can be done once and then re-applied to new data in linear time (while vanilla topological optimization has to be re-run from scratch). Moreover, reverting the flow allows us to generate data by sampling the topologically-optimized latent space directly, yielding better interpretability of the model.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# 自由な毒性検出

Toxicity Detection for Free ( http://arxiv.org/abs/2405.18822v1 )

ライセンス: Link先を確認

Zhanhao Hu, Julien Piet, Geng Zhao, Jiantao Jiao, David Wagner,

(参考訳) 現在のLSMは一般に安全要件に従うように調整されており、有害なプロンプトを拒否する傾向がある。しかし、LSMは有害なプロンプトを拒絶したり、過度に注意し、良心的な例を拒否することができない。さらに、最先端の毒性検知器は、低いFPRで低いTPRを持ち、有害な例が稀な現実世界のアプリケーションに高いコストをもたらす。本稿では,LSM自体から直接抽出した情報を用いて有害なプロンプトを検出するMULI(Moduleration Using LLM Introspection)について検討する。代替拒絶反応の分布と第1応答トークンのロジットの分布において,良性と有毒なプロンプトの間に有意な差が認められた。特定の開始トークンのロバストに基づく玩具モデルでは、トレーニングや追加の計算コストを必要とせず、信頼性の高い性能が得られることを示す。我々は、複数の測定値の下でSOTA検出器をはるかに上回る、第1応答トークンロジットのスパースロジスティック回帰モデルを用いて、よりロジスティックな検出器を構築する。

Current LLMs are generally aligned to follow safety requirements and tend to refuse toxic prompts. However, LLMs can fail to refuse toxic prompts or be overcautious and refuse benign examples. In addition, state-of-the-art toxicity detectors have low TPRs at low FPR, incurring high costs in real-world applications where toxic examples are rare. In this paper, we explore Moderation Using LLM Introspection (MULI), which detects toxic prompts using the information extracted directly from LLMs themselves. We found significant gaps between benign and toxic prompts in the distribution of alternative refusal responses and in the distribution of the first response token's logits. These gaps can be used to detect toxicities: We show that a toy model based on the logits of specific starting tokens gets reliable performance, while requiring no training or additional computational cost. We build a more robust detector using a sparse logistic regression model on the first response token logits, which greatly exceeds SOTA detectors under multiple metrics.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# エネルギーシステムにおける強化学習はなぜ説明を必要とするのか

Why Reinforcement Learning in Energy Systems Needs Explanations ( http://arxiv.org/abs/2405.18823v1 )

ライセンス: Link先を確認

Hallah Shahid Butt, Benjamin Schäfer,

(参考訳) 経済発展に伴い、インフラの複雑さは劇的に増大した。同様に、化石燃料から再生可能エネルギー源への移行によって、正確な予測と予測だけでなく、予測のプロセスの理解にも役立つようなシステムが必要である。人工知能と機械学習技術は、エネルギーセクターのさまざまな問題に対する優れたソリューションを見つけるのに役立っている。しかし、強化学習のような最先端技術の使用は驚くべきことではない。本稿では,エネルギーシステムにおける強化技術の適用と,これらのモデルの説明がいかに役立つかについて論じる。

With economic development, the complexity of infrastructure has increased drastically. Similarly, with the shift from fossil fuels to renewable sources of energy, there is a dire need for such systems that not only predict and forecast with accuracy but also help in understanding the process of predictions. Artificial intelligence and machine learning techniques have helped in finding out wellperforming solutions to different problems in the energy sector. However, the usage of state-of-the-art techniques like reinforcement learning is not surprisingly convincing. This paper discusses the application of reinforcement techniques in energy systems and how explanations of these models can be helpful

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# グラフニューラルネットワークに対するラベル伝搬に基づくノード注入攻撃

Node Injection Attack Based on Label Propagation Against Graph Neural Network ( http://arxiv.org/abs/2405.18824v1 )

ライセンス: Link先を確認

Peican Zhu, Zechen Pan, Keke Tang, Xiaodong Cui, Jinhuan Wang, Qi Xuan,

(参考訳) グラフニューラルネットワーク(GNN)は,ノード分類,リンク予測,グラフ分類など,さまざまなグラフ学習タスクにおいて顕著な成功を収めている。 GNNの成功の鍵は、近隣の集約による効果的な構造情報表現にある。しかし、攻撃者は偽ノードを注入することで容易に集約プロセスを摂動でき、グラフインジェクション攻撃に対してGNNが脆弱であることを明らかにする。既存のグラフインジェクション攻撃法は主に、ラベル伝搬による近隣の集約プロセスを見下ろしながら、古典的な特徴集約プロセスの損傷に焦点を当てている。このギャップを埋めるために,ノード分類タスクに対してグラフ注入攻撃を行うラベル伝搬型グローバルインジェクションアタック(LPGIA)を提案する。具体的には,ラベル伝播の観点から集約プロセスを解析し,グラフ注入攻撃問題をグローバルなインジェクションラベル特異性攻撃問題に変換する。この問題を解決するため、LPGIAはラベル伝搬に基づく戦略を用いて、注入ノードに接続されたノードの組み合わせを最適化する。次に、LPGIAは機能マッピングを利用して、注入されたノードの悪意ある機能を生成する。代表的GNNに対する広範な実験において、LPGIAは様々なデータセットにおいて、これまでの最も優れたインジェクション攻撃法よりも優れており、その優位性と転送性を示している。

Graph Neural Network (GNN) has achieved remarkable success in various graph learning tasks, such as node classification, link prediction and graph classification. The key to the success of GNN lies in its effective structure information representation through neighboring aggregation. However, the attacker can easily perturb the aggregation process through injecting fake nodes, which reveals that GNN is vulnerable to the graph injection attack. Existing graph injection attack methods primarily focus on damaging the classical feature aggregation process while overlooking the neighborhood aggregation process via label propagation. To bridge this gap, we propose the label-propagation-based global injection attack (LPGIA) which conducts the graph injection attack on the node classification task. Specifically, we analyze the aggregation process from the perspective of label propagation and transform the graph injection attack problem into a global injection label specificity attack problem. To solve this problem, LPGIA utilizes a label propagation-based strategy to optimize the combinations of the nodes connected to the injected node. Then, LPGIA leverages the feature mapping to generate malicious features for injected nodes. In extensive experiments against representative GNNs, LPGIA outperforms the previous best-performing injection attack method in various datasets, demonstrating its superiority and transferability.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# 最小資源を持つ量子ビット上のデバイス非依存次元リークヌルテスト

Device-independent dimension leakage null test on qubits with minimal resources ( http://arxiv.org/abs/2405.18827v1 )

ライセンス: Link先を確認

Tomasz Rybotycki, Tomasz Białecki, Josep Batle, Adam Bednorz,

(参考訳) 我々は、デバイス独立であり、最小限の異なる実験を必要とするキュービットの2レベル空間のヌルテストを構築する。ほとんどの量子ビットは10以上の標準偏差でテストに失敗する。脱コヒーレンスや位相シフトといった一般的な技術的欠陥に対するテストの堅牢さは、逸脱の起源が既知の効果を超えていることを示している。

We construct a null test of the two-level space of a qubit, which is both device independent and needs a minimal number of different experiments. We demonstrate its feasibility on IBM Quantum, with most qubits failing the test by more than 10 standard deviations. The robustness of the test against common technical imperfections, like decoherence and phase shifts, and supposedly negligible leakage, indicates that the origin of deviations is beyond known effects.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# CHANI: 生体吸入によるニューロンの相関に基づくホークス凝集

CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration ( http://arxiv.org/abs/2405.18828v1 )

ライセンス: Link先を確認

Sophie Jaffard, Samuel Vaiter, Patricia Reynaud-Bouret,

(参考訳) 本研究の目的は,生物学にインスパイアされたニューラルネットワークが,局所的な変換のみによって分類タスクを学習できることを数学的に証明することである。そこで本研究では,CHANI(Correlation-based Hawkes Aggregation of Neurons with Bio-Inspiration)と呼ばれるスパイキングニューラルネットワークを提案する。シナプス重みはエキスパートアグリゲーションアルゴリズムによって更新され、局所的で単純な学習ルールを提供する。ネットワークが平均的かつ漸近的に学習できることを証明することができたのです。さらに、ネットワークが複数のクラスをエンコードし、中間層内の同じニューロンを複数のクラスで活性化できるという意味で、神経集合を自動生成することを示し、合成データセット上で数値シミュレーションを行った。この理論的なアプローチは、生物学的にインスパイアされたネットワークの従来の実証的な検証とは対照的であり、局所的な学習規則によって神経細胞が複雑な概念を表現できるアセンブリを形成する方法を理解するための道を開く。

The present work aims at proving mathematically that a neural network inspired by biology can learn a classification task thanks to local transformations only. In this purpose, we propose a spiking neural network named CHANI (Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration), whose neurons activity is modeled by Hawkes processes. Synaptic weights are updated thanks to an expert aggregation algorithm, providing a local and simple learning rule. We were able to prove that our network can learn on average and asymptotically. Moreover, we demonstrated that it automatically produces neuronal assemblies in the sense that the network can encode several classes and that a same neuron in the intermediate layers might be activated by more than one class, and we provided numerical simulations on synthetic dataset. This theoretical approach contrasts with the traditional empirical validation of biologically inspired networks and paves the way for understanding how local learning rules enable neurons to form assemblies able to represent complex concepts.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# 3次元視覚質問応答ベンチマークによるゼロショットGPT-4Vの性能評価

Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks ( http://arxiv.org/abs/2405.18831v1 )

ライセンス: Link先を確認

Simranjit Singh, Georgios Pavlakos, Dimitrios Stamoulis,

(参考訳) 基礎モデルの文脈における3次元視覚質問回答(VQA)問題の「修正」に関心があるため、これらの新しいパラダイムが既存の閉語彙データセットにどのように影響するかを評価することが不可欠である。本稿では,基礎モデル(GPT-4 Vision and GPT-4)のゼロショット性能を,確立された3次元VQAベンチマーク,すなわち3D-VQAとScanQAで評価する。我々は,従来のモデリング手法と比較して,GPTに基づくエージェントの性能を文脈化するための調査を行う。我々は,GPTをベースとしたエージェントが,クローズドボキャブラリのアプローチと同等に機能することを発見した。我々の研究は,閉鎖語彙設定において,「盲」モデルが驚くほど強いベースラインを確立するという最近の知見を裏付けるものである。エージェントは,テキストのテキストグラウンド化によって,シーン固有の語彙から大きな利益を享受できることを実証する。これまでのベースラインと事前比較を行うことで、マルチモーダルな3Dベンチマークを改良するためのコミュニティの継続的な取り組みを知らせることを期待します。

As interest in "reformulating" the 3D Visual Question Answering (VQA) problem in the context of foundation models grows, it is imperative to assess how these new paradigms influence existing closed-vocabulary datasets. In this case study, we evaluate the zero-shot performance of foundational models (GPT-4 Vision and GPT-4) on well-established 3D VQA benchmarks, namely 3D-VQA and ScanQA. We provide an investigation to contextualize the performance of GPT-based agents relative to traditional modeling approaches. We find that GPT-based agents without any fine-tuning perform on par with the closed vocabulary approaches. Our findings corroborate recent results that "blind" models establish a surprisingly strong baseline in closed-vocabulary settings. We demonstrate that agents benefit significantly from scene-specific vocabulary via in-context textual grounding. By presenting a preliminary comparison with previous baselines, we hope to inform the community's ongoing efforts to refine multi-modal 3D benchmarks.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# MoNDE: 大規模スパースモデルのためのニアデータエキスパートの混合

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models ( http://arxiv.org/abs/2405.18832v1 )

ライセンス: Link先を確認

Taehyun Kim, Kwanseok Choi, Youngmock Cho, Jaehoon Cho, Hyuk-Jae Lee, Jaewoong Sim,

(参考訳) MoE(Mixture-of-Experts)の大規模言語モデル(LLM)は、GPUメモリ容量を超えることが多いメモリ要件を持ち、二次記憶から専門計算のためのGPUへのコストのかかるパラメータ移動を必要とする。そこで本研究では,MoE LLM推論を効率的に実現するニアデータ・コンピューティング・ソリューションであるMixture of Near-Data Experts(MoNDE)を提案する。 MoNDEは、$\textit{hot}$専門家だけをGPUに転送し、残りの$\textit{cold}$専門家をホストメモリデバイス内で計算することで、MoEパラメータの運動量を削減している。大規模な専門家パラメータの転送を小さなアクティベーションに置き換えることで、MoNDEは通信効率のよいMoE推論を可能にし、エンコーダとデコーダの両方で既存のパラメータオフロードフレームワークを大幅に高速化する。

Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter movement by transferring only the $\textit{hot}$ experts to the GPU, while computing the remaining $\textit{cold}$ experts inside the host memory device. By replacing the transfers of massive expert parameters with the ones of small activations, MoNDE enables far more communication-efficient MoE inference, thereby resulting in substantial speedups over the existing parameter offloading frameworks for both encoder and decoder operations.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# Do Finetti: 交換可能なデータに対する因果効果について

Do Finetti: On Causal Effects for Exchangeable Data ( http://arxiv.org/abs/2405.18836v1 )

ライセンス: Link先を確認

Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf,

(参考訳) データをi.d.d.(独立で同一の分散)でない環境での因果効果の推定について検討する。我々は、独立因果関係の仮定を満たす交換可能なデータに焦点を当てる。従来の因果効果推定フレームワーク(例:構造因果モデルとdo-calculus)は、通常、i.d.データに制限され、複数の環境データに自然に発生する、より一般的な交換可能な生成プロセスに拡張されない。このギャップに対処するために、我々は、交換可能なデータのための一般化されたフレームワークを開発し、我々の設定における因果効果の同定と推定を容易にする、切り離された因果分解公式を導入する。潜在的な応用を説明するために、我々は因果的P\'olya urnモデルを導入し、介入が交換可能なデータ設定においてどのように影響を伝播するかを示す。最後に,マルチ環境データから因果探索と効果推定を同時に行うアルゴリズムを開発した。

We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal P\'olya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data.

翻訳日:2024-05-30 18:28:55 公開日:2024-05-29

# 量子回路パラメータ化のための変圧器

Transformer for Parameterized Quantum Circuits Expressibility Prediction ( http://arxiv.org/abs/2405.18837v1 )

ライセンス: Link先を確認

Fei Zhang, Jie Li, Zhimin He, Haozhen Situ,

(参考訳) 特定の問題に対する指数関数的に高速な計算により、近年は量子コンピューティングに大きな注目を集めている。変分量子アルゴリズム(VQA)は、量子コンピューティングを実装する上で重要な手法であり、適切なタスク固有アンサッツにより、VQAの量子優位性を効果的に向上させることができる。しかし、膨大な検索スペースは、最適なタスク固有のアンサッツを見つけるのを困難にしている。ヒルベルト空間を効果的に探索するために量子状態の多様性を定量化する表現性は、一方のアンザッツが他方よりも優れているかどうかを評価するために用いられる。本研究では,パラメータ化量子回路の表現性予測におけるトランスフォーマーモデルの有効性について検討した。ゲートワイズ法により生成されるノイズレス回路を含む2つのデータセットを構築し, 量子ビット, ゲート数, 深さで変動する。回路はグラフに変換され、その表現性はKL分割と相対KL分割を用いて計算される。 Transformerモデルはこれらのデータセットに基づいてトレーニングされ、回路特性と表現性の間の複雑な関係をキャプチャする。 5つの評価指標を算出し, 実験結果から, 種々の表現可能性計算法において, 訓練されたモデルが高い性能とロバスト性を達成することを示す。本研究は、効率的な量子回路設計のアイデアを提供し、量子アーキテクチャ探索法の進歩に寄与することができる。

With the exponentially faster computation for certain problems, quantum computing has garnered significant attention in recent years. Variational Quantum Algorithm (VQA) is a crucial method to implement quantum computing, and an appropriate task-specific ansatz can effectively enhance the quantum advantage of VQAs. However, the vast search space makes it challenging to find the optimal task-specific ansatz. Expressibility, quantifying the diversity of quantum states to explore the Hilbert space effectively, can be used to evaluate whether one ansatz is superior than another. This study investigates the effectiveness of the Transformer model in predicting the expressibility of parameterized quantum circuits. We construct two datasets containing noiseless circuits generated by the gatewise method, varying in qubits, gate numbers and depths. The circuits are transformed into graphs, and then their expressibility are calculated using KL-divergence and Relative KL-divergence. A Transformer model is trained on these datasets to capture the intricate relationships between circuit characteristics and expressibility. Five evaluation metrics are calculated, and experimental results demonstrate that the trained model achieves high performance and robustness across various expressibility calculation methods. This research provides ideas for efficient quantum circuit design and can contribute to the advancement of quantum architecture search methods.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# 自動走行車開発におけるヒューマンファクター管理に必要な戦略

Requirements Strategy for Managing Human Factors in Automated Vehicle Development ( http://arxiv.org/abs/2405.18838v1 )

ライセンス: Link先を確認

Amna Pir Muhammad, Alessia Knauss, Eric Knauss, Jonas Bärgman,

(参考訳) 人的要因(HF)の知識の統合は、自動車両(AV)のような安全クリティカルなシステムを開発する際に重要である。 HFの知識がAV開発プロセスを通して継続的に考慮されることを保証することは、これらの先進的なシステムの有効性、安全性、受け入れなど、いくつかの理由において不可欠である。しかしながら、アジャイル開発における要件としてHFを含めることは難しい。最近、アジャイル開発における要件エンジニアリングの課題に対処するための要件戦略が提案されている。 AVのアジャイル開発におけるHF要件の調査に,要件戦略の概念をレンズとして適用することにより,本論文は3つの領域に到達した。 a)HF要件の所有権及び責任 b)HF要件及び情報モデルの構造及び c)HF要求に係る作業及び特徴フローの定義グローバルな自動車産業の専門家との13の半構造化インタビューに基づいて、これらの3分野について質的な洞察を提供する。インタビュアーが共有する多様な視点と経験は、洞察に富んだ見解を提供し、業界内でHFを統合するための各領域の潜在的なソリューション空間について、実世界の実践と戦略を強調するのに役立ちました。

The integration of human factors (HF) knowledge is crucial when developing safety-critical systems, such as automated vehicles (AVs). Ensuring that HF knowledge is considered continuously throughout the AV development process is essential for several reasons, including efficacy, safety, and acceptance of these advanced systems. However, it is challenging to include HF as requirements in agile development. Recently, Requirements Strategies have been suggested to address requirements engineering challenges in agile development. By applying the concept of Requirements Strategies as a lens to the investigation of HF requirements in agile development of AVs, this paper arrives at three areas for investigation: a) ownership and responsibility for HF requirements, b) structure of HF requirements and information models, and c) definition of work and feature flows related to HF requirements. Based on 13 semi-structured interviews with professionals from the global automotive industry, we provide qualitative insights in these three areas. The diverse perspectives and experiences shared by the interviewees provide insightful views and helped to reason about the potential solution spaces in each area for integrating HF within the industry, highlighting the real-world practices and strategies used.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# MEGA:人間のメッシュ回復のためのマスケ生成オートエンコーダ

MEGA: Masked Generative Autoencoder for Human Mesh Recovery ( http://arxiv.org/abs/2405.18839v1 )

ライセンス: Link先を確認

Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer,

(参考訳) 単一のRGB画像からのHuman Mesh Recovery(HMR)は、類似した2D投影が複数の3D解釈に対応できるため、非常に曖昧な問題である。しかしながら、ほとんどのHMR法はこの曖昧さを無視し、関連する不確実性を考慮せずに単一の予測を行う。いくつかのアプローチは、人間のメッシュの分布を生成し、複数の予測のサンプリングを可能にするが、それらのうちの1つの予測を行う際に、最新の単一出力モデルと競合するものは存在しない。本研究は,マスク生成モデルに基づく新しい手法を提案する。人間のポーズと形状をトークン化することにより、HMRタスクを入力画像に条件付けられた離散トークンのシーケンスを生成するものとして定式化する。画像と部分的ヒューマンメッシュトークンシーケンスから人間のメッシュを復元するために訓練された MaskEd Generative Autoencoder であるMEGA を紹介する。画像が与えられた場合、フレキシブルな生成方式により、決定論的モードで1つの人間のメッシュを予測したり、確率論的モードで複数の人間のメッシュを生成できる。 MEGAにより、複数の出力を提案し、予測の不確実性を評価することができる。 In-the-wildベンチマークの実験により、MEGAは決定論的および確率的モードにおける最先端のパフォーマンスを達成し、単一出力および複数出力のアプローチより優れていることが示された。

Human Mesh Recovery (HMR) from a single RGB image is a highly ambiguous problem, as similar 2D projections can correspond to multiple 3D interpretations. Nevertheless, most HMR methods overlook this ambiguity and make a single prediction without accounting for the associated uncertainty. A few approaches generate a distribution of human meshes, enabling the sampling of multiple predictions; however, none of them is competitive with the latest single-output model when making a single prediction. This work proposes a new approach based on masked generative modeling. By tokenizing the human pose and shape, we formulate the HMR task as generating a sequence of discrete tokens conditioned on an input image. We introduce MEGA, a MaskEd Generative Autoencoder trained to recover human meshes from images and partial human mesh token sequences. Given an image, our flexible generation scheme allows us to predict a single human mesh in deterministic mode or to generate multiple human meshes in stochastic mode. MEGA enables us to propose multiple outputs and to evaluate the uncertainty of the predictions. Experiments on in-the-wild benchmarks show that MEGA achieves state-of-the-art performance in deterministic and stochastic modes, outperforming single-output and multi-output approaches.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# 開語彙セマンティックセグメンテーションのための超球面空間におけるパラメータ効率の微調整

Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation ( http://arxiv.org/abs/2405.18840v1 )

ライセンス: Link先を確認

Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Lingxi Xie, Qi Tian, Wei Shen,

(参考訳) オープンボキャブラリセマンティックセグメンテーションは、画像中の各ピクセルに任意のテキスト記述をラベル付けしようとする。ビジョン言語基盤モデル、特にCLIPは、最近、オープン語彙機能を取得するための強力なツールとして登場した。しかし、ピクセルレベルの予測能力を備えた微調整のCLIPは、しばしば3つの問題に悩まされる。 1 計算コストが高いこと。 2)CLIPとCLIPの2つの性質の相違 3) 目に見えないカテゴリーにおける一般化能力の低下。これらの問題に対処するため, 2つのCLIPモダリティに対して超球面空間で実施される対称パラメータ効率細調整(PEFT)戦略を提案する。具体的には、PEFT戦略は、全ての学習可能な行列のうち、効率的なブロック対角学習可能な変換行列と二重相互関係通信モジュールによって達成される。 PEFT戦略は2つのCLIPモダリティと対称に行われるので、それら間のミスアライメントが軽減される。さらに,CLIPテキストエンコーダにおいて,超球面エネルギーの原理に従ってPEFTに新たな制約を適用する。すなわち,微調整時の超球面エネルギーの最小化は,CLIPテキストエンコーダが提供する一般化能力の破壊を防止するため,元のパラメータ空間の内在的構造を保存する。様々なベンチマークにおいて、H-CLIPは、CLIPの総パラメータの約4%を更新するだけで、新しいSOTAオープン語彙セマンティックセマンティックセグメンテーション結果を達成することが示された。

Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between the two inherent modalities of CLIP, and 3) degraded generalization ability on unseen categories. To address these issues, we propose H-CLIP a symmetrical parameter-efficient fine-tuning (PEFT) strategy conducted in hyperspherical space for both of the two CLIP modalities. Specifically, the PEFT strategy is achieved by a series of efficient block-diagonal learnable transformation matrices and a dual cross-relation communication module among all learnable matrices. Since the PEFT strategy is conducted symmetrically to the two CLIP modalities, the misalignment between them is mitigated. Furthermore, we apply an additional constraint to PEFT on the CLIP text encoder according to the hyperspherical energy principle, i.e., minimizing hyperspherical energy during fine-tuning preserves the intrinsic structure of the original parameter space, to prevent the destruction of the generalization ability offered by the CLIP text encoder. Extensive evaluations across various benchmarks show that H-CLIP achieves new SOTA open-vocabulary semantic segmentation results while only requiring updating approximately 4% of the total parameters of CLIP.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# 自動走行車開発におけるヒューマンファクターの管理--課題と実践に向けて

Managing Human Factors in Automated Vehicle Development: Towards Challenges and Practices ( http://arxiv.org/abs/2405.18841v1 )

ライセンス: Link先を確認

Amna Pir Muhammad, Eric Knauss, Jonas Bärgman, Alessia Knauss,

(参考訳) 技術的複雑さと社会的影響のため、自動走行車(AV)の開発は、現在の自動車工学の実践に挑戦する。研究は、人的要因(HF)の知識を安全かつ受け入れられるために、AVを開発する際に考慮することが重要であることを示している。本研究は,自動車産業におけるアジャイルAV開発におけるHF要求の実践と課題について考察する。我々は、HFの専門家やAVエンジニアを含むスウェーデンの自動車会社から10人の業界専門家にインタビューした。半構造化インタビューの質的な分析に基づいて、HFの知識をアジャイルAV開発に伝達し、取り入れるための現在のアプローチと関連する課題について論じる。私たちの発見は、HFの知識をアジャイルなAV開発に効果的に組み込む上で重要な問題について、将来の研究に集中するのに役立ちます。

Due to the technical complexity and social impact, automated vehicle (AV) development challenges the current state of automotive engineering practice. Research shows that it is important to consider human factors (HF) knowledge when developing AVs to make them safe and accepted. This study explores the current practices and challenges of the automotive industries for incorporating HF requirements during agile AV development. We interviewed ten industry professionals from several Swedish automotive companies, including HF experts and AV engineers. Based on our qualitative analysis of the semi-structured interviews, a number of current approaches for communicating and incorporating HF knowledge into agile AV development and associated challenges are discussed. Our findings may help to focus future research on issues that are critical to effectively incorporate HF knowledge into agile AV development.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# 野生における記述的画像品質評価

Descriptive Image Quality Assessment in the Wild ( http://arxiv.org/abs/2405.18842v1 )

ライセンス: Link先を確認

Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Tianfan Xue, Chao Dong,

(参考訳) 視覚言語モデル(VLM)の急速な進歩により、VLMベースの画像品質評価(IQA)は、画像品質を言語的に記述し、人間の表現と整合し、IQAタスクの多面的な性質を捉えようとしている。しかし、現在の方法はまだ実用には程遠い。まず、事前の作業は特定のサブタスクや設定に絞られ、多様な現実世界のアプリケーションと一致しない。第二に、データセットのカバレッジ、スケール、品質に制限があるため、パフォーマンスは準最適である。これらの課題を克服するために、野生における画像品質評価(DepictQA-Wild)を紹介する。本手法は,評価タスクと比較タスク,簡潔かつ詳細な応答,完全参照,非参照シナリオを含む多機能IQAタスクパラダイムを含む。そこで本研究では,データ品質を向上する基盤トラスインフォームドデータセット構築手法を導入し,短時間のジョイントフレームワークの下でデータセットを495Kにスケールアップする。そこで我々はDQ-495Kという,包括的で大規模で高品質なデータセットを構築した。また、画像の解像度をトレーニング中に保持し、解像度に関する品質問題に対処し、低品質の応答をフィルタリングするのに有用な信頼性スコアを推定する。実験結果から,DepictQA-Wildは従来のスコアベース手法,VLMモデル以前のIQAモデル,歪み識別,即時評価,推論タスクにおいて独自のGPT-4Vよりも優れていた。我々の優位性は、Webダウンロードされた画像の評価や、モデル処理された画像のランク付けなど、現実世界のアプリケーションによってさらに確認される。データセットとコードはhttps://depictqa.github.io/depictqa-wild/でリリースされる。

With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-world applications. Second, their performance is sub-optimal due to limitations in dataset coverage, scale, and quality. To overcome these challenges, we introduce Depicted image Quality Assessment in the Wild (DepictQA-Wild). Our method includes a multi-functional IQA task paradigm that encompasses both assessment and comparison tasks, brief and detailed responses, full-reference and non-reference scenarios. We introduce a ground-truth-informed dataset construction approach to enhance data quality, and scale up the dataset to 495K under the brief-detail joint framework. Consequently, we construct a comprehensive, large-scale, and high-quality dataset, named DQ-495K. We also retain image resolution during training to better handle resolution-related quality issues, and estimate a confidence score that is helpful to filter out low-quality responses. Experimental results demonstrate that DepictQA-Wild significantly outperforms traditional score-based methods, prior VLM-based IQA models, and proprietary GPT-4V in distortion identification, instant rating, and reasoning tasks. Our advantages are further confirmed by real-world applications including assessing the web-downloaded images and ranking model-processed images. Datasets and codes will be released in https://depictqa.github.io/depictqa-wild/.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# データ駆動型機械故障検出:総合的レビュー

Data-driven Machinery Fault Detection: A Comprehensive Review ( http://arxiv.org/abs/2405.18843v1 )

ライセンス: Link先を確認

Dhiraj Neupane, Mohamed Reda Bouadjenek, Richard Dazeley, Sunil Aryal,

(参考訳) 先進的な製造の時代には、機械の故障をできるだけ早く診断し、安全で効率的な運転を保証することがこれまで以上に重要になりました。産業用ビッグデータの急増とセンシング・計算技術の進歩により、機械/深度学習アプローチに基づくデータ駆動型機械学習故障診断(MFD)ソリューションが製造業においてユビキタスに利用されている。多くの関連ソリューションが提案され、多くの論文でレビューされている産業応用において、故障した機械信号のタイムリーかつ正確に識別することが不可欠である。 MFDに関する多くのソリューションとレビューが利用可能であるにもかかわらず、既存の作品はいくつかの側面を欠いていることが多い。利用可能な文献の多くは、特定の種類の機器や分析方法に集中しているため、幅広い製造環境において適用性に制限がある。さらに、ノイズの多いデータ処理、適切な特徴の選択、新しい障害や予期せぬ障害に対応するモデルの適用など、データ駆動型アプローチの実装に関わる課題に関する議論は、表面的あるいは完全に見落とされがちである。そこで本調査では, 各種機械故障の検出・診断にさまざまな機械学習アプローチを用いた記事の総合的なレビュー, 強度と限界の強調, 条件に基づく解析に使用される手法のレビュー, 利用可能な機械故障データセットの総合的な検討, 今後の研究者に対して, これらのアプローチをMFDに使用しながら直面する可能性のある課題について紹介し, それらの問題を緩和するための潜在的ソリューションを推奨する。今後の研究の見通しは、この分野をより深く理解するためにも指摘されている。この記事は、研究者の助けとなり、この分野のさらなる発展に貢献してくれると信じている。

In this era of advanced manufacturing, it's now more crucial than ever to diagnose machine faults as early as possible to guarantee their safe and efficient operation. With the massive surge in industrial big data and advancement in sensing and computational technologies, data-driven Machinery Fault Diagnosis (MFD) solutions based on machine/deep learning approaches have been used ubiquitously in manufacturing. Timely and accurately identifying faulty machine signals is vital in industrial applications for which many relevant solutions have been proposed and are reviewed in many articles. Despite the availability of numerous solutions and reviews on MFD, existing works often lack several aspects. Most of the available literature has limited applicability in a wide range of manufacturing settings due to their concentration on a particular type of equipment or method of analysis. Additionally, discussions regarding the challenges associated with implementing data-driven approaches, such as dealing with noisy data, selecting appropriate features, and adapting models to accommodate new or unforeseen faults, are often superficial or completely overlooked. Thus, this survey provides a comprehensive review of the articles using different types of machine learning approaches for the detection and diagnosis of various types of machinery faults, highlights their strengths and limitations, provides a review of the methods used for condition-based analyses, comprehensively discusses the available machinery fault datasets, introduces future researchers to the possible challenges they have to encounter while using these approaches for MFD and recommends the probable solutions to mitigate those problems. The future research prospects are also pointed out for a better understanding of the field. We believe this article will help researchers and contribute to the further development of the field.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# Wikiコントリビュータのシミュレーション、モデリング、分類:「善」、「悪」、そして「悪」

Simulation, Modelling and Classification of Wiki Contributors: Spotting The Good, The Bad, and The Ugly ( http://arxiv.org/abs/2405.18845v1 )

ライセンス: Link先を確認

Silvia García Méndez, Fátima Leal, Benedita Malheiro, Juan Carlos Burguillo Rial, Bruno Veloso, Adriana E. Chis, Horacio González Vélez,

(参考訳) データクラウドソーシング(Data crowdsourcing)は、自発的なコントリビュータのグループが、ニュース、コメント、メディアから知識、分類に至るまで、非常に関連性の高いデータをプラットフォームに提供する、データ取得プロセスである。通常、ユーザ生成データストリームを処理して、wiki、コラボレーティブマップ、eコマースサイト、ソーシャルネットワークなどのポピュラーなサービスを提供し、洗練する。しかしながら、このモナス・オペランディは、敵対的環境における意図しないデータ操作に関する深刻な懸念を提起する。本稿では,人間と非人間(ロボット)を自動的に識別するためのシミュレーション,モデリング,分類手法を提案する。 WikiVoyageをテストベッドとして利用することで,実データと合成データの両方からなるクラスバランスのデータストリームを使用することで,分類者の信頼性と品質を大幅に向上させることが証明された。以上の結果から,本手法は良性ボットと良性ボットと,最大92%の分類精度を持つヒトコントリビュータを区別できることがわかった。

Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage - a free worldwide wiki travel guide open to contribution from the general public - as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# アジャイルにおける要求戦略の定義 - デザインサイエンス研究

Defining Requirements Strategies in Agile: A Design Science Research Study ( http://arxiv.org/abs/2405.18847v1 )

ライセンス: Link先を確認

Amna Pir Muhammad, Eric Knauss, Odzaya Batsaikhan, Nassiba El Haskouri, Yi-Chun Lin, Alessia Knauss,

(参考訳) 調査によると、現在アジャイル開発で直面している課題の多くは、要件エンジニアリングに関連している。デザインサイエンスの研究に基づいて、未定義の要求戦略からアジャイル開発で生じる重要な課題を考察する。これらの課題に対処し、要求戦略の重要な構成要素を合成する潜在的な方法を模索する。我々のデザインサイエンス研究は、通信技術、セキュリティサービス、自動車の分野における3つの産業ケースで、複数のケーススタディに基づいています。具体的な課題とワークフローを理解するために,計20回のインタビュー,2回のワークショップ,2回の参加者観察,各事例の文書分析を頼りにしました。いずれの場合も、プロセスマネージャや経験豊富なエンジニアとのコラボレーションにおいて、要件戦略を定義します。この経験から、私たちはアジャイル開発における要件戦略を定義するためのガイドラインを抽出します。

Research shows that many of the challenges currently encountered with agile development are related to requirements engineering. Based on design science research, this paper investigates critical challenges that arise in agile development from an undefined requirements strategy. We explore potential ways to address these challenges and synthesize the key building blocks of requirements strategies. Our design science research rests on a multiple case study with three industrial cases in the domains of communication technology, security services, and automotive. We relied on a total of 20 interviews, two workshops, participant observation in two cases, and document analysis in each of the cases to understand concrete challenges and workflows. In each case, we define a requirements strategy in collaboration with process managers and experienced engineers. From this experience, we extract guidelines for defining requirements strategies in agile development.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# コンテキストコントラストによる異常検出

Anomaly Detection by Context Contrasting ( http://arxiv.org/abs/2405.18848v1 )

ライセンス: Link先を確認

Alain Ryser, Thomas M. Sutter, Alexander Marx, Julia E. Vogt,

(参考訳) 異常検出は、標準から逸脱するサンプルを特定することに焦点を当てる。画像などの高次元データを扱う場合、異常パターンを検出するための重要な要件は、トレーニング中に見られる通常の概念をキャプチャする低次元表現を学習することである。近年の自己教師型学習の進歩は、この点において大きな可能性を秘めている。しかし、最も成功した自己教師付き異常検出法の多くは、異常の構造に関する事前知識を前提として、訓練中に合成異常を利用する。しかし、多くの現実世界のアプリケーションでは、目に見えないデータから何を期待すべきかは分かっていません。本研究では、通常のトレーニングデータを通常のプロパティを保持しながら異なるコンテキストに設定し、異なる視点からデータを観察することで、この問題に対処するCon2を提案する。その結果、見つからない通常のデータは学習した文脈表現に固執するが、異常は起こらないため、トレーニング中に異常について何も知らないまま検出できる。提案手法は,より現実的な医療環境において,潜在的な異常に関する知識が不足している場合に,より優れたパフォーマンスを示すとともに,様々なベンチマーク上での最先端のパフォーマンスを実現することを実証した。

Anomaly Detection focuses on identifying samples that deviate from the norm. When working with high-dimensional data such as images, a crucial requirement for detecting anomalous patterns is learning lower-dimensional representations that capture normal concepts seen during training. Recent advances in self-supervised learning have shown great promise in this regard. However, many of the most successful self-supervised anomaly detection methods assume prior knowledge about the structure of anomalies and leverage synthetic anomalies during training. Yet, in many real-world applications, we do not know what to expect from unseen data, and we can solely leverage knowledge about normal data. In this work, we propose Con2, which addresses this problem by setting normal training data into distinct contexts while preserving its normal properties, letting us observe the data from different perspectives. Unseen normal data consequently adheres to learned context representations while anomalies fail to do so, letting us detect them without any knowledge about anomalies during training. Our experiments demonstrate that our approach achieves state-of-the-art performance on various benchmarks while exhibiting superior performance in a more realistic healthcare setting, where knowledge about potential anomalies is often scarce.

翻訳日:2024-05-30 18:19:11 公開日:2024-05-29

# SFANet:気象予報のための空間周波数アテンションネットワーク

SFANet: Spatial-Frequency Attention Network for Weather Forecasting ( http://arxiv.org/abs/2405.18849v1 )

ライセンス: Link先を確認

Jiaze Wang, Hao Chen, Hongcan Xu, Jinpeng Li, Bowen Wang, Kun Shao, Furui Liu, Huaxi Chen, Guangyong Chen, Pheng-Ann Heng,

(参考訳) 天気予報は様々な分野において重要な役割を担い、意思決定とリスク管理を推進している。しかし、伝統的な手法は、特に高解像度データの存在下で、気象系の複雑な力学を捉えるのに苦労することが多い。本稿では、これらの課題に対処し、時空間天気予報の精度を高めるために設計された新しいディープラーニングフレームワークである空間周波数注意ネットワーク(SFANet)を提案する。既存の手法の限界からインスピレーションを得て,高度なトークンミキシングとアテンション機構をシームレスに統合する革新的なアプローチを提案する。プールと空間混合の両戦略を活用することにより、SFANetは高次元時空間列の処理を最適化し、成分間関係情報を保存し、広範囲の長距離関係をモデル化する。機能統合をさらに強化するため,我々は空間周波数アテンションモジュールを導入し,複雑な相互モーダル相関を捉える。 SEVIR (Storm EVent ImageRy) と ICAR (Institute for Climate and Application Research) - El Ni\~{n}o Southern Oscillation (ENSO) の2つの異なるデータセットに対する広範な実験的評価は、SFANetの顕著な性能を示している。特に,SFANetは,降水パターンの予測やEl Ni\~{n}oイベントの予測に習熟し,最先端の手法に対する大幅な進歩を実現している。

Weather forecasting plays a critical role in various sectors, driving decision-making and risk management. However, traditional methods often struggle to capture the complex dynamics of meteorological systems, particularly in the presence of high-resolution data. In this paper, we propose the Spatial-Frequency Attention Network (SFANet), a novel deep learning framework designed to address these challenges and enhance the accuracy of spatiotemporal weather prediction. Drawing inspiration from the limitations of existing methodologies, we present an innovative approach that seamlessly integrates advanced token mixing and attention mechanisms. By leveraging both pooling and spatial mixing strategies, SFANet optimizes the processing of high-dimensional spatiotemporal sequences, preserving inter-component relational information and modeling extensive long-range relationships. To further enhance feature integration, we introduce a novel spatial-frequency attention module, enabling the model to capture intricate cross-modal correlations. Our extensive experimental evaluation on two distinct datasets, the Storm EVent ImageRy (SEVIR) and the Institute for Climate and Application Research (ICAR) - El Ni\~{n}o Southern Oscillation (ENSO) dataset, demonstrates the remarkable performance of SFANet. Notably, SFANet achieves substantial advancements over state-of-the-art methods, showcasing its proficiency in forecasting precipitation patterns and predicting El Ni\~{n}o events.

翻訳日:2024-05-30 18:19:10 公開日:2024-05-29

# LetsMap: セマンティックなBEVマッピングのための教師なし表現学習

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping ( http://arxiv.org/abs/2405.18852v1 )

ライセンス: Link先を確認

Nikhil Gosala, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo Drews-Jr, Wolfram Burgard, Abhinav Valada,

(参考訳) Semantic Bird's Eye View (BEV) マップは、自律運転における様々な意思決定タスクに対する強い排他的推論を備えたリッチな表現を提供する。しかしながら、ほとんどのBEVマッピングアプローチでは、大量の人間による注釈付きBEV基底真理データに依存する、完全に教師付き学習パラダイムを採用している。本研究では,この制限に対処するため,単眼正面視(FV)画像からのセマンティックなBEVマップをラベル効率よく生成する,教師なし表現学習手法を提案する。提案手法は,2つの解離したニューラルパスを教師なしの方法で利用し,BEV内の少数のラベルのみを用いたセマンティックBEVマッピングのタスクを微調整することで,シーン幾何学とシーン意味論を独立に推論するネットワークを事前訓練する。本研究では,FV画像の空間的・時間的整合性を利用して,シーン表現を符号化する新しい時間的マスク付きオートエンコーダの定式化に依存しながら,シーン形状を学習する。 KITTI-360 と nuScenes データセットの大規模な評価は,BEV ラベルの 1% しか使用せず,追加ラベル付きデータも使用せず,我々のアプローチが既存の最先端アプローチと同等であることを示している。

Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.

翻訳日:2024-05-30 18:19:10 公開日:2024-05-29

# Snapshot Spectral Imaging Face Anti-Spoofingにおけるコントラスト学習の促進

Supervised Contrastive Learning for Snapshot Spectral Imaging Face Anti-Spoofing ( http://arxiv.org/abs/2405.18853v1 )

ライセンス: Link先を確認

Chuanbiao Song, Yan Hong, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang,

(参考訳) 本研究は,顔認識システムにおける顔の偽造防止機能を強化することを目的とした,最先端の再均衡型コントラスト学習戦略を明らかにし,印刷写真や高現実的なシリコンマスクやラテックスマスクによる課題への対処に焦点を当てた。ハイパースペクトル画像の提供にSnapshot Spectral Imaging技術を活用するHySpeFASデータセットを活用することで,データ再サンプリングによるクラスレベルのコントラスト学習と,革新的なリアルタイムリウェイト技術との調和を実現した。この方法は、データセットの不均衡を効果的に軽減し、アイデンティティ関連のバイアスを低減する。 CVPR 2024のChalearn Snapshot Spectral Imaging Face Anti-spoofing Challengeにおいて,HySpeFASデータセット上での平均分類誤差率(ACER)は前例のない0.0000\%に達した。

This study reveals a cutting-edge re-balanced contrastive learning strategy aimed at strengthening face anti-spoofing capabilities within facial recognition systems, with a focus on countering the challenges posed by printed photos, and highly realistic silicone or latex masks. Leveraging the HySpeFAS dataset, which benefits from Snapshot Spectral Imaging technology to provide hyperspectral images, our approach harmonizes class-level contrastive learning with data resampling and an innovative real-face oriented reweighting technique. This method effectively mitigates dataset imbalances and reduces identity-related biases. Notably, our strategy achieved an unprecedented 0.0000\% Average Classification Error Rate (ACER) on the HySpeFAS dataset, ranking first at the Chalearn Snapshot Spectral Imaging Face Anti-spoofing Challenge on CVPR 2024.

翻訳日:2024-05-30 18:19:10 公開日:2024-05-29

# SSGA-Net: 自律運転のためのステップワイドグローバルローカルアグリゲーションネットワーク

SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving ( http://arxiv.org/abs/2405.18857v1 )

ライセンス: Link先を確認

Yiming Cui, Cheng Han, Dongfang Liu,

(参考訳) 視覚に基づく知覚は、自動運転の鍵となるモジュールである。これらの視覚的認識タスクの中で、ビデオオブジェクト検出は、高速な動きや複数のポーズによって生じる特徴劣化のため、主要かつ困難なタスクである。現在のモデルは、通常、隣接するフレームから特徴を集約してタスクヘッドのオブジェクト表現を強化し、より正確な予測を生成する。性能は向上するが、これらの手法は将来のフレームの情報に依存し、高い計算複雑性に悩まされる。一方、アグリゲーションプロセスは、推論時間中に再構成できない。これらの問題により、既存のモデルのほとんどがオンラインアプリケーションでは利用できない。これらの問題を解決するために、段階的に空間的にグローバルな集約ネットワークを導入する。提案するモデルは,主に3つの部分を含む。多段階のステップワイドネットワークは、前段階からの予測とオブジェクト表現を徐々に洗練する。空間的グローバル・ローカル・アグリゲーションは、隣接するフレームからの局所情報と現在のフレームからのグローバル・セマンティクスを融合させ、特徴劣化を解消する。ダイナミックアグリゲーション戦略は、リファインメント結果に基づいて早期にアグリゲーションプロセスを停止し、冗長性を除去し、効率を向上する。 ImageNet VIDベンチマークの大規模な実験により、提案モデルの有効性と効率が検証された。

Visual-based perception is the key module for autonomous driving. Among those visual perception tasks, video object detection is a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current models usually aggregate features from the neighboring frames to enhance the object representations for the task heads to generate more accurate predictions. Though getting better performance, these methods rely on the information from the future frames and suffer from high computational complexity. Meanwhile, the aggregation process is not reconfigurable during the inference time. These issues make most of the existing models infeasible for online applications. To solve these problems, we introduce a stepwise spatial global-local aggregation network. Our proposed models mainly contain three parts: 1). Multi-stage stepwise network gradually refines the predictions and object representations from the previous stage; 2). Spatial global-local aggregation fuses the local information from the neighboring frames and global semantics from the current frame to eliminate the feature degradation; 3). Dynamic aggregation strategy stops the aggregation process early based on the refinement results to remove redundancy and improve efficiency. Extensive experiments on the ImageNet VID benchmark validate the effectiveness and efficiency of our proposed models.

翻訳日:2024-05-30 18:19:10 公開日:2024-05-29

# ドメインにインスパイアされたシャープネス-ドメインシフトによる最小化

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts ( http://arxiv.org/abs/2405.18861v1 )

ライセンス: Link先を確認

Ruipeng Zhang, Ziqing Fan, Jiangchao Yao, Ya Zhang, Yanfeng Wang,

(参考訳) 本稿ではドメインシフト下での最適化のためのドメインインスパイアされたシャープネス認識最小化(DISAM)アルゴリズムを提案する。これは、異なる領域にまたがるSAMの不整合収束度によって動機付けられ、特定の領域に対する最適化バイアスを引き起こし、したがって全体の収束を損なう。この問題に対処するために、我々は、(十分に)最適化されていないドメインに対する圧倒的な(不十分な)摂動を防ぐために、シャープネス推定におけるドメインレベルの収束一貫性を検討する。特に、DIAMは、ドメイン損失の分散を最小化する制約を導入し、摂動発生における弾性勾配校正を可能にする: 1つのドメインが平均レベル \textit{w.r.t.} の損失以上に最適化されると、そのドメインへの勾配摂動は自動的に弱くなる。このメカニズムの下では、不整合収束が生じると、理論上disAMがより高速な総合収束を実現し、原理的に一般化を向上できることが示される。様々な領域一般化ベンチマークの広範囲な実験は、最先端の手法よりもDIAMの方が優れていることを示している。さらに、パラメータ効率の良い微調整と事前学習モデルの組み合わせにより、DIAMの優れた効率性を示す。ソースコードはhttps://github.com/MediaBrain-SJTU/DISAMで公開されている。

This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.

翻訳日:2024-05-30 18:19:10 公開日:2024-05-29

# 単一分子分光法による新しいビュー合成のためのニューラルラジアンス場

Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy ( http://arxiv.org/abs/2405.18863v1 )

ライセンス: Link先を確認

Zijie Jiang, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Kenji Miki,

(参考訳) 腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下手術は腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下腹腔鏡下手術の診断に有用である。この目的を達成するための典型的な方法は、構造移動(SfM)やポアソン表面の再構成を含む従来の3D再構成技術を統合することである。これらの手法は、点雲やメッシュなどの明示的な3D表現を生成し、新しい視点から画像のレンダリングを可能にする。しかし、胃内の低テクスチュア領域と非ランベルト領域の存在は、しばしば、点雲とメッシュのノイズと不完全な再構成をもたらし、高品質の画像レンダリングの達成を妨げる。本稿では,ニューラルラジアンスフィールド(NeRF)の新しい手法をモノクロガストロスコープデータに適用し,新しい視点に向けて光現実像を合成する。単眼胃内視鏡の局所領域における視線間隔による性能劣化に対処するため, 既設点雲からの幾何先行をNeRFのトレーニングに組み込んだ。近年のNeRF法と比較すると,胃内の新しい視点からの高忠実度画像のレンダリングは質的,定量的に行われている。

Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, such as point clouds and meshes, thereby enabling the rendering of the images from novel viewpoints. However, the existence of low-texture and non-Lambertian regions within the stomach often results in noisy and incomplete reconstructions of point clouds and meshes, hindering the attainment of high-quality image rendering. In this paper, we apply the emerging technique of neural radiance fields (NeRF) to monocular gastroscopic data for synthesizing photo-realistic images for novel viewpoints. To address the performance degradation due to view sparsity in local regions of monocular gastroscopy, we incorporate geometry priors from a pre-reconstructed point cloud into the training of NeRF, which introduces a novel geometry-based loss to both pre-captured observed views and generated unobserved views. Compared to other recent NeRF methods, our approach showcases high-fidelity image renderings from novel viewpoints within the stomach both qualitatively and quantitatively.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# 最適マルチモーダル埋め込み空間のトポロジ的展望

Topological Perspectives on Optimal Multimodal Embedding Spaces ( http://arxiv.org/abs/2405.18867v1 )

ライセンス: Link先を確認

Abdul Aziz A. B, A. B Abdul Rahim,

(参考訳) マルチモーダルモデル開発における最近の進歩は、テキスト・ツー・イメージ生成の領域におけるパラダイムシフトを浮き彫りにした。これらの進歩の中で、CLIPは、テキスト情報と視覚情報を一体化された潜在空間内にエンコードする高度なオートエンコーダである、顕著な成果として際立っている。本稿では,CLIPと最近のCLOOBの比較分析について述べる。これらのモデルによって構築された埋め込み空間内での複雑な区別を明らかにするために、トポロジカルデータ解析を用いる。提案手法は,モダリティギャップドライバ,高次元と低次元の両方に存在するクラスタリング構造,および各埋め込み空間を形成する上で,次元崩壊が果たす重要な役割を包括的に検討することを含む。実証実験は、様々な文脈シナリオにおける下流性能に関する分析の影響を裏付けるものである。本研究は,CLIP と CLOOB の比較効果を生かし,それぞれの長所と短所に関する知見を提供し,マルチモーダルモデル研究のさらなる洗練と発展のための基盤を提供することを目的としている。

Recent strides in multimodal model development have ignited a paradigm shift in the realm of text-to-image generation. Among these advancements, CLIP stands out as a remarkable achievement which is a sophisticated autoencoder adept at encoding both textual and visual information within a unified latent space. This paper delves into a comparative analysis between CLIP and its recent counterpart, CLOOB. To unravel the intricate distinctions within the embedding spaces crafted by these models, we employ topological data analysis. Our approach encompasses a comprehensive examination of the modality gap drivers, the clustering structures existing across both high and low dimensions, and the pivotal role that dimension collapse plays in shaping their respective embedding spaces. Empirical experiments substantiate the implications of our analyses on downstream performance across various contextual scenarios. Through this investigation, we aim to shed light on the nuanced intricacies that underlie the comparative efficacy of CLIP and CLOOB, offering insights into their respective strengths and weaknesses, and providing a foundation for further refinement and advancement in multimodal model research.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# ハーモニックトラップ電位内に置かれた重力波検出器における量子重力信号

Quantum gravity signatures in gravitational wave detectors placed inside a harmonic trap potential ( http://arxiv.org/abs/2405.18868v1 )

ライセンス: Link先を確認

Soham Sen, Sunandan Gangopadhyay, Sukanta Bhattacharyya,

(参考訳) 本研究は,高調波トラップ内にのみ設置された偏光と入ってくる重力波と相互作用する重力波の一般重力波検出器について考察する。このモデルは、重力波の共鳴検出器の記述とよく一致する。よく知られた検出器-重力波相互作用のシナリオは、検出器を量子力学的に扱う半古典的な手法を用いるが、重力波は古典的なレベルで考えることができる。解析では、重力波の摂動の離散モード分解を用いて、重力波と調和振動子に対応する運動量演算子の位置と運動量演算子を含むハミルトニアンを導出する。そして、初期状態から未知の最終状態に移行するための高調波振動子-重力波テンソル積状態の遷移確率を計算した。重力波のエネルギーフラックス関係を用いて、全エネルギーを検出器の初期状態における重力子数の組み合わせとして考えると、共鳴吸収の場合の遷移確率は半古典吸収の場合と全く同じ解析形式を取る。本モデルでは, 半古典的アンカルージに完全に欠落した単一重力子の自然放出を観測した。したがって、これは線型化された量子重力の直接的なシグネチャを与える。

In this work, we consider a general gravitational wave detector of gravitational wave interacting with an incoming gravitational wave carrying plus polarization only placed inside a harmonic trap. This model can be well acquainted with the description of a resonant detector of gravitational wave as well. The well known detector-gravitational wave interaction scenario uses the method of a semi classical approach where the detector is treated quantum mechanically but the gravitational wave is considered at a classical level. In our analysis, we use a discrete mode decomposition of the gravitational wave perturbation which results in a Hamiltonian involving the position and momentum operators corresponding to the gravitational wave and the harmonic oscillator. We have then calculated the transition probability for the harmonic oscillator-gravitational wave tensor product state for going from an initial state to some unknown final state. Using the energy flux relation of the gravitational waves, we observe that if we consider the total energy as a combination of the number of gravitons in the initial state of the detector then the transition probability for the resonant absorption case scenario takes the analytical form which is exactly similar to the semi-classical absorption case. In case of the emission scenario, we observe a spontaneous emission of a single graviton which was completely absent in the semi-classical analouge of this model. This therefore gives a direct signature of linearized quantum gravity.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# データ駆動型電力管理に向けて - マルチリージョン同調データと知識グラフ

Towards Data-Driven Electricity Management: Multi-Region Harmonized Data and Knowledge Graph ( http://arxiv.org/abs/2405.18869v1 )

ライセンス: Link先を確認

Vid Hanžel, Blaž Bertalanič, Carolina Fortuna,

(参考訳) 人口増加と技術進歩により、世界的な電力消費が増加し、結果としてCO2排出量も増加している。住宅セクターは世界の電力消費の25%を占めており、快適さを犠牲にすることなく効率を高め、CO2フットプリントを減らす大きな可能性を秘めている。しかし、複数の地域をまたがる家庭レベルでの一様消費データが欠如していることは、大規模な研究や堅牢な多地域モデル開発を妨げる。本稿では、公開されているソースからコンパイルされ、均一なフォーマットで提示されるマルチリージョンデータセットについて紹介する。このデータにより、デアグリゲーション、需要予測、アプライアンスON/OFF分類などの機械学習タスクが可能になる。さらに,家庭の電力消費を特徴付けるRDF知識グラフを開発し,Wikidata や DBpedia などのオープンな知識ベースとの相互運用性とセマンティッククエリを実現するための,家庭関連プロパティとコンテキストを関連づける。この構造化されたデータは、さまざまな利害関係者にデータ駆動型ポリシーやビジネス開発について通知するために利用することができる。

Due to growing population and technological advances, global electricity consumption, and consequently also CO2 emissions are increasing. The residential sector makes up 25% of global electricity consumption and has great potential to increase efficiency and reduce CO2 footprint without sacrificing comfort. However, a lack of uniform consumption data at the household level spanning multiple regions hinders large-scale studies and robust multi-region model development. This paper introduces a multi-region dataset compiled from publicly available sources and presented in a uniform format. This data enables machine learning tasks such as disaggregation, demand forecasting, appliance ON/OFF classification, etc. Furthermore, we develop an RDF knowledge graph that characterizes the electricity consumption of the households and contextualizes it with household related properties enabling semantic queries and interoperability with other open knowledge bases like Wikidata and DBpedia. This structured data can be utilized to inform various stakeholders towards data-driven policy and business development.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# LLMはマインドタスクの高次理論上での成人人間のパフォーマンスを達成する

LLMs achieve adult human performance on higher-order theory of mind tasks ( http://arxiv.org/abs/2405.18870v1 )

ライセンス: Link先を確認

Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar,

(参考訳) 本稿では,大規模言語モデル (LLM) が高次心の理論 (ToM) をいかに発展させたかを検討する。本稿では、手書きテストスイートであるMulti-Order Theory of Mind Q&Aを導入し、5つのLCMのパフォーマンスと、新たに集まった成人のベンチマークを比較することによって、以前の作業の上に構築する。 GPT-4とFlan-PaLMは、ToMタスク全体において、成人レベルおよびほぼ成人レベルに到達し、GPT-4は6次推定で成人レベルを超えることが判明した。以上の結果から,ToM能力を実現するためのモデルサイズと微調整の間には相互作用があることが示唆された。高次ToMが幅広い協調的かつ競争的な人間の行動に果たす役割を考えると、これらの発見はユーザ向けLLMアプリケーションに重大な影響を及ぼす。

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# DFAMiner:ラベル付きサンプルから最小限のDFAを分離する

DFAMiner: Mining minimal separating DFAs from labelled samples ( http://arxiv.org/abs/2405.18871v1 )

ライセンス: Link先を確認

Daniele Dell'Erba, Yong Li, Sven Schewe,

(参考訳) ラベル付きサンプルから最小限の決定論的有限オートマトン(DFA)を分離する受動的学習ツールであるDFAMinerを提案する。分離オートマトンは、通常モデルチェックで一般的に発生する興味深いオートマトンクラスであり、パリティゲーム解決の基本的な問題に関心を寄せている。まず,3値のDFA(3DFA)を,通常の語彙順に付与されたラベル付きサンプルの集合からインクリメンタルに構築する,単純で線形な時間的アルゴリズムを提案する。この3DFAは、ラベル付き例を正確に認識できるように、状態を受け入れ、拒否すると同時に、不注意な状態も受け入れている。そこで我々は,SAT問題を最小化して構築したオートマトンを最小化することにより,ラベル付きサンプルの最小分離DFAをマイニングするツールを開発した。実験的な評価の結果,本ツールはサンプルから最小限のDFAを学習するための標準ベンチマークにおいて,最先端のツールよりも優れていた。 DFAを分離する効率的な構築の進歩は、DFAMinerが最大7色までの単純な言語に対して最適な分離オートマトンを作成できることを示すために、パリティゲーム解決の低い境界を見つけることにも繋がる。将来的には、データ構造の改善が期待できるだろう。

We propose DFAMiner, a passive learning tool for learning minimal separating deterministic finite automata (DFA) from a set of labelled samples. Separating automata are an interesting class of automata that occurs generally in regular model checking and has raised interest in foundational questions of parity game solving. We first propose a simple and linear-time algorithm that incrementally constructs a three-valued DFA (3DFA) from a set of labelled samples given in the usual lexicographical order. This 3DFA has accepting and rejecting states as well as don't-care states, so that it can exactly recognise the labelled examples. We then apply our tool to mining a minimal separating DFA for the labelled samples by minimising the constructed automata via a reduction to solving SAT problems. Empirical evaluation shows that our tool outperforms current state-of-the-art tools significantly on standard benchmarks for learning minimal separating DFAs from samples. Progress in the efficient construction of separating DFAs can also lead to finding the lower bound of parity game solving, where we show that DFAMiner can create optimal separating automata for simple languages with up to 7 colours. Future improvements might offer inroads to better data structures.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# トレーニング可能な特徴マッチングアテンションネットワークに基づく単一画像超解像

Single image super-resolution based on trainable feature matching attention network ( http://arxiv.org/abs/2405.18872v1 )

ライセンス: Link先を確認

Qizhou Chen, Qing Shao,

(参考訳) 畳み込みニューラルネットワーク(CNN)は近年,画像の超解像(SR)に広く利用されている。様々な技術は、CNN構造を変更したり、改善された自己認識機構を取り入れることでSR性能を向上させる。興味深いことに、これらの進歩は共通の特徴を共有している。高周波の詳細を明示的に学習する代わりに、特徴マップの自分自身の要素の重み付けされた和を利用して、畳み込みや非局所的といった、暗黙的な特徴処理モードを学ぶ。対照的に、初期の辞書ベースのアプローチは、Low-Resolution (LR)機能にマッチして再構築するために、特徴分解を明示的に学習する。この分析に基づいて、この明示的な特徴学習をCNNにマージし、その表現能力を増強するために、トレーニング可能な特徴マッチング(TFM)を導入する。 TFMでは、トレーニング可能な機能セットが統合され、特徴マッチングを通じて画像のトレーニングから機能を明示的に学習する。さらに,提案するトレーニング可能な特徴マッチング注意ネットワーク(TFMAN)に非局所的およびチャネル的注意を組み込むことにより,SR性能をさらに向上する。本研究では,非局所演算の計算要求を軽減するため,SRNL (Same-size-divided Region-level Non-Local) と呼ばれる簡易な変種を提案する。 SRNLは入力特徴写像から一様に分割されたブロック上で非局所計算を並列に行う。 TFMとSRNLの有効性は、アブレーション研究とモジュール探索を通じて検証される。パラメータ利用を最適化するために、TFMANのバックボーンとして繰り返し畳み込みネットワークを用いる。ベンチマークデータセットに関する総合的な実験により、TFMANはパラメータを減らしながら、ほとんどの比較において優れた結果が得られることが示された。コードはhttps://github.com/qizhou000/tfman.comから入手できる。

Convolutional Neural Networks (CNNs) have been widely employed for image Super-Resolution (SR) in recent years. Various techniques enhance SR performance by altering CNN structures or incorporating improved self-attention mechanisms. Interestingly, these advancements share a common trait. Instead of explicitly learning high-frequency details, they learn an implicit feature processing mode that utilizes weighted sums of a feature map's own elements for reconstruction, akin to convolution and non-local. In contrast, early dictionary-based approaches learn feature decompositions explicitly to match and rebuild Low-Resolution (LR) features. Building on this analysis, we introduce Trainable Feature Matching (TFM) to amalgamate this explicit feature learning into CNNs, augmenting their representation capabilities. Within TFM, trainable feature sets are integrated to explicitly learn features from training images through feature matching. Furthermore, we integrate non-local and channel attention into our proposed Trainable Feature Matching Attention Network (TFMAN) to further enhance SR performance. To alleviate the computational demands of non-local operations, we propose a streamlined variant called Same-size-divided Region-level Non-Local (SRNL). SRNL conducts non-local computations in parallel on blocks uniformly divided from the input feature map. The efficacy of TFM and SRNL is validated through ablation studies and module explorations. We employ a recurrent convolutional network as the backbone of our TFMAN to optimize parameter utilization. Comprehensive experiments on benchmark datasets demonstrate that TFMAN achieves superior results in most comparisons while using fewer parameters. The code is available at https://github.com/qizhou000/tfman.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# クエリとキーは常に関連しているか? : トランスフォーマー波動関数のケーススタディ

Are queries and keys always relevant? A case study on Transformer wave functions ( http://arxiv.org/abs/2405.18874v1 )

ライセンス: Link先を確認

Riccardo Rende, Luciano Loris Viteritti,

(参考訳) ドット製品アテンションメカニズムは、元々自然言語処理(NLP)タスク用に設計されたもので、現代のトランスフォーマーの基盤となっている。文中の単語ペア間の意味的関係を、クエリとキー間の類似性を計算することによって、適切にキャプチャする。本研究では、量子多体スピンハミルトニアンの基底状態に近似する変動波関数のパラメトリゼーションの特定の領域において、トランスフォーマーの適合性について検討する。具体的には、格子上の量子量体系の分野における一般的なベンチマークである2次元の$J_1$-$J_2$Heisenbergモデルで数値シミュレーションを行う。標準的な注意機構の性能と、クエリやキーを省略した簡易バージョンを比較し、位置のみに依存することで、計算コストとパラメータ使用量の削減を図り、競合する結果を得る。さらに,標準アテンション機構によって生成されたアテンションマップの解析により,最適化の終了時に,アテンション重みが効果的に入力非依存となることを示す。解析計算により解析結果をサポートし、大規模システムの研究において、なぜクエリとキーが注意機構から省略されるのかを物理的に把握する。興味深いことに、同じ引数を長い入力文の制限で NLP ドメインに拡張することができる。

The dot product attention mechanism, originally designed for natural language processing (NLP) tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum-many body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems. Interestingly, the same arguments can be extended to the NLP domain, in the limit of long input sentences.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# 地域・グローバル・リコースにおける対物的メタルル

Counterfactual Metarules for Local and Global Recourse ( http://arxiv.org/abs/2405.18875v1 )

ライセンス: Link先を確認

Tom Bewley, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, Manuela Veloso,

(参考訳) T-CRExは局所的およびグローバル的対実的説明(CE)のための新しいモデルに依存しない手法である。ツリーベースのサロゲートモデルを活用して、モデル行動のグローバルな分析と、ユーザのための多様なリコメンデーションオプションの両方を提供する、最適な領域を示す'メタル'とともに、カウンターファクトルールを学ぶ。実験により、T-CRExは、CEデシデラタにおける既存のルールベースベースラインよりも、桁違いに高速に動作しながら、より優れたアグリゲーション性能を達成することが示された。

We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside 'metarules' denoting their regions of optimality, providing both a global analysis of model behaviour and diverse recourse options for users. Experiments indicate that T-CREx achieves superior aggregate performance over existing rule-based baselines on a range of CE desiderata, while being orders of magnitude faster to run.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# ブロックチェーン生態系の公平性について

On Fairness Concerns in the Blockchain Ecosystem ( http://arxiv.org/abs/2405.18876v1 )

ライセンス: Link先を確認

Johnnatan Messias Peixoto Afonso,

(参考訳) ブロックチェーンは、分散化と透明性を促進することで、銀行や金融といった中央集権的なセクターに革命をもたらした。ブロックチェーンでは、情報は参加者やアプリケーションが発行したトランザクションを通じて送信される。マイナーは、より高いインセンティブや手数料を持つものを優先して、ブロックインクルージョンのために準備中の取引を選択し、注文し、検証する。トランザクションを含む順序は、ブロックチェーンの最終状態に影響を与える可能性がある。さらに、ブロックチェーン上で動作するアプリケーションは、中核機能を変更するための意思決定力を分散化するためのガバナンスプロトコルに依存することが多い。これらの変更は、参加者がこれらのアプリケーションとどのようにやりとりするかに影響を与える可能性がある。 1つのトークンが1つの投票に等しいため、複数のトークンを持つ参加者は、提案された変更を支持し、拒否する投票力が高い。この投票力が分散される範囲は疑問の余地があり、少数の保有者の間で集中している場合、ガバナンス攻撃につながる可能性がある。この論文では、BitcoinとEthereumのブロックチェーンを監査し、トランザクションの優先順位を決定するためにマイナーが続く規範を調査します。また、参加者間で投票力が公平に分散されているかどうかを評価するために、Commenなどの分散型ガバナンスプロトコルを監査する。私たちの発見は、ブロックチェーンと分散アプリケーションの将来的な発展に重大な影響を与えます。

Blockchains revolutionized centralized sectors like banking and finance by promoting decentralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state. Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. These changes can affect how participants interact with these applications. Since one token equals one vote, participants holding multiple tokens have a higher voting power to support or reject the proposed changes. The extent to which this voting power is distributed is questionable and if highly concentrated among a few holders can lead to governance attacks. In this thesis, we audit the Bitcoin and Ethereum blockchains to investigate the norms followed by miners in determining the transaction prioritization. We also audit decentralized governance protocols such as Compound to evaluate whether the voting power is fairly distributed among the participants. Our findings have significant implications for future developments of blockchains and decentralized applications.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# 連続製品グラフニューラルネットワーク

Continuous Product Graph Neural Networks ( http://arxiv.org/abs/2405.18877v1 )

ライセンス: Link先を確認

Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo,

(参考訳) 複数のグラフ上に定義されたマルチドメインデータの処理は、計算機科学における様々な実践的応用において大きな可能性を秘めている。しかし、現在の手法は主に離散グラフフィルタリングに限られている。グラフ上のテンソル偏微分方程式(TPDEG)は、既存の離散的方法論の限界に対処しながら、複数の相互作用グラフにまたがる構造化データをモデル化するための原則的なフレームワークを提供する。本稿では,PTDEGの自然な解として現れるCITRUS(Continuous Product Graph Neural Networks)を紹介する。 CITRUSは、カルト系グラフ製品からの連続熱核の分離性を利用して、グラフスペクトル分解を効率的に実装する。本研究では,CITRUSの安定性と過度な平滑化特性を,ドメイン固有のグラフ摂動とグラフスペクトルの影響に応答して完全に理論的に解析する。我々は、CITRUSをよく知られた交通・気象時空間予測データセットで評価し、既存の手法よりも優れた性能を示す。

Processing multidomain data defined on multiple graphs holds significant potential in various practical applications in computer science. However, current methods are mostly limited to discrete graph filtering operations. Tensorial partial differential equations on graphs (TPDEGs) provide a principled framework for modeling structured data across multiple interacting graphs, addressing the limitations of the existing discrete methodologies. In this paper, we introduce Continuous Product Graph Neural Networks (CITRUS) that emerge as a natural solution to the TPDEG. CITRUS leverages the separability of continuous heat kernels from Cartesian graph products to efficiently implement graph spectral decomposition. We conduct thorough theoretical analyses of the stability and over-smoothing properties of CITRUS in response to domain-specific graph perturbations and graph spectra effects on the performance. We evaluate CITRUS on well-known traffic and weather spatiotemporal forecasting datasets, demonstrating superior performance over existing approaches.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# 医療応用のための多人数計算によるデータイミューテーションのプライバシ保護

Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications ( http://arxiv.org/abs/2405.18878v1 )

ライセンス: Link先を確認

Julia Jentsch, Ali Burak Ünal, Şeyma Selcan Mağara, Mete Akgün,

(参考訳) 欠落したデータの処理は機械学習では不可欠だが、多くのデータセットにはエラーや非レスポンスによるギャップが含まれている。単純だが不十分なリストワイズ削除のような従来の方法とは異なり、この文献はより高度で効果的な方法を提供し、サンプルのサイズと精度を改善している。しかし、これらの方法はデータセット全体にアクセスする必要があり、これはデータが複数のソースに分散される際のプライバシー規制とは矛盾する。特に医療と医療の分野では、そのようなアクセスは患者に関する機密情報を明らかにする。本研究は,セキュアなマルチパーティ計算を用いた機密データに対するプライバシ保護計算手法に対処し,当事者の機密情報を明らかにすることなく,セキュアな計算を可能にする。本研究では,プライバシ保護手法を用いて,平均,中央値,回帰値,kNNの計算手法を考案した。患者データの保護の重要性を考慮した医療・医療分野を特に対象とし,糖尿病データセット上での方法を示す。糖尿病データセットの実験では、プライバシ保護の計算手法の正しさが検証され、最も誤差が大きいのは3ドル(約3,300円)の10^{-3}$の平文法でした。また,本手法のスケーラビリティをさまざまなサンプルに適用し,実際の医療問題への適用性を示した。分析の結果,全手法がサンプル数と線形にスケールできることが判明した。 kNNを除いて、我々のメソッドのランタイムは、大きなデータセットに使用できることを示している。

Handling missing data is crucial in machine learning, but many datasets contain gaps due to errors or non-response. Unlike traditional methods such as listwise deletion, which are simple but inadequate, the literature offers more sophisticated and effective methods, thereby improving sample size and accuracy. However, these methods require accessing the whole dataset, which contradicts the privacy regulations when the data is distributed among multiple sources. Especially in the medical and healthcare domain, such access reveals sensitive information about patients. This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation, enabling secure computations without revealing any party's sensitive information. In this study, we realized the mean, median, regression, and kNN imputation methods in a privacy-preserving way. We specifically target the medical and healthcare domains considering the significance of protection of the patient data, showcasing our methods on a diabetes dataset. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 \times 10^{-3}$, closely matching plaintext methods. We also analyzed the scalability of our methods to varying numbers of samples, showing their applicability to real-world healthcare problems. Our analysis demonstrated that all our methods scale linearly with the number of samples. Except for kNN, the runtime of all our methods indicates that they can be utilized for large datasets.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# 時空間予測の効率性:因果グラフ処理ニューラルネットワーク

Spatiotemporal Forecasting Meets Efficiency: Causal Graph Process Neural Networks ( http://arxiv.org/abs/2405.18879v1 )

ライセンス: Link先を確認

Aref Einizade, Fragkiskos D. Malliaros, Jhony H. Giraldo,

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ内のノードとして表されるセンサ(または他の測定方法)間の関係帰納バイアスを活用することにより、時空間予測の高度化を実現している。しかし、現在の手法はしばしばリカレントニューラルネットワーク(RNN)に依存しており、ランタイムとメモリ使用の増加につながっている。さらに、これらの手法は典型的には1ホップの近傍で機能し、受容野の減少を悪化させる。因果グラフプロセス(CGP)は、パラメータの削減とメモリ消費の最小化のために、MLP層の代わりにグラフフィルタを使用する代替手段を提供する。本稿では,CGP と GNN を組み合わせた非線形モデルである Causal Graph Process Neural Network (CGProNet) を紹介する。 CGProNetは高階グラフフィルタを採用し、より少ないパラメータでモデルを最適化し、メモリ使用量を削減し、実行効率を向上させる。本稿では,CGProNetの理論的および実験的安定性解析について概説する。合成および実データに関する実験は、CGProNetの優れた効率性を示し、競合予測性能を維持しながら、メモリと時間要求を最小限に抑える。

Graph Neural Networks (GNNs) have advanced spatiotemporal forecasting by leveraging relational inductive biases among sensors (or any other measuring scheme) represented as nodes in a graph. However, current methods often rely on Recurrent Neural Networks (RNNs), leading to increased runtimes and memory use. Moreover, these methods typically operate within 1-hop neighborhoods, exacerbating the reduction of the receptive field. Causal Graph Processes (CGPs) offer an alternative, using graph filters instead of MLP layers to reduce parameters and minimize memory consumption. This paper introduces the Causal Graph Process Neural Network (CGProNet), a non-linear model combining CGPs and GNNs for spatiotemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency. We present a comprehensive theoretical and experimental stability analysis, highlighting key aspects of CGProNet. Experiments on synthetic and real data demonstrate CGProNet's superior efficiency, minimizing memory and time requirements while maintaining competitive forecasting performance.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# EventZoom: 強化されたニューロモーフィックビジョンのためのイベントベースのデータ拡張への進歩的なアプローチ

EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision ( http://arxiv.org/abs/2405.18880v1 )

ライセンス: Link先を確認

Yiting Dong, Xiang He, Guobin Shen, Dongcheng Zhao, Yang Li, Yi Zeng,

(参考訳) Dynamic Vision Sensors (DVS)がキャプチャしたイベントデータは、従来のビデオキャプチャとは異なる視覚処理にユニークなアプローチを提供し、動的およびリアルタイムシナリオにおけるその効率を示す。高時間分解能や低エネルギー消費といったアドバンテージにもかかわらず、イベントデータの適用は、データセットのサイズと多様性の制限による課題に直面している。これを解決するために、イベントデータ用に特別に設計されたデータ拡張戦略であるEventZoomを開発しました。 EventZoomは、時間と空間をインテリジェントにブレンドして、その信頼性を維持しながらデータの多様性と複雑さを高める、プログレッシブな時間戦略を採用している。本手法は,複雑な動的シーンを扱うアルゴリズムの適応性とロバスト性を向上させることを目的としている。教師付き学習,半教師付き学習,教師なし学習を含む,さまざまな教師付き学習フレームワークを対象に,EventZoomを実験的に検証した。以上の結果から,EventZoomはさまざまな学習環境において,強力なイベントベースデータ拡張ツールとしての有効性と適用性を確認するとともに,他のデータ拡張手法を一貫して上回ることを示す。

Event data captured by Dynamic Vision Sensors (DVS) offers a unique approach to visual processing that differs from traditional video capture, showcasing its efficiency in dynamic and real-time scenarios. Despite advantages such as high temporal resolution and low energy consumption, the application of event data faces challenges due to limited dataset size and diversity. To address this, we developed EventZoom -- a data augmentation strategy specifically designed for event data. EventZoom employs a progressive temporal strategy that intelligently blends time and space to enhance the diversity and complexity of the data while maintaining its authenticity. This method aims to improve the quality of data for model training and enhance the adaptability and robustness of algorithms in handling complex dynamic scenes. We have experimentally validated EventZoom across various supervised learning frameworks, including supervised, semi-supervised, and unsupervised learning. Our results demonstrate that EventZoom consistently outperforms other data augmentation methods, confirming its effectiveness and applicability as a powerful event-based data augmentation tool in diverse learning settings.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# 直接雑音最適化を用いた拡散モデルの調整自由配向

Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization ( http://arxiv.org/abs/2405.18881v1 )

ライセンス: Link先を確認

Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang, Mingyi Hong, Fan Wang, Tsung-Hui Chang,

(参考訳) 本研究では,人間の嗜好改善など,下流タスクの具体的目的を表す連続報酬関数を用いた拡散モデルのアライメント問題に焦点をあてる。アライメント問題の主目的は、生成したサンプルが目標報酬関数を最大化するように拡散モデルで学習した分布を調整することである。拡散モデルのサンプリング過程において, 直接雑音最適化 (DNO) と呼ばれる新しいアライメント手法を提案する。設計上、DNOはチューニング不要で、生成中にオンライン形式でアライメントが発生するため、プロンプトに依存しない。我々は、DNOの理論的性質を厳密に研究し、また、微分不可能な報酬関数を扱う変種を提案する。さらに,DNO の素直な実装は,最適化されたサンプルが高い報酬を得られるが,事前学習された分布をサポートできない,不当な分配報酬ハック問題に悩まされることも見いだした。この問題を解決するために,古典的高次元統計理論を活用し,確率正規化によるDNO損失の増大を提案する。我々は、人間のフィードバックデータに基づいて訓練された複数の人気報酬関数について広範な実験を行い、提案したDNOアプローチが、最先端の報酬スコアと高画質を、すべて生成に適切な時間予算で達成できることを実証した。

In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.

翻訳日:2024-05-30 18:09:15 公開日:2024-05-29

# DecomCAM: 分解と統合によるサリエンシマップを越えた拡張

DecomCAM: Advancing Beyond Saliency Maps through Decomposition and Integration ( http://arxiv.org/abs/2405.18882v1 )

ライセンス: Link先を確認

Yuguang Yang, Runtang Guo, Sheng Wu, Yimi Wang, Linlin Yang, Bo Fan, Jilong Zhong, Juan Zhang, Baochang Zhang,

(参考訳) 複雑な深層ネットワーク、特に事前訓練された視覚言語モデル(VLM)の解釈は、非常に難しい課題である。現在のクラスアクティベーションマップ(CAM)手法では、モデルの意思決定基準を明らかにする領域が強調されているが、明確なサリエンシマップと詳細な解釈容易性は欠如している。このギャップを埋めるために,チャネル活性化マップから共有パターンを抽出する新しい分解・積分法であるDecomCAMを提案する。特異値分解を利用して、DecomCAMはクラス識別活性化マップを直交サブサービスマップ(OSSM)に分解し、ターゲット概念への貢献に基づいて統合する。 6つのベンチマークでの大規模な実験により、DecomCAMは正確な位置決めに優れるだけでなく、解釈可能性と計算効率のバランスを最適化できることがわかった。さらなる分析により、OSSMは識別可能なオブジェクトコンポーネントと相関し、モデルの推論のきめ細かい理解を促進することが判明した。これにより、DecomCAMは高度なディープラーニングモデルの微妙な解釈のための潜在的なツールとして位置づけられる。コードはhttps://github.com/CapricornGuang/DecomCAMで利用可能である。

Interpreting complex deep networks, notably pre-trained vision-language models (VLMs), is a formidable challenge. Current Class Activation Map (CAM) methods highlight regions revealing the model's decision-making basis but lack clear saliency maps and detailed interpretability. To bridge this gap, we propose DecomCAM, a novel decomposition-and-integration method that distills shared patterns from channel activation maps. Utilizing singular value decomposition, DecomCAM decomposes class-discriminative activation maps into orthogonal sub-saliency maps (OSSMs), which are then integrated together based on their contribution to the target concept. Extensive experiments on six benchmarks reveal that DecomCAM not only excels in locating accuracy but also achieves an optimizing balance between interpretability and computational efficiency. Further analysis unveils that OSSMs correlate with discernible object components, facilitating a granular understanding of the model's reasoning. This positions DecomCAM as a potential tool for fine-grained interpretation of advanced deep learning models. The code is avaible at https://github.com/CapricornGuang/DecomCAM.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# イオントラップ誘起ac磁場のキャラクタリゼーション

Characterization of ion-trap-induced ac-magnetic fields ( http://arxiv.org/abs/2405.18883v1 )

ライセンス: Link先を確認

Manoj K. Joshi, Milena Guevara-Bertsch, Florian Kranzl, Rainer Blatt, Christian F. Roos,

(参考訳) 高周波イオントラップの非平衡電流によって生じる発振磁場は、精密分光実験に有害な遷移周波数シフトとサイドバンド遷移を誘導する。本稿では、2光子分光法に基づいて、直流バイアス磁場を変更したり、トラップRF電力を変更したりすることなく、rf誘起磁場の強度と方向を決定する手法について述べる。この技術は、狭い直線幅の遷移を特徴とする閉じ込められたイオン実験にも容易に適用できる。

The oscillating magnetic field produced by unbalanced currents in radio-frequency ion traps induces transition frequency shifts and sideband transitions that can be harmful to precision spectroscopy experiments. Here, we describe a methodology, based on two-photon spectroscopy, for determining both the strength and direction of rf-induced magnetic fields without modifying any DC magnetic bias field or changing any trap RF power. The technique is readily applicable to any trapped-ion experiment featuring narrow linewidth transitions.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 汎用ブラックボックス離散最適化のためのミックス・オブ・エクササイズ学習

Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization ( http://arxiv.org/abs/2405.18884v1 )

ライセンス: Link先を確認

Shengcai Liu, Zhiyuan Wang, Yew-Soon Ong, Xin Yao, Ke Tang,

(参考訳) 実世界のアプリケーションは様々な個別の最適化問題を含む。これらの問題のそれぞれに特別なオプティマイザを設計することは困難であり、通常、かなりのドメイン知識と人間の努力を必要とする。したがって、幅広い問題に対するオフザシェルフツールとしての汎用オプティマイザの開発は、長年の研究目標となっている。この記事では、完全なデータ駆動学習最適化(L2O)アプローチによってトレーニングされた、新しい汎用神経オプティマイザであるMEGOを紹介する。 MEGOは、トレーニング問題の解決から経験を学習したエキスパートの混合物で構成されており、バイナリ決定変数による最適化問題の基盤モデルと見なすことができる。解決すべき問題を提示すると、MEGOは関連する専門家モデルを選択して高品質なソリューションを生成する。 MEGOは、スタンドアロンのサンプル効率最適化器として、あるいは既存の検索メソッドと組み合わせて、初期ソリューションジェネレータとして使用することができる。 MEGOの一般性は、3つの古典的な問題クラスと3つの問題クラスを含む6つの問題クラスで検証されている。 MEGOは古典的な問題クラスのみに訓練され、6つの問題クラスすべてで非常によく機能し、ソリューションの品質と効率の両面で広く使われている汎用オプティマイザをはるかに上回っている。 MEGOは特定の最先端のオプティマイザを超越する場合もある。さらに、MEGOは問題間の類似度尺度を提供し、問題分類の新しい視点をもたらす。 L2Oを通した汎用オプティマイザの追求において、MEGOは最初の重要な一歩である。

Real-world applications involve various discrete optimization problems. Designing a specialized optimizer for each of these problems is challenging, typically requiring significant domain knowledge and human efforts. Hence, developing general-purpose optimizers as an off-the-shelf tool for a wide range of problems has been a long-standing research target. This article introduces MEGO, a novel general-purpose neural optimizer trained through a fully data-driven learning-to-optimize (L2O) approach. MEGO consists of a mixture-of-experts trained on experiences from solving training problems and can be viewed as a foundation model for optimization problems with binary decision variables. When presented with a problem to solve, MEGO actively selects relevant expert models to generate high-quality solutions. MEGO can be used as a standalone sample-efficient optimizer or in conjunction with existing search methods as an initial solution generator. The generality of MEGO is validated across six problem classes, including three classic problem classes and three problem classes arising from real-world applications in compilers, network analysis, and 3D reconstruction. Trained solely on classic problem classes, MEGO performs very well on all six problem classes, significantly surpassing widely used general-purpose optimizers in both solution quality and efficiency. In some cases, MEGO even surpasses specialized state-of-the-art optimizers. Additionally, MEGO provides a similarity measure between problems, yielding a new perspective for problem classification. In the pursuit of general-purpose optimizers through L2O, MEGO represents an initial yet significant step forward.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 低ランク・低精度分解を用いた大規模言語モデル圧縮

Compressing Large Language Models using Low Rank and Low Precision Decomposition ( http://arxiv.org/abs/2405.18886v1 )

ライセンス: Link先を確認

Rajarshi Saha, Naomi Sagan, Varun Srivastava, Andrea J. Goldsmith, Mert Pilanci,

(参考訳) 現在、LLM(Large Language Models)の禁止サイズは、メモリ制約のあるエッジデバイスへのデプロイを困難にしている。このアルゴリズムは、重量行列 $\mathbf{W}$ の固有の低ランク構造を利用して、低ランクで低精度な分解を $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$ として近似することで、新しい学習後 LLM 圧縮アルゴリズムである $\rm CALDERA$ を導入する。ここで、$\mathbf{L}$ と $\mathbf{R}$ は低いランク因子であり、$\mathbf{Q}$, $\mathbf{L}$ と $\mathbf{R}$ のエントリは量子化される。モデルを各層に$\mathbf{Q} + \mathbf{L}\mathbf{R}$分解を代入して圧縮し、圧縮されたモデルのゼロショット性能を評価する。さらに、$\mathbf{L}$ と $\mathbf{R}$ は容易にローランク適応が可能となり、ゼロショット性能が向上する。 $\rm CALDERA$ はこの分解を最適化問題 $\min_{\mathbf{Q},\mathbf{L},\mathbf{R}}\lVert(\mathbf{Q} + \mathbf{L}\mathbf{R} - \mathbf{W})\mathbf{X}^\top\rVert_{\rm F}^2$ として定式化し、$\mathbf{X}$ はキャリブレーションデータである。ランク制約回帰フレームワークを用いて,$\rm CALDERA$の近似誤差に関する理論的上限を設定し,目標ランクと量子化ビット予算の影響を分析して,圧縮率とモデル性能のトレードオフについて検討した。その結果、LlaMa-$2$$7$B/$70$BとLlaMa-$3$8$Bの圧縮は、パラメータあたり2.5ドル以下という既存のトレーニング後のLCM圧縮技術より優れていることが示された。実装は以下の通りである。 \href{https://github.com/pilancilab/caldera}{https://github.com/pilancilab/caldera}。

The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy them on memory-constrained edge devices. This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$. Here, $\mathbf{L}$ and $\mathbf{R}$ are low rank factors, and the entries of $\mathbf{Q}$, $\mathbf{L}$ and $\mathbf{R}$ are quantized. The model is compressed by substituting each layer with its $\mathbf{Q} + \mathbf{L}\mathbf{R}$ decomposition, and the zero-shot performance of the compressed model is evaluated. Additionally, $\mathbf{L}$ and $\mathbf{R}$ are readily amenable to low-rank adaptation, consequently enhancing the zero-shot performance. $\rm CALDERA$ obtains this decomposition by formulating it as an optimization problem $\min_{\mathbf{Q},\mathbf{L},\mathbf{R}}\lVert(\mathbf{Q} + \mathbf{L}\mathbf{R} - \mathbf{W})\mathbf{X}^\top\rVert_{\rm F}^2$, where $\mathbf{X}$ is the calibration data, and $\mathbf{Q}, \mathbf{L}, \mathbf{R}$ are constrained to be representable using low-precision formats. Theoretical upper bounds on the approximation error of $\rm CALDERA$ are established using a rank-constrained regression framework, and the tradeoff between compression ratio and model performance is studied by analyzing the impact of target rank and quantization bit budget. Results illustrate that compressing LlaMa-$2$ $7$B/$70$B and LlaMa-$3$ $8$B models obtained using $\rm CALDERA$ outperforms existing post-training LLM compression techniques in the regime of less than $2.5$ bits per parameter. The implementation is available at: \href{https://github.com/pilancilab/caldera}{https://github.com/pilancilab/caldera}.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 深層強化学習に基づく住宅におけるプライバシ・コストのトレードオフを伴う積極的負荷形成戦略

Proactive Load-Shaping Strategies with Privacy-Cost Trade-offs in Residential Households based on Deep Reinforcement Learning ( http://arxiv.org/abs/2405.18888v1 )

ライセンス: Link先を確認

Ruichang Zhang, Youcheng Sun, Mustafa A. Mustafa,

(参考訳) スマートメーターはエネルギー管理と効率を高める上で重要な役割を担いますが、エネルギー消費パターンを通じて詳細なユーザー行動を明らかにすることで、プライバシー上の懸念を生じさせます。近年の研究では、コストのバランスを保ちながらユーザのプライバシを保護するため、バッテリ支援型ロードシェイピング技術の開発に焦点が当てられている。本稿では,攻撃者を誤解させるような人工的な負荷シグネチャを積極的に生成することにより,ユーザのプライバシを保護するために設計された,深層強化学習に基づくロードシェイピングアルゴリズム(PLS-DQN)を提案する。我々は,提案アルゴリズムを非侵入負荷監視(NILM)の敵に対して評価する。その結果,本手法は実際のエネルギー利用パターンを効果的に隠蔽するだけでなく,コスト効率を維持しつつユーザのプライバシを向上させる上で,最先端の手法よりも優れていることがわかった。

Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on developing battery-aided load-shaping techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learning-based load-shaping algorithm (PLS-DQN) designed to protect user privacy by proactively creating artificial load signatures that mislead potential attackers. We evaluate our proposed algorithm against a non-intrusive load monitoring (NILM) adversary. The results demonstrate that our approach not only effectively conceals real energy usage patterns but also outperforms state-of-the-art methods in enhancing user privacy while maintaining cost efficiency.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 生成AIの暖房と使用状況の認識について

On Perception of Prevalence of Cheating and Usage of Generative AI ( http://arxiv.org/abs/2405.18889v1 )

ライセンス: Link先を確認

Roman Denkin,

(参考訳) 本報告では,学生の不正行為の頻度に対する教員の認識と,生成AIが学術的整合性に与える影響について検討する。データは、ウプサラ大学情報工学科の教員の匿名調査を通じて収集され、2004年から2023年までの不正調査に関する機関統計とともに分析された。その結果、教師は一般的に不正行為を一般的とはみなさないが、ジェネレーティブAIのアクセシビリティのため、その発生率が増加しているという強い信念が持たれている。ほとんどの教師は、不正行為とAIの使用法を同一視していないが、学生の間で広く使われていることを認めている。さらに、教師の認識は不正傾向の客観的なデータと一致し、学業不条理の進化する風景に対する認識を浮き彫りにしている。

This report investigates the perceptions of teaching staff on the prevalence of student cheating and the impact of Generative AI on academic integrity. Data was collected via an anonymous survey of teachers at the Department of Information Technology at Uppsala University and analyzed alongside institutional statistics on cheating investigations from 2004 to 2023. The results indicate that while teachers generally do not view cheating as highly prevalent, there is a strong belief that its incidence is increasing, potentially due to the accessibility of Generative AI. Most teachers do not equate AI usage with cheating but acknowledge its widespread use among students. Furthermore, teachers' perceptions align with objective data on cheating trends, highlighting their awareness of the evolving landscape of academic dishonesty.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 局所的推定グローバル摂動はフェデレートシャープネス認識最小化のための局所摂動よりも優れている

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization ( http://arxiv.org/abs/2405.18890v1 )

ライセンス: Link先を確認

Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang,

(参考訳) フェデレートラーニング(FL)では、クライアント間のマルチステップ更新とデータの異質性により、よりシャープなミニマでロスランドスケープが発生し、結果のグローバルモデルの性能が低下する。一般的なフェデレーションアプローチは、シャープネス認識最小化(SAM)を局所的なトレーニングに組み込んでこの問題を軽減する。しかし、局所的な損失景観は、異種環境におけるグローバルな損失景観の平坦さを正確に反映するものではなく、結果として、局所的なシャープネスを最小化し、クライアントデータに対する摂動を計算することは、FLにおけるSAMの有効性と集中的なトレーニングとを一致させることができない。この課題を解決するために,FedLESAMを提案する。FedLESAMは,クライアント側におけるグローバルな摂動方向を,前回のアクティブラウンドと現在のラウンドのグローバルモデルの違いとして局所的に推定するアルゴリズムである。改善された品質に加えて、FedLESAMはイテレーション毎に一度だけバックプロパゲーションを実行するため、フェデレートされたSAMベースのアプローチを高速化する。理論的には、一貫した摂動を保証することによって、元のFedSAMよりもわずかに密接な境界を証明している。実験的に,フェデレートされた4つのベンチマークデータセットを3つの分割戦略で包括的に実験し,FedLESAMの優れた性能と効率を実証した。

In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# Few-Shot Testing: 1つのベイズ試験ベクトルを用いた膜深部ニューラルネットワークの不確かさの推定

Few-Shot Testing: Estimating Uncertainty of Memristive Deep Neural Networks Using One Bayesian Test Vector ( http://arxiv.org/abs/2405.18894v1 )

ライセンス: Link先を確認

Soyed Tuhin Ahmed, Mehdi Tahoori,

(参考訳) ニューラルネットワーク(NN)のようなディープラーニングアルゴリズムのパフォーマンスは、最近大幅に向上し、多くのドメインで最先端のパフォーマンスを達成することができる。しかし、メモリと計算リソースの制約のため、エッジデバイスにNNを実装するのは難しい作業である。したがって、メモリメモリ(CIM)などのハードウェアアクセラレータは、行列ベクトル乗算(行列ベクトル乗算)など、最も一般的な演算を高速化するために開発された。しかし、固有のデバイス特性、温度などの外部環境要因、未熟な製造プロセスにより、メムリスタは製造や実行中に発生する欠陥や変動など、様々な非理想に悩まされる。その結果、モデルによる予測に完全な信頼が欠如している。本稿では,デバイス非イデアルの存在下でハードウェアアクセラレーターが行うNN予測の信頼性を向上させるために,memristor-based CIMハードウェア上で実装されたNNのモデル不確かさを推定できるベイズテストベクトル生成フレームワークを提案する。従来の点推定試験ベクトル生成法と比較して,本手法は異なるモデル次元でより一般化可能であり,ハードウェアに1つのベイズベクトルだけを格納する必要がある。提案手法は, 異なるモデル次元, タスク, 故障率, 変動ノイズに基づいて評価し, メモリオーバーヘッドを0.024$ MB に抑えながら, 常に100\% のカバレッジを達成可能であることを示す。

The performance of deep learning algorithms such as neural networks (NNs) has increased tremendously recently, and they can achieve state-of-the-art performance in many domains. However, due to memory and computation resource constraints, implementing NNs on edge devices is a challenging task. Therefore, hardware accelerators such as computation-in-memory (CIM) with memristive devices have been developed to accelerate the most common operations, i.e., matrix-vector multiplication. However, due to inherent device properties, external environmental factors such as temperature, and an immature fabrication process, memristors suffer from various non-idealities, including defects and variations occurring during manufacturing and runtime. Consequently, there is a lack of complete confidence in the predictions made by the model. To improve confidence in NN predictions made by hardware accelerators in the presence of device non-idealities, in this paper, we propose a Bayesian test vector generation framework that can estimate the model uncertainty of NNs implemented on memristor-based CIM hardware. Compared to the conventional point estimate test vector generation method, our method is more generalizable across different model dimensions and requires storing only one test Bayesian vector in the hardware. Our method is evaluated on different model dimensions, tasks, fault rates, and variation noise to show that it can consistently achieve $100\%$ coverage with only $0.024$ MB of memory overhead.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 経験的方程式開発のための単位認識型遺伝的プログラミング

Unit-Aware Genetic Programming for the Development of Empirical Equations ( http://arxiv.org/abs/2405.18896v1 )

ライセンス: Link先を確認

Julia Reuter, Viktor Martinek, Roland Herzog, Sanaz Mostaghim,

(参考訳) 経験方程式を開発する際には、ドメインの専門家はこれらを正確で物理的法則に従うように要求する。しばしば、未知の単位を持つ定数は方程式とともに発見される。従来の単位認識型遺伝的プログラミング(GP)アプローチは、未決定単位の未知定数を含む場合には使用できない。本稿では,未知の単位を「ジョーカー」として伝播させ,単位違反の大きさを返却する次元解析手法を提案する。本稿では,GPアルゴリズムの次元解析を統合するために,エボリューティブカリング,修復機構,多目的アプローチという3つの手法を提案する。基礎的真理を持つデータセットの実験は、エボリューティブ・カリングの同等の性能を示し、次元解析なしでベースラインへの多目的アプローチを示す。根拠のないデータセットの大規模な分析により、単位認識アルゴリズムは、単位依存の解を生成する一方で、精度の低い犠牲しか生じないことが明らかとなった。全体として、単元型経験方程式を開発するための有望な新しいアプローチを提示した。

When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# MLAE:パラメータ効率の良いファインチューニングのためのマスク付きLoRAエキスパート

MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2405.18897v1 )

ライセンス: Link先を確認

Junjie Wang, Guangjing Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Qicheng Lao,

(参考訳) 大規模事前学習モデルの完全微調整に要する広範囲なパラメータ更新による課題に対して,ローランド適応(LoRA)を例として,パラメータ効率のよい微調整(PEFT)法が出現している。 LoRAは微調整のプロセスを単純化するが、低ランク行列における一定の冗長性に苦しむ可能性があり、単にランクを上げることによる有効性は限られている。これらの問題に対処するため、自然な考え方は、低ランク行列の学習プロセスの独立性と多様性を高めることである。そこで我々は,マスクの概念をPEFTに適用する革新的な手法であるMasked LoRA Experts (MLAE)を提案する。本手法は,低ランク行列を独立したランク1サブマトリクス,すなわち 'experts' に変換するセル分解戦略を取り入れ,独立性を向上する。さらに、これらの専門家を訓練中に選択的に活性化する二項マスク行列を導入し、専門家レベルのドロップアウト戦略に基づいて、より多様で異方性のある学習を促進する。本研究により, この選択的活性化は, 性能の向上だけでなく, MLAE間のパラメータ類似性を顕著に低下させ, パラメータ数を増加させると共に, モデルの品質を著しく向上させると共に, より多様な知識獲得を促進することが明らかとなった。注目すべきことに、MLAEはVTAB-1kベンチマークで平均78.8%、FGVCベンチマークで90.9%の精度で新しいSOTA性能を実現し、優れた性能を示している。私たちのコードはhttps://github.com/jie040109/MLAEで利用可能です。

In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or ``experts'', thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model while barely increasing the parameter count. Remarkably, MLAE achieves new SOTA performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, demonstrating superior performance. Our code is available at https://github.com/jie040109/MLAE.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# スペクトルの忠実度と空間的エンハンスメント:衛星画像のためのパンシャープ技術の評価とカスケード

Spectral Fidelity and Spatial Enhancement: An Assessment and Cascading of Pan-Sharpening Techniques for Satellite Imagery ( http://arxiv.org/abs/2405.18900v1 )

ライセンス: Link先を確認

Abdul Aziz A. B, A. B Abdul Rahim,

(参考訳) 本研究は, スペクトルの忠実度と空間的エンハンスメントの重要な側面に着目し, 衛星画像のパンシャーピング技術に関する包括的評価を行う。リモートセンシングにおける情報的アルゴリズム選択の必要性から,既存手法の詳細な比較分析により,新たなカスケード・構造化評価フレームワークが提案されている。研究結果は,空間分解能の増強とともに,スペクトル精度が約88%の複雑なトレードオフを浮き彫りにした。この研究は、パンシャーピングの実践的意義に光を当て、リモートセンシングアプリケーションにおけるスペクトル面と空間面の両方の重要性を強調している。様々なパンシャーピングアルゴリズムは、その性能の全体像を提供するために体系的に採用され、その能力と限界のより深い理解に寄与した。

This research presents a comprehensive assessment of pan-sharpening techniques for satellite imagery, focusing on the critical aspects of spectral fidelity and spatial enhancement. Motivated by the need for informed algorithm selection in remote sensing, A novel cascaded and structured evaluation framework has been proposed with a detailed comparative analysis of existing methodologies. The research findings underscore the intricate trade-offs between spectral accuracy of about 88\% with spatial resolution enhancement. The research sheds light on the practical implications of pan-sharpening and emphasizes the significance of both spectral and spatial aspects in remote sensing applications. Various pan-sharpening algorithms were systematically employed to provide a holistic view of their performance, contributing to a deeper understanding of their capabilities and limitations.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# デファリングシステム評価のための因果的枠組み

A Causal Framework for Evaluating Deferring Systems ( http://arxiv.org/abs/2405.18902v1 )

ライセンス: Link先を確認

Filippo Palomba, Andrea Pugnana, José Manuel Alvarez, Salvatore Ruggieri,

(参考訳) 定義システムは教師付き機械学習(ML)モデルを拡張し、予測を人間の専門家に延期する。しかし、遅延戦略がシステム精度に与える影響を評価することは、まだ見過ごされている領域である。本稿では、因果レンズによる遅延システムの評価により、このギャップを埋める。我々は、因果推論の潜在的な結果フレームワークと延期システムとを関連づける。これにより,遅延戦略の因果的影響が予測精度に与える影響を明らかにすることができる。 2つのシナリオを区別する。最初の例では、遅延したインスタンスに対して、人間とMLモデルの予測の両方にアクセスできます。そのような場合、遅延したインスタンスとそれらの集合に対する個々の因果効果を特定できる。第2のシナリオでは、遅延したインスタンスに対して、人間の予測しか利用できない。この場合、回帰不連続設計を利用して局所因果効果を推定できる。文献からの7つの遅延システムの合成および実データに対するアプローチを実証的に評価した。

Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This allows us to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we can access both the human and the ML model predictions for the deferred instances. In such a case, we can identify the individual causal effects for deferred instances and aggregates of them. In the second scenario, only human predictions are available for the deferred instances. In this case, we can resort to regression discontinuity design to estimate a local causal effect. We empirically evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 厳密なスコーリング規則による言語生成

Language Generation with Strictly Proper Scoring Rules ( http://arxiv.org/abs/2405.18906v1 )

ライセンス: Link先を確認

Chenze Shao, Fandong Meng, Yijin Liu, Jie Zhou,

(参考訳) 最大推定(MLE)に基づく言語生成は,テキスト生成の基本的なアプローチとなっている。最大確率推定は通常、統計決定理論における対数スコアとしても知られる対数類似損失を最小化することによって行われる。対数スコアは、モデルが真の確率を報告したときにのみ期待されるスコアが最大化されるような、誠実な予測を促進するという意味では、厳密には適切である。多くの厳密な適切な採点規則が存在するが、対数スコアは観察されたサンプルの確率にのみ依存する唯一の局所採点ルールであり、自然テキストの指数的に大きなサンプル空間を扱うことができる。本研究では,非局所的なスコアリングルールを用いた言語モデリングを可能にするため,スコアリングルールを言語生成に適用するための簡単な戦略を提案する。この戦略を活用することで、対数スコアの代替として、2つの古典的な厳密なスコアルールであるブライアスコアと球面スコアを用いて言語生成モデルを訓練する。実験結果から, 他のハイパーパラメータを調整せずに損失関数を置換するだけで, モデル生成能力が大幅に向上することが示唆された。さらに、LLaMA-7BやLLaMA-13Bのような大きな言語モデル(LLM)にも拡張可能である。ソースコード: \url{https://github.com/shaochenze/ScoringRulesLM}。

Language generation based on maximum likelihood estimation (MLE) has become the fundamental approach for text generation. Maximum likelihood estimation is typically performed by minimizing the log-likelihood loss, also known as the logarithmic score in statistical decision theory. The logarithmic score is strictly proper in the sense that it encourages honest forecasts, where the expected score is maximized only when the model reports true probabilities. Although many strictly proper scoring rules exist, the logarithmic score is the only local scoring rule among them that depends exclusively on the probability of the observed sample, making it capable of handling the exponentially large sample space of natural text. In this work, we propose a straightforward strategy for adapting scoring rules to language generation, allowing for language modeling with any non-local scoring rules. Leveraging this strategy, we train language generation models using two classic strictly proper scoring rules, the Brier score and the Spherical score, as alternatives to the logarithmic score. Experimental results indicate that simply substituting the loss function, without adjusting other hyperparameters, can yield substantial improvements in model's generation capabilities. Moreover, these improvements can scale up to large language models (LLMs) such as LLaMA-7B and LLaMA-13B. Source code: \url{https://github.com/shaochenze/ScoringRulesLM}.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# クロスドメインデータによるシンガポールのパーキング可用性予測 - 新しいデータセットとデータ駆動アプローチ

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach ( http://arxiv.org/abs/2405.18910v1 )

ライセンス: Link先を確認

Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang,

(参考訳) 車両の増加は、効率的な駐車スペース管理の必要性を強調している。リアルタイムパーキングアベイラビリティ(PA)の予測は、交通渋滞とそれに伴う社会問題を軽減するのに役立つ。本研究では,シンガポールにおける今後のPAを,様々な領域の複雑な要因で総合的に予測することを目的としている。 1)新しいデータセット: シンガポールの1,687の駐車場から1年分のPAデータを含む,さまざまな空間的・時間的要因に富んだデータセットについて紹介する。 2)データ駆動アプローチ: 数千の駐車場にまたがる将来のPAを集合的かつ効率的に予測する,新しいディープラーニングフレームワークであるDeepPAを提案する。 (3) 大規模実験と展開: DeepPAは既存の先進モデルと比較して最大3時間予測の予測誤差を9.2%削減することを示した。さらに,DeepPAを実践的なWebベースプラットフォームに実装し,ドライバーを支援するリアルタイムPA予測と,シンガポールの知事に都市計画を通知する。データセットとソースコードはhttps://github.com/yoshall/SINPAで公開しています。

The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domains. The contributions in this paper are listed as follows: (1) A New Dataset: We introduce the \texttt{SINPA} dataset, containing a year's worth of PA data from 1,687 parking lots in Singapore, enriched with various spatial and temporal factors. (2) A Data-Driven Approach: We present DeepPA, a novel deep-learning framework, to collectively and efficiently predict future PA across thousands of parking lots. (3) Extensive Experiments and Deployment: DeepPA demonstrates a 9.2% reduction in prediction error for up to 3-hour forecasts compared to existing advanced models. Furthermore, we implement DeepPA in a practical web-based platform to provide real-time PA predictions to aid drivers and inform urban planning for the governors in Singapore. We release the dataset and source code at https://github.com/yoshall/SINPA.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 能動学習とモデル選択の相乗化による対人テスト時間適応の探索

Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection ( http://arxiv.org/abs/2405.18911v1 )

ライセンス: Link先を確認

Yushu Li, Yongyi Su, Xulei Yang, Kui Jia, Xun Xu,

(参考訳) 既存のテスト時間適応(TTA)アプローチは、ラベルのないテストデータストリームでモデルに適応することが多い。近年の研究では,Human-In-the-Loop Test-Time Adaptation (HILTTA)と呼ばれる,限定的な人間のアノテーションを導入することで,仮説を緩和した。既存のHILTTAの焦点は、ラベル付けする最も情報に富むサンプル、すなわちアクティブラーニングの選択にある。本研究では,超パラメータに敏感なTTAの落とし穴を動機として,能動的学習とモデル選択の相乗化によるHILTTAへのアプローチを提案する。具体的には、まず人間のアノテーション(能動的学習)のサンプルを選択し、次にラベル付きデータを用いて最適なハイパーパラメータ(モデル選択)を選択する。サンプル選択戦略は、アクティブラーニングとモデル選択の目的とのバランスを考慮し、サンプルを選択するために調整される。提案手法は,最先端のHILTTA手法やストリームベースのアクティブラーニング手法よりも優れている,既製のTTA手法と互換性があることを4つのTTAデータセットで実証した。重要な点として,本提案手法は,市販のTTA方式で常に最悪の過度パラメータの選択を防止できる。ソースコードは公開時に公開される。

Existing test-time adaptation (TTA) approaches often adapt models with the unlabeled testing data stream. A recent attempt relaxed the assumption by introducing limited human annotation, referred to as Human-In-the-Loop Test-Time Adaptation (HILTTA) in this study. The focus of existing HILTTA lies on selecting the most informative samples to label, a.k.a. active learning. In this work, we are motivated by a pitfall of TTA, i.e. sensitive to hyper-parameters, and propose to approach HILTTA by synergizing active learning and model selection. Specifically, we first select samples for human annotation (active learning) and then use the labeled data to select optimal hyper-parameters (model selection). A sample selection strategy is tailored for choosing samples by considering the balance between active learning and model selection purposes. We demonstrate on 4 TTA datasets that the proposed HILTTA approach is compatible with off-the-shelf TTA methods which outperform the state-of-the-art HILTTA methods and stream-based active learning methods. Importantly, our proposed method can always prevent choosing the worst hyper-parameters on all off-the-shelf TTA methods. The source code will be released upon publication.

翻訳日:2024-05-30 17:59:30 公開日:2024-05-29

# 土壌水分予測のためのスマート農業における時系列基盤モデルの導入

Leveraging Time-Series Foundation Models in Smart Agriculture for Soil Moisture Forecasting ( http://arxiv.org/abs/2405.18913v1 )

ライセンス: Link先を確認

Boje Deforce, Bart Baesens, Estefanía Serral Asensio,

(参考訳) 近年、自然言語処理とコンピュータビジョンの基礎モデルが急増し、様々な領域におけるイノベーションが加速した。この進歩に触発されて、スマート農業における時系列予測の基礎モデルの可能性を探る。具体的には、土壌水ポテンシャル(\psi_\mathrm{soil}$)を予測するため、土壌水の状態(SOTA)時系列基盤モデルである$\texttt{TimeGPT}$という新しい応用法を提案する。伝統的に、このタスクは幅広い入力変数に依存する。我々は$\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in:$i$) a zero-shot setting,$ii$) 歴史的$\psi_\mathrm{soil}$ Measurement,$iii$) 細調整された設定を探索し、モデルに外因性変数を追加する。我々は$\texttt{TimeGPT}$のパフォーマンスを、$\psi_\mathrm{soil}$を予測するための確立されたSOTAベースラインモデルと比較する。我々の結果は、$\texttt{TimeGPT}$が、歴史的な$\psi_\mathrm{soil}$データのみを使用して、競合予測精度を達成し、農業アプリケーションに対するその顕著な可能性を強調していることを示している。本研究は、伝統的に大規模なデータ収集やドメインの専門知識に依存したタスクの予測を可能にすることにより、農業における持続的開発のための時系列モデル構築の道を開くものである。

The recent surge in foundation models for natural language processing and computer vision has fueled innovation across various domains. Inspired by this progress, we explore the potential of foundation models for time-series forecasting in smart agriculture, a field often plagued by limited data availability. Specifically, this work presents a novel application of $\texttt{TimeGPT}$, a state-of-the-art (SOTA) time-series foundation model, to predict soil water potential ($\psi_\mathrm{soil}$), a key indicator of field water status that is typically used for irrigation advice. Traditionally, this task relies on a wide array of input variables. We explore $\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in: ($i$) a zero-shot setting, ($ii$) a fine-tuned setting relying solely on historic $\psi_\mathrm{soil}$ measurements, and ($iii$) a fine-tuned setting where we also add exogenous variables to the model. We compare $\texttt{TimeGPT}$'s performance to established SOTA baseline models for forecasting $\psi_\mathrm{soil}$. Our results demonstrate that $\texttt{TimeGPT}$ achieves competitive forecasting accuracy using only historical $\psi_\mathrm{soil}$ data, highlighting its remarkable potential for agricultural applications. This research paves the way for foundation time-series models for sustainable development in agriculture by enabling forecasting tasks that were traditionally reliant on extensive data collection and domain expertise.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# 信頼に満ちた結束に向けて:大規模言語モデルがレゾネーターを橋渡ししている

Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners ( http://arxiv.org/abs/2405.18915v1 )

ライセンス: Link先を確認

Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao,

(参考訳) 大言語モデル(LLM)は、深刻な不信の連鎖(CoT)問題に悩まされる。従来の研究では測定と説明が試みられていたが、CoTの内部での詳細な分析は欠如しており、すべての推論コンポーネント間の相互作用を共同で考慮していない。本稿では,まずCoTステップの粒度におけるCoT忠実度問題について検討し,集中的推論と分散推論という2つの推論パラダイムを同定し,信頼度との関係を明らかにする。その後,文脈,CoT,回答の因果関係に関する共同分析を行った。その結果、LLMが回答を予測すると、文脈からCoTに欠けている正しい情報を思い出すことができ、不誠実な問題を引き起こすことが証明された。最後に,この問題を緩和するための推論ブリッジ手法を提案する。そこでは属性法を用いてCoT生成のヒントとして情報をリコールし,その意味的一貫性と属性スコアに基づいてノイズの多いCoTをフィルタリングする。大規模な実験は、我々のアプローチが不誠実なCoT問題を効果的に軽減することを示した。

Large language models (LLMs) suffer from serious unfaithful chain-of-thought (CoT) issues. Previous work attempts to measure and explain it but lacks in-depth analysis within CoTs and does not consider the interactions among all reasoning components jointly. In this paper, we first study the CoT faithfulness issue at the granularity of CoT steps, identify two reasoning paradigms: centralized reasoning and distributed reasoning, and find their relationship with faithfulness. Subsequently, we conduct a joint analysis of the causal relevance among the context, CoT, and answer during reasoning. The result proves that, when the LLM predicts answers, it can recall correct information missing in the CoT from the context, leading to unfaithfulness issues. Finally, we propose the inferential bridging method to mitigate this issue, in which we use the attribution method to recall information as hints for CoT generation and filter out noisy CoTs based on their semantic consistency and attribution scores. Extensive experiments demonstrate that our approach effectively alleviates the unfaithful CoT problem.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# 対実データ増大を考慮した因果行動の影響

Causal Action Influence Aware Counterfactual Data Augmentation ( http://arxiv.org/abs/2405.18917v1 )

ライセンス: Link先を確認

Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, Georg Martius,

(参考訳) オフラインデータは、ロボットに複雑な振る舞いを教えるための価値と実践的なリソースである。理想的には、学習エージェントは、利用可能なデモンストレーションの不足によって制約されるべきではない。しかし、現実のシナリオの複雑さは通常、ニューラルネットワークポリシーが素早い相関関係を拾い上げ、非因果関係を学ぶのを防ぐために大量のデータを必要とします。 CAIACは、オンライン環境のインタラクションにアクセスすることなく、固定データセットから実現可能な合成遷移を生成できるデータ拡張手法である。因果的影響を定量化するための原則的手法を利用することで、データセット内の独立軌跡間の状態空間の$\it{action}$-unffected部分を交換することで、反ファクト的推論を行うことができる。これは、分散シフトに対するオフライン学習アルゴリズムのロバスト性を大幅に向上させることを実証的に示す。

Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\it{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# 小惑星帯における低推力移動の計算 : 天体力学操作と機械学習アプローチの比較

Computing low-thrust transfers in the asteroid belt, a comparison between astrodynamical manipulations and a machine learning approach ( http://arxiv.org/abs/2405.18918v1 )

ライセンス: Link先を確認

Giacomo Acciarini, Laurent Beauregard, Dario Izzo,

(参考訳) 低推力軌道は、小惑星帯のミッションにおける科学的出力とコスト効率の最適化に重要な役割を果たしている。高スラスト移動とは異なり、低スラスト軌道は複雑な最適制御問題を解く必要がある。この複雑さは、軌道力学の複雑さによって訪れた小惑星の数とともに指数関数的に増加する。文献では、解析的および機械学習技術を含む、完全な最適化なしに低推力転送を近似する手法が提案されている。本研究では,新しい解析近似を提案し,その精度と性能を機械学習手法と比較する。解析的近似は軌道理論を利用して軌道のコストを推定するが、機械学習はよりブラックボックスなアプローチを採用し、ニューラルネットワークを利用して様々な属性に基づいて最適な移動を予測する。私たちは、時間と燃料の最適制御問題を解決することで、約300万回の転送のデータセットを構築します。このデータベース上の2つの手法の比較は、特に長い転送において、機械学習の優位性を明らかにしている。多変量移動のような課題にもかかわらず、どちらのアプローチも、多くの小惑星を含む軌道のデータベース上で、最終的な質量誤差において数パーセント以内の精度を維持している。この研究は、小惑星帯におけるミッション機会の効率的な探索に寄与し、様々な近似戦略の強さと限界についての洞察を提供する。

Low-thrust trajectories play a crucial role in optimizing scientific output and cost efficiency in asteroid belt missions. Unlike high-thrust transfers, low-thrust trajectories require solving complex optimal control problems. This complexity grows exponentially with the number of asteroids visited due to orbital mechanics intricacies. In the literature, methods for approximating low-thrust transfers without full optimization have been proposed, including analytical and machine learning techniques. In this work, we propose new analytical approximations and compare their accuracy and performance to machine learning methods. While analytical approximations leverage orbit theory to estimate trajectory costs, machine learning employs a more black-box approach, utilizing neural networks to predict optimal transfers based on various attributes. We build a dataset of about 3 million transfers, found by solving the time and fuel optimal control problems, for different time of flights, which we also release open-source. Comparison between the two methods on this database reveals the superiority of machine learning, especially for longer transfers. Despite challenges such as multi revolution transfers, both approaches maintain accuracy within a few percent in the final mass errors, on a database of trajectories involving numerous asteroids. This work contributes to the efficient exploration of mission opportunities in the asteroid belt, providing insights into the strengths and limitations of different approximation strategies.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# GLANCE: 現実的説明責任のためのNutshellにおけるグローバルアクション

GLANCE: Global Actions in a Nutshell for Counterfactual Explainability ( http://arxiv.org/abs/2405.18921v1 )

ライセンス: Link先を確認

Ioannis Emiris, Dimitris Fotakis, Giorgos Giannopoulos, Dimitrios Gunopulos, Loukas Kavouras, Kleopatra Markou, Eleni Psaroudaki, Dimitrios Rontogiannis, Dimitris Sacharidis, Nikolaos Theologitis, Dimitrios Tomaras, Konstantinos Tsopelas,

(参考訳) 複雑な機械学習モデルを理解し、デバッグし、監査するための重要なツールとして、カウンターファクトの説明が登場した。グローバルな対実的説明可能性を提供するため、局所的な説明の要約を構築し、簡潔性、対実的効果、対実的コスト又はインスタンスに課される負担のトレードオフを提供する。本研究では,グローバルな反事実を識別する問題を簡潔に定式化し,パレート支配からインスピレーションを得て,解を比較するための原則的基準を確立する。本稿では,クラスタリングと決定木をキーコンポーネントとして用いて,入力空間全体あるいは特定のパーティションのグローバルな対策を見つけるという課題に対処する,革新的なアルゴリズムを導入する。さらに,問題の様々な事例を考慮し,提案したアルゴリズムを最先端の手法と比較し,総合的な実験評価を行う。その結果,意味的かつ解釈可能なグローバルな対実的説明を生成するアルゴリズムの一貫性が強調された。

Counterfactual explanations have emerged as an important tool to understand, debug, and audit complex machine learning models. To offer global counterfactual explainability, state-of-the-art methods construct summaries of local explanations, offering a trade-off among conciseness, counterfactual effectiveness, and counterfactual cost or burden imposed on instances. In this work, we provide a concise formulation of the problem of identifying global counterfactuals and establish principled criteria for comparing solutions, drawing inspiration from Pareto dominance. We introduce innovative algorithms designed to address the challenge of finding global counterfactuals for either the entire input space or specific partitions, employing clustering and decision trees as key components. Additionally, we conduct a comprehensive experimental evaluation, considering various instances of the problem and comparing our proposed algorithms with state-of-the-art methods. The results highlight the consistent capability of our algorithms to generate meaningful and interpretable global counterfactual explanations.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# 対象の復号化の観点からの翻訳問題の理解と対応

Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective ( http://arxiv.org/abs/2405.18922v1 )

ライセンス: Link先を確認

Chenze Shao, Fandong Meng, Jiali Zeng, Jie Zhou,

(参考訳) ニューラルネットワーク翻訳(NMT)はここ数年で顕著な進歩を遂げてきた。しかし、現在最先端のNTTシステムでは、過翻訳と過翻訳の2つの課題が残っている。本研究では,NMTにおけるアンダートランスレーションの根本原因を詳細に分析し,デコード目的の観点から解説する。ビーム探索の目的を最適化するために、モデルは自信の薄い単語を無視する傾向があり、翻訳の過度な現象につながる。それに対応して、モデルが文末予測(EOS)に自信を持つことは、翻訳が下がったときに減少し、翻訳されていない候補者にとって軽度なペナルティとなる。この分析に基づいて、我々は、EOSをアンダートランスレーションの検知器として予測する自信を生かし、アンダートランスレーションのリスクの高い候補をペナルティ化する自信に基づくペナルティを強化することを提案する。合成データと実世界のデータの両方で実験した結果,本手法は下書き変換された出力を正確に検出し,修正することが可能であり,他の正しい翻訳にはほとんど影響しないことがわかった。

Neural Machine Translation (NMT) has made remarkable progress over the past years. However, under-translation and over-translation remain two challenging problems in state-of-the-art NMT systems. In this work, we conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective. To optimize the beam search objective, the model tends to overlook words it is less confident about, leading to the under-translation phenomenon. Correspondingly, the model's confidence in predicting the End Of Sentence (EOS) diminishes when under-translation occurs, serving as a mild penalty for under-translated candidates. Building upon this analysis, we propose employing the confidence of predicting EOS as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation. Experiments on both synthetic and real-world data show that our method can accurately detect and rectify under-translated outputs, with minor impact on other correct translations.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# MDIW-13: スクリプト識別のための新しい多言語・多スクリプトデータベースとベンチマーク

MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification ( http://arxiv.org/abs/2405.18924v1 )

ライセンス: Link先を確認

Miguel A. Ferrer, Abhijit Das, Moises Diaz, Aythami Morales, Cristina Carmona-Duarte, Umapada Pal,

(参考訳) スクリプト識別は、多言語および多言語環境において、手書きと文書解析を含むアプリケーションにおいて重要な役割を果たす。また、人間の認知と深く結びついている。本稿では,アラビア文字,ベンガル文字,グジャラート文字,グルムクヒ文字,デバナガリ文字,日本語,カナダ文字,マラヤラム文字,オリヤ文字,ローマ文字,タミル文字,テルグ文字,タイ文字など,多種多様なスクリプトから収集された文書をベンチマークする。データセットは、地元の新聞や手書きの手紙からスキャンされた1,135件の文書と、異なるネイティブライターのメモで構成されている。さらに、これらの文書は、それぞれデータセットにおいて合計13,979行と86,655行からなる行と単語に区分される。簡単なベンチマークは、手作りとディープラーニングの手法で提案されている。ベンチマークには、文書、行、単語レベルの結果と、印刷および手書きの文書が含まれている。また、文書/行/ワードレベルに依存しず、印刷/手書き文字に依存しないスクリプト識別の結果も与えられる。新しい多言語データベースは、新しいスクリプト識別子を作成することが期待されており、手書きおよび印刷されたサンプルの識別や、3つのベンチマークの報告された結果に基づいて、将来のスクリプト識別研究の基盤となる様々な課題が提示される。

Script identification plays a vital role in applications that involve handwriting and document analysis within a multi-script and multi-lingual environment. Moreover, it exhibits a profound connection with human cognition. This paper provides a new database for benchmarking script identification algorithms, which contains both printed and handwritten documents collected from a wide variety of scripts, such as Arabic, Bengali (Bangla), Gujarati, Gurmukhi, Devanagari, Japanese, Kannada, Malayalam, Oriya, Roman, Tamil, Telugu, and Thai. The dataset consists of 1,135 documents scanned from local newspaper and handwritten letters as well as notes from different native writers. Further, these documents are segmented into lines and words, comprising a total of 13,979 and 86,655 lines and words, respectively, in the dataset. Easy-to-go benchmarks are proposed with handcrafted and deep learning methods. The benchmark includes results at the document, line, and word levels with printed and handwritten documents. Results of script identification independent of the document/line/word level and independent of the printed/handwritten letters are also given. The new multi-lingual database is expected to create new script identifiers, present various challenges, including identifying handwritten and printed samples and serve as a foundation for future research in script identification based on the reported results of the three benchmarks.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# Federated Continual Learningがオンラインに: モダリティ非依存のクラスインクリメンタルラーニングのための不確実性を活用する

Federated Continual Learning Goes Online: Leveraging Uncertainty for Modality-Agnostic Class-Incremental Learning ( http://arxiv.org/abs/2405.18925v1 )

ライセンス: Link先を確認

Giuseppe Serra, Florian Buettner,

(参考訳) より現実的でダイナミックな問題をモデル化する能力を考えると、FCL(Federated Continual Learning)は近年ますます研究されている。この設定でよく見られる問題は、いわゆる破滅的な忘れことであり、学習モデルは、以前に学んだ知識を忘れながら、より最近のタスクに集中する傾向にある。 FCLの現在のアプローチの大半は、そのような問題を解決するための生成的ソリューションを提案している。しかし、この設定では複数のトレーニングエポックを必要とするため、データセットをローカルに保存し、時間とともに変更するオフライン設定を暗示する。さらに,提案手法は視覚タスクのみに特化している。これらの制限を克服するため、我々は、新しいデータが1回しか処理できないミニバッチのストリームに到着するオンラインシナリオに対処する、新しいモダリティに依存しないアプローチを提案する。破滅的な記憶を解決するために,不確実性を考慮したメモリベースアプローチを提案する。特に,Bregman Information (BI) に基づく推定器を用いて,サンプルレベルでのモデルの分散を計算することを提案する。予測の不確実性の尺度を用いて, 特定の特徴を持つサンプルを抽出し, モデルの再学習により, 現実的な環境下での忘れ込み効果を低減させる手法の可能性を示す。

Given the ability to model more realistic and dynamic problems, Federated Continual Learning (FCL) has been increasingly investigated recently. A well-known problem encountered in this setting is the so-called catastrophic forgetting, for which the learning model is inclined to focus on more recent tasks while forgetting the previously learned knowledge. The majority of the current approaches in FCL propose generative-based solutions to solve said problem. However, this setting requires multiple training epochs over the data, implying an offline setting where datasets are stored locally and remain unchanged over time. Furthermore, the proposed solutions are tailored for vision tasks solely. To overcome these limitations, we propose a new modality-agnostic approach to deal with the online scenario where new data arrive in streams of mini-batches that can only be processed once. To solve catastrophic forgetting, we propose an uncertainty-aware memory-based approach. In particular, we suggest using an estimator based on the Bregman Information (BI) to compute the model's variance at the sample level. Through measures of predictive uncertainty, we retrieve samples with specific characteristics, and - by retraining the model on such samples - we demonstrate the potential of this approach to reduce the forgetting effect in realistic settings.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# 光学制御イオンによるキラル量子加熱と冷却

Chiral quantum heating and cooling with an optically controlled ion ( http://arxiv.org/abs/2405.18927v1 )

ライセンス: Link先を確認

Jin-Tao Bu, Jian-Qi Zhang, Ge-Yi Ding, Jia-Chong Li, Jia-Wei Zhang, Bin Wang, Wen-Qiang Ding, Wen-Fei Yuan, Liang Chen, Qi Zhong, Ali Keçebaş, Şahin K. Özdemir, Fei Zhou, Hui Jing, Mang Feng,

(参考訳) 量子熱エンジンと冷凍機は開量子系であり、非エルミート形式を用いて力学をよく理解することができる。非休眠性の顕著な特徴は例外点(EP)の存在であり、これは閉じた量子系にはない。古典的なシステムでは、ループがEPを含むか否かに関わらず、EP近傍の動的囲みがキラルモード変換をもたらすことが示されている。ここでは、量子ジャンプと関連する雑音の影響を含むリウヴィリアEP(LEP)の近傍で動的囲みが行われる場合、これは量子システムにも有効であることを示す。 LEP近傍に閉じたループを動的に囲むことで、最初のキラル量子加熱・冷却であるポールトラップ超低温イオンを用いて実証した。量子熱機関(量子冷凍機)のキラリティーと熱放出(吸収)とサイクル方向が関連しているのを目撃する。実験の結果, 断熱・断熱だけでなく, ランダウ・ツェナー・シュタッケルベルク過程が動的循環において重要な役割を担っていることが明らかとなった。我々の観測は、非エルミート系におけるキラルおよびトポロジカルな特徴のさらなる理解に寄与し、キラル性と量子熱力学の関係を探求する方法を舗装する。

Quantum heat engines and refrigerators are open quantum systems, whose dynamics can be well understood using a non-Hermitian formalism. A prominent feature of non-Hermiticity is the existence of exceptional points (EPs), which has no counterpart in closed quantum systems. It has been shown in classical systems that dynamical encirclement in the vicinity of an EP, whether the loop includes the EP or not, could lead to chiral mode conversion. Here, we show that this is valid also for quantum systems when dynamical encircling is performed in the vicinity of their Liouvillian EPs (LEPs) which include the effects of quantum jumps and associated noise - an important quantum feature not present in previous works. We demonstrate, using a Paul-trapped ultracold ion, the first chiral quantum heating and refrigeration by dynamically encircling a closed loop in the vicinity of an LEP. We witness the cycling direction to be associated with the chirality and heat release (absorption) of the quantum heat engine (quantum refrigerator). Our experiments have revealed that not only the adiabaticity-breakdown but also the Landau-Zener-St\"uckelberg process play an essential role during dynamic encircling, resulting in chiral thermodynamic cycles. Our observations contributes to further understanding of chiral and topological features in non-Hermitian systems and pave a way to exploring the relation between chirality and quantum thermodynamics.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# 汚染された未ラベルデータに対する深い正のラベル付き異常検出

Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data ( http://arxiv.org/abs/2405.18929v1 )

ライセンス: Link先を確認

Hiroshi Takahashi, Tomoharu Iwata, Atsutoshi Kumagai, Yuuki Yamanaka,

(参考訳) ラベルなしデータに加えて,少量の異常データを用いて,異常検出の性能向上を目的とした半教師付き異常検出が注目されている。既存の半教師付きアプローチでは、ラベルなしデータは概ね正常であると仮定する。彼らは異常検知器を訓練し、ラベルなしデータの異常スコアを最小化し、異常データの異常スコアを最大化する。しかし実際には、ラベルのないデータはしばしば異常によって汚染される。これにより、異常スコアの最大化効果が弱まり、検出性能が向上するのを防ぐことができる。この問題を解決するために,正の未ラベル学習に基づく正の未ラベルオートエンコーダと,オートエンコーダのような異常検出器を提案する。提案手法では, ラベルなしおよび異常データを用いて, 正規データに対する異常スコアを近似することができる。したがって、ラベル付き正規データなしでは、異常検出器を訓練することで、正常データに対する異常スコアを最小化し、異常データに対してそれらを最大化することができる。また,本手法はDeepSVDDなどの各種異常検出装置にも適用可能である。各種データセットを用いた実験により,本手法は既存手法よりも優れた検出性能が得られることが示された。

Semi-supervised anomaly detection, which aims to improve the performance of the anomaly detector by using a small amount of anomaly data in addition to unlabeled data, has attracted attention. Existing semi-supervised approaches assume that unlabeled data are mostly normal. They train the anomaly detector to minimize the anomaly scores for the unlabeled data, and to maximize those for the anomaly data. However, in practice, the unlabeled data are often contaminated with anomalies. This weakens the effect of maximizing the anomaly scores for anomalies, and prevents us from improving the detection performance. To solve this problem, we propose the positive-unlabeled autoencoder, which is based on positive-unlabeled learning and the anomaly detector such as the autoencoder. With our approach, we can approximate the anomaly scores for normal data using the unlabeled and anomaly data. Therefore, without the labeled normal data, we can train the anomaly detector to minimize the anomaly scores for normal data, and to maximize those for the anomaly data. In addition, our approach is applicable to various anomaly detectors such as the DeepSVDD. Experiments on various datasets show that our approach achieves better detection performance than existing approaches.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# EntProp: 精度とロバスト性を改善するための高エントロピープロパゲーション

EntProp: High Entropy Propagation for Improving Accuracy and Robustness ( http://arxiv.org/abs/2405.18931v1 )

ライセンス: Link先を確認

Shohei Enomoto,

(参考訳) ディープニューラルネットワーク(DNN)は、優れたパフォーマンスにもかかわらず、トレーニング中のものとは異なる、配布外ドメインの一般化に苦慮している。実践的な応用においては、DNNが標準精度と配布外領域に対する堅牢性の両方を持つことが重要である。これら2つの改善を両立させる手法の1つは、補助バッチ正規化層(ABN)を介して混合分布を伴う非絡み合い学習である。このテクニックはクリーンで変換されたサンプルを異なるドメインとして扱い、DNNは混合ドメインからより良い機能を学ぶことができる。しかし、エントロピーに基づいてサンプルの領域を区別すると、いくつかの変換されたサンプルはクリーンなサンプルと同じドメインから引き出され、これらのサンプルは完全に異なるドメインではないことが分かる。クリーンな試料とは全く異なる領域から引き出された試料を生成するために, 清潔な高エントロピー試料を変換して, 分布域からはるかに離れた分布域外試料をより多く生成する, という仮説を立てた。この仮説に基づいて,ABNを用いたネットワークに高エントロピーのサンプルを供給する高エントロピー伝搬~(EntProp)を提案する。エントロピーを増大させ,サンプルを分布内領域からさらに遠ざける2つの手法,すなわちデータ拡張と自由敵対的トレーニングを導入する。これらの技術は追加の訓練費用を必要としない。実験の結果,EntPropはベースライン法よりもトレーニングコストの低い標準精度とロバスト性を実現していることがわかった。特にEntPropは、小さなデータセットのトレーニングに非常に効果的です。

Deep neural networks (DNNs) struggle to generalize to out-of-distribution domains that are different from those in training despite their impressive performance. In practical applications, it is important for DNNs to have both high standard accuracy and robustness against out-of-distribution domains. One technique that achieves both of these improvements is disentangled learning with mixture distribution via auxiliary batch normalization layers (ABNs). This technique treats clean and transformed samples as different domains, allowing a DNN to learn better features from mixed domains. However, if we distinguish the domains of the samples based on entropy, we find that some transformed samples are drawn from the same domain as clean samples, and these samples are not completely different domains. To generate samples drawn from a completely different domain than clean samples, we hypothesize that transforming clean high-entropy samples to further increase the entropy generates out-of-distribution samples that are much further away from the in-distribution domain. On the basis of the hypothesis, we propose high entropy propagation~(EntProp), which feeds high-entropy samples to the network that uses ABNs. We introduce two techniques, data augmentation and free adversarial training, that increase entropy and bring the sample further away from the in-distribution domain. These techniques do not require additional training costs. Our experimental results show that EntProp achieves higher standard accuracy and robustness with a lower training cost than the baseline methods. In particular, EntProp is highly effective at training on small datasets.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# ランダムフォレスト実装による異常検出のためのマロズライクな基準

A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation ( http://arxiv.org/abs/2405.18932v1 )

ライセンス: Link先を確認

Gaoxiang Zhao, Lu Wang, Xiaoqiang Wang,

(参考訳) 異常信号検出の有効性は、特定のモデルに依存する固有の不確実性によって著しく損なわれる可能性がある。モデル平均法の枠組みの下では,集中損失関数が極めて不均衡なデータの分類に寄与するような,複数のモデルの集約における重み付けを選択するための新しい基準を提案する。この戦略は従来の投票法を置き換えることでランダムフォレストアルゴリズムにさらに統合される。提案手法をネットワーク侵入を含む様々な領域にわたるベンチマークデータセット上で評価した。提案手法は, 典型的な損失関数を用いた平均化モデルを上回るだけでなく, 精度とロバスト性の観点から, 一般的な異常検出アルゴリズムを超越することを示す。

The effectiveness of anomaly signal detection can be significantly undermined by the inherent uncertainty of relying on one specified model. Under the framework of model average methods, this paper proposes a novel criterion to select the weights on aggregation of multiple models, wherein the focal loss function accounts for the classification of extremely imbalanced data. This strategy is further integrated into Random Forest algorithm by replacing the conventional voting method. We have evaluated the proposed method on benchmark datasets across various domains, including network intrusion. The findings indicate that our proposed method not only surpasses the model averaging with typical loss functions but also outstrips common anomaly detection algorithms in terms of accuracy and robustness.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# LSPI:サイズ近傍経路同定に基づく不均一グラフニューラルネットワーク分類アルゴリズム

LSPI: Heterogeneous Graph Neural Network Classification Aggregation Algorithm Based on Size Neighbor Path Identification ( http://arxiv.org/abs/2405.18933v1 )

ライセンス: Link先を確認

Yufei Zhaoa, Shiduo Wanga, Hua Duana,

(参考訳) 既存のヘテロジニアスグラフニューラルネットワークアルゴリズム(HGNN)は、ヘテロジニアスグラフ(ヘテロジニアス情報ネットワーク(HIN)とも呼ばれる)に含まれる豊富なセマンティック情報をキャプチャするために、メタパスに依存しているが、これらのHGNNのほとんどは、機能集約の異なる方法に焦点を当て、メタパス自体の特性を無視している。本稿では3つの一般的なデータ集合におけるメタパスについて検討し、異なるメタパスによって接続される隣人の数に大きな違いがあることを見出した。同時に、大きなボルパスに含まれる騒音情報は、モデル性能に悪影響を及ぼす。そこで本稿では,大小近傍経路Iden tification (LSPI) に基づく異種グラフニューラルネットワークの分類と集約アルゴリズムを提案する。 LSPIは、まず、パス判別器を通じて、メタパスを大小隣の経路に分割し、大きな隣の経路におけるノイズ干渉問題を低減するために、トポロジと特徴の両方からより類似度の高い隣のノードを選択し、小さな隣の経路を通り、異なるグラフ畳み込み成分を介して大きな隣の経路をフィルタリングする。集約を行い、異なるサブグラフの下で特徴情報を取得し、LSPIはサブグラフレベルの注意を使って異なるサブグラフの下で特徴情報を融合して最終ノード埋め込みを生成する。最後に, 大規模実験により提案手法の優越性を検証し, エクスペイメントによる大規模隣接経路に留置すべきノード数について提案する。完全な再現可能なコードAdnデータは、https://github.com/liuhua811/LSPIAで公開された。

Existing heterogeneous graph neural network algorithms (HGNNs) mostly rely on meta-paths to capture the rich semantic information contained in heterogeneous graphs (also known as heterogeneous information networks (HINs)), but most of these HGNNs focus on different ways of feature aggre gation and ignore the properties of the meta-paths themselves. This paper studies meta-paths in three commonly used data sets and finds that there are huge differences in the number of neighbors connected by different meta paths. At the same time, the noise information contained in large neigh bor paths will have an adverse impact on model performance. Therefore, this paper proposes a Heterogeneous Graph Neural Network Classification and Aggregation Algorithm Based on Large and Small Neighbor Path Iden tification(LSPI). LSPI firstly divides the meta-paths into large and small neighbor paths through the path discriminator , and in order to reduce the noise interference problem in large neighbor paths, LSPI selects neighbor nodes with higher similarity from both topology and feature perspectives, and passes small neighbor paths and filtered large neighbor paths through different graph convolution components. Aggregation is performed to obtain feature information under different subgraphs, and then LSPI uses subgraph level attention to fuse the feature information under different subgraphs to generate the final node embedding. Finally this paper verifies the superiority of the method through extensive experiments and also gives suggestions on the number of nodes to be retained in large neighbor paths through exper iments. The complete reproducible code adn data has been published at: https://github.com/liuhua811/LSPIA.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# Kestrel: 部分認識3次元視覚言語理解のためのポイントグラウンドマルチモーダルLLM

Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding ( http://arxiv.org/abs/2405.18937v1 )

ライセンス: Link先を確認

Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed Bakr, Mohamed Elhoseiny,

(参考訳) 3次元MLLMは大きな進歩を遂げているが、それらは物体やシーンの理解に限られており、部分レベルの3次元空間構造を理解するのに苦労している。本稿では,3次元MLLMをパート認識で活用し,パートレベルでの3次元オブジェクトの解釈とセグメンテーションのグラウンディングを向上する,新しいアプローチを示すケストレルを紹介する。その重要性にも拘わらず、現在の状況には、この機能を補完し評価するタスクやデータセットが欠けている。そこで本研究では,(1)部分認識ポイントグラウンディング(Part-Aware Point Grounding),(2)部分認識ポイントグラウンドドキャプション(Part-Aware Point Grounded Captioning),(2)部分認識ポイントグラウンドドキャプション(Part-Aware Point Grounded Captioning)という2つの新しいタスクを提案する。これらの課題の学習と評価を支援するために,3DCoMPaT Grounded Instructions Dataset(3DCoMPaT-GRIN)を紹介する。 3DCoMPaT-GRINバニラは、789k個の部分認識点雲分離マスク三重項から構成され、部分認識セグメンテーショングラウンドディングのMLLMの能力を評価するために使用される。 3DCoMPaT-GRIN Grounded Captionは107kのパート対応のクラウドインストラクショングラウンド付きキャプショントレーレットを含み、MLLMのパート対応言語理解とセグメンテーショングラウンド機能の両方を評価する。導入したタスク,データセット,ケストレルは,人間の認知と3次元MLLMのギャップを埋めるための予備的な取り組みである。 3DCoMPaT-GRINの大規模な実験により、ケストレルは既存の3DMLLMには存在しないユーザ指定セグメンテーションマスクを生成できることが示されている。そこでケストレルは、3Dオブジェクトの理解とセグメンテーションの基盤を評価するためのベンチマークを構築した。 Project page at https://feielysia.github.io/Kestrel.github.io/

While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empowers 3D MLLMs with part-aware understanding, enabling better interpretation and segmentation grounding of 3D objects at the part level. Despite its significance, the current landscape lacks tasks and datasets that endow and assess this capability. Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks. To support learning and evaluating for these tasks, we introduce 3DCoMPaT Grounded Instructions Dataset (3DCoMPaT-GRIN). 3DCoMPaT-GRIN Vanilla, comprising 789k part-aware point cloud-instruction-segmentation mask triplets, is used to evaluate MLLMs' ability of part-aware segmentation grounding. 3DCoMPaT-GRIN Grounded Caption, containing 107k part-aware point cloud-instruction-grounded caption triplets, assesses both MLLMs' part-aware language comprehension and segmentation grounding capabilities. Our introduced tasks, dataset, and Kestrel represent a preliminary effort to bridge the gap between human cognition and 3D MLLMs, i.e., the ability to perceive and engage with the environment at both global and part levels. Extensive experiments on the 3DCoMPaT-GRIN show that Kestrel can generate user-specified segmentation masks, a capability not present in any existing 3D MLLM. Kestrel thus established a benchmark for evaluating the part-aware language comprehension and segmentation grounding of 3D objects. Project page at https://feielysia.github.io/Kestrel.github.io/

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# HLOB - 制限順序書における情報持続性と構造

HLOB -- Information Persistence and Structure in Limit Order Books ( http://arxiv.org/abs/2405.18938v1 )

ライセンス: Link先を確認

Antonio Briola, Silvia Bartolucci, Tomaso Aste,

(参考訳) 本稿では,制約順序書の中間価格変化予測のための新しい大規模ディープラーニングモデルを紹介し,それをHLOBと呼ぶ。この建築 (i)情報フィルタリングネットワーク(Triangulated Maximally Filtered Graph)によって符号化された情報を利用して、ボリュームレベルの深い非自明な依存性構造を明らかにする。 (II) ホモロジカル畳み込みニューラルネットワークの画期的なクラスからインスピレーションを得て, 基礎となるシステムの複雑さに対処する決定論的設計選択を保証する。我々は、NASDAQ取引所で取引された15株を含む3つの実世界の制限順序ブックデータセット上の9つの最先端ディープラーニング代替案に対して、我々のモデルを検証し、HLOBが最先端アーキテクチャを上回るシナリオを体系的に特徴づける。当社のアプローチは,高頻度金融市場におけるミクロ構造モデリングと深層学習に基づく予測とのギャップを狭めるとともに,情報空間の空間分布と,予測地平線の増大に伴う劣化に新たな光を当てるものである。

We introduce a novel large-scale deep learning model for Limit Order Book mid-price changes forecasting, and we name it `HLOB'. This architecture (i) exploits the information encoded by an Information Filtering Network, namely the Triangulated Maximally Filtered Graph, to unveil deeper and non-trivial dependency structures among volume levels; and (ii) guarantees deterministic design choices to handle the complexity of the underlying system by drawing inspiration from the groundbreaking class of Homological Convolutional Neural Networks. We test our model against 9 state-of-the-art deep learning alternatives on 3 real-world Limit Order Book datasets, each including 15 stocks traded on the NASDAQ exchange, and we systematically characterize the scenarios where HLOB outperforms state-of-the-art architectures. Our approach sheds new light on the spatial distribution of information in Limit Order Books and on its degradation over increasing prediction horizons, narrowing the gap between microstructural modeling and deep learning-based forecasting in high-frequency financial markets.

翻訳日:2024-05-30 17:49:44 公開日:2024-05-29

# Stance-Neutral Recommendation における内容非依存的モデレーション

Content-Agnostic Moderation for Stance-Neutral Recommendation ( http://arxiv.org/abs/2405.18941v1 )

ライセンス: Link先を確認

Nan Li, Bo Kang, Tijl De Bie,

(参考訳) パーソナライズされたレコメンデーションシステムは、しばしばユーザーをより極端なコンテンツへと誘導し、意見の偏りを悪化させる。これらの効果を緩和する(コンテンツ認識)モデレーションが提案されているが、そのようなアプローチは言論の自由と情報の自由を危険にさらす。この問題に対処するために、偏光低減のための代替アプローチとして、emph{content-agnostic}モデレーションの実現可能性を提案し、検討する。コンテンツに依存しないモデレーションは、実際のコンテンツが適格化されることに頼らない。我々は、コンテンツに依存しないモデレーションが完全に汎用的な環境で動作することを保証できないことを理論的に確立する。しかし,実証可能な仮定で効果的に実現できることがしばしば示されている。本稿では,コンテンツ機能に頼らずに,コンテンツレコメンデータからのリコメンデーションを変更する2つの新しいコンテンツ非依存モデレーション手法を提案する。制御実験におけるコンテンツに依存しないモデレーションの可能性を評価するため,ユーザセット,レコメンデーションシステム,モデレーションアプローチを用いて,システムのクローズドループ動作を分析するシミュレーション環境を構築した。この環境での総合的な実験を通して、提案手法は、様々なデータシナリオにおいて、スタント中立性を著しく向上させ、高いレコメンデーション品質を維持することを示します。この結果から,直接コンテンツ情報なしでのスタンス中立性の実現は実現可能であるだけでなく,ユーザのエンゲージメントを著しく低下させることなく,よりバランスのとれた情報的レコメンデーションシステムを開発する上でも有効であることが示唆された。

Personalized recommendation systems often drive users towards more extreme content, exacerbating opinion polarization. While (content-aware) moderation has been proposed to mitigate these effects, such approaches risk curtailing the freedom of speech and of information. To address this concern, we propose and explore the feasibility of \emph{content-agnostic} moderation as an alternative approach for reducing polarization. Content-agnostic moderation does not rely on the actual content being moderated, arguably making it less prone to forms of censorship. We establish theoretically that content-agnostic moderation cannot be guaranteed to work in a fully generic setting. However, we show that it can often be effectively achieved in practice with plausible assumptions. We introduce two novel content-agnostic moderation methods that modify the recommendations from the content recommender to disperse user-item co-clusters without relying on content features. To evaluate the potential of content-agnostic moderation in controlled experiments, we built a simulation environment to analyze the closed-loop behavior of a system with a given set of users, recommendation system, and moderation approach. Through comprehensive experiments in this environment, we show that our proposed moderation methods significantly enhance stance neutrality and maintain high recommendation quality across various data scenarios. Our results indicate that achieving stance neutrality without direct content information is not only feasible but can also help in developing more balanced and informative recommendation systems without substantially degrading user engagement.