Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240318となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 中国の胸部X線診断用病原体 A Disease Labeler for Chinese Chest X-Ray Report Generation ( http://arxiv.org/abs/2404.16852v1 ) ライセンス: Link先を確認	Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou,	(参考訳) 医療画像解析の分野では、中国の胸部X線レポートデータセットの不足により、中国の胸部X線レポートを生成する技術の開発が妨げられている。一方、中国の胸部X線レポートデータセットの構築は、正確な専門的疾患診断の時間的・費用的なプロセスによって制限される。一方, 1つの自然言語生成指標を用いて, 生成した報告と基盤真実の類似性を評価するのが一般的であるが, 生成した報告の臨床的精度と有効性は, 正確な疾患ラベル(分類器)に依存している。本研究は,中国の胸部X線レポート作成に適した疾患ラベル作成手法を提案する。診断報告と臨床情報を別々に扱うためにデュアルBERTアーキテクチャを活用し、疾患と身体部分の関連性に基づく階層的なラベル学習アルゴリズムを構築し、テキスト分類性能を向上させる。この疾患ラベルを用いて, 51,262検体からなる中国の胸部X線レポートデータセットを構築した。最後に、専門家が注釈した中国の胸部X線レポートのサブセットについて実験と分析を行い、提案した疾患ラベル装置の有効性を検証した。 In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metric is commonly used to evaluate the similarity between generated and ground-truth reports, while the clinical accuracy and effectiveness of the generated reports rely on an accurate disease labeler (classifier). To address the issues, this study proposes a disease labeler tailored for the generation of Chinese chest X-ray reports. This labeler leverages a dual BERT architecture to handle diagnostic reports and clinical information separately and constructs a hierarchical label learning algorithm based on the affiliation between diseases and body parts to enhance text classification performance. Utilizing this disease labeler, a Chinese chest X-ray report dataset comprising 51,262 report samples was established. Finally, experiments and analyses were conducted on a subset of expert-annotated Chinese chest X-ray reports, validating the effectiveness of the proposed disease labeler.	翻訳日:2024-07-01 11:39:16 公開日:2024-03-18
# パスワード強度指標としての期待エントロピー Expectation Entropy as a Password Strength Metric ( http://arxiv.org/abs/2404.16853v1 ) ライセンス: Link先を確認	Khan Reaz, Gerhard Wunder,	(参考訳) NIST Entropy Estimation Suite は、Min-Entropy の 0 から 1 までの結果を与える。本研究では,ランダムなパスワードやランダムなパスワードの強度を推定できる期待エントロピーを新たに開発した。期待エントロピーは、エントロピー推定ツールと同じ規模のパスワードの強度を提供する。例えば、0.4のようなある値の「探索エントロピー」を持つことは、攻撃者がパスワードを見つけるには、推測の総数の少なくとも40%を網羅的に検索しなければならないことを意味する。 The classical combinatorics-based password strength formula provides a result in tens of bits, whereas the NIST Entropy Estimation Suite give a result between 0 and 1 for Min-entropy. In this work, we present a newly developed metric -- Expectation entropy that can be applied to estimate the strength of any random or random-like password. Expectation entropy provides the strength of a password on the same scale as an entropy estimation tool. Having an 'Expectation entropy' of a certain value, for example, 0.4 means that an attacker has to exhaustively search at least 40\% of the total number of guesses to find the password.	翻訳日:2024-07-01 11:39:16 公開日:2024-03-18
# 画像文書における財務表抽出 Financial Table Extraction in Image Documents ( http://arxiv.org/abs/2405.05260v1 ) ライセンス: Link先を確認	William Watson, Bo Liu,	(参考訳) テーブルの抽出は、金融サービスにおいて長年にわたり広範囲にわたる問題であった。これは、コンテンツが厄介なピクセルフォーマットでロックされているイメージ領域において、より難しい。幸いなことに、画像セグメンテーション、OCR、シーケンスモデリングのためのディープラーニングの進歩は、印象的な結果を得るために必要な重み付けを提供する。本稿では,画像文書中の表状コンテンツを特定し,抽出し,翻訳するためのエンドツーエンドパイプラインを提案する。 Table extraction has long been a pervasive problem in financial services. This is more challenging in the image domain, where content is locked behind cumbersome pixel format. Luckily, advances in deep learning for image segmentation, OCR, and sequence modeling provides the necessary heavy lifting to achieve impressive results. This paper presents an end-to-end pipeline for identifying, extracting and transcribing tabular content in image documents, while retaining the original spatial relations with high fidelity.	翻訳日:2024-07-01 10:40:42 公開日:2024-03-18
# 3次元ホロスティックOR匿名化 3D Holistic OR Anonymization ( http://arxiv.org/abs/2405.05261v1 ) ライセンス: Link先を確認	Tony Danjun Wang,	(参考訳) 本稿では,オペレーティングルーム(OR)のマルチビューRGB-Dビデオ記録を自動的に匿名化するために,3D情報を活用する新しい手法を提案する。匿名化方式では,各画像の顔を異なる顔に置き換えて元のデータ分布を保存し,さらに下流のタスクに適したデータとして保存する。確立された匿名化法とは対照的に,本手法は2次元空間ではなく,まず3次元空間の顔の局所化を行う。それぞれの顔は、それぞれのカメラビューに異なる顔を再投影して匿名化され、最終的に結果の画像の元の顔を置き換える。さらに,動物(スワイン)の腹腔鏡下手術を経験した経験者に対して,ORの典型的特徴をカプセル化した多視点RGB-Dデータセットを提案する。最後に,そのデータセットを用いて評価した実験結果から,OR画像における3次元データを活用することにより,よりリアルな顔を生成することができることを示した。我々の知る限り、マルチビューOR記録の匿名化に対処する先行研究や、3D情報を利用した2次元顔のローカライゼーションは存在していない。 We propose a novel method that leverages 3D information to automatically anonymize multi-view RGB-D video recordings of operating rooms (OR). Our anonymization method preserves the original data distribution by replacing the faces in each image with different faces so that the data remains suitable for further downstream tasks. In contrast to established anonymization methods, our approach localizes faces in 3D space first rather than in 2D space. Each face is then anonymized by reprojecting a different face back into each camera view, ultimately replacing the original faces in the resulting images. Furthermore, we introduce a multi-view RGB-D dataset, captured during a real operation of experienced surgeons performing laparoscopic surgery on an animal object (swine), which encapsulates typical characteristics of ORs. Finally, we present experimental results evaluated on that dataset, showing that leveraging 3D data can achieve better face localization in OR images and generate more realistic faces than the current state-of-the-art. There has been, to our knowledge, no prior work that addresses the anonymization of multi-view OR recordings, nor 2D face localization that leverages 3D information.	翻訳日:2024-07-01 10:40:42 公開日:2024-03-18
# TQFTから見た通信プロトコルとQECC : その2 時空としてのQECC Communication protocols and QECC from the perspective of TQFT, Part II: QECCs as spacetimes ( http://arxiv.org/abs/2405.12364v1 ) ライセンス: Link先を確認	Chris Fields, James F. Glazebrook, Antonino Marciano,	(参考訳) トポロジカル量子場理論(TQFT)は、量子状態の準備と測定を記述するための一般的な最小推定言語を提供する。そのため、マルチエージェント通信プロトコル、例えばローカル操作、古典通信(LOCC)プロトコルを表現する汎用言語を提供する。第1部では、TQFTを用いてLOCCプロトコルを構築し、エージェント環境境界上でLOCCプロトコルが量子誤り訂正符号(QECC)を誘導することを示す。そのような QECC は、そのような境界上での時空の出現を実装または誘導すると見なすことができる。本稿では、TQFTの異なる実現法を利用して、エージェント間通信と時空の関係について検討する。計算システムとしてのスピンネットワークのバウンダリをサポートするTQFTを探索する。これらはトポロジカル量子ニューラルネットワーク(TQNN)として知られている。テンソルネットワークとして自然な表現を持つTQNNは、QECCを実装している。私たちは HaPPY コードをパラダイム的な例として認識しています。次に、バルク境界符号としてのQECCが有効時空をいかに引き起こすかを示す。 QECCにおける効果的な空間的および時間的分離は、空間的に分離された観測者間のLOCCプロトコルを可能にする。次に、BF理論およびチャーン・サイモンズ理論におけるQECCの実装を検討し、QECCによる時空がLOCCに必要な古典的冗長性を提供することを示す。最後に、位相的M-理論を高時空次元におけるQECCの実装とみなす。 Topological quantum field theories (TQFTs) provide a general, minimal-assumption language for describing quantum-state preparation and measurement. They therefore provide a general language in which to express multi-agent communication protocols, e.g. local operations, classical communication (LOCC) protocols. In the accompanying Part I, we construct LOCC protocols using TQFT, and show that LOCC protocols induce quantum error-correcting codes (QECCs) on the agent-environment boundary. Such QECCs can be regarded as implementing or inducing the emergence of spacetimes on such boundaries. Here we investigate this connection between inter-agent communication and spacetime, exploiting different realizations of TQFT. We delve into TQFTs that support on their boundaries spin-networks as computational systems: these are known as topological quantum neural networks (TQNNs). TQNNs, which have a natural representation as tensor networks, implement QECC. We recognize into the HaPPY code a paradigmatic example. We then show how generic QECCs, as bulk-boundary codes, induce effective spacetimes. The effective spatial and temporal separations that take place in QECC enables LOCC protocols between spatially separated observers. We then consider the implementation of QECCs in BF and Chern-Simons theories, and show that QECC-induced spacetimes provide the classical redundancy required for LOCC. Finally, we consider topological M-theory as an implementation of QECC in higher spacetime dimensions.	翻訳日:2024-07-01 08:39:42 公開日:2024-03-18
# 高速レコメンデーションのための動的プルーニングによる行列係数の高速化 Accelerating Matrix Factorization by Dynamic Pruning for Fast Recommendation ( http://arxiv.org/abs/2404.04265v1 ) ライセンス: Link先を確認	Yining Wu, Shengyu Duan, Gaole Sai, Chenhong Cao, Guobing Zou,	(参考訳) 行列分解 (MF) は、高い予測精度、優れた柔軟性、ビッグデータ処理における高い効率のために、リコメンデーションシステム (RS) に広く使われているコラボレーティブフィルタリング (CF) アルゴリズムである。しかし、現在のRSのユーザ/イテムが劇的に増加し、MFモデルをトレーニングする計算の複雑さが大きくなった。既存の多くの研究は、追加の計算資源を投入するか、並列システムを利用することでMFを加速し、大きなコストをかけた。本稿では,余分な計算資源を誘導することなく,MFを高速化するアルゴリズムを提案する。具体的には, あるしきい値を考慮した場合, 分解された特徴行列の微細な構造空間を観察する。微細な構造化されたスパーシリティは、行列乗算と潜在因子の更新の間に大量の不要な操作を引き起こし、MFトレーニングプロセスの計算時間を増加させる。この観測に基づいて,まず関節の間隔に基づいて特徴行列を並べ替えることを提案する。特徴行列再構成は、後のプルーニング処理による誤差を制限するために与えられる。そこで本研究では,行列乗算と潜在因子更新の双方において,非有意な潜在因子を早期に停止するプロセスによって引き起こすことを提案する。プルーニングプロセスは、異なるユーザ/イテムに対する潜伏因子の間隔に応じて動的に実行され、プロセスが加速される。実験の結果,従来のMF訓練法と比較して最大20.08%の誤差増加で1.2-1.65の高速化が達成できた。また,最適化手法,最適化手法,初期化手法など,異なるパラメータを考慮した提案手法が適用可能であることを示す。 Matrix factorization (MF) is a widely used collaborative filtering (CF) algorithm for recommendation systems (RSs), due to its high prediction accuracy, great flexibility and high efficiency in big data processing. However, with the dramatically increased number of users/items in current RSs, the computational complexity for training a MF model largely increases. Many existing works have accelerated MF, by either putting in additional computational resources or utilizing parallel systems, introducing a large cost. In this paper, we propose algorithmic methods to accelerate MF, without inducing any additional computational resources. In specific, we observe fine-grained structured sparsity in the decomposed feature matrices when considering a certain threshold. The fine-grained structured sparsity causes a large amount of unnecessary operations during both matrix multiplication and latent factor update, increasing the computational time of the MF training process. Based on the observation, we firstly propose to rearrange the feature matrices based on joint sparsity, which potentially makes a latent vector with a smaller index more dense than that with a larger index. The feature matrix rearrangement is given to limit the error caused by the later performed pruning process. We then propose to prune the insignificant latent factors by an early stopping process during both matrix multiplication and latent factor update. The pruning process is dynamically performed according to the sparsity of the latent factors for different users/items, to accelerate the process. The experiments show that our method can achieve 1.2-1.65 speedups, with up to 20.08% error increase, compared with the conventional MF training process. We also prove the proposed methods are applicable considering different hyperparameters including optimizer, optimization strategy and initialization method.	翻訳日:2024-04-14 13:21:48 公開日:2024-03-18
# HomoGenius:ニューラル演算子を用いた機械的特性の迅速予測のための均質化基礎モデル HomoGenius: a Foundation Model of Homogenization for Rapid Prediction of Effective Mechanical Properties using Neural Operators ( http://arxiv.org/abs/2404.07943v1 ) ライセンス: Link先を確認	Yizheng Wang, Xiang Li, Ziming Yan, Yuqing Du, Jinshuai Bai, Bokai Liu, Timon Rabczuk, Yinghua Liu,	(参考訳) 均質化(homogenization)は、多スケールの物理現象を研究するための重要なツールである。しかし、有限要素解析に大きく依存する伝統的な数値的均質化は、特に複雑な測地、材料、高分解能問題を扱う際に、広範な計算コストを必要とする。これらの制約に対処するために,演算子学習に基づく数値同化モデルを提案する。提案モデルでは,任意の測地,材料,分解物の均質化結果を迅速に提供し,従来の数値均質化法と比較して80倍の効率向上を実現している。我々は, 周期材料(TPMS: Triply Periodic Minimal Surface)の有効弾性率の予測におけるモデルの有効性を検証した。その結果,本モデルは高精度,超効率,学習能力を有することがわかった。 Homogenization is an essential tool for studying multiscale physical phenomena. However, traditional numerical homogenization, heavily reliant on finite element analysis, requires extensive computation costs, particularly in handling complex geometries, materials, and high-resolution problems. To address these limitations, we propose a numerical homogenization model based on operator learning: HomoGenius. The proposed model can quickly provide homogenization results for arbitrary geometries, materials, and resolutions, increasing the efficiency by a factor of 80 compared to traditional numerical homogenization methods. We validate effectiveness of our model in predicting the effective elastic modulus on periodic materials (TPMS: Triply Periodic Minimal Surface), including complex geometries, various Poisson's ratios and elastic modulus, and different resolutions for training and testing. The results show that our model possesses high precision, super efficiency, and learning capability.	翻訳日:2024-04-14 13:03:36 公開日:2024-03-18
# オンデバイス学習のための組込み開発環境のユーザビリティと性能解析 Usability and Performance Analysis of Embedded Development Environment for On-device Learning ( http://arxiv.org/abs/2404.07948v1 ) ライセンス: Link先を確認	Enzo Scaffi, Antoine Bonneau, Frédéric Le Mouël, Fabien Mieyeville,	(参考訳) 本研究は,デバイス上でのTinyML実装に有効な組み込み開発ツールを実証的に検討する。この研究は、基本的なハードウェア操作から最小限のMLトレーニングの展開に至るまで、リソース制限されたIoTデバイス上でさまざまな抽象化レベルを持つさまざまな開発ツールを評価する。この分析は、異なるソリューションのモデルトレーニングおよび推論およびユーザビリティにおけるメモリ使用量、エネルギー消費量、パフォーマンスメトリクスを含む。 Arduino Frameworkは実装の容易さを提供するが、ネイティブオプションと比較してエネルギー消費が増加する。 DVFSのような特定の重要な機能がOSに直接統合されていないことは、ハードウェア制御の細かい制限を強調している。 This research empirically examines embedded development tools viable for on-device TinyML implementation. The research evaluates various development tools with various abstraction levels on resource-constrained IoT devices, from basic hardware manipulation to deployment of minimalistic ML training. The analysis encompasses memory usage, energy consumption, and performance metrics during model training and inference and usability of the different solutions. Arduino Framework offers ease of implementation but with increased energy consumption compared to the native option, while RIOT OS exhibits efficient energy consumption despite higher memory utilization with equivalent ease of use. The absence of certain critical functionalities like DVFS directly integrated into the OS highlights limitations for fine hardware control.	翻訳日:2024-04-14 13:03:36 公開日:2024-03-18
# 一般化可能なガウススプレイティングによる強化学習 Reinforcement Learning with Generalizable Gaussian Splatting ( http://arxiv.org/abs/2404.07950v1 ) ライセンス: Link先を確認	Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu,	(参考訳) 優れた表現は強化学習(RL)のパフォーマンス、特に視覚に基づく強化学習において重要である。環境表現の質は学習課題の達成に直接影響を及ぼす。従来の視覚ベースのRLは、画像、点、ボクセル、神経放射場などの環境を表現するために、明示的または暗黙的な方法を使用するのが一般的である。しかし、これらの表現にはいくつかの欠点がある。複雑な局所的な地形を記述することも、見えない場面によく一般化することも、正確な前景マスクを必要とすることもできない。さらに、これらの暗黙的な神経表現は『ブラックボックス』に似たものであり、解釈可能性を大幅に妨げている。 3D Gaussian Splatting (3DGS) は、その明示的なシーン表現と微分可能なレンダリング特性を持ち、再構築と表現方法の革新的変化と見なされている。本稿では、GSRLと呼ばれるRLタスクを表現するための新しい一般化可能なガウス分割フレームワークを提案する。提案手法は,RoboMimic環境での検証により,複数のタスクにおいて他のベースラインよりも優れた結果が得られ,最も難しいタスクのベースラインに比べて10%,44%,15%の性能向上が達成される。この研究は、RLの表現として一般化可能な3DGSを活用する最初の試みである。 An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.	翻訳日:2024-04-14 13:03:36 公開日:2024-03-18
# ソーシャルネットワーク上でのARIMA時系列解析による多言語トピックダイナミクスとトレンド同定のデコード:LDA/HDPモデルにより強化された新しいデータ翻訳フレームワーク Decoding Multilingual Topic Dynamics and Trend Identification through ARIMA Time Series Analysis on Social Networks: A Novel Data Translation Framework Enhanced by LDA/HDP Models ( http://arxiv.org/abs/2403.15445v1 ) ライセンス: Link先を確認	Samawel Jaballi, Azer Mahjoubi, Manar Joundy Hazar, Salah Zrigui, Henri Nicolas, Mounir Zrigui,	(参考訳) 本研究では,多言語トピックのダイナミクスの復号化と危機時のコミュニケーション傾向の同定に有効な新しい手法を提案する。われわれは、コロナウイルスパンデミックの間、チュニジアのソーシャルネットワーク内での対話や、スポーツや政治などの有名なテーマに焦点を当てている。まず、これらのテーマに関連するコメントの多言語コーパスを集約することから始めます。このデータセットは、データ前処理中に厳格に洗練される。次に、言語的差異に対処するために、ノー・イングリッシュ・トゥ・イングリッシュ・マシン翻訳手法を導入する。本手法の実証実験では, 高い精度とF1得点を示し, 言語的に整合性のある課題に対する適合性を強調した。より深い高度なモデリング技術、特にLDAとHDPモデルを用いて、翻訳されたコンテンツから関連するトピックを抽出する。これにより、ARIMA時系列分析を適用して、進化するトピックのトレンドをデコードする。提案手法を多言語チュニジアデータセットに適用し,公共の感情を反映した重要なトピックを効果的に同定した。このような洞察は、危機時の公共の視点を理解しようとする組織や政府にとって不可欠である。標準的なアプローチと比較して、私たちのモデルは、Coherence Score、U-mass、Topic Coherenceといったメトリクスで確認されているように、パフォーマンスが優れています。さらに,特定トピックの詳細な評価では,RMSEに基づく分析を背景として,議論の主題的変化が顕著であり,その傾向は印象的な精度を示している。 In this study, the authors present a novel methodology adept at decoding multilingual topic dynamics and identifying communication trends during crises. We focus on dialogues within Tunisian social networks during the Coronavirus Pandemic and other notable themes like sports and politics. We start by aggregating a varied multilingual corpus of comments relevant to these subjects. This dataset undergoes rigorous refinement during data preprocessing. We then introduce our No-English-to-English Machine Translation approach to handle linguistic differences. Empirical tests of this method showed high accuracy and F1 scores, highlighting its suitability for linguistically coherent tasks. Delving deeper, advanced modeling techniques, specifically LDA and HDP models are employed to extract pertinent topics from the translated content. This leads to applying ARIMA time series analysis to decode evolving topic trends. Applying our method to a multilingual Tunisian dataset, we effectively identified key topics mirroring public sentiment. Such insights prove vital for organizations and governments striving to understand public perspectives during crises. Compared to standard approaches, our model outperforms, as confirmed by metrics like Coherence Score, U-mass, and Topic Coherence. Additionally, an in-depth assessment of the identified topics revealed notable thematic shifts in discussions, with our trends identification indicating impressive accuracy, backed by RMSE-based analysis.	翻訳日:2024-04-01 02:54:20 公開日:2024-03-18
# 圧縮された信頼の復号:圧縮下における効率的なLLMの信頼性の検討 Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ( http://arxiv.org/abs/2403.15447v1 ) ライセンス: Link先を確認	Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li,	(参考訳) 高機能大言語モデル (LLM) の圧縮は,資源効率のよい推論手法として好まれている。 SoTA(State-of-the-art)圧縮法は、良質なタスク性能の保存において顕著な進歩を誇っているが、安全性と信頼性の点で圧縮の潜在的なリスクは無視されている。本研究は,8次元(8次元)にわたる5つのSTA圧縮技術を用いて,3つのLLMを徹底的に評価する。我々の実験は、圧縮と信頼性の間の複雑な相互作用を強調し、興味深いパターンを明らかにします。量子化は現在、効率性と信頼性を同時に達成する上で、プルーニングよりも効果的なアプローチであることが分かっています。例えば、4ビットの量子化モデルでは、元のモデルの信頼性は維持されるが、モデルプルーニングは50%の間隔でも信頼性を著しく低下させる。さらに、適度なビット範囲内での量子化の導入は、倫理や公正といった特定の信頼性の次元を予想外に改善する可能性がある。逆に、非常に低ビットレベル(3ビット)への極端な量子化は、信頼性を著しく低下させる傾向がある。このリスクの増加は、良心的なパフォーマンスを単独で見るだけでは発見できない。これらの知見は, LLMの実用性, 効率, 信頼性を同時に達成するための実践的勧告を導いた。モデルとコードはhttps://decoding-comp-trust.github.io/.com/で公開されている。 Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to significantly reduce trustworthiness. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Models and code are available at https://decoding-comp-trust.github.io/.	翻訳日:2024-04-01 02:54:20 公開日:2024-03-18
# 位相検索におけるエンド・ツー・エンド・ラーニングとは何か? What is Wrong with End-to-End Learning for Phase Retrieval? ( http://arxiv.org/abs/2403.15448v1 ) ライセンス: Link先を確認	Wenjie Zhang, Yuxiang Wan, Zhong Zhuang, Ju Sun,	(参考訳) 画像科学でよく見られる非線形逆問題に対しては、フォワードモデルの対称性が一般的である。このような問題を解決するためにデータ駆動型ディープラーニングアプローチを使用する場合、本質的な対称性は重大な学習困難を引き起こす可能性がある。本稿では,このような困難がどうして生じるのか,さらに重要なことは,学習前にトレーニングセット,すなわち対称性の破れを前処理して克服する方法を説明する。科学画像の多くの領域において中心的な遠距離位相探索 (FFPR) を例に挙げ, 対称破壊がデータ駆動学習を大幅に改善することを示す。また、対称性の破れの数学的原理を定式化する。 For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before any learning, i.e., symmetry breaking. We take far-field phase retrieval (FFPR), which is central to many areas of scientific imaging, as an example and show that symmetric breaking can substantially improve data-driven learning. We also formulate the mathematical principle of symmetry breaking.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# 無視からの憎しみ! 会話のヘイトスピーチに対する説得モードの蒸留 Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech ( http://arxiv.org/abs/2403.15449v1 ) ライセンス: Link先を確認	Ghadi Alyahya, Abeer Aldayel,	(参考訳) 反音声が使用する要因を調べることは、オンラインでヘイトスピーチに直面する最適な方法を理解することの核心にある。様々な研究は、感情の共感、攻撃性、敵意のレベルなど、カウンタースピーチで使用される感情ベースファクターを評価する。本研究は,会話の対話で使用される対語をより深く理解するために,説得モードを理性,感情,信頼性に抽出し,閉(複数ターン)と開(単ターン)の2種類の会話の相互作用において,人種差別,性差別,宗教に関する会話の相互作用を評価する。評価は、人間と生成された対音声の区別された振る舞いをカバーしている。また,回答の姿勢と,対応音声における各説得態勢の相互関係についても検討した。特に、オープン・クローズド・インタラクション(特にトピックレベルで)に対する反音声の説得モードの微妙な違いを観察し、論点をヘイトコメントを表すための説得モードとして理性を用いる傾向が一般的である。生成された反音声は感情的な説得モードを示す傾向があり、一方で人間のカウンターは推論を用いて傾いている。さらに,本研究は,説得モードとしての理由が,他の説得型よりも支持的な応答を得る傾向にあることを示した。本研究は, ヘイトスピーチを抑える研究に説得モードを取り入れることの可能性を強調し, これらのモードが説明可能性の最適な手段となり, 応答のスタンスをさらに導入するための道筋と, 最適な逆音声を構成するものを評価する上で果たす役割を明らかにする。 Examining the factors that the counter-speech uses is at the core of understanding the optimal methods for confronting hate speech online. Various studies assess the emotional base factor used in counter speech, such as emotion-empathy, offensiveness, and level of hostility. To better understand the counter-speech used in conversational interactions, this study distills persuasion modes into reason, emotion, and credibility and then evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) conversation interactions concerning racism, sexism, and religion. The evaluation covers the distinct behaviors of human versus generated counter-speech. We also assess the interplay between the replies' stance and each mode of persuasion in the counter-speech. Notably, we observe nuanced differences in the counter-speech persuasion modes for open and closed interactions -- especially on the topic level -- with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The generated counter-speech tends to exhibit an emotional persuasion mode, while human counters lean towards using reasoning. Furthermore, our study shows that reason as a persuasion mode tends to obtain more supportive replies than do other persuasion types. The findings highlight the potential of incorporating persuasion modes into studies about countering hate speech, as these modes can serve as an optimal means of explainability and paves the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counter-speech.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# 検索拡張世代(LoRAG)のループ Loops On Retrieval Augmented Generation (LoRAG) ( http://arxiv.org/abs/2403.15450v1 ) ライセンス: Link先を確認	Ayush Thakur, Rashmi Vashisth,	(参考訳) 本稿では,反復ループ機構の導入による検索強化テキスト生成の品質向上を目的とした新しいフレームワークであるLoRAGについて述べる。このアーキテクチャは、生成モデル、検索機構、動的ループモジュールを統合し、入力コンテキストから取得した関連情報との相互作用を通じて生成されたテキストを反復的に洗練することができる。ベンチマークデータセットの実験的評価では、LORAGはBLEUスコア、ROUGEスコア、パープレキシティの点で既存の最先端モデルを超えており、生成されたテキストのコヒーレンスと関連性の両方を達成する上での有効性を示している。質的な評価は、文脈的にリッチで一貫性のある出力を生成するLORAGの能力をさらに示している。本研究は,テキスト生成における課題の緩和における反復ループの可能性について,LoRAGをこの分野における有望な進歩と位置づけた貴重な知見を提供する。 This paper presents Loops On Retrieval Augmented Generation (LoRAG), a new framework designed to enhance the quality of retrieval-augmented text generation through the incorporation of an iterative loop mechanism. The architecture integrates a generative model, a retrieval mechanism, and a dynamic loop module, allowing for iterative refinement of the generated text through interactions with relevant information retrieved from the input context. Experimental evaluations on benchmark datasets demonstrate that LoRAG surpasses existing state-of-the-art models in terms of BLEU score, ROUGE score, and perplexity, showcasing its effectiveness in achieving both coherence and relevance in generated text. The qualitative assessment further illustrates LoRAG's capability to produce contextually rich and coherent outputs. This research contributes valuable insights into the potential of iterative loops in mitigating challenges in text generation, positioning LoRAG as a promising advancement in the field.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# 大規模言語モデルを用いたFAIRデータ空間の実現に向けて Towards Enabling FAIR Dataspaces Using Large Language Models ( http://arxiv.org/abs/2403.15451v1 ) ライセンス: Link先を確認	Benedikt T. Arnold, Johannes Theissen-Lipp, Diego Collarana, Christoph Lange, Sandra Geisler, Edward Curry, Stefan Decker,	(参考訳) データスペースは、伝統的に文化のようなデジタル化されていない領域を含む、さまざまな分野で採用されている。セマンティックWeb技術を活用することは、データ空間をFAIRにするのに役立つが、その複雑さはデータ空間の採用に重大な課題をもたらし、コストを増大させる。 LLM(Large Language Models)の出現は、これらのモデルがFAIRデータ空間の採用をサポートするにはどうすればよいのかという疑問を提起する。本研究では,データ空間におけるLLMの可能性を具体例で示す。我々はまた、この新興分野を探求するための研究課題も導いた。 Dataspaces have recently gained adoption across various sectors, including traditionally less digitized domains such as culture. Leveraging Semantic Web technologies helps to make dataspaces FAIR, but their complexity poses a significant challenge to the adoption of dataspaces and increases their cost. The advent of Large Language Models (LLMs) raises the question of how these models can support the adoption of FAIR dataspaces. In this work, we demonstrate the potential of LLMs in dataspaces with a concrete example. We also derive a research agenda for exploring this emerging field.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# ツールとは何か?言語モデルから見た調査 What Are Tools Anyway? A Survey from the Language Model Perspective ( http://arxiv.org/abs/2403.15452v1 ) ライセンス: Link先を確認	Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig,	(参考訳) 言語モデル(LM)は強力だが、主にテキスト生成タスクに向いている。複雑なスキルを必要とするタスクのパフォーマンスを大幅に向上させた。しかしながら,多くの著作では,“ツール”という用語をさまざまな方法で採用している。その後、ツールはどのようにしてLMを助けるのか? 本稿では,LMが使用する外部プログラムとしてツールを統一的に定義し,LMツールのシナリオとアプローチを体系的にレビューする。本レビューに基づいて,様々なベンチマークで必要な計算および性能向上を計測し,様々なツール手法の有効性を実証的に検討し,今後の課題と課題を明らかにする。 Language models (LMs) are powerful yet mostly for text generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills. However, many works adopt the term "tool" in different ways, raising the question: What is a tool anyway? Subsequently, where and how do tools help LMs? In this survey, we provide a unified definition of tools as external programs used by LMs, and perform a systematic review of LM tooling scenarios and approaches. Grounded on this review, we empirically study the efficiency of various tooling methods by measuring their required compute and performance gains on various benchmarks, and highlight some challenges and potential future research in the field.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# Span-Oriented Information extract -- 情報抽出の統一的視点 Span-Oriented Information Extraction -- A Unifying Perspective on Information Extraction ( http://arxiv.org/abs/2403.15453v1 ) ライセンス: Link先を確認	Yifan Ding, Michael Yankoski, Tim Weninger,	(参考訳) インフォメーション抽出(Information extract)とは、自然言語処理(NLP)におけるタスクの集合で、テキストとそのラベル内のサブシーケンスを識別する。これらのタスクは、関連する情報を抽出し、自由テキストを構造化データにリンクするために長年使われてきた。しかし,情報抽出タスクの不均一性は,この分野の進歩を妨げている。したがって、テキストでスパンと定義するものを中心に、統一された視点を提供する。次に、これらの不連続なタスクをこの統一的な視点に再配置し、続いて、情報抽出タスクを、同じ基本的なSpan-Oriented Information extractタスクの変種として、広範囲に並べて表現する。 Information Extraction refers to a collection of tasks within Natural Language Processing (NLP) that identifies sub-sequences within text and their labels. These tasks have been used for many years to link extract relevant information and to link free text to structured data. However, the heterogeneity among information extraction tasks impedes progress in this area. We therefore offer a unifying perspective centered on what we define to be spans in text. We then re-orient these seemingly incongruous tasks into this unified perspective and then re-present the wide assortment of information extraction tasks as variants of the same basic Span-Oriented Information Extraction task.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# 変圧器による感情検出 : 比較検討 Emotion Detection with Transformers: A Comparative Study ( http://arxiv.org/abs/2403.15454v1 ) ライセンス: Link先を確認	Mahdi Rezapour,	(参考訳) 本研究では,テキストデータを用いた感情分類におけるトランスフォーマーモデルの適用について検討する。我々は、異なる変圧器の変種を用いて、感情データセットを用いて、事前訓練されたトランスフォーマーモデルを訓練し、評価する。また、トランス層の微調整、層の訓練性、テキストデータの事前処理など、モデルの性能に影響を及ぼす要因についても分析する。解析の結果,句読解や停止語といった一般的な手法は,モデルの性能を損なうことが判明した。これは、トランスフォーマーの強みがテキスト内のコンテキスト関係を理解することにあるためかもしれない。句読点や停止語といった要素は、それでも感情や強調を伝達し、それらを取り除くことで、この文脈を混乱させる可能性がある。 In this study, we explore the application of transformer-based models for emotion classification on text data. We train and evaluate several pre-trained transformer models, on the Emotion dataset using different variants of transformers. The paper also analyzes some factors that in-fluence the performance of the model, such as the fine-tuning of the transformer layer, the trainability of the layer, and the preprocessing of the text data. Our analysis reveals that commonly applied techniques like removing punctuation and stop words can hinder model performance. This might be because transformers strength lies in understanding contextual relationships within text. Elements like punctuation and stop words can still convey sentiment or emphasis and removing them might disrupt this context.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# テキストストリーム中の微調整文のサンプリング法の改善 Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams ( http://arxiv.org/abs/2403.15455v1 ) ライセンス: Link先を確認	Cristiano Mesquita Garcia, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr, Jean Paul Barddal,	(参考訳) インターネット上でのテキストデータの拡散は、組織や企業がサービスや製品に関する世論を監視できるユニークな機会である。このようなデータの高速な生成を考えると、シーケンシャルに到着し、潜在的に無限のテキストストリームを処理するテキストストリームマイニング設定は、従来のバッチ学習よりも適していることが多い。事前トレーニングされた言語モデルは、ストリーミング環境で高品質なテキストベクトル化機能に一般的に使用されるが、コンセプトドリフト(データ分散が時間とともに変化し、モデルのパフォーマンスに悪影響を及ぼす現象)に適応するための課題に直面している。本研究は,概念ドリフトの問題に対処するため,選択的な微調整言語モデルの設計した7つのテキストサンプリング手法の有効性について検討し,性能劣化を軽減した。これらの手法がSBERTモデルの微調整に与える影響を, 4つの異なる損失関数を用いて正確に評価する。マクロF1スコアと経過時間に着目した評価では、2つのテキストストリームデータセットとインクリメンタルSVM分類器を用いて性能をベンチマークする。以上の結果から,ソフトマックスの損失とバッチ・オール・トリプレットの損失はテキストストリームの分類に特に有効であることが示唆された。特に,提案したWordPieceToken比サンプリング法は,識別された損失関数により性能を著しく向上させ,ベースライン結果を上回った。 The proliferation of textual data on the Internet presents a unique opportunity for institutions and companies to monitor public opinion about their services and products. Given the rapid generation of such data, the text stream mining setting, which handles sequentially arriving, potentially infinite text streams, is often more suitable than traditional batch learning. While pre-trained language models are commonly employed for their high-quality text vectorization capabilities in streaming contexts, they face challenges adapting to concept drift - the phenomenon where the data distribution changes over time, adversely affecting model performance. Addressing the issue of concept drift, this study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models, thereby mitigating performance degradation. We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions. Our evaluation, focused on Macro F1-score and elapsed time, employs two text stream datasets and an incremental SVM classifier to benchmark performance. Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification, demonstrating that larger sample sizes generally correlate with improved macro F1-scores. Notably, our proposed WordPieceToken ratio sampling method significantly enhances performance with the identified loss functions, surpassing baseline results.	翻訳日:2024-04-01 02:44:33 公開日:2024-03-18
# 気象リアナリシスデータを用いた高層天体における上向き雷の時空間リスク評価 Spatio-seasonal risk assessment of upward lightning at tall objects using meteorological reanalysis data ( http://arxiv.org/abs/2403.18853v1 ) ライセンス: Link先を確認	Isabell Stucke, Deborah Morgenstern, Georg J. Mayr, Thorsten Simon, Achim Zeileis, Gerhard Diendorfer, Wolfgang Schulz, Hannes Pichler,	(参考訳) 本研究は,アルプス東部とその周辺地域における高層天体の雷害について検討し,上向きの雷害のリスクを評価する。 ULの長期電流が大きな損傷を与える可能性があるため、ULは特に風力タービンに脅威を与える。現在のリスク評価手法は、気象条件の影響を見落とし、ULリスクを過小評価する可能性がある。そこで本研究では,ガイスベルクタワー(オーストリア)で測定されたUL値と大規模気象変数(35ドル)との関係を,機械学習手法であるランダムフォレストを用いて解析した。これらのうち、風速10mでの大規模な上昇速度、風速、方向、および雲物理学の変数が最も多くの情報に貢献している。ランダム森林は1 km$^2$の解像度で、調査領域全体でULのリスクを予測する。強風と高地による上向きのたわみが組み合わさった強風は、ULリスクを増大させる。 ULリスクと高リスク領域の日周期は季節的に変化する。冬はアルプス山脈の北と北東に集中し、北から南に広がるため、過渡期と夏の間は北イタリアに影響を及ぼす。このモデルは冬に最も良く、高い天体で観測された雷で観測されたピークと一致したULリスクが最も高い。最高濃度はアルプス山脈の北にあり、ほとんどの風力タービンが位置しており、雷活動全体の増加につながっている。雷密度は高い天体の雷の指標として不十分であるため、総合的な気象情報はULリスク評価に不可欠である。 This study investigates lightning at tall objects and evaluates the risk of upward lightning (UL) over the eastern Alps and its surrounding areas. While uncommon, UL poses a threat, especially to wind turbines, as the long-duration current of UL can cause significant damage. Current risk assessment methods overlook the impact of meteorological conditions, potentially underestimating UL risks. Therefore, this study employs random forests, a machine learning technique, to analyze the relationship between UL measured at Gaisberg Tower (Austria) and $35$ larger-scale meteorological variables. Of these, the larger-scale upward velocity, wind speed and direction at 10 meters and cloud physics variables contribute most information. The random forests predict the risk of UL across the study area at a 1 km$^2$ resolution. Strong near-surface winds combined with upward deflection by elevated terrain increase UL risk. The diurnal cycle of the UL risk as well as high-risk areas shift seasonally. They are concentrated north/northeast of the Alps in winter due to prevailing northerly winds, and expanding southward, impacting northern Italy in the transitional and summer months. The model performs best in winter, with the highest predicted UL risk coinciding with observed peaks in measured lightning at tall objects. The highest concentration is north of the Alps, where most wind turbines are located, leading to an increase in overall lightning activity. Comprehensive meteorological information is essential for UL risk assessment, as lightning densities are a poor indicator of lightning at tall objects.	翻訳日:2024-04-01 02:25:04 公開日:2024-03-18
# リンク予測による基準緩和勧告とランク付け Directed Criteria Citation Recommendation and Ranking Through Link Prediction ( http://arxiv.org/abs/2403.18855v1 ) ライセンス: Link先を確認	William Watson, Lawrence Yong,	(参考訳) リンク予測は、新しい文書にトポロジ的あるいは文脈的に関連がある可能性のある既存の文献から自動的に文書を抽出するプロキシとして検討する。本モデルでは,各文書の意味を要約ネットワーク内のノードとして符号化するために,トランスフォーマーベースのグラフ埋め込みを用いる。我々のモデルが生成するセマンティック表現は、推薦タスクやランキングタスクにおいて、他のコンテントベースの手法よりも優れていることを示す。これは、すべての矛盾の可能性を最小限に抑えるために、これらの文書が互いに適切に引用することが重要である領域における引用グラフを探索するための全体論的アプローチを提供する。 We explore link prediction as a proxy for automatically surfacing documents from existing literature that might be topically or contextually relevant to a new document. Our model uses transformer-based graph embeddings to encode the meaning of each document, presented as a node within a citation network. We show that the semantic representations that our model generates can outperform other content-based methods in recommendation and ranking tasks. This provides a holistic approach to exploring citation graphs in domains where it is critical that these documents properly cite each other, so as to minimize the possibility of any inconsistencies	翻訳日:2024-04-01 02:25:04 公開日:2024-03-18
# Shift Aggregate Extract Networks Shift Aggregate Extract Networks ( http://arxiv.org/abs/1703.05537v2 ) ライセンス: Link先を確認	Francesco Orsini, Daniele Baracchi, Paolo Frasconi,	(参考訳) 大規模グラフの効率的な表現を学習するために,階層分解に基づくアーキテクチャを導入する。我々のフレームワークは、カーネルメソッドで使用される古典的なR分解を拡張し、ネストした部分関係を可能にする。入力グラフのテンプレートを直接アンロールする再帰的ニューラルネットワークとは異なり、ニューラルネットワークテンプレートを分解階層上にアンロールすることで、一般的にソーシャルネットワークグラフを特徴付ける高次変動に対処することができる。深い階層的な分解は、対称性を利用して空間と時間の複雑さを減らす手法である領域圧縮にも適用可能である。我々は、我々のアプローチが、大規模なソーシャルネットワークデータセット上で最先端のグラフ分類手法より優れていると同時に、小さな化学生物学的なベンチマークデータセットに対して競争力があることを実証的に示す。 We introduce an architecture based on deep hierarchical decompositions to learn effective representations of large graphs. Our framework extends classic R-decompositions used in kernel methods, enabling nested part-of-part relations. Unlike recursive neural networks, which unroll a template on input graphs directly, we unroll a neural network template over the decomposition hierarchy, allowing us to deal with the high degree variability that typically characterize social network graphs. Deep hierarchical decompositions are also amenable to domain compression, a technique that reduces both space and time complexity by exploiting symmetries. We show empirically that our approach is able to outperform current state-of-the-art graph classification methods on large social network datasets, while at the same time being competitive on small chemobiological benchmark datasets.	翻訳日:2024-03-26 00:17:07 公開日:2024-03-18
# MDU-Net:バイオメディカルイメージセグメンテーションのためのマルチスケール高密度接続U-Net MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation ( http://arxiv.org/abs/1812.00352v3 ) ライセンス: Link先を確認	Jiawei Zhang, Yuzhen Jin, Jilan Xu, Xiaowei Xu, Yanchun Zhang,	(参考訳) バイオメディカルイメージセグメンテーションは、定量的分析、臨床診断、医療介入において中心的な役割を果たす。完全畳み込みネットワーク (FCN) と U-Net により、ディープ畳み込みネットワーク (DNN) はバイオメディカルイメージセグメンテーションの応用に多大な貢献をしている。本稿では,U字型アーキテクチャのデコーダであるエンコーダに対して,3つの異なるMDC(Multi-scale dense connection)を提案する。 3つの密接な接続に基づいて,バイオメディカルイメージセグメンテーションのためのマルチスケール密接なU-Net(MDU-Net)を提案する。 MDU-Netは、隣のフィーチャーマップを高層と低層の両方から異なるスケールで直接融合させ、現在のレイヤにおけるフィーチャの伝搬を強化する。入力と出力に近い層間の接続が短いマルチスケールの高密度接続は、さらに深いU-Netを可能にする。さらに,高密度接続におけるポテンシャル過適合を緩和し,さらにセグメンテーション性能を向上させるために量子化を導入する。提案手法をMICCAI 2015 Gland Segmentation (GlaS) データセット上で評価した。 3つのMDCはU-Netのパフォーマンスを最大1.8%改善し、MICCAI GlandデータセットではテストAでは3.5%向上した。一方、量子化を伴うMDU-Netは、明らかに元のU-Netのセグメンテーション性能を改善する。 Biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. In the light of the fully convolutional networks (FCN) and U-Net, deep convolutional networks (DNNs) have made significant contributions to biomedical image segmentation applications. In this paper, we propose three different multi-scale dense connections (MDC) for the encoder, the decoder of U-shaped architectures, and across them. Based on three dense connections, we propose a multi-scale densely connected U-Net (MDU-Net) for biomedical image segmentation. MDU-Net directly fuses the neighboring feature maps with different scales from both higher layers and lower layers to strengthen feature propagation in the current layer. Multi-scale dense connections, which contain shorter connections between layers close to the input and output, also make a much deeper U-Net possible. Besides, we introduce quantization to alleviate the potential overfitting in dense connections, and further improve the segmentation performance. We evaluate our proposed model on the MICCAI 2015 Gland Segmentation (GlaS) dataset. The three MDC improve U-Net performance by up to 1.8% on test A and 3.5% on test B in the MICCAI Gland dataset. Meanwhile, the MDU-Net with quantization obviously improves the segmentation performance of original U-Net.	翻訳日:2024-03-26 00:17:07 公開日:2024-03-18
# コンパイラが生成した大規模言語モデルへのフィードバック Compiler generated feedback for Large Language Models ( http://arxiv.org/abs/2403.14714v1 ) ライセンス: Link先を確認	Dejan Grubisic, Chris Cummins, Volker Seeker, Hugh Leather,	(参考訳) 我々は,LLVMアセンブリのコードサイズを最適化するために,コンパイラフィードバックを備えたLarge Language Modelを用いたコンパイラ最適化において,新しいパラダイムを導入する。このモデルは、最適化されていないLLVM IRを入力として取り、最適化されたIR、最適な最適化パス、最適化されていないIRと最適化されたIRの両方の命令数を生成する。そして、生成された最適化で入力をコンパイルし、予測された命令数が正しいか評価し、生成されたIRがコンパイル可能で、コンパイルされたコードに対応する。このフィードバックを LLM に返して,コードを最適化する新たな機会を与えています。このアプローチでは、オリジナルのモデルに-Ozよりも0.53%改善されている。フィードバックでより多くの情報を追加するのは直感的であるように思えるが、単純なサンプリング技術は10以上のサンプルが与えられた場合、はるかに高いパフォーマンスを達成する。 We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly. The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs. Then we compile the input with generated optimization passes and evaluate if the predicted instruction count is correct, generated IR is compilable, and corresponds to compiled code. We provide this feedback back to LLM and give it another chance to optimize code. This approach adds an extra 0.53% improvement over -Oz to the original model. Even though, adding more information with feedback seems intuitive, simple sampling techniques achieve much higher performance given 10 or more samples.	翻訳日:2024-03-25 21:41:26 公開日:2024-03-18
# コンピューティングにおける参加拡大の進展を可視化する:コンテキストの価値 Visualizing Progress in Broadening Participation in Computing: The Value of Context ( http://arxiv.org/abs/2403.14708v1 ) ライセンス: Link先を確認	Valerie Barr, Carla E. Brodley, Manuel A. Pérez-Quiñones,	(参考訳) 米国内でのコンピューティングの表現に関する懸念は、参加を広げるために多くの活動を促している。これらの取り組みの影響の評価と、実際に対処されている「プロブレム」の明確な評価は、計算学の学位を持つ学生数の比率として各人口の表現を考察する最も一般的なデータ分析の性質によって制限されている。この単一のメトリクスの使用は、参加活動の拡大の影響を適切に評価することはできない。第一に、このアプローチは、連邦が指定した性別、人種、民族集団の総数と相対比率の点で、学部生の人口人口の変化を説明できない。第二の問題は、コンピューティング(BPC)への参加の拡大に関する文献の大多数が、学生の交叉アイデンティティに関するデータを省略して、性別や人種、民族に関するデータを報告していることである。これにより、データと私たちがフィールドとして直面している課題の両方を正しく理解できません。本稿では,BPCの取り組みに対する影響を追跡するために,いくつかの異なるアプローチを提案する。推奨事項は3つあります。 1)コホートに基づく分析は,コンピュータにおける学生のエンゲージメントを正確に示すために用いるべきである。 2 分野全体としては、常に交叉データを報告する基準を採用する必要がある。 3)大学人口統計学の文脈は、CS部門がコンピューティングへの参加を拡大するためにどれだけうまく行っているかを考える際に重要であり、その中には、コンピューティングの地域人口統計学に影響を及ぼす大学人口動態の経年変化の分析も含まれる。 Concerns about representation in computing within the U.S. have driven numerous activities to broaden participation. Assessment of the impact of these efforts and, indeed, a clear assessment of the actual "problem" being addressed are limited by the nature of the most common data analysis which looks at the representation of each population as a percentage of the number of students graduating with a degree in computing. This use of a single metric cannot adequately assess the impact of broadening participation efforts. First, this approach fails to account for changing demographics of the undergraduate population in terms of overall numbers and relative proportion of the Federally designated gender, race, and ethnicity groupings. A second issue is that the majority of literature on broadening participation in computing (BPC) reports data on gender or on race/ethnicity, omitting data on students' intersectional identities. This leads to an incorrect understanding of both the data and the challenges we face as a field. In this paper we present several different approaches to tracking the impact of BPC efforts. We make three recommendations: 1) cohort-based analysis should be used to accurately show student engagement in computing; 2) the field as a whole needs to adopt the norm of always reporting intersectional data; 3) university demographic context matters when looking at how well a CS department is doing to broaden participation in computing, including longitudinal analysis of university demographic shifts that impact the local demographics of computing.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-18
# 気候Q&A: 気候科学者と一般大衆のギャップを埋める ClimateQ&A: Bridging the gap between climate scientists and the general public ( http://arxiv.org/abs/2403.14709v1 ) ライセンス: Link先を確認	Natalia De La Calzada, Théo Alves Da Costa, Annabelle Blangero, Nicolas Chesneau,	(参考訳) 本研究では,気候変動と生物多様性の喪失に関する世論を,ClimateQ&Aプラットフォームに対する質問の分析によって調査する。 ClimateQ&Aは、IPCCおよびIPBESレポートから14,000ページ以上の科学文献に基づいたクエリ応答にLLMを使用する会話エージェントである。 2023年3月にオンライン公開されたこのツールは、主にフランスの聴衆から3万以上の質問を集めた。そのチャットボットインタフェースは、自然に関する質問の自由な定式化を可能にする。その主な目的は自然科学をよりアクセスしやすくすることであるが、質問とそのテーマの収集と分析を可能にする。クローズドな質問を含む従来の調査とは異なり、この新手法は自然に関する個別の質問に対する新たな視点を提供する。 3,425の質問でNLPクラスタリングアルゴリズムを実行すると、気候変動や生物多様性の喪失が個人(例えば、居住地や休暇、消費習慣)に与える影響や、自然(例えば、輸送や食品の選択)に対する行動の具体的な影響について、25.8%が大きな質問をしていることがわかった。このことは、従来の調査手法が既存の知識ギャップを全て特定する訳ではなく、IPCCとIPBESのレポートにのみ依存することは、気候と生物多様性に関する個々の質問に対処するものではなく、これらの問題に対する公衆の理解と行動に影響を与える可能性があることを示唆している。 ※「気候変化」・「生物多様性喪失」の傘語として「自然」を用いる。 This research paper investigates public views on climate change and biodiversity loss by analyzing questions asked to the ClimateQ&A platform. ClimateQ&A is a conversational agent that uses LLMs to respond to queries based on over 14,000 pages of scientific literature from the IPCC and IPBES reports. Launched online in March 2023, the tool has gathered over 30,000 questions, mainly from a French audience. Its chatbot interface allows for the free formulation of questions related to nature. While its main goal is to make nature science more accessible, it also allows for the collection and analysis of questions and their themes. Unlike traditional surveys involving closed questions, this novel method offers a fresh perspective on individual interrogations about nature. Running NLP clustering algorithms on a sample of 3,425 questions, we find that a significant 25.8% inquire about how climate change and biodiversity loss will affect them personally (e.g., where they live or vacation, their consumption habits) and the specific impacts of their actions on nature (e.g., transportation or food choices). This suggests that traditional methods of surveying may not identify all existing knowledge gaps, and that relying solely on IPCC and IPBES reports may not address all individual inquiries about climate and biodiversity, potentially affecting public understanding and action on these issues. *we use 'nature' as an umbrella term for 'climate change' and 'biodiversity loss'	翻訳日:2024-03-25 21:31:40 公開日:2024-03-18
# ディプレックス学生支援のためのレコメンデーションモデルの利用 Use of recommendation models to provide support to dyslexic students ( http://arxiv.org/abs/2403.14710v1 ) ライセンス: Link先を確認	Gianluca Morciano, José Manuel Alcalde-Llergo, Andrea Zingoni, Enrique Yeguas-Bolivar, Juri Taborri, Giuseppe Calabrò,	(参考訳) dyslexiaは、最も広範囲にわたる特定の学習障害であり、認知領域に深刻な障害がある。これは、学習過程において、ディプレックスの学生に悪影響を及ぼす。したがって、これらの学生に特定の支援を与える必要がある。さらに、障害によって生じる問題は、互いに大きく異なる可能性があるため、このようなサポートは高度にパーソナライズされなければならない。本研究では, ディプレックスの学生に最も適した支援ツールを提案するために, AIを活用する可能性について検討した。これを実現するために、私たちは、個人の好みを検出し、最も適切な提案を提供することを目的とした、機械学習の分野であるレコメンデーションアルゴリズムを頼りにしました。そこで我々は,3つの協調フィルタリング推薦モデル,すなわちアイテムベース,ユーザベース,および重み付きハイブリッドモデルを実装し,1237名の学生の情報からなる大規模データベース上で,最も多く利用されている支援戦略とデジタルツールに関する自己評価質問紙を用いて,その性能について検討した。各レコメンデーションモデルは、ピアソン相関、ユークリッド距離、コサイン類似度という3つの異なる類似度指標で試験された。その結果,レコメンデーションシステムは,全員に最適なヘルプツールや戦略を提案する上で極めて有効であることがわかった。このことは、提案手法が成功し、ジストレキシーの学生を支援するための新しい効果的な方法として利用できることを示している。 Dyslexia is the most widespread specific learning disorder and significantly impair different cognitive domains. This, in turn, negatively affects dyslexic students during their learning path. Therefore, specific support must be given to these students. In addition, such a support must be highly personalized, since the problems generated by the disorder can be very different from one to another. In this work, we explored the possibility of using AI to suggest the most suitable supporting tools for dyslexic students, so as to provide a targeted help that can be of real utility. To do this, we relied on recommendation algorithms, which are a branch of machine learning, that aim to detect personal preferences and provide the most suitable suggestions. We hence implemented and trained three collaborative-filtering recommendation models, namely an item-based, a user-based and a weighted-hybrid model, and studied their performance on a large database of 1237 students' information, collected with a self-evaluating questionnaire regarding all the most used supporting strategies and digital tools. Each recommendation model was tested with three different similarity metrics, namely Pearson correlation, Euclidean distance and Cosine similarity. The obtained results showed that a recommendation system is highly effective in suggesting the optimal help tools/strategies for everyone. This demonstrates that the proposed approach is successful and can be used as a new and effective methodology to support students with dyslexia.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-18
# リング検出のための対人AI Human-in-the-Loop AI for Cheating Ring Detection ( http://arxiv.org/abs/2403.14711v1 ) ライセンス: Link先を確認	Yong-Siang Shih, Manqian Liao, Ruidong Liu, Mirza Basim Baig,	(参考訳) 近年,アクセシビリティのため,オンライン試験が普及している。しかし、オンライン試験の安全性、特に悪質な試験受験者が合格するのを助けるプロの不正行為の文脈において、いくつかの懸念が持ち上がっており、いわゆる「チーティングリング」を形成している。本稿では,これらの不正なリングを検知し,阻止するように設計された,ループ型AI不正なリング検出システムを提案する。我々は、この人間のループAIシステムの基盤となる論理を概説し、不正者検出の目的を達成するための設計原則を探求する。さらに、AIシステムに関連する意図しないリスクを軽減することを目的として、その性能と公平性を評価するために使用される方法論について説明する。システムの設計と開発はResponsible AI(RAI)標準に準拠し、開発プロセス全体を通して倫理的考察が統合されることを保証する。 Online exams have become popular in recent years due to their accessibility. However, some concerns have been raised about the security of the online exams, particularly in the context of professional cheating services aiding malicious test takers in passing exams, forming so-called "cheating rings". In this paper, we introduce a human-in-the-loop AI cheating ring detection system designed to detect and deter these cheating rings. We outline the underlying logic of this human-in-the-loop AI system, exploring its design principles tailored to achieve its objectives of detecting cheaters. Moreover, we illustrate the methodologies used to evaluate its performance and fairness, aiming to mitigate the unintended risks associated with the AI system. The design and development of the system adhere to Responsible AI (RAI) standards, ensuring that ethical considerations are integrated throughout the entire development process.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-18
# 官僚的生産性のためのAI: 1億4300万の英国政府の取引を自動化するAIの可能性の測定 AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions ( http://arxiv.org/abs/2403.14712v1 ) ライセンス: Link先を確認	Vincent J. Straub, Youmna Hashem, Jonathan Bright, Satyam Bhagwanani, Deborah Morgan, John Francis, Saba Esnaashari, Helen Margetts,	(参考訳) 現在政府内では、複雑だが反復的な官僚的タスクの自動化によって、人工知能が公共サービスの生産性を向上させる可能性について、かなりの興奮がある。ここでは、英国中央政府における市民向きの官僚的意思決定手順の規模をマッピングし、AIによる自動化の可能性を評価することによって、この機会の規模を探る。我々は、英国中央政府が年間約10億件の市民向け取引を約400のサービスで行っており、そのうち約1億4300万が複雑な反復取引であると見積もっている。これらの複雑なトランザクションの84%は高度に自動化可能であると見積もっています。また、政府のサービスによる取引量の推定モデルも開発し、政府が取引量の計測に時間を費やすのを避けるための手段を提供する。最後に、政府が提供するサービスの種類には高いオーバオーバがあることが分かりました。つまり、自動化の取り組みは、時間とともに進化する可能性のあるサービス自身ではなく、一般的な手順に重点を置くべきです。全体として、我々の研究は、現代政府の構造と機能、そしてそれが人工知能の時代にどのように進化するかについて、新しい視点を示します。 There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-18
# 観測不能な条件下での公正さの監査 Auditing Fairness under Unobserved Confounding ( http://arxiv.org/abs/2403.14713v1 ) ライセンス: Link先を確認	Yewon Byun, Dylan Sam, Michael Oberst, Zachary C. Lipton, Bryan Wilder,	(参考訳) 意思決定システムにおける根本的な問題は、人口統計上の不平等の存在である。しかしながら、不平等は定量化が困難であり、特に我々の株式の概念がリスク(例えば、それ無しで死ぬ人に対する治療への平等なアクセス)のような難しい概念に依存している場合である。このような不平等を監査するには、個々のリスクを正確に測定する必要がある。これらの観測不能物が明らかな相違を「説明」する場合、過渡状態または過渡状態の不等式が成立する可能性がある。本稿では, リスク要因がすべて観察されているという仮定を排除した場合でも, 緩やかに, あるいは(当然のことながら) 高いリスクの個人間でのアロケーション率に情報的限界を与えることができることを示す。我々は、現実の多くの設定(例えば、新しい治療の導入)において、いかなるアロケーションよりも前の期間のデータを持ち、不偏のリスク見積を導出するという事実を利用する。筆者らは,Paxlovidの患者への配当に関する現実的な研究において,我々の枠組みの有効性を実証し,観察された人種的不平等は,重要な観察された同種種と同一の強度を持つ未観察の共同設立者によって説明できないことを発見した。 A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any allocation, to derive unbiased estimates of risk. We demonstrate the effectiveness of our framework on a real-world study of Paxlovid allocation to COVID-19 patients, finding that observed racial inequity cannot be explained by unobserved confounders of the same strength as important observed covariates.	翻訳日:2024-03-25 21:31:40 公開日:2024-03-18
# 脳-コンピュータインタフェースのための軽量ベクトル記号アーキテクチャのスケジューリング知識獲得 Scheduled Knowledge Acquisition on Lightweight Vector Symbolic Architectures for Brain-Computer Interfaces ( http://arxiv.org/abs/2403.13844v1 ) ライセンス: Link先を確認	Yejia Liu, Shijin Duan, Xiaolin Xu, Shaolei Ren,	(参考訳) Brain-Computer Interface (BCI) は通常、ユーザがタイムリーなフィードバックを提供するために、軽量でリアルタイムに応答できるように設計されている。古典的特徴エンジニアリングは計算効率は高いが精度は低いが、最近のニューラルネットワーク(DNN)は精度を向上するが、計算コストが高く、レイテンシが高い。有望な代替として、ベクトル記号アーキテクチャ(VSA)に基づく低次元計算(LDC)分類器は、古典的な特徴工学手法よりも小さいモデルサイズで精度が高い。しかし、その精度は現代のDNNと比べても遅れており、複雑な脳信号を処理することは困難である。小モデルの精度を向上させるため、知識蒸留は一般的な方法である。しかし、教師と生徒のモデルの蒸留レベルを一定に保つことは、成長する学生にとって、その進歩的な学習段階において最善の方法ではないかもしれない。そこで本研究では,カリキュラムデータに基づく簡易な知識蒸留手法を提案し,学生が授業モデルから徐々に知識を構築できるようにし,それを$\alpha$スケジューラで制御する。一方,LDC/VSAを学生モデルとして採用し,低レイテンシを必要とする小型BCIデバイスにおいて,デバイス上での推論効率を向上させる。実験結果から,本手法は他の手法に比べて精度とハードウェア効率のトレードオフが良好であることが示された。 Brain-Computer interfaces (BCIs) are typically designed to be lightweight and responsive in real-time to provide users timely feedback. Classical feature engineering is computationally efficient but has low accuracy, whereas the recent neural networks (DNNs) improve accuracy but are computationally expensive and incur high latency. As a promising alternative, the low-dimensional computing (LDC) classifier based on vector symbolic architecture (VSA), achieves small model size yet higher accuracy than classical feature engineering methods. However, its accuracy still lags behind that of modern DNNs, making it challenging to process complex brain signals. To improve the accuracy of a small model, knowledge distillation is a popular method. However, maintaining a constant level of distillation between the teacher and student models may not be the best way for a growing student during its progressive learning stages. In this work, we propose a simple scheduled knowledge distillation method based on curriculum data order to enable the student to gradually build knowledge from the teacher model, controlled by an $\alpha$ scheduler. Meanwhile, we employ the LDC/VSA as the student model to enhance the on-device inference efficiency for tiny BCI devices that demand low latency. The empirical results have demonstrated that our approach achieves better tradeoff between accuracy and hardware efficiency compared to other methods.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# 目に見えないものをよりよく見る: インクリメンタルゼロショット異常診断のための広深さ混合アンチフォッティングフレームワーク Learning to better see the unseen: Broad-Deep Mixed Anti-Forgetting Framework for Incremental Zero-Shot Fault Diagnosis ( http://arxiv.org/abs/2403.13845v1 ) ライセンス: Link先を確認	Jiancheng Zhao, Jiaqi Yue, Chunhui Zhao,	(参考訳) ゼロショット断層診断(ZSFD)は、人間の専門家によってラベル付けされた断層特性を予測することによって、目に見えない断層を識別することができる。我々はまず,ZSFDの産業プロセスの継続的な変化,すなわち新たな障害カテゴリや属性に適応するモデルの能力に対処する上で,これまで学んだ診断能力を忘れてはならない,という要求を認識した。既存のZSFDパラダイムは、産業シナリオにおけるトレーニングデータのストリームの進化から学べないという問題を克服するために、従来のZSFDパラダイムと一般化ZSFDパラダイムの両方にカテゴリインクリメントと属性インクリメントを組み込んだインクリメンタルZSFD(IZSFD)パラダイムが最初に提案されている。 IZSFDを実現するために,新しい障害カテゴリや属性から学習することを目的とした,広域混合型アンチフォゲッティングフレームワーク(BDMAFF)を提案する。忘れる問題に対処するため、BDMAFFは2つの観点から得られた知識、すなわち特徴と属性のプロトタイプを効果的に蓄積する。特徴記憶は、アンチフォッゲッティングトレーニング戦略を用いた深層生成モデルにより確立され、歴史的カテゴリの生成品質が監視され維持される。診断モデルは、生成モデルから生成されたサンプルの助けを借りてUNSEEN断層をSEEする。属性プロトタイプメモリは、広範学習システムにインスパイアされた診断モデルによって確立される。従来の漸進的学習アルゴリズムとは異なり、BDMAFFは診断モデルのメモリ駆動反復更新戦略を導入し、過去のトレーニングサンプルをすべて保存することなく、新しい障害や属性を学習できるようにする。提案手法の有効性は,実油圧システムとテネシー・イーストマンベンチマークプロセスによって検証される。 Zero-shot fault diagnosis (ZSFD) is capable of identifying unseen faults via predicting fault attributes labeled by human experts. We first recognize the demand of ZSFD to deal with continuous changes in industrial processes, i.e., the model's ability to adapt to new fault categories and attributes while avoiding forgetting the diagnosis ability learned previously. To overcome the issue that the existing ZSFD paradigm cannot learn from evolving streams of training data in industrial scenarios, the incremental ZSFD (IZSFD) paradigm is proposed for the first time, which incorporates category increment and attribute increment for both traditional ZSFD and generalized ZSFD paradigms. To achieve IZSFD, we present a broad-deep mixed anti-forgetting framework (BDMAFF) that aims to learn from new fault categories and attributes. To tackle the issue of forgetting, BDMAFF effectively accumulates previously acquired knowledge from two perspectives: features and attribute prototypes. The feature memory is established through a deep generative model that employs anti-forgetting training strategies, ensuring the generation quality of historical categories is supervised and maintained. The diagnosis model SEEs the UNSEEN faults with the help of generated samples from the generative model. The attribute prototype memory is established through a diagnosis model inspired by the broad learning system. Unlike traditional incremental learning algorithms, BDMAFF introduces a memory-driven iterative update strategy for the diagnosis model, which allows the model to learn new faults and attributes without requiring the storage of all historical training samples. The effectiveness of the proposed method is verified by a real hydraulic system and the Tennessee-Eastman benchmark process.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# グラフ最大復号情報を用いたクラスタリング手法 A Clustering Method with Graph Maximum Decoding Information ( http://arxiv.org/abs/2403.13846v1 ) ライセンス: Link先を確認	Xinrun Xu, Manying Lv, Yurong Wu, Zhanbiao Lian, Zhiming Ding, Jin Yan, Shan Jiang,	(参考訳) グラフモデルに基づくクラスタリング手法は,様々な知識領域にまたがる適用性に注目が集まっている。他の関連するアプリケーションとシームレスに統合する適応性は、グラフモデルに基づくクラスタリング分析に、データセット内で「自然な関連」や「グラフ構造」を堅牢に抽出する能力を与え、データポイント間の関係のモデリングを容易にする。その有効性にもかかわらず、グラフベースモデルを用いた現在のクラスタリング手法は、ノード間のランダムウォークアクセスとデータ内の組込み構造情報に関連する不確実性を見落としている。このギャップに対処するために, CMDI と呼ばれるグラフベースモデル内でのデコード情報の最大化のためのクラスタリング手法を提案する。 CMDIは、グラフ構造抽出とグラフ頂点分割という2つのフェーズからなるクラスタリングプロセスに、2次元構造情報理論を革新的に組み入れている。 CMDI内では、グラフ分割は抽象的なクラスタリング問題として再構成され、最大復号情報を利用して、頂点へのランダムな訪問に関連する不確実性を最小限に抑える。 3つの実世界のデータセットに対する実証的な評価は、CMDIが古典的ベースライン法よりも優れており、より優れた復号化情報比(DI-R)を示すことを示している。さらにCMDIは,特に事前知識(PK)を考慮した場合,高い効率性を示す。これらの結果から,デコード情報の品質と計算効率を向上させるCMDIの有効性が示され,グラフベースのクラスタリング解析において貴重なツールとして位置づけられた。 The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# ガウス混合モデルによるドメイン適応のための最適輸送 Optimal Transport for Domain Adaptation through Gaussian Mixture Models ( http://arxiv.org/abs/2403.13847v1 ) ライセンス: Link先を確認	Eduardo Fernandes Montesuma, Fred Maurice Ngolè Mboula, Antoine Souloumiac,	(参考訳) 本稿では,最適輸送による領域適応について検討する。本稿では,ガウス混合モデルを用いてデータ分布をモデル化する手法を提案する。この戦略により、等価な離散的な問題を通じて連続的な最適輸送を解くことができる。最適なトランスポートソリューションは、ソースとターゲットドメインの混合コンポーネントのマッチングを提供します。このマッチングから、ドメイン間でデータポイントをマッピングしたり、ソースドメインコンポーネントからターゲットドメインへラベルを転送したりできます。断層診断における2つの領域適応ベンチマークを用いて,本手法の最先端性能を示す。 In this paper we explore domain adaptation through optimal transport. We propose a novel approach, where we model the data distributions through Gaussian mixture models. This strategy allows us to solve continuous optimal transport through an equivalent discrete problem. The optimal transport solution gives us a matching between source and target domain mixture components. From this matching, we can map data points between domains, or transfer the labels from the source domain components towards the target domain. We experiment with 2 domain adaptation benchmarks in fault diagnosis, showing that our methods have state-of-the-art performance.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# 差分生成型かつ正確なルールリストの学習のための平滑感性 Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists ( http://arxiv.org/abs/2403.13848v1 ) ライセンス: Link先を確認	Timothée Ly, Julien Ferry, Marie-José Huguet, Sébastien Gambs, Ulrich Aivodji,	(参考訳) Differentially-private (DP) メカニズムは、結果として生じるモデルをプライバシリークから保護するために、機械学習アルゴリズムの設計に組み込むことができる。本稿では,Giniの不純物のスムーズな感度を確立し,それを利用してDPグリードルリストアルゴリズムを提案することによって,ルールリストモデルのトレードオフを改善することを目的とする。特に, 理論解析および実験結果から, 滑らかな感度を組み込んだDPルールリストは, グローバルな感度に基づく他のDPフレームワークを用いたモデルよりも精度が高いことが示された。 Differentially-private (DP) mechanisms can be embedded into the design of a machine learningalgorithm to protect the resulting model against privacy leakage, although this often comes with asignificant loss of accuracy. In this paper, we aim at improving this trade-off for rule lists modelsby establishing the smooth sensitivity of the Gini impurity and leveraging it to propose a DP greedyrule list algorithm. In particular, our theoretical analysis and experimental results demonstrate thatthe DP rule lists models integrating smooth sensitivity have higher accuracy that those using otherDP frameworks based on global sensitivity.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# グラフ: グラフニューラルネットワークとグラフ生成 Graphs Unveiled: Graph Neural Networks and Graph Generation ( http://arxiv.org/abs/2403.13849v1 ) ライセンス: Link先を確認	László Kovács, Ali Jlidi,	(参考訳) 機械学習におけるホットトピックの1つは、GNNの分野である。グラフデータの複雑さは、既存の機械学習アルゴリズムに重大な課題を課している。近年,グラフデータに対する深層学習手法の拡張に関する研究が盛んに行われている。本稿では,グラフニューラルネットワーク(GNN)の概要を紹介する。様々な領域にわたるグラフニューラルネットワークの適用について論じる。最後に,GNNの高度な分野としてグラフ生成を提案する。 One of the hot topics in machine learning is the field of GNN. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. This paper represents a survey, providing a comprehensive overview of Graph Neural Networks (GNNs). We discuss the applications of graph neural networks across various domains. Finally, we present an advanced field in GNNs: graph generation.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# 物理認識とパラメータ拡散誘導による時空間流体力学モデリング Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance ( http://arxiv.org/abs/2403.13850v1 ) ライセンス: Link先を確認	Hao Wu, Fan Xu, Yifan Duan, Ziwei Niu, Weiyan Wang, Gaofeng Lu, Kun Wang, Yuxuan Liang, Yang Wang,	(参考訳) 本稿では,地球科学分野における時空間流体力学モデリングのための2段階のフレームワークST-PADを提案する。上流の段階では、時間的進化特性を持つベクトル量子化再構成モジュールを設計し、一般的な物理制約を導入することで、平衡パラメータ分布と弾力パラメータ分布を確保する。下流の段階では、パラメータを含む拡散確率ネットワークを用いて、様々な物理装置におけるパラメータの知覚によりモデルの一般化能力を高めながら、流体の高品質な将来状態を生成する。複数のベンチマークデータセットに対する大規模な実験により、ST-PADフレームワークの有効性とロバスト性が確認され、ST-PADは流体力学のモデリングと予測において、特に局所的な表現を効果的に取得し、OOD世代において大きな優位性を維持する上で、現在の主流モデルよりも優れていることを示した。 This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics, ensuring balanced and resilient parameter distribution by introducing general physical constraints. In the downstream stage, a diffusion probability network involving parameters is utilized to generate high-quality future states of fluids, while enhancing the model's generalization ability by perceiving parameters in various physical setups. Extensive experiments on multiple benchmark datasets have verified the effectiveness and robustness of the ST-PAD framework, which showcase that ST-PAD outperforms current mainstream models in fluid dynamics modeling and prediction, especially in effectively capturing local representations and maintaining significant advantages in OOD generations.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# ニューラルネットワークを用いた医療用デジタル双生児の制御 Control of Medical Digital Twins with Artificial Neural Networks ( http://arxiv.org/abs/2403.13851v1 ) ライセンス: Link先を確認	Lucas Böttcher, Luis L. Fonseca, Reinhard C. Laubenbacher,	(参考訳) パーソナライズドメディカルの目的は、患者固有の特徴に対する介入を調整することである。この目的のための重要な技術は、医療用デジタルツイン、ヒト生物学の計算モデルであり、患者固有のデータを時間とともに収集するパーソナライズされ、動的に更新することができる。免疫系のような人間の生物学の特定の側面は、微分方程式のような物理学に基づくモデルでは容易には捉えられない。代わりに、それらはしばしばマルチスケール、確率的、ハイブリッドである。これは、そのようなモデルに容易に適用できない既存のモデルベースの制御と最適化アプローチに挑戦する。自動微分法やニューラルネットワーク制御法の最近の進歩は、複雑な制御問題に対処する上で有望である。しかし、これらのアプローチの生体医療システムへの応用は、まだ初期段階にある。この研究は、医療用デジタルツインを制御する代替アプローチとして、動的インフォームドニューラルネットワークコントローラを導入している。この手法の第一のユースケースとして、バイオメディシンにおける多用途で一般的なモデリングプラットフォームであるエージェントベースモデルに焦点が当てられている。提案手法の有効性を実証し,2種類のエージェントモデルを用いた他の手法と比較した。ここで紹介される方法の関連性は、医療用デジタル双生児以外にも、他の複雑な力学系にも及んでいる。 The objective of personalized medicine is to tailor interventions to an individual patient's unique characteristics. A key technology for this purpose involves medical digital twins, computational models of human biology that can be personalized and dynamically updated to incorporate patient-specific data collected over time. Certain aspects of human biology, such as the immune system, are not easily captured with physics-based models, such as differential equations. Instead, they are often multi-scale, stochastic, and hybrid. This poses a challenge to existing model-based control and optimization approaches that cannot be readily applied to such models. Recent advances in automatic differentiation and neural-network control methods hold promise in addressing complex control problems. However, the application of these approaches to biomedical systems is still in its early stages. This work introduces dynamics-informed neural-network controllers as an alternative approach to control of medical digital twins. As a first use case for this method, the focus is on agent-based models, a versatile and increasingly common modeling platform in biomedicine. The effectiveness of the proposed neural-network control method is illustrated and benchmarked against other methods with two widely-used agent-based model types. The relevance of the method introduced here extends beyond medical digital twins to other complex dynamical systems.	翻訳日:2024-03-22 18:28:52 公開日:2024-03-18
# ChildCIフレームワーク:年齢検出のためのコンピュータインタラクションによる子どもの運動・認知発達の分析 ChildCI Framework: Analysis of Motor and Cognitive Development in Children-Computer Interaction for Age Detection ( http://arxiv.org/abs/2204.04236v3 ) ライセンス: Link先を確認	Juan Carlos Ruiz-Garcia, Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Jaime Herreros-Rodriguez,	(参考訳) 本稿では、近年のChildCIフレームワークで提案されている様々なテストについて包括的分析を行い、子どもの神経運動と認知発達をよりよく理解する可能性を示し、e-Healthやe-Learningといった他の研究分野にも応用できる可能性を示した。特に,子どもがモバイル機器と対話する際の運動・認知的側面に関連する,100以上のグローバルな特徴のセットを提案し,その一部を文献から収集し,適応させた。さらに, 運動と認知行動に基づいて, 児童年齢群検出の課題に対する実験結果を含む, 特徴集合の頑健性と識別力について分析した。本研究では,2つの異なるシナリオを考察する。一単体テストのシナリオ及び ii) 複数テストシナリオ。 93%以上の正確性は、公開されているChildCIdb_v1データベース(18ヶ月から8歳までの400人以上)を用いて達成され、子どもの年齢とモバイルデバイスとのインタラクションの関連性が高いことが証明された。 This article presents a comprehensive analysis of the different tests proposed in the recent ChildCI framework, proving its potential for generating a better understanding of children's neuromotor and cognitive development along time, as well as their possible application in other research areas such as e-Health and e-Learning. In particular, we propose a set of over 100 global features related to motor and cognitive aspects of the children interaction with mobile devices, some of them collected and adapted from the literature. Furthermore, we analyse the robustness and discriminative power of the proposed feature set including experimental results for the task of children age group detection based on their motor and cognitive behaviours. Two different scenarios are considered in this study: i) single-test scenario, and ii) multiple-test scenario. Results over 93% accuracy are achieved using the publicly available ChildCIdb_v1 database (over 400 children from 18 months to 8 years old), proving the high correlation of children's age with the way they interact with mobile devices.	翻訳日:2024-03-21 23:26:53 公開日:2024-03-18
# HyperVQ:双曲空間におけるMLRに基づくベクトル量子化 HyperVQ: MLR-based Vector Quantization in Hyperbolic Space ( http://arxiv.org/abs/2403.13015v1 ) ライセンス: Link先を確認	Nabarun Goswami, Yusuke Mukuta, Tatsuya Harada,	(参考訳) トークン化されたデータを扱うモデルの成功は、特に非離散的なデータを含む視覚や聴覚タスクに適用する場合に、効果的なトークン化手法の需要が高まっている。最も一般的なトークン化手法の1つはベクトル量子化(VQ)である。典型的には、VQ変動オートコーダ(VQVAE)は、データのトークン化表現への変換を訓練する。しかしながら、VQVAEは再構築目的で訓練されているため、埋め込みがうまく切り離されているという制約はなく、差別的なタスクでそれらを使用する上で重要な側面である。近年,表現学習における双曲空間の利点を実証する研究がいくつかある。双曲空間は、指数的体積成長と階層的および構造化されたデータをモデル化する固有の能力により、コンパクトな潜在表現を誘導する。本研究では,ベクトル量子化(HyperVQ)における双曲空間の利用について検討し,VQVAEで使用されるユークリッドK平均クラスタリングとは対照的に,双曲多項ロジスティック回帰(MLR)問題としてVQ演算を定式化する。広範にわたる実験により,ハイパーVQは,識別的タスクにおいてVQより優れ,非常に不整合な潜在空間を学習しながら,再構成や生成作業において相容れない性能を示す。 The success of models operating on tokenized data has led to an increased demand for effective tokenization methods, particularly when applied to vision or auditory tasks, which inherently involve non-discrete data. One of the most popular tokenization methods is Vector Quantization (VQ), a key component of several recent state-of-the-art methods across various domains. Typically, a VQ Variational Autoencoder (VQVAE) is trained to transform data to and from its tokenized representation. However, since the VQVAE is trained with a reconstruction objective, there is no constraint for the embeddings to be well disentangled, a crucial aspect for using them in discriminative tasks. Recently, several works have demonstrated the benefits of utilizing hyperbolic spaces for representation learning. Hyperbolic spaces induce compact latent representations due to their exponential volume growth and inherent ability to model hierarchical and structured data. In this work, we explore the use of hyperbolic spaces for vector quantization (HyperVQ), formulating the VQ operation as a hyperbolic Multinomial Logistic Regression (MLR) problem, in contrast to the Euclidean K-Means clustering used in VQVAE. Through extensive experiments, we demonstrate that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.	翻訳日:2024-03-21 21:08:57 公開日:2024-03-18
# Impart: 知覚不能で効果的なラベル付きバックドアアタック Impart: An Imperceptible and Effective Label-Specific Backdoor Attack ( http://arxiv.org/abs/2403.13017v1 ) ライセンス: Link先を確認	Jingke Zhao, Zan Wang, Yongwei Wang, Lanjun Wang,	(参考訳) バックドア攻撃は、実際のセキュリティクリティカルなシナリオに深刻な脅威を課すことが示されている。以前の研究は高い攻撃の成功率を達成することができるが、実際には脅威を著しく減少させるような被害者モデルにアクセスするか、ステルスネスで視覚的に目立たせることが必要になる。さらに、異なる毒のサンプルが異なる標的ラベル(すなわちオール・ツー・オールのセッティング)を持つというシナリオにおいて、攻撃の成功率を改善する余地もある。本研究では,攻撃者が被害者モデルにアクセスできないシナリオにおいて,Impartという新たな非知覚的バックドアアタック・フレームワークを提案する。具体的には、オール・ツー・オール・セッティングの攻撃能力を高めるために、まずラベル固有の攻撃を提案する。そこで本研究では, イメージ特徴のターゲットラベルと一致した摂動を代理モデルにより生成する手法を提案する。このようにして、生成した有毒画像にターゲットクラスに関する知識を付加し、攻撃能力を著しく向上させる。 Backdoor attacks have been shown to impose severe threats to real security-critical scenarios. Although previous works can achieve high attack success rates, they either require access to victim models which may significantly reduce their threats in practice, or perform visually noticeable in stealthiness. Besides, there is still room to improve the attack success rates in the scenario that different poisoned samples may have different target labels (a.k.a., the all-to-all setting). In this study, we propose a novel imperceptible backdoor attack framework, named Impart, in the scenario where the attacker has no access to the victim model. Specifically, in order to enhance the attack capability of the all-to-all setting, we first propose a label-specific attack. Different from previous works which try to find an imperceptible pattern and add it to the source image as the poisoned image, we then propose to generate perturbations that align with the target label in the image feature by a surrogate model. In this way, the generated poisoned images are attached with knowledge about the target class, which significantly enhances the attack capability.	翻訳日:2024-03-21 21:08:57 公開日:2024-03-18
# 特異値分解による見えないバックドア攻撃 Invisible Backdoor Attack Through Singular Value Decomposition ( http://arxiv.org/abs/2403.13018v1 ) ライセンス: Link先を確認	Wenmin Chen, Xiaowei Xu,	(参考訳) さまざまなドメインでディープラーニングが広く適用されるようになると、そのセキュリティに対する懸念は大幅に高まっている。これらのうち、バックドア攻撃はディープニューラルネットワーク(DNN)に深刻なセキュリティ上の脅威をもたらす。近年、ニューラルネットワークに対するバックドア攻撃はますます洗練され、隠れた無許可の機能やトリガーを埋め込むことによってモデルのセキュリティと信頼性を損なうことを目的としている。トリガーを知覚しにくく、知覚できないものにするため、様々な目に見えないバックドア攻撃が提案されている。しかし,その多くが空間領域の視認性しか考慮していないため,近年の防衛手法による有害画像の検出が容易であり,これらの課題に対処するために,DEBAと呼ばれる目に見えないバックドア攻撃を提案する。 DEBAは、Singular Value Decomposition(SVD)の数学的特性を活用して、トレーニングフェーズ中に知覚できないバックドアをモデルに埋め込むことで、特定のトリガー条件下で事前に定義された悪意のある振る舞いを示す。具体的には、まず画像上でSVDを実行し、次に、トリガー画像のマイナーな特徴をクリーン画像の特徴に置き換え、それらをトリガーとして使用して、攻撃の有効性を保証する。画像全体に小さな特徴が散在しているため、清潔な画像の主要な特徴が保存され、清潔な画像とは視覚的に区別できない。広汎な実験的評価により, DEBAは高い知覚品質を維持し, 有毒画像に対する高い攻撃成功率を保ち, 極めて有効であることが示された。さらに, 既存の防衛対策におけるDEBAの性能評価を行い, これらの防衛対策の効果を著しく回避し, 抵抗することができることを示した。 With the widespread application of deep learning across various domains, concerns about its security have grown significantly. Among these, backdoor attacks pose a serious security threat to deep neural networks (DNNs). In recent years, backdoor attacks on neural networks have become increasingly sophisticated, aiming to compromise the security and trustworthiness of models by implanting hidden, unauthorized functionalities or triggers, leading to misleading predictions or behaviors. To make triggers less perceptible and imperceptible, various invisible backdoor attacks have been proposed. However, most of them only consider invisibility in the spatial domain, making it easy for recent defense methods to detect the generated toxic images.To address these challenges, this paper proposes an invisible backdoor attack called DEBA. DEBA leverages the mathematical properties of Singular Value Decomposition (SVD) to embed imperceptible backdoors into models during the training phase, thereby causing them to exhibit predefined malicious behavior under specific trigger conditions. Specifically, we first perform SVD on images, and then replace the minor features of trigger images with those of clean images, using them as triggers to ensure the effectiveness of the attack. As minor features are scattered throughout the entire image, the major features of clean images are preserved, making poisoned images visually indistinguishable from clean ones. Extensive experimental evaluations demonstrate that DEBA is highly effective, maintaining high perceptual quality and a high attack success rate for poisoned images. Furthermore, we assess the performance of DEBA under existing defense measures, showing that it is robust and capable of significantly evading and resisting the effects of these defense measures.	翻訳日:2024-03-21 21:08:57 公開日:2024-03-18
# ASOP: クラウドベースのIoTサービスのためのセキュアでセキュアなデバイスオンボードプロトコル ASOP: A Sovereign and Secure Device Onboarding Protocol for Cloud-based IoT Services ( http://arxiv.org/abs/2403.13020v1 ) ライセンス: Link先を確認	Khan Reaz, Gerhard Wunder,	(参考訳) 既存の高圧デバイス搭載プロセスは、IoT(Internet of Things)の約束と可能性を妨げる。様々なデバイスメーカーやワーキンググループによるいくつかの試みの後でも、広く採用されている標準ソリューションは実現しなかった。 Fast Identity Online (FIDO) Allianceによる最新の試みでは、マスマーケットIoT顧客のためのゼロタッチソリューションが約束されているが、その負担は中間サプライチェーン(すなわち、すべてのデバイスに対して‘Ownership Voucher’と呼ばれるキーとデジタルシグネチャを管理するためのインフラストラクチャを維持する必要がある)に転送される。この仕様はドメイン名システム(DNS)サーバーの概念を模倣した 'Rendezvous Server' に依存している。これは本質的には、Denial of Service(DoS)攻撃や相関攻撃を含む、DNSに関連する既存の攻撃シナリオを復活させることを意味する。 Ownership Voucherは、一部の中間サプライチェーンエージェントが悪意を持って行動し、所有権の移転を拒絶したり、間違ったキーで署名したりするリスクを生じさせる。さらに、この仕様における弱い楕円曲線SECP256r1/SECP384r1(NIST P-256/384としても知られる)の故意な使用は、疑問を提起する。私たちは、デバイスメーカー、サプライチェーン、クラウドサービスプロバイダを盲目的に信頼することなく、IoTデバイス用の主権とセキュアなデバイスオンボードプロトコルであるASOPを紹介します。 ASOPプロトコルは、ユーザが所有する認証者の助けを借りて、IoTデバイスをクラウドサーバに搭載することを可能にする。本稿では,プロトコルの事前開発とその高レベル記述について概説する。我々の 'zero-trust' と ' Human-in-the-loop' アプローチは、デバイス所有者がサードパーティのインフラストラクチャの恩恵を受けていないことを保証し、最近標準化されたポスト量子暗号スイート (CRYSTALS) を使用してコネクションとメッセージを保護する。 The existing high-friction device onboarding process hinders the promise and potentiality of Internet of Things (IoT). Even after several attempts by various device manufacturers and working groups, no widely adopted standard solution came to fruition. The latest attempt by Fast Identity Online (FIDO) Alliance promises a zero touch solution for mass market IoT customers, but the burden is transferred to the intermediary supply chain (i.e. they have to maintain infrastructure for managing keys and digital signatures called `Ownership Voucher' for all devices). The specification relies on a `Rendezvous Server' mimicking the notion of Domain Name System (DNS) server'. This essentially means resurrecting all existing possible attack scenarios associated with DNS, which include Denial of Service (DoS) attack, and Correlation attack. `Ownership Voucher' poses the risk that some intermediary supply chain agents may act maliciously and reject the transfer of ownership or sign with a wrong key. Furthermore, the deliberate use of the weak elliptic curve SECP256r1/SECP384r1 (also known as NIST P-256/384) in the specification raises questions. We introduce ASOP: a sovereign and secure device onboarding protocol for IoT devices without blindly trusting the device manufacturer, supply chain, and cloud service provider. The ASOP protocol allows onboarding an IoT device to a cloud server with the help of an authenticator owned by the user. This paper outlines the preliminary development of the protocol and its high-level description. Our `zero-trust' and `human-in-the-loop' approach guarantees that the device owner does not remain at the mercy of third-party infrastructures, and it utilises recently standardized post-quantum cryptographic suite (CRYSTALS) to secure connection and messages.	翻訳日:2024-03-21 21:08:57 公開日:2024-03-18
# 説明可能なコンセプトドリフトでサイバーセキュリティ攻撃を防ぐ Thwarting Cybersecurity Attacks with Explainable Concept Drift ( http://arxiv.org/abs/2403.13023v1 ) ライセンス: Link先を確認	Ibrahim Shaer, Abdallah Shami,	(参考訳) サイバーセキュリティ攻撃は、自律システムの運用に重大な脅威をもたらす。特に影響を受けているのは、スマートビルの暖房、換気、空調(HVAC)システムで、センサーが収集したデータと、キャプチャデータを使用した機械学習(ML)モデルに依存する。したがって、これらのセンサーの読み方を変える攻撃は、住民の快適性とエネルギー削減の目標に影響を与えるHVACシステムの運用に深刻な影響を与える可能性がある。このような攻撃は、MLモデルに供給されるオンラインデータ配布の変化を誘発し、トレーニングとデータ配布のテストにおける類似性の基本的な前提を侵害する可能性がある。これにより、概念ドリフト(CD)と呼ばれる現象によってモデル予測精度が低下し、入力特徴と対象変数の関係が変化する。 CDに対処するには、ターゲット緩和戦略を適用するためにドリフトの源を特定する必要がある。本稿では, ドリフト特徴を特定するための特徴ドリフト記述(FDE)モジュールを提案する。 FDEは自動エンコーダ(AE)を利用して回帰ディープラーニング(DL)モデルの第一層の活性化を再構築し、その潜在表現を見つける。ドリフトを検出すると、ドリフトデータの各特徴をトレーニングデータから代表データに置き換える。ミンコフスキー距離は、変化したドリフトデータと元のトレーニングデータとのばらつきを測定するために使用される。その結果,FDE はドリフト特性の85.77 % を同定し,CD 現象下での DL 適応法での有用性を示した。その結果、FDE法は、サイバーセキュリティ攻撃を阻止するための漂流の特徴を識別するための効果的な戦略である。 Cyber-security attacks pose a significant threat to the operation of autonomous systems. Particularly impacted are the Heating, Ventilation, and Air Conditioning (HVAC) systems in smart buildings, which depend on data gathered by sensors and Machine Learning (ML) models using the captured data. As such, attacks that alter the readings of these sensors can severely affect the HVAC system operations impacting residents' comfort and energy reduction goals. Such attacks may induce changes in the online data distribution being fed to the ML models, violating the fundamental assumption of similarity in training and testing data distribution. This leads to a degradation in model prediction accuracy due to a phenomenon known as Concept Drift (CD) - the alteration in the relationship between input features and the target variable. Addressing CD requires identifying the source of drift to apply targeted mitigation strategies, a process termed drift explanation. This paper proposes a Feature Drift Explanation (FDE) module to identify the drifting features. FDE utilizes an Auto-encoder (AE) that reconstructs the activation of the first layer of the regression Deep Learning (DL) model and finds their latent representations. When a drift is detected, each feature of the drifting data is replaced by its representative counterpart from the training data. The Minkowski distance is then used to measure the divergence between the altered drifting data and the original training data. The results show that FDE successfully identifies 85.77 % of drifting features and showcases its utility in the DL adaptation method under the CD phenomenon. As a result, the FDE method is an effective strategy for identifying drifting features towards thwarting cyber-security attacks.	翻訳日:2024-03-21 21:08:57 公開日:2024-03-18
# 1枚の画像からタスクを発見・幻覚化させる計画(動画) See, Imagine, Plan: Discovering and Hallucinating Tasks from a Single Image ( http://arxiv.org/abs/2403.13438v1 ) ライセンス: Link先を確認	Chenyang Ma, Kai Lu, Ta-Ying Cheng, Niki Trigoni, Andrew Markham,	(参考訳) 人間は、現在の世界で世界を認識し、理解するだけでなく、すぐに知覚できる以上の将来のシナリオを思い描くことができる。この深い人間の能力に似て、ゼロショットのタスク幻覚を導入します -- 未知の環境やオブジェクトを含むシーンの1つのRGBイメージを考えると、私たちのモデルは潜在的なタスクを特定し、ビデオとして実現された鮮やかな物語の中でそれらの実行を想像できます。動的相互作用のためのVLMと物体軌道のための3次元モーションプランニングを組み込んだ,シーンの分解,理解,再構築を段階的に向上するモジュールパイプラインを開発した。我々のモデルは、機械と人間の両方が理解できる現実的で魅力的な視覚結果を示すタスクビデオによって、多様なタスクを発見できる。 Project Page: https://dannymcy.github.io/zeroshot_task_hallucination/ Humans can not only recognize and understand the world in its current state but also envision future scenarios that extend beyond immediate perception. To resemble this profound human capacity, we introduce zero-shot task hallucination -- given a single RGB image of any scene comprising unknown environments and objects, our model can identify potential tasks and imagine their execution in a vivid narrative, realized as a video. We develop a modular pipeline that progressively enhances scene decomposition, comprehension, and reconstruction, incorporating VLM for dynamic interaction and 3D motion planning for object trajectories. Our model can discover diverse tasks, with the generated task videos demonstrating realistic and compelling visual outcomes that are understandable by both machines and humans. Project Page: https://dannymcy.github.io/zeroshot_task_hallucination/	翻訳日:2024-03-21 17:28:32 公開日:2024-03-18
# 説明可能な自然言語処理のための局所的解釈:サーベイ Local Interpretations for Explainable Natural Language Processing: A Survey ( http://arxiv.org/abs/2103.11072v3 ) ライセンス: Link先を確認	Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon,	(参考訳) 過去10年間で深層学習技術が様々な分野に普及するにつれて、ブラックボックスモデルの不透明性に対する不満が高まり、ディープラーニングモデルの透明性に焦点が当てられるようになった。本研究では,機械翻訳や感情分析など,自然言語処理(NLP)タスクにおけるディープニューラルネットワークの解釈可能性を改善するための様々な手法について検討する。本研究のはじめに,解釈可能性という用語の定義とその諸側面について,包括的に議論する。本調査で収集・要約された手法は,局所的な解釈にのみ関連しており,具体的には3つのカテゴリに分けられる。 1) 関連する入力特徴を通してモデルの予測を解釈すること。 2) 自然言語の説明による解釈 3)モデルと単語表現の隠された状態の探索。 As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model's predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.	翻訳日:2024-03-21 04:02:20 公開日:2024-03-18
# DCVNet:高速光フローのための拡張コストボリュームネットワーク DCVNet: Dilated Cost Volume Networks for Fast Optical Flow ( http://arxiv.org/abs/2103.17271v2 ) ライセンス: Link先を確認	Huaizu Jiang, Erik Learned-Miller,	(参考訳) コストボリュームは、2つの入力画像間での対応の類似性を捉え、最先端の光学的流れのアプローチにおいて重要な要素である。コストボリュームを構築するために通信をサンプリングする場合、大きな近傍半径が大きな変位に対処するために必要であり、かなりの計算負担が伴う。この問題に対処するため、コストボリュームの粗大な処理または再帰処理が通常採用され、小さな半径の局所的な近傍での対応サンプリングが十分である。本稿では,小型・大規模の変位を同時に捉えるために,異なる拡張係数を持つコストボリュームを構築した代替案を提案する。スキップ接続を有するU-Netを用いて、拡張コストのボリュームを、光学的フローを得るために、可能なすべての変位の間の補間重みに変換する。提案したモデルDCVNetは,単純なフィードフォワード方式で1回だけコストボリュームを処理し,シーケンシャルな処理戦略に依存しない。 DCVNetは、既存のアプローチに匹敵する精度を取得し、リアルタイム推論(ミッドエンドの1080ti GPUで30fps)を達成する。コードとモデルの重み付けはhttps://github.com/neu-vi/ezflow.comで確認できる。 The cost volume, capturing the similarity of possible correspondences across two input images, is a key ingredient in state-of-the-art optical flow approaches. When sampling correspondences to build the cost volume, a large neighborhood radius is required to deal with large displacements, introducing a significant computational burden. To address this, coarse-to-fine or recurrent processing of the cost volume is usually adopted, where correspondence sampling in a local neighborhood with a small radius suffices. In this paper, we propose an alternative by constructing cost volumes with different dilation factors to capture small and large displacements simultaneously. A U-Net with skip connections is employed to convert the dilated cost volumes into interpolation weights between all possible captured displacements to get the optical flow. Our proposed model DCVNet only needs to process the cost volume once in a simple feedforward manner and does not rely on the sequential processing strategy. DCVNet obtains comparable accuracy to existing approaches and achieves real-time inference (30 fps on a mid-end 1080ti GPU). The code and model weights are available at https://github.com/neu-vi/ezflow.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-18
# 捕捉されたイオン量子コンピュータのための簡易Mølmer-Sørensenゲート A simplified Mølmer-Sørensen gate for the trapped ion quantum computer ( http://arxiv.org/abs/2112.07855v4 ) ライセンス: Link先を確認	Hiroo Azuma,	(参考訳) トラップされたイオン量子コンピュータで使用されるMolmer-Sorensen(MS)ゲートの簡易化について論じる。元のMSゲートは、2つのイオンにバイクロマチックコヒーレント光電場を同時に照射することで実装されている。本稿では, 単色コヒーレント光電場を個別に照射することにより, 2つのイオンの分離可能な状態をベル状態に変換する方法を提案する。提案するゲートの実行時間の長さは,元のMSゲートの時間に匹敵するが,数値計算により,提案ゲートはフォノンの熱ゆらぎに弱いことが示されている。絡み合いを発生できるが、熱ゆらぎに弱い単純な2イオンゲートの別の例を示すことで、単純化したMSゲートが通常よりもマークされていることを示す。 We discuss how to simplify the Molmer-Sorensen (MS) gate which is used for the trapped ion quantum computer. The original MS gate is implemented by illuminating two ions with bichromatic coherent light fields separately at the same time. In this paper, we propose a method for transforming a separable state of two ions into one of the Bell states by illuminating the two ions with monochromatic coherent light fields individually and this point is the advantage of our scheme over the original MS gate. The length of the execution time of our proposed gate is comparable to that of the original MS gate, however, numerical calculations show that our proposed gate is weakly sensitive to thermal fluctuations of the phonons. By giving another example of a simple two-ion gate that can generate entanglement but is strongly vulnerable to thermal fluctuations, we show that our simplified MS gate is more marked than usual.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-18
# テスト可能なHTML5 Canvasの分類 A Taxonomy of Testable HTML5 Canvas Issues ( http://arxiv.org/abs/2201.07351v5 ) ライセンス: Link先を確認	Finlay Macklon, Markos Viggiato, Natalia Romanova, Chris Buzon, Dale Paas, Cor-Paul Bezemer,	(参考訳) HTML5<canvas>は、Webアプリケーションで高品質なグラフィックを表示するために広く使われている。しかし、<canvas>アプリケーションを構築するのに必要なWeb、GUI、ビジュアルテクニックの組み合わせは、テストやデバッグツールの欠如とともに、そのようなアプリケーションの開発を非常に困難にしています。本稿では,テスト可能な<canvas>問題の分類について述べる。まず,HTML5<canvas>を使用する123のオープンソースプロジェクトから,2,403件の<canvas>関連イシューレポートを抽出した。第2に,無作為な332件の報告を手作業で分類することで分類を構築した。手動分類では、視覚やパフォーマンスの問題など、テスト可能な<canvas>問題の5つの幅広いカテゴリを特定しました。視覚的な問題は最も頻繁(35%)であり、パフォーマンス上の問題は比較的稀(5%)であることがわかった。また、テスト可能な<canvas>問題の多くが、実際にはWebアプリケーションの他のコンポーネントによって引き起こされていることもわかりました。テスト可能な<canvas>問題の分類は,<canvas>問題とテストの今後の研究に有効である。 The HTML5 <canvas> is widely used to display high quality graphics in web applications. However, the combination of web, GUI, and visual techniques that are required to build <canvas> applications, together with the lack of testing and debugging tools, makes developing such applications very challenging. To help direct future research on testing <canvas> applications, in this paper we present a taxonomy of testable <canvas> issues. First, we extracted 2,403 <canvas>-related issue reports from 123 open-source GitHub projects that use the HTML5 <canvas>. Second, we constructed our taxonomy by manually classifying a random sample of 332 issue reports. Our manual classification identified five broad categories of testable <canvas> issues, such as Visual and Performance issues. We found that Visual issues are the most frequent (35%), while Performance issues are relatively infrequent (5%). We also found that many testable <canvas> issues that present themselves visually on the <canvas> are actually caused by other components of the web application. Our taxonomy of testable <canvas> issues can be used to steer future research into <canvas> issues and testing.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-18
# 密度行列を用いた量子密度推定:量子異常検出への応用 Quantum density estimation with density matrices: Application to quantum anomaly detection ( http://arxiv.org/abs/2201.10006v5 ) ライセンス: Link先を確認	Diego H. Useche, Oscar A. Bustos-Brinez, Joseph A. Gallego-Mejia, Fabio A. González,	(参考訳) 密度推定は統計学と機械学習の中心的なタスクである。この問題は、観測されたデータセットに最もよく適合する基礎となる確率密度関数を決定することを目的としている。応用例としては、統計的推論、教師なし学習、異常検出などがある。その関連性にもかかわらず、量子コンピューティングの密度推定への応用を探求する研究はほとんどない。本稿では,Q-DEMDEと呼ばれる新しい量子古典密度行列密度推定モデルを提案する。量子ハードウェアを用いて、混合量子状態によるトレーニングデータの確率分布を構築する。量子コンピュータ上でのスペクトル分解から混合密度行列の期待値を推定するアルゴリズムを提案する。さらに,本手法の量子古典的異常検出への応用について述べる。我々は、量子シミュレータと実量子コンピュータ上の異なるデータセット上で、量子ランダムおよび量子適応フーリエ特徴を用いた密度推定モデルの評価を行った。この研究の重要な成果は、現在の量子コンピュータ上で高い性能で密度推定と異常検出を行うことが可能であることを示すことである。 Density estimation is a central task in statistics and machine learning. This problem aims to determine the underlying probability density function that best aligns with an observed data set. Some of its applications include statistical inference, unsupervised learning, and anomaly detection. Despite its relevance, few works have explored the application of quantum computing to density estimation. In this article, we present a novel quantum-classical density matrix density estimation model, called Q-DEMDE, based on the expected values of density matrices and a novel quantum embedding called quantum Fourier features. The method uses quantum hardware to build probability distributions of training data via mixed quantum states. As a core subroutine, we propose a new algorithm to estimate the expected value of a mixed density matrix from its spectral decomposition on a quantum computer. In addition, we present an application of the method for quantum-classical anomaly detection. We evaluated the density estimation model with quantum random and quantum adaptive Fourier features on different data sets on a quantum simulator and a real quantum computer. An important result of this work is to show that it is possible to perform density estimation and anomaly detection with high performance on present-day quantum computers.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-18
# 効率的な推論のための多段視覚変換器 Multi-Tailed Vision Transformer for Efficient Inference ( http://arxiv.org/abs/2203.01587v3 ) ライセンス: Link先を確認	Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu,	(参考訳) 近年、視覚変換器(ViT)は画像認識において有望な性能を達成し、様々な視覚タスクにおいて、徐々に強力なバックボーンとして機能している。 Transformerのシーケンシャル入力を満たすために、ViTのテールはまず各画像を一定長さの視覚トークンのシーケンスに分割する。次に、以下の自己注意層がトークン間のグローバルな関係を構築し、下流のタスクに有用な表現を生成する。実証的には、より多くのトークンで画像を表現することでパフォーマンスが向上するが、トークンの数に対する自己認識層の2次計算の複雑さは、ViTの推論の効率に深刻な影響を及ぼす可能性がある。計算量削減のために、トランスフォーマーエンコーダで不定形トークンを段階的にプルーニングする手法がいくつかあるが、トランスフォーマーが触れない前にトークンの数を残している。実際、Transformerエンコーダの入力として、以下の計算コストを直接削減できるトークンが少ない。本稿では,MT-ViT(Multi-Tailed Vision Transformer)を提案する。 MT-ViTは、以下のTransformerエンコーダのために異なる長さの視覚シーケンスを生成するために複数のテールを採用する。テール予測器を導入し、どのテールが最も効率的に正確な予測を行うかを決定する。どちらのモジュールも、Gumbel-Softmaxのトリックでエンドツーエンドで最適化されている。 ImageNet-1Kの実験では、MT-ViTは精度を低下させることなくFLOPの大幅な削減を実現し、他の比較手法よりも精度とFLOPの両面で優れていた。 Recently, Vision Transformer (ViT) has achieved promising performance in image recognition and gradually serves as a powerful backbone in various vision tasks. To satisfy the sequential input of Transformer, the tail of ViT first splits each image into a sequence of visual tokens with a fixed length. Then the following self-attention layers constructs the global relationship between tokens to produce useful representation for the downstream tasks. Empirically, representing the image with more tokens leads to better performance, yet the quadratic computational complexity of self-attention layer to the number of tokens could seriously influence the efficiency of ViT's inference. For computational reduction, a few pruning methods progressively prune uninformative tokens in the Transformer encoder, while leaving the number of tokens before the Transformer untouched. In fact, fewer tokens as the input for the Transformer encoder can directly reduce the following computational cost. In this spirit, we propose a Multi-Tailed Vision Transformer (MT-ViT) in the paper. MT-ViT adopts multiple tails to produce visual sequences of different lengths for the following Transformer encoder. A tail predictor is introduced to decide which tail is the most efficient for the image to produce accurate prediction. Both modules are optimized in an end-to-end fashion, with the Gumbel-Softmax trick. Experiments on ImageNet-1K demonstrate that MT-ViT can achieve a significant reduction on FLOPs with no degradation of the accuracy and outperform other compared methods in both accuracy and FLOPs.	翻訳日:2024-03-21 02:10:44 公開日:2024-03-18
# インタラクション・レプリカ:人間と物体の相互作用の追跡と人間の動きからのシーン変化 Interaction Replica: Tracking Human-Object Interaction and Scene Changes From Human Motion ( http://arxiv.org/abs/2205.02830v4 ) ライセンス: Link先を確認	Vladimir Guzov, Julian Chibane, Riccardo Marin, Yannan He, Yunus Saracoglu, Torsten Sattler, Gerard Pons-Moll,	(参考訳) 私たちの世界は静的ではなく、人間は自然に環境の変化を引き起こします。人間によって引き起こされる変化をモデル化することは、デジタル双生児、例えば、共有物理仮想空間(メタバース)とロボット工学の文脈で構築するために不可欠である。このような新興アプリケーションを広く採用するためには、対話を捉えるためのセンサーのセットアップは、エキスパートでないユーザにとって安価で使いやすいものにする必要がある。すなわち、対話は、外部カメラやオブジェクトトラッカーに頼らず、カメラとIMUセンサーの組み合わせのような単純なエゴ中心のセンサーによってキャプチャされ、モデル化されるべきである。しかし、私たちの知る限りでは、このようなエゴ中心のセンサー設定を通じて人間とシーンのインタラクションをモデル化する難しい問題に対処する作業は存在しない。本稿は、シーンにおける人間の視覚的位置決めと、IMUデータからの人間とシーンの相互作用に関する接触に基づく推論を組み合わせることによって、文学におけるこのギャップを埋める。興味深いことに、インタラクションの視覚的な観察がなくても、人間とシーンの接触や相互作用が人間のポーズシーケンスから現実的に予測できることが示される。我々の手法であるiReplica(Interaction Replica)は,没入型仮想空間における将来のAR/VR応用や,人間のように振る舞うためのトレーニングマシンに必要な,人間との相互作用の自我中心的なキャプチャと動的シーンのモデリングに向けた重要な第一歩である。私たちのコード、データ、モデルはプロジェクトのページ(http://virtual humans.mpi-inf.mpg.de/ireplica/)で公開されています。 Our world is not static and humans naturally cause changes in their environments through interactions, e.g., opening doors or moving furniture. Modeling changes caused by humans is essential for building digital twins, e.g., in the context of shared physical-virtual spaces (metaverses) and robotics. In order for widespread adoption of such emerging applications, the sensor setup used to capture the interactions needs to be inexpensive and easy-to-use for non-expert users. I.e., interactions should be captured and modeled by simple ego-centric sensors such as a combination of cameras and IMU sensors, not relying on any external cameras or object trackers. Yet, to the best of our knowledge, no work tackling the challenging problem of modeling human-scene interactions via such an ego-centric sensor setup exists. This paper closes this gap in the literature by developing a novel approach that combines visual localization of humans in the scene with contact-based reasoning about human-scene interactions from IMU data. Interestingly, we can show that even without visual observations of the interactions, human-scene contacts and interactions can be realistically predicted from human pose sequences. Our method, iReplica (Interaction Replica), is an essential first step towards the egocentric capture of human interactions and modeling of dynamic scenes, which is required for future AR/VR applications in immersive virtual universes and for training machines to behave like humans. Our code, data and model are available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/	翻訳日:2024-03-21 02:10:44 公開日:2024-03-18
# ランダムな直交分解と深層学習によるディジタル双対データモデリング Digital Twin Data Modelling by Randomized Orthogonal Decomposition and Deep Learning ( http://arxiv.org/abs/2206.08659v2 ) ライセンス: Link先を確認	Diana Alina Bistrian, Omer San, Ionel Michael Navon,	(参考訳) デジタルツインは、元のプロセスの振る舞いを反映する主な特徴を持つ代理モデルである。複雑性を低減したデジタルツインモデルと動的処理を関連付けることは、CPU時間とハードウェアのコストを削減した精度で動的処理をタイムスケールにマッピングする上で、大きな利点となる。本稿では,流体の効率的なディジタル双対モデルを作成するための新しい枠組みを提案する。我々は、Krylovに基づく動的モード分解と適切な直交分解を組み合わせ、最も影響力のあるモードの選択を上回る新しいアルゴリズムを提案する。我々は,SVD経験的直交分解法に対してランダム化された直交分解アルゴリズムがいくつかの利点を与え,多目的最適化問題の射影誤差を軽減できることを証明した。我々は,ディジタル双対モデルのリアルタイム適応キャリブレーションを行うために,最先端の人工知能ディープラーニング(DL)を巻き込み,忠実度を増大させる。出力は流体力学の高忠実なデジタルTWINデータモデルであり、複雑さの低減の利点がある。複雑化を伴う3つの波動現象の数値シミュレーションにおいて,新しいモデリングツールについて検討した。本研究は,時間シミュレーション応答特性研究を含む数値的精度と計算効率の観点から,新たなディジタルツインデータモデルの性能を徹底的に評価する。 A digital twin is a surrogate model that has the main feature to mirror the original process behavior. Associating the dynamical process with a digital twin model of reduced complexity has the significant advantage to map the dynamics with high accuracy and reduced costs in CPU time and hardware to timescales over which that suffers significantly changes and so it is difficult to explore. This paper introduces a new framework for creating efficient digital twin models of fluid flows. We introduce a novel algorithm that combines the advantages of Krylov based dynamic mode decomposition with proper orthogonal decomposition and outperforms the selection of the most influential modes. We prove that randomized orthogonal decomposition algorithm provides several advantages over SVD empirical orthogonal decomposition methods and mitigates the projection error formulating a multiobjective optimization problem.We involve the state-of-the-art artificial intelligence Deep Learning (DL) to perform a real-time adaptive calibration of the digital twin model, with increasing fidelity. The output is a high-fidelity DIGITAL TWIN DATA MODEL of the fluid flow dynamics, with the advantage of a reduced complexity. The new modelling tools are investigated in the numerical simulation of three wave phenomena with increasing complexity. We show that the outputs are consistent with the original source data.We perform a thorough assessment of the performance of the new digital twin data models, in terms of numerical accuracy and computational efficiency, including a time simulation response feature study.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# 多層ケークのフェアディビジョン Fair Division of Multi-layered Cakes ( http://arxiv.org/abs/2208.00726v2 ) ライセンス: Link先を確認	Mohammad Azharuddin Sanpui,	(参考訳) 複数層ケーキの切断について検討し, 連続性と実現可能性という2つの制約の下で, エージェント群に多数の分割可能な資源(ケーキ層)を適切に割り当てる方法について検討した。まず,'a pair of knives' と呼ばれる多層ケーキに新しい計算モデルを導入する。そして,新しい計算モデルを用いて,2つのエージェントと2つのレイヤに対して,正確なマルチアロケーションが存在することを示す。本研究では,3個以上のエージェントに対して,3層ケーキ上に有意かつ連続的な比例多重配置の計算手順を実証する。最後に,任意の数$n\geq 2^a3$のエージェントと2^a3$のレイヤに対して,$a$が任意の正の整数であるような比例割当を計算する手法を開発した。 We consider multi-layered cake cutting in order to fairly allocate numerous divisible resources (layers of cake) among a group of agents under two constraints: contiguity and feasibility. We first introduce a new computational model in a multi-layered cake named ``a pair of knives''. Then, we show the existence of an exact multi-allocation for two agents and two layers using the new computational model. We demonstrate the computation procedure of a feasible and contiguous proportional multi-allocation over a three-layered cake for more than three agents. Finally, we develop a technique for computing proportional allocations for any number $n\geq 2^a3$ of agents and $2^a3$ layers, where $a$ is any positive integer.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# マルチビヘイビア勧告における公正のための因果介入 Causal Intervention for Fairness in Multi-behavior Recommendation ( http://arxiv.org/abs/2209.04589v2 ) ライセンス: Link先を確認	Xi Wang, Wenjie Wang, Wenge Rong, Fuli Feng, Chuantao Yin, Zhang Xiong,	(参考訳) レコメンダシステムは、クリックやクリック後の動作(例えば、いいね! しかし、これらの行動は必然的に人気バイアスを示し、不公平な問題を引き起こします。 1) 類似品質の品目については、より人気の高い品目が露出しやすくなり、 2) 人気度が低い人気商品の方が露出が大きくなる可能性がある。人気バイアスを緩和する既存の作業は、偏見を盲目的に排除し、アイテムの品質の影響を無視する。異なるユーザ行動(例えば変換率)の関係は、実際にはアイテムの品質を反映している、と我々は主張する。そこで本稿では,不公平な問題に対処するため,複数のユーザの行動を考慮した人気バイアスを軽減することを提案する。本研究では,多行動レコメンデーションにおけるインタラクション生成手法の背景にある因果関係について検討する。特に、私たちはこう発見しています。 1)アイテムの人気は、露出したアイテムとユーザーのクリック後のインタラクションの共創者であり、最初の不公平につながる。 2) 隠れた共同設立者(例えば、商品生産者の評判)は、商品の人気と品質の両方に影響を与え、2番目の不公平をもたらす。これらの問題点を解消するため,共同設立者によるバックドア経路の抑制にバックドア調整を利用する因果効果を推定する因果枠組みを提案する。推論段階では、人気のネガティブな効果を排除し、品質のよい効果を推薦に活用する。 2つの実世界のデータセット実験により,提案手法の有効性が検証された。 Recommender systems usually learn user interests from various user behaviors, including clicks and post-click behaviors (e.g., like and favorite). However, these behaviors inevitably exhibit popularity bias, leading to some unfairness issues: 1) for items with similar quality, more popular ones get more exposure; and 2) even worse the popular items with lower popularity might receive more exposure. Existing work on mitigating popularity bias blindly eliminates the bias and usually ignores the effect of item quality. We argue that the relationships between different user behaviors (e.g., conversion rate) actually reflect the item quality. Therefore, to handle the unfairness issues, we propose to mitigate the popularity bias by considering multiple user behaviors. In this work, we examine causal relationships behind the interaction generation procedure in multi-behavior recommendation. Specifically, we find that: 1) item popularity is a confounder between the exposed items and users' post-click interactions, leading to the first unfairness; and 2) some hidden confounders (e.g., the reputation of item producers) affect both item popularity and quality, resulting in the second unfairness. To alleviate these confounding issues, we propose a causal framework to estimate the causal effect, which leverages backdoor adjustment to block the backdoor paths caused by the confounders. In the inference stage, we remove the negative effect of popularity and utilize the good effect of quality for recommendation. Experiments on two real-world datasets validate the effectiveness of our proposed framework, which enhances fairness without sacrificing recommendation accuracy.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# ニューラルネットワークのグラフニューラルモデリング Graph Neural Modeling of Network Flows ( http://arxiv.org/abs/2209.05208v3 ) ライセンス: Link先を確認	Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi,	(参考訳) 基盤となるインフラが効果的に利用されるようにトラフィックを分散するネットワークフロー問題は、輸送や物流においてユビキタスである。その中でも, 汎用マルチコモディティ・ネットワーク・フロー(MCNF)問題は, リンクの有効利用を実現しつつ, 複数のソースとシンク間の異なるサイズの複数のフローの分散を懸念している。データ駆動最適化の魅力により、これらの問題はグラフ学習法によってますますアプローチされてきた。本稿では,Per-Edge Weights (PEW) と呼ばれるネットワークフロー問題に対する新しいグラフ学習アーキテクチャを提案する。この方法はグラフアテンションネットワーク上に構築され、各リンクに沿って明確にパラメトリケートされたメッセージ関数を使用する。提案手法を,サービスプロバイダの17ドルのトポロジと2ドルのルーティングスキームを用いて,インターネットフロールーティングケーススタディを通じて広く評価した。本稿では,グローバルメッセージ関数がルーティングを不必要に制約するアーキテクチャに対して,PEWが実質的な利得が得られることを示す。 MLPが他の標準アーキテクチャと競合していることもわかっています。さらに,データ駆動型フロールーティングにおけるグラフ構造と予測性能の関係を解析する。 Network flow problems, which involve distributing traffic such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Among them, the general Multi-Commodity Network Flow (MCNF) problem concerns the distribution of multiple flows of different sizes between several sources and sinks, while achieving effective utilization of the links. Due to the appeal of data-driven optimization, these problems have increasingly been approached using graph learning methods. In this paper, we propose a novel graph learning architecture for network flow problems called Per-Edge Weights (PEW). This method builds on a Graph Attention Network and uses distinctly parametrized message functions along each link. We extensively evaluate the proposed solution through an Internet flow routing case study using $17$ Service Provider topologies and $2$ routing schemes. We show that PEW yields substantial gains over architectures whose global message function constrains the routing unnecessarily. We also find that an MLP is competitive with other standard architectures. Furthermore, we analyze the relationship between graph structure and predictive performance for data-driven routing of flows, an aspect that has not been considered by existing work in the area.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# 機械学習を用いた予測不確実性推定の検討 A review of predictive uncertainty estimation with machine learning ( http://arxiv.org/abs/2209.08307v2 ) ライセンス: Link先を確認	Hristos Tyralis, Georgia Papacharalampous,	(参考訳) 機械学習モデルの予測と予測は、エンドユーザーに伝達される情報の量を増やすことを目的として、確率分布の形式をとるべきである。学術・産業における確率的予測と機械学習モデルによる予測の応用は、ますます頻繁になってきているが、関連する概念や手法は、全分野の全体観の下で形式化され、構造化されていない。本稿では,機械学習アルゴリズムによる予測不確実性推定の話題と,確率的予測を評価するための関連する指標(一貫性スコアリング関数と適切なスコアリングルール)について概説する。このレビューでは、最近の機械学習アルゴリズム(位置、スケール、形状、ランダムな森林、強化とディープラーニングアルゴリズムの一般化された付加的モデルを含む)への早期統計(ベイズ統計または量子回帰に基づく線形回帰と時系列モデル)の導入から、自然により柔軟である期間をカバーしている。最新の進歩は、より複雑なアルゴリズムに適用された基本的な概念に基づいているため、ユーザのニーズに合わせて新しいアルゴリズムを開発する方法についての理解を深める。材料を分類し、研究のホットトピックとなっている課題について議論することで、結論付けます。 Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# ヒューマンAI意思決定における説明・公正・適切な信頼 Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making ( http://arxiv.org/abs/2209.11812v5 ) ライセンス: Link先を確認	Jakob Schoeffer, Maria De-Arteaga, Niklas Kuehl,	(参考訳) 本研究では,特徴に基づく説明がAIによる意思決定の分配的公正性に及ぼす影響について検討する。また、人間の公正感とAIレコメンデーションへの依存によって、どのような効果が媒介されるかについても検討する。以上の結果から,説明が公正感に影響を及ぼし,人間によるAI推奨に固執する傾向が示唆された。しかし、このような説明は、人間が正しいAIレコメンデーションと不正なAIレコメンデーションを識別することができない。代わりに、AIレコメンデーションの正確性に関わらず、それらが依存に影響を与える可能性があることを示す。説明がタスク非関連で、センシティブな属性と明らかに関連している特徴を強調している場合、これは、性別のステレオタイプに合わせてAIの推奨に反するオーバーライドを引き起こす。一方、説明がタスク関連性を示す場合、これはステレオタイプ整列エラーを強化する信頼行動を引き起こす。これらの結果は、特徴に基づく説明は分配的公正性を改善するための信頼性のあるメカニズムではないことを示唆している。 In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, we see that such explanations do not enable humans to discern correct and incorrect AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter AI recommendations that align with gender stereotypes. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results imply that feature-based explanations are not a reliable mechanism to improve distributive fairness.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# 地域的・地域的カウンターファクトルール:要約とロバストな論説 Local and Regional Counterfactual Rules: Summarized and Robust Recourses ( http://arxiv.org/abs/2209.14568v3 ) ライセンス: Link先を確認	Salim I. Amoukou, Nicolas J. B Brunel,	(参考訳) Counterfactual Explanations (CE)は、安定性の確保、複数のCEの合成、信頼性とスパーシリティの保証など、いくつかの未解決課題に直面している。より実践的な視点から見ると、最近の研究[Pawelczyk et al , 2022] は、規定された反ファクト・リコースが個人によって正しく実施されていないことを示し、ほとんどの最先端のCEアルゴリズムが、このノイズの多い環境で失敗する可能性が非常に高いことを示した。これらの問題に対処するため,各観測値に対して局所的反実律を緩やかに付与し,高い確率で決定を変更できる範囲の値を与える確率的枠組みを提案する。これらの規則は、様々な反事実的説明の要約として機能し、堅牢な論説をもたらす。さらに、これらの局所ルールを地域反事実ルールに集約し、データのサブグループに対する共有リコースを識別する。我々の地域・地域ルールはRandom Forestアルゴリズムから導かれており、高密度領域のレコースを選択することにより、データ分布に対する統計的保証と忠実度を提供する。さらに、我々のルールは、まず、決定を変更できる確率の高い最小の変数群を選択するため、疎い。我々は, 標準CEと最近の同様の試みと比較して, 対実ルールの有効性を検証する実験を行った。当社のメソッドはPythonパッケージとして利用可能です。 Counterfactual Explanations (CE) face several unresolved challenges, such as ensuring stability, synthesizing multiple CEs, and providing plausibility and sparsity guarantees. From a more practical point of view, recent studies [Pawelczyk et al., 2022] show that the prescribed counterfactual recourses are often not implemented exactly by individuals and demonstrate that most state-of-the-art CE algorithms are very likely to fail in this noisy environment. To address these issues, we propose a probabilistic framework that gives a sparse local counterfactual rule for each observation, providing rules that give a range of values capable of changing decisions with high probability. These rules serve as a summary of diverse counterfactual explanations and yield robust recourses. We further aggregate these local rules into a regional counterfactual rule, identifying shared recourses for subgroups of the data. Our local and regional rules are derived from the Random Forest algorithm, which offers statistical guarantees and fidelity to data distribution by selecting recourses in high-density regions. Moreover, our rules are sparse as we first select the smallest set of variables having a high probability of changing the decision. We have conducted experiments to validate the effectiveness of our counterfactual rules in comparison to standard CE and recent similar attempts. Our methods are available as a Python package.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# ゼロ階ハードThresholding: Gradient Error vs. Expansivity Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity ( http://arxiv.org/abs/2210.05279v2 ) ライセンス: Link先を確認	William de Vazelhes, Hualin Zhang, Huimin Wu, Xiao-Tong Yuan, Bin Gu,	(参考訳) $\ell_0$制約付き最適化は、特に高次元問題において機械学習において一般的である。厳密な勾配勾配降下はこの問題を解決する主要な手法である。しかし、目的関数の1階勾配は、ゼロ階勾配(ZO)が良い代理となるような実世界の多くの問題において計算するために、利用できないか高価なかのいずれかである。残念なことに、ZO勾配がハードThresholding演算子と機能するかどうかはまだ未解決の問題である。そこで,本稿では,制約付きブラックボックス確率最適化問題に焦点をあて,新しいランダムサポートサンプリングを用いた一般ZO勾配推定器を用いた確率的ゼロ階勾配ハードスレッディング(SZOHT)アルゴリズムを提案する。標準仮定の下でSZOHTの収束解析を行う。重要なことは、ZO推定器の偏差とハードThresholding演算子の膨張率との矛盾を明らかにし、ZO勾配におけるランダムな方向の数の理論的最小値を提供する。さらに,SZOHTの問合せ複雑性は,異なる条件下での次元性に依存しないか,あるいは弱く依存していることがわかった。最後に,ポートフォリオ最適化問題およびブラックボックス攻撃に対する本手法の有用性について述べる。 $\ell_0$ constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the $\ell_0$ constrained black-box stochastic optimization problems, and propose a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling. We provide the convergence analysis of SZOHT under standard assumptions. Importantly, we reveal a conflict between the deviation of ZO estimators and the expansivity of the hard-thresholding operator, and provide a theoretical minimal value of the number of random directions in ZO gradients. In addition, we find that the query complexity of SZOHT is independent or weakly dependent on the dimensionality under different settings. Finally, we illustrate the utility of our method on a portfolio optimization problem as well as black-box adversarial attacks.	翻訳日:2024-03-21 02:00:54 公開日:2024-03-18
# Copula Conformal Prediction for Multi-step Time Series Forecasting Copula Conformal Prediction for Multi-step Time Series Forecasting ( http://arxiv.org/abs/2212.03281v4 ) ライセンス: Link先を確認	Sophia Sun, Rose Yu,	(参考訳) 正確な不確実性測定は、堅牢で信頼性の高い機械学習システムを構築するための重要なステップである。コンフォーマル予測(Conformal prediction)は、その実装の容易さ、統計的カバレッジ保証、および基礎となる予測器の汎用性で人気のある、分布のない不確実性定量化アルゴリズムである。しかし、時系列に対する既存の共形予測アルゴリズムは、時間的依存を考慮せずに単段階予測に制限される。本稿では,多変量・多段階時系列予測のためのCopula Conformal Predictionアルゴリズム,CopulaCPTSを提案する。我々はCopulaCPTSが有限標本妥当性を保証することを証明した。いくつかの合成および実世界の多変量時系列データセットにおいて、CopulaCPTSは既存の手法よりも多段階予測タスクに対してより校正され、鋭い信頼区間を生成することを示す。 Accurate uncertainty measurement is a key step to building robust and reliable machine learning systems. Conformal prediction is a distribution-free uncertainty quantification algorithm popular for its ease of implementation, statistical coverage guarantees, and versatility for underlying forecasters. However, existing conformal prediction algorithms for time series are limited to single-step prediction without considering the temporal dependency. In this paper, we propose a Copula Conformal Prediction algorithm for multivariate, multi-step Time Series forecasting, CopulaCPTS. We prove that CopulaCPTS has finite sample validity guarantee. On several synthetic and real-world multivariate time series datasets, we show that CopulaCPTS produces more calibrated and sharp confidence intervals for multi-step prediction tasks than existing techniques.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-18
# Tsallis KL分枝を用いた一般化Munchausen強化学習 Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence ( http://arxiv.org/abs/2301.11476v4 ) ライセンス: Link先を確認	Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White,	(参考訳) 強化学習における多くの政策最適化アプローチは、政策の急激な変更を防ぐために、クルバック・ライルブラー(KL)を以前の政策に分岐させる。このアイデアは、TRPOやMunchausen Value Iteration (MVI)といったアルゴリズムによって与えられる近似を用いて、保守政策イテレーションに関するセミナー論文で最初に提案された。我々は、定義に$q$-logarithmを使用する一般化KL発散(英語版)(Tsallis KL divergence)と呼ばれる、一般化KL発散(英語版)の研究を継続する。このアプローチは厳密な一般化であり、$q = 1$ は標準 KL の発散に対応する;$q > 1$ は様々な新しい選択肢を提供する。我々は、Tsallis KLで学んだポリシーのタイプを特徴付け、$q > 1$が有益である場合に動機付ける。 Tsallis KL正則化を組み込んだ実用的なアルゴリズムを得るために、我々はKL正則化を組み込む最も単純なアプローチの一つであるMVIを拡張する。この一般化されたMVI($q$)は、35のアタリゲームにおいて標準MVI($q = 1$)よりも大幅に改善されていることを示す。 Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the $q$-logarithm in the definition. The approach is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q >1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games.	翻訳日:2024-03-21 01:51:05 公開日:2024-03-18
# 蒸留における学生・教師の逸脱について--不服従にかかわるか? On student-teacher deviations in distillation: does it pay to disobey? ( http://arxiv.org/abs/2301.12923v3 ) ライセンス: Link先を確認	Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar,	(参考訳) 知識蒸留(KD)は「学生」ネットワークのテスト精度を向上させるために広く使われており、訓練された「教師」ネットワークのソフトな確率を模倣するように訓練されている。しかし、近年の研究では、教師の確率に合うように訓練されているにもかかわらず、生徒は教師の確率から大きく逸脱するだけでなく、パフォーマンスにおいて教師よりも優れていることが示されている。私たちの研究は、この一見パラドックス的な観察を和解することを目的としています。具体的には、学生と教師の偏差の正確な性質を特徴付け、どのようにしてより一般化して共起できるかについて議論する。まず、画像と言語データの実験を通して、これらの確率偏差が教師の信頼度を体系的に誇張する学生に対応することを確認した。次に、理論上かつ経験的に、いくつかの単純な設定で別の誇張の形式を確立する: KDは、データのトップ固有方向に沿って高速に収束する際に、勾配降下の暗黙のバイアスを誇張する。最後に、これらの2つの観測を結びつける。我々は、KDの誇張バイアスが同時に両方の結果をもたらすことを実証する。 a) 自信と自信の誇張 b) 学生の一般化が向上し, 明らかなパラドックスに対する解決法が提供される。我々の分析は、KDにおける勾配降下の役割を考慮し、理論的および経験的両方の環境において過大なバイアス効果を示すことによって、既存の理論と実践をより近づける。 Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in performance. Our work aims to reconcile this seemingly paradoxical observation. Specifically, we characterize the precise nature of the student-teacher deviations, and argue how they can co-occur with better generalization. First, through experiments on image and language data, we identify that these probability deviations correspond to the student systematically exaggerating the confidence levels of the teacher. Next, we theoretically and empirically establish another form of exaggeration in some simple settings: KD exaggerates the implicit bias of gradient descent in converging faster along the top eigendirections of the data. Finally, we tie these two observations together: we demonstrate that the exaggerated bias of KD can simultaneously result in both (a) the exaggeration of confidence and (b) the improved generalization of the student, thus offering a resolution to the apparent paradox. Our analysis brings existing theory and practice closer by considering the role of gradient descent in KD and by demonstrating the exaggerated bias effect in both theoretical and empirical settings.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# DCEM-PINN:固体力学の深い相補的エネルギー法 DCEM-PINNs: A deep complementary energy method for solid mechanics ( http://arxiv.org/abs/2302.01538v6 ) ライセンス: Link先を確認	Yizheng Wang, Jia Sun, Timon Rabczuk, Yinghua Liu,	(参考訳) 近年、ディープラーニングの急速な進歩は、特に固体力学の領域で偏微分方程式(PDE)を解く際に、様々な分野に大きな影響を与え、ニューラルネットワークの顕著な近似能力の恩恵を受けている。 PDEの解決において、物理情報ニューラルネットワーク(PINN)とDeep Energy Method(DEM)が注目されている。最小ポテンシャルエネルギーと相補エネルギーの原理は、固体力学における2つの重要な変分原理である。しかし、よく知られたDeep Energy Method (DEM) は最小ポテンシャルエネルギーの原理に基づいているが、最小補完エネルギーの重要な形態は欠いている。このギャップを埋めるために、最小補間エネルギーの原理に基づく深部補間エネルギー法(DCEM)を提案する。 DCEMの出力関数は、本質的に平衡方程式を満たす応力関数である。本稿では,Prandtl と Airy の応力関数を用いて数値計算を行い,典型的な機械的問題をモデル化する際,DCEM と既存の PINN と DEM のアルゴリズムを比較した。以上の結果から,DCEMはDEMよりも応力精度と効率が優れており,理論的解析や数値シミュレーションによって支持される複雑な変位境界条件に対処する上で有利であることが示唆された。我々はDCEMをDCEM-Plus(DCEM-P)に拡張し、偏微分方程式を満たす項を追加する。さらに,演算子学習と物理方程式を組み合わせることで,Deep complementary energy operator method (DCEM-O)を提案する。当初,我々は高忠実度数値結果を用いてDCEM-Oを訓練し,補完エネルギーを取り入れた。 DCEM-PとDCEM-Oは、DCEMの精度と効率をさらに高める。 In recent years, the rapid advancement of deep learning has significantly impacted various fields, particularly in solving partial differential equations (PDEs) in the realm of solid mechanics, benefiting greatly from the remarkable approximation capabilities of neural networks. In solving PDEs, Physics-Informed Neural Networks (PINNs) and the Deep Energy Method (DEM) have garnered substantial attention. The principle of minimum potential energy and complementary energy are two important variational principles in solid mechanics. However, the well-known Deep Energy Method (DEM) is based on the principle of minimum potential energy, but there lacks the important form of minimum complementary energy. To bridge this gap, we propose the deep complementary energy method (DCEM) based on the principle of minimum complementary energy. The output function of DCEM is the stress function, which inherently satisfies the equilibrium equation. We present numerical results using the Prandtl and Airy stress functions, and compare DCEM with existing PINNs and DEM algorithms when modeling representative mechanical problems. The results demonstrate that DCEM outperforms DEM in terms of stress accuracy and efficiency and has an advantage in dealing with complex displacement boundary conditions, which is supported by theoretical analyses and numerical simulations. We extend DCEM to DCEM-Plus (DCEM-P), adding terms that satisfy partial differential equations. Furthermore, we propose a deep complementary energy operator method (DCEM-O) by combining operator learning with physical equations. Initially, we train DCEM-O using high-fidelity numerical results and then incorporate complementary energy. DCEM-P and DCEM-O further enhance the accuracy and efficiency of DCEM.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# 同時音楽生成と分離のためのマルチソース拡散モデル Multi-Source Diffusion Models for Simultaneous Music Generation and Separation ( http://arxiv.org/abs/2302.02257v4 ) ライセンス: Link先を確認	Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà,	(参考訳) 本研究では、文脈を共有するソースの結合確率密度のスコアを学習することにより、音楽合成と音源分離の両方が可能な拡散ベース生成モデルを定義する。古典的な全推論タスク(例えば、ミックスを生成し、ソースを分離する)とともに、ソース計算の部分生成タスクも導入し、実験し、ソースのサブセットを生成する(例えば、ドラムとうまく連携するピアノトラックを再生する)。さらに,ディラック確率関数に基づく分離タスクの新しい推論手法を提案する。我々は,音楽音源分離のための標準データセットであるSlakh2100でモデルをトレーニングし,生成設定における定性的な結果を提供し,音源分離設定における競合定量的結果を示す。本手法は,生成タスクと分離タスクの両方を扱える単一モデルの最初の例である。 In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task based on Dirac likelihood functions. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the source separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# ニューラルコードモデル解釈のための因果論に向けて Toward a Theory of Causation for Interpreting Neural Code Models ( http://arxiv.org/abs/2302.03788v2 ) ライセンス: Link先を確認	David N. Palacio, Nathan Cooper, Alvaro Rodriguez, Kevin Moran, Denys Poshyvanyk,	(参考訳) コードのニューラル言語モデル(Neural Language Models of Code、NCM)は、研究プロトタイプから商用開発ツールまで、急速に進歩している。そのため、そのようなモデルの能力と限界を理解することが重要になっている。しかしながら、これらのモデルの能力は通常、実際のパフォーマンスの一部だけを明らかにする自動メトリクスを使用して測定される。一般的には、NCMのパフォーマンスは有望であるように思われるが、現在、そのようなモデルがどのように決定を下すかは不明だ。そこで本研究では,モデル予測を記述可能な NCM 固有のポストホック解釈法である $do_{code}$ を紹介する。 $do_{code}$は、言語指向の説明を可能にする因果推論に基づいている。 do_{code}$の理論的基盤は、異なるモデル特性を探索するために拡張可能であるが、プログラミング言語の性質におけるモデル挙動の説明を基礎として、突発的相関の影響を軽減することを目的とした具体的なインスタンス化を提供する。 do_{code}$の実用的メリットを実証するために,2つの人気のあるディープラーニングアーキテクチャと10のNCMに関するケーススタディを実行することで,我々のフレームワークが提供できる洞察について説明する。このケーススタディの結果から,NCMはコード構文の変化に敏感であることが示唆された。 BERTライクなモデルを除いて、我々のNCMは、他のプログラミング言語の構造と比べて、曖昧なバイアスが少なく、コードのブロック(グレッグ括弧、括弧、セミコロン)に関連するトークンを統計的に予測することを学びます。これらの知見は、NCMにおける共起バイアスの検出と緩和に有用な方法として$do_{code}$の可能性を示している。 Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces $do_{code}$, a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. $do_{code}$ is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of $do_{code}$ are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of $do_{code}$, we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code (\eg brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of $do_{code}$ as a useful method to detect and facilitate the elimination of confounding bias in NCMs.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# 深さ・意味を考慮したマルチモーダルドメイン翻訳:LiDAR点雲から3次元パノラマカラー画像を生成する Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds ( http://arxiv.org/abs/2302.07661v4 ) ライセンス: Link先を確認	Tiago Cortinhal, Eren Erdal Aksoy,	(参考訳) 本研究は,LiDARとカメラセンサのマルチモーダル構成によるクロスドメイン画像・画像変換のための,深度とセマンティックスを考慮した新しい条件生成モデルTITAN-Nextを提案する。提案モデルでは,シーンセマンティクスを中間レベル表現として活用し,シーンセグメントのみに依存して生のLiDAR点雲をRGB-Dカメラ画像に変換する。我々は、これがこの種の最初のフレームワークであり、フェールセーフなメカニズムを提供し、ターゲット画像領域で利用可能なデータを増強するなど、自動運転車に実践的な応用があると主張している。提案モデルは,大規模かつ挑戦的なセマンティック・キティデータセットに基づいて評価され,実験結果から,元のTITAN-Netや他の強力なベースラインよりも23.7$\%のマージンをかなり上回っていることがわかった。 This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7$\%$ margin in terms of IoU.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# 多構成価結合理論への量子アルゴリズム的アプローチ:解釈可能な回路設計からの考察 A Quantum Algorithmic Approach to Multiconfigurational Valence Bond Theory: Insights from Interpretable Circuit Design ( http://arxiv.org/abs/2302.10660v2 ) ライセンス: Link先を確認	Jakob S. Kottmann, Francesco Scala,	(参考訳) 量子コンピュータ上でのフェルミオン基底状態の効率的な作成方法が求められており、近年様々な技術が開発されている。膨大な数のメソッドがあるにもかかわらず、どのメソッドがどのシステムによく機能するかは未だに不明である。本研究では,多構成価結合波動関数を最適化するために,解釈可能な回路設計と効果的な基本手法を組み合わせる。選択されたモデルシステムに基づいて、これがいかに説明可能な性能をもたらすかを示す。提案手法は, 実効ベースのサイズや, 関連回路の個々の量子資源の観点から, 関連手法よりも優れていることを示す。 Efficient ways to prepare fermionic ground states on quantum computers are in high demand and different techniques have been developed over the last years. Despite having a vast set of methods, it is still unclear which method performs well for which system. In this work, we combine interpretable circuit designs with an effective basis approach in order to optimize a multiconfigurational valence bond wavefunction. Based on selected model systems, we show how this leads to explainable performance. We demonstrate that the developed methodology outperforms related methods in terms of the size of the effective basis as well as individual quantum resources for the involved circuits.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# Magnushammer: 選択を前提としたトランスフォーマーベースのアプローチ Magnushammer: A Transformer-Based Approach to Premise Selection ( http://arxiv.org/abs/2303.04488v3 ) ライセンス: Link先を確認	Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu,	(参考訳) 本稿では,自動定理証明における重要な推論課題である前提選択に対する新しいアプローチを提案する。伝統的に、このタスクには広範なドメイン知識とエンジニアリングの努力に依存するシンボリックメソッドが適用される。対照的に、この研究は、トランスフォーマーアーキテクチャによる対照的なトレーニングが、エンジニアリングオーバーヘッドを伴わずに、関連する前提の高品質な検索を実現することを実証している。我々の手法であるMagnushammerは、Sledgehammerと呼ばれるインタラクティブな定理の証明において、最も先進的で広く使われている自動化ツールより優れています。 PISA と miniF2F のベンチマークでは、Magnushammer は59.5 %$(それぞれ38.3 %$)と34.0 %$(それぞれ20.9 %$)を達成している。言語モデルに基づく自動定理証明器と<method</methodを併用することにより、PISAベンチマークにおいて4ドル未満のパラメータを用いて、最先端の証明成功率を57.0\%から71.0\%に改善する。さらに,本研究では,(保護状態,関連する前提)ペアのテキスト表現を含む,前提選択のための新しいデータセットを開発し,オープンソース化する。私たちの知る限りでは、これは利用可能な最大の前提選択データセットであり、Isabelle証明アシスタントの最初のものである。 This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without the engineering overhead. Our method, Magnushammer, outperforms the most advanced and widely used automation tool in interactive theorem proving called Sledgehammer. On the PISA and miniF2F benchmarks Magnushammer achieves $59.5\%$ (against $38.3\%$) and $34.0\%$ (against $20.9\%$) success rates, respectively. By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57.0\%$ to $71.0\%$ on the PISA benchmark using $4$x fewer parameters. Moreover, we develop and open source a novel dataset for premise selection, containing textual representations of (proof state, relevant premise) pairs. To the best of our knowledge, this is the largest available premise selection dataset, and the first one for the Isabelle proof assistant.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# EHRDiff:拡散モデルによるリアルなEHR合成の探索 EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models ( http://arxiv.org/abs/2303.05656v2 ) ライセンス: Link先を確認	Hongyi Yuan, Songchi Zhou, Sheng Yu,	(参考訳) 電子健康記録(EHR)には、精密医療システムの開発のための貴重な資源として、豊富な生物医学情報が含まれている。しかしながら、プライバシに関する懸念は、研究者のための高品質で大規模なEHRデータへのアクセスを制限し、方法論の発展を妨げている。近年の研究では、生成的モデリング技術による現実的なEHRデータの合成が試みられ、提案手法の大半は、生成的敵対的ネットワーク(GAN)とそのEHR合成のバリエーションに依存している。 GANに基づく手法はEHRデータの生成における最先端性能を実現するが、これらの手法は訓練が困難であり、モード崩壊の傾向にある。近年, 画像生成において拡散モデルにより最先端の性能が確立されているが, EHRデータ合成における有効性は未解明のままである。本研究では, EHRデータ合成における拡散モデルの可能性について検討し, 新たな手法である EHRDiff を提案する。広範な実験を通じて、EHRDiffは、合成されたEHRデータのための新しい最先端の品質を確立し、一方でプライベート情報を保護する。 Electronic health records (EHR) contain a wealth of biomedical information, serving as valuable resources for the development of precision medicine systems. However, privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers, impeding progress in methodological development. Recent research has delved into synthesizing realistic EHR data through generative modeling techniques, where a majority of proposed methods relied on generative adversarial networks (GAN) and their variants for EHR synthesis. Despite GAN-based methods attaining state-of-the-art performance in generating EHR data, these approaches are difficult to train and prone to mode collapse. Recently introduced in generative modeling, diffusion models have established cutting-edge performance in image generation, but their efficacy in EHR data synthesis remains largely unexplored. In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff. Through extensive experiments, EHRDiff establishes new state-of-the-art quality for synthetic EHR data, protecting private information in the meanwhile.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# Tag2Text:イメージタグによる視覚言語モデルの誘導 Tag2Text: Guiding Vision-Language Model via Image Tagging ( http://arxiv.org/abs/2303.05657v3 ) ライセンス: Link先を確認	Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang,	(参考訳) 本稿では,視覚言語事前学習(VLP)フレームワークであるTag2Textについて述べる。対象タグを手動でラベル付けするか,あるいはオフザシェルフ検出器で自動的に検出する従来の手法とは対照的に,本手法では画像ペアリングテキストから解析したタグを用いて画像タグを明示的に学習し,視覚言語モデルに強力な意味的ガイダンスを提供する。このように、Tag2Textは、画像とテキストのペアに応じて、大規模なアノテーションのない画像タグを利用でき、オブジェクトを超えてより多様なタグカテゴリを提供する。結果として、Tag2Textは、完全な教師付きモデルに匹敵するゼロショットのパフォーマンスで、基礎的なイメージタグ付けモデルの能力を示す。さらに、タグ付けガイダンスを活用することで、Tag2Textは世代ベースとアライメントベースの両方のタスクにおける視覚言語モデルの性能を効果的に向上する。幅広いダウンストリームベンチマークにおいて、Tag2Textは、同様のモデルサイズとデータスケールで最先端の結果を達成し、提案したタグ付けガイダンスの有効性を実証する。コード、デモ、事前訓練されたモデルはhttps://github.com/xinyu1205/recognize-anything.comで入手できる。 This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features. In contrast to prior works which utilize object tags either manually labeled or automatically detected with an off-the-shelf detector with limited performance, our approach explicitly learns an image tagger using tags parsed from image-paired text and thus provides a strong semantic guidance to vision-language models. In this way, Tag2Text can utilize large-scale annotation-free image tags in accordance with image-text pairs, and provides more diverse tag categories beyond objects. As a result, Tag2Text demonstrates the ability of a foundational image tagging model, with superior zero-shot performance even comparable to fully supervised models. Moreover, by leveraging the tagging guidance, Tag2Text effectively enhances the performance of vision-language models on both generation-based and alignment-based tasks. Across a wide range of downstream benchmarks, Tag2Text achieves state-of-the-art results with similar model sizes and data scales, demonstrating the efficacy of the proposed tagging guidance. Code, demo and pre-trained models are available at https://github.com/xinyu1205/recognize-anything.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# 多レベル原子の秩序配列におけるディック超放射能 Dicke superradiance in ordered arrays of multilevel atoms ( http://arxiv.org/abs/2304.00093v2 ) ライセンス: Link先を確認	Stuart J. Masson, Jacob P. Covey, Sebastian Will, Ana Asenjo-Garcia,	(参考訳) 逆原子アンサンブルでは、光子を媒介とする相互作用は多体崩壊の形で、光子バーストとしてエネルギーが急速に放出される。元々は点のようなアンサンブルで研究されていたが、粒子間距離が一定の境界以下であれば、この現象は拡張順序系で継続する。ここでは, ストロンチウムやイッテルビウムなどのアルカリ性アース(-様)原子の配列を順序付けして, 現実的な実験環境下でのDicke超放射能について検討する。このような原子は、内部構造が長い波長の遷移に比べて短い原子間距離でトラップできるので、光と物質の相互作用にエキサイティングな新しい機会を与える。その複雑な電子構造にもかかわらず、これらの原子種の2次元配列は達成可能な格子定数に対して多体超放射性を示すことが示される。さらに、超放射能は、マルチレベル原子がより2レベルになるような「クローゼス」遷移を効果的に行う。これは、アバランシェ様の崩壊がほとんどの光子の放出を支配的な遷移に導いており、その微細構造とゼーマン分岐によって予測される単原子の崩壊比を克服しているためである。我々の研究は、アルカリ原子を量子光学源として利用し、多体散逸動力学を探求するためのプラットフォームとして利用するための重要なステップである。 In inverted atomic ensembles, photon-mediated interactions give rise to Dicke superradiance, a form of many-body decay that results in a rapid release of energy as a photon burst. While originally studied in pointlike ensembles, this phenomenon persists in extended ordered systems if the inter-particle distance is below a certain bound. Here, we investigate Dicke superradiance in a realistic experimental setting using ordered arrays of alkaline-earth(-like) atoms, such as strontium and ytterbium. Such atoms offer exciting new opportunities for light-matter interactions as their internal structure allows for trapping at short interatomic distances compared to their long-wavelength transitions, providing the potential for collectively enhanced dissipative interactions. Despite their intricate electronic structure, we show that two-dimensional arrays of these atomic species should exhibit many-body superradiance for achievable lattice constants. Moreover, superradiance effectively ``closes'' transitions, such that multilevel atoms become more two-level like. This occurs because the avalanchelike decay funnels the emission of most photons into the dominant transition, overcoming the single-atom decay ratios dictated by their fine structure and Zeeman branching. Our work represents an important step in harnessing alkaline-earth atoms as quantum optical sources and as platforms to explore many-body dissipative dynamics.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# DeforestVis: サロゲート決定スタンプを用いた機械学習モデルの動作解析 DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps ( http://arxiv.org/abs/2304.00133v4 ) ライセンス: Link先を確認	Angelos Chatzimparmpas, Rafael M. Martins, Alexandru C. Telea, Andreas Kerren,	(参考訳) 機械学習(ML)モデルの複雑さが増大し、異なる(そして重要な)ドメインでの応用が増加するにつれて、より解釈可能で信頼性の高いMLに対する強い需要がある。そのようなモデルを直接、モデルに依存しない、解釈する方法は、ルールセットや決定ツリーのような代理モデルを訓練することである。しかし、ルールセットは非常に長くなり、多くのif-else文があり、複雑なMLモデルを正確にエミュレートすると決定木深さが急速に増加する。このような場合、どちらのアプローチも、モデル解釈可能性を持つユーザを目標とする中核的な目標達成に失敗する可能性がある。これを解決するために,Adaptive Boosting (AdaBoost) 技術で生成された一段決定切り株(一段決定木)を提供することにより,複雑なMLモデルの振る舞いを要約する視覚解析ツールであるDeforestVisを提案する。 DeforestVisは、より多くの切り株をインクリメンタルに生成し、決定を正当化するために重み付けされた切り株を使った属性ベースの説明を作成し、1つ以上の切り株間のトレーニングインスタンス割り当てに対するルールオーバーライドの影響を分析することで、複雑さとフィデリティのトレードオフを探索するのに役立つ。独立したテストセットでは、手動のルール変更の有効性を監視し、ケースバイケース分析に基づいて仮説を形成することができる。 DeforestVisの適用性と有用性について,2つのユースケースと,データアナリストとモデル開発者とのエキスパートインタビューで紹介する。 As the complexity of machine learning (ML) models increases and their application in different (and critical) domains grows, there is a strong demand for more interpretable and trustworthy ML. A direct, model-agnostic, way to interpret such models is to train surrogate models-such as rule sets and decision trees-that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal-providing users with model interpretability. To tackle this, we propose DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the Adaptive Boosting (AdaBoost) technique. DeforestVis helps users to explore the complexity versus fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analysing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case analyses. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.	翻訳日:2024-03-21 01:40:47 公開日:2024-03-18
# 神経集団動態と幾何学の解釈可能な統計的表現 Interpretable statistical representations of neural population dynamics and geometry ( http://arxiv.org/abs/2304.03376v3 ) ライセンス: Link先を確認	Adam Gosztolai, Robert L. Peach, Alexis Arnaudon, Mauricio Barahona, Pierre Vandergheynst,	(参考訳) 多くの行動課題におけるニューロンの集団のダイナミクスは、低次元多様体上で進化する。しかし、行動情報に明示的に依存することなく、個人や状況間で解釈可能で、一貫したデオードが可能なニューラル記録から潜伏表現を発見することは依然として困難である。本稿では,局所的動的特徴の統計分布に基づく非線形力学のデータ駆動表現のための,完全に教師なしの幾何学的深層学習フレームワークMARBLEを紹介する。非線形力学系からのシリカの例と、リカレントニューラルネットワークによる例と、霊長類やネズミからの生体内記録の両方を用いて、MARBLEは、決定閾値、運動学、内部状態などの大域的システム変数の観点から高い解釈が可能な潜伏表現を推測できることを示した。また、MARBLE表現はニューラルネットワークや動物間で一貫性があることを示し、認知計算の比較や普遍デコーダの訓練に使用することができる。広範なベンチマークによって、教師なしのMARBLEは、現在の教師付きアプローチに匹敵する、あるいははるかに優れた、クラス内および動物間デコーディングの精度を提供するが、行動ラベルは不要であることを示す。この結果から,ニューラルネットワークの時間的情報とともに多様体構造を用いることで,より優れた復号アルゴリズムを開発し,実験間でデータを同化するための共通の枠組みが得られることが示唆された。 The dynamics of neuron populations during many behavioural tasks evolve on low-dimensional manifolds. However, it remains challenging to discover latent representations from neural recordings that are interpretable and consistently decodable across individuals and conditions without explicitly relying on behavioural information. Here, we introduce MARBLE, a fully unsupervised geometric deep learning framework for the data-driven representation of non-linear dynamics based on statistical distributions of local dynamical features. Using both in silico examples from non-linear dynamical systems and recurrent neural networks and in vivo recordings from primates and rodents, we demonstrate that MARBLE can infer latent representations that are highly interpretable in terms of global system variables such as decision-thresholds, kinematics or internal states. We also show that MARBLE representations are consistent across neural networks and animals so that they can be used to compare cognitive computations or train universal decoders. Through extensive benchmarking, we show that unsupervised MARBLE provides best-in-class within- and across-animal decoding accuracy, comparable to or significantly better than current supervised approaches, yet without the need for behavioural labels. Our results suggest that using the manifold structure in conjunction with the temporal information of neural dynamics provides a common framework to develop better decoding algorithms and assimilate data across experiments.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# StillFast: 短期オブジェクトインタラクション予測のためのエンドツーエンドアプローチ StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation ( http://arxiv.org/abs/2304.03959v2 ) ライセンス: Link先を確認	Francesco Ragusa, Giovanni Maria Farinella, Antonino Furnari,	(参考訳) 予測問題は、人間の位置の予測、手や物体の軌跡の予測、行動の予測、人間と物体の相互作用など、さまざまな側面を考慮して研究されてきた。本稿では,オブジェクト間相互作用の短期的予測問題をエゴセントリックな視点から検討し,新たなエンドツーエンドアーキテクチャであるStillFastを提案する。提案手法は静止画像と映像を同時に処理し、次のアクティブなオブジェクトを検出して位置を定め、将来のインタラクションを記述する動詞を予測し、いつ対話が始まるかを決定する。大規模エゴセントリックデータセットEGO4Dの実験結果から,提案手法は課題に対する最先端のアプローチよりも優れていた。本手法は,EGO4D短期オブジェクトインタラクション予測課題2022において,第1位にランクされている。コードと詳細については、プロジェクトのWebページを参照してください。 Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022. Please see the project web page for code and additional details: https://iplab.dmi.unict.it/stillfast/.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# 深部ニューラルネットワークの不確かさ校正におけるテスト時間増大へのアプローチ Approaching Test Time Augmentation in the Context of Uncertainty Calibration for Deep Neural Networks ( http://arxiv.org/abs/2304.05104v2 ) ライセンス: Link先を確認	Pedro Conde, Tiago Barros, Rui L. Lopes, Cristiano Premebida, Urbano J. Nunes,	(参考訳) Deep Neural Networksの台頭により、機械学習システムは、現在、多くの現実世界のアプリケーションにおいてユビキタスであり、信頼性の高いモデルを必要としている。これは、そのようなシステムの正確性だけでなく、予測の不確実性についても徹底的に検討する必要がある。そこで我々は,画像分類のための深部モデルの不確実性校正を改善するために,テスト時間増大に基づく新しい手法(M-ATTAとV-ATTA)を提案する。ナ適応重み付けシステムを利用することで、M/V-ATTAはモデルの精度に影響を与えることなく不確実性校正を改善する。これらの手法の性能は、不確実性の校正に関連する様々な指標を考慮し、その頑健さを実証することによって評価される。 CIFAR-10, CIFAR-100, Aerial Image Dataset, および分布シフト下の2つの異なるシナリオで得られた実験結果は, 提案手法がいくつかの最先端のポストホックキャリブレーション法より優れていることを示している。さらに,本手法は,分布外サンプルの予測エントロピーも改善した。 M/V-ATTA コード:https://github.com/pedrormconde/MV-ATTA With the rise of Deep Neural Networks, machine learning systems are nowadays ubiquitous in a number of real-world applications, which bears the need for highly reliable models. This requires a thorough look not only at the accuracy of such systems, but also at their predictive uncertainty. Hence, we propose a novel technique (with two different variations, named M-ATTA and V-ATTA) based on test time augmentation, to improve the uncertainty calibration of deep models for image classification. By leveraging na adaptive weighting system, M/V-ATTA improves uncertainty calibration without affecting the model's accuracy. The performance of these techniques is evaluated by considering diverse metrics related to uncertainty calibration, demonstrating their robustness. Empirical results, obtained on CIFAR-10, CIFAR-100, Aerial Image Dataset, as well as in two different scenarios under distribution-shift, indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques. Furthermore, the methods proposed also show improvements in terms of predictive entropy on out-of-distribution samples. Code for M/V-ATTA available at: https://github.com/pedrormconde/MV-ATTA	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# Farm3D:2D拡散による人工3D動物の学習 Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion ( http://arxiv.org/abs/2304.10535v2 ) ライセンス: Link先を確認	Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi,	(参考訳) 本稿では,事前に訓練された2次元拡散画像生成装置からの「自由」な仮想監督のみに頼って,カテゴリー別3次元再構成器を学習するFarm3Dを提案する。最近のアプローチでは、オブジェクトカテゴリの単一ビューイメージの集合から、オブジェクトの発生の3次元形状、アルベド、照明、視点を予測する単眼ネットワークを学習することができる。しかし、これらのアプローチは手作業によるクリーンなトレーニングデータに大きく依存している。本稿では, 安定拡散などの画像生成装置を用いて, 十分にクリーンで手作業によるキュレーションを必要としない合成トレーニングデータを生成するフレームワークを提案する。さらに,拡散モデルをスコアとして組み込んで学習プロセスを強化する。このアイデアは、視点や照明などの再構成の特定の側面をランダム化し、再構成された3Dオブジェクトの仮想ビューを生成し、2Dネットワークが結果の画像の品質を評価できるようにし、再構成者にフィードバックを提供する。テキストプロンプトごとに単一の3Dアセットを生成する蒸留法とは異なり、本手法では、任意の画像から制御可能な3Dアセットを出力できる単分子再構成ネットワークを、1つのフォワードパスで数秒で生成する。我々のネットワークは、単分子再構成や合成などの分析に利用でき、ビデオゲームのようなリアルタイムアプリケーションのための音響資産を生成することができる。 We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a monocular network that predicts the 3D shape, albedo, illumination, and viewpoint of any object occurrence, given a collection of single-view images of an object category. However, these approaches heavily rely on manually curated clean training data, which are expensive to obtain. We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch. Additionally, we incorporate the diffusion model as a score to enhance the learning process. The idea involves randomizing certain aspects of the reconstruction, such as viewpoint and illumination, generating virtual views of the reconstructed 3D object, and allowing the 2D network to assess the quality of the resulting image, thus providing feedback to the reconstructor. Unlike work based on distillation, which produces a single 3D asset for each textual prompt, our approach yields a monocular reconstruction network capable of outputting a controllable 3D asset from any given image, whether real or generated, in a single forward pass in a matter of seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# 非エルミートスピンレスBHZ様モデルにおけるパーシング皮膚効果 Parsing skin effect in a non-Hermitian spinless BHZ-like model ( http://arxiv.org/abs/2304.12723v2 ) ライセンス: Link先を確認	Dipendu Halder, Saurabh Basu,	(参考訳) この研究は、スピンレスベルネヴィグ・ヒューズ・チャン(BHZ)のような1次元のモデルにおける非エルミート皮膚効果(NHSE)を包括的に研究する。非相互ホッピング振幅を持つシステムはNHSEを示すと一般的に信じられている。しかし, システム内のNHSEやその変異の存在を復号するためには, より詳細な解析が必要である。オービタルホッピング用語に非相反性を含めることによって,従来のNHSEや双方向NHSEの存在や,驚くべきNHSEの欠如が示唆される。位相特性と(両直交)バルク境界対応は、(複素)ベリー位相、巻数、エッジモードの空間的局在の計算によって列挙され、そこで生じる位相遷移が強調される。さらに、非エルミートモデルの構造的議論を促進するために、結果をPT対称および非PT対称のケースに分割し、この2つを比較した。 This work comprehensively investigates the non-Hermitian skin effect (NHSE) in a spinless Bernevig-Hughes-Zhang (BHZ)-like model in one dimension. It is generally believed that a system with non-reciprocal hopping amplitudes demonstrates NHSE. However, we show that there are exceptions, and more in-depth analyses are required to decode the presence of NHSE or its variants in a system. The fascinating aspects of our findings, depending on the inclusion of non-reciprocity in the inter-orbital hopping terms, concede the existence of conventional or bi-directional NHSE and even a surprising absence of NHSE. The topological properties and the (bi-orthogonal) bulk-boundary correspondence, enumerated via computation of the (complex) Berry phase, the winding number, and spatial localization of the edge modes, highlight the topological phase transitions occurring therein. Further, to facilitate a structured discussion of the non-Hermitian model, we split the results into PT symmetric and non-PT symmetric cases with a view to comparing the two.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# BCQQ: 周期データ再アップロードによるバッチ制約量子Q-Learning BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading ( http://arxiv.org/abs/2305.00905v2 ) ライセンス: Link先を確認	Maniraman Periyasamy, Marc Hölle, Marco Wiedmann, Daniel D. Scherer, Axel Plinge, Christopher Mutschler,	(参考訳) 深層強化学習(DRL)は、しばしば大量のデータと環境の相互作用を必要とし、トレーニングプロセスに時間がかかる。バッチRLでは、エージェントは環境の相互作用を伴わずに、事前にコンパイルされたデータセットにのみトレーニングされる。量子コンピューティングの最近の進歩は、量子モデルは古典的手法に比べて訓練に必要なデータが少ないことを示唆している。本稿では、離散バッチ制約深度Q-ラーニング(BCQ)アルゴリズムにおいて、VQCを関数近似器として利用するバッチRLアルゴリズムを提案する。さらに,データエンコーディング層における入力変数の順序を周期的にシフトさせることにより,新しいデータ再ロード方式を導入する。我々は,OpenAI CartPole環境におけるアルゴリズムの有効性を評価し,その性能を従来のニューラルネットワークに基づく離散BCQと比較した。 Deep reinforcement learning (DRL) often requires a large number of data and environment interactions, making the training process time-consuming. This challenge is further exacerbated in the case of batch RL, where the agent is trained solely on a pre-collected dataset without environment interactions. Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. In this paper, we investigate this potential advantage by proposing a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm. Additionally, we introduce a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# AIを用いた実用的アノテーションの可能性評価:謝罪の事例 Assessing the potential of AI-assisted pragmatic annotation: The case of apologies ( http://arxiv.org/abs/2305.08339v4 ) ライセンス: Link先を確認	Danni Yu, Luyang Li, Hang Su, Matteo Fuoli,	(参考訳) 音声や意味的タグ付けなどの言語アノテーションの特定の形態は、高精度で自動化することができる。しかし、語彙形式への直接マッピングが欠如している複雑な実用的・非帰的特徴に対して、手動のアノテーションは依然として必要である。この手動のプロセスは時間をかけてエラーを起こし、コーパス言語学における関数間アプローチのスケーラビリティを制限する。そこで本研究では,大規模言語モデル(LLM)を用いたプラグマ離散コーパスアノテーションの自動化について検討した。局所文法の枠組みに基づいて,ChatGPT,Bingチャットボット,および人間のコーダを英語で注釈付けする。 BingチャットボットはChatGPTより優れており、精度は人間のコーダに近づいた。これらの結果から,AIは実用的コーパスアノテーションの支援に成功し,プロセスをより効率的かつスケーラブルにすることができることが示唆された。キーワード:言語アノテーション、関数間アプローチ、大言語モデル、局所文法解析、Bingチャットボット、ChatGPT Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores automating pragma-discursive corpus annotation using large language models (LLMs). We compare ChatGPT, the Bing chatbot, and a human coder in annotating apology components in English based on the local grammar framework. We find that the Bing chatbot outperformed ChatGPT, with accuracy approaching that of a human coder. These results suggest that AI can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient and scalable. Keywords: linguistic annotation, function-to-form approaches, large language models, local grammar analysis, Bing chatbot, ChatGPT	翻訳日:2024-03-21 01:30:29 公開日:2024-03-18
# ハングタイムHAR: Wrist-Worn慣性センサを用いたバスケットボール活動認識のためのベンチマークデータセット Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition using Wrist-Worn Inertial Sensors ( http://arxiv.org/abs/2305.13124v2 ) ライセンス: Link先を確認	Alexander Hoelzemann, Julia Lee Romero, Marius Bock, Kristof Van Laerhoven, Qin Lv,	(参考訳) バスケットボールのトレーニングやドリル,ゲームなどの特定の設定のために,手首のセンサーを用いた身体活動認識手法を評価するためのベンチマークデータセットを提案する。バスケットボール活動は手首に装着した慣性センサーによる計測に適しており、そのようなスポーツ関連アクティビティを検出するシステムは、ゲーム分析、ガイド付きトレーニング、および身体的活動追跡への応用に利用することができる。このデータセットは、バスケットボールのトレーニングセッションとフルゲームの両方で、計24人の選手が手首に慣性センサーを装着した2つの国(米国とドイツ)のチームで記録された。このデータセットの特徴としては,2つの国で記録された試合ルールやスタイルの文化的差異による固有の差異や,以前のバスケットボール経験の面では異質であるため,スポーツスキルのレベルが異なることが挙げられる。いくつかの時系列分析でデータセットの特徴を概説し、2つの最先端ディープラーニングアーキテクチャを用いたベースライン分類性能研究について報告する。 We present a benchmark dataset for evaluating physical human activity recognition methods from wrist-worn sensors, for the specific setting of basketball training, drills, and games. Basketball activities lend themselves well for measurement by wrist-worn inertial sensors, and systems that are able to detect such sport-relevant activities could be used in applications toward game analysis, guided training, and personal physical activity tracking. The dataset was recorded for two teams from separate countries (USA and Germany) with a total of 24 players who wore an inertial sensor on their wrist, during both repetitive basketball training sessions and full games. Particular features of this dataset include an inherent variance through cultural differences in game rules and styles as the data was recorded in two countries, as well as different sport skill levels, since the participants were heterogeneous in terms of prior basketball experience. We illustrate the dataset's features in several time-series analyses and report on a baseline classification performance study with two state-of-the-art deep learning architectures.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-18
# Dual expectile-Quantile Regressionを用いた分散強化学習 Distributional Reinforcement Learning with Dual Expectile-Quantile Regression ( http://arxiv.org/abs/2305.16877v2 ) ライセンス: Link先を確認	Sami Jullien, Romain Deffayet, Jean-Michel Renders, Paul Groth, Maarten de Rijke,	(参考訳) 分散強化学習(RL)は,リターンの完全な分布を近似し,環境サンプルをよりよく活用できるため,複数のベンチマークで有用であることが証明されている。非対称な$L_1$損失に基づく分布RLに対する一般的な量子レグレッションアプローチは、任意の戻り分布を柔軟かつ効果的に学習する方法を提供する。実際には、量子レグレッションのためにより効率的でハイブリッドな$L_1$-$L_2$ Huber損失を使用することで、しばしば改善される。しかし, 分布推定は消滅し, 推定分布が急速に崩壊するのを実証的に観察する。実際、期待回帰に対応する非対称$L_2$損失は、分布時間差分学習では容易には利用できない。本研究は,$L_2$ベースの学習を効率よく行うことにより,返却分布の予測値と量子化値とを協調的に学習し,返却分布の完全な分布を推定し,効率的な学習を可能にすることを提案する。提案手法は, 正解分布を概ね学習し, おもちゃの例と規模で実践的な実装をベンチマークする。 Atari ベンチマークでは,2M のトレーニングフレームの後に Huber ベースの IQN-1 ベースラインの性能にマッチするが,分布の崩壊を回避し,リターンの完全な分布を推定する。 Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and makes a better use of environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, hybrid asymmetric $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our approach approximately learns the correct return distribution, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-18
# アウト・オブ・ディストリビューション検出のためのセマンティック・ロール・ラベルリング Semantic Role Labeling Guided Out-of-distribution Detection ( http://arxiv.org/abs/2305.18026v2 ) ライセンス: Link先を確認	Jinan Zou, Maihao Guo, Yu Tian, Yuhao Lin, Haiyao Cao, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng Shi,	(参考訳) 自然言語処理における予期せぬドメインシフトのインスタンスを特定することは、現実世界のアプリケーションでは不可欠である。従来の作業では,1つのグローバルな特徴を埋め込んで文を表現することで,アウト・オブ・ディストリビューション(OOD)インスタンスを識別していた。現在のOOD法が直面しているもうひとつの大きな課題は、有効な低次元の文表現を学習して、意味論的にIn-distriion(ID)データに類似したハードなOODインスタンスを特定することである。本稿では,文の異なる引数と全文のグローバルな特徴表現から,意味的役割ラベル付け(SRL)を導出した意味的役割ラベル付け(SRLOOD)を分離し,抽出し,学習する,意味的役割ラベル付け(Semantic Role Labeling Guided Out-of-distriion Detection, SRLOOD)と呼ばれる新しい教師なしOOD検出手法を提案する。また,SRLの抽出した役割を予測することにより,グローバルな特徴学習を強化するために,新たな自己教師型アプローチも導入されている。その結果,4つのOODベンチマークにおいてSOTA性能が得られ,本手法の有効性が示唆された。コードは \url{https://github.com/cytai/SRLOOD} を通じて公開されている。 Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the out-of-distribution (OOD) instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sentence representations to identify the hard OOD instances that are semantically similar to the in-distribution (ID) data. In this paper, we propose a new unsupervised OOD detection method, namely Semantic Role Labeling Guided Out-of-distribution Detection (SRLOOD), that separates, extracts, and learns the semantic role labeling (SRL) guided fine-grained local feature representations from different arguments of a sentence and the global feature representations of the full sentence using a margin-based contrastive loss. A novel self-supervised approach is also introduced to enhance such global-local feature learning by predicting the SRL extracted role. The resulting model achieves SOTA performance on four OOD benchmarks, indicating the effectiveness of our approach. The code is publicly accessible via \url{https://github.com/cytai/SRLOOD}.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-18
# DeepSolo++:多言語テキストスポッティングのための明示的なポイントを持つトランスフォーマーデコーダ DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting ( http://arxiv.org/abs/2305.19957v2 ) ライセンス: Link先を確認	Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao,	(参考訳) エンドツーエンドのテキストスポッティングは、シーンテキストの検出と認識を統一されたフレームワークに統合することを目的としている。 2つのサブタスクの関係に対処することは、効果的なスポッターの設計において重要な役割を果たす。 Transformerベースの手法はヒューリスティックな後処理を排除しているが、サブタスク間の相乗効果とトレーニング効率の低下に悩まされている。さらに、追加のスクリプト識別タスクを必要とする多言語テキストスポッティングの探索も見落としている。本稿では,DeepSolo++について述べる。DeepSolo++は単純なDETRライクなベースラインで,テキスト検出,認識,スクリプト識別を単独で行う1つのデコーダを同時に行うことができる。技術的には、各テキストインスタンスに対して、文字シーケンスを順序付けられたポイントとして表現し、学習可能な明示的なポイントクエリでそれらをモデル化します。単一デコーダを渡すと、ポイントクエリは必要なテキストセマンティクスと場所を符号化するので、非常に単純な予測ヘッドを並列で中心線、境界線、スクリプト、およびテキストの信頼性にさらにデコードすることができる。さらに、文字クラス、言語タイプ、タスクの観点から、驚くほど優れた拡張性を示す。一方,本手法は,英語のシーンだけでなく,複雑なフォント構造と中国語などの1000レベルの文字クラスで書き起こしを習得する。一方、私たちのDeepSolo++は、以前の方法と比較して、より簡単なトレーニングパイプラインで、追加で導入されたスクリプト識別タスクにおいて、より良いパフォーマンスを実現しています。さらに、私たちのモデルは行アノテーションとも互換性があり、ポリゴンよりもアノテーションコストがはるかに低い。コードは \url{https://github.com/ViTAE-Transformer/DeepSolo} で公開されている。 End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. Besides, they overlook the exploring on multilingual text spotting which requires an extra script identification task. In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese. On the other hand, our DeepSolo++ achieves better performance on the additionally introduced script identification task with a simpler training pipeline compared with previous methods. In addition, our models are also compatible with line annotations, which require much less annotation cost than polygons. The code is available at \url{https://github.com/ViTAE-Transformer/DeepSolo}.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-18
# BeyondPixels: ニューラルネットワークの進化の概観 BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields ( http://arxiv.org/abs/2306.03000v3 ) ライセンス: Link先を確認	AKM Shahariar Azad Rabby, Chengcui Zhang,	(参考訳) ニューラルレンダリングは、古典的なコンピュータグラフィックスと機械学習のアイデアを組み合わせて、現実世界の観察から画像を合成する。 NeRF(Neural Radiance Fieldsの略)は、AIアルゴリズムを使用して2D画像から3Dオブジェクトを生成する最近のイノベーションである。補間アプローチを活用することで、NeRFは複雑なシーンの新しい3D再構成ビューを生成することができる。 3Dシーンの形状を直接復元する代わりに、NeRFは「放射場」と呼ばれる体積表現を生成し、関連する3D空間内のすべての点について色と密度を生成できる。 NeRFの幅広い魅力と不明瞭さは、このトピックに関する既存の研究を包括的に調査することが不可欠である。 3Dレンダリングに関する以前の調査は、主に従来のコンピュータビジョンベースの、あるいはディープラーニングベースのアプローチに焦点を当てていたが、NeRFの可能性について議論する人はごくわずかである。しかし、これらの調査は主にNeRFの初期の貢献に焦点を合わせており、その潜在能力を探求していない。 NeRFは、その能力と限界について継続的に研究されている比較的新しい技術である。この調査は最近のNeRFの進歩を概観し、特に新規なビュー合成の分野において、それらのアーキテクチャ設計に従って分類する。 Neural rendering combines ideas from classical computer graphics and machine learning to synthesize images from real-world observations. NeRF, short for Neural Radiance Fields, is a recent innovation that uses AI algorithms to create 3D objects from 2D images. By leveraging an interpolation approach, NeRF can produce new 3D reconstructed views of complicated scenes. Rather than directly restoring the whole 3D scene geometry, NeRF generates a volumetric representation called a ``radiance field,'' which is capable of creating color and density for every point within the relevant 3D space. The broad appeal and notoriety of NeRF make it imperative to examine the existing research on the topic comprehensively. While previous surveys on 3D rendering have primarily focused on traditional computer vision-based or deep learning-based approaches, only a handful of them discuss the potential of NeRF. However, such surveys have predominantly focused on NeRF's early contributions and have not explored its full potential. NeRF is a relatively new technique continuously being investigated for its capabilities and limitations. This survey reviews recent advances in NeRF and categorizes them according to their architectural designs, especially in the field of novel view synthesis.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-18
# 一般化可能なロボットマニピュレーションのための転移基礎モデル Transferring Foundation Models for Generalizable Robotic Manipulation ( http://arxiv.org/abs/2306.05716v4 ) ライセンス: Link先を確認	Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, Limin Wang,	(参考訳) 現実世界における汎用ロボット操作エージェントの一般化能力の向上は、長い間大きな課題であった。既存のアプローチは、RT-1データセットのようなコストと時間を要する大規模なロボットデータの収集に依存していることが多い。しかし、データの多様性が不十分なため、これらのアプローチは一般的に、新しいオブジェクトと多様な環境を持つオープンドメインシナリオにおける能力を制限することに悩まされる。本稿では,インターネット規模の基盤モデルによって生成された言語推論セグメンテーションマスクを,ロボット操作タスクの条件付けに効果的に活用する新しいパラダイムを提案する。視覚基盤モデルから導かれる意味的・幾何学的・時間的相関をエンド・ツー・エンドのポリシーモデルに組み込んだマスクのモダリティを組み込むことにより,本手法はオブジェクトのポーズを効果的かつ堅牢に知覚し,新しいオブジェクトインスタンス,セマンティックカテゴリ,目に見えない背景を含むサンプル効率のよい一般化学習を可能にする。まず、複数のタスクにまたがる自然言語要求を基盤とする基礎モデルを紹介します。第2に,実画像とオブジェクトマスクを処理する模倣学習に基づく2ストリーム2Dポリシーモデルを構築し,局所的な認識方式でロボットの動作を予測する。フランカ・エミカのロボットアームを用いた大規模な実世界実験により,提案したパラダイムとポリシーアーキテクチャの有効性が実証された。デモは提出されたビデオで見ることができ、より包括的なデモはlink1またはlink2で見ることができます。 Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-domain scenarios with new objects and diverse environments. In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks. By integrating the mask modality, which incorporates semantic, geometric, and temporal correlation priors derived from vision foundation models, into the end-to-end policy model, our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning, including new object instances, semantic categories, and unseen backgrounds. We first introduce a series of foundation models to ground natural language demands across multiple tasks. Secondly, we develop a two-stream 2D policy model based on imitation learning, which processes raw images and object masks to predict robot actions with a local-global perception manner. Extensive realworld experiments conducted on a Franka Emika robot arm demonstrate the effectiveness of our proposed paradigm and policy architecture. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.	翻訳日:2024-03-21 01:20:39 公開日:2024-03-18
# 対戦トレーニングは非ゼロサムゲームとしてキャストされるべきである Adversarial Training Should Be Cast as a Non-Zero-Sum Game ( http://arxiv.org/abs/2306.11035v2 ) ライセンス: Link先を確認	Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher,	(参考訳) ディープニューラルネットワークの敵対的脆弱性を解決するための1つの顕著なアプローチは、敵対的トレーニングの2つのプレイヤーゼロサムパラダイムであり、予測者は敵対的に選択されたデータの摂動に対して訓練される。このアプローチの約束にもかかわらず、このパラダイムに基づくアルゴリズムは十分なロバストネスのレベルを示さず、ロバストオーバーフィッティングのような病理学的行動に悩まされている。この欠点を理解するために、まず、敵対的学習アルゴリズムでよく使われる代理に基づく緩和が、訓練された分類器の堅牢性に関するすべての保証を無効にすることを示す。この落とし穴の特定は、対戦訓練の非ゼロサム二段階の新たな定式化を通知し、各プレイヤーは異なる目的関数を最適化する。我々の定式化は、単純なアルゴリズムの枠組みを生み出し、場合によっては最先端の攻撃よりも優れ、標準的な敵の訓練アルゴリズムに匹敵する堅牢性を達成し、頑強なオーバーフィッティングに苦しむことはない。 One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-18
# 線形識別学習における周波数効果 Frequency effects in Linear Discriminative Learning ( http://arxiv.org/abs/2306.11044v2 ) ライセンス: Link先を確認	Maria Heitmeier, Yu-Ying Chuang, Seth D. Axen, R. Harald Baayen,	(参考訳) 単語頻度は、ほとんどの語彙処理タスクにおいて強力な予測器である。したがって、どんな単語認識モデルでも、単語の周波数効果がどのように生じるかを考慮する必要がある。識別辞書モデル (DLM; Baayen et al , 2018a, 2019) は、単語の形式とその意味を線形にマッピングした語彙処理をモデル化する。これまでのところ、これらのマッピングは、誤り駆動学習(英語版)によって段階的に得られるか、あるいは全ての単語が最適に学習される理論的な学習状態(EL)をモデル化する、効率的だが周波数に依存しない計算コストの高いプロセスである。本研究では, 形式と意味の効率よく, 周波数インフォームドマッピングが実現可能であることを示す(周波数インフォームド学習; FIL)。 FILは計算コストをはるかに安くしながら、インクリメンタルな解をよく近似していることが分かりました。 FILは比較的低い型と高いトークン精度を示し、モデルが日々の生活の中で話者が遭遇するほとんどのワードトークンを正しく処理できることを示した。我々は、オランダのLexicon Project (Keuleers et al , 2010) において、FILを用いて反応時間をモデル化し、FILが周波数と反応時間の平均の間のS字型関係を適切に予測するが、低頻度語に対する反応時間のばらつきを過小評価する。 FILは,マンダリン中国語(Lee, 2007)の聴覚語彙決定タスクにおいて,ELと比較してプライミング効果を考慮しやすくしている。最後に, CHILDES (Brown, 1973; Demuth et al , 2006) の順序データを用いて, FIL と漸進学習を用いて得られた写像を比較した。写像は高い相関性を持つが、FILでは単語順序効果に基づくニュアンスの一部が失われる。本研究は,学習モデルにおける周波数効果を効率的にシミュレートする方法を示し,認知モデルにおける低頻度単語の最適な説明法について疑問を投げかけるものである。 Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-18
# ポケット特異的分子生成と実験のための関数群に基づく拡散 Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration ( http://arxiv.org/abs/2306.13769v3 ) ライセンス: Link先を確認	Haitao Lin, Yufei Huang, Odin Zhang, Lirong Wu, Siyuan Li, Zhiyuan Chen, Stan Z. Li,	(参考訳) 近年、標的タンパク質のポケットの構造から分子を生成するためにAIによる薬物設計法が提案されている。その多くは原子準位に基づく手法であり、原子を基本成分とみなし、原子の位置と型を生成する。しかし、この方法では複雑な構造を持つ現実的な断片を生成することは困難である。そこで我々はD3FGを提案する。D3FGはポケット固有の分子の生成と実験のための機能群に基づく拡散モデルである。 D3FGは分子を、剛体として定義される官能基と質量点としてのリンカーの2つのカテゴリに分解する。そしてこの2種類の成分は、リガンドとタンパク質の相互作用を強化する複雑な断片を形成することができる。具体的には、拡散過程において、D3FGは、成分の位置、向き、タイプのデータ分布を事前分布に拡散させ、生成過程において、設計された同変グラフニューラルネットワークでパラメータ化して、3変数からノイズを徐々に除去する。実験では, より現実的な3次元構造, タンパク質標的に対する競合親和性, 薬物特性の良好な分子を生成できる。さらに、D3FGは分子の発見の新たな課題の解決策として、既存のリガンドと標的タンパク質のホットスポットに基づいて高い親和性を持つ分子を生成することができる。 In recent years, AI-assisted drug design methods have been proposed to generate molecules given the pockets' structures of target proteins. Most of them are atom-level-based methods, which consider atoms as basic components and generate atom positions and types. In this way, however, it is hard to generate realistic fragments with complicated structures. To solve this, we propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration. D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points. And the two kinds of components can together form complicated fragments that enhance ligand-protein interactions. To be specific, in the diffusion process, D3FG diffuses the data distribution of the positions, orientations, and types of the components into a prior distribution; In the generative process, the noise is gradually removed from the three variables by denoisers parameterized with designed equivariant graph neural networks. In the experiments, our method can generate molecules with more realistic 3D structures, competitive affinities toward the protein targets, and better drug properties. Besides, D3FG as a solution to a new task of molecule elaboration, could generate molecules with high affinities based on existing ligands and the hotspots of target proteins.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-18
# DNABERT-2:多種ゲノムの効率的な基盤モデルとベンチマーク DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome ( http://arxiv.org/abs/2306.15006v2 ) ライセンス: Link先を確認	Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, Han Liu,	(参考訳) DNABERTやヌクレオチドトランスフォーマーといった事前訓練された基礎モデルがこの領域で大きな進歩を遂げている。既存の研究は、A、T、C、Gのk-mer、固定長の置換に、その単純さからゲノム言語のトークンとして大きく依存している。しかし、k-merのトークン化によって引き起こされる計算とサンプルの非効率性は、大規模なゲノム基盤モデルの開発における主要な障害である。そこで我々は,k-merのトークン化をByte Pair Encoding (BPE) に置き換えることを提案する。これは統計に基づくデータ圧縮アルゴリズムで,コーパス内の最も頻繁な共起ゲノムセグメントを反復的にマージすることでトークンを構築する。我々は、BPEがk-merトークン化の限界を克服するだけでなく、重複しないトークン化の計算効率から恩恵を受けることを示した。これらの知見に基づき,DNABERT-2を導入した。DNABERT-2は効率的なプロテタイザに適応し,入力長制約を克服し,時間とメモリ消費を低減し,モデル機能を向上させる。さらに、ゲノム理解のための包括的で標準化されたベンチマークが欠如していることを、公正な比較分析のもう一つの重要な障害とみなす。これに対応するために、GUE(Genome Understanding Evaluation)という総合的な多種ゲノム分類データセットを提案し、このデータセットは、9ドルのタスクで36ドルの異なるデータセットをアマルガットし、入力長は70ドルから10000ドルである。 GUEベンチマークの総合的な実験を通じて、DNABERT-2は、21ドル(約2万2000円)のパラメータと約920ドル(約9万2000円)のGPUトレーニング前の時間で、最先端モデルに匹敵するパフォーマンスを達成することを実証した。 Decoding the linguistic intricacies of the genome is a crucial problem in biology, and pre-trained foundational models such as DNABERT and Nucleotide Transformer have made significant strides in this area. Existing works have largely hinged on k-mer, fixed-length permutations of A, T, C, and G, as the token of the genome language due to its simplicity. However, we argue that the computation and sample inefficiencies introduced by k-mer tokenization are primary obstacles in developing large genome foundational models. We provide conceptual and empirical insights into genome tokenization, building on which we propose to replace k-mer tokenization with Byte Pair Encoding (BPE), a statistics-based data compression algorithm that constructs tokens by iteratively merging the most frequent co-occurring genome segment in the corpus. We demonstrate that BPE not only overcomes the limitations of k-mer tokenization but also benefits from the computational efficiency of non-overlapping tokenization. Based on these insights, we introduce DNABERT-2, a refined genome foundation model that adapts an efficient tokenizer and employs multiple strategies to overcome input length constraints, reduce time and memory expenditure, and enhance model capability. Furthermore, we identify the absence of a comprehensive and standardized benchmark for genome understanding as another significant impediment to fair comparative analysis. In response, we propose the Genome Understanding Evaluation (GUE), a comprehensive multi-species genome classification dataset that amalgamates $36$ distinct datasets across $9$ tasks, with input lengths ranging from $70$ to $10000$. Through comprehensive experiments on the GUE benchmark, we demonstrate that DNABERT-2 achieves comparable performance to the state-of-the-art model with $21 \times$ fewer parameters and approximately $92 \times$ less GPU time in pre-training.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-18
# 深層学習による新型コロナウイルス研究のためのソーシャルメディア情報検索の合理化 Streamlining Social Media Information Retrieval for COVID-19 Research with Deep Learning ( http://arxiv.org/abs/2306.16001v3 ) ライセンス: Link先を確認	Yining Hua, Jiageng Wu, Shixu Lin, Minghui Li, Yujie Zhang, Dinah Foer, Siwen Wang, Peilin Zhou, Jie Yang, Li Zhou,	(参考訳) 目的:ソーシャルメディアベースの公衆衛生研究は疫病の監視に不可欠であるが、ほとんどの研究はキーワードマッチングを伴う関連コーパスを特定している。本研究は,口語医学辞典の整理過程を合理化するシステムを開発した。我々は、新型コロナウイルス関連ツイートからUMLS-coloquial symptom dictionaryを算出し、そのパイプラインを概念実証として示す。方法:2020年2月1日から2022年4月30日までの新型コロナウイルス関連のツイートが使用された。パイプラインには、ツイート中の症状を検出する名前付きエンティティ認識モジュール、検出されたエンティティを集約するエンティティ正規化モジュール、エンティティを統一医療言語システムの概念に反復的にマッピングするマッピングモジュールの3つのモジュールが含まれている。最終的な辞書からランダムな500個のエンティティサンプルを抽出し、精度検証を行った。さらに, 先行研究から, 辞書を予め定義された辞書と比較するために, 症状頻度分布解析を行った。結果: ツイートから498,480のユニークな症状を抽出した。プリプロセッシングは18,226まで減少する。最終辞書には、966 UMLSの概念にマッピングできる症状の38,175のユニークな表現が含まれている(精度=95%)。症状分布分析の結果,我々の辞書はより多くの症状を検知し,不安やうつ病などの精神疾患の同定に有効であることが判明した。結論: この研究は, ソーシャルメディアデータから症状レキシコンをキュレートするための, 新たな体系的パイプラインを実装することによって, 公衆衛生研究を推進している。医療専門家によって検証された最終レキシコンの高精度さは、この手法が膨大な量の構造化されていないソーシャルメディアデータを、多様な地域・地域景観にまたがる行動可能な医学的洞察に確実に解釈し分類する可能性を強調している。 Objective: Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a UMLS-colloquial symptom dictionary from COVID-19-related tweets as proof of concept. Methods: COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity sample were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare our dictionary to a pre-defined lexicon from previous research. Results: We identified 498,480 unique symptom entity expressions from the tweets. Pre-processing reduces the number to 18,226. The final dictionary contains 38,175 unique expressions of symptoms that can be mapped to 966 UMLS concepts (accuracy = 95%). Symptom distribution analysis found that our dictionary detects more symptoms and is effective at identifying psychiatric disorders like anxiety and depression, often missed by pre-defined lexicons. Conclusions: This study advances public health research by implementing a novel, systematic pipeline for curating symptom lexicons from social media data. The final lexicon's high accuracy, validated by medical professionals, underscores the potential of this methodology to reliably interpret and categorize vast amounts of unstructured social media data into actionable medical insights across diverse linguistic and regional landscapes.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-18
# T-MARS:テキスト特徴学習による視覚表現の改善 T-MARS: Improving Visual Representations by Circumventing Text Feature Learning ( http://arxiv.org/abs/2307.03132v2 ) ライセンス: Link先を確認	Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan,	(参考訳) 大規模なWebソースのマルチモーダルデータセットは、汎用的な視覚表現を学習し、コンピュータビジョンの最先端を推し進め、ゼロショットと少数ショットの認識に革命をもたらす、数多くの新しい方法に力を入れている。実践者が直面する決定の1つは、たとえ何であれ、より大きくなったデータセットをどのようにキュレートするかだ。例えば、LAION-5Bデータセットの作成者は、CLIPの類似度スコアが指定された閾値を超えたイメージキャプチャペアのみを保持することを選択した。本稿では,LAIONの画像の40%近くが字幕と重なるテキストを含んでいるという観察を動機とした,最新のデータフィルタリング手法を提案する。直感的には、このようなデータは視覚的特徴を学習するのではなく、光学的文字認識を行うモデルにインセンティブを与えるため、無駄になる可能性がある。しかし、視覚的特徴を含む画像を(重なり合うテキストに加えて)捨ててしまうため、こうしたデータを全て取り除くのは無駄になる可能性がある。私たちのシンプルでスケーラブルなアプローチであるT-MARS(Text Masking and Re-Scoring)は、テキストが残りの視覚的特徴を支配しているペアのみをフィルタリングします。実験的に、T-MARSは、DataCompの"medium scale"(データフィルタリングベンチマーク)において、ImageNetの6.5%、VTABの4.7%のマージンで、トップランクの手法より優れている。さらに, 2M から 64M までのデータプールサイズを系統的に評価した結果,T-MARS による精度向上はデータや計算が指数関数的に大きくなるにつれて線形的に増加することが示された。コードはhttps://github.com/locuslab/T-MARSで入手できる。 Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.	翻訳日:2024-03-21 01:10:08 公開日:2024-03-18
# 動的グラフのためのディープラーニング:モデルとベンチマーク Deep learning for dynamic graphs: models and benchmarks ( http://arxiv.org/abs/2307.06104v3 ) ライセンス: Link先を確認	Alessio Gravina, Davide Bacciu,	(参考訳) 近年,Deep Graph Networks (DGNs) の研究が進展し,グラフ上の学習領域が成熟した。この研究分野の成長にもかかわらず、まだ解決されていない重要な課題がまだ残っている。具体的には、時間とともに進化する相互接続された実体の現実的なシステムにおいて、予測タスクに適したDGNを作ることが望まれている。動的グラフの領域における研究を促進することを目的として、まず、時間情報と空間情報の両方を学ぶことの最近の利点を調査し、動的グラフの表現学習領域における現在の最先端技術の概要を概観する。第二に、ノードとエッジレベルのタスクに関する最も一般的な提案手法と比較して、厳密なモデル選択と評価を活用して、新しいアーキテクチャとアプローチを評価するためのサウンドベースラインを確立する。 Recent progress in research on Deep Graph Networks (DGNs) has led to a maturation of the domain of learning on graphs. Despite the growth of this research field, there are still important challenges that are yet unsolved. Specifically, there is an urge of making DGNs suitable for predictive tasks on realworld systems of interconnected entities, which evolve over time. With the aim of fostering research in the domain of dynamic graphs, at first, we survey recent advantages in learning both temporal and spatial information, providing a comprehensive overview of the current state-of-the-art in the domain of representation learning for dynamic graphs. Secondly, we conduct a fair performance comparison among the most popular proposed approaches on node and edge-level tasks, leveraging rigorous model selection and assessment for all the methods, thus establishing a sound baseline for evaluating new architectures and approaches	翻訳日:2024-03-21 01:00:25 公開日:2024-03-18
# 1次元導波路QEDシステムにおける多重サイドバンド干渉による単一原子増幅 Single-Atom Amplification Assisted by Multiple Sideband Interference in 1D Waveguide QED Systems ( http://arxiv.org/abs/2307.11174v2 ) ライセンス: Link先を確認	Kuan-Ting Lin, Ting Hsu, Fahad Aziz, Yu-Chen Lin, Ping-Yi Wen, Io-Chun Hoi, Guin-Dar Lin,	(参考訳) 本研究では1次元導波路量子電磁力学系における複数のRabiサイドバンドコヒーレンスから生じる信号増幅に関する理論的研究を行う。我々は半無限導波路を用いて、強いコヒーレントマイクロ波場を持つ非調和多層トランスモンを駆動し、プローブ信号を導入して散乱挙動を調べる。文献上より微細なスペクトルを呈し, 特定の共振条件下での信号増幅について検討した。この増幅の背後にあるメカニズムを解明するために、強い駆動場が存在する場合に複数の服を着たサイドバンドを明示的に考慮するモデルを開発した。このモデルからプローブ信号の反射振幅を導出する。特に,本研究の結果は,集団逆転の有無がなくても,集団逆転や複数のサイドバンドの構成的干渉によって増幅が生じる可能性が示唆された。さらに、量子ビットのデフォーカスが増幅プロセスにどのように影響するかについても検討する。 This study conducts a theoretical investigation into the signal amplification arising from multiple Rabi sideband coherence within a one-dimensional waveguide quantum electrodynamics system. We utilize a semi-infinite waveguide to drive an anharmonic multi-level transmon with a strong coherent microwave field, examining the scattering behavior by introducing a probe signal. Our findings reveal signal amplification under specific resonant conditions, presenting spectra that reveal finer details than previously documented in the literature. To elucidate the mechanisms behind this amplification, we develop a model that explicitly accounts for multiple dressed sidebands in the presence of a strong driving field. From this model, we derive the reflection amplitude of the probe signal. Notably, our results indicate that amplification can occur due to either population inversion or, in some instances, through the constructive interference of multiple sidebands even in the absence of population inversion. Additionally, we explore how qubit dephasing impacts the amplification process.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-18
# 局所精製密度演算子と局所測定を用いたスケーラブル量子状態トモグラフィ Scalable Quantum State Tomography with Locally Purified Density Operators and Local Measurements ( http://arxiv.org/abs/2307.16381v2 ) ライセンス: Link先を確認	Yuchen Guo, Shuo Yang,	(参考訳) 量子システムを理解することは、量子ハードウェアとソフトウェアの性能の評価、および量子制御と量子センシングの探索において重要である。量子状態の効率的な表現により、最小限の測定で量子状態トモグラフィを実現することができる。本研究では局所的に精製された密度演算子を通して混合状態のテンソルネットワーク表現を用い,局所的な測定のみを必要とする古典的データ後処理アルゴリズムを用いる状態トモグラフィーの新しいアプローチを提案する。 1次元純状態と2次元純状態の数値シミュレーションにより,提案手法の効率,精度,ロバスト性を実証した。 IBM と Quafu Quantum プラットフォームでの実験はこれらの数値シミュレーションを補完する。本研究では,テンソルネットワーク形式を用いた2次元システムのための量子状態トモグラフィの新たな道を開く。 Understanding quantum systems is of significant importance for assessing the performance of quantum hardware and software, as well as exploring quantum control and quantum sensing. An efficient representation of quantum states enables realizing quantum state tomography with minimal measurements. In this study, we propose a new approach to state tomography that uses tensor network representations of mixed states through locally purified density operators and employs a classical data postprocessing algorithm requiring only local measurements. Through numerical simulations of one-dimensional pure and mixed states and two-dimensional pure states up to size $8\times 8$, we demonstrate the efficiency, accuracy, and robustness of our proposed methods. Experiments on the IBM and Quafu Quantum platforms complement these numerical simulations. Our study opens new avenues in quantum state tomography for two-dimensional systems using tensor network formalism.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-18
# 高頻度半導体量子ドットの断熱的量子アドミタンス:リフレクションメトリーをポラロンダイナミクスとして再考 Beyond-adiabatic Quantum Admittance of a Semiconductor Quantum Dot at High Frequencies: Rethinking Reflectometry as Polaron Dynamics ( http://arxiv.org/abs/2307.16725v5 ) ライセンス: Link先を確認	L. Peri, G. A. Oakes, L. Cochrane, C. J. B. Ford, M. F. Gonzalez-Zalba,	(参考訳) 量子ドットは動的に動作し、量子センサやコンピュータなどの多くの量子技術の基礎となっている。したがって、マイクロ波周波数での電気特性のモデル化は、より大きな電子回路での性能をシミュレートするために不可欠である。そこで我々は,コヒーレント光子浴の効果により,電荷貯留層に結合した量子ドットトンネルの存在感を得るために,自己整合型量子マスター方程式の定式化を開発する。本研究では, フォトニックドライブの共振器と共振器との結合が増大し, 寿命の推移とともに, 既知の半古典的(熱的)限界を捉えたアクセタンスに対する一般表現を求める。さらに,Floquet wideeningはQD状態のドレッシングによって決定され,Floquet wideeningはシステム内の光子損失によって決定される。本研究では,QDの高周波挙動を広範囲に再現し,過去の実験を記述し,新しいQD-光子相互作用の探索法を提案する。 Semiconductor quantum dots operated dynamically are the basis of many quantum technologies such as quantum sensors and computers. Hence, modelling their electrical properties at microwave frequencies becomes essential to simulate their performance in larger electronic circuits. Here, we develop a self-consistent quantum master equation formalism to obtain the admittance of a quantum dot tunnel-coupled to a charge reservoir under the effect of a coherent photon bath. We find a general expression for the admittance that captures the well-known semiclassical (thermal) limit, along with the transition to lifetime and power broadening regimes due to the increased coupling to the reservoir and amplitude of the photonic drive, respectively. Furthermore, we describe two new photon-mediated regimes: Floquet broadening, determined by the dressing of the QD states, and broadening determined by photon loss in the system. Our results provide a method to simulate the high-frequency behaviour of QDs in a wide range of limits, describe past experiments, and propose novel explorations of QD-photon interactions.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-18
# 知覚CLIP:コンテキストの推論と条件付けによる視覚的分類 PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts ( http://arxiv.org/abs/2308.01313v3 ) ライセンス: Link先を確認	Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang,	(参考訳) CLIPのような視覚言語モデルは、様々な視覚概念や自然言語の記述を理解する能力のため、ゼロショット画像分類で広く使われている。しかし、より優れたパフォーマンスを達成するために、CLIPの先例のない人間的な理解能力をフル活用する方法は、まだ未解決の課題である。本論文は,物体の視覚的知覚過程からインスピレーションを得たもので,まず,前景の物体を背景から分離し,その情報に基づいて対象物を分類する,文脈的属性(背景,方向など)を推定する。このことから,CLIPを文脈属性で提供することにより,ゼロショット画像の分類が向上し,スプリアス機能への依存が軽減されることがわかった。また、CLIP自体が画像から属性を合理的に推測できることも観察します。そこで本研究では,トレーニング不要で2段階のゼロショット分類手法PerceptionCLIPを提案する。画像が与えられたら、まずコンテキスト属性(例えば、背景)を推論し、それに基づいてオブジェクト分類条件を実行する。実験の結果,PerceptionCLIPはより優れた一般化,グループロバスト性,相互運用性を実現することがわかった。私たちのコードはhttps://github.com/umd-huang-lab/perceptionCLIPで利用可能です。 Vision-language models like CLIP are widely used in zero-shot image classification due to their ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better performance is still an open question. This paper draws inspiration from the human visual perception process: when classifying an object, humans first infer contextual attributes (e.g., background and orientation) which help separate the foreground object from the background, and then classify the object based on this information. Inspired by it, we observe that providing CLIP with contextual attributes improves zero-shot image classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and interoperability. Our code is available at https://github.com/umd-huang-lab/perceptionCLIP	翻訳日:2024-03-21 01:00:25 公開日:2024-03-18
# DIG In:地理多様性指標を用いた画像生成の差異評価 DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity ( http://arxiv.org/abs/2308.06198v3 ) ライセンス: Link先を確認	Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano,	(参考訳) 最近のテキスト・ツー・イメージ生成システムによる前例のないフォトリアリスティックな結果と、プラグ・アンド・プレイのコンテンツ生成ソリューションとしての利用の増加により、それらの潜在的なバイアスを理解することが不可欠である。本研究では,世界からオブジェクトを生成するように促されたテキスト・ツー・イメージ生成システムの現実性,多様性,迅速な生成一貫性を評価するための3つの指標を提案する。視覚コンテンツ作成システムの構築に向けた重要なステップとして,地理的格差の自動的かつ効率的なベンチマークを可能にすることで,このようなシステムの広範な影響の質的分析を補完する。提案した指標を用いて,現在最先端のビジュアルコンテンツ生成システムにおける潜在的な地理的バイアスを分析し,(1) モデルがアフリカや西アジアに向けて欧州よりも現実性や世代多様性が低いこと,(2) 地理的情報によって生成した画像の一貫性と多様性の促進にコストがかかること,(3) モデルが他のオブジェクトよりも領域レベルの格差が大きいこと,などを見出した。おそらく最も興味深いのは、画像生成品質の進歩は、現実世界の地理的表現のコストがかかることを示唆している。包括的評価は、視覚コンテンツ制作のポジティブな体験を確保するための重要なステップである。 The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone.	翻訳日:2024-03-21 01:00:25 公開日:2024-03-18
# 解毒剤の強化: 毒殺攻撃に対するポイントワイズ認証の改善 Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks ( http://arxiv.org/abs/2308.07553v2 ) ライセンス: Link先を確認	Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein,	(参考訳) 毒殺攻撃は、トレーニングコーパスに小さな変更を加えることで、モデル行動に不当に影響を及ぼす可能性がある。特定の毒殺攻撃に対する防御は存在するが、一般的には保証はない。対照的に、最悪の場合の振る舞いを調べることで、認証された防衛は、ポイントワイド認証として知られる限られた数のトレーニングサンプルを変更する敵攻撃に対して、サンプルの堅牢性を保証することができる。これを実現するために、差分プライバシーとサンプリングガウス機構の両方を利用して、有限個の有毒例に対して各テストインスタンスの予測のばらつきを確実にする。そうすることで、我々のモデルは、以前の認証の2倍以上の大きさの敵の堅牢性を保証する。 Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# ユークリッド関数の最適化に関するワープ幾何情報 Warped geometric information on the optimisation of Euclidean functions ( http://arxiv.org/abs/2308.08305v2 ) ライセンス: Link先を確認	Marcelo Hartmann, Bernardo Williams, Hanlin Yu, Mark Girolami, Alessandro Barp, Arto Klami,	(参考訳) 多くの機械学習タスクにおける損失関数や統計的推論における確率分布の対数といった、潜在的に高次元ユークリッド空間で定義される実数値関数を最適化する基本的なタスクを考える。我々はリーマン幾何学の概念を用いてユークリッド空間上の函数の最適化問題を、歪んだ計量を持つリーマン多様体に再定義し、その多様体に沿った函数の最適性を求める。探索領域に選択された歪んだ計量は、多様体上の測地線曲線に付随する最適な探索方向を計算しやすくする計算フレンドリーな計量テンソルを誘導する。測地線に沿った最適化の実行は一般に不可能であることが知られているが、この特定の多様体ではテイラー近似を3階まで解析的に導出できることが示される。一般に、これらの測地線曲線への近似は多様体上には属さないが、多様体にそれらを引き戻すのに適した退化写像を構築する。したがって、近似測地線曲線に沿って効率的に最適化できる。関連する理論を網羅し、実用的な最適化アルゴリズムを記述し、挑戦的な最適化ベンチマークのコレクション上でそれを実証的に評価する。提案アルゴリズムは測地学の3次近似を用いており、収束するまでの反復数で標準ユークリッド勾配法よりも優れている傾向にある。 We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# ユーザ反応予測のための時間的関心ネットワーク Temporal Interest Network for User Response Prediction ( http://arxiv.org/abs/2308.08487v2 ) ライセンス: Link先を確認	Haolin Zhou, Junwei Pan, Xinyi Zhou, Xihua Chen, Jie Jiang, Xiaofeng Gao, Guihai Chen,	(参考訳) オンラインディスプレイ広告のような産業レコメンデーションシステムでは,ユーザ反応の予測が不可欠である。レコメンデーションモデルのすべての機能の中で、ユーザの振る舞いが最も重要になります。多くの研究で、ユーザの行動は、行動と候補者の間の意味的あるいは時間的相関から、候補項目に対するユーザの関心を反映していることが明らかになっている。論文はそれぞれの相関関係を個別に検討しているが、研究者はまだそれらを意味的・時間的相関関係(意味的・時間的相関関係)と組み合わせて分析していない。我々はこの相関を経験的に測定し、直感的で頑健なパターンを観察する。そして、いくつかの人気ユーザー関心モデルを調べ、驚くべきことに、誰もそのような相関関係をうまく学ばないということに気付きました。このギャップを埋めるために,行動と対象間の意味的時間的相関を同時に捉えるための時間的関心ネットワーク(TIN)を提案する。これを実現するために,意味的エンコーディングに加えて,対象を意識したテンポラルエンコーディングを組み込んで行動や対象を表現する。さらに,ターゲット認識とターゲット認識表現を配置して,意味的・時間的相関を捉えることで,明示的な4方向インタラクションを行う。我々は2つの人気のある公開データセットに対して総合的な評価を行い、提案したTINはGAUCにおいてそれぞれ0.43%、0.29%で最高のパフォーマンスのベースラインを上回ります。 Tencentの広告プラットフォームにおけるオンラインA/Bテストでは、TINは1.65%のコストリフトと1.93%のGMVリフトを達成した。 2023年10月から運用に成功し、WeChat Momentsのトラフィックを処理した。コードをhttps://github.com/zhouxy1003/TINでリリースしました。 User response prediction is essential in industrial recommendation systems, such as online display advertising. Among all the features in recommendation models, user behaviors are among the most critical. Many works have revealed that a user's behavior reflects her interest in the candidate item, owing to the semantic or temporal correlation between behaviors and the candidate. While the literature has individually examined each of these correlations, researchers have yet to analyze them in combination, that is, the semantic-temporal correlation. We empirically measure this correlation and observe intuitive yet robust patterns. We then examine several popular user interest models and find that, surprisingly, none of them learn such correlation well. To fill this gap, we propose a Temporal Interest Network (TIN) to capture the semantic-temporal correlation simultaneously between behaviors and the target. We achieve this by incorporating target-aware temporal encoding, in addition to semantic encoding, to represent behaviors and the target. Furthermore, we conduct explicit 4-way interaction by deploying target-aware attention and target-aware representation to capture both semantic and temporal correlation. We conduct comprehensive evaluations on two popular public datasets, and our proposed TIN outperforms the best-performing baselines by 0.43% and 0.29% on GAUC, respectively. During online A/B testing in Tencent's advertising platform, TIN achieves 1.65% cost lift and 1.93% GMV lift over the base model. It has been successfully deployed in production since October 2023, serving the WeChat Moments traffic. We have released our code at https://github.com/zhouxy1003/TIN.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# サウジアラビアにおけるGoogleアカウント保有者のプライバシー認識と行動 Privacy Perceptions and Behaviors of Google Personal Account Holders in Saudi Arabia ( http://arxiv.org/abs/2308.10148v3 ) ライセンス: Link先を確認	Eman Alashwali, Lorrie Faith Cranor,	(参考訳) 西洋社会ではプライバシーの認識や行動が研究されているが、非西洋社会ではこれらの問題についてはほとんど分かっていない。このギャップを埋めるために、私たちはサウジアラビアのGoogleアカウント保有者30人に、Googleが保存した活動データに関するプライバシーの認識と行動についてインタビューした。我々の研究は、ユーザーがWeb \& App Activity、Location History、YouTube Historyを保存できるかどうかを制御できるGoogleのActivity Controlsに焦点を当てている。我々の結果によると、ほとんどの参加者はGoogleのデータプラクティスやアクティビティコントロールについてある程度の意識を持っているが、多くは曖昧な認識しか持っておらず、大多数は利用可能なコントロールを使用していない。参加者が保存した活動データを見たとき、多くの人が救われたことに驚きました。多くの参加者は、Googleが提供したサービスを改善するためにデータを使用することを容認しているが、大多数は広告目的でデータを使用することを容認できないと考えている。サウジアラビアの参加者は、米国の研究では、プライバシー意識、態度、好み、関心、行動に類似した傾向とパターンを示しています。我々の結果は以下の必要性を強調している。 1) ユーザに対して,アカウント登録時のプライバシ設定を通知し,ユーザに対して設定を通知し,プライバシ設定に対する意識を高める技術の改善。 2)プライバシー設定インタフェースの改善により、多くのユーザーが設定を変更するのを妨げているコストを削減する。 3)非西洋文化におけるプライバシーに関するさらなる研究。 While privacy perceptions and behaviors have been investigated in Western societies, little is known about these issues in non-Western societies. To bridge this gap, we interviewed 30 Google personal account holders in Saudi Arabia about their privacy perceptions and behaviors regarding the activity data that Google saves about them. Our study focuses on Google's Activity Controls, which enable users to control whether, and how, Google saves their Web \& App Activity, Location History, and YouTube History. Our results show that although most participants have some level of awareness about Google's data practices and the Activity Controls, many have only vague awareness, and the majority have not used the available controls. When participants viewed their saved activity data, many were surprised by what had been saved. While many participants find Google's use of their data to improve the services provided to them acceptable, the majority find the use of their data for ad purposes unacceptable. We observe that our Saudi participants exhibit similar trends and patterns in privacy awareness, attitudes, preferences, concerns, and behaviors to what has been found in studies in the US. Our results emphasize the need for: 1) improved techniques to inform users about privacy settings during account sign-up, to remind users about their settings, and to raise awareness about privacy settings; 2) improved privacy setting interfaces to reduce the costs that deter many users from changing the settings; and 3) further research to explore privacy concerns in non-Western cultures.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# 物体検出における不確かさの校正評価のための理論的・実践的枠組み A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection ( http://arxiv.org/abs/2309.00464v2 ) ライセンス: Link先を確認	Pedro Conde, Rui L. Lopes, Cristiano Premebida,	(参考訳) ディープニューラルネットワークの普及により、機械学習システムは様々な現実世界のアプリケーションにますます存在感を増している。その結果,多くの領域において信頼性の高いモデルに対する需要が高まっており,深層学習の将来を考える上で,不確実性校正の問題が重要である。これは、自律運転、ロボット工学、医療診断などの安全上重要な応用に一般的に存在する物体検出システムを考えると特に当てはまる。そこで本研究では,不確実性校正の文脈において,物体検出システムを評価するための理論的,実践的な枠組みを提案する。これは、異なる形式的定義を通じてこの概念の新しい包括的定式化と、そのような理論の基礎から派生した3つの新しい評価指標を含む。提案した不確実性校正指標のロバスト性は, 一連の代表的な実験を通して示される。 The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical applications such as autonomous driving, robotics and medical diagnosis. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. This encompasses a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# LeBenchmark 2.0: フランス語の自己教師型表現のための標準化され、再現可能で拡張されたフレームワーク LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech ( http://arxiv.org/abs/2309.05472v2 ) ライセンス: Link先を確認	Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier,	(参考訳) 自己教師付き学習(SSL)は、コンピュータビジョンや自然言語処理など、多くの異なる領域において前例のない改善がなされている。現在のドメイン関連のタスクのほとんどは、事前トレーニングされたモデルでアプローチされているため、音声処理はSSLから大幅に恩恵を受けています。この研究は、SSL対応のフランス語音声技術の評価と構築のためのオープンソースのフレームワークであるLeBenchmark 2.0を紹介している。これには、最大14,000時間のヘテロジニアスなスピーチを含む文書化された大規模で異質なコーパス、コミュニティと共有される2600万から10億の学習可能なパラメータを含む10のトレーニング済みSSL wav2vec 2.0モデル、既存のベンチマークを補完する6つの下流タスクからなる評価プロトコルが含まれる。 LeBenchmark 2.0はまた、凍結した下流モデルと微調整された下流モデル、タスクに依存しないモデルとタスク固有の事前訓練モデル、および大規模モデルトレーニングの炭素フットプリントに関する議論を含む、スピーチのための事前訓練されたSSLモデルに関するユニークな視点を提示する。全体として、フランス語の14,000時間でトレーニングされた新しいモデルは、マルチリンガルと以前のLeBenchmark SSLモデルよりも優れていたが、事前トレーニングには最大4倍のエネルギーが必要だった。 Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# (ほぼ)量子ベルの不等式とデバイス非依存の応用 (Almost-)Quantum Bell Inequalities and Device-Independent Applications ( http://arxiv.org/abs/2309.06304v3 ) ライセンス: Link先を確認	Yuan Liu, Ho Yiu Chung, Ravishankar Ramanathan,	(参考訳) 近年、量子ベルの不等式の導出による量子相関の境界に関する調査が注目されているが、これはツィレルソンの問題と関連しており、DI情報処理に重要な応用がある。しかし、量子ベルの不等式を決定することは、非常に難しい課題であり、孤立した例のみが知られている。本稿では、(ほぼ)量子ベルの不等式(英語版)のファミリーを提示し、3つの基礎的およびDI的応用に焦点を当てる。第一に、符号なし境界上の量子相関は弱い源からのDIランダム性抽出において重要である。 2つのkアウトカム測定を持つ2人のプレイヤーの現実的なベルのシナリオでは、量子境界が次元$\leq 4k-4$の非局所なポリトープの面から分離されていることを示す量子ベルの不等式が導かれる。直近の副産物として、量子系に対するオーマンの合意定理とほぼ量子相関の一般的な証明を与える。これは、オーマンの合意定理が、一般的な非符号理論から量子理論とほぼ量子相関の両方を選ぶための、疫学の文脈における合理的な物理原理であることを意味する。第二に、m二乗測定シナリオを持つ2人のプレイヤーに量子ベルの不等式(英語版)の族を提示し、2量子ビットのシングルレットと2mの測定を自己検証する。興味深いことに、この主張はTsirelson-Landau-Masanesによって発見された m=2 の結果を一般化し、最先端の DIRA よりも改善されたことを示す。最後に、量子ベルの不等式を用いて、量子相関集合を特徴づける情報理論の原理である非局所計算における優位性の原理の一般形を導出する。これにより、これまでに知られている量子境界の最も正確な特徴を与える。 Investigations of the boundary of the quantum correlation set through the derivation of quantum Bell inequalities have gained increased attention in recent years, which are related to Tsirelson's problem and have significant applications in DI information processing. However, determining quantum Bell inequalities is a notoriously difficult task and only isolated examples are known. In this paper, we present families of (almost-)quantum Bell inequalities and highlight three foundational and DI applications. Firstly, quantum correlations on the non-signaling boundary are crucial in the DI randomness extraction from weak sources. In the practical Bell scenario of two players with two k-outcome measurements, we derive quantum Bell inequalities that show a separation of the quantum boundary from nonlocal faces of the non-signaling polytope of dimension $\leq 4k-4$, extending previous results. As an immediate by-product of this, we give a general proof of Aumann's Agreement theorem for quantum systems and the almost-quantum correlations, which implies Aumann's agreement theorem is a reasonable physical principle in the context of epistemics to pick out both quantum theory and almost-quantum correlations from general no-signaling theories. Secondly, we present a family of quantum Bell inequalities in the two players with m binary measurements scenarios, that serve to self-test the two-qubit singlet and 2m measurements. Interestingly, this claim generalizes the result for m=2 discovered by Tsirelson-Landau-Masanes and shows an improvement over the state-of-the-art DIRA. Lastly, we use our quantum Bell inequalities to derive the general form of the principle of no advantage in nonlocal computation, which is an information-theoretic principle that serves to characterize the quantum correlation set. With this, we provide the most precise characterization of the quantum boundary known so far.	翻訳日:2024-03-21 00:50:27 公開日:2024-03-18
# 可変量子力学のためのオーバーヘッド拘束回路編み Overhead-constrained circuit knitting for variational quantum dynamics ( http://arxiv.org/abs/2309.07857v2 ) ライセンス: Link先を確認	Gian Gentinetta, Friederike Metz, Giuseppe Carleo,	(参考訳) 巨大量子系の力学をシミュレーションすることは、量子力学現象のより深い理解を得るための決定的かつ重要な追求である。量子コンピュータはそのようなシミュレーションを高速化する大きな可能性を秘めているが、その実用化は依然として限られたスケールと広範に広まる騒音によって妨げられている。そこで本研究では,大規模な量子系を個別のデバイスでシミュレート可能な小さなサブシステムに分割する回路編み機を用いて,これらの課題に対処する手法を提案する。システムの進化は、予測された変分量子力学(PVQD)アルゴリズムによって制御され、変分量子回路のパラメータの制約が補われ、回路編み方式によって課されるサンプリングオーバーヘッドが制御可能であることを保証する。我々は,複数の弱い絡み合ったブロックを持つ量子スピン系上で,強く相関したスピンからなる量子スピン系上で実験を行い,サンプリングのオーバーヘッドを管理しつつ,ダイナミックスを正確にシミュレートできることを示した。さらに,長径ゲートを切断することで回路深度を低減できることを示す。 Simulating the dynamics of large quantum systems is a formidable yet vital pursuit for obtaining a deeper understanding of quantum mechanical phenomena. While quantum computers hold great promise for speeding up such simulations, their practical application remains hindered by limited scale and pervasive noise. In this work, we propose an approach that addresses these challenges by employing circuit knitting to partition a large quantum system into smaller subsystems that can each be simulated on a separate device. The evolution of the system is governed by the projected variational quantum dynamics (PVQD) algorithm, supplemented with constraints on the parameters of the variational quantum circuit, ensuring that the sampling overhead imposed by the circuit knitting scheme remains controllable. We test our method on quantum spin systems with multiple weakly entangled blocks each consisting of strongly correlated spins, where we are able to accurately simulate the dynamics while keeping the sampling overhead manageable. Further, we show that the same method can be used to reduce the circuit depth by cutting long-ranged gates.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# Beta Divergencesを用いたDeep Non negative Matrix Factorization Deep Nonnegative Matrix Factorization with Beta Divergences ( http://arxiv.org/abs/2309.08249v3 ) ライセンス: Link先を確認	Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta, Nicolas Gillis,	(参考訳) ディープ非負行列因子化(Deep Non negative Matrix Factorization, ディープNMF)は、最近、異なるスケールで複数の特徴層を抽出する貴重な手法として登場した。しかし、既存のディープNMFモデルとアルゴリズムは、主に最小二乗誤差に基づく評価を中心にしており、多様なデータセットの近似の質を評価するのに最適な指標ではないかもしれない。例えば、オーディオ信号やドキュメントなどのデータタイプを扱う場合、$\beta$-divergencesの方がより適切な代替手段を提供すると広く認識されている。本稿では,Kullback-Leiblerの発散に着目し,$\beta$-divergencesを用いて深部NMFの新しいモデルとアルゴリズムを開発する。その後,これらの手法を,顔の特徴抽出,文書コレクション内の話題の同定,ハイパースペクトル画像内の資料の同定に応用した。 Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $\beta$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $\beta$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# 言語モデリングは圧縮である Language Modeling Is Compression ( http://arxiv.org/abs/2309.10668v2 ) ライセンス: Link先を確認	Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness,	(参考訳) 予測モデルが損失のない圧縮機に変換できることは、長い間確立されてきた。ちなみに、近年、機械学習コミュニティは、ますます大きくて強力な自己監督型(言語)モデルのトレーニングに重点を置いている。これらの大きな言語モデルは印象的な予測能力を示すため、強い圧縮機として十分に配置されている。本研究では,大規模な(基礎)モデルの圧縮能力を評価するとともに,圧縮レンズを通して予測問題を観測することを提唱する。大規模言語モデルは強力な汎用予測器であり、圧縮視点は法則、トークン化、文脈内学習のスケーリングに関する新しい洞察を提供することを示す。例えば、Chinchilla 70Bは、主にテキストで訓練されているが、ImageNetのパッチを43.4%、LibriSpeechのサンプルを16.4%に圧縮し、それぞれPNG(58.5%)やFLAC(30.3%)といったドメイン固有の圧縮機を圧倒している。最後に、予測圧縮等価性により、任意の圧縮器(gzipなど)を用いて条件付き生成モデルを構築することができることを示す。 It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# 核融合スピン鎖上の量子セルオートマトン An index for quantum cellular automata on fusion spin chains ( http://arxiv.org/abs/2309.10961v2 ) ライセンス: Link先を確認	Corey Jones, Junhwi Lim,	(参考訳) 1次元量子セルオートマトン(QCA)のGNVW指数を部分因子のジョーンズ指数で解釈すると、より一般的な抽象スピン鎖上のQCAに定義された指数の一般化につながる。融合スピン鎖は、大域(カテゴリー/MPO)対称性の下で局所作用素として不変であり、2Dトポロジカル符号の境界作用素として生じる。融合圏 $\mathbf{Fib}$ から構築された融合スピン鎖に対して、指数はQCA変調有限深さ回路群に対する完全不変量であることを示す。 Interpreting the GNVW index for 1D quantum cellular automata (QCA) in terms of the Jones index for subfactors leads to a generalization of the index defined for QCA on more general abstract spin chains. These include fusion spin chains, which arise as the local operators invariant under a global (categorical/MPO) symmetry, and as the boundary operators of 2D topological codes. We show that for the fusion spin chains built from the fusion category $\mathbf{Fib}$, the index is a complete invariant for the group of QCA modulo finite depth circuits.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# 1次元上の自由フェルミオンに対する測定誘起相転移 Measurement-induced phase transition for free fermions above one dimension ( http://arxiv.org/abs/2309.12405v3 ) ライセンス: Link先を確認	Igor Poboiko, Igor V. Gornyi, Alexander D. Mirlin,	(参考訳) 自由フェルミオンモデルに対する$d>1$次元における測定誘起エンタングルメント相転移の理論を開発した。臨界点がギャップレス位相を$\ell^{d-1} \ln \ell$スケーリングと$\ell^{d-1}スケールで分離し、$\ell$はサブシステムのサイズである。この問題は、$R\to 1$を持つ$d+1$次元のSU($R$)レプリカ非線型シグマモデルにマッピングされる。正規化群解析を用いて、1ループ近似における臨界指標を$d = 1+ \epsilon$と$\epsilon \ll 1$で計算する。さらに、平方格子上の$d=2$モデルの遷移の数値的研究を行い、臨界点を数値的に決定し、相関長の臨界指標である$\nu \approx 1.4$を推定する。 A theory of the measurement-induced entanglement phase transition for free-fermion models in $d>1$ dimensions is developed. The critical point separates a gapless phase with $\ell^{d-1} \ln \ell$ scaling of the second cumulant of the particle number and of the entanglement entropy and an area-law phase with $\ell^{d-1}$ scaling, where $\ell$ is a size of the subsystem. The problem is mapped onto an SU($R$) replica non-linear sigma model in $d+1$ dimensions, with $R\to 1$. Using renormalization-group analysis, we calculate critical indices in one-loop approximation justified for $d = 1+ \epsilon$ with $\epsilon \ll 1$. Further, we carry out a numerical study of the transition for a $d=2$ model on a square lattice, determine numerically the critical point, and estimate the critical index of the correlation length, $\nu \approx 1.4$.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# どこまで行くのか?人間とAIのコラボレーションから見たデータストーリーテリングツールを理解する Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration ( http://arxiv.org/abs/2309.15723v2 ) ライセンス: Link先を確認	Haotian Li, Yun Wang, Huamin Qu,	(参考訳) データストーリーテリングは、データの洞察を伝えるのに強力ですが、多様なスキルと人間の創造者によるかなりの努力が必要です。近年の研究では、人工知能(AI)がデータストーリーテリングにおいて人間を支援し、強化する可能性について広く研究されている。しかし、人間とAIのコラボレーションの観点からデータストーリーテリングツールを理解するための体系的なレビューがないため、研究者は人間の利点とAIの利点を促進し、その欠点を緩和する既存のコラボレーションツール設計を反映することを妨げている。本稿では, ストーリーテリング・ワークフローの段階, 分析, 計画, 実装, コミュニケーション, クリエータ, アシスタント, オプティマイザ, レビュアーなど, それぞれの段階における人間とAIの役割について検討した。分析を通じて,既存のツールの共通的なコラボレーションパターンを認識し,これらのパターンから学んだ教訓を要約し,データストーリーテリングにおける人間とAIのコラボレーション研究の機会について説明する。 Data storytelling is powerful for communicating data insights, but it requires diverse skills and considerable effort from human creators. Recent research has widely explored the potential for artificial intelligence (AI) to support and augment humans in data storytelling. However, there lacks a systematic review to understand data storytelling tools from the perspective of human-AI collaboration, which hinders researchers from reflecting on the existing collaborative tool designs that promote humans' and AI's advantages and mitigate their shortcomings. This paper investigated existing tools with a framework from two perspectives: the stages in the storytelling workflow where a tool serves, including analysis, planning, implementation, and communication, and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. Through our analysis, we recognize the common collaboration patterns in existing tools, summarize lessons learned from these patterns, and further illustrate research opportunities for human-AI collaboration in data storytelling.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# 騒音の多い農業環境における3次元再構築 : ビュープランニングのためのベイズ最適化の視点 3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View Planning ( http://arxiv.org/abs/2310.00145v2 ) ライセンス: Link先を確認	Athanasios Bacharis, Konstantinos D. Polyzos, Henry J. Nelson, Georgios B. Giannakis, Nikolaos Papanikolopoulos,	(参考訳) 3D再構築は、農業、水中、都市環境など、さまざまな実践的な環境において大きな影響を与えているため、ロボット工学の基本的な課題である。このタスクはビュープランニング(VP)を通じて行うことができ、これは視覚情報を最大化する位置に一定の数のカメラを最適に配置し、その結果の3D再構成を改善することを目的としている。しかし,実世界のほとんどの環境では,既存の環境騒音が3次元再構成の性能に大きな影響を及ぼす可能性がある。そこで本研究では, 閉形式表現を必要とせず, 既存の環境騒音を考慮に入れたVPの幾何的再構成品質関数を提案する。目的関数を解析的に表現することができないため,ノイズの存在下での高精度な3次元再構成のための適応ベイズ最適化アルゴリズムが提案される。騒音の多い農業環境における数値実験は, 少数のカメラを用いた3次元再構築手法の利点を実証するものである。 3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstruction. Nonetheless, in most real-world settings, existing environmental noise can significantly affect the performance of 3D reconstruction. To that end, this work advocates a novel geometric-based reconstruction quality function for VP, that accounts for the existing noise of the environment, without requiring its closed-form expression. With no analytic expression of the objective function, this work puts forth an adaptive Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the merits of the proposed approach for 3D reconstruction with even a small number of available cameras.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# Error Norm Truncation:テキスト生成モデルにおけるデータノイズの存在下でのロバストトレーニング Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models ( http://arxiv.org/abs/2310.00840v2 ) ライセンス: Link先を確認	Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi, Kenton Murray,	(参考訳) テキスト生成モデルは、トレーニングデータのエラーに対して脆弱であることが知られている。大量のWebcrawledデータが広範に利用可能になれば、巨大なノイズの多いWebcrawledテキストでトレーニングされたモデルの堅牢性をどのように向上できるか? 本研究では,ノイズの多いデータをトラストする標準学習目標に対する頑健な強化手法であるError Norm Truncation (ENT)を提案する。データ品質を推定するために負の対数損失のみを用いる手法と比較して、本手法は、過去の研究で見落とされがちな非ターゲットトークンの分布を考慮し、より正確な推定を行う。言語モデリング,機械翻訳,テキスト要約に関する総合的な実験を通じて,テキスト生成モデルにENTを組み込むことで,標準学習や従来のソフト・ハード・トランケーション法よりも生成品質が向上することを示す。さらに,本手法は,機械翻訳において最も有害な2種類のノイズに対するモデルのロバスト性を向上し,最大50%のノイズが加わった場合に,MLEベースライン上で2以上のBLEU点が増加することを示した。 Text generation models are notoriously vulnerable to errors in the training data. With the wide-spread availability of massive amounts of web-crawled data becoming more commonplace, how can we enhance the robustness of models trained on a massive amount of noisy web-crawled text? In our work, we propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data. Compared to methods that only uses the negative log-likelihood loss to estimate data quality, our method provides a more accurate estimation by considering the distribution of non-target tokens, which is often overlooked by previous work. Through comprehensive experiments across language modeling, machine translation, and text summarization, we show that equipping text generation models with ENT improves generation quality over standard training and previous soft and hard truncation methods. Furthermore, we show that our method improves the robustness of models against two of the most detrimental types of noise in machine translation, resulting in an increase of more than 2 BLEU points over the MLE baseline when up to 50% of noise is added to the data.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# ImageNet-OOD: 現代のアウト・オブ・ディストリビューション検出アルゴリズムの解読 ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms ( http://arxiv.org/abs/2310.01755v2 ) ライセンス: Link先を確認	William Yang, Byron Zhang, Olga Russakovsky,	(参考訳) アウト・オブ・ディストリビューション(OOD)検出のタスクは、未定義で悪名高い。初期の研究は「セマンティックシフト(semantic shift)」とも呼ばれるラベル変更データ分散シフトを識別することを目的とした、新しいクラス検出に焦点を当てていた。しかし、最近の研究は、障害検出に焦点を当て、OOD評価フレームワークを拡張して、ラベル保存データ分散シフト("covariate shift"とも呼ばれる)を考慮に入れている。興味深いことに、この新たな枠組みの下では、これまで最先端と見なされていた複雑なOOD検出器が、単純な最大ソフトマックス確率ベースラインと同じような、あるいはさらに悪い性能を発揮する。最新のOOD検出器は何が実際に検出されているのか? OOD検出アルゴリズムの振る舞いを解読するには、セマンティックシフトと共変量シフトを分離する評価データセットが必要である。本研究では,共変量シフトの干渉を最小限に抑えるクリーンなセマンティックシフトデータセットであるImageNet-OODを提案する。総合的な実験を通して、OOD検出器は意味的シフトよりも共変量シフトに敏感であることが示され、最近のOOD検出アルゴリズムのセマンティックシフト検出に対する利点は最小限である。我々のデータセットと分析は、将来のOOD検出器の設計を導く上で重要な洞察を提供する。 The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate shift." Intriguingly, under this new framework, complex OOD detectors that were previously considered state-of-the-art now perform similarly to, or even worse than the simple maximum softmax probability baseline. This raises the question: what are the latest OOD detectors actually detecting? Deciphering the behavior of OOD detection algorithms requires evaluation datasets that decouples semantic shift and covariate shift. To aid our investigations, we present ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift. Through comprehensive experiments, we show that OOD detectors are more sensitive to covariate shift than to semantic shift, and the benefits of recent OOD detection algorithms on semantic shift detection is minimal. Our dataset and analyses provide important insights for guiding the design of future OOD detectors.	翻訳日:2024-03-21 00:40:38 公開日:2024-03-18
# $\mathcal{B}$-Coder: プログラム合成のための価値に基づく深層強化学習 $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis ( http://arxiv.org/abs/2310.03173v2 ) ライセンス: Link先を確認	Zishun Yu, Yunzhe Tao, Liyu Chen, Tao Sun, Hongxia Yang,	(参考訳) プログラム合成は,問題仕様,特に文脈における自然言語記述から,正確な実行可能プログラムを作成することを目的としている。近年,大規模言語モデル(LLM)とともに強化学習(RL)の能力を活用し,コード生成能力を大幅に向上させている。 RLの応用は機能的正当性を直接最適化することに焦点を当て、従来の教師付き手法よりも有利である。ポリシーに基づくRL法は、プログラム合成のためのRLに関する文献を支配しているが、プログラム合成タスクの性質は、値ベースの方法と自然な整合性を示唆している。これは、人間のプログラマによって開発されたプログラムや歴史的なサンプルを含む、豊富なオフポリティプログラムの収集と、自動単体テストによる生成プログラムの直接的な検証から来ており、報酬は容易に得られることを意味する。ポリシーベースのアルゴリズムの優位性から、我々の研究は価値ベースのアプローチの実現可能性を探究し、$\mathcal{B}$-Coder(ベルマン・コーダ)の開発に繋がる。しかし,プログラム合成に固有の膨大な検索空間のために,価値に基づく学習手法が課題を呈している。そこで本研究では,事前学習されたLMと保守的なベルマン演算子を用いたRLエージェントの初期化プロトコルを導入し,学習の複雑さを低減した。さらに、学習した値関数を、生成したプログラムを後処理する双対戦略として活用する方法を実証する。実証評価では,ポリシベースの手法と比較して,最先端性能を実現するための$\mathcal{B}$-Coderの能力を実証した。注目すべきことに、この成果は最小限の報酬工学努力で達成され、報酬設計とは無関係に価値に基づくRLの有効性を強調している。 Program synthesis aims to create accurate, executable programs from problem specifications, specifically from natural language descriptions in our context. Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs), significantly enhancing code generation capabilities. The application of RL focuses on directly optimizing for functional correctness, offering an advantage over conventional supervised methods. Despite policy-based RL methods dominating the literature on RL for program synthesis, the nature of program synthesis tasks hints at a natural alignment with value-based methods. This stems from the rich collection of off-policy programs, including those developed by human programmers and also historical samples, coupled with the straightforward verification of generated programs through automated unit testing, meaning rewards are easy to obtain. Diverging from the dominant use of policy-based algorithms, our work explores the feasibility of value-based approaches, leading to the development of our $\mathcal{B}$-Coder (pronounced Bellman coder). Yet, training value-based methods presents challenges due to the enormous search space inherent to program synthesis. To this end, we introduce an initialization protocol for RL agents utilizing pre-trained LMs and a conservative Bellman operator to reduce training complexities. Moreover, we demonstrate how to leverage the learned value functions as a dual strategy to post-process generated programs. Our empirical evaluations demonstrated $\mathcal{B}$-Coder's capability in achieving state-of-the-art performance when compared to policy-based methods. Remarkably, this achievement is reached with minimal reward engineering effort, highlighting the effectiveness of value-based RL, independent of reward designs.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-18
# 困難に適応した軌道マッチングによるロスレスデータセット蒸留に向けて Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching ( http://arxiv.org/abs/2310.05773v2 ) ライセンス: Link先を確認	Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, Yang You,	(参考訳) Dataset Distillationの最終的な目標は、この合成セットでトレーニングされたモデルが、完全な実際のデータセットでトレーニングされたモデルと同等に機能するように、小さな合成データセットを合成することである。これまでのデータセット蒸留法は, 合成試料の総数が極端に少ない場合にのみ, 従来の方法が有効であることから, 完全に損失のない目標に達していない。このような少数のサンプルに十分な情報しか含められないため、真の損失データセット蒸留を実現するためには、合成データセットのサイズが大きくなるにつれて有効である蒸留法を開発する必要があると考えられる。本研究では,既存の手法が大規模で高品質な合成集合を生成できない理由を解明する。現在の最先端の手法は、軌道マッチングに依存するか、あるいは合成データを最適化して、実際のデータと同様の長期トレーニングダイナミクスを誘導する。実験により, 一致する軌道(早期または後期)の訓練段階が, 蒸留データセットの有効性に大きく影響していることが判明した。具体的には、教師ネットワークが容易にパターンを学習する)初期の軌跡は、必要な情報を配布する事例が少ないため、低カルディナリティの合成セットとしてうまく機能する。逆に、(教師ネットワークがハードパターンを学習する)後期軌道は、必要な複雑なパターンを表現するのに十分なサンプルがあるため、より大きな合成セットに対してより良い信号を提供する。そこで本研究では,生成したパターンの難易度を合成データセットのサイズに合わせることを提案する。そこで我々は, トラジェクトリーマッチング法を大規模合成データセットに拡張し, ロスレスなデータセット蒸留を初めて達成した。コードと蒸留データセットはhttps://gzyaftermath.github.io/DATMで入手できる。 The ultimate goal of Dataset Distillation is to synthesize a small synthetic dataset such that a model trained on this synthetic set will perform equally well as a model trained on the full, real dataset. Until now, no method of Dataset Distillation has reached this completely lossless goal, in part due to the fact that previous methods only remain effective when the total number of synthetic samples is extremely small. Since only so much information can be contained in such a small number of samples, it seems that to achieve truly loss dataset distillation, we must develop a distillation method that remains effective as the size of the synthetic dataset grows. In this work, we present such an algorithm and elucidate why existing methods fail to generate larger, high-quality synthetic sets. Current state-of-the-art methods rely on trajectory-matching, or optimizing the synthetic data to induce similar long-term training dynamics as the real data. We empirically find that the training stage of the trajectories we choose to match (i.e., early or late) greatly affects the effectiveness of the distilled dataset. Specifically, early trajectories (where the teacher network learns easy patterns) work well for a low-cardinality synthetic set since there are fewer examples wherein to distribute the necessary information. Conversely, late trajectories (where the teacher network learns hard patterns) provide better signals for larger synthetic sets since there are now enough samples to represent the necessary complex patterns. Based on our findings, we propose to align the difficulty of the generated patterns with the size of the synthetic dataset. In doing so, we successfully scale trajectory matching-based methods to larger synthetic datasets, achieving lossless dataset distillation for the very first time. Code and distilled datasets are available at https://gzyaftermath.github.io/DATM.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-18
# 性能保証付きユニットコミット予測器:サポートベクトルマシン分類器 Unit Commitment Predictor With a Performance Guarantee: A Support Vector Machine Classifier ( http://arxiv.org/abs/2310.08601v2 ) ライセンス: Link先を確認	Farzaneh Pourahmadi, Jalal Kazempour,	(参考訳) システムオペレータは通常、計算の限られた時間枠内で大規模な単位コミットメント問題を解決する必要がある。本稿では,従来の単位のオン/オフ決定を学習し,予測することにより,システムオペレーターが解法を温め,計算を著しく高速化する可能性を示す。予測のために、線形およびカーネル化されたサポートベクタマシン分類器を訓練し、適切に正規化され、分散的に堅牢な分類器に変換された場合、サンプル外の性能保証を提供する。単位コミットメント問題に対して、混合整数二階コーン問題を解く。 IEEE 6-および118-busテストシステムに基づく結果,正規化を適切に行うカーネル化されたSVMは他の分類器よりも優れた性能を示し,計算時間を1.7倍に短縮した。さらに、厳密な計算限界が存在する場合、温暖化開始のない単位コミットメント問題は最適解から遠く離れており、その温暖化開始版は時間限界内で(ほぼ)最適に解ける。 The system operators usually need to solve large-scale unit commitment problems within limited time frame for computation. This paper provides a pragmatic solution, showing how by learning and predicting the on/off commitment decisions of conventional units, there is a potential for system operators to warm start their solver and speed up their computation significantly. For the prediction, we train linear and kernelized support vector machine classifiers, providing an out-of-sample performance guarantee if properly regularized, converting to distributionally robust classifiers. For the unit commitment problem, we solve a mixed-integer second-order cone problem. Our results based on the IEEE 6- and 118-bus test systems show that the kernelized SVM with proper regularization outperforms other classifiers, reducing the computational time by a factor of 1.7. In addition, if there is a tight computational limit, while the unit commitment problem without warm start is far away from the optimal solution, its warmly-started version can be solved to (near) optimality within the time limit.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-18
# 翻訳の文脈的リファインメント:文文と文書レベルの後編集のための大規模言語モデル Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing ( http://arxiv.org/abs/2310.14855v2 ) ライセンス: Link先を確認	Sai Koneru, Miriam Exel, Matthias Huck, Jan Niehues,	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクでかなりの成功を収めてきたが、ニューラルネットワーク翻訳(NMT)では、まだ最先端のパフォーマンスを達成できていない。それでも、広範囲の理解と文脈処理を必要とするタスクにおける重要なパフォーマンスは、翻訳の可能性を示している。これらの能力を活かすために, MT 用 LLM を用いて最近のパラメータ効率向上技術について検討する。驚くべきことに、私たちの最初の実験では、翻訳目的の微調整がパフォーマンスの低下につながることもわかりました。そこで本研究では,LLMを直接翻訳者ではなく自動編集者 (APE) として適応するアプローチを提案する。長いシーケンスを処理・生成するLLMの異常な能力に基づいて、文書レベルの翻訳へのアプローチの拡張も提案する。 APEにローランドアダプタの微調整を適用することで、文レベルと文書レベルの両方のメトリクスが大幅に改善され、ドメイン外データへの一般化が期待できることを示す。最も顕著なのは、ContraProテストセットで89倍の最先端精度を実現し、特に、英語からドイツ語への翻訳において、代名詞のあいまいさを解消する能力を評価することである。最後に、参照コンテキストが利用可能となる文書レベルの翻訳を手作業で後編集する実践シナリオについて検討する。ここでは、人間の修正を活用することで、後続の翻訳に必要な編集回数を大幅に削減できることを実証する(手動フィードバックを統合するInteractive Demoは、https://huggingface.co/spaces/skoneru/contextual_refinement_endeを参照)。 Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations (Interactive Demo for integrating manual feedback can be found here: https://huggingface.co/spaces/skoneru/contextual_refinement_ende).	翻訳日:2024-03-21 00:30:47 公開日:2024-03-18
# 3次元マスク付きオートエンコーダを用いたMRIスキャンのプライバシー保護 Privacy Protection in MRI Scans Using 3D Masked Autoencoders ( http://arxiv.org/abs/2310.15778v3 ) ライセンス: Link先を確認	Lennart Alexander Van der Goten, Kevin Smith,	(参考訳) MRIスキャンは貴重な医療情報を提供するが、保護すべき機密情報や個人識別情報も含む。 MRIメタデータは容易にサニタイズされるが、MRI画像データは患者の頭部の高現実的な3Dヴィジュアライゼーションをレンダリングする情報を含んでいるため、データベースを相互参照することで、悪意あるアクターが被検体を特定できるため、プライバシのリスクである。データ匿名化と非識別化は、個人の個人情報のプライバシーと機密性の確保に関係している。従来のMRIの非識別方法は、特定のスキャンからプライバシーに敏感な部分(目、鼻など)を取り除く。これは、ダウンストリーム分析をオフにできるドメインシフトの導入に費やされる。本研究では,マスク付きオートエンコーダを用いて部品を除去する代わりに,顔のリモデリング(例えば顔の変更)によって顔を識別するCP-MAEを提案する。 CP-MAEは、ダウンストリームタスクのパフォーマンスと非識別の観点から、以前のアプローチよりも優れています。我々の方法では、解像度が最大256^3$までの高忠実度スキャンを合成できるが、従来の手法では128^3$であるのに対し、ボクセルの数は8倍に増加する。 MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. In this work, we propose CP-MAE, a model that de-identifies the face by remodeling it (e.g. changing the face) rather than by removing parts using masked autoencoders. CP-MAE outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize high-fidelity scans of resolution up to $256^3$ -- compared to $128^3$ with previous approaches -- which constitutes an eight-fold increase in the number of voxels.	翻訳日:2024-03-21 00:30:47 公開日:2024-03-18
# IIDウェイトを超えて:スパースと低ランクのディープニューラルネットワークもガウス的プロセスである Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes ( http://arxiv.org/abs/2310.16597v3 ) ライセンス: Link先を確認	Thiziri Nait-Saada, Alireza Naderi, Jared Tanner,	(参考訳) 無限に広いニューラルネットワークは、ディープラーニングに現れる多くの現象の理解を可能にする、有用で管理可能な数学的モデルであることが証明されている。例えば、ランダムディープネットワークをガウス過程に収束させることで、活性化関数とネットワークウェイトの選択がトレーニング力学にどのように影響するかを厳密に分析することができる。本稿では, Matthews et al (2018) の初歩的な証明を, IID や直交重みの確立した事例を含むより大規模な初期重量分布(PSEUDO-IID と呼ぶ)に拡張するとともに, 計算速度向上のために, 新たな低ランクで構造化されたスパースな設定を行う。また,PSEUDO-IID分布を初期化した完全接続型・畳み込み型ネットワークは,その分散により有効に等価であることを示す。この結果を用いて、ニューラルネットワークの幅広いクラスに対してEdge-of-Chaosを識別し、トレーニングを強化するために臨界度で調整することができる。さらに、ベイズニューラルネットワークの後方分布をこれらの様々な初期化スキームで引き出せるようにしている。 The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training. Moreover, they enable the posterior distribution of Bayesian Neural Networks to be tractable across these various initialization schemes.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-18
# AIによる意思決定におけるインタラクションパターンの分類 : 体系的なレビューから Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review ( http://arxiv.org/abs/2310.19778v3 ) ライセンス: Link先を確認	Catalina Gomez, Sue Min Cho, Shichang Ke, Chien-Ming Huang, Mathias Unberath,	(参考訳) 意思決定支援システムにおける人工知能(AI)の活用は、しばしばアルゴリズムの出力と人間の期待の一致を見越して、技術進歩に不相応に焦点を合わせてきた。人間中心の視点は、既存のプロセスとのシームレスな統合のためにAIソリューションを設計することで、この懸念を緩和しようとする。 AIが人間を助けるために提供すべき情報を決定することは不可欠である。しかし、情報がどのように提示されるか、例えば、レコメンデーションのシーケンスと解釈のソリケーションは、人間とAIの間の複雑な相互作用が出現する可能性があるため、同様に重要である。実証的研究は、ドメイン間の人間とAIのダイナミクスを評価してきたが、人間とAIのインタラクションプロトコルの共通語彙は欠如している。インタラクションデザインのより慎重な考察を促進するために,人間とAIのインタラクションの様々なモードを規定するインタラクションパターンの分類を導入する。本稿では,AIによる意思決定文献の体系的レビューの結果を要約し,アプリケーションドメイン間でのインタラクションのトレンドと機会を105記事から抽出する。現在のインタラクションは、単純化されたコラボレーションパラダイムによって支配されており、真のインタラクティブな機能はほとんどサポートされません。我々の分類学は、意思決定におけるAIとの相互作用を理解するツールを提供し、コミュニケーション、信頼性、コラボレーションを明確化するための相互作用設計を育む。 Leveraging Artificial Intelligence (AI) in decision support systems has disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. A human-centered perspective attempts to alleviate this concern by designing AI solutions for seamless integration with existing processes. Determining what information AI should provide to aid humans is vital, a concept underscored by explainable AI's efforts to justify AI predictions. However, how the information is presented, e.g., the sequence of recommendations and solicitation of interpretations, is equally crucial as complex interactions may emerge between humans and AI. While empirical studies have evaluated human-AI dynamics across domains, a common vocabulary for human-AI interaction protocols is lacking. To promote more deliberate consideration of interaction designs, we introduce a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We summarize the results of a systematic review of AI-assisted decision making literature and identify trends and opportunities in existing interactions across application domains from 105 articles. We find that current interactions are dominated by simplistic collaboration paradigms, leading to little support for truly interactive functionality. Our taxonomy offers a tool to understand interactivity with AI in decision-making and foster interaction designs for achieving clear communication, trustworthiness, and collaboration.	翻訳日:2024-03-21 00:20:56 公開日:2024-03-18
# 因果介入による移動予測ネットワークの行動への影響の解明 Revealing behavioral impact on mobility prediction networks through causal interventions ( http://arxiv.org/abs/2311.11749v2 ) ライセンス: Link先を確認	Ye Hong, Yanan Xin, Simon Dirmeier, Fernando Perez-Cruz, Martin Raubal,	(参考訳) ディープニューラルネットワークは、モビリティ予測タスクにますます活用されているが、その複雑な内部動作は、特にモビリティ行動の様々な側面が予測にどのように影響するかを理解する際に、解釈可能性に課題をもたらす。本研究では、次の位置予測のために設計されたニューラルネットワークに対する移動関連要因の影響を評価するための因果介入フレームワークを紹介する。これを実現するために,個別の移動モデルを用いて,データ生成プロセスに介入して,合成位置情報シーケンスを生成し,動作のダイナミクスを制御する。移動度測定値を用いて介入位置列を評価し、よく訓練されたネットワークに入力し、性能変動を分析する。その結果, 異なる移動行動を伴う位置列の生成の有効性が示され, 多様な空間的・時間的変化のシミュレーションが容易となった。これらの変化は、次の位置予測ネットワークのパフォーマンス変動をもたらし、位置遷移のシーケンシャルなパターン、新しい位置を探索する確率、人口と個人レベルの位置選択の好みなど、重要な移動行動要因の影響を明らかにする。得られた知見は、モビリティ予測ネットワークの現実的な応用に重要な価値を持ち、このフレームワークは、モビリティアプリケーションにおけるニューラルネットワークの解釈可能性と堅牢性を高めるための因果推論の利用を促進することが期待されている。 Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. This study introduces a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location prediction -- a task focusing on predicting the immediate next location of an individual. To achieve this, we employ individual mobility models to generate synthetic location visit sequences and control behavior dynamics by intervening in their data generation process. We evaluate the interventional location sequences using mobility metrics and input them into well-trained networks to analyze performance variations. The results demonstrate the effectiveness in producing location sequences with distinct mobility behaviors, thereby facilitating the simulation of diverse yet realistic spatial and temporal changes. These changes result in performance fluctuations in next location prediction networks, revealing impacts of critical mobility behavior factors, including sequential patterns in location transitions, proclivity for exploring new locations, and preferences in location choices at population and individual levels. The gained insights hold significant value for the real-world application of mobility prediction networks, and the framework is expected to promote the use of causal inference for enhancing the interpretability and robustness of neural networks in mobility applications.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-18
# 質と量:ファッションデザインにおけるテキストと画像の合成のための何百万もの高品質な画像 Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design ( http://arxiv.org/abs/2311.12067v3 ) ライセンス: Link先を確認	Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan,	(参考訳) AIとファッションデザインの融合は、有望な研究分野として浮上している。しかし、衣料品や試着段階に関する広範な相互関連データが欠如しているため、この領域におけるAIの潜在能力は損なわれている。これに対応するために、我々は、数年にわたる厳格な取り組みの産物であるFashion-Diffusionデータセットを提示する。このデータセットは、最初のもので、100万以上の高品質なファッションイメージで構成され、詳細なテキスト記述と組み合わせている。さまざまな地理的な場所と文化的背景から得られたデータセットは、世界的なファッショントレンドをカプセル化している。この画像には、衣服や人間に関連する細かい属性が刻まれており、ファッションデザインプロセスを単純化してテキスト・ツー・イメージ(T2I)タスクにしている。 Fashion-Diffusionデータセットは、高品質なテキストイメージペアと多様なヒューマンガーメントペアを提供するだけでなく、人間に関する大規模なリソースとしても機能し、T2I世代の研究を促進する。さらに、T2Iに基づくファッションデザイン分野の標準化を促進するために、ファッションデザインモデルの性能を評価するために、複数のデータセットからなる新しいベンチマークを提案する。この研究は、AI駆動のファッションデザインの領域における大きな飛躍であり、この分野における将来の研究のための新しい標準を確立している。 The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-18
# 学習したフォワード演算子の逆問題 Inverse Problems with Learned Forward Operators ( http://arxiv.org/abs/2311.12528v2 ) ライセンス: Link先を確認	Simon Arridge, Andreas Hauptmann, Yury Korolev,	(参考訳) 逆問題の解決にはフォワード演算子の知識が必要だが、正確なモデルは計算コストがかかるため、復元品質を損なわないより安価な変種が望まれる。本章は、2つの異なるパラダイムに従う学習前方演算子による逆問題における再構成手法についてレビューする。 1つ目は、フォワード演算子に完全に依存せず、トレーニングデータにまたがる部分空間に対する制限を学習する。射影による正規化の枠組みは、再構成を見つけるために使われる。 2つ目は、測定プロセスの物理の単純化されたモデルを使用し、モデルの修正を学習するためにトレーニングデータのみに依存する。これら2つのアプローチの理論を数値的に比較する。両方のメソッドは、フォワード演算子だけでなく、アジョイントのためにもトレーニングデータを必要とする。 Solving inverse problems requires the knowledge of the forward operator, but accurate models can be computationally expensive and hence cheaper variants that do not compromise the reconstruction quality are desired. This chapter reviews reconstruction methods in inverse problems with learned forward operators that follow two different paradigms. The first one is completely agnostic to the forward operator and learns its restriction to the subspace spanned by the training data. The framework of regularisation by projection is then used to find a reconstruction. The second one uses a simplified model of the physics of the measurement process and only relies on the training data to learn a model correction. We present the theory of these two approaches and compare them numerically. A common theme emerges: both methods require, or at least benefit from, training data not only for the forward operator, but also for its adjoint.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-18
# RLIF: 強化学習としてのインタラクティブな模倣学習 RLIF: Interactive Imitation Learning as Reinforcement Learning ( http://arxiv.org/abs/2311.12996v2 ) ライセンス: Link先を確認	Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine,	(参考訳) 強化学習手法は、自動スキル獲得のための強力なフレームワークを提供するが、ロボット工学のような分野における実践的な学習ベースの制御問題に対して、模倣学習はより便利でアクセスしやすい代替手段を提供することが多い。特に, DAggerなどのインタラクティブな模倣学習手法では, 最適に近い専門家にオンラインで介入を依頼して, na\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\ 本稿では,対話型模倣学習と類似するが,さらに実践的な仮定の下で,非政治強化学習がパフォーマンス向上を実現する方法について検討する。提案手法は,ユーザ介入信号を用いた強化学習を報奨として利用する。このことは、インタラクティブな模倣学習において介入する専門家がほぼ最適であるべきだという仮定を緩和し、アルゴリズムが潜在的に最適でない人間の専門家よりも改善される行動を学ぶことを可能にする。また,RL法とDAggerを統一的に解析するためのフレームワークも提供し,本手法の非漸近的サンプル複雑性境界だけでなく,両手法の最適下界の漸近的解析について述べる。次に,実世界のロボットビジョンに基づく操作タスクと同様に,高次元連続制御シミュレーションベンチマークの課題に対する評価を行った。結果は,特に介入する専門家が最適でない場合には,DAggerのようなアプローチよりも優れていることを示す。コードとビデオはプロジェクトのWebサイト(https://rlif-page.github.io)で見ることができる。 Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict na\"ive behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: https://rlif-page.github.io	翻訳日:2024-03-21 00:11:07 公開日:2024-03-18
# D-SCo:単分子ハンドヘルド物体再構成のためのデュアルストリーム条件拡散 D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction ( http://arxiv.org/abs/2311.14189v2 ) ライセンス: Link先を確認	Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari,	(参考訳) 単一のRGB画像からハンドヘルドオブジェクトを再構築することは、コンピュータビジョンにおいて難しい課題である。決定論的モデリングのパラダイムを利用する先行研究とは対照的に、この問題の確率論的性質を考慮に入れた点雲デノナイズ拡散モデルを用いる。中核部では,単眼ハンドヘルドオブジェクト再構成(D-SCo)のための遠心固定型二重ストリーム条件拡散を導入し,二つの課題に対処した。まず,物体の遠方偏差を回避するため,手拘束型遠方偏差固定パラダイムを用い,拡散・逆過程の安定性と特徴投影の精度を向上させる。第2に,新しい手オブジェクトセマンティック埋め込みによる手オブジェクトのセマンティックな相互作用を意味的かつ幾何学的にモデル化し,手対象領域の再構築性能を向上させるために,デュアルストリームデノイザを導入する。 ObManデータセットと、HO3D、MOW、DexYCBの3つの実世界のデータセットの実験は、我々のアプローチが他の最先端の手法を全て超えることを示した。コードはリリースされる。 Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction (D-SCo), tackling two predominant challenges. First, to avoid the object centroid from deviating, we utilize a novel hand-constrained centroid fixing paradigm, enhancing the stability of diffusion and reverse processes and the precision of feature projection. Second, we introduce a dual-stream denoiser to semantically and geometrically model hand-object interactions with a novel unified hand-object semantic embedding, enhancing the reconstruction performance of the hand-occluded region of the object. Experiments on the synthetic ObMan dataset and three real-world datasets HO3D, MOW and DexYCB demonstrate that our approach can surpass all other state-of-the-art methods. Codes will be released.	翻訳日:2024-03-21 00:11:07 公開日:2024-03-18
# 時系列予測のためのモジュールニューラルネットワーク:注意を用いた解釈可能性と特徴選択 Modular Neural Networks for Time Series Forecasting: Interpretability and Feature Selection using Attention ( http://arxiv.org/abs/2311.16834v3 ) ライセンス: Link先を確認	Qiqi Su, Christos Kloukinas, Artur d'Avila Garcez,	(参考訳) 多変量時系列は、医療や気象学から生命科学まで、多くの応用がある。ディープラーニングモデルは時系列で優れた予測性能を示してきたが、彼らは「ブラックボックス」か非解釈可能であると批判されてきた。本稿では,構築によって解釈可能な多変量時系列予測のための新しいモジュール型ニューラルネットワークモデルを提案する。リカレントニューラルネットワークはデータ内の時間的依存関係を学習し、アテンションベースの特徴選択コンポーネントは最も関連性の高い特徴を選択し、時間的依存関係の学習に使用される冗長な特徴を抑制する。モジュール型のディープネットワークは、選択した機能から独立してトレーニングされ、ユーザーが機能がどのように結果に影響を与えるかを示し、モデルを解釈できる。実験結果から,本手法は,時系列タスクの回帰と分類の両方において,最先端の非解釈可能な手法であるLSTM,XGBoostに匹敵する予測性能を達成し,最先端の解釈可能なニューラル付加モデル(NAM)およびそれらのバリエーションより優れていることが示された。 Multivariate time series have many applications, from healthcare and meteorology to life science. Although deep learning models have shown excellent predictive performance for time series, they have been criticised for being "black-boxes" or non-interpretable. This paper proposes a novel modular neural network model for multivariate time series prediction that is interpretable by construction. A recurrent neural network learns the temporal dependencies in the data while an attention-based feature selection component selects the most relevant features and suppresses redundant features used in the learning of the temporal dependencies. A modular deep network is trained from the selected features independently to show the users how features influence outcomes, making the model interpretable. Experimental results show that this approach can outperform state-of-the-art interpretable Neural Additive Models (NAM) and variations thereof in both regression and classification of time series tasks, achieving a predictive performance that is comparable to the top non-interpretable methods for time series, LSTM and XGBoost.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-18
# 多項信念ネットワーク Multinomial belief networks ( http://arxiv.org/abs/2311.16909v2 ) ライセンス: Link先を確認	H. C. Donker, D. Neijzen, J. de Jong, G. A. Lunter,	(参考訳) 機械学習に対するベイズ的アプローチは、不確実性を定量化したり、観察の欠如に対処したり、サンプルが不足したり、データが不足する場合に魅力的である。これらの全ては、医療データを分析する際に一般的に適用される。これらの解析的要求に対処するために,ネットワークの重みと隠れた単位の両方をディリクレ分布とする多項数データの深部生成モデルを提案する。 Gibbsサンプリング手順は、Zhou-Cong-Chenモデルに類似した一連の拡張関係を利用する。本モデルは,手書き小文字と癌DNA変異の大規模な実験データセットに適用し,そのモデルが生物学的に意味のあるメタシグナチャを完全データ駆動で抽出できることを示す。 A Bayesian approach to machine learning is attractive when we need to quantify uncertainty, deal with missing observations, when samples are scarce, or when the data is sparse. All of these commonly apply when analysing healthcare data. To address these analytical requirements, we propose a deep generative model for multinomial count data where both the weights and hidden units of the network are Dirichlet distributed. A Gibbs sampling procedure is formulated that takes advantage of a series of augmentation relations, analogous to the Zhou--Cong--Chen model. We apply the model on small handwritten digits, and a large experimental dataset of DNA mutations in cancer, and we show how the model is able to extract biologically meaningful meta-signatures in a fully data-driven way.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-18
# COLE:多層・編集可能なグラフィクス設計のための階層型生成フレームワーク COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design ( http://arxiv.org/abs/2311.16974v2 ) ライセンス: Link先を確認	Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo,	(参考訳) 15世紀から進化してきたグラフィックデザインは、広告において重要な役割を担っている。高品質な設計を作成するには、設計指向の計画、推論、レイヤワイズ生成が必要である。 GPT-4を既存のデザインテンプレートと統合して独自のGPTを構築するCanvaGPTとは異なり、本研究ではこれらの課題に包括的に対処するために設計された階層型生成フレームワークであるCOLEシステムを紹介する。このCOLEシステムは、曖昧な意図のプロンプトを高品質な多層グラフィック設計に変換すると同時に、ユーザ入力に基づく柔軟な編集をサポートする。このような入力の例としては、久石の演奏会の「ポスターをデザインする」などの指示がある。重要な洞察は、テキスト・デザイン生成の複雑なタスクを単純なサブタスクの階層に分解することであり、それぞれが協調して動作する専門モデルによって対処される。これらのモデルの結果は、結合的な最終的な出力を生成するために統合される。我々の階層的なタスク分解は、複雑なプロセスを合理化し、生成信頼性を大幅に向上させることができる。我々のCOLEシステムは、複数の微調整されたLarge Language Model(LLM)、Large Multimodal Model(LMM)、Diffusion Models(DM)から構成される。さらに,ユーザ意図から高品質なグラフィックデザインを生成する上で,既存の手法よりもCOLEシステムの方が優れていることを示すために,DESIGNINTENTIONベンチマークを構築した。最後に、生成した多層グラフィック画像のフレキシブルな編集を支援するCanvaのような多層画像編集ツールを提案する。我々はCOLEシステムを、より複雑で多層的なグラフィックデザイン生成タスクに今後取り組むための重要なステップとして捉えている。 Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands design-oriented planning, reasoning, and layer-wise generation. Unlike the recent CanvaGPT, which integrates GPT-4 with existing design templates to build a custom GPT, this paper introduces the COLE system - a hierarchical generation framework designed to comprehensively address these challenges. This COLE system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input. Examples of such input might include directives like ``design a poster for Hisaishi's concert.'' The key insight is to dissect the complex task of text-to-design generation into a hierarchy of simpler sub-tasks, each addressed by specialized models working collaboratively. The results from these models are then consolidated to produce a cohesive final output. Our hierarchical task decomposition can streamline the complex process and significantly enhance generation reliability. Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text. Furthermore, we construct the DESIGNINTENTION benchmark to demonstrate the superiority of our COLE system over existing methods in generating high-quality graphic designs from user intent. Last, we present a Canva-like multi-layered image editing tool to support flexible editing of the generated multi-layered graphic design images. We perceive our COLE system as an important step towards addressing more complex and multi-layered graphic design generation tasks in the future.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-18
# 超振動のキャラクタリゼーションと定量化のための一提案 A proposal to characterize and quantify superoscillations ( http://arxiv.org/abs/2311.17703v2 ) ライセンス: Link先を確認	Yu Li, José Polo-Gómez, Eduardo Martín-Martínez,	(参考訳) 超振動関数の形式的定義を示す。これまでに提案された定義の限界について議論し、超振動挙動の全域をカバーしていないことを示す。本稿では,従来の定義を含まないよく知られた超振動関数の例を用いて,提案手法の適合性を実証する。 We present a formal definition of superoscillating function. We discuss the limitations of previously proposed definitions and illustrate that they do not cover the full gamut of superoscillatory behaviours. We demonstrate the suitability of the new proposal with several examples of well-known superoscillating functions that were not encompassed by previous definitions.	翻訳日:2024-03-21 00:01:19 公開日:2024-03-18
# SlimSAM: 0.1%のデータでセグメンテーションがスリムになる SlimSAM: 0.1% Data Makes Segment Anything Slim ( http://arxiv.org/abs/2312.05284v3 ) ライセンス: Link先を確認	Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang,	(参考訳) SAM(Segment Anything Model)を圧縮するための現在のアプローチでは、圧縮可能な結果が得られるが、スクラッチから新しいネットワークをトレーニングするためには、広範なデータが必要である。従来のプルーニング技術を用いることで、データ要求を大幅に削減できるが、性能の低下に悩まされる。そこで本研究では,SlimSAMというデータ効率のよいSAM圧縮手法を導入する。 SlimSAMの本質は、極めて限られたトレーニングデータ可用性と例外的な刈り取り率の下で、知識継承を効果的に強化する代替スリム化フレームワークにカプセル化されている。従来の手法から切り離された我々のフレームワークは、異なる分離されたサブ構造を交互に刈り取り、蒸留することによって、モデルを段階的に圧縮する。また, 切断対象とトレーニング対象との相違に対処するため, 破砕後の蒸留を促進させるため, 破砕したテイラープルーニングも提案されている。 SlimSAMは、既存の圧縮方法の10倍以上のトレーニングデータを要求する一方で、大幅なパフォーマンス向上を実現している。オリジナルのSAMと比較しても、SlimSAMはパラメータカウントをわずか1.4% (9.1M)、MACを0.8% (23G)、SAMトレーニングデータの0.1% (10k) に減らしながら、接近性能を達成する。コードはhttp://github.com/czg1225/SlimSAMで入手できる。 Current approaches for compressing the Segment Anything Model (SAM) yield commendable results, yet necessitate extensive data to train a new network from scratch. Employing conventional pruning techniques can remarkably reduce data requirements but would suffer from a degradation in performance. To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression method that achieves superior performance with extremely less training data. The essence of SlimSAM is encapsulated in the alternate slimming framework which effectively enhances knowledge inheritance under severely limited training data availability and exceptional pruning ratio. Diverging from prior techniques, our framework progressively compresses the model by alternately pruning and distilling distinct, decoupled sub-structures. Disturbed Taylor pruning is also proposed to address the misalignment between the pruning objective and training target, thereby boosting the post-distillation after pruning. SlimSAM yields significant performance improvements while demanding over 10 times less training data than any other existing compression methods. Even when compared to the original SAM, SlimSAM achieves approaching performance while reducing parameter counts to merely 1.4% (9.1M), MACs to 0.8% (23G), and requiring only 0.1% (10k) of the SAM training data. The code is available at http://github.com/czg1225/SlimSAM.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-18
# 量子ユーティリティの強化:超伝導量子コンピュータ上での大規模量子スピンチェーンのシミュレーション Enhancing quantum utility: simulating large-scale quantum spin chains on superconducting quantum computers ( http://arxiv.org/abs/2312.12427v2 ) ライセンス: Link先を確認	Talal Ahmed Chowdhury, Kwangmin Yu, Mahmud Ashraf Shamim, M. L. Kabir, Raza Sabbir Sufian,	(参考訳) 量子スピンのフラストレーション-$\frac{1}{2}$反強磁性ハイゼンベルクスピンチェーンの量子シミュレーションを、100の量子ビットを持つ実超伝導量子コンピュータにおいて、最も近い隣り合う$(J_1)$とnext-nearest-neighbor$(J_2)$の交換相互作用で行う。特に,IBMの超伝導量子コンピュータにおける近接する隣り合う相互作用と,隣り合う隣り合う隣り合う相互作用を持つハミルトニアンを初めて実装し,一階のトロッタライゼーションを用いてスピンチェーンの時間発展を行う。さらに, 近接交換相互作用のみを含む等方性ハイゼンベルクスピンチェーンの2次トロッタライゼーションの新規実装により, 最大100量子ビットの範囲で観測可能なスタッガー磁化の期待値の精密測定が可能となった。どちらの場合も、初期量子ビット数とは無関係に、各トロッターステップの回路深さが一定になる。超伝導量子コンピュータを用いた大規模量子システムの期待値の正確な測定の実証は、多体量子システムの様々な特性を研究するためのこれらの装置の量子ユーティリティを規定する。これは、フォールトトレランス量子時代以前の量子システムをシミュレートする際の古典的よりも量子上の優位性を達成するための足掛かりとなるだろう。 We present the quantum simulation of the frustrated quantum spin-$\frac{1}{2}$ antiferromagnetic Heisenberg spin chain with competing nearest-neighbor $(J_1)$ and next-nearest-neighbor $(J_2)$ exchange interactions in the real superconducting quantum computer with qubits ranging up to 100. In particular, we implement, for the first time, the Hamiltonian with the next-nearest neighbor exchange interaction in conjunction with the nearest neighbor interaction on IBM's superconducting quantum computer and carry out the time evolution of the spin chain by employing first-order Trotterization. Furthermore, our novel implementation of second-order Trotterization for the isotropic Heisenberg spin chain, involving only nearest-neighbor exchange interaction, enables precise measurement of the expectation values of staggered magnetization observable across a range of up to 100 qubits. Notably, in both cases, our approach results in a constant circuit depth in each Trotter step, independent of the initial number of qubits. Our demonstration of the accurate measurement of expectation values for the large-scale quantum system using superconducting quantum computers designates the quantum utility of these devices for investigating various properties of many-body quantum systems. This will be a stepping stone to achieving the quantum advantage over classical ones in simulating quantum systems before the fault tolerance quantum era.	翻訳日:2024-03-20 23:51:29 公開日:2024-03-18
# HD-Painter:拡散モデルによる高分解能・高感度テキストガイド画像 HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models ( http://arxiv.org/abs/2312.14091v3 ) ライセンス: Link先を確認	Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi,	(参考訳) テキスト・ツー・イメージの拡散モデルが前例のない成功を収めたことから, テキスト誘導画像のインペイント化の進展は, 極めて現実的で視覚的にも妥当な結果をもたらしている。しかし、現在のテキスト・ツー・イメージ・インペインティングモデルにおいて、特にユーザプロンプトとインペインティング領域の整合性の向上や高解像度インペインティングの実施において、大きな可能性を秘めている。そこで我々は,HD-Painterを導入し,プロンプトを正確に追従し,高分解能画像インパインティングにコヒーレントにスケールする訓練自由アプローチを提案する。そこで本研究では,Pmpt-Aware Introverted Attention (PAIntA) レイヤを設計し,より優れたテキスト・アライメント・ジェネレーションを実現することで自己注意スコアを向上させる。さらに迅速なコヒーレンスを改善するために,ポストホックサンプリング戦略をDDIMの一般的な形式にシームレスに統合し,非分布潜時シフトを防止するためのRASG(Reweighting Attention Score Guidance)機構を導入する。さらに、HD-Painterは、インペイント用にカスタマイズされた特殊な超解像技術を導入し、最大2K解像度の画像の欠落した領域の完成を可能にすることで、より大きなスケールへの拡張を可能にする。実験の結果,HD-Painterは既存の最先端アプローチを,複数のメトリクスとユーザスタディで定量的かつ質的に超越していることがわかった。コードは、https://github.com/Picsart-AI-Research/HD-Painterで公開されている。 Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, we introduce HD-Painter, a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information resulting in better text aligned generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into the general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study. Code is publicly available at: https://github.com/Picsart-AI-Research/HD-Painter	翻訳日:2024-03-20 23:51:29 公開日:2024-03-18
# 固有値探索とグラディエントDescenceのための量子アルゴリズムの改良 Improved Quantum Algorithms for Eigenvalues Finding and Gradient Descent ( http://arxiv.org/abs/2312.14786v2 ) ライセンス: Link先を確認	Nhat A. Nghiem, Tzu-Chieh Wei,	(参考訳) ブロック符号化は、最近開発された量子アルゴリズムの統一フレームワークを形成する量子信号処理において重要な要素である。当初、探索、振幅推定、ハミルトニアンシミュレーションといったいくつかの問題において資源利用の簡素化と最適化のために示され、量子信号処理の能力はこれらを超え、新しい量子アルゴリズムを考案するための未解決のポテンシャルを提供する。本稿では、ブロック符号化を利用して、これまで提案されていた2つの量子アルゴリズム、最大固有値推定と量子勾配降下を効果的に拡張する。従来の高度な手順を含む研究とは異なり、この発見はユニタリブロック符号化を用いて、初等演算であっても、これらの新しい量子アルゴリズムが元の演算子に存在する大きなスケーリング要因を排除できることを実証している。これにより、複雑な計算問題に驚くほどの効率で対処できるより効率的な量子アルゴリズムが得られる。さらに,提案手法を,行列逆転や多重固有値推定など,異なる文脈に拡張する方法を示す。 Block encoding is a key ingredient in the recently developed quantum signal processing that forms a unifying framework for quantum algorithms. Initially showcased for simplifying and optimizing resource utilization in several problems, such as searching, amplitude estimation, and Hamiltonian simulation, the capabilities of the quantum signal processing go beyond these and offer untapped potential for devising new quantum algorithms. In this article, we utilize block encoding to substantially enhance two previously proposed quantum algorithms: largest eigenvalue estimation and quantum gradient descent. Unlike previous works that involve sophisticated procedures, our findings, using the unitary block encoding, demonstrate that even with elementary operations, these new quantum algorithms can eliminate major scaling factors present in their original counterparts. This yields much more efficient quantum algorithms capable of tackling complex computational problems with remarkable efficiency. Furthermore, we show how to extend our proposed method to different contexts, including matrix inversion and multiple eigenvalues estimation.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 連続時間における集合列の確率論的モデリング Probabilistic Modeling for Sequences of Sets in Continuous-Time ( http://arxiv.org/abs/2312.15045v3 ) ライセンス: Link先を確認	Yuxin Chang, Alex Boyd, Padhraic Smyth,	(参考訳) ニューラルマークされた時間点過程は、連続時間イベントデータに対する統計パラメトリックモデルの既存のツールボックスに重要な付加物である。これらのモデルは、各イベントが1つのアイテム(ひとつのタイプのイベントまたは"マーク")に関連付けられたシーケンスに役立ちます。本研究では,任意の強度に基づくリカレント・ニューラルポイント・プロセス・モデルと互換性のある,設定値データを連続的にモデル化するための一般的なフレームワークを開発する。さらに,このようなモデルを用いて「$B$の前に観測されるアイテム$A$の確率」などの確率的クエリに応答できる推論手法を開発した。このようなクエリに対する正確な答えの計算は、問題設定の連続的な性質と、各事象の潜在的な結果の組合せ的に大きな空間の両方のために、一般的にはニューラルネットワークにとって難解である。そこで,本研究では,4つの実世界のデータセットを用いた体系的な実験を通じて,直接サンプリングよりも高次サンプリングの精度向上を図り,セットベースのシーケンスを問合せするための重要サンプリング手法のクラスを開発する。また、このフレームワークを用いて1段階の予測を伴わない確率を用いてモデル選択を行う方法について説明する。 Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 生成拡散を先行した実世界ブラインド顔復元に向けて Towards Real-World Blind Face Restoration with Generative Diffusion Prior ( http://arxiv.org/abs/2312.15736v2 ) ライセンス: Link先を確認	Xiaoxu Chen, Jingfan Tan, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaochun Cao,	(参考訳) ブラインド顔の復元はコンピュータビジョンにおいて重要な課題であり、広範囲の応用により注目されている。以前の研究は主に顔画像の復元に顔の先行性を利用しており、高品質な結果を示している。しかし、有限データから得られる知識が限られているため、忠実な顔の詳細を生成することは難しい問題である。本研究では,前訓練した安定拡散をブラインドフェイス修復に活用する可能性を探る。低画質の顔画像から特徴を効果的に抽出するように設計されたBFRffusionを提案する。さらに、人種、性別、年齢といったバランスのとれた属性を備えたプライバシ保護顔データセットであるPFHQを構築しています。このデータセットは、ブラインドフェイス復元ネットワークをトレーニングするための実行可能な代替手段として機能し、実際の顔データセットに関連するプライバシーとバイアスの懸念に効果的に対処する。大規模な実験を通じて、我々のBFRffusionは、ブラインドフェイス復元のための合成および実世界のパブリックテストデータセットの両方で最先端のパフォーマンスを達成し、PFHQデータセットはブラインドフェイス復元ネットワークをトレーニングするための利用可能なリソースであることを示す。コード、事前訓練されたモデル、データセットはhttps://github.com/chenxx89/BFRffusion.comでリリースされる。 Blind face restoration is an important task in computer vision and has gained significant attention due to its wide-range applications. Previous works mainly exploit facial priors to restore face images and have demonstrated high-quality results. However, generating faithful facial details remains a challenging problem due to the limited prior knowledge obtained from finite data. In this work, we delve into the potential of leveraging the pretrained Stable Diffusion for blind face restoration. We propose BFRffusion which is thoughtfully designed to effectively extract features from low-quality face images and could restore realistic and faithful facial details with the generative prior of the pretrained Stable Diffusion. In addition, we build a privacy-preserving face dataset called PFHQ with balanced attributes like race, gender, and age. This dataset can serve as a viable alternative for training blind face restoration networks, effectively addressing privacy and bias concerns usually associated with the real face datasets. Through an extensive series of experiments, we demonstrate that our BFRffusion achieves state-of-the-art performance on both synthetic and real-world public testing datasets for blind face restoration and our PFHQ dataset is an available resource for training blind face restoration networks. The codes, pretrained models, and dataset are released at https://github.com/chenxx89/BFRffusion.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 境界注意: 高騒音下で境界を局所化する学習 Boundary Attention: Learning to Localize Boundaries under High Noise ( http://arxiv.org/abs/2401.00935v2 ) ライセンス: Link先を確認	Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler,	(参考訳) 我々は、境界注意と呼ばれるメカニズムを用いて、曲線、コーナー、ジャンクションを含む明示的な境界を推論する微分可能モデルを提案する。境界アテンション(バウンダリアテンション)とは、画像内のすべての重なり合うパッチにおいて、局所境界構造の非ラスタライズされた記述を規定する変数のフィールドを、高密度かつ繰り返し適用する境界アテンション演算である。ボトムアップ方式で動作し、サブピクセルのエッジローカライゼーションやエッジリンクの古典的な手法に似ているが、より高次元的な局所境界構造の記述、設計ではなく学習される空間整合性の概念、エンドツーエンドで微分可能な操作のシーケンスがある。我々は、簡単な合成データを用いてモデルを訓練し、低照度でノイズの少ない写真を用いて評価する。提案手法は, 実センサノイズにより劣化した自然画像に一般化し, 他の最先端手法が故障した場合, ますますノイズの多い条件下で一貫した境界を予測できる。 We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlapping patch within an image. It operates in a bottom-up fashion, similar to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional description of local boundary structure, a notion of spatial consistency that is learned instead of designed, and a sequence of operations that is end-to-end differentiable. We train our model using simple synthetic data and then evaluate it using photographs that were captured under low-light conditions with variable amounts of noise. We find that our method generalizes to natural images corrupted by real sensor noise, and predicts consistent boundaries under increasingly noisy conditions where other state-of-the-art methods fail.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 大言語モデルにおけるゼロショット抽象要約の再検討 : 位置バイアスの観点から Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias ( http://arxiv.org/abs/2401.01989v3 ) ライセンス: Link先を確認	Anshuman Chhabra, Hadi Askari, Prasant Mohapatra,	(参考訳) 本研究では, 位置バイアスを測定することで, 大規模言語モデル(LLM)におけるゼロショット抽象的要約を特徴づけ, 研究し, 従来研究されていたより制限的な鉛バイアス現象の一般的な定式化として提案する。位置バイアスは入力テキストの特定の部分からの情報を不当に優先するモデルの傾向を捉え、望ましくない振る舞いをもたらす。 GPT 3.5-Turbo, Llama-2, Dolly-v2 などの複数の LLM モデルにおける位置バイアスと,Pegasus や BART などの最先端のエンコーダデコーダ・デコーダ抽象要約モデルについて検討した。その結果,ゼロショット要約タスクにおけるモデルの性能と位置バイアスに関する新たな洞察と議論につながった。 We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 図形シンプレクティック代数 Graphical Symplectic Algebra ( http://arxiv.org/abs/2401.07914v3 ) ライセンス: Link先を確認	Robert I. Booth, Titouan Carette, Cole Comfort,	(参考訳) 任意の体上のアフィンラグランジアンおよび共等方的関係のダガーコンパクトプロップに対して完全なプレゼンテーションを行う。これは、親和性に制約された古典力学系と奇数素次元安定化器量子回路の両方に対して統一的なグラフィカル言語群を提供する。この目的のために、無向有色グラフの特定のクラスによるアフィンラグランジアン関係を示す。合成系を推論するために,これらのグラフの頂点がグラフで色付けされるような,スケーラブルな表記法を導入する。安定化器量子力学の設定において、このスケーラブルな表記はグラフ状態の極めて簡潔な記述を与える。「'' 同様に、電気回路の古典的な機械的設定においては、相互ネットワークのインピーダンス行列は基本的に同じであることを示す。 We give complete presentations for the dagger-compact props of affine Lagrangian and coisotropic relations over an arbitrary field. This provides a unified family of graphical languages for both affinely constrained classical mechanical systems, as well as odd-prime-dimensional stabiliser quantum circuits. To this end, we present affine Lagrangian relations by a particular class of undirected coloured graphs. In order to reason about composite systems, we introduce a powerful scalable notation where the vertices of these graphs are themselves coloured by graphs. In the setting of stabiliser quantum mechanics, this scalable notation gives an extremely concise description of graph states, which can be composed via ``phased spider fusion.'' Likewise, in the classical mechanical setting of electrical circuits, we show that impedance matrices for reciprocal networks are presented in essentially the same way.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 確率論的ランベルト問題の解:最適物質輸送、シュレーディンガー橋および反応拡散PDEとの接続 Solution of the Probabilistic Lambert Problem: Connections with Optimal Mass Transport, Schrödinger Bridge and Reaction-Diffusion PDEs ( http://arxiv.org/abs/2401.07961v3 ) ライセンス: Link先を確認	Alexis M. H. Teter, Iman Nodozi, Abhishek Halder,	(参考訳) ランバートの問題は、重力場を受ける速度制御を介して所定の飛行時間内に、与えられた初期から所定の終端位置に宇宙船を移動させることである。位置ベクトルにおける終点制約の知識をそれぞれの合同確率密度関数の知識に置き換えるランベルト問題の確率的変種を考える。終端結合確率密度制約を伴うランベルト問題は、一般化された最適質量輸送(OMT)問題であり、この古典的な天体力学問題を、現代の確率制御と確率機械学習の進歩的な研究領域と結びつけていることを示す。この新たな接続により、確率ランベルト問題に対する解の存在と一意性を厳格に確立することができる。同じ接続は拡散正則化(英語版)により確率ランベルト問題を数値的に解くのにも役立ち、すなわち OMT と Schr\"odinger bridge problem (SBP) とのさらなる接続を利用する。これはまた、加法的動的プロセスノイズを伴う確率ランベルト問題は、実際は一般化されたSBPであり、この研究で述べたように、いわゆる「シュル・オーディンガー因子」を用いて数値的に解くことができることを示している。この結果から, 非線形重力ポテンシャルが反応速度として現れる反応拡散PDEの境界結合系の解法が導かれる。本稿では,新しいアルゴリズムを提案するとともに,実測的な数値結果を示す。解析とアルゴリズムの枠組みは非パラメトリックであり、統計的(例えば、ガウス的、最初の数モーメント、混合あるいは指数的族、十分な統計量の有限次元性)も動的(例えば、テイラー級数)近似もしない。 Lambert's problem concerns with transferring a spacecraft from a given initial to a given terminal position within prescribed flight time via velocity control subject to a gravitational force field. We consider a probabilistic variant of the Lambert problem where the knowledge of the endpoint constraints in position vectors are replaced by the knowledge of their respective joint probability density functions. We show that the Lambert problem with endpoint joint probability density constraints is a generalized optimal mass transport (OMT) problem, thereby connecting this classical astrodynamics problem with a burgeoning area of research in modern stochastic control and stochastic machine learning. This newfound connection allows us to rigorously establish the existence and uniqueness of solution for the probabilistic Lambert problem. The same connection also helps to numerically solve the probabilistic Lambert problem via diffusion regularization, i.e., by leveraging further connection of the OMT with the Schr\"odinger bridge problem (SBP). This also shows that the probabilistic Lambert problem with additive dynamic process noise is in fact a generalized SBP, and can be solved numerically using the so-called Schr\"odinger factors, as we do in this work. We explain how the resulting analysis leads to solving a boundary-coupled system of reaction-diffusion PDEs where the nonlinear gravitational potential appears as the reaction rate. We propose novel algorithms for the same, and present illustrative numerical results. Our analysis and the algorithmic framework are nonparametric, i.e., we make neither statistical (e.g., Gaussian, first few moments, mixture or exponential family, finite dimensionality of the sufficient statistic) nor dynamical (e.g., Taylor series) approximations.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# AI適応画像ラベリングにおけるコンフォーマル予測セットの有用性の評価 Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling ( http://arxiv.org/abs/2401.08876v5 ) ライセンス: Link先を確認	Dongping Zhang, Angelos Chatzimparmpas, Negar Kamali, Jessica Hullman,	(参考訳) ディープ・ニューラル・ネットワークはより一般的に高い領域に展開されるため、ブラックボックスの性質は不確実な定量化を困難にしている。本稿では,AIが推奨する意思決定における不確実性を表現するために,特定のカバレッジで予測セットを生成する手法の分布自由クラスである共形予測セットの提示の効果について検討する。大規模なオンライン実験を通じて、共形予測セットの有用性と、AIが推奨する画像ラベリングのためのTop-1およびTop-k予測の表示を比較した。事前登録された分析では,精度の予測セットの有用性はタスクの難易度に応じて変化し,Top-1やTop-kの表示と同等以上の精度で画像を容易に表示できる一方で,特にセットサイズが小さい場合には,人間にアウト・オブ・ディストリビューション(OOD)画像のラベル付けを支援できる予測セットが優れていることがわかった。本研究は,共形予測セットの実践的課題を実証的に特定し,実世界の意思決定に組み込む方法について考察した。 As deep neural networks are more commonly deployed in high-stakes domains, their black-box nature makes uncertainty quantification challenging. We investigate the effects of presenting conformal prediction sets--a distribution-free class of methods for generating prediction sets with specified coverage--to express uncertainty in AI-advised decision-making. Through a large online experiment, we compare the utility of conformal prediction sets to displays of Top-1 and Top-k predictions for AI-advised image labeling. In a pre-registered analysis, we find that the utility of prediction sets for accuracy varies with the difficulty of the task: while they result in accuracy on par with or less than Top-1 and Top-k displays for easy images, prediction sets excel at assisting humans in labeling out-of-distribution (OOD) images, especially when the set size is small. Our results empirically pinpoint practical challenges of conformal prediction sets and provide implications on how to incorporate them for real-world decision-making.	翻訳日:2024-03-20 23:41:33 公開日:2024-03-18
# 自動ファクトチェックのためのクレーム検出:単言語・多言語・言語横断研究に関する調査 Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research ( http://arxiv.org/abs/2401.11969v3 ) ライセンス: Link先を確認	Rrubaa Panchendrarajan, Arkaitz Zubiaga,	(参考訳) オンラインプラットフォーム上での誤情報拡散の増加により,過去数十年間,ファクトチェックの自動化が注目されている。これはしばしば、一連のタスクとして実行される。一検証を必要とするクレームを構成するオンラインプラットフォームに流通する文の検出 (ii)これらのクレームの検証プロセス本調査は, 事実確認を必要とするクレームを検出するための既存の取り組みを, 多言語データと手法に特に焦点をあてることにより, 前者に焦点を当てる。これは、既存の方法が人間のパフォーマンスにマッチするほど遠くない難易度の高い方向であり、この問題の極めて困難な性質のためである。特に、複数の社会プラットフォームにまたがる情報の拡散は、複数の言語やモダリティで具体化され、誤情報と戦うためのより一般化された解決策が要求される。多言語誤報に着目し,既存の多言語クレーム検出研究を包括的に調査する。本稿では,現状の多言語クレーム検出研究を,問題の3つの重要な要因,妥当性,優先性,類似性に分類する。さらに,既存の多言語データセットの概要と課題について概説し,今後の発展の可能性を提案する。 Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey focuses on the former, by discussing existing efforts towards detecting claims needing fact-checking, with a particular focus on multilingual data and methods. This is a challenging and fertile direction where existing methods are yet far from matching human performance due to the profoundly challenging nature of the issue. Especially, the dissemination of information across multiple social platforms, articulated in multiple languages and modalities demands more generalized solutions for combating misinformation. Focusing on multilingual misinformation, we present a comprehensive survey of existing multilingual claim detection research. We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. Further, we present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# 機械学習とシンボリック手法の融合:自然言語処理へのハイブリッドアプローチに関する調査 Synergizing Machine Learning & Symbolic Methods: A Survey on Hybrid Approaches to Natural Language Processing ( http://arxiv.org/abs/2401.11972v2 ) ライセンス: Link先を確認	Rrubaa Panchendrarajan, Arkaitz Zubiaga,	(参考訳) 機械学習とシンボリックアプローチの進歩は、自然言語処理(NLP)におけるその強みと弱点を裏付けている。機械学習のアプローチはデータのパターンを特定するのに強力だが、コモンセンスとNLPタスクに必要な事実知識の学習には不足することが多い。一方、記号的手法は知識に富んだデータを表現するのに優れている。しかし、彼らは動的データに適応し、知識を一般化するのに苦労している。これら2つのパラダイムをハイブリッドアプローチでブリッジすることで、強みを保ちながら両方の弱点を緩和することができる。近年の研究は、様々なNLPタスクにおいて有望な結果を示しながら、この連合の長所を誇示している。本稿では,NLPにおけるハイブリッドアプローチの概要について述べる。具体的には、自然言語理解、生成、推論を必要とする幅広いNLPタスクに使用される最先端のハイブリッドアプローチについて検討する。さらに,NLPのハイブリッド手法として利用可能な既存の資源と課題と今後の方向性について論じ,今後の研究のロードマップを提供する。 The advancement of machine learning and symbolic approaches have underscored their strengths and weaknesses in Natural Language Processing (NLP). While machine learning approaches are powerful in identifying patterns in data, they often fall short in learning commonsense and the factual knowledge required for the NLP tasks. Meanwhile, the symbolic methods excel in representing knowledge-rich data. However, they struggle to adapt dynamic data and generalize the knowledge. Bridging these two paradigms through hybrid approaches enables the alleviation of weaknesses in both while preserving their strengths. Recent studies extol the virtues of this union, showcasing promising results in a wide range of NLP tasks. In this paper, we present an overview of hybrid approaches used for NLP. Specifically, we delve into the state-of-the-art hybrid approaches used for a broad spectrum of NLP tasks requiring natural language understanding, generation, and reasoning. Furthermore, we discuss the existing resources available for hybrid approaches for NLP along with the challenges and future directions, offering a roadmap for future research avenues.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# 超伝導量子ビットにおけるコヒーレント2レベル系の離散電荷状態の観測 Observation of discrete charge states of a coherent two-level system in a superconducting qubit ( http://arxiv.org/abs/2401.12183v2 ) ライセンス: Link先を確認	Bao-Jie Liu, Ying-Ying Wang, Tal Sheffer, Chen Wang,	(参考訳) 我々は、オフセット電荷感受性超伝導トランスモン量子ビットに強く結合したコヒーレント誘電体2レベル系(TLS)の離散電荷状態の観測を報告する。 2つのTLS固有状態(遷移周波数2.9GHz、緩和時間3ms)に関連する0.072$e$のオフセット電荷を測定する。さらにTLS遷移と準粒子トンネル力学のジョイントトラッキングを行うが,本質的な相関は見つからない。本研究では、低周波帯電雑音の発生源としてマイクロ波TLSを示す。 We report observations of discrete charge states of a coherent dielectric two-level system (TLS) that is strongly coupled to an offset-charge-sensitive superconducting transmon qubit. We measure an offset charge of 0.072$e$ associated with the two TLS eigenstates, which have a transition frequency of 2.9 GHz and a relaxation time exceeding 3 ms. Combining measurements in the strong dispersive and resonant regime, we quantify both transverse and longitudinal couplings of the TLS-qubit interaction. We further perform joint tracking of TLS transitions and quasiparticle tunneling dynamics but find no intrinsic correlations. This study demonstrates microwave-frequency TLS as a source of low-frequency charge noise.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# 人間のフィードバックによる機械翻訳の改善--リワードモデルによる品質評価の探索 Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model ( http://arxiv.org/abs/2401.12873v3 ) ライセンス: Link先を確認	Zhiwei He, Xing Wang, Wenxiang Jiao, Zhuosheng Zhang, Rui Wang, Shuming Shi, Zhaopeng Tu,	(参考訳) 報酬モデルにおける人間の嗜好の不十分なモデリングは、人間のフィードバックを活用して翻訳品質を向上させる上で大きな障害となる。幸いなことに、ある翻訳の品質を基準なしに予測する品質評価(QE)は、過去2年間に人間の評価と顕著に一致している。本研究では,QEモデルを報酬モデルとして活用し,フィードバックトレーニングにおける人間の嗜好を予測する可能性について検討する。まず,QEに基づくフィードバックトレーニングにおいて,翻訳品質が低下する中で,報酬の増大として現れる過度な最適化問題を同定した。この問題を検証し,QEモデルの脆弱性は誤訳に対して高い報奨を与える可能性があり,過度な最適化と誤りの伝播をもたらすと論じる。この問題に対処するために、ヒューリスティックなルールを用いて誤った翻訳を検知し、報酬のスコアにペナルティ項を割り当てる、単純で効果的な手法を採用する。実験の結果,提案したQEに基づくフィードバックトレーニングは,様々な設定において一貫した,重要な改善を達成し,さらに人間の嗜好研究を通じて検証された。続く分析では、QEに基づくフィードバックトレーニングの高効率性を実証し、少量のモノリンガルデータにより、より大きな並列コーパスを用いたシステムより優れていることを示す。私たちのコードは、https://github.com/zwhe99/FeedbackMTで利用可能です。 Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. We examine the problem and argue that the vulnerability of the QE model might lead to high rewards for incorrect translations, resulting in overoptimization and error propagation. To address the problem, we adopt a simple yet effective method that uses heuristic rules to detect the incorrect translations and assigns a penalty term to the reward scores of them. Experimental results show that the proposed QE-based feedback training achieves consistent and significant improvements across various settings, further verified through human preference studies. Our subsequent analysis demonstrates the high data efficiency of the proposed QE-based feedback training: it outperforms systems using larger parallel corpora by a small amount of monolingual data. Our code is available at: https://github.com/zwhe99/FeedbackMT	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# 時間依存力学学習におけるリッチフロー誘導オートエンコーダ Ricci flow-guided autoencoders in learning time-dependent dynamics ( http://arxiv.org/abs/2401.14591v5 ) ライセンス: Link先を確認	Andrew Gracyk,	(参考訳) 本稿では, 時間的非線形力学, 特に偏微分方程式 (PDE) を学習するための多様体ベースオートエンコーダ法を提案する。これは、物理学的インフォームドな設定でリッチフローをシミュレートすることで実現でき、また、リッチフローが経験的に達成されるように、多様体の量と一致させることができる。我々の方法論では、多様体は訓練手順の一部として学習されるので、理想的な測地は識別されうるが、進化は静的な方法よりも共役な潜在表現を同時に引き起こす。本稿では,周期性やランダム性,分布内誤差,外挿シナリオなどの望ましい特徴を包含するPDEを用いた数値実験について述べる。 We present a manifold-based autoencoder method for learning nonlinear dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by simulating Ricci flow in a physics-informed setting, and manifold quantities can be matched so that Ricci flow is empirically achieved. With our methodology, the manifold is learned as part of the training procedure, so ideal geometries may be discerned, while the evolution simultaneously induces a more accommodating latent representation over static methods. We present our method on a range of numerical experiments consisting of PDEs that encompass desirable characteristics such as periodicity and randomness, remarking error on in-distribution and extrapolation scenarios.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# 組合せ最適化のための注意に基づく強化学習:ジョブショップスケジューリング問題への応用 Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem ( http://arxiv.org/abs/2401.16580v2 ) ライセンス: Link先を確認	Jaejin Lee, Seho Kee, Mani Janakiram, George Runger,	(参考訳) ジョブショップスケジューリング問題は組合せ最適化問題の重要かつ複雑な側面を表しており、これは伝統的に正確な解法または近似解法によって解決されてきた。しかし、現実の問題の複雑さのために、これらのソリューションの実践的な応用がしばしば挑戦される。近似解法を利用する場合であっても、近似解を特定するのに必要な時間は禁じられ、導出された解は一般に新しい問題に適用できない。本研究では,ジョブショップスケジューリング問題に特化して設計された,革新的な注意力に基づく強化学習手法を提案する。この方法は、ポリシー勾配強化学習アプローチと、改良されたトランスフォーマーアーキテクチャを統合する。この研究の鍵となる発見は、提案手法で訓練を受けた学習者が、初期訓練セットに含まれない大規模問題に再利用できることである。さらに,本手法が最近の研究結果を上回り,一般に実施されているヒューリスティックルールを上回ることを示す実証的証拠が得られた。このことから,本手法は,求人スケジューリング問題における今後の研究・実践の道筋として有望なものであることが示唆された。 Job shop scheduling problems represent a significant and complex facet of combinatorial optimization problems, which have traditionally been addressed through either exact or approximate solution methodologies. However, the practical application of these solutions is often challenged due to the complexity of real-world problems. Even when utilizing an approximate solution approach, the time required to identify a near-optimal solution can be prohibitively extensive, and the solutions derived are generally not applicable to new problems. This study proposes an innovative attention-based reinforcement learning method specifically designed for the category of job shop scheduling problems. This method integrates a policy gradient reinforcement learning approach with a modified transformer architecture. A key finding of this research is the ability of our trained learners within the proposed method to be repurposed for larger-scale problems that were not part of the initial training set. Furthermore, empirical evidence demonstrates that our approach surpasses the results of recent studies and outperforms commonly implemented heuristic rules. This suggests that our method offers a promising avenue for future research and practical application in the field of job shop scheduling problems.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# 実行可能なコードアクションにより、より良いLLMエージェントが取り除かれる Executable Code Actions Elicit Better LLM Agents ( http://arxiv.org/abs/2402.01030v2 ) ライセンス: Link先を確認	Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji,	(参考訳) 大きな言語モデル(LLM)エージェントは、ツールの呼び出しやロボットの制御など、幅広いアクションを実行することができ、現実世界の課題に取り組む大きな可能性を示している。 LLMエージェントは、通常、事前に定義されたフォーマットでJSONやテキストを生成することでアクションを生成するよう促される。この研究は、実行可能なPythonコードを使用して、LLMエージェントのアクションを統一されたアクション空間(CodeAct)に統合することを提案する。 Pythonインタプリタと統合されたCodeActは、コードアクションを実行し、事前アクションを動的に修正したり、マルチターンインタラクションを通じて新しい観察に新しいアクションを発行することができる。 API-Bank上の17のLLMと、新たにキュレートされたベンチマークの広範な分析は、CodeActが広く使われている代替品(最大20%の成功率)を上回っていることを示している。 CodeActのパフォーマンス向上は、解釈可能なコードを実行し、自然言語を使ってユーザとコラボレーションすることで、環境と対話するオープンソースのLLMエージェントを構築する動機となります。この目的のために,CodeAct を用いた 7k のマルチターンインタラクションからなる命令チューニングデータセット CodeActInstruct を収集する。本稿では,エージェント指向タスクのモデルを改善するために,既存のデータと組み合わせることで,汎用性を損なうことなく利用できることを示す。 Llama2とMistralから微調整されたCodeActAgentはPythonインタプリタと統合されており、既存のライブラリを使用して高度なタスク(例えばモデルトレーニング)を実行し、自律的に自己デバッグするように設計されている。 Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.	翻訳日:2024-03-20 23:31:36 公開日:2024-03-18
# LHRS-Bot:VGI強化大規模マルチモーダル言語モデルを用いたリモートセンシング LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model ( http://arxiv.org/abs/2402.02544v3 ) ライセンス: Link先を確認	Dilxat Muhtar, Zhenshi Li, Feng Gu, Xueliang Zhang, Pengfeng Xiao,	(参考訳) 大規模言語モデル(LLM)の革命的能力は、マルチモーダルな大規模言語モデル(MLLM)の道を切り開き、様々な専門分野にまたがる多様な応用を育んでいる。しかし、リモートセンシング(RS)分野では、最近のMLLMでは、多様な地形やRS画像の様々な物体が適切に考慮されていない。このギャップを埋めるために、大規模なRS画像テキストデータセットであるLHRS-Alignと情報的RS固有の命令データセットであるLHRS-Instructを構築し、大規模なボランティア地理情報(VGI)とグローバルに利用可能なRS画像を活用する。この基盤の上に構築されたLHRS-Botは、新しい多段階視覚言語アライメント戦略とカリキュラム学習手法により、RS画像理解に適したMLLMである。さらに、RS画像理解におけるMLLMの能力を徹底的に評価するベンチマークであるLHRS-Benchを紹介する。総合的な実験により、LHRS-BotはRS画像の深い理解と、RS領域内でニュアンス推論を行う能力を示すことが示された。 The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across various specialized domains. In the remote sensing (RS) field, however, the diverse geographical landscapes and varied objects in RS imagery are not adequately considered in recent MLLM endeavors. To bridge this gap, we construct a large-scale RS image-text dataset, LHRS-Align, and an informative RS-specific instruction dataset, LHRS-Instruct, leveraging the extensive volunteered geographic information (VGI) and globally available RS images. Building on this foundation, we introduce LHRS-Bot, an MLLM tailored for RS image understanding through a novel multi-level vision-language alignment strategy and a curriculum learning method. Additionally, we introduce LHRS-Bench, a benchmark for thoroughly evaluating MLLMs' abilities in RS image understanding. Comprehensive experiments demonstrate that LHRS-Bot exhibits a profound understanding of RS images and the ability to perform nuanced reasoning within the RS domain.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-18
# パラメータフリー確率最適化はどのくらい自由か? How Free is Parameter-Free Stochastic Optimization? ( http://arxiv.org/abs/2402.03126v2 ) ライセンス: Link先を確認	Amit Attia, Tomer Koren,	(参考訳) パラメータフリー確率最適化の問題について,パラメータフリーな手法が存在するかどうかを問うとともに,パラメータフリーな手法と競合する収束率を求める。既存のパラメータフリーなメソッドは、確率的勾配ノルム上の境界、最小値への距離上の境界など、真の問題パラメータに関するいくつかの非自明な知識を必要とするため、 `partially'' パラメータフリーとみなすことができる。非凸環境では、単純なハイパーパラメータ探索技術により、より洗練された最先端のアルゴリズムより優れたパラメータフリーな手法が実現されることを示す。また,弱雑音条件下では,雑音関数値にアクセス可能な凸設定でも同様の結果が得られる。最後に、確率勾配にのみアクセスすると、完全にパラメータフリーな確率凸最適化が実現不可能な下界を確立し、(部分的には)下界で示される極限までパラメータフリーな方法を提案する。 We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered ``partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-18
# 2層ネットワークにおけるグラディエントダイスのためのバッチの再利用効果:情報量とプループ指数を破る The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents ( http://arxiv.org/abs/2402.03220v2 ) ライセンス: Link先を確認	Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala,	(参考訳) マルチインデックスターゲット関数を学習する際の2層ニューラルネットワークのトレーニングダイナミクスについて検討する。本稿では,複数回バッチを再利用するマルチパス勾配勾配(GD)に着目し,単一パス勾配勾配よりも学習可能な関数の結論を大きく変えることを示す。特に、有限段差をもつマルチパスGDは、目標関数の情報指数 (Ben Arous et al , 2021) と跳躍指数 (Abbe et al , 2023) によって与えられる勾配流とシングルパスGDの限界を克服する。本研究では, 階段特性を満足しない関数に対しても, ネットワークは2段階に留まらず, 目標部分空間と重なり合うことを実証する(Abbe et al , 2021)。有限時間で効率的に学習された関数の(広さの)クラスを特徴づける。この結果の証明は、動的平均場理論(DMFT)の分析に基づいている。さらに、重みの低次元射影の動的過程の閉形式記述と、その理論を説明する数値実験について述べる。 We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function. We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property (Abbe et al., 2021). We characterize the (broad) class of functions efficiently learned in finite time. The proof of our results is based on the analysis of the Dynamical Mean-Field Theory (DMFT). We further provide a closed-form description of the dynamical process of the low-dimensional projections of the weights, and numerical experiments illustrating the theory.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-18
# Retrieve to Explain: 言語モデルによるエビデンス駆動予測 Retrieve to Explain: Evidence-driven Predictions with Language Models ( http://arxiv.org/abs/2402.04068v2 ) ライセンス: Link先を確認	Ravi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane Corneil,	(参考訳) マシンラーニングモデル、特に言語モデルは、イントロスペクションが難しいことで有名です。ブラックボックスモデルは、モデルトレーニングと有害バイアスの両方の問題を隠蔽することができる。ヒューマン・イン・ザ・ループのプロセスでは、不透明な予測は信頼の欠如を招き、効果的に実行してもモデルへの影響を制限する。これらの問題に対処するために、Retrieve to Explain (R2E)を紹介します。 R2Eは検索に基づく言語モデルであり、文書コーパスのエビデンスに基づいた研究質問に対して、最終的な予測に対する証拠の相対的重要性を特定するためにシェープリー値を使用する。 R2Eは、再訓練することなく新しいエビデンスに適応し、自然言語へのテンプレート化を通じて構造化データを組み込むことができる。本研究は,本モデルが臨床治験結果を予測するための業界標準遺伝学的アプローチよりも優れていることを示す。 Machine learning models, particularly language models, are notoriously difficult to introspect. Black-box models can mask both issues in model training and harmful biases. For human-in-the-loop processes, opaque predictions can drive lack of trust, limiting a model's impact even when it performs effectively. To address these issues, we introduce Retrieve to Explain (R2E). R2E is a retrieval-based language model that prioritizes amongst a pre-defined set of possible answers to a research question based on the evidence in a document corpus, using Shapley values to identify the relative importance of pieces of evidence to the final prediction. R2E can adapt to new evidence without retraining, and incorporate structured data through templating into natural language. We assess on the use case of drug target identification from published scientific literature, where we show that the model outperforms an industry-standard genetics-based approach on predicting clinical trial outcomes.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-18
# 身元不明の患者集団に対する非検出的敵対的バイアスアタック(動画あり) Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations ( http://arxiv.org/abs/2402.05713v2 ) ライセンス: Link先を確認	Pranav Kulkarni, Andrew Chan, Nithya Navarathna, Skylar Chan, Paul H. Yi, Vishwa S. Parekh,	(参考訳) 放射線学における人工知能(AI)の拡散は、深層学習(DL)モデルが脆弱な患者に対する臨床バイアスを悪化させるリスクに光を当てている。従来の文献では、訓練されたDLモデルによって示されるバイアスの定量化に焦点が当てられていたが、人口統計学的にDLモデルに対する敵対的バイアス攻撃とその臨床環境への影響は、医用画像研究の未調査分野として残されている。本研究は,人口動態を標的としたラベル中毒攻撃が,DLモデルにおいて検出不能な診断バイアスをもたらすことを実証するものである。本研究の結果は,性別,年齢,およびそれらの交叉部分群など,複数のパフォーマンス指標および人口動態群にまたがって,モデル全体の性能に影響を及ぼすことなく,グループモデルのパフォーマンスを劣化させることにより,対象群における偏見に対する高い選択性を示すことが示された。さらに, 逆偏差攻撃は, 外部データセットを用いて評価しても, 予測偏差を伝播する有意なDLモデルをもたらすことが示唆された。 The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-18
# チューニング不要確率最適化 Tuning-Free Stochastic Optimization ( http://arxiv.org/abs/2402.07793v2 ) ライセンス: Link先を確認	Ahmed Khaled, Chi Jin,	(参考訳) 大規模な機械学習の問題は、ハイパーパラメータチューニングのコストをより禁忌なものにする。これにより、自分自身をオンザフライでチューニングできるアルゴリズムの必要性が生まれます。我々は,最適化アルゴリズムの最適化性能を,関連する問題パラメータのゆるいヒントのみを与えられた多対数因子に適合させる「チューニングフリー」アルゴリズムの概念を定式化する。本稿では,SGD(Stochastic Gradient Descent)を最適に調整したアルゴリズムについて考察する。最適化領域が有界である場合、SGDのチューニング不要なマッチングが可能であり、既存のアルゴリズムによって実現可能であることを示す。凸や滑らかなリプシッツ関数を非有界領域上で最小化するタスクでは、チューニング不要な最適化は不可能である。非有界領域でもチューニング不要な最適化が可能となる条件について論じる。特に,最近提案されたDoGアルゴリズムとDoWGアルゴリズムは,ノイズ分布が十分に良好な場合,チューニング不要であることを示す。滑らかで潜在的に非凸な関数の定常点を求めるタスクに対して、チューニングされたSGDの最もよく知られた高確率収束率と、追加の多対数コストで一致するSGDの変種を与える。しかし、調整されたSGDの最適収束率を高い確率で一致させるアルゴリズムが存在しないことを示す不確実性結果も提示する。 Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD). When the domain of optimization is bounded, we show tuning-free matching of SGD is possible and achieved by several existing algorithms. We prove that for the task of minimizing a convex and smooth or Lipschitz function over an unbounded domain, tuning-free optimization is impossible. We discuss conditions under which tuning-free optimization is possible even over unbounded domains. In particular, we show that the recently proposed DoG and DoWG algorithms are tuning-free when the noise distribution is sufficiently well-behaved. For the task of finding a stationary point of a smooth and potentially nonconvex function, we give a variant of SGD that matches the best-known high-probability convergence rate for tuned SGD at only an additional polylogarithmic cost. However, we also give an impossibility result that shows no algorithm can hope to match the optimal expected convergence rate for tuned SGD with high probability.	翻訳日:2024-03-20 23:21:52 公開日:2024-03-18
# CiMNet:DNNアーキテクチャとコンピュート・イン・メモリハードウェアの構成を共同で最適化する CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware ( http://arxiv.org/abs/2402.11780v2 ) ライセンス: Link先を確認	Souvik Kundu, Anthony Sarah, Vinay Joshi, Om J Omer, Sreenivas Subramoney,	(参考訳) 近年の大規模ディープニューラルネットワークの需要増加に伴い、コンピューティングインメモリ(CiM)は、Von-Neumanアーキテクチャを制約する帯域幅とオンチップの相互接続ボトルネックを緩和する重要なソリューションとして浮上した。しかし、CiMハードウェアの構築は、異なるインタフェースにおけるキャッシュサイズとメモリ帯域幅の特定のメモリ階層が、テンソル次元や演算強度などのニューラルネットワークの属性と理想的に一致しない可能性があるため、最適化された性能の低いシステムに繋がる。ニューラルネットワークサーチ(NAS)技術は、所定のハードウェアメトリック予算(例えば、DNNの実行時間やレイテンシ)に対して効率的なサブネットワークを提供するのに成功しているが、ハードウェア構成は凍結され、しばしば与えられた予算に対して最適なサブネットワークを提供する。本稿では,CiMアーキテクチャのための最適なサブネットワークとハードウェア構成を共同で検索するフレームワークであるCiMNetを提案する。提案フレームワークは、サブネットワークの性能と、帯域幅、処理要素サイズ、メモリサイズを含むCiMハードウェア構成の選択との間の複雑な相互作用を理解することができる。 CNNとTransformerファミリーの異なるモデルアーキテクチャに関する実験は、CiMNetが協調最適化サブネットワークとCiMハードウェア構成を見つける上で有効であることを実証している。具体的には、ImageNetの分類精度をベースラインのViT-Bと同等にするために、モデルアーキテクチャのみを最適化するとパフォーマンスが1.7倍に向上し、モデルアーキテクチャとハードウェア構成の両方を最適化すると3.1倍に向上する。 With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-18
# MVDiffusion++:シングル・スパース・ビュー3次元オブジェクト再構成のための高分解能多視点拡散モデル MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction ( http://arxiv.org/abs/2402.12712v2 ) ライセンス: Link先を確認	Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan,	(参考訳) 本稿では,3次元オブジェクト再構成のためのニューラルネットワークMVDiffusion++を提案する。 MVDiffusion++は2つの驚くほどシンプルなアイデアで優れた柔軟性とスケーラビリティを実現します。 1) カメラポーズ情報を明示的に使用せずに、任意の数の条件および生成ビューにまたがる3次元の一貫性を学習する2次元潜伏特徴間の標準的な自己意識を学習する「目的なしアーキテクチャ」。 2)「ビュードロップアウト戦略」は、トレーニング中にかなりの数のアウトプットビューを捨て、トレーニング時のメモリフットプリントを削減し、テスト時に高精細で高精細なビュー合成を可能にする。我々はObjaverseをトレーニングに使用し、Google Scanned Objectsを標準的な新しいビュー合成と3D再構成のメトリクスで評価し、MVDiffusion++は現在の最先端技術よりも大幅に優れています。また,MVDiffusion++とテキスト・ツー・イメージ生成モデルを組み合わせることで,テキスト・ツー・3Dアプリケーションの例を示す。プロジェクトのページはhttps://mvdiffusion-plus.github.ioにある。 This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model. The project page is at https://mvdiffusion-plusplus.github.io.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-18
# 視覚的位置認識のための事前学習モデルのシームレス適応に向けて Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition ( http://arxiv.org/abs/2402.14505v2 ) ライセンス: Link先を確認	Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, Yaowei Wang, Chun Yuan,	(参考訳) 近年の研究では、大規模データを用いた汎用的な視覚学習タスクで事前訓練された視覚モデルが、幅広い視覚知覚問題に有用な特徴表現を提供する可能性が示されている。しかし、視覚的位置認識(VPR)において、事前訓練された基礎モデルを活用する試みはほとんど行われていない。モデル事前学習とVPRのタスク間のトレーニング目標とデータに固有の違いがあるため、どのようにギャップを埋め、VPRのための事前訓練されたモデルの能力を完全に解き放つかは、依然として対処すべき重要な問題である。そこで本研究では,VPRのための事前学習モデルのシームレスな適応を実現する新しい手法を提案する。具体的には、地域を識別するための有意義なランドマークに焦点を当てたグローバル・ローカル両方の特徴を得るために、グローバル・ローカル両方の適応を効率的に実現するためのハイブリッド適応法を設計し、事前訓練されたモデルを調整することなく軽量アダプタのみをチューニングする。また,有効適応の導出として,局所的マッチングに適切な局所的特徴が生成され,再ランク付けに要する時間的空間的検証を回避できる相互近接局所的特徴損失を提案する。実験結果から,本手法は訓練データとトレーニング時間が少なくて最先端の手法より優れており,RANSACによる空間的検証を行う2段階VPR法では,約3%の検索実行時間しか利用できないことがわかった。 MSLSチャレンジリーダーボード(応募時点で)で1位にランクインしている。コードはhttps://github.com/Lu-Feng/SelaVPRで公開されている。 Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-18
# マルチカット多面体の切削面と立方体面 Cut Facets and Cube Facets of Lifted Multicut Polytopes ( http://arxiv.org/abs/2402.16814v2 ) ライセンス: Link先を確認	Lucas Fabian Naumann, Jannik Irmai, Shengxian Zhao, Bjoern Andres,	(参考訳) 昇降型マルチカット問題はコンピュータビジョンの分野で様々な応用がある。線形プログラミングに基づく厳密なアルゴリズムは、持ち上げられたマルチカットポリトープを理解する必要がある。最近の進歩にもかかわらず、これらのポリトープに関する基本的な2つの疑問は未解決のままである: どの低い立方体不等式がファセットを定義するのか、どの不等式がファセットを定義するのか? 本稿では, 必要な条件, 十分かつ効率的に決定可能な条件を確立することで, 最初の質問に答える。第2の質問に向けて、カット不等式のファセット定義性を決定することはNPハードであることを示す。これにより、昇降型マルチカットポリトープの正準面の解析が完了する。 The lifted multicut problem has diverse applications in the field of computer vision. Exact algorithms based on linear programming require an understanding of lifted multicut polytopes. Despite recent progress, two fundamental questions about these polytopes have remained open: Which lower cube inequalities define facets, and which cut inequalities define facets? In this article, we answer the first question by establishing conditions that are necessary, sufficient and efficiently decidable. Toward the second question, we show that deciding facet-definingness of cut inequalities is NP-hard. This completes the analysis of canonical facets of lifted multicut polytopes.	翻訳日:2024-03-20 23:12:03 公開日:2024-03-18
# 次の単語を予測する:人間はこのタスクに不確実性を示し、言語モデル______ Predict the Next Word: Humans exhibit uncertainty in this task and language models _____ ( http://arxiv.org/abs/2402.17527v2 ) ライセンス: Link先を確認	Evgenia Ilia, Wilker Aziz,	(参考訳) 言語モデル (LM) は、人間の生成したテキストに確率を割り当てるよう訓練された統計モデルである。このように、人間の言語的多様性をよく表すかどうかを疑問視することは妥当である。この形式の統計評価は、受理性判定(人的評価)や堅牢な自動プロキシ(非自明な)を必要とするため、通過レベルでの実施が困難である。しかしながら、ある文脈が与えられた単語レベルでは、LMからのサンプルは、利用可能なコンテキストの代替の単一単語継続の事前記録されたデータセットと正確なマッチングによって評価することができる。我々は,この事実を生かし,人間(特に英語話者の集団)が「次の単語予測」タスクで示す多様性を再現するLMの能力を評価する。これは、テキスト分類の文脈において、Baan et al (2022) は、人間の不確実性に対するキャリブレーション(キャリブレーション)と呼んだ。我々は、GPT2、BLOOM、ChatGPTを評価し、人間の不確実性に対するキャリブレーションがかなり低いことを発見した。また, 予測校正誤差(ECE)の誤りを反映し, コミュニティに対して, この設定でそれに頼ることを推奨する。 Language models (LMs) are statistical models trained to assign probability to human-generated text. As such, it is reasonable to question whether they approximate linguistic variability exhibited by humans well. This form of statistical assessment is difficult to perform at the passage level, for it requires acceptability judgements (i.e., human evaluation) or a robust automated proxy (which is non-trivial). At the word level, however, given some context, samples from an LM can be assessed via exact matching against a prerecorded dataset of alternative single-word continuations of the available context. We exploit this fact and evaluate the LM's ability to reproduce variability that humans (in particular, a population of English speakers) exhibit in the 'next word prediction' task. This can be seen as assessing a form of calibration, which, in the context of text classification, Baan et al. (2022) termed calibration to human uncertainty. We assess GPT2, BLOOM and ChatGPT and find that they exhibit fairly low calibration to human uncertainty. We also verify the failure of expected calibration error (ECE) to reflect this, and as such, advise the community against relying on it in this setting.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# ランダム位相演算子とサブグラフ位相演算子を用いたQAOA QAOA with random and subgraph phase operators ( http://arxiv.org/abs/2402.18412v2 ) ライセンス: Link先を確認	Anthony Wilkie, Igor Gaidai, James Ostrowski, Rebekah Herrman,	(参考訳) 量子近似最適化アルゴリズム(QAOA)は、組合せ最適化問題を解くのに使える有望な量子アルゴリズムである。通常のQAOAアンザッツは、コストとミキサーハミルトンの交互に応用される。本研究では,従来のコストであるハミルトン演算子以外のハミルトニアンの使用がQAOAの性能に与える影響について検討する。 p = 1のカスタム位相演算子を持つQAOAの期待値式を導出し、これらのカスタム位相演算子のうちいくつかが元のアルゴリズムよりも高い近似比を達成できることを数値的に示す。テストされた全てのグラフのうち、ランダムなカスタム位相演算子の0.036\%、サブグラフのカスタム位相演算子の75.9\%、三角形を除去したカスタム位相演算子の95.1\%、最大次エッジを除去したカスタム位相演算子の93.9\%は、元のQAOA実装よりも高い近似比を持つ。この発見は、QAOAの性能をさらに向上させるために、より良い位相演算子を設計できるかどうかという疑問を提起する。 The quantum approximate optimization algorithm (QAOA) is a promising quantum algorithm that can be used to approximately solve combinatorial optimization problems. The usual QAOA ansatz consists of an alternating application of the cost and mixer Hamiltonians. In this work, we study how using Hamiltonians other than the usual cost Hamiltonian, dubbed custom phase operators, can affect the performance of QAOA. We derive an expected value formula for QAOA with custom phase operators at p = 1 and show numerically that some of these custom phase operators can achieve higher approximation ratio than the original algorithm implementation. Out of all the graphs tested, 0.036\% of the random custom phase operators, 75.9\% of the subgraph custom phase operators, 95.1\% of the triangle-removed custom phase operators, and 93.9\% of the maximal degree edge-removed custom phase operators have a higher approximation ratio than the original QAOA implementation. This finding opens up the question of whether better phase operators can be designed to further improve the performance of QAOA.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# 鏡ライブラリー:低次元のディープニューラルネットは反射特性を持つ凸ラッソモデルである A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features ( http://arxiv.org/abs/2403.01046v2 ) ライセンス: Link先を確認	Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci,	(参考訳) 1次元データ上でニューラルネットワークをトレーニングすることは、凸ラッソ問題を固定的、明示的に定義された特徴の辞書行列で解くのと等価であることを示す。特定の辞書はアクティベーションと深さに依存する。分割線形アクティベーションを持つ2層ネットワーク,最大4層までの細いReLUネットワーク,符号アクティベーションと任意の深さを持つ長方形およびツリーネットワークについて検討する。興味深いことに、ReLUネットワークでは、第4のレイヤが、自分自身に関するトレーニングデータのリフレクションを表す機能を生成する。 Lasso表現は、グローバルに最適なネットワークとソリューションランドスケープに洞察を与える。 We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# ParallelPARC: 自然言語アナロジーを生成するためのスケーラブルなパイプライン ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies ( http://arxiv.org/abs/2403.01139v2 ) ライセンス: Link先を確認	Oren Sultan, Yonatan Bitton, Ron Yosef, Dafna Shahaf,	(参考訳) アナロジー作成は人間の認知の中心であり、新しい状況に適応することができる。現在、ほとんどのアナロジーデータセットは単純なアナロジー(例:単語のアナロジー)に焦点を当てている。これは計算類似の進歩を後押ししていると我々は信じている。本研究では,現在最先端のLarge Language Models (LLM) を利用したデータ生成パイプラインであるParallelPARC (Parallel Paragraph Creator) を設計し,複雑な段落をベースとしたアナロジーと,複雑で難易度の高いイントラクタを作成する。当社のパイプラインを実演し、科学的プロセス間のアナロジーのデータセットであるProPara-Logyを作成します。我々は人によって検証された金のセットと銀のセットを自動生成する。我々は、LLMと人間のアナロジー認識を二分選択および複数選択設定でテストし、光監督後、人間が最良のモデル(〜13%のギャップ)より優れていることを示した。私たちは、銀のセットがトレーニングモデルに役立つことを実証します。最後に、難解な気晴らし者がLSMを混乱させるが、人間ではないことを示す。私たちのパイプラインは、この新興分野の研究を促進することを願っています。 Analogy-making is central to human cognition, allowing us to adapt to novel situations -- an ability that current AI systems still lack. Most analogy datasets today focus on simple analogies (e.g., word analogies); datasets including complex types of analogies are typically manually curated and very small. We believe that this holds back progress in computational analogy. In this work, we design a data generation pipeline, ParallelPARC (Parallel Paragraph Creator) leveraging state-of-the-art Large Language Models (LLMs) to create complex, paragraph-based analogies, as well as distractors, both simple and challenging. We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes. We publish a gold-set, validated by humans, and a silver-set, generated automatically. We test LLMs' and humans' analogy recognition in binary and multiple-choice settings, and found that humans outperform the best models (~13% gap) after a light supervision. We demonstrate that our silver-set is useful for training models. Lastly, we show challenging distractors confuse LLMs, but not humans. We hope our pipeline will encourage research in this emerging field.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# TPLLM: 事前訓練された大規模言語モデルに基づく交通予測フレームワーク TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models ( http://arxiv.org/abs/2403.02221v2 ) ライセンス: Link先を確認	Yilong Ren, Yue Chen, Shuai Liu, Boyue Wang, Haiyang Yu, Zhiyong Cui,	(参考訳) 交通予測は、インテリジェントトランスポーテーションシステム(ITS)のパービューにおいて重要な側面を占めており、高精度な予測の達成は、効率的な交通管理に重要な意味を持つ。ディープラーニング駆動型トラフィック予測モデルの精度は、通常、トレーニングデータの量が増加するにつれて、上昇傾向を呈する。しかしながら、トラフィックのための包括的な時空間データセットの調達は、主にデータ収集と保持に関連する実質的なコストから生じる、課題によって引き起こされることが多い。したがって,歴史的交通量に制限のある地域で,正確な予測と優れた一般化能力を達成できるモデルを開発することは難しい問題である。近年の先進的な大規模言語モデル (LLM) は, クロスモダリティの知識伝達や数発の学習において, 極めて優れた能力を発揮している。 LLMを利用した新しい交通予測フレームワークであるTPLLMを導入する。本稿では,畳み込みニューラルネットワーク(CNN)に基づくシーケンス埋め込み層とグラフ畳み込みニューラルネットワーク(GCN)に基づくグラフ埋め込み層を構築し,それぞれにシーケンスの特徴と空間的特徴を抽出する。これらは後にLLMに適した入力を形成するために統合される。低ランク適応(LoRA)ファインチューニングアプローチをTPLLMに適用することにより,効率的な学習と計算要求の最小化を実現する。実世界の2つのデータセットの実験では、TPLLMはフルサンプルと少数ショットの予測シナリオの両方で高い性能を示し、歴史的交通量の少ない地域でのITSの開発を効果的に支援している。 Traffic prediction constitutes a pivotal facet within the purview of Intelligent Transportation Systems (ITS), and the attainment of highly precise predictions holds profound significance for efficacious traffic management. The precision of prevailing deep learning-driven traffic prediction models typically sees an upward trend with a rise in the volume of training data. However, the procurement of comprehensive spatiotemporal datasets for traffic is often fraught with challenges, primarily stemming from the substantial costs associated with data collection and retention. Consequently, developing a model that can achieve accurate predictions and good generalization ability in areas with limited historical traffic data is a challenging problem. It is noteworthy that the rapidly advancing pretrained Large Language Models (LLMs) of recent years have demonstrated exceptional proficiency in cross-modality knowledge transfer and few-shot learning. Recognizing the sequential nature of traffic data, similar to language, we introduce TPLLM, a novel traffic prediction framework leveraging LLMs. In this framework, we construct a sequence embedding layer based on Convolutional Neural Networks (CNNs) and a graph embedding layer based on Graph Convolutional Networks (GCNs) to extract sequence features and spatial features, respectively. These are subsequently integrated to form inputs that are suitable for LLMs. A Low-Rank Adaptation (LoRA) fine-tuning approach is applied to TPLLM, thereby facilitating efficient learning and minimizing computational demands. Experiments on two real-world datasets demonstrate that TPLLM exhibits commendable performance in both full-sample and few-shot prediction scenarios, effectively supporting the development of ITS in regions with scarce historical traffic data.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# フォノン急速断熱路による閉じ込められたイオンの冷却 Cooling trapped ions with phonon rapid adiabatic passage ( http://arxiv.org/abs/2403.02315v2 ) ライセンス: Link先を確認	M. I. Fabrikant, P. Lauria, I. S. Madjarov, W. C. Burton, R. T. Sutherland,	(参考訳) 量子電荷結合デバイス(QCCD)コンピュータアーキテクチャの最近のデモでは、回路時間は冷却によって支配される。マルチイオン結晶の運動モードでは、冷却剤イオンの関与が低いため、マグニチュードのオーダーが他の結晶よりも長くかかる。ここでは, 直接冷却よりも短い時間スケールで, 選択モードの熱集団をコヒーレントに交換することにより, この問題を回避する新しい手法を, フォノン急速断熱通路 (phrap) と呼ぶ。断熱的急速通過とは対照的に, 急速冷却モードと直流電場を用いた急速冷却モードを準静的に結合する。結晶が断熱的に横切られないようにすると、ほぼ完全なフォノン集団交換結果が得られる。我々はこれを2イオン結晶上で実証し、全ての放射状モードの間接的な地中冷却を、直接冷却と比較して桁違いの速度アップを達成することを示した。また、この手法の電位と制御磁場のゆらぎを捕捉する感度が低いことを示し、さらにn~200の温度からサブクアンタ温度を達成できることを見出した。 In recent demonstrations of the quantum charge-coupled device (QCCD) computer architecture, circuit times are dominated by cooling. Some motional modes of multi-ion crystals take orders-of-magnitude longer to cool than others because of low coolant ion participation. Here we demonstrate a new technique, which we call phonon rapid adiabatic passage (phrap), that avoids this issue by coherently exchanging the thermal populations of selected modes on timescales short compared to direct cooling. Analogous to adiabatic rapid passage, we quasi-statically couple these slow-cooling modes with fast-cooling ones using DC electric fields. When the crystal is then adiabatically ramped through the resultant avoided crossing, nearly-complete phonon population exchange results. We demonstrate this on two-ion crystals, and show the indirect ground-state cooling of all radial modes--achieving an order of magnitude speedup compared to direct cooling. We also show the technique's insensitivity to trap potential and control field fluctuations, and find that it still achieves sub-quanta temperatures starting as high as n~200.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# Polyak Momentumを用いた非凸確率合成最適化 Non-Convex Stochastic Composite Optimization with Polyak Momentum ( http://arxiv.org/abs/2403.02967v2 ) ライセンス: Link先を確認	Yuan Gao, Anton Rodomanov, Sebastian U. Stich,	(参考訳) 確率的近位勾配法は、広く使われている確率的勾配勾配勾配法(SGD)の強力な一般化であり、機械学習において多くの応用を見出した。しかし、この手法は確率ノイズが顕著な非凸条件(すなわち、小さなバッチサイズまたは境界バッチサイズのみを使用する場合)に収束しないことが知られている。本稿では,ポリーク運動量を持つ確率的近位勾配法に着目した。本手法は,バッチサイズに関係なく,非凸合成最適化問題に対して最適収束率が得られることを示す。さらに, 合成最適化におけるポリアクモーメントの分散低減効果を厳密に解析し, 近似ステップが不正確に解ける場合にも収束することを示す。最後に, 理論的結果を検証する数値実験を行った。 The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results.	翻訳日:2024-03-20 23:01:00 公開日:2024-03-18
# DeepCRE:AI駆動のクロスドラッグ反応評価を通じてドラッグR&Dを変える DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response Evaluation ( http://arxiv.org/abs/2403.03768v3 ) ライセンス: Link先を確認	Yushuai Wu, Ting Zhang, Hao Zhou, Hainan Wu, Hanwen Sunchu, Lei Hu, Xiaofang Chen, Suyuan Zhao, Gaochao Liu, Chao Sun, Jiahuan Zhang, Yizhen Luo, Peng Liu, Zaiqing Nie, Yushuai Wu,	(参考訳) 治療応用と薬物研究・開発(R&D)の分野はどちらも重大な課題に直面している。その理由の1つは、薬物R&Dの後期におけるクロスドラッグ反応評価(CRE)の不十分さである。 in-silico CREモデルは有望な解決策をもたらすが、既存の方法論はターゲットや細胞ラインレベルなどの薬物R&Dの初期段階に限られており、臨床成功率に制限がある。本稿では、薬物研究開発の後期において、CREを効果的に予測する先駆的なAIモデルであるDeepCREを紹介する。 DeepCREは、患者レベルのCREの平均パフォーマンス改善を17.7%、表示レベルのCREを5倍に向上させることで、より正確なパーソナライズされた治療予測と、表示に対する薬価評価を改善することで、既存のベストモデルより優れている。さらに、DeepCREは、5/8の大腸癌オルガノイドの2つの承認された薬物のコンパレータセットよりもはるかに効果の高い6つの薬物候補を同定した。このことは、DeepCREが治療効果を増強した薬物候補のスペクトルを体系的に発見する能力を示し、薬物R&Dを変換する可能性を強調している。 The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models bring a promising solution, existing methodologies are restricted to early stages of drug R&D, such as target and cell-line levels, offering limited improvement to clinical success rates. Herein, we introduce DeepCRE, a pioneering AI model designed to predict CRE effectively in the late stages of drug R&D. DeepCRE outperforms the existing best models by achieving an average performance improvement of 17.7% in patient-level CRE, and a 5-fold increase in indication-level CRE, facilitating more accurate personalized treatment predictions and better pharmaceutical value assessment for indications, respectively. Furthermore, DeepCRE has identified a set of six drug candidates that show significantly greater effectiveness than a comparator set of two approved drugs in 5/8 colorectal cancer organoids. This demonstrates the capability of DeepCRE to systematically uncover a spectrum of drug candidates with enhanced therapeutic effects, highlighting its potential to transform drug R&D.	翻訳日:2024-03-20 20:59:05 公開日:2024-03-18
# 周期変調による地中キラル電流 Ground-state chiral current via periodic modulation ( http://arxiv.org/abs/2403.06688v2 ) ライセンス: Link先を確認	Shuyue Wang, Wuji Zhang, Chunfang Sun, Chunfeng Wu, Gangcheng Wang,	(参考訳) 本研究では,光子を介するDzyaloshinskii-Moriya相互作用を設計し,量子場と古典場によって駆動される3レベル原子に基づく基底状態キラル電流をエミュレートする。我々は、励起状態の有限寿命から生じる課題に対処できる、2レベル系の効果的なジアロシンスキー・モリヤ相互作用を導出するために、断熱除去技術を用いる。さらに,原子基底状態に対する周期変調の実装により,所望のダイナミクスを実現することができる。また、適切な駆動周波数と位相を選択することで、三状態および多状態キラル電流を得ることができる。また、トグルリングフレームに基づいて、他のコンポーネントに対してDzyaloshinskii-Moriya相互作用を設計する。数値シミュレーションの結果,提案手法によって完全に信頼性の高い基底状態のキラル電流が生成され,量子状態遷移と将来の量子ネットワークの発展の可能性が開けることが示唆された。 In this study, we engineer the Dzyaloshinskii-Moriya interaction mediated by photons to emulate ground-state chiral current based on three-level atoms driven by quantum and classical fields. We employ adiabatic elimination techniques to derive an effective Dzyaloshinskii-Moriya interaction Hamiltonian of two-level systems, which can address the challenges arising from the finite lifetime of excited states. Furthermore, we can ensure to achieve desired dynamics through the implementation of periodic modulation on the atomic ground states. Besides, three-state and multi-state chiral current can be obtained by choosing appropriate driving frequencies and phases. We also design the Dzyaloshinskii-Moriya interaction for the other components based on a toggling frame. The numerical simulation results further indicate that our proposal can generate a perfectly reliable ground-state chiral current and open up possibilities for quantum state transfer and the development of future quantum networks.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-18
# ACFIX:スマートコントラクトにおけるアクセス制御脆弱性のコンテキストアウェア修復のための共通RBACプラクティスによるLLM指導 ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts ( http://arxiv.org/abs/2403.06838v2 ) ライセンス: Link先を確認	Lyuye Zhang, Kaixuan Li, Kairan Sun, Daoyuan Wu, Ye Liu, Haoye Tian, Yang Liu,	(参考訳) スマートコントラクトは、アクセス制御(AC)脆弱性が特に重要な、さまざまなセキュリティ問題の影響を受けやすい。既存の研究では複数の検出ツールが提案されているが、スマートコントラクトにおけるAC脆弱性の自動的かつ適切な修復は依然として課題である。通常テンプレートベースのアプローチで固定される、既存の修復ツールで一般的にサポートされている脆弱性タイプとは異なり、ACの主な障害は、人間レベルのインテリジェンスを必要とするタスクである適切なパッチコードを生成するためのAC関連のソースコードの長いリストの中で、適切な役割やパーミッションを特定することである。大規模言語モデル(LLM)の最近の進歩を生かして、最先端のGPT-4モデルを採用し、ACFIXと呼ばれる新しいアプローチで拡張する。重要な洞察は、コード機能の主要なカテゴリに共通するACプラクティスをマイニングし、それを使って、同様の機能でコードを修正するのにLLMをガイドできるということです。この目的のために、ACFIXはオフラインとオンラインの両方のフェーズを含む。まず、オフラインフェーズにおいて、ACFIXは344,251のオンチェーン契約から共通ロールベースアクセス制御(RBAC)のプラクティスの分類をマイニングし、上位1000組から49のロールパーミッションペアを分類する。第2に、ACFIXは、契約全体にわたるAC関連要素を追跡し、このコンテキスト情報とChain-of-Thoughtパイプラインを使用して、対象契約に対する最も適切なロールパーミッションペアを特定し、その後、適切なパッチを生成する。このパッチは有効性と有効性をチェックする。 ACFIXを評価するために、118個の実世界のAC脆弱性のベンチマークデータセットを構築し、ACFIXが94.92%の修正に成功したことを明らかにした。これは、ベースラインの GPT-4 に比べて大幅に改善され、52.54% しか達成されなかった。 Smart contracts are susceptible to various security issues, among which access control (AC) vulnerabilities are particularly critical. While existing research has proposed multiple detection tools, the automatic and appropriate repair of AC vulnerabilities in smart contracts remains a challenge. Unlike commonly supported vulnerability types by existing repair tools, such as reentrancy, which are usually fixed by template-based approaches, the main obstacle of AC lies in identifying the appropriate roles or permissions amid a long list of non-AC-related source code to generate proper patch code, a task that demands human-level intelligence. Leveraging recent advancements in large language models (LLMs), we employ the state-of-the-art GPT-4 model and enhance it with a novel approach called ACFIX. The key insight is that we can mine common AC practices for major categories of code functionality and use them to guide LLMs in fixing code with similar functionality. To this end, ACFIX involves both offline and online phases. First, during the offline phase, ACFIX mines a taxonomy of common Role-based Access Control (RBAC) practices from 344,251 on-chain contracts, categorizing 49 role-permission pairs from the top 1,000 pairs mined. Second, during the online phase, ACFIX tracks AC-related elements across the contract and uses this context information along with a Chain-of-Thought pipeline to guide LLMs in identifying the most appropriate role-permission pair for the subject contract and subsequently generating a suitable patch. This patch will then undergo a validity and effectiveness check. To evaluate ACFIX, we built the first benchmark dataset of 118 real-world AC vulnerabilities, and our evaluation revealed that ACFIX successfully repaired 94.92% of them. This represents a significant improvement compared to the baseline GPT-4, which achieved only 52.54%.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-18
# エネルギー準位交差による相転移検出のための等変変量量子固有解法 Equivariant Variational Quantum Eigensolver to detect Phase Transitions through Energy Level Crossings ( http://arxiv.org/abs/2403.07100v2 ) ライセンス: Link先を確認	Giulio Crognaletti, Giovanni Di Bartolomeo, Michele Vischi, Luciano Loris Viteritti,	(参考訳) レベル分光は、異なる量子相を示す遷移点を特定するための強力な方法である。各量子相は励起状態の特徴的な配列を示すため、低い励起状態の間のエネルギー準位の交差は、相転移点を推定する信頼できる平均を与える。変分量子固有解法のような手法は、量子コンピューティングを用いて相互作用するシステムの基底状態を近似するのに有用であるが、低エネルギーの励起を捉えることは依然として困難である。本研究では,チェーン上の一重項および三重項励起状態を正確に記述するために,全スピンと翻訳対称性を保持する同変量子回路を導入する。さらに、ノイズが変動状態に与える影響を評価し、ゼロノイズ外挿法のような従来の緩和技術が、その物理的特性を確実に回復することを示す。 Level spectroscopy stands as a powerful method for identifying the transition point that delineates distinct quantum phases. Since each quantum phase exhibits a characteristic sequence of excited states, the crossing of energy levels between low-lying excited states offers a reliable mean to estimate the phase transition point. While approaches like the Variational Quantum Eigensolver are useful for approximating ground states of interacting systems using quantum computing, capturing low-energy excitations remains challenging. In our study, we introduce an equivariant quantum circuit that preserves the total spin and the translational symmetry to accurately describe singlet and triplet excited states in the $J_1$-$J_2$ Heisenberg model on a chain, which are crucial for characterizing its transition point. Additionally, we assess the impact of noise on the variational state, showing that conventional mitigation techniques like Zero Noise Extrapolation reliably restore its physical properties.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-18
# Lindbladian SYKにおけるオペレータサイズの成長 Operator size growth in Lindbladian SYK ( http://arxiv.org/abs/2403.07115v2 ) ライセンス: Link先を確認	Jiasheng Liu, Rene Meyer, Zhuo-Yu Xian,	(参考訳) 我々は,Lindbladian SYKモデルにおいて,$q$-body相互作用項とリニアジャンプ項を有限散逸強度で有する演算子サイズの増大について検討した。演算子のサイズと分布を有限の$q$で計算し、解析的に大きめの$q$で計算する。散逸的な(生産的な)ジャンプ項では、サイズはマヨラナフェルミオンの数の半分よりも小さい(大きい)値に収束する。弱い散逸では、作用素の大きさの進化は二次的-指数的-プラトーな振る舞いを示す。プラトー値は、大きな$q$制限における相互作用のカップリングと線形ジャンプ項の比によって決定される。演算子のサイズ分布は、単体の場合と対照的に、遅くとも有限サイズ領域で局所化されている。さらに,有限散逸時の演算子サイズ濃度を示す演算子展開の時間非依存直交基底も導出した。最後に、演算子サイズ成長の不確実性関係が大きな$q$で飽和していることが観察され、散逸を伴う演算子サイズ成長の古典力学が導かれる。 We investigate the growth of operator size in the Lindbladian SYK model with $q$-body interaction terms and linear jump terms at finite dissipation strength. We compute the operator size as well as its distribution numerically at finite $q$ and analytically at large $q$. With dissipative (productive) jump terms, the size converges to a value smaller (larger) than half the number of Majorana fermions. At weak dissipation, the evolution of operator size displays a quadratic-exponential-plateau behavior. The plateau value is determined by the ratios between the coupling of the interaction and the linear jump term in the large $q$ limit. The operator size distribution remains localized in the finite size region even at late times, contrasting with the unitary case. Moreover, we also derived the time-independent orthogonal basis for operator expansion which exhibits the operator size concentration at finite dissipation. Finally, we observe that the uncertainty relation for operator size growth is saturated at large $q$, leading to a classical dynamics of the operator size growth with dissipation.	翻訳日:2024-03-20 20:59:04 公開日:2024-03-18
# STREAM:ビデオ生成モデルのための時空間評価と分析基準 STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models ( http://arxiv.org/abs/2403.09669v2 ) ライセンス: Link先を確認	Pum Jun Kim, Seojun Kim, Jaejun Yoo,	(参考訳) 画像生成モデルは、様々な評価指標からの包括的なガイダンスによって支援され、現実的で多様な画像の生成に大きな進歩をもたらした。しかし、現在のビデオ生成モデルは、改善のための洞察を提供するツールが限られている短いビデオクリップを生成するのに苦労している。現在のビデオ評価指標は、ビデオのユニークな特徴を過小評価するビデオ埋め込みネットワークで埋め込みを切り替えることによって、画像メトリクスの単純な適応である。解析の結果,広範に使用されているFrechet Video Distance(FVD)はビデオの時間的自然性よりも空間的側面に重点を置いていることが判明した。さらに、人間の評価からかなりの不安定性と分岐を示す。そこで本稿では,空間的側面と時間的側面を独立に評価するためのビデオ評価基準STREAMを提案する。この機能は様々な視点から動画生成モデルの包括的解析と評価を可能にする。我々はSTREAMがビデオの視覚的品質と時間的品質の両方に効果的な評価ツールを提供し、ビデオ生成モデルの改善領域に関する洞察を提供することを示す分析的および実験的証拠を提供する。我々の知る限り、STREAMはビデオの時間的側面と空間的側面を別々に評価できる最初の評価指標である。私たちのコードはhttps://github.com/pro2nit/STREAMで公開されています。 Image generative models have made significant progress in generating realistic and diverse images, supported by comprehensive guidance from various evaluation metrics. However, current video generative models struggle to generate even short video clips, with limited tools that provide insights for improvements. Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks, which may underestimate the unique characteristics of video. Our analysis reveals that the widely used Frechet Video Distance (FVD) has a stronger emphasis on the spatial aspect than the temporal naturalness of video and is inherently constrained by the input size of the embedding networks used, limiting it to 16 frames. Additionally, it demonstrates considerable instability and diverges from human evaluations. To address the limitations, we propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects. This feature allows comprehensive analysis and evaluation of video generative models from various perspectives, unconstrained by video length. We provide analytical and experimental evidence demonstrating that STREAM provides an effective evaluation tool for both visual and temporal quality of videos, offering insights into area of improvement for video generative models. To the best of our knowledge, STREAM is the first evaluation metric that can separately assess the temporal and spatial aspects of videos. Our code is available at https://github.com/pro2nit/STREAM.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# Touch-GS:3Dガウシアン・スプレイティングを監督するビジュアル触覚 Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting ( http://arxiv.org/abs/2403.09875v2 ) ライセンス: Link先を確認	Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III,	(参考訳) 本研究では,光学式触覚センサを用いた3次元ガウス撮影(3DGS)シーンの監視手法を提案する。光触覚センサはロボティクスにおいて操作やオブジェクト表現に広く利用されているが、光学触覚センサのデータは直接3DGSシーンを監督するには適していない。我々の表現は、ガウス的プロセス・インプリシット・サーフェスを利用してオブジェクトを暗黙的に表現し、多くのタッチを統一された表現と不確実性を組み合わせた。このモデルを2段階のプロセスで整列した単眼深度推定ネットワークにマージし、奥行きカメラと粗い整列を行い、タッチデータに合わせて微調整する。各トレーニング画像に対して,本手法は対応する融合深度と不確実性マップを生成する。この追加情報を利用することで、3DGSシーンモデルのトレーニングのための新たな損失関数である分散重み付き深度教師付き損失を提案する。我々は、DenseTact光触覚センサとRealSense RGB-Dカメラを利用して、不透明で透明な物体だけでなく、数ビューのシーン合成において、触覚と視覚の組み合わせが視覚や触覚よりも定量的に質的に良い結果をもたらすことを示す。プロジェクトページはhttp://armlabstanford.github.io/touch-gsでご覧ください。 In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# 言語モデルマージのためのフィッシャーマスクノード Fisher Mask Nodes for Language Model Merging ( http://arxiv.org/abs/2403.09891v2 ) ライセンス: Link先を確認	Thennal D K, Ganesh Nathan, Suchithra M S,	(参考訳) 微調整された事前訓練モデルは、下流のパフォーマンスにおいて大きな利点をもたらす。 BERTなどの事前学習モデルの自然言語処理におけるユビキタスな性質は、タスク固有の微調整モデルの普及にも繋がった。これらのモデルは一般的に1つのタスクのみをうまく実行するので、マルチタスクのシナリオでは追加のトレーニングやアンサンブルが必要になる。モデルマージの増大する分野は、複数のタスク固有のモデルを単一のマルチタスクモデルに組み合わせるという課題に対処するソリューションを提供する。本研究では, トランスフォーマーのモデルマージ手法について紹介し, 従来のフィッシャー重み付き平均化における知見と, モデルプルーニングにおけるフィッシャー情報の利用について考察した。トランスフォーマーアーキテクチャにおけるマスクノードのフィッシャー情報を利用して,計算効率のよい重み付け手法を提案する。提案手法は, BERT シリーズの各種モデルにおいて, 計算コストのごく一部において, フルスケールのフィッシャー重み付け平均性能を上回り, ベースライン性能は+6.5 まで向上し, 最大速度は57.4倍に向上した。本研究は,現在のマルチタスク学習環境における本手法の有効性を実証し,新しいモデルアーキテクチャや学習シナリオに対するスケーラビリティと適応性を提案する。 Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple task-specific models into a single multi-task model. In this study, we introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning. Utilizing the Fisher information of mask nodes within the Transformer architecture, we devise a computationally efficient weighted-averaging scheme. Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost, with baseline performance improvements of up to +6.5 and a speedup of 57.4x in the biggest model. Our results prove the potential of our method in current multi-task learning environments and suggest its scalability and adaptability to new model architectures and learning scenarios.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# 見えないデータに対するAI大腸内視鏡モデルの一般化予測 Predicting Generalization of AI Colonoscopy Models to Unseen Data ( http://arxiv.org/abs/2403.09920v2 ) ライセンス: Link先を確認	Joel Shor, Carson McNeil, Yotam Intrator, Joseph R Ledsam, Hiro-o Yamano, Daisuke Tsurumaru, Hiroki Kayama, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Masaaki Miyo, Eiji Oki, Ichiro Takemasa, Ehud Rivlin, Roman Goldenberg,	(参考訳) 背景: 臨床実践におけるAI大腸内視鏡アルゴリズムの一般化性は, より広く採用するために重要である。しかし、現在、目に見えないデータのパフォーマンスを評価する技術は、高価で時間集約的なラベルを必要とする。方法:"Masked Siamese Network"(MSN)を用いて,未知のデータ中の新しい現象を同定し,ポリプ検出器の性能を予測する。 MSNは、ラベルなしでポリプ画像のマスクされた領域を予測するように訓練されている。本研究は,日本からの大腸内視鏡(354本,128時間)において,イスラエルからのデータのみに基づいてMSNを訓練し,未確認技術,狭帯域画像(NBI)およびクロマトエンドスコープ(CE)を検出する能力をテストする。また,MSNは日本からのデータに基づいて訓練を受けていないものの,両国の大腸粘膜におけるポリープのCAD(Computer Aided Detection)の性能を予測する能力についても検証した。結果: NBI と CE は日本白色光 (bootstrapped z-test, \|z\| > 496, p < 10^-8 for both) よりイスラエル白色光に似ていない。 MSNは99%の精度でNBIを検出し、ホワイトライトでのみトレーニングされているにもかかわらず、CEが我々のヒューリスティック(90%対79%の精度)より優れていると予測し、ノイズの多いラベルに対して堅牢な唯一の方法である。 MSNは、イスラエル内および日本の植民地内におけるCADポリプ検出性能(それぞれr=0.79、0.37)を予測している。日本における検出性能の訓練例は少ないが、MSNによる日本の性能予測は改善されている(r=0.56)。結論: 臨床データの分布変化を同定し, ラベルなしでCADe検出性能を予測できる。当社の自己監督型アプローチは、病院やデータがトレーニングから有意義に移行したなど、実際のデータとトレーニングの違いを検出するのに役立ちます。 MSNは大腸内視鏡以外の医療画像領域にも応用できる可能性がある。 Background: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. Methods: We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. Results: MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, \|z\| > 496, p < 10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). Conclusion: Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# PITA:物理式軌道オートエンコーダ PITA: Physics-Informed Trajectory Autoencoder ( http://arxiv.org/abs/2403.11728v1 ) ライセンス: Link先を確認	Johannes Fischer, Kevin Rösch, Martin Lauer, Christoph Stiller,	(参考訳) 安全クリティカルなアプリケーションにおけるロボットシステムの検証には、起こりそうもない稀なエッジケースを含む多くのシナリオでのテストが必要であり、シミュレーションでのテストで現実世界のテストを補完する必要がある。生成モデルは、学習したラテントスペースでサンプリングすることで、エッジケースシナリオを生成するために生成されたデータで現実世界のデータセットを拡張するために使用することができる。オートエンコーダは、低次元の中間表現から入力データを再構成することを学ぶことで、特定の領域の潜在表現を学習することができる。しかし、そのような軌道は必ずしも物理的に可算であるとは限らないが、通常は入力軌道に存在しないノイズを含んでいる。そこで本研究では,物理力学モデルをオートエンコーダの損失関数に組み込んだ新しい物理インフォームド・トラジェクトリ・オートエンコーダ(PITA)アーキテクチャを提案する。この結果、入力軌跡を再構成するだけでなく、物理モデルにも従属する滑らかな軌跡が得られる。車両軌道の実際のデータセット上でPITAを評価し、その性能を通常のオートエンコーダと最先端のアクション空間オートエンコーダと比較する。 Validating robotic systems in safety-critical appli-cations requires testing in many scenarios including rare edgecases that are unlikely to occur, requiring to complement real-world testing with testing in simulation. Generative models canbe used to augment real-world datasets with generated data toproduce edge case scenarios by sampling in a learned latentspace. Autoencoders can learn said latent representation for aspecific domain by learning to reconstruct the input data froma lower-dimensional intermediate representation. However, theresulting trajectories are not necessarily physically plausible, butinstead typically contain noise that is not present in the inputtrajectory. To resolve this issue, we propose the novel Physics-Informed Trajectory Autoencoder (PITA) architecture, whichincorporates a physical dynamics model into the loss functionof the autoencoder. This results in smooth trajectories that notonly reconstruct the input trajectory but also adhere to thephysical model. We evaluate PITA on a real-world dataset ofvehicle trajectories and compare its performance to a normalautoencoder and a state-of-the-art action-space autoencoder.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# 古典的なプランニングドメインのための一般的なポリシーを学ぶ: C$_2$を超える Learning General Policies for Classical Planning Domains: Getting Beyond C$_2$ ( http://arxiv.org/abs/2403.11734v1 ) ライセンス: Link先を確認	Simon Ståhlberg, Blai Bonet, Hector Geffner,	(参考訳) 計画領域全体にわたる一般的なポリシーを学習するためのGNNベースのアプローチは、C_2$の表現力、すなわち2つの変数を持つ一階述語論理とカウントによって制限される。この制限は、$k$-GNNs、$k=3$に移行することで克服できる。しかし、$C_3$の表現力を持つGNNは、$C_2$に制限された$C_3$と$2$-GNNとは違い、メッセージ交換の質的時間と埋め込みのためのキュービックスペースは非現実的だ。本稿では,リレーショナルGNNのパラメータ化バージョンを紹介する。 t$が無限大であるとき、R-GNN[$t$]は埋め込みのための二次空間のみを用いて3ドルGNNを近似する。 t=1$や$t=2$のような$t$の低い値の場合、R-GNN[$t$]は、より少ないメッセージを交換することで、より弱い近似を実現します。さらに、新しいR-GNN[$t$]アーキテクチャは、入力状態のみに適切な変換を施した元のR-GNNアーキテクチャである。実験結果から、R-GNN[$1$]とR-GNN[$2$]は、通常のR-GNNよりも明らかな性能向上を示し、また、300ドルに近いエッジトランスも示している。 GNN-based approaches for learning general policies across planning domains are limited by the expressive power of $C_2$, namely; first-order logic with two variables and counting. This limitation can be overcomed by transitioning to $k$-GNNs, for $k=3$, wherein object embeddings are substituted with triplet embeddings. Yet, while $3$-GNNs have the expressive power of $C_3$, unlike $1$- and $2$-GNNs that are confined to $C_2$, they require quartic time for message exchange and cubic space for embeddings, rendering them impractical. In this work, we introduce a parameterized version of relational GNNs. When $t$ is infinity, R-GNN[$t$] approximates $3$-GNNs using only quadratic space for embeddings. For lower values of $t$, such as $t=1$ and $t=2$, R-GNN[$t$] achieves a weaker approximation by exchanging fewer messages, yet interestingly, often yield the $C_3$ features required in several planning domains. Furthermore, the new R-GNN[$t$] architecture is the original R-GNN architecture with a suitable transformation applied to the input states only. Experimental results illustrate the clear performance gains of R-GNN[$1$] and R-GNN[$2$] over plain R-GNNs, and also over edge transformers that also approximate $3$-GNNs.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# LSKNet: リモートセンシングのための基礎的な軽量バックボーン LSKNet: A Foundation Lightweight Backbone for Remote Sensing ( http://arxiv.org/abs/2403.11735v1 ) ライセンス: Link先を確認	Yuxuan Li, Xiang Li, Yimain Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang,	(参考訳) リモートセンシング画像は、その固有の複雑さのために、下流のタスクに対して異なる課題を生じさせる。リモートセンシング分類、オブジェクト検出、セマンティックセグメンテーションに多くの研究がなされているが、これらの研究の多くは、リモートセンシングシナリオに埋め込まれた貴重な事前知識を見落としている。このような事前知識は、遠隔センシングオブジェクトが十分に長い範囲のコンテキストを参照せずに誤って認識され、異なるオブジェクトに対して異なる可能性があるため、有用である。本稿では,これらの前提を考察し,軽量なLarge Selective Kernel Network(LSKNet)のバックボーンを提案する。 LSKNetはその大きな空間受容場を動的に調整し、リモートセンシングシナリオにおける様々なオブジェクトの範囲をモデル化する。我々の知る限り、大規模で選択的なカーネル機構は、これまでリモートセンシング画像では研究されていない。我々の軽量LSKNetは、標準リモートセンシング分類、オブジェクト検出、セマンティックセグメンテーションベンチマークに基づいて、最先端のスコアを設定しています。包括的分析により、同定された事前の意義とLSKNetの有効性がさらに検証された。コードはhttps://github.com/zcablii/LSKNetで公開されている。 Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote sensing objects may be mistakenly recognized without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes a lightweight Large Selective Kernel Network (LSKNet) backbone. LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To our knowledge, large and selective kernel mechanisms have not been previously explored in remote sensing images. Without bells and whistles, our lightweight LSKNet sets new state-of-the-art scores on standard remote sensing classification, object detection and semantic segmentation benchmarks. Our comprehensive analysis further validated the significance of the identified priors and the effectiveness of LSKNet. The code is available at https://github.com/zcablii/LSKNet.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# 量子後暗号:量子時代におけるデジタル通信のセキュア化 Post-Quantum Cryptography: Securing Digital Communication in the Quantum Era ( http://arxiv.org/abs/2403.11741v1 ) ライセンス: Link先を確認	Dr. G S Mamatha, Namya Dimri, Rasha Sinha,	(参考訳) 量子コンピューティングの出現は、従来の暗号システムに深刻な脅威をもたらし、RSAやECC、それに類する古典的な暗号化手法に依存するデジタル通信チャネルのセキュリティを侵害する脆弱性を露呈する。量子アルゴリズム、特にショアのアルゴリズムは、量子コンピュータの本質的な計算能力を利用して、これらの暗号スキームの根底にある数学的問題を効率的に解く。これに応えて、量子後暗号(PQC)は、量子攻撃に弱いレジリエントな暗号アルゴリズムの開発を目的とした重要な分野として登場した。本稿では,古典暗号システムの脆弱性を量子攻撃に適用し,量子コンピューティングの原理を解明し,格子ベースの暗号,コードベースの暗号,ハッシュベースの暗号,多変量多項式暗号などの様々なPQCアルゴリズムを導入する。量子コンピューティングの進歩の中でのデジタル通信の確保におけるPQCの重要性を強調して、この研究は、出現する量子脅威に直面したデータ完全性、機密性、および認証を保護する上で、その重要な役割を浮き彫りにしている。 The advent of quantum computing poses a profound threat to traditional cryptographic systems, exposing vulnerabilities that compromise the security of digital communication channels reliant on RSA, ECC, and similar classical encryption methods. Quantum algorithms, notably Shor's algorithm, exploit the inherent computational power of quantum computers to efficiently solve mathematical problems underlying these cryptographic schemes. In response, post-quantum cryptography (PQC) emerged as a critical field aimed at developing resilient cryptographic algorithms impervious to quantum attacks. This paper delineates the vulnerabilities of classical cryptographic systems to quantum attacks, elucidates the principles of quantum computing, and introduces various PQC algorithms such as lattice-based cryptography, code-based cryptography, hash-based cryptography, and multivariate polynomial cryptography. Highlighting the importance of PQC in securing digital communication amidst quantum computing advancements, this research underscores its pivotal role in safeguarding data integrity, confidentiality, and authenticity in the face of emerging quantum threats.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# PARMESAN:Dense Prediction Taskのためのパラメータフリーメモリ検索とトランスダクション PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks ( http://arxiv.org/abs/2403.11743v1 ) ライセンス: Link先を確認	Philip Matthias Winter, Maria Wimmer, David Major, Dimitrios Lenis, Astrid Berg, Theresa Neubauer, Gaia Romana De Paolis, Johannes Novotny, Sophia Ulonska, Katja Bühler,	(参考訳) この研究では、トランスダクティブ推論を用いてディープラーニングの柔軟性に対処する。新しいタスクや新しいデータに適応するためには、既存のメソッドは通常、学習可能なパラメータのチューニングや、スクラッチから完全に再トレーニングを含む。計算をメモリからトランスダクション(transduction)で分離するという概念は,これらの問題を解決するためのステップストーンとして機能する,と我々は主張する。そこで我々は,高密度予測タスクを解くためにメモリモジュールを利用するスケーラブルなトランスダクション手法であるPARMESANを提案する。推論では、メモリ内の隠された表現が検索され、対応する例が見つかる。他の方法とは対照的に、PARMESANは、メモリの内容を変更するだけで、継続的なトレーニングや学習可能なパラメータの微調整を必要とせずに学習する。提案手法は一般的なニューラルネットワークと互換性があり、1D, 2D, 3Dグリッドベースのデータにカノニカルに転送する。継続学習や少数ショット学習といった複雑なタスクにおいて,我々のアプローチの能力を実証する。 PARMESANは、予測性能、知識保持、データ効率の点で同等でありながら、一般的なベースラインの最大370倍の速度で学習する。 In this work we address flexibility in deep learning by means of transductive reasoning. For adaptation to new tasks or new data, existing methods typically involve tuning of learnable parameters or even complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a stepping stone for solving these issues. We therefore propose PARMESAN (parameter-free memory search and transduction), a scalable transduction method which leverages a memory module for solving dense prediction tasks. At inference, hidden representations in memory are being searched to find corresponding examples. In contrast to other methods, PARMESAN learns without the requirement for any continuous training or fine-tuning of learnable parameters simply by modifying the memory content. Our method is compatible with commonly used neural architectures and canonically transfers to 1D, 2D, and 3D grid-based data. We demonstrate the capabilities of our approach at complex tasks such as continual and few-shot learning. PARMESAN learns up to 370 times faster than common baselines while being on par in terms of predictive performance, knowledge retention, and data-efficiency.	翻訳日:2024-03-20 20:39:33 公開日:2024-03-18
# 確率分類器を用いた組込み名前付きエンティティ認識 Embedded Named Entity Recognition using Probing Classifiers ( http://arxiv.org/abs/2403.11747v1 ) ライセンス: Link先を確認	Nicholas Popovič, Michael Färber,	(参考訳) 生成したテキストから意味情報を抽出することは、自動事実チェックや検索拡張生成のようなアプリケーションに有用なツールである。現在、これは推論中に別のモデルが必要であり、計算コストを増大させるか、言語モデルの破壊的な微調整を行う。代わりに、探索分類器を用いて事前学習した言語モデルに情報抽出機能を組み込むことにより、効率的なテキスト生成と情報抽出を可能にする。そこで本研究では,EMBERと呼ばれる手法を導入し,デコーダのみの言語モデルにおいて,微調整をせず,推論時に最小限の計算コストを発生させることなく,名前付きエンティティ認識を可能にすることを示す。具体的には,GPT-2 を用いた実験により,EMBER はストリーミングテキスト生成中に高いトークン生成率を維持しており,NER モデルによるベースラインの43.64% の速度低下に対して,約1% の速度低下しか無視できないことがわかった。コードとデータはhttps://github.com/nicpopovic/EMBER.comで公開されている。 Extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the language model. Instead, we propose directly embedding information extraction capabilities into pre-trained language models using probing classifiers, enabling efficient simultaneous text generation and information extraction. For this, we introduce an approach called EMBER and show that it enables named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments using GPT-2 show that EMBER maintains high token generation rates during streaming text generation, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline using a separate NER model. Code and data are available at https://github.com/nicpopovic/EMBER.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# クロススペクトル画像マッチングのための関係表現学習ネットワーク Relational Representation Learning Network for Cross-Spectral Image Patch Matching ( http://arxiv.org/abs/2403.11751v1 ) ライセンス: Link先を確認	Chuang Yu, Yunpeng Liu, Jinmiao Zhao, Dou Quan, Zelin Shi,	(参考訳) 近年,クロススペクトル画像パッチマッチングにおいて特徴関係学習が注目されている。しかし、既存の研究は、画像パッチの特徴間の多様な関係の抽出に重点を置いており、個々の画像パッチの本質的な特徴表現を十分に無視している。そこで, 画像パッチの内在的特徴と画像パッチの特徴の関係を十分にマイニングすることに焦点を当てた, 革新的リレーショナル表現学習のアイデアを初めて提案する。そこで我々は,軽量リレーショナル表現学習ネットワーク(RRL-Net)を構築した。具体的には、個人固有の特徴を完全に特徴付けるオートエンコーダを革新的に構築し、深い特徴関係を抽出する機能相互作用学習(FIL)モジュールを導入する。さらに個々の固有の特徴をフルマイニングするために,各画像パッチのグローバルな特徴抽出を強化し,グローバル機能内のローカル依存関係をキャプチャする,軽量な多次元グローバル・ローカル・アテンション(MGLA)モジュールを構築した。 MGLAモジュールを組み合わせることで、機能抽出ネットワークをさらに探求し、アテンションに基づく軽量特徴抽出(ALFE)ネットワークを構築する。さらに、パラメータや推論時間の増加を回避しつつ、ネットワーク最適化を大幅に促進するマルチロス後処理(MLPP)最適化戦略を提案する。大規模な実験により、RRL-Netは複数の公開データセット上での最先端(SOTA)性能を達成することが示された。私たちのコードは後で公開されます。 Recently, feature relation learning has drawn widespread attention in cross-spectral image patch matching. However, existing related research focuses on extracting diverse relations between image patch features and ignores sufficient intrinsic feature representations of individual image patches. Therefore, an innovative relational representation learning idea is proposed for the first time, which simultaneously focuses on sufficiently mining the intrinsic features of individual image patches and the relations between image patch features. Based on this, we construct a lightweight Relational Representation Learning Network (RRL-Net). Specifically, we innovatively construct an autoencoder to fully characterize the individual intrinsic features, and introduce a Feature Interaction Learning (FIL) module to extract deep-level feature relations. To further fully mine individual intrinsic features, a lightweight Multi-dimensional Global-to-Local Attention (MGLA) module is constructed to enhance the global feature extraction of individual image patches and capture local dependencies within global features. By combining the MGLA module, we further explore the feature extraction network and construct an Attention-based Lightweight Feature Extraction (ALFE) network. In addition, we propose a Multi-Loss Post-Pruning (MLPP) optimization strategy, which greatly promotes network optimization while avoiding increases in parameters and inference time. Extensive experiments demonstrate that our RRL-Net achieves state-of-the-art (SOTA) performance on multiple public datasets. Our code will be made public later.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# 古典を再考する:韻文・詩文におけるジェンダーステレオタイプを同定・定式化する研究 Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems ( http://arxiv.org/abs/2403.11752v1 ) ライセンス: Link先を確認	Aditya Narayan Sankaran, Vigneshwaran Shankaran, Sampath Lonka, Rajesh Sharma,	(参考訳) 韻律や詩は文化規範や社会的な役割を伝達する強力な媒体である。しかしながら、これらの作品における男女のステレオタイプが広く存在することは、偏見の知覚を永続させ、個人のアイデンティティの範囲を制限する。過去の研究では、幼児期にステレオタイピングと偏見が出現することが示されており、因果的メカニズムに関する発達的研究は、ステレオタイピングと偏見の理解と制御に不可欠である。本研究は,ジェンダーステレオタイプを特定するために韻文と詩のデータセットを収集し,ジェンダーバイアスを97%精度で同定するモデルを提案する。ジェンダーのステレオタイプをLarge Language Model (LLM) を用いて修正し、その効果を人間の教育者に対する比較調査で評価した。要約すると、本研究は文学作品におけるジェンダーステレオタイプの普及性を強調し、ジェンダーステレオタイプを是正するLLMの可能性を明らかにする。本研究は,ジェンダー平等に関する言説に重要な貢献をし,芸術表現におけるインクリシティを高めることを目的としている。 Rhymes and poems are a powerful medium for transmitting cultural norms and societal roles. However, the pervasive existence of gender stereotypes in these works perpetuates biased perceptions and limits the scope of individuals' identities. Past works have shown that stereotyping and prejudice emerge in early childhood, and developmental research on causal mechanisms is critical for understanding and controlling stereotyping and prejudice. This work contributes by gathering a dataset of rhymes and poems to identify gender stereotypes and propose a model with 97\% accuracy to identify gender bias. Gender stereotypes were rectified using a Large Language Model (LLM) and its effectiveness was evaluated in a comparative survey against human educator rectifications. To summarize, this work highlights the pervasive nature of gender stereotypes in literary works and reveals the potential of LLMs to rectify gender stereotypes. This study raises awareness and promotes inclusivity within artistic expressions, making a significant contribution to the discourse on gender equality.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# 聴覚的情緒的ミミリー強度推定のための効率的な特徴抽出とレイトフュージョン戦略 Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation ( http://arxiv.org/abs/2403.11757v1 ) ライセンス: Link先を確認	Jun Yu, Wangyuan Zhu, Jichao Zhu,	(参考訳) 本稿では,第6回ABAW(Affective Behavior Analysis in-the-wild)コンペティション(ABAW)コンペティション(ABAW)コンペティション(ABAW)コンペティション(Emotional Mimicry Intensity:EMI推定課題)の解決法を提案する。 In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration," "Amusement," "Determination," "Empathic Pain," "Excitement," and "Joy").	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# DAOガバナンスプロセスの謎化 Demystifying the DAO Governance Process ( http://arxiv.org/abs/2403.11758v1 ) ライセンス: Link先を確認	Junjie Ma, Muhui Jiang, Jinan Jiang, Xiapu Luo, Yufeng Hu, Yajin Zhou, Qi Wang, Fengwei Zhang,	(参考訳) 分散自律組織(DAO)は、分散ガバナンスを実現するために、分散アプリケーション(dApps)のための一般的なガバナンスソリューションになる。 DAOでは、ほとんどのメンバーからの承認なしに、任意のエンティティでdAppsを制御できない。しかし、その優位性にもかかわらず、DAOはいくつかの攻撃の対象にもなっており、数百万ドルが失われている。本稿では,ブロックチェーンにおけるDAOガバナンスプロセスの概要について概説する。次に、ガバナンスプロセスの3つのコンポーネント(ガバナンス契約、文書化、提案)で問題を特定しました。それぞれのコンポーネントは、重大な損失をもたらす可能性のある問題に対して脆弱である。そして、上記の問題を検出する自動手法を開発した。既存のDAOエコシステム内の問題を調べるために、9つの異なるブロックチェーンにわたる16,427のDAO、183のドキュメント、122,307の提案を含む最先端のデータセットを構築しました。分析の結果,DAO開発者やメンバの大多数が,特に提案領域において,これらの問題に十分な注意を払っていないことが明らかとなった。その結果、調査された提案の60%以上は、メンバーに対して一貫した説明とコードを提供しておらず、DAOガバナンスプロセス内で透明性を確保するための大きなギャップを浮き彫りにしている。より良いDAOガバナンスエコシステムのために、DAO開発者とメンバーは、ガバナンスプロセス内の問題を特定し、対処するための方法を利用することができます。 Decentralized Autonomous Organization (DAO) becomes a popular governance solution for decentralized applications (dApps) to achieve decentralized governance. In the DAO, no single entity can arbitrarily control the dApps without approval from the majority of members. However, despite its advantages, DAO has also been targeted by several attacks, leading to the loss of millions of dollars. In this paper, we first provided an overview of the DAO governance process within the blockchain. Next, we identified the issues within three components of governance process: Governance Contract, Documentation, and Proposal. Each of these components is vulnerable to issues that could potentially result in substantial financial losses. Then we developed automated methods to detected above issues. To investigate the issues within the existing DAO ecosystem, we constructed a state-of-the-art dataset that includes 16,427 DAOs, 183 documentation, and 122,307 proposals across 9 different blockchains. Our analysis reveals that a majority of DAO developers and members have not given sufficient attention to these issues, especially in the area of proposal. The result shows that over 60% of the examined proposals fail to provide a consistent description and code for their members, highlighting a significant gap in ensuring transparency within the DAO governance process. For a better DAO governance ecosystem, DAO developers and members can utilize the methods to identify and address issues within governance process.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# なぜE.T.は自宅に電話できないのか - VoWiFiにおけるIPベースのジオブロッキングのグローバルな展望 Why E.T. Can't Phone Home: A Global View on IP-based Geoblocking at VoWiFi ( http://arxiv.org/abs/2403.11759v1 ) ライセンス: Link先を確認	Gabriel Karl Gegenhuber, Philipp Frenzel, Edgar Weippl,	(参考訳) 現在のセルラーネットワーク世代 (4G, 5G) では、IMS (IP Multimedia Subsystem) が音声通話やショートメッセージの終了に重要な役割を果たしている。多くのオペレーターはVoWiFi(Voice over Wi-Fi、Wi-Fi通話)を代替のネットワークアクセス技術として使用し、無線信号がない地域(例えば、農村部やシールドビルなど)での携帯電話の通信を補完する。顧客が国境を定期的に横断するモバイルの世界では、VoWiFiの通話は通常国内レートで請求されるため、海外旅行中に高価な国際ローミング料金を回避できる。この収益源を失わないために、海外に滞在する顧客のためにIMSへのアクセスをブロックするオペレーターもいる。本研究は,グローバルオペレータ間のVoWiFiの現在の展開状況を評価し,IP層上の既存のジオブロッキング対策を解析する。オペレータのかなりのシェア(IPv4: 14.6%、IPv6: 65.2%)がDNSまたはVoWiFiプロトコルレベルでジオブロッキングを実装しており、緊急呼び出しサービスの可用性に関して深刻な欠点を浮き彫りにしている。 In current cellular network generations (4G, 5G) the IMS (IP Multimedia Subsystem) plays an integral role in terminating voice calls and short messages. Many operators use VoWiFi (Voice over Wi-Fi, also Wi-Fi calling) as an alternative network access technology to complement their cellular coverage in areas where no radio signal is available (e.g., rural territories or shielded buildings). In a mobile world where customers regularly traverse national borders, this can be used to avoid expensive international roaming fees while journeying overseas, since VoWiFi calls are usually invoiced at domestic rates. To not lose this revenue stream, some operators block access to the IMS for customers staying abroad. This work evaluates the current deployment status of VoWiFi among worldwide operators and analyzes existing geoblocking measures on the IP layer. We show that a substantial share (IPv4: 14.6%, IPv6: 65.2%) of operators implement geoblocking at the DNS- or VoWiFi protocol level, and highlight severe drawbacks in terms of emergency calling service availability.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# 3R-INN: 動画を消費/配信しながら、どのように気候に優しいか? 3R-INN: How to be climate friendly while consuming/delivering videos? ( http://arxiv.org/abs/2403.11760v1 ) ライセンス: Link先を確認	Zoubida Ameur, Claire-Hélène Demarty, Daniel Menard, Olivier Le Meur,	(参考訳) ビデオの消費は、そのライフサイクルの様々な段階でかなりのエネルギーを必要とする。毎日10億時間のビデオが消費され、温室効果ガスの排出に大きく貢献する。したがって、ビデオチェーンの端から端までのカーボンフットプリントを減らすことは、ユーザ側の体験の質を保ちながら、非常に重要である。 3R-INNは,高解像度の粒度画像が与えられた場合,それを低解像度に再スケールし,フィルムグレーンを除去し,表示時の消費電力を低減させる。このような最小限の有効品質のコンテンツを提供することは、符号化、伝送、復号化、表示時のエネルギー消費を減らすことに寄与する。 3R-INNはまた、その可逆性と高周波の絡み合いのおかげで、高解像度のグレーン画像またはグレーンフリー版を復元でき、補助データを送信しない。実験により、符号化(78%)、復号化(77%)、レンダリング(5%から20%)でかなりの省エネ効果が得られたが、3R-INNは最先端のフィルム粒子合成およびエネルギ認識法より優れ、異なるテストセット上の再スケーリングタスクにおける最先端のパフォーマンスが達成された。 The consumption of a video requires a considerable amount of energy during the various stages of its life-cycle. With a billion hours of video consumed daily, this contributes significantly to the greenhouse gas emission. Therefore, reducing the end-to-end carbon footprint of the video chain, while preserving the quality of experience at the user side, is of high importance. To contribute in an impactful manner, we propose 3R-INN, a single light invertible network that does three tasks at once: given a high-resolution grainy image, it Rescales it to a lower resolution, Removes film grain and Reduces its power consumption when displayed. Providing such a minimum viable quality content contributes to reducing the energy consumption during encoding, transmission, decoding and display. 3R-INN also offers the possibility to restore either the high-resolution grainy original image or a grain-free version, thanks to its invertibility and the disentanglement of the high frequency, and without transmitting auxiliary data. Experiments show that, while enabling significant energy savings for encoding (78%), decoding (77%) and rendering (5% to 20%), 3R-INN outperforms state-of-the-art film grain synthesis and energy-aware methods and achieves state-of-the-art performance on the rescaling task on different test-sets.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# BEVCar:BEVマップとオブジェクトセグメンテーションのためのカメラレーダーフュージョン BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation ( http://arxiv.org/abs/2403.11761v1 ) ライセンス: Link先を確認	Jonas Schramm, Niclas Vödisch, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Wolfram Burgard, Abhinav Valada,	(参考訳) 鳥眼ビュー(BEV)の観点からのセマンティックシーンセグメンテーションは,移動ロボットの計画と意思決定を促進する上で重要な役割を担っている。最近の視覚のみの手法は、性能の顕著な進歩を示しているが、雨や夜間などの悪照明条件下では、しばしば苦労する。アクティブセンサーはこの課題に対する解決策を提供するが、LiDARの高コストは制限要因である。カメラデータを自動車レーダーで融合させることは、より安価な代替手段となるが、以前の研究ではあまり注目されなかった。本研究は,BEVCarと地図セグメンテーションを融合した新しいBEVCarを導入することで,この将来性のある道を推し進めることを目的としている。我々のアプローチの中核的な特徴は、まず生のレーダーデータのポイントベース符号化を学習し、BEV空間への画像特徴の持ち上げを効率的に初期化することである。 nuScenesデータセットに関する広範な実験を行い、BEVCarが現在の最先端技術より優れていることを示す。さらに,レーダ情報の導入により,環境条件の難易度が著しく向上し,遠隔物体のセグメンテーション性能が向上することを示す。将来の研究を促進するため、実験で使用したnuScenesデータセットの天気予報と、http://bevcar.cs.uni-freiburg.deでトレーニングされたモデルを提供しています。 Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at http://bevcar.cs.uni-freiburg.de.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# エンパワーアクティビリティ:異なる能力を持つ雇用と奨学金のためのポータル EmpowerAbility: A portal for employment & scholarships for differently-abled ( http://arxiv.org/abs/2403.11769v1 ) ライセンス: Link先を確認	Himanshu Raj, Shubham Kumar, Dr. J Kalaivani,	(参考訳) インターネットは、今日の技術的に先進的な世界の求職者、特に障害のある人々にとって重要な資源となっている。彼らは主に、特定の要件とスキルセットに適合する仕事を見つけるために、インターネットリソースに依存しています。障害のある候補者の中には、即応の回答や求職提案を受ける者もいれば、複雑な求人ポータルを横切ることが難しい者もいるが、このプロセスの有効性は様々である。この相違は、障害のある人のための求職プロセスを大幅に高速化し、簡素化できるアクセシビリティ機能や機能を完全に理解・活用できないという典型的な誤りから生じ、このプロジェクトは、多様な能力を持つ個人に権限を与える仕事と奨学金のポータルである。成功物語、ユーザー中心の特徴、実践的な機会を通じて、物語を形作りながらレジリエンスと傾倒を育む。このプラットフォームのデュアルプログレッシブ戦略は、プライドを具現化し、現実のソリューションを提供し、触れる生活に永続的な影響を与える。 The internet has become a vital resource for job seekers in today's technologically advanced world, particularly for those with impairments. They mainly rely on internet resources to find jobs that fit their particular requirements and skill set. Though some disabled candidates receive prompt responses and job offers, others find it difficult to traverse the intricate world of job portals, the efficacy of this process frequently varies. This discrepancy results from a typical error: a failure to completely comprehend and utilize the accessibility features and functions that can significantly expedite and simplify the job search process for people with impairments.This project is a job and scholarship portal that empowers individuals with diverse abilities. Through inspiring success stories, user-centric features, and practical opportunities, it fosters resilience and inclusivity while reshaping narratives. This platform's dual-pronged strategy instills pride and offers real-world solutions, making a lasting impact on the lives it touches.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# モダリティ非依存 fMRI デコードによる視覚・言語 Modality-Agnostic fMRI Decoding of Vision and Language ( http://arxiv.org/abs/2403.11771v1 ) ライセンス: Link先を確認	Mitja Nikolaus, Milad Mozafari, Nicholas Asher, Leila Reddy, Rufin VanRullen,	(参考訳) 従来の研究では、画像を見る被験者の脳活動データを視覚モデル(モダリティ特異的デコーディング)だけでなく、言語モデル(モダリティ横断デコーディング)の特徴表現空間にマッピングすることが可能であることが示されている。本研究では,画像とテキスト記述の両方を見ている人々の大規模なfMRIデータセット(約8,500件の被験者毎のトライアル)を新たに導入し,使用した。このデータセットは、刺激が提示されるモダリティ(画像またはテキスト)に関係なく、被験者が見ている刺激を予測できる単一のデコーダである、モダリティに依存しないデコーダの開発を可能にする。我々はこのようなデコーダをトレーニングし、脳の信号を様々な利用可能な視覚、言語、マルチモーダル(ビジョン+言語)モデルから刺激表現にマッピングする。その結果,(1) モダリティに依存しないデコーダ,(2) モダリティに依存しないデコーダ,(2) モダリティに依存しないデコーダ,(3) 言語と低レベルの視覚(後頭)脳領域がテキストや画像刺激の復号に最適であるのに対し,高レベルの視覚(側頭)領域は両方の刺激タイプでよく機能することがわかった。 Previous studies have shown that it is possible to map brain activation data of subjects viewing images onto the feature representation space of not only vision models (modality-specific decoding) but also language models (cross-modal decoding). In this work, we introduce and use a new large-scale fMRI dataset (~8,500 trials per subject) of people watching both images and text descriptions of such images. This novel dataset enables the development of modality-agnostic decoders: a single decoder that can predict which stimulus a subject is seeing, irrespective of the modality (image or text) in which the stimulus is presented. We train and evaluate such decoders to map brain signals onto stimulus representations from a large range of publicly available vision, language and multimodal (vision+language) models. Our findings reveal that (1) modality-agnostic decoders perform as well as (and sometimes even better than) modality-specific decoders (2) modality-agnostic decoders mapping brain data onto representations from unimodal models perform as well as decoders relying on multimodal representations (3) while language and low-level visual (occipital) brain regions are best at decoding text and image stimuli, respectively, high-level visual (temporal) regions perform well on both stimulus types.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# S-JEPA:動的空間的注意によるシームレスなデータセット間転送に向けて S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention ( http://arxiv.org/abs/2403.11772v1 ) ライセンス: Link先を確認	Pierre Guetschel, Thomas Moreau, Michael Tangermann,	(参考訳) 本稿では,脳波信号処理におけるシームレスなクロスデータセット転送の課題に触発され,JEPA(Joint Embedding Predictive Architectures)の利用に関する探索的研究を行う。近年,様々な領域におけるトランスファーラーニングにおいて,自己指導型学習が有望なアプローチとして出現している。しかし、脳波信号への応用はいまだに未解明である。本稿では、新しい領域固有の空間ブロックマスキング戦略と、下流分類のための3つの新しいアーキテクチャを含む、脳波記録を表現するためのSignal-JEPAを紹介する。本研究は54-subjectsデータセットを用いて,運動画像,ERP,SSVEPの3つのBCIパラダイムを用いて,モデルの下流性能を評価する。本研究は脳波信号符号化におけるJEPAの可能性に関する予備的証拠を提供する。特に,本研究では,下流分類における空間フィルタリングの重要性を強調し,事前学習例の長さが下流性能に与える影響を明らかにした。 Motivated by the challenge of seamless cross-dataset transfer in EEG signal processing, this article presents an exploratory study on the use of Joint Embedding Predictive Architectures (JEPAs). In recent years, self-supervised learning has emerged as a promising approach for transfer learning in various domains. However, its application to EEG signals remains largely unexplored. In this article, we introduce Signal-JEPA for representing EEG recordings which includes a novel domain-specific spatial block masking strategy and three novel architectures for downstream classification. The study is conducted on a 54~subjects dataset and the downstream performance of the models is evaluated on three different BCI paradigms: motor imagery, ERP and SSVEP. Our study provides preliminary evidence for the potential of JEPAs in EEG signal encoding. Notably, our results highlight the importance of spatial filtering for accurate downstream classification and reveal an influence of the length of the pre-training examples but not of the mask size on the downstream performance.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# DVN-SLAM:局所言語符号化に基づく動的ビジュアルニューラルSLAM DVN-SLAM: Dynamic Visual Neural SLAM Based on Local-Global Encoding ( http://arxiv.org/abs/2403.11776v1 ) ライセンス: Link先を確認	Wenhua Wu, Guangming Wang, Ting Deng, Sebastian Aegidius, Stuart Shanks, Valerio Modugno, Dimitrios Kanoulas, Hesheng Wang,	(参考訳) 暗黙的表現に基づく同時局所化マッピング(SLAM)に関する最近の研究は,屋内環境において有望な成果を示した。しかし、暗黙のエンコーディングのシーン表現能力の制限、暗黙の表現からのレンダリングプロセスの不確実性、動的オブジェクトによる一貫性の破壊など、いくつかの課題がある。これらの課題に対処するため,DVN-SLAM という,局所グロバル融合型ニューラル暗黙表現に基づくリアルタイム動的視覚SLAMシステムを提案する。シーン表現能力を向上させるために,グローバルな構造と局所的な詳細の両方を考慮して暗黙の地図を構築することができる,局所的な融合型ニューラル暗黙の表現を導入する。レンダリング処理から生じる不確実性に対処するため,物体表面のシーン情報に集中して,最適化のための情報集中損失を設計する。提案したDVN-SLAMは、複数のデータセットをまたいだローカライゼーションとマッピングにおいて、競合的な性能を達成する。さらに重要なことは、DVN-SLAMは、他のNeRFベースの方法と異なる特徴である動的シーンの堅牢性を示す。 Recent research on Simultaneous Localization and Mapping (SLAM) based on implicit representation has shown promising results in indoor environments. However, there are still some challenges: the limited scene representation capability of implicit encodings, the uncertainty in the rendering process from implicit representations, and the disruption of consistency by dynamic objects. To address these challenges, we propose a real-time dynamic visual SLAM system based on local-global fusion neural implicit representation, named DVN-SLAM. To improve the scene representation capability, we introduce a local-global fusion neural implicit representation that enables the construction of an implicit map while considering both global structure and local details. To tackle uncertainties arising from the rendering process, we design an information concentration loss for optimization, aiming to concentrate scene information on object surfaces. The proposed DVN-SLAM achieves competitive performance in localization and mapping across multiple datasets. More importantly, DVN-SLAM demonstrates robustness in dynamic scenes, a trait that sets it apart from other NeRF-based methods.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# 通信プラットフォームにおけるリアルタイムディープフェイク音声検出システムの開発に向けて Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms ( http://arxiv.org/abs/2403.11778v1 ) ライセンス: Link先を確認	Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan,	(参考訳) ディープフェイクオーディオは、音声ストリームの整合性のためにリアルタイム検出を必要とする通信プラットフォームにおいて、ますます脅威となる。本研究は,従来の非リアルタイム手法と異なり,リアルタイム通信プラットフォームにおける静的ディープフェイク音声検出モデルの適用可能性を評価する。実行可能ソフトウェアはクロスプラットフォーム互換のために開発され、リアルタイム実行が可能である。 ResnetとLCNNアーキテクチャに基づく2つのディープフェイクオーディオ検出モデルは、ASVspoof 2019データセットを使用して実装されており、ASVspoof 2019チャレンジベースラインと比較してベンチマークパフォーマンスが達成されている。本研究は、これらのモデルを強化するための戦略とフレームワークを提案し、通信プラットフォームにおけるリアルタイムディープフェイク音声検出の道を開いた。この研究は、オーディオストリームセキュリティの進歩に寄与し、動的でリアルタイムな通信シナリオにおけるロバストな検出機能を保証する。 Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# Prompt-Singer:自然言語による制御可能なSing-Voice-Synthesis Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt ( http://arxiv.org/abs/2403.11780v1 ) ライセンス: Link先を確認	Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao,	(参考訳) 近年の歌声合成法(SVS)は,声質や自然性に優れるが,歌声のスタイル特性を明示的に制御する能力は乏しい。本稿では,歌手の性別,声域,音量を自然言語で制御できる最初のSVS手法であるPrompt-Singerを提案する。マルチスケール階層を持つデコーダのみのトランスフォーマーに基づくモデルアーキテクチャを採用し、メロディ的精度を維持しつつテキスト条件付き声域制御が可能なレンジメロディデカップリングピッチ表現を設計する。さらに,テキスト表現の種類,テキストエンコーダの微調整,データ不足を軽減するための音声データの導入など,さまざまな実験環境についても検討する。実験により,本モデルは良好な制御能力と音質が得られることが示された。オーディオサンプルはhttp://prompt-singer.github.io で公開されている。 Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .	翻訳日:2024-03-20 20:29:45 公開日:2024-03-18
# Infinite-ID: ID-semantics Decoupling Paradigmによるアイデンティティ保存型パーソナライゼーション Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm ( http://arxiv.org/abs/2403.11781v1 ) ライセンス: Link先を確認	Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, Bin Li,	(参考訳) テキスト・ツー・イメージ生成のための拡散モデルの最近の進歩を反映して、アイデンティティ保存されたパーソナライゼーションは、単一の参照画像で特定のアイデンティティを正確に把握する上で大きな進歩を遂げた。しかし、既存の手法は、主にテキスト埋め込み空間に参照画像を統合するため、画像とテキスト情報の複雑な絡み合いが生じ、アイデンティティの忠実さとセマンティック一貫性の両立が困難になる。この課題に対処するために、アイデンティティ保存パーソナライゼーションのためのID-セマンティック・デカップリングパラダイムであるInfinite-IDを提案する。具体的には、拡散モデルの元のテキスト・クロス・アテンション・モジュールを非活性化しながら、十分なID情報を取得するために、追加のイメージ・クロス・アテンション・モジュールを組み込んだアイデンティティ・エンハンス・トレーニングを導入する。これにより、画像ストリームは、テキスト入力からの干渉を緩和しつつ、参照画像によって提供されるアイデンティティを忠実に表現することを保証する。さらに,2つのストリームをシームレスにマージするために,混合アテンションモジュールとAdaIN平均演算を組み合わせた機能相互作用機構を導入する。このメカニズムは、アイデンティティとセマンティック一貫性の完全性を高めるだけでなく、生成された画像のスタイルを便利に制御できる。原画像生成とスタイル画像生成の双方に対する大規模な実験結果から,提案手法の優れた性能が示された。 Drawing on recent advancements in diffusion models for text-to-image generation, identity-preserved personalization has made significant progress in accurately capturing specific identities with just a single reference image. However, existing methods primarily integrate reference images within the text embedding space, leading to a complex entanglement of image and text information, which poses challenges for preserving both identity fidelity and semantic consistency. To tackle this challenge, we propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization. Specifically, we introduce identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information while deactivating the original text cross-attention module of the diffusion model. This ensures that the image stream faithfully represents the identity provided by the reference image while mitigating interference from textual input. Additionally, we introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams. This mechanism not only enhances the fidelity of identity and semantic consistency but also enables convenient control over the styles of the generated images. Extensive experimental results on both raw photo generation and style image generation demonstrate the superior performance of our proposed method.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# ガウス過程による選好と選択から学ぶチュートリアル A tutorial on learning from preferences and choices with Gaussian Processes ( http://arxiv.org/abs/2403.11782v1 ) ライセンス: Link先を確認	Alessio Benavoli, Dario Azzimonti,	(参考訳) 推奨モデリングは、経済学、決定理論、機械学習、統計学の交差点にある。個人の好みを理解し、どのように選択するかを理解することで、期待にぴったり合う製品を構築することができ、幅広い領域にわたってより効率的でパーソナライズされたアプリケーションを実現することができます。本チュートリアルの目的は,ガウス的プロセス(GP)による嗜好学習のための包括的で包括的な枠組みを提示し,理性原理(経済学や意思決定理論など)を学習プロセスにシームレスに組み込む方法を示すことである。このフレームワークは、確率関数を適切に調整することにより、ランダムなユーティリティモデル、識別の限界、およびオブジェクトとラベルの両方に矛盾する複数のユーティリティを持つシナリオを含む嗜好学習モデルの構築を可能にする。このチュートリアルは、既存の文献の特定のギャップに対処する新しいGPベースのモデルを同時に導入しながら、確立された研究の上に構築されている。 Preference modelling lies at the intersection of economics, decision theory, machine learning and statistics. By understanding individuals' preferences and how they make choices, we can build products that closely match their expectations, paving the way for more efficient and personalised applications across a wide range of domains. The objective of this tutorial is to present a cohesive and comprehensive framework for preference learning with Gaussian Processes (GPs), demonstrating how to seamlessly incorporate rationality principles (from economics and decision theory) into the learning process. By suitably tailoring the likelihood function, this framework enables the construction of preference learning models that encompass random utility models, limits of discernment, and scenarios with multiple conflicting utilities for both object- and label-preference. This tutorial builds upon established research while simultaneously introducing some novel GP-based models to address specific gaps in the existing literature.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# mqdtfit: 経験的マルチ量子量子欠陥計算のためのPython関数のコレクション mqdtfit: A collection of Python functions for empirical multichannel quantum defect calculations ( http://arxiv.org/abs/2403.11783v1 ) ライセンス: Link先を確認	R. M. Potvliege,	(参考訳) この論文で配布されるPython関数は、複素原子の励起束縛状態を記述する多量子量子欠陥理論モデルのパラメータを計算するのに使うことができる。これらのパラメータは、ユーザが提供した実験データにモデルを適用することで得られる。理論の2つの主要な定式化は、モデルのパラメータが固有チャネル量子欠陥と変換行列の集合であるもの、およびこれらのパラメータが反応行列の要素であるものをサポートする。この分布は、理論エネルギーレベルを計算し、混合係数とチャネル分率を計算し、Lu-Fanoプロットを生成するプログラムを含む。 The Python functions distributed with this article can be used for calculating the parameters of multichannel quantum defect theory models describing excited bound states of complex atoms. These parameters are obtained by fitting a model to experimental data provided by the user. The two main formulations of the theory are supported, namely the one in which the parameters of the model are a set of eigen channel quantum defects and a transformation matrix, and the one where these parameters are the elements of a reactance matrix. The distribution includes programs for calculating theoretical energy levels, calculating mixing coefficients and channel fractions and producing Lu-Fano plots.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# ForzaETH Race Stack - 完全商用オフザシェルハードウェア上での大規模自動ヘッド・ツー・ヘッドレース ForzaETH Race Stack - Scaled Autonomous Head-to-Head Racing on Fully Commercial off-the-Shelf Hardware ( http://arxiv.org/abs/2403.11784v1 ) ライセンス: Link先を確認	Nicolas Baumann, Edoardo Ghignone, Jonas Kühne, Niklas Bastuck, Jonathan Becker, Nadine Imholz, Tobias Kränzlin, Tian Yi Lim, Michael Lötscher, Luca Schwarzenbach, Luca Tognoni, Christian Vogt, Andrea Carron, Michele Magno,	(参考訳) ロボット工学における自律的なレースは、信頼性とリアルタイムな意思決定の必要性と、高速なダイナミクスを組み合わせる。このようなレースはソフトウェアとハードウェアを限界まで押し上げるが、既存のフルシステムソリューションの多くは複雑でカスタムなハードウェアとソフトウェアを必要とする。これにより再現性が制限され、機械、電気、ロボティクスの分野における総合的な専門知識を持つ、よく調達された研究所で、進歩と複製が実現可能である。自律性領域に関心がある研究者は、これらの分野の1つで部分的な経験しか持たないため、親しみと統合にかなりの時間を費やす必要がある。 ForzaETH Race Stackは、F1TENTHのために設計された自動運転レーシングソフトウェアプラットフォームを提供することで、このギャップに対処する。このアプローチは、自律レースの競争的側面を強化し、この分野における研究開発のためのアクセス可能なプラットフォームを提供する。 ForzaETH Race Stackはモジュラリティと運用上の使いやすさを念頭に設計されており、トラックの摩擦やレイアウトといった様々な環境条件へのカスタマイズと適応性を実現している。タイムトリアルレースとヘッド・ツー・ヘッドレースの両方を扱えるスタックは、公式のF1TENTH国際大会で複数回優勝し、その有効性、堅牢性、適応性を示した。 Autonomous racing in robotics combines high-speed dynamics with the necessity for reliability and real-time decision-making. While such racing pushes software and hardware to their limits, many existing full-system solutions necessitate complex, custom hardware and software, and usually focus on Time-Trials rather than full unrestricted Head-to-Head racing, due to financial and safety constraints. This limits their reproducibility, making advancements and replication feasible mostly for well-resourced laboratories with comprehensive expertise in mechanical, electrical, and robotics fields. Researchers interested in the autonomy domain but with only partial experience in one of these fields, need to spend significant time with familiarization and integration. The ForzaETH Race Stack addresses this gap by providing an autonomous racing software platform designed for F1TENTH, a 1:10 scaled Head-to-Head autonomous racing competition, which simplifies replication by using commercial off-the-shelf hardware. This approach enhances the competitive aspect of autonomous racing and provides an accessible platform for research and development in the field. The ForzaETH Race Stack is designed with modularity and operational ease of use in mind, allowing customization and adaptability to various environmental conditions, such as track friction and layout. Capable of handling both Time-Trials and Head-to-Head racing, the stack has demonstrated its effectiveness, robustness, and adaptability in the field by winning the official F1TENTH international competition multiple times.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# 事前学習大言語モデルを用いたハイパーリレーショナル知識グラフの構築 Construction of Hyper-Relational Knowledge Graphs Using Pre-Trained Large Language Models ( http://arxiv.org/abs/2403.11786v1 ) ライセンス: Link先を確認	Preetha Datta, Fedor Vitiugin, Anastasiia Chizhikova, Nitin Sawhney,	(参考訳) 包括的知識グラフの構築にはハイパーリレーションの抽出が不可欠だが,このタスクには限定的な教師付き手法が存在する。このギャップに対処するために,OpenAIのGPT-3.5モデルを用いたゼロショットプロンプトベースの手法を導入し,テキストからハイパーリレーショナルな知識を抽出する。モデルとベースラインを比較して,0.77のリコールで有望な結果を得た。現在、精度は低いが、モデル出力の詳細な分析により、この分野における今後の研究の道筋が明らかになっている。 Extracting hyper-relations is crucial for constructing comprehensive knowledge graphs, but there are limited supervised methods available for this task. To address this gap, we introduce a zero-shot prompt-based method using OpenAI's GPT-3.5 model for extracting hyper-relational knowledge from text. Comparing our model with a baseline, we achieved promising results, with a recall of 0.77. Although our precision is currently lower, a detailed analysis of the model outputs has uncovered potential pathways for future research in this area.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# EMIE-MAP:明示的メッシュと暗黙的符号化に基づく大規模路面再構成 EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding ( http://arxiv.org/abs/2403.11789v1 ) ライセンス: Link先を確認	Wenhua Wu, Qi Wang, Guangming Wang, Junping Wang, Tiankun Zhao, Yang Liu, Dongchao Gao, Zhe Liu, Hesheng Wang,	(参考訳) 道路路面の再構築は自動運転システムにおいて重要な役割を担い、道路路面の認識と高精度マッピングを可能にする。近年,特にシーンテクスチャのリアルなレンダリングにおいて,ニューラル暗黙符号化はシーン表現において顕著な成果を上げている。しかし、大規模なシーンの幾何学的情報を直接表現する上での課題に直面している。そこで我々は,明示的メッシュと暗黙的符号化に基づく大規模道路表面再構築手法であるEMIE-MAPを提案する。道路形状は明示的なメッシュで表現され、各頂点は色と意味情報を表す暗黙のエンコーディングを格納する。道路の高架化を最適化することの難しさを克服するために,多層パーセプトロン(MLP)に基づく軌道に基づく高架化初期化と高架化残差学習手法を導入する。さらに,暗黙のエンコーディングとマルチカメラカラーMPPデコーディングを用いることで,シーンの物理的特性とカメラ特性を別々にモデル化し,サラウンドビューを異なるカメラモデルに適合させる。本手法は,様々な現実の難易度シナリオにおいて,顕著な路面復元性能を実現する。 Road surface reconstruction plays a vital role in autonomous driving systems, enabling road lane perception and high-precision mapping. Recently, neural implicit encoding has achieved remarkable results in scene representation, particularly in the realistic rendering of scene textures. However, it faces challenges in directly representing geometric information for large-scale scenes. To address this, we propose EMIE-MAP, a novel method for large-scale road surface reconstruction based on explicit mesh and implicit encoding. The road geometry is represented using explicit mesh, where each vertex stores implicit encoding representing the color and semantic information. To overcome the difficulty in optimizing road elevation, we introduce a trajectory-based elevation initialization and an elevation residual learning method based on Multi-Layer Perceptron (MLP). Additionally, by employing implicit encoding and multi-camera color MLPs decoding, we achieve separate modeling of scene physical properties and camera characteristics, allowing surround-view reconstruction compatible with different camera models. Our method achieves remarkable road surface reconstruction performance in a variety of real-world challenging scenarios.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# Deep Medial Voxels: 解剖学的形状モデリングのためのメディア軸近似の学習 Deep Medial Voxels: Learned Medial Axis Approximations for Anatomical Shape Modeling ( http://arxiv.org/abs/2403.11790v1 ) ライセンス: Link先を確認	Antonio Pepe, Richard Schussnig, Jianning Li, Christina Gsaxner, Dieter Schmalstieg, Jan Egger,	(参考訳) 画像ボリュームからの形状再構成は、医用画像解析において繰り返し必要となる。一般的なワークフローはセグメンテーションステップから始まり、慎重に後処理とアドホックなメッシュアルゴリズムが続く。このシーケンスは時間を要する可能性があるため、ニューラルネットワークはテンプレートの変形によって形状を再構築するように訓練される。これらのネットワークは手動による介入なしに最先端の結果をもたらすが、これまでのところ、個体間のトポロジ的多様性がほとんどない解剖学的形状で評価されてきた。対照的に、他の研究は、メッシュ化と視覚化に複数の利点がある暗黙の形状モデルを学ぶことを好んでいる。我々の研究は、画像の体積からトポロジカルな骨格を忠実に近似した半単純表現であるディープ・メディカル・ボクセルを導入し、最終的に畳み込み面による形状復元へと導いた。再現技術は,可視化と計算機シミュレーションの両方の可能性を示している。 Shape reconstruction from imaging volumes is a recurring need in medical image analysis. Common workflows start with a segmentation step, followed by careful post-processing and,finally, ad hoc meshing algorithms. As this sequence can be timeconsuming, neural networks are trained to reconstruct shapes through template deformation. These networks deliver state-ofthe-art results without manual intervention, but, so far, they have primarily been evaluated on anatomical shapes with little topological variety between individuals. In contrast, other works favor learning implicit shape models, which have multiple benefits for meshing and visualization. Our work follows this direction by introducing deep medial voxels, a semi-implicit representation that faithfully approximates the topological skeleton from imaging volumes and eventually leads to shape reconstruction via convolution surfaces. Our reconstruction technique shows potential for both visualization and computer simulations.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# PAON:パデ近似を用いた新しいニューロンモデル PAON: A New Neuron Model using Padé Approximants ( http://arxiv.org/abs/2403.11791v1 ) ライセンス: Link先を確認	Onur Keleş, A. Murat Tekalp,	(参考訳) 畳み込みニューラルネットワーク(CNN)は古典的なマカロック・ピッツニューロンモデルに基づいて構築されている。いくつかの研究者は、二次ニューロン、一般化された操作ニューロン、生成ニューロン、スーパーニューロンを含む強化されたニューロンモデルを提案しており、ポイントワイド活性化関数によって提供されるものよりも強い非線形性を持っている。また、Pade近似を一般化活性化関数として使う提案もある。本稿では,異なる順序の多項式の比として超越関数の最適数学的近似であるPade近似にインスパイアされた,Padeニューロン(Paons)と呼ばれる新しいニューロンモデルを紹介する。 Paonsは、他のすべての提案されたニューロンモデルのスーパーセットであることを示す。したがって、既知のCNNモデルの基本ニューロンは、Paonsに置き換えられる。本稿では、よく知られたResNetをPaonsによって構築されたPadeNetに拡張し、そのコンセプトを実証する。単一画像超解像タスクにおける実験により,PadeNetsは競合するアーキテクチャよりも優れた結果が得られることが示された。 Convolutional neural networks (CNN) are built upon the classical McCulloch-Pitts neuron model, which is essentially a linear model, where the nonlinearity is provided by a separate activation function. Several researchers have proposed enhanced neuron models, including quadratic neurons, generalized operational neurons, generative neurons, and super neurons, with stronger nonlinearity than that provided by the pointwise activation function. There has also been a proposal to use Pade approximation as a generalized activation function. In this paper, we introduce a brand new neuron model called Pade neurons (Paons), inspired by the Pade approximants, which is the best mathematical approximation of a transcendental function as a ratio of polynomials with different orders. We show that Paons are a super set of all other proposed neuron models. Hence, the basic neuron in any known CNN model can be replaced by Paons. In this paper, we extend the well-known ResNet to PadeNet (built by Paons) to demonstrate the concept. Our experiments on the single-image super-resolution task show that PadeNets can obtain better results than competing architectures.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# SETA:ドメイン・ジェネリゼーションのためのセマンティック・アウェア・トークン強化 SETA: Semantic-Aware Token Augmentation for Domain Generalization ( http://arxiv.org/abs/2403.11792v1 ) ライセンス: Link先を確認	Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao,	(参考訳) ドメイン一般化(DG)は、ターゲットドメインにアクセスすることなく、ドメインシフトに対するモデル堅牢性を高めることを目的としている。 DGのメソッドの一般的なカテゴリはデータ拡張であり、ドメインシフトをシミュレートする仮想サンプルの生成に焦点を当てている。しかし、DGの既存の拡張技術は、主に畳み込みニューラルネットワーク(CNN)向けに調整されており、トークンベースのアーキテクチャ、すなわちビジョントランスフォーマー(ViT)と多層パーセプトロン(MLP)モデルでの探索が限られている。本稿では,従来のCNNによる拡張手法がトークンベースモデルに与える影響について検討し,その性能が最適であることを明らかにする。この問題に対処するため,Semantic-Aware Token Augmentation (SETA)法を提案する。 SETAは、グローバルな形状の特徴を保持しつつ、局所的なエッジキューを摂動させることでトークンの特徴を変換し、形状情報のモデル学習を強化する。モデルの一般化能力をさらに高めるため,DGにおける2つの最先端スタイル拡張手法と組み合わせて,2種類のスタイルのバリエーションを導入する。本手法について理論的考察を行い,一般化リスク境界の低減効果を示す。 5つのベンチマークの総合的な実験により、本手法は様々なViTおよびMPPアーキテクチャでSOTA性能を実現することが証明された。私たちのコードはhttps://github.com/lingeringlight/SETAで公開されています。 Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the SEmantic-aware Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at https://github.com/lingeringlight/SETA.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# 大規模言語モデルの推論能力:抽象と推論コーパスの詳細な分析 Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus ( http://arxiv.org/abs/2403.11793v1 ) ライセンス: Link先を確認	Seungpil Lee, Woochang Sim, Donghyeon Shin, Sanha Hwang, Wongyu Seo, Jiwon Park, Seokki Lee, Sejin Kim, Sundong Kim,	(参考訳) 大規模言語モデル(LLM)の推論能力を評価する既存の手法は結果中心であり,推論プロセスの評価が困難である。プロセス中心の方法で大規模言語モデルの推論と文脈理解能力を評価するために,ARCデータセットを用いた新しい手法を提案する。 ARCは問題解決のために厳密な論理構造を必要としており、モデル推論能力と人間の比較を容易にするベンチマークである。実験の結果、大きな言語モデルは推論能力が弱いが、論理的一貫性、構成性、生産性の点でまだ遅れていることが明らかとなった。実験では,LLMの推論能力を強調し,人間レベルの推論を実現するための開発経路を提案する。 The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# 低コストプライバシ対応分散型学習 Low-Cost Privacy-Aware Decentralized Learning ( http://arxiv.org/abs/2403.11795v1 ) ライセンス: Link先を確認	Sayan Biswas, Davide Frey, Romaric Gaudel, Anne-Marie Kermarrec, Dimitri Lerévérend, Rafael Pires, Rishi Sharma, François Taïani,	(参考訳) 本稿では、モデルトレーニングプロセス中に各モデル更新に相関ノイズを追加することに依存する、新しいプライバシ対応分散学習(DL)アルゴリズムであるZIP-DLを紹介する。この手法により、付加されたノイズはその相関関係により凝集過程中にほぼ中和し、モデル精度への影響を最小限に抑えることができる。さらに、ZIP-DLはノイズキャンセリングのために複数の通信ラウンドを必要としないため、プライバシ保護と通信オーバーヘッドの共通トレードオフに対処する。本稿では,収束速度とプライバシ保証の両方を理論的に保証し,ZIP-DLを実用シナリオに適用する。本研究は,ZIP-DLが脆弱性と精度の最良のトレードオフを達成していることを示す。特にZIP-DL (i)ベースラインDLと比較して最大52ポイントのリンク性攻撃の有効性を低下させ、 (二)プライバシー保護競争相手に対する会員推論攻撃において、同一の脆弱性に対して最大37の精度ポイントを達成する This paper introduces ZIP-DL, a novel privacy-aware decentralized learning (DL) algorithm that relies on adding correlated noise to each model update during the model training process. This technique ensures that the added noise almost neutralizes itself during the aggregation process due to its correlation, thus minimizing the impact on model accuracy. In addition, ZIP-DL does not require multiple communication rounds for noise cancellation, addressing the common trade-off between privacy protection and communication overhead. We provide theoretical guarantees for both convergence speed and privacy guarantees, thereby making ZIP-DL applicable to practical scenarios. Our extensive experimental study shows that ZIP-DL achieves the best trade-off between vulnerability and accuracy. In particular, ZIP-DL (i) reduces the effectiveness of a linkability attack by up to 52 points compared to baseline DL, and (ii) achieves up to 37 more accuracy points for the same vulnerability under membership inference attacks against a privacy-preserving competitor	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# OpenOcc: Occupancy Representationによるオープン語彙3Dシーン再構築 OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation ( http://arxiv.org/abs/2403.11796v1 ) ライセンス: Link先を確認	Haochen Jiang, Yueming Xu, Yihan Zeng, Hang Xu, Wei Zhang, Jianfeng Feng, Li Zhang,	(参考訳) 3D再構成は、移動ロボットの自律ナビゲーション分野で広く利用されている。しかし、以前の研究では、人間のインタラクションや視覚ナビゲーションといった高度なタスクを制限する、オープンワールドのシーン理解能力のない基本的な幾何学構造しか提供できない。さらに、従来の3Dシーン理解アプローチでは、高価なラベル付き3Dデータセットを使用して、単一のタスクのためにモデルをトレーニングしている。このように、ゼロショットシーン理解による幾何学的再構築、すなわちオープンな3次元理解と再構築は、将来の移動ロボットの発展に不可欠である。本稿では,3次元シーン再構成とオープン語彙理解をニューラルラディアンス場と統合する新しいフレームワークであるOpenOccを提案する。シーンの幾何学的構造を占有表現でモデル化し,ゼロショット推論のためのボリュームレンダリングを用いて,事前学習した開語彙モデルを3次元言語フィールドに蒸留する。さらに, 蒸留特性における不整合測定による言語表現の退化を解消するために, セマンティック・アウェア・アウェア・インシュレイト・プロポーザル (SCP) 法が提案されている。実験結果から,本手法は3次元シーン理解タスクにおいて,特に小型・長距離オブジェクトにおいて,競争性能が向上することが示された。 3D reconstruction has been widely used in autonomous navigation fields of mobile robotics. However, the former research can only provide the basic geometry structure without the capability of open-world scene understanding, limiting advanced tasks like human interaction and visual navigation. Moreover, traditional 3D scene understanding approaches rely on expensive labeled 3D datasets to train a model for a single task with supervision. Thus, geometric reconstruction with zero-shot scene understanding i.e. Open vocabulary 3D Understanding and Reconstruction, is crucial for the future development of mobile robots. In this paper, we propose OpenOcc, a novel framework unifying the 3D scene reconstruction and open vocabulary understanding with neural radiance fields. We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference. Furthermore, a novel semantic-aware confidence propagation (SCP) method has been proposed to relieve the issue of language field representation degeneracy caused by inconsistent measurements in distilled features. Experimental results show that our approach achieves competitive performance in 3D scene understanding tasks, especially for small and long-tail objects.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# パスワードを忘れた人は誰か? アカウントのリカバリがリスクベースの認証と出会う Is It Really You Who Forgot the Password? When Account Recovery Meets Risk-Based Authentication ( http://arxiv.org/abs/2403.11798v1 ) ライセンス: Link先を確認	Andre Büttner, Andreas Thue Pedersen, Stephan Wiefling, Nils Gruschka, Luigi Lo Iacono,	(参考訳) リスクベースの認証(RBA)は、ユーザアカウントを不正な乗っ取りから保護するためにオンラインサービスで使用される。 RBAは一般的に、ログインコンテキストの特徴的属性が既知の値から逸脱した場合に、不審なログインの試みを示すコンテキスト的特徴を使用する。 RBAと認証における異常検出に関するこれまでの研究は、主にログインプロセスに焦点を当ててきた。しかし、最近の攻撃は認証プロセスの他の部分、特にアカウント回復機能における脆弱性を明らかにしている。したがって、総合的な認証セキュリティを確保するためには、アカウント回復の文脈における異常検出の使用も検討する必要がある。本研究は,野生におけるリスクベース会計回復(RBAR)を調査するための最初の研究である。 RBARを5つの著名なオンラインサービス(RBA)で採用した事例を分析した。調査の結果、Google、LinkedIn、AmazonでのRBARの使用が確認されました。さらに、これらのサービスの様々なRBARメカニズムに関する洞察を提供し、それらに対する多要素認証の影響を探る。この結果をもとに,RBARの課題に対する最初の成熟度モデルを構築した。当社の目標は、開発者、管理者、政策立案者がRBARを最初に理解することを支援し、この方向へのさらなる研究を促進することです。 Risk-based authentication (RBA) is used in online services to protect user accounts from unauthorized takeover. RBA commonly uses contextual features that indicate a suspicious login attempt when the characteristic attributes of the login context deviate from known and thus expected values. Previous research on RBA and anomaly detection in authentication has mainly focused on the login process. However, recent attacks have revealed vulnerabilities in other parts of the authentication process, specifically in the account recovery function. Consequently, to ensure comprehensive authentication security, the use of anomaly detection in the context of account recovery must also be investigated. This paper presents the first study to investigate risk-based account recovery (RBAR) in the wild. We analyzed the adoption of RBAR by five prominent online services (that are known to use RBA). Our findings confirm the use of RBAR at Google, LinkedIn, and Amazon. Furthermore, we provide insights into the different RBAR mechanisms of these services and explore the impact of multi-factor authentication on them. Based on our findings, we create a first maturity model for RBAR challenges. The goal of our work is to help developers, administrators, and policy-makers gain an initial understanding of RBAR and to encourage further research in this direction.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# Counting-Stars: 長期の大規模言語モデルを評価するためのシンプルで効率的で合理的な戦略 Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models ( http://arxiv.org/abs/2403.11802v1 ) ライセンス: Link先を確認	Mingyang Song, Mao Zheng, Xuan Luo,	(参考訳) 最近の研究は、堅牢なLarge Language Models(LLMs)の開発に集中しているが、適切な評価戦略が欠けているため、LLM(例えばChatGPTやKimiChat)の長文処理能力とパフォーマンスについてはあまり知られていない。このギャップに対処するために、長文LLMを新しいベンチマークであるCounting-Starsとして評価するための、シンプルで効率的で合理的な戦略を提案する。 Counting-Starsは、LLMが長いコンテキストにおける長い依存関係を完全に理解し、キャプチャし、タスクを完了するためにコンテキスト全体にまたがる複数のエビデンスにまたがる依存性を収集できるように設計されている。計数星に基づいて, GPT-4 Turbo と Kimi Chat の2つの長文 LLM の評価実験を行った。実験の結果, GPT-4 Turbo と Kimi Chat は, 4K から 18K までの長い文脈で高い性能を示した。さらに,LLM処理長コンテキストの動作に関する2つの興味深い分析を行った。 While recent research endeavors have concentrated on developing Large Language Models (LLMs) with robust long-context capabilities, due to the lack of appropriate evaluation strategies, relatively little is known about how well the long-context processing abilities and performance of leading LLMs (e.g., ChatGPT and KimiChat). To address this gap, we propose a simple, efficient, and reasonable strategy for evaluating long-context LLMs as a new benchmark, named Counting-Stars. The Counting-Stars is designed to require LLMs to fully understand and capture long dependencies in long contexts and be able to collect inter-dependency across multiple pieces of evidence spanning the entire context to finish the task. Based on the Counting-Stars, we conduct experiments to evaluate the two leading long-context LLMs, i.e., GPT-4 Turbo and Kimi Chat. The experimental results indicate that GPT-4 Turbo and Kimi Chat achieve significant performance in the long context from 4K to 128K. We further present two intriguing analyses regarding the behavior of LLMs processing long context.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# パーソナライズされた脳腫瘍切片に対するフェデレーションモード特異的エンコーダとマルチモーダルアンカー Federated Modality-specific Encoders and Multimodal Anchors for Personalized Brain Tumor Segmentation ( http://arxiv.org/abs/2403.11803v1 ) ライセンス: Link先を確認	Qian Dai, Dong Wei, Hong Liu, Jinghan Sun, Liansheng Wang, Yefeng Zheng,	(参考訳) 医用画像解析のための既存のフェデレートラーニング (FL) 法の多くは、モーダル内不均一性のみを考慮し、マルチモーダルイメージングへの応用に限定している。実際には、一部のFL参加者が完全な画像モダリティのサブセットしか持たないことは珍しくなく、すべての参加者のデータに基づいてグローバルモデルを効果的に訓練するための課題として、モーダル間不均一性(inter-modal heterogeneity)を呈している。さらに、各参加者は、このようなシナリオでFLからローカルデータの特徴に合わせたパーソナライズされたモデルを得ることを期待している。本研究では,2つの並列問題に同時に対処するため,FedMEMA(FedMEMA)とFedMEMA(FedMEMA)を組み合わせた新しいFLフレームワークを提案する。とりわけ、FedMEMAは、まずはモーダル間の不均一性を考慮するために、各モーダルに排他的エンコーダを使用している。一方、エンコーダは参加者によって共有されるが、デコーダは個々のニーズに合わせてパーソナライズされる。具体的には、フルモーダルデータを持つサーバは、フュージョンデコーダを使用して、すべてのモダリティ固有のエンコーダから表現を集約およびヒューズし、モダリティをブリッジして、バックプロパゲーションを介してエンコーダを最適化する。一方、融合マルチモーダル表現から複数のアンカーを抽出し、エンコーダパラメータに加えてクライアントに分散する。一方、不完全なモダリティを持つクライアントは、スケールしたドット積のクロスアテンションを通じて、グローバルなフルモーダルアンカーに対する不完全なモダリティ表現をキャリブレーションし、現在のモダリティの表現を適用しながら、不完全なモダリティによる情報損失を補う。 FedMEMAは、マルチモーダル脳腫瘍セグメンテーションのためのBraTS 2020ベンチマークで検証されている。その結果、マルチモーダルかつパーソナライズされたFLの様々な最新手法よりも優れており、その新規設計が有効であることがわかった。私たちのコードは利用可能です。 Most existing federated learning (FL) methods for medical image analysis only considered intramodal heterogeneity, limiting their applicability to multimodal imaging applications. In practice, it is not uncommon that some FL participants only possess a subset of the complete imaging modalities, posing inter-modal heterogeneity as a challenge to effectively training a global model on all participants' data. In addition, each participant would expect to obtain a personalized model tailored for its local data characteristics from the FL in such a scenario. In this work, we propose a new FL framework with federated modality-specific encoders and multimodal anchors (FedMEMA) to simultaneously address the two concurrent issues. Above all, FedMEMA employs an exclusive encoder for each modality to account for the inter-modal heterogeneity in the first place. In the meantime, while the encoders are shared by the participants, the decoders are personalized to meet individual needs. Specifically, a server with full-modal data employs a fusion decoder to aggregate and fuse representations from all modality-specific encoders, thus bridging the modalities to optimize the encoders via backpropagation reversely. Meanwhile, multiple anchors are extracted from the fused multimodal representations and distributed to the clients in addition to the encoder parameters. On the other end, the clients with incomplete modalities calibrate their missing-modal representations toward the global full-modal anchors via scaled dot-product cross-attention, making up the information loss due to absent modalities while adapting the representations of present ones. FedMEMA is validated on the BraTS 2020 benchmark for multimodal brain tumor segmentation. Results show that it outperforms various up-to-date methods for multimodal and personalized FL and that its novel designs are effective. Our code is available.	翻訳日:2024-03-20 20:19:57 公開日:2024-03-18
# LLMの意思決定はどこまで進んでいるか? マルチエージェント環境におけるLLMのゲーム能力の評価 How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments ( http://arxiv.org/abs/2403.11807v1 ) ライセンス: Link先を確認	Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu,	(参考訳) 様々な種類の能力を必要とする複雑なタスクである意思決定は、LLM(Large Language Models)を評価するための優れたフレームワークを提供する。本研究では, LLMの意思決定能力について, 十分に確立された分野であるゲーム理論のレンズを用いて検討した。 2人以上のエージェントが同時に参加するゲームに特化しています。次に,従来の8種類のマルチエージェントゲームを含むGAMA-Benchを紹介した。これらのゲームにおいて,モデルの性能を定量的に評価するためのスコアリング方式を設計する。 GAMA-Benchを用いて, LLMの堅牢性, 一般化可能性, 拡張戦略について検討する。その結果, GPT-3.5はロバスト性に満足するが, 一般化性は比較的限定的であることがわかった。しかし、その性能はChain-of-Thoughtのようなアプローチによって改善できる。さらに,様々なLCMに対して評価を行い,GAMA-Bench 上で GPT-4 が他のモデルより優れ,スコアが 72.5 であることを確認した。さらに、GPT-3.5(0613, 1106, 0125)の3回にまたがるスコアは、各更新でモデルのインテリジェンスに顕著な進歩を示した。コードと実験結果はhttps://github.com/CUHK-ARISE/GAMABench.comで公開されている。 Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, GAMA-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through GAMA-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on GAMA-Bench, achieving a score of 72.5. Moreover, the increasingly higher scores across the three iterations of GPT-3.5 (0613, 1106, 0125) demonstrate marked advancements in the model's intelligence with each update. The code and experimental results are made publicly available via https://github.com/CUHK-ARISE/GAMABench.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# ViT適応のためのパラメータと推論効率を考慮した動的チューニング Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation ( http://arxiv.org/abs/2403.11808v1 ) ライセンス: Link先を確認	Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You,	(参考訳) 既存のパラメータ効率細調整(PEFT)法は、パラメータ効率を向上させることでビジョントランスフォーマー(ViT)適応において大きな成功を収めた。しかし、適応時の推論効率向上の探索はいまだに未定である。これにより、トレーニング済みのViTモデルのより広範な適用が制限される。本稿では,パラメータと推論効率を両立させる新しい手法である動的チューニング(DyT)を提案する。具体的には,軽量なアダプタモジュールの他に,重要度が低いトークンを区別するトークンディスペンサーを提案し,後者が元のブロックを動的にスキップし,推論時の冗長な計算を低減させる。さらに、DyTのベストプラクティスを見つけるために、複数の設計変種を探索する。最後に,Mix-of-experts (MoE) 機構に着想を得て,適応性能をさらに向上する拡張アダプタを提案する。画像/映像認識やセマンティックセグメンテーションなど,様々なタスクでDyTを検証する。例えば、DyT は既存の PEFT 法と同等またはそれ以上のパフォーマンスを達成し、VTAB-1K ベンチマークでは FLOP の 71%-85% しか実行していない。 Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves comparable or even superior performance compared to existing PEFT methods while evoking only 71%-85% of their FLOPs on the VTAB-1K benchmark.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# LLMのためのメタファー理解チャレンジデータセット Metaphor Understanding Challenge Dataset for LLMs ( http://arxiv.org/abs/2403.11810v1 ) ライセンス: Link先を確認	Xiaoyu Tong, Rochelle Choenni, Martha Lewis, Ekaterina Shutova,	(参考訳) 自然言語のメタファーは、類推や分類のような基本的な認知過程の反映であり、日常のコミュニケーションに深く根ざしている。したがってメタファー理解は、大きな言語モデル(LLM)にとって不可欠なタスクである。 LLMのメタファー理解能力を評価するために,メタファー理解課題データセット(MUNCH)をリリースする。このデータセットは、メタファーの使用を含む文に対して10k以上のパラフレーズと、不適応パラフレーズを含む1.5kのインスタンスを提供する。不適応パラフレーズは、モデルが本当に完全な比喩解釈を行うか、むしろ語彙的類似性に頼るかを決定するための制御として慎重に選択された。アクトと不適応のパラフレーズはすべて手動で注釈付けされた。比喩文は4つのジャンル(学術、ニュース、フィクション、会話)にまたがる自然な比喩をカバーし、それぞれ異なるレベルのノベルティを示す。 LLaMA と GPT-3.5 の実験により、MUNCH は LLM にとって困難な課題であることが示された。データセットはhttps://github.com/xiaoyuisrain/metaphor-understanding-challengeで自由にアクセスできる。 Metaphors in natural language are a reflection of fundamental cognitive processes such as analogical reasoning and categorisation, and are deeply rooted in everyday communication. Metaphor understanding is therefore an essential task for large language models (LLMs). We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLMs. The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases. The inapt paraphrases were carefully selected to serve as control to determine whether the model indeed performs full metaphor interpretation or rather resorts to lexical similarity. All apt and inapt paraphrases were manually annotated. The metaphorical sentences cover natural metaphor uses across 4 genres (academic, news, fiction, and conversation), and they exhibit different levels of novelty. Experiments with LLaMA and GPT-3.5 demonstrate that MUNCH presents a challenging task for LLMs. The dataset is freely accessible at https://github.com/xiaoyuisrain/metaphor-understanding-challenge.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# Aerial Lifting:Aerial Imageryによるニューラルアーバンセマンティックとビルのリフティング Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery ( http://arxiv.org/abs/2403.11812v1 ) ライセンス: Link先を確認	Yuqi Zhang, Guanying Chen, Jiaxing Chen, Shuguang Cui,	(参考訳) 本稿では,3次元にノイズの多い2次元ラベルを持ち上げることで,都市規模のセマンティックスとビルレベルのインスタンスセグメンテーションを実現するためのニューラルラジアンスフィールド手法を提案する。これは2つの主な理由から難しい問題である。第一に、都市空撮画像のオブジェクトは、建物、車、道路など、相当な大きさのバリエーションを示しており、正確な2Dセグメンテーションの課題となっている。第2に,既存のセグメンテーション法によって生成された2Dラベルは,特に空中画像の場合,シーン全体のごく一部しか撮影できない場合,多視点不整合問題に悩まされる。これらの制限を克服するために、我々はまず、異なる高度から予測されるラベルを組み合わせて、異なる大きさのオブジェクトのセグメンテーションを強化するスケール適応型セマンティックラベル融合戦略を導入し、NeRFの新規なビュー合成機能を活用する。次に,2次元のインスタンスラベルにおける多視点不整合問題を緩和するために,3次元シーン表現に基づく新しいクロスビューインスタンスラベルグループ化戦略を導入する。さらに,多視点再構成深度を生かして,再構成放射場の幾何学的品質を向上し,セグメンテーション結果が向上した。複数の実世界の都市規模データセットの実験により、我々のアプローチは既存の手法よりも優れており、その有効性を強調している。 We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D. This is a challenging problem due to two primary reasons. Firstly, objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads, which pose a significant challenge for accurate 2D segmentation. Secondly, the 2D labels generated by existing segmentation methods suffer from the multi-view inconsistency problem, especially in the case of aerial images, where each image captures only a small portion of the entire scene. To overcome these limitations, we first introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes by combining labels predicted from different altitudes, harnessing the novel-view synthesis capabilities of NeRF. We then introduce a novel cross-view instance label grouping strategy based on the 3D scene representation to mitigate the multi-view inconsistency problem in the 2D instance labels. Furthermore, we exploit multi-view reconstructed depth priors to improve the geometric quality of the reconstructed radiance field, resulting in enhanced segmentation results. Experiments on multiple real-world urban-scale datasets demonstrate that our approach outperforms existing methods, highlighting its effectiveness.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# HVDistill: 教師なしハイブリッドビュー蒸留による画像からポイントクラウドへの知識伝達 HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation ( http://arxiv.org/abs/2403.11817v1 ) ライセンス: Link先を確認	Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang,	(参考訳) 本稿では,HVDistillと呼ばれるハイブリッドビューベースの知識蒸留フレームワークについて,教師なしマン・オタクで事前学習したイメージ・ネットワークを用いて,ポイント・クラウド・ニューラルネットの特徴学習を指導する。 RGBカメラとLiDARセンサの幾何学的関係を利用して、画像平面ビューと鳥眼ビューの両方に基づく2つのモードの対応性を確立し、表現学習を容易にする。特に、画像平面対応は、点雲を投影することで単純にオブ・テイニングが可能であり、鳥視対応は、投影された点雲の監督によって予測された深さで3次元空間に画素を持ち上げることで達成できる。画像教師ネットワークは、画像平面ビューからリッチなセマンティクスを提供し、一方、鳥眼ビューから幾何学的情報を取得する。実際、この2つのビューのイメージ特徴は、互いに自然に合成され、同時に、クラウドスタブデントネットワークの学習した特徴表現を改善することができる。さらに、自己教師付き2Dネットワークでは、HVDistillは2Dアノテーションも3Dアノテーションも必要としない。我々は、nuScenesデータセット上のモデルを事前トレーニングし、評価のためにnuScenes、SemanticKITTI、KITTIデータセット上の下流タスクに転送する。その結果,本手法はスクラッチからトレーニングしたベースラインよりも一貫した改善を実現し,既存のスキームをはるかに上回っていることがわかった。コードはgit@github.com:zhangsha1024/HVDistill.gitで入手できる。 We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner. By exploiting the geometric relationship between RGB cameras and LiDAR sensors, the correspondence between the two modalities based on both image- plane view and bird-eye view can be established, which facilitates representation learning. Specifically, the image-plane correspondences can be simply ob- tained by projecting the point clouds, while the bird- eye-view correspondences can be achieved by lifting pixels to the 3D space with the predicted depths un- der the supervision of projected point clouds. The image teacher networks provide rich semantics from the image-plane view and meanwhile acquire geometric information from the bird-eye view. Indeed, image features from the two views naturally comple- ment each other and together can ameliorate the learned feature representation of the point cloud stu- dent networks. Moreover, with a self-supervised pre- trained 2D network, HVDistill requires neither 2D nor 3D annotations. We pre-train our model on nuScenes dataset and transfer it to several downstream tasks on nuScenes, SemanticKITTI, and KITTI datasets for evaluation. Extensive experimental results show that our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes. Codes are available at git@github.com:zhangsha1024/HVDistill.git.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# TCNet:軌道と関連地域からの連続手話認識 TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions ( http://arxiv.org/abs/2403.11818v1 ) ライセンス: Link先を確認	Hui Lu, Albert Ali Salah, Ronald Poppe,	(参考訳) 連続手話認識(CSLR)における鍵となる課題は、ビデオ入力から長時間にわたる空間的相互作用を効率的に捉えることである。この課題に対処するために,トラジェクトリや相関領域からの時空間情報を効果的にモデル化するハイブリッドネットワークTCNetを提案する。 TCNetのトラジェクトリモジュールは、フレームを連続的な視覚トークンからなる整列トラジェクトリに変換する。さらに、クエリトークンに対しては、トラジェクトリに沿って自己アテンションが学習される。これにより,動作中の特定の領域の指の動きなどの微細な時空間パターンにも注目できる。 TCNetの相関モジュールは、無関係なフレーム領域をフィルタリングする新しいダイナミックアテンション機構を使用している。さらに、相関領域から動的キー値トークンを各クエリに割り当てる。どちらの革新も計算コストとメモリを大幅に削減する。 PHOENIX14, PHOENIX14-T, CSL, CSL-Dailyの4つの大規模データセットの実験を行った。我々の結果は,TCNetが常に最先端のパフォーマンスを達成していることを示している。例えば、PHOENIX14とPHOENIX14-Tの単語誤り率をそれぞれ1.5%、1.0%改善する。 A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuous visual tokens. In addition, for a query token, self-attention is learned along the trajectory. As such, our network can also focus on fine-grained spatio-temporal patterns, such as finger movements, of a specific region in motion. TCNet's correlation module uses a novel dynamic attention mechanism that filters out irrelevant frame regions. Additionally, it assigns dynamic key-value tokens from correlated regions to each query. Both innovations significantly reduce the computation cost and memory. We perform experiments on four large-scale datasets: PHOENIX14, PHOENIX14-T, CSL, and CSL-Daily, respectively. Our results demonstrate that TCNet consistently achieves state-of-the-art performance. For example, we improve over the previous state-of-the-art by 1.5% and 1.0% word error rate on PHOENIX14 and PHOENIX14-T, respectively.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# 画像合成におけるテキストの評価 : 画像品質指標の調査と分類 Evaluating Text to Image Synthesis: Survey and Taxonomy of Image Quality Metrics ( http://arxiv.org/abs/2403.11821v1 ) ライセンス: Link先を確認	Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam, Timo Ropinski,	(参考訳) 近年のテキスト・画像合成の進歩は,基礎モデルによる言語と視覚の組み合わせを利用して実現されている。これらのモデルは、World Wide Webや他の大規模データベースから得られた膨大な量のテキストイメージペアに基づいて事前訓練されている。テキストと画像間のコンテンツアライメントを確保するために高品質な画像生成の需要がシフトするにつれて、人間の判断を模倣する新たな評価指標が開発されてきた。このように、研究者たちは、テキストと画像のコンポジションアライメントの品質尺度として、視覚言語モデルの合成性とそれらの組み合わさを研究するために、ますます複雑なアノテーションを持つデータセットを集め始めている。本稿では,既存のテキスト・画像評価指標の概要を概観し,これらの指標を分類するための新しい分類法を提案する。また,テキストから画像への合成モデルを品質や人為的嗜好に最適化する手法について議論する前に,頻繁なテキスト画像ベンチマークデータセットのレビューを行った。最終的に、テキスト・ツー・イメージの評価を改善するためのガイドラインを導き、オープンな課題と現在の制限について議論する。 Recent advances in text-to-image synthesis have been enabled by exploiting a combination of language and vision through foundation models. These models are pre-trained on tremendous amounts of text-image pairs sourced from the World Wide Web or other large-scale databases. As the demand for high-quality image generation shifts towards ensuring content alignment between text and image, novel evaluation metrics have been developed with the aim of mimicking human judgments. Thus, researchers have started to collect datasets with increasingly complex annotations to study the compositionality of vision-language models and their incorporation as a quality measure of compositional alignment between text and image contents. In this work, we provide a comprehensive overview of existing text-to-image evaluation metrics and propose a new taxonomy for categorizing these metrics. We also review frequently adopted text-image benchmark datasets before discussing techniques to optimize text-to-image synthesis models towards quality and human preferences. Ultimately, we derive guidelines for improving text-to-image evaluation and discuss the open challenges and current limitations.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# CapsLorentzNet: 物理にインスパイアされた機能とグラフの畳み込みの統合 CapsLorentzNet: Integrating Physics Inspired Features with Graph Convolution ( http://arxiv.org/abs/2403.11826v1 ) ライセンス: Link先を確認	Rameswar Sahu,	(参考訳) 高度な機械学習技術の出現により、オブジェクトのタグ付けが大幅に進歩した。本稿では,広い範囲のグラフニューラルネットワーク(GNN)アーキテクチャと互換性のある新しいアーキテクチャ変更を導入することにより,この分野をさらに進める。本手法は,標準GNNにおける従来の復号ブロックを置き換えるカプセル層の統合を提唱する。これらのカプセルは、ベクターアクティベーションを持つニューロンのグループである。これらのベクトルの向きは、研究中の物体がカプセルで表されるクラスに属するかどうかを特徴づける大きさで、研究中の物体の重要な特性を表している。さらに、カプセルネットワークは再構成機構による正規化を取り入れ、専門家が設計した高レベルな特徴をシームレスに分析に統合することを容易にする。クォークグルーオンタギングにおけるLorentzNetアーキテクチャによるアーキテクチャの有用性について検討した。ここでは、LorentzNetの復号ブロックをカプセル化復号ブロックに置き換え、結果のアーキテクチャをCapsLorentzNetと呼ぶ。我々の新しいアーキテクチャはクォークグルーオンタギングタスクにおいてローレンツネットの性能を20%向上させることができる。 With the advent of advanced machine learning techniques, boosted object tagging has witnessed significant progress. In this article, we take this field further by introducing novel architectural modifications compatible with a wide array of Graph Neural Network (GNN) architectures. Our approach advocates for integrating capsule layers, replacing the conventional decoding blocks in standard GNNs. These capsules are a group of neurons with vector activations. The orientation of these vectors represents important properties of the objects under study, with their magnitude characterizing whether the object under study belongs to the class represented by the capsule. Moreover, capsule networks incorporate a regularization by reconstruction mechanism, facilitating the seamless integration of expert-designed high-level features into the analysis. We have studied the usefulness of our architecture with the LorentzNet architecture for quark-gluon tagging. Here, we have replaced the decoding block of LorentzNet with a capsulated decoding block and have called the resulting architecture CapsLorentzNet. Our new architecture can enhance the performance of LorentzNet by 20 \% for the quark-gluon tagging task.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# 距離推定による音事象の検出と位置推定 Sound Event Detection and Localization with Distance Estimation ( http://arxiv.org/abs/2403.11827v1 ) ライセンス: Link先を確認	Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros,	(参考訳) 音事象検出と局所化(SELD)は、音事象とその対応方向(DOA)を識別する複合タスクである。この課題には多くの応用があり、近年広く研究されているが、音源位置に関する完全な情報の提供には失敗している。本稿では,タスクを音事象検出,距離推定による局所化(3D SELD)に拡張することで,この問題を克服する。本研究では,SELDコア内に距離推定を統合する2つの方法について検討する。これは,問題を個別のモデル出力で処理するマルチタスクアプローチと,マルチACCDOA法を距離情報を含むように拡張したシングルタスクアプローチである。 STARSS23: Sony-TAU Realistic Space Soundscapes 2023。さらに,距離推定部に関連する損失関数について実験を行った。以上の結果から,音事象検出やDOA推定における性能劣化を伴わずに3D SELDを行うことが可能であることが示唆された。 Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA). While this task has numerous applications and has been extensively researched in recent years, it fails to provide full information about the sound source position. In this paper, we overcome this problem by extending the task to Sound Event Detection, Localization with Distance Estimation (3D SELD). We study two ways of integrating distance estimation within the SELD core - a multi-task approach, in which the problem is tackled by a separate model output, and a single-task approach obtained by extending the multi-ACCDOA method to include distance information. We investigate both methods for the Ambisonic and binaural versions of STARSS23: Sony-TAU Realistic Spatial Soundscapes 2023. Moreover, our study involves experiments on the loss function related to the distance estimation part. Our results show that it is possible to perform 3D SELD without any degradation of performance in sound event detection and DOA estimation.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# グラフニューラルネットワークを用いたネットワーク侵入検知システムにおける問題空間構造逆攻撃 Problem space structural adversarial attacks for Network Intrusion Detection Systems based on Graph Neural Networks ( http://arxiv.org/abs/2403.11830v1 ) ライセンス: Link先を確認	Andrea Venturi, Dario Stabili, Mirco Marchetti,	(参考訳) 機械学習(ML)アルゴリズムは、ネットワーク侵入検知システム(NIDS)をサポートするためにますます人気が高まっている。それにもかかわらず、大規模な研究により、敵攻撃に対する脆弱性が示されており、その性能を損なうことを目的としたモデルの入力に微妙な摂動が伴っている。最近の提案では、グラフニューラルネットワーク(GNN)を有効活用して、侵入による構造パターンにもとづいて、検出ロバスト性の向上を図っている。しかし、GNNベースのNIDSの採用は、新しいタイプのリスクをもたらす。本稿では,ネットワーク侵入検知におけるGNNに適した敵攻撃の最初の形式化を提案する。さらに,現実のシナリオにおいて,実行可能な構造攻撃を行うためには,攻撃者が考慮すべき問題空間の制約を概説し,モデル化する。最終的な貢献として、我々は、最先端のGNNベースのNIDSに対して提案された攻撃を開始するための広範な実験的キャンペーンを実施している。本研究は, 古典的特徴に基づく攻撃に対するモデルの堅牢性の向上と, 構造的攻撃に対する感受性を強調した。 Machine Learning (ML) algorithms have become increasingly popular for supporting Network Intrusion Detection Systems (NIDS). Nevertheless, extensive research has shown their vulnerability to adversarial attacks, which involve subtle perturbations to the inputs of the models aimed at compromising their performance. Recent proposals have effectively leveraged Graph Neural Networks (GNN) to produce predictions based also on the structural patterns exhibited by intrusions to enhance the detection robustness. However, the adoption of GNN-based NIDS introduces new types of risks. In this paper, we propose the first formalization of adversarial attacks specifically tailored for GNN in network intrusion detection. Moreover, we outline and model the problem space constraints that attackers need to consider to carry out feasible structural attacks in real-world scenarios. As a final contribution, we conduct an extensive experimental campaign in which we launch the proposed attacks against state-of-the-art GNN-based NIDS. Our findings demonstrate the increased robustness of the models against classical feature-based adversarial attacks, while highlighting their susceptibility to structure-based attacks.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator ( http://arxiv.org/abs/2403.11833v1 ) ライセンス: Link先を確認	Javad Rafiei Asl, Mohammad H. Rafiei, Manar Alohaly, Daniel Takabi,	(参考訳) マシンラーニングモデルは、悪意ある構築されたAdversarial Examples(AEs)に対して脆弱である。 AEsで機械学習モデルをトレーニングすることで、敵攻撃に対する堅牢性と安定性が向上する。高品質のAEを生産するモデルを開発することが不可欠である。このようなモデルの開発は、自然言語処理(NLP)においてコンピュータビジョンのような分野よりもはるかに遅い。本稿では,SSCAE for \textbf{S}emantic, \textbf{S}yntactic, \textbf{C}ontext-aware natural language \textbf{AE}s generatorを提案する。 SSCAEは重要な単語を特定し、マスク付き言語モデルを使用して、初期の置換セットを生成する。次に、2つのよく知られた言語モデルを用いて、意味的および構文的特性の観点から初期集合を評価する。本稿では,(1)より効率的な摂動を捉えるダイナミックしきい値,(2)高品質なAEを生成するための局所的な欲求探索について紹介する。ブラックボックスの手法として、SSCAEは、意味的一貫性とソース言語の構文的および文法的要求を保った、人間には受け入れ難い、コンテキスト対応のAEを生成する。提案したSSCAEモデルの有効性と優位性について,15種類の比較実験とパラメータ最適化のための広範囲な感度解析を行った。 SSCAEは、より低いクエリ数と同等の摂動率で高いセマンティック一貫性を維持しながら、すべての実験で既存のモデルよりも優れています。 Machine learning models are vulnerable to maliciously crafted Adversarial Examples (AEs). Training a machine learning model with AEs improves its robustness and stability against adversarial attacks. It is essential to develop models that produce high-quality AEs. Developing such models has been much slower in natural language processing (NLP) than in areas such as computer vision. This paper introduces a practical and efficient adversarial attack model called SSCAE for \textbf{S}emantic, \textbf{S}yntactic, and \textbf{C}ontext-aware natural language \textbf{AE}s generator. SSCAE identifies important words and uses a masked language model to generate an early set of substitutions. Next, two well-known language models are employed to evaluate the initial set in terms of semantic and syntactic characteristics. We introduce (1) a dynamic threshold to capture more efficient perturbations and (2) a local greedy search to generate high-quality AEs. As a black-box method, SSCAE generates humanly imperceptible and context-aware AEs that preserve semantic consistency and the source language's syntactical and grammatical requirements. The effectiveness and superiority of the proposed SSCAE model are illustrated with fifteen comparative experiments and extensive sensitivity analysis for parameter optimization. SSCAE outperforms the existing models in all experiments while maintaining a higher semantic consistency with a lower query number and a comparable perturbation rate.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# インコンテクスト学習と構成一般化の関係の理解に向けて Towards Understanding the Relationship between In-context Learning and Compositional Generalization ( http://arxiv.org/abs/2403.11834v1 ) ライセンス: Link先を確認	Sungjun Han, Sebastian Padó,	(参考訳) 構成一般化の原理によれば、複素表現の意味は、その部分の意味とそれらがどのように結合されるかの関数として理解することができる。この原理は人間の言語処理に不可欠であり、また、アウト・オブ・ディストリビューションデータに直面したNLPモデルにも不可欠である。しかし、トランスフォーマーを含む多くのニューラルネットワークモデルは、構成一般化に苦しむことが示されている。本稿では,モデルに文脈内学習を強制することは,構成一般化を促進する帰納的バイアスをもたらすと仮定する。この仮説をテストするために、通常の学習を非常に難しい設定で因果変換器を訓練し、トレーニングインスタンスとシャッフルインスタンスラベルの異なる順序で提示する。これは、データセットから達成可能な、可能な数発の学習問題のすべてについて、モデルをトレーニングすることに対応する。しかし、このモデルは、初期の例を利用して、後の例(例えば、文脈内学習)に一般化することで、タスクを解くことができる。データセット、SCAN、COGS、GeoQueryの評価では、この方法でトレーニングされたモデルは、実際に合成の一般化の改善を示している。このことは、一般化のための帰納的バイアスとして、文脈内学習問題の有用性を示している。 According to the principle of compositional generalization, the meaning of a complex expression can be understood as a function of the meaning of its parts and of how they are combined. This principle is crucial for human language processing and also, arguably, for NLP models in the face of out-of-distribution data. However, many neural network models, including Transformers, have been shown to struggle with compositional generalization. In this paper, we hypothesize that forcing models to in-context learn can provide an inductive bias to promote compositional generalization. To test this hypothesis, we train a causal Transformer in a setting that renders ordinary learning very difficult: we present it with different orderings of the training instance and shuffle instance labels. This corresponds to training the model on all possible few-shot learning problems attainable from the dataset. The model can solve the task, however, by utilizing earlier examples to generalize to later ones (i.e. in-context learning). In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization. This indicates the usefulness of in-context learning problems as an inductive bias for generalization.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# Agent3D-Zero: ゼロショット3D理解のためのエージェント Agent3D-Zero: An Agent for Zero-shot 3D Understanding ( http://arxiv.org/abs/2403.11835v1 ) ライセンス: Link先を確認	Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang,	(参考訳) 3Dの現実世界を理解する能力は、人工知能にとって重要なマイルストーンだ。現在の一般的なプラクティスは、大規模言語モデル(LLM)を3Dデータとテキストで微調整し、3D理解を可能にすることです。有効性にもかかわらず、これらのアプローチは本来、利用可能な3Dデータのスケールと多様性によって制限される。また,本研究では,ゼロショット方式で3Dシーン理解を実現する革新的な3DエージェントフレームワークであるAgent3D-Zeroを紹介する。アプローチの本質は、人間がどのように3Dシーンを理解しようとするかに触発されて、複数の画像からの洞察を理解し、合成するプロセスとして、3Dシーン知覚の課題を再認識することに集中している。本稿では,この概念を統合することで,3次元理解のための視点を積極的に選択・分析することで,大規模視覚言語モデル(VLM)を利用する新しい手法を提案する。具体的には、入力された3Dシーンが与えられた場合、Agent3D-Zeroはまず、カスタムデザインの視覚的プロンプトで鳥眼視画像を処理し、次に視点を選択して、基礎となる知識を観察し、要約する。 Agent3D-Zeroの独特な利点は、視覚的プロンプトの導入である。広範囲な実験により, 多様な3D環境を理解する上で, 提案手法の有効性が示された。 The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence. The current common practice is to finetune Large Language Models (LLMs) with 3D data and texts to enable 3D understanding. Despite their effectiveness, these approaches are inherently limited by the scale and diversity of the available 3D data. Alternatively, in this work, we introduce Agent3D-Zero, an innovative 3D-aware agent framework addressing the 3D scene understanding in a zero-shot manner. The essence of our approach centers on reconceptualizing the challenge of 3D scene perception as a process of understanding and synthesizing insights from multiple images, inspired by how our human beings attempt to understand 3D scenes. By consolidating this idea, we propose a novel way to make use of a Large Visual Language Model (VLM) via actively selecting and analyzing a series of viewpoints for 3D understanding. Specifically, given an input 3D scene, Agent3D-Zero first processes a bird's-eye view image with custom-designed visual prompts, then iteratively chooses the next viewpoints to observe and summarize the underlying knowledge. A distinctive advantage of Agent3D-Zero is the introduction of novel visual prompts, which significantly unleash the VLMs' ability to identify the most informative viewpoints and thus facilitate observing 3D scenes. Extensive experiments demonstrate the effectiveness of the proposed framework in understanding diverse and previously unseen 3D environments.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# 安全と高品質のアウトプットの確保: 言語モデルに対するガイドラインライブラリアプローチ Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models ( http://arxiv.org/abs/2403.11838v1 ) ライセンス: Link先を確認	Yi Luo, Zhenghao Lin, Yuhao Zhang, Jiashuo Sun, Chen Lin, Chengjin Xu, Xiangdong Su, Yelong Shen, Jian Guo, Yeyun Gong,	(参考訳) 大きな言語モデル(LLM)は印象的な能力を示すだけでなく、バイアスのあるコンテンツ生成やプライバシの問題といったリスクも提示する。現在のアライメント手法の1つは、原則駆動の統合を含んでいるが、手作業によるルールの不正確さと、安全トレーニングのないモデルにおけるリスク認識の不十分さから生じる課題に直面している。これらの問題に対処するために,2段階のアプローチである Guide-Align を導入する。当初,安全訓練モデルでは潜在的なリスクを特定し,様々な入力に対する特定のガイドラインを定式化し,入力誘導検索のためのガイドラインとモデルの包括的ライブラリを構築した。その後、検索モデルは、新しい入力と関連するガイドラインを関連付け、応答生成におけるLCMを誘導し、安全で高品質な出力を保証し、人間の値と整合する。追加のオプションステージでは、第2ステージで実装されたプロセスを通じて生成された、新しい整列データセットでモデルを微調整する。本手法は,多様な入力に対応するためのガイドラインをカスタマイズし,ガイドラインライブラリのきめ細かい粒度と包括性を向上する。さらに、軽量検索モデルにより、安全訓練されたLLMの安全性に関する専門知識を取り入れている。当社のアプローチを3つのベンチマークで評価し,LLMのセキュリティと品質の大幅な向上を実証した。特に、微調整されたモデルであるRaradorは、パラメータが13億であっても、GPT-3.5-turboより優れ、アライメント能力はGPT-4より優れています。 Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, thereby establishing a comprehensive library of guidelines and models for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with pertinent guidelines, guiding LLMs in response generation to ensure safe and high-quality outputs, thus aligning with human values. An additional optional stage involves fine-tuning a model with new well-aligned datasets generated through the process implemented in the second stage. Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model. We evaluated our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.	翻訳日:2024-03-20 20:10:10 公開日:2024-03-18
# 知識指導型機械学習の強化手法としての多項目比較 Multi-Criteria Comparison as a Method of Advancing Knowledge-Guided Machine Learning ( http://arxiv.org/abs/2403.11840v1 ) ライセンス: Link先を確認	Jason L. Harman, Jaelle Scheuerman,	(参考訳) 本稿では,AI/MLモデルの評価に適用可能な一般化可能なモデル評価手法について述べる。心理学・決定科学における予測競争から発展し、複数の科学的、理論的、実践的な基準にまたがる様々なタイプと構造の候補モデル群を評価する。基準スコアの正規ランキングは、計算社会選択の分野からの投票規則を用いて評価され、総合的な評価において、異なる尺度とモデルのタイプの比較を可能にする。さらなる利点と応用について論じる。 This paper describes a generalizable model evaluation method that can be adapted to evaluate AI/ML models across multiple criteria including core scientific principles and more practical outcomes. Emerging from prediction competitions in Psychology and Decision Science, the method evaluates a group of candidate models of varying type and structure across multiple scientific, theoretic, and practical criteria. Ordinal ranking of criteria scores are evaluated using voting rules from the field of computational social choice and allow the comparison of divergent measures and types of models in a holistic evaluation. Additional advantages and applications are discussed.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# オフラインデータ構築のためのメディエータを用いた悲観的因果強化学習 Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data ( http://arxiv.org/abs/2403.11841v1 ) ライセンス: Link先を確認	Danyang Wang, Chengchun Shi, Shikai Luo, Will Wei Sun,	(参考訳) 実世界のシナリオでは、ランダム化実験から収集されたデータセットは、時間と予算の制限のため、サイズによって制限されることが多い。結果として、大規模な観測データセットを活用することは、高品質な政策学習を実現するためのより魅力的な選択肢となる。しかし、既存のオフライン強化学習(RL)手法の多くは、観測データコンテキストにおいてしばしば保持されない非確立性と肯定性の2つの重要な仮定に依存している。これらの課題を認識し,新しいポリシー学習アルゴリズム PESsimistic CAusal Learning (PESCAL) を提案する。また, 提案手法では, 前方基準に基づくメディエータ変数を用いて, 境界バイアスを除去し, また, 候補ポリシーによって誘導される行動分布と観測データを生成する行動ポリシーの分布シフトに対処する悲観的原理を採用する。我々のキーとなる観察は、系の力学に作用の作用を媒介する補助変数を組み込むことで、Q関数の代わりにメディエータ分布関数の下限を学習し、分散シフトの問題を部分的に緩和するのに十分であるということである。この知見は,推定Q-関数に対する逐次不確実性定量化の課題を回避することによって,我々のアルゴリズムを著しく単純化する。さらに,提案するアルゴリズムの理論的保証とシミュレーションによる有効性の実証,および主要な配車プラットフォームからのオフラインデータセットを利用した実環境実験も提供する。 In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning. However, most existing offline reinforcement learning (RL) methods depend on two key assumptions--unconfoundedness and positivity--which frequently do not hold in observational data contexts. Recognizing these challenges, we propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on front-door criterion to remove the confounding bias; additionally, we adopt the pessimistic principle to address the distributional shift between the action distributions induced by candidate policies, and the behavior policy that generates the observational data. Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# ファジィラフショケット距離の分類 Fuzzy Rough Choquet Distances for Classification ( http://arxiv.org/abs/2403.11843v1 ) ライセンス: Link先を確認	Adnan Theerens, Chris Cornelis,	(参考訳) 本稿では,ファジィラフセットに基づく新しいチョケット距離を提案する。提案手法は,ファジィ粗集合理論から受信した属性情報とチョーケ積分の柔軟性を組み合わせたものである。このアプローチは、データ内の非線形関係を順応的にキャプチャし、条件属性の判断属性に対する相互作用を認め、より柔軟で正確な距離をもたらすように設計されている。我々は、距離に基づく分類アプローチ(例えば、k-アネレスト近傍)に特に重点を置いて、機械学習の文脈におけるその応用を探求する。本論文は,2つのファジィ粗度に基づく正の領域に基づく測度について検討する。さらに,ファジィ粗集合論から導かれる測度をモノトナイズする2つの手法を探索し,これらをチョーケ積分で用いるのに適したものにし,それらの相違について検討する。 This paper introduces a novel Choquet distance using fuzzy rough set based measures. The proposed distance measure combines the attribute information received from fuzzy rough set theory with the flexibility of the Choquet integral. This approach is designed to adeptly capture non-linear relationships within the data, acknowledging the interplay of the conditional attributes towards the decision attribute and resulting in a more flexible and accurate distance. We explore its application in the context of machine learning, with a specific emphasis on distance-based classification approaches (e.g. k-nearest neighbours). The paper examines two fuzzy rough set based measures that are based on the positive region. Moreover, we explore two procedures for monotonizing the measures derived from fuzzy rough set theory, making them suitable for use with the Choquet integral, and investigate their differences.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 制約付き学習問題の最適解法 Near-Optimal Solutions of Constrained Learning Problems ( http://arxiv.org/abs/2403.11844v1 ) ライセンス: Link先を確認	Juan Elenter, Luiz F. O. Chamon, Alejandro Ribeiro,	(参考訳) 機械学習システムが広く採用されるにつれて、その振る舞いを縮める必要性がますます顕在化している。これは、堅牢性、安全性、公正性要件を満たすモデルの開発に向けた最近の進歩によって証明されている。これらの要件は、制約付き学習問題の定式化によって(一般化保証付きで)課せられ、二重登頂アルゴリズムによって取り組めます。しかし、これらのアルゴリズムは客観的な値に収束するが、凸のない設定であっても、結果が実現可能であることは保証できない。そのためにはすべての反復をランダム化する必要があるが、これは事実上現代のアプリケーションでは現実的ではない。それでも、最終的なイテレーションは、実際にうまく機能することが観察されている。本研究では、凸性の欠如にもかかわらず、最適双対変数に付随するラグランジアン最小値の制約違反を特徴付けることにより、理論と実践の間のこのギャップに対処する。これを実現するために,非凸有限次元制約学習問題を凸関数問題のパラメトリゼーションとみなすことができる。本結果から,2つの手法の実現可能性の問題を効果的に緩和し,従来の2つの学習の実証的成功に光を当てることが示唆された。フェアラーニングの課題について,本研究の成果を概説する。 With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# GraphBEV:マルチモード3Dオブジェクト検出のためのロバストなBEV機能アライメントを目指して GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection ( http://arxiv.org/abs/2403.11848v1 ) ライセンス: Link先を確認	Ziying Song, Lei Yang, Shaoqing Xu, Lin Liu, Dongyang Xu, Caiyan Jia, Feiyang Jia, Li Wang,	(参考訳) LiDARとカメラ情報をBird's-Eye-View(BEV)表現に統合することは、自動運転における3Dオブジェクト検出の重要な側面として現れている。しかし,既存の手法は,LiDARとカメラセンサの不正確な校正関係の影響を受けやすい。このような不正確さは、カメラブランチの深さ推定の誤差をもたらし、最終的にLiDARとカメラBEVの特徴の不一致を引き起こす。本研究では,グラフBEVと呼ばれる堅牢な融合フレームワークを提案する。不正確なポイントクラウドプロジェクションによるエラーに対処するため、グラフマッチングを介して近隣の認識深度機能を利用するLocal Alignモジュールを導入する。さらに,LiDARとカメラBEVの機能の相違を是正するGlobal Alignモジュールを提案する。当社のグラフBEVフレームワークは,nuscenes検証セットにおいて,mAPが70.1\%,BEV Fusionが1.6\%を超え,最先端のパフォーマンスを実現している。重要な点として、我々のグラフBEVは、悪臭のある条件下で、BEV Fusionを8.3%上回っている。 Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 表面電子に由来するナノスケールカシミール力軟化 Nanoscale Casimir force softening originated from surface electrons ( http://arxiv.org/abs/2403.11849v1 ) ライセンス: Link先を確認	Hewan Zhang, Kun Ding,	(参考訳) 真空場と量子物質の強い結合はナノスケールで発生し、光-物質相互作用の地平線を広げる。真空場の展示としてのナノスケールカシミール力は、マイクロンカシミール力では無視できない量子特性による表面電子の影響を必然的に経験する。そこで我々は,カシミール力に対する表面電子の寄与を目的かつ微妙に含む典型的な実験構成に対処する3次元等角写像法を開発した。本手法により,表面電子は材料や結晶面に依存するナノスケールカシミール力を増強または抑制できることを明らかにした。この機構はカシミール力軟化であり、カシミール相互作用で見られる距離を効果的に変化させる表面電子から生じる。本研究は, 表面電子と真空場との相互作用を浮き彫りにするだけでなく, ナノスケールの揺らぎ型問題に関する理論的および実験的研究のレシピを提供する。 Strong coupling between vacuum fields and quantum matter occurs at the nanoscale and broadens the horizon of light-matter interaction. Nanoscale Casimir force, as an exhibition of vacuum fields, inevitably experiences the influence of surface electrons due to their quantum character, which are ignorable in micron Casimir force. Here, we develop a three-dimensional conformal map method to tackle typical experimental configurations with surface electron contributions to Casimir force purposely and delicately included. Based on this method, we reveal that surface electrons can either enhance or suppress the nanoscale Casimir force, depending on materials and crystal facets. The mechanism is demonstrated to be the Casimir force softening, which results from surface electrons effectively altering the distance seen by the Casimir interaction. Our findings not only highlight the interaction between surface electrons and vacuum fields but also provide a recipe for theoretical and experimental investigation of nanoscale fluctuation-type problems.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 長距離コヒーレント攻撃に対する片側DI-QKDの安全性 One-sided DI-QKD secure against coherent attacks over long distances ( http://arxiv.org/abs/2403.11850v1 ) ライセンス: Link先を確認	Michele Masini, Shubhayan Sarkar,	(参考訳) 量子鍵分配(Quantum Key Distribution, QKD)は、証明可能なセキュアな通信を可能にする技術である。デバイス独立(DI) QKDプロトコルは、最小限のデバイス仮定をすることでこの問題を克服するが、高い検出効率を必要とするため距離が限られている。したがって、デバイス上の現実的な仮定に基づく量子鍵分布プロトコルを見つけ、長距離で実装することが望ましい。本研究では,一方的なDI QKD方式を当事者ごとに2つの測定値で検討し,信頼できない側で50.1%以上の効率を検出するためのコヒーレントアタックに対して安全であることを示す。これは、2つの信頼できない測度を持つプロトコルに対して達成可能な理論上の限界である。興味深いことに、信頼できない側に状態のソースを置くことで、我々のプロトコルは標準QKDプロトコルに匹敵する距離にわたって安全であることを示す。 Quantum Key Distribution (QKD) is a technique enabling provable secure communication but faces challenges in device characterization, posing potential security risks. Device-Independent (DI) QKD protocols overcome this issue by making minimal device assumptions but are limited in distance because they require high detection efficiencies, which refer to the ability of the experimental setup to detect quantum states. It is thus desirable to find quantum key distribution protocols that are based on realistic assumptions on the devices as well as implementable over long distances. In this work, we consider a one-sided DI QKD scheme with two measurements per party and show that it is secure against coherent attacks up to detection efficiencies greater than 50.1% specifically on the untrusted side. This is almost the theoretical limit achievable for protocols with two untrusted measurements. Interestingly, we also show that, by placing the source of states close to the untrusted side, our protocol is secure over distances comparable to standard QKD protocols.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 改良型デ・フィネッティ・リダクションを用いた光量子鍵分布のポストセレクション法 Postselection technique for optical Quantum Key Distribution with improved de Finetti reductions ( http://arxiv.org/abs/2403.11851v1 ) ライセンス: Link先を確認	Shlok Nahar, Devashish Tupkary, Yuming Zhao, Norbert Lütkenhaus, Ernest Tan,	(参考訳) ポストセレクション技術は、コヒーレント攻撃に対する量子鍵分布プロトコルの安全性を証明する重要な証明手法である。本研究では,光量子鍵分布プロトコルにポストセレクション手法を厳格に適用するために,複数のステップを踏襲する。まず, ポストセレクション手法を厳密な数学的基礎の上に配置し, 元のポストセレクション論文の技術的欠陥を修正した。次に,デ・フィネッティ・リダクション(De Finetti reduction)を用いて提案手法を適用し,提案手法の適用性について検討した。第3に、ソースにタグを付けることで、deoy-stateプロトコルにポストセレクション手法をどのように利用できるかを示す。最後に, フラッグステート・スカッシャーの新たな変種を開発することにより, ポストセレクション技術の適用性を, リアルな光学装置に拡張する。また,既存のデ・フィネッティ減量法を改良し,キーレートに対するポストセレクション手法の適用効果を低減した。これらの改善は他の量子情報処理タスクにも適用できる。本稿では,本研究の適用性を示す例として,タイムビン符号化三状態プロトコルに適用する。我々は,ポストセレクション手法が,コヒーレント攻撃に対する他の既知の証明手法よりも優れていることを観察した。 The postselection technique is an important proof technique for proving the security of quantum key distribution protocols against coherent attacks. In this work, we go through multiple steps to rigorously apply the postselection technique to optical quantum key distribution protocols. First, we place the postselection technique on a rigorous mathematical foundation by fixing a technical flaw in the original postselection paper. Second, we extend the applicability of the postselection technique to prepare-and-measure protocols by using a de Finetti reduction with a fixed marginal. Third, we show how the postselection technique can be used for decoy-state protocols by tagging the source. Finally, we extend the applicability of the postselection technique to realistic optical setups by developing a new variant of the flag-state squasher. We also improve existing de Finetti reductions, which reduce the effect of using the postselection technique on the key rate. These improvements can be more generally applied to other quantum information processing tasks. As an example to demonstrate the applicability of our work, we apply our results to the time-bin encoded three-state protocol. We observe that the postselection technique performs better than all other known proof techniques against coherent attacks.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# denoiSplit: ジョイントイメージ分割と教師なし denoising の方法 denoiSplit: a method for joint image splitting and unsupervised denoising ( http://arxiv.org/abs/2403.11854v1 ) ライセンス: Link先を確認	Ashesh Ashesh, Florian Jug,	(参考訳) 本研究では,新しい分析課題,すなわち共同意味画像分割と教師なし認知の課題に対処する手法であるdenoiSplitを提案する。この二重アプローチは蛍光顕微鏡において重要な応用であり、セマンティック画像分割は重要な応用であるが、ノイズは一般的に画像内容の下流解析を妨げる。画像分割は、イメージを識別可能なセマンティック構造に分割することを含む。この課題に対する現在の最先端の手法は、意図せず予測された出力にノイズを分散させることによって、画像ノイズの存在に苦しむことを示す。ここでは、教師なしの減音サブタスクを統合することで、画像ノイズに対処することができる。この統合により、画像ノイズの顕著かつ現実的なレベルが存在する場合でも、セマンティックイメージのアンミックスが改善される。デノワスプリットの重要な革新は、特に定式化されたノイズモデルの使用と、我々が訓練している高次元階層型潜在空間に対するKL偏差損失の適切な調整である。実世界の顕微鏡画像において,4つのタスクにまたがるデノワスプリットの性能を示す。さらに,1つの変分分割エンコーダデコーダ(VSE)ネットワークを用いて,2つの適切なノイズモデルを用いてセマンティックスプリッティングとデノナイジングを共同で行うことにより,定性的かつ定量的な評価を行い,既存のベンチマークと比較した。 In this work we present denoiSplit, a method to tackle a new analysis task, i.e. the challenge of joint semantic image splitting and unsupervised denoising. This dual approach has important applications in fluorescence microscopy, where semantic image splitting has important applications but noise does generally hinder the downstream analysis of image content. Image splitting involves dissecting an image into its distinguishable semantic structures. We show that the current state-of-the-art method for this task struggles in the presence of image noise, inadvertently also distributing the noise across the predicted outputs. The method we present here can deal with image noise by integrating an unsupervised denoising sub-task. This integration results in improved semantic image unmixing, even in the presence of notable and realistic levels of imaging noise. A key innovation in denoiSplit is the use of specifically formulated noise models and the suitable adjustment of KL-divergence loss for the high-dimensional hierarchical latent space we are training. We showcase the performance of denoiSplit across 4 tasks on real-world microscopy images. Additionally, we perform qualitative and quantitative evaluations and compare results to existing benchmarks, demonstrating the effectiveness of using denoiSplit: a single Variational Splitting Encoder-Decoder (VSE) Network using two suitable noise models to jointly perform semantic splitting and denoising.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 結晶特性予測のための完全かつ効率的なグラフ変換器 Complete and Efficient Graph Transformers for Crystal Material Property Prediction ( http://arxiv.org/abs/2403.11857v1 ) ライセンス: Link先を確認	Keqiang Yan, Cong Fu, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji,	(参考訳) 結晶構造は、3次元空間の正則格子に沿って繰り返される原始単位セル内の原子塩基によって特徴づけられる。結晶の周期性と無限の性質は、幾何学グラフ表現学習に固有の課題を提起する。具体的には、結晶の完全な幾何学的情報を効果的に捉え、キラル結晶を扱うグラフを構築することは、未解決で困難な問題である。本稿では, 単位細胞の周期的パターンを利用して各原子の格子に基づく表現を確立し, 結晶の効率的かつ表現力のあるグラフ表現を実現する手法を提案する。さらに,結晶材料に特化して設計されたSE(3)トランスであるComFormerを提案する。 ComFormerには、ユークリッド距離と角度の不変な幾何学的記述子を使用するiComFormerと、等変ベクトル表現を使用するeComFormerの2つの変種が含まれている。実験により,ComFormer変種が広く使用されている3つの結晶ベンチマークにおいて,様々なタスクにおいて精度良く予測できることが実証された。私たちのコードはAIRSライブラリ(https://github.com/divelab/AIRS)の一部として公開されています。 Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an unsolved and challenging problem. In this paper, we introduce a novel approach that utilizes the periodic patterns of unit cells to establish the lattice-based representation for each atom, enabling efficient and expressive graph representations of crystals. Furthermore, we propose ComFormer, a SE(3) transformer designed specifically for crystalline materials. ComFormer includes two variants; namely, iComFormer that employs invariant geometric descriptors of Euclidean distances and angles, and eComFormer that utilizes equivariant vector representations. Experimental results demonstrate the state-of-the-art predictive accuracy of ComFormer variants on various tasks across three widely-used crystal benchmarks. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 評価指標としてのGPT-4:農業における害虫管理における大規模言語モデルの評価 GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture ( http://arxiv.org/abs/2403.11858v1 ) ライセンス: Link先を確認	Shanglong Yang, Zhipeng Yuan, Shunbao Li, Ruoling Peng, Kang Liu, Po Yang,	(参考訳) 人工知能(AI)の急速に発展する分野では、農業、特に害虫管理における大規模言語モデル(LLM)の適用は、いまだに初期段階にある。我々は,OpenAIのGenerative Pre-trained Transformer(GPT)シリーズやGoogleのFLANシリーズなど,LLMsが生み出す害虫管理アドバイスの内容を評価することで,その実現可能性を証明することを目的とした。農業アドバイスの文脈固有の性質を考えると、LLMが生成するテキストの品質を自動的に測定または定量化することは重要な課題である。我々は, GPT-4 を評価指標として, コヒーレンス, 論理的一貫性, 頻度, 妥当性, 包括性, 実行性について, 生成した内容を評価する革新的な手法を提案した。さらに,収穫閾値データに基づくエキスパートシステムをベースラインとして統合し,農作物に生息する害虫が管理行動をとるかどうかの実態的精度のスコアを得る。各モデルのスコアは、最終的なスコアを得るためにパーセンテージによって重み付けされた。その結果, GPT-3.4 と GPT-4 はほとんどの評価カテゴリーにおいて FLAN モデルより優れていた。さらに、ドメイン固有の知識を含む指導ベースのプロンプトの使用は、農耕において有効なツールとしてLLMsが有効であることが証明され、精度は72%となり、害虫管理の提案を行う上でのLLMsの有効性が示された。 In the rapidly evolving field of artificial intelligence (AI), the application of large language models (LLMs) in agriculture, particularly in pest management, remains nascent. We aimed to prove the feasibility by evaluating the content of the pest management advice generated by LLMs, including the Generative Pre-trained Transformer (GPT) series from OpenAI and the FLAN series from Google. Considering the context-specific properties of agricultural advice, automatically measuring or quantifying the quality of text generated by LLMs becomes a significant challenge. We proposed an innovative approach, using GPT-4 as an evaluator, to score the generated content on Coherence, Logical Consistency, Fluency, Relevance, Comprehensibility, and Exhaustiveness. Additionally, we integrated an expert system based on crop threshold data as a baseline to obtain scores for Factual Accuracy on whether pests found in crop fields should take management action. Each model's score was weighted by percentage to obtain a final score. The results showed that GPT-3.4 and GPT-4 outperform the FLAN models in most evaluation categories. Furthermore, the use of instruction-based prompting containing domain-specific knowledge proved the feasibility of LLMs as an effective tool in agriculture, with an accuracy rate of 72%, demonstrating LLMs' effectiveness in providing pest management suggestions.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# SAML V2.0 Web Browser SSO 標準の自動セキュリティ解析に向けて - POST/Artifact のユースケース Towards automated formal security analysis of SAML V2.0 Web Browser SSO standard - the POST/Artifact use case ( http://arxiv.org/abs/2403.11859v1 ) ライセンス: Link先を確認	Zvonimir Hartl, Ante Đerek,	(参考訳) シングルサインオン(SSO)プロトコルは、複数のオンラインサービスに対する統一ログインによるユーザ認証を合理化し、ユーザビリティとセキュリティを改善している。最も一般的なSSOプロトコルフレームワークの1つであるSecurity Assertion Markup Language V2.0 (SAML) Web SSO Profileは、主に政府、教育、エンタープライズ環境で20年以上使われてきた。ミッションクリティカルな性質にもかかわらず、Web SSO Profileの特定の配置と構成のみが公式に分析されている。本稿では,POST/Artifact Bindingsのユースケースを用いて,SAML V2.0 SP-initiated SSOの総合的なセキュリティ解析を行うことにより,このギャップを埋めようとしている。特定のデプロイメントや構成に集中するのではなく、標準で許可された多くの異なるデプロイメントをキャプチャすることを目標として、仕様をしっかりとフォローしています。モデリングと解析は,暗号化のシンボリックモデルにおけるセキュリティプロトコルの自動検証のための最先端ツールであるTamarin proverを用いて行われる。技術的には、ユースケースのメタモデルを構築し、8つの異なるプロトコルの変種にインスタンス化します。 Tamarinの証明器を使って、これらのプロトコルの変種に対して、いくつかの重要なセキュリティ特性を正式に検証し、特定の欠点と潜在的な脆弱性を特定します。 Single Sign-On (SSO) protocols streamline user authentication with a unified login for multiple online services, improving usability and security. One of the most common SSO protocol frameworks - the Security Assertion Markup Language V2.0 (SAML) Web SSO Profile - has been in use for more than two decades, primarily in government, education and enterprise environments. Despite its mission-critical nature, only certain deployments and configurations of the Web SSO Profile have been formally analyzed. This paper attempts to bridge this gap by performing a comprehensive formal security analysis of the SAML V2.0 SP-initiated SSO with POST/Artifact Bindings use case. Rather than focusing on a specific deployment and configuration, we closely follow the specification with the goal of capturing many different deployments allowed by the standard. Modeling and analysis is performed using Tamarin prover - state-of-the-art tool for automated verification of security protocols in the symbolic model of cryptography. Technically, we build a meta-model of the use case that we instantiate to eight different protocol variants. Using the Tamarin prover, we formally verify a number of critical security properties for those protocol variants, while identifying certain drawbacks and potential vulnerabilities.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# 熱画像を用いたマルチモーダルニューラルシーン表現の探索 Exploring Multi-modal Neural Scene Representations With Applications on Thermal Imaging ( http://arxiv.org/abs/2403.11865v1 ) ライセンス: Link先を確認	Mert Özer, Maximilian Weiherer, Martin Hundhausen, Bernhard Egger,	(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は、RGB画像のセットでトレーニングする際、新しいビュー合成タスクのための新しいデファクト標準として急速に進化した。本稿では,マルチモーダル学習の文脈において,NeRFなどのニューラルシーン表現を包括的に評価する。具体的には,RGB以外の2次モダリティをNeRFに組み込むための4つの戦略を提示する。(1) 両方のモダリティに独立してスクラッチからトレーニングすること,(2) RGBの事前トレーニングと2次モダリティの微調整を行うこと,(3) 第二分枝を追加すること,(4) 追加モダリティの(色)値を予測するために別成分を追加すること,である。熱画像はRGBとラジオシティの点で大きく異なるため,第2のモダリティとして選択した。提案手法の評価のために,6つの共通オブジェクトと約360RGBのサーマルイメージからなる,公開された新しいマルチビューデータセットであるThermialMixを収集した。データキャプチャに先立ってモダリティ校正を行い、RGBと熱画像の高品質なアライメントを実現した。以上の結果から,第2分枝をNeRFに付加することは熱画像の新規なビュー合成に最適であり,かつRGBに有意な結果をもたらすことが判明した。最後に、近赤外画像や深度マップなど他のモードに一般化した分析結果を示す。プロジェクトページ: https://mert-o.github.io/ThermalNeRF/。 Neural Radiance Fields (NeRFs) quickly evolved as the new de-facto standard for the task of novel view synthesis when trained on a set of RGB images. In this paper, we conduct a comprehensive evaluation of neural scene representations, such as NeRFs, in the context of multi-modal learning. Specifically, we present four different strategies of how to incorporate a second modality, other than RGB, into NeRFs: (1) training from scratch independently on both modalities; (2) pre-training on RGB and fine-tuning on the second modality; (3) adding a second branch; and (4) adding a separate component to predict (color) values of the additional modality. We chose thermal imaging as second modality since it strongly differs from RGB in terms of radiosity, making it challenging to integrate into neural scene representations. For the evaluation of the proposed strategies, we captured a new publicly available multi-view dataset, ThermalMix, consisting of six common objects and about 360 RGB and thermal images in total. We employ cross-modality calibration prior to data capturing, leading to high-quality alignments between RGB and thermal images. Our findings reveal that adding a second branch to NeRF performs best for novel view synthesis on thermal images while also yielding compelling results on RGB. Finally, we also show that our analysis generalizes to other modalities, including near-infrared images and depth maps. Project page: https://mert-o.github.io/ThermalNeRF/.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# ガウススプラッティングによるビュー一貫性3次元編集 View-Consistent 3D Editing with Gaussian Splatting ( http://arxiv.org/abs/2403.11868v1 ) ライセンス: Link先を確認	Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang,	(参考訳) 3D Gaussian Splatting (3DGS)の出現は、3D編集に革命をもたらし、効率よく高忠実なレンダリングを提供し、正確な局所的な操作を可能にした。現在、拡散ベースの2D編集モデルを用いて、マルチビューレンダリング画像を修正し、3DGSモデルの編集をガイドしている。しかし、このアプローチは多視点不整合の重要な問題に直面しており、誘導画像はビュー間で大きな相違を示し、モード崩壊と3DGSの視覚的アーティファクトをもたらす。この目的のために、3DGSをシームレスに画像編集プロセスに組み込む新しいフレームワークであるView-Consistent Editing (VcEdit)を導入する。 VcEditには、Cross-attention Consistency ModuleとEditing Consistency Moduleという2つの革新的な一貫性モジュールがある。これらの一貫性モジュールを反復的なパターンに組み込むことで、VcEditは多視点不整合の問題を解決し、様々な場面で高品質な3DGS編集を容易にする。 The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes.	翻訳日:2024-03-20 20:00:12 公開日:2024-03-18
# IDF-CR:リモートセンシング画像における分流・対流雲除去の反復拡散過程 IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images ( http://arxiv.org/abs/2403.11870v1 ) ライセンス: Link先を確認	Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin,	(参考訳) 深層学習技術は、光学リモートセンシング画像から雲を除去する効果を実証している。畳み込みニューラルネットワーク(CNN)は、クラウド除去タスクにおいて優位性を発揮する。しかし、畳み込み操作の固有の制限により、CNNはわずかに雲の閉塞に対処できる。近年、拡散モデルは、画像生成と再構成において、その強大な生成能力により、最先端(SOTA)の習熟度を達成している。拡散モデルの急激な発展に触発されて、我々はまず、成分分割・対流雲除去を実現するための強力な生成能力を示す雲除去のための反復拡散過程(IDF-CR)を提示する。 IDF-CRはピクセル空間雲除去モジュール(Pixel-CR)と遅延空間反復ノイズ拡散ネットワーク(IND)から構成される。具体的には、IGF-CRはピクセル空間と潜在空間に対処する2段階のモデルに分けられる。 2段階のモデルは、予備的な雲の縮小から微妙な細部の改良への戦略的移行を促進する。ピクセル空間の段階では、Pixel-CRは雲画像の処理を開始し、事前の雲除去知識を持つ拡散モデルを提供する前に、最適な雲の除去をもたらす。潜時空間の段階では、拡散モデルは低品質の雲の除去を高品質のクリーンな出力に変換する。 ControlNetを実装して安定拡散を改良する。さらに,拡散モデルに非教師付き反復雑音除去(INR)モジュールを導入し,予測された雑音の分布を最適化し,高度な詳細回復を向上する。我々のモデルは、光学リモートセンシングデータセット上で、画像再構成や光リモートセンシングクラウド除去など、他のSOTA手法とよく機能する。 Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of-the-art (SOTA) proficiency in image generation and reconstruction due to their formidable generative capabilities. Inspired by the rapid development of diffusion models, we first present an iterative diffusion process for cloud removal (IDF-CR), which exhibits a strong generative capabilities to achieve component divide-and-conquer cloud removal. IDF-CR consists of a pixel space cloud removal module (Pixel-CR) and a latent space iterative noise diffusion network (IND). Specifically, IDF-CR is divided into two-stage models that address pixel space and latent space. The two-stage model facilitates a strategic transition from preliminary cloud reduction to meticulous detail refinement. In the pixel space stage, Pixel-CR initiates the processing of cloudy images, yielding a suboptimal cloud removal prior to providing the diffusion model with prior cloud removal knowledge. In the latent space stage, the diffusion model transforms low-quality cloud removal into high-quality clean output. We refine the Stable Diffusion by implementing ControlNet. In addition, an unsupervised iterative noise refinement (INR) module is introduced for diffusion model to optimize the distribution of the predicted noise, thereby enhancing advanced detail recovery. Our model performs best with other SOTA methods, including image reconstruction and optical remote-sensing cloud removal on the optical remote-sensing datasets.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# ニューラルネットワークのリアルトロピカル幾何学 The Real Tropical Geometry of Neural Networks ( http://arxiv.org/abs/2403.11871v1 ) ライセンス: Link先を確認	Marie-Charlotte Brandenburg, Georg Loho, Guido Montúfar,	(参考訳) 我々は、熱帯有理関数の符号として定義される二項分類器を、2つの凸線型関数の差として考える。 ReLUニューラルネットワークのパラメータ空間は、熱帯有理関数のパラメータ空間内の半代数集合として含まれる。我々は、このパラメータ空間の2つの異なる部分分割の研究を開始する: 半代数集合に分割し、決定境界の組合せ型を固定し、多面体ファンに分割し、データセットの分割のコンビネータをキャプチャする。 0/1-ロス関数の下位レベル集合は、この分類ファンの下位ファンとして現れ、レベル集合は必ずしも連結でないことを示す。分類ファンについて述べる一アクティベーションポリトープの通常の扇として、及び二関連する二分グラフの性質のリストを組合せて、配向マトロイド及び熱帯配向マトロイドのコベクター公理に類似すること。本研究は,高地表面の正の熱帯化や半代数集合の熱帯化など,実際の熱帯地形で確立された構造を観察することにより,ニューラルネットワークと熱帯地形の関係を拡大・改善するものである。 We consider a binary classifier defined as the sign of a tropical rational function, that is, as the difference of two convex piecewise linear functions. The parameter space of ReLU neural networks is contained as a semialgebraic set inside the parameter space of tropical rational functions. We initiate the study of two different subdivisions of this parameter space: a subdivision into semialgebraic sets, on which the combinatorial type of the decision boundary is fixed, and a subdivision into a polyhedral fan, capturing the combinatorics of the partitions of the dataset. The sublevel sets of the 0/1-loss function arise as subfans of this classification fan, and we show that the level-sets are not necessarily connected. We describe the classification fan i) geometrically, as normal fan of the activation polytope, and ii) combinatorially through a list of properties of associated bipartite graphs, in analogy to covector axioms of oriented matroids and tropical oriented matroids. Our findings extend and refine the connection between neural networks and tropical geometry by observing structures established in real tropical geometry, such as positive tropicalizations of hypersurfaces and tropical semialgebraic sets.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# NuGraph2:ニュートリノ物理イベント再構築のためのグラフニューラルネットワーク NuGraph2: A Graph Neural Network for Neutrino Physics Event Reconstruction ( http://arxiv.org/abs/2403.11872v1 ) ライセンス: Link先を確認	V Hewes, Adam Aurisano, Giuseppe Cerati, Jim Kowalkowski, Claire Lee, Wei-keng Liao, Daniel Grzenda, Kaushal Gumpula, Xiaohe Zhang,	(参考訳) 液体アルゴン時間射影チャンバー(LArTPC)検出器技術は、粒子相互作用に関する豊富な高解像度情報を提供し、その情報を最大限に活用するには高度な自動再構成技術が必要である。本稿では、LArTPC検出器におけるシミュレーションニュートリノ相互作用の低レベル再構成のためのグラフニューラルネットワーク(GNN)であるNuGraph2について述べる。 MicroBooNE検出器幾何学におけるシミュレートされたニュートリノ相互作用は、平面部分グラフ上にノードを形成する各検出器面にエネルギー沈着を持つ不均一グラフとして記述される。このネットワークは、バックグラウンドフィルタリングとセマンティックラベリングをこれらのグラフノード上で実行し、98.0\%の効率で一次物理相互作用に関連するものを識別し、94.9\%の効率で粒子タイプに従ってラベル付けする。このネットワークは、複数の2次元表現にまたがる検出器オブザーバブルを直接運用するが、これらの表現間の一貫性を促進するために3Dコンテキスト認識機構を利用する。モデル推論はCPUでは0.12 s/event、GPUでは0.005 s/eventである。このアーキテクチャはニュートリノ物理学における粒子再構成のための汎用的なソリューションとして設計されており、幅広い検出器技術に展開する可能性がある。 Liquid Argon Time Projection Chamber (LArTPC) detector technology offers a wealth of high-resolution information on particle interactions, and leveraging that information to its full potential requires sophisticated automated reconstruction techniques. This article describes NuGraph2, a Graph Neural Network (GNN) for low-level reconstruction of simulated neutrino interactions in a LArTPC detector. Simulated neutrino interactions in the MicroBooNE detector geometry are described as heterogeneous graphs, with energy depositions on each detector plane forming nodes on planar subgraphs. The network utilizes a multi-head attention message-passing mechanism to perform background filtering and semantic labelling on these graph nodes, identifying those associated with the primary physics interaction with 98.0\% efficiency and labelling them according to particle type with 94.9\% efficiency. The network operates directly on detector observables across multiple 2D representations, but utilizes a 3D-context-aware mechanism to encourage consistency between these representations. Model inference takes 0.12 s/event on a CPU, and 0.005 s/event batched on a GPU. This architecture is designed to be a general-purpose solution for particle reconstruction in neutrino physics, with the potential for deployment across a broad range of detector technologies, and offers a core convolution engine that can be leveraged for a variety of tasks beyond the two described in this article.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# CO3: 生成的対話型クエリ書き換えのための低リソースコントラスト協調トレーニング CO3: Low-resource Contrastive Co-training for Generative Conversational Query Rewrite ( http://arxiv.org/abs/2403.11873v1 ) ライセンス: Link先を確認	Yifei Yuan, Chen Shi, Runze Wang, Liyi Chen, Renjun Hu, Zengming Zhang, Feijun Jiang, Wai Lam,	(参考訳) 生成的クエリ書き直しは、会話履歴を用いて再構成されたクエリ書き直しを生成する。近年,これらの手法は,データサイズに制限があるため固有のノイズに敏感であるのに対し,この課題に対して,少数ショット学習が人気が高まっている。さらに、両方の試みは、トレーニングとテストケースの間に言語スタイルのシフトがある場合、パフォーマンスの低下に直面します。そこで本研究では,ノイズや言語スタイルのシフトに対して頑健な,低リソースな生成的対話型クエリ書き換えについて検討する。中心となる考え方は、大量のラベルのないデータを使用して、コントラッシブなコトレーニングパラダイムを通じてさらなる改善を行うことである。具体的には、2つの双対モデル(RewriterとSimplifier)を共同でトレーニングし、それぞれが擬似ラベルによる追加ガイダンスを提供し、互いに反復的に拡張する。また、データ拡張によるコントラスト学習を活用して、ノイズよりも真に価値のある情報にもっと注意を払うことができます。大規模な実験は、少数ショットとゼロショットの両方のシナリオで、我々のモデルの優越性を実証する。また、言語スタイルのシフトに遭遇する際のモデルのより優れた一般化能力を検証する。 Generative query rewrite generates reconstructed query rewrites using the conversation history while rely heavily on gold rewrite pairs that are expensive to obtain. Recently, few-shot learning is gaining increasing popularity for this task, whereas these methods are sensitive to the inherent noise due to limited data size. Besides, both attempts face performance degradation when there exists language style shift between training and testing cases. To this end, we study low-resource generative conversational query rewrite that is robust to both noise and language style shift. The core idea is to utilize massive unlabeled data to make further improvements via a contrastive co-training paradigm. Specifically, we co-train two dual models (namely Rewriter and Simplifier) such that each of them provides extra guidance through pseudo-labeling for enhancing the other in an iterative manner. We also leverage contrastive learning with data augmentation, which enables our model pay more attention on the truly valuable information than the noise. Extensive experiments demonstrate the superiority of our model under both few-shot and zero-shot scenarios. We also verify the better generalization ability of our model when encountering language style shift.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# ダイナミックビジョンセンサを用いた高速無人航空機の実時間検出に向けて Towards Real-Time Fast Unmanned Aerial Vehicle Detection Using Dynamic Vision Sensors ( http://arxiv.org/abs/2403.11875v1 ) ライセンス: Link先を確認	Jakub Mandula, Jonas Kühne, Luca Pascarella, Michele Magno,	(参考訳) 無人航空機(UAV)は民間や軍事用途で人気を集めている。しかし、制限区域への制御されていないアクセスは、プライバシーとセキュリティを脅かす。したがって、UAVの防止と検出は、機密性と安全性を保証するために重要である。レーダーをベースとした能動走査は最も精度の高い技術であるが、受動的検査(例えば物体認識)よりも高価で汎用性が低い。ダイナミックビジョンセンサー(Dynamic Vision Sensor, DVS)は、低遅延物体検出によく適応する高速移動シーンにおいて、タイムスタンプによる画素レベルの明るさ変化を利用する、バイオインスパイアされたイベントベースの視覚モデルである。本稿では,F-UAV-D(Fast Unmanned Aerial Vehicle Detector)を提案する。特に、リアルタイム・低消費電力構成におけるRGBカメラの代替としてDVSを利用するためのセットアップを提案する。提案手法は,DVSの高ダイナミックレンジ(HDR)と背景抑制を活用し,様々な高速移動ドローンを用いて訓練すると,低照度や高速移動シーンなどの環境条件下でRGB入力より優れる。 F-UAV-Dが有効であることを示す。 (i)平均15W未満でドローンを検出すること。 i)エッジコンピュータのCPUとGPUノードを活用することにより、リアルタイム推論(50ms)を行う。 Unmanned Aerial Vehicles (UAVs) are gaining popularity in civil and military applications. However, uncontrolled access to restricted areas threatens privacy and security. Thus, prevention and detection of UAVs are pivotal to guarantee confidentiality and safety. Although active scanning, mainly based on radars, is one of the most accurate technologies, it can be expensive and less versatile than passive inspections, e.g., object recognition. Dynamic vision sensors (DVS) are bio-inspired event-based vision models that leverage timestamped pixel-level brightness changes in fast-moving scenes that adapt well to low-latency object detection. This paper presents F-UAV-D (Fast Unmanned Aerial Vehicle Detector), an embedded system that enables fast-moving drone detection. In particular, we propose a setup to exploit DVS as an alternative to RGB cameras in a real-time and low-power configuration. Our approach leverages the high-dynamic range (HDR) and background suppression of DVS and, when trained with various fast-moving drones, outperforms RGB input in suboptimal ambient conditions such as low illumination and fast-moving scenes. Our results show that F-UAV-D can (i) detect drones by using less than <15 W on average and (ii) perform real-time inference (i.e., <50 ms) by leveraging the CPU and GPU nodes of our edge computer.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# 自己監督型高分解能オフロードマッピングのためのディープベイズフュージョン Deep Bayesian Future Fusion for Self-Supervised, High-Resolution, Off-Road Mapping ( http://arxiv.org/abs/2403.11876v1 ) ライセンス: Link先を確認	Shubhra Aich, Wenshan Wang, Parv Maheshwari, Matthew Sivaprakasam, Samuel Triest, Cherie Ho, Jason M. Gregory, John G. Rogers III, Sebastian Scherer,	(参考訳) 資源が制限されたオフロード車両の感度の制限は、信頼性の高いオフロード自律性に重大な課題をもたらす。この制限を克服するため、我々は将来的な情報(すなわち、将来の融合)を自己監督のために融合する一般的な枠組みを提案する。近年のアプローチでは、この未来の情報を手作りのヒューリスティックスと共に活用して、ターゲットとする下流タスクを直接監督している(例えば、トラバーサビリティ推定)。しかし,本稿では,高分解能(画素あたり2cm)のBEVマップを将来の融合を通じて自己監督的に作成し,より長い範囲の予測のために下流のタスクに使用できる,より一般的な開発ラインを選択する。この目的のために、まず、RGB/高さの生のスパースとノイズの多い入力とマップベースの高密度ラベルのペアを含む高解像度のフューチャーフュージョンデータセットを作成する。次に,特に遠位領域における知覚情報のノイズや空間性に対応するため,バニラ畳み込みネットワークへのベイズフィルタの効率よく実現する機構を設計する。我々のベイズ構造は、SOTA生成モデルからアイデアを取り入れ、遠位領域における高品質なBEVマップを効果的に予測する。将来の融合データセットにおける完了の質と下流タスクの双方に対する広範囲な評価は、我々のアプローチの可能性を示している。 The limited sensing resolution of resource-constrained off-road vehicles poses significant challenges towards reliable off-road autonomy. To overcome this limitation, we propose a general framework based on fusing the future information (i.e. future fusion) for self-supervision. Recent approaches exploit this future information alongside the hand-crafted heuristics to directly supervise the targeted downstream tasks (e.g. traversability estimation). However, in this paper, we opt for a more general line of development - time-efficient completion of the highest resolution (i.e. 2cm per pixel) BEV map in a self-supervised manner via future fusion, which can be used for any downstream tasks for better longer range prediction. To this end, first, we create a high-resolution future-fusion dataset containing pairs of (RGB / height) raw sparse and noisy inputs and map-based dense labels. Next, to accommodate the noise and sparsity of the sensory information, especially in the distal regions, we design an efficient realization of the Bayes filter onto the vanilla convolutional network via the recurrent mechanism. Equipped with the ideas from SOTA generative models, our Bayesian structure effectively predicts high-quality BEV maps in the distal regions. Extensive evaluation on both the quality of completion and downstream task on our future-fusion dataset demonstrates the potential of our approach.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# 第4世代暖房グリッドにおける学習型熱流の効率的な訓練 Efficient Training of Learning-Based Thermal Power Flow for 4th Generation District Heating Grids ( http://arxiv.org/abs/2403.11877v1 ) ライセンス: Link先を確認	Andreas Bott, Mario Beykirch, Florian Steinke,	(参考訳) ヒートパワーフロー(TPF)は、複数の分散型熱源とメッシュグリッド構造を有する第4世代地域熱グリッドにおいて、様々な制御目的のために重要なタスクである。 TPFの計算、すなわち、所定の供給および需要値に対する温度、圧力、質量フローからなるグリッド状態を決定することは、非線形熱グリッド方程式を解くことによって古典的に行われるが、ニューラルネットワークのような学習モデルを用いて、桁違いに高速化することができる。本稿では,必要な供給と需要を網羅した,十分に大規模なトレーニングデータセットを生成するための,新しい,効率的な手法を提案する。提案手法は,供給と需要の値をサンプリングする代わりに,発電機および消費者のマスフロー上のプロキシ分布からトレーニング例を生成し,ヒートグリッド方程式の解法に必要なイテレーションを省略する。正確には、わずかに異なるトレーニング例は、元のトレーニング分布を表すために重み付けすることができる。提案手法は, トレーニングサンプルの信頼性を損なうことなく, サンプリングと需要値を直接比較して, トレーニングセット生成時間を2桁の規模で削減できることを示す。さらに, トレーニングデータセットを用いたTPFの学習は, サンプルレス, 物理対応のトレーニングアプローチを著しく上回ることを示した。 Thermal power flow (TPF) is an important task for various control purposes in 4 Th generation district heating grids with multiple decentral heat sources and meshed grid structures. Computing the TPF, i.e., determining the grid state consisting of temperatures, pressures, and mass flows for given supply and demand values, is classically done by solving the nonlinear heat grid equations, but can be sped up by orders of magnitude using learned models such as neural networks. We propose a novel, efficient scheme to generate a sufficiently large training data set covering relevant supply and demand values. Instead of sampling supply and demand values, our approach generates training examples from a proxy distribution over generator and consumer mass flows, omitting the iterations needed for solving the heat grid equations. The exact, but slightly different, training examples can be weighted to represent the original training distribution. We show with simulations for typical grid structures that the new approach can reduce training set generation times by two orders of magnitude compared to sampling supply and demand values directly, without loss of relevance for the training samples. Moreover, learning TPF with a training data set is shown to outperform sample-free, physics-aware training approaches significantly.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# InTeX:Unified Depth-Aware Inpaintingによるインタラクティブテキスト・テクスチャ合成 InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting ( http://arxiv.org/abs/2403.11878v1 ) ライセンス: Link先を確認	Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu,	(参考訳) テキスト・ツー・テクスチャ合成は、最近のテキスト・ツー・イメージ・モデルの発展により、3Dコンテンツ作成の新たなフロンティアとなった。既存の手法は主に、事前訓練された深度認識拡散と塗装モデルの組み合わせを採用するが、それらは3Dの不整合や制限された制御可能性などの欠点を示す。これらの課題に対処するために,インタラクティブなテキスト・テクスチャ合成のための新しいフレームワークであるInteXを紹介する。 1) InteXはユーザフレンドリーなインタフェースを備えており、合成プロセス全体を通して対話や制御が容易であり、地域固有の塗り替えや正確なテクスチャ編集を可能にしている。 2) 深度情報と塗布手がかりを統合し, 3D の不整合を効果的に軽減し, 生成速度を向上する統合深度認識塗装モデルを構築した。大規模な実験を通じて,本フレームワークはテキストからテクスチャへの合成に実用的かつ効果的であることが証明され,高品質な3Dコンテンツ作成の道を開いた。 Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# 単相多面体融合による情緒的偏見の予測 Unimodal Multi-Task Fusion for Emotional Mimicry Prediciton ( http://arxiv.org/abs/2403.11879v1 ) ライセンス: Link先を確認	Tobias Hallmen, Fabian Deuser, Norbert Oswald, Elisabeth André,	(参考訳) 本研究では,第6回ワークショップおよび感情行動分析コンペティションにおける情緒的不安度(EMI)推定の方法論を提案する。提案手法では,包括的ポッドキャストデータセットで事前学習したWav2Vec 2.0フレームワークを利用して,言語的およびパラ言語的要素を含む幅広い音声特徴を抽出する。我々は,グローバルな平均ベクトルと個々の特徴を統合する融合手法により特徴表現を強化し,分析にグローバルな文脈的洞察を導入する。さらに,Wav2Vec 2.0モデルから,事前学習したValence-arousal-dominance (VAD)モジュールを組み込んだ。我々の融合では、音声データの時間的効率的な分析にLong Short-Term Memory (LSTM) アーキテクチャを採用している。提案手法は,提供された音声データのみを利用することで,確立されたベースラインよりも大幅に改善されたことを示す。 In this study, we propose a methodology for the Emotional Mimicry Intensity (EMI) Estimation task within the context of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our approach leverages the Wav2Vec 2.0 framework, pre-trained on a comprehensive podcast dataset, to extract a broad range of audio features encompassing both linguistic and paralinguistic elements. We enhance feature representation through a fusion technique that integrates individual features with a global mean vector, introducing global contextual insights into our analysis. Additionally, we incorporate a pre-trained valence- arousal-dominance (VAD) module from the Wav2Vec 2.0 model. Our fusion employs a Long Short-Term Memory (LSTM) architecture for efficient temporal analysis of audio data. Utilizing only the provided audio data, our approach demonstrates significant improvements over the established baseline.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# ReGenNet:人間の行動反応合成を目指して ReGenNet: Towards Human Action-Reaction Synthesis ( http://arxiv.org/abs/2403.11882v1 ) ライセンス: Link先を確認	Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng,	(参考訳) 人間は周囲の環境と常に対話する。現在のヒト中心生成モデルは、静的なシーンやオブジェクトとプラスティックに相互作用する人間の合成に重点を置いている一方、ユビキタスな因果的人間と人間の相互作用のための動的ヒトの行動-反応合成は、あまり研究されていない。人間と人間の相互作用は、原子間相互作用期のアクターや反応器と非対称であると見なすことができる。本稿では,人間と人間の相互作用の非対称性,動的性,同期性,詳細性を包括的に分析し,人間の行動に条件付けされた人間の反応を生成するための,最初のマルチセットヒト行動-反応合成ベンチマークを提案する。まず,NTU120,InterHuman,Chi3Dデータセットに対して,対話シーケンスのアクター・リアクター順序をアノテートすることを提案する。それらに基づいて,ReGenNetと呼ばれるトランスフォーマーデコーダアーキテクチャを用いた拡散型生成モデルを提案する。定量的および定性的な結果から,本手法はベースラインと比較して即時かつ妥当な人間の反応を生成でき,アクターの動きや視点の変化を一般化できることが示された。 Humans constantly interact with their surrounding environments. Current human-centric generative models mainly focus on synthesizing humans plausibly interacting with static scenes and objects, while the dynamic human action-reaction synthesis for ubiquitous causal human-human interactions is less explored. Human-human interactions can be regarded as asymmetric with actors and reactors in atomic interaction periods. In this paper, we comprehensively analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions and propose the first multi-setting human action-reaction synthesis benchmark to generate human reactions conditioned on given human actions. To begin with, we propose to annotate the actor-reactor order of the interaction sequences for the NTU120, InterHuman, and Chi3D datasets. Based on them, a diffusion-based generative model with a Transformer decoder architecture called ReGenNet together with an explicit distance-based interaction loss is proposed to predict human reactions in an online manner, where the future states of actors are unavailable to reactors. Quantitative and qualitative results show that our method can generate instant and plausible human reactions compared to the baselines, and can generalize to unseen actor motions and viewpoint changes.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# QueryAgent: 環境フィードバックに基づく自己補正による信頼性と効率的な推論フレームワーク QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction ( http://arxiv.org/abs/2403.11886v1 ) ライセンス: Link先を確認	Xiang Huang, Sitao Cheng, Shanshan Huang, Jiayu Shen, Yong Xu, Chaoyun Zhang, Yuzhong Qu,	(参考訳) 意味解析にLarge Language Models(LLM)を使うことは、大きな成功を収めた。しかし,幻覚に遭遇した場合,既存の手法は信頼性や効率性に乏しいことが判明した。本稿では,質問を段階的に解決し,段階的に自己補正を行うQueryAgentというフレームワークを用いて,これらの課題に対処する。環境フィードバックに基づく自己補正手法ERASERを提案する。従来のアプローチとは異なり、ERASERは中間段階の豊かな環境フィードバックを活用して、必要に応じて選択的で差別化された自己補正を行う。実験の結果、QueryAgentはGrailQAとGraphQのサンプルを7.0と15.0のF1で1つだけ使って、以前のいくつかのショットメソッドを特に上回っている。さらに,ランタイムやクエリオーバヘッド,API呼び出しコストなど,効率性の面で優れています。 ERASERを活用することで、AgentBenchという別のベースラインを約10ポイント改善し、我々のアプローチの強い転送可能性を明らかにする。 Employing Large Language Models (LLMs) for semantic parsing has achieved remarkable success. However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs step-wise self-correction. We introduce an environmental feedback-based self-correction method called ERASER. Unlike traditional approaches, ERASER leverages rich environmental feedback in the intermediate steps to perform selective and differentiated self-correction only when necessary. Experimental results demonstrate that QueryAgent notably outperforms all previous few-shot methods using only one example on GrailQA and GraphQ by 7.0 and 15.0 F1. Moreover, our approach exhibits superiority in terms of efficiency, including runtime, query overhead, and API invocation costs. By leveraging ERASER, we further improve another baseline (i.e., AgentBench) by approximately 10 points, revealing the strong transferability of our approach.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# SuperLoRA:多層アテンションモジュールのパラメータ効率の良い統一適応 SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules ( http://arxiv.org/abs/2403.11887v1 ) ライセンス: Link先を確認	Xiangyu Chen, Jing Liu, Ye Wang, Pu Perry Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino,	(参考訳) 低ランク適応(LoRA)とその変種は、自然言語処理のための大規模言語モデルやコンピュータビジョンのための拡散モデルなど、微調整された大型モデルに広く用いられている。本稿では、異なるパラメータ設定で実現可能な、異なるLoRA変異を統一および拡張する、SuperLoRAと呼ばれる一般化されたフレームワークを提案する。 SuperLoRAは、グループ化、折り畳み、シャッフル、プロジェクション、テンソルファクタリングを導入し、他のLoRAの亜種と比較して高い柔軟性を提供し、特に極小パラメータの状況において、トランスファーラーニングタスクの優れたパフォーマンスを示す。 Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# KnFu: 効果的な知識融合 KnFu: Effective Knowledge Fusion ( http://arxiv.org/abs/2403.11892v1 ) ライセンス: Link先を確認	S. Jamal Seyedmohammadi, S. Kawa Atapour, Jamshid Abouei, Arash Mohammadi,	(参考訳) フェデレートラーニング(FL)は、従来の集中型ラーニングのアプローチに代わる顕著な代替手段として登場した。一般的にFLは、複数のローカルノードにわたる機械学習(ML)モデルの協調トレーニングを可能にし、さまざまなデータセットを活用しながら、データのプライバシとセキュリティを確保する、分散化されたアプローチである。しかし、従来のFLは勾配反転攻撃の影響を受けやすく、局所モデルに一様アーキテクチャを限定的に適用しており、非IID局所データセットによるモデルの不均一性(モデルドリフト)に悩まされている。これらの課題を緩和するために、新しいFKD(Federated Knowledge Distillation)パラダイムが登場した。 FDKはKD(Knowledge Distillation)の概念に基づいて開発され、大きく訓練された教師モデルの知識を軽量の学生モデルに抽出し、伝達する。しかし、FKDはモデルドリフトの問題に直面している。直感的には、すべての知識が局所ノード間のデータ固有の多様性のために普遍的に有用であるとは限らない。これにより、各クライアントの知識の他人に対する妥当性と有効性を評価し、有害な知識の伝播を防止する革新的なメカニズムが求められます。そこで,本研究では,各クライアントに対するセマンティックな隣人の効果的な知識のみを融合させるため,局所モデルの知識を評価するための実効的知識融合(KnFu)アルゴリズムを提案する。 KnFuは、各クライアントに対してパーソナライズされた効果的な知識融合スキームであり、集約フェーズの前に異なるローカルモデルの知識の有効性を分析する。提案したKnFuの有効性を示すMNISTとCIFAR10データセットの総合的な実験を行った。この研究の重要な結論は、大規模でヘテロジニアスなローカルデータセットを持つシナリオでは、知識融合ベースのソリューションよりも局所的なトレーニングが望ましい、ということである。 Federated Learning (FL) has emerged as a prominent alternative to the traditional centralized learning approach. Generally speaking, FL is a decentralized approach that allows for collaborative training of Machine Learning (ML) models across multiple local nodes, ensuring data privacy and security while leveraging diverse datasets. Conventional FL, however, is susceptible to gradient inversion attacks, restrictively enforces a uniform architecture on local models, and suffers from model heterogeneity (model drift) due to non-IID local datasets. To mitigate some of these challenges, the new paradigm of Federated Knowledge Distillation (FKD) has emerged. FDK is developed based on the concept of Knowledge Distillation (KD), which involves extraction and transfer of a large and well-trained teacher model's knowledge to lightweight student models. FKD, however, still faces the model drift issue. Intuitively speaking, not all knowledge is universally beneficial due to the inherent diversity of data among local nodes. This calls for innovative mechanisms to evaluate the relevance and effectiveness of each client's knowledge for others, to prevent propagation of adverse knowledge. In this context, the paper proposes Effective Knowledge Fusion (KnFu) algorithm that evaluates knowledge of local models to only fuse semantic neighbors' effective knowledge for each client. The KnFu is a personalized effective knowledge fusion scheme for each client, that analyzes effectiveness of different local models' knowledge prior to the aggregation phase. Comprehensive experiments were performed on MNIST and CIFAR10 datasets illustrating effectiveness of the proposed KnFu in comparison to its state-of-the-art counterparts. A key conclusion of the work is that in scenarios with large and highly heterogeneous local datasets, local training could be preferable to knowledge fusion-based solutions.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# マルチパーティネットワークにおける量子コーディネート率 Quantum Coordination Rates in Multi-Partite Networks ( http://arxiv.org/abs/2403.11893v1 ) ライセンス: Link先を確認	Hosen Nator, Uzi Pereg,	(参考訳) 最適調整速度は、マルチパーティ量子ネットワークの3つの一次設定で決定され、複数のパーティ間の共同量子状態をシミュレートするために必要となる最小限のリソースを特徴付ける。本研究では,(1)狭い絡み合いを持つカスケードネットワーク,(2)1つの送信機と2つの受信機からなる放送ネットワーク,(3)2つの送信機と1つの受信機を備えた多重アクセスネットワークについて検討する。我々は,各設定において,漸近的に達成可能なコミュニケーションと絡み合い率について,必要かつ十分な条件を確立する。最後に、量子戦略を持つ非局所ゲームにおいて、結果が意味することを示す。 The optimal coordination rates are determined in three primary settings of multi-partite quantum networks, thus characterizing the minimal resources required in order to simulate a joint quantum state among multiple parties. We study the following models: (1) a cascade network with limited entanglement, (2) a broadcast network, which consists of a single sender and two receivers, (3) a multiple-access network with two senders and a single receiver. We establish the necessary and sufficient conditions on the asymptotically-achievable communication and entanglement rates in each setting. At last, we show the implications of our results on nonlocal games with quantum strategies.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# 医療における自然言語処理のための説明可能なディープラーニングから解釈可能なディープラーニングへ:現実からどのくらい遠いのか? From explainable to interpretable deep learning for natural language processing in healthcare: how far from reality? ( http://arxiv.org/abs/2403.11894v1 ) ライセンス: Link先を確認	Guangming Huang, Yunfei Long, Yingya Li, Giorgos Papanastasiou,	(参考訳) ディープラーニング(DL)は、さまざまな自然言語処理(NLP)タスクに対処することで、医療研究を大幅に強化している。しかし、DLベースのNLP手法の複雑さの増大は、信頼性の高い意思決定のために、透明性のあるモデル解釈可能性(少なくとも説明可能性)を必要とする。本研究は, 医療用NLPにおける説明可能な, 解釈可能なDLについて, 徹底的なスコーピングレビューを行った。 XAI(eXplainable and Interpretable Artificial Intelligence)という用語は、XAIとIAIを区別するために導入された。メソッドはさらに、その機能(モデル、インプット、アウトプットベース)とスコープ(ローカル、グローバル)に基づいて分類された。分析の結果,注意機構がIAIの主流であったことが判明した。また、IAI は XAI に対してますます利用されている。主要な課題は、ほとんどのXIAIが"グローバル"なモデリングプロセス、ベストプラクティスの欠如、体系的な評価とベンチマークの必要性を探求していないことである。パーソナライズ医療におけるマルチモーダルXIAIの強化や、DLと因果推論の併用など、重要な機会が得られた。我々の議論は、LLMとドメイン固有の小さなモデルへのXIAIの統合を奨励する。我々のレビューは、医療における本質的なIAIの改善と複雑なNLPの関与に向けて、さらなる研究とベンチマークを刺激することができる。 Deep learning (DL) has substantially enhanced healthcare research by addressing various natural language processing (NLP) tasks. Yet, the increasing complexity of DL-based NLP methods necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review on explainable and interpretable DL in healthcare NLP. The term "XIAI" (eXplainable and Interpretable Artificial Intelligence) was introduced to distinguish XAI from IAI. Methods were further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms were the most dominant emerging IAI. Moreover, IAI is increasingly used against XAI. The major challenges identified are that most XIAI do not explore "global" modeling processes, the lack of best practices, and the unmet need for systematic evaluation and benchmarks. Important opportunities were raised such as using "attention" to enhance multi-modal XIAI for personalized medicine and combine DL with causal reasoning. Our discussion encourages the integration of XIAI in LLMs and domain-specific smaller models. Our review can stimulate further research and benchmarks toward improving inherent IAI and engaging complex NLP in healthcare.	翻訳日:2024-03-20 19:50:22 公開日:2024-03-18
# 機械翻訳におけるジェンダーバイアスのマーカーとドライバの検討 Investigating Markers and Drivers of Gender Bias in Machine Translations ( http://arxiv.org/abs/2403.11896v1 ) ライセンス: Link先を確認	Peter J Barclay, Ashkan Sami,	(参考訳) 大規模言語モデル(LLM)におけるインプシット・ジェンダーバイアスは、十分に文書化された問題であり、自動翻訳に導入されたジェンダーの影響は、現実世界のバイアスを持続させることができる。しかし、一部のLLMはヒューリスティックスやポストプロセッシングを使ってそのようなバイアスを隠蔽し、調査を困難にしている。本稿では,従来の56のソフトウェアエンジニアリングタスクを繰り返し翻訳する際に発生するバイアスをDeepL翻訳APIを用いて,逆翻訳によるLLMのバイアスについて検討する。それぞれの文は「she」から始まり、最初は「genderless」中間言語に翻訳され、次に英語に戻す。先行研究は,(1)フィンランド語,インドネシア語,エストニア語,トルコ語,ハンガリー語という5つの中間言語を対象とした結果の比較,(2)反復翻訳で示唆される性別の変動を評価するための新しい指標の提案,(2)先行研究における個々の代名詞の過度な解釈を避けること,(3)バイアスを駆動する文の特徴を調査すること,(4)3つのタイムラプスデータセットの結果を比較してアプローチの再現性を確立すること,の5つの方法によって拡張される。いくつかの言語は3つのゆるいグループに分類されるが、そのパターンはグループによって異なる。また,文中に出現する主動詞は,翻訳における意味のあるジェンダーの要因である可能性が示唆された。さらに,本研究では,DeepL翻訳APIの動作に明らかな変化があるにも関わらず,結果の再現性が良好であることが確認された。これらの結果から,バックトランスレーション法は,言語モデルにおけるバイアスに関するさらなる洞察を与えることができることがわかった。 Implicit gender bias in Large Language Models (LLMs) is a well-documented problem, and implications of gender introduced into automatic translations can perpetuate real-world biases. However, some LLMs use heuristics or post-processing to mask such bias, making investigation difficult. Here, we examine bias in LLMss via back-translation, using the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks used in a previous study. Each statement starts with 'she', and is translated first into a 'genderless' intermediate language then back into English; we then examine pronoun- choice in the back-translated texts. We expand prior research in the following ways: (1) by comparing results across five intermediate languages, namely Finnish, Indonesian, Estonian, Turkish and Hungarian; (2) by proposing a novel metric for assessing the variation in gender implied in the repeated translations, avoiding the over-interpretation of individual pronouns, apparent in earlier work; (3) by investigating sentence features that drive bias; (4) and by comparing results from three time-lapsed datasets to establish the reproducibility of the approach. We found that some languages display similar patterns of pronoun use, falling into three loose groups, but that patterns vary between groups; this underlines the need to work with multiple languages. We also identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations. Moreover, we see a good level of replicability in the results, and establish that our variation metric proves robust despite an obvious change in the behaviour of the DeepL translation API during the course of the study. These results show that the back-translation method can provide further insights into bias in language models.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# ケーブルプラグ用ビゾタクティルプレトレーニング Visuo-Tactile Pretraining for Cable Plugging ( http://arxiv.org/abs/2403.11898v1 ) ライセンス: Link先を確認	Abraham George, Selam Gano, Pranav Katragadda, Amir Barati Farimani,	(参考訳) 触覚情報は微粒な操作にとって重要なツールである。人間として、私たちは私たちの環境の物体を理解するために触覚情報に大きく依存しています。操作タスクの実行だけでなく、これらのタスクの実行方法の学習にもタッチを使用します。したがって、人間や超人的なパフォーマンスで操作作業の完了を学習できるロボットエージェントを作成するためには、触覚情報をスキル実行とスキル学習の両方に適切に組み込む必要がある。本稿では,複雑なタスクの性能向上のために,触覚情報を模倣学習プラットフォームに組み込む方法について検討する。そのために、細粒度ビズータクティルサーブに依存する巧妙な操作タスクであるUSBケーブルを差し込むという課題に取り組む。触覚情報を模倣学習フレームワークに組み込むことで、ロボットエージェントにUSBケーブルを接続するように訓練することが可能になります。さらに, 触覚情報を用いて非触覚エージェントの訓練を行う方法についても検討した。その結果, 触覚情報による事前学習により, 非触覚エージェントの性能が著しく向上し, ビジュオ触覚エージェントと同等のレベルに達することが示唆された。デモビデオとコードベースへのアクセスについては、プロジェクトのWebサイトを参照してください。 Tactile information is a critical tool for fine-grain manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on complex tasks. To do this, we tackle the challenge of plugging in a USB cable, a dexterous manipulation task that relies on fine-grain visuo-tactile serving. By incorporating tactile information into imitation learning frameworks, we are able to train a robotic agent to plug in a USB cable - a first for imitation learning. Additionally, we explore how tactile information can be used to train non-tactile agents through a contrastive-loss pretraining process. Our results show that by pretraining with tactile information, the performance of a non-tactile agent can be significantly improved, reaching a level on par with visuo-tactile agents. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-cable-plugging/home	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# GNeRP:雑音偏光先行した反射物体のガウス誘導ニューラル再構成 GNeRP: Gaussian-guided Neural Reconstruction of Reflective Objects with Noisy Polarization Priors ( http://arxiv.org/abs/2403.11899v1 ) ライセンス: Link先を確認	LI Yang, WU Ruizheng, LI Jiyong, CHEN Ying-cong,	(参考訳) ニューラルレイディアンス場(NeRF)からの学習面は、MVS(Multi-View Stereo)の話題となった。近年のサインド・ディスタンス・ファンクション (SDF) を用いた手法は, ランバートのシーンの正確な3次元形状を復元する能力を示した。しかし、反射シーンにおけるそれらの結果は、特異な放射率と複雑な幾何学の絡み合いにより満足できない。そこで本研究では,SDF分野における正規表現のガウス的表現を提案する。偏光の先行によって監督されるこの表現は、鏡面反射の背後にある幾何学の学習をガイドし、既存の方法よりも多くの詳細を捉えている。さらに,偏光前処理のノイズ問題を緩和する最適化プロセスにおける重み付け戦略を提案する。設計の有効性を検証するため,様々な形状の反射シーンで偏光情報と地中真理メッシュをキャプチャする。また,PANDORAデータセット上でのフレームワークの評価を行った。提案手法は,反射シーンにおける既存の3次元再構成法よりも大きなマージンで優れていることを示す。 Learning surfaces from neural radiance field (NeRF) became a rising topic in Multi-View Stereo (MVS). Recent Signed Distance Function (SDF)-based methods demonstrated their ability to reconstruct accurate 3D shapes of Lambertian scenes. However, their results on reflective scenes are unsatisfactory due to the entanglement of specular radiance and complicated geometry. To address the challenges, we propose a Gaussian-based representation of normals in SDF fields. Supervised by polarization priors, this representation guides the learning of geometry behind the specular reflection and captures more details than existing methods. Moreover, we propose a reweighting strategy in the optimization process to alleviate the noise issue of polarization priors. To validate the effectiveness of our design, we capture polarimetric information, and ground truth meshes in additional reflective scenes with various geometry. We also evaluated our framework on the PANDORA dataset. Comparisons prove our method outperforms existing neural 3D reconstruction methods in reflective scenes by a large margin.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# Larimar: エピソードメモリ制御を備えた大規模言語モデル Larimar: Large Language Models with Episodic Memory Control ( http://arxiv.org/abs/2403.11901v1 ) ライセンス: Link先を確認	Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen,	(参考訳) LLM(Large Language Models)に格納された知識の効率的かつ正確な更新は、今日の最も急進的な研究課題の1つである。本稿では,Larimarについて述べる。Larimarは,分散エピソードメモリを用いてLLMを拡張するための,脳にインスパイアされた新しいアーキテクチャである。 Larimarのメモリは、計算コストのかかるリトレーニングや微調整を必要とせずに、動的でワンショットの知識更新を可能にする。複数のファクト編集ベンチマークの実験結果から、Larimarは、挑戦的なシーケンシャルな編集セットアップであっても、最も競争力のあるベースラインに匹敵する精度を達成できたが、ベースLLMに依存して4～10倍のスピードアップを実現している。さらに、Larimarを用いた選択的な事実認識と入力コンテキスト長の一般化のメカニズムを提案し、その効果を示す。 Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 4-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting and input context length generalization with Larimar and show their effectiveness.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# クレーム分解の概観 A Closer Look at Claim Decomposition ( http://arxiv.org/abs/2403.11903v1 ) ライセンス: Link先を確認	Miriam Wanner, Seth Ebner, Zhengping Jiang, Mark Dredze, Benjamin Van Durme,	(参考訳) 生成したテキストがより一般的になるにつれて、このようなテキストが外部の知識ソースによってどれだけ支持されているかを評価することがますます重要である。テキストサポートを評価するための多くのアプローチは、信頼された参照に対して得られる個々のサブ文にテキストを分解する方法に依存している。本稿では,最近提案されたFActScore などの評価手法が,各種のクレーム分解方法,特に LLM に基づく手法にどのような影響を及ぼすかを検討した。この感度は、エラーが計量の分解ステップからもたらされるとしても、テキストを生成するモデルに対して、そのようなメトリクスが全体的なテキストサポートであるから生じます。分解品質を測定するために,DecompScore と呼ぶ FActScore の適応を導入する。そこで我々は,Bertrand Russell の論理原子論とネオダビッドソン意味論に触発された分解を生成するための LLM ベースの手法を提案し,その分解品質を従来の方法よりも向上したことを示した。 As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based methods -- affect the result of an evaluation approach such as the recently proposed FActScore, finding that it is sensitive to the decomposition method used. This sensitivity arises because such metrics attribute overall textual support to the model that generated the text even though error can also come from the metric's decomposition step. To measure decomposition quality, we introduce an adaptation of FActScore, which we call DecompScore. We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics and demonstrate its improved decomposition quality over previous methods.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# CICLe: 大規模多型食品リスク分類のためのコンフォーマル・インコンテクスト学習 CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification ( http://arxiv.org/abs/2403.11904v1 ) ライセンス: Link先を確認	Korbinian Randl, John Pavlopoulos, Aron Henriksson, Tony Lindgren,	(参考訳) 汚染された食品や成体の食品は、人間の健康に重大なリスクをもたらす。トレーニング用のラベル付きWebテキストセットが与えられたら、機械学習と自然言語処理を適用して、そのようなリスクを自動的に検出することができる。我々は,公開食品リコール発表を記述した7,546の短いテキストのデータセットを公開している。各テキストは、2つの粒度レベル(粗さと微妙さ)で手動でラベル付けされる。データセットとベンチマークナイーブ、従来型、トランスフォーマーモデルについて説明する。分析の結果,tf-idf表現に基づくロジスティック回帰は,低サポートのクラスではRoBERTaとXLM-Rより優れていた。最後に,異なるプロンプト戦略について議論し,コンフォーマル予測に基づくLLM-in-the-loopフレームワークを提案する。 Contaminated or adulterated food poses a substantial risk to human health. Given sets of labeled web texts for training, Machine Learning and Natural Language Processing can be applied to automatically detect such risks. We publish a dataset of 7,546 short texts describing public food recall announcements. Each text is manually labeled, on two granularity levels (coarse and fine), for food products and hazards that the recall corresponds to. We describe the dataset and benchmark naive, traditional, and Transformer models. Based on our analysis, Logistic Regression based on a tf-idf representation outperforms RoBERTa and XLM-R on classes with low support. Finally, we discuss different prompting strategies and present an LLM-in-the-loop framework, based on Conformal Prediction, which boosts the performance of the base classifier while reducing energy consumption compared to normal prompting.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# Tur[k]ingBench: Webエージェントのチャレンジベンチマーク Tur[k]ingBench: A Challenge Benchmark for Web Agents ( http://arxiv.org/abs/2403.11905v1 ) ライセンス: Link先を確認	Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi,	(参考訳) 最近のチャットボットは、生のテキスト形式で理解し、コミュニケーションする能力を発揮している。しかし、世界は原文以上のものが存在する。例えば、人間が長い時間をウェブページで過ごし、そこではテキストが他のモダリティと連動し、タスクは様々な複雑な相互作用の形で達成される。最先端のマルチモーダルモデルはそのような複雑な領域に一般化できるのか? この問題に対処するために、TurkingBenchという、マルチモーダルコンテキストによるテキスト命令を含むWebページとして定式化されたタスクのベンチマークを導入する。人工的に合成されたWebページを利用する既存の作業とは異なり、ここでは、さまざまなアノテーションのために、もともとクラウドソーシングワーカーのために設計された、自然なHTMLページを使用します。各タスクのHTML命令は、さまざまな値(クラウドソーシングタスクから得られる)でインスタンス化され、タスクの新しいインスタンスを形成します。このベンチマークには158タスクに分散した32.2Kインスタンスが含まれている。さらに,TurkingBenchの評価を容易にするために,チャットボットの応答をWebページの修正(テキストボックスの変更,ラジオの確認など)に結びつける評価フレームワークを開発した。本ベンチマークでは,言語のみ,視覚のみ,レイアウトのみ,およびそれらの組み合わせを含む最先端モデルの性能を評価する。以上の結果から,これらのモデルではランダムな確率よりもはるかに優れた性能が得られたが,改善の余地は十分にあることがわかった。このベンチマークによって、Webベースのエージェントの評価と開発が促進されることを願っています。 Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such complex domains? To address this question, we introduce TurkingBench, a benchmark of tasks formulated as web pages containing textual instructions with multi-modal context. Unlike existing work which employs artificially synthesized web pages, here we use natural HTML pages that were originally designed for crowdsourcing workers for various annotation purposes. The HTML instructions of each task are also instantiated with various values (obtained from the crowdsourcing tasks) to form new instances of the task. This benchmark contains 32.2K instances distributed across 158 tasks. Additionally, to facilitate the evaluation on TurkingBench, we develop an evaluation framework that connects the responses of chatbots to modifications on web pages (modifying a text box, checking a radio, etc.). We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark. Our findings reveal that these models perform significantly better than random chance, yet considerable room exists for improvement. We hope this benchmark will help facilitate the evaluation and development of web-based agents.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# Distill2Explain:エネルギー応用制御系における説明可能な強化学習のための微分可能な決定木 Distill2Explain: Differentiable decision trees for explainable reinforcement learning in energy application controllers ( http://arxiv.org/abs/2403.11907v1 ) ライセンス: Link先を確認	Gargya Gokhale, Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder,	(参考訳) エネルギー移行プロセスにおける重要な要素として、需要側の柔軟性が重要になっている。世界の最終エネルギー消費の約25%を占める住宅セクターは、エネルギーの柔軟性の重要な(潜在的)源である。しかし、この柔軟性を解き放つには、(1)異なる住宅に容易にスケールできる、(2)メンテナンスが容易、(3)エンドユーザにとって理解しやすいコントロールフレームワークを開発する必要がある。そのようなタスクの潜在的な制御フレームワークは、データ駆動型制御、特にモデルフリー強化学習(RL)である。このようなRLベースのコントローラは、環境と対話し、データに基づいて純粋に学習し、人間の介入を最小限に抑えて、優れた制御ポリシーを学習する。しかし、説明性に欠けており、ユーザーの受け入れを妨げている。さらに、住宅資産の限られたハードウェア能力はハードルとなる(例えば、ディープニューラルネットワークを使用する)。これらの課題を克服するために、微分可能な決定木を用いて説明可能なRLポリシーを得る新しい方法を提案する。政策蒸留アプローチを用いて、標準的なRLベースのコントローラを模倣するためにこれらの異なる決定木を訓練し、データ駆動型で説明しやすい決定木ベースの制御ポリシーを導出する。概念実証として,バッテリベース家庭用エネルギー管理システムにおける提案手法の性能と説明可能性について検討し,エネルギーコストの低減を図る。このユースケースでは,提案手法がベースラインルールベースのポリシーを20～25%上回り,シンプルで説明可能な制御ポリシーを提供する。さらに、これらの説明可能なポリシーを標準のRLポリシーと比較し、この説明可能性の増加に伴うパフォーマンストレードオフについて検討する。 Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3) is simple to understand for end-users. A potential control framework for such a task is data-driven control, specifically model-free reinforcement learning (RL). Such RL-based controllers learn a good control policy by interacting with their environment, learning purely based on data and with minimal human intervention. Yet, they lack explainability, which hampers user acceptance. Moreover, limited hardware capabilities of residential assets forms a hurdle (e.g., using deep neural networks). To overcome both those challenges, we propose a novel method to obtain explainable RL policies by using differentiable decision trees. Using a policy distillation approach, we train these differentiable decision trees to mimic standard RL-based controllers, leading to a decision tree-based control policy that is data-driven and easy to explain. As a proof-of-concept, we examine the performance and explainability of our proposed approach in a battery-based home energy management system to reduce energy costs. For this use case, we show that our proposed approach can outperform baseline rule-based policies by about 20-25%, while providing simple, explainable control policies. We further compare these explainable policies with standard RL policies and examine the performance trade-offs associated with this increased explainability.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# RoGUENeRF: NeRF用ロバストな幾何型ユニバーサルエンハンサー RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF ( http://arxiv.org/abs/2403.11909v1 ) ライセンス: Link先を確認	Sibi Catley-Chandar, Richard Shaw, Gregory Slabaugh, Eduardo Perez-Pellitero,	(参考訳) ニューラルレンダリングの最近の進歩は、高光写実性3Dシーンの再構築と新しいビュー合成を可能にしている。この進歩にもかかわらず、現在の最先端の手法は、放射界の低周波バイアスや不正確なカメラキャリブレーションなどの要因により、高周波詳細の再構築に苦慮している。この問題を緩和するための1つのアプローチは、レンダリング後のイメージを強化することである。 2Dエンハンサーは、いくつかの詳細を回復するために事前訓練することができるが、シーン幾何学には依存せず、画像劣化の新しい分布に容易に一般化することができない。逆に、既存の3Dエンハンサーは、近隣のトレーニング画像からの細部を一般化可能な方法で転送することができるが、不正確なカメラキャリブレーションに悩まされ、幾何学的誤差を描画画像に伝達することができる。両パラダイムの長所を生かしたニューラルレンダリングエンハンサーであるRoGUENeRFを提案する。本手法は,3次元アライメントと幾何認識融合により,近隣のトレーニング画像からの情報を活用するとともに,一般エンハンサーを学習するための事前訓練を行う。本手法は, 幾何整合性を維持しながら高周波テクスチャを復元すると共に, 不正確なカメラキャリブレーションにも頑健である。例えば、現実世界の360v2データセット上で、MipNeRF360のPSNRを0.63dB、Nerfactoを1.34dB改善する。 Recent advances in neural rendering have enabled highly photorealistic 3D scene reconstruction and novel view synthesis. Despite this progress, current state-of-the-art methods struggle to reconstruct high frequency detail, due to factors such as a low-frequency bias of radiance fields and inaccurate camera calibration. One approach to mitigate this issue is to enhance images post-rendering. 2D enhancers can be pre-trained to recover some detail but are agnostic to scene geometry and do not easily generalize to new distributions of image degradation. Conversely, existing 3D enhancers are able to transfer detail from nearby training images in a generalizable manner, but suffer from inaccurate camera calibration and can propagate errors from the geometry into rendered images. We propose a neural rendering enhancer, RoGUENeRF, which exploits the best of both paradigms. Our method is pre-trained to learn a general enhancer while also leveraging information from nearby training images via robust 3D alignment and geometry-aware fusion. Our approach restores high-frequency textures while maintaining geometric consistency and is also robust to inaccurate camera calibration. We show that RoGUENeRF substantially enhances the rendering quality of a wide range of neural rendering baselines, e.g. improving the PSNR of MipNeRF360 by 0.63dB and Nerfacto by 1.34dB on the real world 360v2 dataset.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# 分散型協調運転における単エージェントアクター批判 Single-Agent Actor Critic for Decentralized Cooperative Driving ( http://arxiv.org/abs/2403.11914v1 ) ライセンス: Link先を確認	Shengchao Yan, Lukas König, Wolfram Burgard,	(参考訳) 自動運転車(AV)を取り入れたアクティブな交通管理は、渋滞の低減と交通の流れの強化を約束する。しかし、現実のアプリケーションのためのアルゴリズムを開発するには、連続的なトラフィックフローと部分的な可観測性によって生じる課題に対処する必要がある。このギャップを埋めて、より分散化に向けての積極的な交通管理の分野を推し進めるために、単エージェント強化学習を用いて自律走行車のための分散型協調運転ポリシーを学習することを目的とした、新しい非対称アクター批判モデルを導入する。提案手法では,マスキングを用いたアテンションニューラルネットワークを用いて,現実の交通流の動的性質と部分観測可能性を扱う。各種交通シナリオのベースラインコントローラに対する広範囲な評価を通じて,道路システム内の多様なボトルネック箇所における交通流改善の可能性を示す。さらに、交通規制に厳格に従う自動運転車の保守的な運転行動に関わる課題についても検討する。実験の結果,提案する協調政策は,安全を損なうことなく,潜在的な交通の減速を緩和できることが示された。 Active traffic management incorporating autonomous vehicles (AVs) promises a future with diminished congestion and enhanced traffic flow. However, developing algorithms for real-world application requires addressing the challenges posed by continuous traffic flow and partial observability. To bridge this gap and advance the field of active traffic management towards greater decentralization, we introduce a novel asymmetric actor-critic model aimed at learning decentralized cooperative driving policies for autonomous vehicles using single-agent reinforcement learning. Our approach employs attention neural networks with masking to handle the dynamic nature of real-world traffic flow and partial observability. Through extensive evaluations against baseline controllers across various traffic scenarios, our model shows great potential for improving traffic flow at diverse bottleneck locations within the road system. Additionally, we explore the challenge associated with the conservative driving behaviors of autonomous vehicles that adhere strictly to traffic regulations. The experiment results illustrate that our proposed cooperative policy can mitigate potential traffic slowdowns without compromising safety.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# 多言語文埋め込みを用いた適応的バイリンガルアライディング Adaptative Bilingual Aligning Using Multilingual Sentence Embedding ( http://arxiv.org/abs/2403.11921v1 ) ライセンス: Link先を確認	Olivier Kraif,	(参考訳) 本稿では,AIlignと呼ばれる適応的ビット情報アライメントシステムを提案する。このアライダは文の埋め込みに依存して、並列性が断片的で厳密に単調ではないテキストであってもアライメントパスを導くことのできる信頼できるアンカーポイントを抽出する。いくつかのデータセットに対する実験では、AIlignが準線形複雑性を持つ最先端技術に匹敵する結果が得られることを示した。さらに、AIlignは、VecalignやBertalignのような最近のシステムとは異なり、並列性と単調性の性質が局所的にのみ満足されるテキストを扱うことができる。 In this paper, we present an adaptive bitextual alignment system called AIlign. This aligner relies on sentence embeddings to extract reliable anchor points that can guide the alignment path, even for texts whose parallelism is fragmentary and not strictly monotonic. In an experiment on several datasets, we show that AIlign achieves results equivalent to the state of the art, with quasi-linear complexity. In addition, AIlign is able to handle texts whose parallelism and monotonicity properties are only satisfied locally, unlike recent systems such as Vecalign or Bertalign.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# 多レベルアクター臨界による平均回帰RLにおける時間オラクルの混合のない大域的最適性 Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic ( http://arxiv.org/abs/2403.11925v1 ) ライセンス: Link先を確認	Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha,	(参考訳) 平均回帰強化学習の文脈では、混合時間のオラクル知識の要求、固定された政策の下でマルコフ連鎖の持続時間の測定は、その定常分布を達成する必要がある。この要件は、大きな状態空間を持つ環境での混合時間推定の困難さと費用が原因で特に問題となる。この制限に対処するために,マルチレベルモンテカルロ勾配推定器を組み込んだマルチレベルアクタ・クリティカル(MAC)フレームワークを検討する。提案手法では, 時間知識の混合への依存を効果的に緩和する。さらに,本手法は先行研究と比較して,$\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$の厳密な依存性を示す。 2次元グリッドワールドの目標到達航法実験により,MACが従来のPG法よりも高い報酬を得られることを示す。 In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications. To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest-available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$ relative to prior work. With a 2D gridworld goal-reaching navigation experiment, we demonstrate that MAC achieves higher reward than a previous PG-based method for average reward, Parameterized Policy Gradient with Advantage Estimation (PPGAE), especially in cases with relatively small training sample budget restricting trajectory length.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# LayerDiff:Layer-Collaborative Diffusion Modelによるテキスト誘導多層合成画像の探索 LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model ( http://arxiv.org/abs/2403.11929v1 ) ライセンス: Link先を確認	Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang, Renjing Pei, Guansong Lu, Songcen Xu, Wei Zhang, Hang Xu,	(参考訳) 拡散ベースの生成モデルによってテキストプロンプトが与えられると、高品質な画像を生成することに成功したが、以前の作業では画像全体を直接生成するが、オブジェクト指向の操作能力は提供できない。プロのグラフィックデザインやデジタルアートのようなより広範なリアルなアプリケーションをサポートするために、画像は複数の層で頻繁に作成され、操作され、柔軟性とコントロールが向上する。そこで本稿では,テキスト誘導,多層化,構成可能な画像合成のためのレイヤ協調拡散モデルであるLayerDiffを提案する。構成可能な画像は、背景層、前景層の集合、および各前景要素のための関連するマスク層からなる。これを実現するため、LayerDiffはレイヤ間のパターンをキャプチャするために複数のレイヤ協調アテンションモジュールを組み込んだレイヤベースの生成パラダイムを導入した。具体的には、層間アテンションモジュールは層間の情報交換と学習を促進するように設計され、テキスト誘導イントラアテンションモジュールは層固有のプロンプトを組み込んで各層に対して特定のコンテンツ生成を指示する。レイヤ固有のプロンプト強化モジュールは、グローバルプロンプトから詳細なテキストキューをキャプチャする。さらに、自己マスク誘導サンプリング戦略により、多層画像を生成するモデルの能力をさらに解き放つ。また、既存の知覚モデルと生成モデルを統合して、高品質でテキストプロンプされた多層画像の大規模なデータセットを生成するパイプラインを提案する。大規模な実験により,従来の全画像生成手法に匹敵する高画質の多層画像が生成可能であることが示された。さらにLayerDiffは、レイヤ固有の画像編集やスタイル転送など、幅広いコントロール可能な生成アプリケーションを可能にする。 Despite the success of generating high-quality images given any text prompts by diffusion-based generative models, prior works directly generate the entire images, but cannot provide object-wise manipulation capability. To support wider real applications like professional graphic design and digital artistry, images are frequently created and manipulated in multiple layers to offer greater flexibility and control. Therefore in this paper, we propose a layer-collaborative diffusion model, named LayerDiff, specifically designed for text-guided, multi-layered, composable image synthesis. The composable image consists of a background layer, a set of foreground layers, and associated mask layers for each foreground element. To enable this, LayerDiff introduces a layer-based generation paradigm incorporating multiple layer-collaborative attention modules to capture inter-layer patterns. Specifically, an inter-layer attention module is designed to encourage information exchange and learning between layers, while a text-guided intra-layer attention module incorporates layer-specific prompts to direct the specific-content generation for each layer. A layer-specific prompt-enhanced module better captures detailed textual cues from the global prompt. Additionally, a self-mask guidance sampling strategy further unleashes the model's ability to generate multi-layered images. We also present a pipeline that integrates existing perceptual and generative models to produce a large dataset of high-quality, text-prompted, multi-layered images. Extensive experiments demonstrate that our LayerDiff model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods. Moreover, LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# ニュートラル原子量子プロセッサを用いたグラフアルゴリズム Graph Algorithms with Neutral Atom Quantum Processors ( http://arxiv.org/abs/2403.11931v1 ) ライセンス: Link先を確認	Constantin Dalyac, Lucas Leclerc, Louis Vignoli, Mehdi Djellabi, Wesley da Silva Coelho, Bruno Ximenez, Alexandre Dareau, Davide Dreon, VIncent E. Elfving, Adrien Signoles, Louis-Paul Henry, Loïc Henriet,	(参考訳) ニュートラル原子技術は、量子アルゴリズムを実行するための最前線のプラットフォームとして位置づけられ、理論と実験の進歩を着実に証明してきた。この技術のユニークな利点の1つは、qubitレジスタのジオメトリをショットからショットに再構成できることである。このユニークな機能は、複雑な最適化と機械学習タスクの解決に重大な結果をもたらす、ハードウェアレベルでグラフ構造化問題のネイティブな埋め込みを可能にする。量子ビットを駆動することで、グラフ複素特性を保持する処理された量子状態を生成することができる。これらの状態は、問題への直接的な解決策や、ハイブリッド量子古典的スキームのリソースとして利用することができる。本稿では、中性原子量子処理ユニット(QPU)上で動作するグラフ問題に対する量子アルゴリズムの進歩を概観し、最近導入された埋め込みと問題解決技術について議論する。さらに、中性原子QPUのスケーラビリティ、制御可能性、計算繰り返し率の向上に重点を置いて、ハードウェアの継続的な進歩を明らかにした。 Neutral atom technology has steadily demonstrated significant theoretical and experimental advancements, positioning itself as a front-runner platform for running quantum algorithms. One unique advantage of this technology lies in the ability to reconfigure the geometry of the qubit register, from shot to shot. This unique feature makes possible the native embedding of graph-structured problems at the hardware level, with profound consequences for the resolution of complex optimization and machine learning tasks. By driving qubits, one can generate processed quantum states which retain graph complex properties. These states can then be leveraged to offer direct solutions to problems or as resources in hybrid quantum-classical schemes. In this paper, we review the advancements in quantum algorithms for graph problems running on neutral atom Quantum Processing Units (QPUs), and discuss recently introduced embedding and problem-solving techniques. In addition, we clarify ongoing advancements in hardware, with an emphasis on enhancing the scalability, controllability and computation repetition rate of neutral atom QPUs.	翻訳日:2024-03-20 19:40:35 公開日:2024-03-18
# 高エネルギー物理画像分類:ジェットの応用に関する調査 High-energy physics image classification: A Survey of Jet Applications ( http://arxiv.org/abs/2403.11934v1 ) ライセンス: Link先を確認	Hamza Kheddar, Yassine Himeur, Abbes Amira, Rachik Soualah,	(参考訳) 近年、高エネルギー物理(HEP)実験や現象学研究の分野では、機械学習(ML)とその専門分野である深層学習(DL)が統合されている。この調査は、様々なDLアプローチの範囲内で、これらの応用を包括的に評価する。本論文の最初のセグメントでは,様々な粒子物理学のタイプを包含する基礎について紹介し,素粒子物理を学習モデルと組み合わせて評価するための基準を確立する。その後、HEP画像、アクセス可能なデータセット、事前処理技術の詳細な詳細、特徴抽出と選択の方法などを表現するための包括的な分類法が提示される。その後、HEP画像に合わせて利用可能な人工知能(AI)モデルを探索し、ジェット粒子に関するHEP画像分類を精査する。本総説では, ML と DL が提案する最先端技術 (SOTA) について深く検討し, HEP 調査の意義について考察した。この議論は、ジェットタグ、ジェットトラッキング、粒子分類など、特定の応用をかなり詳細に掘り下げている。本調査は, DL方法論に基づくHEPの現状に関する分析から, 今後の研究課題と今後の課題を包括する。 In recent times, the fields of high-energy physics (HEP) experimentation and phenomenological studies have seen the integration of machine learning (ML) and its specialized branch, deep learning (DL). This survey offers a comprehensive assessment of these applications within the realm of various DL approaches. The initial segment of the paper introduces the fundamentals encompassing diverse particle physics types and establishes criteria for evaluating particle physics in tandem with learning models. Following this, a comprehensive taxonomy is presented for representing HEP images, encompassing accessible datasets, intricate details of preprocessing techniques, and methods of feature extraction and selection. Subsequently, the focus shifts to an exploration of available artificial intelligence (AI) models tailored to HEP images, along with a concentrated examination of HEP image classification pertaining to Jet particles. Within this review, a profound investigation is undertaken into distinct ML and DL proposed state-of-the art (SOTA) techniques, underscoring their implications for HEP inquiries. The discussion delves into specific applications in substantial detail, including Jet tagging, Jet tracking, particle classification, and more. The survey culminates with an analysis concerning the present status of HEP grounded in DL methodologies, encompassing inherent challenges and prospective avenues for future research endeavors.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# ハイパーカラー化:ハイパースペクトル画像再構成のための空間的疎雑音スペクトル手がかりの伝搬 HyperColorization: Propagating spatially sparse noisy spectral clues for reconstructing hyperspectral images ( http://arxiv.org/abs/2403.11935v1 ) ライセンス: Link先を確認	M. Kerem Aydin, Qi Guo, Emma Alexander,	(参考訳) ハイパースペクトルカメラは、空間分解能のトレードオフに挑戦しており、同じ露光時間で撮影されたRGB写真よりもショットノイズの影響を受けやすい。ここでは、グレースケールのガイド画像と空間的にスパースなスペクトル手がかりから、ハイパースペクトル画像を再構成するカラー化アルゴリズムを提案する。提案アルゴリズムは,ハイパースペクトル画像の様々なスペクトル次元に一般化し,低ランク空間におけるカラー化が計算時間とショットノイズの影響を減少させることを示す。頑健性を高めるため,ガイド付きサンプリング,エッジ認識フィルタリング,次元推定手法を取り入れた。提案手法は,SSIM,PSNR,GFC,EMDなどの様々な性能指標において過去のアルゴリズムを上回り,ハイパースペクトル画像品質を特徴付ける指標として分析する。これらの知見は、ウィスキーやプッシュブルームスキャナーで得られた試料から高スペクトル像を再構成することにより、時空間分解能トレードオフを克服するための有望な手段を提供するとともに、ハイブリッド空間分光イメージングシステムを提供する。 Hyperspectral cameras face challenging spatial-spectral resolution trade-offs and are more affected by shot noise than RGB photos taken over the same total exposure time. Here, we present a colorization algorithm to reconstruct hyperspectral images from a grayscale guide image and spatially sparse spectral clues. We demonstrate that our algorithm generalizes to varying spectral dimensions for hyperspectral images, and show that colorizing in a low-rank space reduces compute time and the impact of shot noise. To enhance robustness, we incorporate guided sampling, edge-aware filtering, and dimensionality estimation techniques. Our method surpasses previous algorithms in various performance metrics, including SSIM, PSNR, GFC, and EMD, which we analyze as metrics for characterizing hyperspectral image quality. Collectively, these findings provide a promising avenue for overcoming the time-space-wavelength resolution trade-off by reconstructing a dense hyperspectral image from samples obtained by whisk or push broom scanners, as well as hybrid spatial-spectral computational imaging systems.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# AIによる頸部がん検診 AI-Assisted Cervical Cancer Screening ( http://arxiv.org/abs/2403.11936v1 ) ライセンス: Link先を確認	Kanchan Poudel, Lisasha Poudel, Prabin Raj Shakya, Atit Poudel, Archana Shrestha, Bishesh Khanal,	(参考訳) 低所得国や中所得国(LMIC)では、好まれるが利用できない専門家である婦人科医の代わりに、看護師によるスクリーニングキャンプや一次・地域医療センターがしばしば実施されている。テストの主観的な性質に対処するため、カメラやスマートフォンを統合した様々なハンドヘルドデバイスが、最近、VIA中の頚部画像をキャプチャし、遠隔医療やAIモデルによる意思決定を支援するために研究されている。 AIモデルを提案するほとんどの研究は、特定のデバイス、デジタルカメラ、スマートフォンから収集された画像の比較的少数を振り返りに使用している。資源制約されたキャンプ設定におけるVIA中の品質画像取得の課題とプロトコルは、しばしば見過ごされがちである。本稿では,異なる統合デバイスを購入する必要のない,堅牢なスマートフォンベースのAI支援システムを構築するための,エンド・ツー・エンドの設計プロセスについて述べる。資源制約のある環境での高品質な画像取得のためのプロトコル,キャンプ,前処理パイプライン,深層学習に基づく分類モデルのトレーニングと評価において,看護師が実施するVIA中の1,430人の女性から収集したデータセット。我々の研究は、容易に利用可能なスマートフォンと適切なプロトコルが、VIAテストに必要な詳細でcervixイメージをキャプチャできることを示し、深層学習に基づく分類モデルは、VIAスクリーニングにおける看護師を支援するための有望な結果を提供し、リソース制約された設定における大規模データ収集と検証の方向性を提供する。 Visual Inspection with Acetic Acid (VIA) remains the most feasible cervical cancer screening test in resource-constrained settings of low- and middle-income countries (LMICs), which are often performed screening camps or primary/community health centers by nurses instead of the preferred but unavailable expert Gynecologist. To address the highly subjective nature of the test, various handheld devices integrating cameras or smartphones have been recently explored to capture cervical images during VIA and aid decision-making via telemedicine or AI models. Most studies proposing AI models retrospectively use a relatively small number of already collected images from specific devices, digital cameras, or smartphones; the challenges and protocol for quality image acquisition during VIA in resource-constrained camp settings, challenges in getting gold standard, data imbalance, etc. are often overlooked. We present a novel approach and describe the end-to-end design process to build a robust smartphone-based AI-assisted system that does not require buying a separate integrated device: the proposed protocol for quality image acquisition in resource-constrained settings, dataset collected from 1,430 women during VIA performed by nurses in screening camps, preprocessing pipeline, and training and evaluation of a deep-learning-based classification model aimed to identify (pre)cancerous lesions. Our work shows that the readily available smartphones and a suitable protocol can capture the cervix images with the required details for the VIA test well; the deep-learning-based classification model provides promising results to assist nurses in VIA screening; and provides a direction for large-scale data collection and validation in resource-constrained settings.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# 畳み込み層に対する Roesser 型の状態空間表現 State space representations of the Roesser type for convolutional layers ( http://arxiv.org/abs/2403.11938v1 ) ライセンス: Link先を確認	Patricia Pauli, Dennis Gramlich, Fran Allgöwer,	(参考訳) 制御理論の観点からは、畳み込み層(ニューラルネットワーク)は2-D(またはN-D)線形時間不変力学系である。畳み込みカーネルによる畳み込み層の通常の表現は、そのインパルス応答による力学系の表現に対応する。しかし、制御理論からの多くの解析ツール、例えば線型行列の不等式は状態空間表現を必要とする。この理由から、我々は、$c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ state, where $c_\mathrm{in}$/c_\mathrm{out}$は層の入出力チャネルの数であり、$r_1$/$r_2$は、畳み込みカーネルの幅と長さを特徴づける。この表現は$c_\mathrm{in} = c_\mathrm{out}$に対して最小であることが示されている。さらに、拡張、ストライド、N-D畳み込みのための状態空間表現を構築する。 From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# マルチステップの逆は必要なだけではない Multistep Inverse Is Not All You Need ( http://arxiv.org/abs/2403.11940v1 ) ライセンス: Link先を確認	Alexander Levine, Peter Stone, Amy Zhang,	(参考訳) 実世界の制御環境では、観測空間は不要に高次元であり、時間関連ノイズにさらされることが多い。しかし、制御可能なシステムの力学は、しばしば生の観測の力学よりもはるかに単純である。したがって、観測空間を制御関連変数のより単純な空間にマッピングするエンコーダを学ぶことが望ましい。本研究では,Efroni et al (2022) が最初に提案したEx-BMDPモデルについて考察する。 Lamb et al (2022) は、エンコーダを学習し、そのような問題の観測から完全な行動依存潜在状態表現を抽出する「AC状態」法を提案する。 AC-Stateは、パス内の最初のアクションを予測するために、パス内の最初の状態と最後の状態のエンコーディングを使用する、多段階逆法である。しかし、AC-Stateがエージェント制御可能因子の正しい潜在表現を学習できないケースを特定する。そこで我々は,多段階逆予測と潜在前方モデルを組み合わせた新しいアルゴリズムACDFを提案する。 ACDFは、多数のEx-BMDPモデルに対して、アクション依存の潜在状態エンコーダを正しく推論することが保証されている。ニューラルネットワークを用いたエンコーダを用いた高次元環境だけでなく, 数値シミュレーションによる表計算元BMDPに対するACDFの有効性を実証する。コードはhttps://github.com/midi-lab/acdf.comで入手できる。 In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# 微分決定木を用いた説明可能な強化学習に基づく家庭エネルギー管理システム Explainable Reinforcement Learning-based Home Energy Management Systems using Differentiable Decision Trees ( http://arxiv.org/abs/2403.11947v1 ) ライセンス: Link先を確認	Gargya Gokhale, Bert Claessens, Chris Develder,	(参考訳) エネルギーの継続的な移行により、需要側の柔軟性は、グリッドのサポートと持続可能なエネルギー源のさらなる統合を可能にするため、現代の電力網の重要な側面となっている。従来の供給源の他に、住宅セクターは太陽光発電、家庭用バッテリー、EVの採用の増加によって、柔軟性を損なう主要な供給源となっている。しかし、家庭のエネルギー消費を効果的に管理し、多様な住宅で容易にスケーラブルにしながら、利用者の快適性を維持するためのコントロールフレームワークが必要であるため、この住宅の柔軟性の解放は困難である。我々は、この課題に対処し、微分可能な決定木を用いた強化学習に基づくアプローチを導入することを目指している。このアプローチは、データ駆動強化学習のスケーラビリティと(微分可能)決定木の説明可能性を統合する。これにより、さまざまな家庭に容易に適応できるコントローラが実現され、エンドユーザに説明可能なシンプルなコントロールポリシが提供され、ユーザの受け入れがさらに向上します。概念実証として,家庭内エネルギー管理問題を用いて提案手法を解析し,その性能を市販のルールベースベースラインと標準ニューラルネットワークベースRLコントローラと比較した。本研究により,提案手法の性能は標準のRLコントローラに匹敵するものであり,日常的なコスト削減の点において,ベースラインコントローラを約20%上回り,説明が容易であることを示す。 With the ongoing energy transition, demand-side flexibility has become an important aspect of the modern power grid for providing grid support and allowing further integration of sustainable energy sources. Besides traditional sources, the residential sector is another major and largely untapped source of flexibility, driven by the increased adoption of solar PV, home batteries, and EVs. However, unlocking this residential flexibility is challenging as it requires a control framework that can effectively manage household energy consumption, and maintain user comfort while being readily scalable across different, diverse houses. We aim to address this challenging problem and introduce a reinforcement learning-based approach using differentiable decision trees. This approach integrates the scalability of data-driven reinforcement learning with the explainability of (differentiable) decision trees. This leads to a controller that can be easily adapted across different houses and provides a simple control policy that can be explained to end-users, further improving user acceptance. As a proof-of-concept, we analyze our method using a home energy management problem, comparing its performance with commercially available rule-based baseline and standard neural network-based RL controllers. Through this preliminary study, we show that the performance of our proposed method is comparable to standard RL-based controllers, outperforming baseline controllers by ~20% in terms of daily cost savings while being straightforward to explain.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# 空間曲率における非線形性を符号化する力学系学習 Learning Dynamical Systems Encoding Non-Linearity within Space Curvature ( http://arxiv.org/abs/2403.11948v1 ) ライセンス: Link先を確認	Bernardo Fichera, Aude Billard,	(参考訳) 動的システム(DS)は、ロボット制御のための高レベルポリシーを効果的かつ強力に形成する手段である。それらは、駆動ベクトル場の安定性を確保しながら、堅牢で反応性の高い制御を提供する。現実のシナリオの複雑さが増大するにつれ、DSはより高度な非線形性を必要とし、障害物のような環境条件の変化に適応する能力も必要となる。 DSの現在の学習戦略は、しばしばトレードオフを伴い、学習されたDSの能力を高めるために、安定性保証かオフラインの計算効率を犠牲にする。環境変化に対するオンラインの地域適応は考慮されないか、別の問題として扱われる。本稿では,学習したDSの複雑性を,トレーニングや安定性保証において効率を損なうことなく向上させる手法を提案する。さらに,初期学習されたDSの非線形性と環境の変化によって生じる任意の局所的非線形性とをシームレスに統合するための統一的なアプローチを提案する。本稿では,ロボット制御のための漸近的に安定な非線形DSを学習するための幾何学的アプローチを提案する。各DSは、潜在多様体上の調和減衰振動子としてモデル化される。多様体のユークリッド埋め込み表現を学習することにより、我々のアプローチは空間の曲率内のDSの非線形性を符号化する。多様体の明示的な埋め込み表現を持つことで、空間の局所的な変形を直接誘導することによって障害物回避を示すことができる。まず,合成ベクトル場の2次元学習と,実環境における3次元ロボットのエンドエフェクタ動作の学習の2つのシナリオを通して,方法論の有効性を実証する。 Dynamical Systems (DS) are an effective and powerful means of shaping high-level policies for robotics control. They provide robust and reactive control while ensuring the stability of the driving vector field. The increasing complexity of real-world scenarios necessitates DS with a higher degree of non-linearity, along with the ability to adapt to potential changes in environmental conditions, such as obstacles. Current learning strategies for DSs often involve a trade-off, sacrificing either stability guarantees or offline computational efficiency in order to enhance the capabilities of the learned DS. Online local adaptation to environmental changes is either not taken into consideration or treated as a separate problem. In this paper, our objective is to introduce a method that enhances the complexity of the learned DS without compromising efficiency during training or stability guarantees. Furthermore, we aim to provide a unified approach for seamlessly integrating the initially learned DS's non-linearity with any local non-linearities that may arise due to changes in the environment. We propose a geometrical approach to learn asymptotically stable non-linear DS for robotics control. Each DS is modeled as a harmonic damped oscillator on a latent manifold. By learning the manifold's Euclidean embedded representation, our approach encodes the non-linearity of the DS within the curvature of the space. Having an explicit embedded representation of the manifold allows us to showcase obstacle avoidance by directly inducing local deformations of the space. We demonstrate the effectiveness of our methodology through two scenarios: first, the 2D learning of synthetic vector fields, and second, the learning of 3D robotic end-effector motions in real-world settings.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# 決定論的に生成されたフォトニックグラフ状態の融合 Fusion of deterministically generated photonic graph states ( http://arxiv.org/abs/2403.11950v1 ) ライセンス: Link先を確認	Philip Thomas, Leonardo Ruscio, Olivier Morin, Gerhard Rempe,	(参考訳) 絡み合いは、量子物理学の謎的な概念から、量子技術の鍵となる要素へと進化してきた。これは古典物理学と矛盾する測定結果の相関を説明し、個々の量子ビットの小さな集合で広く研究されてきた。ゲートベースの量子計算プロトコルで構築されたマルチパーティの絡み合った状態と、より広い視点から見れば$\unicode{x2013}$が、測定ベースの量子情報処理の主資源として提案された。後者は、グラフによって記述された多ビットの絡み合った状態の元アンテ生成を必要とする。ベル状態や線形クラスタ状態のような小さなグラフ状態は光子で生成されているが、提案された量子コンピューティングと量子ネットワークアプリケーションでは、プログラム可能な方法でそのような状態がより大きくより強力な状態に融合する必要がある。ここではこの目的を達成するために、2つの個別に対応可能な原子を1つの光共振器に採用する。最大8キュービットのリングおよびツリーグラフ状態は、絡み合いトポロジーを反映した名前であり、個々の原子によって放出されるフォトニック状態から効率的に融合する。融合過程自体は、2つの原子の間に空洞補助ゲートを用いる。我々の技術は原則として、より多くの量子ビットに対してスケーラブルであり、例えば将来の量子インターネットにおけるメモリレス量子リピータへの決定的なステップである。 Entanglement has evolved from an enigmatic concept of quantum physics to a key ingredient of quantum technology. It explains correlations between measurement outcomes that contradict classical physics, and has been widely explored with small sets of individual qubits. Multi-partite entangled states build up in gate-based quantum-computing protocols, and $\unicode{x2013}$ from a broader perspective $\unicode{x2013}$ were proposed as the main resource for measurement-based quantum-information processing. The latter requires the ex-ante generation of a multi-qubit entangled state described by a graph. Small graph states such as Bell or linear cluster states have been produced with photons, but the proposed quantum computing and quantum networking applications require fusion of such states into larger and more powerful states in a programmable fashion. Here we achieve this goal by employing two individually addressable atoms in one optical resonator. Ring and tree graph states with up to eight qubits, with the names reflecting the entanglement topology, are efficiently fused from the photonic states emitted by the individual atoms. The fusion process itself employs a cavity-assisted gate between the two atoms. Our technique is in principle scalable to even larger numbers of qubits, and is the decisive step towards, for instance, a memory-less quantum repeater in a future quantum internet.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# エストニアのオープン・ガバメント・データ開発と卓越への旅--オープン・ガバメント・プロポーテーションにおける地方自治体の進展をめざして Exploring Estonia's Open Government Data Development as a Journey towards Excellence: Unveiling the Progress of Local Governments in Open Data Provision ( http://arxiv.org/abs/2403.11952v1 ) ライセンス: Link先を確認	Katrin Rajamäe-Soosaar, Anastasija Nikiforova,	(参考訳) エストニアは、デジタル国家または電子国家という世界的名声を持っている。しかし、デジタルガバナンスの成功にもかかわらず、この国はオープン・ガバメント・データ(OGD)領域の領域で課題に直面しており、2020年以降の様々なオープン・データランキングに反映されるように、OGDエコシステムに大きな進歩を遂げている。本稿では,エストニアのOGD開発の発展と位置づけについて,さまざまな指標の統合分析,エストニアのOGDポータルからの一次データ,詳細な文献レビューを通じて検討する。この調査は、エストニアが全国レベルのオープンデータエコシステムを進歩させたことを示している。しかし、地方レベルでは発展せず、地方自治体はOGD規定に遅れを取っている。文献レビューは、エストニアとヨーロッパの地方オープンデータに焦点を当てた以前の研究の欠如を強調し、市町村のOGDの障壁と有効性を探究する将来の研究の必要性を強調している。この研究は、エストニアのOGDランドスケープにおけるダイナミックな旅の微妙な理解に寄与し、持続可能なオープンデータエコシステムを確立するためのさらなる注意を喚起する成果と領域の両方に光を当てている。 Estonia has a global reputation of a digital state or e-country. However, despite the success in digital governance, the country has faced challenges in the realm of Open Government Data (OGD) area, with significant advancements in its OGD ecosystem, as reflected in various open data rankings from 2020 and onwards, in the recent years being recognized among trend-setters. This paper aims to explore the evolution and positioning of Estonia's OGD development, encompassing national and local levels, through an integrated analysis of various indices, primary data from the Estonian OGD portal, and a thorough literature review. The research shows that Estonia has made progress in the national level open data ecosystem, primarily due to improvements in the OGD portal usability and legislation amendments. However, the local level is not as developed, with local governments lagging behind in OGD provision. The literature review highlights the lack of previous research focusing on Estonian and European local open data, emphasizing the need for future studies to explore the barriers and enablers of municipal OGD. This study contributes to a nuanced understanding of Estonia's dynamic journey in the OGD landscape, shedding light on both achievements and areas warranting further attention for establishing a sustainable open data ecosystem.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# 3次元CT画像におけるCOVID-19検出の進歩 Advancing COVID-19 Detection in 3D CT Scans ( http://arxiv.org/abs/2403.11953v1 ) ライセンス: Link先を確認	Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen,	(参考訳) より正確な新型コロナウイルスの診断を行うために,本研究では,単純で効果的なモデルを提案する。まず,3次元CTスキャンの特徴を分析し,非肺部位を除去し,病変関連領域に焦点をあてることと計算コストの低減を図る。我々はResNeSt50を強力な特徴抽出器として使用し、新型コロナウイルス特異的な事前知識を持つ事前訓練した重量で初期化する。本モデルは,第4回COV19Dコンペティションチャレンジ$\mathrm{I}$の検証セットで0.94のマクロF1スコアを達成し,ベースラインを16%超えた。これは、新型コロナウイルス(COVID-19)と非新型コロナウイルス(COVID-19)を区別する効果を示しており、新型コロナウイルス検出の堅牢な方法となっている。 To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# ディープラーニングによる言語進化 Language Evolution with Deep Learning ( http://arxiv.org/abs/2403.11958v1 ) ライセンス: Link先を確認	Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub,	(参考訳) 計算モデリングは言語の出現の研究において重要な役割を担っている。それは、シミュレートされた制御環境内で構造化言語が出現するきっかけとなる条件と学習過程をシミュレートすることを目的としている。エージェントベースシステム,ベイズエージェント,遺伝的アルゴリズム,ルールベースシステムなど,言語の起源を調べるために,いくつかの手法が用いられている。この章では、最近機械学習の分野に革命をもたらした別の種類の計算モデル、ディープ・ラーニング・モデルについて論じる。この章では、深層・強化学習法の基本概念を紹介し、言語の出現をシミュレートするための有用性を要約する。また、現実的なシミュレーションを構築するための重要な発見、制限、最近の試みについても論じている。この章は、言語進化を研究するツールとしてディープラーニングの導入を求めている言語学者や認知科学者を対象としている。 Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several methods have been used to investigate the origin of our language, including agent-based systems, Bayesian agents, genetic algorithms, and rule-based systems. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models. The chapter introduces the basic concepts of deep and reinforcement learning methods and summarizes their helpfulness for simulating language emergence. It also discusses the key findings, limitations, and recent attempts to build realistic simulations. This chapter targets linguists and cognitive scientists seeking an introduction to deep learning as a tool to investigate language evolution.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# IVAC-P2L:不規則な繰り返しを通したビデオアクションカウントの強化 IVAC-P2L: Enhancing Video Action Counting through Irregular Repetition Priors ( http://arxiv.org/abs/2403.11959v1 ) ライセンス: Link先を確認	Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang,	(参考訳) ビデオアクションカウント(英語: Video Action Counting, VAC)は、スポーツ、フィットネス、日々の活動を分析し、ビデオ内の反復行動の定量化に重要である。しかしながら、従来のVAC手法は、割り込みや周期の変動など、アクション反復の複雑さを見落としている。本研究は,IVAC(Irregular Video Action Counting)と呼ばれる新しいアプローチを導入することで,その欠点に対処する。 IVACはビデオにおける不規則な反復パターンのモデリングを優先し、サイクル間一貫性とサイクル間一貫性の2つの主要な側面で定義する。サイクル間一貫性は、サイクルセグメントの時空間表現における均一性を保証する。サイクル間不整合は、その固有の内容の違いに基づいて、サイクルセグメントと間隔を区別することの重要性を強調している。これらの原則をカプセル化するために,一意のプル・プッシュ・ロス(P2L)機構によって支持される一貫性と不整合モジュールを含む新しい方法論を提案する。 IVAC-P2Lモデルでは、周期セグメントの特徴間のコヒーレンスを促進するためにプルロスと、周期セグメントの特徴と間隔セグメントを明確に区別するためにプッシュロスを適用している。 RepCountデータセットで実施された実証評価では、IVAC-P2LモデルがVACタスク性能の新たなベンチマークを設定できることが示されている。さらに、このモデルは、データセット固有の最適化を必要とせずに、UCFRepとCountixという2つの追加データセット上で既存のモデルよりも優れた、様々なビデオコンテンツに対する例外的な適応性と一般化を示す。これらの結果は,ビデオにおける不規則な繰り返しに対処するためのアプローチの有効性を確認し,ビデオ分析と理解のさらなる進歩の道を開くものである。 Video Action Counting (VAC) is crucial in analyzing sports, fitness, and everyday activities by quantifying repetitive actions in videos. However, traditional VAC methods have overlooked the complexity of action repetitions, such as interruptions and the variability in cycle duration. Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC). IVAC prioritizes modeling irregular repetition patterns in videos, which we define through two primary aspects: Inter-cycle Consistency and Cycle-interval Inconsistency. Inter-cycle Consistency ensures homogeneity in the spatial-temporal representations of cycle segments, signifying action uniformity within cycles. Cycle-interval inconsistency highlights the importance of distinguishing between cycle segments and intervals based on their inherent content differences. To encapsulate these principles, we propose a new methodology that includes consistency and inconsistency modules, supported by a unique pull-push loss (P2L) mechanism. The IVAC-P2L model applies a pull loss to promote coherence among cycle segment features and a push loss to clearly distinguish features of cycle segments from interval segments. Empirical evaluations conducted on the RepCount dataset demonstrate that the IVAC-P2L model sets a new benchmark in VAC task performance. Furthermore, the model demonstrates exceptional adaptability and generalization across various video contents, outperforming existing models on two additional datasets, UCFRep and Countix, without the need for dataset-specific optimization. These results confirm the efficacy of our approach in addressing irregular repetitions in videos and pave the way for further advancements in video analysis and understanding.	翻訳日:2024-03-20 19:30:44 公開日:2024-03-18
# Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation CASPER: Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation ( http://arxiv.org/abs/2403.11960v1 ) ライセンス: Link先を確認	Baoyu Jing, Dawei Zhou, Kan Ren, Carl Yang,	(参考訳) 時空間時系列は人間の活動とその影響を理解する基礎であり、通常は異なる場所に配置された監視センサーを通して収集される。収集されたデータは通常、さまざまな障害のために欠落した値を含んでおり、データ分析に大きな影響を及ぼす。欠落した値を暗示するために、多くのメソッドが導入されている。特定のデータポイントを復元する場合、ほとんどの既存手法は、原因と効果の関係の有無にかかわらず、そのポイントに関連するすべての情報を考慮する傾向にある。データ収集の過程では、例えば時系列のバックグラウンドノイズや、構築されたセンサネットワーク内の非因果的ショートカットエッジなど、未知の共同創設者が含まれていることは避けられない。これらの共同設立者は、インプットとアウトプットの間にバックドアパスを開くことができ、言い換えれば、インプットとアウトプットの間に非因果関係を確立することができる。これらの非因果関係を過度に探索すると、過度に適合し、モデルをノイズに弱いものにすることができる。本稿では,入力,出力,埋め込み,共同設立者間の因果関係を示す因果的視点から,まず時空間的時系列計算を再考する。次に、玄関の調整を通じて共同ファウンダーをブロックする方法を示す。正面調整の結果に基づき,新しい時空間注意 (SCA) と Prompt Based Decoder (PBD) を含むCausality-Aware SPatiotEmpoRal graph Neural Network (CASPER) を導入する。 PBDは共同設立者の影響を減らし、SCAは埋め込み間の微妙な因果関係を発見する可能性がある。理論的解析によると、SCAは勾配の値に基づいて因果関係を発見する。実世界の3つのデータセット上でCasperを評価し,実験結果から,Casperはベースラインよりも優れ,因果関係を効果的に発見できることが示された。 Spatiotemporal time series is the foundation of understanding human activities and their impacts, which is usually collected via monitoring sensors placed at different locations. The collected data usually contains missing values due to various failures, which have significant impact on data analysis. To impute the missing values, a lot of methods have been introduced. When recovering a specific data point, most existing methods tend to take into consideration all the information relevant to that point regardless of whether they have a cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths between the input and output, in other words, they establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could result in overfitting and make the model vulnerable to noises. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective, which shows the causal relationships among the input, output, embeddings and confounders. Next, we show how to block the confounders via the frontdoor adjustment. Based on the results of the frontdoor adjustment, we introduce a novel Causality-Aware SPatiotEmpoRal graph neural network (CASPER), which contains a novel Spatiotemporal Causal Attention (SCA) and a Prompt Based Decoder (PBD). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper outperforms the baselines and effectively discovers causal relationships.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 運動補償によるイベントベースビデオ再構成の強化 Enhanced Event-Based Video Reconstruction with Motion Compensation ( http://arxiv.org/abs/2403.11961v1 ) ライセンス: Link先を確認	Siying Liu, Pier Luigi Dragotti,	(参考訳) イベントベースのビデオ再構成のためのディープニューラルネットワークは、解釈可能性の欠如と高いメモリ要求に悩まされることが多い。 CISTA-LSTCと呼ばれる軽量ネットワークが最近導入され、アーキテクチャの体系的設計により高品質な再構築が達成されている。しかし、入力信号と出力再構成フレームが同じスパース表現を共有するというモデリング仮定は、動きによる変位を無視する。そこで本研究では,入力強度フレームとスパース符号の歪みを補正し,再現性を向上させることを提案する。 CISTA-Flowネットワークは、動き補償のためのフローネットワークとCISTA-LSTCを統合することで構築される。このシステムは、予測フローが再構築に役立ち、フロー推定を容易にするために再構築されたフレームを使用するイベントにのみ依存する。この組み合わせシステムのための反復的なトレーニングフレームワークも導入する。以上の結果から,本手法は最先端の復元精度を達成し,信頼性の高い高密度流れ推定を同時に提供することを示す。さらに,本モデルでは,異なるフローネットワークを統合可能な柔軟性を示し,さらなる性能向上の可能性を示唆している。 Deep neural networks for event-based video reconstruction often suffer from a lack of interpretability and have high memory demands. A lightweight network called CISTA-LSTC has recently been introduced showing that high-quality reconstruction can be achieved through the systematic design of its architecture. However, its modelling assumption that input signals and output reconstructed frame share the same sparse representation neglects the displacement caused by motion. To address this, we propose warping the input intensity frames and sparse codes to enhance reconstruction quality. A CISTA-Flow network is constructed by integrating a flow network with CISTA-LSTC for motion compensation. The system relies solely on events, in which predicted flow aids in reconstruction and then reconstructed frames are used to facilitate flow estimation. We also introduce an iterative training framework for this combined system. Results demonstrate that our approach achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation. Furthermore, our model exhibits flexibility in that it can integrate different flow networks, suggesting its potential for further performance enhancement.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 境界密度比を超えた伝達学習 Transfer Learning Beyond Bounded Density Ratios ( http://arxiv.org/abs/2403.11963v1 ) ライセンス: Link先を確認	Alkis Kalavasis, Ilias Zadik, Manolis Zampetakis,	(参考訳) 学習アルゴリズムは、あるソース分布からデータを収集するが、異なるターゲット分布に対して$Q$である。測度論の標準的な変化は、転送学習が密度比$dQ/dP$が有界であるときに起こることを意味する。しかし、Kpotufe と Martinet (COLT, 2018) と Hanneke と Kpotufe (NeurIPS, 2019) による事前の思考誘発研究は、dQ/dP$ の比率が未有のケースを実証している。本研究では,低次多項式推定器のクラスにおける伝達学習に着目した。我々の主な結果は、領域 $\mathbb{R}^n$ 上の一般的な移動不等式であり、低次多項式に対する非自明な移動学習は、非常に穏やかな仮定の下で可能であることを証明し、$dQ/dP$ が有界であるという古典的な仮定をはるかに超えている。例えば、$Q$ が対数凹測度であり、逆比 $dP/dQ$ が有界である場合、常に適用される。不等式の適用性を実証するため,(1) 古典的truncated regression set, where $dQ/dP$ equals infinity, (2) より最近の変換器を用いたインコンテキスト学習線形関数のアウト・オブ・ディストリビューション一般化set という設定で新たな結果を得た。また、Boolean Hypercube $\{-1,1\}^n$ 上での移動不等式を離散的に類似させ、Abe, Bengio, Lotfi, Rizk (ICML, 2023) の不等式に関する最近の一般化問題との関係について研究する。我々の主要な概念的貢献は、推定器 $\widehat{f}-f^$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^)$ の誤差の最大値は、転送可能性の十分な条件として作用するということである。 We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio $dQ/dP$ is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain $\mathbb{R}^n$, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded. For instance, it always applies if $Q$ is a log-concave measure and the inverse ratio $dP/dQ$ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where $dQ/dP$ equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube $\{-1,1\}^n$, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator $\widehat{f}-f^$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^)$, acts as a sufficient condition for transferability; when $\mathrm{I}_{\max}(\widehat{f}-f^*)$ is appropriately bounded, transfer is possible over the Boolean domain.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# ニューラルネットワーク回帰設計による確率的校正 Probabilistic Calibration by Design for Neural Network Regression ( http://arxiv.org/abs/2403.11964v1 ) ライセンス: Link先を確認	Victor Dheur, Souhaib Ben Taieb,	(参考訳) 多くの実世界のアプリケーションにおいて、回帰問題に対する校正および鋭いニューラルネットワーク予測分布の生成は最適な意思決定に不可欠である。ニューラルネットワークの誤校正問題に対処するために、トレーニング後の予測を調整するポストホック法や、トレーニング中に行動する正規化法など、キャリブレーションを改善するための様々な方法が提案されている。ポストホック法は正則化法に比べてキャリブレーションの改善が進んでいるが, ポストホック法はモデルトレーニングから完全に独立している。本稿では,量子校正トレーニング(Quantile Recalibration Training)と呼ばれる新しいエンド・ツー・エンドのモデルトレーニング手法を導入し,時間後校正を追加パラメータなしで直接トレーニングプロセスに統合する。また,本手法や他のポストホック法および正規化法を含む統一アルゴリズムを提案する。本研究では,57個のグラフ回帰データセットを用いた大規模実験を行い,キャリブレーションを維持しながら予測精度の向上を示す。また,提案手法における異なる成分の意義を評価するためのアブレーション研究や,ベースモデルと異なるハイパーパラメータが予測精度に与える影響の詳細な分析を行った。 Generating calibrated and sharp neural network predictive distributions for regression problems is essential for optimal decision-making in many real-world applications. To address the miscalibration issue of neural networks, various methods have been proposed to improve calibration, including post-hoc methods that adjust predictions after training and regularization methods that act during training. While post-hoc methods have shown better improvement in calibration compared to regularization methods, the post-hoc step is completely independent of model training. We introduce a novel end-to-end model training procedure called Quantile Recalibration Training, integrating post-hoc calibration directly into the training process without additional parameters. We also present a unified algorithm that includes our method and other post-hoc and regularization methods, as particular cases. We demonstrate the performance of our method in a large-scale experiment involving 57 tabular regression datasets, showcasing improved predictive accuracy while maintaining calibration. We also conduct an ablation study to evaluate the significance of different components within our proposed method, as well as an in-depth analysis of the impact of the base model and different hyperparameters on predictive accuracy.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 軌道予測のためのインフォームドスペクトル正規化ガウス過程 Informed Spectral Normalized Gaussian Processes for Trajectory Prediction ( http://arxiv.org/abs/2403.11966v1 ) ライセンス: Link先を確認	Christian Schlauch, Christian Wirth, Nadja Klein,	(参考訳) 事前パラメータ分布は、情報学習のための事前の専門家と世界の知識を表現するエレガントな方法を提供する。従来の研究は、確率的深層学習(DL)モデルを正規化するために、そのような情報的事前の使用により、その性能とデータ効率が向上することが示されている。しかし、確率的DLモデルのサンプリングベース近似は、複数の推論パスと長いトレーニング時間を必要とするため、計算コストがかかる可能性がある。提案手法は、スペクトル正規化ガウス過程(SNGP)のような計算効率のよい最終層カーネル近似である。本稿では,従来のタスクから学習した事前知識を表す情報的事前情報の利用を可能にする,新しい正規化に基づくSNGPの連続学習手法を提案する。提案手法は確立された手法に基づいており,リハーサルメモリやパラメータ拡張を必要としない。本研究では, 自律運転における軌道予測問題に対する情報SNGPモデルの適用について検討した。 2つの公開データセットにおいて、トレーニングデータとロケーション間の性能を低下させ、非インフォームドベースラインおよびインフォームドベースライン上でのデータ効率とロバスト性の向上を実証する。 Prior parameter distributions provide an elegant way to represent prior expert and world knowledge for informed learning. Previous work has shown that using such informative priors to regularize probabilistic deep learning (DL) models increases their performance and data-efficiency. However, commonly used sampling-based approximations for probabilistic DL models can be computationally expensive, requiring multiple inference passes and longer training times. Promising alternatives are compute-efficient last layer kernel approximations like spectral normalized Gaussian processes (SNGPs). We propose a novel regularization-based continual learning method for SNGPs, which enables the use of informative priors that represent prior knowledge learned from previous tasks. Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion. We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge. On two public datasets, we investigate its performance under diminishing training data and across locations, and thereby demonstrate an increase in data-efficiency and robustness to location-transfers over non-informed and informed baselines.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 強相互干渉型超伝導回路格子におけるサイト分解電流の探索 Probing Site-Resolved Current in Strongly Interacting Superconducting Circuit Lattices ( http://arxiv.org/abs/2403.11967v1 ) ライセンス: Link先を確認	Botao Du, Ramya Suresh, Santiago López, Jeremy Cadiente, Ruichao Ma,	(参考訳) 輸送測定は、超伝導から分数量子ホール効果まで、凝縮物質現象を理解するための基礎となる。対照的に、これらは量子シミュレーターで合成量子物質を探索するための強力なツールである。ここでは超伝導回路格子内のその場粒子電流の測定を実演し、コヒーレントおよびバス結合格子の輸送の研究に応用する。本手法は,2重井戸電位による制御トンネル法を用いて,電流をオンサイト密度にマッピングし,サイト解決電流と電流統計を明らかにする。格子充填の異なるBose-Hubbard格子を強く相互作用させ、多体状態が超流動からモット絶縁体へ遷移するにつれて現在の統計の変化を観察する。さらに、格子を調整可能な粒子源および排水源として機能する工学的駆動散逸浴に結合させることにより、非平衡電流力学を考察する。離散導電路および相互作用支援輸送路における定常電流を観測する。これらの結果は超伝導回路における微視的量子輸送を研究するための多用途プラットフォームを確立する。 Transport measurements are fundamental for understanding condensed matter phenomena, from superconductivity to the fractional quantum Hall effect. Analogously, they can be powerful tools for probing synthetic quantum matter in quantum simulators. Here we demonstrate the measurement of in-situ particle current in a superconducting circuit lattice and apply it to study transport in both coherent and bath-coupled lattices. Our method utilizes controlled tunneling in a double-well potential to map current to on-site density, revealing site-resolved current and current statistics. We prepare a strongly interacting Bose-Hubbard lattice at different lattice fillings, and observe the change in current statistics as the many-body states transition from superfluid to Mott insulator. Furthermore, we explore non-equilibrium current dynamics by coupling the lattice to engineered driven-dissipative baths that serve as tunable particle source and drain. We observe steady-state current in discrete conduction channels and interaction-assisted transport. These results establish a versatile platform to investigate microscopic quantum transport in superconducting circuits.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# クラシファイアフリーガイダンスを用いた未開条件拡散モデル:シャープ統計理論 Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory ( http://arxiv.org/abs/2403.11968v1 ) ライセンス: Link先を確認	Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen,	(参考訳) 条件付き拡散モデルは現代の画像合成の基礎となり、計算生物学や強化学習などの分野に広く応用されている。これらの応用において、条件拡散モデルには、様々な条件情報、例えばプロンプト入力が組み込まれ、サンプル生成を所望の特性に導く。経験的成功にもかかわらず、条件拡散モデルの理論はほとんど欠落している。本稿では, 条件拡散モデルを用いた分布推定の急激な統計的理論を提示することにより, このギャップを埋める。解析の結果,データ分布の滑らかさに適応し,ミニマックス下界に適合するサンプル複雑性境界が得られた。我々の理論の発展の鍵は条件付きスコア関数の近似結果にある。さらに,強化学習におけるモデルベース遷移カーネル推定,逆問題解,報奨条件付きサンプル生成など,多種多様な応用における条件拡散モデルの性能を明らかにするための統計的理論の有用性を示した。 Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of conditional diffusion models is largely missing. This paper bridges this gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models. Our analysis yields a sample complexity bound that adapts to the smoothness of the data distribution and matches the minimax lower bound. The key to our theoretical development lies in an approximation result for the conditional score function, which relies on a novel diffused Taylor approximation technique. Moreover, we demonstrate the utility of our statistical theory in elucidating the performance of conditional diffusion models across diverse applications, including model-based transition kernel estimation in reinforcement learning, solving inverse problems, and reward conditioned sample generation.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 深い熱化とヒルベルト空間エルゴディディティにおける最大エントロピー原理 A Maximum Entropy Principle in Deep Thermalization and in Hilbert-Space Ergodicity ( http://arxiv.org/abs/2403.11970v1 ) ライセンス: Link先を確認	Daniel K. Mark, Federica Surace, Andreas Elben, Adam L. Shaw, Joonhee Choi, Gil Refael, Manuel Endres, Soonwon Choi,	(参考訳) 量子多体系において自然に現れる純粋状態のアンサンブルによって示される普遍統計特性について報告する。具体的には、状態アンサンブルの2つのクラスが考慮される。一一量子状態の一様進化の時空間軌道二部分的局所射影測定により得られる小サブシステムの量子状態これらのケースはそれぞれ「ヒルベルト空間エルゴード性」と「深熱化」の現象を例示している。純粋な状態の分布は最大エントロピーを持ち、エネルギー保存のような制約や熱化による効果的な制約を受ける。我々は、アンサンブルの全ての統計モーメントに対して明示的な公式を導出し、広く受け入れられた仮定の下でそのような普遍性に必要な必要十分条件を証明し、実験においてそれらの測定可能な結果を記述することによって、この原理の定量化シグネチャを提示し、数値的に検証する。我々はさらに、この普遍性に関する情報理論的含意について論じる:我々のアンサンブルは、極度に尋問が困難でありながら、最大の情報内容を持ち、自然に隠れる(スクランブル)情報で起こる一般的な量子状態アンサンブルをできるだけ強くする。この結果はヒルベルト空間エルゴード性の概念を時間に依存しないハミルトン力学と無限から有限の有効温度での深熱化に一般化する。我々の研究は、統計および情報理論ツールを用いて量子力学の普遍的挙動を特徴づけ、理解するための新しい視点を提示している。 We report universal statistical properties displayed by ensembles of pure states that naturally emerge in quantum many-body systems. Specifically, two classes of state ensembles are considered: those formed by i) the temporal trajectory of a quantum state under unitary evolution or ii) the quantum states of small subsystems obtained by partial, local projective measurements performed on their complements. These cases respectively exemplify the phenomena of "Hilbert-space ergodicity" and "deep thermalization." In both cases, the resultant ensembles are defined by a simple principle: the distributions of pure states have maximum entropy, subject to constraints such as energy conservation, and effective constraints imposed by thermalization. We present and numerically verify quantifiable signatures of this principle by deriving explicit formulae for all statistical moments of the ensembles; proving the necessary and sufficient conditions for such universality under widely-accepted assumptions; and describing their measurable consequences in experiments. We further discuss information-theoretic implications of the universality: our ensembles have maximal information content while being maximally difficult to interrogate, establishing that generic quantum state ensembles that occur in nature hide (scramble) information as strongly as possible. Our results generalize the notions of Hilbert-space ergodicity to time-independent Hamiltonian dynamics and deep thermalization from infinite to finite effective temperature. Our work presents new perspectives to characterize and understand universal behaviors of quantum dynamics using statistical and information theoretic tools.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# ヒルベルト空間エルゴディディティからの普遍的ゆらぎと雑音学習 Universal fluctuations and noise learning from Hilbert-space ergodicity ( http://arxiv.org/abs/2403.11971v1 ) ライセンス: Link先を確認	Adam L. Shaw, Daniel K. Mark, Joonhee Choi, Ran Finkelstein, Pascal Scholl, Soonwon Choi, Manuel Endres,	(参考訳) 熱平衡に達するシステムはユビキタスである。古典系では、この現象は一般に位相空間のエルゴード性を通じて統計的に理解されるが、量子系への変換は長年の関心事である。最近では、ヒルベルト空間エルゴード性と呼ばれる、孤立した大域量子状態が利用可能な状態空間を均一に探索する、エルゴード性という量子概念が提案されている。ここでは、実験的なRydberg量子シミュレータと様々な数値モデルを用いて、このプロセスのシグネチャを観察し、その環境と相互作用する局所量子系の場合を一般化する。環境が相補的なサブシステムである閉系では、観測可能なものは大きな量子的ゆらぎから小さなガウス的ゆらぎへと変化し、浴槽のサイズが大きくなるにつれて、スムーズな量子-古典的遷移を予測および観測する。この遷移は、有限温度、イテナント粒子を持つもの、ランダム回路を含む幅広い系の定量的レベルにおいて普遍的である。次に、外部環境とノイズに相互作用するオープンシステムの場合について考察する。相関誤差を含むほぼ任意のノイズチャネル下で観測可能な天体の統計を予測し、連続的なハミルトン時間進化とディジタルランダム回路の両方において候補誤差モデルを識別する。最終的に、我々の結果は量子力学におけるエルゴード性の役割を明らかにし、基本的な結果と実践的な結果を得た。 Systems reaching thermal equilibrium are ubiquitous. For classical systems, this phenomenon is typically understood statistically through ergodicity in phase space, but translating this to quantum systems is a long-standing problem of interest. Recently a quantum notion of ergodicity has been proposed, namely that isolated, global quantum states uniformly explore their available state space, dubbed Hilbert-space ergodicity. Here we observe signatures of this process with an experimental Rydberg quantum simulator and various numerical models, before generalizing to the case of a local quantum system interacting with its environment. For a closed system, where the environment is a complementary subsystem, we predict and observe a smooth quantum-to-classical transition in that observables progress from large, quantum fluctuations to small, Gaussian fluctuations as the bath size grows. This transition is universal on a quantitative level amongst a wide range of systems, including those at finite temperature, those with itinerant particles, and random circuits. Then, we consider the case of an open system interacting noisily with an external environment. We predict the statistics of observables under largely arbitrary noise channels including those with correlated errors, allowing us to discriminate candidate error models both for continuous Hamiltonian time evolution and for digital random circuits. Ultimately our results clarify the role of ergodicity in quantum dynamics, with fundamental and practical consequences.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 量子場論における量子参照フレーム、測定スキームおよび局所代数の種類 Quantum reference frames, measurement schemes and the type of local algebras in quantum field theory ( http://arxiv.org/abs/2403.11973v1 ) ライセンス: Link先を確認	Christopher J. Fewster, Daan W. Janssen, Leon Deryck Loveridge, Kasia Rejzner, James Waldron,	(参考訳) 本研究では、相対論的量子計測理論と量子参照フレーム(QRF)を組み合わせ、背景の量子場と対称性の局所的な測定をQRFに対して行う運用フレームワークを開発する。これにより、時空等距離群の自然な作用の下で不変である量子場と参照フレームの合同代数が得られる。量子参照フレームの適切なクラスに対して、この代数は交叉積の項でパラメータ化される。量子場が良い熱的性質(ある種の非零温度でのKMS状態の存在によって表される)を持つとすると、モジュラー理論を用いて不変代数が半有限トレースを持つことを示すことができる。さらに、量子参照フレームが同じ温度で(KMSの重みによって表される)良好な熱挙動を持つ場合、このトレースは有限である。物理的可観測物の不変代数が $\textnormal{II}_1$ factor であるような正確な条件を与える。この結果はChandrasekaran, Longo, Penington and Witten [JHEP 2023, 82 (2023)] の最近の研究に基づいている。 We develop an operational framework, combining relativistic quantum measurement theory with quantum reference frames (QRFs), in which local measurements of a quantum field on a background with symmetries are performed relative to a QRF. This yields a joint algebra of quantum-field and reference-frame observables that is invariant under the natural action of the group of spacetime isometries. For the appropriate class of quantum reference frames, this algebra is parameterised in terms of crossed products. Provided that the quantum field has good thermal properties (expressed by the existence of a KMS state at some nonzero temperature), one can use modular theory to show that the invariant algebra admits a semifinite trace. If furthermore the quantum reference frame has good thermal behaviour (expressed by the existence of a KMS weight) at the same temperature, this trace is finite. We give precise conditions for the invariant algebra of physical observables to be a type $\textnormal{II}_1$ factor. Our results build upon recent work of Chandrasekaran, Longo, Penington and Witten [JHEP 2023, 82 (2023)], providing both a significant mathematical generalisation of these findings and a refined operational understanding of their model.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# OUCopula:OU-UWF画像に基づくマイオピアスクリーニング用バイチャネルマルチラベルコプラ拡張アダプタベースCNN OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images ( http://arxiv.org/abs/2403.11974v1 ) ライセンス: Link先を確認	Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Bo Fu, Catherien C. Liu, Xingtao Zhou,	(参考訳) 近縁超広視野撮影(UWF)による近視性検診は眼科成績に有意な影響を及ぼすと考えられた。現在,眼科と深部学習(DL)の多分野的な研究は,Oculus Uterque (OU, 両眼)のジョイントモデリングと予測を無視した単一眼画像を用いた疾患分類と診断に重点を置いている。 OUの複雑な関係と、(連続した)結果ラベル(球面平衡と軸長)の高相関に着想を得て、OU UWFファウンダス画像(OUCopula)を用いたコプラエンハンスアダプタ畳み込みニューラルネットワーク(CNN)学習の枠組みを提案し、複数の臨床スコアの同時予測を行う。我々は,(1)高い相関性と不均一性の両方を考慮した2チャネル画像入力を(同一のバックボーンネットワークを共有し,アダプタを用いてチャネル単位の差分をパラメータ化することにより)実現可能な,新しい2チャネルマルチラベルCNNを設計し,(2)連続出力ラベル間の相関情報を(コプラを用いて)組み込む。 OUCopulaは、バックボーンモデルと比較して、ミオピアスコア予測において満足な性能を発揮することを示す。さらに、OUCopulaはシングルアイ入力用に構築されたモデルの性能をはるかに上回ることができる。また,両チャネルモデルがマルチチャネルパラダイムに拡張される可能性や,OUCopulaが様々なバックボーンCNNにまたがる一般化可能性についても示唆した。 Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex relationships between OU and the high correlation between the (continuous) outcome labels (Spherical Equivalent and Axial Length), we propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores. We design a novel bi-channel multi-label CNN that can (1) take bi-channel image inputs subject to both high correlation and heterogeneity (by sharing the same backbone network and employing adapters to parameterize the channel-wise discrepancy), and (2) incorporate correlation information between continuous output labels (using a copula). Solid experiments show that OUCopula achieves satisfactory performance in myopia score prediction compared to backbone models. Moreover, OUCopula can far exceed the performance of models constructed for single-eye inputs. Importantly, our study also hints at the potential extension of the bi-channel model to a multi-channel paradigm and the generalizability of OUCopula across various backbone CNNs.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 非拘束3次元運動モデルを用いた単眼カメラによる歩行者追跡 Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model ( http://arxiv.org/abs/2403.11978v1 ) ライセンス: Link先を確認	Jan Krejčí, Oliver Kost, Ondřej Straka, Jindřich Duník,	(参考訳) 歩行者追跡のための第一原理単目的モデルを提案する。移動物体の広さは、歩行者の高さなどの3次元の既知の統計によって説明できると仮定される。提案したモデルでは3次元の物体の動きを共通の地上面に制約する必要はない。このモデルのための非線形フィルタは、無人カルマンフィルタ(UKF)を用いて実装され、公開されているMOT-17データセットを用いてテストされる。提案手法は, 2次元画像に投影された場合, 完全な結果を維持しつつ, 3次元で有望な結果が得られる。さらに、推定誤差の共分散は真に一致する。従来の手法とは異なり、導入されたモデルパラメータは便利な意味を持ち、問題に対して容易に調整できる。 A first-principle single-object model is proposed for pedestrian tracking. It is assumed that the extent of the moving object can be described via known statistics in 3D, such as pedestrian height. The proposed model thus need not constrain the object motion in 3D to a common ground plane, which is usual in 3D visual tracking applications. A nonlinear filter for this model is implemented using the unscented Kalman filter (UKF) and tested using the publicly available MOT-17 dataset. The proposed solution yields promising results in 3D while maintaining perfect results when projected into the 2D image. Moreover, the estimation error covariance matches the true one. Unlike conventional methods, the introduced model parameters have convenient meaning and can readily be adjusted for a problem.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# クリーンラベルの毒殺対策としての拡散脱臭 Diffusion Denoising as a Certified Defense against Clean-label Poisoning ( http://arxiv.org/abs/2403.11981v1 ) ライセンス: Link先を確認	Sanghyun Hong, Nicholas Carlini, Alexey Kurakin,	(参考訳) クリーンラベル中毒に対する認証された防御策を提示する。これらの攻撃は、トレーニングデータに$p$-normの有界対向摂動を含む少数の毒サンプル(例、1%)を注入して、テストタイム入力のターゲットの誤分類を誘導する。 $denoized$$smoothing$によって達成された対向ロバスト性に着想を得て、オフ・ザ・シェルフ拡散モデルが、改ざんしたトレーニングデータをどのように浄化するかを示す。 7件のクリーンラベル中毒に対する我々の防御を広範囲に検証し、その攻撃成功率を0-16%に抑え、テスト時間の精度は無視できない程度に低下した。我々は,我々の防衛をクリーンラベル中毒に対する既存の対策と比較し,攻撃成功率を最も低くし,最良のモデルユーティリティを提供することを示す。以上の結果から,より強力なクリーンラベル攻撃の開発に向けた今後の取り組みの必要性と,これらの攻撃を評価するための強力なベースラインとして,我々の認定された実用的防御を活用できることが浮き彫りにされた。 We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 生成テキストモデルを用いた学生による授業評価のための定性的コードブックの作成 Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching ( http://arxiv.org/abs/2403.11984v1 ) ライセンス: Link先を確認	Andrew Katz, Mitchell Gerhardt, Michelle Soledad,	(参考訳) フィードバックは改善の重要な側面です。残念なことに、複数のソースからの多くのフィードバックがある場合、情報を実用的な洞察に抽出することは困難です。教育者にとって重要なフィードバック源である教育評価(SET)について考察する。授業中の動作についてインストラクターに洞察を与えることができる。 SETのコレクションは、管理者がコースやプログラム全体の信号として役立つ。しかし、数年間にわたる高学歴や行政記録のように大規模に、SETの量は分析を困難にしている。本稿では,自然言語処理(NLP)と大規模言語モデル(LLM)を用いたSETの解析手法について述べる。大規模公立大学から5,000SETのコーパスに適用し,本手法を実証する。提案手法は,SETを抽出,埋め込み,クラスタ化,要約して表現するテーマを識別するために利用できることを示す。より一般的に、この研究はNLP技術とLLMを組み合わせてSETのコードブックを生成する方法を示している。本稿では,本手法が授業や研究環境において,SETやその他の学生の書き方を分析することの意義について論じる。 Feedback is a critical aspect of improvement. Unfortunately, when there is a lot of feedback from multiple sources, it can be difficult to distill the information into actionable insights. Consider student evaluations of teaching (SETs), which are important sources of feedback for educators. They can give instructors insights into what worked during a semester. A collection of SETs can also be useful to administrators as signals for courses or entire programs. However, on a large scale as in high-enrollment courses or administrative records over several years, the volume of SETs can render them difficult to analyze. In this paper, we discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs). We demonstrate the method by applying it to a corpus of 5,000 SETs from a large public university. We show that the method can be used to extract, embed, cluster, and summarize the SETs to identify the themes they express. More generally, this work illustrates how to use the combination of NLP techniques and LLMs to generate a codebook for SETs. We conclude by discussing the implications of this method for analyzing SETs and other types of student writing in teaching and research settings.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# GetMesh: 高品質なメッシュ生成と操作のための制御可能なモデル GetMesh: A Controllable Model for High-quality Mesh Generation and Manipulation ( http://arxiv.org/abs/2403.11990v1 ) ライセンス: Link先を確認	Zhaoyang Lyu, Ben Fei, Jinyi Wang, Xudong Xu, Ya Zhang, Weidong Yang, Bo Dai,	(参考訳) Meshは様々な産業アプリケーションにおける3Dアセットの基本的な表現であり、プロのソフトウェアによって広く支持されている。しかし、その不規則な構造のため、メッシュの生成と操作はしばしば時間と労力がかかる。本稿では,メッシュ生成と異なるカテゴリ間の操作のための,高度に制御可能な生成モデルGetMeshを提案する。さまざまなポイントを潜在表現として取り、それらをトリプレーン表現として再編成することで、GetMeshはリッチでシャープな詳細を持つメッシュを生成し、単一カテゴリとマルチカテゴリの両方を上回るパフォーマンスを実現している。さらに、グローバル/ローカルメッシュトポロジの変更、メッシュ部品の追加/削除、カテゴリ間のメッシュ部品の結合といった、従来のメッシュ生成モデルでは達成できなかった生成プロセスのきめ細かい制御も、遅延点の数、位置、特徴を調整することで、直感的に、効率よく、堅牢に実現できる。プロジェクトページはhttps://getmesh.github.io.com。 Mesh is a fundamental representation of 3D assets in various industrial applications, and is widely supported by professional softwares. However, due to its irregular structure, mesh creation and manipulation is often time-consuming and labor-intensive. In this paper, we propose a highly controllable generative model, GetMesh, for mesh generation and manipulation across different categories. By taking a varying number of points as the latent representation, and re-organizing them as triplane representation, GetMesh generates meshes with rich and sharp details, outperforming both single-category and multi-category counterparts. Moreover, it also enables fine-grained control over the generation process that previous mesh generative models cannot achieve, where changing global/local mesh topologies, adding/removing mesh parts, and combining mesh parts across categories can be intuitively, efficiently, and robustly accomplished by adjusting the number, positions or features of latent points. Project page is https://getmesh.github.io.	翻訳日:2024-03-20 19:20:58 公開日:2024-03-18
# 変分量子固有解法の成功指標としてのハミルトン-再構成距離 Hamiltonian-reconstruction distance as a success metric for the Variational Quantum Eigensolver ( http://arxiv.org/abs/2403.11995v1 ) ライセンス: Link先を確認	Leo Joon Il Moon, Mandar M. Sohoni, Michael A. Shimizu, Praveen Viswanathan, Kevin Zhang, Eun-Ah Kim, Peter L. McMahon,	(参考訳) 変分量子固有解法(VQE)は、量子シミュレーションのためのハイブリッド量子古典的アルゴリズムである。ハミルトンの基底状態を見つけるための他のヒューリスティックアルゴリズムと同様に、VQEの課題は、真の基底状態と基底状態エネルギーが未知のとき、アルゴリズムの出力解が真の基底状態にどの程度近いかを知ることである。これは、誤った早期終了を避けたいVQEのような反復アルゴリズムにおいて特に重要である。ハミルトニアン再構成の最近の発展 - 固有状態が与えられるハミルトニアンの推定 - は、ハミルトン固有解問題に対する変分解の質を評価するために、計量を与えることができる。この計量は、真の基底状態や基底状態エネルギーを知ることなく、基底状態への変分解の近接性を評価することができる。数値シミュレーションやクラウドベースのトラップイオン量子コンピュータでの実証では、一次元横フィールドイシング(11 qubits)と2次元J1-J2横フィールドイシング(6 qubits)のスピン問題の両方の場合、ハミルトン再構成距離は、VQEが基底状態を発見していないかどうかを示す有益な指標となる。我々の実験では、VQE反復の関数としてのエネルギープラトーがVQEアルゴリズムの誤った早期停止をもたらす可能性があるが、ハミルトン-再構成距離が正しく繰り返し続けることを示唆するケースを含む。ハミルトン-再構成距離は、VQE溶液と真の基底状態の間の忠実度と有用な相関関係を持つ。我々の研究は、ハミルトニアン-再構成距離が、実際にノイズの多い量子プロセッサを含むVQEの成功を評価するのに役立つことを示唆している。 The Variational Quantum Eigensolver (VQE) is a hybrid quantum-classical algorithm for quantum simulation that can be run on near-term quantum hardware. A challenge in VQE -- as well as any other heuristic algorithm for finding ground states of Hamiltonians -- is to know how close the algorithm's output solution is to the true ground state, when the true ground state and ground-state energy are unknown. This is especially important in iterative algorithms, such as VQE, where one wants to avoid erroneous early termination. Recent developments in Hamiltonian reconstruction -- the inference of a Hamiltonian given an eigenstate -- give a metric can be used to assess the quality of a variational solution to a Hamiltonian-eigensolving problem. This metric can assess the proximity of the variational solution to the ground state without any knowledge of the true ground state or ground-state energy. In numerical simulations and in demonstrations on a cloud-based trapped-ion quantum computer, we show that for examples of both one-dimensional transverse-field-Ising (11 qubits) and two-dimensional J1-J2 transverse-field-Ising (6 qubits) spin problems, the Hamiltonian-reconstruction distance gives a helpful indication of whether VQE has yet found the ground state or not. Our experiments included cases where the energy plateaus as a function of the VQE iteration, which could have resulted in erroneous early stopping of the VQE algorithm, but where the Hamiltonian-reconstruction distance correctly suggests to continue iterating. We find that the Hamiltonian-reconstruction distance has a useful correlation with the fidelity between the VQE solution and the true ground state. Our work suggests that the Hamiltonian-reconstruction distance may be a useful tool for assessing success in VQE, including on noisy quantum processors in practice.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# 生成的知識抽出、グラフベース表現、マルチモーダル・インテリジェントグラフ推論による科学的発見の高速化 Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning ( http://arxiv.org/abs/2403.11996v1 ) ライセンス: Link先を確認	Markus J. Buehler,	(参考訳) 生成人工知能 (AI) を用いて, 生物材料の領域における1000の科学論文を詳細なオントロジ知識グラフに変換し, その本質的に無スケールな性質を明らかにした。ノード類似度と相互中心性の組合せランキングに基づく異種概念間のグラフトラバースパス検出を用いて,クエリに応答し,知識のギャップを識別し,前例のない素材設計とその動作を提案する。ある比較では、生体材料とベートーヴェン第9交響楽団の詳細な構造的類似を明らかにし、同型写像を通して複雑さの共有パターンを強調した。このアルゴリズムはさらに、カンディンスキーのコンポジションVII(英語版)の絵画から抽出された原理とグラフサンプリングの結合合成を取り入れた革新的な階層的な菌糸体を創り出し、結果として得られる合成物はカオスと秩序のバランスを反映し、調整可能なポロシティ、機械的強度、複雑なパターン化された化学機能化などの特徴を持つ。我々は、物理的、生物学的、芸術的な領域にまたがる他の同型を解明し、ポストモダン哲学に共鳴する不純物と物質フラックスのニュアンスなオントロジーを明らかにし、これらの相互接続を階層的な枠組みに配置する。本研究は,従来の階層的パラダイムを超越した実体の動的,文脈依存的な相互作用を明らかにし,個々の構成要素の意義とシステム内のゆらぎ的関係を強調した。我々の予測は、従来の生成型AI手法よりもはるかに高い斬新さ、技術的詳細、爆発能力を達成する。このアプローチは、発見を容易にする隠れた接続を明らかにすることによって、イノベーションのための広く有用なフレームワークを確立する。 Using generative Artificial Intelligence (AI), we transformed a set of 1,000 scientific papers in the area of biological materials into detailed ontological knowledge graphs, revealing their inherently scale-free nature. Using graph traversal path detection between dissimilar concepts based on combinatorial ranking of node similarity and betweenness centrality, we reveal deep insights into unprecedented interdisciplinary relationships that can be used to answer queries, identify gaps in knowledge, and propose never-before-seen material designs and their behaviors. One comparison revealed detailed structural parallels between biological materials and Beethoven's 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping. The algorithm further created an innovative hierarchical mycelium-based composite that incorporates joint synthesis of graph sampling with principles extracted from Kandinsky's Composition VII painting, where the resulting composite reflects a balance of chaos and order, with features like adjustable porosity, mechanical strength, and complex patterned chemical functionalization. We uncover other isomorphisms across physical, biological, and artistic spheres, revealing a nuanced ontology of immanence and material flux that resonates with postmodern philosophy, and positions these interconnections within a heterarchical framework. Our findings reveal the dynamic, context-dependent interplay of entities beyond traditional hierarchical paradigms, emphasizing the significant role of individual components and their fluctuative relationships within the system. Our predictions achieve a far higher degree of novelty, technical detail and explorative capacity than conventional generative AI methods. The approach establishes a widely useful framework for innovation by revealing hidden connections that facilitate discovery.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# リカレントニューラルネットワーク重み行列の有用な表現法 Learning Useful Representations of Recurrent Neural Network Weight Matrices ( http://arxiv.org/abs/2403.11998v1 ) ライセンス: Link先を確認	Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber,	(参考訳) リカレントニューラルネットワーク(Recurrent Neural Networks、RNN)は、汎用並列シーケンスコンピュータである。 RNNのプログラムはその重み行列である。下流タスクと同様に、RNN分析を容易にするRNN重みの有用な表現をどうやって学習するか? メカニスティックなアプローチは、その振る舞いを予測するためにRNNの重みを直接調べるが、機能主義的なアプローチは、その全体的な機能、特に入出力マッピングを分析する。我々は、RNN重みに対するいくつかの力学的アプローチを検討し、RNNに対して置換同変のDeep Weight Space層を適用する。我々の2つの新しい機能主義者は、入力を探索することでRNNの重みから情報を'インターロゲート'することで抽出する。機能主義的アプローチがRNNの振る舞いを決定するのに役立つリッチな表現を生成できる条件を示す理論的枠組みを開発する。 RNN重み表現学習のための最初の2つの'モデル動物園'データセットを作成し、リリースする。 1つは形式言語のクラスの生成モデルで構成され、もう1つは逐次処理されたMNIST桁の分類器である。我々は,エミュレーションに基づく自己教師付き学習技術を用いて,複数の下流アプリケーション上で異なるRNN重み符号化技術を比較し,評価する。もっとも難しいのは、RNNがトレーニングした正確なタスクを予測することであり、機能主義者のアプローチは明らかに優位性を示している。 Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality -- specifically, its input-output mapping. We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. We develop a theoretical framework that demonstrates conditions under which the functionalist approach can generate rich representations that help determine RNN behavior. We create and release the first two 'model zoo' datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits. With the help of an emulation-based self-supervised learning technique we compare and evaluate the different RNN weight encoding techniques on multiple downstream applications. On the most challenging one, namely predicting which exact task the RNN was trained on, functionalist approaches show clear superiority.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# HIRI-ViT:高分解能入力を用いた拡張型視覚変換器 HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs ( http://arxiv.org/abs/2403.11999v1 ) ライセンス: Link先を確認	Ting Yao, Yehao Li, Yingwei Pan, Tao Mei,	(参考訳) Vision Transformer(ViT)とConvolution Neural Network(CNN)のハイブリッドディープモデルは、ビジョンタスクの強力なバックボーンクラスとして登場した。このようなハイブリッドバックボーンの入力解像度のスケールアップは、モデル容量を自然に強化するが、必然的に、二次的にスケールする重い計算コストに悩まされる。代わりに、HIgh-Resolution Inputs(HIRI-ViT)を組み込んだ新しいハイブリッドバックボーンを提案し、高解像度入力に適した4段のViTから5段のViTにアップグレードする。 HIRI-ViTは、典型的なCNN操作を2つの並列CNNブランチにコスト効率よく分解するという基本的な考え方に基づいている。 1つの高分解能分岐は入力として第一の高分解能特徴を直接取り込むが、畳み込み演算は少ない。他の低解像度ブランチは、まずダウンサンプリングを行い、その後、そのような低解像度機能に対してより畳み込み演算を利用する。認識タスク(ImageNet-1Kデータセット)と高密度予測タスク(COCOおよびADE20Kデータセット)の両方の実験は、HIRI-ViTの優位性を実証している。 HIRI-ViTは448$\times$448の入力でImageNet上で84.3%の最高のTop-1精度を実現し、224$\times$224の入力で、iFormer-Sの83.4%を0.9%改善した。 The hybrid deep models of Vision Transformer (ViT) and Convolution Neural Network (CNN) have emerged as a powerful class of backbones for vision tasks. Scaling up the input resolution of such hybrid backbones naturally strengthes model capacity, but inevitably suffers from heavy computational cost that scales quadratically. Instead, we present a new hybrid backbone with HIgh-Resolution Inputs (namely HIRI-ViT), that upgrades prevalent four-stage ViT to five-stage ViT tailored for high-resolution inputs. HIRI-ViT is built upon the seminal idea of decomposing the typical CNN operations into two parallel CNN branches in a cost-efficient manner. One high-resolution branch directly takes primary high-resolution features as inputs, but uses less convolution operations. The other low-resolution branch first performs down-sampling and then utilizes more convolution operations over such low-resolution features. Experiments on both recognition task (ImageNet-1K dataset) and dense prediction tasks (COCO and ADE20K datasets) demonstrate the superiority of HIRI-ViT. More remarkably, under comparable computational cost ($\sim$5.0 GFLOPs), HIRI-ViT achieves to-date the best published Top-1 accuracy of 84.3% on ImageNet with 448$\times$448 inputs, which absolutely improves 83.4% of iFormer-S by 0.9% with 224$\times$224 inputs.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# Notochord: リアルタイムMIDIパフォーマンスのための柔軟な確率モデル Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance ( http://arxiv.org/abs/2403.12000v1 ) ライセンス: Link先を確認	Victor Shepardson, Jack Armitage, Thor Magnusson,	(参考訳) 深層学習に基づく音楽データの確率論的モデルは、ますます現実的な結果を生み出し、多くの種類の創造的ワークフローに入ることを約束している。しかし、パフォーマンスの面ではほとんど研究されていないため、ユーザアクションの結果は通常、瞬時に感じるべきである。このような研究を可能にするために、構造化イベントのシーケンスの深い確率モデルであるNotochordを設計し、Lakh MIDIデータセット上でそのインスタンスをトレーニングした。我々の確率的定式化により、サブイベントレベルでの解釈可能な介入が可能となり、1つのモデルがステアブルジェネレーション、調和、機械即興、可能性に基づくインタフェースを含む多様なインタラクティブな音楽機能のためのバックボーンとして機能する。 NotochordはポリフォニックおよびマルチトラックMIDIを生成し、10ミリ秒未満のレイテンシで入力に応答する。トレーニングコード、モデルチェックポイント、インタラクティブな例がオープンソースソフトウェアとして提供されている。 Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# DreamMotion:ゼロショットビデオ編集のための時空間自己相似スコア蒸留 DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing ( http://arxiv.org/abs/2403.12002v1 ) ライセンス: Link先を確認	Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye,	(参考訳) テキスト駆動拡散に基づくビデオ編集は、実際の動きを確立するという、画像編集の文献で遭遇しない独特な課題を提示する。既存のビデオ編集手法とは異なり,本研究では,通常の逆拡散過程を回避し,すでに自然な動きを示すビデオから最適化を開始するために,スコア蒸留サンプリングに焦点を当てる。分析の結果, ビデオスコア蒸留は, ターゲットテキストで示される新しいコンテンツを効果的に導入できる一方で, 重要な構造や動きのずれを引き起こす可能性があることがわかった。これに対抗するために,本研究では,原ビデオと編集ビデオの時空間自己相似性をスコア蒸留中にマッチングすることを提案する。スコア蒸留の応用により,本手法はモデル非依存であり,カスケードおよび非カスケードビデオ拡散フレームワークにも適用可能である。先行手法との比較により,従来の構造と動きを正確に保ちながら外観を変化させる上で,その優位性を示す。 Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video score distillation can effectively introduce new content indicated by target text, it can also cause significant structure and motion deviation. To counteract this, we propose to match space-time self-similarities of the original video and the edited video during the score distillation. Thanks to the use of score distillation, our approach is model-agnostic, which can be applied for both cascaded and non-cascaded video diffusion frameworks. Through extensive comparisons with leading methods, our approach demonstrates its superiority in altering appearances while accurately preserving the original structure and motion.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# GenView: 自己指導型学習のための事前学習型生成モデルによるビュー品質向上 GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning ( http://arxiv.org/abs/2403.12003v1 ) ライセンス: Link先を確認	Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang,	(参考訳) 自己教師付き学習は、ラベルのないデータから高品質な表現を取得することに成功している。広範に採用されているコントラスト学習フレームワークは、同じ画像から生じるポジティブビュー間の距離を最小化し、不変表現を学習することを目的としている。しかし、既存の正のビューを構築する技術は手動の変換に強く依存しており、結果として多様性が限られ、潜在的に偽の正のペアが生まれる。これらの課題に対処するため、GenViewは、セマンティクスを保ちながら、事前学習された生成モデルのパワーを活用するポジティブビューの多様性を高める制御可能なフレームワークである。可変性を導入しながら本質的な意味の保存を確保するため,サンプリング中の雑音レベルを動的に調整する適応ビュー生成手法を開発した。さらに,前景の類似性と背景の多様性を両立させることにより,正の対の質を評価する品質駆動型コントラスト損失を導入する。この損失は、私たちが構築する高品質な正ペアを優先し、低品質なペアの影響を低減し、生成モデルやアグレッシブなデータ拡張によってもたらされる潜在的な意味的不整合を軽減します。肯定的なビュー品質の改善と品質主導のコントラスト損失のおかげで、GenViewはさまざまなタスクにわたる自己教師型学習を大幅に改善した。例えば、GenViewはImageNetの線形/半教師付き分類でMoCov2のパフォーマンスを2.5%/2.2%改善している。さらに、GenViewは、Laion400MやImageNet21KでImageNetデータセットをナレーション的に拡張するよりも、はるかに優れたパフォーマンスを実現している。コードはhttps://github.com/xiaojieli0903/genview.comから入手できる。 Self-supervised learning has achieved remarkable success in acquiring high-quality representations from unlabeled data. The widely adopted contrastive learning framework aims to learn invariant representations by minimizing the distance between positive views originating from the same image. However, existing techniques to construct positive views highly rely on manual transformations, resulting in limited diversity and potentially false positive pairs. To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics. We develop an adaptive view generation method that dynamically adjusts the noise level in sampling to ensure the preservation of essential semantic meaning while introducing variability. Additionally, we introduce a quality-driven contrastive loss, which assesses the quality of positive pairs by considering both foreground similarity and background diversity. This loss prioritizes the high-quality positive pairs we construct while reducing the influence of low-quality pairs, thereby mitigating potential semantic inconsistencies introduced by generative models and aggressive data augmentation. Thanks to the improved positive view quality and the quality-driven contrastive loss, GenView significantly improves self-supervised learning across various tasks. For instance, GenView improves MoCov2 performance by 2.5%/2.2% on ImageNet linear/semi-supervised classification. Moreover, GenView even performs much better than naively augmenting the ImageNet dataset with Laion400M or ImageNet21K. Code is available at https://github.com/xiaojieli0903/genview.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# 機械学習における信頼の可視化:2023年のフィールドの現状 Visualization for Trust in Machine Learning Revisited: The State of the Field in 2023 ( http://arxiv.org/abs/2403.12005v1 ) ライセンス: Link先を確認	Angelos Chatzimparmpas, Kostiantyn Kucher, Andreas Kerren,	(参考訳) 説明可能な信頼性のある機械学習のための可視化は、医療、金融、バイオインフォマティクスなど、さまざまな応用分野における情報可視化と視覚分析において、最も重要な研究分野の1つである。 2020年、200のテクニックからなる最先端のレポートの後、可視化技術に関する査読された論文を継続的に収集し、119のカテゴリからなる以前に確立された分類スキーマに基づいて分類し、オンラインサーベイブラウザで542のテクニックの収集を行った。本稿では,2023年秋以降のこのデータセットの新たな分析結果について報告し,機械学習における可視化利用に関するトレンド,洞察,8つのオープン課題について論じる。我々の結果は、過去3年間に機械学習モデルの信頼性を高めるための可視化技術の急成長傾向を裏付けるもので、可視化は一般的なモデル説明可能性の手法の改善や、新しいディープラーニングアーキテクチャのチェックに役立ちます。 Visualization for explainable and trustworthy machine learning remains one of the most important and heavily researched fields within information visualization and visual analytics with various application domains, such as medicine, finance, and bioinformatics. After our 2020 state-of-the-art report comprising 200 techniques, we have persistently collected peer-reviewed articles describing visualization techniques, categorized them based on the previously established categorization schema consisting of 119 categories, and provided the resulting collection of 542 techniques in an online survey browser. In this survey article, we present the updated findings of new analyses of this dataset as of fall 2023 and discuss trends, insights, and eight open challenges for using visualizations in machine learning. Our results corroborate the rapidly growing trend of visualization techniques for increasing trust in machine learning models in the past three years, with visualization found to help improve popular model explainability methods and check new deep learning architectures, for instance.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# SV3D:潜時ビデオ拡散を用いた単一画像からの新しい多視点合成と3次元生成 SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion ( http://arxiv.org/abs/2403.12008v1 ) ライセンス: Link先を確認	Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani,	(参考訳) 安定ビデオ3D(SV3D) - 3Dオブジェクトの周囲の高解像度・画像・マルチビュー生成のための潜時ビデオ拡散モデルを提案する。最近の3D生成技術は、新しいビュー合成(NVS)と3D最適化のために2D生成モデルを適応させる手法を提案する。しかし、これらの手法は、限られた視点や一貫性のないNVSのいずれかのため、いくつかの欠点があり、3次元オブジェクト生成の性能に影響を及ぼす。本研究では,新たな多視点合成と3D生成に画像間拡散モデルを適用するSV3Dを提案する。また,SV3DとそのNVS出力を画像から3D生成に利用するための改良された3D最適化手法を提案する。 2Dと3Dのメトリクスを持つ複数のデータセットの大規模な実験結果とユーザスタディは、SV3DのNVSにおける最先端のパフォーマンスと、以前の作業と比較して3D再構成を実証している。 We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affecting the performance of 3D object generation. In this work, we propose SV3D that adapts image-to-video diffusion model for novel multi-view synthesis and 3D generation, thereby leveraging the generalization and multi-view consistency of the video models, while further adding explicit camera control for NVS. We also propose improved 3D optimization techniques to use SV3D and its NVS outputs for image-to-3D generation. Extensive experimental results on multiple datasets with 2D and 3D metrics as well as user study demonstrate SV3D's state-of-the-art performance on NVS as well as 3D reconstruction compared to prior works.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# ビデオMV:大容量映像生成モデルに基づく連続マルチビュー生成 VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model ( http://arxiv.org/abs/2403.12010v1 ) ライセンス: Link先を確認	Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang,	(参考訳) テキストやシングルイメージのプロンプトに基づいてマルチビュー画像を生成することは、3Dコンテンツを作成する上で重要な機能である。このトピックに関する2つの基本的な質問は、トレーニングに使用するデータと、マルチビューの一貫性を保証する方法です。本稿では,両質問に基礎的貢献を行う新しい枠組みを紹介する。トレーニングのために2次元拡散モデルからの画像を利用するのと異なり、市販のビデオ生成モデルから微調整された密集した一貫した多視点生成モデルを提案する。映像生成モデルからのイメージは、フレームの一貫性を強制するために時間モジュールを使用するため、マルチビュー生成に適している。さらに、これらのモデルをトレーニングするために使用されるビデオデータセットは多種多様であり、列車の微調整領域のギャップを減らしている。マルチビューの整合性を高めるために,まずフィードフォワード再構成モジュールを用いてグローバルな3Dモデルを得る3D-Aware Denoising Samplingを導入し,次に,グローバルな3Dモデルから描画された画像をデノージングサンプリングループに効果的に巻き込むサンプリング戦略を適用し,最終画像のマルチビュー整合性を改善する。副産物として、このモジュールはまた、数秒で3Dガウスアンによって表される3Dアセットを作成する高速な方法を提供する。当社のアプローチでは24の濃密なビューを生成して,最先端のアプローチ(4GPU時間と数千GPU時間)よりもはるかに高速に,視覚的品質と一貫性を両立することが可能です。さらに微調整を行うことで、既存の最先端手法よりも定量的メトリクスと視覚効果の両面で優れる。プロジェクトページは aigc3d.github.io/MVMV。 Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# HOIDiffusion:リアルな3Dハンドオブジェクトインタラクションデータを生成する HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data ( http://arxiv.org/abs/2403.12011v1 ) ライセンス: Link先を確認	Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang,	(参考訳) 3Dハンドオブジェクトのインタラクションデータは、データ収集プロセスのスケールアップにおけるハードウェア上の制約のため、ほとんどありません。本稿では,現実的かつ多様な3次元ハンドオブジェクトインタラクションデータを生成するためのHOIDiffusionを提案する。本モデルは,3次元手対象幾何学構造とテキスト記述を画像合成の入力として用いた条件拡散モデルである。これは、構造とスタイルの入力を非交互に指定できるため、より制御可能で現実的な合成を提供する。 HOIDiffusionは、大規模な自然画像と数枚の人間の実演で事前訓練された拡散モデルを活用することで訓練される。制御可能な画像合成以外にも、生成した3Dデータを用いて6次元オブジェクトのポーズ推定を学習し、認識システムの改善にその効果を示す。プロジェクトページ: https://mq-zhang1.github.io/HOIDiffusion 3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis. This offers a more controllable and realistic synthesis as we can specify the structure and style inputs in a disentangled manner. HOIDiffusion is trained by leveraging a diffusion model pre-trained on large-scale natural images and a few 3D human demonstrations. Beyond controllable image synthesis, we adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems. Project page: https://mq-zhang1.github.io/HOIDiffusion	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# リー群に対するKineetic Langevin Monte Carloの収束性 Convergence of Kinetic Langevin Monte Carlo on Lie groups ( http://arxiv.org/abs/2403.12012v1 ) ライセンス: Link先を確認	Lingkai Kong, Molei Tao,	(参考訳) リー群上で定義される関数を最適化するための明示的で運動量に基づく力学は、変分最適化や左自明化といった手法に基づいて最近構築された。我々は、ポテンシャル関数が多様体上に存在するにもかかわらず、運動量変数がユークリッドであるという利点を生かして、最適化力学をサンプリング力学に変換するために、トラクタブルノイズを適切に加える。次に,Lie群MCMCサンプリング器を提案し,その結果の速度論的ラージビン型サンプリングダイナミクスを微妙に判別する。リー群構造は、この離散化によって正確に保存される。連続力学と離散サンプリング器の両方に対する明示的な収束率を持つ指数収束は、W2距離の下で証明される。リー群のコンパクト性とポテンシャル関数の測地的L-滑らか性のみが必要である。我々の知る限りでは、これは曲線空間上での動力学ランゲヴィンの初めての収束結果であり、凸性を必要としない最初の定量的結果である。 Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the momentum variable is Euclidean despite that the potential function lives on a manifold. We then propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics. The Lie group structure is exactly preserved by this discretization. Exponential convergence with explicit convergence rate for both the continuous dynamics and the discrete sampler are then proved under W2 distance. Only compactness of the Lie group and geodesically L-smoothness of the potential function are needed. To the best of our knowledge, this is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# GeoWizard: 単一画像からの3次元幾何推定のための拡散優先事項の解放 GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image ( http://arxiv.org/abs/2403.12013v1 ) ライセンス: Link先を確認	Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long,	(参考訳) 幾何学的属性,例えば深さ,正規度を単一画像から推定するための新しい生成基盤モデルであるGeoWizardを紹介する。この領域ではすでに重要な研究が行われているが、公開データセットの多様性と品質の低さにより、進歩は著しく制限されている。結果として、以前の作品は限られたシナリオに制約されるか、幾何学的詳細を捉えることができないことに悩まされる。本稿では、従来の識別モデル(例えば、CNN、トランスフォーマー)とは対照的に、生成モデルは本質的に不適切な問題に効果的に対処できることを実証する。さらに,拡散前処理の活用により,資源利用の一般化,詳細な保存,効率性が著しく向上することが示唆された。具体的には,従来の安定拡散モデルを拡張して,両表現間の相互情報交換と高整合性を実現する。より重要なことは、様々なシーンの複雑なデータ分布を異なるサブディストリビューションに分離する、単純かつ効果的な戦略を提案することである。この戦略により,我々のモデルは異なるシーンレイアウトを認識でき,顕著な忠実さで3次元幾何学を捉えることができる。 GeoWizardは、ゼロショット深度と通常の予測のための新しいベンチマークを設定し、3D再構成、2Dコンテンツ作成、新しい視点合成など、多くの下流アプリケーションを大幅に強化した。 We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenarios or suffer from the inability to capture geometric details. In this paper, we demonstrate that generative models, as opposed to traditional discriminative models (e.g., CNNs and Transformers), can effectively address the inherently ill-posed problem. We further show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. Specifically, we extend the original stable diffusion model to jointly predict depth and normal, allowing mutual information exchange and high consistency between the two representations. More importantly, we propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions. This strategy enables our model to recognize different scene layouts, capturing 3D geometry with remarkable fidelity. GeoWizard sets new benchmarks for zero-shot depth and normal prediction, significantly enhancing many downstream applications such as 3D reconstruction, 2D content creation, and novel viewpoint synthesis.	翻訳日:2024-03-20 19:11:08 公開日:2024-03-18
# EnvGen: 人工呼吸器を訓練するためのLLMによる環境の生成と適応 EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents ( http://arxiv.org/abs/2403.12014v1 ) ライセンス: Link先を確認	Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal,	(参考訳) 近年のSOTAアプローチでは,環境における次のステップを決定するために,大規模言語モデル(LLM)を直接エージェントとして採用している。世界的知識と推論能力のため、LLMエージェントは強化学習(RL)に基づく従来のより小さなエージェントよりも高い性能を達成するが、LLMを頻繁に呼び出すのは遅くて高価である。 LLMをエージェントとして直接利用する代わりに、LLMの推論機能を使用して、より小さなRLエージェントが、彼らが弱いという有用なスキルを学ぶのに役立つトレーニング環境を適応的に作成できますか? 本稿では,この問題に対処するための新しいフレームワークであるEnvGenを提案する。まず LLM に,エージェントが並列に異なるタスクを素早く学習できるように訓練環境を生成するように促す。具体的には、LLMには、エージェントが学習すべきタスク記述とシミュレーターの目的が与えられ、その後、環境設定(例えば、異なる地形、エージェントに与えられたアイテムなど)のセットを生成するように要求される。次に、LLM生成環境とLLM生成環境を混合した小さなRLエージェントを訓練する。次に, LLMが生成した環境を継続的に適応させ, エージェントのパフォーマンスの形でLLMにフィードバックを提供することにより, エージェントが弱いスキルを徐々に向上させる。 Crafter および Heist 環境での総合的な実験により,EnvGen の有用性を実証する。我々は、EnvGenで訓練された小さなRLエージェントが、GPT-4エージェントを含むSOTAメソッドより優れており、長い水平タスクをかなり高速に学習できることを発見した。我々は、LLMがトレーニング環境に適応し、RLエージェントのより弱いスキルを時間とともに改善する方法を定性的に示す。加えて、EnvGen は LLM コールを少数(例えば、合計 4 個)しか使用していないのに対して、LLM エージェントは数千個の LLM コールを必要とするため、かなり効率的である。最後に、設計選択に関する詳細なアブレーション研究について述べる。 Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller embodied RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. First, we prompt an LLM to generate training environments that allow agents to quickly learn different tasks in parallel. Concretely, the LLM is given the task description and simulator objectives that the agents should learn and is then asked to generate a set of environment configurations (e.g., different terrains, items given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We show qualitatively how the LLM adapts training environments to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of LLM calls. Lastly, we present detailed ablation studies for our design choices.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# 潜伏拡散蒸留による高速高分解能画像合成 Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation ( http://arxiv.org/abs/2403.12015v1 ) ライセンス: Link先を確認	Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach,	(参考訳) 拡散モデルは画像合成とビデオ合成の進歩の主要因であるが、推論速度の遅さに悩まされている。最近導入された逆拡散蒸留(ADD)のように、蒸留法は、固定された事前訓練されたDINOv2識別器に依存するため、高価で困難な最適化を犠牲にして、モデルを多段式から単段式にシフトすることを目的としている。 ADDの限界を克服する新しい蒸留法であるLADD(Latent Adversarial Diffusion Distillation)を導入する。ピクセルベースのADDとは対照的に、LADDは事前訓練された潜伏拡散モデルから生成的特徴を利用する。このアプローチは、訓練を単純化し、性能を向上し、高分解能マルチアスペクト比画像合成を可能にする。 LADDを安定拡散3 (8B) に適用し, 4つの無誘導サンプリングステップのみを用いて, 最先端のテキスト・画像生成装置の性能に適合する高速モデルSD3-Turboを得る。さらに,そのスケーリング動作を体系的に検討し,画像編集やインペイントなどの様々な応用においてLADDの有効性を示す。 Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator. We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD. In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models. This approach simplifies training and enhances performance, enabling high-resolution multi-aspect ratio image synthesis. We apply LADD to Stable Diffusion 3 (8B) to obtain SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. Moreover, we systematically investigate its scaling behavior and demonstrate LADD's effectiveness in various applications such as image editing and inpainting.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# 逆強化学習としてのファインチューニング Supervised Fine-Tuning as Inverse Reinforcement Learning ( http://arxiv.org/abs/2403.12017v1 ) ライセンス: Link先を確認	Hao Sun,	(参考訳) LLM(Large Language Models)の整合性に対する一般的なアプローチは、一般的に人間やAIのフィードバックに依存し、特定のタイプの嗜好データセットへのアクセスを前提としている。本研究では,このようなデータセットの有効性に疑問を呈し,専門家による実演との整合性がより現実的であることを証明した様々なシナリオを探索する。実演データセットを用いてLCMを整列する問題を定式化するための逐次的意思決定フレームワークを構築した。逆強化学習と模倣学習から洞察を得た上で,LLMアライメントタスクにおける分散化最小化のための様々なアプローチを提案する。分析では、これらの異なるアプローチの質量探索とモード探索の挙動を強調した。包括的に,古典的微調整法の長所と短所を考察し,異なる方法が輝くシナリオについて検討した。 The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback and assumes access to specific types of preference datasets. In our work, we question the efficacy of such datasets and explore various scenarios where alignment with expert demonstrations proves more realistic. We build a sequential decision-making framework to formulate the problem of aligning LLMs using demonstration datasets. Drawing insights from inverse reinforcement learning and imitation learning, we introduce various approaches for divergence minimization in the LLM alignment tasks. Our analysis highlights the mass-covering and mode-seeking behaviors of these different approaches. Inclusively, we examine the pros and cons of the classical supervised fine-tuning method, elaborating on scenarios where different methods shine.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# 一般化された波多野・ネルソンモデルにおける任意の順序の例外点 Exceptional points of any order in a generalized Hatano-Nelson model ( http://arxiv.org/abs/2403.12018v1 ) ライセンス: Link先を確認	Julius T. Gohsrich, Jacob Fauman, Flore K. Kunst,	(参考訳) 例外点(EP)は真に非エルミート(NH)退化であり、行列が欠陥となる。そのようなEPの順序は、結合固有ベクトルの数によって与えられる。一方、ほとんどの研究は、$N\leq4$-dimensional NH Bloch Hamiltonians における$N$th-order EPの研究に焦点を当てている。一方, NHスキン効果を示すモデルでは, システムサイズでスケールする順序のEPの存在が指摘されている。本稿では,新しいタイプのEPを紹介し,システムサイズにスケールしない任意の順序でEPを実現する方法を提案する。より長距離ホッピングを持つパラダイム的ハタノ・ネルソンモデルの一般化版を導入する。この系に存在するEPは、顕著な物理的特徴を示す:それらの関連する固有状態は、いくつかの部位に局在し、NH皮膚効果を示す。さらに、EPはホッピング強度の一般的な摂動や、特定の形態のオンサイト障害に対して堅牢である。 Exceptional points (EPs) are truly non-Hermitian (NH) degeneracies where matrices become defective. The order of such an EP is given by the number of coalescing eigenvectors. On the one hand, most work focusses on studying $N$th-order EPs in $N\leq4$-dimensional NH Bloch Hamiltonians. On the other hand, some works have remarked on the existence of EPs of orders scaling with systems size in models exhibiting the NH skin effect. In this letter, we introduce a new type of EP and provide a recipe on how to realize EPs of arbitrary order not scaling with system size. We introduce a generalized version of the paradigmatic Hatano-Nelson model with longer-range hoppings. The EPs existing in this system show remarkable physical features: Their associated eigenstates are localized on a subset of sites and are exhibiting the NH skin effect. Furthermore, the EPs are robust against generic perturbations in the hopping strengths as well as against a specific form of on-site disorder.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# LN3Diff:高速3次元生成のためのスケーラブル潜在ニューラルネットワーク拡散 LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation ( http://arxiv.org/abs/2403.12019v1 ) ライセンス: Link先を確認	Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy,	(参考訳) ニューラルレンダリングの分野は、生成モデルと微分可能なレンダリング技術の進歩により、大きな進歩をみせた。 2次元拡散は成功したが、統一された3次元拡散パイプラインは依然として未解決のままである。本稿では,このギャップに対処し,高速で高品質で汎用的な条件付き3D生成を可能にするLN3Diffという新しいフレームワークを提案する。提案手法では,3次元アーキテクチャと可変オートエンコーダ(VAE)を用いて,入力画像を構造化されたコンパクトな3次元ラテント空間に符号化する。潜伏剤は、トランスフォーマーベースのデコーダによって、高容量の3Dニューラルフィールドに復号される。この3D対応潜伏空間上での拡散モデルをトレーニングすることにより,ShapeNetの3D生成における最先端性能を実現し,各データセットにおけるモノラルな3D再構成と条件付き3D生成において優れた性能を示す。さらに、既存の3次元拡散法を推論速度で上回り、インスタンスごとの最適化を必要としない。提案するLN3Diffは3次元生成モデリングの大幅な進歩を示し、3次元視覚およびグラフィックスタスクにおける様々な応用を約束する。 The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# 4つの筆記システムの探索と標準化による法キエン二重翻訳の強化 Enhancing Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems ( http://arxiv.org/abs/2403.12024v1 ) ライセンス: Link先を確認	Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai,	(参考訳) 機械翻訳は主に高リソース言語(HRL)に重点を置いているが、台湾のホッキエンのような低リソース言語(LRL)は比較的研究が進んでいない。本研究は,台湾のホッキエンと中国語と英語の二重翻訳モデルを開発することにより,このギャップを解消することを目的とする。台湾のホッキエン漢と伝統的なマンダリン中国語の正書法的類似性を活用するために,従来のマンダリン中国語に特化して訓練済みのLLaMA2-7Bモデルを用いる。本研究の総合的な実験は,台湾のホクキエンや台湾のホクキエン,その他のHRL間の様々な書記システムにおける翻訳作業を含む。限定的な単言語コーパスの使用により,台湾のホッキエン能力がさらに向上することが判明した。そして、翻訳モデルを用いて、台湾のすべての法キエン文字体系を北キエン漢に標準化し、さらなる性能向上を実現した。さらに,逆翻訳とGPT-4を併用した評価手法を導入し,LRLにおいても信頼性の高い翻訳品質評価を実現する。この研究は台湾のホッキエンの資源ギャップを狭めることに寄与し、LLaMA 2.0に基づく事前学習と微調整の利点と限界を実証的に研究している。 Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. This study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA2-7B model specialized in Traditional Mandarin Chinese to leverage the orthographic similarities between Taiwanese Hokkien Han and Traditional Mandarin Chinese. Our comprehensive experiments involve translation tasks across various writing systems of Taiwanese Hokkien and between Taiwanese Hokkien and other HRLs. We find that the use of a limited monolingual corpus also further improve the model's Taiwanese Hokkien capabilities. We then utilize our translation model to standardize all Taiwanese Hokkien writing systems into Hokkien Han, resulting in further performance improvements. Additionally, we introduce an evaluation method incorporating back-translation and GPT-4 to ensure reliable translation quality assessment even for LRLs. The study contributes to narrowing the resource gap for Taiwanese Hokkien and empirically investigates the advantages and limitations of pre-training and fine-tuning based on LLaMA 2.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# 大規模言語モデルにおけるヘルスエクイティ・ハームとバイアスに対するツールボックス A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models ( http://arxiv.org/abs/2403.12025v1 ) ライセンス: Link先を確認	Stephen R. Pfohl, Heather Cole-Lewis, Rory Sayres, Darlene Neal, Mercy Asiedu, Awa Dieng, Nenad Tomasev, Qazi Mamunur Rashid, Shekoofeh Azizi, Negar Rostamzadeh, Liam G. McCoy, Leo Anthony Celi, Yun Liu, Mike Schaekermann, Alanna Walton, Alicia Parrish, Chirag Nagpal, Preeti Singh, Akeiylah Dewitt, Philip Mansfield, Sushant Prakash, Katherine Heller, Alan Karthikesalingam, Christopher Semturs, Joelle Barral, Greg Corrado, Yossi Matias, Jamila Smith-Loud, Ivor Horn, Karan Singhal,	(参考訳) 大規模言語モデル(LLM)は、複雑な健康情報を提供するという大きな約束を持っているが、健康格差を悪化させる可能性がある。エクイティ関連モデル失敗の信頼性評価は、ヘルスエクイティを促進するシステムを開発するための重要なステップである。本研究は,医学的質問に対するLLM生成の長期的回答において,株式関連害を沈降させる可能性を秘めたバイアスを克服し,Med-PaLM 2を用いて経験的ケーススタディを実施し,その結果,これまでで最大の人的評価研究となった。 EquityMedQAは、手動で計算し、LLMで生成した質問を敵対的クエリに富んだ7つの新たにリリースしたデータセットの集合である。我々の人間評価フレームワークとデータセット設計プロセスは、反復的な参加的アプローチと、Med-PaLM 2の逆クエリに対するバイアスの可能性を検証している。実験的な研究を通じて,複数の評価ルーブリックデザインと多様なレーダグループを活用する徹底的な評価プロトコルと組み合わせることで,より狭い評価アプローチによって見逃される可能性のあるバイアスを表面化することを発見した。我々の経験は、多様な評価手法を使うことの重要性と、様々なバックグラウンドや専門知識のラウンダーを巻き込むことの重要性を浮き彫りにしている。我々は、我々のフレームワークが特定のバイアスの種類を特定することはできるが、AIシステムの展開が同等の健康結果を促進するかどうかを全体論的に評価することは十分ではないことを強調する。より広いコミュニティがこれらのツールや手法を活用して、誰もがアクセス可能で公平な医療を促進するLLMの共通の目標を実現することを願っています。 Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. In this work, we present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and then conduct an empirical case study with Med-PaLM 2, resulting in the largest human evaluation study in this area to date. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven newly-released datasets comprising both manually-curated and LLM-generated questions enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of possible biases in Med-PaLM 2 answers to adversarial queries. Through our empirical study, we find that the use of a collection of datasets curated through a variety of methodologies, coupled with a thorough evaluation protocol that leverages multiple assessment rubric designs and diverse rater groups, surfaces biases that may be missed via narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. We emphasize that while our framework can identify specific forms of bias, it is not sufficient to holistically assess whether the deployment of an AI system promotes equitable health outcomes. We hope the broader community leverages and builds on these tools and methods towards realizing a shared goal of LLMs that promote accessible and equitable healthcare for all.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# FlexCap: 画像にリッチ、ローカライズ、フレキシブルなキャプションを生成する FlexCap: Generating Rich, Localized, and Flexible Captions in Images ( http://arxiv.org/abs/2403.12026v1 ) ライセンス: Link先を確認	Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar,	(参考訳) 様々な長さの領域固有の記述を生成できる汎用的な$\textit{flexible-captioning}$ Vision-Language Model (VLM)を導入する。モデルであるFlexCapは、入力バウンディングボックスのための長さ条件付きキャプションを生成するように訓練されており、これにより、簡潔なオブジェクトラベルから詳細なキャプションまで、その出力の情報密度を制御できる。これを実現するために、キャプション付き画像から、長さの異なる画像領域記述の大規模なトレーニングデータセットを作成する。この柔軟なカプセル化機能には、いくつかの価値のある応用がある。まず、FlexCapはVisual Genomeデータセットの高密度キャプションタスクにおいて優れたパフォーマンスを示す。第二に、視覚的質問応答(VQA)システムはFlexCapを利用して、大きな言語モデルへの入力として局所化された記述を生成することができる。得られたシステムは、多数のVQAデータセット上で最先端のゼロショット性能を達成する。また、FlexCapを使った$\textit{localize-then-describe}$アプローチは、他のVLMによる$\textit{describe-then-localize}$アプローチよりも、オープンなオブジェクト検出に優れていることを示す。我々は,プレフィックス条件付けによって様々な視覚情報を抽出するFlexCapの特徴を強調した。最後に、画像ラベリング、オブジェクト属性認識、ビジュアルダイアログといったタスクにおいてFlexCapの幅広い適用性を質的に示す。プロジェクトWebページ: https://flex-cap.github.io 。 We introduce a versatile $\textit{flexible-captioning}$ vision-language model (VLM) capable of generating region-specific descriptions of varying lengths. The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions. To achieve this we create large-scale training datasets of image region descriptions of varying length, starting from captioned images. This flexible-captioning capability has several valuable applications. First, FlexCap demonstrates superior performance in dense captioning tasks on the Visual Genome dataset. Second, a visual question answering (VQA) system can be built by employing FlexCap to generate localized descriptions as inputs to a large language model. The resulting system achieves state-of-the-art zero-shot performance on a number of VQA datasets. We also demonstrate a $\textit{localize-then-describe}$ approach with FlexCap can be better at open-ended object detection than a $\textit{describe-then-localize}$ approach with other VLMs. We highlight a novel characteristic of FlexCap, which is its ability to extract diverse visual information through prefix conditioning. Finally, we qualitatively demonstrate FlexCap's broad applicability in tasks such as image labeling, object attribute recognition, and visual dialog. Project webpage: https://flex-cap.github.io .	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# Pixelsからインサイトへ:大規模基盤モデルの時代における自動チャート理解に関する調査 From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models ( http://arxiv.org/abs/2403.12027v1 ) ライセンス: Link先を確認	Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji,	(参考訳) グラフ形式のデータの可視化は、データ分析において重要な役割を担い、重要な洞察を提供し、情報的な意思決定を支援する。自動チャート理解は、近年の大規模基盤モデルの台頭とともに、大きな進歩をみせている。大規模言語モデル(LLM)のような基礎モデルは、様々な自然言語処理(NLP)タスクに革命をもたらし、チャート理解タスクにますます応用されている。本稿では,これらの基礎モデルの文脈におけるチャート理解の最近の展開,課題,今後の方向性について概観する。この論文は、チャート理解、問題定式化の概要、およびチャート理解タスクの研究に不可欠な基本的な構成要素について議論することから始まる。タスクとデータセットの節では、チャート理解の中で様々なタスクを探索し、それらの評価指標と、チャートとテキストのインプットの両方のソースについて議論する。次に、分類ベースと生成ベースの両方のアプローチと、チャート理解性能を高めるツール拡張技術を含むモデリング戦略について検討する。さらに、各タスクの最先端性能について論じ、その性能を改善する方法について論じる。課題と今後の方向性は専用のセクションで対処され、ドメイン固有のチャート、評価への取り組みの欠如、エージェント指向の設定などの課題が強調される。本研究は,大規模基盤モデルを活用したチャート理解における今後の研究に有用な洞察と方向性を提供するものである。この論文で言及された研究は、新しい研究とともに、次のように継続的に更新される。 Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models (LLMs), have revolutionized various natural language processing (NLP) tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. The paper begins by defining chart understanding, outlining problem formulations, and discussing fundamental building blocks crucial for studying chart understanding tasks. In the section on tasks and datasets, we explore various tasks within chart understanding and discuss their evaluation metrics and sources of both charts and textual inputs. Modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed in a dedicated section, highlighting issues such as domain-specific charts, lack of efforts in evaluation, and agent-oriented settings. This survey paper serves to provide valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# ウルトラマン:ウルトラスピードと細部を兼ね備えた1枚の3D人間の再構築 Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail ( http://arxiv.org/abs/2403.12028v1 ) ライセンス: Link先を確認	Mingjin Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao,	(参考訳) 3Dの人体再構築はコンピュータビジョンの分野において課題となっている。従来の方法は、しばしば時間がかかり、人体の詳細な外観を捉えるのが困難である。本論文では,1枚の画像からテクスチャ化された3次元人間のモデルを高速に再現する手法である「emph{Ultraman}」を提案する。既存の技術と比較すると, 高品質なテクスチャの詳細を保存しながら, 復元速度と精度を大幅に向上させる。本稿では,3つの部分,幾何学的再構成,テクスチャ生成,テクスチャマッピングからなる,人間の再構築のための新しい枠組みを提案する。まず、メッシュ再構成フレームワークを使用し、単一の画像から正確に3次元の人体形状を抽出する。同時に,一つの画像に基づいて人体の多視点一貫した画像を生成する手法を提案する。最終的に、テクスチャの細部を最適化し、再構築時の色の整合性を確保する新しいテクスチャマッピング手法と組み合わせられる。実験や評価を通じて,各種標準データセット上での \emph{Ultraman} の優れた性能を示す。さらに、emph{Ultraman} は人間のレンダリング品質とスピードの点で最先端の手法よりも優れています。この記事が受理されると、コードとデータを公開します。 3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture mapping. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture mapping method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# Align and Distill: ドメイン適応型オブジェクト検出の統一と改善 Align and Distill: Unifying and Improving Domain Adaptive Object Detection ( http://arxiv.org/abs/2403.12029v1 ) ライセンス: Link先を確認	Justin Kay, Timm Haucke, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, Sara Beery, Grant Van Horn,	(参考訳) オブジェクト検出器は、トレーニングセットと異なるデータに対して、よく機能しない。ドメイン適応オブジェクト検出(DAOD)手法は近年,この問題に対処する上で大きな成果を上げている。残念ながら、過去の結果を疑問視し、さらなる進歩を妨げるような、体系的なベンチマークの落とし穴を特定します。 (a)低出力ベースラインによる性能の過大評価ロ方法の透明な比較を防止する不整合な実施方法及び (c)時代遅れのバックボーンとベンチマークの多様性の欠如による一般性の欠如。 1) DAODメソッドの比較と今後の開発を支援するALDI(Align and Distill)と,(2) ベンチマークの落とし穴に対処するDAODのための公正かつ現代的なトレーニングおよび評価プロトコル,(3) 新しいDAODベンチマークデータセットであるCFC-DAOD,(4) さまざまな実世界のデータに対する評価を可能にする新たな手法であるALDI++。 ALDI++は、Cityscapesで+3.5 AP50、Sim10kで+5.7 AP50、Cityscapesで+5.7 AP50、CFC Kenai to Channelで+2.0 AP50よりもパフォーマンスが高い。我々のフレームワーク、データセット、最先端の手法はDAODにとって重要なリセットを提供し、将来の研究の強力な基盤を提供する。コードとデータは以下の通りである。 https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting。 Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +2.0 AP50 on CFC Kenai to Channel. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# 事前学習モデルに基づくクラスインクリメンタル学習のための拡張可能なサブスペースアンサンブル Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning ( http://arxiv.org/abs/2403.12030v1 ) ライセンス: Link先を確認	Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye, De-Chuan Zhan,	(参考訳) CIL(Class-Incremental Learning)は、学習システムにおいて、忘れずに新しいクラスを継続的に学習することを必要とする。 CILにおけるPTM(Pre-Trained Models)の強いパフォーマンスにもかかわらず、重要な問題は続く。ネットワークの過剰な変更は忘れを引き起こすが、最小限の調整は新しいクラスに不適当である。その結果,従来の知識を損なうことなく,効率的なモデル更新方法を見出すことが望まれる。本稿では,PTMベースのCILのためのExpAndable Subspace Ensemble (EASE)を提案する。コンフリクトのないモデル更新を可能にするため、タスク固有のサブスペースを作成することを目的として、新しいタスクごとに異なる軽量アダプタモジュールをトレーニングする。これらのアダプタは高次元の特徴空間にまたがり、複数の部分空間をまたいだ共同決定を可能にする。データが進化するにつれて、拡張サブスペースは古いクラス分類器を新しいステージ空間と互換性のないものにする。それに対応して、古いクラスのインスタンスを使わずに、古いクラスの新機能を合成する意味誘導型プロトタイプ補完戦略を設計する。 7つのベンチマークデータセットに対する大規模な実験は、EASEの最先端のパフォーマンスを検証する。コードは、https://github.com/sun-hailong/CVPR24-Easeで入手できる。 Class-Incremental Learning (CIL) requires a learning system to continually learn new classes without forgetting. Despite the strong performance of Pre-Trained Models (PTMs) in CIL, a critical issue persists: learning new classes often results in the overwriting of old ones. Excessive modification of the network causes forgetting, while minimal adjustments lead to an inadequate fit for new classes. As a result, it is desired to figure out a way of efficient model updating without harming former knowledge. In this paper, we propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL. To enable model updating without conflict, we train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces. These adapters span a high-dimensional feature space, enabling joint decision-making across multiple subspaces. As data evolves, the expanding subspaces render the old class classifiers incompatible with new-stage spaces. Correspondingly, we design a semantic-guided prototype complement strategy that synthesizes old classes' new features without using any old class instance. Extensive experiments on seven benchmark datasets verify EASE's state-of-the-art performance. Code is available at: https://github.com/sun-hailong/CVPR24-Ease	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# RouterBENCH:マルチLLMルーティングシステムのベンチマーク ROUTERBENCH: A Benchmark for Multi-LLM Routing System ( http://arxiv.org/abs/2403.12031v1 ) ライセンス: Link先を確認	Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay,	(参考訳) 大規模言語モデル(LLM)のアプリケーションの範囲が拡大し続けており、効果的なサービスソリューションの需要がますます重要になっている。 LLMの汎用性にもかかわらず、特にパフォーマンスとコストのバランスをとる場合、すべてのタスクやアプリケーションに最適なモデルが存在しない。この制限により、個々のLSMの制約を克服するために、様々なモデルの強みを組み合わせたLSMルーティングシステムの開発に繋がった。しかし,LLMルータの性能評価のための標準ベンチマークが欠如していることは,この分野の進歩を妨げている。このギャップを埋めるために、我々は、LLMルーティングシステムの有効性を体系的に評価する新しい評価フレームワークであるROUTERBENCHと、代表的なLLMによる405万以上の推論結果からなる包括的なデータセットを提示し、ルーティング戦略の開発を支援する。さらに、LLMルーティングの理論的フレームワークを提案し、ROUTERBENCHを通して様々なルーティングアプローチの比較分析を行い、評価フレームワークにおけるそれらの可能性と限界を明らかにする。この作業は、LLMルーティングシステムの開発を形式化し、前進させるだけでなく、その評価基準を設定し、よりアクセスしやすく、経済的に実行可能なLLMデプロイメントの道を開く。コードとデータはhttps://github.com/withmartian/routerbench.comで公開されている。 As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs. Yet, the absence of a standardized benchmark for evaluating the performance of LLM routers hinders progress in this area. To bridge this gap, we present ROUTERBENCH, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies. We further propose a theoretical framework for LLM routing, and deliver a comparative analysis of various routing approaches through ROUTERBENCH, highlighting their potentials and limitations within our evaluation framework. This work not only formalizes and advances the development of LLM routing systems but also sets a standard for their assessment, paving the way for more accessible and economically viable LLM deployments. The code and data are available at https://github.com/withmartian/routerbench.	翻訳日:2024-03-20 19:01:22 公開日:2024-03-18
# HiKER-SG:階層的知識によるロバストなシーングラフ生成 HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation ( http://arxiv.org/abs/2403.12033v1 ) ライセンス: Link先を確認	Ce Zhang, Simon Stepputtis, Joseph Campbell, Katia Sycara, Yaqi Xie,	(参考訳) 視覚的なシーンを理解することは、自律運転、ロボティクス、その他の視覚に基づくアプローチなど、多くの下流タスクの先駆けとなる。しかし、既存の多くのアプローチでは、霧、雪、煙のような現実世界の汚職や、太陽フレアや水滴のような不均一な摂動が欠如していると仮定している。そこで本研究では,視覚ゲノムデータセット上でのプロシージャ的に生成された気象汚染やその他の変換を含む新しいSGGベンチマークを提案する。さらに,階層的知識向上型ロバストシーングラフ生成(HiKER-SGG)を導入し,このような困難な環境下でのシーングラフ生成の強力なベースラインを提供する。中心となるHiKER-SGGは階層的な知識グラフを用いて予測を粗い初期推定から詳細な予測へと洗練する。広汎な実験では、非破壊画像上でのHKER-SGGは、ゼロショット方式で優れた性能を示すだけでなく、非破壊SGGタスクにおける最先端の手法よりも優れた性能を示す。コードはhttps://github.com/zhangce01/HiKER-SGGで入手できる。 Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks. Code is available at https://github.com/zhangce01/HiKER-SGG.	翻訳日:2024-03-20 18:51:34 公開日:2024-03-18
# VFusion3D:ビデオ拡散モデルからスケーラブルな3D生成モデルを学ぶ VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models ( http://arxiv.org/abs/2403.12034v1 ) ライセンス: Link先を確認	Junlin Han, Filippos Kokkinos, Philip Torr,	(参考訳) 本稿では,事前学習ビデオ拡散モデルを用いたスケーラブルな3次元生成モデル構築のための新しいパラダイムを提案する。基礎3D生成モデルの開発における主要な障害は、3Dデータの可用性の制限である。画像、テキスト、ビデオとは異なり、3Dデータは容易にアクセスできず、入手が困難である。この結果、他の種類のデータと比較すると、大きな差が生じる。そこで本研究では,3次元データの知識源として,大量のテキスト,画像,ビデオで訓練されたビデオ拡散モデルを提案する。微調整により多視点生成能力を解放することにより、大規模な合成多視点データセットを生成し、フィードフォワード3D生成モデルを訓練する。提案するモデルであるVFusion3Dは,約3Mの合成マルチビューデータに基づいてトレーニングされ,単一の画像から1秒で3Dアセットを生成し,現在のSOTAフィードフォワード3D生成モデルと比較して優れた性能が得られる。 This paper presents a novel paradigm for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 70% of the time.	翻訳日:2024-03-20 18:51:33 公開日:2024-03-18
# CoCoCo: 一貫性,可制御性,コンパチビリティ向上のためのテキストガイド型ビデオインペインティングの改善 CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility ( http://arxiv.org/abs/2403.12035v1 ) ライセンス: Link先を確認	Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang,	(参考訳) 近年のビデオ生成の進歩は目覚ましいが、既存の多くの手法は一貫性とテキスト・ビデオの整合性に悩まされている。さらに、テキスト誘導画像の塗布において、よく探索された領域とは対照的な、テキスト誘導ビデオ塗布の効果的な技術が欠如している。そこで本稿では, 一貫性, 制御性, 互換性を向上する新しいテキスト誘導型映像インパインティングモデルを提案する。具体的には、動作の一貫性を維持するためのシンプルだが効率的なモーションキャプチャモジュールを導入し、ランダムな領域選択の代わりにインスタンス対応の領域選択を設計し、テキストによる制御性を向上し、新しい戦略を用いて、パーソナライズされたモデルをCoCoCoモデルに注入し、モデル互換性を向上させる。大規模な実験により,我々のモデルは高品質なビデオクリップを生成できることが判明した。一方,本モデルでは,動作の整合性,テキスト制御性,モデル互換性が向上している。詳細は[cococozibojia.github.io](cococozibojia.github.io]に示されている。 Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility. Specifically, we introduce a simple but efficient motion capture module to preserve motion consistency, and design an instance-aware region selection instead of a random region selection to obtain better textual controllability, and utilize a novel strategy to inject some personalized models into our CoCoCo model and thus obtain better model compatibility. Extensive experiments show that our model can generate high-quality video clips. Meanwhile, our model shows better motion consistency, textual controllability and model compatibility. More details are shown in [cococozibojia.github.io](cococozibojia.github.io).	翻訳日:2024-03-20 18:51:33 公開日:2024-03-18
# テキスト・ツー・イメージモデルを用いたワンステップ画像翻訳 One-Step Image Translation with Text-to-Image Models ( http://arxiv.org/abs/2403.12036v1 ) ライセンス: Link先を確認	Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu,	(参考訳) 本研究では,既存の条件拡散モデルの2つの制限に対処する: 反復的復調過程による推論速度の遅いことと,モデル微調整のためのペアデータへの依存である。これらの課題に対処するために,敵対的な学習目的を通じて,新しいタスクやドメインに単一ステップ拡散モデルを適用するための一般的な手法を提案する。具体的には,バニラ遅延拡散モデルの様々なモジュールを,訓練可能な小重量の単一エンドツーエンドジェネレータネットワークに統合し,オーバーフィッティングを低減しつつ,入力画像構造を保存できる能力を向上する。筆者らのモデルであるCycleGAN-Turboは, 日中変換や霧, 雪, 雨などの気象効果の付加・除去など, 様々な場面翻訳タスクにおいて, 既存のGANベースおよび拡散ベースの手法よりも優れていた。私たちはこのメソッドをペア設定に拡張し、Sketch2PhotoのControl-NetやEdge2Imageのような最近の作業と同等ですが、シングルステップの推論が可能です。本研究は, 単段階拡散モデルが, GAN学習目的の強力なバックボーンとして機能することを示唆している。私たちのコードとモデルはhttps://github.com/GaParmar/img2img-turbo.comで公開されています。 In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.	翻訳日:2024-03-20 18:51:33 公開日:2024-03-18
# 深部機能地図を用いたゼロショット画像特徴センサス Zero-Shot Image Feature Consensus with Deep Functional Maps ( http://arxiv.org/abs/2403.12038v1 ) ライセンス: Link先を確認	Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas,	(参考訳) 対応は、生成的および識別的なタスクのために訓練された大規模な視覚モデルから生じる。これは、特徴格子上の最も近い隣人を用いて、一対のイメージ間の対応マップの計算によって明らかにされ、ベンチマークされている。既存の作業は、異なるレイヤやネットワークの特徴を組み合わせるなど、異なるソースからの機能を慎重に混合することで、これらの対応マップの品質向上を図っている。より優れた対応戦略が可能であることを指摘し、対応フィールドに直接構造を課す関数写像について述べる。この単純な数学的ツールを用いて、画素空間から関数空間への対応問題を解き、グローバルに一貫性のある写像を直接最適化する。本手法は,学習対象の大規模視覚モデルに埋め込まれた知識をよりよく反映し,よりスムーズなだけでなく,より正確に対応できることを示す。我々の手法は、様々な密接な対応タスクに新たな最先端を設定できる。また,キーポイント対応やアベイランスマップの転送にも有効であることを示す。 Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and benchmarked by computing correspondence maps between pairs of images, using nearest neighbors on the feature grids. Existing work has attempted to improve the quality of these correspondence maps by carefully mixing features from different sources, such as by combining the features of different layers or networks. We point out that a better correspondence strategy is available, which directly imposes structure on the correspondence field: the functional map. Wielding this simple mathematical tool, we lift the correspondence problem from the pixel space to the function space and directly optimize for mappings that are globally coherent. We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying. Our approach sets a new state-of-the-art on various dense correspondence tasks. We also demonstrate our effectiveness in keypoint correspondence and affordance map transfer.	翻訳日:2024-03-20 18:51:33 公開日:2024-03-18
# 1枚以下の画像にデータセットを蒸留する Distilling Datasets Into Less Than One Image ( http://arxiv.org/abs/2403.12040v1 ) ライセンス: Link先を確認	Asaf Shul, Eliahu Horwitz, Yedid Hoshen,	(参考訳) データセット蒸留は、データセットをはるかに小さなデータセットに圧縮することで、蒸留データセットでトレーニングされたモデルが高い精度を達成することを目的としている。現在の方法では、K を正の整数とするK 蒸留画像の予算に対する蒸留分類精度を最大化するものである。本稿では,データセットの蒸留の境界を1クラス当たりのイメージ以下に圧縮する。意味のある量は、クラス当たりの蒸留画像数ではなく、データ当たりの蒸留画素数であることに気付くことが重要である。そこで,Poster Dataset Distillation (PoDD)を提案する。ポスターアプローチは、トレーニングイメージと学習可能なラベルを作成するための新しい技術ソリューションを動機付けている。本手法は,従来の1つのイメージ・パー・クラスを用いた手法と比較して,1クラス当たりのイメージ・パー・クラス以下で同等あるいは優れた性能を実現することができる。具体的には, CIFAR-10, CIFAR-100, CUB200に対して, 0.3画像単位の精度で新しい最先端性能を実現する。 Dataset distillation aims to compress a dataset into a much smaller one so that a model trained on the distilled dataset achieves high accuracy. Current methods frame this as maximizing the distilled classification accuracy for a budget of K distilled images-per-class, where K is a positive integer. In this paper, we push the boundaries of dataset distillation, compressing the dataset into less than an image-per-class. It is important to realize that the meaningful quantity is not the number of distilled images-per-class but the number of distilled pixels-per-dataset. We therefore, propose Poster Dataset Distillation (PoDD), a new approach that distills the entire original dataset into a single poster. The poster approach motivates new technical solutions for creating training images and learnable labels. Our method can achieve comparable or better performance with less than an image-per-class compared to existing methods that use one image-per-class. Specifically, our method establishes a new state-of-the-art performance on CIFAR-10, CIFAR-100, and CUB200 using as little as 0.3 images-per-class.	翻訳日:2024-03-20 18:51:33 公開日:2024-03-18
# ビデオオブジェクトセグメンテーション参照のための事前学習型テキスト・ビデオ拡散モデルの検討 Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation ( http://arxiv.org/abs/2403.12042v1 ) ライセンス: Link先を確認	Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua,	(参考訳) 本稿では,ビデオ理解タスクのための事前学習されたテキスト・ツー・ビデオ(T2V)拡散モデルから生成された視覚表現について検討する。我々は、事前訓練された生成的T2Vモデルから学習した潜伏表現が、豊かな意味論と一貫性のある時間的対応をカプセル化し、ビデオ理解を自然に促進する、という仮説を立てる。我々の仮説は古典的参照ビデオオブジェクトセグメンテーション(R-VOS)タスクによって検証される。固定事前訓練されたT2Vモデル上に構築されたコンポーネントを専用に設計した新しいフレームワークである ``VD-IT'' を導入する。具体的には、VD-ITはテキスト情報を条件入力として使用し、正確な時間的インスタンスマッチングのための時間間のセマンティック一貫性を保証する。さらに、画像トークンを補足的なテキスト入力として組み込んで、詳細かつニュアンスなマスクを生成する機能セットを充実させ、標準のガウスノイズの代わりに、余分なノイズ予測モジュールを用いて映像特有のノイズを予測し、特徴の忠実さを保ち、セグメンテーション品質を高めることを提案する。広範にわたる実験により,ビデオバックボーン(例えばビデオスウィントランスフォーマー)に画像・ビデオ前タスクを事前訓練した固定型T2V拡散モデルが,意味的アライメントと時間的整合性を維持する可能性が示唆された。既存の標準ベンチマークでは、我々のVD-ITは、多くの最先端の手法を超越して、非常に競争力のある結果を得る。コードは \url{https://github.com/buxiangzhiren/VD-IT} で入手できる。 In this paper, we explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding. Our hypothesis is validated through the classic referring video object segmentation (R-VOS) task. We introduce a novel framework, termed ``VD-IT'', tailored with dedicatedly designed components built upon a fixed pretrained T2V model. Specifically, VD-IT uses textual information as a conditional input, ensuring semantic consistency across time for precise temporal instance matching. It further incorporates image tokens as supplementary textual inputs, enriching the feature set to generate detailed and nuanced masks.Besides, instead of using the standard Gaussian noise, we propose to predict the video-specific noise with an extra noise prediction module, which can help preserve the feature fidelity and elevates segmentation quality. Through extensive experiments, we surprisingly observe that fixed generative T2V diffusion models, unlike commonly used video backbones (e.g., Video Swin Transformer) pretrained with discriminative image/video pre-tasks, exhibit better potential to maintain semantic alignment and temporal consistency. On existing standard benchmarks, our VD-IT achieves highly competitive results, surpassing many existing state-of-the-art methods. The code will be available at \url{https://github.com/buxiangzhiren/VD-IT}	翻訳日:2024-03-20 18:51:33 公開日:2024-03-18
# AIは人間がより良い判断を下すのに役立つか? 実験的な評価のための方法論的枠組み Does AI help humans make better decisions? A methodological framework for experimental evaluation ( http://arxiv.org/abs/2403.12108v1 ) ライセンス: Link先を確認	Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin,	(参考訳) データ駆動型アルゴリズムに基づく人工知能(AI)の利用は今日の社会で広く普及している。しかし、多くの場合、特に利害関係が高い場合、人間は最終的な決定を下す。したがって、重要な疑問は、AIが人間単独やAI単独と比較して、人間によるより良い意思決定を支援するかどうかである。本稿では,新たな方法論の枠組みを導入し,追加の仮定を伴わずにこの疑問に実験的に答えられるようにした。我々は、基準となる潜在的な結果に基づいて、標準分類基準を用いて正しい意思決定を行う意思決定者の能力を測定する。我々は、AI生成レコメンデーションの提供が、人間が最終決定を下すケースでランダム化される、単盲の実験的設計を考える。この実験的な設計の下で、人間とAI、AIとAIの3つの代替意思決定システムの性能を比較する方法を示す。提案手法を,事前リスク評価器のランダム化制御試験から得られたデータに適用する。 AIレコメンデーションは、キャッシュベイルを課す裁判官の決定の分類精度を向上しない。我々の分析は、AIが単独で行う決定は、AI支援の有無にかかわらず、人間の決定よりも一般的に悪い結果をもたらすことを示している。最後に、AIレコメンデーションは、非白人の逮捕者に対して、白人の逮捕者よりも頻繁に現金の保釈を課す傾向にある。 The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to answer experimentally this question with no additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with a human making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that AI recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that AI-alone decisions generally perform worse than human decisions with or without AI assistance. Finally, AI recommendations tend to impose cash bail on non-white arrestees more often than necessary when compared to white arrestees.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# GCAM : 食品の微粒化認識におけるガウス的・因果的アテンションモデル GCAM: Gaussian and causal-attention model of food fine-grained recognition ( http://arxiv.org/abs/2403.12109v1 ) ライセンス: Link先を確認	Guohang Zhuang, Yue Hu, Tianxing Yan, JiaZhan Gao,	(参考訳) 現在、ほとんどの食品認識は、分類の深層学習に依存している。しかしながら、これらのアプローチは視覚的に類似した食品サンプルを効果的に区別することに苦慮し、食品認識におけるきめ細かい問題に対処する必要性を強調している。これらの課題を緩和するため,細粒度物体認識のためのガウス的・因果的アテンションモデルの導入を提案し,特に対象領域におけるガウス的特徴の獲得を訓練し,続いて対象領域から細粒度特徴の抽出を行い,対象領域の特徴マッピング機能の向上を図る。不均一なデータ分布から生じるデータドリフトに対処するために、我々は反実的推論アプローチを採用する。対物的介入を用いて、学習した画像注意機構がネットワーク予測に与える影響を分析し、より詳細な画像認識のためのより有用な注意重みをネットワークが取得できるようにする。最後に,各種モジュール間のトレーニング安定性のバランスをとるための学習可能な損失戦略を設計し,最終的な目標認識の精度を向上する。我々は,この4つのデータセットに対して,GCAMがETH-FOOD101, UECFOOD256, Vireo-FOOD172データセットの最先端手法を超えることを実験的に示した。さらに,本手法は,CUB-200データセットの最先端性能も達成する。 Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition.In particular, we train to obtain Gaussian features over target regions, followed by the extraction of fine-grained features from the objects, thereby enhancing the feature mapping capabilities of the target regions. To counteract data drift resulting from uneven data distributions, we employ a counterfactual reasoning approach. By using counterfactual interventions, we analyze the impact of the learned image attention mechanism on network predictions, enabling the network to acquire more useful attention weights for fine-grained image recognition. Finally, we design a learnable loss strategy to balance training stability across various modules, ultimately improving the accuracy of the final target recognition. We validate our approach on four relevant datasets, demonstrating its excellent performance across these four datasets.We experimentally show that GCAM surpasses state-of-the-art methods on the ETH-FOOD101, UECFOOD256, and Vireo-FOOD172 datasets. Furthermore, our approach also achieves state-of-the-art performance on the CUB-200 dataset.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# 2つの貯水池と相互作用するボソニック系の電流と効率 Current and efficiency of bosonic systems interacting with two thermal reservoirs ( http://arxiv.org/abs/2403.12112v1 ) ライセンス: Link先を確認	Jayarshi Bhattacharya, Sunandan Gangopadhyay, Gautam Gangopadhyay,	(参考訳) 本稿では,異なる温度で2つの貯水池と相互作用する中心系からなるボソニック系の電流と効率のダイナミクスについて検討する。系の密度行列の時間発展を記述するマスター方程式を導出し, 成分間の相互作用とエネルギー移動を考慮した。システム内のボソンの流れを表す電流を定量化し, システムのパラメータと温度依存性を解析する。定常状態においては,エネルギー伝達過程の効率性を表す式を導出した。解析の結果,温度依存性や量子補正係数などの量子効果がエネルギー伝達効率に大きな影響を及ぼすことが示された。特に、高温では、量子システムの効率がカルノットの効率よりも大きいことが観察される。この分析から得られた知見は、エネルギー利用の最適化が不可欠である量子コンピューティングやエネルギー収穫など、様々な分野に影響を及ぼす可能性がある。 This paper investigates the dynamics of current and efficiency in a bosonic system consisting of a central system interacting with two reservoirs at different temperatures. We derive a master equation describing the time evolution of the density matrix of the system, accounting for the interactions and energy transfer between the components. We quantify the current, representing the flow of bosons through the system and analyse its dependence on the system's parameters and temperatures of the thermal reservoirs. In the steady state regime, we derived an expression for the efficiency of the energy transfer process. Our analysis show that quantum effects, such as the dependence on temperature and the quantum correction factor, can significantly impact energy transfer efficiency. In particular, we observe that at high temperatures, the efficiency of the quantum system is greater than the Carnot efficiency. The insights gained from this analysis may have implications in various fields, including quantum computing and energy harvesting, where optimising energy utilisation is crucial.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# 自律鉄道システムの安全分析--SACRED手法の紹介 Safety Analysis of Autonomous Railway Systems: An Introduction to the SACRED Methodology ( http://arxiv.org/abs/2403.12114v1 ) ライセンス: Link先を確認	Josh Hunter, John McDermid, Simon Burton,	(参考訳) 鉄道産業は、自律性と機械学習(ML)の導入をますます求めているため、いくつかの疑問が浮かび上がっている。このようなシステムや技術に対して、どうやって安全性を確保することができるのか? この新たな技術分野における現在の安全基準の適用性はどのようなものか? システムを安全に分類するための重要な指標は何ですか? 現在、鉄道における安全分析は、既存の技術の故障モードを反映しており、対照的に、自動化の分析の主な関心事は、通常平均的な性能である。このような純粋に統計的にMLのパフォーマンスを測定するアプローチは制限されている。これらの課題に対処するため、我々は、初期安全ケースを作成し、自律システムにとって重要な安全基準を決定する安全方法論であるSACREDを紹介した。 SACREDの開発は、ベルリンで提案されたGoA-4ライトレールシステムによって動機付けられている。 As the railway industry increasingly seeks to introduce autonomy and machine learning (ML), several questions arise. How can safety be assured for such systems and technologies? What is the applicability of current safety standards within this new technological landscape? What are the key metrics to classify a system as safe? Currently, safety analysis for the railway reflects the failure modes of existing technology; in contrast, the primary concern of analysis of automation is typically average performance. Such purely statistical approaches to measuring ML performance are limited, as they may overlook classes of situations that may occur rarely but in which the function performs consistently poorly. To combat these difficulties we introduce SACRED, a safety methodology for producing an initial safety case and determining important safety metrics for autonomous systems. The development of SACRED is motivated by the proposed GoA-4 light-rail system in Berlin.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# 深層学習は多専門観測者と比較してコブ角測定を自動化する Deep learning automates Cobb angle measurement compared with multi-expert observers ( http://arxiv.org/abs/2403.12115v1 ) ライセンス: Link先を確認	Keyu Li, Hanxue Gu, Roy Colglazier, Robert Lark, Elizabeth Hubbard, Robert French, Denise Smith, Jikai Zhang, Erin McCrum, Anthony Catanzano, Joseph Cao, Leah Waldman, Maciej A. Mazurowski, Benjamin Alman,	(参考訳) 変形につながる異常な脊椎曲率を特徴とする強皮症は、効果的な診断と管理のために正確な評価方法を必要とする。コブ角 (Cobb angle) は、傾斜した椎骨間の曲率を測定する、広く使われているスコリシス定量法である。しかし、Cobbのアングルの手動測定は時間がかかり、労働集約的であり、大きなオブザーバ間およびオブザーバ内変動を伴っている。これらの課題に対処するために、既存の自動化手法で見られる解釈可能性の欠如に対処するため、私たちは、Cobb角を正確に測定するだけでなく、これらの測定の明確な可視化を提供する、完全に自動化されたソフトウェアを作成しました。このソフトウェアは、ディープニューラルネットワークに基づくスピーン領域の検出とセグメンテーション、スピーン中心線同定、最も傾いた脊椎のピンポイント、オリジナル画像上のコブ角の直接可視化を統合している。専門家7人の評価結果と比較すると、我々のアルゴリズムはコブ角度の4.17度の平均偏差を示し、特に手作業による平均可読値の5.16度を上回った。また,0.96以上のクラス内相関係数 (ICC) と0.944以上のピアソン相関係数 (Pearson correlation coefficient) も達成した。総合的な読者調査と統計分析を通じて、このアルゴリズムは専門家の読者と高いコンセンサスを確保するだけでなく、評価中の解釈可能性や再現性を高めることができると信じている。臨床応用には非常に有望であり、より正確なスコリオーシスの評価と診断を医師に支援し、患者のケアを改善する可能性がある。 Scoliosis, a prevalent condition characterized by abnormal spinal curvature leading to deformity, requires precise assessment methods for effective diagnosis and management. The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. Yet, manual measuring of Cobb angles is time-consuming and labor-intensive, fraught with significant interobserver and intraobserver variability. To address these challenges and the lack of interpretability found in certain existing automated methods, we have created fully automated software that not only precisely measures the Cobb angle but also provides clear visualizations of these measurements. This software integrates deep neural network-based spine region detection and segmentation, spine centerline identification, pinpointing the most significantly tilted vertebrae, and direct visualization of Cobb angles on the original images. Upon comparison with the assessments of 7 expert readers, our algorithm exhibited a mean deviation in Cobb angle measurements of 4.17 degrees, notably surpassing the manual approach's average intra-reader discrepancy of 5.16 degrees. The algorithm also achieved intra-class correlation coefficients (ICC) exceeding 0.96 and Pearson correlation coefficients above 0.944, reflecting robust agreement with expert assessments and superior measurement reliability. Through the comprehensive reader study and statistical analysis, we believe this algorithm not only ensures a higher consensus with expert readers but also enhances interpretability and reproducibility during assessments. It holds significant promise for clinical application, potentially aiding physicians in more accurate scoliosis assessment and diagnosis, thereby improving patient care.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# 自己決定型バイオインスパイアされた標的を用いた教師なしエンドツーエンドトレーニング Unsupervised End-to-End Training with a Self-Defined Bio-Inspired Target ( http://arxiv.org/abs/2403.12116v1 ) ライセンス: Link先を確認	Dongshu Liu, Jérémie Laydevant, Adrien Pontlevy, Damien Querlioz, Julie Grollier,	(参考訳) 現在の教師なし学習法は、自己教師付き学習のような深層学習技術によるエンドツーエンドの訓練、高い計算要求、ヘビアン学習のようなバイオインスパイアされたアプローチを用いた層間学習、あるいは教師付き学習とは相容れない局所学習規則を用いる。どちらのアプローチも、疎結合な計算リソースに依存し、教師なしと教師なしの学習フェーズの交互化による大きな恩恵を受けるエッジAIハードウェアには問題があり、環境から広く利用可能なラベルなしのデータとラベル付きトレーニングデータセットを活用する。この課題を解決するために,ネットワークの最終層でWinner-Take-All (WTA) の選択性を利用する「自己定義目標」を導入し,生物学的にインスパイアされたホメオスタシス機構による正規化を補完する。このアプローチはフレームワークに依存しず、グローバル(バックプロパゲーション)とローカル(平衡伝播)の学習ルールの両方と互換性があり、MNISTデータセット上で97.6%のテスト精度を達成する。さらに,隠蔽層を組み込むことで,学習方法の分類精度と品質が向上し,エンド・ツー・エンドの教師なし学習の利点が示されることを示した。半教師付き学習に拡張して、データ可用性に応じてターゲットを動的に調整し、600個のラベル付きMNISTサンプルで96.6%の精度で達成する。この結果は、豊富なラベル付きデータ可用性から不要なシナリオにおける、"教師なしのターゲット"戦略の有効性と柔軟性を強調します。 Current unsupervised learning methods depend on end-to-end training via deep learning techniques such as self-supervised learning, with high computational requirements, or employ layer-by-layer training using bio-inspired approaches like Hebbian learning, using local learning rules incompatible with supervised learning. Both approaches are problematic for edge AI hardware that relies on sparse computational resources and would strongly benefit from alternating between unsupervised and supervised learning phases - thus leveraging widely available unlabeled data from the environment as well as labeled training datasets. To solve this challenge, in this work, we introduce a 'self-defined target' that uses Winner-Take-All (WTA) selectivity at the network's final layer, complemented by regularization through biologically inspired homeostasis mechanism. This approach, framework-agnostic and compatible with both global (Backpropagation) and local (Equilibrium propagation) learning rules, achieves a 97.6% test accuracy on the MNIST dataset. Furthermore, we demonstrate that incorporating a hidden layer enhances classification accuracy and the quality of learned features across all training methods, showcasing the advantages of end-to-end unsupervised training. Extending to semi-supervised learning, our method dynamically adjusts the target according to data availability, reaching a 96.6% accuracy with just 600 labeled MNIST samples. This result highlights our 'unsupervised target' strategy's efficacy and flexibility in scenarios ranging from abundant to no labeled data availability.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# T細胞応答予測のための伝達学習 Transfer Learning for T-Cell Response Prediction ( http://arxiv.org/abs/2403.12117v1 ) ライセンス: Link先を確認	Josua Stadelmaier, Brandon Malone, Ralf Eggeling,	(参考訳) 特定の特定のペプチドに対するT細胞応答の予測について検討し、特に、パーソナライズされたがんワクチンの開発に向けた重要なステップとなる可能性がある。モデルは、T細胞応答に関連する特定のペプチド特性よりも、ソース生物のようなペプチド源の一般的な特性を学習する。本稿では,T細胞応答予測のためのトランスフォーマーモデルを用いて,膨らませた予測性能の危険性は理論上ではなく,実際に発生することを示す。そこで本研究では,ドメイン認識評価手法を提案する。次に、多領域構造とショートカット学習を扱うために、異なるトランスファー学習手法について研究する。さらに,本研究の最終モデルは,ヒトペプチドに対するT細胞応答を予測するために,既存の最先端のアプローチよりも優れていることを示す。 We study the prediction of T-cell response for specific given peptides, which could, among other applications, be a crucial step towards the development of personalized cancer vaccines. It is a challenging task due to limited, heterogeneous training data featuring a multi-domain structure; such data entail the danger of shortcut learning, where models learn general characteristics of peptide sources, such as the source organism, rather than specific peptide characteristics associated with T-cell response. Using a transformer model for T-cell response prediction, we show that the danger of inflated predictive performance is not merely theoretical but occurs in practice. Consequently, we propose a domain-aware evaluation scheme. We then study different transfer learning techniques to deal with the multi-domain structure and shortcut learning. We demonstrate a per-source fine tuning approach to be effective across a wide range of peptide sources and further show that our final model outperforms existing state-of-the-art approaches for predicting T-cell responses for human peptides.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# すべての非キラルアーベル位相に対する低オーバヘッド非クリフォードトポロジカルトトレラント回路 Low-overhead non-Clifford topological fault-tolerant circuits for all non-chiral abelian topological phases ( http://arxiv.org/abs/2403.12119v1 ) ライセンス: Link先を確認	Andreas Bauer,	(参考訳) 本稿では,任意のアーベル非キラル位相を積極的に誤り訂正したフォールトトレラントメモリとして実現した幾何的局所回路群を提案する。これらの回路は、セルコホモロジーと高次カップ生成物を通して表現される離散的不動点経路積分の1-形式対称性から構成される。私たちが使用する具体的な経路積分は、ねじれた量子二重モデルの時空表現である三次元セルレーション上のアーベル的ジクグラーフ・ウィッテン状態和である。結果として得られた回路は、(キューディット)安定化トーリック符号のシンドローム抽出回路に基づいており、そこでは 'twist'' を実装した非クリフォード位相ゲートを挿入する。トーリック符号に対するオーバーヘッドは、ねじれたアーベル位相の既知の構成とは対照的に、適度である。また,測度に基づくトポロジカル量子計算やフロッケ符号のような(量子)トーリック符号相の他のアーキテクチャは,位相ゲートに富み,ツイスト量子双対を実装できることを示した。さらなる結果として、1-形式対称固定点回路と呼ぶ位相回路の非常に一般的なクラスに対して、任意の局所雑音(非パウリノイズを含む)の下での耐故障性を証明する。この概念は、安定化トーリック符号、サブシステムトーリック符号、測定に基づくトポロジカル量子計算、または(CSS)ハニカムフロッケ符号と同様に、この論文の回路を統一する。また,本手法が特定の非アベリア位相に対する耐故障回路の構築にどのように適用できるかを示す。付録では、任意のセルレーション上の高次カップ積の式を定義するための明示的な組合せ手順を提示する。 We propose a family of explicit geometrically local circuits realizing any abelian non-chiral topological phase as an actively error-corrected fault-tolerant memory. These circuits are constructed from measuring 1-form symmetries in discrete fixed-point path integrals, which we express through cellular cohomology and higher-order cup products. The specific path integral we use is the abelian Dijkgraaf-Witten state sum on a 3-dimensional cellulation, which is a spacetime representation of the twisted quantum double model. The resulting circuits are based on a syndrome extraction circuit of the (qudit) stabilizer toric code, into which we insert non-Clifford phase gates that implement the ``twist''. The overhead compared to the toric code is moderate, in contrast to known constructions for twisted abelian phases. We also show that other architectures for the (qudit) toric code phase, like measurement-based topological quantum computation or Floquet codes, can be enriched with phase gates to implement twisted quantum doubles instead of their untwisted versions. As a further result, we prove fault tolerance under arbitrary local (including non-Pauli) noise for a very general class of topological circuits that we call 1-form symmetric fixed-point circuits. This notion unifies the circuits in this paper as well as the stabilizer toric code, subsystem toric code, measurement-based topological quantum computation, or the (CSS) honeycomb Floquet code. We also demonstrate how our method can be adapted to construct fault-tolerant circuits for specific non-Abelian phases. In the appendix we present an explicit combinatorial procedure to define formulas for higher cup products on arbitrary cellulations, which might be interesting in its own right to the TQFT and topological-phases community.	翻訳日:2024-03-20 18:41:45 公開日:2024-03-18
# DistClassiPyを用いた光曲線分類:新しい距離ベース分類器 Light Curve Classification with DistClassiPy: a new distance-based classifier ( http://arxiv.org/abs/2403.12120v1 ) ライセンス: Link先を確認	Siddharth Chaini, Ashish Mahabal, Ajit Kembhavi, Federica B. Bianco,	(参考訳) シントロピック・スカイサーベイの台頭は、時間領域天文学におけるビッグデータの時代に始まり、データ科学と機械学習が天体の研究に欠かせないツールとなった。ツリーベース(例えばランダムフォレスト)とディープラーニングモデルは、この分野の現在の標準を表している。物体の分類に異なる距離の測定値を用いる方法について検討する。そこで我々はDistClassiPyという距離メートル法に基づく新しい分類器を開発した。距離メトリクスの直接利用は、時間領域天文学では研究されていないアプローチであるが、距離に基づく手法は、分類結果の解釈可能性を高め、計算コストを減少させるのに役立つ。特に、異なるクラスの天体間の距離を比較することで、変光星の光曲線を分類する。 10級の6,000個の変光星のカタログに応用した18の距離測定値を用いて,分類と次元の減少を実証した。この分類器は最先端の性能に適合するが, 計算要求が低く, 解釈性も向上していることを示す。 DistClassiPyをオープンソースにしてhttps://pypi.org/project/distclassipy/でアクセスできるようにした。 The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. Tree-based (e.g. Random Forests) and deep learning models represent the current standard in the field. We explore the use of different distance metrics to aid in the classification of objects. For this, we developed a new distance metric based classifier called DistClassiPy. The direct use of distance metrics is an approach that has not been explored in time-domain astronomy, but distance-based methods can aid in increasing the interpretability of the classification result and decrease the computational costs. In particular, we classify light curves of variable stars by comparing the distances between objects of different classes. Using 18 distance metrics applied to a catalog of 6,000 variable stars in 10 classes, we demonstrate classification and dimensionality reduction. We show that this classifier meets state-of-the-art performance but has lower computational requirements and improved interpretability. We have made DistClassiPy open-source and accessible at https://pypi.org/project/distclassipy/ with the goal of broadening its applications to other classification scenarios within and beyond astronomy.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# Anyonic partial Transpose を用いたAnyonic Systemsの絡み合い特性 Characterizing the Entanglement of Anyonic Systems using the Anyonic Partial Transpose ( http://arxiv.org/abs/2403.12121v1 ) ライセンス: Link先を確認	Nico Kirchner, Wonjune Choi, Frank Pollmann,	(参考訳) 混合量子状態の絡み合いは、部分転位とその対応する絡み合い測度、対数ネガティリティを用いて定量化することができる。近年、部分転位の概念は、交換統計がボゾンやフェルミオンのケースを超えたエキゾチック準粒子であるエキゾチック準粒子の系にまで拡張されている。この正準部分転位の基本的な性質を調べたところ、フェルミオン系の特別な場合に適用すると、境界マヨラナフェルミオンが存在するか否かに応じてフェルミオン部分転位またはそのねじれた変種に還元できることが明らかとなった。基底状態の性質に着目して、共形場理論によって予測されるような、空隙のない系の正しい絡み合いスケーリングと、位相的に自明な位相と非自明な位相の相転移の両方を、正準部分転置が捉えていることが分かる。非アーベル素数や二分割幾何に対して、部分転置の固有値、いわゆる負性スペクトルのリッチな多重構造を見つけ、電荷-と不均衡分解された負性の両方を定義する可能性を明らかにする。 Entanglement of mixed quantum states can be quantified using the partial transpose and its corresponding entanglement measure, the logarithmic negativity. Recently, the notion of partial transpose has been extended to systems of anyons, which are exotic quasiparticles whose exchange statistics go beyond the bosonic and fermionic case. Studying the fundamental properties of this anyonic partial transpose, we first reveal that when applied to the special case of fermionic systems, it can be reduced to the fermionic partial transpose or its twisted variant depending on whether or not a boundary Majorana fermion is present. Focusing on ground state properties, we find that the anyonic partial transpose captures both the correct entanglement scaling for gapless systems, as predicted by conformal field theory, and the phase transition between a topologically trivial and a nontrivial phase. For non-abelian anyons and the bipartition geometry, we find a rich multiplet structure in the eigenvalues of the partial transpose, the so-called negativity spectrum, and reveal the possibility of defining both a charge- and an imbalance-resolved negativity.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# ニューラルネットワークの同変表現学習のためのグラフニューラルネットワーク Graph Neural Networks for Learning Equivariant Representations of Neural Networks ( http://arxiv.org/abs/2403.12143v1 ) ライセンス: Link先を確認	Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang,	(参考訳) 他のニューラルネットワークのパラメータを処理するニューラルネットワークは、暗黙のニューラルネットワーク表現の分類、ニューラルネットワークの重みの生成、一般化エラーの予測など、さまざまな分野のアプリケーションを見つける。しかし、既存のアプローチは、ニューラルネットワークの固有の置換対称性を見落としているか、あるいは、ネットワークアーキテクチャ自体の影響を無視しながら、均等性を達成するために複雑な重み付けパターンに依存している。本研究では,ニューラルネットワークをパラメータの計算グラフとして表現することを提案する。そこで本研究では,ニューラルネットワークグラフを多種多様なアーキテクチャでエンコードする単一モデルを提案する。本稿では,暗黙のニューラル表現の分類と編集,一般化性能の予測,最適化の学習など,幅広いタスクにおける本手法の有効性について述べる。ソースコードはhttps://github.com/mkofinas/neural-graphsで公開されている。 Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance, while ignoring the impact of the network architecture itself. In this work, we propose to represent neural networks as computational graphs of parameters, which allows us to harness powerful graph neural networks and transformers that preserve permutation symmetry. Consequently, our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations, predicting generalization performance, and learning to optimize, while consistently outperforming state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neural-graphs.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# Syn-QA2:Synthetic QAデータセットを用いた長期質問における偽推定の評価 Syn-QA2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets ( http://arxiv.org/abs/2403.12145v1 ) ライセンス: Link先を確認	Ashwin Daswani, Rohan Sawant, Najoung Kim,	(参考訳) 情報探索問題における虚偽の仮定(または偽の前提)に対する感度は、堅牢な質問回答システム(QA)にとって重要である。近年の研究では、自然発生問題における誤った仮定が、生成的QAと単純な検出タスクの両方で低い性能で、現在のモデルに課題をもたらすことが示されている(Kim et al 2023)。しかし, 自然発生型質問に対する既存の研究の焦点は, 可能な質問の分布の長い部分におけるモデル行動の分析のギャップに繋がる。この目的のために、Syn-(QA)$^2$という合成生成された2つのQAデータセットをWikidataから摂動関係を用いて生成し、HotpotQAを摂動することで生成する(Yang et al 2018)。大規模言語モデルの評価から得られた知見は,(1)QAにおける誤った仮定は,先行研究の成果を反映して困難である,(2)生成的QA自体の難易度よりも二項検出タスクが困難である,(3)自然発生の質問よりも長い質問の方が困難であること,(3)合成データセットや生成手法の有用性を強調している,の3つである。 Sensitivity to false assumptions (or false premises) in information-seeking questions is critical for robust question-answering (QA) systems. Recent work has shown that false assumptions in naturally occurring questions pose challenges to current models, with low performance on both generative QA and simple detection tasks (Kim et al. 2023). However, the focus of existing work on naturally occurring questions leads to a gap in the analysis of model behavior on the long tail of the distribution of possible questions. To this end, we introduce Syn-(QA)$^2$, a set of two synthetically generated QA datasets: one generated using perturbed relations from Wikidata, and the other by perturbing HotpotQA (Yang et al. 2018). Our findings from evaluating a range of large language models are threefold: (1) false assumptions in QA are challenging, echoing the findings of prior work, (2) the binary detection task is challenging even compared to the difficulty of generative QA itself, possibly due to the linguistic structure of the problem, and (3) the detection task is more challenging with long-tail questions compared to naturally occurring questions, highlighting the utility of our synthetic datasets and generation method.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 有限位相格子モデルに対する厳密な反断熱駆動 Exact counterdiabatic driving for finite topological lattice models ( http://arxiv.org/abs/2403.12150v1 ) ライセンス: Link先を確認	Callum W. Duncan,	(参考訳) 断熱プロトコルは、しばしば状態の準備スキームで使用されるが、瞬時固有状態間の遷移が指数関数的に抑制されるように、ゆっくりと変化するハミルトニアンによってシステムを動かす必要がある。ダイアバティック・ドライビング(英: Counterdiabatic driving)は、ダイアバティック・エキサイティング(Diabatic excitations)に対抗する瞬間固有状態から計算された追加用語を含めることで、ダイアバティック・プロトコルを高速化する技術である。しかし、このアプローチは完全な固有スペクトルの知識を必要とするため、反断熱駆動の正確な解析形式は、例えば高調波振動子と横場イジングモデルのような問題のサブセットでのみ知られている。この問題のサブセットを、開境界条件と任意のオンサイトポテンシャル、トンネル項、格子サイズを持つ1次元非相互作用格子モデルの一般族を含むように拡張する。格子モデルのすべての状態に対してこのアプローチを定式化し、トポロジカル絶縁体に現れるような境界状態やギャップ内状態を含む。また、特定の状態に留まるために動的状態を強制するために調整された、標的の反断熱駆動用語を導出する。一例として、Su-Schrieffer-Heegerモデルの位相的エッジ状態を用いた状態遷移を考える。導出された解析的反断熱駆動ハミルトニアンは、多体格子モデルにおける制御プロトコルを知らせたり、格子モデルの非平衡特性を探索するために利用することができる。 Adiabatic protocols are often employed in state preparation schemes but require the system to be driven by a slowly varying Hamiltonian so that transitions between instantaneous eigenstates are exponentially suppressed. Counterdiabatic driving is a technique to speed up adiabatic protocols by including additional terms calculated from the instantaneous eigenstates that counter diabatic excitations. However, this approach requires knowledge of the full eigenspectrum meaning that the exact analytical form of counterdiabatic driving is only known for a subset of problems, e.g., the harmonic oscillator and transverse field Ising model. We extend this subset of problems to include the general family of one-dimensional non-interacting lattice models with open boundary conditions and arbitrary on-site potential, tunnelling terms, and lattice size. We formulate this approach for all states of lattice models, including bound and in-gap states which appear, e.g., in topological insulators. We also derive targeted counterdiabatic driving terms which are tailored to enforce the dynamical state to remain in a specific state. As an example, we consider state transfer using the topological edge states of the Su-Schrieffer-Heeger model. The derived analytical counterdiabatic driving Hamiltonian can be utilised to inform control protocols in many-body lattice models or to probe the non-equilibrium properties of lattice models.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# ゼロショットオブジェクト状態分類のための知識グラフへの大言語モデルからのドメイン特化コンテンツの利用 Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification ( http://arxiv.org/abs/2403.12151v1 ) ライセンス: Link先を確認	Filippos Gouidis, Katerina Papantoniou, Konstantinos Papoutsakis Theodore Patkos, Antonis Argyros, Dimitris Plexousakis,	(参考訳) ドメイン固有の知識は、幅広いビジョンタスクへの対処に大きく貢献する。しかし、そのような知識の創出には相当な人的労働力と時間的コストが伴う。本研究では,Large Language Models (LLMs) のセマンティック埋め込みによるドメイン固有情報の生成と提供の可能性について検討する。これを実現するために、LLMは知識グラフと事前訓練されたセマンティックベクターを、ビジョンベースのゼロショットオブジェクト状態分類タスクのコンテキストで使用するパイプラインに統合される。広範囲なアブレーション研究を通じて, LLMの挙動を徹底的に検討した。その結果,LLMをベースとした組込みと汎用的な事前学習型組込みを組み合わせることで,大幅な性能向上が期待できることがわかった。このアブレーション研究から得られた知見を引用し、競合するモデルとの比較分析を行い、提案手法により達成された最先端の性能を明らかにする。 Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors in the context of the Vision-based Zero-shot Object State Classification task. We thoroughly examine the behavior of the LLM through an extensive ablation study. Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements. Drawing insights from this ablation study, we conduct a comparative analysis against competing models, thereby highlighting the state-of-the-art performance achieved by the proposed approach.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 心エコー図による左室エジェクション分画の自動ニューラルネットワーク予測法の開発 Development of Automated Neural Network Prediction for Echocardiographic Left ventricular Ejection Fraction ( http://arxiv.org/abs/2403.12152v1 ) ライセンス: Link先を確認	Yuting Zhang, Boyang Liu, Karina V. Bunting, David Brind, Alexander Thorley, Andreas Karwath, Wenqi Lu, Diwei Zhou, Xiaoxia Wang, Alastair R. Mobley, Otilia Tica, Georgios Gkoutos, Dipak Kotecha, Jinming Duan,	(参考訳) 左室流出率(LVEF)の心エコー計測は,心不全(HF)患者の診断と分類の基礎となる。本稿では,LVEFを自動的かつ正確に定量化するために,深層ニューラルネットワークとアンサンブル学習に基づく新しいパイプライン手法を提案する。パイプライン内では、Atrous Convolutional Neural Network (ACNN) が最初に訓練され、左心室(LV)を分割した後、楕円体単一平面モデルに基づく領域長の定式化を用いてLVEF値を計算した。この定式化には、改良されたジェフリー法を用いたセグメント化から派生したLV領域の入力と、新しいアンサンブル学習モデルから派生したLV長さが必要であった。パイプラインの精度をさらに向上するために、自動ピーク検出アルゴリズムを使用して、エンド・ディストリックとエンド・シストリックのフレームを識別し、ヒューマンエラーの問題を回避した。その後, 単拍LVEF値を全心循環で平均化し, 最終LVEFを得た。この手法は,10,030個の心エコー図を含むオープンソースデータセットを用いて開発され,内部的に検証された。 Pearson's correlation coefficient was 0.83 for LVEF prediction than expert human analysis (p<0.001) and a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 - 0.99) for cateization of HF with reduce ejection (HFrEF; LVEF<40%)。 200個の心エコー図を用いた外部データセットでは、HFrEF評価のためのAUCが0.90(95%信頼区間0.88から0.91)に達した。本研究では、LVEFの自動ニューラルネットワークに基づく計算が、心収縮機能の時間的・フレーム単位のマニュアル評価を行う専門医に匹敵することを示した。 The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). In order to quantify LVEF automatically and accurately, this paper proposes a new pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first trained to segment the left ventricle (LV), before employing the area-length formulation based on the ellipsoid single-plane model to calculate LVEF values. This formulation required inputs of LV area, derived from segmentation using an improved Jeffrey's method, as well as LV length, derived from a novel ensemble learning model. To further improve the pipeline's accuracy, an automated peak detection algorithm was used to identify end-diastolic and end-systolic frames, avoiding issues with human error. Subsequently, single-beat LVEF values were averaged across all cardiac cycles to obtain the final LVEF. This method was developed and internally validated in an open-source dataset containing 10,030 echocardiograms. The Pearson's correlation coefficient was 0.83 for LVEF prediction compared to expert human analysis (p<0.001), with a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 to 0.99) for categorisation of HF with reduced ejection (HFrEF; LVEF<40%). In an external dataset with 200 echocardiograms, this method achieved an AUC of 0.90 (95% confidence interval 0.88 to 0.91) for HFrEF assessment. This study demonstrates that an automated neural network-based calculation of LVEF is comparable to expert clinicians performing time-consuming, frame-by-frame manual evaluation of cardiac systolic function.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 多元経路探索に応用した解集合プログラミングにおけるルーティングとスケジューリング:予備報告 Routing and Scheduling in Answer Set Programming applied to Multi-Agent Path Finding: Preliminary Report ( http://arxiv.org/abs/2403.12153v1 ) ライセンス: Link先を確認	Roland Kaminski, Torsten Schaub, Tran Cao Son, Jiří Švancara, Philipp Wanko,	(参考訳) 本稿では、ASP(Answer Set Programming)におけるルーティングとスケジューリングの代替手法を提案し、マルチエージェントパス探索の文脈でそれらを探索する。その考え方は、アクションや流動性に付随する時間ステップではなく、部分的な順序で時間の流れを捉えることである。これはまた、計画の長さの固定された上界の必要性を廃止する。この回避のトレードオフは、(一部)時間軌道は、同じ作用や流線型の複数の発生がもはや区別できないため、非循環でなければならないことである。このアプローチはルーティングをモデリングする興味深い代替手段を提供するが、きめ細かいタイミングをASP.NETで表現できないため、スケジューリングの代替にはならない。これは、非巡回性や差分制約といった外部手段で効率的に処理できる部分順序に対して異なる。我々はこのアイデアを正式に詳述し、いくつかのASPエンコーディングを提示する。最後に,実験解析による有効性を示す。 We present alternative approaches to routing and scheduling in Answer Set Programming (ASP), and explore them in the context of Multi-agent Path Finding. The idea is to capture the flow of time in terms of partial orders rather than time steps attached to actions and fluents. This also abolishes the need for fixed upper bounds on the length of plans. The trade-off for this avoidance is that (parts of) temporal trajectories must be acyclic, since multiple occurrences of the same action or fluent cannot be distinguished anymore. While this approach provides an interesting alternative for modeling routing, it is without alternative for scheduling since fine-grained timings cannot be represented in ASP in a feasible way. This is different for partial orders that can be efficiently handled by external means such as acyclicity and difference constraints. We formally elaborate upon this idea and present several resulting ASP encodings. Finally, we demonstrate their effectiveness via an empirical analysis.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# ThermoNeRF: 熱合成のためのマルチモーダル神経放射場 ThermoNeRF: Multimodal Neural Radiance Fields for Thermal Novel View Synthesis ( http://arxiv.org/abs/2403.12154v1 ) ライセンス: Link先を確認	Mariam Hassan, Florent Forest, Olga Fink, Malcolm Mielle,	(参考訳) 熱環境の再構築は、建築エネルギー消費分析や非破壊試験など、幅広い分野にわたるap-plicationの可能性を秘めている。しかし、既存のメスオードは通常、密集したシーン計測を必要とし、しばしばRGB画像に頼って3次元形状の再構成を行い、熱情報は再建後に投影される。この2段階の戦略は、熱画像のテクスチャの欠如によって採用され、再構成された物体の形状と温度と実際のシーンの温度の相違をもたらす可能性がある。この課題に対処するため,ニューラル・ラジアンス・フィールドに基づく新しいマルチモーダル・アプローチであるThermoNeRFを提案する。熱画像のテクスチャの欠如を克服するために,RGBと熱画像を組み合わせてシーン密度を学習し,異なるネットワークが色や温度情報を推定する。さらに、シーン再構築に利用可能なRGB+熱的データセットの欠如を緩和する新しいデータセットであるThermoScenesを紹介する。実験結果から, サーモネフロンの平均絶対温度誤差は1.5{\deg}Cであり, コンカレントRGB+熱データとNerfactoを用いた場合に比べて50%以上向上した。 Thermal scene reconstruction exhibit great potential for ap- plications across a broad spectrum of fields, including building energy consumption analysis and non-destructive testing. However, existing meth- ods typically require dense scene measurements and often rely on RGB images for 3D geometry reconstruction, with thermal information being projected post-reconstruction. This two-step strategy, adopted due to the lack of texture in thermal images, can lead to disparities between the geometry and temperatures of the reconstructed objects and those of the actual scene. To address this challenge, we propose ThermoNeRF, a novel multimodal approach based on Neural Radiance Fields, capable of rendering new RGB and thermal views of a scene jointly. To overcome the lack of texture in thermal images, we use paired RGB and thermal images to learn scene density, while distinct networks estimate color and temperature information. Furthermore, we introduce ThermoScenes, a new dataset to palliate the lack of available RGB+thermal datasets for scene reconstruction. Experimental results validate that ThermoNeRF achieves accurate thermal image synthesis, with an average mean ab- solute error of 1.5{\deg}C, an improvement of over 50% compared to using concatenated RGB+thermal data with Nerfacto, a state-of-the-art NeRF method.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# ディリクレ混合モデルにおける効率的なKL偏差推定のための変分法 Variational Approach for Efficient KL Divergence Estimation in Dirichlet Mixture Models ( http://arxiv.org/abs/2403.12158v1 ) ライセンス: Link先を確認	Samyajoy Pal, Christian Heumann,	(参考訳) 本研究は,構成データのクラスタリングに不可欠なディリクレ混合モデル (DMM) におけるKulback-Leibler (KL) の分散を効率的に推定することに取り組む。 DMMが重要であるにも拘わらず、KL分枝に対する解析的に抽出可能な解が得られることが証明されている。過去のアプローチはモンテカルロ法を計算的に要求することに依存しており、新しい変分法の導入を動機付けていた。本手法は,高速モデル比較とロバスト評価のための計算効率を大幅に向上する閉形式解を提供する。実データとシミュレーションデータを用いた検証は、モンテカルロの従来の手法よりも優れた効率と精度を示し、多様なDMMモデルの迅速な探索と、構成データの統計的解析の進歩に新たな道を開く。 This study tackles the efficient estimation of Kullback-Leibler (KL) Divergence in Dirichlet Mixture Models (DMM), crucial for clustering compositional data. Despite the significance of DMMs, obtaining an analytically tractable solution for KL Divergence has proven elusive. Past approaches relied on computationally demanding Monte Carlo methods, motivating our introduction of a novel variational approach. Our method offers a closed-form solution, significantly enhancing computational efficiency for swift model comparisons and robust estimation evaluations. Validation using real and simulated data showcases its superior efficiency and accuracy over traditional Monte Carlo-based methods, opening new avenues for rapid exploration of diverse DMM models and advancing statistical analyses of compositional data.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 金融市場におけるリーダーの声の影響:NASDAQ, NSE, その他に関する実証的深層学習調査 Effect of Leaders Voice on Financial Market: An Empirical Deep Learning Expedition on NASDAQ, NSE, and Beyond ( http://arxiv.org/abs/2403.12161v1 ) ライセンス: Link先を確認	Arijit Das, Tanmoy Nandi, Prasanta Saha, Suman Das, Saronyo Mukherjee, Sudip Kumar Naskar, Diganta Saha,	(参考訳) 株価、株式、金、石油、相互資金といった金融市場は、ニュースやソーシャルメディアへの投稿の影響を受けている。本研究では、さまざまな分野のリーダーのTwitterハンドルのNLP分析に基づいて、金融市場の動向を予測するために、ディープラーニングに基づくモデルを提案する。財務要素の歴史的データだけでなく、歴史的データとTwitterのようなソーシャルメディアのニュースや投稿を組み合わせることで、金融市場を予測できるモデルが、この研究の主目的である。その結果、実質的な改善が示される。現在の作品の主な特徴は- a) Twitterハンドルと金融コンポーネントのモデルを生成することができる完全に一般化されたアルゴリズムを提案すること。ロ株価に対するつぶやき効果の時間窓の予測 c) トレンドを予測するために複数のTwitterハンドルの効果を分析すること。近年の同様の分野における最新の研究の発見、研究ギャップの発見、分析と予測に必要なデータ収集のための詳細な調査が行われている。 State-of-the-artアルゴリズムが提案され,環境との完全な実装が提案されている。金融市場における Twitter データの NLP 分析を考慮した結果改善の洞察に富んだ傾向を示す。インドとアメリカの金融市場は、将来他の市場が取られるように、現在の作業で探索されている。本研究の社会的・経済的影響をまとめる。 Financial market like the price of stock, share, gold, oil, mutual funds are affected by the news and posts on social media. In this work deep learning based models are proposed to predict the trend of financial market based on NLP analysis of the twitter handles of leaders of different fields. There are many models available to predict financial market based on only the historical data of the financial component but combining historical data with news and posts of the social media like Twitter is the main objective of the present work. Substantial improvement is shown in the result. The main features of the present work are- a) proposing completely generalized algorithm which is able to generate models for any twitter handle and any financial component, b) predicting the time window for a tweets effect on a stock price c) analyzing the effect of multiple twitter handles for predicting the trend. A detailed survey is done to find out the latest work in recent years in the similar field, find the research gap, and collect the required data for analysis and prediction. State-of-the-art algorithm is proposed and complete implementation with environment is given. An insightful trend of the result improvement considering the NLP analysis of twitter data on financial market components is shown. The Indian and USA financial markets are explored in the present work where as other markets can be taken in future. The socio-economic impact of the present work is discussed in conclusion.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 計画分析による知的実行 Intelligent Execution through Plan Analysis ( http://arxiv.org/abs/2403.12162v1 ) ライセンス: Link先を確認	Daniel Borrajo, Manuela Veloso,	(参考訳) インテリジェントなロボットは計画を作成し実行する必要がある。現実の環境の複雑さに対処するために、計画は世界についていくつかの仮定をする。計画を実行する場合、通常、仮定は満たされない。ほとんどの研究は、この事実のネガティブな影響と実行失敗後の再計画の使用に焦点を当てている。代わりに私たちは、ポジティブな影響や、より良い計画を見つける機会に重点を置いています。計画する際、提案手法はこれらの機会を見つけ、保存する。その後、実行中に、監視システムは、スクラッチから計画を立て直すのではなく、知覚に集中し、計画を修正するために使用することができる。いくつかのパラダイム的なロボットタスクの実験は、アプローチが標準的な計画戦略よりも優れていることを示す。 Intelligent robots need to generate and execute plans. In order to deal with the complexity of real environments, planning makes some assumptions about the world. When executing plans, the assumptions are usually not met. Most works have focused on the negative impact of this fact and the use of replanning after execution failures. Instead, we focus on the positive impact, or opportunities to find better plans. When planning, the proposed technique finds and stores those opportunities. Later, during execution, the monitoring system can use them to focus perception and repair the plan, instead of replanning from scratch. Experiments in several paradigmatic robotic tasks show how the approach outperforms standard replanning strategies.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 超伝導回路の損失に及ぼすニオブ薄膜構造の影響 The effect of niobium thin film structure on losses in superconducting circuits ( http://arxiv.org/abs/2403.12164v1 ) ライセンス: Link先を確認	Maxwell Drimmer, Sjoerd Telkamp, Felix L. Fischer, Ines C. Rodrigues, Clemens Todt, Filip Krizek, Dominik Kriegner, Christoph Müller, Werner Wegscheider, Yiwen Chu,	(参考訳) 超伝導マイクロ波回路の性能は超伝導膜と基板の材料特性に強く影響される。表面処理の重要性や表面酸化物の影響を理解するために研究が進んでいるが、マイクロ波損失に対する超伝導膜構造の影響は、まだ完全には理解されていない。本研究では, 結晶特性の異なるニオブ共振器のマイクロ波特性とその表面トポグラフィーについて検討した。我々は、Nb結晶配向と表面トポグラフィーが室温と975Kの間で基板温度を変化させることで変化する一連のマグネトロンスパッタ薄膜を解析した。成長系列(550K)の中間温度条件下で成長したフィルムにおいて,結晶ドメインの優先順序と低表面粗さの両方を示すフィルムにおいて,最も高い品質因子を観察した。さらに, 共振器の温度依存性を解析し, Nb膜中の準粒子密度がニオブ結晶構造と粒界の存在によってどのように影響を受けるかを知る。その結果, 超伝導薄膜の結晶構造と共振器による損失機構の関連性が強調され, 薄膜成膜時の温度の適度な変化が結果として生じる品質要因に大きく影響することが示唆された。 The performance of superconducting microwave circuits is strongly influenced by the material properties of the superconducting film and substrate. While progress has been made in understanding the importance of surface preparation and the effect of surface oxides, the complex effect of superconductor film structure on microwave losses is not yet fully understood. In this study, we investigate the microwave properties of niobium resonators with different crystalline properties and related surface topographies. We analyze a series of magnetron sputtered films in which the Nb crystal orientation and surface topography are changed by varying the substrate temperatures between room temperature and 975 K. The lowest-loss resonators that we measure have quality factors of over one million at single-photon powers, among the best ever recorded using the Nb on sapphire platform. We observe the highest quality factors in films grown at an intermediate temperature regime of the growth series (550 K) where the films display both preferential ordering of the crystal domains and low surface roughness. Furthermore, we analyze the temperature-dependent behavior of our resonators to learn about how the quasiparticle density in the Nb film is affected by the niobium crystal structure and the presence of grain boundaries. Our results stress the connection between the crystal structure of superconducting films and the loss mechanisms suffered by the resonators and demonstrate that even a moderate change in temperature during thin film deposition can significantly affect the resulting quality factors.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 弱さの力:Coreset Selectionによるデータリヘアリングの高速化と強化 The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection ( http://arxiv.org/abs/2403.12166v1 ) ライセンス: Link先を確認	Mohammad Jafari, Yimeng Zhang, Yihua Zhang, Sijia Liu,	(参考訳) 機械学習のタスクが進化し続けるにつれて、傾向はより大きなデータセットを集め、ますます大きなモデルを訓練する。これは精度の向上につながったが、計算コストを持続不可能なレベルへとエスカレートした。そこで本研究は,計算効率とモデル精度の微妙なバランスをとることを目的としている。計算時間とモデル性能の両方を効果的に最適化し、コアサブセットの選択を重み付けに利用する新しい手法を提案する。戦略的に選択されたコアセットに焦点をあてることで、アウトリーチの影響を効率よく最小化するため、我々のアプローチは堅牢な表現を提供する。再校正された重みは、データセット全体に対してマッピングされ、伝播される。実験により,本手法の有効性を実証し,モデルトレーニングのスケーラブルで高精度な解法としての可能性を明らかにした。 As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# 医用画像分類のためのディープラーニングモデルの一般化 Generalizing deep learning models for medical image classification ( http://arxiv.org/abs/2403.12167v1 ) ライセンス: Link先を確認	Matta Sarah, Lamard Mathieu, Zhang Philippe, Alexandre Le Guilcher, Laurent Borderie, Béatrice Cochener, Gwenolé Quellec,	(参考訳) 多くのDeep Learning(DL)モデルが、医療実践のさまざまな側面を再形成することを約束する医療画像分析アプリケーションのために開発されている。医療機関がそれを採用することを奨励するDLモデル検証と実装の進歩にもかかわらず、いくつかの根本的な疑問が残る:DLモデルは一般化可能であるか? DLモデルのパフォーマンスが低下する原因は何でしょう? DLモデルのパフォーマンス低下を克服するには? 医療機器のアップデート、新しい画像ワークフロー、患者人口や人口の変化など、複数の要因により、時間とともにこのドリフトが引き起こされるため、医療データは動的でドメインシフトの傾向にある。本稿では,DLに基づく分類モデルの一般化手法の最近の展開を概観する。また、評価プロトコルやベンチマークの改善の必要性など今後の課題についても論じ、医用画像分類のための堅牢で一般化されたモデルを実現するための今後の発展を構想する。 Numerous Deep Learning (DL) models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, some fundamental questions remain: are the DL models capable of generalizing? What causes a drop in DL model performances? How to overcome the DL model performance drop? Medical data are dynamic and prone to domain shift, due to multiple factors such as updates to medical equipment, new imaging workflow, and shifts in patient demographics or populations can induce this drift over time. In this paper, we review recent developments in generalization methods for DL-based classification models. We also discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.	翻訳日:2024-03-20 18:31:46 公開日:2024-03-18
# EasyJailbreak: 大規模言語モデルをジェイルブレイクするための統一フレームワーク EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models ( http://arxiv.org/abs/2403.12171v1 ) ライセンス: Link先を確認	Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang,	(参考訳) 大規模言語モデル(LLM)のセキュリティ脆弱性の特定と緩和には、脱獄攻撃が不可欠である。セーフガードをバイパスし、禁止されたアウトプットを引き出すように設計されている。しかし、さまざまなjailbreakメソッドに大きな違いがあるため、コミュニティで利用可能な標準実装フレームワークは存在せず、包括的なセキュリティ評価が制限されている。本稿では,LLMに対するジェイルブレイク攻撃の構築と評価を容易にする統合フレームワークであるEasyJailbreakを紹介する。 Selector、Mutator、Constraint、Evaluatorの4つのコンポーネントを使ってJailbreak攻撃を構築する。このモジュラーフレームワークは、研究者が新しいコンポーネントと既存のコンポーネントの組み合わせから簡単に攻撃を構築できる。今のところ、EasyJailbreakは11の異なるjailbreakメソッドをサポートし、幅広いLLMのセキュリティ検証を容易にする。 10の異なるLSMで検証した結果、さまざまなジェイルブレイク攻撃で平均60%の攻撃確率で重大な脆弱性が判明した。特に、GPT-3.5-TurboやGPT-4のような先進モデルでさえ、それぞれ平均攻撃成功率(ASR)が57%、33%である。我々は、Webプラットフォーム、PyPIパブリッシュパッケージ、スクリーンキャストビデオ、実験的なアウトプットなど、研究者のための豊富なリソースをリリースした。 Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs. It builds jailbreak attacks using four components: Selector, Mutator, Constraint, and Evaluator. This modular framework enables researchers to easily construct attacks from combinations of novel and existing components. So far, EasyJailbreak supports 11 distinct jailbreak methods and facilitates the security validation of a broad spectrum of LLMs. Our validation across 10 distinct LLMs reveals a significant vulnerability, with an average breach probability of 60% under various jailbreaking attacks. Notably, even advanced models like GPT-3.5-Turbo and GPT-4 exhibit average Attack Success Rates (ASR) of 57% and 33%, respectively. We have released a wealth of resources for researchers, including a web platform, PyPI published package, screencast video, and experimental outputs.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# 骨格型ビデオ異常検出のためのグラフ-Jigsaw条件拡散モデル Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection ( http://arxiv.org/abs/2403.12172v1 ) ライセンス: Link先を確認	Ali Karami, Thi Kieu Khanh Ho, Narges Armanfard,	(参考訳) スケルトンに基づくビデオ異常検出(SVAD)はコンピュータビジョンにおいて重要な課題である。異常パターンや事象を正確に識別することで、オペレーターは不審な行為を迅速に検出し、安全性を高めることができる。これを達成するためには、身体レベルと地域レベルの両方において、人間の動きを包括的に理解することが必要である。しかし、既存の研究はこれらの重要な性質を同時に解決することができない。本稿では,SVADに関連する課題を克服するため,Skeleton-based Video Anomaly Detection (GiCiSAD) のためのグラフ-Jigsaw条件付き拡散モデル(Graph-Jigsaw Conditioned Diffusion Model)を提案する。 GiCiSADは3つの新しいモジュールで構成されている。グラフアテンションベースの予測モジュールはデータ固有の時空間的依存関係をキャプチャし、グラフレベルのJigsaw Puzzle Makerモジュールは正常な動きと異常な動きの間の微妙な領域レベルの不一致を区別し、グラフベースの条件拡散モデルは人間の動きの幅広いスペクトルを生成する。広く使われている4つの骨格ベースのビデオデータセットの大規模な実験により、GiCiSADはトレーニングパラメータが大幅に少ない既存のメソッドよりも優れており、新しい最先端技術として確立されている。 Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action. However, existing studies fail to simultaneously address these crucial properties. This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD) to overcome the challenges associated with SVAD. GiCiSAD consists of three novel modules: the Graph Attention-based Forecasting module to capture the spatio-temporal dependencies inherent in the data, the Graph-level Jigsaw Puzzle Maker module to distinguish subtle region-level discrepancies between normal and abnormal motions, and the Graph-based Conditional Diffusion model to generate a wide spectrum of human motions. Extensive experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters, establishing it as the new state-of-the-art.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# TnT-LLM:大規模言語モデルを用いた大規模テキストマイニング TnT-LLM: Text Mining at Scale with Large Language Models ( http://arxiv.org/abs/2403.12173v1 ) ライセンス: Link先を確認	Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan,	(参考訳) 非構造化テキストを構造化され有意義な形式に変換し、有用なカテゴリラベルで整理することは、下流の分析と応用のためのテキストマイニングの基本的なステップである。しかしながら、ラベル分類法やテキストベースのラベル分類器を構築するための既存の方法の多くは、ドメインの専門知識と手作業によるキュレーションに大きく依存しているため、そのプロセスは高価で時間を要する。ラベル空間が不特定であり、大規模なデータアノテーションが利用できない場合、これは特に困難である。本稿では,これらの課題を大規模言語モデル (LLM) を用いて解決する。 TnT-LLM は LLM を利用した2段階のフレームワークで,任意のユースケースに対して最小限の人的労力でラベル生成と割り当てのプロセスを自動化する。第1フェーズでは,ラベル分類を反復的に生成・洗練するゼロショット多段階推論手法を導入する。第2フェーズでは、LLMをトレーニングサンプルを生成するデータラベルとして使用し、軽量な教師付き分類器を確実に構築、デプロイ、大規模に提供できるようにします。我々は、オープンドメインチャットベースの検索エンジンであるBing Copilot(旧Bing Chat)のユーザ意図と会話ドメインの分析にTnT-LLMを適用した。 TnT-LLMは、最先端のベースラインと比較すると、より正確で関連性の高いラベル分類を生成でき、大規模分類における精度と効率のバランスが良好であることを示す。また、現実のアプリケーションにおける大規模テキストマイニングにLLMを使うことの課題と機会に関する実践的経験と洞察を共有します。 Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# エンド・ツー・エンド自動運転における説明可能な人工知能の安全性 Safety Implications of Explainable Artificial Intelligence in End-to-End Autonomous Driving ( http://arxiv.org/abs/2403.12176v1 ) ライセンス: Link先を確認	Shahin Atakishiyev, Mohammad Salameh, Randy Goebel,	(参考訳) エンド・ツー・エンドの学習パイプラインは、ディープラーニングの進歩、大規模トレーニングデータセットの可用性、統合センサーデバイスの改善など、高度自動運転車の継続的な開発におけるパラダイムシフトを徐々に生み出している。しかし、現代の学習手法によるリアルタイム意思決定における解釈可能性の欠如は、ユーザの信頼を阻害し、そのような車両の普及と商業化を阻害する。さらに、これらの車両が交通事故に巻き込まれたり、事故を起こしたりする場合には、この問題が悪化する。このような欠点は、社会的および法的観点から深刻な安全上の懸念を提起する。したがって、車両自動化の安全性を実現するためには、エンドツーエンドの自動運転における説明責任が不可欠である。しかしながら、自律運転の安全性と説明可能性の側面は、今日の最先端の研究者によって概して不一致に研究されている。本稿では,これらのトピック間のギャップを埋めて,次のような研究課題に答えようとする。いつ,どのように説明が自動運転の安全性を向上させるのか? そこで本稿では,自律運転における安全性と最先端の説明可能性技術について再考する。さらに,3つの重要なケーススタディを提示し,自動運転車の安全性向上における説明の要点を示す。最後に、我々の経験的調査について説明し、自動車の自律性に対する安全性と透明性を確保することの役割について、実用的な説明可能なAI手法による潜在的な価値、限界、注意点を明らかにする。 The end-to-end learning pipeline is gradually creating a paradigm shift in the ongoing development of highly autonomous vehicles, largely due to advances in deep learning, the availability of large-scale training datasets, and improvements in integrated sensor devices. However, a lack of interpretability in real-time decisions with contemporary learning methods impedes user trust and attenuates the widespread deployment and commercialization of such vehicles. Moreover, the issue is exacerbated when these cars are involved in or cause traffic accidents. Such drawback raises serious safety concerns from societal and legal perspectives. Consequently, explainability in end-to-end autonomous driving is essential to enable the safety of vehicular automation. However, the safety and explainability aspects of autonomous driving have generally been investigated disjointly by researchers in today's state of the art. In this paper, we aim to bridge the gaps between these topics and seek to answer the following research question: When and how can explanations improve safety of autonomous driving? In this regard, we first revisit established safety and state-of-the-art explainability techniques in autonomous driving. Furthermore, we present three critical case studies and show the pivotal role of explanations in enhancing self-driving safety. Finally, we describe our empirical investigation and reveal potential value, limitations, and caveats with practical explainable AI methods on their role of assuring safety and transparency for vehicle autonomy.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# 仮想2モードスクイーズ表現としての真空ラビ分割:周波数シフトからスクイーズパラメータを抽出する Vacuum Rabi splitting as a manifestation of virtual two-mode squeezing: Extracting the squeezing parameters from frequency shifts ( http://arxiv.org/abs/2403.12177v1 ) ライセンス: Link先を確認	Karol Gietka,	(参考訳) 真空ラビ分裂は、原子の共振周波数と原子が存在する空洞の対称的な分裂に依存する。この研究において、真空ラビ分裂は仮想的な2モードのスクイーズ現象の顕在化であると主張している。仮想励起のスクイーズパラメータと物理モードの周波数シフトの関連性を確立する。この目的のために、Dickeモデルと相互作用する2つの調和振動子のマッピングを用い、素モードと物理的モードの枠組みで解析する。最後に、そのような量子場の仮想的スクイーズもまた、場の量子論において役割を果たすかもしれないことを示唆する。 Vacuum Rabi splitting relies on symmetrical splitting of the common resonance frequency of atoms and the cavity in which the atoms reside. In this work, we argue that vacuum Rabi splitting is a manifestation of virtual light-matter two-mode squeezing. We establish a connection between squeezing parameters of virtual excitations and frequency shifts of the physical modes. To this end, we use the mapping between the Dicke model and two interacting harmonic oscillators, which we analyze in the framework of bare and physical modes. Finally, we suggest that such virtual squeezing of quantum fields might also play a role in quantum field theories.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# MACを用いた施設配置機構設計 MAC Advice for Facility Location Mechanism Design ( http://arxiv.org/abs/2403.12181v1 ) ライセンス: Link先を確認	Zohar Barak, Anupam Gupta, Inbal Talgam-Cohen,	(参考訳) 予測付きアルゴリズムは、伝統的な最悪のケース分析を超える方法として、施設位置の変種を含む、さまざまな領域で近年注目を集めている。我々は、$k$-facilityロケーションメカニズムの設計問題を調査し、$n$エージェントは戦略的であり、位置を誤報告する可能性がある。以前のモデルとは異なり、$k$の最適施設位置の予測は各エージェントの位置の予測に対して$n$の予測を受ける。しかし、これらの予測は「ほとんど」で「ほぼ」正しい(略してMAC)、すなわち予測された位置の$\delta$-fractionの一部が任意に誤りを許容され、残りの予測は$\varepsilon$-errorまで修正できる。我々は誤りの独立を前提にしない。このような予測は、現在の防犯施設の配置で最善を尽くすことができるだろうか? 一組の点の1ドル中央値(幾何学的中央値)は、汚職の下で自然に堅牢であることが示され、MAC予測による単一ファクティリティ位置のアルゴリズムが導かれる。我々はロバスト性の結果を$k$のファシリティケースの"バランスの取れた"変種に拡張する。バランスが取れなければ、ライン上の$k=2$の設備であっても、ロバスト性は完全に崩壊する。この「バランスの取れない」設定のために、予測を使用しないLu et al [2010] の最もよく知られた結果を上回る真にランダムなメカニズムを考案する。途中に「第2の」施設配置の問題(第1の施設位置が固定されている場合)を導入する。古典的なブレークポイントの定量的バージョンがロバストな統計結果をもたらすため、中間者1ドル、より一般的な$k$-メディアンのロバスト性に関する我々の発見は、独立した関心を持つ可能性がある。 Algorithms with predictions have attracted much attention in the last years across various domains, including variants of facility location, as a way to surpass traditional worst-case analyses. We study the $k$-facility location mechanism design problem, where the $n$ agents are strategic and might misreport their location. Unlike previous models, where predictions are for the $k$ optimal facility locations, we receive $n$ predictions for the locations of each of the agents. However, these predictions are only "mostly" and "approximately" correct (or MAC for short) -- i.e., some $\delta$-fraction of the predicted locations are allowed to be arbitrarily incorrect, and the remainder of the predictions are allowed to be correct up to an $\varepsilon$-error. We make no assumption on the independence of the errors. Can such predictions allow us to beat the current best bounds for strategyproof facility location? We show that the $1$-median (geometric median) of a set of points is naturally robust under corruptions, which leads to an algorithm for single-facility location with MAC predictions. We extend the robustness result to a "balanced" variant of the $k$ facilities case. Without balancedness, we show that robustness completely breaks down, even for the setting of $k=2$ facilities on a line. For this "unbalanced" setting, we devise a truthful random mechanism that outperforms the best known result of Lu et al. [2010], which does not use predictions. En route, we introduce the problem of "second" facility location (when the first facility's location is already fixed). Our findings on the robustness of the $1$-median and more generally $k$-medians may be of independent interest, as quantitative versions of classic breakdown-point results in robust statistics.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# ニューラルネットワークによるRKHS関数の近似 Approximation of RKHS Functionals by Neural Networks ( http://arxiv.org/abs/2403.12187v1 ) ライセンス: Link先を確認	Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo,	(参考訳) 時系列や画像などの関数データの豊富さによって、そのようなデータをニューラルネットワークに統合し、関数空間からR(関数)へのマップを学習することへの関心が高まっている。本稿では,ニューラルネットワークを用いたカーネルヒルベルト空間(RKHS)における関数の近似について検討する。我々は、RKHS上の関数の近似の普遍性を確立する。具体的には、逆多重四元数、ガウス、ソボレフのカーネルによって誘導されるものに対して明示的な誤差境界を導出する。さらに、ニューラルネットワークが一般化された汎関数線形モデルにおける回帰マップを正確に近似できることを証明し、機能回帰に本研究の成果を適用した。関数学習に関する既存の研究は、事前定義された基底関数のセットを含む統合型基底関数の拡張を必要とする。 RKHSの直交射影を補間することにより,基本関数展開の代替として点評価を用いることで,提案するネットワークはよりシンプルになる。 Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation of functionals on the RKHS's. Specifically, we derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels. Moreover, we apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps in generalized functional linear models. Existing works on functional learning require integration-type basis function expansions with a set of pre-specified basis functions. By leveraging the interpolating orthogonal projections in RKHS's, our proposed network is much simpler in that we use point evaluations to replace basis function expansions.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# PETScML:Scientific Machine Learningにおける回帰学習問題の2次解法 PETScML: Second-order solvers for training regression problems in Scientific Machine Learning ( http://arxiv.org/abs/2403.12188v1 ) ライセンス: Link先を確認	Stefano Zampini, Umberto Zerbinati, George Turkiyyah, David Keyes,	(参考訳) 近年,計算機科学や工学の応用による深層学習技術を用いた分析ツールとして,科学機械学習が出現するのを目撃している。これらの手法のコアとなるのは、ニューラルネットワークの実現を学ぶための教師付きトレーニングアルゴリズムである。しかし、ディープラーニングの実践とは違って、科学的な機械学習のトレーニング問題では、スムーズなデータの量が多くなり、経験的リスク関数のキャラクタリゼーションが向上し、制約のない最適化のための従来の解法に適している。我々は,Portable and Extensible Toolkit for Scientific計算上に構築された軽量なソフトウェアフレームワークを導入し,ディープラーニングソフトウェアと非制約最小化のための従来の解法とのギャップを埋める。我々は,幅広い科学的機械学習手法とテストケースのサロゲートモデルを学習する際に,回帰タスクから生じる一般化誤差を改善するために,ヘッセンのガウス・ニュートン近似に基づく信頼領域法の有効性を実証的に実証した。 L-BFGSや不正確なニュートンを含む従来の二階解法は、コストや精度の観点からも、サロゲートモデルの検証に使用される適応的な一階解法と比較して好意的に比較した。 In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# POLARトラバースデータセット:極地における極地移動をシミュレーションしたステレオカメラ画像のデータセット The POLAR Traverse Dataset: A Dataset of Stereo Camera Images Simulating Traverses across Lunar Polar Terrain under Extreme Lighting Conditions ( http://arxiv.org/abs/2403.12194v1 ) ライセンス: Link先を確認	Margaret Hansen, Uland Wong, Terrence Fong,	(参考訳) POLARtraverse Dataset: 直線トラバースをシミュレートするために設計された極点照明条件下での月状地形の高忠実なステレオペア画像のデータセットを提案する。カメラの高さやピッチの異なる個々のトラバースの画像は、静止したステレオバーをリゴリスシミュレーションで満たされたテストベッドの上を移動させ、1m間隔で記録され、月の南極地形を模した形状になっている。地上の真実の幾何学やカメラの位置情報も記録された。このデータセットは、月極環境における使用のために、ステレオカメラやモノクラーカメラのイメージ、例えば視覚計測のようなソフトウェアアルゴリズムを開発し、テストすることを目的としており、また、月の極域で期待される照明条件についての洞察を提供する。 We present the POLAR Traverse Dataset: a dataset of high-fidelity stereo pair images of lunar-like terrain under polar lighting conditions designed to simulate a straight-line traverse. Images from individual traverses with different camera heights and pitches were recorded at 1 m intervals by moving a suspended stereo bar across a test bed filled with regolith simulant and shaped to mimic lunar south polar terrain. Ground truth geometry and camera position information was also recorded. This dataset is intended for developing and testing software algorithms that rely on stereo or monocular camera images, such as visual odometry, for use in the lunar polar environment, as well as to provide insight into the expected lighting conditions in lunar polar regions.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# レンズのシフト:大規模言語モデルを用いたnpmエコシステム内のマルウェアの検出 Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models ( http://arxiv.org/abs/2403.12196v1 ) ライセンス: Link先を確認	Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams,	(参考訳) Gartner 2022のレポートは、世界中の組織の45%が2025年までにソフトウェアサプライチェーンの攻撃に遭遇すると予想しており、コミュニティと国家の利益のためにソフトウェアサプライチェーンのセキュリティを改善する緊急性を強調している。現在のマルウェア検出技術は、良性パッケージとマルウェアパッケージをフィルタリングすることで手動でレビューするのに役立つが、偽陽性率が高く、自動化サポートが限られている。したがって、マルウェア検出技術は、正確かつ最小限の偽陽性結果に対する高度な、より自動化されたアプローチの恩恵を受けることができる。本研究の目的は,大規模言語モデル(LLM)の実証研究を通じて,セキュリティアナリストによる悪意のあるパッケージの特定を支援し,npmエコシステムにおける潜在的なマルウェアを検出することである。本稿では,ChatGPTの反復的自己修正とゼロショットロールプレイチェーンを用いた多段階意思決定マルウェア検出ワークフローであるSocketAI Scannerを提案する。我々は,5,115 npmパッケージ(そのうち2,180は悪意がある)を調査し,静的解析ツールを用いてGPT-3およびGPT-4モデルのベースライン比較を行った。誤分類警告率の低いGPTモデルでは有望な結果が得られた。ベースライン比較では, 25%以上の精度, 15%以上のF1スコアにおいて, 静的解析よりも顕著な改善が見られた。 GPT-3モデルの精度は91%, F1スコアは94%であった。 GPT-4は精度(99%)とF1(97%)が優れており、GPT-3は費用対効果のバランスを示す。 The Gartner 2022 report predicts that 45% of organizations worldwide will encounter software supply chain attacks by 2025, highlighting the urgency to improve software supply chain security for community and national interests. Current malware detection techniques aid in the manual review process by filtering benign and malware packages, yet such techniques have high false-positive rates and limited automation support. Therefore, malware detection techniques could benefit from advanced, more automated approaches for accurate and minimally false-positive results. The goal of this study is to assist security analysts in identifying malicious packages through the empirical study of large language models (LLMs) to detect potential malware in the npm ecosystem. We present SocketAI Scanner, a multi-stage decision-maker malware detection workflow using iterative self-refinement and zero-shot-role-play-Chain of Thought (CoT) prompting techniques for ChatGPT. We studied 5,115 npm packages (of which 2,180 are malicious) and performed a baseline comparison of the GPT-3 and GPT-4 models with a static analysis tool. Our findings showed promising results for GPT models with low misclassification alert rates. Our baseline comparison demonstrates a notable improvement over static analysis in precision scores above 25% and F1 scores above 15%. We attained precision and F1 scores of 91% and 94%, respectively, for the GPT-3 model. Overall, GPT-4 demonstrates superior performance in precision (99%) and F1 (97%) scores, while GPT-3 presents a cost-effective balance between performance and expenditure.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# E2F-Net:StyleGANラテント・スペースによるアイ・ツー・フェイス・インペインティング E2F-Net: Eyes-to-Face Inpainting via StyleGAN Latent Space ( http://arxiv.org/abs/2403.12197v1 ) ライセンス: Link先を確認	Ahmad Hassanpour, Fatemeh Jamalbafrani, Bian Yang, Kiran Raja, Raymond Veldhuis, Julian Fierrez,	(参考訳) 顔画像の欠落または損傷領域を復元する技術である顔の塗り絵は、隠蔽されたシナリオにおける顔認識や、品質の悪いキャプチャによる画像解析といった応用において重要なものである。このプロセスは、現実的なヴィジュアライゼーションを生成するだけでなく、個々のアイデンティティ特性も保持する。本研究の目的は、新しいGANベースの「Eyes-to-Face Network (E2F-Net)」モデルにより、眼球周囲領域(眼球面)に塗布することである。提案手法は,2つの専用エンコーダを用いて眼周囲領域から同一性および非同一性の特徴を抽出する。抽出された特徴は、事前訓練されたStyleGANジェネレータの潜伏空間にマッピングされ、最先端の性能とリッチで多様な表現力のある潜伏空間の恩恵を受けることができる。 GANインバージョン手法の最適化により,遅延空間における最適コードを見つけるために,StyleGAN出力をさらに改良する。私たちのE2F-Netは、二次的な利点として計算の複雑さを減らす最小限のトレーニングプロセスを必要とします。広範囲な実験を通して,本手法は,訓練と監督の努力が著しく少ないにも関わらず,顔全体を高品質に再構築し,現在の技術を超えていることを示す。提案手法をトレーニングし,検証するために,よく知られた公開顔データセットに基づいて,視線対面データセットを7つ生成した。コードとデータセットは公開されている。 Face inpainting, the technique of restoring missing or damaged regions in facial images, is pivotal for applications like face recognition in occluded scenarios and image analysis with poor-quality captures. This process not only needs to produce realistic visuals but also preserve individual identity characteristics. The aim of this paper is to inpaint a face given periocular region (eyes-to-face) through a proposed new Generative Adversarial Network (GAN)-based model called Eyes-to-Face Network (E2F-Net). The proposed approach extracts identity and non-identity features from the periocular region using two dedicated encoders have been used. The extracted features are then mapped to the latent space of a pre-trained StyleGAN generator to benefit from its state-of-the-art performance and its rich, diverse and expressive latent space without any additional training. We further improve the StyleGAN output to find the optimal code in the latent space using a new optimization for GAN inversion technique. Our E2F-Net requires a minimum training process reducing the computational complexity as a secondary benefit. Through extensive experiments, we show that our method successfully reconstructs the whole face with high quality, surpassing current techniques, despite significantly less training and supervision efforts. We have generated seven eyes-to-face datasets based on well-known public face datasets for training and verifying our proposed methods. The code and datasets are publicly available.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# FLex:ステレオ内視鏡映像のダイナミック・ラジアンス・フィールド最適化 FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos ( http://arxiv.org/abs/2403.12198v1 ) ライセンス: Link先を確認	Florian Philipp Stilz, Mert Asim Karaoglu, Felix Tristram, Nassir Navab, Benjamin Busam, Alexander Ladikos,	(参考訳) 内視鏡的シーンの再構築は、外科手術後の分析から教育訓練まで、様々な医療応用にとって重要な要素である。最近, 変形組織を用いた内視鏡的再建術で有望な成績を示した。しかし、セットアップは、静的内視鏡、変形の制限、または内視鏡カメラのカメラポーズ情報を取得するための外部追跡装置に限られている。 FLexでは、変形組織の非常にダイナミックな環境において、動く内視鏡の挑戦的なセットアップを飾ります。複数重重なり合う4次元ニューラルラジアンスフィールド(NeRF)への暗黙的なシーン分離と、再構成とカメラのスクラッチからのポーズを協調的に最適化するプログレッシブ最適化手法を提案する。これにより、使いやすさが向上し、5000フレーム以上の手術ビデオの処理に間に合うように再構築能力を拡張できる。 StereoMISデータセットの大規模な評価により、FLexは競争力のあるポーズ精度を維持しながら、新規ビュー合成の品質を著しく向上することが示された。 Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. Neural rendering has recently shown promising results in endoscopic reconstruction with deforming tissue. However, the setup has been restricted to a static endoscope, limited deformation, or required an external tracking device to retrieve camera pose information of the endoscopic camera. With FLex we adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information. Extensive evaluations on the StereoMIS dataset show that FLex significantly improves the quality of novel view synthesis while maintaining competitive pose accuracy.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# 機械学習プロジェクトにおけるCI/CDパイプラインの進化に関する実証分析 Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects ( http://arxiv.org/abs/2403.12199v1 ) ライセンス: Link先を確認	Alaa Houerbi, Chadha Siala, Alexis Tucker, Dhia Elhaq Rzig, Foyzul Hassan,	(参考訳) 機械学習(ML)の人気が高まり、他のソフトウェアアーティファクトとのMLコンポーネントの統合が増加し、Travis CIやGitHub Actionsなどの継続的インテグレーションとデリバリ(CI/CD)ツールが利用されるようになった。このようなCI/CD構成とサービスは、プロジェクトのライフサイクル中に同期を必要とする。従来のソフトウェアシステムにおけるCI/CD構成とサービスの使い方について、いくつかの研究が議論された。しかしながら、MLプロジェクトでのCI/CD構成とサービスの変更に関する知識は限られている。この知識ギャップを埋めるために、この研究は、MLソフトウェアシステムにおけるCI/CD構成の進化に関する最初の経験的分析を示す。我々は508のオープンソースMLプロジェクトから収集された343のコミットを手動で分析し、MLプロジェクトにおいて一般的なCI/CD構成変更カテゴリを特定し、CI/CDとMLコンポーネントの14の共変更の分類法を考案した。さらに, 頻繁なCI/CD構成変更パターンを15,634コミットで識別するCI/CD構成変更クラスタリングツールを開発した。さらに、CI/CD構成を変更するML開発者の専門知識を測定しました。この分析から、コミットの61.8%がビルドポリシーの変更と、一般的なオープンソースプロジェクトと比較してパフォーマンスと保守性に関する最小限の変更を含んでいることがわかった。さらに、共進化分析では、CI/CD構成が、依存関係の直接包摂や標準化されたテストフレームワークの使用の欠如といった悪いプラクティスのために、不要に変更されたことが判明した。推奨外の設定とジェネリックビルド言語への依存による変更パターンの分析を通じて、さらに多くのプラクティスが見つかった。最後に、私たちの開発者の専門知識分析は、経験豊富な開発者がCI/CD構成を変更する傾向にあることを示唆しています。 The growing popularity of machine learning (ML) and the integration of ML components with other software artifacts has led to the use of continuous integration and delivery (CI/CD) tools, such as Travis CI, GitHub Actions, etc. that enable faster integration and testing for ML projects. Such CI/CD configurations and services require synchronization during the life cycle of the projects. Several works discussed how CI/CD configuration and services change during their usage in traditional software systems. However, there is very limited knowledge of how CI/CD configuration and services change in ML projects. To fill this knowledge gap, this work presents the first empirical analysis of how CI/CD configuration evolves for ML software systems. We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories in ML projects and devised a taxonomy of 14 co-changes in CI/CD and ML components. Moreover, we developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits. Furthermore, we measured the expertise of ML developers who modify CI/CD configurations. Based on this analysis, we found that 61.8% of commits include a change to the build policy and minimal changes related to performance and maintainability compared to general open-source projects. Additionally, the co-evolution analysis identified that CI/CD configurations, in many cases, changed unnecessarily due to bad practices such as the direct inclusion of dependencies and a lack of usage of standardized testing frameworks. More practices were found through the change patterns analysis consisting of using deprecated settings and reliance on a generic build language. Finally, our developer's expertise analysis suggests that experienced developers are more inclined to modify CI/CD configurations.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# 人間と機械の機能の合成学習 Compositional learning of functions in humans and machines ( http://arxiv.org/abs/2403.12201v1 ) ライセンス: Link先を確認	Yanli Zhou, Brenden M. Lake, Adina Williams,	(参考訳) 機能を学び、構成する能力は、人間の効率的な学習と推論の基礎となり、既知の調理プロセスから新しい料理を作るといった柔軟な一般化を可能にします。関数の逐次連鎖以外にも、既存の言語学文献では、人間が相互作用する関数によってより複雑な構成を把握できることが示されている。視覚領域の調査を拡大し、様々な相互作用条件下での合成機能を用いた学習・推論において、人間とニューラルネットワークモデルの能力を探究する機能学習パラダイムを開発した。個々の機能に関する短いトレーニングの後、人間の参加者は、2つの学習された機能を構成する上で評価され、第1の関数の応用が第2の関数を適用するコンテキストを作成したり削除したりするインスタンスを含む4つの主要な相互作用タイプをカバーする方法が検討された。以上の結果から,人間は相互作用条件にまたがる新しい視覚機能合成をゼロショットで一般化することができ,文脈変化に対する感受性を示すことが示唆された。同じタスクにおけるニューラルネットワークモデルとの比較により、合成性(MLC)アプローチのメタラーニングを通じて、標準的なシーケンス対シーケンス変換器は、構成関数における人間の一般化パターンを模倣することができることが明らかになった。 The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# DeCoTR:2Dおよび3Dアテンションによる深度補完 DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions ( http://arxiv.org/abs/2403.12202v1 ) ライセンス: Link先を確認	Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli,	(参考訳) 本稿では,2次元と3次元の両方の注意を生かして,反復的な空間伝搬を必要とせず,高精度な深度補完を実現する手法を提案する。具体的には、まず、ボトルネックにおける2次元特徴に注意を向け、接続をスキップすることで、ベースライン畳み込み深度補完モデルを強化する。これにより、この単純なネットワークの性能が向上し、最新の複雑なトランスフォーマーベースモデルと同等に設定できる。このネットワークからの初期深度と特徴を活用して、2D機能を引き上げて3Dポイントクラウドを形成し、3Dポイントトランスフォーマーを構築し、モデルが3D幾何学的特徴を明示的に学習し、活用できるようにする。さらに,本論文では,学習を改善する点群処理の正規化手法を提案し,棚から点変圧器を直接使用するよりも精度が向上した。さらに、ダウンサンプリングされたポイントクラウド機能に対するグローバルな注意を取り入れ、計算可能でありながら、長距離コンテキストを可能にする。提案手法であるDeCoTRを,NYU Depth V2 や KITTI を含む確立された深度補完ベンチマークで評価した結果,新しい最先端性能が得られた。さらに、ScanNetおよびDDADベンチマークでゼロショット評価を行い、DeCoTRが既存のアプローチよりも優れた一般化性を有することを示した。 In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.	翻訳日:2024-03-20 18:21:58 公開日:2024-03-18
# ビジョンベースのアジャイルフライトのための模倣によるブートストラップ強化学習 Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight ( http://arxiv.org/abs/2403.12203v1 ) ライセンス: Link先を確認	Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza,	(参考訳) 視覚に基づく自律型ドローンレースにおける強化学習(RL)の有効性とImitation Learning(IL)の有効性を組み合わせる。我々は、明示的な状態推定なしで視覚入力を直接処理することに集中する。 RLは、試行錯誤を通じて複雑なコントローラを学習するための一般的なフレームワークを提供するが、視覚入力の高次元性のため、サンプル効率と計算要求に関する課題に直面している。逆に、ILは視覚的なデモンストレーションから学ぶことの効率を示すが、これらのデモの品質によって制限され、共変量シフトのような問題に直面している。これらの制約を克服するために、RLとILの利点を組み合わせた新しいトレーニングフレームワークを提案する。本フレームワークは,特権状態情報を用いた教師政策の初期訓練,ILを用いた学生政策への蒸留,適応的RL微調整の3段階からなる。実環境と実環境の両方でのシミュレーション実験により,我々の手法は,明示的な状態推定を伴わない視覚情報のみを用いて,レースコースを走行する際に,ILやRL単独よりも優れた性能とロバスト性を達成できることが示されている。 We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL demonstrates efficiency in learning from visual demonstrations but is limited by the quality of those demonstrations and faces issues like covariate shift. To overcome these limitations, we propose a novel training framework combining RL and IL's advantages. Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# BACQ - 量子コンピューティングのためのアプリケーション指向ベンチマーク BACQ - Application-oriented Benchmarks for Quantum Computing ( http://arxiv.org/abs/2403.12205v1 ) ライセンス: Link先を確認	Frédéric Barbaresco, Laurent Rioux, Christophe Labreuche, Michel Nowak, Noé Olivier, Damien Nicolazic, Olivier Hess, Anne-Lise Guilmin, Robert Wang, Tanguy Sassolas, Stéphane Louise, Kyrylo Snizhko, Grégoire Misguich, Alexia Auffèves, Robert Whitney, Emmanuelle Vergnaud, Félicien Schopfer,	(参考訳) フランスの量子戦略の一部であるMetriQs-Franceは、量子技術の測定、標準、評価に関する国家プログラムの支援により、BACQプロジェクトは量子コンピューティングのアプリケーション指向ベンチマークに特化している。 The consortium gathering THALES, EVIDEN, a Atos business, CEA, CNRS, TERATEC, LNEは, 業界ユーザにとって有意義な参照評価基準を確立することを目的としている。 With the support of the national program on measurements, standards, and evaluation of quantum technologies MetriQs-France, a part of the French national quantum strategy, the BACQ project is dedicated to application-oriented benchmarks for quantum computing. The consortium gathering THALES, EVIDEN, an Atos business, CEA, CNRS, TERATEC, and LNE aims at establishing performance evaluation criteria of reference, meaningful for industry users.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# データフィッティングに有用なコンパクト表現 Useful Compact Representations for Data-Fitting ( http://arxiv.org/abs/2403.12206v1 ) ライセンス: Link先を確認	Johannes J. Brust,	(参考訳) 第2の微分情報を持たない最小化問題に対して、ヘッセンマトリスを推定する手法は非常に効果的である。しかし、従来の手法では、大きな問題に対して禁忌な高密度な行列が生成される。限定メモリのコンパクト表現は、低ランクの表現の観点から密度の強い配列を表現し、大規模な決定論的問題に対するソフトウェア実装の最先端技術となっている。我々はベクトルの選択によってパラメータ化される新しいコンパクト表現を開発し、特別な選択のために既存のよく知られた公式に還元する。本研究では, 大規模固有値計算, テンソル因子分解, 非線形回帰に対するコンパクト表現の有効性を示す。 For minimization problems without 2nd derivative information, methods that estimate Hessian ma- trices can be very effective. However, conventional techniques generate dense matrices that are prohibitive for large problems. Limited-memory compact representations express the dense arrays in terms of a low rank representation and have become the state-of-the-art for software implementations on large deterministic problems. We develop new compact representations that are parameterized by a choice of vectors and that reduce to existing well known formulas for special choices. We demonstrate effectiveness of the compact representations for large eigenvalue computations, tensor factorizations and nonlinear regressions.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# サイバー影響操作における合成画像生成 : 創発的脅威? Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat? ( http://arxiv.org/abs/2403.12207v1 ) ライセンス: Link先を確認	Melanie Mathys, Marco Willi, Michael Graber, Raphael Meier,	(参考訳) 人工知能(AI)の進化は、デジタルコンテンツ生成の変革を触媒し、サイバー・インフルエンス・オペレーションに深く影響している。本報告では, 合成画像の作成において, 拡散モデルなどの生成的深層学習モデルの可能性と限界について述べる。我々は、これらのツールのアクセシビリティ、実用性、出力品質と、それらが詐欺、影響、転倒の脅威シナリオに与える影響を批判的に評価する。このレポートは、いくつかの仮説的サイバー影響操作に関するコンテンツを生成し、脅威アクターに対するこれらのAI駆動手法の現在の能力と限界を実証している。生成モデルはイラストや非現実的画像の制作に優れるが、人間の指導による洗練の必要性と計算資源によって制限された、説得力のある写真リアルコンテンツを作成することは依然として大きな課題である。我々の調査は、技術進歩と誤用の可能性の微妙なバランスを浮き彫りにして、進行中の研究、防衛機構、多分野連携、政策開発への推奨を促している。これらの勧告は、特にサイバー影響の文脈において、情報の完全性に対するリスクを保護しながら、ポジティブな影響に対するAIの可能性を活用することを目的としている。 The evolution of artificial intelligence (AI) has catalyzed a transformation in digital content generation, with profound implications for cyber influence operations. This report delves into the potential and limitations of generative deep learning models, such as diffusion models, in fabricating convincing synthetic images. We critically assess the accessibility, practicality, and output quality of these tools and their implications in threat scenarios of deception, influence, and subversion. Notably, the report generates content for several hypothetical cyber influence operations to demonstrate the current capabilities and limitations of these AI-driven methods for threat actors. While generative models excel at producing illustrations and non-realistic imagery, creating convincing photo-realistic content remains a significant challenge, limited by computational resources and the necessity for human-guided refinement. Our exploration underscores the delicate balance between technological advancement and its potential for misuse, prompting recommendations for ongoing research, defense mechanisms, multi-disciplinary collaboration, and policy development. These recommendations aim to leverage AI's potential for positive impact while safeguarding against its risks to the integrity of information, especially in the context of cyber influence.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 効率的な強化学習のためのリアプノフ関数の解法 Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning ( http://arxiv.org/abs/2403.12210v1 ) ライセンス: Link先を確認	Antonio Lopez, David Fridovich-Keil,	(参考訳) 強化学習(RL)を用いた最近の手法は、未知の環境で知的エージェントの訓練に成功している。しかし、RLは現実世界のロボティクスのシナリオでは広く適用されていない。これは、現在の最先端のRLメソッドでは、特定のタスクを学習するために大量のデータを必要とするため、エージェントをデプロイして実際のアプリケーションにデータを収集する場合、不合理なコストが発生するためである。本稿では,RLの報酬関数を再現する既存の作業から,サンプルの複雑性を低減するための制御リャプノフ関数(CLF)を導入する。それでも、この定式化にはシステムのCLFを知る必要があるが、一般的な手法が欠如しているため、適切なCLFを特定することはしばしば困難である。既存の作業はハミルトン・ヤコビ到達可能性手順を通じて低次元のCLFを計算することができる。しかし、この手法は高次元システムでは難解となり、システム分解技術を用いて分解制御リアプノフ関数 (DCLF) と呼ばれるものを計算する。計算されたDCLFを報酬形成に使用し、RL性能の向上を示す。複数の例を通して、我々の手法は、最先端のソフトアクター批判アルゴリズムが必要とする実世界のデータの半分以下にクワッドコプターを着陸させる政策を立証する。 Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 欠損を伴う縦型マルチモーダル・マルチビュー予測のための統一モデル A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness ( http://arxiv.org/abs/2403.12211v1 ) ライセンス: Link先を確認	Boqi Chen, Junier Oliva, Marc Niethammer,	(参考訳) 医療記録は、画像、テキスト、表情報など、様々なモダリティから構成されることが多い。すべてのモダリティを統合することは、患者の状態の全体像を提供すると同時に、それらを縦に分析することで、疾患の進行をよりよく理解する。しかし、現実世界の経時的医療記録には課題がある。 1)患者は特定の時点のデータの一部または全部を欠くことがあり、 2) ある期間にすべての患者に特定のモダリティや見解が欠如している可能性がある。本研究では,長手型マルチモーダル・マルチビュー(MMMV)予測のための統一モデルを提案する。提案手法は,入力に希望する時間ポイントを最大で確保し,利用可能なデータをすべて活用することを目的としている。変形性膝関節症に対するOAI(Ocearthritis Initiative)とKellgren-Lawrence grade(KLG)による膝関節症データセットの実験的評価を行った。我々は,本手法の有効性を,トレーニングと評価において同一のモダリティとビューの組み合わせを使用する特定のモデルと比較することによって示す。また、時間的データの拡張による利点を示し、異なるタスクにおける各モダリティ/ビューの重要性をより深く理解するためのポストホック分析を提供する。 Medical records often consist of different modalities, such as images, text, and tabular information. Integrating all modalities offers a holistic view of a patient's condition, while analyzing them longitudinally provides a better understanding of disease progression. However, real-world longitudinal medical records present challenges: 1) patients may lack some or all of the data for a specific timepoint, and 2) certain modalities or views might be absent for all patients during a particular period. In this work, we introduce a unified model for longitudinal multi-modal multi-view (MMMV) prediction with missingness. Our method allows as many timepoints as desired for input, and aims to leverage all available data, regardless of their availability. We conduct extensive experiments on the knee osteoarthritis dataset from the Osteoarthritis Initiative (OAI) for pain and Kellgren-Lawrence grade (KLG) prediction at a future timepoint. We demonstrate the effectiveness of our method by comparing results from our unified model to specific models that use the same modality and view combinations during training and evaluation. We also show the benefit of having extended temporal data and provide post-hoc analysis for a deeper understanding of each modality/view's importance for different tasks.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 名前付きエンティティ認識の評価:ブラジルのコーポレートオーナニングにおける単言語モデルと多言語トランスフォーマーモデルの比較分析 Evaluating Named Entity Recognition: Comparative Analysis of Mono- and Multilingual Transformer Models on Brazilian Corporate Earnings Call Transcriptions ( http://arxiv.org/abs/2403.12212v1 ) ライセンス: Link先を確認	Ramon Abilio, Guilherme Palermo Coelho, Ana Estela Antunes da Silva,	(参考訳) 名前付きエンティティ認識(NER)は、テキスト文書から情報を抽出する自然言語処理技術である。しかし、NERに関する既存の研究の多くは英語の文書を中心にしており、ポルトガルの金融ドメインに合わせたデータセットの入手率の差を残している。本研究は、ブラジルの銀行の決算報告から抽出したポルトガル語テキストに着目し、金融分野におけるNERの必要性に対処するものである。 384文字からなる包括的データセットの収集とアノテーションの弱監督手法の活用により,ポルトガル語で訓練された単言語モデル(BERTimbau, PTT5)と多言語モデル(mBERT, mT5)の性能評価を行った。特に,トークン分類タスクをテキスト生成問題として再編成し,T5モデルの微調整と評価を可能にする手法を提案する。モデルの微調整に続いて、テストデータセットの評価を行い、パフォーマンスとエラーのメトリクスを利用する。以上の結果から,BERTベースモデルはT5ベースモデルより一貫して優れていた。さらに,マルチ言語モデルはマクロF1スコアに匹敵する性能を示したが,BERTimbauはPTT5よりも優れた性能を示した。 PTT5 と mT5 が生成した文のマニュアル解析では、元の文と生成された文の間に 0.89 から 1.0 までの類似度が示される。しかし、両モデルとも通貨やパーセンテージの値の変更など、金融分野における正確性や整合性の重要性を裏付ける不一致を示すため、重大なエラーが発生する。これらの課題にもかかわらず、PTT5とmT5はそれぞれ98.52%と98.85%という印象的なマクロF1スコアを達成した。さらに,本研究では,モデル間の推論において,メモリと時間消費の顕著な相違点に光を当てた。 Named Entity Recognition (NER) is a Natural Language Processing technique for extracting information from textual documents. However, much of the existing research on NER has been centered around English-language documents, leaving a gap in the availability of datasets tailored to the financial domain in Portuguese. This study addresses the need for NER within the financial domain, focusing on Portuguese-language texts extracted from earnings call transcriptions of Brazilian banks. By curating a comprehensive dataset comprising 384 transcriptions and leveraging weak supervision techniques for annotation, we evaluate the performance of monolingual models trained on Portuguese (BERTimbau and PTT5) and multilingual models (mBERT and mT5). Notably, we introduce a novel approach that reframes the token classification task as a text generation problem, enabling fine-tuning and evaluation of T5 models. Following the fine-tuning of the models, we conduct an evaluation on the test dataset, employing performance and error metrics. Our findings reveal that BERT-based models consistently outperform T5-based models. Furthermore, while the multilingual models exhibit comparable macro F1-scores, BERTimbau demonstrates superior performance over PTT5. A manual analysis of sentences generated by PTT5 and mT5 unveils a degree of similarity ranging from 0.89 to 1.0, between the original and generated sentences. However, critical errors emerge as both models exhibit discrepancies, such as alterations to monetary and percentage values, underscoring the importance of accuracy and consistency in the financial domain. Despite these challenges, PTT5 and mT5 achieve impressive macro F1-scores of 98.52% and 98.85%, respectively, with our proposed approach. Furthermore, our study sheds light on notable disparities in memory and time consumption for inference across the models.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 2乗和によるプライベートグラフオン推定 Private graphon estimation via sum-of-squares ( http://arxiv.org/abs/2403.12213v1 ) ライセンス: Link先を確認	Hongjie Chen, Jingqiu Ding, Tommaso d'Orsi, Yiding Hua, Chih-Hung Liu, David Steurer,	(参考訳) 確率ブロックモデルを学習し,任意のブロック数の多項式ランニング時間を用いたグラフトン推定のための,最初の純ノード微分プライベートアルゴリズムを開発した。統計的効用は、これらの問題に対する以前の最良の情報理論(指数時間)ノードプライド機構のそれと一致することを保証している。このアルゴリズムは、ブロック数に依存する2乗緩和の和で定義されるスコア関数の指数的なメカニズムに基づいている。結果の主な要素は,(1)2つの確率行列のポリトープ上の2次最適化によるブロックグラモン間距離の特徴づけ,(2)任意のポリトープ上の多項式最適化のための2次収束結果の一般化,(3)総和2乗アルゴリズムパラダイムの一部としてスコア関数のリプシッツ拡張を実行するための一般アプローチである。 We develop the first pure node-differentially-private algorithms for learning stochastic block models and for graphon estimation with polynomial running time for any constant number of blocks. The statistical utility guarantees match those of the previous best information-theoretic (exponential-time) node-private mechanisms for these problems. The algorithm is based on an exponential mechanism for a score function defined in terms of a sum-of-squares relaxation whose level depends on the number of blocks. The key ingredients of our results are (1) a characterization of the distance between the block graphons in terms of a quadratic optimization over the polytope of doubly stochastic matrices, (2) a general sum-of-squares convergence result for polynomial optimization over arbitrary polytopes, and (3) a general approach to perform Lipschitz extensions of score functions as part of the sum-of-squares algorithmic paradigm.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 導波路における多重量子状態移動 Multiplexed quantum state transfer in waveguides ( http://arxiv.org/abs/2403.12222v1 ) ライセンス: Link先を確認	Guillermo F. Peñas, Ricardo Puebla, Juan José García-Ripoll,	(参考訳) 本稿では、QEDセットアップにおける量子情報の記憶と操作の最大化を示すテストベッドとして機能する量子ネットワークの現実的な導波路実装について考察する。ウェーブパケット工学と量子状態伝達プロトコルを用いて2つの手法を解析する。まず、時間領域における直交光子の族を提案し、設計する。これらの光子は異なる標的量子ビットとの選択的相互作用を可能にする。しかし、共振ノードを用いたモード多重化はクロストーク効果によって大きく損なわれている。これは第2のアプローチ、すなわち周波数多重化を動機付けている。ここでは、導波路を通る周波数多重化の限界について検討し、所定の帯域内で異なる周波数の光子をホストし、忠実に送信する能力を解析する。我々は1光と2光の詳細なシミュレーションを行い、現実的な条件下でのコヒーレント量子状態伝達プロトコルの忠実性に関する理論的境界を提供する。この結果から, 耐故障性量子コンピューティングの要求を満たすため, 数十個の多重光子を大域的忠実度で利用することが可能であることが示唆された。これは、単一光子の忠実性の条件が満たされることに注意が必要である。 In this article, we consider a realistic waveguide implementation of a quantum network that serves as a testbed to show how to maximize the storage and manipulation of quantum information in QED setups. We analyze two approaches using wavepacket engineering and quantum state transfer protocols. First, we propose and design a family of orthogonal photons in the time domain. These photons allow for a selective interaction with distinct targeted qubits. Yet, mode multiplexing employing resonant nodes is largely spoiled by cross-talk effects. This motivates the second approach, namely, frequency multiplexing. Here we explore the limits of frequency multiplexing through the waveguide, analyzing its capabilities to host and faithfully transmit photons of different frequencies within a given bandwidth. We perform detailed one- and two-photon simulations and provide theoretical bounds for the fidelity of coherent quantum state transfer protocols under realistic conditions. Our results show that state-of-the-art experiments can employ dozens of multiplexed photons with global fidelities fulfilling the requirements imposed by fault-tolerant quantum computing. This is with the caveat that the conditions for single-photon fidelity are met.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# FloodCastを用いた大規模洪水モデリングと予測 Large-scale flood modeling and forecasting with FloodCast ( http://arxiv.org/abs/2403.12226v1 ) ライセンス: Link先を確認	Qingsong Xu, Yilei Shi, Jonathan Bamber, Chaojun Ouyang, Xiao Xiang Zhu,	(参考訳) 大規模流体力学モデルは通常、高い計算コストをもたらすだけでなく、固定解像度の空間格子とモデルパラメータに依存する。これにより、洪水の隆起を正確に予測し、時限危険警報を発する能力が制限される。本研究では,FloodCastという大規模に動作可能な高速で安定,高精度,解像度不変,幾何適応的な洪水モデリングおよび予測フレームワークを構築した。このフレームワークは、マルチ衛星観測と流体力学モデリングの2つの主要なモジュールから構成されている。マルチ衛星観測モジュールでは,大規模洪水予測におけるマルチ衛星観測の可能性をフル活用するために,リアルタイムな教師なし変化検出法と降雨処理・解析ツールが提案されている。流体力学モデリングモジュールでは、物理インフォームドニューラルネットワークにおけるデータトレーニングの要件がなく、フーリエニューラル演算子による高速で正確で解像度不変のアーキテクチャを特徴とする幾何適応型物理インフォームドニューラルソルバ(GeoPINS)が導入された。 GeoPINSは、一般的なPDEにおいて、正規および不規則なドメインにまたがる印象的なパフォーマンスを示す。大規模洪水モデルにおいて,GeoPINS を用いた長期時間系列と広域空間領域を扱うためのシーケンス・ツー・シーケンスのGeoPINS モデルを提案する。次に,2022年パキスタン洪水における様々な洪水予測手法を評価するために,ベンチマークデータセットを構築した。最後に, 時空間下降時の浸水範囲, 深さ, 移動可能性の3次元的検証を行った。従来の流体力学とシークエンス・ツー・シークエンス(Sequence-to-Sequence)のGeoPINSは、SARに基づく洪水深度データと比較すると、シークエンス・ツー・シークエンス・ジオPINSは予測誤差が小さく、従来の流体力学よりも優れていた。 Large-scale hydrodynamic models generally rely on fixed-resolution spatial grids and model parameters as well as incurring a high computational cost. This limits their ability to accurately forecast flood crests and issue time-critical hazard warnings. In this work, we build a fast, stable, accurate, resolution-invariant, and geometry-adaptative flood modeling and forecasting framework that can perform at large scales, namely FloodCast. The framework comprises two main modules: multi-satellite observation and hydrodynamic modeling. In the multi-satellite observation module, a real-time unsupervised change detection method and a rainfall processing and analysis tool are proposed to harness the full potential of multi-satellite observations in large-scale flood prediction. In the hydrodynamic modeling module, a geometry-adaptive physics-informed neural solver (GeoPINS) is introduced, benefiting from the absence of a requirement for training data in physics-informed neural networks and featuring a fast, accurate, and resolution-invariant architecture with Fourier neural operators. GeoPINS demonstrates impressive performance on popular PDEs across regular and irregular domains. Building upon GeoPINS, we propose a sequence-to-sequence GeoPINS model to handle long-term temporal series and extensive spatial domains in large-scale flood modeling. Next, we establish a benchmark dataset in the 2022 Pakistan flood to assess various flood prediction methods. Finally, we validate the model in three dimensions - flood inundation range, depth, and transferability of spatiotemporal downscaling. Traditional hydrodynamics and sequence-to-sequence GeoPINS exhibit exceptional agreement during high water levels, while comparative assessments with SAR-based flood depth data show that sequence-to-sequence GeoPINS outperforms traditional hydrodynamics, with smaller prediction errors.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 分析-評価-クリーティング:ビジュアルプログラミング領域における計算思考と問題解決の評価 Analyzing-Evaluating-Creating: Assessing Computational Thinking and Problem Solving in Visual Programming Domains ( http://arxiv.org/abs/2403.12227v1 ) ライセンス: Link先を確認	Ahana Ghosh, Liina Malva, Adish Singla,	(参考訳) コンピュータ思考(CT)と問題解決のスキルは、世界中のK-8スクールカリキュラムに統合されつつある。その結果、これらのスキルの学生の熟練度を評価するための信頼性評価を開発する必要性が高まっている。近年の研究では、様々なCT概念や実践、特に大規模研究における精神測定的検証と使用を可能にする多項目に基づいて、これらのスキルを評価するための試験が提案されている。実際の関連性にもかかわらず、これらのテストは学生の計算的創造性を測定する方法に限られており、実際の環境でCTと問題解決を適用する上で重要な能力である。本研究は,ブルームの分類学における3つの高い認知レベル,すなわちアナライズ,評価,創造に焦点を当てた新しいテストであるACEを開発した。 ACEは、これらの3つのレベルにまたがる7x3の多目的アイテムの多種多様なセットで構成されており、基本的なブロックベースのビジュアルプログラミングに基づいている。学年3～7年生371名を対象に,ACEの心理測定特性について検討した。いくつかの心理測定分析フレームワークに基づいて,ACEの信頼性と妥当性を確認した。 Code.org による Hour of Code: Maze Challenge の成績と ACE における学生の成績との間にも正の相関関係が認められた。 Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide. Consequently, there is a growing need to develop reliable assessments for measuring students' proficiency in these skills. Recent works have proposed tests for assessing these skills across various CT concepts and practices, in particular, based on multi-choice items enabling psychometric validation and usage in large-scale studies. Despite their practical relevance, these tests are limited in how they measure students' computational creativity, a crucial ability when applying CT and problem solving in real-world settings. In our work, we have developed ACE, a novel test focusing on the three higher cognitive levels in Bloom's Taxonomy, i.e., Analyze, Evaluate, and Create. ACE comprises a diverse set of 7x3 multi-choice items spanning these three levels, grounded in elementary block-based visual programming. We evaluate the psychometric properties of ACE through a study conducted with 371 students in grades 3-7 from 10 schools. Based on several psychometric analysis frameworks, our results confirm the reliability and validity of ACE. Our study also shows a positive correlation between students' performance on ACE and performance on Hour of Code: Maze Challenge by Code.org.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 画像偽造解析のための物体マスク誘導型フュージョントランス Fusion Transformer with Object Mask Guidance for Image Forgery Analysis ( http://arxiv.org/abs/2403.12229v1 ) ライセンス: Link先を確認	Dimitrios Karageorgiou, Giorgos Kordopatis-Zilos, Symeon Papadopoulos,	(参考訳) 本研究では,様々な法医学的信号から情報を抽出し,ロバストな画像フォージェリ検出とローカライゼーションを実現するための融合トランスフォーマーネットワークであるOMG-Fuserを紹介する。我々のアプローチは、任意の数の法定信号で動作することができ、その分析にオブジェクト情報を利用することができます。そこで我々は,物体の注意機構によって誘導される変圧器からなる法医学信号ストリームを設計し,同一の物体を表すパッチを関連付ける。このようにして、画像からオブジェクトレベルの情報を取り込む。各法医学信号は、その特異性に適応する異なるストリームによって処理される。その後、トークン融合変換器は任意の数のネットワークストリームの出力を効率よく集約し、各画像パッチに対して融合表現を生成する。これらの表現は最終的に、画像パッチ間の固有の関係をキャプチャする長距離依存変換器によって処理される。提案手法上の2つの融合変種を評価する。 (i)複数の画像鑑定アルゴリズムの出力を融合するスコアレベル融合と (ii)低レベルの法医学的痕跡を直接融合する特徴レベルの融合。どちらの変種も画像偽造検出とローカライゼーションのための7つのデータセットの最先端性能を超えており、F1の相対的な平均改善は12.1%と20.4%である。我々のネットワークは、伝統的で新しい偽造攻撃に対する堅牢性を示し、スクラッチからトレーニングを受けることなく、新しい信号で拡張することができる。 In this work, we introduce OMG-Fuser, a fusion transformer-based network designed to extract information from various forensic signals to enable robust image forgery detection and localization. Our approach can operate with an arbitrary number of forensic signals and leverages object information for their analysis -- unlike previous methods that rely on fusion schemes with few signals and often disregard image semantics. To this end, we design a forensic signal stream composed of a transformer guided by an object attention mechanism, associating patches that depict the same objects. In that way, we incorporate object-level information from the image. Each forensic signal is processed by a different stream that adapts to its peculiarities. Subsequently, a token fusion transformer efficiently aggregates the outputs of an arbitrary number of network streams and generates a fused representation for each image patch. These representations are finally processed by a long-range dependencies transformer that captures the intrinsic relations between the image patches. We assess two fusion variants on top of the proposed approach: (i) score-level fusion that fuses the outputs of multiple image forensics algorithms and (ii) feature-level fusion that fuses low-level forensic traces directly. Both variants exceed state-of-the-art performance on seven datasets for image forgery detection and localization, with a relative average improvement of 12.1% and 20.4% in terms of F1. Our network demonstrates robustness against traditional and novel forgery attacks and can be expanded with new signals without training from scratch.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# ハードサンプルのメタラーニングによる一般化の改善 Improving Generalization via Meta-Learning on Hard Samples ( http://arxiv.org/abs/2403.12236v1 ) ライセンス: Link先を確認	Nishant Jain, Arun S. Suggala, Pradeep Shenoy,	(参考訳) 教師付き学習に対する学習再重み付け(LRW)アプローチでは、代表検証データセットのパフォーマンスを最大化するために、最適化基準を使用してトレーニングインスタンスの重み付けを割り当てる。 LRWトレーニングで使用される検証セットを最適化し、分類器の一般化を改善する。特に、検証集合における分類の難しいインスタンスの使用は、理論上の関係と、一般化の強い経験的証拠の両方を持つことを示す。このメタ最適化モデルを学習するための効率的なアルゴリズムと、注意深い比較研究のための単純なトレインツースヒューリスティックを提供する。簡単な検証データを持つLRWは、ハードな検証データを持つLRWよりも一貫して悪い性能を示し、メタ最適化問題の妥当性を確立した。提案アルゴリズムは,データセットやドメインシフトの課題(Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDSなど)に対して,VIT-BをImagenet上で使用する場合の約1%のゲインで,幅広いベースラインを達成している。また、LRWトレーニングにおける検証のための自然なハード例(Imagenet-R / Imagenet-A)を使用することで、クリーンかつ自然なテストインスタンスの性能が1-2%向上することを示す。 2次解析により、LRWフレームワークにおけるハード検証データを使用することで、テストデータのマージンが向上し、経験的ゲインの基礎となるメカニズムが示唆された。本研究は,メタ学習を教師付き学習コンテキストでメタ学習に最適化するための新たな研究の方向性を開くと信じている。 Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 資源制約型IoT環境における効率的なトランスフォーマーベースハイパーパラメータ最適化 Efficient Transformer-based Hyper-parameter Optimization for Resource-constrained IoT Environments ( http://arxiv.org/abs/2403.12237v1 ) ライセンス: Link先を確認	Ibrahim Shaer, Soodeh Nikan, Abdallah Shami,	(参考訳) ハイパーパラメータ最適化(HPO)プロセスは、最も優れた畳み込みニューラルネットワーク(CNN)を見つけるために必須である。 HPOの自動化プロセスは、その巨大な計算フットプリントと透明性の欠如を特徴としている。本稿では,トランスフォーマアーキテクチャとアクタ・クリティック・強化学習(RL)モデルを組み合わせた新しい手法であるTRL-HPOを提案する。これらの仮定は、MNISTデータセット上でTRL-HPOを評価し、CNNモデルをスクラッチから構築する最先端のアプローチと比較することによって、実証的に構築される。 TRL-HPOは,HPOプロセスにおけるTRL-HPOの効率を実証し,これらの手法の分類結果を同時に6.8%向上させることを示した。この結果から, 完全に連結した層を積み重ねることによる性能劣化の主要因を同定した。本稿では,資源制約環境下でのRLベースのHPOプロセスを改善するための新しい方法について述べる。 The hyper-parameter optimization (HPO) process is imperative for finding the best-performing Convolutional Neural Networks (CNNs). The automation process of HPO is characterized by its sizable computational footprint and its lack of transparency; both important factors in a resource-constrained Internet of Things (IoT) environment. In this paper, we address these problems by proposing a novel approach that combines transformer architecture and actor-critic Reinforcement Learning (RL) model, TRL-HPO, equipped with multi-headed attention that enables parallelization and progressive generation of layers. These assumptions are founded empirically by evaluating TRL-HPO on the MNIST dataset and comparing it with state-of-the-art approaches that build CNN models from scratch. The results show that TRL-HPO outperforms the classification results of these approaches by 6.8% within the same time frame, demonstrating the efficiency of TRL-HPO for the HPO process. The analysis of the results identifies the main culprit for performance degradation attributed to stacking fully connected layers. This paper identifies new avenues for improving RL-based HPO processes in resource-constrained environments.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 6Gセキュリティにおける大規模言語モデル - 課題と機会 Large language models in 6G security: challenges and opportunities ( http://arxiv.org/abs/2403.12239v1 ) ライセンス: Link先を確認	Tri Nguyen, Huong Nguyen, Ahmad Ijaz, Saeid Sheikhi, Athanasios V. Vasilakos, Panos Kostakos,	(参考訳) 教育や医療などの分野におけるジェネレーティブAI(GenAI)とLarge Language Models(LLMs)の急速な統合は、テクノロジーの大幅な進歩を象徴している。しかし、この成長は、ほとんど未調査の側面、すなわちセキュリティ上の脆弱性につながっている。オフラインおよびオンラインモデル、さまざまなツール、ブラウザプラグイン、サードパーティアプリケーションを含むエコシステムが拡大を続けるにつれ、攻撃面が大幅に拡大し、セキュリティ侵害の可能性も拡大する。 6Gやランドスケープを超えて拡張されたこれらの拡張は、敵が悪意ある目的のためにLSMを操作するための新たな道を提供する。我々は,LLMのセキュリティ面に,潜在的な敵の立場から焦点をあてる。我々は,その目的と方法論を解明し,既知のセキュリティの弱点を詳細に分析することを目的としている。これには包括的脅威分類の開発が含まれ、様々な敵の行動を分類する。また、我々の研究は、防衛チーム(ブルーチームとしても知られる)によるサイバーセキュリティ活動にLLMがどのように統合されるかに焦点を当てます。 LLMとブロックチェーン技術間のシナジーの可能性を探り、この組み合わせが次世代の完全自律型セキュリティソリューションの開発にどのように寄与するかを検討します。このアプローチは、コンピュータ連続体全体にわたって統一されたサイバーセキュリティ戦略を確立することを目的としており、デジタルセキュリティインフラストラクチャ全体の強化を目的としている。 The rapid integration of Generative AI (GenAI) and Large Language Models (LLMs) in sectors such as education and healthcare have marked a significant advancement in technology. However, this growth has also led to a largely unexplored aspect: their security vulnerabilities. As the ecosystem that includes both offline and online models, various tools, browser plugins, and third-party applications continues to expand, it significantly widens the attack surface, thereby escalating the potential for security breaches. These expansions in the 6G and beyond landscape provide new avenues for adversaries to manipulate LLMs for malicious purposes. We focus on the security aspects of LLMs from the viewpoint of potential adversaries. We aim to dissect their objectives and methodologies, providing an in-depth analysis of known security weaknesses. This will include the development of a comprehensive threat taxonomy, categorizing various adversary behaviors. Also, our research will concentrate on how LLMs can be integrated into cybersecurity efforts by defense teams, also known as blue teams. We will explore the potential synergy between LLMs and blockchain technology, and how this combination could lead to the development of next-generation, fully autonomous security solutions. This approach aims to establish a unified cybersecurity strategy across the entire computing continuum, enhancing overall digital security infrastructure.	翻訳日:2024-03-20 18:12:11 公開日:2024-03-18
# 基準に基づくメトリクスは質問生成のテーマを異にする Reference-based Metrics Disprove Themselves in Question Generation ( http://arxiv.org/abs/2403.12242v1 ) ライセンス: Link先を確認	Bang Nguyen, Mengxia Yu, Yun Huang, Meng Jiang,	(参考訳) BLEUやBERTScoreのような基準ベースのメトリクスは、質問生成(QG)を評価するために広く使われている。本研究では、SQuADやHotpotQAなどのQGベンチマークにおいて、人手による参照を用いることで基準ベースのメトリクスの有効性を保証できないことを示す。ほとんどのQGベンチマークには1つの参照しかありません。優れた測定基準は、生成した質問に比較して、人間公認の質問を格付けすることが期待された。しかし, 新たに収集した基準値に対する基準基準値の結果は, 基準値自体を反証した。本研究では,大規模言語モデルを用いて,自然性,応答可能性,複雑性などの多次元基準からなる基準自由度尺度を提案する。これらの基準は単一の参照質問の構文や意味に制約されず、メトリクスは多様な参照セットを必要としない。実験の結果、我々の測定基準は高品質な質問と欠陥のある質問を正確に区別し、人間の判断と最先端の一致を実現していることがわかった。 Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# ゼロショットマルチタスク幻覚検出 Zero-Shot Multi-task Hallucination Detection ( http://arxiv.org/abs/2403.12244v1 ) ライセンス: Link先を確認	Patanjali Bhamidipati, Advaith Malladi, Manish Shrivastava, Radhika Mamidi,	(参考訳) 近年,大規模言語モデルの広範囲な活用は,テキスト生成の品質評価やタスク関連性評価において,ロバストな評価手法の重要性を浮き彫りにしている。これは、生成したテキストがソースへの忠実さに欠け、評価基準から逸脱する、モデルにおける創発的条件である幻覚として知られる一般的な問題を明らかにしている。本研究では,幻覚を正式に定義し,ゼロショット設定における定量的検出のための枠組みを提案する。幻覚検出では, モデル認識設定では0.78, モデル認識設定では0.61の精度が得られた。特に、我々のソリューションは計算効率を保ち、他のSOTAアプローチよりも計算資源をはるかに少なくし、軽量で圧縮されたモデルへの傾向に合わせている。 In recent studies, the extensive utilization of large language models has underscored the importance of robust evaluation methodologies for assessing text generation quality and relevance to specific tasks. This has revealed a prevalent issue known as hallucination, an emergent condition in the model where generated text lacks faithfulness to the source and deviates from the evaluation criteria. In this study, we formally define hallucination and propose a framework for its quantitative detection in a zero-shot setting, leveraging our definition and the assumption that model outputs entail task and sample specific inputs. In detecting hallucinations, our solution achieves an accuracy of 0.78 in a model-aware setting and 0.61 in a model-agnostic setting. Notably, our solution maintains computational efficiency, requiring far less computational resources than other SOTA approaches, aligning with the trend towards lightweight and compressed models.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 寄生虫群集:ゴールデンフリーPCB検証の可能性について Parasitic Circus:On the Feasibility of Golden Free PCB Verification ( http://arxiv.org/abs/2403.12252v1 ) ライセンス: Link先を確認	Maryam Saadat Safa, Patrick Schaumont, Shahin Tajik,	(参考訳) プリント回路基板(PCB)は電子システムの不可欠な部分である。したがって、サプライチェーン攻撃(例えば、改ざん、偽造)の存在下での物理的な整合性を検証することは、非常に重要である。近年,PCBの電力配信ネットワーク(PDN)のインピーダンス特性を基盤としたタンパー検出技術が,そのグローバルな検出範囲,非侵襲性,低コスト性により注目されている。他の物理的検証方法と同様に、これらの手法は署名の比較のために物理的な金のサンプルの存在に依存している。しかし、ゴールデンシグネチャ抽出のための物理的なゴールデンサンプルにアクセスすることは、多くの現実世界のシナリオでは実現不可能である。そこで本研究では,PCB設計ファイルから得られた金のサンプルを除去し,模擬金のシグネチャに置き換える可能性について検討する。社内設計PCB上で広範囲なシミュレーションと測定を行うことにより,PCBコンポーネントの寄生インピーダンスが,検証に成功するための重要な役割を担っていることを示す。得られた結果と統計値を用いて,シミュレーションと測定から得られたシグネチャの差を緩和できることを示す。 Printed circuit boards (PCBs) are an integral part of electronic systems. Hence, verifying their physical integrity in the presence of supply chain attacks (e.g., tampering and counterfeiting) is of utmost importance. Recently, tamper detection techniques grounded in impedance characterization of PCB's Power Delivery Network (PDN) have gained prominence due to their global detection coverage, non-invasive, and low-cost nature. Similar to other physical verification methods, these techniques rely on the existence of a physical golden sample for signature comparisons. However, having access to a physical golden sample for golden signature extraction is not feasible in many real-world scenarios. In this work, we assess the feasibility of eliminating a physical golden sample and replacing it with a simulated golden signature obtained by the PCB design files. By performing extensive simulation and measurements on an in-house designed PCB, we demonstrate how the parasitic impedance of the PCB components plays a major role in reaching a successful verification. Based on the obtained results and using statistical metrics, we show that we can mitigate the discrepancy between collected signatures from simulation and measurements.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 生成ディープラーニングを用いた適応LDDレーダ波形設計 Adaptive LPD Radar Waveform Design with Generative Deep Learning ( http://arxiv.org/abs/2403.12254v1 ) ライセンス: Link先を確認	Matthew R. Ziemann, Christopher A. Metzler,	(参考訳) 本研究では,その動作環境に混在する低検出(LPD)レーダ波形を適応的に生成する新しい学習手法を提案する。私たちの波形は、周囲の無線周波数(RF)の背景と区別できない分布に従うように設計されています。生成ネットワークは、生成した波形を背景から区別するために最適化された、批評家ネットワークを混乱させるように設計された波形を生成する。生成した波形がまだ検出に有効であることを確かめるために、生成した波形にあいまいさ関数に基づく損失を導入し、最小化する。本研究では, 従来のLPD波形と比較し, 独立に学習した検出ニューラルネットワークを用いて, 単一パルス検出性能の評価を行った。提案手法では,検出性を最大90%低減するLPD波形を生成できると同時に,両義性関数(センサ)特性を向上できることがわかった。私たちのフレームワークは、検出性と検知性能をトレードオフするメカニズムも提供しています。 We propose a novel, learning-based method for adaptively generating low probability of detection (LPD) radar waveforms that blend into their operating environment. Our waveforms are designed to follow a distribution that is indistinguishable from the ambient radio frequency (RF) background -- while still being effective at ranging and sensing. To do so, we use an unsupervised, adversarial learning framework; our generator network produces waveforms designed to confuse a critic network, which is optimized to differentiate generated waveforms from the background. To ensure our generated waveforms are still effective for sensing, we introduce and minimize an ambiguity function-based loss on the generated waveforms. We evaluate the performance of our method by comparing the single-pulse detectability of our generated waveforms with traditional LPD waveforms using a separately trained detection neural network. We find that our method can generate LPD waveforms that reduce detectability by up to 90% while simultaneously offering improved ambiguity function (sensing) characteristics. Our framework also provides a mechanism to trade-off detectability and sensing performance.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 遷移金属シバルコゲナイドモアレ超格子におけるウィグナー分子超結晶-ボトムアップアプローチからの教訓- Wigner-molecule supercrystal in transition-metal dichalcogenide moiré superlattices: Lessons from the bottom-up approach ( http://arxiv.org/abs/2403.12262v1 ) ライセンス: Link先を確認	Constantine Yannouleas, Uzi Landman,	(参考訳) N=4$フェルミオン型電荷担体を2重井戸型モワール='{e}量子ドット(MQD)において、遷移金属ジカルコゲナイド(TMD)モワール='e超格子における分子超結晶の形成を研究するためのボトムアップ戦略の第一歩として、$\nu > 1$の最小ボディ問題は、フル構成相互作用(FCI)計算による大規模精密対角化(英語版)を用いることで、正確に解決される。しばしば用いられるスピン・アンド・スペース・アンリミテッド・ハートリー・フォック(英語版)(sS-UHF)の平均場解との比較解析は、UHF法(それ自体)の限界を証明し、中間クーロン相互作用の影響を適切に記述する。特に、$\nu=2$ に対して、各 MQD 内の正確な電荷密度 (CDs) は、ウィグナー分子 (WMs) のスライドで見られるように、完全に孤立した MQD のリング状形状特性 (幅広い関連するパラメータに対して) を保持することが明確に示されている。この深く量子力学的な振る舞いは、向きに固定されたよく局在したダンベル二量体のみを描写するUHF CDとは対照的である。さらに sS-UHF の破壊パリティ対称性の復元から得られた FCI 計算と一致する改良CDを導入し, sS-UHF 結果を修正するための平均値を超える方法論的ロードマップを提案する。 $\nu=2$ moir\'e TMD 上の超格子の結論は、孤立 MQD におけるスライド WM に関連する積分充填を持つすべてのケースに拡張されると推測されている。 $\nu=3$のケースは、孤立MQDにおけるピン付きWMと関連付けられており、例外である。 The few-body problem for $N=4$ fermionic charge carriers in a double-well moir\'{e} quantum dot (MQD), representing the first step in a bottom-up strategy to investigate formation of molecular supercrystals in transition metal dichalcogenide (TMD) moir\'e superlattices with integral fillings, $\nu > 1$, is solved exactly by employing large-scale exact-diagonalization via full configuration interaction (FCI) computations. A comparative analysis with the mean-field solutions of the often used spin-and-space unrestricted Hartree Fock (sS-UHF) demonstrates the limitations of the UHF method (by itself) to provide a proper description of the influence of the interdot Coulomb interaction. In particular, it is explicitly shown for $\nu=2$ that the exact charge densities (CDs) within each MQD retain the ring-like shape characteristic (for a wide range of relevant parameters) of a fully isolated MQD, as was found for sliding Wigner molecules (WMs). This deeply quantum-mechanical behavior contrasts sharply with the UHF CDs that portray solely orientationally pinned and well localized dumbbell dimers. An improved CD, which agrees with the FCI-calculated one, derived from the restoration of the sS-UHF broken parity symmetries is further introduced, suggesting a beyond-mean-field methodological roadmap for correcting the sS-UHF results. It is conjectured that the conclusions for the $\nu=2$ moir\'e TMD superlattice case extend to all cases with integral fillings that are associated with sliding WMs in isolated MQDs. The case of $\nu=3$, associated with a pinned WM in isolated MQDs, is an exception.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 包括的包括的包括的包括的地域的イニシアティブ : 共同設計支援技術へのコミュニティ統合 Fostering Inclusion: A Regional Initiative Uniting Communities to Co-Design Assistive Technologies ( http://arxiv.org/abs/2403.12263v1 ) ライセンス: Link先を確認	Katharina Schmermbeck, Oliver Ott, Lennart Ralfs, Robert Weidner,	(参考訳) 障害者は社会のあらゆる領域で差別やアクセスの欠如に直面していることが多い。補助技術の可利用性と適切性の向上は、参加や自立を容易にするための道を開くことができるが、社会の一部としての障害者の認識と受容は避けられない。提示された地域的取り組みは、障害のある人々、学生、研究者、協会をまとめることによって、これらの課題に対処しようとするものである。大学における様々な講義形式において、学生は障害のある人と支援技術を共同設計する。 1年間の実践の後、我々はこのイニシアチブとその支援技術の発展と能力主義の緩和への影響を振り返る。参加者や他の関係者を対象に,13回の半構造化インタビューを実施し,分析した。すべての共同設計プロジェクトは講義中に完了したわけではない。それにもかかわらず、参加者は共同設計のアプローチと正しい方向へのステップを理解できた。インタビュアーは、このイニシアチブが認知を高めることの重要性を強調し、障害に関する知識を広げ、参加する人々に対して内部的に有能な仮定を仮定した。我々は、具体的な補助技術、アクセシビリティのギャップを埋め、より包括的な社会を育むために、コラボレーション、継続、および公的なアウトリーチが最重要であると結論付けている。 People with disabilities often face discrimination and lack of access in all areas of society. While improving the affordability and appropriateness of assistive technologies can pave the way for easier participation and independence, awareness and acceptance of disability as part of society are inevitable. The presented regional initiative strives to tackle these problems by bringing together people with disabilities, students, researchers, and associations. During different lecture formats at the university, students co-design assistive technologies with people with disabilities. After one year in practice, we reflect on the initiative and its impact on assistive technology development and mitigation of ableism. We conducted and analyzed thirteen semi-structured interviews with participants and other involved stakeholders. Not all co-design projects were finished within the time of a lecture. Participants nevertheless appreciated the co-design approach and steps in the right direction as projects are continued in upcoming semesters. Interviewees highlighted the initiative's importance in raising awareness and broadening knowledge regarding disability and internalized ableist assumptions for those participating. We conclude that collaboration, continuity, and public outreach are most important to work towards tangible assistive technologies, bridging accessibility gaps, and fostering a more inclusive society.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# データ効率の良いコントラスト言語画像事前学習:量よりもデータ品質を優先する Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity ( http://arxiv.org/abs/2403.12267v1 ) ライセンス: Link先を確認	Siddharth Joshi, Arnav Jain, Ali Payani, Baharan Mirzasoleiman,	(参考訳) 大規模画像キャプチャデータセット上でのCLIP(Contrastive Language- Image Pre-Training)は、目覚ましいゼロショットの一般化を実現する表現を学習する。しかし、そのようなモデルは大量の事前学習データを必要とする。事前トレーニングデータの品質向上は、ボリュームの増加よりもCLIPのパフォーマンス向上に有効であることが示されている。それでも、ベストを確実に一般化するトレーニングデータの小さなサブセットを見つけることは、未解決の問題のままである。本稿では,CLIPの理論的に厳密なデータ選択法を提案する。画像とキャプションの相互共分散を密に保存する部分集合は、より優れた一般化性能が得られることを示す。 ConceptualCaptions3MとConceptualCaptions12Mの広範な実験により、 \method\が発見したサブセットは、ImageNetとそのシフトしたバージョンにおける次の最良のベースラインの精度の2.7倍と1.4倍の精度を達成することが示された。さらに,我々のサブセットでは,11の下流データセットの平均精度が1.5倍になることを示す。コードはhttps://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip.comで入手できる。 Contrastive Language-Image Pre-training (CLIP) on large-scale image-caption datasets learns representations that can achieve remarkable zero-shot generalization. However, such models require a massive amount of pre-training data. Improving the quality of the pre-training data has been shown to be much more effective in improving CLIP's performance than increasing its volume. Nevertheless, finding small subsets of training data that provably generalize the best has remained an open question. In this work, we propose the first theoretically rigorous data selection method for CLIP. We show that subsets that closely preserve the cross-covariance of the images and captions of the full data provably achieve a superior generalization performance. Our extensive experiments on ConceptualCaptions3M and ConceptualCaptions12M demonstrate that subsets found by \method\ achieve over 2.7x and 1.4x the accuracy of the next best baseline on ImageNet and its shifted versions. Moreover, we show that our subsets obtain 1.5x the average accuracy across 11 downstream datasets, of the next best baseline. The code is available at: https://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# ウィグナー関数の音化--強い光-物質相互作用のケーススタディ Sonification of Wigner functions: case study of intense light-matter interactions ( http://arxiv.org/abs/2403.12269v1 ) ライセンス: Link先を確認	Reiko Yamada, Antoine Reserbat-Plantey, Eloy Piñol, Maciej Lewenstein,	(参考訳) 量子力学において、ウィグナー函数 $\rho_W(\textbf{r},\textbf{p})$ は位相空間表現として機能し、量子系の位置 $\textbf{r}$ と運動量 $\textbf{p}$ の両方に関する情報を取得する。ウィグナー関数は観測変数の期待値の計算、量子力学の検証、コヒーレンスと相関の解析を容易にする。したがって、例えば音素化技術を用いて、量子システムを直感的に表現するためのツールとして機能するかもしれない。本稿では,前回のプロジェクトにおける実験戦略を要約し,その成果に基づく新しいアプローチについて述べる。提案手法は,量子現象の直観的理解と解釈を高めることを目的として,量子化とスコアリングのプロセスを洗練することを目的としている。 In quantum mechanics, the Wigner function $\rho_W(\textbf{r},\textbf{p})$ serves as a phase-space representation, capturing information about both the position $\textbf{r}$ and momentum $\textbf{p}$ of a quantum system. The Wigner function facilitates the calculation of expectation values of observables, examination of quantum system dynamics, and analysis of coherence and correlations. Therefore, it might serve as a tool to express quantum systems intuitively, for example, by using sonification techniques. This paper summarizes the experimental strategies employed in a previous project and delineates a new approach based on its outcomes. Emphasizing the attribution of specific Wigner functions to their underlying quantum states, dynamics, and sources; our proposed methodology seeks to refine the sonification and scoring process, aiming to enhance intuitive understanding and interpretation of quantum phenomena.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 相関電子波動関数の変動補間によって実現される高速かつ正確な非断熱的分子動力学 Fast and accurate nonadiabatic molecular dynamics enabled through variational interpolation of correlated electron wavefunctions ( http://arxiv.org/abs/2403.12275v1 ) ライセンス: Link先を確認	Kemal Atalar, Yannic Rath, Rachel Crespo-Otero, George H. Booth,	(参考訳) 本研究では, 固有ベクトル継続の概念に基づいて, 平均フィールドコストで化学空間を通した多体波動関数の訓練セットを, 厳密かつ滑らかに補間する効率的な多状態法を開発した。推定された状態は、異なる核ジオメトリの多体基底間で伝達される訓練状態の変分最適線形結合として表される。モデルから解析的多状態力と非断熱的結合が非断熱的分子動力学に適用可能であることを示す。このことは、光励起された28原子水素鎖の非断熱的分子動力学に応用し、結果として生じる核運動が驚くほど複雑になる。異なるジオメトリーにおける低エネルギー相関電子構造からのトレーニング状態の22個のDMRG計算で、12,000ジオメトリーにおける多状態エネルギー, 力および非断熱結合ベクトルを、ブルート力アプローチでは実現できない分子軌道のアンサンブルに沿った高精度な収束性で推定する。これにより、正確な単一点相関電子構造法と光誘起分子動力学の関連性の時間スケールの間に時間スケールを橋渡しするルートが開かれる。 We build on the concept of eigenvector continuation to develop an efficient multi-state method for the rigorous and smooth interpolation of a small training set of many-body wavefunctions through chemical space at mean-field cost. The inferred states are represented as variationally optimal linear combinations of the training states transferred between the many-body basis of different nuclear geometries. We show that analytic multi-state forces and nonadiabatic couplings from the model enable application to nonadiabatic molecular dynamics, developing an active learning scheme to ensure a compact and systematically improvable training set. This culminates in application to the nonadiabatic molecular dynamics of a photoexcited 28-atom hydrogen chain, with surprising complexity in the resulting nuclear motion. With just 22 DMRG calculations of training states from the low-energy correlated electronic structure at different geometries, we infer the multi-state energies, forces and nonadiabatic coupling vectors at 12,000 geometries with provable convergence to high accuracy along an ensemble of molecular trajectories, which would not be feasible with a brute force approach. This opens up a route to bridge the timescales between accurate single-point correlated electronic structure methods and timescales of relevance for photo-induced molecular dynamics.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 急激な正則化を図った確率的ラウンドリング Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices ( http://arxiv.org/abs/2403.12278v1 ) ライセンス: Link先を確認	Gregory Dexter, Christos Boutsikas, Linkai Ma, Ilse C. F. Ipsen, Petros Drineas,	(参考訳) 機械学習の文脈における確率的ラウンドリングの人気と大規模ディープニューラルネットワークモデルの訓練により、実行列の確率的近接性ラウンドリングは列よりも多くの行を持つと考えられる。確率の高い確率で、確率的に丸い行列の最小特異値がゼロから十分離れているという新しい理論的な証拠を提供する。言い換えれば、確率的丸み付け \textit{implicitly regularizes} の高さと細い行列は $\mathbf{A}$ であり、丸み付きバージョンは全列ランクを持つ。我々の証明はランダム行列理論の強力な結果を利用しており、確率的丸め誤差は低次元の列空間に集中しないという考え方である。 Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# FinLlama: アルゴリズムトレーディングアプリケーションのためのファイナンシャルインセンティブ分類 FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications ( http://arxiv.org/abs/2403.12285v1 ) ライセンス: Link先を確認	Thanos Konstantinidis, Giorgos Iacovides, Mingxue Xu, Tony G. Constantinides, Danilo Mandic,	(参考訳) 市場の動きやトレーダーの判断に影響を及ぼす金融ニュースは、オンラインでもいくつか出ている。これは正確な感情分析の必要性を強調し、適切なアルゴリズムトレーディング技術を持つことに加えて、より詳細なトレーディング決定を下す必要がある。標準的なレキシコンベースの感情アプローチは、財政的な決定を補助する力を示している。しかし、文脈の感度や単語の順序に関する問題に悩まされていることが知られている。 LLM(Large Language Models)もこの文脈で使用することができるが、財務に特化せず、重要な計算資源を必要とする傾向がある。 Llama 2 7Bの基礎モデルに基づく新たなアプローチを導入し,その生成特性と包括的言語操作のメリットを享受する。これは、Llama2 7Bモデルを教師付き財務感情分析データのごく一部に微調整することで、金融レキシコンとコンテキストの複雑さを共同で処理し、さらにニューラルネットワークに基づく決定機構を組み込むことによって達成される。このようなジェネレータ分類スキームはFinLlamaと呼ばれ、感情の原子価を分類するだけでなく、その強さを定量化するために訓練されている。補足すれば、LoRAによるパラメータ効率の良い微調整の実装は、トレーニング可能なパラメータを最適化し、精度を犠牲にすることなく、計算とメモリの要求を最小限に抑えることができる。シミュレーションの結果は、FinLlamaがポートフォリオ管理の強化と市場リターンの向上のためのフレームワークを提供する能力を示している。これらの結果は、不安定な期間や予測不可能な市場イベントであっても、高いレジリエンスを示すハイリターンポートフォリオを構築するためのFinLlamaの能力の基盤となっている。 There are multiple sources of financial news online which influence market movements and trader's decisions. This highlights the need for accurate sentiment analysis, in addition to having appropriate algorithmic trading techniques, to arrive at better informed trading decisions. Standard lexicon based sentiment approaches have demonstrated their power in aiding financial decisions. However, they are known to suffer from issues related to context sensitivity and word ordering. Large Language Models (LLMs) can also be used in this context, but they are not finance-specific and tend to require significant computational resources. To facilitate a finance specific LLM framework, we introduce a novel approach based on the Llama 2 7B foundational model, in order to benefit from its generative nature and comprehensive language manipulation. This is achieved by fine-tuning the Llama2 7B model on a small portion of supervised financial sentiment analysis data, so as to jointly handle the complexities of financial lexicon and context, and further equipping it with a neural network based decision mechanism. Such a generator-classifier scheme, referred to as FinLlama, is trained not only to classify the sentiment valence but also quantify its strength, thus offering traders a nuanced insight into financial news articles. Complementing this, the implementation of parameter-efficient fine-tuning through LoRA optimises trainable parameters, thus minimising computational and memory requirements, without sacrificing accuracy. Simulation results demonstrate the ability of the proposed FinLlama to provide a framework for enhanced portfolio management decisions and increased market returns. These results underpin the ability of FinLlama to construct high-return portfolios which exhibit enhanced resilience, even during volatile periods and unpredictable market events.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 3次元解剖学的セグメンテーションにおけるスライス伝播不確かさの推定と解析 Estimation and Analysis of Slice Propagation Uncertainty in 3D Anatomy Segmentation ( http://arxiv.org/abs/2403.12290v1 ) ライセンス: Link先を確認	Rachaell Nihalaani, Tushar Kataria, Jadie Adams, Shireen Y. Elhabian,	(参考訳) 3次元解剖学的セグメンテーションの監視手法は優れた性能を示すが、アノテートされたデータの可用性によって制限されることが多い。この制限により、利用可能な無注釈データの豊富さと相まって、自己監督的なアプローチへの関心が高まっている。スライス伝播は、スライス登録を自己監督タスクとして活用し、最小限の監督で完全な解剖学的セグメンテーションを実現する自己監督的アプローチとして登場した。このアプローチによって、ドメインの専門知識、時間、およびセグメンテーションネットワークのトレーニングに必要な完全なアノテーション付きデータセット構築に伴うコストが大幅に削減される。しかし、この決定論的ネットワークによる監視の削減へのシフトは、特により正確な教師付きアプローチと比較して、予測の信頼性と信頼性に関する懸念を提起する。この問題に対処するため,キャリブレーションされた不確実性定量化(UQ)をスライス伝播法に統合し,モデルの予測信頼性と信頼性レベルについて考察する。不確実性対策を取り入れることで、自己管理アプローチに対するユーザの信頼感を高め、実用的な適用性を向上させる。 5つのUQ法を用いて3次元腹部分割のための3つのデータセットについて実験を行った。その結果,UQの導入はモデルの信頼性だけでなくセグメンテーションの精度も向上することがわかった。さらに, エンドユーザーにはすぐには明らかでないかもしれないスライス伝播手法の様々な障害モードを明らかにした。本研究は,スライス伝播法の精度と信頼性を向上させるため,新しい研究手法を開拓する。 Supervised methods for 3D anatomy segmentation demonstrate superior performance but are often limited by the availability of annotated data. This limitation has led to a growing interest in self-supervised approaches in tandem with the abundance of available un-annotated data. Slice propagation has emerged as an self-supervised approach that leverages slice registration as a self-supervised task to achieve full anatomy segmentation with minimal supervision. This approach significantly reduces the need for domain expertise, time, and the cost associated with building fully annotated datasets required for training segmentation networks. However, this shift toward reduced supervision via deterministic networks raises concerns about the trustworthiness and reliability of predictions, especially when compared with more accurate supervised approaches. To address this concern, we propose the integration of calibrated uncertainty quantification (UQ) into slice propagation methods, providing insights into the model's predictive reliability and confidence levels. Incorporating uncertainty measures enhances user confidence in self-supervised approaches, thereby improving their practical applicability. We conducted experiments on three datasets for 3D abdominal segmentation using five UQ methods. The results illustrate that incorporating UQ improves not only model trustworthiness, but also segmentation accuracy. Furthermore, our analysis reveals various failure modes of slice propagation methods that might not be immediately apparent to end-users. This study opens up new research avenues to improve the accuracy and trustworthiness of slice propagation methods.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# パルス励起および連続波励起によるh-BN量子エミッタの光子統計解析 Photon statistics analysis of h-BN quantum emitters with pulsed and continuous-wave excitation ( http://arxiv.org/abs/2403.12291v1 ) ライセンス: Link先を確認	Hamidreza Akbari, Pankaj K. Jha, Kristina Malinowski, Benjamin E. C. Koltenbah, Harry A. Atwater,	(参考訳) ヘキサゴナル窒化ホウ素(h-BN)量子エミッタの量子光子統計について,マンデルQパラメータを解析して報告する。我々は,h-BN量子エミッタのマンデルQパラメータを,様々な温度およびポンプ出力条件下で測定した。パルス励起では、-0.002のマンデルQと連続波励起(CW)により、このパラメータは-0.0025に達する。低温がマンデルQに与える影響を調べた結果,光子統計は温度とともに弱く変化することがわかった。励起2レベルエミッタモデルからの自然放出の計算により, 実験光子収集効率を考慮した場合, マンデルQパラメータと測定値との良好な一致を示す。最後に、乱数生成の例による量子応用におけるマンデルQの有用性を説明し、この方法によるランダムビットの生成速度に対するマンデルQの効果を分析する。 We report on the quantum photon statistics of hexagonal boron nitride (h-BN) quantum emitters by analyzing the Mandel Q parameter. We have measured the Mandel Q parameter for h-BN quantum emitters under various temperatures and pump power excitation conditions. Under pulsed excitation we can achieve a Mandel Q of -0.002 and under continuous-wave (CW) excitation this parameter can reach -0.0025. We investigate the effect of cryogenic temperatures on Mandel Q and conclude that the photon statistics vary weakly with temperature. Through calculation of spontaneous emission from an excited two-level emitter model, we demonstrate good agreement between measured and calculated Mandel Q parameter when accounting for the experimental photon collection efficiency. Finally, we illustrate the usefulness of Mandel Q in quantum applications by the example of random number generation and analyze the effect of Mandel Q on the speed of generating random bits via this method.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# DALL-E 2における合成構文と意味論の比較検討 A Comparative Investigation of Compositional Syntax and Semantics in DALL-E 2 ( http://arxiv.org/abs/2403.12294v1 ) ライセンス: Link先を確認	Elliot Murphy, Jill de Villiers, Sofia Lucero Morales,	(参考訳) 本研究は,DALL-E 2が,幼児の理解テストにおける言語指導の意味を視覚的にどう表現するかを比較検討した。 2～7歳の英語を話す数百人の子どもを対象にした評価試験から,文法知識の基本的構成要素を表す文を抽出した。 DALL-E 2は、大人9人の審査員が得点するために、これらのプロンプトを5回与え、アイテムごとに20の漫画を制作した。その結果,若年者(2歳)においても,DALL-E2生成画像が子供の意味的精度に合致する状況はみられなかった。 DALL-E 2 は、可逆的な形で適切な役割を割り当てることに失敗した; 子どもが受け取っていたよりコントラストの強いプロンプトにもかかわらず否定することに失敗した; 間違った名詞に形容詞を割り当てることがしばしばあり、受身者の暗黙のエージェントを無視した。この研究は、DALL-E 2の合成文表現が明らかに存在しないことを示唆している。 In this study we compared how well DALL-E 2 visually represented the meaning of linguistic prompts also given to young children in comprehension tests. Sentences representing fundamental components of grammatical knowledge were selected from assessment tests used with several hundred English-speaking children aged 2-7 years for whom we had collected original item-level data. DALL-E 2 was given these prompts five times to generate 20 cartoons per item, for 9 adult judges to score. Results revealed no conditions in which DALL-E 2-generated images that matched the semantic accuracy of children, even at the youngest age (2 years). DALL-E 2 failed to assign the appropriate roles in reversible forms; it failed on negation despite an easier contrastive prompt than the children received; it often assigned the adjective to the wrong noun; it ignored implicit agents in passives. This work points to a clear absence of compositional sentence representations for DALL-E 2.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 偽カバレッジ率制御による情報共形予測セットの選択 Selecting informative conformal prediction sets with false coverage rate control ( http://arxiv.org/abs/2403.12295v1 ) ライセンス: Link先を確認	Ulysse Gazin, Ruth Heller, Ariane Marandon, Etienne Roquain,	(参考訳) 回帰と分類を含む教師付き学習において、コンフォメーション法は、任意の機械学習予測器に対して有限サンプルカバレッジで結果/ラベルの予測セットを提供する。このような予測セットが選択プロセスの後に現れる場合を考える。選択過程は、選択された予測セットが、明確に定義された意味で「形式的」であることが要求される。予測ラベルセットや予測間隔を十分に小さくしたり、null値を除外したり、あるいは他の適切な「モノトーン」制約に従う場合にのみ、分析者が情報的とみなすような分類と回帰の両方について検討する。本研究は,様々なアプリケーションへの関心を多岐にわたってカバーするが,提案したサンプルに対して偽カバレッジ率(FCR)を制御しながら,このような情報的共形予測セットを構築するための統一的なフレームワークを開発する。選択後の共形予測セットは、この分野における最近の文献の焦点となっているが、InfoSPとInfoSCOPと呼ばれる新しい手順は、情報的予測セットにFCR制御を提供する最初の方法である。提案手法の有効性を実データおよびシミュレーションデータに示す。 In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictors. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction label sets or prediction intervals small enough, excluding null values, or obeying other appropriate `monotone' constraints. While this covers many settings of possible interest in various applications, we develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. While conformal prediction sets after selection have been the focus of much recent literature in the field, the new introduced procedures, called InfoSP and InfoSCOP, are to our knowledge the first ones providing FCR control for informative prediction sets. We show the usefulness of our resulting procedures on real and simulated data.	翻訳日:2024-03-20 18:02:18 公開日:2024-03-18
# 大規模言語モデルを用いて臨床ノートから物質使用障害の重症度を抽出する:ゼロショット学習アプローチ Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach ( http://arxiv.org/abs/2403.12297v1 ) ライセンス: Link先を確認	Maria Mahbub, Gregory M. Dams, Sudarshan Srinivasan, Caitlin Rizy, Ioana Danciu, Jodie Trafton, Kathryn Knight,	(参考訳) 物質利用障害 (SUD) は、健康や社会に有害な影響があるとして大きな懸念を抱いている。 SUDの識別と治療は、重症度、共同決定要因(例えば、離脱症状)、社会的決定要因など、様々な要因に依存している。国際疾患分類(ICD-10)のようなアメリカの保険会社が使用している既存の診断符号システムでは、特定の診断の粒度が不足しているが、臨床医はこの粒度(精神障害の診断・統計マニュアル(DSM-5)で見られるように)を臨床医に補足的な非構造テキストとして追加する。従来の自然言語処理(NLP)手法は、このような多様な臨床言語を正確に解析する際の限界に直面している。大きな言語モデル(LLM)は、多様な言語パターンに適応することで、これらの課題を克服する約束を提供する。本研究は,臨床ノートから重症度関連情報を抽出するためのLSMの応用について検討した。 LLMのゼロショット学習を巧みに構築したプロンプトと後処理技術を用いたワークフローを提案する。オープンソース LLM である Flan-T5 を用いた実験により,ルールベースアプローチよりも優れたリコールを実演する。重症度情報抽出におけるLSMsの有効性はSUD患者のリスク評価と治療計画の改善に寄与すると考えられる。 Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10), lack granularity for certain diagnoses, but clinicians will add this granularity (as that found within the Diagnostic and Statistical Manual of Mental Disorders classification or DSM-5) as supplemental unstructured text in clinical notes. Traditional natural language processing (NLP) methods face limitations in accurately parsing such diverse clinical language. Large Language Models (LLMs) offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of LLMs for extracting severity-related information for various SUD diagnoses from clinical notes. We propose a workflow employing zero-shot learning of LLMs with carefully crafted prompts and post-processing techniques. Through experimentation with Flan-T5, an open-source LLM, we demonstrate its superior recall compared to the rule-based approach. Focusing on 11 categories of SUD diagnoses, we show the effectiveness of LLMs in extracting severity information, contributing to improved risk assessment and treatment planning for SUD patients.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# R3DS:パノラマシーン理解のためのリアルな3Dシーン R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding ( http://arxiv.org/abs/2403.12301v1 ) ライセンス: Link先を確認	Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang,	(参考訳) 本稿では,Matterport3Dパノラマのリアルなシーン配置を反映した合成3Dシーンの3Dシーンデータセットについて紹介する。以前の研究と比較すると、R3DSはパノラマにおける現実世界の観測に関連付けられたオブジェクトを持つ、より完全で人口密度の高いシーンを持っている。 R3DSはまた、各シーンに対するオブジェクトサポート階層とオブジェクトセット(例えばダイニングテーブルの周りの同じ椅子)も提供する。 R3DSには、100以上のカテゴリから3,784個のCADモデルで表される19Kオブジェクトが含まれている。パノラマシーン理解作業におけるR3DSの有効性を示す。以下に示す。 1) R3DS のトレーニングは、より良い一般化を可能にする。 2)R3DSで訓練されたサポート関係予測は、ヒューリスティックに計算されたサポートよりも性能を向上させる。 3) R3DSはパノラマシーン理解に関する今後の研究に挑戦的なベンチマークを提供する。 We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objects linked to real-world observations in panoramas. R3DS also provides an object support hierarchy, and matching object sets (e.g., same chairs around a dining table) for each scene. Overall, R3DS contains 19K objects represented by 3,784 distinct CAD models from over 100 object categories. We demonstrate the effectiveness of R3DS on the Panoramic Scene Understanding task. We find that: 1) training on R3DS enables better generalization; 2) support relation prediction trained with R3DS improves performance compared to heuristically calculated support; and 3) R3DS offers a challenging benchmark for future work on panoramic scene understanding.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 超次元グラフ分類を用いた分子分類 Molecular Classification Using Hyperdimensional Graph Classification ( http://arxiv.org/abs/2403.12307v1 ) ライセンス: Link先を確認	Pere Verges, Igor Nunes, Mike Heddes, Tony Givargis, Alexandru Nicolau,	(参考訳) 我々の研究は超次元コンピューティングを活用したグラフ学習の革新的アプローチを導入している。グラフは情報伝達のための広く受け入れられた手法であり,その学習における利用が注目されている。これは、グラフ表現からの学習が重要な役割を果たすケモインフォマティクスの分野において顕著である。この領域における重要な応用は、様々な分子構造にまたがるがん細胞の同定である。本稿では,グラフニューラルネットワーク (GNN) やWeisfieler-Lehmanグラフカーネル (WL) のような最先端のモデルと比較して,曲線の下の面積を比較検討する。さらに,従来の超次元計算グラフ学習法よりも優れていた。さらに、トレーニングフェーズの40倍の高速化と、GNNやWLモデルと比較して15倍の推論時間向上を実現している。このことはHDCベースの手法の有効性を裏付けるだけでなく、高速かつ資源効率の高いグラフ学習の可能性も強調している。 Our work introduces an innovative approach to graph learning by leveraging Hyperdimensional Computing. Graphs serve as a widely embraced method for conveying information, and their utilization in learning has gained significant attention. This is notable in the field of chemoinformatics, where learning from graph representations plays a pivotal role. An important application within this domain involves the identification of cancerous cells across diverse molecular structures. We propose an HDC-based model that demonstrates comparable Area Under the Curve results when compared to state-of-the-art models like Graph Neural Networks (GNNs) or the Weisfieler-Lehman graph kernel (WL). Moreover, it outperforms previously proposed hyperdimensional computing graph learning methods. Furthermore, it achieves noteworthy speed enhancements, boasting a 40x acceleration in the training phase and a 15x improvement in inference time compared to GNN and WL models. This not only underscores the efficacy of the HDC-based method, but also highlights its potential for expedited and resource-efficient graph learning.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 自動微分によるファジィシステム最適化-ファジィRを事例として Gradient-based Fuzzy System Optimisation via Automatic Differentiation -- FuzzyR as a Use Case ( http://arxiv.org/abs/2403.12308v1 ) ライセンス: Link先を確認	Chao Chen, Christian Wagner, Jonathan M. Garibaldi,	(参考訳) ファジィセットとシステムは、導入以来、モデリング、知識表現、推論における汎用性で知られる研究の重要領域となり、文脈説明可能なAIの中でその可能性が高まっている。ファジィシステムの応用は多岐にわたるが、機械学習の観点からの設計は比較的進歩していない。言い換えれば、ニューラルネットワークのような表現は、トレーニングメカニズムや利用可能なツール、特に勾配降下の進歩と組み合わせて、計算性能の向上によって引き起こされる学習能力のブームから恩恵を受けているが、ファジィシステム設計への影響は限られている。本稿では,ファジィシステム設計者の複雑な微分計算から自由なファジィシステム設計へ向け,特にニューラルネットワーク学習における自動微分に焦点をあて,ファジィシステムの機能的・説明可能性面により焦点をあてる。本稿では,ファジィ・システム設計の将来の可能性について論じ,ファジィ・システム設計における現在のファジィ推論システムの実装を,自動微分ツールセットの強力な機能を活用するためにどのように調整できるかを示すユースケースを紹介する。 Since their introduction, fuzzy sets and systems have become an important area of research known for its versatility in modelling, knowledge representation and reasoning, and increasingly its potential within the context explainable AI. While the applications of fuzzy systems are diverse, there has been comparatively little advancement in their design from a machine learning perspective. In other words, while representations such as neural networks have benefited from a boom in learning capability driven by an increase in computational performance in combination with advances in their training mechanisms and available tool, in particular gradient descent, the impact on fuzzy system design has been limited. In this paper, we discuss gradient-descent-based optimisation of fuzzy systems, focussing in particular on automatic differentiation--crucial to neural network learning--with a view to free fuzzy system designers from intricate derivative computations, allowing for more focus on the functional and explainability aspects of their design. As a starting point, we present a use case in FuzzyR which demonstrates how current fuzzy inference system implementations can be adjusted to leverage powerful features of automatic differentiation tools sets, discussing its potential for the future of fuzzy system design.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 世界モデルによる遅延観測からの強化学習 Reinforcement Learning from Delayed Observations via World Models ( http://arxiv.org/abs/2403.12309v1 ) ライセンス: Link先を確認	Armin Karamzade, Kyungmin Kim, Montek Kalsi, Roy Fox,	(参考訳) 標準的な強化学習設定では、エージェントは通常、実行後にアクションの効果について即時にフィードバックを受けます。しかし、実際には、この仮定は物理的制約のために成り立たず、RLアルゴリズムの性能に大きな影響を及ぼす可能性がある。本稿では,部分的に観測可能な環境下での観測遅延に対処することに焦点を当てる。本稿では、過去の観測と学習のダイナミクスを統合することに成功している世界モデルを活用して、観測遅延を処理することを提案する。遅延PMDPを世界モデルで遅延MDPに還元することで,既存手法が観測性能を低下させる場合や,観測可能性の低下にともなって急速に劣化するおそれのある部分的可観測性を効果的に処理できる。実験の結果,提案手法の1つが,モデルベースアプローチを最大で30パーセント上回ることが示唆された。さらに,視覚的インプットに基づく遅延環境において,視覚的観察における遅延認識強化学習を初めて示す手法について検討した。 In standard Reinforcement Learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of RL algorithms. In this paper, we focus on addressing observation delays in partially observable environments. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays. By reducing delayed POMDPs to delayed MDPs with world models, our methods can effectively handle partial observability, where existing approaches achieve sub-optimal performance or even degrade quickly as observability decreases. Experiments suggest that one of our methods can outperform a naive model-based approach by up to %30. Moreover, we evaluate our methods on visual input based delayed environment, for the first time showcasing delay-aware reinforcement learning on visual observations.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 3Dセンサーのプロトタイポ・ド・アン・コンタドール・ビディクショナル・オートマティコ・デ・ペルソナス・バサード Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D ( http://arxiv.org/abs/2403.12310v1 ) ライセンス: Link先を確認	Benjamín Ojeda-Magaña, Rubén Ruelas, José Guadalupe Robledo-Hernández, Víctor Manuel Rangel-Cobián, Fernando López Aguilar-Hernández,	(参考訳) RGB-Dセンサーとしても知られる3Dセンサーは、深度画像を利用して、各ピクセルがカメラから物体までの距離を計測し、構造化光や飛行時間などの原理を用いる。人工視覚の進歩により、物体の動きを伴わずにリアルタイムな物体検出が可能な安価な3Dカメラが、情報深度で2Dカメラを上回っている。これらのカメラは、様々な色や反射率の物体を識別でき、照明の変化の影響を受けない。プロトタイプはRGB-Dセンサーを使って、スタジアムや空港などの空間におけるセキュリティと監視を支援する。リアルタイムの占有率を決定し、緊急時に重要な最大容量をチェックする。このシステムには、RealSense D415奥行きカメラと、人物をカウントするオブジェクト検出アルゴリズムを実行するミニコンピュータと、身元確認用の2Dカメラが含まれている。このシステムは統計解析をサポートし、C++、Python、PHPとOpenCVを画像処理に使用し、会場の占有状況を監視するための包括的なアプローチを実証している。 3D sensors, also known as RGB-D sensors, utilize depth images where each pixel measures the distance from the camera to objects, using principles like structured light or time-of-flight. Advances in artificial vision have led to affordable 3D cameras capable of real-time object detection without object movement, surpassing 2D cameras in information depth. These cameras can identify objects of varying colors and reflectivities and are less affected by lighting changes. The described prototype uses RGB-D sensors for bidirectional people counting in venues, aiding security and surveillance in spaces like stadiums or airports. It determines real-time occupancy and checks against maximum capacity, crucial during emergencies. The system includes a RealSense D415 depth camera and a mini-computer running object detection algorithms to count people and a 2D camera for identity verification. The system supports statistical analysis and uses C++, Python, and PHP with OpenCV for image processing, demonstrating a comprehensive approach to monitoring venue occupancy.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# プライバシ保護フェデレーション学習におけるLoRAの改善 Improving LoRA in Privacy-preserving Federated Learning ( http://arxiv.org/abs/2403.12313v1 ) ライセンス: Link先を確認	Youbang Sun, Zitao Li, Yaliang Li, Bolin Ding,	(参考訳) ローランク適応(ローランク適応、LoRA)は、学習済み言語モデルにおけるタスク固有パラメータ効率の微調整(PEFT)手法の1つである。 LoRAは、凍結事前訓練された各モデルモジュールの上部に2つの訓練可能な階数分解行列の積を注入する。しかし、プライバシー保護連合学習(FL)の設定に適用した場合、次の事実によりLoRAは不安定になる可能性がある。 1)データの不均一性とマルチステップローカル更新の影響は無視できない。 2)差分プライバシー(DP)を保証するために勾配の更新を強制する付加雑音を増幅し得る。 3) 最終性能はハイパーパラメータの影響を受けやすい。これらの現象に繋がる重要な要因は、ローカルクライアントによって2つの低ランク行列を共同で最適化し、中央サーバによって個別に集約する、という不一致である。そこで本稿では,これらの課題を緩和し,さらに細調整 LLM の通信コストを半減させるため,LoRA の効率的かつ効率的なバージョンである Federated Freeze A LoRA (FFA-LoRA) を提案する。 FFA-LoRAの基本的な考え方は、ランダムに初期化された非ゼロ行列を修正し、ゼロ初期化行列を微調整することである。 LoRAと比較すると、FFA-LoRAはプライバシー保護FLの実用的および理論的利点によって動機付けられている。 FFA-LoRAは,様々なFLタスクにおいて,バニラロラよりも計算効率が良く,より一貫した性能を提供することを示した。 Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server. Thus, this paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges and further halve the communication cost of federated fine-tuning LLMs. The core idea of FFA-LoRA is to fix the randomly initialized non-zero matrices and only fine-tune the zero-initialized matrices. Compared to LoRA, FFA-LoRA is motivated by practical and theoretical benefits in privacy-preserved FL. Our experiments demonstrate that FFA-LoRA provides more consistent performance with better computational efficiency over vanilla LoRA in various FL tasks.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# OpenEval: 機能、アライメント、安全性にまたがる中国のLLMのベンチマーク OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety ( http://arxiv.org/abs/2403.12316v1 ) ライセンス: Link先を確認	Chuang Liu, Linhao Yu, Jiaxuan Li, Renren Jin, Yufei Huang, Ling Shi, Junhui Zhang, Xinmeng Ji, Tingting Cui, Tao Liu, Jinwang Song, Hongying Zan, Sun Li, Deyi Xiong,	(参考訳) 中国語大言語モデル(LLM)の急速な開発は、効率的なLLM評価に大きな課題をもたらす。現在のイニシアチブは、中国のLLMを評価するための新しいベンチマークや評価プラットフォームを導入しているが、これらの多くは主に機能に重点を置いており、通常は潜在的なアライメントや安全性の問題を見落としている。このギャップに対処するため、我々は、能力、アライメント、安全性にまたがって中国のLLMをベンチマークする評価テストベッドであるOpenEvalを紹介した。機能評価には,NLPタスク,ディシプリナリーナレッジ,コモンセンス推論,数学的推論という4つのサブディメンジョンから中国語LLMを評価するための12のベンチマークデータセットを含む。アライメントアライメントアセスメントのために、OpenEvalには、中国のLLMが出力するバイアス、攻撃性、不正性を調べる7つのデータセットが含まれている。高度なLCMの安全性、特に予測されるリスク(例えば、電力探索、自己認識)を評価するために、6つのデータセットを含む。これらのベンチマークに加えて、我々は、OpenEvalが中国のLLMの開発と一致しているか、あるいは、中国のLLMの開発をガイドする最先端のベンチマークデータセットを提供することができるように、段階的な公開評価とベンチマーク更新戦略を実装した。最初の公開評価では、オープンソースモデルとプロプライエタリモデルの両方を含む7Bから72Bパラメータにまたがる、さまざまな中国のLLMをテストしました。評価の結果,中国のLLMは特定のタスクにおいて顕著な性能を示したが,コモンセンス推論,アライメント,安全性といった幅広い側面に注意を向けるべきであることが示唆された。 The rapid development of Chinese large language models (LLMs) poses big challenges for efficient LLM evaluation. While current initiatives have introduced new benchmarks or evaluation platforms for assessing Chinese LLMs, many of these focus primarily on capabilities, usually overlooking potential alignment and safety issues. To address this gap, we introduce OpenEval, an evaluation testbed that benchmarks Chinese LLMs across capability, alignment and safety. For capability assessment, we include 12 benchmark datasets to evaluate Chinese LLMs from 4 sub-dimensions: NLP tasks, disciplinary knowledge, commonsense reasoning and mathematical reasoning. For alignment assessment, OpenEval contains 7 datasets that examines the bias, offensiveness and illegalness in the outputs yielded by Chinese LLMs. To evaluate safety, especially anticipated risks (e.g., power-seeking, self-awareness) of advanced LLMs, we include 6 datasets. In addition to these benchmarks, we have implemented a phased public evaluation and benchmark update strategy to ensure that OpenEval is in line with the development of Chinese LLMs or even able to provide cutting-edge benchmark datasets to guide the development of Chinese LLMs. In our first public evaluation, we have tested a range of Chinese LLMs, spanning from 7B to 72B parameters, including both open-source and proprietary models. Evaluation results indicate that while Chinese LLMs have shown impressive performance in certain tasks, more attention should be directed towards broader aspects such as commonsense reasoning, alignment, and safety.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# EffiPerception: さまざまな知覚タスクのための効率的なフレームワーク EffiPerception: an Efficient Framework for Various Perception Tasks ( http://arxiv.org/abs/2403.12317v1 ) ライセンス: Link先を確認	Xinhao Xiang, Simon Dräger, Jiawei Zhang,	(参考訳) 精度とメモリのトレードオフは、コンピュータビジョン認識タスクにおいて常に考慮すべき優先事項である。これまでは主に、効果的なデータ拡張、特徴抽出、学習戦略など、単一のあるいは小さなタスクに重点を置いてきた。提案されたモデルの性能は、特定の知覚タスクやデータセットに依存する可能性がある。共通学習パターンの探索とモジュールの堅牢性の向上を目的として,EffiPerceptionフレームワークを提案する。 2Dオブジェクト検出、3Dオブジェクト検出、2Dインスタンスセグメンテーション、3Dポイントクラウドセグメンテーション。全体として、このフレームワークは3つの部分から構成される: (1) 効率の良い特徴エクストラクタで、各モダリティの入力特徴を抽出する。 2) 効率的なレイヤ、機能表現をさらに処理するプラグイン・プラグイン・アウト・レイヤ、そして、ノイズの多い提案を練りながらコア学習情報を集約する。 (3)8ビットオプティマイザであるEffiOptimは,計算コストをさらに削減し,性能安定性を向上する。 KITTI、セマンティクス-KITTI、COCOデータセットの大規模な実験により、EffiPerceptionは4つの検出およびセグメンテーションタスクにおいて、従来よく検討されていた方法と比較して、大幅な精度とメモリ全体のパフォーマンス向上を示すことが示された。 The accuracy-speed-memory trade-off is always the priority to consider for several computer vision perception tasks. Previous methods mainly focus on a single or small couple of these tasks, such as creating effective data augmentation, feature extractor, learning strategies, etc. These approaches, however, could be inherently task-specific: their proposed model's performance may depend on a specific perception task or a dataset. Targeting to explore common learning patterns and increasing the module robustness, we propose the EffiPerception framework. It could achieve great accuracy-speed performance with relatively low memory cost under several perception tasks: 2D Object Detection, 3D Object Detection, 2D Instance Segmentation, and 3D Point Cloud Segmentation. Overall, the framework consists of three parts: (1) Efficient Feature Extractors, which extract the input features for each modality. (2) Efficient Layers, plug-in plug-out layers that further process the feature representation, aggregating core learned information while pruning noisy proposals. (3) The EffiOptim, an 8-bit optimizer to further cut down the computational cost and facilitate performance stability. Extensive experiments on the KITTI, semantic-KITTI, and COCO datasets revealed that EffiPerception could show great accuracy-speed-memory overall performance increase within the four detection and segmentation tasks, in comparison to earlier, well-respected methods.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 近似同値比:ニューラルネットワークトレーニングの強化のための前向きで並列なフレームワーク Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training ( http://arxiv.org/abs/2403.12320v1 ) ライセンス: Link先を確認	Zeliang Zhang, Jinyang Jiang, Zhuo Liu, Susan Liang, Yijie Peng, Chenliang Xu,	(参考訳) ニューラルネットワークトレーニングにおけるバックプロパゲーションの効率的かつ生物学的に可能な代替手段は、高い計算複雑性や、より深いネットワークにスケーラビリティを制限するニューラルネットワークに関する追加の仮定といった問題のために、依然として課題である。確率比法は、有望な勾配推定戦略を提供するが、特に推定分散を低減するために複数のデータのコピーをデプロイする場合、メモリ消費に制約される。本稿では,勾配推定における計算およびメモリ要求を軽減するために,LR法を近似する手法を提案する。 LRを用いた後方通過時の自然な並列性を利用して、前方パスと後方パスの両方をパイプライン化し、特殊なハードウェア上での計算により適した高性能なトレーニング戦略を提供する。広範囲にわたる実験は、ニューラルネットワークトレーニングにおける近似手法の有効性を実証している。この研究は、高速ニューラルネットワークトレーニングの実現における可能性比法の可能性を強調し、さらなる探索の道筋を示唆している。 Efficient and biologically plausible alternatives to backpropagation in neural network training remain a challenge due to issues such as high computational complexity and additional assumptions about neural networks, which limit scalability to deeper networks. The likelihood ratio method offers a promising gradient estimation strategy but is constrained by significant memory consumption, especially when deploying multiple copies of data to reduce estimation variance. In this paper, we introduce an approximation technique for the likelihood ratio (LR) method to alleviate computational and memory demands in gradient estimation. By exploiting the natural parallelism during the backward pass using LR, we further provide a high-performance training strategy, which pipelines both the forward and backward pass, to make it more suitable for the computation on specialized hardware. Extensive experiments demonstrate the effectiveness of the approximation technique in neural network training. This work underscores the potential of the likelihood ratio method in achieving high-performance neural network training, suggesting avenues for further exploration.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 組込みデバイスを用いた超次元計算による経皮アルコール濃度の高次検出 Enhanced Detection of Transdermal Alcohol Levels Using Hyperdimensional Computing on Embedded Devices ( http://arxiv.org/abs/2403.12323v1 ) ライセンス: Link先を確認	Manuel E. Segura, Pere Verges, Justin Tian Jin Chen, Ramesh Arangott, Angela Kristine Garcia, Laura Garcia Reynoso, Alexandru Nicolau, Tony Givargis, Sergio Gago-Masague,	(参考訳) アルコール摂取は個人の健康に重大な影響を与え、消費が過度になるとさらに顕著な結果をもたらす。より健康的な飲酒習慣を促進するアプローチの1つは、重度の飲酒エピソード中に酔っぱらいを示すタイムリーな通知を送る、ジャスト・イン・タイムの介入を実装することである。しかし、介入機構の複雑さや侵襲性は、実際に使用することを妨げる可能性がある。これまでの研究では、大量の飲酒エピソードを分類するために、収集されたモーションデータと従来の機械学習(ML)アルゴリズムを使用して、モバイルデバイスの非現実的な精度と計算効率でこの問題に取り組んできた。その結果、私たちは、スマートフォン、スマートウェアラブル、IoTデプロイメントに実用的なジャスト・イン・タイムの介入アプローチを設計するために、Hyperdimensional Computing(HDC)を使用することを選択しました。 HDCはリアルタイムセンサデータの処理を効率的に行うことが証明されたフレームワークである。このアプローチには、低レイテンシ、最小消費電力、高並列性など、いくつかのメリットがある。我々は、様々なHDC符号化設計を探求し、それらを様々なHDC学習モデルと組み合わせて、モバイルデバイスに最適な、実現可能なアプローチを作成する。以上の結果より, 精度は89 %であり, 現状よりも12 % 向上していることが明らかとなった。 Alcohol consumption has a significant impact on individuals' health, with even more pronounced consequences when consumption becomes excessive. One approach to promoting healthier drinking habits is implementing just-in-time interventions, where timely notifications indicating intoxication are sent during heavy drinking episodes. However, the complexity or invasiveness of an intervention mechanism may deter an individual from using them in practice. Previous research tackled this challenge using collected motion data and conventional Machine Learning (ML) algorithms to classify heavy drinking episodes, but with impractical accuracy and computational efficiency for mobile devices. Consequently, we have elected to use Hyperdimensional Computing (HDC) to design a just-in-time intervention approach that is practical for smartphones, smart wearables, and IoT deployment. HDC is a framework that has proven results in processing real-time sensor data efficiently. This approach offers several advantages, including low latency, minimal power consumption, and high parallelism. We explore various HDC encoding designs and combine them with various HDC learning models to create an optimal and feasible approach for mobile devices. Our findings indicate an accuracy rate of 89\%, which represents a substantial 12\% improvement over the current state-of-the-art.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 行列積状態と正確なフロケ量子スカーのタンジェント宇宙生成 Tangent space generators of matrix product states and exact Floquet quantum scars ( http://arxiv.org/abs/2403.12325v1 ) ライセンス: Link先を確認	Marko Ljubotina, Elena Petrova, Norbert Schuch, Maksym Serbyn,	(参考訳) 量子シミュレータの進歩は、量子多体系における効率的な状態準備を支援する理論的枠組みの開発を動機付けている。一般に、時間に依存したカップリングによるユニタリ進化を通じて、対象の絡み合った状態を作成することは難しい課題であり、解の存在とその性質についてはほとんど知られていない。本研究では連続的なユニタリ進化を通じて行列積状態(MPS)を作成するための構成的アプローチを開発する。我々は、与えられたMPSの特定の方向に沿ってその接空間の進化を正確に実装する演算子の明示的な構成を提供する。この作用素は有限範囲の局所項の和として記述できるが、一般には非エルミート的である。力学の非エルミート生成器の明示的な構成に基づき、所望のMPS進化を実装する演算子のエルミート列の存在と、演算子範囲で指数関数的に減少する誤差を実証する。この構成は、変換不変なMPS多様体における明示的な周期軌道上でベンチマークされる。 Floquetユニタリが軌道の1つの周期でダイナミクスを発生し、熱化固有状態の海に埋め込まれた近似MPSのような固有状態が特徴的であることを実証した。これらの結果から,本システムの構築は,多体系の状態準備と制御に有用であるだけでなく,Floquetスカーへの一般的な経路 – スペクトルに正確なMPS固有状態を持つ準局所生成体を周期的に駆動するモデルを提供する。 The advancement of quantum simulators motivates the development of a theoretical framework to assist with efficient state preparation in quantum many-body systems. Generally, preparing a target entangled state via unitary evolution with time-dependent couplings is a challenging task and very little is known about the existence of solutions and their properties. In this work we develop a constructive approach for preparing matrix product states (MPS) via continuous unitary evolution. We provide an explicit construction of the operator which exactly implements the evolution of a given MPS along a specified direction in its tangent space. This operator can be written as a sum of local terms of finite range, yet it is in general non-Hermitian. Relying on the explicit construction of the non-Hermitian generator of the dynamics, we demonstrate the existence of a Hermitian sequence of operators that implements the desired MPS evolution with the error which decreases exponentially with the operator range. The construction is benchmarked on an explicit periodic trajectory in a translationally invariant MPS manifold. We demonstrate that the Floquet unitary generating the dynamics over one period of the trajectory features an approximate MPS-like eigenstate embedded among a sea of thermalizing eigenstates. These results show that our construction is useful not only for state preparation and control of many-body systems, but also provides a generic route towards Floquet scars -- periodically driven models with quasi-local generators of dynamics that have exact MPS eigenstates in their spectrum.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# 学習可能なプロンプトを用いたテキスト・画像生成モデルにおける望ましくない概念の除去 Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts ( http://arxiv.org/abs/2403.12326v1 ) ライセンス: Link先を確認	Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung,	(参考訳) 生成モデルは、テキスト記述から視覚的に印象的なコンテンツを生成する素晴らしい可能性を示している。しかし、これらのモデルをフィルタリングされていないインターネットデータでトレーニングすると、学習のリスクが生じ、著作権や非倫理的コンテンツのような望ましくない概念が伝播する。本稿では,学習可能なプロンプトをクロスアテンションモジュールに組み込むことで,テキスト・画像生成モデルから望ましくない概念を除去する手法を提案する。この学習可能なプロンプトは、望ましくない概念の知識をそれに移し、これらの概念のモデルパラメータと対応するテキスト入力への依存を減らすために追加記憶として機能する。このような知識がプロンプトに伝達されるため、これらの望ましくない概念を根絶することはより安定し、他の概念に最小限の負の影響を与える。本研究では,本手法の安定拡散モデルにおける有効性を示すとともに,非不要な要素を保存しつつ,不要な内容の除去という観点から,最先端の消去手法よりも優れていることを示す。 Generative models have demonstrated remarkable potential in generating visually impressive content from textual descriptions. However, training these models on unfiltered internet data poses the risk of learning and subsequently propagating undesirable concepts, such as copyrighted or unethical content. In this paper, we propose a novel method to remove undesirable concepts from text-to-image generative models by incorporating a learnable prompt into the cross-attention module. This learnable prompt acts as additional memory to transfer the knowledge of undesirable concepts into it and reduce the dependency of these concepts on the model parameters and corresponding textual inputs. Because of this knowledge transfer into the prompt, erasing these undesirable concepts is more stable and has minimal negative impact on other concepts. We demonstrate the effectiveness of our method on the Stable Diffusion model, showcasing its superiority over state-of-the-art erasure methods in terms of removing undesirable content while preserving other unrelated elements.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# GT-Rain Single Image Deraining Challenge Report GT-Rain Single Image Deraining Challenge Report ( http://arxiv.org/abs/2403.12327v1 ) ライセンス: Link先を確認	Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng,	(参考訳) 本報告では,CVPR 2023のUG2+ワークショップにおいて,単一画像デライニングにおけるGT-Rainチャレンジの結果についてレビューする。本コンペティションの目的は、現実のシナリオにおける雨天現象の研究、新しい現実の雨天画像データセットの提供、および、実際の画像上での単一画像デコレーション手法の開発をさらに進めるための革新的なアイデアの創出である。送信はGT-Rainデータセットでトレーニングされ、15の追加シーンからなるデータセットの拡張で評価された。 GT-Rainのシーンは、雨が止まった後、実際の雨と地面の真実のイメージで構成されている。 275人の参加者がチャレンジに登録され、55人が最終テストフェーズに出場した。 This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained on the GT-Rain dataset and evaluated on an extension of the dataset consisting of 15 additional scenes. Scenes in GT-Rain are comprised of real rainy image and ground truth image captured moments after the rain had stopped. 275 participants were registered in the challenge and 55 competed in the final testing phase.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# テキストストリーム中のドリフトの生成方法 Methods for Generating Drift in Text Streams ( http://arxiv.org/abs/2403.12328v1 ) ライセンス: Link先を確認	Cristiano Mesquita Garcia, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr, Jean Paul Barddal,	(参考訳) システムと個人は継続的にデータを生成します。インターネット上では、人々は自分の知識、感情、意見を共有し、サービスや製品に関するレビューを提供する。これらのテキストデータから自動的に学習することで、組織や機関に洞察を与え、例えば財務的影響を防止できる。テキストデータから時間の経過とともに学習するには、機械学習システムは概念の漂流を考慮しなければならない。コンセプトドリフトは、実世界のデータセットで頻繁に発生する現象であり、時間とともにデータ分布の変化に対応する。例えば、感情の変化や単語の意味が時間とともに調整されるときに、概念の漂流が起こる。概念ドリフトは現実世界のアプリケーションでは頻繁に見られるが、ラベル付きドリフトを持つベンチマークデータセットは文献ではまれである。このギャップを埋めるため,本論文では,ラベル付きドリフトを用いたデータセット作成を容易にする4つのテキストドリフト生成手法を提案する。これらの手法はYelpとAirbnbのデータセットに適用され、ストリームマイニングパラダイムに関するインクリメンタルな分類器を使用して、ドリフトから回復する能力を評価する。その結果、ドリフトの直後に全てのメソッドのパフォーマンスが劣化し、インクリメンタルSVMは、精度とマクロF1スコアに関する前のパフォーマンスレベルを実行および回復するのに最も高速であることがわかった。 Systems and individuals produce data continuously. On the Internet, people share their knowledge, sentiments, and opinions, provide reviews about services and products, and so on. Automatically learning from these textual data can provide insights to organizations and institutions, thus preventing financial impacts, for example. To learn from textual data over time, the machine learning system must account for concept drift. Concept drift is a frequent phenomenon in real-world datasets and corresponds to changes in data distribution over time. For instance, a concept drift occurs when sentiments change or a word's meaning is adjusted over time. Although concept drift is frequent in real-world applications, benchmark datasets with labeled drifts are rare in the literature. To bridge this gap, this paper provides four textual drift generation methods to ease the production of datasets with labeled drifts. These methods were applied to Yelp and Airbnb datasets and tested using incremental classifiers respecting the stream mining paradigm to evaluate their ability to recover from the drifts. Results show that all methods have their performance degraded right after the drifts, and the incremental SVM is the fastest to run and recover the previous performance levels regarding accuracy and Macro F1-Score.	翻訳日:2024-03-20 17:52:34 公開日:2024-03-18
# データ拡張によるニューラルネットワークの後方不確かさの定量化 Posterior Uncertainty Quantification in Neural Networks using Data Augmentation ( http://arxiv.org/abs/2403.12729v1 ) ライセンス: Link先を確認	Luhuan Wu, Sinead Williamson,	(参考訳) 本稿では,予測フレームワークを通じてディープラーニングにおける不確実性定量化の問題にアプローチし,予測できない将来のデータの予測分布に関する仮定を定め,モデルパラメータの不確かさを捉える。この観点から、深層化(Lakshminarayanan et al , 2017)は、将来のデータが既存の観測でのみサポートされることを前提に、基本的に誤った仕様のモデルクラスであることを示す。この制限に対処するため,一般的なデータ拡張手法を用いて,より現実的な予測分布を構築する手法であるMixupMPを提案する。 MixupMPは深層アンサンブルの代替として機能し、各アンサンブルメンバーはこの予測分布からランダムなシミュレーションに基づいて訓練される。最近提案されたマーティンゲイル後部の枠組み(Fong et al , 2023)に基づいて、MixupMPは暗黙的に定義されたベイズ後部のサンプルを返す。実験により,MixupMPは既存のベイズ的手法や非ベイズ的手法と比較して,様々な画像分類データセットにおいて優れた予測性能と不確かさの定量化を実現していることが示された。 In this paper, we approach the problem of uncertainty quantification in deep learning through a predictive framework, which captures uncertainty in model parameters by specifying our assumptions about the predictive distribution of unseen future data. Under this view, we show that deep ensembling (Lakshminarayanan et al., 2017) is a fundamentally mis-specified model class, since it assumes that future data are supported on existing observations only -- a situation rarely encountered in practice. To address this limitation, we propose MixupMP, a method that constructs a more realistic predictive distribution using popular data augmentation techniques. MixupMP operates as a drop-in replacement for deep ensembles, where each ensemble member is trained on a random simulation from this predictive distribution. Grounded in the recently-proposed framework of Martingale posteriors (Fong et al., 2023), MixupMP returns samples from an implicitly defined Bayesian posterior. Our empirical analysis showcases that MixupMP achieves superior predictive performance and uncertainty quantification on various image classification datasets, when compared with existing Bayesian and non-Bayesian approaches.	翻訳日:2024-03-20 14:03:59 公開日:2024-03-18
# スペクトルインバージョンによる超高速パルスの超解像 Super-resolution of ultrafast pulses via spectral inversion ( http://arxiv.org/abs/2403.12746v1 ) ライセンス: Link先を確認	Michał Lipka, Michał Parniak,	(参考訳) 古典分光の分解能限界は、複素電磁場の位相に含まれる情報を活用する量子インスピレーション法によって超えることができる。空間イメージングにおけるそれらの実装は広く議論され、実証されてきたが、スペクトル領域の実装は少なく、少ない。広帯域光(10～100GHz)を対象とした分光超解像法を実験的に実証し,画像反転干渉計のスペクトル領域アナログに基づく。原理実証実験において、等輝度の2つの非コヒーレントスペクトル特徴と、コヒーレンス時間当たりの光子との小さな分離を推定するパラダイム的問題を考察した。漸近的推定理論の根拠として、スペクトル直接像に対する2ドル以上の改善は、所定の推定値の分散に必要な資源(光子)の観点から示される。この装置は、電気光学タイムレンズとインバージョンを実装したパッシブスペクトル分散器を備えた、アクティブに安定化されたマッハ・ツェンダー型干渉計に基づいている。このように、このメソッドはオンチップの統合、優れたスケーラビリティ、さらにモードソートなどのアプリケーションを実現する。 The resolution limits of classical spectroscopy can be surpassed by quantum-inspired methods leveraging the information contained in the phase of the complex electromagnetic field. Their counterpart in spatial imaging has been widely discussed and demonstrated; however, the spectral-domain implementations are few and scarce. We experimentally demonstrate a spectroscopic super-resolution method aimed at broadband light (10s to 100s of GHz), and based on the spectral-domain analog of image inversion interferometry. In a proof-of-principle experiment, we study the paradigmatic problem of estimating a small separation between two incoherent spectral features of equal brightness, with a small number of photons per coherence time. On the grounds of asymptotic estimation theory, more than a $2$-fold improvement over the spectral direct imaging is demonstrated in terms of required resources (photons) for a given estimator variance. The setup is based on an actively stabilized Mach-Zehnder-type interferometer with electro-optic time lenses and passive spectral dispersers implementing the inversion. As such, the method promises on-chip integration, good scalability, and further applications e.g. for mode sorting.	翻訳日:2024-03-20 14:03:58 公開日:2024-03-18
# N-Modal Contrastive LossesとTrimodal空間におけるソーシャルメディアデータへの応用 N-Modal Contrastive Losses with Applications to Social Media Data in Trimodal Space ( http://arxiv.org/abs/2403.12747v1 ) ライセンス: Link先を確認	William Theisen, Walter Scheirer,	(参考訳) コンフリクトダイナミクスのソーシャルメディアの展望は、ますますマルチモーダル化している。 CLIPのようなモデルアーキテクチャの最近の進歩により、研究者はテキストのモダリティと画像の共有潜在空間における相互作用を研究することができるようになった。しかし、CLIPモデルでは、投稿中のモダリティが2つを超えると、ソーシャルメディア上の状況に対処できない。ソーシャルメディアのダイナミクスは、テキストと画像の相互作用を理解するだけでなく、ビデオも理解する必要があることが多い。本稿では,任意のモダリティを許容するコントラッシブ・ロス関数の拡張について検討し,ソーシャルメディア上でのトリモーダル・スペースにおけるその有用性を示す。 CLIPを3次元に拡張することで、3つのモダリティがすべて存在するソーシャルメディアの風景(より一般的な状況)の理解をさらに助長することができる。我々は、新たに収集された3つのモダリティを含むTelegramポストの公開データセットを使用して、2つのOSINTシナリオにおけるトリモーダルモデルの有用性を実証する。トリモーダルCLIPモデルはこれまで検討されてきたが(ソーシャルメディアデータにはないが)、新しいクアッドモーダルCLIPモデルも提示する。このモデルは、テキスト、画像、ビデオ、オーディオ間の相互作用を学ぶことができる。クアッドモデルモデルに対する検索における最新技術ベースラインの新たな結果を示す。 The social media landscape of conflict dynamics has grown increasingly multi-modal. Recent advancements in model architectures such as CLIP have enabled researchers to begin studying the interplay between the modalities of text and images in a shared latent space. However, CLIP models fail to handle situations on social media when modalities present in a post expand above two. Social media dynamics often require understanding the interplay between not only text and images, but video as well. In this paper we explore an extension of the contrastive loss function to allow for any number of modalities, and demonstrate its usefulness in trimodal spaces on social media. By extending CLIP into three dimensions we can further aide understanding social media landscapes where all three modalities are present (an increasingly common situation). We use a newly collected public data set of Telegram posts containing all three modalities to train, and then demonstrate the usefulness of, a trimodal model in two OSINT scenarios: classifying a social media artifact post as either pro-Russian or pro-Ukrainian and identifying which account a given artifact originated from. While trimodal CLIP models have been explored before (though not on social media data), we also display a novel quadmodal CLIP model. This model can learn the interplay between text, image, video, and audio. We demonstrate new state-of-the-art baseline results on retrieval for quadmodel models moving forward.	翻訳日:2024-03-20 14:03:58 公開日:2024-03-18
# NovelQA: ロングランジな質問回答のベンチマーク NovelQA: A Benchmark for Long-Range Novel Question Answering ( http://arxiv.org/abs/2403.12766v1 ) ライセンス: Link先を確認	Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, Yue Zhang,	(参考訳) 大規模言語モデル(LLM)の急速な進歩は、特に長文情報の理解と処理において、自然言語処理における新たなフロンティアを導入している。しかしながら、これらのモデルの長期コンテキスト能力の評価は、現在のベンチマークの限界のため、依然として課題である。このギャップに対処するために,拡張テキストでLLMの能力をテストするためのベンチマークであるNovellQAを紹介する。ノベルクアは英語の小説から作られており、複雑さ、長さ、物語のコヒーレンスを独特にブレンドしており、LLMの深いテキスト理解を評価するのに理想的なツールである。本稿では,ノベルQAの設計と構築について述べる。 NovelQA上でのLong-context LLMの評価では、特にマルチホップ推論、詳細指向の質問、および10万以上のトークンを持つ非常に長い入力で直面する課題に、モデルのパフォーマンスに関する重要な洞察が示されています。その結果、LLMのさらなる進歩が、長期の文脈理解と計算文学研究を改善する必要性を浮き彫りにした。 The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with more than 100,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension and computational literary studies.	翻訳日:2024-03-20 13:53:54 公開日:2024-03-18
# ANIM:1枚のRGB-D画像からの人体再構成のための正確なニューラルインシシットモデル ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image ( http://arxiv.org/abs/2403.10357v2 ) ライセンス: Link先を確認	Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung,	(参考訳) 人間の形状学習の最近の進歩は、ニューラル暗黙のモデルが、限られた数のビューから、さらには単一のRGB画像から3次元の人間の表面を生成するのに効果的であることを示している。しかし、既存の単分子的アプローチは、顔、手、布のしわなどの細かい幾何学的詳細を回復するのに依然として苦労している。また、カメラの光学軸に沿って歪んだジオメトリーをもたらす奥行きの曖昧さも容易に生じやすい。本稿では,単視点RGB-D画像から任意の3次元形状を復元する新しい手法であるANIMを導入することにより,復元過程に深度観測を取り入れることのメリットを検討する。本モデルでは, 深度情報を活用し, 空間的関係を緩和し, 奥行きの曖昧さを緩和するために, 多解像度画素整列とボクセル整列の両特徴から幾何学的詳細を学習する。再建面上に位置する点の符号付き距離場推定の精度を向上させる深度スーパービジョン戦略を導入することにより、再建形状の質をさらに向上する。実験によると、ANIMはRGB、表面正規、ポイントクラウド、RGB-Dデータを入力として使用する最先端の作業よりも優れている。さらに、コンシューマグレードのRGB-Dカメラと組み合わせた高品質なスキャンを含む新しいマルチモーダルデータセットであるANIM-Realと、ANIMを微調整するためのプロトコルを導入し、現実世界の人間の捕獲から高品質な再構築を可能にする。 Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries along the camera optical axis. In this paper, we explore the benefits of incorporating depth observations in the reconstruction process by introducing ANIM, a novel method that reconstructs arbitrary 3D human shapes from single-view RGB-D images with an unprecedented level of accuracy. Our model learns geometric details from both multi-resolution pixel-aligned and voxel-aligned features to leverage depth information and enable spatial relationships, mitigating depth ambiguities. We further enhance the quality of the reconstructed shape by introducing a depth-supervision strategy, which improves the accuracy of the signed distance field estimation of points that lie on the reconstructed surface. Experiments demonstrate that ANIM outperforms state-of-the-art works that use RGB, surface normals, point cloud or RGB-D data as input. In addition, we introduce ANIM-Real, a new multi-modal dataset comprising high-quality scans paired with consumer-grade RGB-D camera, and our protocol to fine-tune ANIM, enabling high-quality reconstruction from real-world human capture.	翻訳日:2024-03-20 12:54:38 公開日:2024-03-18
# 安全ケース - 高度なAIシステムの安全性を正当化する方法 Safety Cases: How to Justify the Safety of Advanced AI Systems ( http://arxiv.org/abs/2403.10462v2 ) ライセンス: Link先を確認	Joshua Clymer, Nick Gabrieli, David Krueger, Thomas Larsen,	(参考訳) AIシステムがより高度化するにつれ、企業や規制機関は、トレーニングとデプロイが安全かどうかという難しい決定を下すことになる。これらの決定に備えて、我々は、AIシステムが大惨事を引き起こす可能性が低いという構造化された根拠である、開発者がどのようにして「安全ケース」を作ることができるかを調査する。我々は、安全ケースの組織化のための枠組みを提案し、安全を正当化するための4つのカテゴリについて議論する:大惨事を引き起こすことができないこと、十分な強力な制御手段、危害を引き起こす能力に拘わらず信頼感、そしてAIシステムがより強力なものになった場合、信頼できるAIアドバイザに言及する。我々は、各カテゴリにおける議論の具体的な例を評価し、AIシステムが安全にデプロイ可能であることを正当化するために、議論をどのように組み合わせるかを概説する。 As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is a structured rationale that AI systems are unlikely to cause a catastrophe. We propose a framework for organizing a safety case and discuss four categories of arguments to justify safety: total inability to cause a catastrophe, sufficiently strong control measures, trustworthiness despite capability to cause harm, and -- if AI systems become much more powerful -- deference to credible AI advisors. We evaluate concrete examples of arguments in each category and outline how arguments could be combined to justify that AI systems are safe to deploy.	翻訳日:2024-03-20 12:54:38 公開日:2024-03-18
# ディープラーニングによる制御可能なデータ生成 Controllable Data Generation by Deep Learning: A Review ( http://arxiv.org/abs/2207.09542v6 ) ライセンス: Link先を確認	Shiyu Wang, Yuanqi Du, Xiaojie Guo, Bo Pan, Zhaohui Qin, Liang Zhao,	(参考訳) 分子設計や画像編集,音声合成などの重要な応用が注目されている。従来の手作りのアプローチは、専門的な経験と集中的な人間の努力に大きく依存しているが、効率的で効率的なデータ生成をサポートするための科学的知識と低スループットの不足に悩まされている。近年,深層学習の進歩は,データの表現と特性を表現的手法で学習する機会を生み出している。このような能力は、データの構造的パターンと機能的特性の間の相互関係を決定する新しい方法を提供する。この記事では、制御可能な深層データ生成として知られるこの将来性のある研究領域について、体系的なレビューを行う。まず、この記事は潜在的な課題を提起し、予備機能を提供します。次に、制御可能な深層データ生成を正式に定義し、様々な技術に関する分類を提案し、この特定の領域における評価指標を要約する。その後、制御可能な深層データ生成のエキサイティングな応用を紹介し、既存の研究を実験的に分析し比較する。最後に、制御可能な深層データ生成の将来的な方向性を強調し、潜在的な5つの課題を特定します。 Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning has created the opportunity for expressive methods to learn the underlying representation and properties of data. Such capability provides new ways of determining the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationships to generate structural data, given the desired properties. This article is a systematic review that explains this promising research area, commonly known as controllable deep data generation. First, the article raises the potential challenges and provides preliminaries. Then the article formally defines controllable deep data generation, proposes a taxonomy on various techniques and summarizes the evaluation metrics in this specific domain. After that, the article introduces exciting applications of controllable deep data generation, experimentally analyzes and compares existing works. Finally, this article highlights the promising future directions of controllable deep data generation and identifies five potential challenges.	翻訳日:2024-03-20 06:58:04 公開日:2024-03-18
# 想像すらできないことについて、私たちは何を知っているだろうか? What can we know about that which we cannot even imagine? ( http://arxiv.org/abs/2208.03886v5 ) ライセンス: Link先を確認	David H. Wolpert,	(参考訳) このエッセイでは一連の質問を検討します。最初の疑問は、知能の生物学的機能、特に人間の知能の認知的補綴に関するものである。これらのことは、人類がこれまでに開発した最も重要な認知補綴物である、人間の言語に関する疑問に繋がる。人間の言語にカプセル化される認知力についてラプソーズするのは伝統的ですが、人間言語がいかに恐ろしいほど制限されているか、そして、言語で強化されているにもかかわらず、私たちの認知能力がどれほど制限されているかを強調します。これは、人間数学が究極的には人間の言語で定式化されているかどうかという疑問にもつながります。そして、これらの質問を組み合わせることで、このエッセイの指導的関心に対して、部分的で、ある種の副次的な答えを提示します。 In this essay I will consider a sequence of questions. The first questions concern the biological function of intelligence in general, and cognitive prostheses of human intelligence in particular. These will lead into questions concerning human language, perhaps the most important cognitive prosthesis humanity has ever developed. While it is traditional to rhapsodize about the cognitive power encapsulated in human language, I will emphasize how horribly limited human language is - and therefore how limited our cognitive abilities are, despite their being augmented with language. This will lead to questions of whether human mathematics, being ultimately formulated in terms of human language, is also deeply limited. I will then combine these questions to pose a partial, sort-of, sideways answer to the guiding concern of this essay: what we can ever discern about that we cannot even conceive?	翻訳日:2024-03-20 06:58:04 公開日:2024-03-18
# 自己監督型単眼深度推定におけるロバストなクロスビュー整合性について On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation ( http://arxiv.org/abs/2209.08747v3 ) ライセンス: Link先を確認	Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, Dacheng Tao,	(参考訳) 自己教師付き単眼深度推定(SS-MDE)において、例えば、光度整合性や3次元点雲の整合性について検討することで、顕著な進展が見られた。しかし、照明のバラツキ、オクルージョン、テクスチャのない領域、移動物体に非常に弱いため、様々な場面を扱えるほど頑丈ではない。この課題に対処するため,本稿では2種類の堅牢なクロスビュー整合性について検討する。第一に、隣接するフレーム間の空間オフセットフィールドは、変形可能なアライメントにより、隣接するフレームから参照フレームを再構成し、Depth Feature Alignment(DFA)ロスを介して時間深度特徴を整列させる。次に、基準フレームとその近傍フレームの3D点雲を算出してボクセル空間に変換し、ボクセルの点密度を算出し、ボクセル密度アライメント(VDA)損失を介して整列させる。このようにして、SS-MDEの深度特徴空間と3次元ボクセル空間の時間的コヒーレンスを利用して、「ポイント・ツー・ポイント」アライメントパラダイムを「リージョン・ツー・リージョン」パラダイムにシフトする。光度整合性損失や剛性点雲のアライメント損失と比較して、DFAとVDAの損失は、深い特徴の強い表現力と上記の課題に対するボクセル密度の高い耐性のため、より堅牢である。いくつかのアウトドアベンチマークの実験結果から,本手法は最先端技術より優れていることが示された。大規模なアブレーション研究と分析は、特に挑戦的な場面において、提案された損失の有効性を検証した。コードとモデルはhttps://github.com/sunnyHelen/RCVC-deepth.comで公開されている。 Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a Depth Feature Alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a Voxel Density Alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the "point-to-point" alignment paradigm to the "region-to-region" one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.	翻訳日:2024-03-20 06:58:04 公開日:2024-03-18
# MechProNet: 金属添加物製造における機械的特性の機械学習予測 MechProNet: Machine Learning Prediction of Mechanical Properties in Metal Additive Manufacturing ( http://arxiv.org/abs/2209.12605v2 ) ライセンス: Link先を確認	Parand Akbari, Masoud Zamani, Amir Mostafaei,	(参考訳) 金属添加物製造(MAM)における機械的特性の予測は、印刷部品の性能と信頼性を確保するとともに、特定の用途に適合するために不可欠である。しかし、MAMプロセスの機械的特性を推定する実験は手間がかかり高価であり、しばしば特定の材料やプロセスに限られる。機械学習(ML)手法は、処理パラメータと材料特性に基づいて機械特性を予測するための、より柔軟で費用効率の良いアプローチを提供する。本研究では,機械的特性を予測するためのMLモデルをベンチマークするための包括的なフレームワークを提案する。我々は,多種多様な資料から90以上のMAM記事およびデータシートから,140種類のMAMデータシートを含む広範な実験データセットを収集した。このデータセットは、MAM処理条件、機械、材料、および、降伏強度、究極の引張強さ、弾性率、伸長、硬さ、表面粗さなどの機械的特性に関する情報を含む。本フレームワークは,機械的特性を予測するための総合的な学習フレームワークを構築するために,MAMに特有な物理認識処理,調整可能なMLモデル,および調整された評価指標を組み込んだ。さらに、機械的特性に対するMLモデルの予測値の解明と解釈を行うための説明可能なAI手法、特にSHAP解析について検討する。さらに,データ駆動型明示モデルを開発し,処理パラメータと材料特性に基づいて機械的特性を推定し,従来のMLモデルと比較して高い解釈性を示した。 Predicting mechanical properties in metal additive manufacturing (MAM) is essential for ensuring the performance and reliability of printed parts, as well as their suitability for specific applications. However, conducting experiments to estimate mechanical properties in MAM processes can be laborious and expensive, and they are often limited to specific materials and processes. Machine learning (ML) methods offer a more flexible and cost-effective approach to predicting mechanical properties based on processing parameters and material properties. In this study, we introduce a comprehensive framework for benchmarking ML models for predicting mechanical properties. We compiled an extensive experimental dataset from over 90 MAM articles and data sheets from a diverse range of sources, encompassing 140 different MAM data sheets. This dataset includes information on MAM processing conditions, machines, materials, and resulting mechanical properties such as yield strength, ultimate tensile strength, elastic modulus, elongation, hardness, and surface roughness. Our framework incorporates physics-aware featurization specific to MAM, adjustable ML models, and tailored evaluation metrics to construct a comprehensive learning framework for predicting mechanical properties. Additionally, we explore the Explainable AI method, specifically SHAP analysis, to elucidate and interpret the predicted values of ML models for mechanical properties. Furthermore, data-driven explicit models were developed to estimate mechanical properties based on processing parameters and material properties, offering enhanced interpretability compared to conventional ML models.	翻訳日:2024-03-20 06:58:04 公開日:2024-03-18
# オーディオ・ビジュアル同期のためのマルチモーダル変圧器蒸留 Multimodal Transformer Distillation for Audio-Visual Synchronization ( http://arxiv.org/abs/2210.15563v3 ) ライセンス: Link先を確認	Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-yi Lee, Jyh-Shing Roger Jang,	(参考訳) 音声と視覚の同期は、ビデオ中の口の動きと音声が同期しているかどうかを判断することを目的としている。 VocaLiSTは、マルチモーダルトランスフォーマーを組み込んで、音声と視覚の対話情報をモデル化することで、最先端のパフォーマンスを実現する。しかし、それは高い計算資源を必要とし、現実世界のアプリケーションには実用的ではない。本稿ではMTDVocaLiSTモデルを提案する。 MTD損失により、MTDVocaLiSTモデルはVocaLiSTの変換器のクロスアテンション分布と値関係を深く模倣することができる。さらに、すべての層にわたるインタラクション情報を完全に活用するために、不確実性重み付けを利用する。提案手法は, 蒸留法の観点から, MTD損失は他の強い蒸留ベースラインよりも優れた性能を示す。蒸留モデルの性能の観点から 1)MTDVocaLiSTは、同様のサイズのSOTAモデル、SyncNet、Perfect Matchモデルを15.65%、そして3.35%で上回る。 2) MTDVocaLiSTはVocaLiSTのモデルサイズを83.52%削減するが、同様の性能を維持している。 Audio-visual synchronization aims to determine whether the mouth movements and speech in the video are synchronized. VocaLiST reaches state-of-the-art performance by incorporating multimodal Transformers to model audio-visual interact information. However, it requires high computing resources, making it impractical for real-world applications. This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss. MTD loss enables MTDVocaLiST model to deeply mimic the cross-attention distribution and value-relation in the Transformer of VocaLiST. Additionally, we harness uncertainty weighting to fully exploit the interaction information across all layers. Our proposed method is effective in two aspects: From the distillation method perspective, MTD loss outperforms other strong distillation baselines. From the distilled model's performance perspective: 1) MTDVocaLiST outperforms similar-size SOTA models, SyncNet, and Perfect Match models by 15.65% and 3.35%; 2) MTDVocaLiST reduces the model size of VocaLiST by 83.52%, yet still maintaining similar performance.	翻訳日:2024-03-20 06:58:04 公開日:2024-03-18
# 統一多角性アライメントを用いたロバスト領域適応物体検出 Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment ( http://arxiv.org/abs/2301.00371v2 ) ライセンス: Link先を確認	Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling,	(参考訳) ドメイン適応検出は、ターゲットドメイン上の検出器の一般化を改善することを目的としている。 2つのドメイン間の特徴分布の相違を低減するため、近年のアプローチでは、逆学習によって異なる粒度の特徴アライメントによってドメイン適応を実現している。しかし、複数の粒度とアライメントの異なる特徴、劣化検出の関係を無視する。これに対処するため,ドメイン不変な特徴学習のためのMGA(Multiple-granularity alignment)に基づく検出フレームワークを導入する。鍵となるのは、ピクセルレベル、インスタンスレベル、カテゴリレベルなど、さまざまな粒度の依存関係を同時にエンコードして、2つのドメインをアライメントすることだ。具体的には,画素レベルの特徴をベースとして,まずOmni-scale gated fusion (OSGF) モジュールを開発し,大規模コンボリューションを持つインスタンスの識別表現を集約し,堅牢なマルチスケール検出を実現する。さらに、複数の粒度判別器を導入し、ソースまたはターゲットドメイン、サンプルの異なる粒度がどこから来ているかを特定する。注意すべき点として、MGAは異なるカテゴリのインスタンス識別性を利用するだけでなく、2つのドメイン間のカテゴリ整合性を利用して検出する。さらに、モデル更新のためのモデルアセスメントを探索し、擬似ラベルを改善し、局所的な不整合問題を緩和し、検出ロバスト性を高める適応指数移動平均(AEMA)戦略を提案する。複数のドメイン適応シナリオに関する大規模な実験は、FCOSやFaster R-CNN検出器の他のアプローチよりもMGAの方が優れていることを検証している。コードはhttps://github.com/tiankongzhang/MGAでリリースされる。 Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.	翻訳日:2024-03-20 06:48:15 公開日:2024-03-18
# 自己教師付き音声表現学習のための低レイテンシアテンションモジュール A low latency attention module for streaming self-supervised speech representation learning ( http://arxiv.org/abs/2302.13451v2 ) ライセンス: Link先を確認	Jianbo Ma, Siqi Pan, Deepak Chandran, Andrea Fanelli, Richard Cartwright,	(参考訳) トランスはディープラーニングの基本的な構成要素であり、アテンションメカニズムはトランスのコアコンポーネントである。自己教師付き音声表現学習(SSRL)は、トランスフォーマーアーキテクチャの一般的なユースケースである。変圧器の因果挙動のため、SSRLにおける変圧器の使用は主に因果的応用に焦点が当てられている。しかし、音声処理のようなメディア処理の問題にはリアルタイムの解決が必要である。本稿では,SSRLアーキテクチャを低演算およびメモリ要求でトレーニングし,低レイテンシと固定レイテンシでリアルタイム推論を可能にするアテンションモジュールの実装について述べる。本稿では,ストリーミングアテンション (SA) と低遅延ストリーミングアテンション (LLSA) の2つのコンポーネントについて述べる。 SAは効率的なストリーミングSSRL実装の提案であり,LLSAはマスマスキング・カソーサル・アテンション(MAA)などの他のストリーミングアテンションアーキテクチャの遅延構築問題を解決し,複数層を積み重ねた場合でもレイテンシが1層に等しいことを保証している。本稿では,自動音声認識をダウンストリームタスクとするストリーミングSSRLをトレーニングすることにより,このバニラアテンション(AA),SA,LLSAの比較分析を行う。 librispeech-clean-100のトレーニングとlibrispeech-test-cleanのテストでは,低遅延注意モジュールの単語誤り率(WER)は5.84%であり,MAA(WER=13.82%)よりも大幅に向上した。私たちの実装では、推論のレイテンシも1.92秒から0.16秒に短縮しています。提案する低レイテンシモジュールは,従来のアコーザルトランスの利点の多くを保ちつつ,リアルタイムストリーミングアプリケーションに適用可能なレイテンシ特性も実現している。 The transformer is a fundamental building block in deep learning, and the attention mechanism is the transformer's core component. Self-supervised speech representation learning (SSRL) represents a popular use-case for the transformer architecture. Due to transformers' acausal behavior, the use of transformers for SSRL has been predominantly focused on acausal applications. However, several media processing problems, such as speech processing, require real-time solutions. In this paper, we present an implementation of the attention module that enables training of SSRL architectures with low compute and memory requirements, while allowing real-time inference with low and fixed latency. The attention module proposed in this paper includes two components, streaming attention (SA) and low-latency streaming attention (LLSA). The SA represents our proposal for an efficient streaming SSRL implementation, while the LLSA solves the latency build-up problem of other streaming attention architectures, such as the masked acausal attention (MAA), guaranteeing a latency equal to one layer even when multiple layers are stacked. We present a comparative analysis between the vanilla attention, which we will refer here as acausal attention (AA), the SA, and the LLSA, by training a streaming SSRL with automatic speech recognition as downstream task. When training on librispeech-clean-100 and testing on librispeech-test-clean, our low-latency attention module has a word error rate (WER) of 5.84%, which represents a significant improvement over the MAA (WER = 13.82%). Our implementation also reduces the inference latency from 1.92 to 0.16 seconds. The proposed low-latency module preserves many of the benefits of conventional acausal transformers, but also enables latency characteristics that make it applicable to real-time streaming applications.	翻訳日:2024-03-20 06:38:27 公開日:2024-03-18
# TQFTから見た通信プロトコルとQECC : その1:TQFTからのLOCCプロトコルとQECCの構築 Communication protocols and QECCs from the perspective of TQFT, Part I: Constructing LOCC protocols and QECCs from TQFTs ( http://arxiv.org/abs/2303.16461v2 ) ライセンス: Link先を確認	Chris Fields, James F. Glazebrook, Antonino Marciano,	(参考訳) トポロジカル量子場理論(TQFT)は、量子状態の準備と測定を記述するための一般的な最小推定言語を提供する。そのため、マルチエージェント通信プロトコル、例えばローカル操作、古典通信(LOCC)プロトコルを表現する汎用言語を提供する。ここでは、TQFTを用いてLOCCプロトコルを構築し、LOCCプロトコルが一般に量子誤り訂正符号(QECC)を誘導することを示す。量子ダーウィンとベル/EPRの実験によって記述されたマルチサーバシナリオを用いて、これらのLOCC誘発QECCがエンタングルメントを古典的冗長性に変換する方法を示す。第II部では,このようなQECCを,相互作用系間の境界における時空の出現,実行,誘導とみなすことができることを示す。本稿では,BF理論とチャーン・サイモンズ理論を用いて,エージェント間通信と時空の関係を考察し,位相的M理論を用いて検討する。 Topological quantum field theories (TQFTs) provide a general, minimal-assumption language for describing quantum-state preparation and measurement. They therefore provide a general language in which to express multi-agent communication protocols, e.g. local operations, classical communication (LOCC) protocols. Here we construct LOCC protocols using TQFT, and show that LOCC protocols generically induce quantum error-correcting codes (QECCs). Using multi-observer scenarios described by quantum Darwinism and Bell/EPR experiments as examples, we show how these LOCC-induced QECCs effectively convert entanglement into classical redundancy. In the accompanying Part II, we show that such QECCs can be regarded as implementing, or inducing the emergence of, spacetimes on the boundaries between interacting systems. We investigate this connection between inter-agent communication and spacetime using BF and Chern-Simons theories, and then using topological M-theory.	翻訳日:2024-03-20 06:38:27 公開日:2024-03-18
# エネルギー誘導型エントロピーニューラル最適輸送 Energy-guided Entropic Neural Optimal Transport ( http://arxiv.org/abs/2304.06094v4 ) ライセンス: Link先を確認	Petr Mokrov, Alexander Korotin, Alexander Kolesov, Nikita Gushchin, Evgeny Burnaev,	(参考訳) エネルギーベースのモデル(EBM)は、機械学習コミュニティで数十年にわたって知られている。エネルギポテンシャル(英語版) (unnormalized chance function) を用いて生成モデリング問題を解決する効率的な方法が数多く存在する。対照的に、オプティマルトランスポート(OT)と特にニューラルOTソルバの領域は、最近の研究(ロス関数としてOTを応用し、OTマップ自体をモデル化しないWGANベースのアプローチを除く)によって、明らかに研究され、制限されている。本研究では,EBMとEntropy-regularized OTのギャップを埋める。本稿では,前者の最近の発展と技術的改善を活用して,後者を豊かにするための新しい方法論を提案する。理論的な観点から、我々の手法の一般化境界を証明する。実際に,玩具2Dおよび画像領域における適用性を検証する。拡張性を示すために、事前訓練されたStyleGANを用いて、それを高解像度のAFHQ 512\times 512$unpaired I2I翻訳に適用する。簡単なこととして、我々はエネルギー誘導型エントロピーOT手法のバックボーンとして単純な短絡型ESMを選択し、より洗練されたESMを将来の研究に利用した。私たちのコードは、https://github.com/PetrMokrov/Energy-guided-Entropic-OTで利用可能です。 Energy-based models (EBMs) are known in the Machine Learning community for decades. Since the seminal works devoted to EBMs dating back to the noughties, there have been a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN-based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present a novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. From the theoretical perspective, we prove generalization bounds for our technique. In practice, we validate its applicability in toy 2D and image domains. To showcase the scalability, we empower our method with a pre-trained StyleGAN and apply it to high-res AFHQ $512\times 512$ unpaired I2I translation. For simplicity, we choose simple short- and long-run EBMs as a backbone of our Energy-guided Entropic OT approach, leaving the application of more sophisticated EBMs for future research. Our code is available at: https://github.com/PetrMokrov/Energy-guided-Entropic-OT	翻訳日:2024-03-20 06:38:27 公開日:2024-03-18
# BackCache: キャッシュラインの排除によるコンテントベースのキャッシュタイムアタックの軽減 BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions ( http://arxiv.org/abs/2304.10268v4 ) ライセンス: Link先を確認	Quancheng Wang, Xige Zhang, Han Wang, Yuzhe Gu, Ming Tang,	(参考訳) キャッシュはCPUとメモリ間の速度差を減らし、最新のプロセッサの性能を改善するために使われる。しかし、攻撃者は競合ベースのキャッシュタイミング攻撃を使用して、慎重に設計されたキャッシュ消去セットを通じて被害者プロセスから機密情報を盗むことができる。また、L1データキャッシュ攻撃は広く利用されており、プライバシーと機密性の重大な脅威となる。既存のハードウェアベースの対策は、主にキャッシュパーティショニング、ランダム化、キャッシュラインのフラッシングに重点を置いている。本稿では、キャッシュミスではなくキャッシュヒットを常に達成し、L1データキャッシュに対する競合ベースのキャッシュタイミング攻撃を緩和する、新しいハードウェア・ソフトウェアの共同設計であるBackCacheを提案する。 BackCacheは、解放されたキャッシュラインをL1データキャッシュから完全に連想的なバックアップキャッシュに配置して、排除を隠蔽する。 BackCacheのセキュリティを改善するために,ランダムに使用される代用ポリシー(RURP)と動的バックアップキャッシュリサイズ機構を導入する。 BackCacheの有効性を示すための理論的セキュリティ分析も提示する。 gem5シミュレータによる評価では,OSカーネル,シングルスレッド,マルチスレッドのベンチマークにおいて,BackCacheはパフォーマンスを1.33%,7.34%,7.59%低下させることができる。 Caches are used to reduce the speed differential between the CPU and memory to improve the performance of modern processors. However, attackers can use contention-based cache timing attacks to steal sensitive information from victim processes through carefully designed cache eviction sets. And L1 data cache attacks are widely exploited and pose a significant privacy and confidentiality threat. Existing hardware-based countermeasures mainly focus on cache partitioning, randomization, and cache line flushing, which unfortunately either incur high overhead or can be circumvented by sophisticated attacks. In this paper, we propose a novel hardware-software co-design called BackCache with the idea of always achieving cache hits instead of cache misses to mitigate contention-based cache timing attacks on the L1 data cache. BackCache places the evicted cache lines from the L1 data cache into a fully-associative backup cache to hide the evictions. To improve the security of BackCache, we introduce a randomly used replacement policy (RURP) and a dynamic backup cache resizing mechanism. We also present a theoretical security analysis to demonstrate the effectiveness of BackCache. Our evaluation on the gem5 simulator shows that BackCache can degrade the performance by 1.33%, 7.34%, and 7.59% For OS kernel, single-thread, and multi-thread benchmarks.	翻訳日:2024-03-20 06:28:31 公開日:2024-03-18
# Tram: ソースコード要約のためのトークンレベルの検索強化メカニズム Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization ( http://arxiv.org/abs/2305.11074v2 ) ライセンス: Link先を確認	Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang,	(参考訳) プログラムの機能を記述するヒューマン可読テキストの自動生成は、ソースコードの要約の意図である。この分野では、ニューラルネットワークモデルは大きなパフォーマンスを達成するが、外部知識にアクセスできないため制限されている。この制限に対処するために、新たなトレンドは、ニューラルネットワークと外部知識を検索方法で組み合わせることである。従来はエンコーダ側の文レベルの検索パラダイムに頼っていた。しかし、このパラダイムは粗く、ノイズが充満しており、デコーダ側の高品質なサマリトークンを直接利用できない。本稿では、エンコーダ側ではなくデコーダ側で、より微細なトークンレベルの検索強化機構(Tram)を提案し、ニューラルネットワークの性能を高め、要約を生成する際により低周波のトークンを生成する。さらに、文脈的コード意味論の獲得におけるトークンレベルの検索の課題を克服するために、コード意味論を個々の要約トークンに統合することを提案する。広範囲な実験と人的評価の結果,トークンレベルの検索強化アプローチにより,性能が大幅に向上し,解釈性も向上した。 Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although neural language models achieve significant performance in this field, they are limited by their inability to access external knowledge. To address this limitation, an emerging trend is combining neural models with external knowledge through retrieval methods. Previous methods have relied on the sentence-level retrieval paradigm on the encoder side. However, this paradigm is coarse-grained, noise-filled and cannot directly take advantage of the high-quality retrieved summary tokens on the decoder side. In this paper, we propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side rather than the encoder side to enhance the performance of neural models and produce more low-frequency tokens in generating summaries. Furthermore, to overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens. The results of extensive experiments and human evaluation show that our token-level retrieval-augmented approach significantly improves performance and is more interpretable.	翻訳日:2024-03-20 06:28:31 公開日:2024-03-18
# LLM-CXR:CXR画像理解・生成のための命令型LCM LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation ( http://arxiv.org/abs/2305.11490v5 ) ライセンス: Link先を確認	Suhyeon Lee, Won Jun Kim, Jinho Chang, Jong Chul Ye,	(参考訳) LLMの印象的な発展に続いて、マルチモーダル推論と視覚IOを可能にするために、LLMにおける視覚言語アライメントが活発に研究されている。この研究の方向性は、医用画像解析と生成が、視覚的特徴と事前知識の組み合わせに基づく推論から成り立っているため、医用画像に特に関係している。近年の多くの研究は、画像処理ネットワークとLLM間の情報ブリッジとして機能するアダプタネットワークのトレーニングに重点を置いている。これは、胸部X線(CXR)などの医用画像の理解と生成が、正確な視覚的および言語に基づく推論だけでなく、2つのモダリティ間のより親密なマッピングを必要とするため、医療領域において特に重要である。そこで本稿では, 双方向画像とテキスト生成のためのトランスフォーマとVQ-GANの組み合わせに関する以前の研究から着想を得て, テキストのみに事前学習したLLMを指導し, 医用画像の視覚言語能力を得る手法を開発した。具体的には、事前学習されたLLMの既存の質問回答と指示追従能力を活用して、画像入力に関する質問に答えるよう指示し、左右対称に、画像ベースのテキスト生成とテキストベースの画像生成を含む多様なタスクでLLMをチューニングすることにより、所定のクエリに適したテキストと画像応答を出力する。提案手法で学習したLLM-CXRモデルでは,CXR理解タスクと生成タスクの両方において画像テキストのアライメントが向上する一方で,従来開発されたより狭い範囲のタスクを実行するモデルに比べて小型であることを示す。コードはhttps://github.com/hyn2028/llm-cxrにある。 Following the impressive development of LLMs, vision-language alignment in LLMs is actively being researched to enable multimodal reasoning and visual IO. This direction of research is particularly relevant to medical imaging because medical image analysis and generation consist of reasoning based on a combination of visual features and prior knowledge. Many recent works have focused on training adapter networks that serve as an information bridge between image processing networks and LLMs; but presumably, in order to achieve maximum reasoning potential of LLMs on visual information as well, visual and language features should be allowed to interact more freely. This is especially important in the medical domain because understanding and generating medical images such as chest X-rays (CXR) require not only accurate visual and language-based reasoning but also a more intimate mapping between the two modalities. Thus, taking inspiration from previous work on the transformer and VQ-GAN combination for bidirectional image and text generation, we build upon this approach and develop a method for instruction-tuning an LLM pre-trained only on text to gain vision-language capabilities for medical images. Specifically, we leverage a pretrained LLM's existing question-answering and instruction-following abilities to teach it to understand visual inputs by instructing it to answer questions about image inputs and, symmetrically, output both text and image responses appropriate to a given query by tuning the LLM with diverse tasks that encompass image-based text-generation and text-based image-generation. We show that our model, LLM-CXR, trained in this approach shows better image-text alignment in both CXR understanding and generation tasks while being smaller in size compared to previously developed models that perform a narrower range of tasks. The code is at https://github.com/hyn2028/llm-cxr.	翻訳日:2024-03-20 06:28:31 公開日:2024-03-18
# 有限データを用いた擬似生成モデルの学習のための位相データ拡張 Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data ( http://arxiv.org/abs/2305.12681v2 ) ライセンス: Link先を確認	Yuta Mimura,	(参考訳) 生成モデルは、現実的なイメージの作成に優れていますが、トレーニングのための広範なデータセットへの依存は、特にデータ収集がコストがかかる、あるいは難しい領域において、大きな課題を示します。現在のデータ効率の手法はGANアーキテクチャに重点を置いており、他の生成モデルの訓練にギャップを残している。本研究は,データ分散の変化を伴わずに,限られたデータシナリオでのトレーニングを最適化することで,このギャップに対処する新しい手法として「フェーズドデータ拡張」を紹介した。本手法は,学習段階を通じて拡張強度を制限することにより,限られたデータから学習するモデルの能力を高め,忠実性を維持する。提案手法は,PixelCNNとVQ-VAE-2を統合したモデルに適用し,様々なデータセットにおける定量評価と定性評価の両方において優れた性能を示す。これは、可能性に基づくモデルの効率的なトレーニングにおいて重要な一歩であり、GANだけでなく、データ拡張技術の有用性も拡張している。 Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges, especially in domains where data collection is costly or challenging. Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models. Our study introduces "phased data augmentation" as a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution. By limiting the augmentation intensity throughout the learning phases, our method enhances the model's ability to learn from limited data, thus maintaining fidelity. Applied to a model integrating PixelCNNs with VQ-VAE-2, our approach demonstrates superior performance in both quantitative and qualitative evaluations across diverse datasets. This represents an important step forward in the efficient training of likelihood-based models, extending the usefulness of data augmentation techniques beyond just GANs.	翻訳日:2024-03-20 04:32:24 公開日:2024-03-18
# 凸結合によるロバスト性検証のための表現的損失 Expressive Losses for Verified Robustness via Convex Combinations ( http://arxiv.org/abs/2305.13991v3 ) ライセンス: Link先を確認	Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan Kumar, Robert Stanforth, Alessio Lomuscio,	(参考訳) 検証された対向ロバスト性のためにネットワークをトレーニングするためには、摂動領域に対する最悪の損失を過度に近似することが一般的であり、その結果、標準的な性能を犠牲にして検証可能なネットワークが得られる。最近の研究で示されているように、敵のトレーニングと過剰近似を慎重に結合することで、精度と堅牢性の間のトレードオフをより良く得ることができる。損失関数の表現性は,下界と上界のトレードオフの範囲を1つのパラメータ(オーバー近似係数)を通して最悪の場合の損失に拡大する能力として形式化され,最先端の性能を達成するための鍵となる。本仮説を裏付けるために,敵攻撃とIPP境界の凸結合により得られた自明な表現的損失は,その概念的単純さにもかかわらず,様々な状況において最先端の結果をもたらすことを示す。本稿では, 過近似係数と異なる表現的損失に対する性能プロファイルの関係を詳細に解析し, 表現性は不可欠であるが, 最悪の場合の損失のより優れた近似は, 必ずしも優れた強靭性-精度トレードオフに関係しないことを示した。 In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.	翻訳日:2024-03-20 04:32:24 公開日:2024-03-18
# tドープ安定化状態の学習 Learning t-doped stabilizer states ( http://arxiv.org/abs/2305.15398v4 ) ライセンス: Link先を確認	Lorenzo Leone, Salvatore F. E. Oliviero, Alioscia Hamma,	(参考訳) 本稿では,有限個の$t$ of$T$-gateをドープしたクリフォード回路を用いて,計算基底状態から得られる学習状態を対象とした学習アルゴリズムを提案する。このアルゴリズムは、パウリ観測可能量の観点から、$t$ドープ安定化状態の正確なトモグラフィ記述を学習する。このような状態は可算であり、離散集合を形成するからである。この問題に対処するために、$t$ドープ安定化状態のための新しい代数的フレームワークを導入し、これは$T$ゲートを超えて拡張され、任意の種類の局所的非クリフォードゲートによるドーピングを含む。このアルゴリズムは、複雑さのリソースである$\text{poly}(n,2^t)$を必要とし、指数的に小さな失敗の確率を示す。 In this paper, we present a learning algorithm aimed at learning states obtained from computational basis states by Clifford circuits doped with a finite number $t$ of $T$-gates. The algorithm learns an exact tomographic description of $t$-doped stabilizer states in terms of Pauli observables. This is possible because such states are countable and form a discrete set. To tackle the problem, we introduce a novel algebraic framework for $t$-doped stabilizer states, which extends beyond $T$-gates and includes doping with any kind of local non-Clifford gate. The algorithm requires resources of complexity $\text{poly}(n,2^t)$ and exhibits an exponentially small probability of failure.	翻訳日:2024-03-20 04:32:24 公開日:2024-03-18
# 皮質表面再構成のための効率的な最適輸送による異方性メッシュ変形 Diffeomorphic Mesh Deformation via Efficient Optimal Transport for Cortical Surface Reconstruction ( http://arxiv.org/abs/2305.17555v3 ) ライセンス: Link先を確認	Tung Le, Khai Nguyen, Shanlin Sun, Kun Han, Nhat Ho, Xiaohui Xie,	(参考訳) メッシュ変形は、動的シミュレーション、レンダリング、再構成を含む多くの3次元視覚タスクにおいて重要な役割を果たす。しかしながら、予測されたメッシュとターゲットメッシュの効率的な相違を定義することは、未解決の問題である。現在のディープラーニングにおける一般的なアプローチは、2つのメッシュからランダムにサンプリングされた2つの点雲とシャンファーの擬似距離を比較することで、2つの面間の差を測定するセットベースアプローチである。それでも、セットベースのアプローチには、サンプリングされた点雲の点数を選択する理論的保証が欠如していることや、シャンファー発散の擬測度と二次複雑性など、制限がある。これらの問題に対処するために,メッシュ変形を学習するための新しい指標を提案する。この計量は、セットベースのアプローチを一般化する確率測度として表されるメッシュ上のワッサーシュタイン距離をスライスして定義される。確率測度空間を利用することで、連続的、経験的、離散的な測度などの様々な種類の確率測度を用いてメッシュを符号化する際の柔軟性を得る。確率測度を符号化した後、線形計算複雑性と効果的な最適輸送距離であるスライスされたワッサーシュタイン距離を用いてメッシュを比較することができ、メッシュの表面を近似するための高速な統計速度を提供することができる。最後に, 入力面を対象形状に変形させるために, 平面上の点の軌跡をモデル化したニューラル常微分方程式(ODE)を用いる。大脳皮質表面の再構成実験は、複数のデータセットやメトリクスにおいて、我々のアプローチが他の競合する手法を上回ることを示した。 Mesh deformation plays a pivotal role in many 3D vision tasks including dynamic simulations, rendering, and reconstruction. However, defining an efficient discrepancy between predicted and target meshes remains an open problem. A prevalent approach in current deep learning is the set-based approach which measures the discrepancy between two surfaces by comparing two randomly sampled point-clouds from the two meshes with Chamfer pseudo-distance. Nevertheless, the set-based approach still has limitations such as lacking a theoretical guarantee for choosing the number of points in sampled point-clouds, and the pseudo-metricity and the quadratic complexity of the Chamfer divergence. To address these issues, we propose a novel metric for learning mesh deformation. The metric is defined by sliced Wasserstein distance on meshes represented as probability measures that generalize the set-based approach. By leveraging probability measure space, we gain flexibility in encoding meshes using diverse forms of probability measures, such as continuous, empirical, and discrete measures via varifold representation. After having encoded probability measures, we can compare meshes by using the sliced Wasserstein distance which is an effective optimal transport distance with linear computational complexity and can provide a fast statistical rate for approximating the surface of meshes. To the end, we employ a neural ordinary differential equation (ODE) to deform the input surface into the target shape by modeling the trajectories of the points on the surface. Our experiments on cortical surface reconstruction demonstrate that our approach surpasses other competing methods in multiple datasets and metrics.	翻訳日:2024-03-20 04:32:24 公開日:2024-03-18
# Provable and Practical: Langevin Monte Carloによる強化学習の効率的な探索 Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo ( http://arxiv.org/abs/2305.18246v2 ) ライセンス: Link先を確認	Haque Ishfaq, Qingfeng Lan, Pan Xu, A. Rupam Mahmood, Doina Precup, Anima Anandkumar, Kamyar Azizzadenesheli,	(参考訳) 本稿では,強化学習(RL)のためのトンプソンサンプリングに基づく,スケーラブルで効果的な探索戦略を提案する。既存のトンプソンサンプリングアルゴリズムの重要な欠点の1つは、後続分布のガウス近似を実行する必要があることである。その代わりに、マルコフ連鎖モンテカルロ法(MCMC)の効率的な型であるランゲヴィンモンテカルロを用いて、その後部分布から直接Q関数をサンプリングする。提案手法では,Q関数の正確な後部分布を学習するためにのみ雑音勾配降下更新を行う必要があるため,より深いRLでの展開が容易である。提案手法の厳密な理論的解析を行い、線形マルコフ決定過程(線形MDP)において、$\tilde{O}(d^{3/2}H^{3/2}\sqrt{T})$の後悔境界を持ち、$d$は特徴写像の次元であり、$H$は計画的地平線であり、$T$はステップの総数であることを示す。我々は、Adam Optimizationrを用いて勾配更新を行うことにより、このアプローチをディープRLに適用する。提案手法は,Atari57スイートからのいくつかの挑戦的な探索課題において,最先端の深部RLアルゴリズムと比較して,より優れた,あるいは類似した結果が得られる。 We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distribution, which is not a good surrogate in most practical settings. We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo, an efficient type of Markov Chain Monte Carlo (MCMC) method. Our method only needs to perform noisy gradient descent updates to learn the exact posterior distribution of the Q function, which makes our approach easy to deploy in deep RL. We provide a rigorous theoretical analysis for the proposed method and demonstrate that, in the linear Markov decision process (linear MDP) setting, it has a regret bound of $\tilde{O}(d^{3/2}H^{3/2}\sqrt{T})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $T$ is the total number of steps. We apply this approach to deep RL, by using Adam optimizer to perform gradient updates. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.	翻訳日:2024-03-20 04:32:24 公開日:2024-03-18
# ChatGPT内容の検出可能性について:学術書記レンズによるベンチマーク,方法論,評価 On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing ( http://arxiv.org/abs/2306.05524v2 ) ライセンス: Link先を確認	Zeyan Liu, Zijun Yao, Fengjun Li, Bo Luo,	(参考訳) ChatGPTが注目を浴びる中、学術的な執筆を支援するために大きな言語モデル(LLM)を利用することは、コミュニティでかなりの議論を巻き起こしている。本稿では,学術文献におけるChatGPT生成コンテンツの検出可能性に関する総合的研究,特に学術論文の要約に着目し,学術分野におけるLCMの応用と政策の総合的な開発を支援することを目的とする。具体的には、コンピュータ科学、物理学、人文科学および社会科学における科学的著作の人間書き、GPT書き、GPT完備、およびGPT取り除かれた抽象のベンチマークデータセットであるGPABench2を初めて提示する。次に,ChatGPTの内容を検出する手法について検討する。まず、既存のChatGPT検出ツールの不満足な性能と、人間の評価者(240人以上の研究者や学生を含む)が直面している課題について検討する。次に、手作りの言語特徴モデルをベースラインとしてテストし、ChatGPT文の微妙で深い意味と言語パターンをよりよく捉えるために、CheckGPTというディープニューラルネットワークを開発する。最後に,各ベンチマークタスクにおいて,提案したCheckGPTフレームワークを異なる分野にわたって検証するための総合的な実験を行う。 ChatGPTコンテンツの検出性を評価するため、我々はCheckGPTの転送性、迅速なエンジニアリング、ロバスト性について広範な実験を行った。 With ChatGPT under the spotlight, utilizing large language models (LLMs) to assist academic writing has drawn a significant amount of debate in the community. In this paper, we aim to present a comprehensive study of the detectability of ChatGPT-generated content within the academic literature, particularly focusing on the abstracts of scientific papers, to offer holistic support for the future development of LLM applications and policies in academia. Specifically, we first present GPABench2, a benchmarking dataset of over 2.8 million comparative samples of human-written, GPT-written, GPT-completed, and GPT-polished abstracts of scientific writing in computer science, physics, and humanities and social sciences. Second, we explore the methodology for detecting ChatGPT content. We start by examining the unsatisfactory performance of existing ChatGPT detecting tools and the challenges faced by human evaluators (including more than 240 researchers or students). We then test the hand-crafted linguistic features models as a baseline and develop a deep neural framework named CheckGPT to better capture the subtle and deep semantic and linguistic patterns in ChatGPT written literature. Last, we conduct comprehensive experiments to validate the proposed CheckGPT framework in each benchmarking task over different disciplines. To evaluate the detectability of ChatGPT content, we conduct extensive experiments on the transferability, prompt engineering, and robustness of CheckGPT.	翻訳日:2024-03-20 04:32:24 公開日:2024-03-18
# Open Brain AI - 自動言語アセスメント Open Brain AI. Automatic Language Assessment ( http://arxiv.org/abs/2306.06693v2 ) ライセンス: Link先を確認	Charalambos Themistocleous,	(参考訳) 言語評価は、発達か獲得かにかかわらず、神経因性疾患によって引き起こされる言語、言語、コミュニケーション障害の個人を診断し、治療する上で重要な役割を担っている。しかし、現在の評価方法は、手動、努力、時間を要するので、さらなる患者ストレスを引き起こします。これらの課題に対処するため、Open Brain AI (https://openbrainai.com)を開発しました。この計算プラットフォームは、機械学習、自然言語処理、大規模言語モデル、音声からテキストへの自動書き起こしといった革新的なAI技術を活用し、多言語音声および書き起こし音声生成を自動的に分析する。本稿では,Open Brain AI,AI言語処理モジュールの開発,および談話マクロ構造とマイクロ構造の言語計測について論じる。言語の迅速かつ自動的な分析により、臨床医の負担が軽減され、ワークフローを合理化し、より多くの時間とリソースを患者のケアに直接割り当てることが可能になる。 Open Brain AIは自由に利用でき、臨床医に重要なデータ分析を行い、治療と治療の他の重要な側面により多くの注意とリソースを与える。 Language assessment plays a crucial role in diagnosing and treating individuals with speech, language, and communication disorders caused by neurogenic conditions, whether developmental or acquired. However, current assessment methods are manual, laborious, and time-consuming to administer and score, causing additional patient stress. To address these challenges, we developed Open Brain AI (https://openbrainai.com). This computational platform harnesses innovative AI techniques, namely machine learning, natural language processing, large language models, and automatic speech-to-text transcription, to automatically analyze multilingual spoken and written speech productions. This paper discusses the development of Open Brain AI, the AI language processing modules, and the linguistic measurements of discourse macro-structure and micro-structure. The fast and automatic analysis of language alleviates the burden on clinicians, enabling them to streamline their workflow and allocate more time and resources to direct patient care. Open Brain AI is freely accessible, empowering clinicians to conduct critical data analyses and give more attention and resources to other critical aspects of therapy and treatment.	翻訳日:2024-03-20 04:22:24 公開日:2024-03-18
# MotionGPT:精巧なLLMは汎用モーションジェネレータ MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators ( http://arxiv.org/abs/2306.10900v2 ) ライセンス: Link先を確認	Yaqi Zhang, Di Huang, Bin Liu, Shixiang Tang, Yan Lu, Lu Chen, Lei Bai, Qi Chu, Nenghai Yu, Wanli Ouyang,	(参考訳) 与えられた行動記述から現実的な人間の動きを生成することは、デジタル人間の要求が高まっているため、大きな進歩を経験してきた。近年の研究では、テキストによる動作記述から直接動作を生成するという印象的な成果が得られているが、実際のデジタルヒューマン産業での応用を制限する制御信号の単一のモダリティしかサポートしていないことが多い。本稿では,多言語モデル(LLM)における特殊入力トークンとしてマルチモーダル信号を扱うことで,多モーダル制御信号,例えばテキスト,単一フレームのポーズを連続的に生成できるMotionGPT(MotionGPT)を提案する。具体的には、まずマルチモーダル制御信号を離散符号に量子化し、次にそれらを統一的なプロンプト命令で定式化し、LCMに動作応答を生成する。我々のMotionGPTは、LLMパラメータのわずか0.4%をチューニングすることで、マルチモーダル制御信号を備えた統一された人間動作生成モデルを示す。私たちの知る限りでは、MotionGPTはマルチモーダル制御信号によって人間の動きを生成する最初の方法です。 https://qiqiapink.github.io/MotionGPT/ Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human industry. This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals, e.g., text and single-frame poses, for generating consecutive human motions by treating multimodal signals as special input tokens in large language models (LLMs). Specifically, we first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction to ask the LLMs to generate the motion answer. Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4% of LLM parameters. To the best of our knowledge, MotionGPT is the first method to generate human motion by multimodal control signals, which we hope can shed light on this new direction. Visit our webpage at https://qiqiapink.github.io/MotionGPT/.	翻訳日:2024-03-20 04:22:24 公開日:2024-03-18
# メモリ拡張アダプタを用いたプラガブルニューラルネットワーク変換モデル Pluggable Neural Machine Translation Models via Memory-augmented Adapters ( http://arxiv.org/abs/2307.06029v2 ) ライセンス: Link先を確認	Yuzhuang Xu, Shuo Wang, Peng Li, Xuebo Liu, Xiaolong Wang, Weidong Liu, Yang Liu,	(参考訳) ニューラルマシン翻訳(NMT)モデルは一般的なドメインではうまく機能するが、異なるユーザの要求を満たすために生成動作を制御することは、依然として困難である。コストのかかるトレーニングコストとユーザ要求毎に新しいモデルをスクラッチから学習する際のデータ不足を考慮し、プリトレーニングされたNMTモデルをプラガブルに操るメモリ拡張アダプタを提案する。具体的には,ユーザが提供するテキストサンプルに基づいて複数粒度メモリを構築し,モデル表現と検索結果を組み合わせた新しいアダプタアーキテクチャを提案する。また,NMTモデルとメモリ間の素早い依存関係を低減するため,メモリドロップアウトを用いたトレーニング戦略を提案する。提案手法はスタイルとドメイン固有の実験の両方において検証し,提案手法がいくつかの代表的プラグ可能なベースラインより優れていることを示す。 Although neural machine translation (NMT) models perform well in the general domain, it remains rather challenging to control their generation behavior to satisfy the requirement of different users. Given the expensive training cost and the data scarcity challenge of learning a new model from scratch for each user requirement, we propose a memory-augmented adapter to steer pretrained NMT models in a pluggable manner. Specifically, we construct a multi-granular memory based on the user-provided text samples and propose a new adapter architecture to combine the model representations and the retrieved results. We also propose a training strategy using memory dropout to reduce spurious dependencies between the NMT model and the memory. We validate our approach on both style- and domain-specific experiments and the results indicate that our method can outperform several representative pluggable baselines.	翻訳日:2024-03-20 04:12:33 公開日:2024-03-18
# 大規模言語モデルにおける文脈圧縮のための文脈内オートエンコーダ In-context Autoencoder for Context Compression in a Large Language Model ( http://arxiv.org/abs/2307.06945v3 ) ライセンス: Link先を確認	Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei,	(参考訳) 大規模言語モデル(LLM)のパワーを活用して,LLMによって様々な目的で直接条件付け可能な,長いコンテキストをコンパクトなメモリスロットに圧縮するICAEを提案する。 ICAEは、まず、大量のテキストデータに基づく自動符号化と言語モデリングの目的の両方を用いて事前訓練を行い、元のコンテキストを正確にかつ包括的に表現するメモリスロットを生成する。実験によると、我々の軽量ICAEは、約1%の追加パラメータを導入し、Llamaに基づくコンテキスト圧縮の4$\times$を実現し、推論中のレイテンシとGPUメモリコストの改善の両方の利点を提供し、メモリ化に関する興味深い洞察とスケーラビリティの可能性を示している。これらの有望な結果は、認知科学におけるワーキングメモリとLLMにおける表現学習の関連性に関する新たな視点を示し、LLMのコンテキスト管理におけるICAEの意義を明らかにしている。私たちのデータ、コード、モデルはhttps://github.com/getao/icae.comで公開されています。 We propose the In-context Autoencoder (ICAE), leveraging the power of a large language models (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context; Then, it is fine-tuned on instruction data for producing desirable responses to various prompts. Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves $4\times$ context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management. Our data, code and models are available at https://github.com/getao/icae.	翻訳日:2024-03-20 04:12:33 公開日:2024-03-18
# ニューラルアーキテクチャ検索 Neural Architecture Retrieval ( http://arxiv.org/abs/2307.07919v2 ) ライセンス: Link先を確認	Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu,	(参考訳) 新しいニューラルアーキテクチャの設計が増え、既存のニューラルアーキテクチャがかなり存在するため、研究者が既存のニューラルアーキテクチャと比較してコントリビューションを絞り込むことや、デザインと他の関連するアーキテクチャとの接続を確立することは難しくなる。類似したニューラルアーキテクチャを効率的かつ自動的に発見するために,クエリニューラルアーキテクチャに類似した設計を持つ既存のニューラルアーキテクチャの集合を検索するニューラルアーキテクチャ検索という新たな問題を定義する。既存のグラフ事前学習戦略は、グラフのサイズとモチーフのため、ニューラルネットワークの計算グラフに対処できない。この可能性を達成するために,これらの問題に対処するためにマクログラフを再構築するために使用されるモチーフにグラフを分割し,正確なグラフ表現学習を実現するためにマルチレベルコントラスト学習を導入することを提案する。人間の設計と合成の両方のニューラルネットワークアーキテクチャの大規模な評価は、我々のアルゴリズムの優位性を示している。 12万の現実世界のネットワークアーキテクチャと、その埋め込みを含むデータセットは、ニューラルネットワークの検索のために構築されている。 With the increasing number of new neural architecture designs and substantial existing neural architectures, it becomes difficult for the researchers to situate their contributions compared with existing neural architectures or establish the connections between their designs and other relevant ones. To discover similar neural architectures in an efficient and automatic manner, we define a new problem Neural Architecture Retrieval which retrieves a set of existing neural architectures which have similar designs to the query neural architecture. Existing graph pre-training strategies cannot address the computational graph in neural architectures due to the graph size and motifs. To fulfill this potential, we propose to divide the graph into motifs which are used to rebuild the macro graph to tackle these issues, and introduce multi-level contrastive learning to achieve accurate graph representation learning. Extensive evaluations on both human-designed and synthesized neural architectures demonstrate the superiority of our algorithm. Such a dataset which contains 12k real-world network architectures, as well as their embedding, is built for neural architecture retrieval.	翻訳日:2024-03-20 04:12:33 公開日:2024-03-18
# 離散スライスワッサースタイン損失の特性 Properties of Discrete Sliced Wasserstein Losses ( http://arxiv.org/abs/2307.10352v3 ) ライセンス: Link先を確認	Eloi Tanguy, Rémi Flamary, Julie Delon,	(参考訳) Sliced Wasserstein (SW) 距離は、確率測度を比較するために、Wasserstein 距離の代替として人気がある。ワイドスプレッドの応用としては、画像処理、ドメイン適応、生成モデリングがあり、SWを最小化するためにパラメータを最適化することが一般的である。これらの最適化問題はすべて、スライスされたワッサーシュタインエネルギーを最小化する同じサブプロブレムを持つ。本稿では、$\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, すなわち、サポート $Y \in \mathbb{R}^{n \times d} の関数として同じ量の点を持つ2つの一様離散測度の間のSW距離について検討する。このエネルギーの正則性と最適化特性、およびモンテカルロ近似$\mathcal{E}_p$($p$サンプルのみを用いてSWの期待を見積もる)について検討し、$\mathcal{E}_p$の臨界点と$\mathcal{E}_p$の臨界点に対する収束結果、およびほぼ一様収束を示す。最後に、ある意味では、Stochastic Gradient Descent method minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge to (Clarke) critical points of these energy。 The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.	翻訳日:2024-03-20 04:12:33 公開日:2024-03-18
# スライスワッサースタイン損失を用いたニューラルネットワーク学習におけるSGDの収束性 Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses ( http://arxiv.org/abs/2307.11714v3 ) ライセンス: Link先を確認	Eloi Tanguy,	(参考訳) 最適輸送は近年、特にワッサーシュタイン距離(英語版)のおかげで、幾何的に合理的で直感的に確率測度を比較する方法によって、鮮明な関心を呼び起こしている。計算上の理由から、スライデッド・ワッサースタイン距離(SW)はワッサースタイン距離の代替として導入され、生成ニューラルネットワーク(NN)のトレーニングに利用されてきた。確率勾配Descent (SGD) の収束は, 実際にこのような状況下で観測されているが, この観測に対する理論的保証はない。 Bianchi et al (2022) による非滑らかおよび非凸関数に対するSGDの収束に関する最近の研究を活用し、我々はその知識ギャップを埋めることを目的としており、NNパラメータ上のSW損失に対する固定ステップSGD軌道が収束する現実的な文脈を提供する。より正確には、ステップが減少するにつれて、軌道が(部分)勾配方程式の集合に近づくことを示す。より厳密な仮定の下では、雑音および射影されたSGDスキームに対してより強い収束結果を示す。 Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function.	翻訳日:2024-03-20 04:12:33 公開日:2024-03-18
# 予測された文脈による盗賊のオンライン学習 Online learning in bandits with predicted context ( http://arxiv.org/abs/2307.13916v3 ) ライセンス: Link先を確認	Yongyi Guo, Ziping Xu, Susan Murphy,	(参考訳) エージェントがコンテキストのノイズのあるバージョンとエラー分散(あるいはこの分散の推定器)にのみアクセスできる状況的帯域幅問題を考える。この設定は、意思決定の真のコンテキストが観測されず、潜在的に複雑な機械学習アルゴリズムによるコンテキストの予測しかできない幅広いアプリケーションによって動機付けられている。文脈誤差がなくなると、古典的な帯域幅アルゴリズムはサブ線形後悔を達成できない。本研究では,この設定において,軽度条件下でのサブ線形後悔保証を用いた最初のオンラインアルゴリズムを提案する。鍵となる考え方は、古典統計学における測定誤差モデルをオンライン意思決定設定に拡張することである。さらに、合成および実際のデジタル介入データセットに基づくシミュレーション環境における提案手法の利点を実証する。 We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-vanishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations. We further demonstrate the benefits of the proposed approach in simulation environments based on synthetic and real digital intervention datasets.	翻訳日:2024-03-20 04:12:33 公開日:2024-03-18
# 生成逆数ネットワークのための統一電源損失関数 A Unifying Generator Loss Function for Generative Adversarial Networks ( http://arxiv.org/abs/2308.07233v3 ) ライセンス: Link先を確認	Justin Veiner, Fady Alajaji, Bahman Gharesifard,	(参考訳) 従来のGAN(VanillaGAN)システムのように、標準的な(または古典的な)判別器損失関数を使用する二重目的生成逆数ネットワーク(GAN)に対して、$\alpha$-parametrized generator loss関数を導入する。ジェネレータ損失関数は対称クラス確率推定型関数である$\mathcal{L}_\alpha$に基づいており、結果として得られるGANシステムは$\mathcal{L}_\alpha$-GANと呼ばれる。最適判別器の下では、ジェンセン・シャノン発散の自然な一般化であるJensen-$f_\alpha$-divergence を最小化することで、ジェネセンの最適化問題は、損失関数 $\mathcal{L}_\alpha$ で表される凸函数であることを示す。また、この$\mathcal{L}_\alpha$-GAN問題は、VanillaGAN、Least Squares GAN (LSGAN)、Least $k$th order GAN (L$k$GAN)、最近導入された$(\alpha_D,\alpha_G)$-GAN with $\alpha_D=1$など、文学における多くのGAN問題として回復することを示した。最後に、MNIST、CIFAR-10、Stacked MNISTの3つのデータセットを用いて実験を行い、$\mathcal{L}_\alpha$-GANシステムの様々な例のパフォーマンスを示す。 A unifying $\alpha$-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN), which uses a canonical (or classical) discriminator loss function such as the one in the original GAN (VanillaGAN) system. The generator loss function is based on a symmetric class probability estimation type function, $\mathcal{L}_\alpha$, and the resulting GAN system is termed $\mathcal{L}_\alpha$-GAN. Under an optimal discriminator, it is shown that the generator's optimization problem consists of minimizing a Jensen-$f_\alpha$-divergence, a natural generalization of the Jensen-Shannon divergence, where $f_\alpha$ is a convex function expressed in terms of the loss function $\mathcal{L}_\alpha$. It is also demonstrated that this $\mathcal{L}_\alpha$-GAN problem recovers as special cases a number of GAN problems in the literature, including VanillaGAN, Least Squares GAN (LSGAN), Least $k$th order GAN (L$k$GAN) and the recently introduced $(\alpha_D,\alpha_G)$-GAN with $\alpha_D=1$. Finally, experimental results are conducted on three datasets, MNIST, CIFAR-10, and Stacked MNIST to illustrate the performance of various examples of the $\mathcal{L}_\alpha$-GAN system.	翻訳日:2024-03-20 04:02:28 公開日:2024-03-18
# OmniQuant: 大規模言語モデルのための一方向校正量子化 OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models ( http://arxiv.org/abs/2308.13137v3 ) ライセンス: Link先を確認	Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo,	(参考訳) 大規模言語モデル(LLM)は自然言語処理タスクに革命をもたらした。しかし、彼らの実践的なデプロイメントは、その膨大なメモリと計算要求によって妨げられている。近年のPTQ法はメモリフットプリントの削減とLLMの計算効率の向上に有効であるが、手作業による量子化パラメータが有効であり、特に極低ビット量子化において性能が低下する。この問題に対処するために, 様々な量子化パラメータを効率的に最適化し, PTQの計算効率を維持しつつ, 多様な量子化設定において優れた性能を実現するLLMのOmnidirectionally calibrated Quantization (\textbf{OmniQuant}) 技術を導入する。 OmniQuantはLearnerable Weight Clipping (LWC) とLearnerable Equivalent Transformation (LET) の2つの革新的なコンポーネントで構成されている。 LWCはクリッピング閾値を最適化することで重量の極端な値を変調する。一方、LETはアクティベーションからウェイトへの量子化の課題をシフトすることで、アクティベーションアウトリーに取り組みます。 OmniQuantはブロックワイドエラー最小化を用いて、微分可能なフレームワーク内で動作し、ウェイトオンリーおよびウェイトアクティベーション量子化の両方のために、量子化プロセスを効率的に最適化することができる。例えば、LLaMA-2モデルファミリーサイズ7-70Bは、1-16時間以内に128サンプルを使用して単一のA100-40G GPU上でOmniQuantで処理できる。大規模な実験により、OmniQuantはW4A4(4ビットの重み、4ビットのアクティベーション)、W6A6、W4A16、W3A16、W2A16などの様々な量子化構成において優れた性能を示す。さらに、OmniQuantは命令チューニングモデルの有効性を示し、実際のデバイスにおける推論速度とメモリ削減の顕著な改善を提供する。コードは \url{https://github.com/OpenGVLab/OmniQuant} で公開されている。 Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, leading to low performance, especially in extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (\textbf{OmniQuant}) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters. OmniQuant comprises two innovative components including Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET). LWC modulates the extreme values of weights by optimizing the clipping threshold. Meanwhile, LET tackles activation outliers by shifting the challenge of quantization from activations to weights. Operating within a differentiable framework using block-wise error minimization, OmniQuant can optimize the quantization process efficiently for both weight-only and weight-activation quantization. For instance, the LLaMA-2 model family size 7-70B can be processed with OmniQuant on a single A100-40G GPU within 1-16 hours using 128 samples. Extensive experiments validate OmniQuant's superior performance across diverse quantization configurations such as W4A4 (4-bit weight, 4-bit activation), W6A6, W4A16, W3A16, and W2A16. Additionally, OmniQuant demonstrates effectiveness in instruction-tuned models and delivers notable improvements in inference speed and memory reduction on real devices. Codes are available at \url{https://github.com/OpenGVLab/OmniQuant}.	翻訳日:2024-03-20 04:02:28 公開日:2024-03-18
# DynaMoN:動的ニューラルラジアンス場のための高速かつロバストなカメラローカライゼーション DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields ( http://arxiv.org/abs/2309.08927v2 ) ライセンス: Link先を確認	Nicolas Schischka, Hannah Schieber, Mert Asim Karaoglu, Melih Görgülü, Florian Grötzner, Alexander Ladikos, Daniel Roth, Nassir Navab, Benjamin Busam,	(参考訳) ニューラルレイディアンス場を用いた動的シーンの正確な再構成は、カメラポーズの推定に大きく依存する。広範に使用されている移動パイプラインは、シーンの内容とカメラの動きの異なるダイナミクスに直面した場合に、カメラ軌跡を正確に追跡することが困難である。この課題に対処するため,我々はDynaMoNを提案する。 DynaMoNは、セマンティックセグメンテーションとジェネリックモーションマスクを使用して、動的コンテンツを扱う。我々の新しい反復学習方式は、NeRFのトレーニングとポーズパラメータの更新を切り替えて、改良された再構成と軌道推定の品質を向上する。提案したパイプラインは,トレーニングプロセスの大幅な加速を示す。我々は,TUM RGB-DとBONN RGB-D Dynamicデータセットの2つの実世界の動的データセットに対するアプローチを広範に評価した。 DynaMoNは、再構築品質と軌道精度の両面で最先端の技術を向上する。この分野での研究を強化するために、コードを公開する予定です。 The accurate reconstruction of dynamic scenes with neural radiance fields is significantly dependent on the estimation of camera poses. Widely used structure-from-motion pipelines encounter difficulties in accurately tracking the camera trajectory when faced with separate dynamics of the scene content and the camera movement. To address this challenge, we propose DynaMoN. DynaMoN utilizes semantic segmentation and generic motion masks to handle dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis. Our novel iterative learning scheme switches between training the NeRF and updating the pose parameters for an improved reconstruction and trajectory estimation quality. The proposed pipeline shows significant acceleration of the training process. We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D and the BONN RGB-D Dynamic dataset. DynaMoN improves over the state-of-the-art both in terms of reconstruction quality and trajectory accuracy. We plan to make our code public to enhance research in this area.	翻訳日:2024-03-20 04:02:28 公開日:2024-03-18
# Choice-75: スクリプト学習における決定分岐に関するデータセット Choice-75: A Dataset on Decision Branching in Script Learning ( http://arxiv.org/abs/2309.11737v2 ) ライセンス: Link先を確認	Zhaoyi Joey Hou, Li Zhang, Chris Callison-Burch,	(参考訳) スクリプト学習は、ステレオタイプイベントがどのように展開され、機械が暗黙の情報で物語を推論できるようにするかを研究する。以前の作品は、主にスクリプトを出来事の線形的なシーケンスと見なし、人々の状況選択によって生じる潜在的な枝を無視している。そこで我々は、75のスクリプトと600以上のシナリオを含む、記述可能なシナリオを判断するためにインテリジェントシステムに挑戦する最初のベンチマークであるChoice-75を提案する。また,現在の大規模言語モデル (LLM) による予備的な結果も提示する。全体的にまともなパフォーマンスを示しているが、ハードなシナリオには依然として注目すべきヘッドルームがある。 Script learning studies how stereotypical events unfold, enabling machines to reason about narratives with implicit information. Previous works mostly consider a script as a linear sequence of events while ignoring the potential branches that arise due to people's circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given descriptive scenarios, containing 75 scripts and more than 600 scenarios. We also present preliminary results with current large language models (LLM). Although they demonstrate overall decent performance, there is still notable headroom in hard scenarios.	翻訳日:2024-03-20 03:52:43 公開日:2024-03-18
# Kerrパラメトリック発振器に対する有効対フロケ理論 Effective versus Floquet theory for the Kerr parametric oscillator ( http://arxiv.org/abs/2309.12516v3 ) ライセンス: Link先を確認	Ignacio García-Mata, Rodrigo G. Cortiñas, Xu Xiao, Jorge Chávez-Carlos, Victor S. Batista, Lea F. Santos, Diego A. Wisniacki,	(参考訳) 駆動系の静的有効ハミルトニアンの観点から設計されたパラメトリックゲートとプロセスは、量子技術の中心である。しかし、静的有効モデルの導出に使われる摂動展開は、元の系のすべての関連する物理を効率的に捉えることができないかもしれない。本研究では,スキューズ駆動下でのKerr発振器を記述するのに使用される通常の低次静的実効ハミルトニアンの有効性について検討する。このシステムは基本的および技術的関心事である。特に、量子コンピューティングに応用されるSchr\"odinger cat stateの安定化に用いられている。実効的静的ハミルトニアンの状態およびエネルギーを、駆動系の正確なフロケ状態と準エネルギーと比較し、2つの記述が一致するパラメータ状態を決定する。我々の研究は、通常の静的な効果的な処理によって取り残され、最先端の実験によって探索される物理学の光をもたらす。 Parametric gates and processes engineered from the perspective of the static effective Hamiltonian of a driven system are central to quantum technology. However, the perturbative expansions used to derive static effective models may not be able to efficiently capture all the relevant physics of the original system. In this work, we investigate the conditions for the validity of the usual low-order static effective Hamiltonian used to describe a Kerr oscillator under a squeezing drive. This system is of fundamental and technological interest. In particular, it has been used to stabilize Schr\"odinger cat states, which have applications for quantum computing. We compare the states and energies of the effective static Hamiltonian with the exact Floquet states and quasi-energies of the driven system and determine the parameter regime where the two descriptions agree. Our work brings to light the physics that is left out by ordinary static effective treatments and that can be explored by state-of-the-art experiments.	翻訳日:2024-03-20 03:52:43 公開日:2024-03-18
# 本当に否定的か? 複数の信頼度ビデオプール上での自然言語ビデオのローカライゼーション性能の評価 Is it Really Negative? Evaluating Natural Language Video Localization Performance on Multiple Reliable Videos Pool ( http://arxiv.org/abs/2309.16701v2 ) ライセンス: Link先を確認	Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung,	(参考訳) 近年のマルチメディアコンテンツの普及に伴い、複数のビデオから与えられた自然言語クエリにマッチするビデオモーメント検出を目的としたビデオコーパスモーメント検索(VCMR)が重要な問題となっている。しかし、既存のVCMR研究では、すべてのビデオが特定のクエリとペアにされていないことを負のクエリとみなしており、負のビデオセットを構築する際に偽の負を含む可能性を無視しているため、大きな制限がある。本稿では,ビデオフレームを巨大なビデオ集合内にローカライズすることを目的としたMVMR(Massive Videos Moment Retrieval)タスクを提案する。そこで本稿では,既存のビデオモーメント検索データセットにテキストと視覚的セマンティックマッチング評価手法を適用し,MVMRデータセットを3つ導入することで,自動データセット構築フレームワークを提案する。さらに,MVMRタスクの信頼性と情報的負を選択的に識別し,MVMRタスク上でのモデルの堅牢性を向上する,双方向のコントラスト学習を用いた強力なCroCを提案する。その結果,既存のビデオモーメント検索モデルは負の映像フレームによって容易に邪魔されるが,本モデルでは顕著な性能を示した。 With the explosion of multimedia content in recent years, Video Corpus Moment Retrieval (VCMR), which aims to detect a video moment that matches a given natural language query from multiple videos, has become a critical problem. However, existing VCMR studies have a significant limitation since they have regarded all videos not paired with a specific query as negative, neglecting the possibility of including false negatives when constructing the negative video set. In this paper, we propose an MVMR (Massive Videos Moment Retrieval) task that aims to localize video frames within a massive video set, mitigating the possibility of falsely distinguishing positive and negative videos. For this task, we suggest an automatic dataset construction framework by employing textual and visual semantic matching evaluation methods on the existing video moment search datasets and introduce three MVMR datasets. To solve MVMR task, we further propose a strong method, CroCs, which employs cross-directional contrastive learning that selectively identifies the reliable and informative negatives, enhancing the robustness of a model on MVMR task. Experimental results on the introduced datasets reveal that existing video moment search models are easily distracted by negative video frames, whereas our model shows significant performance.	翻訳日:2024-03-20 03:52:43 公開日:2024-03-18
# ランダム化平滑化のためのリプシッツ-バランス-マージントレードオフ The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing ( http://arxiv.org/abs/2309.16883v4 ) ライセンス: Link先を確認	Blaise Delattre, Alexandre Araujo, Quentin Barthélemy, Alexandre Allauzen,	(参考訳) ディープニューラルネットワークの現実的な応用は、ノイズの多い入力や敵攻撃に直面した場合、その不安定な予測によって妨げられる。この文脈における証明された半径は、モデルの堅牢性の重要な指標である。しかし、関連する認定半径を持つ効率的な分類器を設計するにはどうすればよいのか? ランダム化スムーシングは、スムーズでロバストな分類器を得るために、入力へのノイズ注入に頼ることで、有望なフレームワークを提供する。本稿では,ランダムな平滑化過程推定におけるモンテカルロサンプリングによって生じる分散が,分類器の他の2つの重要な性質であるリプシッツ定数とマージンと密接に相互作用することを示す。より正確には、我々の研究は、スムーズ化された分類器と経験的分散の両方に対する基底分類器のリプシッツ定数の二重影響を強調している。証明されたロバスト半径を増やすために,ロジットを基底分類器の確率ベクトルに変換し,分散マージントレードオフを利用する方法を導入する。我々は、ランダムな平滑化のために強化されたリプシッツ境界とともに、ベルンシュタインの濃度不等式の利用を利用する。実験の結果,現在の最先端手法と比較して精度が著しく向上した。新たな認証手法により、ランダムな平滑化による事前学習モデルの使用が可能となり、ゼロショット方式で現在の認証半径を効果的に改善できる。 Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, \textit{i.e.} its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. To increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.	翻訳日:2024-03-20 03:52:43 公開日:2024-03-18
# 制約のない確率的CAA:マルチビューと自己監督学習の統合 Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning ( http://arxiv.org/abs/2310.01012v3 ) ライセンス: Link先を確認	James Chapman, Lennie Wells, Ana Lawry Aguila,	(参考訳) カノニカル相関解析(CCA)は多視点学習の基礎となる手法である。正規化線形CAA法は部分最小正方形 (PLS) を一般化し、一般化固有値問題 (GEP) フレームワークと統一することができる。しかし、これらの線形手法の古典的アルゴリズムは大規模データに対して計算不可能である。 Deep CCAの拡張は有望だが、現在のトレーニング手順は遅く、複雑である。まず、GEPの上位部分空間を特徴付ける、制約のない新しい目的を提案する。我々のコアコントリビューションは、確率的PSS、確率的CAA、Deep CCAのための高速アルゴリズムのファミリーであり、対応するCAの目的に確率的勾配勾配(SGD)を適用するだけで得られる。我々のアルゴリズムは、すべての標準CCAおよびDeep CCAベンチマークにおいて、従来よりもはるかに高速な収束と高い相関関係の回復を示す。これらの改善により、英国バイオバンクの非常に大きなバイオメディカルデータセットを、最初のPLS分析で分析することができます。最後に,CIFAR-10 と CIFAR-100 における 'CCA- Family' Self-Supervised Learning (SSL) 手法の性能を最小限のハイパーパラメータチューニングで比較し,これらの手法と古典的な CCA との関係を明らかにするための理論を述べる。 The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.	翻訳日:2024-03-20 03:42:41 公開日:2024-03-18
# 複数の音声言語に対する音声発話ペアを用いたゼロリソース符号切替音声ベンチマーク Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages ( http://arxiv.org/abs/2310.03018v3 ) ライセンス: Link先を確認	Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-yi Lee,	(参考訳) 自己教師型音声エンコーダの符号切替機能を直接評価するゼロリソース符号切替型音声ベンチマークを提案する。本稿では,音声エンコーダのコードスイッチング能力がゼロリソース方式でどのように評価できるかを示すために,離散単位に基づく言語モデリングのベースラインシステムを紹介する。我々の実験は、Wav2vec 2.0、HuBERT、XLSRなど、よく知られた音声エンコーダを含む。事前学習言語とモデルサイズがベンチマーク性能に与える影響について検討する。特に,XLSRで実証した多言語事前学習による音声エンコーダは,コードスイッチングシナリオにおける単言語変種(Wav2vec 2.0, HuBERT)よりも優れているが,コードスイッチング言語能力の改善の余地は十分にある。 We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance. Notably, though our results demonstrate that speech encoders with multilingual pre-training, exemplified by XLSR, outperform monolingual variants (Wav2vec 2.0, HuBERT) in code-switching scenarios, there is still substantial room for improvement in their code-switching linguistic abilities.	翻訳日:2024-03-20 03:42:41 公開日:2024-03-18
# 点雲発生による近面サンプリングによるニューラルラジアンス場の改善 Improving Neural Radiance Field using Near-Surface Sampling with Point Cloud Generation ( http://arxiv.org/abs/2310.04152v2 ) ライセンス: Link先を確認	Hye Bin Yoo, Hyun Min Han, Sung Soo Hwang, Il Yong Chun,	(参考訳) ニューラル放射場(NeRF)は、3次元の3次元空間における点を抽出し、その存在と色確率を推定する新しいビュー合成法である。 NeRFの欠点は、多くの3Dポイントをサンプリングするため、長い訓練時間を必要とすることである。さらに、1つのサンプルが隠蔽領域から、あるいは物体が存在しないような空間へ向けられた場合、NeRFのレンダリング品質を劣化させることができる。これらの問題は3次元シーンの形状を推定することで解決できる。本稿では,NeRFのレンダリング品質を向上させるため,表面近傍のサンプリングフレームワークを提案する。提案手法は,トレーニングセットの深度画像を用いて3次元物体の表面を推定し,その周辺でのみサンプリングを行う。新たな視点の深度情報を得るために,3次元点雲生成法と点雲から投影された深度を簡易に精錬する方法を提案する。実験結果から,提案手法は,元のNeRFと3種類の最先端NeRFと比較して,レンダリング品質を著しく向上させることができることがわかった。また,提案手法により,NeRFモデルのトレーニング時間を大幅に短縮することができる。 Neural radiance field (NeRF) is an emerging view synthesis method that samples points in a three-dimensional (3D) space and estimates their existence and color probabilities. The disadvantage of NeRF is that it requires a long training time since it samples many 3D points. In addition, if one samples points from occluded regions or in the space where an object is unlikely to exist, the rendering quality of NeRF can be degraded. These issues can be solved by estimating the geometry of 3D scene. This paper proposes a near-surface sampling framework to improve the rendering quality of NeRF. To this end, the proposed method estimates the surface of a 3D object using depth images of the training set and sampling is performed around there only. To obtain depth information on a novel view, the paper proposes a 3D point cloud generation method and a simple refining method for projected depth from a point cloud. Experimental results show that the proposed near-surface sampling NeRF framework can significantly improve the rendering quality, compared to the original NeRF and three different state-of-the-art NeRF. In addition, one can significantly accelerate the training time of a NeRF model with the proposed near-surface sampling framework.	翻訳日:2024-03-20 03:32:38 公開日:2024-03-18
# Toolink:Linking Toolkitの作成とオープンソースモデルのチェーン・オブ・ソルビングによる利用 Toolink: Linking Toolkit Creation and Using through Chain-of-Solving on Open-Source Model ( http://arxiv.org/abs/2310.05155v2 ) ライセンス: Link先を確認	Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu,	(参考訳) 大規模言語モデル(LLM)は、ツールの利用において顕著な進歩を示しているが、そのクローズドソースの性質と高い推論コストは、適応性に制限を与え、より小さく、オープンソースのモデルを活用する有効な方法を必要としている。本稿では、まずツールキットを作成し、次にチェーン・オブ・ソルディング(CoS)アプローチを通じてツールの計画と呼び出しを統合することでタスク解決を行う包括的フレームワークであるToolinkを紹介する。まず、ChatGPT上でのモデルの創造性とCoS能力を活用する上で、Toolinkの有効性を検証する。その後、ツール使用用に設計されたチェーン・オブ・ゾルディング・データセットであるCoS-GPTをキュレートし、LLaMA-7Bモデルを微調整する。その結果、高度なツールプランニングとツールコール機能を備えた強力なオープンソースモデルであるLLaMA-CoSが実現した。 BIG-benchによる多様なタスクの評価では、CoSの能力はChatGPTの能力と一致し、その性能はチェーン・オブ・ソート・アプローチを上回っている。さらなる研究は、LLaMA-CoSの未確認タスクへの一般化を強調し、ターゲットタスクに明示的に適合しないツールキットの使用能力を示し、現実のシナリオにおける堅牢性を確認している。 Large Language Models (LLMs) have demonstrated remarkable progress in utilizing tools, but their closed-source nature and high inference costs pose limitations on their adaptability, necessitating a valid method that leverages smaller, open-sourced models. In this paper, we introduce Toolink, a comprehensive framework that performs task-solving by first creating a toolkit and then integrating the planning and calling of tools through a chain-of-solving (CoS) approach. We first validate the efficacy of Toolink in harnessing the model's creativity and CoS ability on ChatGPT. Subsequently, we curate CoS-GPT, a chain-of-solving dataset designed for tool-using, and finetune the LLaMA-7B model. It results in LLaMA-CoS, a powerful open-source model with advanced tool-planning and tool-calling capabilities. Evaluation of diverse tasks from BIG-bench demonstrates its CoS ability matches that of ChatGPT while its performance surpasses the chain-of-thought approach. Further studies highlight the generalization of LLaMA-CoS to unseen tasks and showcase its capability in using toolkits not explicitly tailored for the target task, affirming its robustness in real-world scenarios.	翻訳日:2024-03-20 03:32:38 公開日:2024-03-18
# 測定に基づく等時時間進化を用いたAKLT状態の効率的作成 Efficient preparation of the AKLT State with Measurement-based Imaginary Time Evolution ( http://arxiv.org/abs/2310.06031v2 ) ライセンス: Link先を確認	Tianqi Chen, Tim Byrnes,	(参考訳) 量子状態の準備は、量子シミュレーション、量子力学、量子コンピューティングなどの応用において、量子情報科学のいくつかの領域において重要な役割を果たす。しかし、一般に状態の準備には、確率的な性質やそれ以外のために、問題の大きさと指数関数的にスケールするリソースが必要であるため、そのようなモデルの研究は困難である。本稿では,Affleck-Lieb-Kennedy-Tasaki (AKLT) モデルの基底状態を作成する手法を提案する。 AKLT状態の特殊特性を生かして,MITE法を用いて効率的に調製可能であることを示す。局所射影列の収束とMITEアルゴリズムの直接進化に基づく推定は、AKLTサイトの数に関して一定のスケーリングを示唆している。提案手法は、キュービットベースのシミュレータと互換性があり、回路再コンパイルに可変量子アルゴリズムを用いることで、MITEに必要な測定演算子は、デフォルトのQiskit法に比べて回路深さの浅い回路で十分近似できることを示す。 Quantum state preparation plays a crucial role in several areas of quantum information science, in applications such as quantum simulation, quantum metrology and quantum computing. However, typically state preparation requires resources that scale exponentially with the problem size, due to their probabilistic nature or otherwise, making studying such models challenging. In this article, we propose a method to prepare the ground state of the Affleck-Lieb-Kennedy-Tasaki (AKLT) model deterministically using an measurement-based imaginary time evolution (MITE) approach. By taking advantage of the special properties of the AKLT state, we show that it can be prepared efficiently using the MITE approach. Estimates based on the convergence of a sequence of local projections, as well as direct evolution of the MITE algorithm suggest a constant scaling with respect to the number of AKLT sites, which is an exponential improvement over the naive estimate for conveargence. We show that the procedure is compatible with qubit-based simulators, and show that using a variational quantum algorithm for circuit recompilation, the measurement operator required for MITE can be well approximated by a circuit with a much shallower circuit depth compared with the one obtained using the default Qiskit method.	翻訳日:2024-03-20 03:32:38 公開日:2024-03-18
# EC-Depth:挑戦シーンにおける自己教師付き単眼深度推定の整合性を探る EC-Depth: Exploring the consistency of self-supervised monocular depth estimation in challenging scenes ( http://arxiv.org/abs/2310.08044v2 ) ライセンス: Link先を確認	Ziyang Song, Ruijie Zhu, Chuxin Wang, Jiacheng Deng, Jianfeng He, Tianzhu Zhang,	(参考訳) 自律走行とロボット工学の分野では、自己監督された単眼深度推定が重要である。しかし、既存の手法は一般的に標準的なデータセットでトレーニングされ、テストされ、雨の日のような現実世界のアプリケーションで発生する様々な有害な状況の影響を見越す。その結果、これらの手法がこれらの難解なシナリオを扱うのに苦労していることがよく観察される。この問題に対処するため,我々は,頑健な深さ推定を実現するための新しい自己教師型2段階フレームワークであるEC-Depthを提案する。第1段階では、信頼性の高い監督を標準から挑戦的な場面に広めるために、奥行き整合正則化を提案する。第2段階では、平均教師パラダイムを採用し、新しい一貫性に基づく擬似ラベルフィルタリング戦略を提案し、擬似ラベルの品質を改善し、モデルの精度と堅牢性を改善する。提案手法は, KITTI, KITTI-C, DrivingStereo, NuScenes-Nightベンチマークにおいて, 既存の最先端手法を超越して, 高精度かつ一貫した深度予測を実現する。 Self-supervised monocular depth estimation holds significant importance in the fields of autonomous driving and robotics. However, existing methods are typically trained and tested on standard datasets, overlooking the impact of various adverse conditions prevalent in real-world applications, such as rainy days. As a result, it is commonly observed that these methods struggle to handle these challenging scenarios. To address this issue, we present EC-Depth, a novel self-supervised two-stage framework to achieve a robust depth estimation. In the first stage, we propose depth consistency regularization to propagate reliable supervision from standard to challenging scenes. In the second stage, we adopt the Mean Teacher paradigm and propose a novel consistency-based pseudo-label filtering strategy to improve the quality of pseudo-labels, further improving both the accuracy and robustness of our model. Extensive experiments demonstrate that our method achieves accurate and consistent depth predictions in both standard and challenging scenarios, surpassing existing state-of-the-art methods on KITTI, KITTI-C, DrivingStereo, and NuScenes-Night benchmarks.	翻訳日:2024-03-20 03:32:38 公開日:2024-03-18
# 単語からエクササイズからウェルネスへ:Farsi Chatbot for Self-Attachment Technique From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique ( http://arxiv.org/abs/2310.09362v2 ) ライセンス: Link先を確認	Sina Elahimanesh, Shayan Salehi, Sara Zahedi Movahed, Lisa Alazraki, Ruoyu Hu, Abbas Edalat,	(参考訳) 社会的孤立とうつ病や不安の高まりを特徴とするポストパンデミック時代以降、デジタル心理療法に基づく会話エージェントは、伝統的なセラピーセッションよりも重要な役割を担っている。本研究では,Farsiにおける音声対応型チャットボットを開発し,アタッチメント理論に基づく,新規で自己管理型,包括的心理学的手法であるセルフアタッチメント(SAT)を通じてユーザを誘導する。我々のチャットボットは,会話を通してユーザ入力を理解し,対話フローチャートをナビゲートするために,ルールベースのモジュールと分類ベースのモジュールの動的配列を使用し,ユーザの感情や精神状態に依存する適切なSAT演習を推奨する。特に、6000以上の発話のデータセットを収集し、ユーザの感情を12クラスに分類する新しい感情分析モジュールを92%以上の精度で開発する。会話の新規化とエンゲージメントを維持するために、チャットボットの応答は、Farsi GPT-2と強化学習アプローチの助けを借りて作成した大量の発話データセットから検索されるので、人間のアノテーションは最小限である。私たちのチャットボットはSAT Teacherという質問応答モジュールも提供しています。最後に,ボットのユーザインタフェースとしてクロスプラットフォームアプリケーションを設計する。チャットボットとの対話を合計2000回以上行ったN=52人のボランティアを対象に,このプラットフォームを10日間の人間実験で評価した。その結果,ほとんどのユーザ(75%),72%がインタラクションの後に気分が良くなり,74%がSAT教師のパフォーマンスに満足していたことが示唆された。 In the wake of the post-pandemic era, marked by social isolation and surging rates of depression and anxiety, conversational agents based on digital psychotherapy can play an influential role compared to traditional therapy sessions. In this work, we develop a voice-capable chatbot in Farsi to guide users through Self-Attachment (SAT), a novel, self-administered, holistic psychological technique based on attachment theory. Our chatbot uses a dynamic array of rule-based and classification-based modules to comprehend user input throughout the conversation and navigates a dialogue flowchart accordingly, recommending appropriate SAT exercises that depend on the user's emotional and mental state. In particular, we collect a dataset of over 6,000 utterances and develop a novel sentiment-analysis module that classifies user sentiment into 12 classes, with accuracy above 92%. To keep the conversation novel and engaging, the chatbot's responses are retrieved from a large dataset of utterances created with the aid of Farsi GPT-2 and a reinforcement learning approach, thus requiring minimal human annotation. Our chatbot also offers a question-answering module, called SAT Teacher, to answer users' questions about the principles of Self-Attachment. Finally, we design a cross-platform application as the bot's user interface. We evaluate our platform in a ten-day human study with N=52 volunteers from the non-clinical population, who have had over 2,000 dialogues in total with the chatbot. The results indicate that the platform was engaging to most users (75%), 72% felt better after the interactions, and 74% were satisfied with the SAT Teacher's performance.	翻訳日:2024-03-20 03:32:38 公開日:2024-03-18
# Bongard-OpenWorld: 現実の世界における自由な視覚概念のためのFew-Shot Reasoning Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World ( http://arxiv.org/abs/2310.10207v5 ) ライセンス: Link先を確認	Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun Zhu, Yizhou Wang,	(参考訳) Bongard-OpenWorldは、マシンビジョンのための実世界の数ショット推論を評価するための新しいベンチマークである。古典的ボナード問題(BP)に由来する: 2つのイメージセット(正と負の)が与えられたモデルでは、クエリイメージが属する集合を、正の集合からのみ描写される視覚概念を誘導することによって識別する必要がある。我々のベンチマークは、最初のBPのいくつかの概念を継承し、新しい2つの課題を追加している。 1) オープンワールドのオープンワールド自由形式概念は,Bongard-OpenWorldの視覚概念は,対象カテゴリーから抽象的な視覚属性及び常識的な事実知識まで,オープン語彙からの用語のユニークな構成である。 2) 実世界の画像は, 合成図とは対照的である。私たちの調査では、Bongard-OpenWorldは、現在の数発の推論アルゴリズムに対して、すでに重大な課題を課しています。さらに,最近導入されたLarge Language Models (LLMs) とVision-Language Models (VLMs) が,VLMを直接探索し,VLMとLLMを対話型推論方式で組み合わせることで,その課題をどの程度解決できるかについても検討する。ボナード問題に対する人間の問題解決過程をエミュレートするために,LLMとVLMを論理的推論で整合させる,ニューロシンボリック推論アプローチも考案した。しかし、最良の学習者は64%の精度を達成し、人間の参加者は91%に到達し易いため、これらのアプローチはいずれも人間と機械のギャップを埋めるには至らなかった。 Bongard-OpenWorldは、現在のビジュアルインテリジェンスの限界をよりよく理解し、より強力な数発のビジュアル推論機能を備えたビジュアルエージェントの研究を後押ししてくれることを期待しています。 We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge; 2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge to current few-shot reasoning algorithms. We further investigate to which extent the recently introduced Large Language Models (LLMs) and Vision-Language Models (VLMs) can solve our task, by directly probing VLMs, and combining VLMs and LLMs in an interactive reasoning scheme. We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems. However, none of these approaches manage to close the human-machine gap, as the best learner achieves 64% accuracy while human participants easily reach 91%. We hope Bongard-OpenWorld can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# 画像編集のためのオブジェクト認識インバージョンと再組み立て Object-aware Inversion and Reassembly for Image Editing ( http://arxiv.org/abs/2310.12149v2 ) ライセンス: Link先を確認	Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen,	(参考訳) 元のプロンプトとターゲットのプロンプトを比較することで、対象と対応する編集対象からなる多数の編集ペアを得ることができる。既存の編集方法は、入力画像に対する忠実性を保ちながら、編集性を確保するため、通常、入力画像全体をノイズの潜在表現に投影する固定数の反転ステップを伴い、続いてターゲットプロンプトによってガイドされる復調処理を行う。しかし,編集難易度が異なるため,最適な編集結果を得るための倒立ステップの数が異なることが判明した。そのため、現在の文献では、特に複数の編集ペアを自然画像で処理する場合に、一定数の反転ステップに依存するため、準最適生成品質が得られる。そこで本研究では,オブジェクトレベルの微粒化編集を実現するために,オブジェクト認識型インバージョン・リアセンブラ(OIR)と呼ばれる新たな画像編集パラダイムを提案する。具体的には、ターゲットの編集可能性と非編集領域の忠実度を共同で考慮し、各編集ペアの最適反転ステップを決定する新しい検索指標を設計する。画像の編集時に各編集ペアに対して最適な反転ステップを見つけるために,検索基準を用いる。次に、これらの編集ペアを別々に編集し、概念ミスマッチを避ける。その後、各編集結果と非編集領域をシームレスに統合し、最終的な編集画像を得るための追加の組立ステップを提案する。提案手法の有効性を体系的に評価するために,OIRBenchと呼ばれる2つのデータセットを収集した。実験により, 対象の形状, 色, 材料, カテゴリ等の編集において, 特に多目的の編集シナリオにおいて, 優れた性能が得られることが示された。 By comparing the original and target prompts, we can obtain numerous editing pairs, each comprising an object and its corresponding editing target. To allow editability while maintaining fidelity to the input image, existing editing methods typically involve a fixed number of inversion steps that project the whole input image to its noisier latent representation, followed by a denoising process guided by the target prompt. However, we find that the optimal number of inversion steps for achieving ideal editing results varies significantly among different editing pairs, owing to varying editing difficulties. Therefore, the current literature, which relies on a fixed number of inversion steps, produces sub-optimal generation quality, especially when handling multiple editing pairs in a natural image. To this end, we propose a new image editing paradigm, dubbed Object-aware Inversion and Reassembly (OIR), to enable object-level fine-grained editing. Specifically, we design a new search metric, which determines the optimal inversion steps for each editing pair, by jointly considering the editability of the target and the fidelity of the non-editing region. We use our search metric to find the optimal inversion step for each editing pair when editing an image. We then edit these editing pairs separately to avoid concept mismatch. Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image. To systematically evaluate the effectiveness of our method, we collect two datasets called OIRBench for benchmarking single- and multi-object editing, respectively. Experiments demonstrate that our method achieves superior performance in editing object shapes, colors, materials, categories, etc., especially in multi-object editing scenarios.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# MARVEL:大規模可変速度限界に対するマルチエージェント強化学習 MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits ( http://arxiv.org/abs/2310.12359v2 ) ライセンス: Link先を確認	Yuhang Zhang, Marcos Quinones-Grueiro, Zhiyao Zhang, Yanbing Wang, William Barbour, Gautam Biswas, Daniel Work,	(参考訳) 可変速度制限 (VSL) 制御は, リアルタイム交通条件に応じて速度制限を動的に調整することにより, 交通安全を向上し, 世界展開に期待できる高速道路交通管理戦略として機能する。これまでのVSL制御アルゴリズムの多くはルールベースであり、様々な複雑なトラフィックシナリオ下での一般化性に欠けていた。そこで本研究では,実環境に配置したハイウェイ廊下における大規模VSL制御のための新しいフレームワークであるMARVEL(Multi-Agent Reinforcement-learning for large-scale Variable spEed Limits)を提案する。 MARVELは、現実世界で観測可能な感覚情報のみを状態入力として利用し、交通条件、安全性、移動性への適応性を組み込んだ報酬構造を通して学習することにより、マルチエージェント協調を可能にする。全てのVSLエージェント間のパラメータ共有により、提案するフレームワークは、多くのエージェントで廊下をカバーするためにスケールする。ポリシーは、顕微鏡的な交通シミュレーション環境で訓練され、高速道路の短い伸びに焦点が当てられ、8つのVSLエージェントが7マイルにわたっている。テストのために、これらのポリシーは、アメリカTNのナッシュビル近郊17マイルのI-24に34のVSLエージェントを配置した、より広範なネットワークに適用される。 MARVELをベースとした手法は、制御不能なシナリオと比較して交通安全を63.4%改善し、I-24にデプロイされた最先端のアルゴリズムと比較して58.6%の交通移動率向上を実現している。さらに,エージェントの意思決定過程を検証し,異なる交通条件下での学習方針を検討するために,説明可能性分析を行う。最後に、I-24から収集した実世界のデータを用いたシミュレーションに基づく実験から得られたポリシーの応答を検証し、その展開能力について説明する。 Variable Speed Limit (VSL) control acts as a promising highway traffic management strategy with worldwide deployment, which can enhance traffic safety by dynamically adjusting speed limits according to real-time traffic conditions. Most of the deployed VSL control algorithms so far are rule-based, lacking generalizability under varying and complex traffic scenarios. In this work, we propose MARVEL (Multi-Agent Reinforcement-learning for large-scale Variable spEed Limits), a novel framework for large-scale VSL control on highway corridors with real-world deployment settings. MARVEL utilizes only sensing information observable in the real world as state input and learns through a reward structure that incorporates adaptability to traffic conditions, safety, and mobility, thereby enabling multi-agent coordination. With parameter sharing among all VSL agents, the proposed framework scales to cover corridors with many agents. The policies are trained in a microscopic traffic simulation environment, focusing on a short freeway stretch with 8 VSL agents spanning 7 miles. For testing, these policies are applied to a more extensive network with 34 VSL agents spanning 17 miles of I-24 near Nashville, TN, USA. MARVEL-based method improves traffic safety by 63.4% compared to the no control scenario and enhances traffic mobility by 58.6% compared to a state-of-the-practice algorithm that has been deployed on I-24. Besides, we conduct an explainability analysis to examine the decision-making process of the agents and explore the learned policy under different traffic conditions. Finally, we test the response of the policy learned from the simulation-based experiments with real-world data collected from I-24 and illustrate its deployment capability.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# 変圧器の追加を理解する Understanding Addition in Transformers ( http://arxiv.org/abs/2310.13121v8 ) ライセンス: Link先を確認	Philip Quirke, Fazl Barez,	(参考訳) Transformersのような機械学習モデルの内部動作を理解することは、安全で倫理的な使用に不可欠である。本稿では,n桁整数加算を行うために訓練された1層トランスフォーマーモデルの包括的解析を行う。提案手法は,各桁を対象とする並列ストリームに分割し,各桁の異なる位置に合わせて最適化されたアルゴリズムを用いることを示唆している。さらに,高い損失を特徴とする稀なシナリオを特定し,その説明を行う。モデルのアルゴリズムを徹底的に解明することにより、その機能に関する新たな洞察を提供する。これらの知見は厳密なテストと数学的モデリングを通じて検証され、モデル理解と解釈可能性の幅広い分野に寄与する。我々のアプローチは、より複雑なタスクと多層トランスフォーマーモデルを分析するための扉を開く。 Understanding the inner workings of machine learning models like Transformers is vital for their safe and ethical use. This paper provides a comprehensive analysis of a one-layer Transformer model trained to perform n-digit integer addition. Our findings suggest that the model dissects the task into parallel streams dedicated to individual digits, employing varied algorithms tailored to different positions within the digits. Furthermore, we identify a rare scenario characterized by high loss, which we explain. By thoroughly elucidating the model's algorithm, we provide new insights into its functioning. These findings are validated through rigorous testing and mathematical modeling, thereby contributing to the broader fields of model understanding and interpretability. Our approach opens the door for analyzing more complex tasks and multi-layer Transformer models.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# グラッピング支援の意義 Value of Assistance for Grasping ( http://arxiv.org/abs/2310.14402v2 ) ライセンス: Link先を確認	Mohammad Masarwy, Yuval Goshen, David Dovrat, Sarah Keren,	(参考訳) 複数の現実的な環境では、ロボットはその正確なポーズを知らずに物体をつかむことを任務とし、そのポーズを確率論的に推定し、つかむ方法を決定する。我々は、つかむ前に物体の観察をロボットに提供することが可能な設定をサポートするが、この可能性には制限があり、どの感知アクションが最も有用かを決定する必要がある。本決定は,ロボットのタスク完了能力に対する期待効果を評価するために,VOA(Value of Assistance)尺度を提供することによって支援する。シミュレーションおよび実世界の協調的把握設定における提案手法の評価を行った。 In multiple realistic settings, a robot is tasked with grasping an object without knowing its exact pose and relies on a probabilistic estimation of the pose to decide how to attempt the grasp. We support settings in which it is possible to provide the robot with an observation of the object before a grasp is attempted but this possibility is limited and there is a need to decide which sensing action would be most beneficial. We support this decision by offering a novel Value of Assistance (VOA) measure for assessing the expected effect a specific observation will have on the robot's ability to complete its task. We evaluate our suggested measure in simulated and real-world collaborative grasping settings.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# HallusionBench:大規模視覚言語モデルにおける言語幻覚と視覚錯覚の高度な診断スイート HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models ( http://arxiv.org/abs/2310.14566v4 ) ライセンス: Link先を確認	Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou,	(参考訳) 本稿では,画像コンテキスト推論評価のための総合ベンチマークであるHalusionBenchを紹介する。このベンチマークは、GPT-4V(Vision)、Gemini Pro Vision、Claude 3、LLaVA-1.5といった先進的な視覚言語モデル(LVLM)に対して、曖昧な理解と視覚データの解釈を強調することで、大きな課題を提示している。このベンチマークは、1129の質問と組み合わせた346の画像で構成されており、すべて人間の専門家によって細心の注意を払って作成されている。我々は,これらの視覚的質問に対して,制御群を確立するための新しい構造を導入する。この構造により、モデルの応答傾向、論理的整合性、および様々な障害モードの定量的解析を行うことができる。 HallusionBenchの評価では、15種類のモデルをベンチマークし、最先端のGPT-4Vによって達成された31.42%の質問対精度を強調した。特に、他の評価モデルは全て16%未満の精度を達成する。さらに,本分析では,言語幻覚や視覚錯覚など,観察された障害モードだけでなく,これらの落とし穴の理解を深めている。 HallusionBench内の包括的ケーススタディは、LVLMにおける幻覚と幻覚の課題に光を当てた。これらの知見に基づいて,今後の改善の道筋を提案する。ベンチマークとコードベースはhttps://github.com/tianyi-lab/HallusionBench.orgからアクセスすることができる。 We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HallusionBench, we benchmarked 15 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion, but also deepens an understanding of these pitfalls. Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://github.com/tianyi-lab/HallusionBench.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# EmoCLIP:ゼロショット映像表情認識のための視覚言語法 EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition ( http://arxiv.org/abs/2310.16640v2 ) ライセンス: Link先を確認	Niki Maria Foteinopoulou, Ioannis Patras,	(参考訳) 表情認識(FER)は感情コンピューティングにおいて重要な課題であるが、従来の7つの基本的な感情に焦点をあてることで、複雑な感情スペクトルへの適応性が制限される。動的インザワイルドFERに存在する新しい、目に見えない感情の問題に対処するため、ゼロショット分類のためのリッチな潜在表現の学習を促進することを目的とした、サンプルレベルのテキスト記述(文脈、表現、感情的手がかりのキャプション)を自然言語の監督として活用する新しい視覚言語モデルを提案する。これをテストするために,4つの人気のある動的FERデータセットのサンプルレベル記述に基づいて訓練されたモデルのゼロショット分類を用いて評価を行った。以上の結果から,本手法はベースライン法と比較して大きな改善をもたらすことが示唆された。具体的には、ゼロショットビデオFERでは、重み付き平均リコールでCLIPを10倍、重み付き平均リコールで5倍以上上回ります。さらに、メンタルヘルスの症状推定の下流課題に関するサンプルレベル記述を用いてトレーニングしたネットワークから得られた表現を評価し、最先端の手法に匹敵する性能、人間専門家との強い合意を達成した。すなわち、統合失調症症状の重症度推定において、Pearsonの相関係数を最大0.85まで達成し、これは人間の専門家の合意に匹敵するものである。コードはhttps://github.com/NickyFot/EmoCLIPで公開されている。 Facial Expression Recognition (FER) is a crucial task in affective computing, but its conventional focus on the seven basic emotions limits its applicability to the complex and expanding emotional spectrum. To address the issue of new and unseen emotions present in dynamic in-the-wild FER, we propose a novel vision-language model that utilises sample-level text descriptions (i.e. captions of the context, expressions or emotional cues) as natural language supervision, aiming to enhance the learning of rich latent representations, for zero-shot classification. To test this, we evaluate using zero-shot classification of the model trained on sample-level descriptions on four popular dynamic FER datasets. Our findings show that this approach yields significant improvements when compared to baseline methods. Specifically, for zero-shot video FER, we outperform CLIP by over 10\% in terms of Weighted Average Recall and 5\% in terms of Unweighted Average Recall on several datasets. Furthermore, we evaluate the representations obtained from the network trained using sample-level descriptions on the downstream task of mental health symptom estimation, achieving performance comparable or superior to state-of-the-art methods and strong agreement with human experts. Namely, we achieve a Pearson's Correlation Coefficient of up to 0.85 on schizophrenia symptom severity estimation, which is comparable to human experts' agreement. The code is publicly available at: https://github.com/NickyFot/EmoCLIP.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# SparseDFF:ワンショットデキスタラスマニピュレーションのためのスパースビュー機能蒸留 SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation ( http://arxiv.org/abs/2310.16838v2 ) ライセンス: Link先を確認	Qianxu Wang, Haotong Zhang, Congyue Deng, Yang You, Hao Dong, Yixin Zhu, Leonidas Guibas,	(参考訳) 人間は、様々な形状、ポーズ、外観のオブジェクト間で操作能力を伝達する素晴らしいスキルを示し、それは異なるインスタンス間の意味的対応を理解することに根ざしている。 SparseDFFは、大きな2次元視覚モデルを用いて、スパースRGBD画像から意味的特徴を抽出する3次元シーンのための新しいDFFである。 SparseDFFは、画像特徴を3Dポイントクラウドにマッピングすることで、デクスタラス操作の効率的なワンショット学習を可能にする、ビュー一貫性の3D DFFを生成する。 Central to SparseDFFは機能改善ネットワークであり、ビュー間の対照的な損失と機能継続のためのポイントプルーニング機構に最適化されている。これにより、機能不一致w.r.t.エンドエフェクタパラメータ、ブリッジングデモ、ターゲット操作の最小化が容易になる。 SparseDFFは実世界のシナリオにおいて、厳密なオブジェクトと変形可能なオブジェクトの両方を操作できることを証明し、オブジェクトとシーンのバリエーションをまたいだ重要な一般化能力を示す。 Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a capability rooted in their understanding of semantic correspondences between different instances. To equip robots with a similar high-level comprehension, we present SparseDFF, a novel DFF for 3D scenes utilizing large 2D vision models to extract semantic features from sparse RGBD images, a domain where research is limited despite its relevance to many tasks with fixed-camera setups. SparseDFF generates view-consistent 3D DFFs, enabling efficient one-shot learning of dexterous manipulations by mapping image features to a 3D point cloud. Central to SparseDFF is a feature refinement network, optimized with a contrastive loss between views and a point-pruning mechanism for feature continuity. This facilitates the minimization of feature discrepancies w.r.t. end-effector parameters, bridging demonstrations and target manipulations. Validated in real-world scenarios with a dexterous hand, SparseDFF proves effective in manipulating both rigid and deformable objects, demonstrating significant generalization capabilities across object and scene variations.	翻訳日:2024-03-20 03:22:50 公開日:2024-03-18
# 低ランク適応の表現力 The Expressive Power of Low-Rank Adaptation ( http://arxiv.org/abs/2310.17513v3 ) ライセンス: Link先を確認	Yuchen Zeng, Kangwook Lee,	(参考訳) 重み行列の低ランク適応を利用するパラメータ効率のよい微調整法であるLoRAは,大規模言語モデルや拡散モデルなどの事前学習モデルの微調整手法として広く用いられている。実際に大きな成功を収めたにもかかわらず、ロラの理論的基盤は未解明のままである。本稿では,ロラの表現力を理論的に解析することで,このギャップを埋める第一歩を踏み出す。完全に接続されたニューラルネットワークの場合、LoRAは任意のモデル$f$を適用でき、任意の小さなターゲットモデルを表す$\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$を正確に表現できる。また,LoRAランクがしきい値よりも低い場合の近似誤差を定量化する。トランスフォーマーネットワークの場合、任意のモデルが、ランク-$(\frac{\text{embedding size}}{2})$ LoRAアダプタで同じサイズのターゲットモデルに適応可能であることを示す。 Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any model $f$ to accurately represent any smaller target model $\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(\frac{\text{embedding size}}{2})$ LoRA adapters.	翻訳日:2024-03-20 03:12:40 公開日:2024-03-18
# TivNe-SLAM:時変ニューラルラジアンス場による動的マッピングと追跡 TivNe-SLAM: Dynamic Mapping and Tracking via Time-Varying Neural Radiance Fields ( http://arxiv.org/abs/2310.18917v4 ) ライセンス: Link先を確認	Chengyao Duan, Zhiliu Yang,	(参考訳) 従来のNeural Radiance Fields(NeRF)をSLAMフレームワークに統合する試みは、静的シーンの仮定に依存するか、地上の真理カメラのポーズを必要とする。本稿では,動的シーンの追跡と再構成を行うための時間変化表現を提案する。まず、追跡プロセスとマッピングプロセスという2つのプロセスが同時にフレームワークで維持されます。トラッキングプロセスでは、全ての入力画像は一様にサンプリングされ、その後、自己監督パラダイムで漸進的に訓練される。マッピングプロセスでは,動的物体を静的な背景から区別するためにモーションマスクを活用し,動的領域からより多くのピクセルをサンプリングする。第二に、両プロセスのパラメータ最適化は2段階からなる: 第1段階は、変形場を標準場に変換するために、時間と3D位置を関連付ける。そして、第2ステージは、標準フィールドの埋め込みと時間を関連付け、色と符号付き距離関数(SDF)を得る。最後に、重なり合う速度に基づく新しいキーフレーム選択戦略を提案する。 2つの合成データセットと1つの実世界のデータセットに対するアプローチを評価する。また,提案手法は既存のNeRF法と比較して,トラッキングとマッピングの両面で競合する結果が得られることを実証した。 Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, tracking process and mapping process, are simultaneously maintained in our framework. For the tracking process, all input images are uniformly sampled, then progressively trained in a self-supervised paradigm. For the mapping process, we leverage motion masks to distinguish dynamic objects from static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes consists of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second stage associates time with the embeddings of canonical field to obtain colors and Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two synthetic datasets and one real-world dataset. And the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based methods.	翻訳日:2024-03-20 03:12:40 公開日:2024-03-18
# 生成逆ネットワークの安定トレーニングのための洪水正規化 Flooding Regularization for Stable Training of Generative Adversarial Networks ( http://arxiv.org/abs/2311.00318v2 ) ライセンス: Link先を確認	Iu Yahiro, Takashi Ishida, Naoto Yokoya,	(参考訳) GAN(Generative Adversarial Networks)は画像生成において顕著な性能を示した。しかし、GANトレーニングは不安定な問題に悩まされている。この問題に対処する主要なアプローチの1つは損失関数を変更することである。本稿では,対向損失関数を直接正規化することに焦点を当てる。本稿では, 教師付き学習における過度な抑制手法であるフラッディングをGANに適用し, 差別者の損失が過度に低くなるのを防ぐ方法を提案する。洪水は洪水レベルを調整する必要があるが, GANに適用した場合, 対向的損失関数によって適切な範囲の洪水レベル設定が決定され, バイナリクロスエントロピー損失を用いたGANの理論的解析が支持される。我々は,洪水がGAN訓練を安定させ,他の安定化技術と組み合わせることができることを実験的に検証した。また, 差別者の損失をフラッドレベル以下に抑えることにより, フラッドレベルがある程度高い場合でも, トレーニングは安定的に進行することを示した。 Generative Adversarial Networks (GANs) have shown remarkable performance in image generation. However, GAN training suffers from the problem of instability. One of the main approaches to address this problem is to modify the loss function, often using regularization terms in addition to changing the type of adversarial losses. This paper focuses on directly regularizing the adversarial loss function. We propose a method that applies flooding, an overfitting suppression method in supervised learning, to GANs to directly prevent the discriminator's loss from becoming excessively low. Flooding requires tuning the flood level, but when applied to GANs, we propose that the appropriate range of flood level settings is determined by the adversarial loss function, supported by theoretical analysis of GANs using the binary cross entropy loss. We experimentally verify that flooding stabilizes GAN training and can be combined with other stabilization techniques. We also show that by restricting the discriminator's loss to be no less than the flood level, the training proceeds stably even when the flood level is somewhat high.	翻訳日:2024-03-20 03:12:40 公開日:2024-03-18
# 機械スーパービジョンへのシフト:自動医用画像分割・分類のための注釈効率の良いセミ・セルフ・スーパービジョン学習 Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification ( http://arxiv.org/abs/2311.10319v3 ) ライセンス: Link先を確認	Pranav Singh, Raviteja Chukkapalli, Shravan Chaudhari, Luoyao Chen, Mei Chen, Jinqian Pan, Craig Smuda, Jacopo Cirrone,	(参考訳) 臨床治療の進歩は、大量の注釈付きデータに依存する教師付き学習技術の限界によって、ますます制限されている。アノテーションのプロセスは費用がかかるだけでなく、臨床専門家にかなりの時間を要する。本稿では,S4MI(Self-Supervision and Semi-Supervision for Medical Imaging)パイプラインを導入する。これらの技術はラベリングを必要としない補助的なタスクに携わり、完全に教師された手法に比べて機械の監督のスケーリングを簡素化する。本研究は、これらの手法を3つの異なる医用画像データセット上で評価し、分類と分割作業の有効性を評価する。特に,自己教師付き学習は,すべての評価データセットの分類において,教師付き手法の性能を大幅に上回った。注目すべきは、半教師付きアプローチはセグメンテーションにおいて優れた結果を示し、全データセットで50%少ないラベルを使用しながら、完全な教師付き手法よりも優れた結果を示したことだ。科学コミュニティへのコントリビューションへのコミットメントに合わせて、私たちはS4MIコードを公開して、より広範な適用とこれらの手法のさらなる開発を可能にしました。 Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages the advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self-supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods.	翻訳日:2024-03-20 03:02:46 公開日:2024-03-18
# 不確かさを解消した自己プロンプト型医用画像分割におけるセグメンテーションモデルの信頼性向上 Enhancing the Reliability of Segment Anything Model for Auto-Prompting Medical Image Segmentation with Uncertainty Rectification ( http://arxiv.org/abs/2311.10529v3 ) ライセンス: Link先を確認	Yichi Zhang, Shiyao Hu, Sijie Ren, Chen Jiang, Yuan Cheng, Yuan Qi,	(参考訳) Segment Anything Model (SAM)は、最近、プロンプト駆動のイメージセグメンテーションタスクの基盤モデルとして登場した。しかし、オリジナルのSAMとその医学的なバリエーションはどちらも、標的構造をスライス・バイ・スライス・マニュアルで指示し、アプリケーションの負担を直接増やす必要がある。 SAMを完全な自動的な方法で自動プロンプトする試みは試みられているが、医療画像の分野ではまだ性能が劣り、信頼性が欠如している。本稿では,医用画像の自動分割の信頼性を高めるための不確実性修正SAMフレームワークUR-SAMを提案する。自動プロンプト生成のためのローカライゼーションフレームワークを構築し,不確実性推定のためのSAMの一連のインプットプロンプトを得るためのプロンプト拡張モジュールと,不確実性推定の分布をさらに活用してセグメンテーション性能を向上させるための不確実性補正モジュールを組み込んだ。 35個の臓器の分節を包含する2つの公開3次元医用データセットの広範囲な実験により, 補足訓練や微調整がなければ, 最大10.7 %, 13.8 %のダイス類似度係数で分節性能を向上し, 手動のプロンプトを伴わない医用画像分節の効率と幅広い機能を示す。 The Segment Anything Model (SAM) has recently emerged as a groundbreaking foundation model for prompt-driven image segmentation tasks. However, both the original SAM and its medical variants require slice-by-slice manual prompting of target structures, which directly increase the burden for applications. Despite attempts of auto-prompting to turn SAM into a fully automatic manner, it still exhibits subpar performance and lacks of reliability especially in the field of medical imaging. In this paper, we propose UR-SAM, an uncertainty rectified SAM framework to enhance the reliability for auto-prompting medical image segmentation. Building upon a localization framework for automatic prompt generation, our method incorporates a prompt augmentation module to obtain a series of input prompts for SAM for uncertainty estimation and an uncertainty-based rectification module to further utilize the distribution of estimated uncertainty to improve the segmentation performance. Extensive experiments on two public 3D medical datasets covering the segmentation of 35 organs demonstrate that without supplementary training or fine-tuning, our method further improves the segmentation performance with up to 10.7 % and 13.8 % in dice similarity coefficient, demonstrating efficiency and broad capabilities for medical image segmentation without manual prompting.	翻訳日:2024-03-20 03:02:46 公開日:2024-03-18
# LOSTU: 高速でスケーラブルで不確実な三角測量 LOSTU: Fast, Scalable, and Uncertainty-Aware Triangulation ( http://arxiv.org/abs/2311.11171v2 ) ライセンス: Link先を確認	Sébastien Henry, John A. Christian,	(参考訳) この研究は、‘texttt{LOSTU} と呼ばれる三角測量を非定性的でスケーラブルで統計的に最適に行う方法を提案する。再射(L_2$)エラーを最小限に抑える三角法アルゴリズムとは異なり、LOSTUはカメラのポーズやパラメータにエラーがある場合の最大推定値を提供する。この一般的なフレームワークは、直接線形変換(DLT)や中間点のような他の三角法を文脈化するために用いられる。合成実験により、LOSTU は不確実性を意識した Levenberg-Marquardt (または類似した) 最適化スキームよりもはるかに高速であり、同等の精度で結果が得られることが示された。最後に、LOSTUは、不確実性を認識したポーズ推定と共に逐次再構成で実装され、より良い復元基準が得られる。 This work proposes a non-iterative, scalable, and statistically optimal way to triangulate called \texttt{LOSTU}. Unlike triangulation algorithms that minimize the reprojection ($L_2$) error, LOSTU will still provide the maximum likelihood estimate when there are errors in camera pose or parameters. This generic framework is used to contextualize other triangulation methods like the direct linear transform (DLT) or the midpoint. Synthetic experiments show that LOSTU can be substantially faster than using uncertainty-aware Levenberg-Marquardt (or similar) optimization schemes, while providing results of comparable precision. Finally, LOSTU is implemented in sequential reconstruction in conjunction with uncertainty-aware pose estimation, where it yields better reconstruction metrics.	翻訳日:2024-03-20 03:02:46 公開日:2024-03-18
# CRISP: クラス認識型モデルプルーニングのためのハイブリッド構造空間 CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning ( http://arxiv.org/abs/2311.14272v2 ) ライセンス: Link先を確認	Shivam Aggarwal, Kuluhan Binici, Tulika Mitra,	(参考訳) 分類タスクのための機械学習パイプラインは、広範囲のクラスで正確性を達成するために普遍的なモデルを訓練することが多い。しかし、典型的なユーザーは定期的に限られたクラスだけに遭遇する。この格差は、ユーザー固有のクラスにフォーカスするようにモデルを調整することで、計算効率を高める機会を提供する。既存の作業は非構造化プルーニングに依存しており、ランダムに分散されたノンゼロ値がモデルに導入されているため、ハードウェアアクセラレーションには適さない。あるいは、チャネルプルーニングのような構造化プルーニングを用いる方法もあるが、これらは最小限の圧縮しか提供せず、モデルの精度を低下させる可能性がある。本研究では,N:Mの微細構造と粗粒ブロックの微細構造を組み合わせたハイブリッド構造空間パターンを利用した新しい刈り込みフレームワークCRISPを提案する。我々のプルーニング戦略は、勾配に基づくクラス対応サリエンシスコアによって導かれ、ユーザ固有のクラスに不可欠なウェイトを維持できる。 CRISPは、ImageNetとCIFAR-100データセット上のResNet-50、VGG-16、MobileNetV2のような人気モデルのメモリ消費を最小限に抑えて高い精度を達成する。さらに、CRISPは、既存のプルーニング法と比較して、最大14$\times$のレイテンシとエネルギー消費の削減を提供すると同時に、同等の精度を維持している。私たちのコードはhttps://github.com/shivmgg/CRISP/で利用可能です。 Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at https://github.com/shivmgg/CRISP/.	翻訳日:2024-03-20 03:02:46 公開日:2024-03-18
# 深層学習における幾何適応勾配勾配による一様指数速度での大域的$\mathcal{L}^2$最小化 Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning ( http://arxiv.org/abs/2311.15487v3 ) ライセンス: Link先を確認	Thomas Chen,	(参考訳) 本稿では,Deep Learning Networkにおける$\mathcal{L}^2$コスト関数の最小化に広く用いられている勾配降下流について考察する。どちらも明快で自然な不変な幾何学的意味を持ち、オーバーパラメトリズドのプルバックベクトルバンドル構造とアンダーパラメトリズドセッティングのプッシュフォワードベクトルバンドル構造を考慮に入れている。過度パラメータ化の場合、階数条件が成り立つと、修正勾配降下のすべての軌道が、一様指数収束速度で$\mathcal{L}^2$のコストを大域最小に駆動する。後者と非リーマン幾何学との関係を指摘する。 We consider the gradient descent flow widely used for the minimization of the $\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two modified versions; one adapted for the overparametrized setting, and the other for the underparametrized setting. Both have a clear and natural invariant geometric meaning, taking into account the pullback vector bundle structure in the overparametrized, and the pushforward vector bundle structure in the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry.	翻訳日:2024-03-20 02:52:49 公開日:2024-03-18
# UniRepLKNet: オーディオ、ビデオ、ポイントクラウド、時系列、画像認識のためのユニバーサルパーセプション大カーネル ConvNet UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition ( http://arxiv.org/abs/2311.15599v2 ) ライセンス: Link先を確認	Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan,	(参考訳) 大規模カーネル畳み込みニューラルネットワーク(ConvNets)は、最近広範な研究の注目を集めているが、未解決で重要な2つの問題がさらなる調査を要求している。 1) 既存の大型カーネルのConvNetのアーキテクチャは、従来のConvNetやトランスフォーマーの設計原則に大きく従っているが、大型カーネルのConvNetのアーキテクチャ設計は未完成のままである。 2) 変換器が複数のモダリティを支配してきたため, ConvNets が視覚以外の領域で強い普遍的知覚能力を持つかどうかについても検討が続けられている。本稿では2つの側面から貢献する。 1)大カーネルのConvNet設計のための4つのアーキテクチャガイドラインを提案し,その中核となるのは,それらを小さなカーネルと区別する,大きなカーネルの本質的特性を活用することだ。このようなガイドラインに従って,提案する大カーネルのConvNetは画像認識における主要な性能を示す(画像ネット精度88.0%,ADE20K mIoU55.6%,COCOボックスAP56.4%)。 2) 大規模なカーネルが,本来熟練していないドメインにおいて,ConvNetの例外的なパフォーマンスを解放する鍵となることを発見した。特定のモダリティ関連前処理アプローチを用いて,提案モデルは,アーキテクチャへのモダリティ固有のカスタマイズがなくても,時系列予測や音声認識タスクにおける最先端のパフォーマンスを実現する。すべてのコードとモデルはGitHubとHuggingfaceで公開されている。 Large-kernel convolutional neural networks (ConvNets) have recently received extensive research attention, but two unresolved and critical issues demand further investigation. 1) The architectures of existing large-kernel ConvNets largely follow the design principles of conventional ConvNets or transformers, while the architectural design for large-kernel ConvNets remains under-addressed. 2) As transformers have dominated multiple modalities, it remains to be investigated whether ConvNets also have a strong universal perception ability in domains beyond vision. In this paper, we contribute from two aspects. 1) We propose four architectural guidelines for designing large-kernel ConvNets, the core of which is to exploit the essential characteristics of large kernels that distinguish them from small kernels - they can see wide without going deep. Following such guidelines, our proposed large-kernel ConvNet shows leading performance in image recognition (ImageNet accuracy of 88.0%, ADE20K mIoU of 55.6%, and COCO box AP of 56.4%), demonstrating better performance and higher speed than the recent powerful competitors. 2) We discover large kernels are the key to unlocking the exceptional performance of ConvNets in domains where they were originally not proficient. With certain modality-related preprocessing approaches, the proposed model achieves state-of-the-art performance on time-series forecasting and audio recognition tasks even without modality-specific customization to the architecture. All the code and models are publicly available on GitHub and Huggingface.	翻訳日:2024-03-20 02:52:49 公開日:2024-03-18
# 線形逆問題の解法のための深い正則化複合ガウスネットワーク Deep Regularized Compound Gaussian Network for Solving Linear Inverse Problems ( http://arxiv.org/abs/2311.17248v3 ) ライセンス: Link先を確認	Carter Lyons, Raghu G. Raj, Margaret Cheney,	(参考訳) 逆問題に事前情報を組み込むことは、例えば、最大位置推定によって、堅牢な逆問題解決を容易にする重要な手法である。本稿では,複合ガウス分布(CG)クラスにおける問題固有の統計的事前選択を許容する線形逆問題に対する2つの新しいアプローチを考案する。 CGクラスは、疎度に基づくアプローチを含む信号および画像再構成手法において、よく使われる多くの先行を仮定する。最初に開発された手法は、一般化複合ガウス最小二乗法(G-CG-LS)と呼ばれる反復アルゴリズムであり、正規化がCGを前に強制する正規化最小二乗目的関数を最小化する。そして、G-CG-LSをアンロールするか、展開するかして、2つ目の手法、DR-CG-Netと呼ばれる新しいDeep regularized(DR)ニューラルネットワークを構築し、事前情報を学習する。 G-CG-LSの収束特性に関する詳細な計算理論とDR-CG-Netの詳細な数値実験を提供する。従来のCGの総合的な性質から、DR-CG-Netはトモグラフィーや圧縮センシングにおいて、特に低トレーニングのシナリオにおいて、競合する先行技術よりも優れていたことが示されている。 Incorporating prior information into inverse problems, e.g. via maximum-a-posteriori estimation, is an important technique for facilitating robust inverse problem solutions. In this paper, we devise two novel approaches for linear inverse problems that permit problem-specific statistical prior selections within the compound Gaussian (CG) class of distributions. The CG class subsumes many commonly used priors in signal and image reconstruction methods including those of sparsity-based approaches. The first method developed is an iterative algorithm, called generalized compound Gaussian least squares (G-CG-LS), that minimizes a regularized least squares objective function where the regularization enforces a CG prior. G-CG-LS is then unrolled, or unfolded, to furnish our second method, which is a novel deep regularized (DR) neural network, called DR-CG-Net, that learns the prior information. A detailed computational theory on convergence properties of G-CG-LS and thorough numerical experiments for DR-CG-Net are provided. Due to the comprehensive nature of the CG prior, these experiments show that DR-CG-Net outperforms competitive prior art methods in tomographic imaging and compressive sensing, especially in challenging low-training scenarios.	翻訳日:2024-03-20 02:52:49 公開日:2024-03-18
# 視覚世界における三角形分布の学習 Learning Triangular Distribution in Visual World ( http://arxiv.org/abs/2311.18605v3 ) ライセンス: Link先を確認	Ping Chen, Xingpeng Zhang, Chengtao Zhou, Dichao Fan, Peng Tu, Le Zhang, Yanlin Qian,	(参考訳) 畳み込みニューラルネットワークは、ラベル分布学習を含む広汎な視覚タスクで成功しており、通常は、非線形の視覚特徴から明確に定義されたラベルへの注入を学習する形式を取る。しかし,これらの問題に対処するために,特徴とラベルの数学的関連性について検討し,ラベル分布学習のための汎用的でシンプルな枠組みを提示する。特徴とラベルの間に射影関数を構築するためのいわゆる三角分布変換(TDT)を提案し、対称的特徴差がラベルの違いを線形に反映することを保証する。提案したTDTは,各種ラベル分散学習タスクに対処するために,主流のバックボーンネットワークのプラグインとして使用できる。顔の年齢認識, 照明色度推定, 審美性評価実験は, TDTが先行技術よりも同等以上の結果が得られることを示した。 Convolution neural network is successful in pervasive vision tasks, including label distribution learning, which usually takes the form of learning an injection from the non-linear visual features to the well-defined labels. However, how the discrepancy between features is mapped to the label discrepancy is ambient, and its correctness is not guaranteed.To address these problems, we study the mathematical connection between feature and its label, presenting a general and simple framework for label distribution learning. We propose a so-called Triangular Distribution Transform (TDT) to build an injective function between feature and label, guaranteeing that any symmetric feature discrepancy linearly reflects the difference between labels. The proposed TDT can be used as a plug-in in mainstream backbone networks to address different label distribution learning tasks. Experiments on Facial Age Recognition, Illumination Chromaticity Estimation, and Aesthetics assessment show that TDT achieves on-par or better results than the prior arts.	翻訳日:2024-03-20 02:52:49 公開日:2024-03-18
# ロバストOOD自己監督型コントラスト心電図表現学習における拡張の包括的評価 A Comprehensive Evaluation of Augmentations for Robust OOD Self-Supervised Contrastive Phonocardiogram Representation Learning ( http://arxiv.org/abs/2312.00502v2 ) ライセンス: Link先を確認	Aristotelis Ballas, Vasileios Papapanagiotou, Christos Diou,	(参考訳) 近年の研究活動が増加しているにもかかわらず、深層学習モデルは医学などいくつかの現実世界では広く受け入れられていない。高品質な注釈付きデータの不足は、しばしば堅牢で一般化可能なモデルの開発を妨げる。 Contrastive Self-Supervised Learning (SSL)は、ラベル付きデータの不足に対する潜在的な解決策を提供する。本研究では,1D phonocardiogram (PCG) サンプルの異常検出のために,信号の一般化表現を学習してコントラッシブSSLを適用することを提案する。具体的には、幅広いオーディオベース拡張の広範な比較評価を行い、複数の下流タスクにまたがる複数のデータセットにおける訓練された分類器の評価を行い、最終的にモデルトレーニングにおける各拡張の影響について報告する。トレーニング分布によっては、完全教師付きモデルの有効性は、未確認データで評価すると最大32%低下し、SSLモデルは最大10%低下し、場合によっては改善可能であることを実験的に実証した。我々は、対照的なSSL事前トレーニングが、医療専門家による時間的および労働集約的なアノテーションプロセスに頼ることなく、見つからないOODデータに一般化可能な堅牢な分類器を提供することを支援することを議論し、実験的に実証した。さらに,提案プロトコルは,モデルトレーニングにおけるその効果の大きさを計算することにより,ロバストPCG信号処理における最も有望かつ適切な拡張点に光を当てる。最後に、新しいアプローチを開発するためのオープンソースのコードベースに加えて、PCG分類のための堅牢なモデルを作成するためのロードマップを研究者や実践者に提供します。 Despite the recent increase in research activity, deep-learning models have not yet been widely accepted in several real-world settings, such as medicine. The shortage of high-quality annotated data often hinders the development of robust and generalizable models, which do not suffer from degraded effectiveness when presented with newly-collected, out-of-distribution (OOD) datasets. Contrastive Self-Supervised Learning (SSL) offers a potential solution to labeled data scarcity, as it takes advantage of unlabeled data to increase model effectiveness and robustness. In this research, we propose applying contrastive SSL for detecting abnormalities in 1D phonocardiogram (PCG) samples by learning a generalized representation of the signal. Specifically, we perform an extensive comparative evaluation of a wide range of audio-based augmentations, evaluate trained classifiers on multiple datasets across different downstream tasks, and finally report on the impact of each augmentation in model training. We experimentally demonstrate that, depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32% when evaluated on unseen data, while SSL models only lose up to 10% or even improve in some cases. We argue and experimentally demonstrate that, contrastive SSL pretraining can assist in providing robust classifiers which can generalize to unseen, OOD data, without relying on time- and labor-intensive annotation processes by medical experts. Furthermore, the proposed extensive evaluation protocol sheds light on the most promising and appropriate augmentations for robust PCG signal processing, by calculating their effect size on model training. Finally, we provide researchers and practitioners with a roadmap towards producing robust models for PCG classification, in addition to an open-source codebase for developing novel approaches.	翻訳日:2024-03-20 02:52:49 公開日:2024-03-18
# 感情付きGPT-4V:一般化感情認識のためのゼロショットベンチマーク GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition ( http://arxiv.org/abs/2312.04293v3 ) ライセンス: Link先を確認	Zheng Lian, Licai Sun, Haiyang Sun, Kang Chen, Zhuofan Wen, Hao Gu, Bin Liu, Jianhua Tao,	(参考訳) 近年, GPT-4 with Vision (GPT-4V) は様々なタスクにおいて顕著な視覚能力を示したが, 感情認識性能は十分に評価されていない。このギャップを埋めるために、視覚的感情分析、ツイート感情分析、マイクロ圧縮認識、顔の感情認識、動的顔の感情認識、マルチモーダル感情認識の6つのタスクをカバーする21のベンチマークデータセット上で、GPT-4Vの定量的評価結果を示す。本稿では,これらの課題を総合的に「一般化感情認識(GER)」と呼ぶ。実験により, GERタスクにおいて, GPT-4Vが強い視覚的理解能力を示すことが明らかとなった。一方、GPT-4Vは、マルチモーダルな手がかりを統合し、時間的情報を活用する能力を示す。しかし、GPT-4Vは主に一般的なドメイン向けに設計されており、専門知識を必要とするマイクロ表現を認識できないことに注意する必要がある。本稿では,GER タスクに対する GPT-4V の定量的評価を行う。コードをオープンソース化し、その後の研究者に、より多くのタスクやデータセットを含めることで、評価範囲を広げるよう促しています。私たちのコードと評価結果は、https://github.com/zeroQiaoba/gpt4v-emotion.comで公開されています。 Recently, GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated. To bridge this gap, we present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks: visual sentiment analysis, tweet sentiment analysis, micro-expression recognition, facial emotion recognition, dynamic facial emotion recognition, and multimodal emotion recognition. This paper collectively refers to these tasks as ``Generalized Emotion Recognition (GER)''. Through experimental analysis, we observe that GPT-4V exhibits strong visual understanding capabilities in GER tasks. Meanwhile, GPT-4V shows the ability to integrate multimodal clues and exploit temporal information, which is also critical for emotion recognition. However, it's worth noting that GPT-4V is primarily designed for general domains and cannot recognize micro-expressions that require specialized knowledge. To the best of our knowledge, this paper provides the first quantitative assessment of GPT-4V for GER tasks. We have open-sourced the code and encourage subsequent researchers to broaden the evaluation scope by including more tasks and datasets. Our code and evaluation results are available at: https://github.com/zeroQiaoba/gpt4v-emotion.	翻訳日:2024-03-20 02:42:50 公開日:2024-03-18
# スペクトル超解法における学習行動相関--空間スペクトル注意が線形依存と出会う場合- Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence ( http://arxiv.org/abs/2312.12833v2 ) ライセンス: Link先を確認	Hongyuan Wang, Lizhi Wang, Jiang Xu, Chang Chen, Xue Hu, Fenglong Song, Youliang Yan,	(参考訳) 容易に取得可能なRGB画像からハイパースペクトル像(HSI)を復元することを目的としたスペクトル超解像は、計算写真分野への関心が高まっている。スペクトル超解像の重要な側面は、HSI内の相関を利用することである。しかし、既存のTransformerのボトルネックには2つの種類がある。まず、既存のトランスフォーマーは、空間的またはスペクトル的相関を個別に強調し、HSIの3次元特徴を乱し、空間的・スペクトル的相関の統一を阻害する。第二に、既存の自己注意機構は、トークンのペア間の相関を学習することで、常にフルランク相関行列を確立し、複数のトークン間でHSIに広く存在する線形依存を記述することができない。これらの問題に対処するために,スペクトル超解像のための新しい Exhaustive correlation Transformer (ECT) を提案する。まず、空間的に連続的な分割戦略とスペクトル的に不連続な分割戦略を統合することにより、空間的スペクトル相関を統一したスペクトル的不連続な3D分割戦略を提案する。第二に、動的に計算された低ランク依存マップを用いて、複数のトークン間の線形依存をキャプチャする動的低ランクマッピング(DLRM)モデルを提案する。統合された空間スペクトルの注意と線形依存を統合することで、われわれのECTはHSI内での徹底的な相関をモデル化できる。シミュレーションデータと実データの両方を用いた実験結果から,本手法が最先端の性能を達成できることが示唆された。コードと事前訓練されたモデルは後日公開される。 Spectral super-resolution that aims to recover hyperspectral image (HSI) from easily obtainable RGB image has drawn increasing interest in the field of computational photography. The crucial aspect of spectral super-resolution lies in exploiting the correlation within HSIs. However, two types of bottlenecks in existing Transformers limit performance improvement and practical applications. First, existing Transformers often separately emphasize either spatial-wise or spectral-wise correlation, disrupting the 3D features of HSI and hindering the exploitation of unified spatial-spectral correlation. Second, existing self-attention mechanism always establishes full-rank correlation matrix by learning the correlation between pairs of tokens, leading to its inability to describe linear dependence widely existing in HSI among multiple tokens. To address these issues, we propose a novel Exhaustive Correlation Transformer (ECT) for spectral super-resolution. First, we propose a Spectral-wise Discontinuous 3D (SD3D) splitting strategy, which models unified spatial-spectral correlation by integrating spatial-wise continuous splitting strategy and spectral-wise discontinuous splitting strategy. Second, we propose a Dynamic Low-Rank Mapping (DLRM) model, which captures linear dependence among multiple tokens through a dynamically calculated low-rank dependence map. By integrating unified spatial-spectral attention and linear dependence, our ECT can model exhaustive correlation within HSI. The experimental results on both simulated and real data indicate that our method achieves state-of-the-art performance. Codes and pretrained models will be available later.	翻訳日:2024-03-20 02:32:43 公開日:2024-03-18
# ECAMP: エンティティ中心のコンテキスト対応医療ビジョン言語事前トレーニング ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training ( http://arxiv.org/abs/2312.13316v2 ) ライセンス: Link先を確認	Rongsheng Wang, Qingsong Yao, Haoran Lai, Zhiyang He, Xiaodong Tao, Zihang Jiang, S. Kevin Zhou,	(参考訳) 医学的視覚言語による事前訓練の大幅な進歩にもかかわらず、既存の手法は、放射線学レポートにおける固有の実体固有の文脈と、テキストと画像の間の複雑な相互モーダルな文脈関係をほとんど見落としてきた。このギャップを埋めるために、我々は、よりエンティティ中心でコンテキストに敏感な医療データの解釈を可能にするために設計された、エンティティ中心のコンテキスト対応医療ビジョン言語事前学習(ECAMP)フレームワークを提案する。近年の強力な大規模言語モデルを用いて,医療報告からエンティティ中心のコンテキストを抽出し,ECAMPがテキストのモダリティからより効果的な監視を行えるようにした。さらに、慎重に設計されたエンティティ認識、コンテキスト強化されたマスク付き言語モデリング、コンテキスト誘導された超解像タスクでモデルを事前学習することにより、ECAMPはテキストと画像のモダリティ間の相互作用を著しく改善し、エンティティ中心のコンテキスト特徴を抽出する能力が向上する。さらに、提案するマルチスケールコンテキスト融合設計により、粗い画像表現と細かな画像表現のセマンティック統合が向上し、マルチスケールダウンストリームアプリケーションの性能が向上する。これらのコンポーネントを組み合わせることで、現在の最先端の手法よりも大幅にパフォーマンスが向上し、医療画像におけるクロスモダリティ学習の新たな標準を確立します。コードとモデルはhttps://github.com/ToniChopp/ECAMPで入手できる。 Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent entity-specific context within radiology reports and the complex cross-modality contextual relationships between text and images. To close this gap, we propose a novel Entity-centered Context-aware Medical Vision-language Pre-training (ECAMP) framework, which is designed to enable a more entity-centered and context-sensitive interpretation of medical data. Utilizing the recent powerful large language model, we distill entity-centered context from medical reports, which enables ECAMP to gain more effective supervision from the text modality. By further pre-training our model with carefully designed entity-aware, context-enhanced masked language modeling and context-guided super-resolution tasks, ECAMP significantly refines the interplay between text and image modalities, leading to an enhanced ability to extract entity-centered contextual features. Besides, our proposed multi-scale context fusion design also improves the semantic integration of both coarse and fine-level image representations, prompting better performance for multi-scale downstream applications. Combining these components leads to significant performance leaps over current state-of-the-art methods and establishes a new standard for cross-modality learning in medical imaging, whose effectiveness is demonstrated by our extensive experiments on various tasks including classification, segmentation, and detection across several public datasets. Code and models are available at https://github.com/ToniChopp/ECAMP.	翻訳日:2024-03-20 02:32:43 公開日:2024-03-18
# 拡散逆流:条件付きビデオ拡散による逆流の学習 Diffusion Reward: Learning Rewards via Conditional Video Diffusion ( http://arxiv.org/abs/2312.14134v2 ) ライセンス: Link先を確認	Tao Huang, Guangqi Jiang, Yanjie Ze, Huazhe Xu,	(参考訳) エキスパートビデオからの学習報酬は、強化学習タスクの意図した振る舞いを特定するための、安価で効果的なソリューションを提供する。本研究では,複雑な視覚的RL問題を解くための条件付きビデオ拡散モデルを用いて,エキスパートビデオから報酬を学習する新しいフレームワークであるDiffusion Rewardを提案する。我々の重要な洞察は、専門家の軌道で条件付けされた場合、低い生成多様性が観察されるということである。拡散逆転は、専門家のような行動の生産的探索を促進する条件エントロピーの負によって形式化される。本稿では,メタワールドとアドロイトのロボット操作タスク10名に対して,視覚的インプットとスパース報酬が有効であることを示す。さらに、ディフュージョン・リワードは未確認のタスクをうまく効果的に解決し、ベースラインの手法をはるかに超えた。プロジェクトページとコード:https://diffusion-reward.github.io/ Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from expert videos via conditional video diffusion models for solving complex visual RL problems. Our key insight is that lower generative diversity is observed when conditioned on expert trajectories. Diffusion Reward is accordingly formalized by the negative of conditional entropy that encourages productive exploration of expert-like behaviors. We show the efficacy of our method over 10 robotic manipulation tasks from MetaWorld and Adroit with visual input and sparse reward. Moreover, Diffusion Reward could even solve unseen tasks successfully and effectively, largely surpassing baseline methods. Project page and code: https://diffusion-reward.github.io/.	翻訳日:2024-03-20 02:32:43 公開日:2024-03-18
# 露光ブラケットは、画像復元と拡張タスクの統合に必要なもの Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks ( http://arxiv.org/abs/2401.00766v2 ) ライセンス: Link先を確認	Zhilu Zhang, Shuohao Zhang, Renlong Wu, Zifei Yan, Wangmeng Zuo,	(参考訳) 低照度環境では、鮮明な内容の高品質な写真を取得することが非常に望ましいが、難しい。マルチイメージ処理手法(バースト、デュアル露光、マルチ露光画像)はこの問題に対処する上で大きな進歩を遂げているが、通常は特定の復元や強化の問題に焦点を合わせており、マルチイメージの活用には不十分である。マルチ露光画像は,分解,分解,高ダイナミックレンジイメージング,高解像度化に相補的であり,露光ブラケット写真を用いて修復作業と強化作業を統合することを提案する。実世界のペアを集めることの難しさから,まず合成ペアデータを用いてモデルを事前学習し,実世界の未ラベル画像に適応させる手法を提案する。特に,時間変調リカレントネットワーク(TMRNet)と自己教師あり適応手法を提案する。さらに,200の夜間シナリオからペアを合成し,実世界の画像を収集するデータシミュレーションパイプラインを構築した。両データセットの実験から,本手法は最先端のマルチイメージ処理に対して良好に動作することが示された。データセット、コード、事前トレーニングされたモデルはhttps://github.com/cszhilu1998/BracketIREで入手できる。 It is highly desired but challenging to acquire high-quality photos with clear content in low-light environments. Although multi-image processing methods (using burst, dual-exposure, or multi-exposure images) have made significant progress in addressing this issue, they typically focus on specific restoration or enhancement problems, being insufficient in exploiting multi-image. Motivated by that multi-exposure images are complementary in denoising, deblurring, high dynamic range imaging, and super-resolution, we propose to utilize exposure bracketing photography to unify restoration and enhancement tasks in this work. Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data and then adapts it to real-world unlabeled images. In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed. Moreover, we construct a data simulation pipeline to synthesize pairs and collect real-world images from 200 nighttime scenarios. Experiments on both datasets show that our method performs favorably against the state-of-the-art multi-image processing ones. The dataset, code, and pre-trained models are available at https://github.com/cszhilu1998/BracketIRE.	翻訳日:2024-03-20 02:32:43 公開日:2024-03-18
# GRAM:マルチページVQAのためのグローバル推論 GRAM: Global Reasoning for Multi-Page VQA ( http://arxiv.org/abs/2401.03411v2 ) ライセンス: Link先を確認	Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Elad Ben Avraham, Aviad Aberdam, Shahar Tsiper, Ron Litman,	(参考訳) トランスフォーマーベースの大規模言語モデルの利用が増加し、長いシーケンスを処理するという課題がもたらされる。ドキュメント視覚的質問応答(DocVQA)では、主要な手法は単一ページの設定に焦点を当て、文書は数百ページに及ぶ。計算量の多い事前学習を必要とせずに,事前学習したシングルページモデルを複数ページ設定にシームレスに拡張するGRAMを提案する。そこで我々は,局所的なページレベルの理解にシングルページエンコーダを活用し,それを文書レベルの指定層や学習可能なトークンで拡張し,グローバルな推論のためにページ間の情報の流れを容易にする。そこで本研究では,新たに導入された文書トークンを利用するためのモデルを提案する。復号化の際に,圧縮変換器(C-Former)を用いた任意の圧縮ステージを導入し,符号化シーケンス長を低減し,品質とレイテンシのトレードオフを可能にする。大規模実験では,多ページDocVQAのベンチマークでGRAMの最先端性能を示し,本手法の有効性を実証した。 The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA), leading methods focus on the single-page setting, while documents can span hundreds of pages. We present GRAM, a method that seamlessly extends pre-trained single-page models to the multi-page setting, without requiring computationally-heavy pretraining. To do so, we leverage a single-page encoder for local page-level understanding, and enhance it with document-level designated layers and learnable tokens, facilitating the flow of information across pages for global reasoning. To enforce our model to utilize the newly introduced document tokens, we propose a tailored bias adaptation method. For additional computational savings during decoding, we introduce an optional compression stage using our compression-transformer (C-Former),reducing the encoded sequence length, thereby allowing a tradeoff between quality and latency. Extensive experiments showcase GRAM's state-of-the-art performance on the benchmarks for multi-page DocVQA, demonstrating the effectiveness of our approach.	翻訳日:2024-03-20 02:32:42 公開日:2024-03-18
# TrustLLM: 大規模言語モデルにおける信頼性 TrustLLM: Trustworthiness in Large Language Models ( http://arxiv.org/abs/2401.05561v4 ) ライセンス: Link先を確認	Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, Ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao,	(参考訳) ChatGPTによって実証された大規模言語モデル(LLM)は、その優れた自然言語処理能力にかなりの注目を集めている。にもかかわらず、これらのLSMは、特に信頼性の領域において、多くの課題を呈している。したがって、LSMの信頼性を確保することが重要なトピックである。本稿では, LLMにおける信頼度に関する総合的研究であるTrustLLMを紹介し, 信頼性の異なる側面に対する原則, 確立されたベンチマーク, 主要なLCMに対する信頼度の評価, 分析, オープンチャレンジと今後の方向性について議論する。具体的には,まず,8つの異なる次元にまたがる信頼性の高いLCMの原理を提案する。これらの原則に基づいて、真理性、安全性、公正性、堅牢性、プライバシー、機械倫理を含む6つの次元にわたるベンチマークを確立する。次に、30以上のデータセットからなるTrustLLMの16のメインストリームLCMを評価する。まず,一般に信頼性と実用性(機能的有効性)は肯定的に関連していることが示唆された。第2に,プロプライエタリなLDMは信頼性という点で一般的にオープンソースよりも優れており,広くアクセス可能なオープンソースLMの潜在的なリスクに対する懸念が高まっている。しかし、いくつかのオープンソース LLM はプロプライエタリに非常に近いものである。第三に、一部のLSMは信頼性を示すために過度に調整されている可能性がある点に注意する必要がある。最後に、モデル自体だけでなく、信頼性を支える技術においても透明性を確保することの重要性を強調します。採用されている特定の信頼できる技術を知ることは、その有効性を分析するのに不可欠である。 Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness.	翻訳日:2024-03-20 02:22:38 公開日:2024-03-18
# 政策勾配部分空間の同定 Identifying Policy Gradient Subspaces ( http://arxiv.org/abs/2401.06604v3 ) ライセンス: Link先を確認	Jan Schneider, Pierre Schumacher, Simon Guist, Le Chen, Daniel Häufle, Bernhard Schölkopf, Dieter Büchler,	(参考訳) ポリシー勾配法は、複雑な連続制御タスクを解く大きな可能性を秘めている。それでも、最適化問題内の構造を利用することで、トレーニング効率を向上させることができる。最近の研究は、勾配が低次元でゆっくりと変化する部分空間にあるという事実を活用することで教師あり学習を加速できることを示している。本稿では, この現象を, 様々なベンチマークタスクにおける2つの一般的な政策勾配法に対して, 徹底的に評価する。その結果、強化学習に固有のデータ分布が連続的に変化するにもかかわらず、そのような勾配部分空間の存在が示されている。これらの結果は,パラメータ空間探索の改善や2次最適化の実現を通じて,より効率的な強化学習,例えば,今後の研究に向けた有望な方向性を明らかにしている。 Policy gradient methods hold great potential for solving complex continuous control tasks. Still, their training efficiency can be improved by exploiting structure within the optimization problem. Recent work indicates that supervised learning can be accelerated by leveraging the fact that gradients lie in a low-dimensional and slowly-changing subspace. In this paper, we conduct a thorough evaluation of this phenomenon for two popular deep policy gradient methods on various simulated benchmark tasks. Our results demonstrate the existence of such gradient subspaces despite the continuously changing data distribution inherent to reinforcement learning. These findings reveal promising directions for future work on more efficient reinforcement learning, e.g., through improving parameter-space exploration or enabling second-order optimization.	翻訳日:2024-03-20 02:22:38 公開日:2024-03-18
# 拡散モデルを用いたキーポイント誘導変形画像マニピュレーション Key-point Guided Deformable Image Manipulation Using Diffusion Model ( http://arxiv.org/abs/2401.08178v2 ) ライセンス: Link先を確認	Seok-Hwan Oh, Guil Jung, Myeong-Gee Kim, Sang-Yun Kim, Young-Min Kim, Hyeon-Jik Lee, Hyuk-Sool Kwon, Hyeon-Min Bae,	(参考訳) 本稿では,キーポイント誘導拡散確率モデル(KDM)を提案する。中間出力として光フローマップを組み込んだ2段階生成モデルを提案する。これにより、画像とスパースキーポイントのセマンティクス関係の高密度な画素ワイズ理解が構成され、より現実的な画像生成につながる。さらに、光学フローの統合は、シーケンシャルな画像のフレーム間分散を制御し、真にシーケンシャルな画像生成を示す。 KDMは、顔画像生成、ヒトのポーズ合成、心エコー画像予測など、さまざまなキーポイント条件付き画像合成タスクを用いて評価され、KDMは、最先端のモデルと比較して一貫性とフォトリアリスティックなイメージを実証している。 In this paper, we introduce a Key-point-guided Diffusion probabilistic Model (KDM) that gains precise control over images by manipulating the object's key-point. We propose a two-stage generative model incorporating an optical flow map as an intermediate output. By doing so, a dense pixel-wise understanding of the semantic relation between the image and sparse key point is configured, leading to more realistic image generation. Additionally, the integration of optical flow helps regulate the inter-frame variance of sequential images, demonstrating an authentic sequential image generation. The KDM is evaluated with diverse key-point conditioned image synthesis tasks, including facial image generation, human pose synthesis, and echocardiography video prediction, demonstrating the KDM is proving consistency enhanced and photo-realistic images compared with state-of-the-art models.	翻訳日:2024-03-20 02:22:38 公開日:2024-03-18
# 初期の熱帯性サイクロンの増強に伴う3次元放射パターンの同定 Identifying Three-Dimensional Radiative Patterns Associated with Early Tropical Cyclone Intensification ( http://arxiv.org/abs/2401.09493v3 ) ライセンス: Link先を確認	Frederick Iat-Hin Tam, Tom Beucler, James H. Ruppert Jr,	(参考訳) 雲の放射フィードバックは初期の熱帯性サイクロン(TC)の増強に影響を及ぼすが、既存の診断フレームワークの制限により、非対称または過渡的な放射熱の研究には適さない。本稿では, 実数値シミュレーションTCの表面強度と放射の隠れ関係を学習するための線形変分エンコーダ(VED)を提案する。 VEDモデル入力の制限により、その不確実性を利用して、放射線が強度を高めるためにより重要となる期間を特定することができる。抽出した3次元放射構造を綿密に調べたところ、内核深部対流と浅部雲からの長波放射強制力はともに強度に寄与し、深部対流は全体的に最も影響が大きいことが示唆された。浅層雲の深い対流下風は、ハイヤンの激化に欠かせない。我々の研究は、機械学習が軸対称的あるいは決定論的仮定に頼ることなく熱力学的関係を発見できることを示し、現実的な条件下でTCの強化につながるプロセスの客観的発見への道を開いた。 Cloud radiative feedback impacts early tropical cyclone (TC) intensification, but limitations in existing diagnostic frameworks make them unsuitable for studying asymmetric or transient radiative heating. We propose a linear Variational Encoder-Decoder (VED) to learn the hidden relationship between radiation and the surface intensification of realistic simulated TCs. Limiting VED model inputs enables using its uncertainty to identify periods when radiation has more importance for intensification. A close examination of the extracted 3D radiative structures suggests that longwave radiative forcing from inner core deep convection and shallow clouds both contribute to intensification, with the deep convection having the most impact overall. We find that deep convection downwind of the shallow clouds is critical to the intensification of Haiyan. Our work demonstrates that machine learning can discover thermodynamic-kinematic relationships without relying on axisymmetric or deterministic assumptions, paving the way towards the objective discovery of processes leading to TC intensification in realistic conditions.	翻訳日:2024-03-20 02:22:38 公開日:2024-03-18
# Hybrid-Task Meta-Learning: スケーラブルで転送可能な帯域割り当てのためのグラフニューラルネットワークアプローチ Hybrid-Task Meta-Learning: A Graph Neural Network Approach for Scalable and Transferable Bandwidth Allocation ( http://arxiv.org/abs/2401.10253v2 ) ライセンス: Link先を確認	Xin Hao, Changyang She, Phee Lep Yeoh, Yuhong Liu, Branka Vucetic, Yonghui Li,	(参考訳) 本稿では,深層学習に基づく帯域割り当て政策について述べる。 1) ユーザ数とスケーラビリティ 2)非定常無線チャネル,QoS(Quality-of-Service)要件,動的に利用可能なリソースなど,さまざまな通信シナリオに転送可能である。スケーラビリティをサポートするために、帯域割り当てポリシーは、ユーザ数に応じてトレーニングパラメータの数が変化しないグラフニューラルネットワーク(GNN)によって表現される。 GNNの一般化を実現するために,GNNの初期パラメータをメタトレーニング中に異なる通信シナリオで訓練するハイブリッドタスクメタ学習(HML)アルゴリズムを開発した。次に、メタテストの間、いくつかのサンプルを使用して、見えない通信シナリオでGNNを微調整する。シミュレーションの結果、我々のHMLアプローチは、既存のベンチマークと比較して、初期性能を8.79 %$で改善し、サンプリング効率を7,3 %$で改善できることが示されている。微調整後、我々の近最適GNNベースのポリシーは、反復最適化を用いて得られる最適ポリシーと比較して、推論の複雑さがはるかに低いため、ほぼ同じ報酬を達成することができる。 In this paper, we develop a deep learning-based bandwidth allocation policy that is: 1) scalable with the number of users and 2) transferable to different communication scenarios, such as non-stationary wireless channels, different quality-of-service (QoS) requirements, and dynamically available resources. To support scalability, the bandwidth allocation policy is represented by a graph neural network (GNN), with which the number of training parameters does not change with the number of users. To enable the generalization of the GNN, we develop a hybrid-task meta-learning (HML) algorithm that trains the initial parameters of the GNN with different communication scenarios during meta-training. Next, during meta-testing, a few samples are used to fine-tune the GNN with unseen communication scenarios. Simulation results demonstrate that our HML approach can improve the initial performance by $8.79\%$, and sampling efficiency by $73\%$, compared with existing benchmarks. After fine-tuning, our near-optimal GNN-based policy can achieve close to the same reward with much lower inference complexity compared to the optimal policy obtained using iterative optimization.	翻訳日:2024-03-20 02:22:38 公開日:2024-03-18
# CMMMU:中国の大規模多分野マルチモーダル理解ベンチマーク CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark ( http://arxiv.org/abs/2401.11944v2 ) ライセンス: Link先を確認	Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu,	(参考訳) 大規模マルチモーダルモデル(LMM)の性能向上が進むにつれ,LMMの性能評価の必要性が高まっている。さらに、中国語のような非英語の文脈において、LMMの高度な知識と推論能力を評価するには、さらに大きなギャップがある。 CMMMUは、中国における大学レベルの教科知識と意図的推論を必要とするタスクにおいて、LMMを評価するために設計された、中国の大規模多分野マルチモーダル理解(Multimodal Understanding)ベンチマークである。 CMMMUはMMMUのアノテーションと分析パターンにインスパイアされ、厳密に従っている。 CMMMUは、大学試験、クイズ、教科書から12kの質問を手作業で収集し、アート&デザイン、ビジネス、サイエンス、ヘルス&メディカル、人文科学、テクノロジー&エンジニアリングの6つの中核分野をカバーしている。これらの質問は30の被験者に及び、図、図、地図、テーブル、音楽シート、化学構造など、39の非常に異質なイメージタイプで構成されている。 CMMMUは、中国語の文脈における複雑な認識とドメイン固有の知識による推論に焦点を当てている。我々は11個のオープンソースLCMと1つのプロプライエタリなGPT-4V(ision)を評価した。 GPT-4Vでさえ42%の精度しか達成せず、改善の余地が大きいことを示している。 CMMMUは、人工知能の専門家に向けて次世代のLMMを構築するためのコミュニティを強化し、多様な言語コンテキストを提供することでLMMの民主化を促進する。 As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context. CMMMU is inspired by and strictly follows the annotation and analysis pattern of MMMU. CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. CMMMU focuses on complex perception and reasoning with domain-specific knowledge in the Chinese context. We evaluate 11 open-source LLMs and one proprietary GPT-4V(ision). Even GPT-4V only achieves accuracies of 42%, indicating a large space for improvement. CMMMU will boost the community to build the next-generation LMMs towards expert artificial intelligence and promote the democratization of LMMs by providing diverse language contexts.	翻訳日:2024-03-20 02:22:38 公開日:2024-03-18
# 非定常コヒーレント光波の幾何学的位相:動的位相に調和した非線形発展 Geometric phase for a nonstatic coherent light-wave: nonlinear evolution harmonized with the dynamical phase ( http://arxiv.org/abs/2401.12560v2 ) ライセンス: Link先を確認	Jeong Ryeol Choi,	(参考訳) 静環境下で発生する非定常コヒーレント光波の幾何位相の特性を種々の角度から解析した。幾何学的位相は規則的な非線形な方法で変化し、その変化の中心は時間とともに常に増加する。この結果は、周期波の崩壊と膨張が幾何学的位相の進化に与える影響によるものである。このような幾何学的位相と動的位相との調和は、全位相を非定常性の度合いに依存するユニークなパターンで進化させる。総相は、波の非定常性に対する幾何学的位相の強い応答のため、その進化において周期的に析出する、極端な非定常性の場合の特異な挙動を示す。コヒーレント状態の幾何学的位相がフォック状態の位相よりも顕著であることが確認された。波の非定常性が消える単純な場合、幾何相の記述は、もはや周期的変化を起こさないよく知られた従来のものへと回復する。慣れ親しんだ力学相は、ハミルトニアンの期待値にのみ関係しているが、我々が管理した幾何学相は、量子状態の進化における微妙な非定常性差を反映している。 Properties of the geometric phase for a nonstatic coherent light-wave arisen in a static environment are analyzed from various angles. The geometric phase varies in a regular nonlinear way, where the center of its variation increases constantly with time. This consequence is due to the effects of the periodic wave collapse and expansion on the evolution of the geometric phase. Harmonization of such a geometric-phase evolution with the dynamical phase makes the total phase evolve with a unique pattern that depends on the degree of nonstaticity. The total phase exhibits a peculiar behavior for the case of extreme nonstaticity, which is that it precipitates periodically in its evolution, owing to a strong response of the geometric phase to the wave nonstaticity. It is confirmed that the geometric phase in the coherent state is mostly more prominent compared to that in the Fock states. For a simple case where the wave nonstaticity disappears, our description of the geometric phase recovers to the well-known conventional one which no longer undergoes periodical change. While the familiar dynamical phase is just related to the expectation value of the Hamiltonian, the geometric phase that we have managed reflects a delicate nonstaticity difference in the evolution of quantum states.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# SpeechDPR--to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering SpeechDPR: End-to-End Spoken Passage Retrieval for Open-Domain Spoken Question Answering ( http://arxiv.org/abs/2401.13463v2 ) ライセンス: Link先を確認	Chyi-Jiunn Lin, Guan-Ting Lin, Yung-Sung Chuang, Wei-Lun Wu, Shang-Wen Li, Abdelrahman Mohamed, Hung-yi Lee, Lin-shan Lee,	(参考訳) SQA(Spken Question Answering)は、機械がユーザの質問に応答するために必要である。 SQAは、認識エラーや外語彙(OOV)の問題を避けるために、これまでASRなしで達成されてきた。しかし,オープンドメインSQA(open-domain SQA)の現実的な問題として,音声アーカイブから応答を含む可能性のあるパスをマシンが最初に取り出す必要があることが考えられた。本稿では,openSQA問題の検索コンポーネントとして,最初のエンドツーエンドフレームワークであるSpeechDPR(SpeechDPR)を提案する。 SpeechDPRは、教師なしASR (UASR) とテキスト密度検索 (TDR) のカスケーディングモデルから知識を蒸留することにより、文レベルの意味表現を学習する。手書きの音声データの書き起こしは不要。最初の実験では、UASRとTDRのカスケードモデルに匹敵する性能を示し、UASRが貧弱な場合には、この手法が音声認識エラーに対してより堅牢であることを示す。 Spoken Question Answering (SQA) is essential for machines to reply to user's question by finding the answer span within a given spoken passage. SQA has been previously achieved without ASR to avoid recognition errors and Out-of-Vocabulary (OOV) problems. However, the real-world problem of Open-domain SQA (openSQA), in which the machine needs to first retrieve passages that possibly contain the answer from a spoken archive in addition, was never considered. This paper proposes the first known end-to-end framework, Speech Dense Passage Retriever (SpeechDPR), for the retrieval component of the openSQA problem. SpeechDPR learns a sentence-level semantic representation by distilling knowledge from the cascading model of unsupervised ASR (UASR) and text dense retriever (TDR). No manually transcribed speech data is needed. Initial experiments showed performance comparable to the cascading model of UASR and TDR, and significantly better when UASR was poor, verifying this approach is more robust to speech recognition errors.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# 変形性注意の蒸留学習による自己教師付きビデオオブジェクトセグメンテーション Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention ( http://arxiv.org/abs/2401.13937v2 ) ライセンス: Link先を確認	Quang-Trung Truong, Duc Thanh Nguyen, Binh-Son Hua, Sai-Kit Yeung,	(参考訳) ビデオオブジェクトセグメンテーションはコンピュータビジョンの基本的な研究課題である。最近の技術は、ビデオシーケンスからのオブジェクト表現学習にしばしば注意機構を適用している。しかし、ビデオデータの時間的変化により、アテンションマップはビデオフレーム間の関心の対象とうまく一致せず、長時間のビデオ処理においてエラーが蓄積される可能性がある。さらに、既存の技術は複雑なアーキテクチャを利用しており、高い計算複雑性を必要としているため、低出力デバイスにビデオオブジェクトのセグメンテーションを統合する能力は制限されている。これらの課題に対処するために,変形性注意の蒸留学習に基づく自己教師型ビデオオブジェクトセグメンテーションを提案する。具体的には、時間的変化に効果的に適用可能な、ビデオオブジェクトセグメンテーションのための軽量なアーキテクチャを考案する。これは、アテンションモジュール内のビデオシーケンスのメモリをキャプチャするキーと値が、フレーム間でフレキシブルな位置を更新する、変形可能なアテンション機構によって実現される。したがって、学習対象表現は空間次元と時間次元の両方に適応する。提案手法は, 変形性アテンションマップを蒸留損失に組み込んだ新しい知識蒸留パラダイムを用いて, 自己指導型アーキテクチャを訓練する。 DAVIS 2016/2017 や YouTube-VOS 2018/2019 などのベンチマークデータセット上で,本手法を質的に定量的に評価し,既存の手法と比較した。実験により,本手法が達成した最先端性能と最適なメモリ使用量により,本手法の優位性を検証した。 Video object segmentation is a fundamental research problem in computer vision. Recent techniques have often applied attention mechanism to object representation learning from video sequences. However, due to temporal changes in the video data, attention maps may not well align with the objects of interest across video frames, causing accumulated errors in long-term video processing. In addition, existing techniques have utilised complex architectures, requiring highly computational complexity and hence limiting the ability to integrate video object segmentation into low-powered devices. To address these issues, we propose a new method for self-supervised video object segmentation based on distillation learning of deformable attention. Specifically, we devise a lightweight architecture for video object segmentation that is effectively adapted to temporal changes. This is enabled by deformable attention mechanism, where the keys and values capturing the memory of a video sequence in the attention module have flexible locations updated across frames. The learnt object representations are thus adaptive to both the spatial and temporal dimensions. We train the proposed architecture in a self-supervised fashion through a new knowledge distillation paradigm where deformable attention maps are integrated into the distillation loss. We qualitatively and quantitatively evaluate our method and compare it with existing methods on benchmark datasets including DAVIS 2016/2017 and YouTube-VOS 2018/2019. Experimental results verify the superiority of our method via its achieved state-of-the-art performance and optimal memory usage.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# マルチモーダルパス:他のモーダルからの無関係データによるトランスフォーマーの改善 Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities ( http://arxiv.org/abs/2401.14405v2 ) ライセンス: Link先を確認	Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue,	(参考訳) 音声やポイントクラウドのデータセットを用いたImageNetモデルの改善など、他のモダリティから無関係なデータを用いて、特定のモダリティの変換器を改善することを提案する。対象のモダリティのデータサンプルが他のモダリティとは無関係であることを強調したい。これは、ペア(例えば、CLIP)や異なるモダリティのインターリーブされたデータを利用する他の作業との違いである。目的のモダリティとそれ用に設計されたトランスフォーマーを前提として、他のモダリティのデータで訓練された補助トランスフォーマーを用いて、2つのモデルのコンポーネントを接続し、目的のモダリティのデータを両モデルで処理できるように構成する手法を提案する。このようにして、2つのモードから得られる変換器の普遍的なシーケンス・ツー・シーケンス・モデリング能力を利用する。具体的実装として、モーダリティ特化トークンとタスク特化ヘッドを用いるが、提案手法であるクロスモーダル再パラメータ化(Cross-Modal Re-parameterization)により補助モデルの変圧ブロックを利用する。画像,ポイントクラウド,ビデオ,および音声認識タスクでは,他のモダリティから無関係なデータを用いて,顕著かつ一貫したパフォーマンス向上を観察する。コードとモデルはhttps://github.com/AILab-CVC/M2PTで公開されている。 We propose to improve transformers of a specific modality with irrelevant data from other modalities, e.g., improve an ImageNet model with audio or point cloud datasets. We would like to highlight that the data samples of the target modality are irrelevant to the other modalities, which distinguishes our method from other works utilizing paired (e.g., CLIP) or interleaved data of different modalities. We propose a methodology named Multimodal Pathway - given a target modality and a transformer designed for it, we use an auxiliary transformer trained with data of another modality and construct pathways to connect components of the two models so that data of the target modality can be processed by both models. In this way, we utilize the universal sequence-to-sequence modeling abilities of transformers obtained from two modalities. As a concrete implementation, we use a modality-specific tokenizer and task-specific head as usual but utilize the transformer blocks of the auxiliary model via a proposed method named Cross-Modal Re-parameterization, which exploits the auxiliary weights without any inference costs. On the image, point cloud, video, and audio recognition tasks, we observe significant and consistent performance improvements with irrelevant data from other modalities. The code and models are available at https://github.com/AILab-CVC/M2PT.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# ドメインの一般化を理解する:騒音のロバスト性の観点から Understanding Domain Generalization: A Noise Robustness Perspective ( http://arxiv.org/abs/2401.14846v2 ) ライセンス: Link先を確認	Rui Qiao, Bryan Kian Hsiang Low,	(参考訳) ドメイン一般化(DG)のための機械学習アルゴリズムの急速な開発にもかかわらず、既存のDGアルゴリズムが標準ベンチマークにおける古典的経験的リスク最小化(ERM)よりも優れているという明確な実証的証拠はない。この現象をよりよく理解するために,ラベルノイズのレンズによるEMM上のDGアルゴリズムの利点について検討する。具体的には, 有限サンプル解析により, ラベルノイズがERMの急激な相関効果を悪化させ, 一般化を損なうことが明らかとなった。逆に、DGアルゴリズムは、刺激的な相関が存在する場合でも、有限サンプルトレーニング中に暗黙のラベルノイズロバスト性を示すことを示す。このような望ましい性質は、素早い相関を緩和し、合成実験における一般化を改善するのに役立つ。しかし、実世界のベンチマークデータセットに関する追加の包括的な実験は、ラベルノイズの頑健さがEMMよりも優れたパフォーマンスに必ずしも変換されないことを示している。我々は,スプリアス相関から生じるERMの故障モードが,実際にはあまり顕著でないと推測する。 Despite the rapid development of machine learning algorithms for domain generalization (DG), there is no clear empirical evidence that the existing DG algorithms outperform the classic empirical risk minimization (ERM) across standard benchmarks. To better understand this phenomenon, we investigate whether there are benefits of DG algorithms over ERM through the lens of label noise. Specifically, our finite-sample analysis reveals that label noise exacerbates the effect of spurious correlations for ERM, undermining generalization. Conversely, we illustrate that DG algorithms exhibit implicit label-noise robustness during finite-sample training even when spurious correlation is present. Such desirable property helps mitigate spurious correlations and improve generalization in synthetic experiments. However, additional comprehensive experiments on real-world benchmark datasets indicate that label-noise robustness does not necessarily translate to better performance compared to ERM. We conjecture that the failure mode of ERM arising from spurious correlations may be less pronounced in practice.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# 固有値補正によるスペクトルグラフニューラルネットワークの表現力向上 Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction ( http://arxiv.org/abs/2401.15603v2 ) ライセンス: Link先を確認	Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang, Mengran Yin, Tat-Seng Chua,	(参考訳) 近年,多項式フィルタを特徴とするスペクトルグラフニューラルネットワークが注目度を高め,ノード分類などのタスクにおいて顕著な性能を発揮している。これらのモデルは典型的には正規化ラプラス行列の固有値が互いに異なると仮定し、多項式フィルタが高い適合性を持つことを期待する。しかし、本論文は、正規化されたラプラシア行列が繰り返し固有値を持つことを実証的に観察する。さらに、スペクトルグラフニューラルネットワークの表現力を決定する上で、識別可能な固有値の数が重要な役割を担っていることを理論的に証明する。そこで本研究では,繰り返し入力される固有値の制約から多項式フィルタを解放する固有値補正手法を提案する。具体的には、提案した固有値補正戦略により固有値の均一分布が向上し、繰り返し固有値を緩和し、多項式フィルタの適合能力と表現力を向上させる。人工と実世界の両方のデータセットに対する大規模な実験結果から,本手法の優位性が確認された。 In recent years, spectral graph neural networks, characterized by polynomial filters, have garnered increasing attention and have achieved remarkable performance in tasks such as node classification. These models typically assume that eigenvalues for the normalized Laplacian matrix are distinct from each other, thus expecting a polynomial filter to have a high fitting ability. However, this paper empirically observes that normalized Laplacian matrices frequently possess repeated eigenvalues. Moreover, we theoretically establish that the number of distinguishable eigenvalues plays a pivotal role in determining the expressive power of spectral graph neural networks. In light of this observation, we propose an eigenvalue correction strategy that can free polynomial filters from the constraints of repeated eigenvalue inputs. Concretely, the proposed eigenvalue correction strategy enhances the uniform distribution of eigenvalues, thus mitigating repeated eigenvalues, and improving the fitting capacity and expressive power of polynomial filters. Extensive experimental results on both synthetic and real-world datasets demonstrate the superiority of our method.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# LeTO:微分軌道最適化による制約付きビジュモータ政策の学習 LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory Optimization ( http://arxiv.org/abs/2401.17500v2 ) ライセンス: Link先を確認	Zhengtong Xu, Yu She,	(参考訳) 本稿では,可微分軌道最適化による制約付きビジュモータポリシーの学習手法であるLeTOを紹介する。当社のアプローチでは,ニューラルネットワークに微分可能な最適化レイヤを独自に統合しています。最適化層を軌道最適化問題として定式化することにより、モデルが余分なモジュールなしで安全かつ制御された方法でアクションをエンド・ツー・エンドに生成できるようにする。本手法は,訓練過程中に制約情報を導入し,制約を満たすための訓練目標のバランス,軌道の平滑化,実証による誤りの最小化を可能にする。この"グレーボックス"メソッドは、最適化に基づく安全性と解釈性を、ニューラルネットワークの強力な表現能力とマージする。シミュレーションおよび実ロボット上でLeTOを定量的に評価する。シミュレーションでは、LeTOは最先端の模倣学習手法に匹敵する成功率を達成するが、生成された軌道は不確実性が少なく、高品質で、より滑らかである。実世界の実験では、制約クリティカルなタスクを処理するためにLeTOをデプロイしました。その結果,LeTOと最先端の模倣学習手法を比較した。コードをhttps://github.com/ZhengtongXu/LeTOでリリースします。 This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at https://github.com/ZhengtongXu/LeTO.	翻訳日:2024-03-20 02:12:30 公開日:2024-03-18
# S-Agents: オープンエンド環境における自己組織化エージェント S-Agents: Self-organizing Agents in Open-ended Environments ( http://arxiv.org/abs/2402.04578v3 ) ライセンス: Link先を確認	Jiaqi Chen, Yuxian Jiang, Jiachen Lu, Li Zhang,	(参考訳) 大規模言語モデル(LLM)を活用することで、自律エージェントは大幅に改善され、さまざまなタスクを処理できるようになった。オープンエンド設定では、効率と有効性のためのコラボレーションの最適化は柔軟な調整を必要とする。それにもかかわらず、現在の研究は主に、固定されたタスク指向のワークフローを強調し、エージェント中心の組織構造を見落としている。人間の組織行動からインスピレーションを得て,動的ワークフローのための「エージェントツリー」構造を備えた自己組織化エージェントシステム(S-Agents),情報優先順位のバランスをとる「時間ガラスエージェントアーキテクチャ」,エージェント間の非同期タスク実行を可能にする「非障害物協調」手法を導入する。この構造はエージェントのグループを自律的に調整することができ、人間の介入なしにオープンで動的な環境の課題に効率的に対処することができる。実験の結果,S-AgentsはMinecraft環境において協調的な建築作業と資源収集を行い,その効果を検証した。 Leveraging large language models (LLMs), autonomous agents have significantly improved, gaining the ability to handle a variety of tasks. In open-ended settings, optimizing collaboration for efficiency and effectiveness demands flexible adjustments. Despite this, current research mainly emphasizes fixed, task-oriented workflows and overlooks agent-centric organizational structures. Drawing inspiration from human organizational behavior, we introduce a self-organizing agent system (S-Agents) with a "tree of agents" structure for dynamic workflow, an "hourglass agent architecture" for balancing information priorities, and a "non-obstructive collaboration" method to allow asynchronous task execution among agents. This structure can autonomously coordinate a group of agents, efficiently addressing the challenges of open and dynamic environments without human intervention. Our experiments demonstrate that S-Agents proficiently execute collaborative building tasks and resource collection in the Minecraft environment, validating their effectiveness.	翻訳日:2024-03-20 02:02:06 公開日:2024-03-18
# EcoVal: 機械学習のための効率的なデータ評価フレームワーク EcoVal: An Efficient Data Valuation Framework for Machine Learning ( http://arxiv.org/abs/2402.09288v3 ) ライセンス: Link先を確認	Ayush K Tarun, Vikram S Chundawat, Murari Mandal, Hong Ming Tan, Bowei Chen, Mohan Kankanhalli,	(参考訳) 機械学習ワークフローにおけるデータの価値の定量化は、機械学習イニシアチブにおいて、より戦略的決定を下す上で重要な役割を果たす。機械学習におけるデータバリュエーションのための既存のShapley値ベースのフレームワークは、Shapley値を得るためにモデルの繰り返しトレーニングを必要とするため、計算コストがかかる。本稿では,機械学習モデルにおけるデータの価値を,高速かつ実用的な方法で推定する,効率的なデータアセスメントフレームワークであるEcoValを紹介する。個々のデータサンプルを直接扱う代わりに、類似したデータポイントのクラスタの値を決定します。この値は、全てのメンバークラスタポイントの間でさらに伝播する。本研究では,各データの内在的および外在的値を推定することにより,全体のデータ値を決定することができることを示す。これは、伝統的な自由経済市場において、労働や資本といった要因に基づいて出力の量を推定するために一般的に使用される概念である「textit{production function}」としてモデルのパフォーマンスを定式化することで実現される。評価手法の正式な証明を提供し、その性能を加速する原理とメカニズムを解明する。本研究では,本手法の実際の適用性を,分布内データとサンプル外データの両方に対して有効性を示すことによって実証する。この研究は、機械学習モデルにおいて、大規模にスケールした効率的なデータバリュエーションのコア課題の1つに対処する。 Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an efficient data valuation framework EcoVal, to estimate the value of data for machine learning models in a fast and practical manner. Instead of directly working with individual data sample, we determine the value of a cluster of similar data points. This value is further propagated amongst all the member cluster points. We show that the overall data value can be determined by estimating the intrinsic and extrinsic value of each data. This is enabled by formulating the performance of a model as a \textit{production function}, a concept which is popularly used to estimate the amount of output based on factors like labor and capital in a traditional free economic market. We provide a formal proof of our valuation technique and elucidate the principles and mechanisms that enable its accelerated performance. We demonstrate the real-world applicability of our method by showcasing its effectiveness for both in-distribution and out-of-sample data. This work addresses one of the core challenges of efficient data valuation at scale in machine learning models.	翻訳日:2024-03-20 02:02:06 公開日:2024-03-18
# HyperAgent: 複雑な環境のためのシンプルでスケーラブルで効率的な強化学習フレームワーク HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments ( http://arxiv.org/abs/2402.10228v3 ) ライセンス: Link先を確認	Yingru Li, Jiawei Xu, Lei Han, Zhi-Quan Luo,	(参考訳) 資源制約下での複雑なタスクを解決するためには、強化学習(RL)エージェントは単純で効率的でスケーラブルで、(1)大きな状態空間と(2)相互作用データの連続的な蓄積に対処する必要がある。一般値関数に関連付けられた後続の計算効率の高いインクリメンタル近似を,共役性やデータ効率のよい動作選択を不要に実現した,ハイパーモデルとインデックスサンプリングを特徴とするRLフレームワークHyperAgentを提案する。 HyperAgentの実装は簡単で、Double-DQNに必要なモジュールをひとつ追加するだけでよい。 HyperAgentは、大規模なディープRLベンチマークで堅牢なパフォーマンスを提供する最初の方法であり、証明可能なスケーラブルなステップ毎の計算複雑性を実現し、表の仮定の下でサブ線形後悔を実現する。 HyperAgentは、問題のサイズに合わせて最適にスケールし、Atariベンチマークの下でのデータと計算の両方で大幅な効率向上を示すエピソードでディープシーのハードな探索問題を解決することができる。理論解析の核となるのは、ジョンソン-リンデンシュトラウスの非自明なマーチンゲール拡大であるシーケンシャルランダム射影の最初の解析ツールによって実現された逐次後近似論である。この研究はRLの理論的および実践的な領域を橋渡しし、RLアルゴリズム設計の新しいベンチマークを確立した。 To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.	翻訳日:2024-03-20 02:02:06 公開日:2024-03-18
# Fabry-Perot微小キャビティにおける量子ドットからのフィルタフリー高性能単一光子放射 Filter-free high-performance single photon emission from a quantum dot in a Fabry-Perot microcavity ( http://arxiv.org/abs/2402.11623v2 ) ライセンス: Link先を確認	Zhixuan Rao, Jiawei Yang, Changkun Song, Mujie Rao, Ziyang Zheng, Luyu Liu, Xuebin Peng, Ying Yu, Siyuan Yu,	(参考訳) 共鳴励起とPurcell-enhanced single quantum dots (QDs) を組み合わせることは、高性能な固体単一光子源を実現するための重要な戦略である。しかし、光子効率を最適化するには、励起レーザーをQDの放出から効果的に分離する際の課題に対処する必要がある。伝統的に、これは偏光フィルタリングを含み、達成可能な偏光方向とフォトニック状態のスケーラビリティを制限する。本研究では, モノリシックファブリペロマイクロキャビティと決定的に結合したQDの空間直交共振励起を用いて, この問題に対処した。膜キャビティ構造を利用して, フィルタのない単一光子共鳴蛍光を実現した。得られた光源は、高い抽出効率が0.87、純度が0.9045(4)、識別性が0.963(4)の単一光子を同時に生成する。 Combining resonant excitation with Purcell-enhanced single quantum dots (QDs) stands out as a prominent strategy for realizing high performance solid-state single photon sources. However, optimizing photon efficiency requires addressing challenges associated with effectively separating the excitation laser from QDs' emission. Traditionally, this involves polarization filtering, which limits the achievable polarization directions and the scalability of photonic states. In this study, we have successfully tackled this challenge by employing spatially-orthogonal resonant excitation of QDs, deterministically coupled to monolithic Fabry-Perot microcavities. Leveraging the membrane cavity structures, we have achieved filter-free single photon resonant fluorescence. The resulting source produces single photons with a simultaneous high extraction efficiency of 0.87, purity of 0.9045(4), and indistinguishability of 0.963(4).	翻訳日:2024-03-20 02:02:06 公開日:2024-03-18
# 自動車運転のための大規模言語モデルに基づくハイブリッド推論 Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving ( http://arxiv.org/abs/2402.13602v3 ) ライセンス: Link先を確認	Mehdi Azarafza, Mojtaba Nayyeri, Charles Steinmetz, Steffen Staab, Achim Rettberg,	(参考訳) 大規模言語モデル(LLM)は、テキストや画像を理解し、人間に似たテキストを生成し、複雑な推論タスクを実行する能力において、大きな注目を集めている。しかし、この先進的な推論を動的状況における意思決定のための自然言語テキストの組み合わせで一般化するには、さらなる探索が必要である。本研究では,LLMが算術的推論と常識的推論の組み合わせ,特に自律運転シナリオにおいてどの程度うまく適応できるかを考察する。 LLMのハイブリッド推論能力は、検出された物体やセンサデータを分析し、運転規則や物理法則を理解し、追加のコンテキストを提供することによって、自律運転を改善することができると仮定する。これは、(気象条件による)視界の低い決定のような複雑なシナリオに対処する。我々は,CARLA内の人間生成の真実と比較し,その精度に基づいてLarge Language Models(LLMs)を評価した。その結果、LLMに画像(検出対象物)とセンサーデータを組み合わせると、様々な天候条件下での自動運転車のブレーキやスロットル制御の正確な情報が得られることがわかった。この定式化と回答は自動操縦システムの意思決定に役立てることができる。 Large Language Models (LLMs) have garnered significant attention for their ability to understand text and images, generate human-like text, and perform complex reasoning tasks. However, their ability to generalize this advanced reasoning with a combination of natural language text for decision-making in dynamic situations requires further exploration. In this study, we investigate how well LLMs can adapt and apply a combination of arithmetic and common-sense reasoning, particularly in autonomous driving scenarios. We hypothesize that LLMs hybrid reasoning abilities can improve autonomous driving by enabling them to analyze detected object and sensor data, understand driving regulations and physical laws, and offer additional context. This addresses complex scenarios, like decisions in low visibility (due to weather conditions), where traditional methods might fall short. We evaluated Large Language Models (LLMs) based on accuracy by comparing their answers with human-generated ground truth inside CARLA. The results showed that when a combination of images (detected objects) and sensor data is fed into the LLM, it can offer precise information for brake and throttle control in autonomous vehicles across various weather conditions. This formulation and answers can assist in decision-making for auto-pilot systems.	翻訳日:2024-03-20 01:52:05 公開日:2024-03-18
# KorNAT:韓国の社会価値と共通知識のためのLLMアライメントベンチマーク KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge ( http://arxiv.org/abs/2402.13605v3 ) ライセンス: Link先を確認	Jiyoung Lee, Minwoo Kim, Seungho Kim, Junghwan Kim, Seunghyun Won, Hwaran Lee, Edward Choi,	(参考訳) 大きな言語モデル(LLM)が特定の国に効果的に展開されるためには、その国の文化と基本的な知識を理解する必要がある。この目的のために,社会価値アライメントと共通知識アライメントという2つの側面から,LLMと対象国間のアライメントを測定する全国アライメントを導入する。社会的価値のアライメントは、モデルがいかに国家固有の社会的価値を理解するかを評価する一方、共通の知識のアライメントは、モデルが国家に関連する基本的な知識をいかに捉えるかを調べる。我々は韓国と国家の整合性を測定する最初のベンチマークであるKorNATを構築した。社会価値データセットについて,6,174人の韓国人参加者を対象とした大規模調査から,基礎的真理ラベルを得た。共通知識データセットについて,韓国の教科書とGED参照資料に基づくサンプルを構築した。 KorNATには、それぞれ社会的価値と共通知識に関する4Kと6Kの多重選択質問が含まれている。我々のデータセット作成プロセスは、統計的サンプリング理論に基づいて慎重に設計され、複数ラウンドの人間によるレビューを通して洗練されている。 7つのLLM実験の結果, 基準値に適合するモデルはごくわずかであり, さらなる拡張の可能性を示した。 KorNATは、データセットの品質評価を専門とする政府機関による評価を通過させた後、政府の承認を受けた。データセットのサンプルと詳細な評価プロトコルはhttps://selectstar.ai/ko/papers-national-alignmentに記載されている。 For Large Language Models (LLMs) to be effectively deployed in a specific country, they must possess an understanding of the nation's culture and basic knowledge. To this end, we introduce National Alignment, which measures an alignment between an LLM and a targeted country from two aspects: social value alignment and common knowledge alignment. Social value alignment evaluates how well the model understands nation-specific social values, while common knowledge alignment examines how well the model captures basic knowledge related to the nation. We constructed KorNAT, the first benchmark that measures national alignment with South Korea. For the social value dataset, we obtained ground truth labels from a large-scale survey involving 6,174 unique Korean participants. For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials. KorNAT contains 4K and 6K multiple-choice questions for social value and common knowledge, respectively. Our dataset creation process is meticulously designed and based on statistical sampling theory and was refined through multiple rounds of human review. The experiment results of seven LLMs reveal that only a few models met our reference score, indicating a potential for further enhancement. KorNAT has received government approval after passing an assessment conducted by a government-affiliated organization dedicated to evaluating dataset quality. Samples and detailed evaluation protocols of our dataset can be found in https://selectstar.ai/ko/papers-national-alignment	翻訳日:2024-03-20 01:52:05 公開日:2024-03-18
# 機械学習注意モデルを用いた時間バイアス補正 A Temporal Bias Correction using a Machine Learning Attention model ( http://arxiv.org/abs/2402.14169v3 ) ライセンス: Link先を確認	Omer Nivron, Damon J. Wischik, Mathieu Vrac, Emily Shuckburgh,	(参考訳) 気候モデルは、実世界の観測に関して偏りがあり、通常、影響研究の前に校正される必要がある。このようなキャリブレーションを可能にする統計手法の組はバイアス補正(BC)と呼ばれる。しかし、現在のBC法は、連続する時間点間の依存を無視しているため、時間バイアスを調整するのに苦労している。結果として、熱波の持続時間や周波数などの長期的特性を持つ気候統計を正確に修正することはできず、そのような気候統計に関する信頼性の高い影響研究を作成するのがより困難になる。本稿では,時間的バイアスを補正する新しいBC手法を提案する。これは実現される。一アルゴリズムの手順よりも確率モデルとしてBCを再考し、二最先端機械学習(ML)の確率的注意モデルを適用すること。アブハ、ナイジェリア、東京における熱波持続時間統計のケーススタディにより、現在の気候モデルと代替のBC法と比較して顕著な結果が得られた。 Climate models are biased with respect to real world observations and usually need to be calibrated prior to impact studies. The suite of statistical methods that enable such calibrations is called bias correction (BC). However, current BC methods struggle to adjust for temporal biases, because they disregard the dependence between consecutive time-points. As a result, climate statistics with long-range temporal properties, such as heatwave duration and frequency, cannot be corrected accurately, making it more difficult to produce reliable impact studies on such climate statistics. In this paper, we offer a novel BC methodology to correct for temporal biases. This is made possible by i) re-thinking BC as a probability model rather than an algorithmic procedure, and ii) adapting state-of-the-art machine-learning (ML) probabilistic attention models to fit the BC task. With a case study of heatwave duration statistics in Abuja, Nigeria, and Tokyo, Japan, we show striking results compared to current climate model outputs and alternative BC methods.	翻訳日:2024-03-20 01:52:05 公開日:2024-03-18

Title

Authors

Abstract

論文公表日・翻訳日

# 中国の胸部X線診断用病原体

A Disease Labeler for Chinese Chest X-Ray Report Generation ( http://arxiv.org/abs/2404.16852v1 )

ライセンス: Link先を確認

Mengwei Wang, Ruixin Yan, Zeyi Hou, Ning Lang, Xiuzhuang Zhou,

(参考訳) 医療画像解析の分野では、中国の胸部X線レポートデータセットの不足により、中国の胸部X線レポートを生成する技術の開発が妨げられている。一方、中国の胸部X線レポートデータセットの構築は、正確な専門的疾患診断の時間的・費用的なプロセスによって制限される。一方, 1つの自然言語生成指標を用いて, 生成した報告と基盤真実の類似性を評価するのが一般的であるが, 生成した報告の臨床的精度と有効性は, 正確な疾患ラベル(分類器)に依存している。本研究は,中国の胸部X線レポート作成に適した疾患ラベル作成手法を提案する。診断報告と臨床情報を別々に扱うためにデュアルBERTアーキテクチャを活用し、疾患と身体部分の関連性に基づく階層的なラベル学習アルゴリズムを構築し、テキスト分類性能を向上させる。この疾患ラベルを用いて, 51,262検体からなる中国の胸部X線レポートデータセットを構築した。最後に、専門家が注釈した中国の胸部X線レポートのサブセットについて実験と分析を行い、提案した疾患ラベル装置の有効性を検証した。

In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metric is commonly used to evaluate the similarity between generated and ground-truth reports, while the clinical accuracy and effectiveness of the generated reports rely on an accurate disease labeler (classifier). To address the issues, this study proposes a disease labeler tailored for the generation of Chinese chest X-ray reports. This labeler leverages a dual BERT architecture to handle diagnostic reports and clinical information separately and constructs a hierarchical label learning algorithm based on the affiliation between diseases and body parts to enhance text classification performance. Utilizing this disease labeler, a Chinese chest X-ray report dataset comprising 51,262 report samples was established. Finally, experiments and analyses were conducted on a subset of expert-annotated Chinese chest X-ray reports, validating the effectiveness of the proposed disease labeler.

翻訳日:2024-07-01 11:39:16 公開日:2024-03-18

# パスワード強度指標としての期待エントロピー

Expectation Entropy as a Password Strength Metric ( http://arxiv.org/abs/2404.16853v1 )

ライセンス: Link先を確認

Khan Reaz, Gerhard Wunder,

(参考訳) NIST Entropy Estimation Suite は、Min-Entropy の 0 から 1 までの結果を与える。本研究では,ランダムなパスワードやランダムなパスワードの強度を推定できる期待エントロピーを新たに開発した。期待エントロピーは、エントロピー推定ツールと同じ規模のパスワードの強度を提供する。例えば、0.4のようなある値の「探索エントロピー」を持つことは、攻撃者がパスワードを見つけるには、推測の総数の少なくとも40%を網羅的に検索しなければならないことを意味する。

The classical combinatorics-based password strength formula provides a result in tens of bits, whereas the NIST Entropy Estimation Suite give a result between 0 and 1 for Min-entropy. In this work, we present a newly developed metric -- Expectation entropy that can be applied to estimate the strength of any random or random-like password. Expectation entropy provides the strength of a password on the same scale as an entropy estimation tool. Having an 'Expectation entropy' of a certain value, for example, 0.4 means that an attacker has to exhaustively search at least 40\% of the total number of guesses to find the password.

翻訳日:2024-07-01 11:39:16 公開日:2024-03-18

# 画像文書における財務表抽出

Financial Table Extraction in Image Documents ( http://arxiv.org/abs/2405.05260v1 )

ライセンス: Link先を確認

William Watson, Bo Liu,

(参考訳) テーブルの抽出は、金融サービスにおいて長年にわたり広範囲にわたる問題であった。これは、コンテンツが厄介なピクセルフォーマットでロックされているイメージ領域において、より難しい。幸いなことに、画像セグメンテーション、OCR、シーケンスモデリングのためのディープラーニングの進歩は、印象的な結果を得るために必要な重み付けを提供する。本稿では,画像文書中の表状コンテンツを特定し,抽出し,翻訳するためのエンドツーエンドパイプラインを提案する。

Table extraction has long been a pervasive problem in financial services. This is more challenging in the image domain, where content is locked behind cumbersome pixel format. Luckily, advances in deep learning for image segmentation, OCR, and sequence modeling provides the necessary heavy lifting to achieve impressive results. This paper presents an end-to-end pipeline for identifying, extracting and transcribing tabular content in image documents, while retaining the original spatial relations with high fidelity.

翻訳日:2024-07-01 10:40:42 公開日:2024-03-18

# 3次元ホロスティックOR匿名化

3D Holistic OR Anonymization ( http://arxiv.org/abs/2405.05261v1 )

ライセンス: Link先を確認

Tony Danjun Wang,

(参考訳) 本稿では,オペレーティングルーム(OR)のマルチビューRGB-Dビデオ記録を自動的に匿名化するために,3D情報を活用する新しい手法を提案する。匿名化方式では,各画像の顔を異なる顔に置き換えて元のデータ分布を保存し,さらに下流のタスクに適したデータとして保存する。確立された匿名化法とは対照的に,本手法は2次元空間ではなく,まず3次元空間の顔の局所化を行う。それぞれの顔は、それぞれのカメラビューに異なる顔を再投影して匿名化され、最終的に結果の画像の元の顔を置き換える。さらに,動物(スワイン)の腹腔鏡下手術を経験した経験者に対して,ORの典型的特徴をカプセル化した多視点RGB-Dデータセットを提案する。最後に,そのデータセットを用いて評価した実験結果から,OR画像における3次元データを活用することにより,よりリアルな顔を生成することができることを示した。我々の知る限り、マルチビューOR記録の匿名化に対処する先行研究や、3D情報を利用した2次元顔のローカライゼーションは存在していない。

We propose a novel method that leverages 3D information to automatically anonymize multi-view RGB-D video recordings of operating rooms (OR). Our anonymization method preserves the original data distribution by replacing the faces in each image with different faces so that the data remains suitable for further downstream tasks. In contrast to established anonymization methods, our approach localizes faces in 3D space first rather than in 2D space. Each face is then anonymized by reprojecting a different face back into each camera view, ultimately replacing the original faces in the resulting images. Furthermore, we introduce a multi-view RGB-D dataset, captured during a real operation of experienced surgeons performing laparoscopic surgery on an animal object (swine), which encapsulates typical characteristics of ORs. Finally, we present experimental results evaluated on that dataset, showing that leveraging 3D data can achieve better face localization in OR images and generate more realistic faces than the current state-of-the-art. There has been, to our knowledge, no prior work that addresses the anonymization of multi-view OR recordings, nor 2D face localization that leverages 3D information.

翻訳日:2024-07-01 10:40:42 公開日:2024-03-18

# TQFTから見た通信プロトコルとQECC : その2 時空としてのQECC

Communication protocols and QECC from the perspective of TQFT, Part II: QECCs as spacetimes ( http://arxiv.org/abs/2405.12364v1 )

ライセンス: Link先を確認

Chris Fields, James F. Glazebrook, Antonino Marciano,

(参考訳) トポロジカル量子場理論(TQFT)は、量子状態の準備と測定を記述するための一般的な最小推定言語を提供する。そのため、マルチエージェント通信プロトコル、例えばローカル操作、古典通信(LOCC)プロトコルを表現する汎用言語を提供する。第1部では、TQFTを用いてLOCCプロトコルを構築し、エージェント環境境界上でLOCCプロトコルが量子誤り訂正符号(QECC)を誘導することを示す。そのような QECC は、そのような境界上での時空の出現を実装または誘導すると見なすことができる。本稿では、TQFTの異なる実現法を利用して、エージェント間通信と時空の関係について検討する。計算システムとしてのスピンネットワークのバウンダリをサポートするTQFTを探索する。これらはトポロジカル量子ニューラルネットワーク(TQNN)として知られている。テンソルネットワークとして自然な表現を持つTQNNは、QECCを実装している。私たちは HaPPY コードをパラダイム的な例として認識しています。次に、バルク境界符号としてのQECCが有効時空をいかに引き起こすかを示す。 QECCにおける効果的な空間的および時間的分離は、空間的に分離された観測者間のLOCCプロトコルを可能にする。次に、BF理論およびチャーン・サイモンズ理論におけるQECCの実装を検討し、QECCによる時空がLOCCに必要な古典的冗長性を提供することを示す。最後に、位相的M-理論を高時空次元におけるQECCの実装とみなす。

Topological quantum field theories (TQFTs) provide a general, minimal-assumption language for describing quantum-state preparation and measurement. They therefore provide a general language in which to express multi-agent communication protocols, e.g. local operations, classical communication (LOCC) protocols. In the accompanying Part I, we construct LOCC protocols using TQFT, and show that LOCC protocols induce quantum error-correcting codes (QECCs) on the agent-environment boundary. Such QECCs can be regarded as implementing or inducing the emergence of spacetimes on such boundaries. Here we investigate this connection between inter-agent communication and spacetime, exploiting different realizations of TQFT. We delve into TQFTs that support on their boundaries spin-networks as computational systems: these are known as topological quantum neural networks (TQNNs). TQNNs, which have a natural representation as tensor networks, implement QECC. We recognize into the HaPPY code a paradigmatic example. We then show how generic QECCs, as bulk-boundary codes, induce effective spacetimes. The effective spatial and temporal separations that take place in QECC enables LOCC protocols between spatially separated observers. We then consider the implementation of QECCs in BF and Chern-Simons theories, and show that QECC-induced spacetimes provide the classical redundancy required for LOCC. Finally, we consider topological M-theory as an implementation of QECC in higher spacetime dimensions.

翻訳日:2024-07-01 08:39:42 公開日:2024-03-18

# 高速レコメンデーションのための動的プルーニングによる行列係数の高速化

Accelerating Matrix Factorization by Dynamic Pruning for Fast Recommendation ( http://arxiv.org/abs/2404.04265v1 )

ライセンス: Link先を確認

Yining Wu, Shengyu Duan, Gaole Sai, Chenhong Cao, Guobing Zou,

(参考訳) 行列分解 (MF) は、高い予測精度、優れた柔軟性、ビッグデータ処理における高い効率のために、リコメンデーションシステム (RS) に広く使われているコラボレーティブフィルタリング (CF) アルゴリズムである。しかし、現在のRSのユーザ/イテムが劇的に増加し、MFモデルをトレーニングする計算の複雑さが大きくなった。既存の多くの研究は、追加の計算資源を投入するか、並列システムを利用することでMFを加速し、大きなコストをかけた。本稿では,余分な計算資源を誘導することなく,MFを高速化するアルゴリズムを提案する。具体的には, あるしきい値を考慮した場合, 分解された特徴行列の微細な構造空間を観察する。微細な構造化されたスパーシリティは、行列乗算と潜在因子の更新の間に大量の不要な操作を引き起こし、MFトレーニングプロセスの計算時間を増加させる。この観測に基づいて,まず関節の間隔に基づいて特徴行列を並べ替えることを提案する。特徴行列再構成は、後のプルーニング処理による誤差を制限するために与えられる。そこで本研究では,行列乗算と潜在因子更新の双方において,非有意な潜在因子を早期に停止するプロセスによって引き起こすことを提案する。プルーニングプロセスは、異なるユーザ/イテムに対する潜伏因子の間隔に応じて動的に実行され、プロセスが加速される。実験の結果,従来のMF訓練法と比較して最大20.08%の誤差増加で1.2-1.65の高速化が達成できた。また,最適化手法,最適化手法,初期化手法など,異なるパラメータを考慮した提案手法が適用可能であることを示す。

Matrix factorization (MF) is a widely used collaborative filtering (CF) algorithm for recommendation systems (RSs), due to its high prediction accuracy, great flexibility and high efficiency in big data processing. However, with the dramatically increased number of users/items in current RSs, the computational complexity for training a MF model largely increases. Many existing works have accelerated MF, by either putting in additional computational resources or utilizing parallel systems, introducing a large cost. In this paper, we propose algorithmic methods to accelerate MF, without inducing any additional computational resources. In specific, we observe fine-grained structured sparsity in the decomposed feature matrices when considering a certain threshold. The fine-grained structured sparsity causes a large amount of unnecessary operations during both matrix multiplication and latent factor update, increasing the computational time of the MF training process. Based on the observation, we firstly propose to rearrange the feature matrices based on joint sparsity, which potentially makes a latent vector with a smaller index more dense than that with a larger index. The feature matrix rearrangement is given to limit the error caused by the later performed pruning process. We then propose to prune the insignificant latent factors by an early stopping process during both matrix multiplication and latent factor update. The pruning process is dynamically performed according to the sparsity of the latent factors for different users/items, to accelerate the process. The experiments show that our method can achieve 1.2-1.65 speedups, with up to 20.08% error increase, compared with the conventional MF training process. We also prove the proposed methods are applicable considering different hyperparameters including optimizer, optimization strategy and initialization method.

翻訳日:2024-04-14 13:21:48 公開日:2024-03-18

# HomoGenius:ニューラル演算子を用いた機械的特性の迅速予測のための均質化基礎モデル

HomoGenius: a Foundation Model of Homogenization for Rapid Prediction of Effective Mechanical Properties using Neural Operators ( http://arxiv.org/abs/2404.07943v1 )

ライセンス: Link先を確認

Yizheng Wang, Xiang Li, Ziming Yan, Yuqing Du, Jinshuai Bai, Bokai Liu, Timon Rabczuk, Yinghua Liu,

(参考訳) 均質化(homogenization)は、多スケールの物理現象を研究するための重要なツールである。しかし、有限要素解析に大きく依存する伝統的な数値的均質化は、特に複雑な測地、材料、高分解能問題を扱う際に、広範な計算コストを必要とする。これらの制約に対処するために,演算子学習に基づく数値同化モデルを提案する。提案モデルでは,任意の測地,材料,分解物の均質化結果を迅速に提供し,従来の数値均質化法と比較して80倍の効率向上を実現している。我々は, 周期材料(TPMS: Triply Periodic Minimal Surface)の有効弾性率の予測におけるモデルの有効性を検証した。その結果,本モデルは高精度,超効率,学習能力を有することがわかった。

Homogenization is an essential tool for studying multiscale physical phenomena. However, traditional numerical homogenization, heavily reliant on finite element analysis, requires extensive computation costs, particularly in handling complex geometries, materials, and high-resolution problems. To address these limitations, we propose a numerical homogenization model based on operator learning: HomoGenius. The proposed model can quickly provide homogenization results for arbitrary geometries, materials, and resolutions, increasing the efficiency by a factor of 80 compared to traditional numerical homogenization methods. We validate effectiveness of our model in predicting the effective elastic modulus on periodic materials (TPMS: Triply Periodic Minimal Surface), including complex geometries, various Poisson's ratios and elastic modulus, and different resolutions for training and testing. The results show that our model possesses high precision, super efficiency, and learning capability.

翻訳日:2024-04-14 13:03:36 公開日:2024-03-18

# オンデバイス学習のための組込み開発環境のユーザビリティと性能解析

Usability and Performance Analysis of Embedded Development Environment for On-device Learning ( http://arxiv.org/abs/2404.07948v1 )

ライセンス: Link先を確認

Enzo Scaffi, Antoine Bonneau, Frédéric Le Mouël, Fabien Mieyeville,

(参考訳) 本研究は,デバイス上でのTinyML実装に有効な組み込み開発ツールを実証的に検討する。この研究は、基本的なハードウェア操作から最小限のMLトレーニングの展開に至るまで、リソース制限されたIoTデバイス上でさまざまな抽象化レベルを持つさまざまな開発ツールを評価する。この分析は、異なるソリューションのモデルトレーニングおよび推論およびユーザビリティにおけるメモリ使用量、エネルギー消費量、パフォーマンスメトリクスを含む。 Arduino Frameworkは実装の容易さを提供するが、ネイティブオプションと比較してエネルギー消費が増加する。 DVFSのような特定の重要な機能がOSに直接統合されていないことは、ハードウェア制御の細かい制限を強調している。

This research empirically examines embedded development tools viable for on-device TinyML implementation. The research evaluates various development tools with various abstraction levels on resource-constrained IoT devices, from basic hardware manipulation to deployment of minimalistic ML training. The analysis encompasses memory usage, energy consumption, and performance metrics during model training and inference and usability of the different solutions. Arduino Framework offers ease of implementation but with increased energy consumption compared to the native option, while RIOT OS exhibits efficient energy consumption despite higher memory utilization with equivalent ease of use. The absence of certain critical functionalities like DVFS directly integrated into the OS highlights limitations for fine hardware control.

翻訳日:2024-04-14 13:03:36 公開日:2024-03-18

# 一般化可能なガウススプレイティングによる強化学習

Reinforcement Learning with Generalizable Gaussian Splatting ( http://arxiv.org/abs/2404.07950v1 )

ライセンス: Link先を確認

Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu,

(参考訳) 優れた表現は強化学習(RL)のパフォーマンス、特に視覚に基づく強化学習において重要である。環境表現の質は学習課題の達成に直接影響を及ぼす。従来の視覚ベースのRLは、画像、点、ボクセル、神経放射場などの環境を表現するために、明示的または暗黙的な方法を使用するのが一般的である。しかし、これらの表現にはいくつかの欠点がある。複雑な局所的な地形を記述することも、見えない場面によく一般化することも、正確な前景マスクを必要とすることもできない。さらに、これらの暗黙的な神経表現は『ブラックボックス』に似たものであり、解釈可能性を大幅に妨げている。 3D Gaussian Splatting (3DGS) は、その明示的なシーン表現と微分可能なレンダリング特性を持ち、再構築と表現方法の革新的変化と見なされている。本稿では、GSRLと呼ばれるRLタスクを表現するための新しい一般化可能なガウス分割フレームワークを提案する。提案手法は,RoboMimic環境での検証により,複数のタスクにおいて他のベースラインよりも優れた結果が得られ,最も難しいタスクのベースラインに比べて10%,44%,15%の性能向上が達成される。この研究は、RLの表現として一般化可能な3DGSを活用する最初の試みである。

An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.

翻訳日:2024-04-14 13:03:36 公開日:2024-03-18

# ソーシャルネットワーク上でのARIMA時系列解析による多言語トピックダイナミクスとトレンド同定のデコード:LDA/HDPモデルにより強化された新しいデータ翻訳フレームワーク

Decoding Multilingual Topic Dynamics and Trend Identification through ARIMA Time Series Analysis on Social Networks: A Novel Data Translation Framework Enhanced by LDA/HDP Models ( http://arxiv.org/abs/2403.15445v1 )

ライセンス: Link先を確認

Samawel Jaballi, Azer Mahjoubi, Manar Joundy Hazar, Salah Zrigui, Henri Nicolas, Mounir Zrigui,

(参考訳) 本研究では,多言語トピックのダイナミクスの復号化と危機時のコミュニケーション傾向の同定に有効な新しい手法を提案する。われわれは、コロナウイルスパンデミックの間、チュニジアのソーシャルネットワーク内での対話や、スポーツや政治などの有名なテーマに焦点を当てている。まず、これらのテーマに関連するコメントの多言語コーパスを集約することから始めます。このデータセットは、データ前処理中に厳格に洗練される。次に、言語的差異に対処するために、ノー・イングリッシュ・トゥ・イングリッシュ・マシン翻訳手法を導入する。本手法の実証実験では, 高い精度とF1得点を示し, 言語的に整合性のある課題に対する適合性を強調した。より深い高度なモデリング技術、特にLDAとHDPモデルを用いて、翻訳されたコンテンツから関連するトピックを抽出する。これにより、ARIMA時系列分析を適用して、進化するトピックのトレンドをデコードする。提案手法を多言語チュニジアデータセットに適用し,公共の感情を反映した重要なトピックを効果的に同定した。このような洞察は、危機時の公共の視点を理解しようとする組織や政府にとって不可欠である。標準的なアプローチと比較して、私たちのモデルは、Coherence Score、U-mass、Topic Coherenceといったメトリクスで確認されているように、パフォーマンスが優れています。さらに,特定トピックの詳細な評価では,RMSEに基づく分析を背景として,議論の主題的変化が顕著であり,その傾向は印象的な精度を示している。

In this study, the authors present a novel methodology adept at decoding multilingual topic dynamics and identifying communication trends during crises. We focus on dialogues within Tunisian social networks during the Coronavirus Pandemic and other notable themes like sports and politics. We start by aggregating a varied multilingual corpus of comments relevant to these subjects. This dataset undergoes rigorous refinement during data preprocessing. We then introduce our No-English-to-English Machine Translation approach to handle linguistic differences. Empirical tests of this method showed high accuracy and F1 scores, highlighting its suitability for linguistically coherent tasks. Delving deeper, advanced modeling techniques, specifically LDA and HDP models are employed to extract pertinent topics from the translated content. This leads to applying ARIMA time series analysis to decode evolving topic trends. Applying our method to a multilingual Tunisian dataset, we effectively identified key topics mirroring public sentiment. Such insights prove vital for organizations and governments striving to understand public perspectives during crises. Compared to standard approaches, our model outperforms, as confirmed by metrics like Coherence Score, U-mass, and Topic Coherence. Additionally, an in-depth assessment of the identified topics revealed notable thematic shifts in discussions, with our trends identification indicating impressive accuracy, backed by RMSE-based analysis.

翻訳日:2024-04-01 02:54:20 公開日:2024-03-18

# 圧縮された信頼の復号:圧縮下における効率的なLLMの信頼性の検討

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression ( http://arxiv.org/abs/2403.15447v1 )

ライセンス: Link先を確認

Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li,

(参考訳) 高機能大言語モデル (LLM) の圧縮は,資源効率のよい推論手法として好まれている。 SoTA(State-of-the-art)圧縮法は、良質なタスク性能の保存において顕著な進歩を誇っているが、安全性と信頼性の点で圧縮の潜在的なリスクは無視されている。本研究は,8次元(8次元)にわたる5つのSTA圧縮技術を用いて,3つのLLMを徹底的に評価する。我々の実験は、圧縮と信頼性の間の複雑な相互作用を強調し、興味深いパターンを明らかにします。量子化は現在、効率性と信頼性を同時に達成する上で、プルーニングよりも効果的なアプローチであることが分かっています。例えば、4ビットの量子化モデルでは、元のモデルの信頼性は維持されるが、モデルプルーニングは50%の間隔でも信頼性を著しく低下させる。さらに、適度なビット範囲内での量子化の導入は、倫理や公正といった特定の信頼性の次元を予想外に改善する可能性がある。逆に、非常に低ビットレベル(3ビット)への極端な量子化は、信頼性を著しく低下させる傾向がある。このリスクの増加は、良心的なパフォーマンスを単独で見るだけでは発見できない。これらの知見は, LLMの実用性, 効率, 信頼性を同時に達成するための実践的勧告を導いた。モデルとコードはhttps://decoding-comp-trust.github.io/.com/で公開されている。

Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation of three (3) leading LLMs using five (5) SoTA compression techniques across eight (8) trustworthiness dimensions. Our experiments highlight the intricate interplay between compression and trustworthiness, revealing some interesting patterns. We find that quantization is currently a more effective approach than pruning in achieving efficiency and trustworthiness simultaneously. For instance, a 4-bit quantized model retains the trustworthiness of its original counterpart, but model pruning significantly degrades trustworthiness, even at 50% sparsity. Moreover, employing quantization within a moderate bit range could unexpectedly improve certain trustworthiness dimensions such as ethics and fairness. Conversely, extreme quantization to very low bit levels (3 bits) tends to significantly reduce trustworthiness. This increased risk cannot be uncovered by looking at benign performance alone, in turn, mandating comprehensive trustworthiness evaluation in practice. These findings culminate in practical recommendations for simultaneously achieving high utility, efficiency, and trustworthiness in LLMs. Models and code are available at https://decoding-comp-trust.github.io/.

翻訳日:2024-04-01 02:54:20 公開日:2024-03-18

# 位相検索におけるエンド・ツー・エンド・ラーニングとは何か?

What is Wrong with End-to-End Learning for Phase Retrieval? ( http://arxiv.org/abs/2403.15448v1 )

ライセンス: Link先を確認

Wenjie Zhang, Yuxiang Wan, Zhong Zhuang, Ju Sun,

(参考訳) 画像科学でよく見られる非線形逆問題に対しては、フォワードモデルの対称性が一般的である。このような問題を解決するためにデータ駆動型ディープラーニングアプローチを使用する場合、本質的な対称性は重大な学習困難を引き起こす可能性がある。本稿では,このような困難がどうして生じるのか,さらに重要なことは,学習前にトレーニングセット,すなわち対称性の破れを前処理して克服する方法を説明する。科学画像の多くの領域において中心的な遠距離位相探索 (FFPR) を例に挙げ, 対称破壊がデータ駆動学習を大幅に改善することを示す。また、対称性の破れの数学的原理を定式化する。

For nonlinear inverse problems that are prevalent in imaging science, symmetries in the forward model are common. When data-driven deep learning approaches are used to solve such problems, these intrinsic symmetries can cause substantial learning difficulties. In this paper, we explain how such difficulties arise and, more importantly, how to overcome them by preprocessing the training set before any learning, i.e., symmetry breaking. We take far-field phase retrieval (FFPR), which is central to many areas of scientific imaging, as an example and show that symmetric breaking can substantially improve data-driven learning. We also formulate the mathematical principle of symmetry breaking.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# 無視からの憎しみ! 会話のヘイトスピーチに対する説得モードの蒸留

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech ( http://arxiv.org/abs/2403.15449v1 )

ライセンス: Link先を確認

Ghadi Alyahya, Abeer Aldayel,

(参考訳) 反音声が使用する要因を調べることは、オンラインでヘイトスピーチに直面する最適な方法を理解することの核心にある。様々な研究は、感情の共感、攻撃性、敵意のレベルなど、カウンタースピーチで使用される感情ベースファクターを評価する。本研究は,会話の対話で使用される対語をより深く理解するために,説得モードを理性,感情,信頼性に抽出し,閉(複数ターン)と開(単ターン)の2種類の会話の相互作用において,人種差別,性差別,宗教に関する会話の相互作用を評価する。評価は、人間と生成された対音声の区別された振る舞いをカバーしている。また,回答の姿勢と,対応音声における各説得態勢の相互関係についても検討した。特に、オープン・クローズド・インタラクション(特にトピックレベルで)に対する反音声の説得モードの微妙な違いを観察し、論点をヘイトコメントを表すための説得モードとして理性を用いる傾向が一般的である。生成された反音声は感情的な説得モードを示す傾向があり、一方で人間のカウンターは推論を用いて傾いている。さらに,本研究は,説得モードとしての理由が,他の説得型よりも支持的な応答を得る傾向にあることを示した。本研究は, ヘイトスピーチを抑える研究に説得モードを取り入れることの可能性を強調し, これらのモードが説明可能性の最適な手段となり, 応答のスタンスをさらに導入するための道筋と, 最適な逆音声を構成するものを評価する上で果たす役割を明らかにする。

Examining the factors that the counter-speech uses is at the core of understanding the optimal methods for confronting hate speech online. Various studies assess the emotional base factor used in counter speech, such as emotion-empathy, offensiveness, and level of hostility. To better understand the counter-speech used in conversational interactions, this study distills persuasion modes into reason, emotion, and credibility and then evaluates their use in two types of conversation interactions: closed (multi-turn) and open (single-turn) conversation interactions concerning racism, sexism, and religion. The evaluation covers the distinct behaviors of human versus generated counter-speech. We also assess the interplay between the replies' stance and each mode of persuasion in the counter-speech. Notably, we observe nuanced differences in the counter-speech persuasion modes for open and closed interactions -- especially on the topic level -- with a general tendency to use reason as a persuasion mode to express the counterpoint to hate comments. The generated counter-speech tends to exhibit an emotional persuasion mode, while human counters lean towards using reasoning. Furthermore, our study shows that reason as a persuasion mode tends to obtain more supportive replies than do other persuasion types. The findings highlight the potential of incorporating persuasion modes into studies about countering hate speech, as these modes can serve as an optimal means of explainability and paves the way for the further adoption of the reply's stance and the role it plays in assessing what comprises the optimal counter-speech.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# 検索拡張世代(LoRAG)のループ

Loops On Retrieval Augmented Generation (LoRAG) ( http://arxiv.org/abs/2403.15450v1 )

ライセンス: Link先を確認

Ayush Thakur, Rashmi Vashisth,

(参考訳) 本稿では,反復ループ機構の導入による検索強化テキスト生成の品質向上を目的とした新しいフレームワークであるLoRAGについて述べる。このアーキテクチャは、生成モデル、検索機構、動的ループモジュールを統合し、入力コンテキストから取得した関連情報との相互作用を通じて生成されたテキストを反復的に洗練することができる。ベンチマークデータセットの実験的評価では、LORAGはBLEUスコア、ROUGEスコア、パープレキシティの点で既存の最先端モデルを超えており、生成されたテキストのコヒーレンスと関連性の両方を達成する上での有効性を示している。質的な評価は、文脈的にリッチで一貫性のある出力を生成するLORAGの能力をさらに示している。本研究は,テキスト生成における課題の緩和における反復ループの可能性について,LoRAGをこの分野における有望な進歩と位置づけた貴重な知見を提供する。

This paper presents Loops On Retrieval Augmented Generation (LoRAG), a new framework designed to enhance the quality of retrieval-augmented text generation through the incorporation of an iterative loop mechanism. The architecture integrates a generative model, a retrieval mechanism, and a dynamic loop module, allowing for iterative refinement of the generated text through interactions with relevant information retrieved from the input context. Experimental evaluations on benchmark datasets demonstrate that LoRAG surpasses existing state-of-the-art models in terms of BLEU score, ROUGE score, and perplexity, showcasing its effectiveness in achieving both coherence and relevance in generated text. The qualitative assessment further illustrates LoRAG's capability to produce contextually rich and coherent outputs. This research contributes valuable insights into the potential of iterative loops in mitigating challenges in text generation, positioning LoRAG as a promising advancement in the field.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# 大規模言語モデルを用いたFAIRデータ空間の実現に向けて

Towards Enabling FAIR Dataspaces Using Large Language Models ( http://arxiv.org/abs/2403.15451v1 )

ライセンス: Link先を確認

Benedikt T. Arnold, Johannes Theissen-Lipp, Diego Collarana, Christoph Lange, Sandra Geisler, Edward Curry, Stefan Decker,

(参考訳) データスペースは、伝統的に文化のようなデジタル化されていない領域を含む、さまざまな分野で採用されている。セマンティックWeb技術を活用することは、データ空間をFAIRにするのに役立つが、その複雑さはデータ空間の採用に重大な課題をもたらし、コストを増大させる。 LLM(Large Language Models)の出現は、これらのモデルがFAIRデータ空間の採用をサポートするにはどうすればよいのかという疑問を提起する。本研究では,データ空間におけるLLMの可能性を具体例で示す。我々はまた、この新興分野を探求するための研究課題も導いた。

Dataspaces have recently gained adoption across various sectors, including traditionally less digitized domains such as culture. Leveraging Semantic Web technologies helps to make dataspaces FAIR, but their complexity poses a significant challenge to the adoption of dataspaces and increases their cost. The advent of Large Language Models (LLMs) raises the question of how these models can support the adoption of FAIR dataspaces. In this work, we demonstrate the potential of LLMs in dataspaces with a concrete example. We also derive a research agenda for exploring this emerging field.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# ツールとは何か?言語モデルから見た調査

What Are Tools Anyway? A Survey from the Language Model Perspective ( http://arxiv.org/abs/2403.15452v1 )

ライセンス: Link先を確認

Zhiruo Wang, Zhoujun Cheng, Hao Zhu, Daniel Fried, Graham Neubig,

(参考訳) 言語モデル(LM)は強力だが、主にテキスト生成タスクに向いている。複雑なスキルを必要とするタスクのパフォーマンスを大幅に向上させた。しかしながら,多くの著作では,“ツール”という用語をさまざまな方法で採用している。その後、ツールはどのようにしてLMを助けるのか? 本稿では,LMが使用する外部プログラムとしてツールを統一的に定義し,LMツールのシナリオとアプローチを体系的にレビューする。本レビューに基づいて,様々なベンチマークで必要な計算および性能向上を計測し,様々なツール手法の有効性を実証的に検討し,今後の課題と課題を明らかにする。

Language models (LMs) are powerful yet mostly for text generation tasks. Tools have substantially enhanced their performance for tasks that require complex skills. However, many works adopt the term "tool" in different ways, raising the question: What is a tool anyway? Subsequently, where and how do tools help LMs? In this survey, we provide a unified definition of tools as external programs used by LMs, and perform a systematic review of LM tooling scenarios and approaches. Grounded on this review, we empirically study the efficiency of various tooling methods by measuring their required compute and performance gains on various benchmarks, and highlight some challenges and potential future research in the field.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# Span-Oriented Information extract -- 情報抽出の統一的視点

Span-Oriented Information Extraction -- A Unifying Perspective on Information Extraction ( http://arxiv.org/abs/2403.15453v1 )

ライセンス: Link先を確認

Yifan Ding, Michael Yankoski, Tim Weninger,

(参考訳) インフォメーション抽出(Information extract)とは、自然言語処理(NLP)におけるタスクの集合で、テキストとそのラベル内のサブシーケンスを識別する。これらのタスクは、関連する情報を抽出し、自由テキストを構造化データにリンクするために長年使われてきた。しかし,情報抽出タスクの不均一性は,この分野の進歩を妨げている。したがって、テキストでスパンと定義するものを中心に、統一された視点を提供する。次に、これらの不連続なタスクをこの統一的な視点に再配置し、続いて、情報抽出タスクを、同じ基本的なSpan-Oriented Information extractタスクの変種として、広範囲に並べて表現する。

Information Extraction refers to a collection of tasks within Natural Language Processing (NLP) that identifies sub-sequences within text and their labels. These tasks have been used for many years to link extract relevant information and to link free text to structured data. However, the heterogeneity among information extraction tasks impedes progress in this area. We therefore offer a unifying perspective centered on what we define to be spans in text. We then re-orient these seemingly incongruous tasks into this unified perspective and then re-present the wide assortment of information extraction tasks as variants of the same basic Span-Oriented Information Extraction task.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# 変圧器による感情検出 : 比較検討

Emotion Detection with Transformers: A Comparative Study ( http://arxiv.org/abs/2403.15454v1 )

ライセンス: Link先を確認

Mahdi Rezapour,

(参考訳) 本研究では,テキストデータを用いた感情分類におけるトランスフォーマーモデルの適用について検討する。我々は、異なる変圧器の変種を用いて、感情データセットを用いて、事前訓練されたトランスフォーマーモデルを訓練し、評価する。また、トランス層の微調整、層の訓練性、テキストデータの事前処理など、モデルの性能に影響を及ぼす要因についても分析する。解析の結果,句読解や停止語といった一般的な手法は,モデルの性能を損なうことが判明した。これは、トランスフォーマーの強みがテキスト内のコンテキスト関係を理解することにあるためかもしれない。句読点や停止語といった要素は、それでも感情や強調を伝達し、それらを取り除くことで、この文脈を混乱させる可能性がある。

In this study, we explore the application of transformer-based models for emotion classification on text data. We train and evaluate several pre-trained transformer models, on the Emotion dataset using different variants of transformers. The paper also analyzes some factors that in-fluence the performance of the model, such as the fine-tuning of the transformer layer, the trainability of the layer, and the preprocessing of the text data. Our analysis reveals that commonly applied techniques like removing punctuation and stop words can hinder model performance. This might be because transformers strength lies in understanding contextual relationships within text. Elements like punctuation and stop words can still convey sentiment or emphasis and removing them might disrupt this context.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# テキストストリーム中の微調整文のサンプリング法の改善

Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams ( http://arxiv.org/abs/2403.15455v1 )

ライセンス: Link先を確認

Cristiano Mesquita Garcia, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr, Jean Paul Barddal,

(参考訳) インターネット上でのテキストデータの拡散は、組織や企業がサービスや製品に関する世論を監視できるユニークな機会である。このようなデータの高速な生成を考えると、シーケンシャルに到着し、潜在的に無限のテキストストリームを処理するテキストストリームマイニング設定は、従来のバッチ学習よりも適していることが多い。事前トレーニングされた言語モデルは、ストリーミング環境で高品質なテキストベクトル化機能に一般的に使用されるが、コンセプトドリフト(データ分散が時間とともに変化し、モデルのパフォーマンスに悪影響を及ぼす現象)に適応するための課題に直面している。本研究は,概念ドリフトの問題に対処するため,選択的な微調整言語モデルの設計した7つのテキストサンプリング手法の有効性について検討し,性能劣化を軽減した。これらの手法がSBERTモデルの微調整に与える影響を, 4つの異なる損失関数を用いて正確に評価する。マクロF1スコアと経過時間に着目した評価では、2つのテキストストリームデータセットとインクリメンタルSVM分類器を用いて性能をベンチマークする。以上の結果から,ソフトマックスの損失とバッチ・オール・トリプレットの損失はテキストストリームの分類に特に有効であることが示唆された。特に,提案したWordPieceToken比サンプリング法は,識別された損失関数により性能を著しく向上させ,ベースライン結果を上回った。

The proliferation of textual data on the Internet presents a unique opportunity for institutions and companies to monitor public opinion about their services and products. Given the rapid generation of such data, the text stream mining setting, which handles sequentially arriving, potentially infinite text streams, is often more suitable than traditional batch learning. While pre-trained language models are commonly employed for their high-quality text vectorization capabilities in streaming contexts, they face challenges adapting to concept drift - the phenomenon where the data distribution changes over time, adversely affecting model performance. Addressing the issue of concept drift, this study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models, thereby mitigating performance degradation. We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions. Our evaluation, focused on Macro F1-score and elapsed time, employs two text stream datasets and an incremental SVM classifier to benchmark performance. Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification, demonstrating that larger sample sizes generally correlate with improved macro F1-scores. Notably, our proposed WordPieceToken ratio sampling method significantly enhances performance with the identified loss functions, surpassing baseline results.

翻訳日:2024-04-01 02:44:33 公開日:2024-03-18

# 気象リアナリシスデータを用いた高層天体における上向き雷の時空間リスク評価

Spatio-seasonal risk assessment of upward lightning at tall objects using meteorological reanalysis data ( http://arxiv.org/abs/2403.18853v1 )

ライセンス: Link先を確認

Isabell Stucke, Deborah Morgenstern, Georg J. Mayr, Thorsten Simon, Achim Zeileis, Gerhard Diendorfer, Wolfgang Schulz, Hannes Pichler,

(参考訳) 本研究は,アルプス東部とその周辺地域における高層天体の雷害について検討し,上向きの雷害のリスクを評価する。 ULの長期電流が大きな損傷を与える可能性があるため、ULは特に風力タービンに脅威を与える。現在のリスク評価手法は、気象条件の影響を見落とし、ULリスクを過小評価する可能性がある。そこで本研究では,ガイスベルクタワー(オーストリア)で測定されたUL値と大規模気象変数(35ドル)との関係を,機械学習手法であるランダムフォレストを用いて解析した。これらのうち、風速10mでの大規模な上昇速度、風速、方向、および雲物理学の変数が最も多くの情報に貢献している。ランダム森林は1 km$^2$の解像度で、調査領域全体でULのリスクを予測する。強風と高地による上向きのたわみが組み合わさった強風は、ULリスクを増大させる。 ULリスクと高リスク領域の日周期は季節的に変化する。冬はアルプス山脈の北と北東に集中し、北から南に広がるため、過渡期と夏の間は北イタリアに影響を及ぼす。このモデルは冬に最も良く、高い天体で観測された雷で観測されたピークと一致したULリスクが最も高い。最高濃度はアルプス山脈の北にあり、ほとんどの風力タービンが位置しており、雷活動全体の増加につながっている。雷密度は高い天体の雷の指標として不十分であるため、総合的な気象情報はULリスク評価に不可欠である。

This study investigates lightning at tall objects and evaluates the risk of upward lightning (UL) over the eastern Alps and its surrounding areas. While uncommon, UL poses a threat, especially to wind turbines, as the long-duration current of UL can cause significant damage. Current risk assessment methods overlook the impact of meteorological conditions, potentially underestimating UL risks. Therefore, this study employs random forests, a machine learning technique, to analyze the relationship between UL measured at Gaisberg Tower (Austria) and $35$ larger-scale meteorological variables. Of these, the larger-scale upward velocity, wind speed and direction at 10 meters and cloud physics variables contribute most information. The random forests predict the risk of UL across the study area at a 1 km$^2$ resolution. Strong near-surface winds combined with upward deflection by elevated terrain increase UL risk. The diurnal cycle of the UL risk as well as high-risk areas shift seasonally. They are concentrated north/northeast of the Alps in winter due to prevailing northerly winds, and expanding southward, impacting northern Italy in the transitional and summer months. The model performs best in winter, with the highest predicted UL risk coinciding with observed peaks in measured lightning at tall objects. The highest concentration is north of the Alps, where most wind turbines are located, leading to an increase in overall lightning activity. Comprehensive meteorological information is essential for UL risk assessment, as lightning densities are a poor indicator of lightning at tall objects.

翻訳日:2024-04-01 02:25:04 公開日:2024-03-18

# リンク予測による基準緩和勧告とランク付け

Directed Criteria Citation Recommendation and Ranking Through Link Prediction ( http://arxiv.org/abs/2403.18855v1 )

ライセンス: Link先を確認

William Watson, Lawrence Yong,

(参考訳) リンク予測は、新しい文書にトポロジ的あるいは文脈的に関連がある可能性のある既存の文献から自動的に文書を抽出するプロキシとして検討する。本モデルでは,各文書の意味を要約ネットワーク内のノードとして符号化するために,トランスフォーマーベースのグラフ埋め込みを用いる。我々のモデルが生成するセマンティック表現は、推薦タスクやランキングタスクにおいて、他のコンテントベースの手法よりも優れていることを示す。これは、すべての矛盾の可能性を最小限に抑えるために、これらの文書が互いに適切に引用することが重要である領域における引用グラフを探索するための全体論的アプローチを提供する。

We explore link prediction as a proxy for automatically surfacing documents from existing literature that might be topically or contextually relevant to a new document. Our model uses transformer-based graph embeddings to encode the meaning of each document, presented as a node within a citation network. We show that the semantic representations that our model generates can outperform other content-based methods in recommendation and ranking tasks. This provides a holistic approach to exploring citation graphs in domains where it is critical that these documents properly cite each other, so as to minimize the possibility of any inconsistencies

翻訳日:2024-04-01 02:25:04 公開日:2024-03-18

# Shift Aggregate Extract Networks

Shift Aggregate Extract Networks ( http://arxiv.org/abs/1703.05537v2 )

ライセンス: Link先を確認

Francesco Orsini, Daniele Baracchi, Paolo Frasconi,

(参考訳) 大規模グラフの効率的な表現を学習するために,階層分解に基づくアーキテクチャを導入する。我々のフレームワークは、カーネルメソッドで使用される古典的なR分解を拡張し、ネストした部分関係を可能にする。入力グラフのテンプレートを直接アンロールする再帰的ニューラルネットワークとは異なり、ニューラルネットワークテンプレートを分解階層上にアンロールすることで、一般的にソーシャルネットワークグラフを特徴付ける高次変動に対処することができる。深い階層的な分解は、対称性を利用して空間と時間の複雑さを減らす手法である領域圧縮にも適用可能である。我々は、我々のアプローチが、大規模なソーシャルネットワークデータセット上で最先端のグラフ分類手法より優れていると同時に、小さな化学生物学的なベンチマークデータセットに対して競争力があることを実証的に示す。

We introduce an architecture based on deep hierarchical decompositions to learn effective representations of large graphs. Our framework extends classic R-decompositions used in kernel methods, enabling nested part-of-part relations. Unlike recursive neural networks, which unroll a template on input graphs directly, we unroll a neural network template over the decomposition hierarchy, allowing us to deal with the high degree variability that typically characterize social network graphs. Deep hierarchical decompositions are also amenable to domain compression, a technique that reduces both space and time complexity by exploiting symmetries. We show empirically that our approach is able to outperform current state-of-the-art graph classification methods on large social network datasets, while at the same time being competitive on small chemobiological benchmark datasets.

翻訳日:2024-03-26 00:17:07 公開日:2024-03-18

# MDU-Net:バイオメディカルイメージセグメンテーションのためのマルチスケール高密度接続U-Net

MDU-Net: Multi-scale Densely Connected U-Net for biomedical image segmentation ( http://arxiv.org/abs/1812.00352v3 )

ライセンス: Link先を確認

Jiawei Zhang, Yuzhen Jin, Jilan Xu, Xiaowei Xu, Yanchun Zhang,

(参考訳) バイオメディカルイメージセグメンテーションは、定量的分析、臨床診断、医療介入において中心的な役割を果たす。完全畳み込みネットワーク (FCN) と U-Net により、ディープ畳み込みネットワーク (DNN) はバイオメディカルイメージセグメンテーションの応用に多大な貢献をしている。本稿では,U字型アーキテクチャのデコーダであるエンコーダに対して,3つの異なるMDC(Multi-scale dense connection)を提案する。 3つの密接な接続に基づいて,バイオメディカルイメージセグメンテーションのためのマルチスケール密接なU-Net(MDU-Net)を提案する。 MDU-Netは、隣のフィーチャーマップを高層と低層の両方から異なるスケールで直接融合させ、現在のレイヤにおけるフィーチャの伝搬を強化する。入力と出力に近い層間の接続が短いマルチスケールの高密度接続は、さらに深いU-Netを可能にする。さらに,高密度接続におけるポテンシャル過適合を緩和し,さらにセグメンテーション性能を向上させるために量子化を導入する。提案手法をMICCAI 2015 Gland Segmentation (GlaS) データセット上で評価した。 3つのMDCはU-Netのパフォーマンスを最大1.8%改善し、MICCAI GlandデータセットではテストAでは3.5%向上した。一方、量子化を伴うMDU-Netは、明らかに元のU-Netのセグメンテーション性能を改善する。

Biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. In the light of the fully convolutional networks (FCN) and U-Net, deep convolutional networks (DNNs) have made significant contributions to biomedical image segmentation applications. In this paper, we propose three different multi-scale dense connections (MDC) for the encoder, the decoder of U-shaped architectures, and across them. Based on three dense connections, we propose a multi-scale densely connected U-Net (MDU-Net) for biomedical image segmentation. MDU-Net directly fuses the neighboring feature maps with different scales from both higher layers and lower layers to strengthen feature propagation in the current layer. Multi-scale dense connections, which contain shorter connections between layers close to the input and output, also make a much deeper U-Net possible. Besides, we introduce quantization to alleviate the potential overfitting in dense connections, and further improve the segmentation performance. We evaluate our proposed model on the MICCAI 2015 Gland Segmentation (GlaS) dataset. The three MDC improve U-Net performance by up to 1.8% on test A and 3.5% on test B in the MICCAI Gland dataset. Meanwhile, the MDU-Net with quantization obviously improves the segmentation performance of original U-Net.

翻訳日:2024-03-26 00:17:07 公開日:2024-03-18

# コンパイラが生成した大規模言語モデルへのフィードバック

Compiler generated feedback for Large Language Models ( http://arxiv.org/abs/2403.14714v1 )

ライセンス: Link先を確認

Dejan Grubisic, Chris Cummins, Volker Seeker, Hugh Leather,

(参考訳) 我々は,LLVMアセンブリのコードサイズを最適化するために,コンパイラフィードバックを備えたLarge Language Modelを用いたコンパイラ最適化において,新しいパラダイムを導入する。このモデルは、最適化されていないLLVM IRを入力として取り、最適化されたIR、最適な最適化パス、最適化されていないIRと最適化されたIRの両方の命令数を生成する。そして、生成された最適化で入力をコンパイルし、予測された命令数が正しいか評価し、生成されたIRがコンパイル可能で、コンパイルされたコードに対応する。このフィードバックを LLM に返して,コードを最適化する新たな機会を与えています。このアプローチでは、オリジナルのモデルに-Ozよりも0.53%改善されている。フィードバックでより多くの情報を追加するのは直感的であるように思えるが、単純なサンプリング技術は10以上のサンプルが与えられた場合、はるかに高いパフォーマンスを達成する。

We introduce a novel paradigm in compiler optimization powered by Large Language Models with compiler feedback to optimize the code size of LLVM assembly. The model takes unoptimized LLVM IR as input and produces optimized IR, the best optimization passes, and instruction counts of both unoptimized and optimized IRs. Then we compile the input with generated optimization passes and evaluate if the predicted instruction count is correct, generated IR is compilable, and corresponds to compiled code. We provide this feedback back to LLM and give it another chance to optimize code. This approach adds an extra 0.53% improvement over -Oz to the original model. Even though, adding more information with feedback seems intuitive, simple sampling techniques achieve much higher performance given 10 or more samples.

翻訳日:2024-03-25 21:41:26 公開日:2024-03-18

# コンピューティングにおける参加拡大の進展を可視化する:コンテキストの価値

Visualizing Progress in Broadening Participation in Computing: The Value of Context ( http://arxiv.org/abs/2403.14708v1 )

ライセンス: Link先を確認

Valerie Barr, Carla E. Brodley, Manuel A. Pérez-Quiñones,

(参考訳) 米国内でのコンピューティングの表現に関する懸念は、参加を広げるために多くの活動を促している。これらの取り組みの影響の評価と、実際に対処されている「プロブレム」の明確な評価は、計算学の学位を持つ学生数の比率として各人口の表現を考察する最も一般的なデータ分析の性質によって制限されている。この単一のメトリクスの使用は、参加活動の拡大の影響を適切に評価することはできない。第一に、このアプローチは、連邦が指定した性別、人種、民族集団の総数と相対比率の点で、学部生の人口人口の変化を説明できない。第二の問題は、コンピューティング(BPC)への参加の拡大に関する文献の大多数が、学生の交叉アイデンティティに関するデータを省略して、性別や人種、民族に関するデータを報告していることである。これにより、データと私たちがフィールドとして直面している課題の両方を正しく理解できません。本稿では,BPCの取り組みに対する影響を追跡するために,いくつかの異なるアプローチを提案する。推奨事項は3つあります。 1)コホートに基づく分析は,コンピュータにおける学生のエンゲージメントを正確に示すために用いるべきである。 2 分野全体としては、常に交叉データを報告する基準を採用する必要がある。 3)大学人口統計学の文脈は、CS部門がコンピューティングへの参加を拡大するためにどれだけうまく行っているかを考える際に重要であり、その中には、コンピューティングの地域人口統計学に影響を及ぼす大学人口動態の経年変化の分析も含まれる。

Concerns about representation in computing within the U.S. have driven numerous activities to broaden participation. Assessment of the impact of these efforts and, indeed, a clear assessment of the actual "problem" being addressed are limited by the nature of the most common data analysis which looks at the representation of each population as a percentage of the number of students graduating with a degree in computing. This use of a single metric cannot adequately assess the impact of broadening participation efforts. First, this approach fails to account for changing demographics of the undergraduate population in terms of overall numbers and relative proportion of the Federally designated gender, race, and ethnicity groupings. A second issue is that the majority of literature on broadening participation in computing (BPC) reports data on gender or on race/ethnicity, omitting data on students' intersectional identities. This leads to an incorrect understanding of both the data and the challenges we face as a field. In this paper we present several different approaches to tracking the impact of BPC efforts. We make three recommendations: 1) cohort-based analysis should be used to accurately show student engagement in computing; 2) the field as a whole needs to adopt the norm of always reporting intersectional data; 3) university demographic context matters when looking at how well a CS department is doing to broaden participation in computing, including longitudinal analysis of university demographic shifts that impact the local demographics of computing.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-18

# 気候Q&A: 気候科学者と一般大衆のギャップを埋める

ClimateQ&A: Bridging the gap between climate scientists and the general public ( http://arxiv.org/abs/2403.14709v1 )

ライセンス: Link先を確認

Natalia De La Calzada, Théo Alves Da Costa, Annabelle Blangero, Nicolas Chesneau,

(参考訳) 本研究では,気候変動と生物多様性の喪失に関する世論を,ClimateQ&Aプラットフォームに対する質問の分析によって調査する。 ClimateQ&Aは、IPCCおよびIPBESレポートから14,000ページ以上の科学文献に基づいたクエリ応答にLLMを使用する会話エージェントである。 2023年3月にオンライン公開されたこのツールは、主にフランスの聴衆から3万以上の質問を集めた。そのチャットボットインタフェースは、自然に関する質問の自由な定式化を可能にする*。その主な目的は自然科学をよりアクセスしやすくすることであるが、質問とそのテーマの収集と分析を可能にする。クローズドな質問を含む従来の調査とは異なり、この新手法は自然に関する個別の質問に対する新たな視点を提供する。 3,425の質問でNLPクラスタリングアルゴリズムを実行すると、気候変動や生物多様性の喪失が個人(例えば、居住地や休暇、消費習慣)に与える影響や、自然(例えば、輸送や食品の選択)に対する行動の具体的な影響について、25.8%が大きな質問をしていることがわかった。このことは、従来の調査手法が既存の知識ギャップを全て特定する訳ではなく、IPCCとIPBESのレポートにのみ依存することは、気候と生物多様性に関する個々の質問に対処するものではなく、これらの問題に対する公衆の理解と行動に影響を与える可能性があることを示唆している。 ※「気候変化」・「生物多様性喪失」の傘語として「自然」を用いる。

This research paper investigates public views on climate change and biodiversity loss by analyzing questions asked to the ClimateQ&A platform. ClimateQ&A is a conversational agent that uses LLMs to respond to queries based on over 14,000 pages of scientific literature from the IPCC and IPBES reports. Launched online in March 2023, the tool has gathered over 30,000 questions, mainly from a French audience. Its chatbot interface allows for the free formulation of questions related to nature*. While its main goal is to make nature science more accessible, it also allows for the collection and analysis of questions and their themes. Unlike traditional surveys involving closed questions, this novel method offers a fresh perspective on individual interrogations about nature. Running NLP clustering algorithms on a sample of 3,425 questions, we find that a significant 25.8% inquire about how climate change and biodiversity loss will affect them personally (e.g., where they live or vacation, their consumption habits) and the specific impacts of their actions on nature (e.g., transportation or food choices). This suggests that traditional methods of surveying may not identify all existing knowledge gaps, and that relying solely on IPCC and IPBES reports may not address all individual inquiries about climate and biodiversity, potentially affecting public understanding and action on these issues. *we use 'nature' as an umbrella term for 'climate change' and 'biodiversity loss'

翻訳日:2024-03-25 21:31:40 公開日:2024-03-18

# ディプレックス学生支援のためのレコメンデーションモデルの利用

Use of recommendation models to provide support to dyslexic students ( http://arxiv.org/abs/2403.14710v1 )

ライセンス: Link先を確認

Gianluca Morciano, José Manuel Alcalde-Llergo, Andrea Zingoni, Enrique Yeguas-Bolivar, Juri Taborri, Giuseppe Calabrò,

(参考訳) dyslexiaは、最も広範囲にわたる特定の学習障害であり、認知領域に深刻な障害がある。これは、学習過程において、ディプレックスの学生に悪影響を及ぼす。したがって、これらの学生に特定の支援を与える必要がある。さらに、障害によって生じる問題は、互いに大きく異なる可能性があるため、このようなサポートは高度にパーソナライズされなければならない。本研究では, ディプレックスの学生に最も適した支援ツールを提案するために, AIを活用する可能性について検討した。これを実現するために、私たちは、個人の好みを検出し、最も適切な提案を提供することを目的とした、機械学習の分野であるレコメンデーションアルゴリズムを頼りにしました。そこで我々は,3つの協調フィルタリング推薦モデル,すなわちアイテムベース,ユーザベース,および重み付きハイブリッドモデルを実装し,1237名の学生の情報からなる大規模データベース上で,最も多く利用されている支援戦略とデジタルツールに関する自己評価質問紙を用いて,その性能について検討した。各レコメンデーションモデルは、ピアソン相関、ユークリッド距離、コサイン類似度という3つの異なる類似度指標で試験された。その結果,レコメンデーションシステムは,全員に最適なヘルプツールや戦略を提案する上で極めて有効であることがわかった。このことは、提案手法が成功し、ジストレキシーの学生を支援するための新しい効果的な方法として利用できることを示している。

Dyslexia is the most widespread specific learning disorder and significantly impair different cognitive domains. This, in turn, negatively affects dyslexic students during their learning path. Therefore, specific support must be given to these students. In addition, such a support must be highly personalized, since the problems generated by the disorder can be very different from one to another. In this work, we explored the possibility of using AI to suggest the most suitable supporting tools for dyslexic students, so as to provide a targeted help that can be of real utility. To do this, we relied on recommendation algorithms, which are a branch of machine learning, that aim to detect personal preferences and provide the most suitable suggestions. We hence implemented and trained three collaborative-filtering recommendation models, namely an item-based, a user-based and a weighted-hybrid model, and studied their performance on a large database of 1237 students' information, collected with a self-evaluating questionnaire regarding all the most used supporting strategies and digital tools. Each recommendation model was tested with three different similarity metrics, namely Pearson correlation, Euclidean distance and Cosine similarity. The obtained results showed that a recommendation system is highly effective in suggesting the optimal help tools/strategies for everyone. This demonstrates that the proposed approach is successful and can be used as a new and effective methodology to support students with dyslexia.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-18

# リング検出のための対人AI

Human-in-the-Loop AI for Cheating Ring Detection ( http://arxiv.org/abs/2403.14711v1 )

ライセンス: Link先を確認

Yong-Siang Shih, Manqian Liao, Ruidong Liu, Mirza Basim Baig,

(参考訳) 近年,アクセシビリティのため,オンライン試験が普及している。しかし、オンライン試験の安全性、特に悪質な試験受験者が合格するのを助けるプロの不正行為の文脈において、いくつかの懸念が持ち上がっており、いわゆる「チーティングリング」を形成している。本稿では,これらの不正なリングを検知し,阻止するように設計された,ループ型AI不正なリング検出システムを提案する。我々は、この人間のループAIシステムの基盤となる論理を概説し、不正者検出の目的を達成するための設計原則を探求する。さらに、AIシステムに関連する意図しないリスクを軽減することを目的として、その性能と公平性を評価するために使用される方法論について説明する。システムの設計と開発はResponsible AI(RAI)標準に準拠し、開発プロセス全体を通して倫理的考察が統合されることを保証する。

Online exams have become popular in recent years due to their accessibility. However, some concerns have been raised about the security of the online exams, particularly in the context of professional cheating services aiding malicious test takers in passing exams, forming so-called "cheating rings". In this paper, we introduce a human-in-the-loop AI cheating ring detection system designed to detect and deter these cheating rings. We outline the underlying logic of this human-in-the-loop AI system, exploring its design principles tailored to achieve its objectives of detecting cheaters. Moreover, we illustrate the methodologies used to evaluate its performance and fairness, aiming to mitigate the unintended risks associated with the AI system. The design and development of the system adhere to Responsible AI (RAI) standards, ensuring that ethical considerations are integrated throughout the entire development process.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-18

# 官僚的生産性のためのAI: 1億4300万の英国政府の取引を自動化するAIの可能性の測定

AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions ( http://arxiv.org/abs/2403.14712v1 )

ライセンス: Link先を確認

Vincent J. Straub, Youmna Hashem, Jonathan Bright, Satyam Bhagwanani, Deborah Morgan, John Francis, Saba Esnaashari, Helen Margetts,

(参考訳) 現在政府内では、複雑だが反復的な官僚的タスクの自動化によって、人工知能が公共サービスの生産性を向上させる可能性について、かなりの興奮がある。ここでは、英国中央政府における市民向きの官僚的意思決定手順の規模をマッピングし、AIによる自動化の可能性を評価することによって、この機会の規模を探る。我々は、英国中央政府が年間約10億件の市民向け取引を約400のサービスで行っており、そのうち約1億4300万が複雑な反復取引であると見積もっている。これらの複雑なトランザクションの84%は高度に自動化可能であると見積もっています。また、政府のサービスによる取引量の推定モデルも開発し、政府が取引量の計測に時間を費やすのを避けるための手段を提供する。最後に、政府が提供するサービスの種類には高いオーバオーバがあることが分かりました。つまり、自動化の取り組みは、時間とともに進化する可能性のあるサービス自身ではなく、一般的な手順に重点を置くべきです。全体として、我々の研究は、現代政府の構造と機能、そしてそれが人工知能の時代にどのように進化するかについて、新しい視点を示します。

There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-18

# 観測不能な条件下での公正さの監査

Auditing Fairness under Unobserved Confounding ( http://arxiv.org/abs/2403.14713v1 )

ライセンス: Link先を確認

Yewon Byun, Dylan Sam, Michael Oberst, Zachary C. Lipton, Bryan Wilder,

(参考訳) 意思決定システムにおける根本的な問題は、人口統計上の不平等の存在である。しかしながら、不平等は定量化が困難であり、特に我々の株式の概念がリスク(例えば、それ無しで死ぬ人に対する治療への平等なアクセス)のような難しい概念に依存している場合である。このような不平等を監査するには、個々のリスクを正確に測定する必要がある。これらの観測不能物が明らかな相違を「説明」する場合、過渡状態または過渡状態の不等式が成立する可能性がある。本稿では, リスク要因がすべて観察されているという仮定を排除した場合でも, 緩やかに, あるいは(当然のことながら) 高いリスクの個人間でのアロケーション率に情報的限界を与えることができることを示す。我々は、現実の多くの設定(例えば、新しい治療の導入)において、いかなるアロケーションよりも前の期間のデータを持ち、不偏のリスク見積を導出するという事実を利用する。筆者らは,Paxlovidの患者への配当に関する現実的な研究において,我々の枠組みの有効性を実証し,観察された人種的不平等は,重要な観察された同種種と同一の強度を持つ未観察の共同設立者によって説明できないことを発見した。

A fundamental problem in decision-making systems is the presence of inequity across demographic lines. However, inequity can be difficult to quantify, particularly if our notion of equity relies on hard-to-measure notions like risk (e.g., equal access to treatment for those who would die without it). Auditing such inequity requires accurate measurements of individual risk, which is difficult to estimate in the realistic setting of unobserved confounding. In the case that these unobservables "explain" an apparent disparity, we may understate or overstate inequity. In this paper, we show that one can still give informative bounds on allocation rates among high-risk individuals, even while relaxing or (surprisingly) even when eliminating the assumption that all relevant risk factors are observed. We utilize the fact that in many real-world settings (e.g., the introduction of a novel treatment) we have data from a period prior to any allocation, to derive unbiased estimates of risk. We demonstrate the effectiveness of our framework on a real-world study of Paxlovid allocation to COVID-19 patients, finding that observed racial inequity cannot be explained by unobserved confounders of the same strength as important observed covariates.

翻訳日:2024-03-25 21:31:40 公開日:2024-03-18

# 脳-コンピュータインタフェースのための軽量ベクトル記号アーキテクチャのスケジューリング知識獲得

Scheduled Knowledge Acquisition on Lightweight Vector Symbolic Architectures for Brain-Computer Interfaces ( http://arxiv.org/abs/2403.13844v1 )

ライセンス: Link先を確認

Yejia Liu, Shijin Duan, Xiaolin Xu, Shaolei Ren,

(参考訳) Brain-Computer Interface (BCI) は通常、ユーザがタイムリーなフィードバックを提供するために、軽量でリアルタイムに応答できるように設計されている。古典的特徴エンジニアリングは計算効率は高いが精度は低いが、最近のニューラルネットワーク(DNN)は精度を向上するが、計算コストが高く、レイテンシが高い。有望な代替として、ベクトル記号アーキテクチャ(VSA)に基づく低次元計算(LDC)分類器は、古典的な特徴工学手法よりも小さいモデルサイズで精度が高い。しかし、その精度は現代のDNNと比べても遅れており、複雑な脳信号を処理することは困難である。小モデルの精度を向上させるため、知識蒸留は一般的な方法である。しかし、教師と生徒のモデルの蒸留レベルを一定に保つことは、成長する学生にとって、その進歩的な学習段階において最善の方法ではないかもしれない。そこで本研究では,カリキュラムデータに基づく簡易な知識蒸留手法を提案し,学生が授業モデルから徐々に知識を構築できるようにし,それを$\alpha$スケジューラで制御する。一方,LDC/VSAを学生モデルとして採用し,低レイテンシを必要とする小型BCIデバイスにおいて,デバイス上での推論効率を向上させる。実験結果から,本手法は他の手法に比べて精度とハードウェア効率のトレードオフが良好であることが示された。

Brain-Computer interfaces (BCIs) are typically designed to be lightweight and responsive in real-time to provide users timely feedback. Classical feature engineering is computationally efficient but has low accuracy, whereas the recent neural networks (DNNs) improve accuracy but are computationally expensive and incur high latency. As a promising alternative, the low-dimensional computing (LDC) classifier based on vector symbolic architecture (VSA), achieves small model size yet higher accuracy than classical feature engineering methods. However, its accuracy still lags behind that of modern DNNs, making it challenging to process complex brain signals. To improve the accuracy of a small model, knowledge distillation is a popular method. However, maintaining a constant level of distillation between the teacher and student models may not be the best way for a growing student during its progressive learning stages. In this work, we propose a simple scheduled knowledge distillation method based on curriculum data order to enable the student to gradually build knowledge from the teacher model, controlled by an $\alpha$ scheduler. Meanwhile, we employ the LDC/VSA as the student model to enhance the on-device inference efficiency for tiny BCI devices that demand low latency. The empirical results have demonstrated that our approach achieves better tradeoff between accuracy and hardware efficiency compared to other methods.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# 目に見えないものをよりよく見る: インクリメンタルゼロショット異常診断のための広深さ混合アンチフォッティングフレームワーク

Learning to better see the unseen: Broad-Deep Mixed Anti-Forgetting Framework for Incremental Zero-Shot Fault Diagnosis ( http://arxiv.org/abs/2403.13845v1 )

ライセンス: Link先を確認

Jiancheng Zhao, Jiaqi Yue, Chunhui Zhao,

(参考訳) ゼロショット断層診断(ZSFD)は、人間の専門家によってラベル付けされた断層特性を予測することによって、目に見えない断層を識別することができる。我々はまず,ZSFDの産業プロセスの継続的な変化,すなわち新たな障害カテゴリや属性に適応するモデルの能力に対処する上で,これまで学んだ診断能力を忘れてはならない,という要求を認識した。既存のZSFDパラダイムは、産業シナリオにおけるトレーニングデータのストリームの進化から学べないという問題を克服するために、従来のZSFDパラダイムと一般化ZSFDパラダイムの両方にカテゴリインクリメントと属性インクリメントを組み込んだインクリメンタルZSFD(IZSFD)パラダイムが最初に提案されている。 IZSFDを実現するために,新しい障害カテゴリや属性から学習することを目的とした,広域混合型アンチフォゲッティングフレームワーク(BDMAFF)を提案する。忘れる問題に対処するため、BDMAFFは2つの観点から得られた知識、すなわち特徴と属性のプロトタイプを効果的に蓄積する。特徴記憶は、アンチフォッゲッティングトレーニング戦略を用いた深層生成モデルにより確立され、歴史的カテゴリの生成品質が監視され維持される。診断モデルは、生成モデルから生成されたサンプルの助けを借りてUNSEEN断層をSEEする。属性プロトタイプメモリは、広範学習システムにインスパイアされた診断モデルによって確立される。従来の漸進的学習アルゴリズムとは異なり、BDMAFFは診断モデルのメモリ駆動反復更新戦略を導入し、過去のトレーニングサンプルをすべて保存することなく、新しい障害や属性を学習できるようにする。提案手法の有効性は,実油圧システムとテネシー・イーストマンベンチマークプロセスによって検証される。

Zero-shot fault diagnosis (ZSFD) is capable of identifying unseen faults via predicting fault attributes labeled by human experts. We first recognize the demand of ZSFD to deal with continuous changes in industrial processes, i.e., the model's ability to adapt to new fault categories and attributes while avoiding forgetting the diagnosis ability learned previously. To overcome the issue that the existing ZSFD paradigm cannot learn from evolving streams of training data in industrial scenarios, the incremental ZSFD (IZSFD) paradigm is proposed for the first time, which incorporates category increment and attribute increment for both traditional ZSFD and generalized ZSFD paradigms. To achieve IZSFD, we present a broad-deep mixed anti-forgetting framework (BDMAFF) that aims to learn from new fault categories and attributes. To tackle the issue of forgetting, BDMAFF effectively accumulates previously acquired knowledge from two perspectives: features and attribute prototypes. The feature memory is established through a deep generative model that employs anti-forgetting training strategies, ensuring the generation quality of historical categories is supervised and maintained. The diagnosis model SEEs the UNSEEN faults with the help of generated samples from the generative model. The attribute prototype memory is established through a diagnosis model inspired by the broad learning system. Unlike traditional incremental learning algorithms, BDMAFF introduces a memory-driven iterative update strategy for the diagnosis model, which allows the model to learn new faults and attributes without requiring the storage of all historical training samples. The effectiveness of the proposed method is verified by a real hydraulic system and the Tennessee-Eastman benchmark process.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# グラフ最大復号情報を用いたクラスタリング手法

A Clustering Method with Graph Maximum Decoding Information ( http://arxiv.org/abs/2403.13846v1 )

ライセンス: Link先を確認

Xinrun Xu, Manying Lv, Yurong Wu, Zhanbiao Lian, Zhiming Ding, Jin Yan, Shan Jiang,

(参考訳) グラフモデルに基づくクラスタリング手法は,様々な知識領域にまたがる適用性に注目が集まっている。他の関連するアプリケーションとシームレスに統合する適応性は、グラフモデルに基づくクラスタリング分析に、データセット内で「自然な関連」や「グラフ構造」を堅牢に抽出する能力を与え、データポイント間の関係のモデリングを容易にする。その有効性にもかかわらず、グラフベースモデルを用いた現在のクラスタリング手法は、ノード間のランダムウォークアクセスとデータ内の組込み構造情報に関連する不確実性を見落としている。このギャップに対処するために, CMDI と呼ばれるグラフベースモデル内でのデコード情報の最大化のためのクラスタリング手法を提案する。 CMDIは、グラフ構造抽出とグラフ頂点分割という2つのフェーズからなるクラスタリングプロセスに、2次元構造情報理論を革新的に組み入れている。 CMDI内では、グラフ分割は抽象的なクラスタリング問題として再構成され、最大復号情報を利用して、頂点へのランダムな訪問に関連する不確実性を最小限に抑える。 3つの実世界のデータセットに対する実証的な評価は、CMDIが古典的ベースライン法よりも優れており、より優れた復号化情報比(DI-R)を示すことを示している。さらにCMDIは,特に事前知識(PK)を考慮した場合,高い効率性を示す。これらの結果から,デコード情報の品質と計算効率を向上させるCMDIの有効性が示され,グラフベースのクラスタリング解析において貴重なツールとして位置づけられた。

The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# ガウス混合モデルによるドメイン適応のための最適輸送

Optimal Transport for Domain Adaptation through Gaussian Mixture Models ( http://arxiv.org/abs/2403.13847v1 )

ライセンス: Link先を確認

Eduardo Fernandes Montesuma, Fred Maurice Ngolè Mboula, Antoine Souloumiac,

(参考訳) 本稿では,最適輸送による領域適応について検討する。本稿では,ガウス混合モデルを用いてデータ分布をモデル化する手法を提案する。この戦略により、等価な離散的な問題を通じて連続的な最適輸送を解くことができる。最適なトランスポートソリューションは、ソースとターゲットドメインの混合コンポーネントのマッチングを提供します。このマッチングから、ドメイン間でデータポイントをマッピングしたり、ソースドメインコンポーネントからターゲットドメインへラベルを転送したりできます。断層診断における2つの領域適応ベンチマークを用いて,本手法の最先端性能を示す。

In this paper we explore domain adaptation through optimal transport. We propose a novel approach, where we model the data distributions through Gaussian mixture models. This strategy allows us to solve continuous optimal transport through an equivalent discrete problem. The optimal transport solution gives us a matching between source and target domain mixture components. From this matching, we can map data points between domains, or transfer the labels from the source domain components towards the target domain. We experiment with 2 domain adaptation benchmarks in fault diagnosis, showing that our methods have state-of-the-art performance.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# 差分生成型かつ正確なルールリストの学習のための平滑感性

Smooth Sensitivity for Learning Differentially-Private yet Accurate Rule Lists ( http://arxiv.org/abs/2403.13848v1 )

ライセンス: Link先を確認

Timothée Ly, Julien Ferry, Marie-José Huguet, Sébastien Gambs, Ulrich Aivodji,

(参考訳) Differentially-private (DP) メカニズムは、結果として生じるモデルをプライバシリークから保護するために、機械学習アルゴリズムの設計に組み込むことができる。本稿では,Giniの不純物のスムーズな感度を確立し,それを利用してDPグリードルリストアルゴリズムを提案することによって,ルールリストモデルのトレードオフを改善することを目的とする。特に, 理論解析および実験結果から, 滑らかな感度を組み込んだDPルールリストは, グローバルな感度に基づく他のDPフレームワークを用いたモデルよりも精度が高いことが示された。

Differentially-private (DP) mechanisms can be embedded into the design of a machine learningalgorithm to protect the resulting model against privacy leakage, although this often comes with asignificant loss of accuracy. In this paper, we aim at improving this trade-off for rule lists modelsby establishing the smooth sensitivity of the Gini impurity and leveraging it to propose a DP greedyrule list algorithm. In particular, our theoretical analysis and experimental results demonstrate thatthe DP rule lists models integrating smooth sensitivity have higher accuracy that those using otherDP frameworks based on global sensitivity.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# グラフ: グラフニューラルネットワークとグラフ生成

Graphs Unveiled: Graph Neural Networks and Graph Generation ( http://arxiv.org/abs/2403.13849v1 )

ライセンス: Link先を確認

László Kovács, Ali Jlidi,

(参考訳) 機械学習におけるホットトピックの1つは、GNNの分野である。グラフデータの複雑さは、既存の機械学習アルゴリズムに重大な課題を課している。近年,グラフデータに対する深層学習手法の拡張に関する研究が盛んに行われている。本稿では,グラフニューラルネットワーク(GNN)の概要を紹介する。様々な領域にわたるグラフニューラルネットワークの適用について論じる。最後に,GNNの高度な分野としてグラフ生成を提案する。

One of the hot topics in machine learning is the field of GNN. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. This paper represents a survey, providing a comprehensive overview of Graph Neural Networks (GNNs). We discuss the applications of graph neural networks across various domains. Finally, we present an advanced field in GNNs: graph generation.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# 物理認識とパラメータ拡散誘導による時空間流体力学モデリング

Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance ( http://arxiv.org/abs/2403.13850v1 )

ライセンス: Link先を確認

Hao Wu, Fan Xu, Yifan Duan, Ziwei Niu, Weiyan Wang, Gaofeng Lu, Kun Wang, Yuxuan Liang, Yang Wang,

(参考訳) 本稿では,地球科学分野における時空間流体力学モデリングのための2段階のフレームワークST-PADを提案する。上流の段階では、時間的進化特性を持つベクトル量子化再構成モジュールを設計し、一般的な物理制約を導入することで、平衡パラメータ分布と弾力パラメータ分布を確保する。下流の段階では、パラメータを含む拡散確率ネットワークを用いて、様々な物理装置におけるパラメータの知覚によりモデルの一般化能力を高めながら、流体の高品質な将来状態を生成する。複数のベンチマークデータセットに対する大規模な実験により、ST-PADフレームワークの有効性とロバスト性が確認され、ST-PADは流体力学のモデリングと予測において、特に局所的な表現を効果的に取得し、OOD世代において大きな優位性を維持する上で、現在の主流モデルよりも優れていることを示した。

This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance. In the upstream stage, we design a vector quantization reconstruction module with temporal evolution characteristics, ensuring balanced and resilient parameter distribution by introducing general physical constraints. In the downstream stage, a diffusion probability network involving parameters is utilized to generate high-quality future states of fluids, while enhancing the model's generalization ability by perceiving parameters in various physical setups. Extensive experiments on multiple benchmark datasets have verified the effectiveness and robustness of the ST-PAD framework, which showcase that ST-PAD outperforms current mainstream models in fluid dynamics modeling and prediction, especially in effectively capturing local representations and maintaining significant advantages in OOD generations.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# ニューラルネットワークを用いた医療用デジタル双生児の制御

Control of Medical Digital Twins with Artificial Neural Networks ( http://arxiv.org/abs/2403.13851v1 )

ライセンス: Link先を確認

Lucas Böttcher, Luis L. Fonseca, Reinhard C. Laubenbacher,

(参考訳) パーソナライズドメディカルの目的は、患者固有の特徴に対する介入を調整することである。この目的のための重要な技術は、医療用デジタルツイン、ヒト生物学の計算モデルであり、患者固有のデータを時間とともに収集するパーソナライズされ、動的に更新することができる。免疫系のような人間の生物学の特定の側面は、微分方程式のような物理学に基づくモデルでは容易には捉えられない。代わりに、それらはしばしばマルチスケール、確率的、ハイブリッドである。これは、そのようなモデルに容易に適用できない既存のモデルベースの制御と最適化アプローチに挑戦する。自動微分法やニューラルネットワーク制御法の最近の進歩は、複雑な制御問題に対処する上で有望である。しかし、これらのアプローチの生体医療システムへの応用は、まだ初期段階にある。この研究は、医療用デジタルツインを制御する代替アプローチとして、動的インフォームドニューラルネットワークコントローラを導入している。この手法の第一のユースケースとして、バイオメディシンにおける多用途で一般的なモデリングプラットフォームであるエージェントベースモデルに焦点が当てられている。提案手法の有効性を実証し,2種類のエージェントモデルを用いた他の手法と比較した。ここで紹介される方法の関連性は、医療用デジタル双生児以外にも、他の複雑な力学系にも及んでいる。

The objective of personalized medicine is to tailor interventions to an individual patient's unique characteristics. A key technology for this purpose involves medical digital twins, computational models of human biology that can be personalized and dynamically updated to incorporate patient-specific data collected over time. Certain aspects of human biology, such as the immune system, are not easily captured with physics-based models, such as differential equations. Instead, they are often multi-scale, stochastic, and hybrid. This poses a challenge to existing model-based control and optimization approaches that cannot be readily applied to such models. Recent advances in automatic differentiation and neural-network control methods hold promise in addressing complex control problems. However, the application of these approaches to biomedical systems is still in its early stages. This work introduces dynamics-informed neural-network controllers as an alternative approach to control of medical digital twins. As a first use case for this method, the focus is on agent-based models, a versatile and increasingly common modeling platform in biomedicine. The effectiveness of the proposed neural-network control method is illustrated and benchmarked against other methods with two widely-used agent-based model types. The relevance of the method introduced here extends beyond medical digital twins to other complex dynamical systems.

翻訳日:2024-03-22 18:28:52 公開日:2024-03-18

# ChildCIフレームワーク:年齢検出のためのコンピュータインタラクションによる子どもの運動・認知発達の分析

ChildCI Framework: Analysis of Motor and Cognitive Development in Children-Computer Interaction for Age Detection ( http://arxiv.org/abs/2204.04236v3 )

ライセンス: Link先を確認

Juan Carlos Ruiz-Garcia, Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Jaime Herreros-Rodriguez,

(参考訳) 本稿では、近年のChildCIフレームワークで提案されている様々なテストについて包括的分析を行い、子どもの神経運動と認知発達をよりよく理解する可能性を示し、e-Healthやe-Learningといった他の研究分野にも応用できる可能性を示した。特に,子どもがモバイル機器と対話する際の運動・認知的側面に関連する,100以上のグローバルな特徴のセットを提案し,その一部を文献から収集し,適応させた。さらに, 運動と認知行動に基づいて, 児童年齢群検出の課題に対する実験結果を含む, 特徴集合の頑健性と識別力について分析した。本研究では,2つの異なるシナリオを考察する。一単体テストのシナリオ及び ii) 複数テストシナリオ。 93%以上の正確性は、公開されているChildCIdb_v1データベース(18ヶ月から8歳までの400人以上)を用いて達成され、子どもの年齢とモバイルデバイスとのインタラクションの関連性が高いことが証明された。

This article presents a comprehensive analysis of the different tests proposed in the recent ChildCI framework, proving its potential for generating a better understanding of children's neuromotor and cognitive development along time, as well as their possible application in other research areas such as e-Health and e-Learning. In particular, we propose a set of over 100 global features related to motor and cognitive aspects of the children interaction with mobile devices, some of them collected and adapted from the literature. Furthermore, we analyse the robustness and discriminative power of the proposed feature set including experimental results for the task of children age group detection based on their motor and cognitive behaviours. Two different scenarios are considered in this study: i) single-test scenario, and ii) multiple-test scenario. Results over 93% accuracy are achieved using the publicly available ChildCIdb_v1 database (over 400 children from 18 months to 8 years old), proving the high correlation of children's age with the way they interact with mobile devices.

翻訳日:2024-03-21 23:26:53 公開日:2024-03-18

# HyperVQ:双曲空間におけるMLRに基づくベクトル量子化

HyperVQ: MLR-based Vector Quantization in Hyperbolic Space ( http://arxiv.org/abs/2403.13015v1 )

ライセンス: Link先を確認

Nabarun Goswami, Yusuke Mukuta, Tatsuya Harada,

(参考訳) トークン化されたデータを扱うモデルの成功は、特に非離散的なデータを含む視覚や聴覚タスクに適用する場合に、効果的なトークン化手法の需要が高まっている。最も一般的なトークン化手法の1つはベクトル量子化(VQ)である。典型的には、VQ変動オートコーダ(VQVAE)は、データのトークン化表現への変換を訓練する。しかしながら、VQVAEは再構築目的で訓練されているため、埋め込みがうまく切り離されているという制約はなく、差別的なタスクでそれらを使用する上で重要な側面である。近年,表現学習における双曲空間の利点を実証する研究がいくつかある。双曲空間は、指数的体積成長と階層的および構造化されたデータをモデル化する固有の能力により、コンパクトな潜在表現を誘導する。本研究では,ベクトル量子化(HyperVQ)における双曲空間の利用について検討し,VQVAEで使用されるユークリッドK平均クラスタリングとは対照的に,双曲多項ロジスティック回帰(MLR)問題としてVQ演算を定式化する。広範にわたる実験により,ハイパーVQは,識別的タスクにおいてVQより優れ,非常に不整合な潜在空間を学習しながら,再構成や生成作業において相容れない性能を示す。

The success of models operating on tokenized data has led to an increased demand for effective tokenization methods, particularly when applied to vision or auditory tasks, which inherently involve non-discrete data. One of the most popular tokenization methods is Vector Quantization (VQ), a key component of several recent state-of-the-art methods across various domains. Typically, a VQ Variational Autoencoder (VQVAE) is trained to transform data to and from its tokenized representation. However, since the VQVAE is trained with a reconstruction objective, there is no constraint for the embeddings to be well disentangled, a crucial aspect for using them in discriminative tasks. Recently, several works have demonstrated the benefits of utilizing hyperbolic spaces for representation learning. Hyperbolic spaces induce compact latent representations due to their exponential volume growth and inherent ability to model hierarchical and structured data. In this work, we explore the use of hyperbolic spaces for vector quantization (HyperVQ), formulating the VQ operation as a hyperbolic Multinomial Logistic Regression (MLR) problem, in contrast to the Euclidean K-Means clustering used in VQVAE. Through extensive experiments, we demonstrate that hyperVQ performs comparably in reconstruction and generative tasks while outperforming VQ in discriminative tasks and learning a highly disentangled latent space.

翻訳日:2024-03-21 21:08:57 公開日:2024-03-18

# Impart: 知覚不能で効果的なラベル付きバックドアアタック

Impart: An Imperceptible and Effective Label-Specific Backdoor Attack ( http://arxiv.org/abs/2403.13017v1 )

ライセンス: Link先を確認

Jingke Zhao, Zan Wang, Yongwei Wang, Lanjun Wang,

(参考訳) バックドア攻撃は、実際のセキュリティクリティカルなシナリオに深刻な脅威を課すことが示されている。以前の研究は高い攻撃の成功率を達成することができるが、実際には脅威を著しく減少させるような被害者モデルにアクセスするか、ステルスネスで視覚的に目立たせることが必要になる。さらに、異なる毒のサンプルが異なる標的ラベル(すなわちオール・ツー・オールのセッティング)を持つというシナリオにおいて、攻撃の成功率を改善する余地もある。本研究では,攻撃者が被害者モデルにアクセスできないシナリオにおいて,Impartという新たな非知覚的バックドアアタック・フレームワークを提案する。具体的には、オール・ツー・オール・セッティングの攻撃能力を高めるために、まずラベル固有の攻撃を提案する。そこで本研究では, イメージ特徴のターゲットラベルと一致した摂動を代理モデルにより生成する手法を提案する。このようにして、生成した有毒画像にターゲットクラスに関する知識を付加し、攻撃能力を著しく向上させる。

Backdoor attacks have been shown to impose severe threats to real security-critical scenarios. Although previous works can achieve high attack success rates, they either require access to victim models which may significantly reduce their threats in practice, or perform visually noticeable in stealthiness. Besides, there is still room to improve the attack success rates in the scenario that different poisoned samples may have different target labels (a.k.a., the all-to-all setting). In this study, we propose a novel imperceptible backdoor attack framework, named Impart, in the scenario where the attacker has no access to the victim model. Specifically, in order to enhance the attack capability of the all-to-all setting, we first propose a label-specific attack. Different from previous works which try to find an imperceptible pattern and add it to the source image as the poisoned image, we then propose to generate perturbations that align with the target label in the image feature by a surrogate model. In this way, the generated poisoned images are attached with knowledge about the target class, which significantly enhances the attack capability.

翻訳日:2024-03-21 21:08:57 公開日:2024-03-18

# 特異値分解による見えないバックドア攻撃

Invisible Backdoor Attack Through Singular Value Decomposition ( http://arxiv.org/abs/2403.13018v1 )

ライセンス: Link先を確認

Wenmin Chen, Xiaowei Xu,

(参考訳) さまざまなドメインでディープラーニングが広く適用されるようになると、そのセキュリティに対する懸念は大幅に高まっている。これらのうち、バックドア攻撃はディープニューラルネットワーク(DNN)に深刻なセキュリティ上の脅威をもたらす。近年、ニューラルネットワークに対するバックドア攻撃はますます洗練され、隠れた無許可の機能やトリガーを埋め込むことによってモデルのセキュリティと信頼性を損なうことを目的としている。トリガーを知覚しにくく、知覚できないものにするため、様々な目に見えないバックドア攻撃が提案されている。しかし,その多くが空間領域の視認性しか考慮していないため,近年の防衛手法による有害画像の検出が容易であり,これらの課題に対処するために,DEBAと呼ばれる目に見えないバックドア攻撃を提案する。 DEBAは、Singular Value Decomposition(SVD)の数学的特性を活用して、トレーニングフェーズ中に知覚できないバックドアをモデルに埋め込むことで、特定のトリガー条件下で事前に定義された悪意のある振る舞いを示す。具体的には、まず画像上でSVDを実行し、次に、トリガー画像のマイナーな特徴をクリーン画像の特徴に置き換え、それらをトリガーとして使用して、攻撃の有効性を保証する。画像全体に小さな特徴が散在しているため、清潔な画像の主要な特徴が保存され、清潔な画像とは視覚的に区別できない。広汎な実験的評価により, DEBAは高い知覚品質を維持し, 有毒画像に対する高い攻撃成功率を保ち, 極めて有効であることが示された。さらに, 既存の防衛対策におけるDEBAの性能評価を行い, これらの防衛対策の効果を著しく回避し, 抵抗することができることを示した。

With the widespread application of deep learning across various domains, concerns about its security have grown significantly. Among these, backdoor attacks pose a serious security threat to deep neural networks (DNNs). In recent years, backdoor attacks on neural networks have become increasingly sophisticated, aiming to compromise the security and trustworthiness of models by implanting hidden, unauthorized functionalities or triggers, leading to misleading predictions or behaviors. To make triggers less perceptible and imperceptible, various invisible backdoor attacks have been proposed. However, most of them only consider invisibility in the spatial domain, making it easy for recent defense methods to detect the generated toxic images.To address these challenges, this paper proposes an invisible backdoor attack called DEBA. DEBA leverages the mathematical properties of Singular Value Decomposition (SVD) to embed imperceptible backdoors into models during the training phase, thereby causing them to exhibit predefined malicious behavior under specific trigger conditions. Specifically, we first perform SVD on images, and then replace the minor features of trigger images with those of clean images, using them as triggers to ensure the effectiveness of the attack. As minor features are scattered throughout the entire image, the major features of clean images are preserved, making poisoned images visually indistinguishable from clean ones. Extensive experimental evaluations demonstrate that DEBA is highly effective, maintaining high perceptual quality and a high attack success rate for poisoned images. Furthermore, we assess the performance of DEBA under existing defense measures, showing that it is robust and capable of significantly evading and resisting the effects of these defense measures.

翻訳日:2024-03-21 21:08:57 公開日:2024-03-18

# ASOP: クラウドベースのIoTサービスのためのセキュアでセキュアなデバイスオンボードプロトコル

ASOP: A Sovereign and Secure Device Onboarding Protocol for Cloud-based IoT Services ( http://arxiv.org/abs/2403.13020v1 )

ライセンス: Link先を確認

Khan Reaz, Gerhard Wunder,

(参考訳) 既存の高圧デバイス搭載プロセスは、IoT(Internet of Things)の約束と可能性を妨げる。様々なデバイスメーカーやワーキンググループによるいくつかの試みの後でも、広く採用されている標準ソリューションは実現しなかった。 Fast Identity Online (FIDO) Allianceによる最新の試みでは、マスマーケットIoT顧客のためのゼロタッチソリューションが約束されているが、その負担は中間サプライチェーン(すなわち、すべてのデバイスに対して‘Ownership Voucher’と呼ばれるキーとデジタルシグネチャを管理するためのインフラストラクチャを維持する必要がある)に転送される。この仕様はドメイン名システム(DNS)サーバーの概念を模倣した 'Rendezvous Server' に依存している。これは本質的には、Denial of Service(DoS)攻撃や相関攻撃を含む、DNSに関連する既存の攻撃シナリオを復活させることを意味する。 Ownership Voucherは、一部の中間サプライチェーンエージェントが悪意を持って行動し、所有権の移転を拒絶したり、間違ったキーで署名したりするリスクを生じさせる。さらに、この仕様における弱い楕円曲線SECP256r1/SECP384r1(NIST P-256/384としても知られる)の故意な使用は、疑問を提起する。私たちは、デバイスメーカー、サプライチェーン、クラウドサービスプロバイダを盲目的に信頼することなく、IoTデバイス用の主権とセキュアなデバイスオンボードプロトコルであるASOPを紹介します。 ASOPプロトコルは、ユーザが所有する認証者の助けを借りて、IoTデバイスをクラウドサーバに搭載することを可能にする。本稿では,プロトコルの事前開発とその高レベル記述について概説する。我々の 'zero-trust' と ' Human-in-the-loop' アプローチは、デバイス所有者がサードパーティのインフラストラクチャの恩恵を受けていないことを保証し、最近標準化されたポスト量子暗号スイート (CRYSTALS) を使用してコネクションとメッセージを保護する。

The existing high-friction device onboarding process hinders the promise and potentiality of Internet of Things (IoT). Even after several attempts by various device manufacturers and working groups, no widely adopted standard solution came to fruition. The latest attempt by Fast Identity Online (FIDO) Alliance promises a zero touch solution for mass market IoT customers, but the burden is transferred to the intermediary supply chain (i.e. they have to maintain infrastructure for managing keys and digital signatures called `Ownership Voucher' for all devices). The specification relies on a `Rendezvous Server' mimicking the notion of Domain Name System (DNS) server'. This essentially means resurrecting all existing possible attack scenarios associated with DNS, which include Denial of Service (DoS) attack, and Correlation attack. `Ownership Voucher' poses the risk that some intermediary supply chain agents may act maliciously and reject the transfer of ownership or sign with a wrong key. Furthermore, the deliberate use of the weak elliptic curve SECP256r1/SECP384r1 (also known as NIST P-256/384) in the specification raises questions. We introduce ASOP: a sovereign and secure device onboarding protocol for IoT devices without blindly trusting the device manufacturer, supply chain, and cloud service provider. The ASOP protocol allows onboarding an IoT device to a cloud server with the help of an authenticator owned by the user. This paper outlines the preliminary development of the protocol and its high-level description. Our `zero-trust' and `human-in-the-loop' approach guarantees that the device owner does not remain at the mercy of third-party infrastructures, and it utilises recently standardized post-quantum cryptographic suite (CRYSTALS) to secure connection and messages.

翻訳日:2024-03-21 21:08:57 公開日:2024-03-18

# 説明可能なコンセプトドリフトでサイバーセキュリティ攻撃を防ぐ

Thwarting Cybersecurity Attacks with Explainable Concept Drift ( http://arxiv.org/abs/2403.13023v1 )

ライセンス: Link先を確認

Ibrahim Shaer, Abdallah Shami,

(参考訳) サイバーセキュリティ攻撃は、自律システムの運用に重大な脅威をもたらす。特に影響を受けているのは、スマートビルの暖房、換気、空調(HVAC)システムで、センサーが収集したデータと、キャプチャデータを使用した機械学習(ML)モデルに依存する。したがって、これらのセンサーの読み方を変える攻撃は、住民の快適性とエネルギー削減の目標に影響を与えるHVACシステムの運用に深刻な影響を与える可能性がある。このような攻撃は、MLモデルに供給されるオンラインデータ配布の変化を誘発し、トレーニングとデータ配布のテストにおける類似性の基本的な前提を侵害する可能性がある。これにより、概念ドリフト(CD)と呼ばれる現象によってモデル予測精度が低下し、入力特徴と対象変数の関係が変化する。 CDに対処するには、ターゲット緩和戦略を適用するためにドリフトの源を特定する必要がある。本稿では, ドリフト特徴を特定するための特徴ドリフト記述(FDE)モジュールを提案する。 FDEは自動エンコーダ(AE)を利用して回帰ディープラーニング(DL)モデルの第一層の活性化を再構築し、その潜在表現を見つける。ドリフトを検出すると、ドリフトデータの各特徴をトレーニングデータから代表データに置き換える。ミンコフスキー距離は、変化したドリフトデータと元のトレーニングデータとのばらつきを測定するために使用される。その結果,FDE はドリフト特性の85.77 % を同定し,CD 現象下での DL 適応法での有用性を示した。その結果、FDE法は、サイバーセキュリティ攻撃を阻止するための漂流の特徴を識別するための効果的な戦略である。

Cyber-security attacks pose a significant threat to the operation of autonomous systems. Particularly impacted are the Heating, Ventilation, and Air Conditioning (HVAC) systems in smart buildings, which depend on data gathered by sensors and Machine Learning (ML) models using the captured data. As such, attacks that alter the readings of these sensors can severely affect the HVAC system operations impacting residents' comfort and energy reduction goals. Such attacks may induce changes in the online data distribution being fed to the ML models, violating the fundamental assumption of similarity in training and testing data distribution. This leads to a degradation in model prediction accuracy due to a phenomenon known as Concept Drift (CD) - the alteration in the relationship between input features and the target variable. Addressing CD requires identifying the source of drift to apply targeted mitigation strategies, a process termed drift explanation. This paper proposes a Feature Drift Explanation (FDE) module to identify the drifting features. FDE utilizes an Auto-encoder (AE) that reconstructs the activation of the first layer of the regression Deep Learning (DL) model and finds their latent representations. When a drift is detected, each feature of the drifting data is replaced by its representative counterpart from the training data. The Minkowski distance is then used to measure the divergence between the altered drifting data and the original training data. The results show that FDE successfully identifies 85.77 % of drifting features and showcases its utility in the DL adaptation method under the CD phenomenon. As a result, the FDE method is an effective strategy for identifying drifting features towards thwarting cyber-security attacks.

翻訳日:2024-03-21 21:08:57 公開日:2024-03-18

# 1枚の画像からタスクを発見・幻覚化させる計画(動画)

See, Imagine, Plan: Discovering and Hallucinating Tasks from a Single Image ( http://arxiv.org/abs/2403.13438v1 )

ライセンス: Link先を確認

Chenyang Ma, Kai Lu, Ta-Ying Cheng, Niki Trigoni, Andrew Markham,

(参考訳) 人間は、現在の世界で世界を認識し、理解するだけでなく、すぐに知覚できる以上の将来のシナリオを思い描くことができる。この深い人間の能力に似て、ゼロショットのタスク幻覚を導入します -- 未知の環境やオブジェクトを含むシーンの1つのRGBイメージを考えると、私たちのモデルは潜在的なタスクを特定し、ビデオとして実現された鮮やかな物語の中でそれらの実行を想像できます。動的相互作用のためのVLMと物体軌道のための3次元モーションプランニングを組み込んだ,シーンの分解,理解,再構築を段階的に向上するモジュールパイプラインを開発した。我々のモデルは、機械と人間の両方が理解できる現実的で魅力的な視覚結果を示すタスクビデオによって、多様なタスクを発見できる。 Project Page: https://dannymcy.github.io/zeroshot_task_hallucination/

Humans can not only recognize and understand the world in its current state but also envision future scenarios that extend beyond immediate perception. To resemble this profound human capacity, we introduce zero-shot task hallucination -- given a single RGB image of any scene comprising unknown environments and objects, our model can identify potential tasks and imagine their execution in a vivid narrative, realized as a video. We develop a modular pipeline that progressively enhances scene decomposition, comprehension, and reconstruction, incorporating VLM for dynamic interaction and 3D motion planning for object trajectories. Our model can discover diverse tasks, with the generated task videos demonstrating realistic and compelling visual outcomes that are understandable by both machines and humans. Project Page: https://dannymcy.github.io/zeroshot_task_hallucination/

翻訳日:2024-03-21 17:28:32 公開日:2024-03-18

# 説明可能な自然言語処理のための局所的解釈:サーベイ

Local Interpretations for Explainable Natural Language Processing: A Survey ( http://arxiv.org/abs/2103.11072v3 )

ライセンス: Link先を確認

Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon,

(参考訳) 過去10年間で深層学習技術が様々な分野に普及するにつれて、ブラックボックスモデルの不透明性に対する不満が高まり、ディープラーニングモデルの透明性に焦点が当てられるようになった。本研究では,機械翻訳や感情分析など,自然言語処理(NLP)タスクにおけるディープニューラルネットワークの解釈可能性を改善するための様々な手法について検討する。本研究のはじめに,解釈可能性という用語の定義とその諸側面について,包括的に議論する。本調査で収集・要約された手法は,局所的な解釈にのみ関連しており,具体的には3つのカテゴリに分けられる。 1) 関連する入力特徴を通してモデルの予測を解釈すること。 2) 自然言語の説明による解釈 3)モデルと単語表現の隠された状態の探索。

As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model's predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.

翻訳日:2024-03-21 04:02:20 公開日:2024-03-18

# DCVNet:高速光フローのための拡張コストボリュームネットワーク

DCVNet: Dilated Cost Volume Networks for Fast Optical Flow ( http://arxiv.org/abs/2103.17271v2 )

ライセンス: Link先を確認

Huaizu Jiang, Erik Learned-Miller,

(参考訳) コストボリュームは、2つの入力画像間での対応の類似性を捉え、最先端の光学的流れのアプローチにおいて重要な要素である。コストボリュームを構築するために通信をサンプリングする場合、大きな近傍半径が大きな変位に対処するために必要であり、かなりの計算負担が伴う。この問題に対処するため、コストボリュームの粗大な処理または再帰処理が通常採用され、小さな半径の局所的な近傍での対応サンプリングが十分である。本稿では,小型・大規模の変位を同時に捉えるために,異なる拡張係数を持つコストボリュームを構築した代替案を提案する。スキップ接続を有するU-Netを用いて、拡張コストのボリュームを、光学的フローを得るために、可能なすべての変位の間の補間重みに変換する。提案したモデルDCVNetは,単純なフィードフォワード方式で1回だけコストボリュームを処理し,シーケンシャルな処理戦略に依存しない。 DCVNetは、既存のアプローチに匹敵する精度を取得し、リアルタイム推論(ミッドエンドの1080ti GPUで30fps)を達成する。コードとモデルの重み付けはhttps://github.com/neu-vi/ezflow.comで確認できる。

The cost volume, capturing the similarity of possible correspondences across two input images, is a key ingredient in state-of-the-art optical flow approaches. When sampling correspondences to build the cost volume, a large neighborhood radius is required to deal with large displacements, introducing a significant computational burden. To address this, coarse-to-fine or recurrent processing of the cost volume is usually adopted, where correspondence sampling in a local neighborhood with a small radius suffices. In this paper, we propose an alternative by constructing cost volumes with different dilation factors to capture small and large displacements simultaneously. A U-Net with skip connections is employed to convert the dilated cost volumes into interpolation weights between all possible captured displacements to get the optical flow. Our proposed model DCVNet only needs to process the cost volume once in a simple feedforward manner and does not rely on the sequential processing strategy. DCVNet obtains comparable accuracy to existing approaches and achieves real-time inference (30 fps on a mid-end 1080ti GPU). The code and model weights are available at https://github.com/neu-vi/ezflow.

翻訳日:2024-03-21 02:10:44 公開日:2024-03-18

# 捕捉されたイオン量子コンピュータのための簡易Mølmer-Sørensenゲート

A simplified Mølmer-Sørensen gate for the trapped ion quantum computer ( http://arxiv.org/abs/2112.07855v4 )

ライセンス: Link先を確認

Hiroo Azuma,

(参考訳) トラップされたイオン量子コンピュータで使用されるMolmer-Sorensen(MS)ゲートの簡易化について論じる。元のMSゲートは、2つのイオンにバイクロマチックコヒーレント光電場を同時に照射することで実装されている。本稿では, 単色コヒーレント光電場を個別に照射することにより, 2つのイオンの分離可能な状態をベル状態に変換する方法を提案する。提案するゲートの実行時間の長さは,元のMSゲートの時間に匹敵するが,数値計算により,提案ゲートはフォノンの熱ゆらぎに弱いことが示されている。絡み合いを発生できるが、熱ゆらぎに弱い単純な2イオンゲートの別の例を示すことで、単純化したMSゲートが通常よりもマークされていることを示す。

We discuss how to simplify the Molmer-Sorensen (MS) gate which is used for the trapped ion quantum computer. The original MS gate is implemented by illuminating two ions with bichromatic coherent light fields separately at the same time. In this paper, we propose a method for transforming a separable state of two ions into one of the Bell states by illuminating the two ions with monochromatic coherent light fields individually and this point is the advantage of our scheme over the original MS gate. The length of the execution time of our proposed gate is comparable to that of the original MS gate, however, numerical calculations show that our proposed gate is weakly sensitive to thermal fluctuations of the phonons. By giving another example of a simple two-ion gate that can generate entanglement but is strongly vulnerable to thermal fluctuations, we show that our simplified MS gate is more marked than usual.

翻訳日:2024-03-21 02:10:44 公開日:2024-03-18

# テスト可能なHTML5 Canvasの分類

A Taxonomy of Testable HTML5 Canvas Issues ( http://arxiv.org/abs/2201.07351v5 )

ライセンス: Link先を確認

Finlay Macklon, Markos Viggiato, Natalia Romanova, Chris Buzon, Dale Paas, Cor-Paul Bezemer,

(参考訳) HTML5<canvas>は、Webアプリケーションで高品質なグラフィックを表示するために広く使われている。しかし、<canvas>アプリケーションを構築するのに必要なWeb、GUI、ビジュアルテクニックの組み合わせは、テストやデバッグツールの欠如とともに、そのようなアプリケーションの開発を非常に困難にしています。本稿では,テスト可能な<canvas>問題の分類について述べる。まず,HTML5<canvas>を使用する123のオープンソースプロジェクトから,2,403件の<canvas>関連イシューレポートを抽出した。第2に,無作為な332件の報告を手作業で分類することで分類を構築した。手動分類では、視覚やパフォーマンスの問題など、テスト可能な<canvas>問題の5つの幅広いカテゴリを特定しました。視覚的な問題は最も頻繁(35%)であり、パフォーマンス上の問題は比較的稀(5%)であることがわかった。また、テスト可能な<canvas>問題の多くが、実際にはWebアプリケーションの他のコンポーネントによって引き起こされていることもわかりました。テスト可能な<canvas>問題の分類は,<canvas>問題とテストの今後の研究に有効である。

The HTML5 <canvas> is widely used to display high quality graphics in web applications. However, the combination of web, GUI, and visual techniques that are required to build <canvas> applications, together with the lack of testing and debugging tools, makes developing such applications very challenging. To help direct future research on testing <canvas> applications, in this paper we present a taxonomy of testable <canvas> issues. First, we extracted 2,403 <canvas>-related issue reports from 123 open-source GitHub projects that use the HTML5 <canvas>. Second, we constructed our taxonomy by manually classifying a random sample of 332 issue reports. Our manual classification identified five broad categories of testable <canvas> issues, such as Visual and Performance issues. We found that Visual issues are the most frequent (35%), while Performance issues are relatively infrequent (5%). We also found that many testable <canvas> issues that present themselves visually on the <canvas> are actually caused by other components of the web application. Our taxonomy of testable <canvas> issues can be used to steer future research into <canvas> issues and testing.

翻訳日:2024-03-21 02:10:44 公開日:2024-03-18

# 密度行列を用いた量子密度推定:量子異常検出への応用

Quantum density estimation with density matrices: Application to quantum anomaly detection ( http://arxiv.org/abs/2201.10006v5 )

ライセンス: Link先を確認

Diego H. Useche, Oscar A. Bustos-Brinez, Joseph A. Gallego-Mejia, Fabio A. González,

(参考訳) 密度推定は統計学と機械学習の中心的なタスクである。この問題は、観測されたデータセットに最もよく適合する基礎となる確率密度関数を決定することを目的としている。応用例としては、統計的推論、教師なし学習、異常検出などがある。その関連性にもかかわらず、量子コンピューティングの密度推定への応用を探求する研究はほとんどない。本稿では,Q-DEMDEと呼ばれる新しい量子古典密度行列密度推定モデルを提案する。量子ハードウェアを用いて、混合量子状態によるトレーニングデータの確率分布を構築する。量子コンピュータ上でのスペクトル分解から混合密度行列の期待値を推定するアルゴリズムを提案する。さらに,本手法の量子古典的異常検出への応用について述べる。我々は、量子シミュレータと実量子コンピュータ上の異なるデータセット上で、量子ランダムおよび量子適応フーリエ特徴を用いた密度推定モデルの評価を行った。この研究の重要な成果は、現在の量子コンピュータ上で高い性能で密度推定と異常検出を行うことが可能であることを示すことである。

Density estimation is a central task in statistics and machine learning. This problem aims to determine the underlying probability density function that best aligns with an observed data set. Some of its applications include statistical inference, unsupervised learning, and anomaly detection. Despite its relevance, few works have explored the application of quantum computing to density estimation. In this article, we present a novel quantum-classical density matrix density estimation model, called Q-DEMDE, based on the expected values of density matrices and a novel quantum embedding called quantum Fourier features. The method uses quantum hardware to build probability distributions of training data via mixed quantum states. As a core subroutine, we propose a new algorithm to estimate the expected value of a mixed density matrix from its spectral decomposition on a quantum computer. In addition, we present an application of the method for quantum-classical anomaly detection. We evaluated the density estimation model with quantum random and quantum adaptive Fourier features on different data sets on a quantum simulator and a real quantum computer. An important result of this work is to show that it is possible to perform density estimation and anomaly detection with high performance on present-day quantum computers.

翻訳日:2024-03-21 02:10:44 公開日:2024-03-18

# 効率的な推論のための多段視覚変換器

Multi-Tailed Vision Transformer for Efficient Inference ( http://arxiv.org/abs/2203.01587v3 )

ライセンス: Link先を確認

Yunke Wang, Bo Du, Wenyuan Wang, Chang Xu,

(参考訳) 近年、視覚変換器(ViT)は画像認識において有望な性能を達成し、様々な視覚タスクにおいて、徐々に強力なバックボーンとして機能している。 Transformerのシーケンシャル入力を満たすために、ViTのテールはまず各画像を一定長さの視覚トークンのシーケンスに分割する。次に、以下の自己注意層がトークン間のグローバルな関係を構築し、下流のタスクに有用な表現を生成する。実証的には、より多くのトークンで画像を表現することでパフォーマンスが向上するが、トークンの数に対する自己認識層の2次計算の複雑さは、ViTの推論の効率に深刻な影響を及ぼす可能性がある。計算量削減のために、トランスフォーマーエンコーダで不定形トークンを段階的にプルーニングする手法がいくつかあるが、トランスフォーマーが触れない前にトークンの数を残している。実際、Transformerエンコーダの入力として、以下の計算コストを直接削減できるトークンが少ない。本稿では,MT-ViT(Multi-Tailed Vision Transformer)を提案する。 MT-ViTは、以下のTransformerエンコーダのために異なる長さの視覚シーケンスを生成するために複数のテールを採用する。テール予測器を導入し、どのテールが最も効率的に正確な予測を行うかを決定する。どちらのモジュールも、Gumbel-Softmaxのトリックでエンドツーエンドで最適化されている。 ImageNet-1Kの実験では、MT-ViTは精度を低下させることなくFLOPの大幅な削減を実現し、他の比較手法よりも精度とFLOPの両面で優れていた。

Recently, Vision Transformer (ViT) has achieved promising performance in image recognition and gradually serves as a powerful backbone in various vision tasks. To satisfy the sequential input of Transformer, the tail of ViT first splits each image into a sequence of visual tokens with a fixed length. Then the following self-attention layers constructs the global relationship between tokens to produce useful representation for the downstream tasks. Empirically, representing the image with more tokens leads to better performance, yet the quadratic computational complexity of self-attention layer to the number of tokens could seriously influence the efficiency of ViT's inference. For computational reduction, a few pruning methods progressively prune uninformative tokens in the Transformer encoder, while leaving the number of tokens before the Transformer untouched. In fact, fewer tokens as the input for the Transformer encoder can directly reduce the following computational cost. In this spirit, we propose a Multi-Tailed Vision Transformer (MT-ViT) in the paper. MT-ViT adopts multiple tails to produce visual sequences of different lengths for the following Transformer encoder. A tail predictor is introduced to decide which tail is the most efficient for the image to produce accurate prediction. Both modules are optimized in an end-to-end fashion, with the Gumbel-Softmax trick. Experiments on ImageNet-1K demonstrate that MT-ViT can achieve a significant reduction on FLOPs with no degradation of the accuracy and outperform other compared methods in both accuracy and FLOPs.

翻訳日:2024-03-21 02:10:44 公開日:2024-03-18

# インタラクション・レプリカ:人間と物体の相互作用の追跡と人間の動きからのシーン変化

Interaction Replica: Tracking Human-Object Interaction and Scene Changes From Human Motion ( http://arxiv.org/abs/2205.02830v4 )

ライセンス: Link先を確認

Vladimir Guzov, Julian Chibane, Riccardo Marin, Yannan He, Yunus Saracoglu, Torsten Sattler, Gerard Pons-Moll,

(参考訳) 私たちの世界は静的ではなく、人間は自然に環境の変化を引き起こします。人間によって引き起こされる変化をモデル化することは、デジタル双生児、例えば、共有物理仮想空間(メタバース)とロボット工学の文脈で構築するために不可欠である。このような新興アプリケーションを広く採用するためには、対話を捉えるためのセンサーのセットアップは、エキスパートでないユーザにとって安価で使いやすいものにする必要がある。すなわち、対話は、外部カメラやオブジェクトトラッカーに頼らず、カメラとIMUセンサーの組み合わせのような単純なエゴ中心のセンサーによってキャプチャされ、モデル化されるべきである。しかし、私たちの知る限りでは、このようなエゴ中心のセンサー設定を通じて人間とシーンのインタラクションをモデル化する難しい問題に対処する作業は存在しない。本稿は、シーンにおける人間の視覚的位置決めと、IMUデータからの人間とシーンの相互作用に関する接触に基づく推論を組み合わせることによって、文学におけるこのギャップを埋める。興味深いことに、インタラクションの視覚的な観察がなくても、人間とシーンの接触や相互作用が人間のポーズシーケンスから現実的に予測できることが示される。我々の手法であるiReplica(Interaction Replica)は,没入型仮想空間における将来のAR/VR応用や,人間のように振る舞うためのトレーニングマシンに必要な,人間との相互作用の自我中心的なキャプチャと動的シーンのモデリングに向けた重要な第一歩である。私たちのコード、データ、モデルはプロジェクトのページ(http://virtual humans.mpi-inf.mpg.de/ireplica/)で公開されています。

Our world is not static and humans naturally cause changes in their environments through interactions, e.g., opening doors or moving furniture. Modeling changes caused by humans is essential for building digital twins, e.g., in the context of shared physical-virtual spaces (metaverses) and robotics. In order for widespread adoption of such emerging applications, the sensor setup used to capture the interactions needs to be inexpensive and easy-to-use for non-expert users. I.e., interactions should be captured and modeled by simple ego-centric sensors such as a combination of cameras and IMU sensors, not relying on any external cameras or object trackers. Yet, to the best of our knowledge, no work tackling the challenging problem of modeling human-scene interactions via such an ego-centric sensor setup exists. This paper closes this gap in the literature by developing a novel approach that combines visual localization of humans in the scene with contact-based reasoning about human-scene interactions from IMU data. Interestingly, we can show that even without visual observations of the interactions, human-scene contacts and interactions can be realistically predicted from human pose sequences. Our method, iReplica (Interaction Replica), is an essential first step towards the egocentric capture of human interactions and modeling of dynamic scenes, which is required for future AR/VR applications in immersive virtual universes and for training machines to behave like humans. Our code, data and model are available on our project page at http://virtualhumans.mpi-inf.mpg.de/ireplica/

翻訳日:2024-03-21 02:10:44 公開日:2024-03-18

# ランダムな直交分解と深層学習によるディジタル双対データモデリング

Digital Twin Data Modelling by Randomized Orthogonal Decomposition and Deep Learning ( http://arxiv.org/abs/2206.08659v2 )

ライセンス: Link先を確認

Diana Alina Bistrian, Omer San, Ionel Michael Navon,

(参考訳) デジタルツインは、元のプロセスの振る舞いを反映する主な特徴を持つ代理モデルである。複雑性を低減したデジタルツインモデルと動的処理を関連付けることは、CPU時間とハードウェアのコストを削減した精度で動的処理をタイムスケールにマッピングする上で、大きな利点となる。本稿では,流体の効率的なディジタル双対モデルを作成するための新しい枠組みを提案する。我々は、Krylovに基づく動的モード分解と適切な直交分解を組み合わせ、最も影響力のあるモードの選択を上回る新しいアルゴリズムを提案する。我々は,SVD経験的直交分解法に対してランダム化された直交分解アルゴリズムがいくつかの利点を与え,多目的最適化問題の射影誤差を軽減できることを証明した。我々は,ディジタル双対モデルのリアルタイム適応キャリブレーションを行うために,最先端の人工知能ディープラーニング(DL)を巻き込み,忠実度を増大させる。出力は流体力学の高忠実なデジタルTWINデータモデルであり、複雑さの低減の利点がある。複雑化を伴う3つの波動現象の数値シミュレーションにおいて,新しいモデリングツールについて検討した。本研究は,時間シミュレーション応答特性研究を含む数値的精度と計算効率の観点から,新たなディジタルツインデータモデルの性能を徹底的に評価する。

A digital twin is a surrogate model that has the main feature to mirror the original process behavior. Associating the dynamical process with a digital twin model of reduced complexity has the significant advantage to map the dynamics with high accuracy and reduced costs in CPU time and hardware to timescales over which that suffers significantly changes and so it is difficult to explore. This paper introduces a new framework for creating efficient digital twin models of fluid flows. We introduce a novel algorithm that combines the advantages of Krylov based dynamic mode decomposition with proper orthogonal decomposition and outperforms the selection of the most influential modes. We prove that randomized orthogonal decomposition algorithm provides several advantages over SVD empirical orthogonal decomposition methods and mitigates the projection error formulating a multiobjective optimization problem.We involve the state-of-the-art artificial intelligence Deep Learning (DL) to perform a real-time adaptive calibration of the digital twin model, with increasing fidelity. The output is a high-fidelity DIGITAL TWIN DATA MODEL of the fluid flow dynamics, with the advantage of a reduced complexity. The new modelling tools are investigated in the numerical simulation of three wave phenomena with increasing complexity. We show that the outputs are consistent with the original source data.We perform a thorough assessment of the performance of the new digital twin data models, in terms of numerical accuracy and computational efficiency, including a time simulation response feature study.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# 多層ケークのフェアディビジョン

Fair Division of Multi-layered Cakes ( http://arxiv.org/abs/2208.00726v2 )

ライセンス: Link先を確認

Mohammad Azharuddin Sanpui,

(参考訳) 複数層ケーキの切断について検討し, 連続性と実現可能性という2つの制約の下で, エージェント群に多数の分割可能な資源(ケーキ層)を適切に割り当てる方法について検討した。まず,'a pair of knives' と呼ばれる多層ケーキに新しい計算モデルを導入する。そして,新しい計算モデルを用いて,2つのエージェントと2つのレイヤに対して,正確なマルチアロケーションが存在することを示す。本研究では,3個以上のエージェントに対して,3層ケーキ上に有意かつ連続的な比例多重配置の計算手順を実証する。最後に,任意の数$n\geq 2^a3$のエージェントと2^a3$のレイヤに対して,$a$が任意の正の整数であるような比例割当を計算する手法を開発した。

We consider multi-layered cake cutting in order to fairly allocate numerous divisible resources (layers of cake) among a group of agents under two constraints: contiguity and feasibility. We first introduce a new computational model in a multi-layered cake named ``a pair of knives''. Then, we show the existence of an exact multi-allocation for two agents and two layers using the new computational model. We demonstrate the computation procedure of a feasible and contiguous proportional multi-allocation over a three-layered cake for more than three agents. Finally, we develop a technique for computing proportional allocations for any number $n\geq 2^a3$ of agents and $2^a3$ layers, where $a$ is any positive integer.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# マルチビヘイビア勧告における公正のための因果介入

Causal Intervention for Fairness in Multi-behavior Recommendation ( http://arxiv.org/abs/2209.04589v2 )

ライセンス: Link先を確認

Xi Wang, Wenjie Wang, Wenge Rong, Fuli Feng, Chuantao Yin, Zhang Xiong,

(参考訳) レコメンダシステムは、クリックやクリック後の動作(例えば、いいね! しかし、これらの行動は必然的に人気バイアスを示し、不公平な問題を引き起こします。 1) 類似品質の品目については、より人気の高い品目が露出しやすくなり、 2) 人気度が低い人気商品の方が露出が大きくなる可能性がある。人気バイアスを緩和する既存の作業は、偏見を盲目的に排除し、アイテムの品質の影響を無視する。異なるユーザ行動(例えば変換率)の関係は、実際にはアイテムの品質を反映している、と我々は主張する。そこで本稿では,不公平な問題に対処するため,複数のユーザの行動を考慮した人気バイアスを軽減することを提案する。本研究では,多行動レコメンデーションにおけるインタラクション生成手法の背景にある因果関係について検討する。特に、私たちはこう発見しています。 1)アイテムの人気は、露出したアイテムとユーザーのクリック後のインタラクションの共創者であり、最初の不公平につながる。 2) 隠れた共同設立者(例えば、商品生産者の評判)は、商品の人気と品質の両方に影響を与え、2番目の不公平をもたらす。これらの問題点を解消するため,共同設立者によるバックドア経路の抑制にバックドア調整を利用する因果効果を推定する因果枠組みを提案する。推論段階では、人気のネガティブな効果を排除し、品質のよい効果を推薦に活用する。 2つの実世界のデータセット実験により,提案手法の有効性が検証された。

Recommender systems usually learn user interests from various user behaviors, including clicks and post-click behaviors (e.g., like and favorite). However, these behaviors inevitably exhibit popularity bias, leading to some unfairness issues: 1) for items with similar quality, more popular ones get more exposure; and 2) even worse the popular items with lower popularity might receive more exposure. Existing work on mitigating popularity bias blindly eliminates the bias and usually ignores the effect of item quality. We argue that the relationships between different user behaviors (e.g., conversion rate) actually reflect the item quality. Therefore, to handle the unfairness issues, we propose to mitigate the popularity bias by considering multiple user behaviors. In this work, we examine causal relationships behind the interaction generation procedure in multi-behavior recommendation. Specifically, we find that: 1) item popularity is a confounder between the exposed items and users' post-click interactions, leading to the first unfairness; and 2) some hidden confounders (e.g., the reputation of item producers) affect both item popularity and quality, resulting in the second unfairness. To alleviate these confounding issues, we propose a causal framework to estimate the causal effect, which leverages backdoor adjustment to block the backdoor paths caused by the confounders. In the inference stage, we remove the negative effect of popularity and utilize the good effect of quality for recommendation. Experiments on two real-world datasets validate the effectiveness of our proposed framework, which enhances fairness without sacrificing recommendation accuracy.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# ニューラルネットワークのグラフニューラルモデリング

Graph Neural Modeling of Network Flows ( http://arxiv.org/abs/2209.05208v3 )

ライセンス: Link先を確認

Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi,

(参考訳) 基盤となるインフラが効果的に利用されるようにトラフィックを分散するネットワークフロー問題は、輸送や物流においてユビキタスである。その中でも, 汎用マルチコモディティ・ネットワーク・フロー(MCNF)問題は, リンクの有効利用を実現しつつ, 複数のソースとシンク間の異なるサイズの複数のフローの分散を懸念している。データ駆動最適化の魅力により、これらの問題はグラフ学習法によってますますアプローチされてきた。本稿では,Per-Edge Weights (PEW) と呼ばれるネットワークフロー問題に対する新しいグラフ学習アーキテクチャを提案する。この方法はグラフアテンションネットワーク上に構築され、各リンクに沿って明確にパラメトリケートされたメッセージ関数を使用する。提案手法を,サービスプロバイダの17ドルのトポロジと2ドルのルーティングスキームを用いて,インターネットフロールーティングケーススタディを通じて広く評価した。本稿では,グローバルメッセージ関数がルーティングを不必要に制約するアーキテクチャに対して,PEWが実質的な利得が得られることを示す。 MLPが他の標準アーキテクチャと競合していることもわかっています。さらに,データ駆動型フロールーティングにおけるグラフ構造と予測性能の関係を解析する。

Network flow problems, which involve distributing traffic such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Among them, the general Multi-Commodity Network Flow (MCNF) problem concerns the distribution of multiple flows of different sizes between several sources and sinks, while achieving effective utilization of the links. Due to the appeal of data-driven optimization, these problems have increasingly been approached using graph learning methods. In this paper, we propose a novel graph learning architecture for network flow problems called Per-Edge Weights (PEW). This method builds on a Graph Attention Network and uses distinctly parametrized message functions along each link. We extensively evaluate the proposed solution through an Internet flow routing case study using $17$ Service Provider topologies and $2$ routing schemes. We show that PEW yields substantial gains over architectures whose global message function constrains the routing unnecessarily. We also find that an MLP is competitive with other standard architectures. Furthermore, we analyze the relationship between graph structure and predictive performance for data-driven routing of flows, an aspect that has not been considered by existing work in the area.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# 機械学習を用いた予測不確実性推定の検討

A review of predictive uncertainty estimation with machine learning ( http://arxiv.org/abs/2209.08307v2 )

ライセンス: Link先を確認

Hristos Tyralis, Georgia Papacharalampous,

(参考訳) 機械学習モデルの予測と予測は、エンドユーザーに伝達される情報の量を増やすことを目的として、確率分布の形式をとるべきである。学術・産業における確率的予測と機械学習モデルによる予測の応用は、ますます頻繁になってきているが、関連する概念や手法は、全分野の全体観の下で形式化され、構造化されていない。本稿では,機械学習アルゴリズムによる予測不確実性推定の話題と,確率的予測を評価するための関連する指標(一貫性スコアリング関数と適切なスコアリングルール)について概説する。このレビューでは、最近の機械学習アルゴリズム(位置、スケール、形状、ランダムな森林、強化とディープラーニングアルゴリズムの一般化された付加的モデルを含む)への早期統計(ベイズ統計または量子回帰に基づく線形回帰と時系列モデル)の導入から、自然により柔軟である期間をカバーしている。最新の進歩は、より複雑なアルゴリズムに適用された基本的な概念に基づいているため、ユーザのニーズに合わせて新しいアルゴリズムを開発する方法についての理解を深める。材料を分類し、研究のホットトピックとなっている課題について議論することで、結論付けます。

Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# ヒューマンAI意思決定における説明・公正・適切な信頼

Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making ( http://arxiv.org/abs/2209.11812v5 )

ライセンス: Link先を確認

Jakob Schoeffer, Maria De-Arteaga, Niklas Kuehl,

(参考訳) 本研究では,特徴に基づく説明がAIによる意思決定の分配的公正性に及ぼす影響について検討する。また、人間の公正感とAIレコメンデーションへの依存によって、どのような効果が媒介されるかについても検討する。以上の結果から,説明が公正感に影響を及ぼし,人間によるAI推奨に固執する傾向が示唆された。しかし、このような説明は、人間が正しいAIレコメンデーションと不正なAIレコメンデーションを識別することができない。代わりに、AIレコメンデーションの正確性に関わらず、それらが依存に影響を与える可能性があることを示す。説明がタスク非関連で、センシティブな属性と明らかに関連している特徴を強調している場合、これは、性別のステレオタイプに合わせてAIの推奨に反するオーバーライドを引き起こす。一方、説明がタスク関連性を示す場合、これはステレオタイプ整列エラーを強化する信頼行動を引き起こす。これらの結果は、特徴に基づく説明は分配的公正性を改善するための信頼性のあるメカニズムではないことを示唆している。

In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, we see that such explanations do not enable humans to discern correct and incorrect AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter AI recommendations that align with gender stereotypes. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results imply that feature-based explanations are not a reliable mechanism to improve distributive fairness.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# 地域的・地域的カウンターファクトルール:要約とロバストな論説

Local and Regional Counterfactual Rules: Summarized and Robust Recourses ( http://arxiv.org/abs/2209.14568v3 )

ライセンス: Link先を確認

Salim I. Amoukou, Nicolas J. B Brunel,

(参考訳) Counterfactual Explanations (CE)は、安定性の確保、複数のCEの合成、信頼性とスパーシリティの保証など、いくつかの未解決課題に直面している。より実践的な視点から見ると、最近の研究[Pawelczyk et al , 2022] は、規定された反ファクト・リコースが個人によって正しく実施されていないことを示し、ほとんどの最先端のCEアルゴリズムが、このノイズの多い環境で失敗する可能性が非常に高いことを示した。これらの問題に対処するため,各観測値に対して局所的反実律を緩やかに付与し,高い確率で決定を変更できる範囲の値を与える確率的枠組みを提案する。これらの規則は、様々な反事実的説明の要約として機能し、堅牢な論説をもたらす。さらに、これらの局所ルールを地域反事実ルールに集約し、データのサブグループに対する共有リコースを識別する。我々の地域・地域ルールはRandom Forestアルゴリズムから導かれており、高密度領域のレコースを選択することにより、データ分布に対する統計的保証と忠実度を提供する。さらに、我々のルールは、まず、決定を変更できる確率の高い最小の変数群を選択するため、疎い。我々は, 標準CEと最近の同様の試みと比較して, 対実ルールの有効性を検証する実験を行った。当社のメソッドはPythonパッケージとして利用可能です。

Counterfactual Explanations (CE) face several unresolved challenges, such as ensuring stability, synthesizing multiple CEs, and providing plausibility and sparsity guarantees. From a more practical point of view, recent studies [Pawelczyk et al., 2022] show that the prescribed counterfactual recourses are often not implemented exactly by individuals and demonstrate that most state-of-the-art CE algorithms are very likely to fail in this noisy environment. To address these issues, we propose a probabilistic framework that gives a sparse local counterfactual rule for each observation, providing rules that give a range of values capable of changing decisions with high probability. These rules serve as a summary of diverse counterfactual explanations and yield robust recourses. We further aggregate these local rules into a regional counterfactual rule, identifying shared recourses for subgroups of the data. Our local and regional rules are derived from the Random Forest algorithm, which offers statistical guarantees and fidelity to data distribution by selecting recourses in high-density regions. Moreover, our rules are sparse as we first select the smallest set of variables having a high probability of changing the decision. We have conducted experiments to validate the effectiveness of our counterfactual rules in comparison to standard CE and recent similar attempts. Our methods are available as a Python package.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# ゼロ階ハードThresholding: Gradient Error vs. Expansivity

Zeroth-Order Hard-Thresholding: Gradient Error vs. Expansivity ( http://arxiv.org/abs/2210.05279v2 )

ライセンス: Link先を確認

William de Vazelhes, Hualin Zhang, Huimin Wu, Xiao-Tong Yuan, Bin Gu,

(参考訳) $\ell_0$制約付き最適化は、特に高次元問題において機械学習において一般的である。厳密な勾配勾配降下はこの問題を解決する主要な手法である。しかし、目的関数の1階勾配は、ゼロ階勾配(ZO)が良い代理となるような実世界の多くの問題において計算するために、利用できないか高価なかのいずれかである。残念なことに、ZO勾配がハードThresholding演算子と機能するかどうかはまだ未解決の問題である。そこで,本稿では,制約付きブラックボックス確率最適化問題に焦点をあて,新しいランダムサポートサンプリングを用いた一般ZO勾配推定器を用いた確率的ゼロ階勾配ハードスレッディング(SZOHT)アルゴリズムを提案する。標準仮定の下でSZOHTの収束解析を行う。重要なことは、ZO推定器の偏差とハードThresholding演算子の膨張率との矛盾を明らかにし、ZO勾配におけるランダムな方向の数の理論的最小値を提供する。さらに,SZOHTの問合せ複雑性は,異なる条件下での次元性に依存しないか,あるいは弱く依存していることがわかった。最後に,ポートフォリオ最適化問題およびブラックボックス攻撃に対する本手法の有用性について述べる。

$\ell_0$ constrained optimization is prevalent in machine learning, particularly for high-dimensional problems, because it is a fundamental approach to achieve sparse learning. Hard-thresholding gradient descent is a dominant technique to solve this problem. However, first-order gradients of the objective function may be either unavailable or expensive to calculate in a lot of real-world problems, where zeroth-order (ZO) gradients could be a good surrogate. Unfortunately, whether ZO gradients can work with the hard-thresholding operator is still an unsolved problem. To solve this puzzle, in this paper, we focus on the $\ell_0$ constrained black-box stochastic optimization problems, and propose a new stochastic zeroth-order gradient hard-thresholding (SZOHT) algorithm with a general ZO gradient estimator powered by a novel random support sampling. We provide the convergence analysis of SZOHT under standard assumptions. Importantly, we reveal a conflict between the deviation of ZO estimators and the expansivity of the hard-thresholding operator, and provide a theoretical minimal value of the number of random directions in ZO gradients. In addition, we find that the query complexity of SZOHT is independent or weakly dependent on the dimensionality under different settings. Finally, we illustrate the utility of our method on a portfolio optimization problem as well as black-box adversarial attacks.

翻訳日:2024-03-21 02:00:54 公開日:2024-03-18

# Copula Conformal Prediction for Multi-step Time Series Forecasting

Copula Conformal Prediction for Multi-step Time Series Forecasting ( http://arxiv.org/abs/2212.03281v4 )

ライセンス: Link先を確認

Sophia Sun, Rose Yu,

(参考訳) 正確な不確実性測定は、堅牢で信頼性の高い機械学習システムを構築するための重要なステップである。コンフォーマル予測(Conformal prediction)は、その実装の容易さ、統計的カバレッジ保証、および基礎となる予測器の汎用性で人気のある、分布のない不確実性定量化アルゴリズムである。しかし、時系列に対する既存の共形予測アルゴリズムは、時間的依存を考慮せずに単段階予測に制限される。本稿では,多変量・多段階時系列予測のためのCopula Conformal Predictionアルゴリズム,CopulaCPTSを提案する。我々はCopulaCPTSが有限標本妥当性を保証することを証明した。いくつかの合成および実世界の多変量時系列データセットにおいて、CopulaCPTSは既存の手法よりも多段階予測タスクに対してより校正され、鋭い信頼区間を生成することを示す。

Accurate uncertainty measurement is a key step to building robust and reliable machine learning systems. Conformal prediction is a distribution-free uncertainty quantification algorithm popular for its ease of implementation, statistical coverage guarantees, and versatility for underlying forecasters. However, existing conformal prediction algorithms for time series are limited to single-step prediction without considering the temporal dependency. In this paper, we propose a Copula Conformal Prediction algorithm for multivariate, multi-step Time Series forecasting, CopulaCPTS. We prove that CopulaCPTS has finite sample validity guarantee. On several synthetic and real-world multivariate time series datasets, we show that CopulaCPTS produces more calibrated and sharp confidence intervals for multi-step prediction tasks than existing techniques.

翻訳日:2024-03-21 01:51:05 公開日:2024-03-18

# Tsallis KL分枝を用いた一般化Munchausen強化学習

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence ( http://arxiv.org/abs/2301.11476v4 )

ライセンス: Link先を確認

Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White,

(参考訳) 強化学習における多くの政策最適化アプローチは、政策の急激な変更を防ぐために、クルバック・ライルブラー(KL)を以前の政策に分岐させる。このアイデアは、TRPOやMunchausen Value Iteration (MVI)といったアルゴリズムによって与えられる近似を用いて、保守政策イテレーションに関するセミナー論文で最初に提案された。我々は、定義に$q$-logarithmを使用する一般化KL発散(英語版)(Tsallis KL divergence)と呼ばれる、一般化KL発散(英語版)の研究を継続する。このアプローチは厳密な一般化であり、$q = 1$ は標準 KL の発散に対応する;$q > 1$ は様々な新しい選択肢を提供する。我々は、Tsallis KLで学んだポリシーのタイプを特徴付け、$q > 1$が有益である場合に動機付ける。 Tsallis KL正則化を組み込んだ実用的なアルゴリズムを得るために、我々はKL正則化を組み込む最も単純なアプローチの一つであるMVIを拡張する。この一般化されたMVI($q$)は、35のアタリゲームにおいて標準MVI($q = 1$)よりも大幅に改善されていることを示す。

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence -- called the Tsallis KL divergence -- which use the $q$-logarithm in the definition. The approach is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q >1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games.

翻訳日:2024-03-21 01:51:05 公開日:2024-03-18

# 蒸留における学生・教師の逸脱について--不服従にかかわるか?

On student-teacher deviations in distillation: does it pay to disobey? ( http://arxiv.org/abs/2301.12923v3 )

ライセンス: Link先を確認

Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar,

(参考訳) 知識蒸留(KD)は「学生」ネットワークのテスト精度を向上させるために広く使われており、訓練された「教師」ネットワークのソフトな確率を模倣するように訓練されている。しかし、近年の研究では、教師の確率に合うように訓練されているにもかかわらず、生徒は教師の確率から大きく逸脱するだけでなく、パフォーマンスにおいて教師よりも優れていることが示されている。私たちの研究は、この一見パラドックス的な観察を和解することを目的としています。具体的には、学生と教師の偏差の正確な性質を特徴付け、どのようにしてより一般化して共起できるかについて議論する。まず、画像と言語データの実験を通して、これらの確率偏差が教師の信頼度を体系的に誇張する学生に対応することを確認した。次に、理論上かつ経験的に、いくつかの単純な設定で別の誇張の形式を確立する: KDは、データのトップ固有方向に沿って高速に収束する際に、勾配降下の暗黙のバイアスを誇張する。最後に、これらの2つの観測を結びつける。我々は、KDの誇張バイアスが同時に両方の結果をもたらすことを実証する。 a) 自信と自信の誇張 b) 学生の一般化が向上し, 明らかなパラドックスに対する解決法が提供される。我々の分析は、KDにおける勾配降下の役割を考慮し、理論的および経験的両方の環境において過大なバイアス効果を示すことによって、既存の理論と実践をより近づける。

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network. Yet, it has been shown in recent work that, despite being trained to fit the teacher's probabilities, the student may not only significantly deviate from the teacher probabilities, but may also outdo than the teacher in performance. Our work aims to reconcile this seemingly paradoxical observation. Specifically, we characterize the precise nature of the student-teacher deviations, and argue how they can co-occur with better generalization. First, through experiments on image and language data, we identify that these probability deviations correspond to the student systematically exaggerating the confidence levels of the teacher. Next, we theoretically and empirically establish another form of exaggeration in some simple settings: KD exaggerates the implicit bias of gradient descent in converging faster along the top eigendirections of the data. Finally, we tie these two observations together: we demonstrate that the exaggerated bias of KD can simultaneously result in both (a) the exaggeration of confidence and (b) the improved generalization of the student, thus offering a resolution to the apparent paradox. Our analysis brings existing theory and practice closer by considering the role of gradient descent in KD and by demonstrating the exaggerated bias effect in both theoretical and empirical settings.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# DCEM-PINN:固体力学の深い相補的エネルギー法

DCEM-PINNs: A deep complementary energy method for solid mechanics ( http://arxiv.org/abs/2302.01538v6 )

ライセンス: Link先を確認

Yizheng Wang, Jia Sun, Timon Rabczuk, Yinghua Liu,

(参考訳) 近年、ディープラーニングの急速な進歩は、特に固体力学の領域で偏微分方程式(PDE)を解く際に、様々な分野に大きな影響を与え、ニューラルネットワークの顕著な近似能力の恩恵を受けている。 PDEの解決において、物理情報ニューラルネットワーク(PINN)とDeep Energy Method(DEM)が注目されている。最小ポテンシャルエネルギーと相補エネルギーの原理は、固体力学における2つの重要な変分原理である。しかし、よく知られたDeep Energy Method (DEM) は最小ポテンシャルエネルギーの原理に基づいているが、最小補完エネルギーの重要な形態は欠いている。このギャップを埋めるために、最小補間エネルギーの原理に基づく深部補間エネルギー法(DCEM)を提案する。 DCEMの出力関数は、本質的に平衡方程式を満たす応力関数である。本稿では,Prandtl と Airy の応力関数を用いて数値計算を行い,典型的な機械的問題をモデル化する際,DCEM と既存の PINN と DEM のアルゴリズムを比較した。以上の結果から,DCEMはDEMよりも応力精度と効率が優れており,理論的解析や数値シミュレーションによって支持される複雑な変位境界条件に対処する上で有利であることが示唆された。我々はDCEMをDCEM-Plus(DCEM-P)に拡張し、偏微分方程式を満たす項を追加する。さらに,演算子学習と物理方程式を組み合わせることで,Deep complementary energy operator method (DCEM-O)を提案する。当初,我々は高忠実度数値結果を用いてDCEM-Oを訓練し,補完エネルギーを取り入れた。 DCEM-PとDCEM-Oは、DCEMの精度と効率をさらに高める。

In recent years, the rapid advancement of deep learning has significantly impacted various fields, particularly in solving partial differential equations (PDEs) in the realm of solid mechanics, benefiting greatly from the remarkable approximation capabilities of neural networks. In solving PDEs, Physics-Informed Neural Networks (PINNs) and the Deep Energy Method (DEM) have garnered substantial attention. The principle of minimum potential energy and complementary energy are two important variational principles in solid mechanics. However, the well-known Deep Energy Method (DEM) is based on the principle of minimum potential energy, but there lacks the important form of minimum complementary energy. To bridge this gap, we propose the deep complementary energy method (DCEM) based on the principle of minimum complementary energy. The output function of DCEM is the stress function, which inherently satisfies the equilibrium equation. We present numerical results using the Prandtl and Airy stress functions, and compare DCEM with existing PINNs and DEM algorithms when modeling representative mechanical problems. The results demonstrate that DCEM outperforms DEM in terms of stress accuracy and efficiency and has an advantage in dealing with complex displacement boundary conditions, which is supported by theoretical analyses and numerical simulations. We extend DCEM to DCEM-Plus (DCEM-P), adding terms that satisfy partial differential equations. Furthermore, we propose a deep complementary energy operator method (DCEM-O) by combining operator learning with physical equations. Initially, we train DCEM-O using high-fidelity numerical results and then incorporate complementary energy. DCEM-P and DCEM-O further enhance the accuracy and efficiency of DCEM.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# 同時音楽生成と分離のためのマルチソース拡散モデル

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation ( http://arxiv.org/abs/2302.02257v4 )

ライセンス: Link先を確認

Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele Rodolà,

(参考訳) 本研究では、文脈を共有するソースの結合確率密度のスコアを学習することにより、音楽合成と音源分離の両方が可能な拡散ベース生成モデルを定義する。古典的な全推論タスク(例えば、ミックスを生成し、ソースを分離する)とともに、ソース計算の部分生成タスクも導入し、実験し、ソースのサブセットを生成する(例えば、ドラムとうまく連携するピアノトラックを再生する)。さらに,ディラック確率関数に基づく分離タスクの新しい推論手法を提案する。我々は,音楽音源分離のための標準データセットであるSlakh2100でモデルをトレーニングし,生成設定における定性的な結果を提供し,音源分離設定における競合定量的結果を示す。本手法は,生成タスクと分離タスクの両方を扱える単一モデルの最初の例である。

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e., generating a mixture, separating the sources), we also introduce and experiment on the partial generation task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task based on Dirac likelihood functions. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the source separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# ニューラルコードモデル解釈のための因果論に向けて

Toward a Theory of Causation for Interpreting Neural Code Models ( http://arxiv.org/abs/2302.03788v2 )

ライセンス: Link先を確認

David N. Palacio, Nathan Cooper, Alvaro Rodriguez, Kevin Moran, Denys Poshyvanyk,

(参考訳) コードのニューラル言語モデル(Neural Language Models of Code、NCM)は、研究プロトタイプから商用開発ツールまで、急速に進歩している。そのため、そのようなモデルの能力と限界を理解することが重要になっている。しかしながら、これらのモデルの能力は通常、実際のパフォーマンスの一部だけを明らかにする自動メトリクスを使用して測定される。一般的には、NCMのパフォーマンスは有望であるように思われるが、現在、そのようなモデルがどのように決定を下すかは不明だ。そこで本研究では,モデル予測を記述可能な NCM 固有のポストホック解釈法である $do_{code}$ を紹介する。 $do_{code}$は、言語指向の説明を可能にする因果推論に基づいている。 do_{code}$の理論的基盤は、異なるモデル特性を探索するために拡張可能であるが、プログラミング言語の性質におけるモデル挙動の説明を基礎として、突発的相関の影響を軽減することを目的とした具体的なインスタンス化を提供する。 do_{code}$の実用的メリットを実証するために,2つの人気のあるディープラーニングアーキテクチャと10のNCMに関するケーススタディを実行することで,我々のフレームワークが提供できる洞察について説明する。このケーススタディの結果から,NCMはコード構文の変化に敏感であることが示唆された。 BERTライクなモデルを除いて、我々のNCMは、他のプログラミング言語の構造と比べて、曖昧なバイアスが少なく、コードのブロック(グレッグ括弧、括弧、セミコロン)に関連するトークンを統計的に予測することを学びます。これらの知見は、NCMにおける共起バイアスの検出と緩和に有用な方法として$do_{code}$の可能性を示している。

Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces $do_{code}$, a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. $do_{code}$ is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of $do_{code}$ are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of $do_{code}$, we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code (\eg brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of $do_{code}$ as a useful method to detect and facilitate the elimination of confounding bias in NCMs.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# 深さ・意味を考慮したマルチモーダルドメイン翻訳:LiDAR点雲から3次元パノラマカラー画像を生成する

Depth- and Semantics-aware Multi-modal Domain Translation: Generating 3D Panoramic Color Images from LiDAR Point Clouds ( http://arxiv.org/abs/2302.07661v4 )

ライセンス: Link先を確認

Tiago Cortinhal, Eren Erdal Aksoy,

(参考訳) 本研究は,LiDARとカメラセンサのマルチモーダル構成によるクロスドメイン画像・画像変換のための,深度とセマンティックスを考慮した新しい条件生成モデルTITAN-Nextを提案する。提案モデルでは,シーンセマンティクスを中間レベル表現として活用し,シーンセグメントのみに依存して生のLiDAR点雲をRGB-Dカメラ画像に変換する。我々は、これがこの種の最初のフレームワークであり、フェールセーフなメカニズムを提供し、ターゲット画像領域で利用可能なデータを増強するなど、自動運転車に実践的な応用があると主張している。提案モデルは,大規模かつ挑戦的なセマンティック・キティデータセットに基づいて評価され,実験結果から,元のTITAN-Netや他の強力なベースラインよりも23.7$\%のマージンをかなり上回っていることがわかった。

This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7$\%$ margin in terms of IoU.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# 多構成価結合理論への量子アルゴリズム的アプローチ:解釈可能な回路設計からの考察

A Quantum Algorithmic Approach to Multiconfigurational Valence Bond Theory: Insights from Interpretable Circuit Design ( http://arxiv.org/abs/2302.10660v2 )

ライセンス: Link先を確認

Jakob S. Kottmann, Francesco Scala,

(参考訳) 量子コンピュータ上でのフェルミオン基底状態の効率的な作成方法が求められており、近年様々な技術が開発されている。膨大な数のメソッドがあるにもかかわらず、どのメソッドがどのシステムによく機能するかは未だに不明である。本研究では,多構成価結合波動関数を最適化するために,解釈可能な回路設計と効果的な基本手法を組み合わせる。選択されたモデルシステムに基づいて、これがいかに説明可能な性能をもたらすかを示す。提案手法は, 実効ベースのサイズや, 関連回路の個々の量子資源の観点から, 関連手法よりも優れていることを示す。

Efficient ways to prepare fermionic ground states on quantum computers are in high demand and different techniques have been developed over the last years. Despite having a vast set of methods, it is still unclear which method performs well for which system. In this work, we combine interpretable circuit designs with an effective basis approach in order to optimize a multiconfigurational valence bond wavefunction. Based on selected model systems, we show how this leads to explainable performance. We demonstrate that the developed methodology outperforms related methods in terms of the size of the effective basis as well as individual quantum resources for the involved circuits.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# Magnushammer: 選択を前提としたトランスフォーマーベースのアプローチ

Magnushammer: A Transformer-Based Approach to Premise Selection ( http://arxiv.org/abs/2303.04488v3 )

ライセンス: Link先を確認

Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak, Bartosz Piotrowski, Albert Qiaochu Jiang, Jin Peng Zhou, Christian Szegedy, Łukasz Kuciński, Piotr Miłoś, Yuhuai Wu,

(参考訳) 本稿では,自動定理証明における重要な推論課題である前提選択に対する新しいアプローチを提案する。伝統的に、このタスクには広範なドメイン知識とエンジニアリングの努力に依存するシンボリックメソッドが適用される。対照的に、この研究は、トランスフォーマーアーキテクチャによる対照的なトレーニングが、エンジニアリングオーバーヘッドを伴わずに、関連する前提の高品質な検索を実現することを実証している。我々の手法であるMagnushammerは、Sledgehammerと呼ばれるインタラクティブな定理の証明において、最も先進的で広く使われている自動化ツールより優れています。 PISA と miniF2F のベンチマークでは、Magnushammer は59.5 %$(それぞれ38.3 %$)と34.0 %$(それぞれ20.9 %$)を達成している。言語モデルに基づく自動定理証明器と<method</methodを併用することにより、PISAベンチマークにおいて4ドル未満のパラメータを用いて、最先端の証明成功率を57.0\%から71.0\%に改善する。さらに,本研究では,(保護状態,関連する前提)ペアのテキスト表現を含む,前提選択のための新しいデータセットを開発し,オープンソース化する。私たちの知る限りでは、これは利用可能な最大の前提選択データセットであり、Isabelle証明アシスタントの最初のものである。

This paper presents a novel approach to premise selection, a crucial reasoning task in automated theorem proving. Traditionally, symbolic methods that rely on extensive domain knowledge and engineering effort are applied to this task. In contrast, this work demonstrates that contrastive training with the transformer architecture can achieve higher-quality retrieval of relevant premises, without the engineering overhead. Our method, Magnushammer, outperforms the most advanced and widely used automation tool in interactive theorem proving called Sledgehammer. On the PISA and miniF2F benchmarks Magnushammer achieves $59.5\%$ (against $38.3\%$) and $34.0\%$ (against $20.9\%$) success rates, respectively. By combining \method with a language-model-based automated theorem prover, we further improve the state-of-the-art proof success rate from $57.0\%$ to $71.0\%$ on the PISA benchmark using $4$x fewer parameters. Moreover, we develop and open source a novel dataset for premise selection, containing textual representations of (proof state, relevant premise) pairs. To the best of our knowledge, this is the largest available premise selection dataset, and the first one for the Isabelle proof assistant.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# EHRDiff:拡散モデルによるリアルなEHR合成の探索

EHRDiff: Exploring Realistic EHR Synthesis with Diffusion Models ( http://arxiv.org/abs/2303.05656v2 )

ライセンス: Link先を確認

Hongyi Yuan, Songchi Zhou, Sheng Yu,

(参考訳) 電子健康記録(EHR)には、精密医療システムの開発のための貴重な資源として、豊富な生物医学情報が含まれている。しかしながら、プライバシに関する懸念は、研究者のための高品質で大規模なEHRデータへのアクセスを制限し、方法論の発展を妨げている。近年の研究では、生成的モデリング技術による現実的なEHRデータの合成が試みられ、提案手法の大半は、生成的敵対的ネットワーク(GAN)とそのEHR合成のバリエーションに依存している。 GANに基づく手法はEHRデータの生成における最先端性能を実現するが、これらの手法は訓練が困難であり、モード崩壊の傾向にある。近年, 画像生成において拡散モデルにより最先端の性能が確立されているが, EHRデータ合成における有効性は未解明のままである。本研究では, EHRデータ合成における拡散モデルの可能性について検討し, 新たな手法である EHRDiff を提案する。広範な実験を通じて、EHRDiffは、合成されたEHRデータのための新しい最先端の品質を確立し、一方でプライベート情報を保護する。

Electronic health records (EHR) contain a wealth of biomedical information, serving as valuable resources for the development of precision medicine systems. However, privacy concerns have resulted in limited access to high-quality and large-scale EHR data for researchers, impeding progress in methodological development. Recent research has delved into synthesizing realistic EHR data through generative modeling techniques, where a majority of proposed methods relied on generative adversarial networks (GAN) and their variants for EHR synthesis. Despite GAN-based methods attaining state-of-the-art performance in generating EHR data, these approaches are difficult to train and prone to mode collapse. Recently introduced in generative modeling, diffusion models have established cutting-edge performance in image generation, but their efficacy in EHR data synthesis remains largely unexplored. In this study, we investigate the potential of diffusion models for EHR data synthesis and introduce a novel method, EHRDiff. Through extensive experiments, EHRDiff establishes new state-of-the-art quality for synthetic EHR data, protecting private information in the meanwhile.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# Tag2Text:イメージタグによる視覚言語モデルの誘導

Tag2Text: Guiding Vision-Language Model via Image Tagging ( http://arxiv.org/abs/2303.05657v3 )

ライセンス: Link先を確認

Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang,

(参考訳) 本稿では,視覚言語事前学習(VLP)フレームワークであるTag2Textについて述べる。対象タグを手動でラベル付けするか,あるいはオフザシェルフ検出器で自動的に検出する従来の手法とは対照的に,本手法では画像ペアリングテキストから解析したタグを用いて画像タグを明示的に学習し,視覚言語モデルに強力な意味的ガイダンスを提供する。このように、Tag2Textは、画像とテキストのペアに応じて、大規模なアノテーションのない画像タグを利用でき、オブジェクトを超えてより多様なタグカテゴリを提供する。結果として、Tag2Textは、完全な教師付きモデルに匹敵するゼロショットのパフォーマンスで、基礎的なイメージタグ付けモデルの能力を示す。さらに、タグ付けガイダンスを活用することで、Tag2Textは世代ベースとアライメントベースの両方のタスクにおける視覚言語モデルの性能を効果的に向上する。幅広いダウンストリームベンチマークにおいて、Tag2Textは、同様のモデルサイズとデータスケールで最先端の結果を達成し、提案したタグ付けガイダンスの有効性を実証する。コード、デモ、事前訓練されたモデルはhttps://github.com/xinyu1205/recognize-anything.comで入手できる。

This paper presents Tag2Text, a vision language pre-training (VLP) framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features. In contrast to prior works which utilize object tags either manually labeled or automatically detected with an off-the-shelf detector with limited performance, our approach explicitly learns an image tagger using tags parsed from image-paired text and thus provides a strong semantic guidance to vision-language models. In this way, Tag2Text can utilize large-scale annotation-free image tags in accordance with image-text pairs, and provides more diverse tag categories beyond objects. As a result, Tag2Text demonstrates the ability of a foundational image tagging model, with superior zero-shot performance even comparable to fully supervised models. Moreover, by leveraging the tagging guidance, Tag2Text effectively enhances the performance of vision-language models on both generation-based and alignment-based tasks. Across a wide range of downstream benchmarks, Tag2Text achieves state-of-the-art results with similar model sizes and data scales, demonstrating the efficacy of the proposed tagging guidance. Code, demo and pre-trained models are available at https://github.com/xinyu1205/recognize-anything.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# 多レベル原子の秩序配列におけるディック超放射能

Dicke superradiance in ordered arrays of multilevel atoms ( http://arxiv.org/abs/2304.00093v2 )

ライセンス: Link先を確認

Stuart J. Masson, Jacob P. Covey, Sebastian Will, Ana Asenjo-Garcia,

(参考訳) 逆原子アンサンブルでは、光子を媒介とする相互作用は多体崩壊の形で、光子バーストとしてエネルギーが急速に放出される。元々は点のようなアンサンブルで研究されていたが、粒子間距離が一定の境界以下であれば、この現象は拡張順序系で継続する。ここでは, ストロンチウムやイッテルビウムなどのアルカリ性アース(-様)原子の配列を順序付けして, 現実的な実験環境下でのDicke超放射能について検討する。このような原子は、内部構造が長い波長の遷移に比べて短い原子間距離でトラップできるので、光と物質の相互作用にエキサイティングな新しい機会を与える。その複雑な電子構造にもかかわらず、これらの原子種の2次元配列は達成可能な格子定数に対して多体超放射性を示すことが示される。さらに、超放射能は、マルチレベル原子がより2レベルになるような「クローゼス」遷移を効果的に行う。これは、アバランシェ様の崩壊がほとんどの光子の放出を支配的な遷移に導いており、その微細構造とゼーマン分岐によって予測される単原子の崩壊比を克服しているためである。我々の研究は、アルカリ原子を量子光学源として利用し、多体散逸動力学を探求するためのプラットフォームとして利用するための重要なステップである。

In inverted atomic ensembles, photon-mediated interactions give rise to Dicke superradiance, a form of many-body decay that results in a rapid release of energy as a photon burst. While originally studied in pointlike ensembles, this phenomenon persists in extended ordered systems if the inter-particle distance is below a certain bound. Here, we investigate Dicke superradiance in a realistic experimental setting using ordered arrays of alkaline-earth(-like) atoms, such as strontium and ytterbium. Such atoms offer exciting new opportunities for light-matter interactions as their internal structure allows for trapping at short interatomic distances compared to their long-wavelength transitions, providing the potential for collectively enhanced dissipative interactions. Despite their intricate electronic structure, we show that two-dimensional arrays of these atomic species should exhibit many-body superradiance for achievable lattice constants. Moreover, superradiance effectively ``closes'' transitions, such that multilevel atoms become more two-level like. This occurs because the avalanchelike decay funnels the emission of most photons into the dominant transition, overcoming the single-atom decay ratios dictated by their fine structure and Zeeman branching. Our work represents an important step in harnessing alkaline-earth atoms as quantum optical sources and as platforms to explore many-body dissipative dynamics.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# DeforestVis: サロゲート決定スタンプを用いた機械学習モデルの動作解析

DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps ( http://arxiv.org/abs/2304.00133v4 )

ライセンス: Link先を確認

Angelos Chatzimparmpas, Rafael M. Martins, Alexandru C. Telea, Andreas Kerren,

(参考訳) 機械学習(ML)モデルの複雑さが増大し、異なる(そして重要な)ドメインでの応用が増加するにつれて、より解釈可能で信頼性の高いMLに対する強い需要がある。そのようなモデルを直接、モデルに依存しない、解釈する方法は、ルールセットや決定ツリーのような代理モデルを訓練することである。しかし、ルールセットは非常に長くなり、多くのif-else文があり、複雑なMLモデルを正確にエミュレートすると決定木深さが急速に増加する。このような場合、どちらのアプローチも、モデル解釈可能性を持つユーザを目標とする中核的な目標達成に失敗する可能性がある。これを解決するために,Adaptive Boosting (AdaBoost) 技術で生成された一段決定切り株(一段決定木)を提供することにより,複雑なMLモデルの振る舞いを要約する視覚解析ツールであるDeforestVisを提案する。 DeforestVisは、より多くの切り株をインクリメンタルに生成し、決定を正当化するために重み付けされた切り株を使った属性ベースの説明を作成し、1つ以上の切り株間のトレーニングインスタンス割り当てに対するルールオーバーライドの影響を分析することで、複雑さとフィデリティのトレードオフを探索するのに役立つ。独立したテストセットでは、手動のルール変更の有効性を監視し、ケースバイケース分析に基づいて仮説を形成することができる。 DeforestVisの適用性と有用性について,2つのユースケースと,データアナリストとモデル開発者とのエキスパートインタビューで紹介する。

As the complexity of machine learning (ML) models increases and their application in different (and critical) domains grows, there is a strong demand for more interpretable and trustworthy ML. A direct, model-agnostic, way to interpret such models is to train surrogate models-such as rule sets and decision trees-that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal-providing users with model interpretability. To tackle this, we propose DeforestVis, a visual analytics tool that offers summarization of the behaviour of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the Adaptive Boosting (AdaBoost) technique. DeforestVis helps users to explore the complexity versus fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analysing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case analyses. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.

翻訳日:2024-03-21 01:40:47 公開日:2024-03-18

# 神経集団動態と幾何学の解釈可能な統計的表現

Interpretable statistical representations of neural population dynamics and geometry ( http://arxiv.org/abs/2304.03376v3 )

ライセンス: Link先を確認

Adam Gosztolai, Robert L. Peach, Alexis Arnaudon, Mauricio Barahona, Pierre Vandergheynst,

(参考訳) 多くの行動課題におけるニューロンの集団のダイナミクスは、低次元多様体上で進化する。しかし、行動情報に明示的に依存することなく、個人や状況間で解釈可能で、一貫したデオードが可能なニューラル記録から潜伏表現を発見することは依然として困難である。本稿では,局所的動的特徴の統計分布に基づく非線形力学のデータ駆動表現のための,完全に教師なしの幾何学的深層学習フレームワークMARBLEを紹介する。非線形力学系からのシリカの例と、リカレントニューラルネットワークによる例と、霊長類やネズミからの生体内記録の両方を用いて、MARBLEは、決定閾値、運動学、内部状態などの大域的システム変数の観点から高い解釈が可能な潜伏表現を推測できることを示した。また、MARBLE表現はニューラルネットワークや動物間で一貫性があることを示し、認知計算の比較や普遍デコーダの訓練に使用することができる。広範なベンチマークによって、教師なしのMARBLEは、現在の教師付きアプローチに匹敵する、あるいははるかに優れた、クラス内および動物間デコーディングの精度を提供するが、行動ラベルは不要であることを示す。この結果から,ニューラルネットワークの時間的情報とともに多様体構造を用いることで,より優れた復号アルゴリズムを開発し,実験間でデータを同化するための共通の枠組みが得られることが示唆された。

The dynamics of neuron populations during many behavioural tasks evolve on low-dimensional manifolds. However, it remains challenging to discover latent representations from neural recordings that are interpretable and consistently decodable across individuals and conditions without explicitly relying on behavioural information. Here, we introduce MARBLE, a fully unsupervised geometric deep learning framework for the data-driven representation of non-linear dynamics based on statistical distributions of local dynamical features. Using both in silico examples from non-linear dynamical systems and recurrent neural networks and in vivo recordings from primates and rodents, we demonstrate that MARBLE can infer latent representations that are highly interpretable in terms of global system variables such as decision-thresholds, kinematics or internal states. We also show that MARBLE representations are consistent across neural networks and animals so that they can be used to compare cognitive computations or train universal decoders. Through extensive benchmarking, we show that unsupervised MARBLE provides best-in-class within- and across-animal decoding accuracy, comparable to or significantly better than current supervised approaches, yet without the need for behavioural labels. Our results suggest that using the manifold structure in conjunction with the temporal information of neural dynamics provides a common framework to develop better decoding algorithms and assimilate data across experiments.

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# StillFast: 短期オブジェクトインタラクション予測のためのエンドツーエンドアプローチ

StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation ( http://arxiv.org/abs/2304.03959v2 )

ライセンス: Link先を確認

Francesco Ragusa, Giovanni Maria Farinella, Antonino Furnari,

(参考訳) 予測問題は、人間の位置の予測、手や物体の軌跡の予測、行動の予測、人間と物体の相互作用など、さまざまな側面を考慮して研究されてきた。本稿では,オブジェクト間相互作用の短期的予測問題をエゴセントリックな視点から検討し,新たなエンドツーエンドアーキテクチャであるStillFastを提案する。提案手法は静止画像と映像を同時に処理し、次のアクティブなオブジェクトを検出して位置を定め、将来のインタラクションを記述する動詞を予測し、いつ対話が始まるかを決定する。大規模エゴセントリックデータセットEGO4Dの実験結果から,提案手法は課題に対する最先端のアプローチよりも優れていた。本手法は,EGO4D短期オブジェクトインタラクション予測課題2022において,第1位にランクされている。コードと詳細については、プロジェクトのWebページを参照してください。

Anticipation problem has been studied considering different aspects such as predicting humans' locations, predicting hands and objects trajectories, and forecasting actions and human-object interactions. In this paper, we studied the short-term object interaction anticipation problem from the egocentric point of view, proposing a new end-to-end architecture named StillFast. Our approach simultaneously processes a still image and a video detecting and localizing next-active objects, predicting the verb which describes the future interaction and determining when the interaction will start. Experiments on the large-scale egocentric dataset EGO4D show that our method outperformed state-of-the-art approaches on the considered task. Our method is ranked first in the public leaderboard of the EGO4D short term object interaction anticipation challenge 2022. Please see the project web page for code and additional details: https://iplab.dmi.unict.it/stillfast/.

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# 深部ニューラルネットワークの不確かさ校正におけるテスト時間増大へのアプローチ

Approaching Test Time Augmentation in the Context of Uncertainty Calibration for Deep Neural Networks ( http://arxiv.org/abs/2304.05104v2 )

ライセンス: Link先を確認

Pedro Conde, Tiago Barros, Rui L. Lopes, Cristiano Premebida, Urbano J. Nunes,

(参考訳) Deep Neural Networksの台頭により、機械学習システムは、現在、多くの現実世界のアプリケーションにおいてユビキタスであり、信頼性の高いモデルを必要としている。これは、そのようなシステムの正確性だけでなく、予測の不確実性についても徹底的に検討する必要がある。そこで我々は,画像分類のための深部モデルの不確実性校正を改善するために,テスト時間増大に基づく新しい手法(M-ATTAとV-ATTA)を提案する。ナ適応重み付けシステムを利用することで、M/V-ATTAはモデルの精度に影響を与えることなく不確実性校正を改善する。これらの手法の性能は、不確実性の校正に関連する様々な指標を考慮し、その頑健さを実証することによって評価される。 CIFAR-10, CIFAR-100, Aerial Image Dataset, および分布シフト下の2つの異なるシナリオで得られた実験結果は, 提案手法がいくつかの最先端のポストホックキャリブレーション法より優れていることを示している。さらに,本手法は,分布外サンプルの予測エントロピーも改善した。 M/V-ATTA コード:https://github.com/pedrormconde/MV-ATTA

With the rise of Deep Neural Networks, machine learning systems are nowadays ubiquitous in a number of real-world applications, which bears the need for highly reliable models. This requires a thorough look not only at the accuracy of such systems, but also at their predictive uncertainty. Hence, we propose a novel technique (with two different variations, named M-ATTA and V-ATTA) based on test time augmentation, to improve the uncertainty calibration of deep models for image classification. By leveraging na adaptive weighting system, M/V-ATTA improves uncertainty calibration without affecting the model's accuracy. The performance of these techniques is evaluated by considering diverse metrics related to uncertainty calibration, demonstrating their robustness. Empirical results, obtained on CIFAR-10, CIFAR-100, Aerial Image Dataset, as well as in two different scenarios under distribution-shift, indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques. Furthermore, the methods proposed also show improvements in terms of predictive entropy on out-of-distribution samples. Code for M/V-ATTA available at: https://github.com/pedrormconde/MV-ATTA

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# Farm3D:2D拡散による人工3D動物の学習

Farm3D: Learning Articulated 3D Animals by Distilling 2D Diffusion ( http://arxiv.org/abs/2304.10535v2 )

ライセンス: Link先を確認

Tomas Jakab, Ruining Li, Shangzhe Wu, Christian Rupprecht, Andrea Vedaldi,

(参考訳) 本稿では,事前に訓練された2次元拡散画像生成装置からの「自由」な仮想監督のみに頼って,カテゴリー別3次元再構成器を学習するFarm3Dを提案する。最近のアプローチでは、オブジェクトカテゴリの単一ビューイメージの集合から、オブジェクトの発生の3次元形状、アルベド、照明、視点を予測する単眼ネットワークを学習することができる。しかし、これらのアプローチは手作業によるクリーンなトレーニングデータに大きく依存している。本稿では, 安定拡散などの画像生成装置を用いて, 十分にクリーンで手作業によるキュレーションを必要としない合成トレーニングデータを生成するフレームワークを提案する。さらに,拡散モデルをスコアとして組み込んで学習プロセスを強化する。このアイデアは、視点や照明などの再構成の特定の側面をランダム化し、再構成された3Dオブジェクトの仮想ビューを生成し、2Dネットワークが結果の画像の品質を評価できるようにし、再構成者にフィードバックを提供する。テキストプロンプトごとに単一の3Dアセットを生成する蒸留法とは異なり、本手法では、任意の画像から制御可能な3Dアセットを出力できる単分子再構成ネットワークを、1つのフォワードパスで数秒で生成する。我々のネットワークは、単分子再構成や合成などの分析に利用でき、ビデオゲームのようなリアルタイムアプリケーションのための音響資産を生成することができる。

We present Farm3D, a method for learning category-specific 3D reconstructors for articulated objects, relying solely on "free" virtual supervision from a pre-trained 2D diffusion-based image generator. Recent approaches can learn a monocular network that predicts the 3D shape, albedo, illumination, and viewpoint of any object occurrence, given a collection of single-view images of an object category. However, these approaches heavily rely on manually curated clean training data, which are expensive to obtain. We propose a framework that uses an image generator, such as Stable Diffusion, to generate synthetic training data that are sufficiently clean and do not require further manual curation, enabling the learning of such a reconstruction network from scratch. Additionally, we incorporate the diffusion model as a score to enhance the learning process. The idea involves randomizing certain aspects of the reconstruction, such as viewpoint and illumination, generating virtual views of the reconstructed 3D object, and allowing the 2D network to assess the quality of the resulting image, thus providing feedback to the reconstructor. Unlike work based on distillation, which produces a single 3D asset for each textual prompt, our approach yields a monocular reconstruction network capable of outputting a controllable 3D asset from any given image, whether real or generated, in a single forward pass in a matter of seconds. Our network can be used for analysis, including monocular reconstruction, or for synthesis, generating articulated assets for real-time applications such as video games.

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# 非エルミートスピンレスBHZ様モデルにおけるパーシング皮膚効果

Parsing skin effect in a non-Hermitian spinless BHZ-like model ( http://arxiv.org/abs/2304.12723v2 )

ライセンス: Link先を確認

Dipendu Halder, Saurabh Basu,

(参考訳) この研究は、スピンレスベルネヴィグ・ヒューズ・チャン(BHZ)のような1次元のモデルにおける非エルミート皮膚効果(NHSE)を包括的に研究する。非相互ホッピング振幅を持つシステムはNHSEを示すと一般的に信じられている。しかし, システム内のNHSEやその変異の存在を復号するためには, より詳細な解析が必要である。オービタルホッピング用語に非相反性を含めることによって,従来のNHSEや双方向NHSEの存在や,驚くべきNHSEの欠如が示唆される。位相特性と(両直交)バルク境界対応は、(複素)ベリー位相、巻数、エッジモードの空間的局在の計算によって列挙され、そこで生じる位相遷移が強調される。さらに、非エルミートモデルの構造的議論を促進するために、結果をPT対称および非PT対称のケースに分割し、この2つを比較した。

This work comprehensively investigates the non-Hermitian skin effect (NHSE) in a spinless Bernevig-Hughes-Zhang (BHZ)-like model in one dimension. It is generally believed that a system with non-reciprocal hopping amplitudes demonstrates NHSE. However, we show that there are exceptions, and more in-depth analyses are required to decode the presence of NHSE or its variants in a system. The fascinating aspects of our findings, depending on the inclusion of non-reciprocity in the inter-orbital hopping terms, concede the existence of conventional or bi-directional NHSE and even a surprising absence of NHSE. The topological properties and the (bi-orthogonal) bulk-boundary correspondence, enumerated via computation of the (complex) Berry phase, the winding number, and spatial localization of the edge modes, highlight the topological phase transitions occurring therein. Further, to facilitate a structured discussion of the non-Hermitian model, we split the results into PT symmetric and non-PT symmetric cases with a view to comparing the two.

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# BCQQ: 周期データ再アップロードによるバッチ制約量子Q-Learning

BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading ( http://arxiv.org/abs/2305.00905v2 )

ライセンス: Link先を確認

Maniraman Periyasamy, Marc Hölle, Marco Wiedmann, Daniel D. Scherer, Axel Plinge, Christopher Mutschler,

(参考訳) 深層強化学習(DRL)は、しばしば大量のデータと環境の相互作用を必要とし、トレーニングプロセスに時間がかかる。バッチRLでは、エージェントは環境の相互作用を伴わずに、事前にコンパイルされたデータセットにのみトレーニングされる。量子コンピューティングの最近の進歩は、量子モデルは古典的手法に比べて訓練に必要なデータが少ないことを示唆している。本稿では、離散バッチ制約深度Q-ラーニング(BCQ)アルゴリズムにおいて、VQCを関数近似器として利用するバッチRLアルゴリズムを提案する。さらに,データエンコーディング層における入力変数の順序を周期的にシフトさせることにより,新しいデータ再ロード方式を導入する。我々は,OpenAI CartPole環境におけるアルゴリズムの有効性を評価し,その性能を従来のニューラルネットワークに基づく離散BCQと比較した。

Deep reinforcement learning (DRL) often requires a large number of data and environment interactions, making the training process time-consuming. This challenge is further exacerbated in the case of batch RL, where the agent is trained solely on a pre-collected dataset without environment interactions. Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods. In this paper, we investigate this potential advantage by proposing a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning (BCQ) algorithm. Additionally, we introduce a novel data re-uploading scheme by cyclically shifting the order of input variables in the data encoding layers. We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# AIを用いた実用的アノテーションの可能性評価:謝罪の事例

Assessing the potential of AI-assisted pragmatic annotation: The case of apologies ( http://arxiv.org/abs/2305.08339v4 )

ライセンス: Link先を確認

Danni Yu, Luyang Li, Hang Su, Matteo Fuoli,

(参考訳) 音声や意味的タグ付けなどの言語アノテーションの特定の形態は、高精度で自動化することができる。しかし、語彙形式への直接マッピングが欠如している複雑な実用的・非帰的特徴に対して、手動のアノテーションは依然として必要である。この手動のプロセスは時間をかけてエラーを起こし、コーパス言語学における関数間アプローチのスケーラビリティを制限する。そこで本研究では,大規模言語モデル(LLM)を用いたプラグマ離散コーパスアノテーションの自動化について検討した。局所文法の枠組みに基づいて,ChatGPT,Bingチャットボット,および人間のコーダを英語で注釈付けする。 BingチャットボットはChatGPTより優れており、精度は人間のコーダに近づいた。これらの結果から,AIは実用的コーパスアノテーションの支援に成功し,プロセスをより効率的かつスケーラブルにすることができることが示唆された。キーワード:言語アノテーション、関数間アプローチ、大言語モデル、局所文法解析、Bingチャットボット、ChatGPT

Certain forms of linguistic annotation, like part of speech and semantic tagging, can be automated with high accuracy. However, manual annotation is still necessary for complex pragmatic and discursive features that lack a direct mapping to lexical forms. This manual process is time-consuming and error-prone, limiting the scalability of function-to-form approaches in corpus linguistics. To address this, our study explores automating pragma-discursive corpus annotation using large language models (LLMs). We compare ChatGPT, the Bing chatbot, and a human coder in annotating apology components in English based on the local grammar framework. We find that the Bing chatbot outperformed ChatGPT, with accuracy approaching that of a human coder. These results suggest that AI can be successfully deployed to aid pragma-discursive corpus annotation, making the process more efficient and scalable. Keywords: linguistic annotation, function-to-form approaches, large language models, local grammar analysis, Bing chatbot, ChatGPT

翻訳日:2024-03-21 01:30:29 公開日:2024-03-18

# ハングタイムHAR: Wrist-Worn慣性センサを用いたバスケットボール活動認識のためのベンチマークデータセット

Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition using Wrist-Worn Inertial Sensors ( http://arxiv.org/abs/2305.13124v2 )

ライセンス: Link先を確認

Alexander Hoelzemann, Julia Lee Romero, Marius Bock, Kristof Van Laerhoven, Qin Lv,

(参考訳) バスケットボールのトレーニングやドリル,ゲームなどの特定の設定のために,手首のセンサーを用いた身体活動認識手法を評価するためのベンチマークデータセットを提案する。バスケットボール活動は手首に装着した慣性センサーによる計測に適しており、そのようなスポーツ関連アクティビティを検出するシステムは、ゲーム分析、ガイド付きトレーニング、および身体的活動追跡への応用に利用することができる。このデータセットは、バスケットボールのトレーニングセッションとフルゲームの両方で、計24人の選手が手首に慣性センサーを装着した2つの国(米国とドイツ)のチームで記録された。このデータセットの特徴としては,2つの国で記録された試合ルールやスタイルの文化的差異による固有の差異や,以前のバスケットボール経験の面では異質であるため,スポーツスキルのレベルが異なることが挙げられる。いくつかの時系列分析でデータセットの特徴を概説し、2つの最先端ディープラーニングアーキテクチャを用いたベースライン分類性能研究について報告する。

We present a benchmark dataset for evaluating physical human activity recognition methods from wrist-worn sensors, for the specific setting of basketball training, drills, and games. Basketball activities lend themselves well for measurement by wrist-worn inertial sensors, and systems that are able to detect such sport-relevant activities could be used in applications toward game analysis, guided training, and personal physical activity tracking. The dataset was recorded for two teams from separate countries (USA and Germany) with a total of 24 players who wore an inertial sensor on their wrist, during both repetitive basketball training sessions and full games. Particular features of this dataset include an inherent variance through cultural differences in game rules and styles as the data was recorded in two countries, as well as different sport skill levels, since the participants were heterogeneous in terms of prior basketball experience. We illustrate the dataset's features in several time-series analyses and report on a baseline classification performance study with two state-of-the-art deep learning architectures.

翻訳日:2024-03-21 01:20:39 公開日:2024-03-18

# Dual expectile-Quantile Regressionを用いた分散強化学習

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression ( http://arxiv.org/abs/2305.16877v2 )

ライセンス: Link先を確認

Sami Jullien, Romain Deffayet, Jean-Michel Renders, Paul Groth, Maarten de Rijke,

(参考訳) 分散強化学習(RL)は,リターンの完全な分布を近似し,環境サンプルをよりよく活用できるため,複数のベンチマークで有用であることが証明されている。非対称な$L_1$損失に基づく分布RLに対する一般的な量子レグレッションアプローチは、任意の戻り分布を柔軟かつ効果的に学習する方法を提供する。実際には、量子レグレッションのためにより効率的でハイブリッドな$L_1$-$L_2$ Huber損失を使用することで、しばしば改善される。しかし, 分布推定は消滅し, 推定分布が急速に崩壊するのを実証的に観察する。実際、期待回帰に対応する非対称$L_2$損失は、分布時間差分学習では容易には利用できない。本研究は,$L_2$ベースの学習を効率よく行うことにより,返却分布の予測値と量子化値とを協調的に学習し,返却分布の完全な分布を推定し,効率的な学習を可能にすることを提案する。提案手法は, 正解分布を概ね学習し, おもちゃの例と規模で実践的な実装をベンチマークする。 Atari ベンチマークでは,2M のトレーニングフレームの後に Huber ベースの IQN-1 ベースラインの性能にマッチするが,分布の崩壊を回避し,リターンの完全な分布を推定する。

Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and makes a better use of environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, hybrid asymmetric $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our approach approximately learns the correct return distribution, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

翻訳日:2024-03-21 01:20:39 公開日:2024-03-18

# アウト・オブ・ディストリビューション検出のためのセマンティック・ロール・ラベルリング

Semantic Role Labeling Guided Out-of-distribution Detection ( http://arxiv.org/abs/2305.18026v2 )

ライセンス: Link先を確認

Jinan Zou, Maihao Guo, Yu Tian, Yuhao Lin, Haiyao Cao, Lingqiao Liu, Ehsan Abbasnejad, Javen Qinfeng Shi,

(参考訳) 自然言語処理における予期せぬドメインシフトのインスタンスを特定することは、現実世界のアプリケーションでは不可欠である。従来の作業では,1つのグローバルな特徴を埋め込んで文を表現することで,アウト・オブ・ディストリビューション(OOD)インスタンスを識別していた。現在のOOD法が直面しているもうひとつの大きな課題は、有効な低次元の文表現を学習して、意味論的にIn-distriion(ID)データに類似したハードなOODインスタンスを特定することである。本稿では,文の異なる引数と全文のグローバルな特徴表現から,意味的役割ラベル付け(SRL)を導出した意味的役割ラベル付け(SRLOOD)を分離し,抽出し,学習する,意味的役割ラベル付け(Semantic Role Labeling Guided Out-of-distriion Detection, SRLOOD)と呼ばれる新しい教師なしOOD検出手法を提案する。また,SRLの抽出した役割を予測することにより,グローバルな特徴学習を強化するために,新たな自己教師型アプローチも導入されている。その結果,4つのOODベンチマークにおいてSOTA性能が得られ,本手法の有効性が示唆された。コードは \url{https://github.com/cytai/SRLOOD} を通じて公開されている。

Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the out-of-distribution (OOD) instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sentence representations to identify the hard OOD instances that are semantically similar to the in-distribution (ID) data. In this paper, we propose a new unsupervised OOD detection method, namely Semantic Role Labeling Guided Out-of-distribution Detection (SRLOOD), that separates, extracts, and learns the semantic role labeling (SRL) guided fine-grained local feature representations from different arguments of a sentence and the global feature representations of the full sentence using a margin-based contrastive loss. A novel self-supervised approach is also introduced to enhance such global-local feature learning by predicting the SRL extracted role. The resulting model achieves SOTA performance on four OOD benchmarks, indicating the effectiveness of our approach. The code is publicly accessible via \url{https://github.com/cytai/SRLOOD}.

翻訳日:2024-03-21 01:20:39 公開日:2024-03-18

# DeepSolo++:多言語テキストスポッティングのための明示的なポイントを持つトランスフォーマーデコーダ

DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting ( http://arxiv.org/abs/2305.19957v2 )

ライセンス: Link先を確認

Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, Dacheng Tao,

(参考訳) エンドツーエンドのテキストスポッティングは、シーンテキストの検出と認識を統一されたフレームワークに統合することを目的としている。 2つのサブタスクの関係に対処することは、効果的なスポッターの設計において重要な役割を果たす。 Transformerベースの手法はヒューリスティックな後処理を排除しているが、サブタスク間の相乗効果とトレーニング効率の低下に悩まされている。さらに、追加のスクリプト識別タスクを必要とする多言語テキストスポッティングの探索も見落としている。本稿では,DeepSolo++について述べる。DeepSolo++は単純なDETRライクなベースラインで,テキスト検出,認識,スクリプト識別を単独で行う1つのデコーダを同時に行うことができる。技術的には、各テキストインスタンスに対して、文字シーケンスを順序付けられたポイントとして表現し、学習可能な明示的なポイントクエリでそれらをモデル化します。単一デコーダを渡すと、ポイントクエリは必要なテキストセマンティクスと場所を符号化するので、非常に単純な予測ヘッドを並列で中心線、境界線、スクリプト、およびテキストの信頼性にさらにデコードすることができる。さらに、文字クラス、言語タイプ、タスクの観点から、驚くほど優れた拡張性を示す。一方,本手法は,英語のシーンだけでなく,複雑なフォント構造と中国語などの1000レベルの文字クラスで書き起こしを習得する。一方、私たちのDeepSolo++は、以前の方法と比較して、より簡単なトレーニングパイプラインで、追加で導入されたスクリプト識別タスクにおいて、より良いパフォーマンスを実現しています。さらに、私たちのモデルは行アノテーションとも互換性があり、ポリゴンよりもアノテーションコストがはるかに低い。コードは \url{https://github.com/ViTAE-Transformer/DeepSolo} で公開されている。

End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. Besides, they overlook the exploring on multilingual text spotting which requires an extra script identification task. In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Furthermore, we show the surprisingly good extensibility of our method, in terms of character class, language type, and task. On the one hand, our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese. On the other hand, our DeepSolo++ achieves better performance on the additionally introduced script identification task with a simpler training pipeline compared with previous methods. In addition, our models are also compatible with line annotations, which require much less annotation cost than polygons. The code is available at \url{https://github.com/ViTAE-Transformer/DeepSolo}.

翻訳日:2024-03-21 01:20:39 公開日:2024-03-18

# BeyondPixels: ニューラルネットワークの進化の概観

BeyondPixels: A Comprehensive Review of the Evolution of Neural Radiance Fields ( http://arxiv.org/abs/2306.03000v3 )

ライセンス: Link先を確認

AKM Shahariar Azad Rabby, Chengcui Zhang,

(参考訳) ニューラルレンダリングは、古典的なコンピュータグラフィックスと機械学習のアイデアを組み合わせて、現実世界の観察から画像を合成する。 NeRF(Neural Radiance Fieldsの略)は、AIアルゴリズムを使用して2D画像から3Dオブジェクトを生成する最近のイノベーションである。補間アプローチを活用することで、NeRFは複雑なシーンの新しい3D再構成ビューを生成することができる。 3Dシーンの形状を直接復元する代わりに、NeRFは「放射場」と呼ばれる体積表現を生成し、関連する3D空間内のすべての点について色と密度を生成できる。 NeRFの幅広い魅力と不明瞭さは、このトピックに関する既存の研究を包括的に調査することが不可欠である。 3Dレンダリングに関する以前の調査は、主に従来のコンピュータビジョンベースの、あるいはディープラーニングベースのアプローチに焦点を当てていたが、NeRFの可能性について議論する人はごくわずかである。しかし、これらの調査は主にNeRFの初期の貢献に焦点を合わせており、その潜在能力を探求していない。 NeRFは、その能力と限界について継続的に研究されている比較的新しい技術である。この調査は最近のNeRFの進歩を概観し、特に新規なビュー合成の分野において、それらのアーキテクチャ設計に従って分類する。

Neural rendering combines ideas from classical computer graphics and machine learning to synthesize images from real-world observations. NeRF, short for Neural Radiance Fields, is a recent innovation that uses AI algorithms to create 3D objects from 2D images. By leveraging an interpolation approach, NeRF can produce new 3D reconstructed views of complicated scenes. Rather than directly restoring the whole 3D scene geometry, NeRF generates a volumetric representation called a ``radiance field,'' which is capable of creating color and density for every point within the relevant 3D space. The broad appeal and notoriety of NeRF make it imperative to examine the existing research on the topic comprehensively. While previous surveys on 3D rendering have primarily focused on traditional computer vision-based or deep learning-based approaches, only a handful of them discuss the potential of NeRF. However, such surveys have predominantly focused on NeRF's early contributions and have not explored its full potential. NeRF is a relatively new technique continuously being investigated for its capabilities and limitations. This survey reviews recent advances in NeRF and categorizes them according to their architectural designs, especially in the field of novel view synthesis.

翻訳日:2024-03-21 01:20:39 公開日:2024-03-18

# 一般化可能なロボットマニピュレーションのための転移基礎モデル

Transferring Foundation Models for Generalizable Robotic Manipulation ( http://arxiv.org/abs/2306.05716v4 )

ライセンス: Link先を確認

Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, Limin Wang,

(参考訳) 現実世界における汎用ロボット操作エージェントの一般化能力の向上は、長い間大きな課題であった。既存のアプローチは、RT-1データセットのようなコストと時間を要する大規模なロボットデータの収集に依存していることが多い。しかし、データの多様性が不十分なため、これらのアプローチは一般的に、新しいオブジェクトと多様な環境を持つオープンドメインシナリオにおける能力を制限することに悩まされる。本稿では,インターネット規模の基盤モデルによって生成された言語推論セグメンテーションマスクを,ロボット操作タスクの条件付けに効果的に活用する新しいパラダイムを提案する。視覚基盤モデルから導かれる意味的・幾何学的・時間的相関をエンド・ツー・エンドのポリシーモデルに組み込んだマスクのモダリティを組み込むことにより,本手法はオブジェクトのポーズを効果的かつ堅牢に知覚し,新しいオブジェクトインスタンス,セマンティックカテゴリ,目に見えない背景を含むサンプル効率のよい一般化学習を可能にする。まず、複数のタスクにまたがる自然言語要求を基盤とする基礎モデルを紹介します。第2に,実画像とオブジェクトマスクを処理する模倣学習に基づく2ストリーム2Dポリシーモデルを構築し,局所的な認識方式でロボットの動作を予測する。フランカ・エミカのロボットアームを用いた大規模な実世界実験により,提案したパラダイムとポリシーアーキテクチャの有効性が実証された。デモは提出されたビデオで見ることができ、より包括的なデモはlink1またはlink2で見ることができます。

Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge. Existing approaches often rely on collecting large-scale robotic data which is costly and time-consuming, such as the RT-1 dataset. However, due to insufficient diversity of data, these approaches typically suffer from limiting their capability in open-domain scenarios with new objects and diverse environments. In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks. By integrating the mask modality, which incorporates semantic, geometric, and temporal correlation priors derived from vision foundation models, into the end-to-end policy model, our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning, including new object instances, semantic categories, and unseen backgrounds. We first introduce a series of foundation models to ground natural language demands across multiple tasks. Secondly, we develop a two-stream 2D policy model based on imitation learning, which processes raw images and object masks to predict robot actions with a local-global perception manner. Extensive realworld experiments conducted on a Franka Emika robot arm demonstrate the effectiveness of our proposed paradigm and policy architecture. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.

翻訳日:2024-03-21 01:20:39 公開日:2024-03-18

# 対戦トレーニングは非ゼロサムゲームとしてキャストされるべきである

Adversarial Training Should Be Cast as a Non-Zero-Sum Game ( http://arxiv.org/abs/2306.11035v2 )

ライセンス: Link先を確認

Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher,

(参考訳) ディープニューラルネットワークの敵対的脆弱性を解決するための1つの顕著なアプローチは、敵対的トレーニングの2つのプレイヤーゼロサムパラダイムであり、予測者は敵対的に選択されたデータの摂動に対して訓練される。このアプローチの約束にもかかわらず、このパラダイムに基づくアルゴリズムは十分なロバストネスのレベルを示さず、ロバストオーバーフィッティングのような病理学的行動に悩まされている。この欠点を理解するために、まず、敵対的学習アルゴリズムでよく使われる代理に基づく緩和が、訓練された分類器の堅牢性に関するすべての保証を無効にすることを示す。この落とし穴の特定は、対戦訓練の非ゼロサム二段階の新たな定式化を通知し、各プレイヤーは異なる目的関数を最適化する。我々の定式化は、単純なアルゴリズムの枠組みを生み出し、場合によっては最先端の攻撃よりも優れ、標準的な敵の訓練アルゴリズムに匹敵する堅牢性を達成し、頑強なオーバーフィッティングに苦しむことはない。

One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.

翻訳日:2024-03-21 01:10:08 公開日:2024-03-18

# 線形識別学習における周波数効果

Frequency effects in Linear Discriminative Learning ( http://arxiv.org/abs/2306.11044v2 )

ライセンス: Link先を確認

Maria Heitmeier, Yu-Ying Chuang, Seth D. Axen, R. Harald Baayen,

(参考訳) 単語頻度は、ほとんどの語彙処理タスクにおいて強力な予測器である。したがって、どんな単語認識モデルでも、単語の周波数効果がどのように生じるかを考慮する必要がある。識別辞書モデル (DLM; Baayen et al , 2018a, 2019) は、単語の形式とその意味を線形にマッピングした語彙処理をモデル化する。これまでのところ、これらのマッピングは、誤り駆動学習(英語版)によって段階的に得られるか、あるいは全ての単語が最適に学習される理論的な学習状態(EL)をモデル化する、効率的だが周波数に依存しない計算コストの高いプロセスである。本研究では, 形式と意味の効率よく, 周波数インフォームドマッピングが実現可能であることを示す(周波数インフォームド学習; FIL)。 FILは計算コストをはるかに安くしながら、インクリメンタルな解をよく近似していることが分かりました。 FILは比較的低い型と高いトークン精度を示し、モデルが日々の生活の中で話者が遭遇するほとんどのワードトークンを正しく処理できることを示した。我々は、オランダのLexicon Project (Keuleers et al , 2010) において、FILを用いて反応時間をモデル化し、FILが周波数と反応時間の平均の間のS字型関係を適切に予測するが、低頻度語に対する反応時間のばらつきを過小評価する。 FILは,マンダリン中国語(Lee, 2007)の聴覚語彙決定タスクにおいて,ELと比較してプライミング効果を考慮しやすくしている。最後に, CHILDES (Brown, 1973; Demuth et al , 2006) の順序データを用いて, FIL と漸進学習を用いて得られた写像を比較した。写像は高い相関性を持つが、FILでは単語順序効果に基づくニュアンスの一部が失われる。本研究は,学習モデルにおける周波数効果を効率的にシミュレートする方法を示し,認知モデルにおける低頻度単語の最適な説明法について疑問を投げかけるものである。

Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently, and raise questions about how to best account for low-frequency words in cognitive models.

翻訳日:2024-03-21 01:10:08 公開日:2024-03-18

# ポケット特異的分子生成と実験のための関数群に基づく拡散

Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration ( http://arxiv.org/abs/2306.13769v3 )

ライセンス: Link先を確認

Haitao Lin, Yufei Huang, Odin Zhang, Lirong Wu, Siyuan Li, Zhiyuan Chen, Stan Z. Li,

(参考訳) 近年、標的タンパク質のポケットの構造から分子を生成するためにAIによる薬物設計法が提案されている。その多くは原子準位に基づく手法であり、原子を基本成分とみなし、原子の位置と型を生成する。しかし、この方法では複雑な構造を持つ現実的な断片を生成することは困難である。そこで我々はD3FGを提案する。D3FGはポケット固有の分子の生成と実験のための機能群に基づく拡散モデルである。 D3FGは分子を、剛体として定義される官能基と質量点としてのリンカーの2つのカテゴリに分解する。そしてこの2種類の成分は、リガンドとタンパク質の相互作用を強化する複雑な断片を形成することができる。具体的には、拡散過程において、D3FGは、成分の位置、向き、タイプのデータ分布を事前分布に拡散させ、生成過程において、設計された同変グラフニューラルネットワークでパラメータ化して、3変数からノイズを徐々に除去する。実験では, より現実的な3次元構造, タンパク質標的に対する競合親和性, 薬物特性の良好な分子を生成できる。さらに、D3FGは分子の発見の新たな課題の解決策として、既存のリガンドと標的タンパク質のホットスポットに基づいて高い親和性を持つ分子を生成することができる。

In recent years, AI-assisted drug design methods have been proposed to generate molecules given the pockets' structures of target proteins. Most of them are atom-level-based methods, which consider atoms as basic components and generate atom positions and types. In this way, however, it is hard to generate realistic fragments with complicated structures. To solve this, we propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration. D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points. And the two kinds of components can together form complicated fragments that enhance ligand-protein interactions. To be specific, in the diffusion process, D3FG diffuses the data distribution of the positions, orientations, and types of the components into a prior distribution; In the generative process, the noise is gradually removed from the three variables by denoisers parameterized with designed equivariant graph neural networks. In the experiments, our method can generate molecules with more realistic 3D structures, competitive affinities toward the protein targets, and better drug properties. Besides, D3FG as a solution to a new task of molecule elaboration, could generate molecules with high affinities based on existing ligands and the hotspots of target proteins.

翻訳日:2024-03-21 01:10:08 公開日:2024-03-18

# DNABERT-2:多種ゲノムの効率的な基盤モデルとベンチマーク

DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome ( http://arxiv.org/abs/2306.15006v2 )

ライセンス: Link先を確認

Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, Han Liu,

(参考訳) DNABERTやヌクレオチドトランスフォーマーといった事前訓練された基礎モデルがこの領域で大きな進歩を遂げている。既存の研究は、A、T、C、Gのk-mer、固定長の置換に、その単純さからゲノム言語のトークンとして大きく依存している。しかし、k-merのトークン化によって引き起こされる計算とサンプルの非効率性は、大規模なゲノム基盤モデルの開発における主要な障害である。そこで我々は,k-merのトークン化をByte Pair Encoding (BPE) に置き換えることを提案する。これは統計に基づくデータ圧縮アルゴリズムで,コーパス内の最も頻繁な共起ゲノムセグメントを反復的にマージすることでトークンを構築する。我々は、BPEがk-merトークン化の限界を克服するだけでなく、重複しないトークン化の計算効率から恩恵を受けることを示した。これらの知見に基づき,DNABERT-2を導入した。DNABERT-2は効率的なプロテタイザに適応し,入力長制約を克服し,時間とメモリ消費を低減し,モデル機能を向上させる。さらに、ゲノム理解のための包括的で標準化されたベンチマークが欠如していることを、公正な比較分析のもう一つの重要な障害とみなす。これに対応するために、GUE(Genome Understanding Evaluation)という総合的な多種ゲノム分類データセットを提案し、このデータセットは、9ドルのタスクで36ドルの異なるデータセットをアマルガットし、入力長は70ドルから10000ドルである。 GUEベンチマークの総合的な実験を通じて、DNABERT-2は、21ドル(約2万2000円)のパラメータと約920ドル(約9万2000円)のGPUトレーニング前の時間で、最先端モデルに匹敵するパフォーマンスを達成することを実証した。

Decoding the linguistic intricacies of the genome is a crucial problem in biology, and pre-trained foundational models such as DNABERT and Nucleotide Transformer have made significant strides in this area. Existing works have largely hinged on k-mer, fixed-length permutations of A, T, C, and G, as the token of the genome language due to its simplicity. However, we argue that the computation and sample inefficiencies introduced by k-mer tokenization are primary obstacles in developing large genome foundational models. We provide conceptual and empirical insights into genome tokenization, building on which we propose to replace k-mer tokenization with Byte Pair Encoding (BPE), a statistics-based data compression algorithm that constructs tokens by iteratively merging the most frequent co-occurring genome segment in the corpus. We demonstrate that BPE not only overcomes the limitations of k-mer tokenization but also benefits from the computational efficiency of non-overlapping tokenization. Based on these insights, we introduce DNABERT-2, a refined genome foundation model that adapts an efficient tokenizer and employs multiple strategies to overcome input length constraints, reduce time and memory expenditure, and enhance model capability. Furthermore, we identify the absence of a comprehensive and standardized benchmark for genome understanding as another significant impediment to fair comparative analysis. In response, we propose the Genome Understanding Evaluation (GUE), a comprehensive multi-species genome classification dataset that amalgamates $36$ distinct datasets across $9$ tasks, with input lengths ranging from $70$ to $10000$. Through comprehensive experiments on the GUE benchmark, we demonstrate that DNABERT-2 achieves comparable performance to the state-of-the-art model with $21 \times$ fewer parameters and approximately $92 \times$ less GPU time in pre-training.

翻訳日:2024-03-21 01:10:08 公開日:2024-03-18

# 深層学習による新型コロナウイルス研究のためのソーシャルメディア情報検索の合理化

Streamlining Social Media Information Retrieval for COVID-19 Research with Deep Learning ( http://arxiv.org/abs/2306.16001v3 )

ライセンス: Link先を確認

Yining Hua, Jiageng Wu, Shixu Lin, Minghui Li, Yujie Zhang, Dinah Foer, Siwen Wang, Peilin Zhou, Jie Yang, Li Zhou,

(参考訳) 目的:ソーシャルメディアベースの公衆衛生研究は疫病の監視に不可欠であるが、ほとんどの研究はキーワードマッチングを伴う関連コーパスを特定している。本研究は,口語医学辞典の整理過程を合理化するシステムを開発した。我々は、新型コロナウイルス関連ツイートからUMLS-coloquial symptom dictionaryを算出し、そのパイプラインを概念実証として示す。方法:2020年2月1日から2022年4月30日までの新型コロナウイルス関連のツイートが使用された。パイプラインには、ツイート中の症状を検出する名前付きエンティティ認識モジュール、検出されたエンティティを集約するエンティティ正規化モジュール、エンティティを統一医療言語システムの概念に反復的にマッピングするマッピングモジュールの3つのモジュールが含まれている。最終的な辞書からランダムな500個のエンティティサンプルを抽出し、精度検証を行った。さらに, 先行研究から, 辞書を予め定義された辞書と比較するために, 症状頻度分布解析を行った。結果: ツイートから498,480のユニークな症状を抽出した。プリプロセッシングは18,226まで減少する。最終辞書には、966 UMLSの概念にマッピングできる症状の38,175のユニークな表現が含まれている(精度=95%)。症状分布分析の結果,我々の辞書はより多くの症状を検知し,不安やうつ病などの精神疾患の同定に有効であることが判明した。結論: この研究は, ソーシャルメディアデータから症状レキシコンをキュレートするための, 新たな体系的パイプラインを実装することによって, 公衆衛生研究を推進している。医療専門家によって検証された最終レキシコンの高精度さは、この手法が膨大な量の構造化されていないソーシャルメディアデータを、多様な地域・地域景観にまたがる行動可能な医学的洞察に確実に解釈し分類する可能性を強調している。

Objective: Social media-based public health research is crucial for epidemic surveillance, but most studies identify relevant corpora with keyword-matching. This study develops a system to streamline the process of curating colloquial medical dictionaries. We demonstrate the pipeline by curating a UMLS-colloquial symptom dictionary from COVID-19-related tweets as proof of concept. Methods: COVID-19-related tweets from February 1, 2020, to April 30, 2022 were used. The pipeline includes three modules: a named entity recognition module to detect symptoms in tweets; an entity normalization module to aggregate detected entities; and a mapping module that iteratively maps entities to Unified Medical Language System concepts. A random 500 entity sample were drawn from the final dictionary for accuracy validation. Additionally, we conducted a symptom frequency distribution analysis to compare our dictionary to a pre-defined lexicon from previous research. Results: We identified 498,480 unique symptom entity expressions from the tweets. Pre-processing reduces the number to 18,226. The final dictionary contains 38,175 unique expressions of symptoms that can be mapped to 966 UMLS concepts (accuracy = 95%). Symptom distribution analysis found that our dictionary detects more symptoms and is effective at identifying psychiatric disorders like anxiety and depression, often missed by pre-defined lexicons. Conclusions: This study advances public health research by implementing a novel, systematic pipeline for curating symptom lexicons from social media data. The final lexicon's high accuracy, validated by medical professionals, underscores the potential of this methodology to reliably interpret and categorize vast amounts of unstructured social media data into actionable medical insights across diverse linguistic and regional landscapes.

翻訳日:2024-03-21 01:10:08 公開日:2024-03-18

# T-MARS:テキスト特徴学習による視覚表現の改善

T-MARS: Improving Visual Representations by Circumventing Text Feature Learning ( http://arxiv.org/abs/2307.03132v2 )

ライセンス: Link先を確認

Pratyush Maini, Sachin Goyal, Zachary C. Lipton, J. Zico Kolter, Aditi Raghunathan,

(参考訳) 大規模なWebソースのマルチモーダルデータセットは、汎用的な視覚表現を学習し、コンピュータビジョンの最先端を推し進め、ゼロショットと少数ショットの認識に革命をもたらす、数多くの新しい方法に力を入れている。実践者が直面する決定の1つは、たとえ何であれ、より大きくなったデータセットをどのようにキュレートするかだ。例えば、LAION-5Bデータセットの作成者は、CLIPの類似度スコアが指定された閾値を超えたイメージキャプチャペアのみを保持することを選択した。本稿では,LAIONの画像の40%近くが字幕と重なるテキストを含んでいるという観察を動機とした,最新のデータフィルタリング手法を提案する。直感的には、このようなデータは視覚的特徴を学習するのではなく、光学的文字認識を行うモデルにインセンティブを与えるため、無駄になる可能性がある。しかし、視覚的特徴を含む画像を(重なり合うテキストに加えて)捨ててしまうため、こうしたデータを全て取り除くのは無駄になる可能性がある。私たちのシンプルでスケーラブルなアプローチであるT-MARS(Text Masking and Re-Scoring)は、テキストが残りの視覚的特徴を支配しているペアのみをフィルタリングします。実験的に、T-MARSは、DataCompの"medium scale"(データフィルタリングベンチマーク)において、ImageNetの6.5%、VTABの4.7%のマージンで、トップランクの手法より優れている。さらに, 2M から 64M までのデータプールサイズを系統的に評価した結果,T-MARS による精度向上はデータや計算が指数関数的に大きくなるにつれて線形的に増加することが示された。コードはhttps://github.com/locuslab/T-MARSで入手できる。

Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.

翻訳日:2024-03-21 01:10:08 公開日:2024-03-18

# 動的グラフのためのディープラーニング:モデルとベンチマーク

Deep learning for dynamic graphs: models and benchmarks ( http://arxiv.org/abs/2307.06104v3 )

ライセンス: Link先を確認

Alessio Gravina, Davide Bacciu,

(参考訳) 近年,Deep Graph Networks (DGNs) の研究が進展し,グラフ上の学習領域が成熟した。この研究分野の成長にもかかわらず、まだ解決されていない重要な課題がまだ残っている。具体的には、時間とともに進化する相互接続された実体の現実的なシステムにおいて、予測タスクに適したDGNを作ることが望まれている。動的グラフの領域における研究を促進することを目的として、まず、時間情報と空間情報の両方を学ぶことの最近の利点を調査し、動的グラフの表現学習領域における現在の最先端技術の概要を概観する。第二に、ノードとエッジレベルのタスクに関する最も一般的な提案手法と比較して、厳密なモデル選択と評価を活用して、新しいアーキテクチャとアプローチを評価するためのサウンドベースラインを確立する。

Recent progress in research on Deep Graph Networks (DGNs) has led to a maturation of the domain of learning on graphs. Despite the growth of this research field, there are still important challenges that are yet unsolved. Specifically, there is an urge of making DGNs suitable for predictive tasks on realworld systems of interconnected entities, which evolve over time. With the aim of fostering research in the domain of dynamic graphs, at first, we survey recent advantages in learning both temporal and spatial information, providing a comprehensive overview of the current state-of-the-art in the domain of representation learning for dynamic graphs. Secondly, we conduct a fair performance comparison among the most popular proposed approaches on node and edge-level tasks, leveraging rigorous model selection and assessment for all the methods, thus establishing a sound baseline for evaluating new architectures and approaches

翻訳日:2024-03-21 01:00:25 公開日:2024-03-18

# 1次元導波路QEDシステムにおける多重サイドバンド干渉による単一原子増幅

Single-Atom Amplification Assisted by Multiple Sideband Interference in 1D Waveguide QED Systems ( http://arxiv.org/abs/2307.11174v2 )

ライセンス: Link先を確認

Kuan-Ting Lin, Ting Hsu, Fahad Aziz, Yu-Chen Lin, Ping-Yi Wen, Io-Chun Hoi, Guin-Dar Lin,

(参考訳) 本研究では1次元導波路量子電磁力学系における複数のRabiサイドバンドコヒーレンスから生じる信号増幅に関する理論的研究を行う。我々は半無限導波路を用いて、強いコヒーレントマイクロ波場を持つ非調和多層トランスモンを駆動し、プローブ信号を導入して散乱挙動を調べる。文献上より微細なスペクトルを呈し, 特定の共振条件下での信号増幅について検討した。この増幅の背後にあるメカニズムを解明するために、強い駆動場が存在する場合に複数の服を着たサイドバンドを明示的に考慮するモデルを開発した。このモデルからプローブ信号の反射振幅を導出する。特に,本研究の結果は,集団逆転の有無がなくても,集団逆転や複数のサイドバンドの構成的干渉によって増幅が生じる可能性が示唆された。さらに、量子ビットのデフォーカスが増幅プロセスにどのように影響するかについても検討する。

This study conducts a theoretical investigation into the signal amplification arising from multiple Rabi sideband coherence within a one-dimensional waveguide quantum electrodynamics system. We utilize a semi-infinite waveguide to drive an anharmonic multi-level transmon with a strong coherent microwave field, examining the scattering behavior by introducing a probe signal. Our findings reveal signal amplification under specific resonant conditions, presenting spectra that reveal finer details than previously documented in the literature. To elucidate the mechanisms behind this amplification, we develop a model that explicitly accounts for multiple dressed sidebands in the presence of a strong driving field. From this model, we derive the reflection amplitude of the probe signal. Notably, our results indicate that amplification can occur due to either population inversion or, in some instances, through the constructive interference of multiple sidebands even in the absence of population inversion. Additionally, we explore how qubit dephasing impacts the amplification process.

翻訳日:2024-03-21 01:00:25 公開日:2024-03-18

# 局所精製密度演算子と局所測定を用いたスケーラブル量子状態トモグラフィ

Scalable Quantum State Tomography with Locally Purified Density Operators and Local Measurements ( http://arxiv.org/abs/2307.16381v2 )

ライセンス: Link先を確認

Yuchen Guo, Shuo Yang,

(参考訳) 量子システムを理解することは、量子ハードウェアとソフトウェアの性能の評価、および量子制御と量子センシングの探索において重要である。量子状態の効率的な表現により、最小限の測定で量子状態トモグラフィを実現することができる。本研究では局所的に精製された密度演算子を通して混合状態のテンソルネットワーク表現を用い,局所的な測定のみを必要とする古典的データ後処理アルゴリズムを用いる状態トモグラフィーの新しいアプローチを提案する。 1次元純状態と2次元純状態の数値シミュレーションにより,提案手法の効率,精度,ロバスト性を実証した。 IBM と Quafu Quantum プラットフォームでの実験はこれらの数値シミュレーションを補完する。本研究では,テンソルネットワーク形式を用いた2次元システムのための量子状態トモグラフィの新たな道を開く。

Understanding quantum systems is of significant importance for assessing the performance of quantum hardware and software, as well as exploring quantum control and quantum sensing. An efficient representation of quantum states enables realizing quantum state tomography with minimal measurements. In this study, we propose a new approach to state tomography that uses tensor network representations of mixed states through locally purified density operators and employs a classical data postprocessing algorithm requiring only local measurements. Through numerical simulations of one-dimensional pure and mixed states and two-dimensional pure states up to size $8\times 8$, we demonstrate the efficiency, accuracy, and robustness of our proposed methods. Experiments on the IBM and Quafu Quantum platforms complement these numerical simulations. Our study opens new avenues in quantum state tomography for two-dimensional systems using tensor network formalism.

翻訳日:2024-03-21 01:00:25 公開日:2024-03-18

# 高頻度半導体量子ドットの断熱的量子アドミタンス:リフレクションメトリーをポラロンダイナミクスとして再考

Beyond-adiabatic Quantum Admittance of a Semiconductor Quantum Dot at High Frequencies: Rethinking Reflectometry as Polaron Dynamics ( http://arxiv.org/abs/2307.16725v5 )

ライセンス: Link先を確認

L. Peri, G. A. Oakes, L. Cochrane, C. J. B. Ford, M. F. Gonzalez-Zalba,

(参考訳) 量子ドットは動的に動作し、量子センサやコンピュータなどの多くの量子技術の基礎となっている。したがって、マイクロ波周波数での電気特性のモデル化は、より大きな電子回路での性能をシミュレートするために不可欠である。そこで我々は,コヒーレント光子浴の効果により,電荷貯留層に結合した量子ドットトンネルの存在感を得るために,自己整合型量子マスター方程式の定式化を開発する。本研究では, フォトニックドライブの共振器と共振器との結合が増大し, 寿命の推移とともに, 既知の半古典的(熱的)限界を捉えたアクセタンスに対する一般表現を求める。さらに,Floquet wideeningはQD状態のドレッシングによって決定され,Floquet wideeningはシステム内の光子損失によって決定される。本研究では,QDの高周波挙動を広範囲に再現し,過去の実験を記述し,新しいQD-光子相互作用の探索法を提案する。

Semiconductor quantum dots operated dynamically are the basis of many quantum technologies such as quantum sensors and computers. Hence, modelling their electrical properties at microwave frequencies becomes essential to simulate their performance in larger electronic circuits. Here, we develop a self-consistent quantum master equation formalism to obtain the admittance of a quantum dot tunnel-coupled to a charge reservoir under the effect of a coherent photon bath. We find a general expression for the admittance that captures the well-known semiclassical (thermal) limit, along with the transition to lifetime and power broadening regimes due to the increased coupling to the reservoir and amplitude of the photonic drive, respectively. Furthermore, we describe two new photon-mediated regimes: Floquet broadening, determined by the dressing of the QD states, and broadening determined by photon loss in the system. Our results provide a method to simulate the high-frequency behaviour of QDs in a wide range of limits, describe past experiments, and propose novel explorations of QD-photon interactions.

翻訳日:2024-03-21 01:00:25 公開日:2024-03-18

# 知覚CLIP:コンテキストの推論と条件付けによる視覚的分類

PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts ( http://arxiv.org/abs/2308.01313v3 )

ライセンス: Link先を確認

Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang,

(参考訳) CLIPのような視覚言語モデルは、様々な視覚概念や自然言語の記述を理解する能力のため、ゼロショット画像分類で広く使われている。しかし、より優れたパフォーマンスを達成するために、CLIPの先例のない人間的な理解能力をフル活用する方法は、まだ未解決の課題である。本論文は,物体の視覚的知覚過程からインスピレーションを得たもので,まず,前景の物体を背景から分離し,その情報に基づいて対象物を分類する,文脈的属性(背景,方向など)を推定する。このことから,CLIPを文脈属性で提供することにより,ゼロショット画像の分類が向上し,スプリアス機能への依存が軽減されることがわかった。また、CLIP自体が画像から属性を合理的に推測できることも観察します。そこで本研究では,トレーニング不要で2段階のゼロショット分類手法PerceptionCLIPを提案する。画像が与えられたら、まずコンテキスト属性(例えば、背景)を推論し、それに基づいてオブジェクト分類条件を実行する。実験の結果,PerceptionCLIPはより優れた一般化,グループロバスト性,相互運用性を実現することがわかった。私たちのコードはhttps://github.com/umd-huang-lab/perceptionCLIPで利用可能です。

Vision-language models like CLIP are widely used in zero-shot image classification due to their ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better performance is still an open question. This paper draws inspiration from the human visual perception process: when classifying an object, humans first infer contextual attributes (e.g., background and orientation) which help separate the foreground object from the background, and then classify the object based on this information. Inspired by it, we observe that providing CLIP with contextual attributes improves zero-shot image classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and interoperability. Our code is available at https://github.com/umd-huang-lab/perceptionCLIP

翻訳日:2024-03-21 01:00:25 公開日:2024-03-18

# DIG In:地理多様性指標を用いた画像生成の差異評価

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity ( http://arxiv.org/abs/2308.06198v3 )

ライセンス: Link先を確認

Melissa Hall, Candace Ross, Adina Williams, Nicolas Carion, Michal Drozdzal, Adriana Romero Soriano,

(参考訳) 最近のテキスト・ツー・イメージ生成システムによる前例のないフォトリアリスティックな結果と、プラグ・アンド・プレイのコンテンツ生成ソリューションとしての利用の増加により、それらの潜在的なバイアスを理解することが不可欠である。本研究では,世界からオブジェクトを生成するように促されたテキスト・ツー・イメージ生成システムの現実性,多様性,迅速な生成一貫性を評価するための3つの指標を提案する。視覚コンテンツ作成システムの構築に向けた重要なステップとして,地理的格差の自動的かつ効率的なベンチマークを可能にすることで,このようなシステムの広範な影響の質的分析を補完する。提案した指標を用いて,現在最先端のビジュアルコンテンツ生成システムにおける潜在的な地理的バイアスを分析し,(1) モデルがアフリカや西アジアに向けて欧州よりも現実性や世代多様性が低いこと,(2) 地理的情報によって生成した画像の一貫性と多様性の促進にコストがかかること,(3) モデルが他のオブジェクトよりも領域レベルの格差が大きいこと,などを見出した。おそらく最も興味深いのは、画像生成品質の進歩は、現実世界の地理的表現のコストがかかることを示唆している。包括的評価は、視覚コンテンツ制作のポジティブな体験を確保するための重要なステップである。

The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play content creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone.

翻訳日:2024-03-21 01:00:25 公開日:2024-03-18

# 解毒剤の強化: 毒殺攻撃に対するポイントワイズ認証の改善

Enhancing the Antidote: Improved Pointwise Certifications against Poisoning Attacks ( http://arxiv.org/abs/2308.07553v2 )

ライセンス: Link先を確認

Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I. P. Rubinstein,

(参考訳) 毒殺攻撃は、トレーニングコーパスに小さな変更を加えることで、モデル行動に不当に影響を及ぼす可能性がある。特定の毒殺攻撃に対する防御は存在するが、一般的には保証はない。対照的に、最悪の場合の振る舞いを調べることで、認証された防衛は、ポイントワイド認証として知られる限られた数のトレーニングサンプルを変更する敵攻撃に対して、サンプルの堅牢性を保証することができる。これを実現するために、差分プライバシーとサンプリングガウス機構の両方を利用して、有限個の有毒例に対して各テストインスタンスの予測のばらつきを確実にする。そうすることで、我々のモデルは、以前の認証の2倍以上の大きさの敵の堅牢性を保証する。

Poisoning attacks can disproportionately influence model behaviour by making small changes to the training corpus. While defences against specific poisoning attacks do exist, they in general do not provide any guarantees, leaving them potentially countered by novel attacks. In contrast, by examining worst-case behaviours Certified Defences make it possible to provide guarantees of the robustness of a sample against adversarial attacks modifying a finite number of training samples, known as pointwise certification. We achieve this by exploiting both Differential Privacy and the Sampled Gaussian Mechanism to ensure the invariance of prediction for each testing instance against finite numbers of poisoned examples. In doing so, our model provides guarantees of adversarial robustness that are more than twice as large as those provided by prior certifications.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# ユークリッド関数の最適化に関するワープ幾何情報

Warped geometric information on the optimisation of Euclidean functions ( http://arxiv.org/abs/2308.08305v2 )

ライセンス: Link先を確認

Marcelo Hartmann, Bernardo Williams, Hanlin Yu, Mark Girolami, Alessandro Barp, Arto Klami,

(参考訳) 多くの機械学習タスクにおける損失関数や統計的推論における確率分布の対数といった、潜在的に高次元ユークリッド空間で定義される実数値関数を最適化する基本的なタスクを考える。我々はリーマン幾何学の概念を用いてユークリッド空間上の函数の最適化問題を、歪んだ計量を持つリーマン多様体に再定義し、その多様体に沿った函数の最適性を求める。探索領域に選択された歪んだ計量は、多様体上の測地線曲線に付随する最適な探索方向を計算しやすくする計算フレンドリーな計量テンソルを誘導する。測地線に沿った最適化の実行は一般に不可能であることが知られているが、この特定の多様体ではテイラー近似を3階まで解析的に導出できることが示される。一般に、これらの測地線曲線への近似は多様体上には属さないが、多様体にそれらを引き戻すのに適した退化写像を構築する。したがって、近似測地線曲線に沿って効率的に最適化できる。関連する理論を網羅し、実用的な最適化アルゴリズムを記述し、挑戦的な最適化ベンチマークのコレクション上でそれを実証的に評価する。提案アルゴリズムは測地学の3次近似を用いており、収束するまでの反復数で標準ユークリッド勾配法よりも優れている傾向にある。

We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# ユーザ反応予測のための時間的関心ネットワーク

Temporal Interest Network for User Response Prediction ( http://arxiv.org/abs/2308.08487v2 )

ライセンス: Link先を確認

Haolin Zhou, Junwei Pan, Xinyi Zhou, Xihua Chen, Jie Jiang, Xiaofeng Gao, Guihai Chen,

(参考訳) オンラインディスプレイ広告のような産業レコメンデーションシステムでは,ユーザ反応の予測が不可欠である。レコメンデーションモデルのすべての機能の中で、ユーザの振る舞いが最も重要になります。多くの研究で、ユーザの行動は、行動と候補者の間の意味的あるいは時間的相関から、候補項目に対するユーザの関心を反映していることが明らかになっている。論文はそれぞれの相関関係を個別に検討しているが、研究者はまだそれらを意味的・時間的相関関係(意味的・時間的相関関係)と組み合わせて分析していない。我々はこの相関を経験的に測定し、直感的で頑健なパターンを観察する。そして、いくつかの人気ユーザー関心モデルを調べ、驚くべきことに、誰もそのような相関関係をうまく学ばないということに気付きました。このギャップを埋めるために,行動と対象間の意味的時間的相関を同時に捉えるための時間的関心ネットワーク(TIN)を提案する。これを実現するために,意味的エンコーディングに加えて,対象を意識したテンポラルエンコーディングを組み込んで行動や対象を表現する。さらに,ターゲット認識とターゲット認識表現を配置して,意味的・時間的相関を捉えることで,明示的な4方向インタラクションを行う。我々は2つの人気のある公開データセットに対して総合的な評価を行い、提案したTINはGAUCにおいてそれぞれ0.43%、0.29%で最高のパフォーマンスのベースラインを上回ります。 Tencentの広告プラットフォームにおけるオンラインA/Bテストでは、TINは1.65%のコストリフトと1.93%のGMVリフトを達成した。 2023年10月から運用に成功し、WeChat Momentsのトラフィックを処理した。コードをhttps://github.com/zhouxy1003/TINでリリースしました。

User response prediction is essential in industrial recommendation systems, such as online display advertising. Among all the features in recommendation models, user behaviors are among the most critical. Many works have revealed that a user's behavior reflects her interest in the candidate item, owing to the semantic or temporal correlation between behaviors and the candidate. While the literature has individually examined each of these correlations, researchers have yet to analyze them in combination, that is, the semantic-temporal correlation. We empirically measure this correlation and observe intuitive yet robust patterns. We then examine several popular user interest models and find that, surprisingly, none of them learn such correlation well. To fill this gap, we propose a Temporal Interest Network (TIN) to capture the semantic-temporal correlation simultaneously between behaviors and the target. We achieve this by incorporating target-aware temporal encoding, in addition to semantic encoding, to represent behaviors and the target. Furthermore, we conduct explicit 4-way interaction by deploying target-aware attention and target-aware representation to capture both semantic and temporal correlation. We conduct comprehensive evaluations on two popular public datasets, and our proposed TIN outperforms the best-performing baselines by 0.43% and 0.29% on GAUC, respectively. During online A/B testing in Tencent's advertising platform, TIN achieves 1.65% cost lift and 1.93% GMV lift over the base model. It has been successfully deployed in production since October 2023, serving the WeChat Moments traffic. We have released our code at https://github.com/zhouxy1003/TIN.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# サウジアラビアにおけるGoogleアカウント保有者のプライバシー認識と行動

Privacy Perceptions and Behaviors of Google Personal Account Holders in Saudi Arabia ( http://arxiv.org/abs/2308.10148v3 )

ライセンス: Link先を確認

Eman Alashwali, Lorrie Faith Cranor,

(参考訳) 西洋社会ではプライバシーの認識や行動が研究されているが、非西洋社会ではこれらの問題についてはほとんど分かっていない。このギャップを埋めるために、私たちはサウジアラビアのGoogleアカウント保有者30人に、Googleが保存した活動データに関するプライバシーの認識と行動についてインタビューした。我々の研究は、ユーザーがWeb \& App Activity、Location History、YouTube Historyを保存できるかどうかを制御できるGoogleのActivity Controlsに焦点を当てている。我々の結果によると、ほとんどの参加者はGoogleのデータプラクティスやアクティビティコントロールについてある程度の意識を持っているが、多くは曖昧な認識しか持っておらず、大多数は利用可能なコントロールを使用していない。参加者が保存した活動データを見たとき、多くの人が救われたことに驚きました。多くの参加者は、Googleが提供したサービスを改善するためにデータを使用することを容認しているが、大多数は広告目的でデータを使用することを容認できないと考えている。サウジアラビアの参加者は、米国の研究では、プライバシー意識、態度、好み、関心、行動に類似した傾向とパターンを示しています。我々の結果は以下の必要性を強調している。 1) ユーザに対して,アカウント登録時のプライバシ設定を通知し,ユーザに対して設定を通知し,プライバシ設定に対する意識を高める技術の改善。 2)プライバシー設定インタフェースの改善により、多くのユーザーが設定を変更するのを妨げているコストを削減する。 3)非西洋文化におけるプライバシーに関するさらなる研究。

While privacy perceptions and behaviors have been investigated in Western societies, little is known about these issues in non-Western societies. To bridge this gap, we interviewed 30 Google personal account holders in Saudi Arabia about their privacy perceptions and behaviors regarding the activity data that Google saves about them. Our study focuses on Google's Activity Controls, which enable users to control whether, and how, Google saves their Web \& App Activity, Location History, and YouTube History. Our results show that although most participants have some level of awareness about Google's data practices and the Activity Controls, many have only vague awareness, and the majority have not used the available controls. When participants viewed their saved activity data, many were surprised by what had been saved. While many participants find Google's use of their data to improve the services provided to them acceptable, the majority find the use of their data for ad purposes unacceptable. We observe that our Saudi participants exhibit similar trends and patterns in privacy awareness, attitudes, preferences, concerns, and behaviors to what has been found in studies in the US. Our results emphasize the need for: 1) improved techniques to inform users about privacy settings during account sign-up, to remind users about their settings, and to raise awareness about privacy settings; 2) improved privacy setting interfaces to reduce the costs that deter many users from changing the settings; and 3) further research to explore privacy concerns in non-Western cultures.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# 物体検出における不確かさの校正評価のための理論的・実践的枠組み

A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection ( http://arxiv.org/abs/2309.00464v2 )

ライセンス: Link先を確認

Pedro Conde, Rui L. Lopes, Cristiano Premebida,

(参考訳) ディープニューラルネットワークの普及により、機械学習システムは様々な現実世界のアプリケーションにますます存在感を増している。その結果,多くの領域において信頼性の高いモデルに対する需要が高まっており,深層学習の将来を考える上で,不確実性校正の問題が重要である。これは、自律運転、ロボット工学、医療診断などの安全上重要な応用に一般的に存在する物体検出システムを考えると特に当てはまる。そこで本研究では,不確実性校正の文脈において,物体検出システムを評価するための理論的,実践的な枠組みを提案する。これは、異なる形式的定義を通じてこの概念の新しい包括的定式化と、そのような理論の基礎から派生した3つの新しい評価指標を含む。提案した不確実性校正指標のロバスト性は, 一連の代表的な実験を通して示される。

The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in many domains, making the problem of uncertainty calibration pivotal when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical applications such as autonomous driving, robotics and medical diagnosis. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. This encompasses a new comprehensive formulation of this concept through distinct formal definitions, and also three novel evaluation metrics derived from such theoretical foundation. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# LeBenchmark 2.0: フランス語の自己教師型表現のための標準化され、再現可能で拡張されたフレームワーク

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech ( http://arxiv.org/abs/2309.05472v2 )

ライセンス: Link先を確認

Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier,

(参考訳) 自己教師付き学習(SSL)は、コンピュータビジョンや自然言語処理など、多くの異なる領域において前例のない改善がなされている。現在のドメイン関連のタスクのほとんどは、事前トレーニングされたモデルでアプローチされているため、音声処理はSSLから大幅に恩恵を受けています。この研究は、SSL対応のフランス語音声技術の評価と構築のためのオープンソースのフレームワークであるLeBenchmark 2.0を紹介している。これには、最大14,000時間のヘテロジニアスなスピーチを含む文書化された大規模で異質なコーパス、コミュニティと共有される2600万から10億の学習可能なパラメータを含む10のトレーニング済みSSL wav2vec 2.0モデル、既存のベンチマークを補完する6つの下流タスクからなる評価プロトコルが含まれる。 LeBenchmark 2.0はまた、凍結した下流モデルと微調整された下流モデル、タスクに依存しないモデルとタスク固有の事前訓練モデル、および大規模モデルトレーニングの炭素フットプリントに関する議論を含む、スピーチのための事前訓練されたSSLモデルに関するユニークな視点を提示する。全体として、フランス語の14,000時間でトレーニングされた新しいモデルは、マルチリンガルと以前のLeBenchmark SSLモデルよりも優れていたが、事前トレーニングには最大4倍のエネルギーが必要だった。

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training. Overall, the newly introduced models trained on 14,000 hours of French speech outperform multilingual and previous LeBenchmark SSL models across the benchmark but also required up to four times more energy for pre-training.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# (ほぼ)量子ベルの不等式とデバイス非依存の応用

(Almost-)Quantum Bell Inequalities and Device-Independent Applications ( http://arxiv.org/abs/2309.06304v3 )

ライセンス: Link先を確認

Yuan Liu, Ho Yiu Chung, Ravishankar Ramanathan,

(参考訳) 近年、量子ベルの不等式の導出による量子相関の境界に関する調査が注目されているが、これはツィレルソンの問題と関連しており、DI情報処理に重要な応用がある。しかし、量子ベルの不等式を決定することは、非常に難しい課題であり、孤立した例のみが知られている。本稿では、(ほぼ)量子ベルの不等式(英語版)のファミリーを提示し、3つの基礎的およびDI的応用に焦点を当てる。第一に、符号なし境界上の量子相関は弱い源からのDIランダム性抽出において重要である。 2つのkアウトカム測定を持つ2人のプレイヤーの現実的なベルのシナリオでは、量子境界が次元$\leq 4k-4$の非局所なポリトープの面から分離されていることを示す量子ベルの不等式が導かれる。直近の副産物として、量子系に対するオーマンの合意定理とほぼ量子相関の一般的な証明を与える。これは、オーマンの合意定理が、一般的な非符号理論から量子理論とほぼ量子相関の両方を選ぶための、疫学の文脈における合理的な物理原理であることを意味する。第二に、m二乗測定シナリオを持つ2人のプレイヤーに量子ベルの不等式(英語版)の族を提示し、2量子ビットのシングルレットと2mの測定を自己検証する。興味深いことに、この主張はTsirelson-Landau-Masanesによって発見された m=2 の結果を一般化し、最先端の DIRA よりも改善されたことを示す。最後に、量子ベルの不等式を用いて、量子相関集合を特徴づける情報理論の原理である非局所計算における優位性の原理の一般形を導出する。これにより、これまでに知られている量子境界の最も正確な特徴を与える。

Investigations of the boundary of the quantum correlation set through the derivation of quantum Bell inequalities have gained increased attention in recent years, which are related to Tsirelson's problem and have significant applications in DI information processing. However, determining quantum Bell inequalities is a notoriously difficult task and only isolated examples are known. In this paper, we present families of (almost-)quantum Bell inequalities and highlight three foundational and DI applications. Firstly, quantum correlations on the non-signaling boundary are crucial in the DI randomness extraction from weak sources. In the practical Bell scenario of two players with two k-outcome measurements, we derive quantum Bell inequalities that show a separation of the quantum boundary from nonlocal faces of the non-signaling polytope of dimension $\leq 4k-4$, extending previous results. As an immediate by-product of this, we give a general proof of Aumann's Agreement theorem for quantum systems and the almost-quantum correlations, which implies Aumann's agreement theorem is a reasonable physical principle in the context of epistemics to pick out both quantum theory and almost-quantum correlations from general no-signaling theories. Secondly, we present a family of quantum Bell inequalities in the two players with m binary measurements scenarios, that serve to self-test the two-qubit singlet and 2m measurements. Interestingly, this claim generalizes the result for m=2 discovered by Tsirelson-Landau-Masanes and shows an improvement over the state-of-the-art DIRA. Lastly, we use our quantum Bell inequalities to derive the general form of the principle of no advantage in nonlocal computation, which is an information-theoretic principle that serves to characterize the quantum correlation set. With this, we provide the most precise characterization of the quantum boundary known so far.

翻訳日:2024-03-21 00:50:27 公開日:2024-03-18

# 可変量子力学のためのオーバーヘッド拘束回路編み

Overhead-constrained circuit knitting for variational quantum dynamics ( http://arxiv.org/abs/2309.07857v2 )

ライセンス: Link先を確認

Gian Gentinetta, Friederike Metz, Giuseppe Carleo,

(参考訳) 巨大量子系の力学をシミュレーションすることは、量子力学現象のより深い理解を得るための決定的かつ重要な追求である。量子コンピュータはそのようなシミュレーションを高速化する大きな可能性を秘めているが、その実用化は依然として限られたスケールと広範に広まる騒音によって妨げられている。そこで本研究では,大規模な量子系を個別のデバイスでシミュレート可能な小さなサブシステムに分割する回路編み機を用いて,これらの課題に対処する手法を提案する。システムの進化は、予測された変分量子力学(PVQD)アルゴリズムによって制御され、変分量子回路のパラメータの制約が補われ、回路編み方式によって課されるサンプリングオーバーヘッドが制御可能であることを保証する。我々は,複数の弱い絡み合ったブロックを持つ量子スピン系上で,強く相関したスピンからなる量子スピン系上で実験を行い,サンプリングのオーバーヘッドを管理しつつ,ダイナミックスを正確にシミュレートできることを示した。さらに,長径ゲートを切断することで回路深度を低減できることを示す。

Simulating the dynamics of large quantum systems is a formidable yet vital pursuit for obtaining a deeper understanding of quantum mechanical phenomena. While quantum computers hold great promise for speeding up such simulations, their practical application remains hindered by limited scale and pervasive noise. In this work, we propose an approach that addresses these challenges by employing circuit knitting to partition a large quantum system into smaller subsystems that can each be simulated on a separate device. The evolution of the system is governed by the projected variational quantum dynamics (PVQD) algorithm, supplemented with constraints on the parameters of the variational quantum circuit, ensuring that the sampling overhead imposed by the circuit knitting scheme remains controllable. We test our method on quantum spin systems with multiple weakly entangled blocks each consisting of strongly correlated spins, where we are able to accurately simulate the dynamics while keeping the sampling overhead manageable. Further, we show that the same method can be used to reduce the circuit depth by cutting long-ranged gates.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# Beta Divergencesを用いたDeep Non negative Matrix Factorization

Deep Nonnegative Matrix Factorization with Beta Divergences ( http://arxiv.org/abs/2309.08249v3 )

ライセンス: Link先を確認

Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta, Nicolas Gillis,

(参考訳) ディープ非負行列因子化(Deep Non negative Matrix Factorization, ディープNMF)は、最近、異なるスケールで複数の特徴層を抽出する貴重な手法として登場した。しかし、既存のディープNMFモデルとアルゴリズムは、主に最小二乗誤差に基づく評価を中心にしており、多様なデータセットの近似の質を評価するのに最適な指標ではないかもしれない。例えば、オーディオ信号やドキュメントなどのデータタイプを扱う場合、$\beta$-divergencesの方がより適切な代替手段を提供すると広く認識されている。本稿では,Kullback-Leiblerの発散に着目し,$\beta$-divergencesを用いて深部NMFの新しいモデルとアルゴリズムを開発する。その後,これらの手法を,顔の特徴抽出,文書コレクション内の話題の同定,ハイパースペクトル画像内の資料の同定に応用した。

Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $\beta$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $\beta$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# 言語モデリングは圧縮である

Language Modeling Is Compression ( http://arxiv.org/abs/2309.10668v2 )

ライセンス: Link先を確認

Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness,

(参考訳) 予測モデルが損失のない圧縮機に変換できることは、長い間確立されてきた。ちなみに、近年、機械学習コミュニティは、ますます大きくて強力な自己監督型(言語)モデルのトレーニングに重点を置いている。これらの大きな言語モデルは印象的な予測能力を示すため、強い圧縮機として十分に配置されている。本研究では,大規模な(基礎)モデルの圧縮能力を評価するとともに,圧縮レンズを通して予測問題を観測することを提唱する。大規模言語モデルは強力な汎用予測器であり、圧縮視点は法則、トークン化、文脈内学習のスケーリングに関する新しい洞察を提供することを示す。例えば、Chinchilla 70Bは、主にテキストで訓練されているが、ImageNetのパッチを43.4%、LibriSpeechのサンプルを16.4%に圧縮し、それぞれPNG(58.5%)やFLAC(30.3%)といったドメイン固有の圧縮機を圧倒している。最後に、予測圧縮等価性により、任意の圧縮器(gzipなど)を用いて条件付き生成モデルを構築することができることを示す。

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# 核融合スピン鎖上の量子セルオートマトン

An index for quantum cellular automata on fusion spin chains ( http://arxiv.org/abs/2309.10961v2 )

ライセンス: Link先を確認

Corey Jones, Junhwi Lim,

(参考訳) 1次元量子セルオートマトン(QCA)のGNVW指数を部分因子のジョーンズ指数で解釈すると、より一般的な抽象スピン鎖上のQCAに定義された指数の一般化につながる。融合スピン鎖は、大域(カテゴリー/MPO)対称性の下で局所作用素として不変であり、2Dトポロジカル符号の境界作用素として生じる。融合圏 $\mathbf{Fib}$ から構築された融合スピン鎖に対して、指数はQCA変調有限深さ回路群に対する完全不変量であることを示す。

Interpreting the GNVW index for 1D quantum cellular automata (QCA) in terms of the Jones index for subfactors leads to a generalization of the index defined for QCA on more general abstract spin chains. These include fusion spin chains, which arise as the local operators invariant under a global (categorical/MPO) symmetry, and as the boundary operators of 2D topological codes. We show that for the fusion spin chains built from the fusion category $\mathbf{Fib}$, the index is a complete invariant for the group of QCA modulo finite depth circuits.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# 1次元上の自由フェルミオンに対する測定誘起相転移

Measurement-induced phase transition for free fermions above one dimension ( http://arxiv.org/abs/2309.12405v3 )

ライセンス: Link先を確認

Igor Poboiko, Igor V. Gornyi, Alexander D. Mirlin,

(参考訳) 自由フェルミオンモデルに対する$d>1$次元における測定誘起エンタングルメント相転移の理論を開発した。臨界点がギャップレス位相を$\ell^{d-1} \ln \ell$スケーリングと$\ell^{d-1}スケールで分離し、$\ell$はサブシステムのサイズである。この問題は、$R\to 1$を持つ$d+1$次元のSU($R$)レプリカ非線型シグマモデルにマッピングされる。正規化群解析を用いて、1ループ近似における臨界指標を$d = 1+ \epsilon$と$\epsilon \ll 1$で計算する。さらに、平方格子上の$d=2$モデルの遷移の数値的研究を行い、臨界点を数値的に決定し、相関長の臨界指標である$\nu \approx 1.4$を推定する。

A theory of the measurement-induced entanglement phase transition for free-fermion models in $d>1$ dimensions is developed. The critical point separates a gapless phase with $\ell^{d-1} \ln \ell$ scaling of the second cumulant of the particle number and of the entanglement entropy and an area-law phase with $\ell^{d-1}$ scaling, where $\ell$ is a size of the subsystem. The problem is mapped onto an SU($R$) replica non-linear sigma model in $d+1$ dimensions, with $R\to 1$. Using renormalization-group analysis, we calculate critical indices in one-loop approximation justified for $d = 1+ \epsilon$ with $\epsilon \ll 1$. Further, we carry out a numerical study of the transition for a $d=2$ model on a square lattice, determine numerically the critical point, and estimate the critical index of the correlation length, $\nu \approx 1.4$.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# どこまで行くのか?人間とAIのコラボレーションから見たデータストーリーテリングツールを理解する

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration ( http://arxiv.org/abs/2309.15723v2 )

ライセンス: Link先を確認

Haotian Li, Yun Wang, Huamin Qu,

(参考訳) データストーリーテリングは、データの洞察を伝えるのに強力ですが、多様なスキルと人間の創造者によるかなりの努力が必要です。近年の研究では、人工知能(AI)がデータストーリーテリングにおいて人間を支援し、強化する可能性について広く研究されている。しかし、人間とAIのコラボレーションの観点からデータストーリーテリングツールを理解するための体系的なレビューがないため、研究者は人間の利点とAIの利点を促進し、その欠点を緩和する既存のコラボレーションツール設計を反映することを妨げている。本稿では, ストーリーテリング・ワークフローの段階, 分析, 計画, 実装, コミュニケーション, クリエータ, アシスタント, オプティマイザ, レビュアーなど, それぞれの段階における人間とAIの役割について検討した。分析を通じて,既存のツールの共通的なコラボレーションパターンを認識し,これらのパターンから学んだ教訓を要約し,データストーリーテリングにおける人間とAIのコラボレーション研究の機会について説明する。

Data storytelling is powerful for communicating data insights, but it requires diverse skills and considerable effort from human creators. Recent research has widely explored the potential for artificial intelligence (AI) to support and augment humans in data storytelling. However, there lacks a systematic review to understand data storytelling tools from the perspective of human-AI collaboration, which hinders researchers from reflecting on the existing collaborative tool designs that promote humans' and AI's advantages and mitigate their shortcomings. This paper investigated existing tools with a framework from two perspectives: the stages in the storytelling workflow where a tool serves, including analysis, planning, implementation, and communication, and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. Through our analysis, we recognize the common collaboration patterns in existing tools, summarize lessons learned from these patterns, and further illustrate research opportunities for human-AI collaboration in data storytelling.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# 騒音の多い農業環境における3次元再構築 : ビュープランニングのためのベイズ最適化の視点

3D Reconstruction in Noisy Agricultural Environments: A Bayesian Optimization Perspective for View Planning ( http://arxiv.org/abs/2310.00145v2 )

ライセンス: Link先を確認

Athanasios Bacharis, Konstantinos D. Polyzos, Henry J. Nelson, Georgios B. Giannakis, Nikolaos Papanikolopoulos,

(参考訳) 3D再構築は、農業、水中、都市環境など、さまざまな実践的な環境において大きな影響を与えているため、ロボット工学の基本的な課題である。このタスクはビュープランニング(VP)を通じて行うことができ、これは視覚情報を最大化する位置に一定の数のカメラを最適に配置し、その結果の3D再構成を改善することを目的としている。しかし,実世界のほとんどの環境では,既存の環境騒音が3次元再構成の性能に大きな影響を及ぼす可能性がある。そこで本研究では, 閉形式表現を必要とせず, 既存の環境騒音を考慮に入れたVPの幾何的再構成品質関数を提案する。目的関数を解析的に表現することができないため,ノイズの存在下での高精度な3次元再構成のための適応ベイズ最適化アルゴリズムが提案される。騒音の多い農業環境における数値実験は, 少数のカメラを用いた3次元再構築手法の利点を実証するものである。

3D reconstruction is a fundamental task in robotics that gained attention due to its major impact in a wide variety of practical settings, including agriculture, underwater, and urban environments. This task can be carried out via view planning (VP), which aims to optimally place a certain number of cameras in positions that maximize the visual information, improving the resulting 3D reconstruction. Nonetheless, in most real-world settings, existing environmental noise can significantly affect the performance of 3D reconstruction. To that end, this work advocates a novel geometric-based reconstruction quality function for VP, that accounts for the existing noise of the environment, without requiring its closed-form expression. With no analytic expression of the objective function, this work puts forth an adaptive Bayesian optimization algorithm for accurate 3D reconstruction in the presence of noise. Numerical tests on noisy agricultural environments showcase the merits of the proposed approach for 3D reconstruction with even a small number of available cameras.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# Error Norm Truncation:テキスト生成モデルにおけるデータノイズの存在下でのロバストトレーニング

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models ( http://arxiv.org/abs/2310.00840v2 )

ライセンス: Link先を確認

Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi, Kenton Murray,

(参考訳) テキスト生成モデルは、トレーニングデータのエラーに対して脆弱であることが知られている。大量のWebcrawledデータが広範に利用可能になれば、巨大なノイズの多いWebcrawledテキストでトレーニングされたモデルの堅牢性をどのように向上できるか? 本研究では,ノイズの多いデータをトラストする標準学習目標に対する頑健な強化手法であるError Norm Truncation (ENT)を提案する。データ品質を推定するために負の対数損失のみを用いる手法と比較して、本手法は、過去の研究で見落とされがちな非ターゲットトークンの分布を考慮し、より正確な推定を行う。言語モデリング,機械翻訳,テキスト要約に関する総合的な実験を通じて,テキスト生成モデルにENTを組み込むことで,標準学習や従来のソフト・ハード・トランケーション法よりも生成品質が向上することを示す。さらに,本手法は,機械翻訳において最も有害な2種類のノイズに対するモデルのロバスト性を向上し,最大50%のノイズが加わった場合に,MLEベースライン上で2以上のBLEU点が増加することを示した。

Text generation models are notoriously vulnerable to errors in the training data. With the wide-spread availability of massive amounts of web-crawled data becoming more commonplace, how can we enhance the robustness of models trained on a massive amount of noisy web-crawled text? In our work, we propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data. Compared to methods that only uses the negative log-likelihood loss to estimate data quality, our method provides a more accurate estimation by considering the distribution of non-target tokens, which is often overlooked by previous work. Through comprehensive experiments across language modeling, machine translation, and text summarization, we show that equipping text generation models with ENT improves generation quality over standard training and previous soft and hard truncation methods. Furthermore, we show that our method improves the robustness of models against two of the most detrimental types of noise in machine translation, resulting in an increase of more than 2 BLEU points over the MLE baseline when up to 50% of noise is added to the data.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# ImageNet-OOD: 現代のアウト・オブ・ディストリビューション検出アルゴリズムの解読

ImageNet-OOD: Deciphering Modern Out-of-Distribution Detection Algorithms ( http://arxiv.org/abs/2310.01755v2 )

ライセンス: Link先を確認

William Yang, Byron Zhang, Olga Russakovsky,

(参考訳) アウト・オブ・ディストリビューション(OOD)検出のタスクは、未定義で悪名高い。初期の研究は「セマンティックシフト(semantic shift)」とも呼ばれるラベル変更データ分散シフトを識別することを目的とした、新しいクラス検出に焦点を当てていた。しかし、最近の研究は、障害検出に焦点を当て、OOD評価フレームワークを拡張して、ラベル保存データ分散シフト("covariate shift"とも呼ばれる)を考慮に入れている。興味深いことに、この新たな枠組みの下では、これまで最先端と見なされていた複雑なOOD検出器が、単純な最大ソフトマックス確率ベースラインと同じような、あるいはさらに悪い性能を発揮する。最新のOOD検出器は何が実際に検出されているのか? OOD検出アルゴリズムの振る舞いを解読するには、セマンティックシフトと共変量シフトを分離する評価データセットが必要である。本研究では,共変量シフトの干渉を最小限に抑えるクリーンなセマンティックシフトデータセットであるImageNet-OODを提案する。総合的な実験を通して、OOD検出器は意味的シフトよりも共変量シフトに敏感であることが示され、最近のOOD検出アルゴリズムのセマンティックシフト検出に対する利点は最小限である。我々のデータセットと分析は、将来のOOD検出器の設計を導く上で重要な洞察を提供する。

The task of out-of-distribution (OOD) detection is notoriously ill-defined. Earlier works focused on new-class detection, aiming to identify label-altering data distribution shifts, also known as "semantic shift." However, recent works argue for a focus on failure detection, expanding the OOD evaluation framework to account for label-preserving data distribution shifts, also known as "covariate shift." Intriguingly, under this new framework, complex OOD detectors that were previously considered state-of-the-art now perform similarly to, or even worse than the simple maximum softmax probability baseline. This raises the question: what are the latest OOD detectors actually detecting? Deciphering the behavior of OOD detection algorithms requires evaluation datasets that decouples semantic shift and covariate shift. To aid our investigations, we present ImageNet-OOD, a clean semantic shift dataset that minimizes the interference of covariate shift. Through comprehensive experiments, we show that OOD detectors are more sensitive to covariate shift than to semantic shift, and the benefits of recent OOD detection algorithms on semantic shift detection is minimal. Our dataset and analyses provide important insights for guiding the design of future OOD detectors.

翻訳日:2024-03-21 00:40:38 公開日:2024-03-18

# $\mathcal{B}$-Coder: プログラム合成のための価値に基づく深層強化学習

$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis ( http://arxiv.org/abs/2310.03173v2 )

ライセンス: Link先を確認

Zishun Yu, Yunzhe Tao, Liyu Chen, Tao Sun, Hongxia Yang,

(参考訳) プログラム合成は,問題仕様,特に文脈における自然言語記述から,正確な実行可能プログラムを作成することを目的としている。近年,大規模言語モデル(LLM)とともに強化学習(RL)の能力を活用し,コード生成能力を大幅に向上させている。 RLの応用は機能的正当性を直接最適化することに焦点を当て、従来の教師付き手法よりも有利である。ポリシーに基づくRL法は、プログラム合成のためのRLに関する文献を支配しているが、プログラム合成タスクの性質は、値ベースの方法と自然な整合性を示唆している。これは、人間のプログラマによって開発されたプログラムや歴史的なサンプルを含む、豊富なオフポリティプログラムの収集と、自動単体テストによる生成プログラムの直接的な検証から来ており、報酬は容易に得られることを意味する。ポリシーベースのアルゴリズムの優位性から、我々の研究は価値ベースのアプローチの実現可能性を探究し、$\mathcal{B}$-Coder(ベルマン・コーダ)の開発に繋がる。しかし,プログラム合成に固有の膨大な検索空間のために,価値に基づく学習手法が課題を呈している。そこで本研究では,事前学習されたLMと保守的なベルマン演算子を用いたRLエージェントの初期化プロトコルを導入し,学習の複雑さを低減した。さらに、学習した値関数を、生成したプログラムを後処理する双対戦略として活用する方法を実証する。実証評価では,ポリシベースの手法と比較して,最先端性能を実現するための$\mathcal{B}$-Coderの能力を実証した。注目すべきことに、この成果は最小限の報酬工学努力で達成され、報酬設計とは無関係に価値に基づくRLの有効性を強調している。

Program synthesis aims to create accurate, executable programs from problem specifications, specifically from natural language descriptions in our context. Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs), significantly enhancing code generation capabilities. The application of RL focuses on directly optimizing for functional correctness, offering an advantage over conventional supervised methods. Despite policy-based RL methods dominating the literature on RL for program synthesis, the nature of program synthesis tasks hints at a natural alignment with value-based methods. This stems from the rich collection of off-policy programs, including those developed by human programmers and also historical samples, coupled with the straightforward verification of generated programs through automated unit testing, meaning rewards are easy to obtain. Diverging from the dominant use of policy-based algorithms, our work explores the feasibility of value-based approaches, leading to the development of our $\mathcal{B}$-Coder (pronounced Bellman coder). Yet, training value-based methods presents challenges due to the enormous search space inherent to program synthesis. To this end, we introduce an initialization protocol for RL agents utilizing pre-trained LMs and a conservative Bellman operator to reduce training complexities. Moreover, we demonstrate how to leverage the learned value functions as a dual strategy to post-process generated programs. Our empirical evaluations demonstrated $\mathcal{B}$-Coder's capability in achieving state-of-the-art performance when compared to policy-based methods. Remarkably, this achievement is reached with minimal reward engineering effort, highlighting the effectiveness of value-based RL, independent of reward designs.

翻訳日:2024-03-21 00:30:47 公開日:2024-03-18

# 困難に適応した軌道マッチングによるロスレスデータセット蒸留に向けて

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching ( http://arxiv.org/abs/2310.05773v2 )

ライセンス: Link先を確認

Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, Yang You,

(参考訳) Dataset Distillationの最終的な目標は、この合成セットでトレーニングされたモデルが、完全な実際のデータセットでトレーニングされたモデルと同等に機能するように、小さな合成データセットを合成することである。これまでのデータセット蒸留法は, 合成試料の総数が極端に少ない場合にのみ, 従来の方法が有効であることから, 完全に損失のない目標に達していない。このような少数のサンプルに十分な情報しか含められないため、真の損失データセット蒸留を実現するためには、合成データセットのサイズが大きくなるにつれて有効である蒸留法を開発する必要があると考えられる。本研究では,既存の手法が大規模で高品質な合成集合を生成できない理由を解明する。現在の最先端の手法は、軌道マッチングに依存するか、あるいは合成データを最適化して、実際のデータと同様の長期トレーニングダイナミクスを誘導する。実験により, 一致する軌道(早期または後期)の訓練段階が, 蒸留データセットの有効性に大きく影響していることが判明した。具体的には、教師ネットワークが容易にパターンを学習する)初期の軌跡は、必要な情報を配布する事例が少ないため、低カルディナリティの合成セットとしてうまく機能する。逆に、(教師ネットワークがハードパターンを学習する)後期軌道は、必要な複雑なパターンを表現するのに十分なサンプルがあるため、より大きな合成セットに対してより良い信号を提供する。そこで本研究では,生成したパターンの難易度を合成データセットのサイズに合わせることを提案する。そこで我々は, トラジェクトリーマッチング法を大規模合成データセットに拡張し, ロスレスなデータセット蒸留を初めて達成した。コードと蒸留データセットはhttps://gzyaftermath.github.io/DATMで入手できる。

The ultimate goal of Dataset Distillation is to synthesize a small synthetic dataset such that a model trained on this synthetic set will perform equally well as a model trained on the full, real dataset. Until now, no method of Dataset Distillation has reached this completely lossless goal, in part due to the fact that previous methods only remain effective when the total number of synthetic samples is extremely small. Since only so much information can be contained in such a small number of samples, it seems that to achieve truly loss dataset distillation, we must develop a distillation method that remains effective as the size of the synthetic dataset grows. In this work, we present such an algorithm and elucidate why existing methods fail to generate larger, high-quality synthetic sets. Current state-of-the-art methods rely on trajectory-matching, or optimizing the synthetic data to induce similar long-term training dynamics as the real data. We empirically find that the training stage of the trajectories we choose to match (i.e., early or late) greatly affects the effectiveness of the distilled dataset. Specifically, early trajectories (where the teacher network learns easy patterns) work well for a low-cardinality synthetic set since there are fewer examples wherein to distribute the necessary information. Conversely, late trajectories (where the teacher network learns hard patterns) provide better signals for larger synthetic sets since there are now enough samples to represent the necessary complex patterns. Based on our findings, we propose to align the difficulty of the generated patterns with the size of the synthetic dataset. In doing so, we successfully scale trajectory matching-based methods to larger synthetic datasets, achieving lossless dataset distillation for the very first time. Code and distilled datasets are available at https://gzyaftermath.github.io/DATM.

翻訳日:2024-03-21 00:30:47 公開日:2024-03-18

# 性能保証付きユニットコミット予測器:サポートベクトルマシン分類器

Unit Commitment Predictor With a Performance Guarantee: A Support Vector Machine Classifier ( http://arxiv.org/abs/2310.08601v2 )

ライセンス: Link先を確認

Farzaneh Pourahmadi, Jalal Kazempour,

(参考訳) システムオペレータは通常、計算の限られた時間枠内で大規模な単位コミットメント問題を解決する必要がある。本稿では,従来の単位のオン/オフ決定を学習し,予測することにより,システムオペレーターが解法を温め,計算を著しく高速化する可能性を示す。予測のために、線形およびカーネル化されたサポートベクタマシン分類器を訓練し、適切に正規化され、分散的に堅牢な分類器に変換された場合、サンプル外の性能保証を提供する。単位コミットメント問題に対して、混合整数二階コーン問題を解く。 IEEE 6-および118-busテストシステムに基づく結果,正規化を適切に行うカーネル化されたSVMは他の分類器よりも優れた性能を示し,計算時間を1.7倍に短縮した。さらに、厳密な計算限界が存在する場合、温暖化開始のない単位コミットメント問題は最適解から遠く離れており、その温暖化開始版は時間限界内で(ほぼ)最適に解ける。

The system operators usually need to solve large-scale unit commitment problems within limited time frame for computation. This paper provides a pragmatic solution, showing how by learning and predicting the on/off commitment decisions of conventional units, there is a potential for system operators to warm start their solver and speed up their computation significantly. For the prediction, we train linear and kernelized support vector machine classifiers, providing an out-of-sample performance guarantee if properly regularized, converting to distributionally robust classifiers. For the unit commitment problem, we solve a mixed-integer second-order cone problem. Our results based on the IEEE 6- and 118-bus test systems show that the kernelized SVM with proper regularization outperforms other classifiers, reducing the computational time by a factor of 1.7. In addition, if there is a tight computational limit, while the unit commitment problem without warm start is far away from the optimal solution, its warmly-started version can be solved to (near) optimality within the time limit.

翻訳日:2024-03-21 00:30:47 公開日:2024-03-18

# 翻訳の文脈的リファインメント:文文と文書レベルの後編集のための大規模言語モデル

Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing ( http://arxiv.org/abs/2310.14855v2 )

ライセンス: Link先を確認

Sai Koneru, Miriam Exel, Matthias Huck, Jan Niehues,

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理タスクでかなりの成功を収めてきたが、ニューラルネットワーク翻訳(NMT)では、まだ最先端のパフォーマンスを達成できていない。それでも、広範囲の理解と文脈処理を必要とするタスクにおける重要なパフォーマンスは、翻訳の可能性を示している。これらの能力を活かすために, MT 用 LLM を用いて最近のパラメータ効率向上技術について検討する。驚くべきことに、私たちの最初の実験では、翻訳目的の微調整がパフォーマンスの低下につながることもわかりました。そこで本研究では,LLMを直接翻訳者ではなく自動編集者 (APE) として適応するアプローチを提案する。長いシーケンスを処理・生成するLLMの異常な能力に基づいて、文書レベルの翻訳へのアプローチの拡張も提案する。 APEにローランドアダプタの微調整を適用することで、文レベルと文書レベルの両方のメトリクスが大幅に改善され、ドメイン外データへの一般化が期待できることを示す。最も顕著なのは、ContraProテストセットで89倍の最先端精度を実現し、特に、英語からドイツ語への翻訳において、代名詞のあいまいさを解消する能力を評価することである。最後に、参照コンテキストが利用可能となる文書レベルの翻訳を手作業で後編集する実践シナリオについて検討する。ここでは、人間の修正を活用することで、後続の翻訳に必要な編集回数を大幅に削減できることを実証する(手動フィードバックを統合するInteractive Demoは、https://huggingface.co/spaces/skoneru/contextual_refinement_endeを参照)。

Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations (Interactive Demo for integrating manual feedback can be found here: https://huggingface.co/spaces/skoneru/contextual_refinement_ende).

翻訳日:2024-03-21 00:30:47 公開日:2024-03-18

# 3次元マスク付きオートエンコーダを用いたMRIスキャンのプライバシー保護

Privacy Protection in MRI Scans Using 3D Masked Autoencoders ( http://arxiv.org/abs/2310.15778v3 )

ライセンス: Link先を確認

Lennart Alexander Van der Goten, Kevin Smith,

(参考訳) MRIスキャンは貴重な医療情報を提供するが、保護すべき機密情報や個人識別情報も含む。 MRIメタデータは容易にサニタイズされるが、MRI画像データは患者の頭部の高現実的な3Dヴィジュアライゼーションをレンダリングする情報を含んでいるため、データベースを相互参照することで、悪意あるアクターが被検体を特定できるため、プライバシのリスクである。データ匿名化と非識別化は、個人の個人情報のプライバシーと機密性の確保に関係している。従来のMRIの非識別方法は、特定のスキャンからプライバシーに敏感な部分(目、鼻など)を取り除く。これは、ダウンストリーム分析をオフにできるドメインシフトの導入に費やされる。本研究では,マスク付きオートエンコーダを用いて部品を除去する代わりに,顔のリモデリング(例えば顔の変更)によって顔を識別するCP-MAEを提案する。 CP-MAEは、ダウンストリームタスクのパフォーマンスと非識別の観点から、以前のアプローチよりも優れています。我々の方法では、解像度が最大256^3$までの高忠実度スキャンを合成できるが、従来の手法では128^3$であるのに対し、ボクセルの数は8倍に増加する。

MRI scans provide valuable medical information, however they also contain sensitive and personally identifiable information that needs to be protected. Whereas MRI metadata is easily sanitized, MRI image data is a privacy risk because it contains information to render highly-realistic 3D visualizations of a patient's head, enabling malicious actors to possibly identify the subject by cross-referencing a database. Data anonymization and de-identification is concerned with ensuring the privacy and confidentiality of individuals' personal information. Traditional MRI de-identification methods remove privacy-sensitive parts (e.g. eyes, nose etc.) from a given scan. This comes at the expense of introducing a domain shift that can throw off downstream analyses. In this work, we propose CP-MAE, a model that de-identifies the face by remodeling it (e.g. changing the face) rather than by removing parts using masked autoencoders. CP-MAE outperforms all previous approaches in terms of downstream task performance as well as de-identification. With our method we are able to synthesize high-fidelity scans of resolution up to $256^3$ -- compared to $128^3$ with previous approaches -- which constitutes an eight-fold increase in the number of voxels.

翻訳日:2024-03-21 00:30:47 公開日:2024-03-18

# IIDウェイトを超えて:スパースと低ランクのディープニューラルネットワークもガウス的プロセスである

Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes ( http://arxiv.org/abs/2310.16597v3 )

ライセンス: Link先を確認

Thiziri Nait-Saada, Alireza Naderi, Jared Tanner,

(参考訳) 無限に広いニューラルネットワークは、ディープラーニングに現れる多くの現象の理解を可能にする、有用で管理可能な数学的モデルであることが証明されている。例えば、ランダムディープネットワークをガウス過程に収束させることで、活性化関数とネットワークウェイトの選択がトレーニング力学にどのように影響するかを厳密に分析することができる。本稿では, Matthews et al (2018) の初歩的な証明を, IID や直交重みの確立した事例を含むより大規模な初期重量分布(PSEUDO-IID と呼ぶ)に拡張するとともに, 計算速度向上のために, 新たな低ランクで構造化されたスパースな設定を行う。また,PSEUDO-IID分布を初期化した完全接続型・畳み込み型ネットワークは,その分散により有効に等価であることを示す。この結果を用いて、ニューラルネットワークの幅広いクラスに対してEdge-of-Chaosを識別し、トレーニングを強化するために臨界度で調整することができる。さらに、ベイズニューラルネットワークの後方分布をこれらの様々な初期化スキームで引き出せるようにしている。

The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training. Moreover, they enable the posterior distribution of Bayesian Neural Networks to be tractable across these various initialization schemes.

翻訳日:2024-03-21 00:20:56 公開日:2024-03-18

# AIによる意思決定におけるインタラクションパターンの分類 : 体系的なレビューから

Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review ( http://arxiv.org/abs/2310.19778v3 )

ライセンス: Link先を確認

Catalina Gomez, Sue Min Cho, Shichang Ke, Chien-Ming Huang, Mathias Unberath,

(参考訳) 意思決定支援システムにおける人工知能(AI)の活用は、しばしばアルゴリズムの出力と人間の期待の一致を見越して、技術進歩に不相応に焦点を合わせてきた。人間中心の視点は、既存のプロセスとのシームレスな統合のためにAIソリューションを設計することで、この懸念を緩和しようとする。 AIが人間を助けるために提供すべき情報を決定することは不可欠である。しかし、情報がどのように提示されるか、例えば、レコメンデーションのシーケンスと解釈のソリケーションは、人間とAIの間の複雑な相互作用が出現する可能性があるため、同様に重要である。実証的研究は、ドメイン間の人間とAIのダイナミクスを評価してきたが、人間とAIのインタラクションプロトコルの共通語彙は欠如している。インタラクションデザインのより慎重な考察を促進するために,人間とAIのインタラクションの様々なモードを規定するインタラクションパターンの分類を導入する。本稿では,AIによる意思決定文献の体系的レビューの結果を要約し,アプリケーションドメイン間でのインタラクションのトレンドと機会を105記事から抽出する。現在のインタラクションは、単純化されたコラボレーションパラダイムによって支配されており、真のインタラクティブな機能はほとんどサポートされません。我々の分類学は、意思決定におけるAIとの相互作用を理解するツールを提供し、コミュニケーション、信頼性、コラボレーションを明確化するための相互作用設計を育む。

Leveraging Artificial Intelligence (AI) in decision support systems has disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. A human-centered perspective attempts to alleviate this concern by designing AI solutions for seamless integration with existing processes. Determining what information AI should provide to aid humans is vital, a concept underscored by explainable AI's efforts to justify AI predictions. However, how the information is presented, e.g., the sequence of recommendations and solicitation of interpretations, is equally crucial as complex interactions may emerge between humans and AI. While empirical studies have evaluated human-AI dynamics across domains, a common vocabulary for human-AI interaction protocols is lacking. To promote more deliberate consideration of interaction designs, we introduce a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We summarize the results of a systematic review of AI-assisted decision making literature and identify trends and opportunities in existing interactions across application domains from 105 articles. We find that current interactions are dominated by simplistic collaboration paradigms, leading to little support for truly interactive functionality. Our taxonomy offers a tool to understand interactivity with AI in decision-making and foster interaction designs for achieving clear communication, trustworthiness, and collaboration.

翻訳日:2024-03-21 00:20:56 公開日:2024-03-18

# 因果介入による移動予測ネットワークの行動への影響の解明

Revealing behavioral impact on mobility prediction networks through causal interventions ( http://arxiv.org/abs/2311.11749v2 )

ライセンス: Link先を確認

Ye Hong, Yanan Xin, Simon Dirmeier, Fernando Perez-Cruz, Martin Raubal,

(参考訳) ディープニューラルネットワークは、モビリティ予測タスクにますます活用されているが、その複雑な内部動作は、特にモビリティ行動の様々な側面が予測にどのように影響するかを理解する際に、解釈可能性に課題をもたらす。本研究では、次の位置予測のために設計されたニューラルネットワークに対する移動関連要因の影響を評価するための因果介入フレームワークを紹介する。これを実現するために,個別の移動モデルを用いて,データ生成プロセスに介入して,合成位置情報シーケンスを生成し,動作のダイナミクスを制御する。移動度測定値を用いて介入位置列を評価し、よく訓練されたネットワークに入力し、性能変動を分析する。その結果, 異なる移動行動を伴う位置列の生成の有効性が示され, 多様な空間的・時間的変化のシミュレーションが容易となった。これらの変化は、次の位置予測ネットワークのパフォーマンス変動をもたらし、位置遷移のシーケンシャルなパターン、新しい位置を探索する確率、人口と個人レベルの位置選択の好みなど、重要な移動行動要因の影響を明らかにする。得られた知見は、モビリティ予測ネットワークの現実的な応用に重要な価値を持ち、このフレームワークは、モビリティアプリケーションにおけるニューラルネットワークの解釈可能性と堅牢性を高めるための因果推論の利用を促進することが期待されている。

Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. This study introduces a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location prediction -- a task focusing on predicting the immediate next location of an individual. To achieve this, we employ individual mobility models to generate synthetic location visit sequences and control behavior dynamics by intervening in their data generation process. We evaluate the interventional location sequences using mobility metrics and input them into well-trained networks to analyze performance variations. The results demonstrate the effectiveness in producing location sequences with distinct mobility behaviors, thereby facilitating the simulation of diverse yet realistic spatial and temporal changes. These changes result in performance fluctuations in next location prediction networks, revealing impacts of critical mobility behavior factors, including sequential patterns in location transitions, proclivity for exploring new locations, and preferences in location choices at population and individual levels. The gained insights hold significant value for the real-world application of mobility prediction networks, and the framework is expected to promote the use of causal inference for enhancing the interpretability and robustness of neural networks in mobility applications.

翻訳日:2024-03-21 00:11:07 公開日:2024-03-18

# 質と量:ファッションデザインにおけるテキストと画像の合成のための何百万もの高品質な画像

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design ( http://arxiv.org/abs/2311.12067v3 )

ライセンス: Link先を確認

Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan,

(参考訳) AIとファッションデザインの融合は、有望な研究分野として浮上している。しかし、衣料品や試着段階に関する広範な相互関連データが欠如しているため、この領域におけるAIの潜在能力は損なわれている。これに対応するために、我々は、数年にわたる厳格な取り組みの産物であるFashion-Diffusionデータセットを提示する。このデータセットは、最初のもので、100万以上の高品質なファッションイメージで構成され、詳細なテキスト記述と組み合わせている。さまざまな地理的な場所と文化的背景から得られたデータセットは、世界的なファッショントレンドをカプセル化している。この画像には、衣服や人間に関連する細かい属性が刻まれており、ファッションデザインプロセスを単純化してテキスト・ツー・イメージ(T2I)タスクにしている。 Fashion-Diffusionデータセットは、高品質なテキストイメージペアと多様なヒューマンガーメントペアを提供するだけでなく、人間に関する大規模なリソースとしても機能し、T2I世代の研究を促進する。さらに、T2Iに基づくファッションデザイン分野の標準化を促進するために、ファッションデザインモデルの性能を評価するために、複数のデータセットからなる新しいベンチマークを提案する。この研究は、AI駆動のファッションデザインの領域における大きな飛躍であり、この分野における将来の研究のための新しい標準を確立している。

The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.

翻訳日:2024-03-21 00:11:07 公開日:2024-03-18

# 学習したフォワード演算子の逆問題

Inverse Problems with Learned Forward Operators ( http://arxiv.org/abs/2311.12528v2 )

ライセンス: Link先を確認

Simon Arridge, Andreas Hauptmann, Yury Korolev,

(参考訳) 逆問題の解決にはフォワード演算子の知識が必要だが、正確なモデルは計算コストがかかるため、復元品質を損なわないより安価な変種が望まれる。本章は、2つの異なるパラダイムに従う学習前方演算子による逆問題における再構成手法についてレビューする。 1つ目は、フォワード演算子に完全に依存せず、トレーニングデータにまたがる部分空間に対する制限を学習する。射影による正規化の枠組みは、再構成を見つけるために使われる。 2つ目は、測定プロセスの物理の単純化されたモデルを使用し、モデルの修正を学習するためにトレーニングデータのみに依存する。これら2つのアプローチの理論を数値的に比較する。両方のメソッドは、フォワード演算子だけでなく、アジョイントのためにもトレーニングデータを必要とする。

Solving inverse problems requires the knowledge of the forward operator, but accurate models can be computationally expensive and hence cheaper variants that do not compromise the reconstruction quality are desired. This chapter reviews reconstruction methods in inverse problems with learned forward operators that follow two different paradigms. The first one is completely agnostic to the forward operator and learns its restriction to the subspace spanned by the training data. The framework of regularisation by projection is then used to find a reconstruction. The second one uses a simplified model of the physics of the measurement process and only relies on the training data to learn a model correction. We present the theory of these two approaches and compare them numerically. A common theme emerges: both methods require, or at least benefit from, training data not only for the forward operator, but also for its adjoint.

翻訳日:2024-03-21 00:11:07 公開日:2024-03-18

# RLIF: 強化学習としてのインタラクティブな模倣学習

RLIF: Interactive Imitation Learning as Reinforcement Learning ( http://arxiv.org/abs/2311.12996v2 )

ライセンス: Link先を確認

Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine,

(参考訳) 強化学習手法は、自動スキル獲得のための強力なフレームワークを提供するが、ロボット工学のような分野における実践的な学習ベースの制御問題に対して、模倣学習はより便利でアクセスしやすい代替手段を提供することが多い。特に, DAggerなどのインタラクティブな模倣学習手法では, 最適に近い専門家にオンラインで介入を依頼して, na\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ \\\\\\\\\\\\\\\\\\\ 本稿では,対話型模倣学習と類似するが,さらに実践的な仮定の下で,非政治強化学習がパフォーマンス向上を実現する方法について検討する。提案手法は,ユーザ介入信号を用いた強化学習を報奨として利用する。このことは、インタラクティブな模倣学習において介入する専門家がほぼ最適であるべきだという仮定を緩和し、アルゴリズムが潜在的に最適でない人間の専門家よりも改善される行動を学ぶことを可能にする。また,RL法とDAggerを統一的に解析するためのフレームワークも提供し,本手法の非漸近的サンプル複雑性境界だけでなく,両手法の最適下界の漸近的解析について述べる。次に,実世界のロボットビジョンに基づく操作タスクと同様に,高次元連続制御シミュレーションベンチマークの課題に対する評価を行った。結果は,特に介入する専門家が最適でない場合には,DAggerのようなアプローチよりも優れていることを示す。コードとビデオはプロジェクトのWebサイト(https://rlif-page.github.io)で見ることができる。

Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict na\"ive behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: https://rlif-page.github.io

翻訳日:2024-03-21 00:11:07 公開日:2024-03-18

# D-SCo:単分子ハンドヘルド物体再構成のためのデュアルストリーム条件拡散

D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction ( http://arxiv.org/abs/2311.14189v2 )

ライセンス: Link先を確認

Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari,

(参考訳) 単一のRGB画像からハンドヘルドオブジェクトを再構築することは、コンピュータビジョンにおいて難しい課題である。決定論的モデリングのパラダイムを利用する先行研究とは対照的に、この問題の確率論的性質を考慮に入れた点雲デノナイズ拡散モデルを用いる。中核部では,単眼ハンドヘルドオブジェクト再構成(D-SCo)のための遠心固定型二重ストリーム条件拡散を導入し,二つの課題に対処した。まず,物体の遠方偏差を回避するため,手拘束型遠方偏差固定パラダイムを用い,拡散・逆過程の安定性と特徴投影の精度を向上させる。第2に,新しい手オブジェクトセマンティック埋め込みによる手オブジェクトのセマンティックな相互作用を意味的かつ幾何学的にモデル化し,手対象領域の再構築性能を向上させるために,デュアルストリームデノイザを導入する。 ObManデータセットと、HO3D、MOW、DexYCBの3つの実世界のデータセットの実験は、我々のアプローチが他の最先端の手法を全て超えることを示した。コードはリリースされる。

Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction (D-SCo), tackling two predominant challenges. First, to avoid the object centroid from deviating, we utilize a novel hand-constrained centroid fixing paradigm, enhancing the stability of diffusion and reverse processes and the precision of feature projection. Second, we introduce a dual-stream denoiser to semantically and geometrically model hand-object interactions with a novel unified hand-object semantic embedding, enhancing the reconstruction performance of the hand-occluded region of the object. Experiments on the synthetic ObMan dataset and three real-world datasets HO3D, MOW and DexYCB demonstrate that our approach can surpass all other state-of-the-art methods. Codes will be released.

翻訳日:2024-03-21 00:11:07 公開日:2024-03-18

# 時系列予測のためのモジュールニューラルネットワーク:注意を用いた解釈可能性と特徴選択

Modular Neural Networks for Time Series Forecasting: Interpretability and Feature Selection using Attention ( http://arxiv.org/abs/2311.16834v3 )

ライセンス: Link先を確認

Qiqi Su, Christos Kloukinas, Artur d'Avila Garcez,

(参考訳) 多変量時系列は、医療や気象学から生命科学まで、多くの応用がある。ディープラーニングモデルは時系列で優れた予測性能を示してきたが、彼らは「ブラックボックス」か非解釈可能であると批判されてきた。本稿では,構築によって解釈可能な多変量時系列予測のための新しいモジュール型ニューラルネットワークモデルを提案する。リカレントニューラルネットワークはデータ内の時間的依存関係を学習し、アテンションベースの特徴選択コンポーネントは最も関連性の高い特徴を選択し、時間的依存関係の学習に使用される冗長な特徴を抑制する。モジュール型のディープネットワークは、選択した機能から独立してトレーニングされ、ユーザーが機能がどのように結果に影響を与えるかを示し、モデルを解釈できる。実験結果から,本手法は,時系列タスクの回帰と分類の両方において,最先端の非解釈可能な手法であるLSTM,XGBoostに匹敵する予測性能を達成し,最先端の解釈可能なニューラル付加モデル(NAM)およびそれらのバリエーションより優れていることが示された。

Multivariate time series have many applications, from healthcare and meteorology to life science. Although deep learning models have shown excellent predictive performance for time series, they have been criticised for being "black-boxes" or non-interpretable. This paper proposes a novel modular neural network model for multivariate time series prediction that is interpretable by construction. A recurrent neural network learns the temporal dependencies in the data while an attention-based feature selection component selects the most relevant features and suppresses redundant features used in the learning of the temporal dependencies. A modular deep network is trained from the selected features independently to show the users how features influence outcomes, making the model interpretable. Experimental results show that this approach can outperform state-of-the-art interpretable Neural Additive Models (NAM) and variations thereof in both regression and classification of time series tasks, achieving a predictive performance that is comparable to the top non-interpretable methods for time series, LSTM and XGBoost.

翻訳日:2024-03-21 00:01:19 公開日:2024-03-18

# 多項信念ネットワーク

Multinomial belief networks ( http://arxiv.org/abs/2311.16909v2 )

ライセンス: Link先を確認

H. C. Donker, D. Neijzen, J. de Jong, G. A. Lunter,

(参考訳) 機械学習に対するベイズ的アプローチは、不確実性を定量化したり、観察の欠如に対処したり、サンプルが不足したり、データが不足する場合に魅力的である。これらの全ては、医療データを分析する際に一般的に適用される。これらの解析的要求に対処するために,ネットワークの重みと隠れた単位の両方をディリクレ分布とする多項数データの深部生成モデルを提案する。 Gibbsサンプリング手順は、Zhou-Cong-Chenモデルに類似した一連の拡張関係を利用する。本モデルは,手書き小文字と癌DNA変異の大規模な実験データセットに適用し,そのモデルが生物学的に意味のあるメタシグナチャを完全データ駆動で抽出できることを示す。

A Bayesian approach to machine learning is attractive when we need to quantify uncertainty, deal with missing observations, when samples are scarce, or when the data is sparse. All of these commonly apply when analysing healthcare data. To address these analytical requirements, we propose a deep generative model for multinomial count data where both the weights and hidden units of the network are Dirichlet distributed. A Gibbs sampling procedure is formulated that takes advantage of a series of augmentation relations, analogous to the Zhou--Cong--Chen model. We apply the model on small handwritten digits, and a large experimental dataset of DNA mutations in cancer, and we show how the model is able to extract biologically meaningful meta-signatures in a fully data-driven way.

翻訳日:2024-03-21 00:01:19 公開日:2024-03-18

# COLE:多層・編集可能なグラフィクス設計のための階層型生成フレームワーク

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design ( http://arxiv.org/abs/2311.16974v2 )

ライセンス: Link先を確認

Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo,

(参考訳) 15世紀から進化してきたグラフィックデザインは、広告において重要な役割を担っている。高品質な設計を作成するには、設計指向の計画、推論、レイヤワイズ生成が必要である。 GPT-4を既存のデザインテンプレートと統合して独自のGPTを構築するCanvaGPTとは異なり、本研究ではこれらの課題に包括的に対処するために設計された階層型生成フレームワークであるCOLEシステムを紹介する。このCOLEシステムは、曖昧な意図のプロンプトを高品質な多層グラフィック設計に変換すると同時に、ユーザ入力に基づく柔軟な編集をサポートする。このような入力の例としては、久石の演奏会の「ポスターをデザインする」などの指示がある。重要な洞察は、テキスト・デザイン生成の複雑なタスクを単純なサブタスクの階層に分解することであり、それぞれが協調して動作する専門モデルによって対処される。これらのモデルの結果は、結合的な最終的な出力を生成するために統合される。我々の階層的なタスク分解は、複雑なプロセスを合理化し、生成信頼性を大幅に向上させることができる。我々のCOLEシステムは、複数の微調整されたLarge Language Model(LLM)、Large Multimodal Model(LMM)、Diffusion Models(DM)から構成される。さらに,ユーザ意図から高品質なグラフィックデザインを生成する上で,既存の手法よりもCOLEシステムの方が優れていることを示すために,DESIGNINTENTIONベンチマークを構築した。最後に、生成した多層グラフィック画像のフレキシブルな編集を支援するCanvaのような多層画像編集ツールを提案する。我々はCOLEシステムを、より複雑で多層的なグラフィックデザイン生成タスクに今後取り組むための重要なステップとして捉えている。

Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands design-oriented planning, reasoning, and layer-wise generation. Unlike the recent CanvaGPT, which integrates GPT-4 with existing design templates to build a custom GPT, this paper introduces the COLE system - a hierarchical generation framework designed to comprehensively address these challenges. This COLE system can transform a vague intention prompt into a high-quality multi-layered graphic design, while also supporting flexible editing based on user input. Examples of such input might include directives like ``design a poster for Hisaishi's concert.'' The key insight is to dissect the complex task of text-to-design generation into a hierarchy of simpler sub-tasks, each addressed by specialized models working collaboratively. The results from these models are then consolidated to produce a cohesive final output. Our hierarchical task decomposition can streamline the complex process and significantly enhance generation reliability. Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text. Furthermore, we construct the DESIGNINTENTION benchmark to demonstrate the superiority of our COLE system over existing methods in generating high-quality graphic designs from user intent. Last, we present a Canva-like multi-layered image editing tool to support flexible editing of the generated multi-layered graphic design images. We perceive our COLE system as an important step towards addressing more complex and multi-layered graphic design generation tasks in the future.

翻訳日:2024-03-21 00:01:19 公開日:2024-03-18

# 超振動のキャラクタリゼーションと定量化のための一提案

A proposal to characterize and quantify superoscillations ( http://arxiv.org/abs/2311.17703v2 )

ライセンス: Link先を確認

Yu Li, José Polo-Gómez, Eduardo Martín-Martínez,

(参考訳) 超振動関数の形式的定義を示す。これまでに提案された定義の限界について議論し、超振動挙動の全域をカバーしていないことを示す。本稿では,従来の定義を含まないよく知られた超振動関数の例を用いて,提案手法の適合性を実証する。

We present a formal definition of superoscillating function. We discuss the limitations of previously proposed definitions and illustrate that they do not cover the full gamut of superoscillatory behaviours. We demonstrate the suitability of the new proposal with several examples of well-known superoscillating functions that were not encompassed by previous definitions.

翻訳日:2024-03-21 00:01:19 公開日:2024-03-18

# SlimSAM: 0.1%のデータでセグメンテーションがスリムになる

SlimSAM: 0.1% Data Makes Segment Anything Slim ( http://arxiv.org/abs/2312.05284v3 )

ライセンス: Link先を確認

Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang,

(参考訳) SAM(Segment Anything Model)を圧縮するための現在のアプローチでは、圧縮可能な結果が得られるが、スクラッチから新しいネットワークをトレーニングするためには、広範なデータが必要である。従来のプルーニング技術を用いることで、データ要求を大幅に削減できるが、性能の低下に悩まされる。そこで本研究では,SlimSAMというデータ効率のよいSAM圧縮手法を導入する。 SlimSAMの本質は、極めて限られたトレーニングデータ可用性と例外的な刈り取り率の下で、知識継承を効果的に強化する代替スリム化フレームワークにカプセル化されている。従来の手法から切り離された我々のフレームワークは、異なる分離されたサブ構造を交互に刈り取り、蒸留することによって、モデルを段階的に圧縮する。また, 切断対象とトレーニング対象との相違に対処するため, 破砕後の蒸留を促進させるため, 破砕したテイラープルーニングも提案されている。 SlimSAMは、既存の圧縮方法の10倍以上のトレーニングデータを要求する一方で、大幅なパフォーマンス向上を実現している。オリジナルのSAMと比較しても、SlimSAMはパラメータカウントをわずか1.4% (9.1M)、MACを0.8% (23G)、SAMトレーニングデータの0.1% (10k) に減らしながら、接近性能を達成する。コードはhttp://github.com/czg1225/SlimSAMで入手できる。

Current approaches for compressing the Segment Anything Model (SAM) yield commendable results, yet necessitate extensive data to train a new network from scratch. Employing conventional pruning techniques can remarkably reduce data requirements but would suffer from a degradation in performance. To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression method that achieves superior performance with extremely less training data. The essence of SlimSAM is encapsulated in the alternate slimming framework which effectively enhances knowledge inheritance under severely limited training data availability and exceptional pruning ratio. Diverging from prior techniques, our framework progressively compresses the model by alternately pruning and distilling distinct, decoupled sub-structures. Disturbed Taylor pruning is also proposed to address the misalignment between the pruning objective and training target, thereby boosting the post-distillation after pruning. SlimSAM yields significant performance improvements while demanding over 10 times less training data than any other existing compression methods. Even when compared to the original SAM, SlimSAM achieves approaching performance while reducing parameter counts to merely 1.4% (9.1M), MACs to 0.8% (23G), and requiring only 0.1% (10k) of the SAM training data. The code is available at http://github.com/czg1225/SlimSAM.

翻訳日:2024-03-20 23:51:29 公開日:2024-03-18

# 量子ユーティリティの強化:超伝導量子コンピュータ上での大規模量子スピンチェーンのシミュレーション

Enhancing quantum utility: simulating large-scale quantum spin chains on superconducting quantum computers ( http://arxiv.org/abs/2312.12427v2 )

ライセンス: Link先を確認

Talal Ahmed Chowdhury, Kwangmin Yu, Mahmud Ashraf Shamim, M. L. Kabir, Raza Sabbir Sufian,

(参考訳) 量子スピンのフラストレーション-$\frac{1}{2}$反強磁性ハイゼンベルクスピンチェーンの量子シミュレーションを、100の量子ビットを持つ実超伝導量子コンピュータにおいて、最も近い隣り合う$(J_1)$とnext-nearest-neighbor$(J_2)$の交換相互作用で行う。特に,IBMの超伝導量子コンピュータにおける近接する隣り合う相互作用と,隣り合う隣り合う隣り合う相互作用を持つハミルトニアンを初めて実装し,一階のトロッタライゼーションを用いてスピンチェーンの時間発展を行う。さらに, 近接交換相互作用のみを含む等方性ハイゼンベルクスピンチェーンの2次トロッタライゼーションの新規実装により, 最大100量子ビットの範囲で観測可能なスタッガー磁化の期待値の精密測定が可能となった。どちらの場合も、初期量子ビット数とは無関係に、各トロッターステップの回路深さが一定になる。超伝導量子コンピュータを用いた大規模量子システムの期待値の正確な測定の実証は、多体量子システムの様々な特性を研究するためのこれらの装置の量子ユーティリティを規定する。これは、フォールトトレランス量子時代以前の量子システムをシミュレートする際の古典的よりも量子上の優位性を達成するための足掛かりとなるだろう。

We present the quantum simulation of the frustrated quantum spin-$\frac{1}{2}$ antiferromagnetic Heisenberg spin chain with competing nearest-neighbor $(J_1)$ and next-nearest-neighbor $(J_2)$ exchange interactions in the real superconducting quantum computer with qubits ranging up to 100. In particular, we implement, for the first time, the Hamiltonian with the next-nearest neighbor exchange interaction in conjunction with the nearest neighbor interaction on IBM's superconducting quantum computer and carry out the time evolution of the spin chain by employing first-order Trotterization. Furthermore, our novel implementation of second-order Trotterization for the isotropic Heisenberg spin chain, involving only nearest-neighbor exchange interaction, enables precise measurement of the expectation values of staggered magnetization observable across a range of up to 100 qubits. Notably, in both cases, our approach results in a constant circuit depth in each Trotter step, independent of the initial number of qubits. Our demonstration of the accurate measurement of expectation values for the large-scale quantum system using superconducting quantum computers designates the quantum utility of these devices for investigating various properties of many-body quantum systems. This will be a stepping stone to achieving the quantum advantage over classical ones in simulating quantum systems before the fault tolerance quantum era.

翻訳日:2024-03-20 23:51:29 公開日:2024-03-18

# HD-Painter:拡散モデルによる高分解能・高感度テキストガイド画像

HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models ( http://arxiv.org/abs/2312.14091v3 )

ライセンス: Link先を確認

Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi,

(参考訳) テキスト・ツー・イメージの拡散モデルが前例のない成功を収めたことから, テキスト誘導画像のインペイント化の進展は, 極めて現実的で視覚的にも妥当な結果をもたらしている。しかし、現在のテキスト・ツー・イメージ・インペインティングモデルにおいて、特にユーザプロンプトとインペインティング領域の整合性の向上や高解像度インペインティングの実施において、大きな可能性を秘めている。そこで我々は,HD-Painterを導入し,プロンプトを正確に追従し,高分解能画像インパインティングにコヒーレントにスケールする訓練自由アプローチを提案する。そこで本研究では,Pmpt-Aware Introverted Attention (PAIntA) レイヤを設計し,より優れたテキスト・アライメント・ジェネレーションを実現することで自己注意スコアを向上させる。さらに迅速なコヒーレンスを改善するために,ポストホックサンプリング戦略をDDIMの一般的な形式にシームレスに統合し,非分布潜時シフトを防止するためのRASG(Reweighting Attention Score Guidance)機構を導入する。さらに、HD-Painterは、インペイント用にカスタマイズされた特殊な超解像技術を導入し、最大2K解像度の画像の欠落した領域の完成を可能にすることで、より大きなスケールへの拡張を可能にする。実験の結果,HD-Painterは既存の最先端アプローチを,複数のメトリクスとユーザスタディで定量的かつ質的に超越していることがわかった。コードは、https://github.com/Picsart-AI-Research/HD-Painterで公開されている。

Recent progress in text-guided image inpainting, based on the unprecedented success of text-to-image diffusion models, has led to exceptionally realistic and visually plausible results. However, there is still significant potential for improvement in current text-to-image inpainting models, particularly in better aligning the inpainted area with user prompts and performing high-resolution inpainting. Therefore, we introduce HD-Painter, a training free approach that accurately follows prompts and coherently scales to high resolution image inpainting. To this end, we design the Prompt-Aware Introverted Attention (PAIntA) layer enhancing self-attention scores by prompt information resulting in better text aligned generations. To further improve the prompt coherence we introduce the Reweighting Attention Score Guidance (RASG) mechanism seamlessly integrating a post-hoc sampling strategy into the general form of DDIM to prevent out-of-distribution latent shifts. Moreover, HD-Painter allows extension to larger scales by introducing a specialized super-resolution technique customized for inpainting, enabling the completion of missing regions in images of up to 2K resolution. Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study. Code is publicly available at: https://github.com/Picsart-AI-Research/HD-Painter

翻訳日:2024-03-20 23:51:29 公開日:2024-03-18

# 固有値探索とグラディエントDescenceのための量子アルゴリズムの改良

Improved Quantum Algorithms for Eigenvalues Finding and Gradient Descent ( http://arxiv.org/abs/2312.14786v2 )

ライセンス: Link先を確認

Nhat A. Nghiem, Tzu-Chieh Wei,

(参考訳) ブロック符号化は、最近開発された量子アルゴリズムの統一フレームワークを形成する量子信号処理において重要な要素である。当初、探索、振幅推定、ハミルトニアンシミュレーションといったいくつかの問題において資源利用の簡素化と最適化のために示され、量子信号処理の能力はこれらを超え、新しい量子アルゴリズムを考案するための未解決のポテンシャルを提供する。本稿では、ブロック符号化を利用して、これまで提案されていた2つの量子アルゴリズム、最大固有値推定と量子勾配降下を効果的に拡張する。従来の高度な手順を含む研究とは異なり、この発見はユニタリブロック符号化を用いて、初等演算であっても、これらの新しい量子アルゴリズムが元の演算子に存在する大きなスケーリング要因を排除できることを実証している。これにより、複雑な計算問題に驚くほどの効率で対処できるより効率的な量子アルゴリズムが得られる。さらに,提案手法を,行列逆転や多重固有値推定など,異なる文脈に拡張する方法を示す。

Block encoding is a key ingredient in the recently developed quantum signal processing that forms a unifying framework for quantum algorithms. Initially showcased for simplifying and optimizing resource utilization in several problems, such as searching, amplitude estimation, and Hamiltonian simulation, the capabilities of the quantum signal processing go beyond these and offer untapped potential for devising new quantum algorithms. In this article, we utilize block encoding to substantially enhance two previously proposed quantum algorithms: largest eigenvalue estimation and quantum gradient descent. Unlike previous works that involve sophisticated procedures, our findings, using the unitary block encoding, demonstrate that even with elementary operations, these new quantum algorithms can eliminate major scaling factors present in their original counterparts. This yields much more efficient quantum algorithms capable of tackling complex computational problems with remarkable efficiency. Furthermore, we show how to extend our proposed method to different contexts, including matrix inversion and multiple eigenvalues estimation.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 連続時間における集合列の確率論的モデリング

Probabilistic Modeling for Sequences of Sets in Continuous-Time ( http://arxiv.org/abs/2312.15045v3 )

ライセンス: Link先を確認

Yuxin Chang, Alex Boyd, Padhraic Smyth,

(参考訳) ニューラルマークされた時間点過程は、連続時間イベントデータに対する統計パラメトリックモデルの既存のツールボックスに重要な付加物である。これらのモデルは、各イベントが1つのアイテム(ひとつのタイプのイベントまたは"マーク")に関連付けられたシーケンスに役立ちます。本研究では,任意の強度に基づくリカレント・ニューラルポイント・プロセス・モデルと互換性のある,設定値データを連続的にモデル化するための一般的なフレームワークを開発する。さらに,このようなモデルを用いて「$B$の前に観測されるアイテム$A$の確率」などの確率的クエリに応答できる推論手法を開発した。このようなクエリに対する正確な答えの計算は、問題設定の連続的な性質と、各事象の潜在的な結果の組合せ的に大きな空間の両方のために、一般的にはニューラルネットワークにとって難解である。そこで,本研究では,4つの実世界のデータセットを用いた体系的な実験を通じて,直接サンプリングよりも高次サンプリングの精度向上を図り,セットベースのシーケンスを問合せするための重要サンプリング手法のクラスを開発する。また、このフレームワークを用いて1段階の予測を伴わない確率を用いてモデル選択を行う方法について説明する。

Neural marked temporal point processes have been a valuable addition to the existing toolbox of statistical parametric models for continuous-time event data. These models are useful for sequences where each event is associated with a single item (a single type of event or a "mark") -- but such models are not suited for the practical situation where each event is associated with a set of items. In this work, we develop a general framework for modeling set-valued data in continuous-time, compatible with any intensity-based recurrent neural point process model. In addition, we develop inference methods that can use such models to answer probabilistic queries such as "the probability of item $A$ being observed before item $B$," conditioned on sequence history. Computing exact answers for such queries is generally intractable for neural models due to both the continuous-time nature of the problem setting and the combinatorially-large space of potential outcomes for each event. To address this, we develop a class of importance sampling methods for querying with set-based sequences and demonstrate orders-of-magnitude improvements in efficiency over direct sampling via systematic experiments with four real-world datasets. We also illustrate how to use this framework to perform model selection using likelihoods that do not involve one-step-ahead prediction.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 生成拡散を先行した実世界ブラインド顔復元に向けて

Towards Real-World Blind Face Restoration with Generative Diffusion Prior ( http://arxiv.org/abs/2312.15736v2 )

ライセンス: Link先を確認

Xiaoxu Chen, Jingfan Tan, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaochun Cao,

(参考訳) ブラインド顔の復元はコンピュータビジョンにおいて重要な課題であり、広範囲の応用により注目されている。以前の研究は主に顔画像の復元に顔の先行性を利用しており、高品質な結果を示している。しかし、有限データから得られる知識が限られているため、忠実な顔の詳細を生成することは難しい問題である。本研究では,前訓練した安定拡散をブラインドフェイス修復に活用する可能性を探る。低画質の顔画像から特徴を効果的に抽出するように設計されたBFRffusionを提案する。さらに、人種、性別、年齢といったバランスのとれた属性を備えたプライバシ保護顔データセットであるPFHQを構築しています。このデータセットは、ブラインドフェイス復元ネットワークをトレーニングするための実行可能な代替手段として機能し、実際の顔データセットに関連するプライバシーとバイアスの懸念に効果的に対処する。大規模な実験を通じて、我々のBFRffusionは、ブラインドフェイス復元のための合成および実世界のパブリックテストデータセットの両方で最先端のパフォーマンスを達成し、PFHQデータセットはブラインドフェイス復元ネットワークをトレーニングするための利用可能なリソースであることを示す。コード、事前訓練されたモデル、データセットはhttps://github.com/chenxx89/BFRffusion.comでリリースされる。

Blind face restoration is an important task in computer vision and has gained significant attention due to its wide-range applications. Previous works mainly exploit facial priors to restore face images and have demonstrated high-quality results. However, generating faithful facial details remains a challenging problem due to the limited prior knowledge obtained from finite data. In this work, we delve into the potential of leveraging the pretrained Stable Diffusion for blind face restoration. We propose BFRffusion which is thoughtfully designed to effectively extract features from low-quality face images and could restore realistic and faithful facial details with the generative prior of the pretrained Stable Diffusion. In addition, we build a privacy-preserving face dataset called PFHQ with balanced attributes like race, gender, and age. This dataset can serve as a viable alternative for training blind face restoration networks, effectively addressing privacy and bias concerns usually associated with the real face datasets. Through an extensive series of experiments, we demonstrate that our BFRffusion achieves state-of-the-art performance on both synthetic and real-world public testing datasets for blind face restoration and our PFHQ dataset is an available resource for training blind face restoration networks. The codes, pretrained models, and dataset are released at https://github.com/chenxx89/BFRffusion.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 境界注意: 高騒音下で境界を局所化する学習

Boundary Attention: Learning to Localize Boundaries under High Noise ( http://arxiv.org/abs/2401.00935v2 )

ライセンス: Link先を確認

Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler,

(参考訳) 我々は、境界注意と呼ばれるメカニズムを用いて、曲線、コーナー、ジャンクションを含む明示的な境界を推論する微分可能モデルを提案する。境界アテンション(バウンダリアテンション)とは、画像内のすべての重なり合うパッチにおいて、局所境界構造の非ラスタライズされた記述を規定する変数のフィールドを、高密度かつ繰り返し適用する境界アテンション演算である。ボトムアップ方式で動作し、サブピクセルのエッジローカライゼーションやエッジリンクの古典的な手法に似ているが、より高次元的な局所境界構造の記述、設計ではなく学習される空間整合性の概念、エンドツーエンドで微分可能な操作のシーケンスがある。我々は、簡単な合成データを用いてモデルを訓練し、低照度でノイズの少ない写真を用いて評価する。提案手法は, 実センサノイズにより劣化した自然画像に一般化し, 他の最先端手法が故障した場合, ますますノイズの多い条件下で一貫した境界を予測できる。

We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlapping patch within an image. It operates in a bottom-up fashion, similar to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional description of local boundary structure, a notion of spatial consistency that is learned instead of designed, and a sequence of operations that is end-to-end differentiable. We train our model using simple synthetic data and then evaluate it using photographs that were captured under low-light conditions with variable amounts of noise. We find that our method generalizes to natural images corrupted by real sensor noise, and predicts consistent boundaries under increasingly noisy conditions where other state-of-the-art methods fail.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 大言語モデルにおけるゼロショット抽象要約の再検討 : 位置バイアスの観点から

Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias ( http://arxiv.org/abs/2401.01989v3 )

ライセンス: Link先を確認

Anshuman Chhabra, Hadi Askari, Prasant Mohapatra,

(参考訳) 本研究では, 位置バイアスを測定することで, 大規模言語モデル(LLM)におけるゼロショット抽象的要約を特徴づけ, 研究し, 従来研究されていたより制限的な鉛バイアス現象の一般的な定式化として提案する。位置バイアスは入力テキストの特定の部分からの情報を不当に優先するモデルの傾向を捉え、望ましくない振る舞いをもたらす。 GPT 3.5-Turbo, Llama-2, Dolly-v2 などの複数の LLM モデルにおける位置バイアスと,Pegasus や BART などの最先端のエンコーダデコーダ・デコーダ抽象要約モデルについて検討した。その結果,ゼロショット要約タスクにおけるモデルの性能と位置バイアスに関する新たな洞察と議論につながった。

We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 図形シンプレクティック代数

Graphical Symplectic Algebra ( http://arxiv.org/abs/2401.07914v3 )

ライセンス: Link先を確認

Robert I. Booth, Titouan Carette, Cole Comfort,

(参考訳) 任意の体上のアフィンラグランジアンおよび共等方的関係のダガーコンパクトプロップに対して完全なプレゼンテーションを行う。これは、親和性に制約された古典力学系と奇数素次元安定化器量子回路の両方に対して統一的なグラフィカル言語群を提供する。この目的のために、無向有色グラフの特定のクラスによるアフィンラグランジアン関係を示す。合成系を推論するために,これらのグラフの頂点がグラフで色付けされるような,スケーラブルな表記法を導入する。安定化器量子力学の設定において、このスケーラブルな表記はグラフ状態の極めて簡潔な記述を与える。「'' 同様に、電気回路の古典的な機械的設定においては、相互ネットワークのインピーダンス行列は基本的に同じであることを示す。

We give complete presentations for the dagger-compact props of affine Lagrangian and coisotropic relations over an arbitrary field. This provides a unified family of graphical languages for both affinely constrained classical mechanical systems, as well as odd-prime-dimensional stabiliser quantum circuits. To this end, we present affine Lagrangian relations by a particular class of undirected coloured graphs. In order to reason about composite systems, we introduce a powerful scalable notation where the vertices of these graphs are themselves coloured by graphs. In the setting of stabiliser quantum mechanics, this scalable notation gives an extremely concise description of graph states, which can be composed via ``phased spider fusion.'' Likewise, in the classical mechanical setting of electrical circuits, we show that impedance matrices for reciprocal networks are presented in essentially the same way.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 確率論的ランベルト問題の解:最適物質輸送、シュレーディンガー橋および反応拡散PDEとの接続

Solution of the Probabilistic Lambert Problem: Connections with Optimal Mass Transport, Schrödinger Bridge and Reaction-Diffusion PDEs ( http://arxiv.org/abs/2401.07961v3 )

ライセンス: Link先を確認

Alexis M. H. Teter, Iman Nodozi, Abhishek Halder,

(参考訳) ランバートの問題は、重力場を受ける速度制御を介して所定の飛行時間内に、与えられた初期から所定の終端位置に宇宙船を移動させることである。位置ベクトルにおける終点制約の知識をそれぞれの合同確率密度関数の知識に置き換えるランベルト問題の確率的変種を考える。終端結合確率密度制約を伴うランベルト問題は、一般化された最適質量輸送(OMT)問題であり、この古典的な天体力学問題を、現代の確率制御と確率機械学習の進歩的な研究領域と結びつけていることを示す。この新たな接続により、確率ランベルト問題に対する解の存在と一意性を厳格に確立することができる。同じ接続は拡散正則化(英語版)により確率ランベルト問題を数値的に解くのにも役立ち、すなわち OMT と Schr\"odinger bridge problem (SBP) とのさらなる接続を利用する。これはまた、加法的動的プロセスノイズを伴う確率ランベルト問題は、実際は一般化されたSBPであり、この研究で述べたように、いわゆる「シュル・オーディンガー因子」を用いて数値的に解くことができることを示している。この結果から, 非線形重力ポテンシャルが反応速度として現れる反応拡散PDEの境界結合系の解法が導かれる。本稿では,新しいアルゴリズムを提案するとともに,実測的な数値結果を示す。解析とアルゴリズムの枠組みは非パラメトリックであり、統計的(例えば、ガウス的、最初の数モーメント、混合あるいは指数的族、十分な統計量の有限次元性)も動的(例えば、テイラー級数)近似もしない。

Lambert's problem concerns with transferring a spacecraft from a given initial to a given terminal position within prescribed flight time via velocity control subject to a gravitational force field. We consider a probabilistic variant of the Lambert problem where the knowledge of the endpoint constraints in position vectors are replaced by the knowledge of their respective joint probability density functions. We show that the Lambert problem with endpoint joint probability density constraints is a generalized optimal mass transport (OMT) problem, thereby connecting this classical astrodynamics problem with a burgeoning area of research in modern stochastic control and stochastic machine learning. This newfound connection allows us to rigorously establish the existence and uniqueness of solution for the probabilistic Lambert problem. The same connection also helps to numerically solve the probabilistic Lambert problem via diffusion regularization, i.e., by leveraging further connection of the OMT with the Schr\"odinger bridge problem (SBP). This also shows that the probabilistic Lambert problem with additive dynamic process noise is in fact a generalized SBP, and can be solved numerically using the so-called Schr\"odinger factors, as we do in this work. We explain how the resulting analysis leads to solving a boundary-coupled system of reaction-diffusion PDEs where the nonlinear gravitational potential appears as the reaction rate. We propose novel algorithms for the same, and present illustrative numerical results. Our analysis and the algorithmic framework are nonparametric, i.e., we make neither statistical (e.g., Gaussian, first few moments, mixture or exponential family, finite dimensionality of the sufficient statistic) nor dynamical (e.g., Taylor series) approximations.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# AI適応画像ラベリングにおけるコンフォーマル予測セットの有用性の評価

Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling ( http://arxiv.org/abs/2401.08876v5 )

ライセンス: Link先を確認

Dongping Zhang, Angelos Chatzimparmpas, Negar Kamali, Jessica Hullman,

(参考訳) ディープ・ニューラル・ネットワークはより一般的に高い領域に展開されるため、ブラックボックスの性質は不確実な定量化を困難にしている。本稿では,AIが推奨する意思決定における不確実性を表現するために,特定のカバレッジで予測セットを生成する手法の分布自由クラスである共形予測セットの提示の効果について検討する。大規模なオンライン実験を通じて、共形予測セットの有用性と、AIが推奨する画像ラベリングのためのTop-1およびTop-k予測の表示を比較した。事前登録された分析では,精度の予測セットの有用性はタスクの難易度に応じて変化し,Top-1やTop-kの表示と同等以上の精度で画像を容易に表示できる一方で,特にセットサイズが小さい場合には,人間にアウト・オブ・ディストリビューション(OOD)画像のラベル付けを支援できる予測セットが優れていることがわかった。本研究は,共形予測セットの実践的課題を実証的に特定し,実世界の意思決定に組み込む方法について考察した。

As deep neural networks are more commonly deployed in high-stakes domains, their black-box nature makes uncertainty quantification challenging. We investigate the effects of presenting conformal prediction sets--a distribution-free class of methods for generating prediction sets with specified coverage--to express uncertainty in AI-advised decision-making. Through a large online experiment, we compare the utility of conformal prediction sets to displays of Top-1 and Top-k predictions for AI-advised image labeling. In a pre-registered analysis, we find that the utility of prediction sets for accuracy varies with the difficulty of the task: while they result in accuracy on par with or less than Top-1 and Top-k displays for easy images, prediction sets excel at assisting humans in labeling out-of-distribution (OOD) images, especially when the set size is small. Our results empirically pinpoint practical challenges of conformal prediction sets and provide implications on how to incorporate them for real-world decision-making.

翻訳日:2024-03-20 23:41:33 公開日:2024-03-18

# 自動ファクトチェックのためのクレーム検出:単言語・多言語・言語横断研究に関する調査

Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research ( http://arxiv.org/abs/2401.11969v3 )

ライセンス: Link先を確認

Rrubaa Panchendrarajan, Arkaitz Zubiaga,

(参考訳) オンラインプラットフォーム上での誤情報拡散の増加により,過去数十年間,ファクトチェックの自動化が注目されている。これはしばしば、一連のタスクとして実行される。一検証を必要とするクレームを構成するオンラインプラットフォームに流通する文の検出 (ii)これらのクレームの検証プロセス本調査は, 事実確認を必要とするクレームを検出するための既存の取り組みを, 多言語データと手法に特に焦点をあてることにより, 前者に焦点を当てる。これは、既存の方法が人間のパフォーマンスにマッチするほど遠くない難易度の高い方向であり、この問題の極めて困難な性質のためである。特に、複数の社会プラットフォームにまたがる情報の拡散は、複数の言語やモダリティで具体化され、誤情報と戦うためのより一般化された解決策が要求される。多言語誤報に着目し,既存の多言語クレーム検出研究を包括的に調査する。本稿では,現状の多言語クレーム検出研究を,問題の3つの重要な要因,妥当性,優先性,類似性に分類する。さらに,既存の多言語データセットの概要と課題について概説し,今後の発展の可能性を提案する。

Automated fact-checking has drawn considerable attention over the past few decades due to the increase in the diffusion of misinformation on online platforms. This is often carried out as a sequence of tasks comprising (i) the detection of sentences circulating in online platforms which constitute claims needing verification, followed by (ii) the verification process of those claims. This survey focuses on the former, by discussing existing efforts towards detecting claims needing fact-checking, with a particular focus on multilingual data and methods. This is a challenging and fertile direction where existing methods are yet far from matching human performance due to the profoundly challenging nature of the issue. Especially, the dissemination of information across multiple social platforms, articulated in multiple languages and modalities demands more generalized solutions for combating misinformation. Focusing on multilingual misinformation, we present a comprehensive survey of existing multilingual claim detection research. We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. Further, we present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# 機械学習とシンボリック手法の融合:自然言語処理へのハイブリッドアプローチに関する調査

Synergizing Machine Learning & Symbolic Methods: A Survey on Hybrid Approaches to Natural Language Processing ( http://arxiv.org/abs/2401.11972v2 )

ライセンス: Link先を確認

Rrubaa Panchendrarajan, Arkaitz Zubiaga,

(参考訳) 機械学習とシンボリックアプローチの進歩は、自然言語処理(NLP)におけるその強みと弱点を裏付けている。機械学習のアプローチはデータのパターンを特定するのに強力だが、コモンセンスとNLPタスクに必要な事実知識の学習には不足することが多い。一方、記号的手法は知識に富んだデータを表現するのに優れている。しかし、彼らは動的データに適応し、知識を一般化するのに苦労している。これら2つのパラダイムをハイブリッドアプローチでブリッジすることで、強みを保ちながら両方の弱点を緩和することができる。近年の研究は、様々なNLPタスクにおいて有望な結果を示しながら、この連合の長所を誇示している。本稿では,NLPにおけるハイブリッドアプローチの概要について述べる。具体的には、自然言語理解、生成、推論を必要とする幅広いNLPタスクに使用される最先端のハイブリッドアプローチについて検討する。さらに,NLPのハイブリッド手法として利用可能な既存の資源と課題と今後の方向性について論じ,今後の研究のロードマップを提供する。

The advancement of machine learning and symbolic approaches have underscored their strengths and weaknesses in Natural Language Processing (NLP). While machine learning approaches are powerful in identifying patterns in data, they often fall short in learning commonsense and the factual knowledge required for the NLP tasks. Meanwhile, the symbolic methods excel in representing knowledge-rich data. However, they struggle to adapt dynamic data and generalize the knowledge. Bridging these two paradigms through hybrid approaches enables the alleviation of weaknesses in both while preserving their strengths. Recent studies extol the virtues of this union, showcasing promising results in a wide range of NLP tasks. In this paper, we present an overview of hybrid approaches used for NLP. Specifically, we delve into the state-of-the-art hybrid approaches used for a broad spectrum of NLP tasks requiring natural language understanding, generation, and reasoning. Furthermore, we discuss the existing resources available for hybrid approaches for NLP along with the challenges and future directions, offering a roadmap for future research avenues.

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# 超伝導量子ビットにおけるコヒーレント2レベル系の離散電荷状態の観測

Observation of discrete charge states of a coherent two-level system in a superconducting qubit ( http://arxiv.org/abs/2401.12183v2 )

ライセンス: Link先を確認

Bao-Jie Liu, Ying-Ying Wang, Tal Sheffer, Chen Wang,

(参考訳) 我々は、オフセット電荷感受性超伝導トランスモン量子ビットに強く結合したコヒーレント誘電体2レベル系(TLS)の離散電荷状態の観測を報告する。 2つのTLS固有状態(遷移周波数2.9GHz、緩和時間3ms)に関連する0.072$e$のオフセット電荷を測定する。さらにTLS遷移と準粒子トンネル力学のジョイントトラッキングを行うが,本質的な相関は見つからない。本研究では、低周波帯電雑音の発生源としてマイクロ波TLSを示す。

We report observations of discrete charge states of a coherent dielectric two-level system (TLS) that is strongly coupled to an offset-charge-sensitive superconducting transmon qubit. We measure an offset charge of 0.072$e$ associated with the two TLS eigenstates, which have a transition frequency of 2.9 GHz and a relaxation time exceeding 3 ms. Combining measurements in the strong dispersive and resonant regime, we quantify both transverse and longitudinal couplings of the TLS-qubit interaction. We further perform joint tracking of TLS transitions and quasiparticle tunneling dynamics but find no intrinsic correlations. This study demonstrates microwave-frequency TLS as a source of low-frequency charge noise.

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# 人間のフィードバックによる機械翻訳の改善--リワードモデルによる品質評価の探索

Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model ( http://arxiv.org/abs/2401.12873v3 )

ライセンス: Link先を確認

Zhiwei He, Xing Wang, Wenxiang Jiao, Zhuosheng Zhang, Rui Wang, Shuming Shi, Zhaopeng Tu,

(参考訳) 報酬モデルにおける人間の嗜好の不十分なモデリングは、人間のフィードバックを活用して翻訳品質を向上させる上で大きな障害となる。幸いなことに、ある翻訳の品質を基準なしに予測する品質評価(QE)は、過去2年間に人間の評価と顕著に一致している。本研究では,QEモデルを報酬モデルとして活用し,フィードバックトレーニングにおける人間の嗜好を予測する可能性について検討する。まず,QEに基づくフィードバックトレーニングにおいて,翻訳品質が低下する中で,報酬の増大として現れる過度な最適化問題を同定した。この問題を検証し,QEモデルの脆弱性は誤訳に対して高い報奨を与える可能性があり,過度な最適化と誤りの伝播をもたらすと論じる。この問題に対処するために、ヒューリスティックなルールを用いて誤った翻訳を検知し、報酬のスコアにペナルティ項を割り当てる、単純で効果的な手法を採用する。実験の結果,提案したQEに基づくフィードバックトレーニングは,様々な設定において一貫した,重要な改善を達成し,さらに人間の嗜好研究を通じて検証された。続く分析では、QEに基づくフィードバックトレーニングの高効率性を実証し、少量のモノリンガルデータにより、より大きな並列コーパスを用いたシステムより優れていることを示す。私たちのコードは、https://github.com/zwhe99/FeedbackMTで利用可能です。

Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. We examine the problem and argue that the vulnerability of the QE model might lead to high rewards for incorrect translations, resulting in overoptimization and error propagation. To address the problem, we adopt a simple yet effective method that uses heuristic rules to detect the incorrect translations and assigns a penalty term to the reward scores of them. Experimental results show that the proposed QE-based feedback training achieves consistent and significant improvements across various settings, further verified through human preference studies. Our subsequent analysis demonstrates the high data efficiency of the proposed QE-based feedback training: it outperforms systems using larger parallel corpora by a small amount of monolingual data. Our code is available at: https://github.com/zwhe99/FeedbackMT

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# 時間依存力学学習におけるリッチフロー誘導オートエンコーダ

Ricci flow-guided autoencoders in learning time-dependent dynamics ( http://arxiv.org/abs/2401.14591v5 )

ライセンス: Link先を確認

Andrew Gracyk,

(参考訳) 本稿では, 時間的非線形力学, 特に偏微分方程式 (PDE) を学習するための多様体ベースオートエンコーダ法を提案する。これは、物理学的インフォームドな設定でリッチフローをシミュレートすることで実現でき、また、リッチフローが経験的に達成されるように、多様体の量と一致させることができる。我々の方法論では、多様体は訓練手順の一部として学習されるので、理想的な測地は識別されうるが、進化は静的な方法よりも共役な潜在表現を同時に引き起こす。本稿では,周期性やランダム性,分布内誤差,外挿シナリオなどの望ましい特徴を包含するPDEを用いた数値実験について述べる。

We present a manifold-based autoencoder method for learning nonlinear dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by simulating Ricci flow in a physics-informed setting, and manifold quantities can be matched so that Ricci flow is empirically achieved. With our methodology, the manifold is learned as part of the training procedure, so ideal geometries may be discerned, while the evolution simultaneously induces a more accommodating latent representation over static methods. We present our method on a range of numerical experiments consisting of PDEs that encompass desirable characteristics such as periodicity and randomness, remarking error on in-distribution and extrapolation scenarios.

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# 組合せ最適化のための注意に基づく強化学習:ジョブショップスケジューリング問題への応用

Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem ( http://arxiv.org/abs/2401.16580v2 )

ライセンス: Link先を確認

Jaejin Lee, Seho Kee, Mani Janakiram, George Runger,

(参考訳) ジョブショップスケジューリング問題は組合せ最適化問題の重要かつ複雑な側面を表しており、これは伝統的に正確な解法または近似解法によって解決されてきた。しかし、現実の問題の複雑さのために、これらのソリューションの実践的な応用がしばしば挑戦される。近似解法を利用する場合であっても、近似解を特定するのに必要な時間は禁じられ、導出された解は一般に新しい問題に適用できない。本研究では,ジョブショップスケジューリング問題に特化して設計された,革新的な注意力に基づく強化学習手法を提案する。この方法は、ポリシー勾配強化学習アプローチと、改良されたトランスフォーマーアーキテクチャを統合する。この研究の鍵となる発見は、提案手法で訓練を受けた学習者が、初期訓練セットに含まれない大規模問題に再利用できることである。さらに,本手法が最近の研究結果を上回り,一般に実施されているヒューリスティックルールを上回ることを示す実証的証拠が得られた。このことから,本手法は,求人スケジューリング問題における今後の研究・実践の道筋として有望なものであることが示唆された。

Job shop scheduling problems represent a significant and complex facet of combinatorial optimization problems, which have traditionally been addressed through either exact or approximate solution methodologies. However, the practical application of these solutions is often challenged due to the complexity of real-world problems. Even when utilizing an approximate solution approach, the time required to identify a near-optimal solution can be prohibitively extensive, and the solutions derived are generally not applicable to new problems. This study proposes an innovative attention-based reinforcement learning method specifically designed for the category of job shop scheduling problems. This method integrates a policy gradient reinforcement learning approach with a modified transformer architecture. A key finding of this research is the ability of our trained learners within the proposed method to be repurposed for larger-scale problems that were not part of the initial training set. Furthermore, empirical evidence demonstrates that our approach surpasses the results of recent studies and outperforms commonly implemented heuristic rules. This suggests that our method offers a promising avenue for future research and practical application in the field of job shop scheduling problems.

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# 実行可能なコードアクションにより、より良いLLMエージェントが取り除かれる

Executable Code Actions Elicit Better LLM Agents ( http://arxiv.org/abs/2402.01030v2 )

ライセンス: Link先を確認

Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji,

(参考訳) 大きな言語モデル(LLM)エージェントは、ツールの呼び出しやロボットの制御など、幅広いアクションを実行することができ、現実世界の課題に取り組む大きな可能性を示している。 LLMエージェントは、通常、事前に定義されたフォーマットでJSONやテキストを生成することでアクションを生成するよう促される。この研究は、実行可能なPythonコードを使用して、LLMエージェントのアクションを統一されたアクション空間(CodeAct)に統合することを提案する。 Pythonインタプリタと統合されたCodeActは、コードアクションを実行し、事前アクションを動的に修正したり、マルチターンインタラクションを通じて新しい観察に新しいアクションを発行することができる。 API-Bank上の17のLLMと、新たにキュレートされたベンチマークの広範な分析は、CodeActが広く使われている代替品(最大20%の成功率)を上回っていることを示している。 CodeActのパフォーマンス向上は、解釈可能なコードを実行し、自然言語を使ってユーザとコラボレーションすることで、環境と対話するオープンソースのLLMエージェントを構築する動機となります。この目的のために,CodeAct を用いた 7k のマルチターンインタラクションからなる命令チューニングデータセット CodeActInstruct を収集する。本稿では,エージェント指向タスクのモデルを改善するために,既存のデータと組み合わせることで,汎用性を損なうことなく利用できることを示す。 Llama2とMistralから微調整されたCodeActAgentはPythonインタプリタと統合されており、既存のライブラリを使用して高度なタスク(例えばモデルトレーニング)を実行し、自律的に自己デバッグするように設計されている。

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.

翻訳日:2024-03-20 23:31:36 公開日:2024-03-18

# LHRS-Bot:VGI強化大規模マルチモーダル言語モデルを用いたリモートセンシング

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model ( http://arxiv.org/abs/2402.02544v3 )

ライセンス: Link先を確認

Dilxat Muhtar, Zhenshi Li, Feng Gu, Xueliang Zhang, Pengfeng Xiao,

(参考訳) 大規模言語モデル(LLM)の革命的能力は、マルチモーダルな大規模言語モデル(MLLM)の道を切り開き、様々な専門分野にまたがる多様な応用を育んでいる。しかし、リモートセンシング(RS)分野では、最近のMLLMでは、多様な地形やRS画像の様々な物体が適切に考慮されていない。このギャップを埋めるために、大規模なRS画像テキストデータセットであるLHRS-Alignと情報的RS固有の命令データセットであるLHRS-Instructを構築し、大規模なボランティア地理情報(VGI)とグローバルに利用可能なRS画像を活用する。この基盤の上に構築されたLHRS-Botは、新しい多段階視覚言語アライメント戦略とカリキュラム学習手法により、RS画像理解に適したMLLMである。さらに、RS画像理解におけるMLLMの能力を徹底的に評価するベンチマークであるLHRS-Benchを紹介する。総合的な実験により、LHRS-BotはRS画像の深い理解と、RS領域内でニュアンス推論を行う能力を示すことが示された。

The revolutionary capabilities of large language models (LLMs) have paved the way for multimodal large language models (MLLMs) and fostered diverse applications across various specialized domains. In the remote sensing (RS) field, however, the diverse geographical landscapes and varied objects in RS imagery are not adequately considered in recent MLLM endeavors. To bridge this gap, we construct a large-scale RS image-text dataset, LHRS-Align, and an informative RS-specific instruction dataset, LHRS-Instruct, leveraging the extensive volunteered geographic information (VGI) and globally available RS images. Building on this foundation, we introduce LHRS-Bot, an MLLM tailored for RS image understanding through a novel multi-level vision-language alignment strategy and a curriculum learning method. Additionally, we introduce LHRS-Bench, a benchmark for thoroughly evaluating MLLMs' abilities in RS image understanding. Comprehensive experiments demonstrate that LHRS-Bot exhibits a profound understanding of RS images and the ability to perform nuanced reasoning within the RS domain.

翻訳日:2024-03-20 23:21:52 公開日:2024-03-18

# パラメータフリー確率最適化はどのくらい自由か?

How Free is Parameter-Free Stochastic Optimization? ( http://arxiv.org/abs/2402.03126v2 )

ライセンス: Link先を確認

Amit Attia, Tomer Koren,

(参考訳) パラメータフリー確率最適化の問題について,パラメータフリーな手法が存在するかどうかを問うとともに,パラメータフリーな手法と競合する収束率を求める。既存のパラメータフリーなメソッドは、確率的勾配ノルム上の境界、最小値への距離上の境界など、真の問題パラメータに関するいくつかの非自明な知識を必要とするため、 `partially'' パラメータフリーとみなすことができる。非凸環境では、単純なハイパーパラメータ探索技術により、より洗練された最先端のアルゴリズムより優れたパラメータフリーな手法が実現されることを示す。また,弱雑音条件下では,雑音関数値にアクセス可能な凸設定でも同様の結果が得られる。最後に、確率勾配にのみアクセスすると、完全にパラメータフリーな確率凸最適化が実現不可能な下界を確立し、(部分的には)下界で示される極限までパラメータフリーな方法を提案する。

We study the problem of parameter-free stochastic optimization, inquiring whether, and under what conditions, do fully parameter-free methods exist: these are methods that achieve convergence rates competitive with optimally tuned methods, without requiring significant knowledge of the true problem parameters. Existing parameter-free methods can only be considered ``partially'' parameter-free, as they require some non-trivial knowledge of the true problem parameters, such as a bound on the stochastic gradient norms, a bound on the distance to a minimizer, etc. In the non-convex setting, we demonstrate that a simple hyperparameter search technique results in a fully parameter-free method that outperforms more sophisticated state-of-the-art algorithms. We also provide a similar result in the convex setting with access to noisy function values under mild noise assumptions. Finally, assuming only access to stochastic gradients, we establish a lower bound that renders fully parameter-free stochastic convex optimization infeasible, and provide a method which is (partially) parameter-free up to the limit indicated by our lower bound.

翻訳日:2024-03-20 23:21:52 公開日:2024-03-18

# 2層ネットワークにおけるグラディエントダイスのためのバッチの再利用効果:情報量とプループ指数を破る

The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents ( http://arxiv.org/abs/2402.03220v2 )

ライセンス: Link先を確認

Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala,

(参考訳) マルチインデックスターゲット関数を学習する際の2層ニューラルネットワークのトレーニングダイナミクスについて検討する。本稿では,複数回バッチを再利用するマルチパス勾配勾配(GD)に着目し,単一パス勾配勾配よりも学習可能な関数の結論を大きく変えることを示す。特に、有限段差をもつマルチパスGDは、目標関数の情報指数 (Ben Arous et al , 2021) と跳躍指数 (Abbe et al , 2023) によって与えられる勾配流とシングルパスGDの限界を克服する。本研究では, 階段特性を満足しない関数に対しても, ネットワークは2段階に留まらず, 目標部分空間と重なり合うことを実証する(Abbe et al , 2021)。有限時間で効率的に学習された関数の(広さの)クラスを特徴づける。この結果の証明は、動的平均場理論(DMFT)の分析に基づいている。さらに、重みの低次元射影の動的過程の閉形式記述と、その理論を説明する数値実験について述べる。

We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function. We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property (Abbe et al., 2021). We characterize the (broad) class of functions efficiently learned in finite time. The proof of our results is based on the analysis of the Dynamical Mean-Field Theory (DMFT). We further provide a closed-form description of the dynamical process of the low-dimensional projections of the weights, and numerical experiments illustrating the theory.

翻訳日:2024-03-20 23:21:52 公開日:2024-03-18

# Retrieve to Explain: 言語モデルによるエビデンス駆動予測

Retrieve to Explain: Evidence-driven Predictions with Language Models ( http://arxiv.org/abs/2402.04068v2 )

ライセンス: Link先を確認

Ravi Patel, Angus Brayne, Rogier Hintzen, Daniel Jaroslawicz, Georgiana Neculae, Dane Corneil,

(参考訳) マシンラーニングモデル、特に言語モデルは、イントロスペクションが難しいことで有名です。ブラックボックスモデルは、モデルトレーニングと有害バイアスの両方の問題を隠蔽することができる。ヒューマン・イン・ザ・ループのプロセスでは、不透明な予測は信頼の欠如を招き、効果的に実行してもモデルへの影響を制限する。これらの問題に対処するために、Retrieve to Explain (R2E)を紹介します。 R2Eは検索に基づく言語モデルであり、文書コーパスのエビデンスに基づいた研究質問に対して、最終的な予測に対する証拠の相対的重要性を特定するためにシェープリー値を使用する。 R2Eは、再訓練することなく新しいエビデンスに適応し、自然言語へのテンプレート化を通じて構造化データを組み込むことができる。本研究は,本モデルが臨床治験結果を予測するための業界標準遺伝学的アプローチよりも優れていることを示す。

Machine learning models, particularly language models, are notoriously difficult to introspect. Black-box models can mask both issues in model training and harmful biases. For human-in-the-loop processes, opaque predictions can drive lack of trust, limiting a model's impact even when it performs effectively. To address these issues, we introduce Retrieve to Explain (R2E). R2E is a retrieval-based language model that prioritizes amongst a pre-defined set of possible answers to a research question based on the evidence in a document corpus, using Shapley values to identify the relative importance of pieces of evidence to the final prediction. R2E can adapt to new evidence without retraining, and incorporate structured data through templating into natural language. We assess on the use case of drug target identification from published scientific literature, where we show that the model outperforms an industry-standard genetics-based approach on predicting clinical trial outcomes.

翻訳日:2024-03-20 23:21:52 公開日:2024-03-18

# 身元不明の患者集団に対する非検出的敵対的バイアスアタック(動画あり)

Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations ( http://arxiv.org/abs/2402.05713v2 )

ライセンス: Link先を確認

Pranav Kulkarni, Andrew Chan, Nithya Navarathna, Skylar Chan, Paul H. Yi, Vishwa S. Parekh,

(参考訳) 放射線学における人工知能(AI)の拡散は、深層学習(DL)モデルが脆弱な患者に対する臨床バイアスを悪化させるリスクに光を当てている。従来の文献では、訓練されたDLモデルによって示されるバイアスの定量化に焦点が当てられていたが、人口統計学的にDLモデルに対する敵対的バイアス攻撃とその臨床環境への影響は、医用画像研究の未調査分野として残されている。本研究は,人口動態を標的としたラベル中毒攻撃が,DLモデルにおいて検出不能な診断バイアスをもたらすことを実証するものである。本研究の結果は,性別,年齢,およびそれらの交叉部分群など,複数のパフォーマンス指標および人口動態群にまたがって,モデル全体の性能に影響を及ぼすことなく,グループモデルのパフォーマンスを劣化させることにより,対象群における偏見に対する高い選択性を示すことが示された。さらに, 逆偏差攻撃は, 外部データセットを用いて評価しても, 予測偏差を伝播する有意なDLモデルをもたらすことが示唆された。

The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets.

翻訳日:2024-03-20 23:21:52 公開日:2024-03-18

# チューニング不要確率最適化

Tuning-Free Stochastic Optimization ( http://arxiv.org/abs/2402.07793v2 )

ライセンス: Link先を確認

Ahmed Khaled, Chi Jin,

(参考訳) 大規模な機械学習の問題は、ハイパーパラメータチューニングのコストをより禁忌なものにする。これにより、自分自身をオンザフライでチューニングできるアルゴリズムの必要性が生まれます。我々は,最適化アルゴリズムの最適化性能を,関連する問題パラメータのゆるいヒントのみを与えられた多対数因子に適合させる「チューニングフリー」アルゴリズムの概念を定式化する。本稿では,SGD(Stochastic Gradient Descent)を最適に調整したアルゴリズムについて考察する。最適化領域が有界である場合、SGDのチューニング不要なマッチングが可能であり、既存のアルゴリズムによって実現可能であることを示す。凸や滑らかなリプシッツ関数を非有界領域上で最小化するタスクでは、チューニング不要な最適化は不可能である。非有界領域でもチューニング不要な最適化が可能となる条件について論じる。特に,最近提案されたDoGアルゴリズムとDoWGアルゴリズムは,ノイズ分布が十分に良好な場合,チューニング不要であることを示す。滑らかで潜在的に非凸な関数の定常点を求めるタスクに対して、チューニングされたSGDの最もよく知られた高確率収束率と、追加の多対数コストで一致するSGDの変種を与える。しかし、調整されたSGDの最適収束率を高い確率で一致させるアルゴリズムが存在しないことを示す不確実性結果も提示する。

Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD). When the domain of optimization is bounded, we show tuning-free matching of SGD is possible and achieved by several existing algorithms. We prove that for the task of minimizing a convex and smooth or Lipschitz function over an unbounded domain, tuning-free optimization is impossible. We discuss conditions under which tuning-free optimization is possible even over unbounded domains. In particular, we show that the recently proposed DoG and DoWG algorithms are tuning-free when the noise distribution is sufficiently well-behaved. For the task of finding a stationary point of a smooth and potentially nonconvex function, we give a variant of SGD that matches the best-known high-probability convergence rate for tuned SGD at only an additional polylogarithmic cost. However, we also give an impossibility result that shows no algorithm can hope to match the optimal expected convergence rate for tuned SGD with high probability.

翻訳日:2024-03-20 23:21:52 公開日:2024-03-18

# CiMNet:DNNアーキテクチャとコンピュート・イン・メモリハードウェアの構成を共同で最適化する

CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware ( http://arxiv.org/abs/2402.11780v2 )

ライセンス: Link先を確認

Souvik Kundu, Anthony Sarah, Vinay Joshi, Om J Omer, Sreenivas Subramoney,

(参考訳) 近年の大規模ディープニューラルネットワークの需要増加に伴い、コンピューティングインメモリ(CiM)は、Von-Neumanアーキテクチャを制約する帯域幅とオンチップの相互接続ボトルネックを緩和する重要なソリューションとして浮上した。しかし、CiMハードウェアの構築は、異なるインタフェースにおけるキャッシュサイズとメモリ帯域幅の特定のメモリ階層が、テンソル次元や演算強度などのニューラルネットワークの属性と理想的に一致しない可能性があるため、最適化された性能の低いシステムに繋がる。ニューラルネットワークサーチ(NAS)技術は、所定のハードウェアメトリック予算(例えば、DNNの実行時間やレイテンシ)に対して効率的なサブネットワークを提供するのに成功しているが、ハードウェア構成は凍結され、しばしば与えられた予算に対して最適なサブネットワークを提供する。本稿では,CiMアーキテクチャのための最適なサブネットワークとハードウェア構成を共同で検索するフレームワークであるCiMNetを提案する。提案フレームワークは、サブネットワークの性能と、帯域幅、処理要素サイズ、メモリサイズを含むCiMハードウェア構成の選択との間の複雑な相互作用を理解することができる。 CNNとTransformerファミリーの異なるモデルアーキテクチャに関する実験は、CiMNetが協調最適化サブネットワークとCiMハードウェア構成を見つける上で有効であることを実証している。具体的には、ImageNetの分類精度をベースラインのViT-Bと同等にするために、モデルアーキテクチャのみを最適化するとパフォーマンスが1.7倍に向上し、モデルアーキテクチャとハードウェア構成の両方を最適化すると3.1倍に向上する。

With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may not be ideally matched to any neural network's attributes such as tensor dimension and arithmetic intensity, thus leading to suboptimal and under-performing systems. Despite the success of neural architecture search (NAS) techniques in yielding efficient sub-networks for a given hardware metric budget (e.g., DNN execution time or latency), it assumes the hardware configuration to be frozen, often yielding sub-optimal sub-networks for a given budget. In this paper, we present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for CiM architectures creating a Pareto optimal frontier of downstream task accuracy and execution metrics (e.g., latency). The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices including bandwidth, processing element size, and memory size. Exhaustive experiments on different model architectures from both CNN and Transformer families demonstrate the efficacy of the CiMNet in finding co-optimized sub-networks and CiM hardware configurations. Specifically, for similar ImageNet classification accuracy as baseline ViT-B, optimizing only the model architecture increases performance (or reduces workload execution time) by 1.7x while optimizing for both the model architecture and hardware configuration increases it by 3.1x.

翻訳日:2024-03-20 23:12:03 公開日:2024-03-18

# MVDiffusion++:シングル・スパース・ビュー3次元オブジェクト再構成のための高分解能多視点拡散モデル

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction ( http://arxiv.org/abs/2402.12712v2 )

ライセンス: Link先を確認

Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan,

(参考訳) 本稿では,3次元オブジェクト再構成のためのニューラルネットワークMVDiffusion++を提案する。 MVDiffusion++は2つの驚くほどシンプルなアイデアで優れた柔軟性とスケーラビリティを実現します。 1) カメラポーズ情報を明示的に使用せずに、任意の数の条件および生成ビューにまたがる3次元の一貫性を学習する2次元潜伏特徴間の標準的な自己意識を学習する「目的なしアーキテクチャ」。 2)「ビュードロップアウト戦略」は、トレーニング中にかなりの数のアウトプットビューを捨て、トレーニング時のメモリフットプリントを削減し、テスト時に高精細で高精細なビュー合成を可能にする。我々はObjaverseをトレーニングに使用し、Google Scanned Objectsを標準的な新しいビュー合成と3D再構成のメトリクスで評価し、MVDiffusion++は現在の最先端技術よりも大幅に優れています。また,MVDiffusion++とテキスト・ツー・イメージ生成モデルを組み合わせることで,テキスト・ツー・3Dアプリケーションの例を示す。プロジェクトのページはhttps://mvdiffusion-plus.github.ioにある。

This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model. The project page is at https://mvdiffusion-plusplus.github.io.

翻訳日:2024-03-20 23:12:03 公開日:2024-03-18

# 視覚的位置認識のための事前学習モデルのシームレス適応に向けて

Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition ( http://arxiv.org/abs/2402.14505v2 )

ライセンス: Link先を確認

Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, Yaowei Wang, Chun Yuan,

(参考訳) 近年の研究では、大規模データを用いた汎用的な視覚学習タスクで事前訓練された視覚モデルが、幅広い視覚知覚問題に有用な特徴表現を提供する可能性が示されている。しかし、視覚的位置認識(VPR)において、事前訓練された基礎モデルを活用する試みはほとんど行われていない。モデル事前学習とVPRのタスク間のトレーニング目標とデータに固有の違いがあるため、どのようにギャップを埋め、VPRのための事前訓練されたモデルの能力を完全に解き放つかは、依然として対処すべき重要な問題である。そこで本研究では,VPRのための事前学習モデルのシームレスな適応を実現する新しい手法を提案する。具体的には、地域を識別するための有意義なランドマークに焦点を当てたグローバル・ローカル両方の特徴を得るために、グローバル・ローカル両方の適応を効率的に実現するためのハイブリッド適応法を設計し、事前訓練されたモデルを調整することなく軽量アダプタのみをチューニングする。また,有効適応の導出として,局所的マッチングに適切な局所的特徴が生成され,再ランク付けに要する時間的空間的検証を回避できる相互近接局所的特徴損失を提案する。実験結果から,本手法は訓練データとトレーニング時間が少なくて最先端の手法より優れており,RANSACによる空間的検証を行う2段階VPR法では,約3%の検索実行時間しか利用できないことがわかった。 MSLSチャレンジリーダーボード(応募時点で)で1位にランクインしている。コードはhttps://github.com/Lu-Feng/SelaVPRで公開されている。

Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.

翻訳日:2024-03-20 23:12:03 公開日:2024-03-18

# マルチカット多面体の切削面と立方体面

Cut Facets and Cube Facets of Lifted Multicut Polytopes ( http://arxiv.org/abs/2402.16814v2 )

ライセンス: Link先を確認

Lucas Fabian Naumann, Jannik Irmai, Shengxian Zhao, Bjoern Andres,

(参考訳) 昇降型マルチカット問題はコンピュータビジョンの分野で様々な応用がある。線形プログラミングに基づく厳密なアルゴリズムは、持ち上げられたマルチカットポリトープを理解する必要がある。最近の進歩にもかかわらず、これらのポリトープに関する基本的な2つの疑問は未解決のままである: どの低い立方体不等式がファセットを定義するのか、どの不等式がファセットを定義するのか? 本稿では, 必要な条件, 十分かつ効率的に決定可能な条件を確立することで, 最初の質問に答える。第2の質問に向けて、カット不等式のファセット定義性を決定することはNPハードであることを示す。これにより、昇降型マルチカットポリトープの正準面の解析が完了する。

The lifted multicut problem has diverse applications in the field of computer vision. Exact algorithms based on linear programming require an understanding of lifted multicut polytopes. Despite recent progress, two fundamental questions about these polytopes have remained open: Which lower cube inequalities define facets, and which cut inequalities define facets? In this article, we answer the first question by establishing conditions that are necessary, sufficient and efficiently decidable. Toward the second question, we show that deciding facet-definingness of cut inequalities is NP-hard. This completes the analysis of canonical facets of lifted multicut polytopes.

翻訳日:2024-03-20 23:12:03 公開日:2024-03-18

# 次の単語を予測する:人間はこのタスクに不確実性を示し、言語モデル______

Predict the Next Word: Humans exhibit uncertainty in this task and language models _____ ( http://arxiv.org/abs/2402.17527v2 )

ライセンス: Link先を確認

Evgenia Ilia, Wilker Aziz,

(参考訳) 言語モデル (LM) は、人間の生成したテキストに確率を割り当てるよう訓練された統計モデルである。このように、人間の言語的多様性をよく表すかどうかを疑問視することは妥当である。この形式の統計評価は、受理性判定(人的評価)や堅牢な自動プロキシ(非自明な)を必要とするため、通過レベルでの実施が困難である。しかしながら、ある文脈が与えられた単語レベルでは、LMからのサンプルは、利用可能なコンテキストの代替の単一単語継続の事前記録されたデータセットと正確なマッチングによって評価することができる。我々は,この事実を生かし,人間(特に英語話者の集団)が「次の単語予測」タスクで示す多様性を再現するLMの能力を評価する。これは、テキスト分類の文脈において、Baan et al (2022) は、人間の不確実性に対するキャリブレーション(キャリブレーション)と呼んだ。我々は、GPT2、BLOOM、ChatGPTを評価し、人間の不確実性に対するキャリブレーションがかなり低いことを発見した。また, 予測校正誤差(ECE)の誤りを反映し, コミュニティに対して, この設定でそれに頼ることを推奨する。

Language models (LMs) are statistical models trained to assign probability to human-generated text. As such, it is reasonable to question whether they approximate linguistic variability exhibited by humans well. This form of statistical assessment is difficult to perform at the passage level, for it requires acceptability judgements (i.e., human evaluation) or a robust automated proxy (which is non-trivial). At the word level, however, given some context, samples from an LM can be assessed via exact matching against a prerecorded dataset of alternative single-word continuations of the available context. We exploit this fact and evaluate the LM's ability to reproduce variability that humans (in particular, a population of English speakers) exhibit in the 'next word prediction' task. This can be seen as assessing a form of calibration, which, in the context of text classification, Baan et al. (2022) termed calibration to human uncertainty. We assess GPT2, BLOOM and ChatGPT and find that they exhibit fairly low calibration to human uncertainty. We also verify the failure of expected calibration error (ECE) to reflect this, and as such, advise the community against relying on it in this setting.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# ランダム位相演算子とサブグラフ位相演算子を用いたQAOA

QAOA with random and subgraph phase operators ( http://arxiv.org/abs/2402.18412v2 )

ライセンス: Link先を確認

Anthony Wilkie, Igor Gaidai, James Ostrowski, Rebekah Herrman,

(参考訳) 量子近似最適化アルゴリズム(QAOA)は、組合せ最適化問題を解くのに使える有望な量子アルゴリズムである。通常のQAOAアンザッツは、コストとミキサーハミルトンの交互に応用される。本研究では,従来のコストであるハミルトン演算子以外のハミルトニアンの使用がQAOAの性能に与える影響について検討する。 p = 1のカスタム位相演算子を持つQAOAの期待値式を導出し、これらのカスタム位相演算子のうちいくつかが元のアルゴリズムよりも高い近似比を達成できることを数値的に示す。テストされた全てのグラフのうち、ランダムなカスタム位相演算子の0.036\%、サブグラフのカスタム位相演算子の75.9\%、三角形を除去したカスタム位相演算子の95.1\%、最大次エッジを除去したカスタム位相演算子の93.9\%は、元のQAOA実装よりも高い近似比を持つ。この発見は、QAOAの性能をさらに向上させるために、より良い位相演算子を設計できるかどうかという疑問を提起する。

The quantum approximate optimization algorithm (QAOA) is a promising quantum algorithm that can be used to approximately solve combinatorial optimization problems. The usual QAOA ansatz consists of an alternating application of the cost and mixer Hamiltonians. In this work, we study how using Hamiltonians other than the usual cost Hamiltonian, dubbed custom phase operators, can affect the performance of QAOA. We derive an expected value formula for QAOA with custom phase operators at p = 1 and show numerically that some of these custom phase operators can achieve higher approximation ratio than the original algorithm implementation. Out of all the graphs tested, 0.036\% of the random custom phase operators, 75.9\% of the subgraph custom phase operators, 95.1\% of the triangle-removed custom phase operators, and 93.9\% of the maximal degree edge-removed custom phase operators have a higher approximation ratio than the original QAOA implementation. This finding opens up the question of whether better phase operators can be designed to further improve the performance of QAOA.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# 鏡ライブラリー:低次元のディープニューラルネットは反射特性を持つ凸ラッソモデルである

A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features ( http://arxiv.org/abs/2403.01046v2 )

ライセンス: Link先を確認

Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci,

(参考訳) 1次元データ上でニューラルネットワークをトレーニングすることは、凸ラッソ問題を固定的、明示的に定義された特徴の辞書行列で解くのと等価であることを示す。特定の辞書はアクティベーションと深さに依存する。分割線形アクティベーションを持つ2層ネットワーク,最大4層までの細いReLUネットワーク,符号アクティベーションと任意の深さを持つ長方形およびツリーネットワークについて検討する。興味深いことに、ReLUネットワークでは、第4のレイヤが、自分自身に関するトレーニングデータのリフレクションを表す機能を生成する。 Lasso表現は、グローバルに最適なネットワークとソリューションランドスケープに洞察を与える。

We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in ReLU networks, a fourth layer creates features that represent reflections of training data about themselves. The Lasso representation sheds insight to globally optimal networks and the solution landscape.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# ParallelPARC: 自然言語アナロジーを生成するためのスケーラブルなパイプライン

ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies ( http://arxiv.org/abs/2403.01139v2 )

ライセンス: Link先を確認

Oren Sultan, Yonatan Bitton, Ron Yosef, Dafna Shahaf,

(参考訳) アナロジー作成は人間の認知の中心であり、新しい状況に適応することができる。現在、ほとんどのアナロジーデータセットは単純なアナロジー(例:単語のアナロジー)に焦点を当てている。これは計算類似の進歩を後押ししていると我々は信じている。本研究では,現在最先端のLarge Language Models (LLM) を利用したデータ生成パイプラインであるParallelPARC (Parallel Paragraph Creator) を設計し,複雑な段落をベースとしたアナロジーと,複雑で難易度の高いイントラクタを作成する。当社のパイプラインを実演し、科学的プロセス間のアナロジーのデータセットであるProPara-Logyを作成します。我々は人によって検証された金のセットと銀のセットを自動生成する。我々は、LLMと人間のアナロジー認識を二分選択および複数選択設定でテストし、光監督後、人間が最良のモデル(〜13%のギャップ)より優れていることを示した。私たちは、銀のセットがトレーニングモデルに役立つことを実証します。最後に、難解な気晴らし者がLSMを混乱させるが、人間ではないことを示す。私たちのパイプラインは、この新興分野の研究を促進することを願っています。

Analogy-making is central to human cognition, allowing us to adapt to novel situations -- an ability that current AI systems still lack. Most analogy datasets today focus on simple analogies (e.g., word analogies); datasets including complex types of analogies are typically manually curated and very small. We believe that this holds back progress in computational analogy. In this work, we design a data generation pipeline, ParallelPARC (Parallel Paragraph Creator) leveraging state-of-the-art Large Language Models (LLMs) to create complex, paragraph-based analogies, as well as distractors, both simple and challenging. We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes. We publish a gold-set, validated by humans, and a silver-set, generated automatically. We test LLMs' and humans' analogy recognition in binary and multiple-choice settings, and found that humans outperform the best models (~13% gap) after a light supervision. We demonstrate that our silver-set is useful for training models. Lastly, we show challenging distractors confuse LLMs, but not humans. We hope our pipeline will encourage research in this emerging field.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# TPLLM: 事前訓練された大規模言語モデルに基づく交通予測フレームワーク

TPLLM: A Traffic Prediction Framework Based on Pretrained Large Language Models ( http://arxiv.org/abs/2403.02221v2 )

ライセンス: Link先を確認

Yilong Ren, Yue Chen, Shuai Liu, Boyue Wang, Haiyang Yu, Zhiyong Cui,

(参考訳) 交通予測は、インテリジェントトランスポーテーションシステム(ITS)のパービューにおいて重要な側面を占めており、高精度な予測の達成は、効率的な交通管理に重要な意味を持つ。ディープラーニング駆動型トラフィック予測モデルの精度は、通常、トレーニングデータの量が増加するにつれて、上昇傾向を呈する。しかしながら、トラフィックのための包括的な時空間データセットの調達は、主にデータ収集と保持に関連する実質的なコストから生じる、課題によって引き起こされることが多い。したがって,歴史的交通量に制限のある地域で,正確な予測と優れた一般化能力を達成できるモデルを開発することは難しい問題である。近年の先進的な大規模言語モデル (LLM) は, クロスモダリティの知識伝達や数発の学習において, 極めて優れた能力を発揮している。 LLMを利用した新しい交通予測フレームワークであるTPLLMを導入する。本稿では,畳み込みニューラルネットワーク(CNN)に基づくシーケンス埋め込み層とグラフ畳み込みニューラルネットワーク(GCN)に基づくグラフ埋め込み層を構築し,それぞれにシーケンスの特徴と空間的特徴を抽出する。これらは後にLLMに適した入力を形成するために統合される。低ランク適応(LoRA)ファインチューニングアプローチをTPLLMに適用することにより,効率的な学習と計算要求の最小化を実現する。実世界の2つのデータセットの実験では、TPLLMはフルサンプルと少数ショットの予測シナリオの両方で高い性能を示し、歴史的交通量の少ない地域でのITSの開発を効果的に支援している。

Traffic prediction constitutes a pivotal facet within the purview of Intelligent Transportation Systems (ITS), and the attainment of highly precise predictions holds profound significance for efficacious traffic management. The precision of prevailing deep learning-driven traffic prediction models typically sees an upward trend with a rise in the volume of training data. However, the procurement of comprehensive spatiotemporal datasets for traffic is often fraught with challenges, primarily stemming from the substantial costs associated with data collection and retention. Consequently, developing a model that can achieve accurate predictions and good generalization ability in areas with limited historical traffic data is a challenging problem. It is noteworthy that the rapidly advancing pretrained Large Language Models (LLMs) of recent years have demonstrated exceptional proficiency in cross-modality knowledge transfer and few-shot learning. Recognizing the sequential nature of traffic data, similar to language, we introduce TPLLM, a novel traffic prediction framework leveraging LLMs. In this framework, we construct a sequence embedding layer based on Convolutional Neural Networks (CNNs) and a graph embedding layer based on Graph Convolutional Networks (GCNs) to extract sequence features and spatial features, respectively. These are subsequently integrated to form inputs that are suitable for LLMs. A Low-Rank Adaptation (LoRA) fine-tuning approach is applied to TPLLM, thereby facilitating efficient learning and minimizing computational demands. Experiments on two real-world datasets demonstrate that TPLLM exhibits commendable performance in both full-sample and few-shot prediction scenarios, effectively supporting the development of ITS in regions with scarce historical traffic data.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# フォノン急速断熱路による閉じ込められたイオンの冷却

Cooling trapped ions with phonon rapid adiabatic passage ( http://arxiv.org/abs/2403.02315v2 )

ライセンス: Link先を確認

M. I. Fabrikant, P. Lauria, I. S. Madjarov, W. C. Burton, R. T. Sutherland,

(参考訳) 量子電荷結合デバイス(QCCD)コンピュータアーキテクチャの最近のデモでは、回路時間は冷却によって支配される。マルチイオン結晶の運動モードでは、冷却剤イオンの関与が低いため、マグニチュードのオーダーが他の結晶よりも長くかかる。ここでは, 直接冷却よりも短い時間スケールで, 選択モードの熱集団をコヒーレントに交換することにより, この問題を回避する新しい手法を, フォノン急速断熱通路 (phrap) と呼ぶ。断熱的急速通過とは対照的に, 急速冷却モードと直流電場を用いた急速冷却モードを準静的に結合する。結晶が断熱的に横切られないようにすると、ほぼ完全なフォノン集団交換結果が得られる。我々はこれを2イオン結晶上で実証し、全ての放射状モードの間接的な地中冷却を、直接冷却と比較して桁違いの速度アップを達成することを示した。また、この手法の電位と制御磁場のゆらぎを捕捉する感度が低いことを示し、さらにn~200の温度からサブクアンタ温度を達成できることを見出した。

In recent demonstrations of the quantum charge-coupled device (QCCD) computer architecture, circuit times are dominated by cooling. Some motional modes of multi-ion crystals take orders-of-magnitude longer to cool than others because of low coolant ion participation. Here we demonstrate a new technique, which we call phonon rapid adiabatic passage (phrap), that avoids this issue by coherently exchanging the thermal populations of selected modes on timescales short compared to direct cooling. Analogous to adiabatic rapid passage, we quasi-statically couple these slow-cooling modes with fast-cooling ones using DC electric fields. When the crystal is then adiabatically ramped through the resultant avoided crossing, nearly-complete phonon population exchange results. We demonstrate this on two-ion crystals, and show the indirect ground-state cooling of all radial modes--achieving an order of magnitude speedup compared to direct cooling. We also show the technique's insensitivity to trap potential and control field fluctuations, and find that it still achieves sub-quanta temperatures starting as high as n~200.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# Polyak Momentumを用いた非凸確率合成最適化

Non-Convex Stochastic Composite Optimization with Polyak Momentum ( http://arxiv.org/abs/2403.02967v2 )

ライセンス: Link先を確認

Yuan Gao, Anton Rodomanov, Sebastian U. Stich,

(参考訳) 確率的近位勾配法は、広く使われている確率的勾配勾配勾配法(SGD)の強力な一般化であり、機械学習において多くの応用を見出した。しかし、この手法は確率ノイズが顕著な非凸条件(すなわち、小さなバッチサイズまたは境界バッチサイズのみを使用する場合)に収束しないことが知られている。本稿では,ポリーク運動量を持つ確率的近位勾配法に着目した。本手法は,バッチサイズに関係なく,非凸合成最適化問題に対して最適収束率が得られることを示す。さらに, 合成最適化におけるポリアクモーメントの分散低減効果を厳密に解析し, 近似ステップが不正確に解ける場合にも収束することを示す。最後に, 理論的結果を検証する数値実験を行った。

The stochastic proximal gradient method is a powerful generalization of the widely used stochastic gradient descent (SGD) method and has found numerous applications in Machine Learning. However, it is notoriously known that this method fails to converge in non-convex settings where the stochastic noise is significant (i.e. when only small or bounded batch sizes are used). In this paper, we focus on the stochastic proximal gradient method with Polyak momentum. We prove this method attains an optimal convergence rate for non-convex composite optimization problems, regardless of batch size. Additionally, we rigorously analyze the variance reduction effect of the Polyak momentum in the composite optimization setting and we show the method also converges when the proximal step can only be solved inexactly. Finally, we provide numerical experiments to validate our theoretical results.

翻訳日:2024-03-20 23:01:00 公開日:2024-03-18

# DeepCRE:AI駆動のクロスドラッグ反応評価を通じてドラッグR&Dを変える

DeepCRE: Transforming Drug R&D via AI-Driven Cross-drug Response Evaluation ( http://arxiv.org/abs/2403.03768v3 )

ライセンス: Link先を確認

Yushuai Wu, Ting Zhang, Hao Zhou, Hainan Wu, Hanwen Sunchu, Lei Hu, Xiaofang Chen, Suyuan Zhao, Gaochao Liu, Chao Sun, Jiahuan Zhang, Yizhen Luo, Peng Liu, Zaiqing Nie, Yushuai Wu,

(参考訳) 治療応用と薬物研究・開発(R&D)の分野はどちらも重大な課題に直面している。その理由の1つは、薬物R&Dの後期におけるクロスドラッグ反応評価(CRE)の不十分さである。 in-silico CREモデルは有望な解決策をもたらすが、既存の方法論はターゲットや細胞ラインレベルなどの薬物R&Dの初期段階に限られており、臨床成功率に制限がある。本稿では、薬物研究開発の後期において、CREを効果的に予測する先駆的なAIモデルであるDeepCREを紹介する。 DeepCREは、患者レベルのCREの平均パフォーマンス改善を17.7%、表示レベルのCREを5倍に向上させることで、より正確なパーソナライズされた治療予測と、表示に対する薬価評価を改善することで、既存のベストモデルより優れている。さらに、DeepCREは、5/8の大腸癌オルガノイドの2つの承認された薬物のコンパレータセットよりもはるかに効果の高い6つの薬物候補を同定した。このことは、DeepCREが治療効果を増強した薬物候補のスペクトルを体系的に発見する能力を示し、薬物R&Dを変換する可能性を強調している。

The fields of therapeutic application and drug research and development (R&D) both face substantial challenges, i.e., the therapeutic domain calls for more treatment alternatives, while numerous promising pre-clinical drugs have failed in clinical trials. One of the reasons is the inadequacy of Cross-drug Response Evaluation (CRE) during the late stages of drug R&D. Although in-silico CRE models bring a promising solution, existing methodologies are restricted to early stages of drug R&D, such as target and cell-line levels, offering limited improvement to clinical success rates. Herein, we introduce DeepCRE, a pioneering AI model designed to predict CRE effectively in the late stages of drug R&D. DeepCRE outperforms the existing best models by achieving an average performance improvement of 17.7% in patient-level CRE, and a 5-fold increase in indication-level CRE, facilitating more accurate personalized treatment predictions and better pharmaceutical value assessment for indications, respectively. Furthermore, DeepCRE has identified a set of six drug candidates that show significantly greater effectiveness than a comparator set of two approved drugs in 5/8 colorectal cancer organoids. This demonstrates the capability of DeepCRE to systematically uncover a spectrum of drug candidates with enhanced therapeutic effects, highlighting its potential to transform drug R&D.

翻訳日:2024-03-20 20:59:05 公開日:2024-03-18

# 周期変調による地中キラル電流

Ground-state chiral current via periodic modulation ( http://arxiv.org/abs/2403.06688v2 )

ライセンス: Link先を確認

Shuyue Wang, Wuji Zhang, Chunfang Sun, Chunfeng Wu, Gangcheng Wang,

(参考訳) 本研究では,光子を介するDzyaloshinskii-Moriya相互作用を設計し,量子場と古典場によって駆動される3レベル原子に基づく基底状態キラル電流をエミュレートする。我々は、励起状態の有限寿命から生じる課題に対処できる、2レベル系の効果的なジアロシンスキー・モリヤ相互作用を導出するために、断熱除去技術を用いる。さらに,原子基底状態に対する周期変調の実装により,所望のダイナミクスを実現することができる。また、適切な駆動周波数と位相を選択することで、三状態および多状態キラル電流を得ることができる。また、トグルリングフレームに基づいて、他のコンポーネントに対してDzyaloshinskii-Moriya相互作用を設計する。数値シミュレーションの結果,提案手法によって完全に信頼性の高い基底状態のキラル電流が生成され,量子状態遷移と将来の量子ネットワークの発展の可能性が開けることが示唆された。

In this study, we engineer the Dzyaloshinskii-Moriya interaction mediated by photons to emulate ground-state chiral current based on three-level atoms driven by quantum and classical fields. We employ adiabatic elimination techniques to derive an effective Dzyaloshinskii-Moriya interaction Hamiltonian of two-level systems, which can address the challenges arising from the finite lifetime of excited states. Furthermore, we can ensure to achieve desired dynamics through the implementation of periodic modulation on the atomic ground states. Besides, three-state and multi-state chiral current can be obtained by choosing appropriate driving frequencies and phases. We also design the Dzyaloshinskii-Moriya interaction for the other components based on a toggling frame. The numerical simulation results further indicate that our proposal can generate a perfectly reliable ground-state chiral current and open up possibilities for quantum state transfer and the development of future quantum networks.

翻訳日:2024-03-20 20:59:04 公開日:2024-03-18

# ACFIX:スマートコントラクトにおけるアクセス制御脆弱性のコンテキストアウェア修復のための共通RBACプラクティスによるLLM指導

ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts ( http://arxiv.org/abs/2403.06838v2 )

ライセンス: Link先を確認

Lyuye Zhang, Kaixuan Li, Kairan Sun, Daoyuan Wu, Ye Liu, Haoye Tian, Yang Liu,

(参考訳) スマートコントラクトは、アクセス制御(AC)脆弱性が特に重要な、さまざまなセキュリティ問題の影響を受けやすい。既存の研究では複数の検出ツールが提案されているが、スマートコントラクトにおけるAC脆弱性の自動的かつ適切な修復は依然として課題である。通常テンプレートベースのアプローチで固定される、既存の修復ツールで一般的にサポートされている脆弱性タイプとは異なり、ACの主な障害は、人間レベルのインテリジェンスを必要とするタスクである適切なパッチコードを生成するためのAC関連のソースコードの長いリストの中で、適切な役割やパーミッションを特定することである。大規模言語モデル(LLM)の最近の進歩を生かして、最先端のGPT-4モデルを採用し、ACFIXと呼ばれる新しいアプローチで拡張する。重要な洞察は、コード機能の主要なカテゴリに共通するACプラクティスをマイニングし、それを使って、同様の機能でコードを修正するのにLLMをガイドできるということです。この目的のために、ACFIXはオフラインとオンラインの両方のフェーズを含む。まず、オフラインフェーズにおいて、ACFIXは344,251のオンチェーン契約から共通ロールベースアクセス制御(RBAC)のプラクティスの分類をマイニングし、上位1000組から49のロールパーミッションペアを分類する。第2に、ACFIXは、契約全体にわたるAC関連要素を追跡し、このコンテキスト情報とChain-of-Thoughtパイプラインを使用して、対象契約に対する最も適切なロールパーミッションペアを特定し、その後、適切なパッチを生成する。このパッチは有効性と有効性をチェックする。 ACFIXを評価するために、118個の実世界のAC脆弱性のベンチマークデータセットを構築し、ACFIXが94.92%の修正に成功したことを明らかにした。これは、ベースラインの GPT-4 に比べて大幅に改善され、52.54% しか達成されなかった。

Smart contracts are susceptible to various security issues, among which access control (AC) vulnerabilities are particularly critical. While existing research has proposed multiple detection tools, the automatic and appropriate repair of AC vulnerabilities in smart contracts remains a challenge. Unlike commonly supported vulnerability types by existing repair tools, such as reentrancy, which are usually fixed by template-based approaches, the main obstacle of AC lies in identifying the appropriate roles or permissions amid a long list of non-AC-related source code to generate proper patch code, a task that demands human-level intelligence. Leveraging recent advancements in large language models (LLMs), we employ the state-of-the-art GPT-4 model and enhance it with a novel approach called ACFIX. The key insight is that we can mine common AC practices for major categories of code functionality and use them to guide LLMs in fixing code with similar functionality. To this end, ACFIX involves both offline and online phases. First, during the offline phase, ACFIX mines a taxonomy of common Role-based Access Control (RBAC) practices from 344,251 on-chain contracts, categorizing 49 role-permission pairs from the top 1,000 pairs mined. Second, during the online phase, ACFIX tracks AC-related elements across the contract and uses this context information along with a Chain-of-Thought pipeline to guide LLMs in identifying the most appropriate role-permission pair for the subject contract and subsequently generating a suitable patch. This patch will then undergo a validity and effectiveness check. To evaluate ACFIX, we built the first benchmark dataset of 118 real-world AC vulnerabilities, and our evaluation revealed that ACFIX successfully repaired 94.92% of them. This represents a significant improvement compared to the baseline GPT-4, which achieved only 52.54%.

翻訳日:2024-03-20 20:59:04 公開日:2024-03-18

# エネルギー準位交差による相転移検出のための等変変量量子固有解法

Equivariant Variational Quantum Eigensolver to detect Phase Transitions through Energy Level Crossings ( http://arxiv.org/abs/2403.07100v2 )

ライセンス: Link先を確認

Giulio Crognaletti, Giovanni Di Bartolomeo, Michele Vischi, Luciano Loris Viteritti,

(参考訳) レベル分光は、異なる量子相を示す遷移点を特定するための強力な方法である。各量子相は励起状態の特徴的な配列を示すため、低い励起状態の間のエネルギー準位の交差は、相転移点を推定する信頼できる平均を与える。変分量子固有解法のような手法は、量子コンピューティングを用いて相互作用するシステムの基底状態を近似するのに有用であるが、低エネルギーの励起を捉えることは依然として困難である。本研究では,チェーン上の一重項および三重項励起状態を正確に記述するために,全スピンと翻訳対称性を保持する同変量子回路を導入する。さらに、ノイズが変動状態に与える影響を評価し、ゼロノイズ外挿法のような従来の緩和技術が、その物理的特性を確実に回復することを示す。

Level spectroscopy stands as a powerful method for identifying the transition point that delineates distinct quantum phases. Since each quantum phase exhibits a characteristic sequence of excited states, the crossing of energy levels between low-lying excited states offers a reliable mean to estimate the phase transition point. While approaches like the Variational Quantum Eigensolver are useful for approximating ground states of interacting systems using quantum computing, capturing low-energy excitations remains challenging. In our study, we introduce an equivariant quantum circuit that preserves the total spin and the translational symmetry to accurately describe singlet and triplet excited states in the $J_1$-$J_2$ Heisenberg model on a chain, which are crucial for characterizing its transition point. Additionally, we assess the impact of noise on the variational state, showing that conventional mitigation techniques like Zero Noise Extrapolation reliably restore its physical properties.

翻訳日:2024-03-20 20:59:04 公開日:2024-03-18

# Lindbladian SYKにおけるオペレータサイズの成長

Operator size growth in Lindbladian SYK ( http://arxiv.org/abs/2403.07115v2 )

ライセンス: Link先を確認

Jiasheng Liu, Rene Meyer, Zhuo-Yu Xian,

(参考訳) 我々は,Lindbladian SYKモデルにおいて,$q$-body相互作用項とリニアジャンプ項を有限散逸強度で有する演算子サイズの増大について検討した。演算子のサイズと分布を有限の$q$で計算し、解析的に大きめの$q$で計算する。散逸的な(生産的な)ジャンプ項では、サイズはマヨラナフェルミオンの数の半分よりも小さい(大きい)値に収束する。弱い散逸では、作用素の大きさの進化は二次的-指数的-プラトーな振る舞いを示す。プラトー値は、大きな$q$制限における相互作用のカップリングと線形ジャンプ項の比によって決定される。演算子のサイズ分布は、単体の場合と対照的に、遅くとも有限サイズ領域で局所化されている。さらに,有限散逸時の演算子サイズ濃度を示す演算子展開の時間非依存直交基底も導出した。最後に、演算子サイズ成長の不確実性関係が大きな$q$で飽和していることが観察され、散逸を伴う演算子サイズ成長の古典力学が導かれる。

We investigate the growth of operator size in the Lindbladian SYK model with $q$-body interaction terms and linear jump terms at finite dissipation strength. We compute the operator size as well as its distribution numerically at finite $q$ and analytically at large $q$. With dissipative (productive) jump terms, the size converges to a value smaller (larger) than half the number of Majorana fermions. At weak dissipation, the evolution of operator size displays a quadratic-exponential-plateau behavior. The plateau value is determined by the ratios between the coupling of the interaction and the linear jump term in the large $q$ limit. The operator size distribution remains localized in the finite size region even at late times, contrasting with the unitary case. Moreover, we also derived the time-independent orthogonal basis for operator expansion which exhibits the operator size concentration at finite dissipation. Finally, we observe that the uncertainty relation for operator size growth is saturated at large $q$, leading to a classical dynamics of the operator size growth with dissipation.

翻訳日:2024-03-20 20:59:04 公開日:2024-03-18

# STREAM:ビデオ生成モデルのための時空間評価と分析基準

STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models ( http://arxiv.org/abs/2403.09669v2 )

ライセンス: Link先を確認

Pum Jun Kim, Seojun Kim, Jaejun Yoo,

(参考訳) 画像生成モデルは、様々な評価指標からの包括的なガイダンスによって支援され、現実的で多様な画像の生成に大きな進歩をもたらした。しかし、現在のビデオ生成モデルは、改善のための洞察を提供するツールが限られている短いビデオクリップを生成するのに苦労している。現在のビデオ評価指標は、ビデオのユニークな特徴を過小評価するビデオ埋め込みネットワークで埋め込みを切り替えることによって、画像メトリクスの単純な適応である。解析の結果,広範に使用されているFrechet Video Distance(FVD)はビデオの時間的自然性よりも空間的側面に重点を置いていることが判明した。さらに、人間の評価からかなりの不安定性と分岐を示す。そこで本稿では,空間的側面と時間的側面を独立に評価するためのビデオ評価基準STREAMを提案する。この機能は様々な視点から動画生成モデルの包括的解析と評価を可能にする。我々はSTREAMがビデオの視覚的品質と時間的品質の両方に効果的な評価ツールを提供し、ビデオ生成モデルの改善領域に関する洞察を提供することを示す分析的および実験的証拠を提供する。我々の知る限り、STREAMはビデオの時間的側面と空間的側面を別々に評価できる最初の評価指標である。私たちのコードはhttps://github.com/pro2nit/STREAMで公開されています。

Image generative models have made significant progress in generating realistic and diverse images, supported by comprehensive guidance from various evaluation metrics. However, current video generative models struggle to generate even short video clips, with limited tools that provide insights for improvements. Current video evaluation metrics are simple adaptations of image metrics by switching the embeddings with video embedding networks, which may underestimate the unique characteristics of video. Our analysis reveals that the widely used Frechet Video Distance (FVD) has a stronger emphasis on the spatial aspect than the temporal naturalness of video and is inherently constrained by the input size of the embedding networks used, limiting it to 16 frames. Additionally, it demonstrates considerable instability and diverges from human evaluations. To address the limitations, we propose STREAM, a new video evaluation metric uniquely designed to independently evaluate spatial and temporal aspects. This feature allows comprehensive analysis and evaluation of video generative models from various perspectives, unconstrained by video length. We provide analytical and experimental evidence demonstrating that STREAM provides an effective evaluation tool for both visual and temporal quality of videos, offering insights into area of improvement for video generative models. To the best of our knowledge, STREAM is the first evaluation metric that can separately assess the temporal and spatial aspects of videos. Our code is available at https://github.com/pro2nit/STREAM.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# Touch-GS:3Dガウシアン・スプレイティングを監督するビジュアル触覚

Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting ( http://arxiv.org/abs/2403.09875v2 )

ライセンス: Link先を確認

Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III,

(参考訳) 本研究では,光学式触覚センサを用いた3次元ガウス撮影(3DGS)シーンの監視手法を提案する。光触覚センサはロボティクスにおいて操作やオブジェクト表現に広く利用されているが、光学触覚センサのデータは直接3DGSシーンを監督するには適していない。我々の表現は、ガウス的プロセス・インプリシット・サーフェスを利用してオブジェクトを暗黙的に表現し、多くのタッチを統一された表現と不確実性を組み合わせた。このモデルを2段階のプロセスで整列した単眼深度推定ネットワークにマージし、奥行きカメラと粗い整列を行い、タッチデータに合わせて微調整する。各トレーニング画像に対して,本手法は対応する融合深度と不確実性マップを生成する。この追加情報を利用することで、3DGSシーンモデルのトレーニングのための新たな損失関数である分散重み付き深度教師付き損失を提案する。我々は、DenseTact光触覚センサとRealSense RGB-Dカメラを利用して、不透明で透明な物体だけでなく、数ビューのシーン合成において、触覚と視覚の組み合わせが視覚や触覚よりも定量的に質的に良い結果をもたらすことを示す。プロジェクトページはhttp://armlabstanford.github.io/touch-gsでご覧ください。

In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# 言語モデルマージのためのフィッシャーマスクノード

Fisher Mask Nodes for Language Model Merging ( http://arxiv.org/abs/2403.09891v2 )

ライセンス: Link先を確認

Thennal D K, Ganesh Nathan, Suchithra M S,

(参考訳) 微調整された事前訓練モデルは、下流のパフォーマンスにおいて大きな利点をもたらす。 BERTなどの事前学習モデルの自然言語処理におけるユビキタスな性質は、タスク固有の微調整モデルの普及にも繋がった。これらのモデルは一般的に1つのタスクのみをうまく実行するので、マルチタスクのシナリオでは追加のトレーニングやアンサンブルが必要になる。モデルマージの増大する分野は、複数のタスク固有のモデルを単一のマルチタスクモデルに組み合わせるという課題に対処するソリューションを提供する。本研究では, トランスフォーマーのモデルマージ手法について紹介し, 従来のフィッシャー重み付き平均化における知見と, モデルプルーニングにおけるフィッシャー情報の利用について考察した。トランスフォーマーアーキテクチャにおけるマスクノードのフィッシャー情報を利用して,計算効率のよい重み付け手法を提案する。提案手法は, BERT シリーズの各種モデルにおいて, 計算コストのごく一部において, フルスケールのフィッシャー重み付け平均性能を上回り, ベースライン性能は+6.5 まで向上し, 最大速度は57.4倍に向上した。本研究は,現在のマルチタスク学習環境における本手法の有効性を実証し,新しいモデルアーキテクチャや学習シナリオに対するスケーラビリティと適応性を提案する。

Fine-tuning pre-trained models provides significant advantages in downstream performance. The ubiquitous nature of pre-trained models such as BERT and its derivatives in natural language processing has also led to a proliferation of task-specific fine-tuned models. As these models typically only perform one task well, additional training or ensembling is required in multi-task scenarios. The growing field of model merging provides a solution, dealing with the challenge of combining multiple task-specific models into a single multi-task model. In this study, we introduce a novel model merging method for Transformers, combining insights from previous work in Fisher-weighted averaging and the use of Fisher information in model pruning. Utilizing the Fisher information of mask nodes within the Transformer architecture, we devise a computationally efficient weighted-averaging scheme. Our method exhibits a regular and significant performance increase across various models in the BERT family, outperforming full-scale Fisher-weighted averaging in a fraction of the computational cost, with baseline performance improvements of up to +6.5 and a speedup of 57.4x in the biggest model. Our results prove the potential of our method in current multi-task learning environments and suggest its scalability and adaptability to new model architectures and learning scenarios.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# 見えないデータに対するAI大腸内視鏡モデルの一般化予測

Predicting Generalization of AI Colonoscopy Models to Unseen Data ( http://arxiv.org/abs/2403.09920v2 )

ライセンス: Link先を確認

Joel Shor, Carson McNeil, Yotam Intrator, Joseph R Ledsam, Hiro-o Yamano, Daisuke Tsurumaru, Hiroki Kayama, Atsushi Hamabe, Koji Ando, Mitsuhiko Ota, Haruei Ogino, Hiroshi Nakase, Kaho Kobayashi, Masaaki Miyo, Eiji Oki, Ichiro Takemasa, Ehud Rivlin, Roman Goldenberg,

(参考訳) 背景: 臨床実践におけるAI大腸内視鏡アルゴリズムの一般化性は, より広く採用するために重要である。しかし、現在、目に見えないデータのパフォーマンスを評価する技術は、高価で時間集約的なラベルを必要とする。方法:"Masked Siamese Network"(MSN)を用いて,未知のデータ中の新しい現象を同定し,ポリプ検出器の性能を予測する。 MSNは、ラベルなしでポリプ画像のマスクされた領域を予測するように訓練されている。本研究は,日本からの大腸内視鏡(354本,128時間)において,イスラエルからのデータのみに基づいてMSNを訓練し,未確認技術,狭帯域画像(NBI)およびクロマトエンドスコープ(CE)を検出する能力をテストする。また,MSNは日本からのデータに基づいて訓練を受けていないものの,両国の大腸粘膜におけるポリープのCAD(Computer Aided Detection)の性能を予測する能力についても検証した。結果: NBI と CE は日本白色光 (bootstrapped z-test, |z| > 496, p < 10^-8 for both) よりイスラエル白色光に似ていない。 MSNは99%の精度でNBIを検出し、ホワイトライトでのみトレーニングされているにもかかわらず、CEが我々のヒューリスティック(90%対79%の精度)より優れていると予測し、ノイズの多いラベルに対して堅牢な唯一の方法である。 MSNは、イスラエル内および日本の植民地内におけるCADポリプ検出性能(それぞれr=0.79、0.37)を予測している。日本における検出性能の訓練例は少ないが、MSNによる日本の性能予測は改善されている(r=0.56)。結論: 臨床データの分布変化を同定し, ラベルなしでCADe検出性能を予測できる。当社の自己監督型アプローチは、病院やデータがトレーニングから有意義に移行したなど、実際のデータとトレーニングの違いを検出するのに役立ちます。 MSNは大腸内視鏡以外の医療画像領域にも応用できる可能性がある。

Background: Generalizability of AI colonoscopy algorithms is important for wider adoption in clinical practice. However, current techniques for evaluating performance on unseen data require expensive and time-intensive labels. Methods: We use a "Masked Siamese Network" (MSN) to identify novel phenomena in unseen data and predict polyp detector performance. MSN is trained to predict masked out regions of polyp images, without any labels. We test MSN's ability to be trained on data only from Israel and detect unseen techniques, narrow-band imaging (NBI) and chromendoscoy (CE), on colonoscopes from Japan (354 videos, 128 hours). We also test MSN's ability to predict performance of Computer Aided Detection (CADe) of polyps on colonoscopies from both countries, even though MSN is not trained on data from Japan. Results: MSN correctly identifies NBI and CE as less similar to Israel whitelight than Japan whitelight (bootstrapped z-test, |z| > 496, p < 10^-8 for both) using the label-free Frechet distance. MSN detects NBI with 99% accuracy, predicts CE better than our heuristic (90% vs 79% accuracy) despite being trained only on whitelight, and is the only method that is robust to noisy labels. MSN predicts CADe polyp detector performance on in-domain Israel and out-of-domain Japan colonoscopies (r=0.79, 0.37 respectively). With few examples of Japan detector performance to train on, MSN prediction of Japan performance improves (r=0.56). Conclusion: Our technique can identify distribution shifts in clinical data and can predict CADe detector performance on unseen data, without labels. Our self-supervised approach can aid in detecting when data in practice is different from training, such as between hospitals or data has meaningfully shifted from training. MSN has potential for application to medical image domains beyond colonoscopy.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# PITA:物理式軌道オートエンコーダ

PITA: Physics-Informed Trajectory Autoencoder ( http://arxiv.org/abs/2403.11728v1 )

ライセンス: Link先を確認

Johannes Fischer, Kevin Rösch, Martin Lauer, Christoph Stiller,

(参考訳) 安全クリティカルなアプリケーションにおけるロボットシステムの検証には、起こりそうもない稀なエッジケースを含む多くのシナリオでのテストが必要であり、シミュレーションでのテストで現実世界のテストを補完する必要がある。生成モデルは、学習したラテントスペースでサンプリングすることで、エッジケースシナリオを生成するために生成されたデータで現実世界のデータセットを拡張するために使用することができる。オートエンコーダは、低次元の中間表現から入力データを再構成することを学ぶことで、特定の領域の潜在表現を学習することができる。しかし、そのような軌道は必ずしも物理的に可算であるとは限らないが、通常は入力軌道に存在しないノイズを含んでいる。そこで本研究では,物理力学モデルをオートエンコーダの損失関数に組み込んだ新しい物理インフォームド・トラジェクトリ・オートエンコーダ(PITA)アーキテクチャを提案する。この結果、入力軌跡を再構成するだけでなく、物理モデルにも従属する滑らかな軌跡が得られる。車両軌道の実際のデータセット上でPITAを評価し、その性能を通常のオートエンコーダと最先端のアクション空間オートエンコーダと比較する。

Validating robotic systems in safety-critical appli-cations requires testing in many scenarios including rare edgecases that are unlikely to occur, requiring to complement real-world testing with testing in simulation. Generative models canbe used to augment real-world datasets with generated data toproduce edge case scenarios by sampling in a learned latentspace. Autoencoders can learn said latent representation for aspecific domain by learning to reconstruct the input data froma lower-dimensional intermediate representation. However, theresulting trajectories are not necessarily physically plausible, butinstead typically contain noise that is not present in the inputtrajectory. To resolve this issue, we propose the novel Physics-Informed Trajectory Autoencoder (PITA) architecture, whichincorporates a physical dynamics model into the loss functionof the autoencoder. This results in smooth trajectories that notonly reconstruct the input trajectory but also adhere to thephysical model. We evaluate PITA on a real-world dataset ofvehicle trajectories and compare its performance to a normalautoencoder and a state-of-the-art action-space autoencoder.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# 古典的なプランニングドメインのための一般的なポリシーを学ぶ: C$_2$を超える

Learning General Policies for Classical Planning Domains: Getting Beyond C$_2$ ( http://arxiv.org/abs/2403.11734v1 )

ライセンス: Link先を確認

Simon Ståhlberg, Blai Bonet, Hector Geffner,

(参考訳) 計画領域全体にわたる一般的なポリシーを学習するためのGNNベースのアプローチは、C_2$の表現力、すなわち2つの変数を持つ一階述語論理とカウントによって制限される。この制限は、$k$-GNNs、$k=3$に移行することで克服できる。しかし、$C_3$の表現力を持つGNNは、$C_2$に制限された$C_3$と$2$-GNNとは違い、メッセージ交換の質的時間と埋め込みのためのキュービックスペースは非現実的だ。本稿では,リレーショナルGNNのパラメータ化バージョンを紹介する。 t$が無限大であるとき、R-GNN[$t$]は埋め込みのための二次空間のみを用いて3ドルGNNを近似する。 t=1$や$t=2$のような$t$の低い値の場合、R-GNN[$t$]は、より少ないメッセージを交換することで、より弱い近似を実現します。さらに、新しいR-GNN[$t$]アーキテクチャは、入力状態のみに適切な変換を施した元のR-GNNアーキテクチャである。実験結果から、R-GNN[$1$]とR-GNN[$2$]は、通常のR-GNNよりも明らかな性能向上を示し、また、300ドルに近いエッジトランスも示している。

GNN-based approaches for learning general policies across planning domains are limited by the expressive power of $C_2$, namely; first-order logic with two variables and counting. This limitation can be overcomed by transitioning to $k$-GNNs, for $k=3$, wherein object embeddings are substituted with triplet embeddings. Yet, while $3$-GNNs have the expressive power of $C_3$, unlike $1$- and $2$-GNNs that are confined to $C_2$, they require quartic time for message exchange and cubic space for embeddings, rendering them impractical. In this work, we introduce a parameterized version of relational GNNs. When $t$ is infinity, R-GNN[$t$] approximates $3$-GNNs using only quadratic space for embeddings. For lower values of $t$, such as $t=1$ and $t=2$, R-GNN[$t$] achieves a weaker approximation by exchanging fewer messages, yet interestingly, often yield the $C_3$ features required in several planning domains. Furthermore, the new R-GNN[$t$] architecture is the original R-GNN architecture with a suitable transformation applied to the input states only. Experimental results illustrate the clear performance gains of R-GNN[$1$] and R-GNN[$2$] over plain R-GNNs, and also over edge transformers that also approximate $3$-GNNs.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# LSKNet: リモートセンシングのための基礎的な軽量バックボーン

LSKNet: A Foundation Lightweight Backbone for Remote Sensing ( http://arxiv.org/abs/2403.11735v1 )

ライセンス: Link先を確認

Yuxuan Li, Xiang Li, Yimain Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang,

(参考訳) リモートセンシング画像は、その固有の複雑さのために、下流のタスクに対して異なる課題を生じさせる。リモートセンシング分類、オブジェクト検出、セマンティックセグメンテーションに多くの研究がなされているが、これらの研究の多くは、リモートセンシングシナリオに埋め込まれた貴重な事前知識を見落としている。このような事前知識は、遠隔センシングオブジェクトが十分に長い範囲のコンテキストを参照せずに誤って認識され、異なるオブジェクトに対して異なる可能性があるため、有用である。本稿では,これらの前提を考察し,軽量なLarge Selective Kernel Network(LSKNet)のバックボーンを提案する。 LSKNetはその大きな空間受容場を動的に調整し、リモートセンシングシナリオにおける様々なオブジェクトの範囲をモデル化する。我々の知る限り、大規模で選択的なカーネル機構は、これまでリモートセンシング画像では研究されていない。我々の軽量LSKNetは、標準リモートセンシング分類、オブジェクト検出、セマンティックセグメンテーションベンチマークに基づいて、最先端のスコアを設定しています。包括的分析により、同定された事前の意義とLSKNetの有効性がさらに検証された。コードはhttps://github.com/zcablii/LSKNetで公開されている。

Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote sensing objects may be mistakenly recognized without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes a lightweight Large Selective Kernel Network (LSKNet) backbone. LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To our knowledge, large and selective kernel mechanisms have not been previously explored in remote sensing images. Without bells and whistles, our lightweight LSKNet sets new state-of-the-art scores on standard remote sensing classification, object detection and semantic segmentation benchmarks. Our comprehensive analysis further validated the significance of the identified priors and the effectiveness of LSKNet. The code is available at https://github.com/zcablii/LSKNet.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# 量子後暗号:量子時代におけるデジタル通信のセキュア化

Post-Quantum Cryptography: Securing Digital Communication in the Quantum Era ( http://arxiv.org/abs/2403.11741v1 )

ライセンス: Link先を確認

Dr. G S Mamatha, Namya Dimri, Rasha Sinha,

(参考訳) 量子コンピューティングの出現は、従来の暗号システムに深刻な脅威をもたらし、RSAやECC、それに類する古典的な暗号化手法に依存するデジタル通信チャネルのセキュリティを侵害する脆弱性を露呈する。量子アルゴリズム、特にショアのアルゴリズムは、量子コンピュータの本質的な計算能力を利用して、これらの暗号スキームの根底にある数学的問題を効率的に解く。これに応えて、量子後暗号(PQC)は、量子攻撃に弱いレジリエントな暗号アルゴリズムの開発を目的とした重要な分野として登場した。本稿では,古典暗号システムの脆弱性を量子攻撃に適用し,量子コンピューティングの原理を解明し,格子ベースの暗号,コードベースの暗号,ハッシュベースの暗号,多変量多項式暗号などの様々なPQCアルゴリズムを導入する。量子コンピューティングの進歩の中でのデジタル通信の確保におけるPQCの重要性を強調して、この研究は、出現する量子脅威に直面したデータ完全性、機密性、および認証を保護する上で、その重要な役割を浮き彫りにしている。

The advent of quantum computing poses a profound threat to traditional cryptographic systems, exposing vulnerabilities that compromise the security of digital communication channels reliant on RSA, ECC, and similar classical encryption methods. Quantum algorithms, notably Shor's algorithm, exploit the inherent computational power of quantum computers to efficiently solve mathematical problems underlying these cryptographic schemes. In response, post-quantum cryptography (PQC) emerged as a critical field aimed at developing resilient cryptographic algorithms impervious to quantum attacks. This paper delineates the vulnerabilities of classical cryptographic systems to quantum attacks, elucidates the principles of quantum computing, and introduces various PQC algorithms such as lattice-based cryptography, code-based cryptography, hash-based cryptography, and multivariate polynomial cryptography. Highlighting the importance of PQC in securing digital communication amidst quantum computing advancements, this research underscores its pivotal role in safeguarding data integrity, confidentiality, and authenticity in the face of emerging quantum threats.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# PARMESAN:Dense Prediction Taskのためのパラメータフリーメモリ検索とトランスダクション

PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks ( http://arxiv.org/abs/2403.11743v1 )

ライセンス: Link先を確認

Philip Matthias Winter, Maria Wimmer, David Major, Dimitrios Lenis, Astrid Berg, Theresa Neubauer, Gaia Romana De Paolis, Johannes Novotny, Sophia Ulonska, Katja Bühler,

(参考訳) この研究では、トランスダクティブ推論を用いてディープラーニングの柔軟性に対処する。新しいタスクや新しいデータに適応するためには、既存のメソッドは通常、学習可能なパラメータのチューニングや、スクラッチから完全に再トレーニングを含む。計算をメモリからトランスダクション(transduction)で分離するという概念は,これらの問題を解決するためのステップストーンとして機能する,と我々は主張する。そこで我々は,高密度予測タスクを解くためにメモリモジュールを利用するスケーラブルなトランスダクション手法であるPARMESANを提案する。推論では、メモリ内の隠された表現が検索され、対応する例が見つかる。他の方法とは対照的に、PARMESANは、メモリの内容を変更するだけで、継続的なトレーニングや学習可能なパラメータの微調整を必要とせずに学習する。提案手法は一般的なニューラルネットワークと互換性があり、1D, 2D, 3Dグリッドベースのデータにカノニカルに転送する。継続学習や少数ショット学習といった複雑なタスクにおいて,我々のアプローチの能力を実証する。 PARMESANは、予測性能、知識保持、データ効率の点で同等でありながら、一般的なベースラインの最大370倍の速度で学習する。

In this work we address flexibility in deep learning by means of transductive reasoning. For adaptation to new tasks or new data, existing methods typically involve tuning of learnable parameters or even complete re-training from scratch, rendering such approaches unflexible in practice. We argue that the notion of separating computation from memory by the means of transduction can act as a stepping stone for solving these issues. We therefore propose PARMESAN (parameter-free memory search and transduction), a scalable transduction method which leverages a memory module for solving dense prediction tasks. At inference, hidden representations in memory are being searched to find corresponding examples. In contrast to other methods, PARMESAN learns without the requirement for any continuous training or fine-tuning of learnable parameters simply by modifying the memory content. Our method is compatible with commonly used neural architectures and canonically transfers to 1D, 2D, and 3D grid-based data. We demonstrate the capabilities of our approach at complex tasks such as continual and few-shot learning. PARMESAN learns up to 370 times faster than common baselines while being on par in terms of predictive performance, knowledge retention, and data-efficiency.

翻訳日:2024-03-20 20:39:33 公開日:2024-03-18

# 確率分類器を用いた組込み名前付きエンティティ認識

Embedded Named Entity Recognition using Probing Classifiers ( http://arxiv.org/abs/2403.11747v1 )

ライセンス: Link先を確認

Nicholas Popovič, Michael Färber,

(参考訳) 生成したテキストから意味情報を抽出することは、自動事実チェックや検索拡張生成のようなアプリケーションに有用なツールである。現在、これは推論中に別のモデルが必要であり、計算コストを増大させるか、言語モデルの破壊的な微調整を行う。代わりに、探索分類器を用いて事前学習した言語モデルに情報抽出機能を組み込むことにより、効率的なテキスト生成と情報抽出を可能にする。そこで本研究では,EMBERと呼ばれる手法を導入し,デコーダのみの言語モデルにおいて,微調整をせず,推論時に最小限の計算コストを発生させることなく,名前付きエンティティ認識を可能にすることを示す。具体的には,GPT-2 を用いた実験により,EMBER はストリーミングテキスト生成中に高いトークン生成率を維持しており,NER モデルによるベースラインの43.64% の速度低下に対して,約1% の速度低下しか無視できないことがわかった。コードとデータはhttps://github.com/nicpopovic/EMBER.comで公開されている。

Extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the language model. Instead, we propose directly embedding information extraction capabilities into pre-trained language models using probing classifiers, enabling efficient simultaneous text generation and information extraction. For this, we introduce an approach called EMBER and show that it enables named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments using GPT-2 show that EMBER maintains high token generation rates during streaming text generation, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline using a separate NER model. Code and data are available at https://github.com/nicpopovic/EMBER.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# クロススペクトル画像マッチングのための関係表現学習ネットワーク

Relational Representation Learning Network for Cross-Spectral Image Patch Matching ( http://arxiv.org/abs/2403.11751v1 )

ライセンス: Link先を確認

Chuang Yu, Yunpeng Liu, Jinmiao Zhao, Dou Quan, Zelin Shi,

(参考訳) 近年,クロススペクトル画像パッチマッチングにおいて特徴関係学習が注目されている。しかし、既存の研究は、画像パッチの特徴間の多様な関係の抽出に重点を置いており、個々の画像パッチの本質的な特徴表現を十分に無視している。そこで, 画像パッチの内在的特徴と画像パッチの特徴の関係を十分にマイニングすることに焦点を当てた, 革新的リレーショナル表現学習のアイデアを初めて提案する。そこで我々は,軽量リレーショナル表現学習ネットワーク(RRL-Net)を構築した。具体的には、個人固有の特徴を完全に特徴付けるオートエンコーダを革新的に構築し、深い特徴関係を抽出する機能相互作用学習(FIL)モジュールを導入する。さらに個々の固有の特徴をフルマイニングするために,各画像パッチのグローバルな特徴抽出を強化し,グローバル機能内のローカル依存関係をキャプチャする,軽量な多次元グローバル・ローカル・アテンション(MGLA)モジュールを構築した。 MGLAモジュールを組み合わせることで、機能抽出ネットワークをさらに探求し、アテンションに基づく軽量特徴抽出(ALFE)ネットワークを構築する。さらに、パラメータや推論時間の増加を回避しつつ、ネットワーク最適化を大幅に促進するマルチロス後処理(MLPP)最適化戦略を提案する。大規模な実験により、RRL-Netは複数の公開データセット上での最先端(SOTA)性能を達成することが示された。私たちのコードは後で公開されます。

Recently, feature relation learning has drawn widespread attention in cross-spectral image patch matching. However, existing related research focuses on extracting diverse relations between image patch features and ignores sufficient intrinsic feature representations of individual image patches. Therefore, an innovative relational representation learning idea is proposed for the first time, which simultaneously focuses on sufficiently mining the intrinsic features of individual image patches and the relations between image patch features. Based on this, we construct a lightweight Relational Representation Learning Network (RRL-Net). Specifically, we innovatively construct an autoencoder to fully characterize the individual intrinsic features, and introduce a Feature Interaction Learning (FIL) module to extract deep-level feature relations. To further fully mine individual intrinsic features, a lightweight Multi-dimensional Global-to-Local Attention (MGLA) module is constructed to enhance the global feature extraction of individual image patches and capture local dependencies within global features. By combining the MGLA module, we further explore the feature extraction network and construct an Attention-based Lightweight Feature Extraction (ALFE) network. In addition, we propose a Multi-Loss Post-Pruning (MLPP) optimization strategy, which greatly promotes network optimization while avoiding increases in parameters and inference time. Extensive experiments demonstrate that our RRL-Net achieves state-of-the-art (SOTA) performance on multiple public datasets. Our code will be made public later.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# 古典を再考する:韻文・詩文におけるジェンダーステレオタイプを同定・定式化する研究

Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems ( http://arxiv.org/abs/2403.11752v1 )

ライセンス: Link先を確認

Aditya Narayan Sankaran, Vigneshwaran Shankaran, Sampath Lonka, Rajesh Sharma,

(参考訳) 韻律や詩は文化規範や社会的な役割を伝達する強力な媒体である。しかしながら、これらの作品における男女のステレオタイプが広く存在することは、偏見の知覚を永続させ、個人のアイデンティティの範囲を制限する。過去の研究では、幼児期にステレオタイピングと偏見が出現することが示されており、因果的メカニズムに関する発達的研究は、ステレオタイピングと偏見の理解と制御に不可欠である。本研究は,ジェンダーステレオタイプを特定するために韻文と詩のデータセットを収集し,ジェンダーバイアスを97%精度で同定するモデルを提案する。ジェンダーのステレオタイプをLarge Language Model (LLM) を用いて修正し、その効果を人間の教育者に対する比較調査で評価した。要約すると、本研究は文学作品におけるジェンダーステレオタイプの普及性を強調し、ジェンダーステレオタイプを是正するLLMの可能性を明らかにする。本研究は,ジェンダー平等に関する言説に重要な貢献をし,芸術表現におけるインクリシティを高めることを目的としている。

Rhymes and poems are a powerful medium for transmitting cultural norms and societal roles. However, the pervasive existence of gender stereotypes in these works perpetuates biased perceptions and limits the scope of individuals' identities. Past works have shown that stereotyping and prejudice emerge in early childhood, and developmental research on causal mechanisms is critical for understanding and controlling stereotyping and prejudice. This work contributes by gathering a dataset of rhymes and poems to identify gender stereotypes and propose a model with 97\% accuracy to identify gender bias. Gender stereotypes were rectified using a Large Language Model (LLM) and its effectiveness was evaluated in a comparative survey against human educator rectifications. To summarize, this work highlights the pervasive nature of gender stereotypes in literary works and reveals the potential of LLMs to rectify gender stereotypes. This study raises awareness and promotes inclusivity within artistic expressions, making a significant contribution to the discourse on gender equality.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# 聴覚的情緒的ミミリー強度推定のための効率的な特徴抽出とレイトフュージョン戦略

Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation ( http://arxiv.org/abs/2403.11757v1 )

ライセンス: Link先を確認

Jun Yu, Wangyuan Zhu, Jichao Zhu,

(参考訳) 本稿では,第6回ABAW(Affective Behavior Analysis in-the-wild)コンペティション(ABAW)コンペティション(ABAW)コンペティション(ABAW)コンペティション(Emotional Mimicry Intensity:EMI推定課題)の解決法を提案する。

In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration," "Amusement," "Determination," "Empathic Pain," "Excitement," and "Joy").

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# DAOガバナンスプロセスの謎化

Demystifying the DAO Governance Process ( http://arxiv.org/abs/2403.11758v1 )

ライセンス: Link先を確認

Junjie Ma, Muhui Jiang, Jinan Jiang, Xiapu Luo, Yufeng Hu, Yajin Zhou, Qi Wang, Fengwei Zhang,

(参考訳) 分散自律組織(DAO)は、分散ガバナンスを実現するために、分散アプリケーション(dApps)のための一般的なガバナンスソリューションになる。 DAOでは、ほとんどのメンバーからの承認なしに、任意のエンティティでdAppsを制御できない。しかし、その優位性にもかかわらず、DAOはいくつかの攻撃の対象にもなっており、数百万ドルが失われている。本稿では,ブロックチェーンにおけるDAOガバナンスプロセスの概要について概説する。次に、ガバナンスプロセスの3つのコンポーネント(ガバナンス契約、文書化、提案)で問題を特定しました。それぞれのコンポーネントは、重大な損失をもたらす可能性のある問題に対して脆弱である。そして、上記の問題を検出する自動手法を開発した。既存のDAOエコシステム内の問題を調べるために、9つの異なるブロックチェーンにわたる16,427のDAO、183のドキュメント、122,307の提案を含む最先端のデータセットを構築しました。分析の結果,DAO開発者やメンバの大多数が,特に提案領域において,これらの問題に十分な注意を払っていないことが明らかとなった。その結果、調査された提案の60%以上は、メンバーに対して一貫した説明とコードを提供しておらず、DAOガバナンスプロセス内で透明性を確保するための大きなギャップを浮き彫りにしている。より良いDAOガバナンスエコシステムのために、DAO開発者とメンバーは、ガバナンスプロセス内の問題を特定し、対処するための方法を利用することができます。

Decentralized Autonomous Organization (DAO) becomes a popular governance solution for decentralized applications (dApps) to achieve decentralized governance. In the DAO, no single entity can arbitrarily control the dApps without approval from the majority of members. However, despite its advantages, DAO has also been targeted by several attacks, leading to the loss of millions of dollars. In this paper, we first provided an overview of the DAO governance process within the blockchain. Next, we identified the issues within three components of governance process: Governance Contract, Documentation, and Proposal. Each of these components is vulnerable to issues that could potentially result in substantial financial losses. Then we developed automated methods to detected above issues. To investigate the issues within the existing DAO ecosystem, we constructed a state-of-the-art dataset that includes 16,427 DAOs, 183 documentation, and 122,307 proposals across 9 different blockchains. Our analysis reveals that a majority of DAO developers and members have not given sufficient attention to these issues, especially in the area of proposal. The result shows that over 60% of the examined proposals fail to provide a consistent description and code for their members, highlighting a significant gap in ensuring transparency within the DAO governance process. For a better DAO governance ecosystem, DAO developers and members can utilize the methods to identify and address issues within governance process.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# なぜE.T.は自宅に電話できないのか - VoWiFiにおけるIPベースのジオブロッキングのグローバルな展望

Why E.T. Can't Phone Home: A Global View on IP-based Geoblocking at VoWiFi ( http://arxiv.org/abs/2403.11759v1 )

ライセンス: Link先を確認

Gabriel Karl Gegenhuber, Philipp Frenzel, Edgar Weippl,

(参考訳) 現在のセルラーネットワーク世代 (4G, 5G) では、IMS (IP Multimedia Subsystem) が音声通話やショートメッセージの終了に重要な役割を果たしている。多くのオペレーターはVoWiFi(Voice over Wi-Fi、Wi-Fi通話)を代替のネットワークアクセス技術として使用し、無線信号がない地域(例えば、農村部やシールドビルなど)での携帯電話の通信を補完する。顧客が国境を定期的に横断するモバイルの世界では、VoWiFiの通話は通常国内レートで請求されるため、海外旅行中に高価な国際ローミング料金を回避できる。この収益源を失わないために、海外に滞在する顧客のためにIMSへのアクセスをブロックするオペレーターもいる。本研究は,グローバルオペレータ間のVoWiFiの現在の展開状況を評価し,IP層上の既存のジオブロッキング対策を解析する。オペレータのかなりのシェア(IPv4: 14.6%、IPv6: 65.2%)がDNSまたはVoWiFiプロトコルレベルでジオブロッキングを実装しており、緊急呼び出しサービスの可用性に関して深刻な欠点を浮き彫りにしている。

In current cellular network generations (4G, 5G) the IMS (IP Multimedia Subsystem) plays an integral role in terminating voice calls and short messages. Many operators use VoWiFi (Voice over Wi-Fi, also Wi-Fi calling) as an alternative network access technology to complement their cellular coverage in areas where no radio signal is available (e.g., rural territories or shielded buildings). In a mobile world where customers regularly traverse national borders, this can be used to avoid expensive international roaming fees while journeying overseas, since VoWiFi calls are usually invoiced at domestic rates. To not lose this revenue stream, some operators block access to the IMS for customers staying abroad. This work evaluates the current deployment status of VoWiFi among worldwide operators and analyzes existing geoblocking measures on the IP layer. We show that a substantial share (IPv4: 14.6%, IPv6: 65.2%) of operators implement geoblocking at the DNS- or VoWiFi protocol level, and highlight severe drawbacks in terms of emergency calling service availability.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# 3R-INN: 動画を消費/配信しながら、どのように気候に優しいか?

3R-INN: How to be climate friendly while consuming/delivering videos? ( http://arxiv.org/abs/2403.11760v1 )

ライセンス: Link先を確認

Zoubida Ameur, Claire-Hélène Demarty, Daniel Menard, Olivier Le Meur,

(参考訳) ビデオの消費は、そのライフサイクルの様々な段階でかなりのエネルギーを必要とする。毎日10億時間のビデオが消費され、温室効果ガスの排出に大きく貢献する。したがって、ビデオチェーンの端から端までのカーボンフットプリントを減らすことは、ユーザ側の体験の質を保ちながら、非常に重要である。 3R-INNは,高解像度の粒度画像が与えられた場合,それを低解像度に再スケールし,フィルムグレーンを除去し,表示時の消費電力を低減させる。このような最小限の有効品質のコンテンツを提供することは、符号化、伝送、復号化、表示時のエネルギー消費を減らすことに寄与する。 3R-INNはまた、その可逆性と高周波の絡み合いのおかげで、高解像度のグレーン画像またはグレーンフリー版を復元でき、補助データを送信しない。実験により、符号化(78%)、復号化(77%)、レンダリング(5%から20%)でかなりの省エネ効果が得られたが、3R-INNは最先端のフィルム粒子合成およびエネルギ認識法より優れ、異なるテストセット上の再スケーリングタスクにおける最先端のパフォーマンスが達成された。

The consumption of a video requires a considerable amount of energy during the various stages of its life-cycle. With a billion hours of video consumed daily, this contributes significantly to the greenhouse gas emission. Therefore, reducing the end-to-end carbon footprint of the video chain, while preserving the quality of experience at the user side, is of high importance. To contribute in an impactful manner, we propose 3R-INN, a single light invertible network that does three tasks at once: given a high-resolution grainy image, it Rescales it to a lower resolution, Removes film grain and Reduces its power consumption when displayed. Providing such a minimum viable quality content contributes to reducing the energy consumption during encoding, transmission, decoding and display. 3R-INN also offers the possibility to restore either the high-resolution grainy original image or a grain-free version, thanks to its invertibility and the disentanglement of the high frequency, and without transmitting auxiliary data. Experiments show that, while enabling significant energy savings for encoding (78%), decoding (77%) and rendering (5% to 20%), 3R-INN outperforms state-of-the-art film grain synthesis and energy-aware methods and achieves state-of-the-art performance on the rescaling task on different test-sets.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# BEVCar:BEVマップとオブジェクトセグメンテーションのためのカメラレーダーフュージョン

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation ( http://arxiv.org/abs/2403.11761v1 )

ライセンス: Link先を確認

Jonas Schramm, Niclas Vödisch, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Wolfram Burgard, Abhinav Valada,

(参考訳) 鳥眼ビュー(BEV)の観点からのセマンティックシーンセグメンテーションは,移動ロボットの計画と意思決定を促進する上で重要な役割を担っている。最近の視覚のみの手法は、性能の顕著な進歩を示しているが、雨や夜間などの悪照明条件下では、しばしば苦労する。アクティブセンサーはこの課題に対する解決策を提供するが、LiDARの高コストは制限要因である。カメラデータを自動車レーダーで融合させることは、より安価な代替手段となるが、以前の研究ではあまり注目されなかった。本研究は,BEVCarと地図セグメンテーションを融合した新しいBEVCarを導入することで,この将来性のある道を推し進めることを目的としている。我々のアプローチの中核的な特徴は、まず生のレーダーデータのポイントベース符号化を学習し、BEV空間への画像特徴の持ち上げを効率的に初期化することである。 nuScenesデータセットに関する広範な実験を行い、BEVCarが現在の最先端技術より優れていることを示す。さらに,レーダ情報の導入により,環境条件の難易度が著しく向上し,遠隔物体のセグメンテーション性能が向上することを示す。将来の研究を促進するため、実験で使用したnuScenesデータセットの天気予報と、http://bevcar.cs.uni-freiburg.deでトレーニングされたモデルを提供しています。

Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at http://bevcar.cs.uni-freiburg.de.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# エンパワーアクティビリティ:異なる能力を持つ雇用と奨学金のためのポータル

EmpowerAbility: A portal for employment & scholarships for differently-abled ( http://arxiv.org/abs/2403.11769v1 )

ライセンス: Link先を確認

Himanshu Raj, Shubham Kumar, Dr. J Kalaivani,

(参考訳) インターネットは、今日の技術的に先進的な世界の求職者、特に障害のある人々にとって重要な資源となっている。彼らは主に、特定の要件とスキルセットに適合する仕事を見つけるために、インターネットリソースに依存しています。障害のある候補者の中には、即応の回答や求職提案を受ける者もいれば、複雑な求人ポータルを横切ることが難しい者もいるが、このプロセスの有効性は様々である。この相違は、障害のある人のための求職プロセスを大幅に高速化し、簡素化できるアクセシビリティ機能や機能を完全に理解・活用できないという典型的な誤りから生じ、このプロジェクトは、多様な能力を持つ個人に権限を与える仕事と奨学金のポータルである。成功物語、ユーザー中心の特徴、実践的な機会を通じて、物語を形作りながらレジリエンスと傾倒を育む。このプラットフォームのデュアルプログレッシブ戦略は、プライドを具現化し、現実のソリューションを提供し、触れる生活に永続的な影響を与える。

The internet has become a vital resource for job seekers in today's technologically advanced world, particularly for those with impairments. They mainly rely on internet resources to find jobs that fit their particular requirements and skill set. Though some disabled candidates receive prompt responses and job offers, others find it difficult to traverse the intricate world of job portals, the efficacy of this process frequently varies. This discrepancy results from a typical error: a failure to completely comprehend and utilize the accessibility features and functions that can significantly expedite and simplify the job search process for people with impairments.This project is a job and scholarship portal that empowers individuals with diverse abilities. Through inspiring success stories, user-centric features, and practical opportunities, it fosters resilience and inclusivity while reshaping narratives. This platform's dual-pronged strategy instills pride and offers real-world solutions, making a lasting impact on the lives it touches.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# モダリティ非依存 fMRI デコードによる視覚・言語

Modality-Agnostic fMRI Decoding of Vision and Language ( http://arxiv.org/abs/2403.11771v1 )

ライセンス: Link先を確認

Mitja Nikolaus, Milad Mozafari, Nicholas Asher, Leila Reddy, Rufin VanRullen,

(参考訳) 従来の研究では、画像を見る被験者の脳活動データを視覚モデル(モダリティ特異的デコーディング)だけでなく、言語モデル(モダリティ横断デコーディング)の特徴表現空間にマッピングすることが可能であることが示されている。本研究では,画像とテキスト記述の両方を見ている人々の大規模なfMRIデータセット(約8,500件の被験者毎のトライアル)を新たに導入し,使用した。このデータセットは、刺激が提示されるモダリティ(画像またはテキスト)に関係なく、被験者が見ている刺激を予測できる単一のデコーダである、モダリティに依存しないデコーダの開発を可能にする。我々はこのようなデコーダをトレーニングし、脳の信号を様々な利用可能な視覚、言語、マルチモーダル(ビジョン+言語)モデルから刺激表現にマッピングする。その結果,(1) モダリティに依存しないデコーダ,(2) モダリティに依存しないデコーダ,(2) モダリティに依存しないデコーダ,(3) 言語と低レベルの視覚(後頭)脳領域がテキストや画像刺激の復号に最適であるのに対し,高レベルの視覚(側頭)領域は両方の刺激タイプでよく機能することがわかった。

Previous studies have shown that it is possible to map brain activation data of subjects viewing images onto the feature representation space of not only vision models (modality-specific decoding) but also language models (cross-modal decoding). In this work, we introduce and use a new large-scale fMRI dataset (~8,500 trials per subject) of people watching both images and text descriptions of such images. This novel dataset enables the development of modality-agnostic decoders: a single decoder that can predict which stimulus a subject is seeing, irrespective of the modality (image or text) in which the stimulus is presented. We train and evaluate such decoders to map brain signals onto stimulus representations from a large range of publicly available vision, language and multimodal (vision+language) models. Our findings reveal that (1) modality-agnostic decoders perform as well as (and sometimes even better than) modality-specific decoders (2) modality-agnostic decoders mapping brain data onto representations from unimodal models perform as well as decoders relying on multimodal representations (3) while language and low-level visual (occipital) brain regions are best at decoding text and image stimuli, respectively, high-level visual (temporal) regions perform well on both stimulus types.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# S-JEPA:動的空間的注意によるシームレスなデータセット間転送に向けて

S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention ( http://arxiv.org/abs/2403.11772v1 )

ライセンス: Link先を確認

Pierre Guetschel, Thomas Moreau, Michael Tangermann,

(参考訳) 本稿では,脳波信号処理におけるシームレスなクロスデータセット転送の課題に触発され,JEPA(Joint Embedding Predictive Architectures)の利用に関する探索的研究を行う。近年,様々な領域におけるトランスファーラーニングにおいて,自己指導型学習が有望なアプローチとして出現している。しかし、脳波信号への応用はいまだに未解明である。本稿では、新しい領域固有の空間ブロックマスキング戦略と、下流分類のための3つの新しいアーキテクチャを含む、脳波記録を表現するためのSignal-JEPAを紹介する。本研究は54-subjectsデータセットを用いて,運動画像,ERP,SSVEPの3つのBCIパラダイムを用いて,モデルの下流性能を評価する。本研究は脳波信号符号化におけるJEPAの可能性に関する予備的証拠を提供する。特に,本研究では,下流分類における空間フィルタリングの重要性を強調し,事前学習例の長さが下流性能に与える影響を明らかにした。

Motivated by the challenge of seamless cross-dataset transfer in EEG signal processing, this article presents an exploratory study on the use of Joint Embedding Predictive Architectures (JEPAs). In recent years, self-supervised learning has emerged as a promising approach for transfer learning in various domains. However, its application to EEG signals remains largely unexplored. In this article, we introduce Signal-JEPA for representing EEG recordings which includes a novel domain-specific spatial block masking strategy and three novel architectures for downstream classification. The study is conducted on a 54~subjects dataset and the downstream performance of the models is evaluated on three different BCI paradigms: motor imagery, ERP and SSVEP. Our study provides preliminary evidence for the potential of JEPAs in EEG signal encoding. Notably, our results highlight the importance of spatial filtering for accurate downstream classification and reveal an influence of the length of the pre-training examples but not of the mask size on the downstream performance.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# DVN-SLAM:局所言語符号化に基づく動的ビジュアルニューラルSLAM

DVN-SLAM: Dynamic Visual Neural SLAM Based on Local-Global Encoding ( http://arxiv.org/abs/2403.11776v1 )

ライセンス: Link先を確認

Wenhua Wu, Guangming Wang, Ting Deng, Sebastian Aegidius, Stuart Shanks, Valerio Modugno, Dimitrios Kanoulas, Hesheng Wang,

(参考訳) 暗黙的表現に基づく同時局所化マッピング(SLAM)に関する最近の研究は,屋内環境において有望な成果を示した。しかし、暗黙のエンコーディングのシーン表現能力の制限、暗黙の表現からのレンダリングプロセスの不確実性、動的オブジェクトによる一貫性の破壊など、いくつかの課題がある。これらの課題に対処するため,DVN-SLAM という,局所グロバル融合型ニューラル暗黙表現に基づくリアルタイム動的視覚SLAMシステムを提案する。シーン表現能力を向上させるために,グローバルな構造と局所的な詳細の両方を考慮して暗黙の地図を構築することができる,局所的な融合型ニューラル暗黙の表現を導入する。レンダリング処理から生じる不確実性に対処するため,物体表面のシーン情報に集中して,最適化のための情報集中損失を設計する。提案したDVN-SLAMは、複数のデータセットをまたいだローカライゼーションとマッピングにおいて、競合的な性能を達成する。さらに重要なことは、DVN-SLAMは、他のNeRFベースの方法と異なる特徴である動的シーンの堅牢性を示す。

Recent research on Simultaneous Localization and Mapping (SLAM) based on implicit representation has shown promising results in indoor environments. However, there are still some challenges: the limited scene representation capability of implicit encodings, the uncertainty in the rendering process from implicit representations, and the disruption of consistency by dynamic objects. To address these challenges, we propose a real-time dynamic visual SLAM system based on local-global fusion neural implicit representation, named DVN-SLAM. To improve the scene representation capability, we introduce a local-global fusion neural implicit representation that enables the construction of an implicit map while considering both global structure and local details. To tackle uncertainties arising from the rendering process, we design an information concentration loss for optimization, aiming to concentrate scene information on object surfaces. The proposed DVN-SLAM achieves competitive performance in localization and mapping across multiple datasets. More importantly, DVN-SLAM demonstrates robustness in dynamic scenes, a trait that sets it apart from other NeRF-based methods.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# 通信プラットフォームにおけるリアルタイムディープフェイク音声検出システムの開発に向けて

Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms ( http://arxiv.org/abs/2403.11778v1 )

ライセンス: Link先を確認

Jonat John Mathew, Rakin Ahsan, Sae Furukawa, Jagdish Gautham Krishna Kumar, Huzaifa Pallan, Agamjeet Singh Padda, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan,

(参考訳) ディープフェイクオーディオは、音声ストリームの整合性のためにリアルタイム検出を必要とする通信プラットフォームにおいて、ますます脅威となる。本研究は,従来の非リアルタイム手法と異なり,リアルタイム通信プラットフォームにおける静的ディープフェイク音声検出モデルの適用可能性を評価する。実行可能ソフトウェアはクロスプラットフォーム互換のために開発され、リアルタイム実行が可能である。 ResnetとLCNNアーキテクチャに基づく2つのディープフェイクオーディオ検出モデルは、ASVspoof 2019データセットを使用して実装されており、ASVspoof 2019チャレンジベースラインと比較してベンチマークパフォーマンスが達成されている。本研究は、これらのモデルを強化するための戦略とフレームワークを提案し、通信プラットフォームにおけるリアルタイムディープフェイク音声検出の道を開いた。この研究は、オーディオストリームセキュリティの進歩に寄与し、動的でリアルタイムな通信シナリオにおけるロバストな検出機能を保証する。

Deepfake audio poses a rising threat in communication platforms, necessitating real-time detection for audio stream integrity. Unlike traditional non-real-time approaches, this study assesses the viability of employing static deepfake audio detection models in real-time communication platforms. An executable software is developed for cross-platform compatibility, enabling real-time execution. Two deepfake audio detection models based on Resnet and LCNN architectures are implemented using the ASVspoof 2019 dataset, achieving benchmark performances compared to ASVspoof 2019 challenge baselines. The study proposes strategies and frameworks for enhancing these models, paving the way for real-time deepfake audio detection in communication platforms. This work contributes to the advancement of audio stream security, ensuring robust detection capabilities in dynamic, real-time communication scenarios.

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# Prompt-Singer:自然言語による制御可能なSing-Voice-Synthesis

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt ( http://arxiv.org/abs/2403.11780v1 )

ライセンス: Link先を確認

Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao Jin, Zhou Zhao,

(参考訳) 近年の歌声合成法(SVS)は,声質や自然性に優れるが,歌声のスタイル特性を明示的に制御する能力は乏しい。本稿では,歌手の性別,声域,音量を自然言語で制御できる最初のSVS手法であるPrompt-Singerを提案する。マルチスケール階層を持つデコーダのみのトランスフォーマーに基づくモデルアーキテクチャを採用し、メロディ的精度を維持しつつテキスト条件付き声域制御が可能なレンジメロディデカップリングピッチ表現を設計する。さらに,テキスト表現の種類,テキストエンコーダの微調整,データ不足を軽減するための音声データの導入など,さまざまな実験環境についても検討する。実験により,本モデルは良好な制御能力と音質が得られることが示された。オーディオサンプルはhttp://prompt-singer.github.io で公開されている。

Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while keeping melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io .

翻訳日:2024-03-20 20:29:45 公開日:2024-03-18

# Infinite-ID: ID-semantics Decoupling Paradigmによるアイデンティティ保存型パーソナライゼーション

Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm ( http://arxiv.org/abs/2403.11781v1 )

ライセンス: Link先を確認

Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, Bin Li,

(参考訳) テキスト・ツー・イメージ生成のための拡散モデルの最近の進歩を反映して、アイデンティティ保存されたパーソナライゼーションは、単一の参照画像で特定のアイデンティティを正確に把握する上で大きな進歩を遂げた。しかし、既存の手法は、主にテキスト埋め込み空間に参照画像を統合するため、画像とテキスト情報の複雑な絡み合いが生じ、アイデンティティの忠実さとセマンティック一貫性の両立が困難になる。この課題に対処するために、アイデンティティ保存パーソナライゼーションのためのID-セマンティック・デカップリングパラダイムであるInfinite-IDを提案する。具体的には、拡散モデルの元のテキスト・クロス・アテンション・モジュールを非活性化しながら、十分なID情報を取得するために、追加のイメージ・クロス・アテンション・モジュールを組み込んだアイデンティティ・エンハンス・トレーニングを導入する。これにより、画像ストリームは、テキスト入力からの干渉を緩和しつつ、参照画像によって提供されるアイデンティティを忠実に表現することを保証する。さらに,2つのストリームをシームレスにマージするために,混合アテンションモジュールとAdaIN平均演算を組み合わせた機能相互作用機構を導入する。このメカニズムは、アイデンティティとセマンティック一貫性の完全性を高めるだけでなく、生成された画像のスタイルを便利に制御できる。原画像生成とスタイル画像生成の双方に対する大規模な実験結果から,提案手法の優れた性能が示された。

Drawing on recent advancements in diffusion models for text-to-image generation, identity-preserved personalization has made significant progress in accurately capturing specific identities with just a single reference image. However, existing methods primarily integrate reference images within the text embedding space, leading to a complex entanglement of image and text information, which poses challenges for preserving both identity fidelity and semantic consistency. To tackle this challenge, we propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization. Specifically, we introduce identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information while deactivating the original text cross-attention module of the diffusion model. This ensures that the image stream faithfully represents the identity provided by the reference image while mitigating interference from textual input. Additionally, we introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams. This mechanism not only enhances the fidelity of identity and semantic consistency but also enables convenient control over the styles of the generated images. Extensive experimental results on both raw photo generation and style image generation demonstrate the superior performance of our proposed method.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# ガウス過程による選好と選択から学ぶチュートリアル

A tutorial on learning from preferences and choices with Gaussian Processes ( http://arxiv.org/abs/2403.11782v1 )

ライセンス: Link先を確認

Alessio Benavoli, Dario Azzimonti,

(参考訳) 推奨モデリングは、経済学、決定理論、機械学習、統計学の交差点にある。個人の好みを理解し、どのように選択するかを理解することで、期待にぴったり合う製品を構築することができ、幅広い領域にわたってより効率的でパーソナライズされたアプリケーションを実現することができます。本チュートリアルの目的は,ガウス的プロセス(GP)による嗜好学習のための包括的で包括的な枠組みを提示し,理性原理(経済学や意思決定理論など)を学習プロセスにシームレスに組み込む方法を示すことである。このフレームワークは、確率関数を適切に調整することにより、ランダムなユーティリティモデル、識別の限界、およびオブジェクトとラベルの両方に矛盾する複数のユーティリティを持つシナリオを含む嗜好学習モデルの構築を可能にする。このチュートリアルは、既存の文献の特定のギャップに対処する新しいGPベースのモデルを同時に導入しながら、確立された研究の上に構築されている。

Preference modelling lies at the intersection of economics, decision theory, machine learning and statistics. By understanding individuals' preferences and how they make choices, we can build products that closely match their expectations, paving the way for more efficient and personalised applications across a wide range of domains. The objective of this tutorial is to present a cohesive and comprehensive framework for preference learning with Gaussian Processes (GPs), demonstrating how to seamlessly incorporate rationality principles (from economics and decision theory) into the learning process. By suitably tailoring the likelihood function, this framework enables the construction of preference learning models that encompass random utility models, limits of discernment, and scenarios with multiple conflicting utilities for both object- and label-preference. This tutorial builds upon established research while simultaneously introducing some novel GP-based models to address specific gaps in the existing literature.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# mqdtfit: 経験的マルチ量子量子欠陥計算のためのPython関数のコレクション

mqdtfit: A collection of Python functions for empirical multichannel quantum defect calculations ( http://arxiv.org/abs/2403.11783v1 )

ライセンス: Link先を確認

R. M. Potvliege,

(参考訳) この論文で配布されるPython関数は、複素原子の励起束縛状態を記述する多量子量子欠陥理論モデルのパラメータを計算するのに使うことができる。これらのパラメータは、ユーザが提供した実験データにモデルを適用することで得られる。理論の2つの主要な定式化は、モデルのパラメータが固有チャネル量子欠陥と変換行列の集合であるもの、およびこれらのパラメータが反応行列の要素であるものをサポートする。この分布は、理論エネルギーレベルを計算し、混合係数とチャネル分率を計算し、Lu-Fanoプロットを生成するプログラムを含む。

The Python functions distributed with this article can be used for calculating the parameters of multichannel quantum defect theory models describing excited bound states of complex atoms. These parameters are obtained by fitting a model to experimental data provided by the user. The two main formulations of the theory are supported, namely the one in which the parameters of the model are a set of eigen channel quantum defects and a transformation matrix, and the one where these parameters are the elements of a reactance matrix. The distribution includes programs for calculating theoretical energy levels, calculating mixing coefficients and channel fractions and producing Lu-Fano plots.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# ForzaETH Race Stack - 完全商用オフザシェルハードウェア上での大規模自動ヘッド・ツー・ヘッドレース

ForzaETH Race Stack - Scaled Autonomous Head-to-Head Racing on Fully Commercial off-the-Shelf Hardware ( http://arxiv.org/abs/2403.11784v1 )

ライセンス: Link先を確認

Nicolas Baumann, Edoardo Ghignone, Jonas Kühne, Niklas Bastuck, Jonathan Becker, Nadine Imholz, Tobias Kränzlin, Tian Yi Lim, Michael Lötscher, Luca Schwarzenbach, Luca Tognoni, Christian Vogt, Andrea Carron, Michele Magno,

(参考訳) ロボット工学における自律的なレースは、信頼性とリアルタイムな意思決定の必要性と、高速なダイナミクスを組み合わせる。このようなレースはソフトウェアとハードウェアを限界まで押し上げるが、既存のフルシステムソリューションの多くは複雑でカスタムなハードウェアとソフトウェアを必要とする。これにより再現性が制限され、機械、電気、ロボティクスの分野における総合的な専門知識を持つ、よく調達された研究所で、進歩と複製が実現可能である。自律性領域に関心がある研究者は、これらの分野の1つで部分的な経験しか持たないため、親しみと統合にかなりの時間を費やす必要がある。 ForzaETH Race Stackは、F1TENTHのために設計された自動運転レーシングソフトウェアプラットフォームを提供することで、このギャップに対処する。このアプローチは、自律レースの競争的側面を強化し、この分野における研究開発のためのアクセス可能なプラットフォームを提供する。 ForzaETH Race Stackはモジュラリティと運用上の使いやすさを念頭に設計されており、トラックの摩擦やレイアウトといった様々な環境条件へのカスタマイズと適応性を実現している。タイムトリアルレースとヘッド・ツー・ヘッドレースの両方を扱えるスタックは、公式のF1TENTH国際大会で複数回優勝し、その有効性、堅牢性、適応性を示した。

Autonomous racing in robotics combines high-speed dynamics with the necessity for reliability and real-time decision-making. While such racing pushes software and hardware to their limits, many existing full-system solutions necessitate complex, custom hardware and software, and usually focus on Time-Trials rather than full unrestricted Head-to-Head racing, due to financial and safety constraints. This limits their reproducibility, making advancements and replication feasible mostly for well-resourced laboratories with comprehensive expertise in mechanical, electrical, and robotics fields. Researchers interested in the autonomy domain but with only partial experience in one of these fields, need to spend significant time with familiarization and integration. The ForzaETH Race Stack addresses this gap by providing an autonomous racing software platform designed for F1TENTH, a 1:10 scaled Head-to-Head autonomous racing competition, which simplifies replication by using commercial off-the-shelf hardware. This approach enhances the competitive aspect of autonomous racing and provides an accessible platform for research and development in the field. The ForzaETH Race Stack is designed with modularity and operational ease of use in mind, allowing customization and adaptability to various environmental conditions, such as track friction and layout. Capable of handling both Time-Trials and Head-to-Head racing, the stack has demonstrated its effectiveness, robustness, and adaptability in the field by winning the official F1TENTH international competition multiple times.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# 事前学習大言語モデルを用いたハイパーリレーショナル知識グラフの構築

Construction of Hyper-Relational Knowledge Graphs Using Pre-Trained Large Language Models ( http://arxiv.org/abs/2403.11786v1 )

ライセンス: Link先を確認

Preetha Datta, Fedor Vitiugin, Anastasiia Chizhikova, Nitin Sawhney,

(参考訳) 包括的知識グラフの構築にはハイパーリレーションの抽出が不可欠だが,このタスクには限定的な教師付き手法が存在する。このギャップに対処するために,OpenAIのGPT-3.5モデルを用いたゼロショットプロンプトベースの手法を導入し,テキストからハイパーリレーショナルな知識を抽出する。モデルとベースラインを比較して,0.77のリコールで有望な結果を得た。現在、精度は低いが、モデル出力の詳細な分析により、この分野における今後の研究の道筋が明らかになっている。

Extracting hyper-relations is crucial for constructing comprehensive knowledge graphs, but there are limited supervised methods available for this task. To address this gap, we introduce a zero-shot prompt-based method using OpenAI's GPT-3.5 model for extracting hyper-relational knowledge from text. Comparing our model with a baseline, we achieved promising results, with a recall of 0.77. Although our precision is currently lower, a detailed analysis of the model outputs has uncovered potential pathways for future research in this area.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# EMIE-MAP:明示的メッシュと暗黙的符号化に基づく大規模路面再構成

EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding ( http://arxiv.org/abs/2403.11789v1 )

ライセンス: Link先を確認

Wenhua Wu, Qi Wang, Guangming Wang, Junping Wang, Tiankun Zhao, Yang Liu, Dongchao Gao, Zhe Liu, Hesheng Wang,

(参考訳) 道路路面の再構築は自動運転システムにおいて重要な役割を担い、道路路面の認識と高精度マッピングを可能にする。近年,特にシーンテクスチャのリアルなレンダリングにおいて,ニューラル暗黙符号化はシーン表現において顕著な成果を上げている。しかし、大規模なシーンの幾何学的情報を直接表現する上での課題に直面している。そこで我々は,明示的メッシュと暗黙的符号化に基づく大規模道路表面再構築手法であるEMIE-MAPを提案する。道路形状は明示的なメッシュで表現され、各頂点は色と意味情報を表す暗黙のエンコーディングを格納する。道路の高架化を最適化することの難しさを克服するために,多層パーセプトロン(MLP)に基づく軌道に基づく高架化初期化と高架化残差学習手法を導入する。さらに,暗黙のエンコーディングとマルチカメラカラーMPPデコーディングを用いることで,シーンの物理的特性とカメラ特性を別々にモデル化し,サラウンドビューを異なるカメラモデルに適合させる。本手法は,様々な現実の難易度シナリオにおいて,顕著な路面復元性能を実現する。

Road surface reconstruction plays a vital role in autonomous driving systems, enabling road lane perception and high-precision mapping. Recently, neural implicit encoding has achieved remarkable results in scene representation, particularly in the realistic rendering of scene textures. However, it faces challenges in directly representing geometric information for large-scale scenes. To address this, we propose EMIE-MAP, a novel method for large-scale road surface reconstruction based on explicit mesh and implicit encoding. The road geometry is represented using explicit mesh, where each vertex stores implicit encoding representing the color and semantic information. To overcome the difficulty in optimizing road elevation, we introduce a trajectory-based elevation initialization and an elevation residual learning method based on Multi-Layer Perceptron (MLP). Additionally, by employing implicit encoding and multi-camera color MLPs decoding, we achieve separate modeling of scene physical properties and camera characteristics, allowing surround-view reconstruction compatible with different camera models. Our method achieves remarkable road surface reconstruction performance in a variety of real-world challenging scenarios.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# Deep Medial Voxels: 解剖学的形状モデリングのためのメディア軸近似の学習

Deep Medial Voxels: Learned Medial Axis Approximations for Anatomical Shape Modeling ( http://arxiv.org/abs/2403.11790v1 )

ライセンス: Link先を確認

Antonio Pepe, Richard Schussnig, Jianning Li, Christina Gsaxner, Dieter Schmalstieg, Jan Egger,

(参考訳) 画像ボリュームからの形状再構成は、医用画像解析において繰り返し必要となる。一般的なワークフローはセグメンテーションステップから始まり、慎重に後処理とアドホックなメッシュアルゴリズムが続く。このシーケンスは時間を要する可能性があるため、ニューラルネットワークはテンプレートの変形によって形状を再構築するように訓練される。これらのネットワークは手動による介入なしに最先端の結果をもたらすが、これまでのところ、個体間のトポロジ的多様性がほとんどない解剖学的形状で評価されてきた。対照的に、他の研究は、メッシュ化と視覚化に複数の利点がある暗黙の形状モデルを学ぶことを好んでいる。我々の研究は、画像の体積からトポロジカルな骨格を忠実に近似した半単純表現であるディープ・メディカル・ボクセルを導入し、最終的に畳み込み面による形状復元へと導いた。再現技術は,可視化と計算機シミュレーションの両方の可能性を示している。

Shape reconstruction from imaging volumes is a recurring need in medical image analysis. Common workflows start with a segmentation step, followed by careful post-processing and,finally, ad hoc meshing algorithms. As this sequence can be timeconsuming, neural networks are trained to reconstruct shapes through template deformation. These networks deliver state-ofthe-art results without manual intervention, but, so far, they have primarily been evaluated on anatomical shapes with little topological variety between individuals. In contrast, other works favor learning implicit shape models, which have multiple benefits for meshing and visualization. Our work follows this direction by introducing deep medial voxels, a semi-implicit representation that faithfully approximates the topological skeleton from imaging volumes and eventually leads to shape reconstruction via convolution surfaces. Our reconstruction technique shows potential for both visualization and computer simulations.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# PAON:パデ近似を用いた新しいニューロンモデル

PAON: A New Neuron Model using Padé Approximants ( http://arxiv.org/abs/2403.11791v1 )

ライセンス: Link先を確認

Onur Keleş, A. Murat Tekalp,

(参考訳) 畳み込みニューラルネットワーク(CNN)は古典的なマカロック・ピッツニューロンモデルに基づいて構築されている。いくつかの研究者は、二次ニューロン、一般化された操作ニューロン、生成ニューロン、スーパーニューロンを含む強化されたニューロンモデルを提案しており、ポイントワイド活性化関数によって提供されるものよりも強い非線形性を持っている。また、Pade近似を一般化活性化関数として使う提案もある。本稿では,異なる順序の多項式の比として超越関数の最適数学的近似であるPade近似にインスパイアされた,Padeニューロン(Paons)と呼ばれる新しいニューロンモデルを紹介する。 Paonsは、他のすべての提案されたニューロンモデルのスーパーセットであることを示す。したがって、既知のCNNモデルの基本ニューロンは、Paonsに置き換えられる。本稿では、よく知られたResNetをPaonsによって構築されたPadeNetに拡張し、そのコンセプトを実証する。単一画像超解像タスクにおける実験により,PadeNetsは競合するアーキテクチャよりも優れた結果が得られることが示された。

Convolutional neural networks (CNN) are built upon the classical McCulloch-Pitts neuron model, which is essentially a linear model, where the nonlinearity is provided by a separate activation function. Several researchers have proposed enhanced neuron models, including quadratic neurons, generalized operational neurons, generative neurons, and super neurons, with stronger nonlinearity than that provided by the pointwise activation function. There has also been a proposal to use Pade approximation as a generalized activation function. In this paper, we introduce a brand new neuron model called Pade neurons (Paons), inspired by the Pade approximants, which is the best mathematical approximation of a transcendental function as a ratio of polynomials with different orders. We show that Paons are a super set of all other proposed neuron models. Hence, the basic neuron in any known CNN model can be replaced by Paons. In this paper, we extend the well-known ResNet to PadeNet (built by Paons) to demonstrate the concept. Our experiments on the single-image super-resolution task show that PadeNets can obtain better results than competing architectures.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# SETA:ドメイン・ジェネリゼーションのためのセマンティック・アウェア・トークン強化

SETA: Semantic-Aware Token Augmentation for Domain Generalization ( http://arxiv.org/abs/2403.11792v1 )

ライセンス: Link先を確認

Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao,

(参考訳) ドメイン一般化(DG)は、ターゲットドメインにアクセスすることなく、ドメインシフトに対するモデル堅牢性を高めることを目的としている。 DGのメソッドの一般的なカテゴリはデータ拡張であり、ドメインシフトをシミュレートする仮想サンプルの生成に焦点を当てている。しかし、DGの既存の拡張技術は、主に畳み込みニューラルネットワーク(CNN)向けに調整されており、トークンベースのアーキテクチャ、すなわちビジョントランスフォーマー(ViT)と多層パーセプトロン(MLP)モデルでの探索が限られている。本稿では,従来のCNNによる拡張手法がトークンベースモデルに与える影響について検討し,その性能が最適であることを明らかにする。この問題に対処するため,Semantic-Aware Token Augmentation (SETA)法を提案する。 SETAは、グローバルな形状の特徴を保持しつつ、局所的なエッジキューを摂動させることでトークンの特徴を変換し、形状情報のモデル学習を強化する。モデルの一般化能力をさらに高めるため,DGにおける2つの最先端スタイル拡張手法と組み合わせて,2種類のスタイルのバリエーションを導入する。本手法について理論的考察を行い,一般化リスク境界の低減効果を示す。 5つのベンチマークの総合的な実験により、本手法は様々なViTおよびMPPアーキテクチャでSOTA性能を実現することが証明された。私たちのコードはhttps://github.com/lingeringlight/SETAで公開されています。

Domain generalization (DG) aims to enhance the model robustness against domain shifts without accessing target domains. A prevalent category of methods for DG is data augmentation, which focuses on generating virtual samples to simulate domain shifts. However, existing augmentation techniques in DG are mainly tailored for convolutional neural networks (CNNs), with limited exploration in token-based architectures, i.e., vision transformer (ViT) and multi-layer perceptrons (MLP) models. In this paper, we study the impact of prior CNN-based augmentation methods on token-based models, revealing their performance is suboptimal due to the lack of incentivizing the model to learn holistic shape information. To tackle the issue, we propose the SEmantic-aware Token Augmentation (SETA) method. SETA transforms token features by perturbing local edge cues while preserving global shape features, thereby enhancing the model learning of shape information. To further enhance the generalization ability of the model, we introduce two stylized variants of our method combined with two state-of-the-art style augmentation methods in DG. We provide a theoretical insight into our method, demonstrating its effectiveness in reducing the generalization risk bound. Comprehensive experiments on five benchmarks prove that our method achieves SOTA performances across various ViT and MLP architectures. Our code is available at https://github.com/lingeringlight/SETA.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# 大規模言語モデルの推論能力:抽象と推論コーパスの詳細な分析

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus ( http://arxiv.org/abs/2403.11793v1 )

ライセンス: Link先を確認

Seungpil Lee, Woochang Sim, Donghyeon Shin, Sanha Hwang, Wongyu Seo, Jiwon Park, Seokki Lee, Sejin Kim, Sundong Kim,

(参考訳) 大規模言語モデル(LLM)の推論能力を評価する既存の手法は結果中心であり,推論プロセスの評価が困難である。プロセス中心の方法で大規模言語モデルの推論と文脈理解能力を評価するために,ARCデータセットを用いた新しい手法を提案する。 ARCは問題解決のために厳密な論理構造を必要としており、モデル推論能力と人間の比較を容易にするベンチマークである。実験の結果、大きな言語モデルは推論能力が弱いが、論理的一貫性、構成性、生産性の点でまだ遅れていることが明らかとなった。実験では,LLMの推論能力を強調し,人間レベルの推論を実現するための開発経路を提案する。

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# 低コストプライバシ対応分散型学習

Low-Cost Privacy-Aware Decentralized Learning ( http://arxiv.org/abs/2403.11795v1 )

ライセンス: Link先を確認

Sayan Biswas, Davide Frey, Romaric Gaudel, Anne-Marie Kermarrec, Dimitri Lerévérend, Rafael Pires, Rishi Sharma, François Taïani,

(参考訳) 本稿では、モデルトレーニングプロセス中に各モデル更新に相関ノイズを追加することに依存する、新しいプライバシ対応分散学習(DL)アルゴリズムであるZIP-DLを紹介する。この手法により、付加されたノイズはその相関関係により凝集過程中にほぼ中和し、モデル精度への影響を最小限に抑えることができる。さらに、ZIP-DLはノイズキャンセリングのために複数の通信ラウンドを必要としないため、プライバシ保護と通信オーバーヘッドの共通トレードオフに対処する。本稿では,収束速度とプライバシ保証の両方を理論的に保証し,ZIP-DLを実用シナリオに適用する。本研究は,ZIP-DLが脆弱性と精度の最良のトレードオフを達成していることを示す。特にZIP-DL (i)ベースラインDLと比較して最大52ポイントのリンク性攻撃の有効性を低下させ、 (二)プライバシー保護競争相手に対する会員推論攻撃において、同一の脆弱性に対して最大37の精度ポイントを達成する

This paper introduces ZIP-DL, a novel privacy-aware decentralized learning (DL) algorithm that relies on adding correlated noise to each model update during the model training process. This technique ensures that the added noise almost neutralizes itself during the aggregation process due to its correlation, thus minimizing the impact on model accuracy. In addition, ZIP-DL does not require multiple communication rounds for noise cancellation, addressing the common trade-off between privacy protection and communication overhead. We provide theoretical guarantees for both convergence speed and privacy guarantees, thereby making ZIP-DL applicable to practical scenarios. Our extensive experimental study shows that ZIP-DL achieves the best trade-off between vulnerability and accuracy. In particular, ZIP-DL (i) reduces the effectiveness of a linkability attack by up to 52 points compared to baseline DL, and (ii) achieves up to 37 more accuracy points for the same vulnerability under membership inference attacks against a privacy-preserving competitor

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# OpenOcc: Occupancy Representationによるオープン語彙3Dシーン再構築

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation ( http://arxiv.org/abs/2403.11796v1 )

ライセンス: Link先を確認

Haochen Jiang, Yueming Xu, Yihan Zeng, Hang Xu, Wei Zhang, Jianfeng Feng, Li Zhang,

(参考訳) 3D再構成は、移動ロボットの自律ナビゲーション分野で広く利用されている。しかし、以前の研究では、人間のインタラクションや視覚ナビゲーションといった高度なタスクを制限する、オープンワールドのシーン理解能力のない基本的な幾何学構造しか提供できない。さらに、従来の3Dシーン理解アプローチでは、高価なラベル付き3Dデータセットを使用して、単一のタスクのためにモデルをトレーニングしている。このように、ゼロショットシーン理解による幾何学的再構築、すなわちオープンな3次元理解と再構築は、将来の移動ロボットの発展に不可欠である。本稿では,3次元シーン再構成とオープン語彙理解をニューラルラディアンス場と統合する新しいフレームワークであるOpenOccを提案する。シーンの幾何学的構造を占有表現でモデル化し,ゼロショット推論のためのボリュームレンダリングを用いて,事前学習した開語彙モデルを3次元言語フィールドに蒸留する。さらに, 蒸留特性における不整合測定による言語表現の退化を解消するために, セマンティック・アウェア・アウェア・インシュレイト・プロポーザル (SCP) 法が提案されている。実験結果から,本手法は3次元シーン理解タスクにおいて,特に小型・長距離オブジェクトにおいて,競争性能が向上することが示された。

3D reconstruction has been widely used in autonomous navigation fields of mobile robotics. However, the former research can only provide the basic geometry structure without the capability of open-world scene understanding, limiting advanced tasks like human interaction and visual navigation. Moreover, traditional 3D scene understanding approaches rely on expensive labeled 3D datasets to train a model for a single task with supervision. Thus, geometric reconstruction with zero-shot scene understanding i.e. Open vocabulary 3D Understanding and Reconstruction, is crucial for the future development of mobile robots. In this paper, we propose OpenOcc, a novel framework unifying the 3D scene reconstruction and open vocabulary understanding with neural radiance fields. We model the geometric structure of the scene with occupancy representation and distill the pre-trained open vocabulary model into a 3D language field via volume rendering for zero-shot inference. Furthermore, a novel semantic-aware confidence propagation (SCP) method has been proposed to relieve the issue of language field representation degeneracy caused by inconsistent measurements in distilled features. Experimental results show that our approach achieves competitive performance in 3D scene understanding tasks, especially for small and long-tail objects.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# パスワードを忘れた人は誰か? アカウントのリカバリがリスクベースの認証と出会う

Is It Really You Who Forgot the Password? When Account Recovery Meets Risk-Based Authentication ( http://arxiv.org/abs/2403.11798v1 )

ライセンス: Link先を確認

Andre Büttner, Andreas Thue Pedersen, Stephan Wiefling, Nils Gruschka, Luigi Lo Iacono,

(参考訳) リスクベースの認証(RBA)は、ユーザアカウントを不正な乗っ取りから保護するためにオンラインサービスで使用される。 RBAは一般的に、ログインコンテキストの特徴的属性が既知の値から逸脱した場合に、不審なログインの試みを示すコンテキスト的特徴を使用する。 RBAと認証における異常検出に関するこれまでの研究は、主にログインプロセスに焦点を当ててきた。しかし、最近の攻撃は認証プロセスの他の部分、特にアカウント回復機能における脆弱性を明らかにしている。したがって、総合的な認証セキュリティを確保するためには、アカウント回復の文脈における異常検出の使用も検討する必要がある。本研究は,野生におけるリスクベース会計回復(RBAR)を調査するための最初の研究である。 RBARを5つの著名なオンラインサービス(RBA)で採用した事例を分析した。調査の結果、Google、LinkedIn、AmazonでのRBARの使用が確認されました。さらに、これらのサービスの様々なRBARメカニズムに関する洞察を提供し、それらに対する多要素認証の影響を探る。この結果をもとに,RBARの課題に対する最初の成熟度モデルを構築した。当社の目標は、開発者、管理者、政策立案者がRBARを最初に理解することを支援し、この方向へのさらなる研究を促進することです。

Risk-based authentication (RBA) is used in online services to protect user accounts from unauthorized takeover. RBA commonly uses contextual features that indicate a suspicious login attempt when the characteristic attributes of the login context deviate from known and thus expected values. Previous research on RBA and anomaly detection in authentication has mainly focused on the login process. However, recent attacks have revealed vulnerabilities in other parts of the authentication process, specifically in the account recovery function. Consequently, to ensure comprehensive authentication security, the use of anomaly detection in the context of account recovery must also be investigated. This paper presents the first study to investigate risk-based account recovery (RBAR) in the wild. We analyzed the adoption of RBAR by five prominent online services (that are known to use RBA). Our findings confirm the use of RBAR at Google, LinkedIn, and Amazon. Furthermore, we provide insights into the different RBAR mechanisms of these services and explore the impact of multi-factor authentication on them. Based on our findings, we create a first maturity model for RBAR challenges. The goal of our work is to help developers, administrators, and policy-makers gain an initial understanding of RBAR and to encourage further research in this direction.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# Counting-Stars: 長期の大規模言語モデルを評価するためのシンプルで効率的で合理的な戦略

Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models ( http://arxiv.org/abs/2403.11802v1 )

ライセンス: Link先を確認

Mingyang Song, Mao Zheng, Xuan Luo,

(参考訳) 最近の研究は、堅牢なLarge Language Models(LLMs)の開発に集中しているが、適切な評価戦略が欠けているため、LLM(例えばChatGPTやKimiChat)の長文処理能力とパフォーマンスについてはあまり知られていない。このギャップに対処するために、長文LLMを新しいベンチマークであるCounting-Starsとして評価するための、シンプルで効率的で合理的な戦略を提案する。 Counting-Starsは、LLMが長いコンテキストにおける長い依存関係を完全に理解し、キャプチャし、タスクを完了するためにコンテキスト全体にまたがる複数のエビデンスにまたがる依存性を収集できるように設計されている。計数星に基づいて, GPT-4 Turbo と Kimi Chat の2つの長文 LLM の評価実験を行った。実験の結果, GPT-4 Turbo と Kimi Chat は, 4K から 18K までの長い文脈で高い性能を示した。さらに,LLM処理長コンテキストの動作に関する2つの興味深い分析を行った。

While recent research endeavors have concentrated on developing Large Language Models (LLMs) with robust long-context capabilities, due to the lack of appropriate evaluation strategies, relatively little is known about how well the long-context processing abilities and performance of leading LLMs (e.g., ChatGPT and KimiChat). To address this gap, we propose a simple, efficient, and reasonable strategy for evaluating long-context LLMs as a new benchmark, named Counting-Stars. The Counting-Stars is designed to require LLMs to fully understand and capture long dependencies in long contexts and be able to collect inter-dependency across multiple pieces of evidence spanning the entire context to finish the task. Based on the Counting-Stars, we conduct experiments to evaluate the two leading long-context LLMs, i.e., GPT-4 Turbo and Kimi Chat. The experimental results indicate that GPT-4 Turbo and Kimi Chat achieve significant performance in the long context from 4K to 128K. We further present two intriguing analyses regarding the behavior of LLMs processing long context.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# パーソナライズされた脳腫瘍切片に対するフェデレーションモード特異的エンコーダとマルチモーダルアンカー

Federated Modality-specific Encoders and Multimodal Anchors for Personalized Brain Tumor Segmentation ( http://arxiv.org/abs/2403.11803v1 )

ライセンス: Link先を確認

Qian Dai, Dong Wei, Hong Liu, Jinghan Sun, Liansheng Wang, Yefeng Zheng,

(参考訳) 医用画像解析のための既存のフェデレートラーニング (FL) 法の多くは、モーダル内不均一性のみを考慮し、マルチモーダルイメージングへの応用に限定している。実際には、一部のFL参加者が完全な画像モダリティのサブセットしか持たないことは珍しくなく、すべての参加者のデータに基づいてグローバルモデルを効果的に訓練するための課題として、モーダル間不均一性(inter-modal heterogeneity)を呈している。さらに、各参加者は、このようなシナリオでFLからローカルデータの特徴に合わせたパーソナライズされたモデルを得ることを期待している。本研究では,2つの並列問題に同時に対処するため,FedMEMA(FedMEMA)とFedMEMA(FedMEMA)を組み合わせた新しいFLフレームワークを提案する。とりわけ、FedMEMAは、まずはモーダル間の不均一性を考慮するために、各モーダルに排他的エンコーダを使用している。一方、エンコーダは参加者によって共有されるが、デコーダは個々のニーズに合わせてパーソナライズされる。具体的には、フルモーダルデータを持つサーバは、フュージョンデコーダを使用して、すべてのモダリティ固有のエンコーダから表現を集約およびヒューズし、モダリティをブリッジして、バックプロパゲーションを介してエンコーダを最適化する。一方、融合マルチモーダル表現から複数のアンカーを抽出し、エンコーダパラメータに加えてクライアントに分散する。一方、不完全なモダリティを持つクライアントは、スケールしたドット積のクロスアテンションを通じて、グローバルなフルモーダルアンカーに対する不完全なモダリティ表現をキャリブレーションし、現在のモダリティの表現を適用しながら、不完全なモダリティによる情報損失を補う。 FedMEMAは、マルチモーダル脳腫瘍セグメンテーションのためのBraTS 2020ベンチマークで検証されている。その結果、マルチモーダルかつパーソナライズされたFLの様々な最新手法よりも優れており、その新規設計が有効であることがわかった。私たちのコードは利用可能です。

Most existing federated learning (FL) methods for medical image analysis only considered intramodal heterogeneity, limiting their applicability to multimodal imaging applications. In practice, it is not uncommon that some FL participants only possess a subset of the complete imaging modalities, posing inter-modal heterogeneity as a challenge to effectively training a global model on all participants' data. In addition, each participant would expect to obtain a personalized model tailored for its local data characteristics from the FL in such a scenario. In this work, we propose a new FL framework with federated modality-specific encoders and multimodal anchors (FedMEMA) to simultaneously address the two concurrent issues. Above all, FedMEMA employs an exclusive encoder for each modality to account for the inter-modal heterogeneity in the first place. In the meantime, while the encoders are shared by the participants, the decoders are personalized to meet individual needs. Specifically, a server with full-modal data employs a fusion decoder to aggregate and fuse representations from all modality-specific encoders, thus bridging the modalities to optimize the encoders via backpropagation reversely. Meanwhile, multiple anchors are extracted from the fused multimodal representations and distributed to the clients in addition to the encoder parameters. On the other end, the clients with incomplete modalities calibrate their missing-modal representations toward the global full-modal anchors via scaled dot-product cross-attention, making up the information loss due to absent modalities while adapting the representations of present ones. FedMEMA is validated on the BraTS 2020 benchmark for multimodal brain tumor segmentation. Results show that it outperforms various up-to-date methods for multimodal and personalized FL and that its novel designs are effective. Our code is available.

翻訳日:2024-03-20 20:19:57 公開日:2024-03-18

# LLMの意思決定はどこまで進んでいるか? マルチエージェント環境におけるLLMのゲーム能力の評価

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments ( http://arxiv.org/abs/2403.11807v1 )

ライセンス: Link先を確認

Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu,

(参考訳) 様々な種類の能力を必要とする複雑なタスクである意思決定は、LLM(Large Language Models)を評価するための優れたフレームワークを提供する。本研究では, LLMの意思決定能力について, 十分に確立された分野であるゲーム理論のレンズを用いて検討した。 2人以上のエージェントが同時に参加するゲームに特化しています。次に,従来の8種類のマルチエージェントゲームを含むGAMA-Benchを紹介した。これらのゲームにおいて,モデルの性能を定量的に評価するためのスコアリング方式を設計する。 GAMA-Benchを用いて, LLMの堅牢性, 一般化可能性, 拡張戦略について検討する。その結果, GPT-3.5はロバスト性に満足するが, 一般化性は比較的限定的であることがわかった。しかし、その性能はChain-of-Thoughtのようなアプローチによって改善できる。さらに,様々なLCMに対して評価を行い,GAMA-Bench 上で GPT-4 が他のモデルより優れ,スコアが 72.5 であることを確認した。さらに、GPT-3.5(0613, 1106, 0125)の3回にまたがるスコアは、各更新でモデルのインテリジェンスに顕著な進歩を示した。コードと実験結果はhttps://github.com/CUHK-ARISE/GAMABench.comで公開されている。

Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, GAMA-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through GAMA-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on GAMA-Bench, achieving a score of 72.5. Moreover, the increasingly higher scores across the three iterations of GPT-3.5 (0613, 1106, 0125) demonstrate marked advancements in the model's intelligence with each update. The code and experimental results are made publicly available via https://github.com/CUHK-ARISE/GAMABench.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# ViT適応のためのパラメータと推論効率を考慮した動的チューニング

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation ( http://arxiv.org/abs/2403.11808v1 )

ライセンス: Link先を確認

Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You,

(参考訳) 既存のパラメータ効率細調整(PEFT)法は、パラメータ効率を向上させることでビジョントランスフォーマー(ViT)適応において大きな成功を収めた。しかし、適応時の推論効率向上の探索はいまだに未定である。これにより、トレーニング済みのViTモデルのより広範な適用が制限される。本稿では,パラメータと推論効率を両立させる新しい手法である動的チューニング(DyT)を提案する。具体的には,軽量なアダプタモジュールの他に,重要度が低いトークンを区別するトークンディスペンサーを提案し,後者が元のブロックを動的にスキップし,推論時の冗長な計算を低減させる。さらに、DyTのベストプラクティスを見つけるために、複数の設計変種を探索する。最後に,Mix-of-experts (MoE) 機構に着想を得て,適応性能をさらに向上する拡張アダプタを提案する。画像/映像認識やセマンティックセグメンテーションなど,様々なタスクでDyTを検証する。例えば、DyT は既存の PEFT 法と同等またはそれ以上のパフォーマンスを達成し、VTAB-1K ベンチマークでは FLOP の 71%-85% しか実行していない。

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves comparable or even superior performance compared to existing PEFT methods while evoking only 71%-85% of their FLOPs on the VTAB-1K benchmark.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# LLMのためのメタファー理解チャレンジデータセット

Metaphor Understanding Challenge Dataset for LLMs ( http://arxiv.org/abs/2403.11810v1 )

ライセンス: Link先を確認

Xiaoyu Tong, Rochelle Choenni, Martha Lewis, Ekaterina Shutova,

(参考訳) 自然言語のメタファーは、類推や分類のような基本的な認知過程の反映であり、日常のコミュニケーションに深く根ざしている。したがってメタファー理解は、大きな言語モデル(LLM)にとって不可欠なタスクである。 LLMのメタファー理解能力を評価するために,メタファー理解課題データセット(MUNCH)をリリースする。このデータセットは、メタファーの使用を含む文に対して10k以上のパラフレーズと、不適応パラフレーズを含む1.5kのインスタンスを提供する。不適応パラフレーズは、モデルが本当に完全な比喩解釈を行うか、むしろ語彙的類似性に頼るかを決定するための制御として慎重に選択された。アクトと不適応のパラフレーズはすべて手動で注釈付けされた。比喩文は4つのジャンル(学術、ニュース、フィクション、会話)にまたがる自然な比喩をカバーし、それぞれ異なるレベルのノベルティを示す。 LLaMA と GPT-3.5 の実験により、MUNCH は LLM にとって困難な課題であることが示された。データセットはhttps://github.com/xiaoyuisrain/metaphor-understanding-challengeで自由にアクセスできる。

Metaphors in natural language are a reflection of fundamental cognitive processes such as analogical reasoning and categorisation, and are deeply rooted in everyday communication. Metaphor understanding is therefore an essential task for large language models (LLMs). We release the Metaphor Understanding Challenge Dataset (MUNCH), designed to evaluate the metaphor understanding capabilities of LLMs. The dataset provides over 10k paraphrases for sentences containing metaphor use, as well as 1.5k instances containing inapt paraphrases. The inapt paraphrases were carefully selected to serve as control to determine whether the model indeed performs full metaphor interpretation or rather resorts to lexical similarity. All apt and inapt paraphrases were manually annotated. The metaphorical sentences cover natural metaphor uses across 4 genres (academic, news, fiction, and conversation), and they exhibit different levels of novelty. Experiments with LLaMA and GPT-3.5 demonstrate that MUNCH presents a challenging task for LLMs. The dataset is freely accessible at https://github.com/xiaoyuisrain/metaphor-understanding-challenge.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# Aerial Lifting:Aerial Imageryによるニューラルアーバンセマンティックとビルのリフティング

Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery ( http://arxiv.org/abs/2403.11812v1 )

ライセンス: Link先を確認

Yuqi Zhang, Guanying Chen, Jiaxing Chen, Shuguang Cui,

(参考訳) 本稿では,3次元にノイズの多い2次元ラベルを持ち上げることで,都市規模のセマンティックスとビルレベルのインスタンスセグメンテーションを実現するためのニューラルラジアンスフィールド手法を提案する。これは2つの主な理由から難しい問題である。第一に、都市空撮画像のオブジェクトは、建物、車、道路など、相当な大きさのバリエーションを示しており、正確な2Dセグメンテーションの課題となっている。第2に,既存のセグメンテーション法によって生成された2Dラベルは,特に空中画像の場合,シーン全体のごく一部しか撮影できない場合,多視点不整合問題に悩まされる。これらの制限を克服するために、我々はまず、異なる高度から予測されるラベルを組み合わせて、異なる大きさのオブジェクトのセグメンテーションを強化するスケール適応型セマンティックラベル融合戦略を導入し、NeRFの新規なビュー合成機能を活用する。次に,2次元のインスタンスラベルにおける多視点不整合問題を緩和するために,3次元シーン表現に基づく新しいクロスビューインスタンスラベルグループ化戦略を導入する。さらに,多視点再構成深度を生かして,再構成放射場の幾何学的品質を向上し,セグメンテーション結果が向上した。複数の実世界の都市規模データセットの実験により、我々のアプローチは既存の手法よりも優れており、その有効性を強調している。

We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images by lifting noisy 2D labels to 3D. This is a challenging problem due to two primary reasons. Firstly, objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads, which pose a significant challenge for accurate 2D segmentation. Secondly, the 2D labels generated by existing segmentation methods suffer from the multi-view inconsistency problem, especially in the case of aerial images, where each image captures only a small portion of the entire scene. To overcome these limitations, we first introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes by combining labels predicted from different altitudes, harnessing the novel-view synthesis capabilities of NeRF. We then introduce a novel cross-view instance label grouping strategy based on the 3D scene representation to mitigate the multi-view inconsistency problem in the 2D instance labels. Furthermore, we exploit multi-view reconstructed depth priors to improve the geometric quality of the reconstructed radiance field, resulting in enhanced segmentation results. Experiments on multiple real-world urban-scale datasets demonstrate that our approach outperforms existing methods, highlighting its effectiveness.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# HVDistill: 教師なしハイブリッドビュー蒸留による画像からポイントクラウドへの知識伝達

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation ( http://arxiv.org/abs/2403.11817v1 )

ライセンス: Link先を確認

Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang,

(参考訳) 本稿では,HVDistillと呼ばれるハイブリッドビューベースの知識蒸留フレームワークについて,教師なしマン・オタクで事前学習したイメージ・ネットワークを用いて,ポイント・クラウド・ニューラルネットの特徴学習を指導する。 RGBカメラとLiDARセンサの幾何学的関係を利用して、画像平面ビューと鳥眼ビューの両方に基づく2つのモードの対応性を確立し、表現学習を容易にする。特に、画像平面対応は、点雲を投影することで単純にオブ・テイニングが可能であり、鳥視対応は、投影された点雲の監督によって予測された深さで3次元空間に画素を持ち上げることで達成できる。画像教師ネットワークは、画像平面ビューからリッチなセマンティクスを提供し、一方、鳥眼ビューから幾何学的情報を取得する。実際、この2つのビューのイメージ特徴は、互いに自然に合成され、同時に、クラウドスタブデントネットワークの学習した特徴表現を改善することができる。さらに、自己教師付き2Dネットワークでは、HVDistillは2Dアノテーションも3Dアノテーションも必要としない。我々は、nuScenesデータセット上のモデルを事前トレーニングし、評価のためにnuScenes、SemanticKITTI、KITTIデータセット上の下流タスクに転送する。その結果,本手法はスクラッチからトレーニングしたベースラインよりも一貫した改善を実現し,既存のスキームをはるかに上回っていることがわかった。コードはgit@github.com:zhangsha1024/HVDistill.gitで入手できる。

We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised man- ner. By exploiting the geometric relationship between RGB cameras and LiDAR sensors, the correspondence between the two modalities based on both image- plane view and bird-eye view can be established, which facilitates representation learning. Specifically, the image-plane correspondences can be simply ob- tained by projecting the point clouds, while the bird- eye-view correspondences can be achieved by lifting pixels to the 3D space with the predicted depths un- der the supervision of projected point clouds. The image teacher networks provide rich semantics from the image-plane view and meanwhile acquire geometric information from the bird-eye view. Indeed, image features from the two views naturally comple- ment each other and together can ameliorate the learned feature representation of the point cloud stu- dent networks. Moreover, with a self-supervised pre- trained 2D network, HVDistill requires neither 2D nor 3D annotations. We pre-train our model on nuScenes dataset and transfer it to several downstream tasks on nuScenes, SemanticKITTI, and KITTI datasets for evaluation. Extensive experimental results show that our method achieves consistent improvements over the baseline trained from scratch and significantly out- performs the existing schemes. Codes are available at git@github.com:zhangsha1024/HVDistill.git.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# TCNet:軌道と関連地域からの連続手話認識

TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions ( http://arxiv.org/abs/2403.11818v1 )

ライセンス: Link先を確認

Hui Lu, Albert Ali Salah, Ronald Poppe,

(参考訳) 連続手話認識(CSLR)における鍵となる課題は、ビデオ入力から長時間にわたる空間的相互作用を効率的に捉えることである。この課題に対処するために,トラジェクトリや相関領域からの時空間情報を効果的にモデル化するハイブリッドネットワークTCNetを提案する。 TCNetのトラジェクトリモジュールは、フレームを連続的な視覚トークンからなる整列トラジェクトリに変換する。さらに、クエリトークンに対しては、トラジェクトリに沿って自己アテンションが学習される。これにより,動作中の特定の領域の指の動きなどの微細な時空間パターンにも注目できる。 TCNetの相関モジュールは、無関係なフレーム領域をフィルタリングする新しいダイナミックアテンション機構を使用している。さらに、相関領域から動的キー値トークンを各クエリに割り当てる。どちらの革新も計算コストとメモリを大幅に削減する。 PHOENIX14, PHOENIX14-T, CSL, CSL-Dailyの4つの大規模データセットの実験を行った。我々の結果は,TCNetが常に最先端のパフォーマンスを達成していることを示している。例えば、PHOENIX14とPHOENIX14-Tの単語誤り率をそれぞれ1.5%、1.0%改善する。

A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuous visual tokens. In addition, for a query token, self-attention is learned along the trajectory. As such, our network can also focus on fine-grained spatio-temporal patterns, such as finger movements, of a specific region in motion. TCNet's correlation module uses a novel dynamic attention mechanism that filters out irrelevant frame regions. Additionally, it assigns dynamic key-value tokens from correlated regions to each query. Both innovations significantly reduce the computation cost and memory. We perform experiments on four large-scale datasets: PHOENIX14, PHOENIX14-T, CSL, and CSL-Daily, respectively. Our results demonstrate that TCNet consistently achieves state-of-the-art performance. For example, we improve over the previous state-of-the-art by 1.5% and 1.0% word error rate on PHOENIX14 and PHOENIX14-T, respectively.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# 画像合成におけるテキストの評価 : 画像品質指標の調査と分類

Evaluating Text to Image Synthesis: Survey and Taxonomy of Image Quality Metrics ( http://arxiv.org/abs/2403.11821v1 )

ライセンス: Link先を確認

Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam, Timo Ropinski,

(参考訳) 近年のテキスト・画像合成の進歩は,基礎モデルによる言語と視覚の組み合わせを利用して実現されている。これらのモデルは、World Wide Webや他の大規模データベースから得られた膨大な量のテキストイメージペアに基づいて事前訓練されている。テキストと画像間のコンテンツアライメントを確保するために高品質な画像生成の需要がシフトするにつれて、人間の判断を模倣する新たな評価指標が開発されてきた。このように、研究者たちは、テキストと画像のコンポジションアライメントの品質尺度として、視覚言語モデルの合成性とそれらの組み合わさを研究するために、ますます複雑なアノテーションを持つデータセットを集め始めている。本稿では,既存のテキスト・画像評価指標の概要を概観し,これらの指標を分類するための新しい分類法を提案する。また,テキストから画像への合成モデルを品質や人為的嗜好に最適化する手法について議論する前に,頻繁なテキスト画像ベンチマークデータセットのレビューを行った。最終的に、テキスト・ツー・イメージの評価を改善するためのガイドラインを導き、オープンな課題と現在の制限について議論する。

Recent advances in text-to-image synthesis have been enabled by exploiting a combination of language and vision through foundation models. These models are pre-trained on tremendous amounts of text-image pairs sourced from the World Wide Web or other large-scale databases. As the demand for high-quality image generation shifts towards ensuring content alignment between text and image, novel evaluation metrics have been developed with the aim of mimicking human judgments. Thus, researchers have started to collect datasets with increasingly complex annotations to study the compositionality of vision-language models and their incorporation as a quality measure of compositional alignment between text and image contents. In this work, we provide a comprehensive overview of existing text-to-image evaluation metrics and propose a new taxonomy for categorizing these metrics. We also review frequently adopted text-image benchmark datasets before discussing techniques to optimize text-to-image synthesis models towards quality and human preferences. Ultimately, we derive guidelines for improving text-to-image evaluation and discuss the open challenges and current limitations.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# CapsLorentzNet: 物理にインスパイアされた機能とグラフの畳み込みの統合

CapsLorentzNet: Integrating Physics Inspired Features with Graph Convolution ( http://arxiv.org/abs/2403.11826v1 )

ライセンス: Link先を確認

Rameswar Sahu,

(参考訳) 高度な機械学習技術の出現により、オブジェクトのタグ付けが大幅に進歩した。本稿では,広い範囲のグラフニューラルネットワーク(GNN)アーキテクチャと互換性のある新しいアーキテクチャ変更を導入することにより,この分野をさらに進める。本手法は,標準GNNにおける従来の復号ブロックを置き換えるカプセル層の統合を提唱する。これらのカプセルは、ベクターアクティベーションを持つニューロンのグループである。これらのベクトルの向きは、研究中の物体がカプセルで表されるクラスに属するかどうかを特徴づける大きさで、研究中の物体の重要な特性を表している。さらに、カプセルネットワークは再構成機構による正規化を取り入れ、専門家が設計した高レベルな特徴をシームレスに分析に統合することを容易にする。クォークグルーオンタギングにおけるLorentzNetアーキテクチャによるアーキテクチャの有用性について検討した。ここでは、LorentzNetの復号ブロックをカプセル化復号ブロックに置き換え、結果のアーキテクチャをCapsLorentzNetと呼ぶ。我々の新しいアーキテクチャはクォークグルーオンタギングタスクにおいてローレンツネットの性能を20%向上させることができる。

With the advent of advanced machine learning techniques, boosted object tagging has witnessed significant progress. In this article, we take this field further by introducing novel architectural modifications compatible with a wide array of Graph Neural Network (GNN) architectures. Our approach advocates for integrating capsule layers, replacing the conventional decoding blocks in standard GNNs. These capsules are a group of neurons with vector activations. The orientation of these vectors represents important properties of the objects under study, with their magnitude characterizing whether the object under study belongs to the class represented by the capsule. Moreover, capsule networks incorporate a regularization by reconstruction mechanism, facilitating the seamless integration of expert-designed high-level features into the analysis. We have studied the usefulness of our architecture with the LorentzNet architecture for quark-gluon tagging. Here, we have replaced the decoding block of LorentzNet with a capsulated decoding block and have called the resulting architecture CapsLorentzNet. Our new architecture can enhance the performance of LorentzNet by 20 \% for the quark-gluon tagging task.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# 距離推定による音事象の検出と位置推定

Sound Event Detection and Localization with Distance Estimation ( http://arxiv.org/abs/2403.11827v1 )

ライセンス: Link先を確認

Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros,

(参考訳) 音事象検出と局所化(SELD)は、音事象とその対応方向(DOA)を識別する複合タスクである。この課題には多くの応用があり、近年広く研究されているが、音源位置に関する完全な情報の提供には失敗している。本稿では,タスクを音事象検出,距離推定による局所化(3D SELD)に拡張することで,この問題を克服する。本研究では,SELDコア内に距離推定を統合する2つの方法について検討する。これは,問題を個別のモデル出力で処理するマルチタスクアプローチと,マルチACCDOA法を距離情報を含むように拡張したシングルタスクアプローチである。 STARSS23: Sony-TAU Realistic Space Soundscapes 2023。さらに,距離推定部に関連する損失関数について実験を行った。以上の結果から,音事象検出やDOA推定における性能劣化を伴わずに3D SELDを行うことが可能であることが示唆された。

Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA). While this task has numerous applications and has been extensively researched in recent years, it fails to provide full information about the sound source position. In this paper, we overcome this problem by extending the task to Sound Event Detection, Localization with Distance Estimation (3D SELD). We study two ways of integrating distance estimation within the SELD core - a multi-task approach, in which the problem is tackled by a separate model output, and a single-task approach obtained by extending the multi-ACCDOA method to include distance information. We investigate both methods for the Ambisonic and binaural versions of STARSS23: Sony-TAU Realistic Spatial Soundscapes 2023. Moreover, our study involves experiments on the loss function related to the distance estimation part. Our results show that it is possible to perform 3D SELD without any degradation of performance in sound event detection and DOA estimation.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# グラフニューラルネットワークを用いたネットワーク侵入検知システムにおける問題空間構造逆攻撃

Problem space structural adversarial attacks for Network Intrusion Detection Systems based on Graph Neural Networks ( http://arxiv.org/abs/2403.11830v1 )

ライセンス: Link先を確認

Andrea Venturi, Dario Stabili, Mirco Marchetti,

(参考訳) 機械学習(ML)アルゴリズムは、ネットワーク侵入検知システム(NIDS)をサポートするためにますます人気が高まっている。それにもかかわらず、大規模な研究により、敵攻撃に対する脆弱性が示されており、その性能を損なうことを目的としたモデルの入力に微妙な摂動が伴っている。最近の提案では、グラフニューラルネットワーク(GNN)を有効活用して、侵入による構造パターンにもとづいて、検出ロバスト性の向上を図っている。しかし、GNNベースのNIDSの採用は、新しいタイプのリスクをもたらす。本稿では,ネットワーク侵入検知におけるGNNに適した敵攻撃の最初の形式化を提案する。さらに,現実のシナリオにおいて,実行可能な構造攻撃を行うためには,攻撃者が考慮すべき問題空間の制約を概説し,モデル化する。最終的な貢献として、我々は、最先端のGNNベースのNIDSに対して提案された攻撃を開始するための広範な実験的キャンペーンを実施している。本研究は, 古典的特徴に基づく攻撃に対するモデルの堅牢性の向上と, 構造的攻撃に対する感受性を強調した。

Machine Learning (ML) algorithms have become increasingly popular for supporting Network Intrusion Detection Systems (NIDS). Nevertheless, extensive research has shown their vulnerability to adversarial attacks, which involve subtle perturbations to the inputs of the models aimed at compromising their performance. Recent proposals have effectively leveraged Graph Neural Networks (GNN) to produce predictions based also on the structural patterns exhibited by intrusions to enhance the detection robustness. However, the adoption of GNN-based NIDS introduces new types of risks. In this paper, we propose the first formalization of adversarial attacks specifically tailored for GNN in network intrusion detection. Moreover, we outline and model the problem space constraints that attackers need to consider to carry out feasible structural attacks in real-world scenarios. As a final contribution, we conduct an extensive experimental campaign in which we launch the proposed attacks against state-of-the-art GNN-based NIDS. Our findings demonstrate the increased robustness of the models against classical feature-based adversarial attacks, while highlighting their susceptibility to structure-based attacks.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator

SSCAE -- Semantic, Syntactic, and Context-aware natural language Adversarial Examples generator ( http://arxiv.org/abs/2403.11833v1 )

ライセンス: Link先を確認

Javad Rafiei Asl, Mohammad H. Rafiei, Manar Alohaly, Daniel Takabi,

(参考訳) マシンラーニングモデルは、悪意ある構築されたAdversarial Examples(AEs)に対して脆弱である。 AEsで機械学習モデルをトレーニングすることで、敵攻撃に対する堅牢性と安定性が向上する。高品質のAEを生産するモデルを開発することが不可欠である。このようなモデルの開発は、自然言語処理(NLP)においてコンピュータビジョンのような分野よりもはるかに遅い。本稿では,SSCAE for \textbf{S}emantic, \textbf{S}yntactic, \textbf{C}ontext-aware natural language \textbf{AE}s generatorを提案する。 SSCAEは重要な単語を特定し、マスク付き言語モデルを使用して、初期の置換セットを生成する。次に、2つのよく知られた言語モデルを用いて、意味的および構文的特性の観点から初期集合を評価する。本稿では,(1)より効率的な摂動を捉えるダイナミックしきい値,(2)高品質なAEを生成するための局所的な欲求探索について紹介する。ブラックボックスの手法として、SSCAEは、意味的一貫性とソース言語の構文的および文法的要求を保った、人間には受け入れ難い、コンテキスト対応のAEを生成する。提案したSSCAEモデルの有効性と優位性について,15種類の比較実験とパラメータ最適化のための広範囲な感度解析を行った。 SSCAEは、より低いクエリ数と同等の摂動率で高いセマンティック一貫性を維持しながら、すべての実験で既存のモデルよりも優れています。

Machine learning models are vulnerable to maliciously crafted Adversarial Examples (AEs). Training a machine learning model with AEs improves its robustness and stability against adversarial attacks. It is essential to develop models that produce high-quality AEs. Developing such models has been much slower in natural language processing (NLP) than in areas such as computer vision. This paper introduces a practical and efficient adversarial attack model called SSCAE for \textbf{S}emantic, \textbf{S}yntactic, and \textbf{C}ontext-aware natural language \textbf{AE}s generator. SSCAE identifies important words and uses a masked language model to generate an early set of substitutions. Next, two well-known language models are employed to evaluate the initial set in terms of semantic and syntactic characteristics. We introduce (1) a dynamic threshold to capture more efficient perturbations and (2) a local greedy search to generate high-quality AEs. As a black-box method, SSCAE generates humanly imperceptible and context-aware AEs that preserve semantic consistency and the source language's syntactical and grammatical requirements. The effectiveness and superiority of the proposed SSCAE model are illustrated with fifteen comparative experiments and extensive sensitivity analysis for parameter optimization. SSCAE outperforms the existing models in all experiments while maintaining a higher semantic consistency with a lower query number and a comparable perturbation rate.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# インコンテクスト学習と構成一般化の関係の理解に向けて

Towards Understanding the Relationship between In-context Learning and Compositional Generalization ( http://arxiv.org/abs/2403.11834v1 )

ライセンス: Link先を確認

Sungjun Han, Sebastian Padó,

(参考訳) 構成一般化の原理によれば、複素表現の意味は、その部分の意味とそれらがどのように結合されるかの関数として理解することができる。この原理は人間の言語処理に不可欠であり、また、アウト・オブ・ディストリビューションデータに直面したNLPモデルにも不可欠である。しかし、トランスフォーマーを含む多くのニューラルネットワークモデルは、構成一般化に苦しむことが示されている。本稿では,モデルに文脈内学習を強制することは,構成一般化を促進する帰納的バイアスをもたらすと仮定する。この仮説をテストするために、通常の学習を非常に難しい設定で因果変換器を訓練し、トレーニングインスタンスとシャッフルインスタンスラベルの異なる順序で提示する。これは、データセットから達成可能な、可能な数発の学習問題のすべてについて、モデルをトレーニングすることに対応する。しかし、このモデルは、初期の例を利用して、後の例(例えば、文脈内学習)に一般化することで、タスクを解くことができる。データセット、SCAN、COGS、GeoQueryの評価では、この方法でトレーニングされたモデルは、実際に合成の一般化の改善を示している。このことは、一般化のための帰納的バイアスとして、文脈内学習問題の有用性を示している。

According to the principle of compositional generalization, the meaning of a complex expression can be understood as a function of the meaning of its parts and of how they are combined. This principle is crucial for human language processing and also, arguably, for NLP models in the face of out-of-distribution data. However, many neural network models, including Transformers, have been shown to struggle with compositional generalization. In this paper, we hypothesize that forcing models to in-context learn can provide an inductive bias to promote compositional generalization. To test this hypothesis, we train a causal Transformer in a setting that renders ordinary learning very difficult: we present it with different orderings of the training instance and shuffle instance labels. This corresponds to training the model on all possible few-shot learning problems attainable from the dataset. The model can solve the task, however, by utilizing earlier examples to generalize to later ones (i.e. in-context learning). In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization. This indicates the usefulness of in-context learning problems as an inductive bias for generalization.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# Agent3D-Zero: ゼロショット3D理解のためのエージェント

Agent3D-Zero: An Agent for Zero-shot 3D Understanding ( http://arxiv.org/abs/2403.11835v1 )

ライセンス: Link先を確認

Sha Zhang, Di Huang, Jiajun Deng, Shixiang Tang, Wanli Ouyang, Tong He, Yanyong Zhang,

(参考訳) 3Dの現実世界を理解する能力は、人工知能にとって重要なマイルストーンだ。現在の一般的なプラクティスは、大規模言語モデル(LLM)を3Dデータとテキストで微調整し、3D理解を可能にすることです。有効性にもかかわらず、これらのアプローチは本来、利用可能な3Dデータのスケールと多様性によって制限される。また,本研究では,ゼロショット方式で3Dシーン理解を実現する革新的な3DエージェントフレームワークであるAgent3D-Zeroを紹介する。アプローチの本質は、人間がどのように3Dシーンを理解しようとするかに触発されて、複数の画像からの洞察を理解し、合成するプロセスとして、3Dシーン知覚の課題を再認識することに集中している。本稿では,この概念を統合することで,3次元理解のための視点を積極的に選択・分析することで,大規模視覚言語モデル(VLM)を利用する新しい手法を提案する。具体的には、入力された3Dシーンが与えられた場合、Agent3D-Zeroはまず、カスタムデザインの視覚的プロンプトで鳥眼視画像を処理し、次に視点を選択して、基礎となる知識を観察し、要約する。 Agent3D-Zeroの独特な利点は、視覚的プロンプトの導入である。広範囲な実験により, 多様な3D環境を理解する上で, 提案手法の有効性が示された。

The ability to understand and reason the 3D real world is a crucial milestone towards artificial general intelligence. The current common practice is to finetune Large Language Models (LLMs) with 3D data and texts to enable 3D understanding. Despite their effectiveness, these approaches are inherently limited by the scale and diversity of the available 3D data. Alternatively, in this work, we introduce Agent3D-Zero, an innovative 3D-aware agent framework addressing the 3D scene understanding in a zero-shot manner. The essence of our approach centers on reconceptualizing the challenge of 3D scene perception as a process of understanding and synthesizing insights from multiple images, inspired by how our human beings attempt to understand 3D scenes. By consolidating this idea, we propose a novel way to make use of a Large Visual Language Model (VLM) via actively selecting and analyzing a series of viewpoints for 3D understanding. Specifically, given an input 3D scene, Agent3D-Zero first processes a bird's-eye view image with custom-designed visual prompts, then iteratively chooses the next viewpoints to observe and summarize the underlying knowledge. A distinctive advantage of Agent3D-Zero is the introduction of novel visual prompts, which significantly unleash the VLMs' ability to identify the most informative viewpoints and thus facilitate observing 3D scenes. Extensive experiments demonstrate the effectiveness of the proposed framework in understanding diverse and previously unseen 3D environments.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# 安全と高品質のアウトプットの確保: 言語モデルに対するガイドラインライブラリアプローチ

Ensuring Safe and High-Quality Outputs: A Guideline Library Approach for Language Models ( http://arxiv.org/abs/2403.11838v1 )

ライセンス: Link先を確認

Yi Luo, Zhenghao Lin, Yuhao Zhang, Jiashuo Sun, Chen Lin, Chengjin Xu, Xiangdong Su, Yelong Shen, Jian Guo, Yeyun Gong,

(参考訳) 大きな言語モデル(LLM)は印象的な能力を示すだけでなく、バイアスのあるコンテンツ生成やプライバシの問題といったリスクも提示する。現在のアライメント手法の1つは、原則駆動の統合を含んでいるが、手作業によるルールの不正確さと、安全トレーニングのないモデルにおけるリスク認識の不十分さから生じる課題に直面している。これらの問題に対処するために,2段階のアプローチである Guide-Align を導入する。当初,安全訓練モデルでは潜在的なリスクを特定し,様々な入力に対する特定のガイドラインを定式化し,入力誘導検索のためのガイドラインとモデルの包括的ライブラリを構築した。その後、検索モデルは、新しい入力と関連するガイドラインを関連付け、応答生成におけるLCMを誘導し、安全で高品質な出力を保証し、人間の値と整合する。追加のオプションステージでは、第2ステージで実装されたプロセスを通じて生成された、新しい整列データセットでモデルを微調整する。本手法は,多様な入力に対応するためのガイドラインをカスタマイズし,ガイドラインライブラリのきめ細かい粒度と包括性を向上する。さらに、軽量検索モデルにより、安全訓練されたLLMの安全性に関する専門知識を取り入れている。当社のアプローチを3つのベンチマークで評価し,LLMのセキュリティと品質の大幅な向上を実証した。特に、微調整されたモデルであるRaradorは、パラメータが13億であっても、GPT-3.5-turboより優れ、アライメント能力はGPT-4より優れています。

Large Language Models (LLMs) exhibit impressive capabilities but also present risks such as biased content generation and privacy issues. One of the current alignment techniques includes principle-driven integration, but it faces challenges arising from the imprecision of manually crafted rules and inadequate risk perception in models without safety training. To address these, we introduce Guide-Align, a two-stage approach. Initially, a safety-trained model identifies potential risks and formulates specific guidelines for various inputs, thereby establishing a comprehensive library of guidelines and models for input-guidelines retrieval. Subsequently, the retrieval model correlates new inputs with pertinent guidelines, guiding LLMs in response generation to ensure safe and high-quality outputs, thus aligning with human values. An additional optional stage involves fine-tuning a model with new well-aligned datasets generated through the process implemented in the second stage. Our method customizes guidelines to accommodate diverse inputs, thereby enhancing the fine-grainedness and comprehensiveness of the guideline library. Furthermore, it incorporates safety expertise from a safety-trained LLM through a lightweight retrieval model. We evaluated our approach on three benchmarks, demonstrating significant improvements in LLM security and quality. Notably, our fine-tuned model, Labrador, even at 13 billion parameters, outperforms GPT-3.5-turbo and surpasses GPT-4 in alignment capabilities.

翻訳日:2024-03-20 20:10:10 公開日:2024-03-18

# 知識指導型機械学習の強化手法としての多項目比較

Multi-Criteria Comparison as a Method of Advancing Knowledge-Guided Machine Learning ( http://arxiv.org/abs/2403.11840v1 )

ライセンス: Link先を確認

Jason L. Harman, Jaelle Scheuerman,

(参考訳) 本稿では,AI/MLモデルの評価に適用可能な一般化可能なモデル評価手法について述べる。心理学・決定科学における予測競争から発展し、複数の科学的、理論的、実践的な基準にまたがる様々なタイプと構造の候補モデル群を評価する。基準スコアの正規ランキングは、計算社会選択の分野からの投票規則を用いて評価され、総合的な評価において、異なる尺度とモデルのタイプの比較を可能にする。さらなる利点と応用について論じる。

This paper describes a generalizable model evaluation method that can be adapted to evaluate AI/ML models across multiple criteria including core scientific principles and more practical outcomes. Emerging from prediction competitions in Psychology and Decision Science, the method evaluates a group of candidate models of varying type and structure across multiple scientific, theoretic, and practical criteria. Ordinal ranking of criteria scores are evaluated using voting rules from the field of computational social choice and allow the comparison of divergent measures and types of models in a holistic evaluation. Additional advantages and applications are discussed.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# オフラインデータ構築のためのメディエータを用いた悲観的因果強化学習

Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data ( http://arxiv.org/abs/2403.11841v1 )

ライセンス: Link先を確認

Danyang Wang, Chengchun Shi, Shikai Luo, Will Wei Sun,

(参考訳) 実世界のシナリオでは、ランダム化実験から収集されたデータセットは、時間と予算の制限のため、サイズによって制限されることが多い。結果として、大規模な観測データセットを活用することは、高品質な政策学習を実現するためのより魅力的な選択肢となる。しかし、既存のオフライン強化学習(RL)手法の多くは、観測データコンテキストにおいてしばしば保持されない非確立性と肯定性の2つの重要な仮定に依存している。これらの課題を認識し,新しいポリシー学習アルゴリズム PESsimistic CAusal Learning (PESCAL) を提案する。また, 提案手法では, 前方基準に基づくメディエータ変数を用いて, 境界バイアスを除去し, また, 候補ポリシーによって誘導される行動分布と観測データを生成する行動ポリシーの分布シフトに対処する悲観的原理を採用する。我々のキーとなる観察は、系の力学に作用の作用を媒介する補助変数を組み込むことで、Q関数の代わりにメディエータ分布関数の下限を学習し、分散シフトの問題を部分的に緩和するのに十分であるということである。この知見は,推定Q-関数に対する逐次不確実性定量化の課題を回避することによって,我々のアルゴリズムを著しく単純化する。さらに,提案するアルゴリズムの理論的保証とシミュレーションによる有効性の実証,および主要な配車プラットフォームからのオフラインデータセットを利用した実環境実験も提供する。

In real-world scenarios, datasets collected from randomized experiments are often constrained by size, due to limitations in time and budget. As a result, leveraging large observational datasets becomes a more attractive option for achieving high-quality policy learning. However, most existing offline reinforcement learning (RL) methods depend on two key assumptions--unconfoundedness and positivity--which frequently do not hold in observational data contexts. Recognizing these challenges, we propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL). We utilize the mediator variable based on front-door criterion to remove the confounding bias; additionally, we adopt the pessimistic principle to address the distributional shift between the action distributions induced by candidate policies, and the behavior policy that generates the observational data. Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function, to partially mitigate the issue of distributional shift. This insight significantly simplifies our algorithm, by circumventing the challenging task of sequential uncertainty quantification for the estimated Q-function. Moreover, we provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# ファジィラフショケット距離の分類

Fuzzy Rough Choquet Distances for Classification ( http://arxiv.org/abs/2403.11843v1 )

ライセンス: Link先を確認

Adnan Theerens, Chris Cornelis,

(参考訳) 本稿では,ファジィラフセットに基づく新しいチョケット距離を提案する。提案手法は,ファジィ粗集合理論から受信した属性情報とチョーケ積分の柔軟性を組み合わせたものである。このアプローチは、データ内の非線形関係を順応的にキャプチャし、条件属性の判断属性に対する相互作用を認め、より柔軟で正確な距離をもたらすように設計されている。我々は、距離に基づく分類アプローチ(例えば、k-アネレスト近傍)に特に重点を置いて、機械学習の文脈におけるその応用を探求する。本論文は,2つのファジィ粗度に基づく正の領域に基づく測度について検討する。さらに,ファジィ粗集合論から導かれる測度をモノトナイズする2つの手法を探索し,これらをチョーケ積分で用いるのに適したものにし,それらの相違について検討する。

This paper introduces a novel Choquet distance using fuzzy rough set based measures. The proposed distance measure combines the attribute information received from fuzzy rough set theory with the flexibility of the Choquet integral. This approach is designed to adeptly capture non-linear relationships within the data, acknowledging the interplay of the conditional attributes towards the decision attribute and resulting in a more flexible and accurate distance. We explore its application in the context of machine learning, with a specific emphasis on distance-based classification approaches (e.g. k-nearest neighbours). The paper examines two fuzzy rough set based measures that are based on the positive region. Moreover, we explore two procedures for monotonizing the measures derived from fuzzy rough set theory, making them suitable for use with the Choquet integral, and investigate their differences.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 制約付き学習問題の最適解法

Near-Optimal Solutions of Constrained Learning Problems ( http://arxiv.org/abs/2403.11844v1 )

ライセンス: Link先を確認

Juan Elenter, Luiz F. O. Chamon, Alejandro Ribeiro,

(参考訳) 機械学習システムが広く採用されるにつれて、その振る舞いを縮める必要性がますます顕在化している。これは、堅牢性、安全性、公正性要件を満たすモデルの開発に向けた最近の進歩によって証明されている。これらの要件は、制約付き学習問題の定式化によって(一般化保証付きで)課せられ、二重登頂アルゴリズムによって取り組めます。しかし、これらのアルゴリズムは客観的な値に収束するが、凸のない設定であっても、結果が実現可能であることは保証できない。そのためにはすべての反復をランダム化する必要があるが、これは事実上現代のアプリケーションでは現実的ではない。それでも、最終的なイテレーションは、実際にうまく機能することが観察されている。本研究では、凸性の欠如にもかかわらず、最適双対変数に付随するラグランジアン最小値の制約違反を特徴付けることにより、理論と実践の間のこのギャップに対処する。これを実現するために,非凸有限次元制約学習問題を凸関数問題のパラメトリゼーションとみなすことができる。本結果から,2つの手法の実現可能性の問題を効果的に緩和し,従来の2つの学習の実証的成功に光を当てることが示唆された。フェアラーニングの課題について,本研究の成果を概説する。

With the widespread adoption of machine learning systems, the need to curtail their behavior has become increasingly apparent. This is evidenced by recent advancements towards developing models that satisfy robustness, safety, and fairness requirements. These requirements can be imposed (with generalization guarantees) by formulating constrained learning problems that can then be tackled by dual ascent algorithms. Yet, though these algorithms converge in objective value, even in non-convex settings, they cannot guarantee that their outcome is feasible. Doing so requires randomizing over all iterates, which is impractical in virtually any modern applications. Still, final iterates have been observed to perform well in practice. In this work, we address this gap between theory and practice by characterizing the constraint violation of Lagrangian minimizers associated with optimal dual variables, despite lack of convexity. To do this, we leverage the fact that non-convex, finite-dimensional constrained learning problems can be seen as parametrizations of convex, functional problems. Our results show that rich parametrizations effectively mitigate the issue of feasibility in dual methods, shedding light on prior empirical successes of dual learning. We illustrate our findings in fair learning tasks.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# GraphBEV:マルチモード3Dオブジェクト検出のためのロバストなBEV機能アライメントを目指して

GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection ( http://arxiv.org/abs/2403.11848v1 )

ライセンス: Link先を確認

Ziying Song, Lei Yang, Shaoqing Xu, Lin Liu, Dongyang Xu, Caiyan Jia, Feiyang Jia, Li Wang,

(参考訳) LiDARとカメラ情報をBird's-Eye-View(BEV)表現に統合することは、自動運転における3Dオブジェクト検出の重要な側面として現れている。しかし,既存の手法は,LiDARとカメラセンサの不正確な校正関係の影響を受けやすい。このような不正確さは、カメラブランチの深さ推定の誤差をもたらし、最終的にLiDARとカメラBEVの特徴の不一致を引き起こす。本研究では,グラフBEVと呼ばれる堅牢な融合フレームワークを提案する。不正確なポイントクラウドプロジェクションによるエラーに対処するため、グラフマッチングを介して近隣の認識深度機能を利用するLocal Alignモジュールを導入する。さらに,LiDARとカメラBEVの機能の相違を是正するGlobal Alignモジュールを提案する。当社のグラフBEVフレームワークは,nuscenes検証セットにおいて,mAPが70.1\%,BEV Fusionが1.6\%を超え,最先端のパフォーマンスを実現している。重要な点として、我々のグラフBEVは、悪臭のある条件下で、BEV Fusionを8.3%上回っている。

Integrating LiDAR and camera information into Bird's-Eye-View (BEV) representation has emerged as a crucial aspect of 3D object detection in autonomous driving. However, existing methods are susceptible to the inaccurate calibration relationship between LiDAR and the camera sensor. Such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a robust fusion framework called Graph BEV. Addressing errors caused by inaccurate point cloud projection, we introduce a Local Align module that employs neighbor-aware depth features via Graph matching. Additionally, we propose a Global Align module to rectify the misalignment between LiDAR and camera BEV features. Our Graph BEV framework achieves state-of-the-art performance, with an mAP of 70.1\%, surpassing BEV Fusion by 1.6\% on the nuscenes validation set. Importantly, our Graph BEV outperforms BEV Fusion by 8.3\% under conditions with misalignment noise.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 表面電子に由来するナノスケールカシミール力軟化

Nanoscale Casimir force softening originated from surface electrons ( http://arxiv.org/abs/2403.11849v1 )

ライセンス: Link先を確認

Hewan Zhang, Kun Ding,

(参考訳) 真空場と量子物質の強い結合はナノスケールで発生し、光-物質相互作用の地平線を広げる。真空場の展示としてのナノスケールカシミール力は、マイクロンカシミール力では無視できない量子特性による表面電子の影響を必然的に経験する。そこで我々は,カシミール力に対する表面電子の寄与を目的かつ微妙に含む典型的な実験構成に対処する3次元等角写像法を開発した。本手法により,表面電子は材料や結晶面に依存するナノスケールカシミール力を増強または抑制できることを明らかにした。この機構はカシミール力軟化であり、カシミール相互作用で見られる距離を効果的に変化させる表面電子から生じる。本研究は, 表面電子と真空場との相互作用を浮き彫りにするだけでなく, ナノスケールの揺らぎ型問題に関する理論的および実験的研究のレシピを提供する。

Strong coupling between vacuum fields and quantum matter occurs at the nanoscale and broadens the horizon of light-matter interaction. Nanoscale Casimir force, as an exhibition of vacuum fields, inevitably experiences the influence of surface electrons due to their quantum character, which are ignorable in micron Casimir force. Here, we develop a three-dimensional conformal map method to tackle typical experimental configurations with surface electron contributions to Casimir force purposely and delicately included. Based on this method, we reveal that surface electrons can either enhance or suppress the nanoscale Casimir force, depending on materials and crystal facets. The mechanism is demonstrated to be the Casimir force softening, which results from surface electrons effectively altering the distance seen by the Casimir interaction. Our findings not only highlight the interaction between surface electrons and vacuum fields but also provide a recipe for theoretical and experimental investigation of nanoscale fluctuation-type problems.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 長距離コヒーレント攻撃に対する片側DI-QKDの安全性

One-sided DI-QKD secure against coherent attacks over long distances ( http://arxiv.org/abs/2403.11850v1 )

ライセンス: Link先を確認

Michele Masini, Shubhayan Sarkar,

(参考訳) 量子鍵分配(Quantum Key Distribution, QKD)は、証明可能なセキュアな通信を可能にする技術である。デバイス独立(DI) QKDプロトコルは、最小限のデバイス仮定をすることでこの問題を克服するが、高い検出効率を必要とするため距離が限られている。したがって、デバイス上の現実的な仮定に基づく量子鍵分布プロトコルを見つけ、長距離で実装することが望ましい。本研究では,一方的なDI QKD方式を当事者ごとに2つの測定値で検討し,信頼できない側で50.1%以上の効率を検出するためのコヒーレントアタックに対して安全であることを示す。これは、2つの信頼できない測度を持つプロトコルに対して達成可能な理論上の限界である。興味深いことに、信頼できない側に状態のソースを置くことで、我々のプロトコルは標準QKDプロトコルに匹敵する距離にわたって安全であることを示す。

Quantum Key Distribution (QKD) is a technique enabling provable secure communication but faces challenges in device characterization, posing potential security risks. Device-Independent (DI) QKD protocols overcome this issue by making minimal device assumptions but are limited in distance because they require high detection efficiencies, which refer to the ability of the experimental setup to detect quantum states. It is thus desirable to find quantum key distribution protocols that are based on realistic assumptions on the devices as well as implementable over long distances. In this work, we consider a one-sided DI QKD scheme with two measurements per party and show that it is secure against coherent attacks up to detection efficiencies greater than 50.1% specifically on the untrusted side. This is almost the theoretical limit achievable for protocols with two untrusted measurements. Interestingly, we also show that, by placing the source of states close to the untrusted side, our protocol is secure over distances comparable to standard QKD protocols.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 改良型デ・フィネッティ・リダクションを用いた光量子鍵分布のポストセレクション法

Postselection technique for optical Quantum Key Distribution with improved de Finetti reductions ( http://arxiv.org/abs/2403.11851v1 )

ライセンス: Link先を確認

Shlok Nahar, Devashish Tupkary, Yuming Zhao, Norbert Lütkenhaus, Ernest Tan,

(参考訳) ポストセレクション技術は、コヒーレント攻撃に対する量子鍵分布プロトコルの安全性を証明する重要な証明手法である。本研究では,光量子鍵分布プロトコルにポストセレクション手法を厳格に適用するために,複数のステップを踏襲する。まず, ポストセレクション手法を厳密な数学的基礎の上に配置し, 元のポストセレクション論文の技術的欠陥を修正した。次に,デ・フィネッティ・リダクション(De Finetti reduction)を用いて提案手法を適用し,提案手法の適用性について検討した。第3に、ソースにタグを付けることで、deoy-stateプロトコルにポストセレクション手法をどのように利用できるかを示す。最後に, フラッグステート・スカッシャーの新たな変種を開発することにより, ポストセレクション技術の適用性を, リアルな光学装置に拡張する。また,既存のデ・フィネッティ減量法を改良し,キーレートに対するポストセレクション手法の適用効果を低減した。これらの改善は他の量子情報処理タスクにも適用できる。本稿では,本研究の適用性を示す例として,タイムビン符号化三状態プロトコルに適用する。我々は,ポストセレクション手法が,コヒーレント攻撃に対する他の既知の証明手法よりも優れていることを観察した。

The postselection technique is an important proof technique for proving the security of quantum key distribution protocols against coherent attacks. In this work, we go through multiple steps to rigorously apply the postselection technique to optical quantum key distribution protocols. First, we place the postselection technique on a rigorous mathematical foundation by fixing a technical flaw in the original postselection paper. Second, we extend the applicability of the postselection technique to prepare-and-measure protocols by using a de Finetti reduction with a fixed marginal. Third, we show how the postselection technique can be used for decoy-state protocols by tagging the source. Finally, we extend the applicability of the postselection technique to realistic optical setups by developing a new variant of the flag-state squasher. We also improve existing de Finetti reductions, which reduce the effect of using the postselection technique on the key rate. These improvements can be more generally applied to other quantum information processing tasks. As an example to demonstrate the applicability of our work, we apply our results to the time-bin encoded three-state protocol. We observe that the postselection technique performs better than all other known proof techniques against coherent attacks.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# denoiSplit: ジョイントイメージ分割と教師なし denoising の方法

denoiSplit: a method for joint image splitting and unsupervised denoising ( http://arxiv.org/abs/2403.11854v1 )

ライセンス: Link先を確認

Ashesh Ashesh, Florian Jug,

(参考訳) 本研究では,新しい分析課題,すなわち共同意味画像分割と教師なし認知の課題に対処する手法であるdenoiSplitを提案する。この二重アプローチは蛍光顕微鏡において重要な応用であり、セマンティック画像分割は重要な応用であるが、ノイズは一般的に画像内容の下流解析を妨げる。画像分割は、イメージを識別可能なセマンティック構造に分割することを含む。この課題に対する現在の最先端の手法は、意図せず予測された出力にノイズを分散させることによって、画像ノイズの存在に苦しむことを示す。ここでは、教師なしの減音サブタスクを統合することで、画像ノイズに対処することができる。この統合により、画像ノイズの顕著かつ現実的なレベルが存在する場合でも、セマンティックイメージのアンミックスが改善される。デノワスプリットの重要な革新は、特に定式化されたノイズモデルの使用と、我々が訓練している高次元階層型潜在空間に対するKL偏差損失の適切な調整である。実世界の顕微鏡画像において,4つのタスクにまたがるデノワスプリットの性能を示す。さらに,1つの変分分割エンコーダデコーダ(VSE)ネットワークを用いて,2つの適切なノイズモデルを用いてセマンティックスプリッティングとデノナイジングを共同で行うことにより,定性的かつ定量的な評価を行い,既存のベンチマークと比較した。

In this work we present denoiSplit, a method to tackle a new analysis task, i.e. the challenge of joint semantic image splitting and unsupervised denoising. This dual approach has important applications in fluorescence microscopy, where semantic image splitting has important applications but noise does generally hinder the downstream analysis of image content. Image splitting involves dissecting an image into its distinguishable semantic structures. We show that the current state-of-the-art method for this task struggles in the presence of image noise, inadvertently also distributing the noise across the predicted outputs. The method we present here can deal with image noise by integrating an unsupervised denoising sub-task. This integration results in improved semantic image unmixing, even in the presence of notable and realistic levels of imaging noise. A key innovation in denoiSplit is the use of specifically formulated noise models and the suitable adjustment of KL-divergence loss for the high-dimensional hierarchical latent space we are training. We showcase the performance of denoiSplit across 4 tasks on real-world microscopy images. Additionally, we perform qualitative and quantitative evaluations and compare results to existing benchmarks, demonstrating the effectiveness of using denoiSplit: a single Variational Splitting Encoder-Decoder (VSE) Network using two suitable noise models to jointly perform semantic splitting and denoising.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 結晶特性予測のための完全かつ効率的なグラフ変換器

Complete and Efficient Graph Transformers for Crystal Material Property Prediction ( http://arxiv.org/abs/2403.11857v1 )

ライセンス: Link先を確認

Keqiang Yan, Cong Fu, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji,

(参考訳) 結晶構造は、3次元空間の正則格子に沿って繰り返される原始単位セル内の原子塩基によって特徴づけられる。結晶の周期性と無限の性質は、幾何学グラフ表現学習に固有の課題を提起する。具体的には、結晶の完全な幾何学的情報を効果的に捉え、キラル結晶を扱うグラフを構築することは、未解決で困難な問題である。本稿では, 単位細胞の周期的パターンを利用して各原子の格子に基づく表現を確立し, 結晶の効率的かつ表現力のあるグラフ表現を実現する手法を提案する。さらに,結晶材料に特化して設計されたSE(3)トランスであるComFormerを提案する。 ComFormerには、ユークリッド距離と角度の不変な幾何学的記述子を使用するiComFormerと、等変ベクトル表現を使用するeComFormerの2つの変種が含まれている。実験により,ComFormer変種が広く使用されている3つの結晶ベンチマークにおいて,様々なタスクにおいて精度良く予測できることが実証された。私たちのコードはAIRSライブラリ(https://github.com/divelab/AIRS)の一部として公開されています。

Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an unsolved and challenging problem. In this paper, we introduce a novel approach that utilizes the periodic patterns of unit cells to establish the lattice-based representation for each atom, enabling efficient and expressive graph representations of crystals. Furthermore, we propose ComFormer, a SE(3) transformer designed specifically for crystalline materials. ComFormer includes two variants; namely, iComFormer that employs invariant geometric descriptors of Euclidean distances and angles, and eComFormer that utilizes equivariant vector representations. Experimental results demonstrate the state-of-the-art predictive accuracy of ComFormer variants on various tasks across three widely-used crystal benchmarks. Our code is publicly available as part of the AIRS library (https://github.com/divelab/AIRS).

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 評価指標としてのGPT-4:農業における害虫管理における大規模言語モデルの評価

GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture ( http://arxiv.org/abs/2403.11858v1 )

ライセンス: Link先を確認

Shanglong Yang, Zhipeng Yuan, Shunbao Li, Ruoling Peng, Kang Liu, Po Yang,

(参考訳) 人工知能(AI)の急速に発展する分野では、農業、特に害虫管理における大規模言語モデル(LLM)の適用は、いまだに初期段階にある。我々は,OpenAIのGenerative Pre-trained Transformer(GPT)シリーズやGoogleのFLANシリーズなど,LLMsが生み出す害虫管理アドバイスの内容を評価することで,その実現可能性を証明することを目的とした。農業アドバイスの文脈固有の性質を考えると、LLMが生成するテキストの品質を自動的に測定または定量化することは重要な課題である。我々は, GPT-4 を評価指標として, コヒーレンス, 論理的一貫性, 頻度, 妥当性, 包括性, 実行性について, 生成した内容を評価する革新的な手法を提案した。さらに,収穫閾値データに基づくエキスパートシステムをベースラインとして統合し,農作物に生息する害虫が管理行動をとるかどうかの実態的精度のスコアを得る。各モデルのスコアは、最終的なスコアを得るためにパーセンテージによって重み付けされた。その結果, GPT-3.4 と GPT-4 はほとんどの評価カテゴリーにおいて FLAN モデルより優れていた。さらに、ドメイン固有の知識を含む指導ベースのプロンプトの使用は、農耕において有効なツールとしてLLMsが有効であることが証明され、精度は72%となり、害虫管理の提案を行う上でのLLMsの有効性が示された。

In the rapidly evolving field of artificial intelligence (AI), the application of large language models (LLMs) in agriculture, particularly in pest management, remains nascent. We aimed to prove the feasibility by evaluating the content of the pest management advice generated by LLMs, including the Generative Pre-trained Transformer (GPT) series from OpenAI and the FLAN series from Google. Considering the context-specific properties of agricultural advice, automatically measuring or quantifying the quality of text generated by LLMs becomes a significant challenge. We proposed an innovative approach, using GPT-4 as an evaluator, to score the generated content on Coherence, Logical Consistency, Fluency, Relevance, Comprehensibility, and Exhaustiveness. Additionally, we integrated an expert system based on crop threshold data as a baseline to obtain scores for Factual Accuracy on whether pests found in crop fields should take management action. Each model's score was weighted by percentage to obtain a final score. The results showed that GPT-3.4 and GPT-4 outperform the FLAN models in most evaluation categories. Furthermore, the use of instruction-based prompting containing domain-specific knowledge proved the feasibility of LLMs as an effective tool in agriculture, with an accuracy rate of 72%, demonstrating LLMs' effectiveness in providing pest management suggestions.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# SAML V2.0 Web Browser SSO 標準の自動セキュリティ解析に向けて - POST/Artifact のユースケース

Towards automated formal security analysis of SAML V2.0 Web Browser SSO standard - the POST/Artifact use case ( http://arxiv.org/abs/2403.11859v1 )

ライセンス: Link先を確認

Zvonimir Hartl, Ante Đerek,

(参考訳) シングルサインオン(SSO)プロトコルは、複数のオンラインサービスに対する統一ログインによるユーザ認証を合理化し、ユーザビリティとセキュリティを改善している。最も一般的なSSOプロトコルフレームワークの1つであるSecurity Assertion Markup Language V2.0 (SAML) Web SSO Profileは、主に政府、教育、エンタープライズ環境で20年以上使われてきた。ミッションクリティカルな性質にもかかわらず、Web SSO Profileの特定の配置と構成のみが公式に分析されている。本稿では,POST/Artifact Bindingsのユースケースを用いて,SAML V2.0 SP-initiated SSOの総合的なセキュリティ解析を行うことにより,このギャップを埋めようとしている。特定のデプロイメントや構成に集中するのではなく、標準で許可された多くの異なるデプロイメントをキャプチャすることを目標として、仕様をしっかりとフォローしています。モデリングと解析は,暗号化のシンボリックモデルにおけるセキュリティプロトコルの自動検証のための最先端ツールであるTamarin proverを用いて行われる。技術的には、ユースケースのメタモデルを構築し、8つの異なるプロトコルの変種にインスタンス化します。 Tamarinの証明器を使って、これらのプロトコルの変種に対して、いくつかの重要なセキュリティ特性を正式に検証し、特定の欠点と潜在的な脆弱性を特定します。

Single Sign-On (SSO) protocols streamline user authentication with a unified login for multiple online services, improving usability and security. One of the most common SSO protocol frameworks - the Security Assertion Markup Language V2.0 (SAML) Web SSO Profile - has been in use for more than two decades, primarily in government, education and enterprise environments. Despite its mission-critical nature, only certain deployments and configurations of the Web SSO Profile have been formally analyzed. This paper attempts to bridge this gap by performing a comprehensive formal security analysis of the SAML V2.0 SP-initiated SSO with POST/Artifact Bindings use case. Rather than focusing on a specific deployment and configuration, we closely follow the specification with the goal of capturing many different deployments allowed by the standard. Modeling and analysis is performed using Tamarin prover - state-of-the-art tool for automated verification of security protocols in the symbolic model of cryptography. Technically, we build a meta-model of the use case that we instantiate to eight different protocol variants. Using the Tamarin prover, we formally verify a number of critical security properties for those protocol variants, while identifying certain drawbacks and potential vulnerabilities.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# 熱画像を用いたマルチモーダルニューラルシーン表現の探索

Exploring Multi-modal Neural Scene Representations With Applications on Thermal Imaging ( http://arxiv.org/abs/2403.11865v1 )

ライセンス: Link先を確認

Mert Özer, Maximilian Weiherer, Martin Hundhausen, Bernhard Egger,

(参考訳) ニューラル・ラジアンス・フィールド(NeRF)は、RGB画像のセットでトレーニングする際、新しいビュー合成タスクのための新しいデファクト標準として急速に進化した。本稿では,マルチモーダル学習の文脈において,NeRFなどのニューラルシーン表現を包括的に評価する。具体的には,RGB以外の2次モダリティをNeRFに組み込むための4つの戦略を提示する。(1) 両方のモダリティに独立してスクラッチからトレーニングすること,(2) RGBの事前トレーニングと2次モダリティの微調整を行うこと,(3) 第二分枝を追加すること,(4) 追加モダリティの(色)値を予測するために別成分を追加すること,である。熱画像はRGBとラジオシティの点で大きく異なるため,第2のモダリティとして選択した。提案手法の評価のために,6つの共通オブジェクトと約360RGBのサーマルイメージからなる,公開された新しいマルチビューデータセットであるThermialMixを収集した。データキャプチャに先立ってモダリティ校正を行い、RGBと熱画像の高品質なアライメントを実現した。以上の結果から,第2分枝をNeRFに付加することは熱画像の新規なビュー合成に最適であり,かつRGBに有意な結果をもたらすことが判明した。最後に、近赤外画像や深度マップなど他のモードに一般化した分析結果を示す。プロジェクトページ: https://mert-o.github.io/ThermalNeRF/。

Neural Radiance Fields (NeRFs) quickly evolved as the new de-facto standard for the task of novel view synthesis when trained on a set of RGB images. In this paper, we conduct a comprehensive evaluation of neural scene representations, such as NeRFs, in the context of multi-modal learning. Specifically, we present four different strategies of how to incorporate a second modality, other than RGB, into NeRFs: (1) training from scratch independently on both modalities; (2) pre-training on RGB and fine-tuning on the second modality; (3) adding a second branch; and (4) adding a separate component to predict (color) values of the additional modality. We chose thermal imaging as second modality since it strongly differs from RGB in terms of radiosity, making it challenging to integrate into neural scene representations. For the evaluation of the proposed strategies, we captured a new publicly available multi-view dataset, ThermalMix, consisting of six common objects and about 360 RGB and thermal images in total. We employ cross-modality calibration prior to data capturing, leading to high-quality alignments between RGB and thermal images. Our findings reveal that adding a second branch to NeRF performs best for novel view synthesis on thermal images while also yielding compelling results on RGB. Finally, we also show that our analysis generalizes to other modalities, including near-infrared images and depth maps. Project page: https://mert-o.github.io/ThermalNeRF/.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# ガウススプラッティングによるビュー一貫性3次元編集

View-Consistent 3D Editing with Gaussian Splatting ( http://arxiv.org/abs/2403.11868v1 )

ライセンス: Link先を確認

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang,

(参考訳) 3D Gaussian Splatting (3DGS)の出現は、3D編集に革命をもたらし、効率よく高忠実なレンダリングを提供し、正確な局所的な操作を可能にした。現在、拡散ベースの2D編集モデルを用いて、マルチビューレンダリング画像を修正し、3DGSモデルの編集をガイドしている。しかし、このアプローチは多視点不整合の重要な問題に直面しており、誘導画像はビュー間で大きな相違を示し、モード崩壊と3DGSの視覚的アーティファクトをもたらす。この目的のために、3DGSをシームレスに画像編集プロセスに組み込む新しいフレームワークであるView-Consistent Editing (VcEdit)を導入する。 VcEditには、Cross-attention Consistency ModuleとEditing Consistency Moduleという2つの革新的な一貫性モジュールがある。これらの一貫性モジュールを反復的なパターンに組み込むことで、VcEditは多視点不整合の問題を解決し、様々な場面で高品質な3DGS編集を容易にする。

The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes.

翻訳日:2024-03-20 20:00:12 公開日:2024-03-18

# IDF-CR:リモートセンシング画像における分流・対流雲除去の反復拡散過程

IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images ( http://arxiv.org/abs/2403.11870v1 )

ライセンス: Link先を確認

Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin,

(参考訳) 深層学習技術は、光学リモートセンシング画像から雲を除去する効果を実証している。畳み込みニューラルネットワーク(CNN)は、クラウド除去タスクにおいて優位性を発揮する。しかし、畳み込み操作の固有の制限により、CNNはわずかに雲の閉塞に対処できる。近年、拡散モデルは、画像生成と再構成において、その強大な生成能力により、最先端(SOTA)の習熟度を達成している。拡散モデルの急激な発展に触発されて、我々はまず、成分分割・対流雲除去を実現するための強力な生成能力を示す雲除去のための反復拡散過程(IDF-CR)を提示する。 IDF-CRはピクセル空間雲除去モジュール(Pixel-CR)と遅延空間反復ノイズ拡散ネットワーク(IND)から構成される。具体的には、IGF-CRはピクセル空間と潜在空間に対処する2段階のモデルに分けられる。 2段階のモデルは、予備的な雲の縮小から微妙な細部の改良への戦略的移行を促進する。ピクセル空間の段階では、Pixel-CRは雲画像の処理を開始し、事前の雲除去知識を持つ拡散モデルを提供する前に、最適な雲の除去をもたらす。潜時空間の段階では、拡散モデルは低品質の雲の除去を高品質のクリーンな出力に変換する。 ControlNetを実装して安定拡散を改良する。さらに,拡散モデルに非教師付き反復雑音除去(INR)モジュールを導入し,予測された雑音の分布を最適化し,高度な詳細回復を向上する。我々のモデルは、光学リモートセンシングデータセット上で、画像再構成や光リモートセンシングクラウド除去など、他のSOTA手法とよく機能する。

Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of-the-art (SOTA) proficiency in image generation and reconstruction due to their formidable generative capabilities. Inspired by the rapid development of diffusion models, we first present an iterative diffusion process for cloud removal (IDF-CR), which exhibits a strong generative capabilities to achieve component divide-and-conquer cloud removal. IDF-CR consists of a pixel space cloud removal module (Pixel-CR) and a latent space iterative noise diffusion network (IND). Specifically, IDF-CR is divided into two-stage models that address pixel space and latent space. The two-stage model facilitates a strategic transition from preliminary cloud reduction to meticulous detail refinement. In the pixel space stage, Pixel-CR initiates the processing of cloudy images, yielding a suboptimal cloud removal prior to providing the diffusion model with prior cloud removal knowledge. In the latent space stage, the diffusion model transforms low-quality cloud removal into high-quality clean output. We refine the Stable Diffusion by implementing ControlNet. In addition, an unsupervised iterative noise refinement (INR) module is introduced for diffusion model to optimize the distribution of the predicted noise, thereby enhancing advanced detail recovery. Our model performs best with other SOTA methods, including image reconstruction and optical remote-sensing cloud removal on the optical remote-sensing datasets.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# ニューラルネットワークのリアルトロピカル幾何学

The Real Tropical Geometry of Neural Networks ( http://arxiv.org/abs/2403.11871v1 )

ライセンス: Link先を確認

Marie-Charlotte Brandenburg, Georg Loho, Guido Montúfar,

(参考訳) 我々は、熱帯有理関数の符号として定義される二項分類器を、2つの凸線型関数の差として考える。 ReLUニューラルネットワークのパラメータ空間は、熱帯有理関数のパラメータ空間内の半代数集合として含まれる。我々は、このパラメータ空間の2つの異なる部分分割の研究を開始する: 半代数集合に分割し、決定境界の組合せ型を固定し、多面体ファンに分割し、データセットの分割のコンビネータをキャプチャする。 0/1-ロス関数の下位レベル集合は、この分類ファンの下位ファンとして現れ、レベル集合は必ずしも連結でないことを示す。分類ファンについて述べる一アクティベーションポリトープの通常の扇として、及び二関連する二分グラフの性質のリストを組合せて、配向マトロイド及び熱帯配向マトロイドのコベクター公理に類似すること。本研究は,高地表面の正の熱帯化や半代数集合の熱帯化など,実際の熱帯地形で確立された構造を観察することにより,ニューラルネットワークと熱帯地形の関係を拡大・改善するものである。

We consider a binary classifier defined as the sign of a tropical rational function, that is, as the difference of two convex piecewise linear functions. The parameter space of ReLU neural networks is contained as a semialgebraic set inside the parameter space of tropical rational functions. We initiate the study of two different subdivisions of this parameter space: a subdivision into semialgebraic sets, on which the combinatorial type of the decision boundary is fixed, and a subdivision into a polyhedral fan, capturing the combinatorics of the partitions of the dataset. The sublevel sets of the 0/1-loss function arise as subfans of this classification fan, and we show that the level-sets are not necessarily connected. We describe the classification fan i) geometrically, as normal fan of the activation polytope, and ii) combinatorially through a list of properties of associated bipartite graphs, in analogy to covector axioms of oriented matroids and tropical oriented matroids. Our findings extend and refine the connection between neural networks and tropical geometry by observing structures established in real tropical geometry, such as positive tropicalizations of hypersurfaces and tropical semialgebraic sets.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# NuGraph2:ニュートリノ物理イベント再構築のためのグラフニューラルネットワーク

NuGraph2: A Graph Neural Network for Neutrino Physics Event Reconstruction ( http://arxiv.org/abs/2403.11872v1 )

ライセンス: Link先を確認

V Hewes, Adam Aurisano, Giuseppe Cerati, Jim Kowalkowski, Claire Lee, Wei-keng Liao, Daniel Grzenda, Kaushal Gumpula, Xiaohe Zhang,

(参考訳) 液体アルゴン時間射影チャンバー(LArTPC)検出器技術は、粒子相互作用に関する豊富な高解像度情報を提供し、その情報を最大限に活用するには高度な自動再構成技術が必要である。本稿では、LArTPC検出器におけるシミュレーションニュートリノ相互作用の低レベル再構成のためのグラフニューラルネットワーク(GNN)であるNuGraph2について述べる。 MicroBooNE検出器幾何学におけるシミュレートされたニュートリノ相互作用は、平面部分グラフ上にノードを形成する各検出器面にエネルギー沈着を持つ不均一グラフとして記述される。このネットワークは、バックグラウンドフィルタリングとセマンティックラベリングをこれらのグラフノード上で実行し、98.0\%の効率で一次物理相互作用に関連するものを識別し、94.9\%の効率で粒子タイプに従ってラベル付けする。このネットワークは、複数の2次元表現にまたがる検出器オブザーバブルを直接運用するが、これらの表現間の一貫性を促進するために3Dコンテキスト認識機構を利用する。モデル推論はCPUでは0.12 s/event、GPUでは0.005 s/eventである。このアーキテクチャはニュートリノ物理学における粒子再構成のための汎用的なソリューションとして設計されており、幅広い検出器技術に展開する可能性がある。

Liquid Argon Time Projection Chamber (LArTPC) detector technology offers a wealth of high-resolution information on particle interactions, and leveraging that information to its full potential requires sophisticated automated reconstruction techniques. This article describes NuGraph2, a Graph Neural Network (GNN) for low-level reconstruction of simulated neutrino interactions in a LArTPC detector. Simulated neutrino interactions in the MicroBooNE detector geometry are described as heterogeneous graphs, with energy depositions on each detector plane forming nodes on planar subgraphs. The network utilizes a multi-head attention message-passing mechanism to perform background filtering and semantic labelling on these graph nodes, identifying those associated with the primary physics interaction with 98.0\% efficiency and labelling them according to particle type with 94.9\% efficiency. The network operates directly on detector observables across multiple 2D representations, but utilizes a 3D-context-aware mechanism to encourage consistency between these representations. Model inference takes 0.12 s/event on a CPU, and 0.005 s/event batched on a GPU. This architecture is designed to be a general-purpose solution for particle reconstruction in neutrino physics, with the potential for deployment across a broad range of detector technologies, and offers a core convolution engine that can be leveraged for a variety of tasks beyond the two described in this article.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# CO3: 生成的対話型クエリ書き換えのための低リソースコントラスト協調トレーニング

CO3: Low-resource Contrastive Co-training for Generative Conversational Query Rewrite ( http://arxiv.org/abs/2403.11873v1 )

ライセンス: Link先を確認

Yifei Yuan, Chen Shi, Runze Wang, Liyi Chen, Renjun Hu, Zengming Zhang, Feijun Jiang, Wai Lam,

(参考訳) 生成的クエリ書き直しは、会話履歴を用いて再構成されたクエリ書き直しを生成する。近年,これらの手法は,データサイズに制限があるため固有のノイズに敏感であるのに対し,この課題に対して,少数ショット学習が人気が高まっている。さらに、両方の試みは、トレーニングとテストケースの間に言語スタイルのシフトがある場合、パフォーマンスの低下に直面します。そこで本研究では,ノイズや言語スタイルのシフトに対して頑健な,低リソースな生成的対話型クエリ書き換えについて検討する。中心となる考え方は、大量のラベルのないデータを使用して、コントラッシブなコトレーニングパラダイムを通じてさらなる改善を行うことである。具体的には、2つの双対モデル(RewriterとSimplifier)を共同でトレーニングし、それぞれが擬似ラベルによる追加ガイダンスを提供し、互いに反復的に拡張する。また、データ拡張によるコントラスト学習を活用して、ノイズよりも真に価値のある情報にもっと注意を払うことができます。大規模な実験は、少数ショットとゼロショットの両方のシナリオで、我々のモデルの優越性を実証する。また、言語スタイルのシフトに遭遇する際のモデルのより優れた一般化能力を検証する。

Generative query rewrite generates reconstructed query rewrites using the conversation history while rely heavily on gold rewrite pairs that are expensive to obtain. Recently, few-shot learning is gaining increasing popularity for this task, whereas these methods are sensitive to the inherent noise due to limited data size. Besides, both attempts face performance degradation when there exists language style shift between training and testing cases. To this end, we study low-resource generative conversational query rewrite that is robust to both noise and language style shift. The core idea is to utilize massive unlabeled data to make further improvements via a contrastive co-training paradigm. Specifically, we co-train two dual models (namely Rewriter and Simplifier) such that each of them provides extra guidance through pseudo-labeling for enhancing the other in an iterative manner. We also leverage contrastive learning with data augmentation, which enables our model pay more attention on the truly valuable information than the noise. Extensive experiments demonstrate the superiority of our model under both few-shot and zero-shot scenarios. We also verify the better generalization ability of our model when encountering language style shift.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# ダイナミックビジョンセンサを用いた高速無人航空機の実時間検出に向けて

Towards Real-Time Fast Unmanned Aerial Vehicle Detection Using Dynamic Vision Sensors ( http://arxiv.org/abs/2403.11875v1 )

ライセンス: Link先を確認

Jakub Mandula, Jonas Kühne, Luca Pascarella, Michele Magno,

(参考訳) 無人航空機(UAV)は民間や軍事用途で人気を集めている。しかし、制限区域への制御されていないアクセスは、プライバシーとセキュリティを脅かす。したがって、UAVの防止と検出は、機密性と安全性を保証するために重要である。レーダーをベースとした能動走査は最も精度の高い技術であるが、受動的検査(例えば物体認識)よりも高価で汎用性が低い。ダイナミックビジョンセンサー(Dynamic Vision Sensor, DVS)は、低遅延物体検出によく適応する高速移動シーンにおいて、タイムスタンプによる画素レベルの明るさ変化を利用する、バイオインスパイアされたイベントベースの視覚モデルである。本稿では,F-UAV-D(Fast Unmanned Aerial Vehicle Detector)を提案する。特に、リアルタイム・低消費電力構成におけるRGBカメラの代替としてDVSを利用するためのセットアップを提案する。提案手法は,DVSの高ダイナミックレンジ(HDR)と背景抑制を活用し,様々な高速移動ドローンを用いて訓練すると,低照度や高速移動シーンなどの環境条件下でRGB入力より優れる。 F-UAV-Dが有効であることを示す。 (i)平均15W未満でドローンを検出すること。 i)エッジコンピュータのCPUとGPUノードを活用することにより、リアルタイム推論(50ms)を行う。

Unmanned Aerial Vehicles (UAVs) are gaining popularity in civil and military applications. However, uncontrolled access to restricted areas threatens privacy and security. Thus, prevention and detection of UAVs are pivotal to guarantee confidentiality and safety. Although active scanning, mainly based on radars, is one of the most accurate technologies, it can be expensive and less versatile than passive inspections, e.g., object recognition. Dynamic vision sensors (DVS) are bio-inspired event-based vision models that leverage timestamped pixel-level brightness changes in fast-moving scenes that adapt well to low-latency object detection. This paper presents F-UAV-D (Fast Unmanned Aerial Vehicle Detector), an embedded system that enables fast-moving drone detection. In particular, we propose a setup to exploit DVS as an alternative to RGB cameras in a real-time and low-power configuration. Our approach leverages the high-dynamic range (HDR) and background suppression of DVS and, when trained with various fast-moving drones, outperforms RGB input in suboptimal ambient conditions such as low illumination and fast-moving scenes. Our results show that F-UAV-D can (i) detect drones by using less than <15 W on average and (ii) perform real-time inference (i.e., <50 ms) by leveraging the CPU and GPU nodes of our edge computer.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# 自己監督型高分解能オフロードマッピングのためのディープベイズフュージョン

Deep Bayesian Future Fusion for Self-Supervised, High-Resolution, Off-Road Mapping ( http://arxiv.org/abs/2403.11876v1 )

ライセンス: Link先を確認

Shubhra Aich, Wenshan Wang, Parv Maheshwari, Matthew Sivaprakasam, Samuel Triest, Cherie Ho, Jason M. Gregory, John G. Rogers III, Sebastian Scherer,

(参考訳) 資源が制限されたオフロード車両の感度の制限は、信頼性の高いオフロード自律性に重大な課題をもたらす。この制限を克服するため、我々は将来的な情報(すなわち、将来の融合)を自己監督のために融合する一般的な枠組みを提案する。近年のアプローチでは、この未来の情報を手作りのヒューリスティックスと共に活用して、ターゲットとする下流タスクを直接監督している(例えば、トラバーサビリティ推定)。しかし,本稿では,高分解能(画素あたり2cm)のBEVマップを将来の融合を通じて自己監督的に作成し,より長い範囲の予測のために下流のタスクに使用できる,より一般的な開発ラインを選択する。この目的のために、まず、RGB/高さの生のスパースとノイズの多い入力とマップベースの高密度ラベルのペアを含む高解像度のフューチャーフュージョンデータセットを作成する。次に,特に遠位領域における知覚情報のノイズや空間性に対応するため,バニラ畳み込みネットワークへのベイズフィルタの効率よく実現する機構を設計する。我々のベイズ構造は、SOTA生成モデルからアイデアを取り入れ、遠位領域における高品質なBEVマップを効果的に予測する。将来の融合データセットにおける完了の質と下流タスクの双方に対する広範囲な評価は、我々のアプローチの可能性を示している。

The limited sensing resolution of resource-constrained off-road vehicles poses significant challenges towards reliable off-road autonomy. To overcome this limitation, we propose a general framework based on fusing the future information (i.e. future fusion) for self-supervision. Recent approaches exploit this future information alongside the hand-crafted heuristics to directly supervise the targeted downstream tasks (e.g. traversability estimation). However, in this paper, we opt for a more general line of development - time-efficient completion of the highest resolution (i.e. 2cm per pixel) BEV map in a self-supervised manner via future fusion, which can be used for any downstream tasks for better longer range prediction. To this end, first, we create a high-resolution future-fusion dataset containing pairs of (RGB / height) raw sparse and noisy inputs and map-based dense labels. Next, to accommodate the noise and sparsity of the sensory information, especially in the distal regions, we design an efficient realization of the Bayes filter onto the vanilla convolutional network via the recurrent mechanism. Equipped with the ideas from SOTA generative models, our Bayesian structure effectively predicts high-quality BEV maps in the distal regions. Extensive evaluation on both the quality of completion and downstream task on our future-fusion dataset demonstrates the potential of our approach.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# 第4世代暖房グリッドにおける学習型熱流の効率的な訓練

Efficient Training of Learning-Based Thermal Power Flow for 4th Generation District Heating Grids ( http://arxiv.org/abs/2403.11877v1 )

ライセンス: Link先を確認

Andreas Bott, Mario Beykirch, Florian Steinke,

(参考訳) ヒートパワーフロー(TPF)は、複数の分散型熱源とメッシュグリッド構造を有する第4世代地域熱グリッドにおいて、様々な制御目的のために重要なタスクである。 TPFの計算、すなわち、所定の供給および需要値に対する温度、圧力、質量フローからなるグリッド状態を決定することは、非線形熱グリッド方程式を解くことによって古典的に行われるが、ニューラルネットワークのような学習モデルを用いて、桁違いに高速化することができる。本稿では,必要な供給と需要を網羅した,十分に大規模なトレーニングデータセットを生成するための,新しい,効率的な手法を提案する。提案手法は,供給と需要の値をサンプリングする代わりに,発電機および消費者のマスフロー上のプロキシ分布からトレーニング例を生成し,ヒートグリッド方程式の解法に必要なイテレーションを省略する。正確には、わずかに異なるトレーニング例は、元のトレーニング分布を表すために重み付けすることができる。提案手法は, トレーニングサンプルの信頼性を損なうことなく, サンプリングと需要値を直接比較して, トレーニングセット生成時間を2桁の規模で削減できることを示す。さらに, トレーニングデータセットを用いたTPFの学習は, サンプルレス, 物理対応のトレーニングアプローチを著しく上回ることを示した。

Thermal power flow (TPF) is an important task for various control purposes in 4 Th generation district heating grids with multiple decentral heat sources and meshed grid structures. Computing the TPF, i.e., determining the grid state consisting of temperatures, pressures, and mass flows for given supply and demand values, is classically done by solving the nonlinear heat grid equations, but can be sped up by orders of magnitude using learned models such as neural networks. We propose a novel, efficient scheme to generate a sufficiently large training data set covering relevant supply and demand values. Instead of sampling supply and demand values, our approach generates training examples from a proxy distribution over generator and consumer mass flows, omitting the iterations needed for solving the heat grid equations. The exact, but slightly different, training examples can be weighted to represent the original training distribution. We show with simulations for typical grid structures that the new approach can reduce training set generation times by two orders of magnitude compared to sampling supply and demand values directly, without loss of relevance for the training samples. Moreover, learning TPF with a training data set is shown to outperform sample-free, physics-aware training approaches significantly.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# InTeX:Unified Depth-Aware Inpaintingによるインタラクティブテキスト・テクスチャ合成

InTeX: Interactive Text-to-texture Synthesis via Unified Depth-aware Inpainting ( http://arxiv.org/abs/2403.11878v1 )

ライセンス: Link先を確認

Jiaxiang Tang, Ruijie Lu, Xiaokang Chen, Xiang Wen, Gang Zeng, Ziwei Liu,

(参考訳) テキスト・ツー・テクスチャ合成は、最近のテキスト・ツー・イメージ・モデルの発展により、3Dコンテンツ作成の新たなフロンティアとなった。既存の手法は主に、事前訓練された深度認識拡散と塗装モデルの組み合わせを採用するが、それらは3Dの不整合や制限された制御可能性などの欠点を示す。これらの課題に対処するために,インタラクティブなテキスト・テクスチャ合成のための新しいフレームワークであるInteXを紹介する。 1) InteXはユーザフレンドリーなインタフェースを備えており、合成プロセス全体を通して対話や制御が容易であり、地域固有の塗り替えや正確なテクスチャ編集を可能にしている。 2) 深度情報と塗布手がかりを統合し, 3D の不整合を効果的に軽減し, 生成速度を向上する統合深度認識塗装モデルを構築した。大規模な実験を通じて,本フレームワークはテキストからテクスチャへの合成に実用的かつ効果的であることが証明され,高品質な3Dコンテンツ作成の道を開いた。

Text-to-texture synthesis has become a new frontier in 3D content creation thanks to the recent advances in text-to-image models. Existing methods primarily adopt a combination of pretrained depth-aware diffusion and inpainting models, yet they exhibit shortcomings such as 3D inconsistency and limited controllability. To address these challenges, we introduce InteX, a novel framework for interactive text-to-texture synthesis. 1) InteX includes a user-friendly interface that facilitates interaction and control throughout the synthesis process, enabling region-specific repainting and precise texture editing. 2) Additionally, we develop a unified depth-aware inpainting model that integrates depth information with inpainting cues, effectively mitigating 3D inconsistencies and improving generation speed. Through extensive experiments, our framework has proven to be both practical and effective in text-to-texture synthesis, paving the way for high-quality 3D content creation.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# 単相多面体融合による情緒的偏見の予測

Unimodal Multi-Task Fusion for Emotional Mimicry Prediciton ( http://arxiv.org/abs/2403.11879v1 )

ライセンス: Link先を確認

Tobias Hallmen, Fabian Deuser, Norbert Oswald, Elisabeth André,

(参考訳) 本研究では,第6回ワークショップおよび感情行動分析コンペティションにおける情緒的不安度(EMI)推定の方法論を提案する。提案手法では,包括的ポッドキャストデータセットで事前学習したWav2Vec 2.0フレームワークを利用して,言語的およびパラ言語的要素を含む幅広い音声特徴を抽出する。我々は,グローバルな平均ベクトルと個々の特徴を統合する融合手法により特徴表現を強化し,分析にグローバルな文脈的洞察を導入する。さらに,Wav2Vec 2.0モデルから,事前学習したValence-arousal-dominance (VAD)モジュールを組み込んだ。我々の融合では、音声データの時間的効率的な分析にLong Short-Term Memory (LSTM) アーキテクチャを採用している。提案手法は,提供された音声データのみを利用することで,確立されたベースラインよりも大幅に改善されたことを示す。

In this study, we propose a methodology for the Emotional Mimicry Intensity (EMI) Estimation task within the context of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our approach leverages the Wav2Vec 2.0 framework, pre-trained on a comprehensive podcast dataset, to extract a broad range of audio features encompassing both linguistic and paralinguistic elements. We enhance feature representation through a fusion technique that integrates individual features with a global mean vector, introducing global contextual insights into our analysis. Additionally, we incorporate a pre-trained valence- arousal-dominance (VAD) module from the Wav2Vec 2.0 model. Our fusion employs a Long Short-Term Memory (LSTM) architecture for efficient temporal analysis of audio data. Utilizing only the provided audio data, our approach demonstrates significant improvements over the established baseline.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# ReGenNet:人間の行動反応合成を目指して

ReGenNet: Towards Human Action-Reaction Synthesis ( http://arxiv.org/abs/2403.11882v1 )

ライセンス: Link先を確認

Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng,

(参考訳) 人間は周囲の環境と常に対話する。現在のヒト中心生成モデルは、静的なシーンやオブジェクトとプラスティックに相互作用する人間の合成に重点を置いている一方、ユビキタスな因果的人間と人間の相互作用のための動的ヒトの行動-反応合成は、あまり研究されていない。人間と人間の相互作用は、原子間相互作用期のアクターや反応器と非対称であると見なすことができる。本稿では,人間と人間の相互作用の非対称性,動的性,同期性,詳細性を包括的に分析し,人間の行動に条件付けされた人間の反応を生成するための,最初のマルチセットヒト行動-反応合成ベンチマークを提案する。まず,NTU120,InterHuman,Chi3Dデータセットに対して,対話シーケンスのアクター・リアクター順序をアノテートすることを提案する。それらに基づいて,ReGenNetと呼ばれるトランスフォーマーデコーダアーキテクチャを用いた拡散型生成モデルを提案する。定量的および定性的な結果から,本手法はベースラインと比較して即時かつ妥当な人間の反応を生成でき,アクターの動きや視点の変化を一般化できることが示された。

Humans constantly interact with their surrounding environments. Current human-centric generative models mainly focus on synthesizing humans plausibly interacting with static scenes and objects, while the dynamic human action-reaction synthesis for ubiquitous causal human-human interactions is less explored. Human-human interactions can be regarded as asymmetric with actors and reactors in atomic interaction periods. In this paper, we comprehensively analyze the asymmetric, dynamic, synchronous, and detailed nature of human-human interactions and propose the first multi-setting human action-reaction synthesis benchmark to generate human reactions conditioned on given human actions. To begin with, we propose to annotate the actor-reactor order of the interaction sequences for the NTU120, InterHuman, and Chi3D datasets. Based on them, a diffusion-based generative model with a Transformer decoder architecture called ReGenNet together with an explicit distance-based interaction loss is proposed to predict human reactions in an online manner, where the future states of actors are unavailable to reactors. Quantitative and qualitative results show that our method can generate instant and plausible human reactions compared to the baselines, and can generalize to unseen actor motions and viewpoint changes.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# QueryAgent: 環境フィードバックに基づく自己補正による信頼性と効率的な推論フレームワーク

QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction ( http://arxiv.org/abs/2403.11886v1 )

ライセンス: Link先を確認

Xiang Huang, Sitao Cheng, Shanshan Huang, Jiayu Shen, Yong Xu, Chaoyun Zhang, Yuzhong Qu,

(参考訳) 意味解析にLarge Language Models(LLM)を使うことは、大きな成功を収めた。しかし,幻覚に遭遇した場合,既存の手法は信頼性や効率性に乏しいことが判明した。本稿では,質問を段階的に解決し,段階的に自己補正を行うQueryAgentというフレームワークを用いて,これらの課題に対処する。環境フィードバックに基づく自己補正手法ERASERを提案する。従来のアプローチとは異なり、ERASERは中間段階の豊かな環境フィードバックを活用して、必要に応じて選択的で差別化された自己補正を行う。実験の結果、QueryAgentはGrailQAとGraphQのサンプルを7.0と15.0のF1で1つだけ使って、以前のいくつかのショットメソッドを特に上回っている。さらに,ランタイムやクエリオーバヘッド,API呼び出しコストなど,効率性の面で優れています。 ERASERを活用することで、AgentBenchという別のベースラインを約10ポイント改善し、我々のアプローチの強い転送可能性を明らかにする。

Employing Large Language Models (LLMs) for semantic parsing has achieved remarkable success. However, we find existing methods fall short in terms of reliability and efficiency when hallucinations are encountered. In this paper, we address these challenges with a framework called QueryAgent, which solves a question step-by-step and performs step-wise self-correction. We introduce an environmental feedback-based self-correction method called ERASER. Unlike traditional approaches, ERASER leverages rich environmental feedback in the intermediate steps to perform selective and differentiated self-correction only when necessary. Experimental results demonstrate that QueryAgent notably outperforms all previous few-shot methods using only one example on GrailQA and GraphQ by 7.0 and 15.0 F1. Moreover, our approach exhibits superiority in terms of efficiency, including runtime, query overhead, and API invocation costs. By leveraging ERASER, we further improve another baseline (i.e., AgentBench) by approximately 10 points, revealing the strong transferability of our approach.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# SuperLoRA:多層アテンションモジュールのパラメータ効率の良い統一適応

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules ( http://arxiv.org/abs/2403.11887v1 )

ライセンス: Link先を確認

Xiangyu Chen, Jing Liu, Ye Wang, Pu Perry Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino,

(参考訳) 低ランク適応(LoRA)とその変種は、自然言語処理のための大規模言語モデルやコンピュータビジョンのための拡散モデルなど、微調整された大型モデルに広く用いられている。本稿では、異なるパラメータ設定で実現可能な、異なるLoRA変異を統一および拡張する、SuperLoRAと呼ばれる一般化されたフレームワークを提案する。 SuperLoRAは、グループ化、折り畳み、シャッフル、プロジェクション、テンソルファクタリングを導入し、他のLoRAの亜種と比較して高い柔軟性を提供し、特に極小パラメータの状況において、トランスファーラーニングタスクの優れたパフォーマンスを示す。

Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# KnFu: 効果的な知識融合

KnFu: Effective Knowledge Fusion ( http://arxiv.org/abs/2403.11892v1 )

ライセンス: Link先を確認

S. Jamal Seyedmohammadi, S. Kawa Atapour, Jamshid Abouei, Arash Mohammadi,

(参考訳) フェデレートラーニング(FL)は、従来の集中型ラーニングのアプローチに代わる顕著な代替手段として登場した。一般的にFLは、複数のローカルノードにわたる機械学習(ML)モデルの協調トレーニングを可能にし、さまざまなデータセットを活用しながら、データのプライバシとセキュリティを確保する、分散化されたアプローチである。しかし、従来のFLは勾配反転攻撃の影響を受けやすく、局所モデルに一様アーキテクチャを限定的に適用しており、非IID局所データセットによるモデルの不均一性(モデルドリフト)に悩まされている。これらの課題を緩和するために、新しいFKD(Federated Knowledge Distillation)パラダイムが登場した。 FDKはKD(Knowledge Distillation)の概念に基づいて開発され、大きく訓練された教師モデルの知識を軽量の学生モデルに抽出し、伝達する。しかし、FKDはモデルドリフトの問題に直面している。直感的には、すべての知識が局所ノード間のデータ固有の多様性のために普遍的に有用であるとは限らない。これにより、各クライアントの知識の他人に対する妥当性と有効性を評価し、有害な知識の伝播を防止する革新的なメカニズムが求められます。そこで,本研究では,各クライアントに対するセマンティックな隣人の効果的な知識のみを融合させるため,局所モデルの知識を評価するための実効的知識融合(KnFu)アルゴリズムを提案する。 KnFuは、各クライアントに対してパーソナライズされた効果的な知識融合スキームであり、集約フェーズの前に異なるローカルモデルの知識の有効性を分析する。提案したKnFuの有効性を示すMNISTとCIFAR10データセットの総合的な実験を行った。この研究の重要な結論は、大規模でヘテロジニアスなローカルデータセットを持つシナリオでは、知識融合ベースのソリューションよりも局所的なトレーニングが望ましい、ということである。

Federated Learning (FL) has emerged as a prominent alternative to the traditional centralized learning approach. Generally speaking, FL is a decentralized approach that allows for collaborative training of Machine Learning (ML) models across multiple local nodes, ensuring data privacy and security while leveraging diverse datasets. Conventional FL, however, is susceptible to gradient inversion attacks, restrictively enforces a uniform architecture on local models, and suffers from model heterogeneity (model drift) due to non-IID local datasets. To mitigate some of these challenges, the new paradigm of Federated Knowledge Distillation (FKD) has emerged. FDK is developed based on the concept of Knowledge Distillation (KD), which involves extraction and transfer of a large and well-trained teacher model's knowledge to lightweight student models. FKD, however, still faces the model drift issue. Intuitively speaking, not all knowledge is universally beneficial due to the inherent diversity of data among local nodes. This calls for innovative mechanisms to evaluate the relevance and effectiveness of each client's knowledge for others, to prevent propagation of adverse knowledge. In this context, the paper proposes Effective Knowledge Fusion (KnFu) algorithm that evaluates knowledge of local models to only fuse semantic neighbors' effective knowledge for each client. The KnFu is a personalized effective knowledge fusion scheme for each client, that analyzes effectiveness of different local models' knowledge prior to the aggregation phase. Comprehensive experiments were performed on MNIST and CIFAR10 datasets illustrating effectiveness of the proposed KnFu in comparison to its state-of-the-art counterparts. A key conclusion of the work is that in scenarios with large and highly heterogeneous local datasets, local training could be preferable to knowledge fusion-based solutions.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# マルチパーティネットワークにおける量子コーディネート率

Quantum Coordination Rates in Multi-Partite Networks ( http://arxiv.org/abs/2403.11893v1 )

ライセンス: Link先を確認

Hosen Nator, Uzi Pereg,

(参考訳) 最適調整速度は、マルチパーティ量子ネットワークの3つの一次設定で決定され、複数のパーティ間の共同量子状態をシミュレートするために必要となる最小限のリソースを特徴付ける。本研究では,(1)狭い絡み合いを持つカスケードネットワーク,(2)1つの送信機と2つの受信機からなる放送ネットワーク,(3)2つの送信機と1つの受信機を備えた多重アクセスネットワークについて検討する。我々は,各設定において,漸近的に達成可能なコミュニケーションと絡み合い率について,必要かつ十分な条件を確立する。最後に、量子戦略を持つ非局所ゲームにおいて、結果が意味することを示す。

The optimal coordination rates are determined in three primary settings of multi-partite quantum networks, thus characterizing the minimal resources required in order to simulate a joint quantum state among multiple parties. We study the following models: (1) a cascade network with limited entanglement, (2) a broadcast network, which consists of a single sender and two receivers, (3) a multiple-access network with two senders and a single receiver. We establish the necessary and sufficient conditions on the asymptotically-achievable communication and entanglement rates in each setting. At last, we show the implications of our results on nonlocal games with quantum strategies.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# 医療における自然言語処理のための説明可能なディープラーニングから解釈可能なディープラーニングへ:現実からどのくらい遠いのか?

From explainable to interpretable deep learning for natural language processing in healthcare: how far from reality? ( http://arxiv.org/abs/2403.11894v1 )

ライセンス: Link先を確認

Guangming Huang, Yunfei Long, Yingya Li, Giorgos Papanastasiou,

(参考訳) ディープラーニング(DL)は、さまざまな自然言語処理(NLP)タスクに対処することで、医療研究を大幅に強化している。しかし、DLベースのNLP手法の複雑さの増大は、信頼性の高い意思決定のために、透明性のあるモデル解釈可能性(少なくとも説明可能性)を必要とする。本研究は, 医療用NLPにおける説明可能な, 解釈可能なDLについて, 徹底的なスコーピングレビューを行った。 XAI(eXplainable and Interpretable Artificial Intelligence)という用語は、XAIとIAIを区別するために導入された。メソッドはさらに、その機能(モデル、インプット、アウトプットベース)とスコープ(ローカル、グローバル)に基づいて分類された。分析の結果,注意機構がIAIの主流であったことが判明した。また、IAI は XAI に対してますます利用されている。主要な課題は、ほとんどのXIAIが"グローバル"なモデリングプロセス、ベストプラクティスの欠如、体系的な評価とベンチマークの必要性を探求していないことである。パーソナライズ医療におけるマルチモーダルXIAIの強化や、DLと因果推論の併用など、重要な機会が得られた。我々の議論は、LLMとドメイン固有の小さなモデルへのXIAIの統合を奨励する。我々のレビューは、医療における本質的なIAIの改善と複雑なNLPの関与に向けて、さらなる研究とベンチマークを刺激することができる。

Deep learning (DL) has substantially enhanced healthcare research by addressing various natural language processing (NLP) tasks. Yet, the increasing complexity of DL-based NLP methods necessitates transparent model interpretability, or at least explainability, for reliable decision-making. This work presents a thorough scoping review on explainable and interpretable DL in healthcare NLP. The term "XIAI" (eXplainable and Interpretable Artificial Intelligence) was introduced to distinguish XAI from IAI. Methods were further categorized based on their functionality (model-, input-, output-based) and scope (local, global). Our analysis shows that attention mechanisms were the most dominant emerging IAI. Moreover, IAI is increasingly used against XAI. The major challenges identified are that most XIAI do not explore "global" modeling processes, the lack of best practices, and the unmet need for systematic evaluation and benchmarks. Important opportunities were raised such as using "attention" to enhance multi-modal XIAI for personalized medicine and combine DL with causal reasoning. Our discussion encourages the integration of XIAI in LLMs and domain-specific smaller models. Our review can stimulate further research and benchmarks toward improving inherent IAI and engaging complex NLP in healthcare.

翻訳日:2024-03-20 19:50:22 公開日:2024-03-18

# 機械翻訳におけるジェンダーバイアスのマーカーとドライバの検討

Investigating Markers and Drivers of Gender Bias in Machine Translations ( http://arxiv.org/abs/2403.11896v1 )

ライセンス: Link先を確認

Peter J Barclay, Ashkan Sami,

(参考訳) 大規模言語モデル(LLM)におけるインプシット・ジェンダーバイアスは、十分に文書化された問題であり、自動翻訳に導入されたジェンダーの影響は、現実世界のバイアスを持続させることができる。しかし、一部のLLMはヒューリスティックスやポストプロセッシングを使ってそのようなバイアスを隠蔽し、調査を困難にしている。本稿では,従来の56のソフトウェアエンジニアリングタスクを繰り返し翻訳する際に発生するバイアスをDeepL翻訳APIを用いて,逆翻訳によるLLMのバイアスについて検討する。それぞれの文は「she」から始まり、最初は「genderless」中間言語に翻訳され、次に英語に戻す。先行研究は,(1)フィンランド語,インドネシア語,エストニア語,トルコ語,ハンガリー語という5つの中間言語を対象とした結果の比較,(2)反復翻訳で示唆される性別の変動を評価するための新しい指標の提案,(2)先行研究における個々の代名詞の過度な解釈を避けること,(3)バイアスを駆動する文の特徴を調査すること,(4)3つのタイムラプスデータセットの結果を比較してアプローチの再現性を確立すること,の5つの方法によって拡張される。いくつかの言語は3つのゆるいグループに分類されるが、そのパターンはグループによって異なる。また,文中に出現する主動詞は,翻訳における意味のあるジェンダーの要因である可能性が示唆された。さらに,本研究では,DeepL翻訳APIの動作に明らかな変化があるにも関わらず,結果の再現性が良好であることが確認された。これらの結果から,バックトランスレーション法は,言語モデルにおけるバイアスに関するさらなる洞察を与えることができることがわかった。

Implicit gender bias in Large Language Models (LLMs) is a well-documented problem, and implications of gender introduced into automatic translations can perpetuate real-world biases. However, some LLMs use heuristics or post-processing to mask such bias, making investigation difficult. Here, we examine bias in LLMss via back-translation, using the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks used in a previous study. Each statement starts with 'she', and is translated first into a 'genderless' intermediate language then back into English; we then examine pronoun- choice in the back-translated texts. We expand prior research in the following ways: (1) by comparing results across five intermediate languages, namely Finnish, Indonesian, Estonian, Turkish and Hungarian; (2) by proposing a novel metric for assessing the variation in gender implied in the repeated translations, avoiding the over-interpretation of individual pronouns, apparent in earlier work; (3) by investigating sentence features that drive bias; (4) and by comparing results from three time-lapsed datasets to establish the reproducibility of the approach. We found that some languages display similar patterns of pronoun use, falling into three loose groups, but that patterns vary between groups; this underlines the need to work with multiple languages. We also identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations. Moreover, we see a good level of replicability in the results, and establish that our variation metric proves robust despite an obvious change in the behaviour of the DeepL translation API during the course of the study. These results show that the back-translation method can provide further insights into bias in language models.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# ケーブルプラグ用ビゾタクティルプレトレーニング

Visuo-Tactile Pretraining for Cable Plugging ( http://arxiv.org/abs/2403.11898v1 )

ライセンス: Link先を確認

Abraham George, Selam Gano, Pranav Katragadda, Amir Barati Farimani,

(参考訳) 触覚情報は微粒な操作にとって重要なツールである。人間として、私たちは私たちの環境の物体を理解するために触覚情報に大きく依存しています。操作タスクの実行だけでなく、これらのタスクの実行方法の学習にもタッチを使用します。したがって、人間や超人的なパフォーマンスで操作作業の完了を学習できるロボットエージェントを作成するためには、触覚情報をスキル実行とスキル学習の両方に適切に組み込む必要がある。本稿では,複雑なタスクの性能向上のために,触覚情報を模倣学習プラットフォームに組み込む方法について検討する。そのために、細粒度ビズータクティルサーブに依存する巧妙な操作タスクであるUSBケーブルを差し込むという課題に取り組む。触覚情報を模倣学習フレームワークに組み込むことで、ロボットエージェントにUSBケーブルを接続するように訓練することが可能になります。さらに, 触覚情報を用いて非触覚エージェントの訓練を行う方法についても検討した。その結果, 触覚情報による事前学習により, 非触覚エージェントの性能が著しく向上し, ビジュオ触覚エージェントと同等のレベルに達することが示唆された。デモビデオとコードベースへのアクセスについては、プロジェクトのWebサイトを参照してください。

Tactile information is a critical tool for fine-grain manipulation. As humans, we rely heavily on tactile information to understand objects in our environments and how to interact with them. We use touch not only to perform manipulation tasks but also to learn how to perform these tasks. Therefore, to create robotic agents that can learn to complete manipulation tasks at a human or super-human level of performance, we need to properly incorporate tactile information into both skill execution and skill learning. In this paper, we investigate how we can incorporate tactile information into imitation learning platforms to improve performance on complex tasks. To do this, we tackle the challenge of plugging in a USB cable, a dexterous manipulation task that relies on fine-grain visuo-tactile serving. By incorporating tactile information into imitation learning frameworks, we are able to train a robotic agent to plug in a USB cable - a first for imitation learning. Additionally, we explore how tactile information can be used to train non-tactile agents through a contrastive-loss pretraining process. Our results show that by pretraining with tactile information, the performance of a non-tactile agent can be significantly improved, reaching a level on par with visuo-tactile agents. For demonstration videos and access to our codebase, see the project website: https://sites.google.com/andrew.cmu.edu/visuo-tactile-cable-plugging/home

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# GNeRP:雑音偏光先行した反射物体のガウス誘導ニューラル再構成

GNeRP: Gaussian-guided Neural Reconstruction of Reflective Objects with Noisy Polarization Priors ( http://arxiv.org/abs/2403.11899v1 )

ライセンス: Link先を確認

LI Yang, WU Ruizheng, LI Jiyong, CHEN Ying-cong,

(参考訳) ニューラルレイディアンス場(NeRF)からの学習面は、MVS(Multi-View Stereo)の話題となった。近年のサインド・ディスタンス・ファンクション (SDF) を用いた手法は, ランバートのシーンの正確な3次元形状を復元する能力を示した。しかし、反射シーンにおけるそれらの結果は、特異な放射率と複雑な幾何学の絡み合いにより満足できない。そこで本研究では,SDF分野における正規表現のガウス的表現を提案する。偏光の先行によって監督されるこの表現は、鏡面反射の背後にある幾何学の学習をガイドし、既存の方法よりも多くの詳細を捉えている。さらに,偏光前処理のノイズ問題を緩和する最適化プロセスにおける重み付け戦略を提案する。設計の有効性を検証するため,様々な形状の反射シーンで偏光情報と地中真理メッシュをキャプチャする。また,PANDORAデータセット上でのフレームワークの評価を行った。提案手法は,反射シーンにおける既存の3次元再構成法よりも大きなマージンで優れていることを示す。

Learning surfaces from neural radiance field (NeRF) became a rising topic in Multi-View Stereo (MVS). Recent Signed Distance Function (SDF)-based methods demonstrated their ability to reconstruct accurate 3D shapes of Lambertian scenes. However, their results on reflective scenes are unsatisfactory due to the entanglement of specular radiance and complicated geometry. To address the challenges, we propose a Gaussian-based representation of normals in SDF fields. Supervised by polarization priors, this representation guides the learning of geometry behind the specular reflection and captures more details than existing methods. Moreover, we propose a reweighting strategy in the optimization process to alleviate the noise issue of polarization priors. To validate the effectiveness of our design, we capture polarimetric information, and ground truth meshes in additional reflective scenes with various geometry. We also evaluated our framework on the PANDORA dataset. Comparisons prove our method outperforms existing neural 3D reconstruction methods in reflective scenes by a large margin.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# Larimar: エピソードメモリ制御を備えた大規模言語モデル

Larimar: Large Language Models with Episodic Memory Control ( http://arxiv.org/abs/2403.11901v1 )

ライセンス: Link先を確認

Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen,

(参考訳) LLM(Large Language Models)に格納された知識の効率的かつ正確な更新は、今日の最も急進的な研究課題の1つである。本稿では,Larimarについて述べる。Larimarは,分散エピソードメモリを用いてLLMを拡張するための,脳にインスパイアされた新しいアーキテクチャである。 Larimarのメモリは、計算コストのかかるリトレーニングや微調整を必要とせずに、動的でワンショットの知識更新を可能にする。複数のファクト編集ベンチマークの実験結果から、Larimarは、挑戦的なシーケンシャルな編集セットアップであっても、最も競争力のあるベースラインに匹敵する精度を達成できたが、ベースLLMに依存して4～10倍のスピードアップを実現している。さらに、Larimarを用いた選択的な事実認識と入力コンテキスト長の一般化のメカニズムを提案し、その効果を示す。

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 4-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting and input context length generalization with Larimar and show their effectiveness.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# クレーム分解の概観

A Closer Look at Claim Decomposition ( http://arxiv.org/abs/2403.11903v1 )

ライセンス: Link先を確認

Miriam Wanner, Seth Ebner, Zhengping Jiang, Mark Dredze, Benjamin Van Durme,

(参考訳) 生成したテキストがより一般的になるにつれて、このようなテキストが外部の知識ソースによってどれだけ支持されているかを評価することがますます重要である。テキストサポートを評価するための多くのアプローチは、信頼された参照に対して得られる個々のサブ文にテキストを分解する方法に依存している。本稿では,最近提案されたFActScore などの評価手法が,各種のクレーム分解方法,特に LLM に基づく手法にどのような影響を及ぼすかを検討した。この感度は、エラーが計量の分解ステップからもたらされるとしても、テキストを生成するモデルに対して、そのようなメトリクスが全体的なテキストサポートであるから生じます。分解品質を測定するために,DecompScore と呼ぶ FActScore の適応を導入する。そこで我々は,Bertrand Russell の論理原子論とネオダビッドソン意味論に触発された分解を生成するための LLM ベースの手法を提案し,その分解品質を従来の方法よりも向上したことを示した。

As generated text becomes more commonplace, it is increasingly important to evaluate how well-supported such text is by external knowledge sources. Many approaches for evaluating textual support rely on some method for decomposing text into its individual subclaims which are scored against a trusted reference. We investigate how various methods of claim decomposition -- especially LLM-based methods -- affect the result of an evaluation approach such as the recently proposed FActScore, finding that it is sensitive to the decomposition method used. This sensitivity arises because such metrics attribute overall textual support to the model that generated the text even though error can also come from the metric's decomposition step. To measure decomposition quality, we introduce an adaptation of FActScore, which we call DecompScore. We then propose an LLM-based approach to generating decompositions inspired by Bertrand Russell's theory of logical atomism and neo-Davidsonian semantics and demonstrate its improved decomposition quality over previous methods.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# CICLe: 大規模多型食品リスク分類のためのコンフォーマル・インコンテクスト学習

CICLe: Conformal In-Context Learning for Largescale Multi-Class Food Risk Classification ( http://arxiv.org/abs/2403.11904v1 )

ライセンス: Link先を確認

Korbinian Randl, John Pavlopoulos, Aron Henriksson, Tony Lindgren,

(参考訳) 汚染された食品や成体の食品は、人間の健康に重大なリスクをもたらす。トレーニング用のラベル付きWebテキストセットが与えられたら、機械学習と自然言語処理を適用して、そのようなリスクを自動的に検出することができる。我々は,公開食品リコール発表を記述した7,546の短いテキストのデータセットを公開している。各テキストは、2つの粒度レベル(粗さと微妙さ)で手動でラベル付けされる。データセットとベンチマークナイーブ、従来型、トランスフォーマーモデルについて説明する。分析の結果,tf-idf表現に基づくロジスティック回帰は,低サポートのクラスではRoBERTaとXLM-Rより優れていた。最後に,異なるプロンプト戦略について議論し,コンフォーマル予測に基づくLLM-in-the-loopフレームワークを提案する。

Contaminated or adulterated food poses a substantial risk to human health. Given sets of labeled web texts for training, Machine Learning and Natural Language Processing can be applied to automatically detect such risks. We publish a dataset of 7,546 short texts describing public food recall announcements. Each text is manually labeled, on two granularity levels (coarse and fine), for food products and hazards that the recall corresponds to. We describe the dataset and benchmark naive, traditional, and Transformer models. Based on our analysis, Logistic Regression based on a tf-idf representation outperforms RoBERTa and XLM-R on classes with low support. Finally, we discuss different prompting strategies and present an LLM-in-the-loop framework, based on Conformal Prediction, which boosts the performance of the base classifier while reducing energy consumption compared to normal prompting.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# Tur[k]ingBench: Webエージェントのチャレンジベンチマーク

Tur[k]ingBench: A Challenge Benchmark for Web Agents ( http://arxiv.org/abs/2403.11905v1 )

ライセンス: Link先を確認

Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi,

(参考訳) 最近のチャットボットは、生のテキスト形式で理解し、コミュニケーションする能力を発揮している。しかし、世界は原文以上のものが存在する。例えば、人間が長い時間をウェブページで過ごし、そこではテキストが他のモダリティと連動し、タスクは様々な複雑な相互作用の形で達成される。最先端のマルチモーダルモデルはそのような複雑な領域に一般化できるのか? この問題に対処するために、TurkingBenchという、マルチモーダルコンテキストによるテキスト命令を含むWebページとして定式化されたタスクのベンチマークを導入する。人工的に合成されたWebページを利用する既存の作業とは異なり、ここでは、さまざまなアノテーションのために、もともとクラウドソーシングワーカーのために設計された、自然なHTMLページを使用します。各タスクのHTML命令は、さまざまな値(クラウドソーシングタスクから得られる)でインスタンス化され、タスクの新しいインスタンスを形成します。このベンチマークには158タスクに分散した32.2Kインスタンスが含まれている。さらに,TurkingBenchの評価を容易にするために,チャットボットの応答をWebページの修正(テキストボックスの変更,ラジオの確認など)に結びつける評価フレームワークを開発した。本ベンチマークでは,言語のみ,視覚のみ,レイアウトのみ,およびそれらの組み合わせを含む最先端モデルの性能を評価する。以上の結果から,これらのモデルではランダムな確率よりもはるかに優れた性能が得られたが,改善の余地は十分にあることがわかった。このベンチマークによって、Webベースのエージェントの評価と開発が促進されることを願っています。

Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such complex domains? To address this question, we introduce TurkingBench, a benchmark of tasks formulated as web pages containing textual instructions with multi-modal context. Unlike existing work which employs artificially synthesized web pages, here we use natural HTML pages that were originally designed for crowdsourcing workers for various annotation purposes. The HTML instructions of each task are also instantiated with various values (obtained from the crowdsourcing tasks) to form new instances of the task. This benchmark contains 32.2K instances distributed across 158 tasks. Additionally, to facilitate the evaluation on TurkingBench, we develop an evaluation framework that connects the responses of chatbots to modifications on web pages (modifying a text box, checking a radio, etc.). We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark. Our findings reveal that these models perform significantly better than random chance, yet considerable room exists for improvement. We hope this benchmark will help facilitate the evaluation and development of web-based agents.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# Distill2Explain:エネルギー応用制御系における説明可能な強化学習のための微分可能な決定木

Distill2Explain: Differentiable decision trees for explainable reinforcement learning in energy application controllers ( http://arxiv.org/abs/2403.11907v1 )

ライセンス: Link先を確認

Gargya Gokhale, Seyed Soroush Karimi Madahi, Bert Claessens, Chris Develder,

(参考訳) エネルギー移行プロセスにおける重要な要素として、需要側の柔軟性が重要になっている。世界の最終エネルギー消費の約25%を占める住宅セクターは、エネルギーの柔軟性の重要な(潜在的)源である。しかし、この柔軟性を解き放つには、(1)異なる住宅に容易にスケールできる、(2)メンテナンスが容易、(3)エンドユーザにとって理解しやすいコントロールフレームワークを開発する必要がある。そのようなタスクの潜在的な制御フレームワークは、データ駆動型制御、特にモデルフリー強化学習(RL)である。このようなRLベースのコントローラは、環境と対話し、データに基づいて純粋に学習し、人間の介入を最小限に抑えて、優れた制御ポリシーを学習する。しかし、説明性に欠けており、ユーザーの受け入れを妨げている。さらに、住宅資産の限られたハードウェア能力はハードルとなる(例えば、ディープニューラルネットワークを使用する)。これらの課題を克服するために、微分可能な決定木を用いて説明可能なRLポリシーを得る新しい方法を提案する。政策蒸留アプローチを用いて、標準的なRLベースのコントローラを模倣するためにこれらの異なる決定木を訓練し、データ駆動型で説明しやすい決定木ベースの制御ポリシーを導出する。概念実証として,バッテリベース家庭用エネルギー管理システムにおける提案手法の性能と説明可能性について検討し,エネルギーコストの低減を図る。このユースケースでは,提案手法がベースラインルールベースのポリシーを20～25%上回り,シンプルで説明可能な制御ポリシーを提供する。さらに、これらの説明可能なポリシーを標準のRLポリシーと比較し、この説明可能性の増加に伴うパフォーマンストレードオフについて検討する。

Demand-side flexibility is gaining importance as a crucial element in the energy transition process. Accounting for about 25% of final energy consumption globally, the residential sector is an important (potential) source of energy flexibility. However, unlocking this flexibility requires developing a control framework that (1) easily scales across different houses, (2) is easy to maintain, and (3) is simple to understand for end-users. A potential control framework for such a task is data-driven control, specifically model-free reinforcement learning (RL). Such RL-based controllers learn a good control policy by interacting with their environment, learning purely based on data and with minimal human intervention. Yet, they lack explainability, which hampers user acceptance. Moreover, limited hardware capabilities of residential assets forms a hurdle (e.g., using deep neural networks). To overcome both those challenges, we propose a novel method to obtain explainable RL policies by using differentiable decision trees. Using a policy distillation approach, we train these differentiable decision trees to mimic standard RL-based controllers, leading to a decision tree-based control policy that is data-driven and easy to explain. As a proof-of-concept, we examine the performance and explainability of our proposed approach in a battery-based home energy management system to reduce energy costs. For this use case, we show that our proposed approach can outperform baseline rule-based policies by about 20-25%, while providing simple, explainable control policies. We further compare these explainable policies with standard RL policies and examine the performance trade-offs associated with this increased explainability.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# RoGUENeRF: NeRF用ロバストな幾何型ユニバーサルエンハンサー

RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF ( http://arxiv.org/abs/2403.11909v1 )

ライセンス: Link先を確認

Sibi Catley-Chandar, Richard Shaw, Gregory Slabaugh, Eduardo Perez-Pellitero,

(参考訳) ニューラルレンダリングの最近の進歩は、高光写実性3Dシーンの再構築と新しいビュー合成を可能にしている。この進歩にもかかわらず、現在の最先端の手法は、放射界の低周波バイアスや不正確なカメラキャリブレーションなどの要因により、高周波詳細の再構築に苦慮している。この問題を緩和するための1つのアプローチは、レンダリング後のイメージを強化することである。 2Dエンハンサーは、いくつかの詳細を回復するために事前訓練することができるが、シーン幾何学には依存せず、画像劣化の新しい分布に容易に一般化することができない。逆に、既存の3Dエンハンサーは、近隣のトレーニング画像からの細部を一般化可能な方法で転送することができるが、不正確なカメラキャリブレーションに悩まされ、幾何学的誤差を描画画像に伝達することができる。両パラダイムの長所を生かしたニューラルレンダリングエンハンサーであるRoGUENeRFを提案する。本手法は,3次元アライメントと幾何認識融合により,近隣のトレーニング画像からの情報を活用するとともに,一般エンハンサーを学習するための事前訓練を行う。本手法は, 幾何整合性を維持しながら高周波テクスチャを復元すると共に, 不正確なカメラキャリブレーションにも頑健である。例えば、現実世界の360v2データセット上で、MipNeRF360のPSNRを0.63dB、Nerfactoを1.34dB改善する。

Recent advances in neural rendering have enabled highly photorealistic 3D scene reconstruction and novel view synthesis. Despite this progress, current state-of-the-art methods struggle to reconstruct high frequency detail, due to factors such as a low-frequency bias of radiance fields and inaccurate camera calibration. One approach to mitigate this issue is to enhance images post-rendering. 2D enhancers can be pre-trained to recover some detail but are agnostic to scene geometry and do not easily generalize to new distributions of image degradation. Conversely, existing 3D enhancers are able to transfer detail from nearby training images in a generalizable manner, but suffer from inaccurate camera calibration and can propagate errors from the geometry into rendered images. We propose a neural rendering enhancer, RoGUENeRF, which exploits the best of both paradigms. Our method is pre-trained to learn a general enhancer while also leveraging information from nearby training images via robust 3D alignment and geometry-aware fusion. Our approach restores high-frequency textures while maintaining geometric consistency and is also robust to inaccurate camera calibration. We show that RoGUENeRF substantially enhances the rendering quality of a wide range of neural rendering baselines, e.g. improving the PSNR of MipNeRF360 by 0.63dB and Nerfacto by 1.34dB on the real world 360v2 dataset.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# 分散型協調運転における単エージェントアクター批判

Single-Agent Actor Critic for Decentralized Cooperative Driving ( http://arxiv.org/abs/2403.11914v1 )

ライセンス: Link先を確認

Shengchao Yan, Lukas König, Wolfram Burgard,

(参考訳) 自動運転車(AV)を取り入れたアクティブな交通管理は、渋滞の低減と交通の流れの強化を約束する。しかし、現実のアプリケーションのためのアルゴリズムを開発するには、連続的なトラフィックフローと部分的な可観測性によって生じる課題に対処する必要がある。このギャップを埋めて、より分散化に向けての積極的な交通管理の分野を推し進めるために、単エージェント強化学習を用いて自律走行車のための分散型協調運転ポリシーを学習することを目的とした、新しい非対称アクター批判モデルを導入する。提案手法では,マスキングを用いたアテンションニューラルネットワークを用いて,現実の交通流の動的性質と部分観測可能性を扱う。各種交通シナリオのベースラインコントローラに対する広範囲な評価を通じて,道路システム内の多様なボトルネック箇所における交通流改善の可能性を示す。さらに、交通規制に厳格に従う自動運転車の保守的な運転行動に関わる課題についても検討する。実験の結果,提案する協調政策は,安全を損なうことなく,潜在的な交通の減速を緩和できることが示された。

Active traffic management incorporating autonomous vehicles (AVs) promises a future with diminished congestion and enhanced traffic flow. However, developing algorithms for real-world application requires addressing the challenges posed by continuous traffic flow and partial observability. To bridge this gap and advance the field of active traffic management towards greater decentralization, we introduce a novel asymmetric actor-critic model aimed at learning decentralized cooperative driving policies for autonomous vehicles using single-agent reinforcement learning. Our approach employs attention neural networks with masking to handle the dynamic nature of real-world traffic flow and partial observability. Through extensive evaluations against baseline controllers across various traffic scenarios, our model shows great potential for improving traffic flow at diverse bottleneck locations within the road system. Additionally, we explore the challenge associated with the conservative driving behaviors of autonomous vehicles that adhere strictly to traffic regulations. The experiment results illustrate that our proposed cooperative policy can mitigate potential traffic slowdowns without compromising safety.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# 多言語文埋め込みを用いた適応的バイリンガルアライディング

Adaptative Bilingual Aligning Using Multilingual Sentence Embedding ( http://arxiv.org/abs/2403.11921v1 )

ライセンス: Link先を確認

Olivier Kraif,

(参考訳) 本稿では,AIlignと呼ばれる適応的ビット情報アライメントシステムを提案する。このアライダは文の埋め込みに依存して、並列性が断片的で厳密に単調ではないテキストであってもアライメントパスを導くことのできる信頼できるアンカーポイントを抽出する。いくつかのデータセットに対する実験では、AIlignが準線形複雑性を持つ最先端技術に匹敵する結果が得られることを示した。さらに、AIlignは、VecalignやBertalignのような最近のシステムとは異なり、並列性と単調性の性質が局所的にのみ満足されるテキストを扱うことができる。

In this paper, we present an adaptive bitextual alignment system called AIlign. This aligner relies on sentence embeddings to extract reliable anchor points that can guide the alignment path, even for texts whose parallelism is fragmentary and not strictly monotonic. In an experiment on several datasets, we show that AIlign achieves results equivalent to the state of the art, with quasi-linear complexity. In addition, AIlign is able to handle texts whose parallelism and monotonicity properties are only satisfied locally, unlike recent systems such as Vecalign or Bertalign.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# 多レベルアクター臨界による平均回帰RLにおける時間オラクルの混合のない大域的最適性

Global Optimality without Mixing Time Oracles in Average-reward RL via Multi-level Actor-Critic ( http://arxiv.org/abs/2403.11925v1 )

ライセンス: Link先を確認

Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha,

(参考訳) 平均回帰強化学習の文脈では、混合時間のオラクル知識の要求、固定された政策の下でマルコフ連鎖の持続時間の測定は、その定常分布を達成する必要がある。この要件は、大きな状態空間を持つ環境での混合時間推定の困難さと費用が原因で特に問題となる。この制限に対処するために,マルチレベルモンテカルロ勾配推定器を組み込んだマルチレベルアクタ・クリティカル(MAC)フレームワークを検討する。提案手法では, 時間知識の混合への依存を効果的に緩和する。さらに,本手法は先行研究と比較して,$\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$の厳密な依存性を示す。 2次元グリッドワールドの目標到達航法実験により,MACが従来のPG法よりも高い報酬を得られることを示す。

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution-poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications. To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest-available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$ relative to prior work. With a 2D gridworld goal-reaching navigation experiment, we demonstrate that MAC achieves higher reward than a previous PG-based method for average reward, Parameterized Policy Gradient with Advantage Estimation (PPGAE), especially in cases with relatively small training sample budget restricting trajectory length.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# LayerDiff:Layer-Collaborative Diffusion Modelによるテキスト誘導多層合成画像の探索

LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model ( http://arxiv.org/abs/2403.11929v1 )

ライセンス: Link先を確認

Runhui Huang, Kaixin Cai, Jianhua Han, Xiaodan Liang, Renjing Pei, Guansong Lu, Songcen Xu, Wei Zhang, Hang Xu,

(参考訳) 拡散ベースの生成モデルによってテキストプロンプトが与えられると、高品質な画像を生成することに成功したが、以前の作業では画像全体を直接生成するが、オブジェクト指向の操作能力は提供できない。プロのグラフィックデザインやデジタルアートのようなより広範なリアルなアプリケーションをサポートするために、画像は複数の層で頻繁に作成され、操作され、柔軟性とコントロールが向上する。そこで本稿では,テキスト誘導,多層化,構成可能な画像合成のためのレイヤ協調拡散モデルであるLayerDiffを提案する。構成可能な画像は、背景層、前景層の集合、および各前景要素のための関連するマスク層からなる。これを実現するため、LayerDiffはレイヤ間のパターンをキャプチャするために複数のレイヤ協調アテンションモジュールを組み込んだレイヤベースの生成パラダイムを導入した。具体的には、層間アテンションモジュールは層間の情報交換と学習を促進するように設計され、テキスト誘導イントラアテンションモジュールは層固有のプロンプトを組み込んで各層に対して特定のコンテンツ生成を指示する。レイヤ固有のプロンプト強化モジュールは、グローバルプロンプトから詳細なテキストキューをキャプチャする。さらに、自己マスク誘導サンプリング戦略により、多層画像を生成するモデルの能力をさらに解き放つ。また、既存の知覚モデルと生成モデルを統合して、高品質でテキストプロンプされた多層画像の大規模なデータセットを生成するパイプラインを提案する。大規模な実験により,従来の全画像生成手法に匹敵する高画質の多層画像が生成可能であることが示された。さらにLayerDiffは、レイヤ固有の画像編集やスタイル転送など、幅広いコントロール可能な生成アプリケーションを可能にする。

Despite the success of generating high-quality images given any text prompts by diffusion-based generative models, prior works directly generate the entire images, but cannot provide object-wise manipulation capability. To support wider real applications like professional graphic design and digital artistry, images are frequently created and manipulated in multiple layers to offer greater flexibility and control. Therefore in this paper, we propose a layer-collaborative diffusion model, named LayerDiff, specifically designed for text-guided, multi-layered, composable image synthesis. The composable image consists of a background layer, a set of foreground layers, and associated mask layers for each foreground element. To enable this, LayerDiff introduces a layer-based generation paradigm incorporating multiple layer-collaborative attention modules to capture inter-layer patterns. Specifically, an inter-layer attention module is designed to encourage information exchange and learning between layers, while a text-guided intra-layer attention module incorporates layer-specific prompts to direct the specific-content generation for each layer. A layer-specific prompt-enhanced module better captures detailed textual cues from the global prompt. Additionally, a self-mask guidance sampling strategy further unleashes the model's ability to generate multi-layered images. We also present a pipeline that integrates existing perceptual and generative models to produce a large dataset of high-quality, text-prompted, multi-layered images. Extensive experiments demonstrate that our LayerDiff model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods. Moreover, LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# ニュートラル原子量子プロセッサを用いたグラフアルゴリズム

Graph Algorithms with Neutral Atom Quantum Processors ( http://arxiv.org/abs/2403.11931v1 )

ライセンス: Link先を確認

Constantin Dalyac, Lucas Leclerc, Louis Vignoli, Mehdi Djellabi, Wesley da Silva Coelho, Bruno Ximenez, Alexandre Dareau, Davide Dreon, VIncent E. Elfving, Adrien Signoles, Louis-Paul Henry, Loïc Henriet,

(参考訳) ニュートラル原子技術は、量子アルゴリズムを実行するための最前線のプラットフォームとして位置づけられ、理論と実験の進歩を着実に証明してきた。この技術のユニークな利点の1つは、qubitレジスタのジオメトリをショットからショットに再構成できることである。このユニークな機能は、複雑な最適化と機械学習タスクの解決に重大な結果をもたらす、ハードウェアレベルでグラフ構造化問題のネイティブな埋め込みを可能にする。量子ビットを駆動することで、グラフ複素特性を保持する処理された量子状態を生成することができる。これらの状態は、問題への直接的な解決策や、ハイブリッド量子古典的スキームのリソースとして利用することができる。本稿では、中性原子量子処理ユニット(QPU)上で動作するグラフ問題に対する量子アルゴリズムの進歩を概観し、最近導入された埋め込みと問題解決技術について議論する。さらに、中性原子QPUのスケーラビリティ、制御可能性、計算繰り返し率の向上に重点を置いて、ハードウェアの継続的な進歩を明らかにした。

Neutral atom technology has steadily demonstrated significant theoretical and experimental advancements, positioning itself as a front-runner platform for running quantum algorithms. One unique advantage of this technology lies in the ability to reconfigure the geometry of the qubit register, from shot to shot. This unique feature makes possible the native embedding of graph-structured problems at the hardware level, with profound consequences for the resolution of complex optimization and machine learning tasks. By driving qubits, one can generate processed quantum states which retain graph complex properties. These states can then be leveraged to offer direct solutions to problems or as resources in hybrid quantum-classical schemes. In this paper, we review the advancements in quantum algorithms for graph problems running on neutral atom Quantum Processing Units (QPUs), and discuss recently introduced embedding and problem-solving techniques. In addition, we clarify ongoing advancements in hardware, with an emphasis on enhancing the scalability, controllability and computation repetition rate of neutral atom QPUs.

翻訳日:2024-03-20 19:40:35 公開日:2024-03-18

# 高エネルギー物理画像分類:ジェットの応用に関する調査

High-energy physics image classification: A Survey of Jet Applications ( http://arxiv.org/abs/2403.11934v1 )

ライセンス: Link先を確認

Hamza Kheddar, Yassine Himeur, Abbes Amira, Rachik Soualah,

(参考訳) 近年、高エネルギー物理(HEP)実験や現象学研究の分野では、機械学習(ML)とその専門分野である深層学習(DL)が統合されている。この調査は、様々なDLアプローチの範囲内で、これらの応用を包括的に評価する。本論文の最初のセグメントでは,様々な粒子物理学のタイプを包含する基礎について紹介し,素粒子物理を学習モデルと組み合わせて評価するための基準を確立する。その後、HEP画像、アクセス可能なデータセット、事前処理技術の詳細な詳細、特徴抽出と選択の方法などを表現するための包括的な分類法が提示される。その後、HEP画像に合わせて利用可能な人工知能(AI)モデルを探索し、ジェット粒子に関するHEP画像分類を精査する。本総説では, ML と DL が提案する最先端技術 (SOTA) について深く検討し, HEP 調査の意義について考察した。この議論は、ジェットタグ、ジェットトラッキング、粒子分類など、特定の応用をかなり詳細に掘り下げている。本調査は, DL方法論に基づくHEPの現状に関する分析から, 今後の研究課題と今後の課題を包括する。

In recent times, the fields of high-energy physics (HEP) experimentation and phenomenological studies have seen the integration of machine learning (ML) and its specialized branch, deep learning (DL). This survey offers a comprehensive assessment of these applications within the realm of various DL approaches. The initial segment of the paper introduces the fundamentals encompassing diverse particle physics types and establishes criteria for evaluating particle physics in tandem with learning models. Following this, a comprehensive taxonomy is presented for representing HEP images, encompassing accessible datasets, intricate details of preprocessing techniques, and methods of feature extraction and selection. Subsequently, the focus shifts to an exploration of available artificial intelligence (AI) models tailored to HEP images, along with a concentrated examination of HEP image classification pertaining to Jet particles. Within this review, a profound investigation is undertaken into distinct ML and DL proposed state-of-the art (SOTA) techniques, underscoring their implications for HEP inquiries. The discussion delves into specific applications in substantial detail, including Jet tagging, Jet tracking, particle classification, and more. The survey culminates with an analysis concerning the present status of HEP grounded in DL methodologies, encompassing inherent challenges and prospective avenues for future research endeavors.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# ハイパーカラー化:ハイパースペクトル画像再構成のための空間的疎雑音スペクトル手がかりの伝搬

HyperColorization: Propagating spatially sparse noisy spectral clues for reconstructing hyperspectral images ( http://arxiv.org/abs/2403.11935v1 )

ライセンス: Link先を確認

M. Kerem Aydin, Qi Guo, Emma Alexander,

(参考訳) ハイパースペクトルカメラは、空間分解能のトレードオフに挑戦しており、同じ露光時間で撮影されたRGB写真よりもショットノイズの影響を受けやすい。ここでは、グレースケールのガイド画像と空間的にスパースなスペクトル手がかりから、ハイパースペクトル画像を再構成するカラー化アルゴリズムを提案する。提案アルゴリズムは,ハイパースペクトル画像の様々なスペクトル次元に一般化し,低ランク空間におけるカラー化が計算時間とショットノイズの影響を減少させることを示す。頑健性を高めるため,ガイド付きサンプリング,エッジ認識フィルタリング,次元推定手法を取り入れた。提案手法は,SSIM,PSNR,GFC,EMDなどの様々な性能指標において過去のアルゴリズムを上回り,ハイパースペクトル画像品質を特徴付ける指標として分析する。これらの知見は、ウィスキーやプッシュブルームスキャナーで得られた試料から高スペクトル像を再構成することにより、時空間分解能トレードオフを克服するための有望な手段を提供するとともに、ハイブリッド空間分光イメージングシステムを提供する。

Hyperspectral cameras face challenging spatial-spectral resolution trade-offs and are more affected by shot noise than RGB photos taken over the same total exposure time. Here, we present a colorization algorithm to reconstruct hyperspectral images from a grayscale guide image and spatially sparse spectral clues. We demonstrate that our algorithm generalizes to varying spectral dimensions for hyperspectral images, and show that colorizing in a low-rank space reduces compute time and the impact of shot noise. To enhance robustness, we incorporate guided sampling, edge-aware filtering, and dimensionality estimation techniques. Our method surpasses previous algorithms in various performance metrics, including SSIM, PSNR, GFC, and EMD, which we analyze as metrics for characterizing hyperspectral image quality. Collectively, these findings provide a promising avenue for overcoming the time-space-wavelength resolution trade-off by reconstructing a dense hyperspectral image from samples obtained by whisk or push broom scanners, as well as hybrid spatial-spectral computational imaging systems.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# AIによる頸部がん検診

AI-Assisted Cervical Cancer Screening ( http://arxiv.org/abs/2403.11936v1 )

ライセンス: Link先を確認

Kanchan Poudel, Lisasha Poudel, Prabin Raj Shakya, Atit Poudel, Archana Shrestha, Bishesh Khanal,

(参考訳) 低所得国や中所得国(LMIC)では、好まれるが利用できない専門家である婦人科医の代わりに、看護師によるスクリーニングキャンプや一次・地域医療センターがしばしば実施されている。テストの主観的な性質に対処するため、カメラやスマートフォンを統合した様々なハンドヘルドデバイスが、最近、VIA中の頚部画像をキャプチャし、遠隔医療やAIモデルによる意思決定を支援するために研究されている。 AIモデルを提案するほとんどの研究は、特定のデバイス、デジタルカメラ、スマートフォンから収集された画像の比較的少数を振り返りに使用している。資源制約されたキャンプ設定におけるVIA中の品質画像取得の課題とプロトコルは、しばしば見過ごされがちである。本稿では,異なる統合デバイスを購入する必要のない,堅牢なスマートフォンベースのAI支援システムを構築するための,エンド・ツー・エンドの設計プロセスについて述べる。資源制約のある環境での高品質な画像取得のためのプロトコル,キャンプ,前処理パイプライン,深層学習に基づく分類モデルのトレーニングと評価において,看護師が実施するVIA中の1,430人の女性から収集したデータセット。我々の研究は、容易に利用可能なスマートフォンと適切なプロトコルが、VIAテストに必要な詳細でcervixイメージをキャプチャできることを示し、深層学習に基づく分類モデルは、VIAスクリーニングにおける看護師を支援するための有望な結果を提供し、リソース制約された設定における大規模データ収集と検証の方向性を提供する。

Visual Inspection with Acetic Acid (VIA) remains the most feasible cervical cancer screening test in resource-constrained settings of low- and middle-income countries (LMICs), which are often performed screening camps or primary/community health centers by nurses instead of the preferred but unavailable expert Gynecologist. To address the highly subjective nature of the test, various handheld devices integrating cameras or smartphones have been recently explored to capture cervical images during VIA and aid decision-making via telemedicine or AI models. Most studies proposing AI models retrospectively use a relatively small number of already collected images from specific devices, digital cameras, or smartphones; the challenges and protocol for quality image acquisition during VIA in resource-constrained camp settings, challenges in getting gold standard, data imbalance, etc. are often overlooked. We present a novel approach and describe the end-to-end design process to build a robust smartphone-based AI-assisted system that does not require buying a separate integrated device: the proposed protocol for quality image acquisition in resource-constrained settings, dataset collected from 1,430 women during VIA performed by nurses in screening camps, preprocessing pipeline, and training and evaluation of a deep-learning-based classification model aimed to identify (pre)cancerous lesions. Our work shows that the readily available smartphones and a suitable protocol can capture the cervix images with the required details for the VIA test well; the deep-learning-based classification model provides promising results to assist nurses in VIA screening; and provides a direction for large-scale data collection and validation in resource-constrained settings.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# 畳み込み層に対する Roesser 型の状態空間表現

State space representations of the Roesser type for convolutional layers ( http://arxiv.org/abs/2403.11938v1 )

ライセンス: Link先を確認

Patricia Pauli, Dennis Gramlich, Fran Allgöwer,

(参考訳) 制御理論の観点からは、畳み込み層(ニューラルネットワーク)は2-D(またはN-D)線形時間不変力学系である。畳み込みカーネルによる畳み込み層の通常の表現は、そのインパルス応答による力学系の表現に対応する。しかし、制御理論からの多くの解析ツール、例えば線型行列の不等式は状態空間表現を必要とする。この理由から、我々は、$c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ state, where $c_\mathrm{in}$/c_\mathrm{out}$は層の入出力チャネルの数であり、$r_1$/$r_2$は、畳み込みカーネルの幅と長さを特徴づける。この表現は$c_\mathrm{in} = c_\mathrm{out}$に対して最小であることが示されている。さらに、拡張、ストライド、N-D畳み込みのための状態空間表現を構築する。

From the perspective of control theory, convolutional layers (of neural networks) are 2-D (or N-D) linear time-invariant dynamical systems. The usual representation of convolutional layers by the convolution kernel corresponds to the representation of a dynamical system by its impulse response. However, many analysis tools from control theory, e.g., involving linear matrix inequalities, require a state space representation. For this reason, we explicitly provide a state space representation of the Roesser type for 2-D convolutional layers with $c_\mathrm{in}r_1 + c_\mathrm{out}r_2$ states, where $c_\mathrm{in}$/$c_\mathrm{out}$ is the number of input/output channels of the layer and $r_1$/$r_2$ characterizes the width/length of the convolution kernel. This representation is shown to be minimal for $c_\mathrm{in} = c_\mathrm{out}$. We further construct state space representations for dilated, strided, and N-D convolutions.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# マルチステップの逆は必要なだけではない

Multistep Inverse Is Not All You Need ( http://arxiv.org/abs/2403.11940v1 )

ライセンス: Link先を確認

Alexander Levine, Peter Stone, Amy Zhang,

(参考訳) 実世界の制御環境では、観測空間は不要に高次元であり、時間関連ノイズにさらされることが多い。しかし、制御可能なシステムの力学は、しばしば生の観測の力学よりもはるかに単純である。したがって、観測空間を制御関連変数のより単純な空間にマッピングするエンコーダを学ぶことが望ましい。本研究では,Efroni et al (2022) が最初に提案したEx-BMDPモデルについて考察する。 Lamb et al (2022) は、エンコーダを学習し、そのような問題の観測から完全な行動依存潜在状態表現を抽出する「AC状態」法を提案する。 AC-Stateは、パス内の最初のアクションを予測するために、パス内の最初の状態と最後の状態のエンコーディングを使用する、多段階逆法である。しかし、AC-Stateがエージェント制御可能因子の正しい潜在表現を学習できないケースを特定する。そこで我々は,多段階逆予測と潜在前方モデルを組み合わせた新しいアルゴリズムACDFを提案する。 ACDFは、多数のEx-BMDPモデルに対して、アクション依存の潜在状態エンコーダを正しく推論することが保証されている。ニューラルネットワークを用いたエンコーダを用いた高次元環境だけでなく, 数値シミュレーションによる表計算元BMDPに対するACDFの有効性を実証する。コードはhttps://github.com/midi-lab/acdf.comで入手できる。

In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# 微分決定木を用いた説明可能な強化学習に基づく家庭エネルギー管理システム

Explainable Reinforcement Learning-based Home Energy Management Systems using Differentiable Decision Trees ( http://arxiv.org/abs/2403.11947v1 )

ライセンス: Link先を確認

Gargya Gokhale, Bert Claessens, Chris Develder,

(参考訳) エネルギーの継続的な移行により、需要側の柔軟性は、グリッドのサポートと持続可能なエネルギー源のさらなる統合を可能にするため、現代の電力網の重要な側面となっている。従来の供給源の他に、住宅セクターは太陽光発電、家庭用バッテリー、EVの採用の増加によって、柔軟性を損なう主要な供給源となっている。しかし、家庭のエネルギー消費を効果的に管理し、多様な住宅で容易にスケーラブルにしながら、利用者の快適性を維持するためのコントロールフレームワークが必要であるため、この住宅の柔軟性の解放は困難である。我々は、この課題に対処し、微分可能な決定木を用いた強化学習に基づくアプローチを導入することを目指している。このアプローチは、データ駆動強化学習のスケーラビリティと(微分可能)決定木の説明可能性を統合する。これにより、さまざまな家庭に容易に適応できるコントローラが実現され、エンドユーザに説明可能なシンプルなコントロールポリシが提供され、ユーザの受け入れがさらに向上します。概念実証として,家庭内エネルギー管理問題を用いて提案手法を解析し,その性能を市販のルールベースベースラインと標準ニューラルネットワークベースRLコントローラと比較した。本研究により,提案手法の性能は標準のRLコントローラに匹敵するものであり,日常的なコスト削減の点において,ベースラインコントローラを約20%上回り,説明が容易であることを示す。

With the ongoing energy transition, demand-side flexibility has become an important aspect of the modern power grid for providing grid support and allowing further integration of sustainable energy sources. Besides traditional sources, the residential sector is another major and largely untapped source of flexibility, driven by the increased adoption of solar PV, home batteries, and EVs. However, unlocking this residential flexibility is challenging as it requires a control framework that can effectively manage household energy consumption, and maintain user comfort while being readily scalable across different, diverse houses. We aim to address this challenging problem and introduce a reinforcement learning-based approach using differentiable decision trees. This approach integrates the scalability of data-driven reinforcement learning with the explainability of (differentiable) decision trees. This leads to a controller that can be easily adapted across different houses and provides a simple control policy that can be explained to end-users, further improving user acceptance. As a proof-of-concept, we analyze our method using a home energy management problem, comparing its performance with commercially available rule-based baseline and standard neural network-based RL controllers. Through this preliminary study, we show that the performance of our proposed method is comparable to standard RL-based controllers, outperforming baseline controllers by ~20% in terms of daily cost savings while being straightforward to explain.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# 空間曲率における非線形性を符号化する力学系学習

Learning Dynamical Systems Encoding Non-Linearity within Space Curvature ( http://arxiv.org/abs/2403.11948v1 )

ライセンス: Link先を確認

Bernardo Fichera, Aude Billard,

(参考訳) 動的システム(DS)は、ロボット制御のための高レベルポリシーを効果的かつ強力に形成する手段である。それらは、駆動ベクトル場の安定性を確保しながら、堅牢で反応性の高い制御を提供する。現実のシナリオの複雑さが増大するにつれ、DSはより高度な非線形性を必要とし、障害物のような環境条件の変化に適応する能力も必要となる。 DSの現在の学習戦略は、しばしばトレードオフを伴い、学習されたDSの能力を高めるために、安定性保証かオフラインの計算効率を犠牲にする。環境変化に対するオンラインの地域適応は考慮されないか、別の問題として扱われる。本稿では,学習したDSの複雑性を,トレーニングや安定性保証において効率を損なうことなく向上させる手法を提案する。さらに,初期学習されたDSの非線形性と環境の変化によって生じる任意の局所的非線形性とをシームレスに統合するための統一的なアプローチを提案する。本稿では,ロボット制御のための漸近的に安定な非線形DSを学習するための幾何学的アプローチを提案する。各DSは、潜在多様体上の調和減衰振動子としてモデル化される。多様体のユークリッド埋め込み表現を学習することにより、我々のアプローチは空間の曲率内のDSの非線形性を符号化する。多様体の明示的な埋め込み表現を持つことで、空間の局所的な変形を直接誘導することによって障害物回避を示すことができる。まず,合成ベクトル場の2次元学習と,実環境における3次元ロボットのエンドエフェクタ動作の学習の2つのシナリオを通して,方法論の有効性を実証する。

Dynamical Systems (DS) are an effective and powerful means of shaping high-level policies for robotics control. They provide robust and reactive control while ensuring the stability of the driving vector field. The increasing complexity of real-world scenarios necessitates DS with a higher degree of non-linearity, along with the ability to adapt to potential changes in environmental conditions, such as obstacles. Current learning strategies for DSs often involve a trade-off, sacrificing either stability guarantees or offline computational efficiency in order to enhance the capabilities of the learned DS. Online local adaptation to environmental changes is either not taken into consideration or treated as a separate problem. In this paper, our objective is to introduce a method that enhances the complexity of the learned DS without compromising efficiency during training or stability guarantees. Furthermore, we aim to provide a unified approach for seamlessly integrating the initially learned DS's non-linearity with any local non-linearities that may arise due to changes in the environment. We propose a geometrical approach to learn asymptotically stable non-linear DS for robotics control. Each DS is modeled as a harmonic damped oscillator on a latent manifold. By learning the manifold's Euclidean embedded representation, our approach encodes the non-linearity of the DS within the curvature of the space. Having an explicit embedded representation of the manifold allows us to showcase obstacle avoidance by directly inducing local deformations of the space. We demonstrate the effectiveness of our methodology through two scenarios: first, the 2D learning of synthetic vector fields, and second, the learning of 3D robotic end-effector motions in real-world settings.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# 決定論的に生成されたフォトニックグラフ状態の融合

Fusion of deterministically generated photonic graph states ( http://arxiv.org/abs/2403.11950v1 )

ライセンス: Link先を確認

Philip Thomas, Leonardo Ruscio, Olivier Morin, Gerhard Rempe,

(参考訳) 絡み合いは、量子物理学の謎的な概念から、量子技術の鍵となる要素へと進化してきた。これは古典物理学と矛盾する測定結果の相関を説明し、個々の量子ビットの小さな集合で広く研究されてきた。ゲートベースの量子計算プロトコルで構築されたマルチパーティの絡み合った状態と、より広い視点から見れば$\unicode{x2013}$が、測定ベースの量子情報処理の主資源として提案された。後者は、グラフによって記述された多ビットの絡み合った状態の元アンテ生成を必要とする。ベル状態や線形クラスタ状態のような小さなグラフ状態は光子で生成されているが、提案された量子コンピューティングと量子ネットワークアプリケーションでは、プログラム可能な方法でそのような状態がより大きくより強力な状態に融合する必要がある。ここではこの目的を達成するために、2つの個別に対応可能な原子を1つの光共振器に採用する。最大8キュービットのリングおよびツリーグラフ状態は、絡み合いトポロジーを反映した名前であり、個々の原子によって放出されるフォトニック状態から効率的に融合する。融合過程自体は、2つの原子の間に空洞補助ゲートを用いる。我々の技術は原則として、より多くの量子ビットに対してスケーラブルであり、例えば将来の量子インターネットにおけるメモリレス量子リピータへの決定的なステップである。

Entanglement has evolved from an enigmatic concept of quantum physics to a key ingredient of quantum technology. It explains correlations between measurement outcomes that contradict classical physics, and has been widely explored with small sets of individual qubits. Multi-partite entangled states build up in gate-based quantum-computing protocols, and $\unicode{x2013}$ from a broader perspective $\unicode{x2013}$ were proposed as the main resource for measurement-based quantum-information processing. The latter requires the ex-ante generation of a multi-qubit entangled state described by a graph. Small graph states such as Bell or linear cluster states have been produced with photons, but the proposed quantum computing and quantum networking applications require fusion of such states into larger and more powerful states in a programmable fashion. Here we achieve this goal by employing two individually addressable atoms in one optical resonator. Ring and tree graph states with up to eight qubits, with the names reflecting the entanglement topology, are efficiently fused from the photonic states emitted by the individual atoms. The fusion process itself employs a cavity-assisted gate between the two atoms. Our technique is in principle scalable to even larger numbers of qubits, and is the decisive step towards, for instance, a memory-less quantum repeater in a future quantum internet.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# エストニアのオープン・ガバメント・データ開発と卓越への旅--オープン・ガバメント・プロポーテーションにおける地方自治体の進展をめざして

Exploring Estonia's Open Government Data Development as a Journey towards Excellence: Unveiling the Progress of Local Governments in Open Data Provision ( http://arxiv.org/abs/2403.11952v1 )

ライセンス: Link先を確認

Katrin Rajamäe-Soosaar, Anastasija Nikiforova,

(参考訳) エストニアは、デジタル国家または電子国家という世界的名声を持っている。しかし、デジタルガバナンスの成功にもかかわらず、この国はオープン・ガバメント・データ(OGD)領域の領域で課題に直面しており、2020年以降の様々なオープン・データランキングに反映されるように、OGDエコシステムに大きな進歩を遂げている。本稿では,エストニアのOGD開発の発展と位置づけについて,さまざまな指標の統合分析,エストニアのOGDポータルからの一次データ,詳細な文献レビューを通じて検討する。この調査は、エストニアが全国レベルのオープンデータエコシステムを進歩させたことを示している。しかし、地方レベルでは発展せず、地方自治体はOGD規定に遅れを取っている。文献レビューは、エストニアとヨーロッパの地方オープンデータに焦点を当てた以前の研究の欠如を強調し、市町村のOGDの障壁と有効性を探究する将来の研究の必要性を強調している。この研究は、エストニアのOGDランドスケープにおけるダイナミックな旅の微妙な理解に寄与し、持続可能なオープンデータエコシステムを確立するためのさらなる注意を喚起する成果と領域の両方に光を当てている。

Estonia has a global reputation of a digital state or e-country. However, despite the success in digital governance, the country has faced challenges in the realm of Open Government Data (OGD) area, with significant advancements in its OGD ecosystem, as reflected in various open data rankings from 2020 and onwards, in the recent years being recognized among trend-setters. This paper aims to explore the evolution and positioning of Estonia's OGD development, encompassing national and local levels, through an integrated analysis of various indices, primary data from the Estonian OGD portal, and a thorough literature review. The research shows that Estonia has made progress in the national level open data ecosystem, primarily due to improvements in the OGD portal usability and legislation amendments. However, the local level is not as developed, with local governments lagging behind in OGD provision. The literature review highlights the lack of previous research focusing on Estonian and European local open data, emphasizing the need for future studies to explore the barriers and enablers of municipal OGD. This study contributes to a nuanced understanding of Estonia's dynamic journey in the OGD landscape, shedding light on both achievements and areas warranting further attention for establishing a sustainable open data ecosystem.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# 3次元CT画像におけるCOVID-19検出の進歩

Advancing COVID-19 Detection in 3D CT Scans ( http://arxiv.org/abs/2403.11953v1 )

ライセンス: Link先を確認

Qingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen,

(参考訳) より正確な新型コロナウイルスの診断を行うために,本研究では,単純で効果的なモデルを提案する。まず,3次元CTスキャンの特徴を分析し,非肺部位を除去し,病変関連領域に焦点をあてることと計算コストの低減を図る。我々はResNeSt50を強力な特徴抽出器として使用し、新型コロナウイルス特異的な事前知識を持つ事前訓練した重量で初期化する。本モデルは,第4回COV19Dコンペティションチャレンジ$\mathrm{I}$の検証セットで0.94のマクロF1スコアを達成し,ベースラインを16%超えた。これは、新型コロナウイルス(COVID-19)と非新型コロナウイルス(COVID-19)を区別する効果を示しており、新型コロナウイルス検出の堅牢な方法となっている。

To make a more accurate diagnosis of COVID-19, we propose a straightforward yet effective model. Firstly, we analyse the characteristics of 3D CT scans and remove the non-lung parts, facilitating the model to focus on lesion-related areas and reducing computational cost. We use ResNeSt50 as the strong feature extractor, initializing it with pretrained weights which have COVID-19-specific prior knowledge. Our model achieves a Macro F1 Score of 0.94 on the validation set of the 4th COV19D Competition Challenge $\mathrm{I}$, surpassing the baseline by 16%. This indicates its effectiveness in distinguishing between COVID-19 and non-COVID-19 cases, making it a robust method for COVID-19 detection.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# ディープラーニングによる言語進化

Language Evolution with Deep Learning ( http://arxiv.org/abs/2403.11958v1 )

ライセンス: Link先を確認

Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub,

(参考訳) 計算モデリングは言語の出現の研究において重要な役割を担っている。それは、シミュレートされた制御環境内で構造化言語が出現するきっかけとなる条件と学習過程をシミュレートすることを目的としている。エージェントベースシステム,ベイズエージェント,遺伝的アルゴリズム,ルールベースシステムなど,言語の起源を調べるために,いくつかの手法が用いられている。この章では、最近機械学習の分野に革命をもたらした別の種類の計算モデル、ディープ・ラーニング・モデルについて論じる。この章では、深層・強化学習法の基本概念を紹介し、言語の出現をシミュレートするための有用性を要約する。また、現実的なシミュレーションを構築するための重要な発見、制限、最近の試みについても論じている。この章は、言語進化を研究するツールとしてディープラーニングの導入を求めている言語学者や認知科学者を対象としている。

Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several methods have been used to investigate the origin of our language, including agent-based systems, Bayesian agents, genetic algorithms, and rule-based systems. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models. The chapter introduces the basic concepts of deep and reinforcement learning methods and summarizes their helpfulness for simulating language emergence. It also discusses the key findings, limitations, and recent attempts to build realistic simulations. This chapter targets linguists and cognitive scientists seeking an introduction to deep learning as a tool to investigate language evolution.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# IVAC-P2L:不規則な繰り返しを通したビデオアクションカウントの強化

IVAC-P2L: Enhancing Video Action Counting through Irregular Repetition Priors ( http://arxiv.org/abs/2403.11959v1 )

ライセンス: Link先を確認

Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang,

(参考訳) ビデオアクションカウント(英語: Video Action Counting, VAC)は、スポーツ、フィットネス、日々の活動を分析し、ビデオ内の反復行動の定量化に重要である。しかしながら、従来のVAC手法は、割り込みや周期の変動など、アクション反復の複雑さを見落としている。本研究は,IVAC(Irregular Video Action Counting)と呼ばれる新しいアプローチを導入することで,その欠点に対処する。 IVACはビデオにおける不規則な反復パターンのモデリングを優先し、サイクル間一貫性とサイクル間一貫性の2つの主要な側面で定義する。サイクル間一貫性は、サイクルセグメントの時空間表現における均一性を保証する。サイクル間不整合は、その固有の内容の違いに基づいて、サイクルセグメントと間隔を区別することの重要性を強調している。これらの原則をカプセル化するために,一意のプル・プッシュ・ロス(P2L)機構によって支持される一貫性と不整合モジュールを含む新しい方法論を提案する。 IVAC-P2Lモデルでは、周期セグメントの特徴間のコヒーレンスを促進するためにプルロスと、周期セグメントの特徴と間隔セグメントを明確に区別するためにプッシュロスを適用している。 RepCountデータセットで実施された実証評価では、IVAC-P2LモデルがVACタスク性能の新たなベンチマークを設定できることが示されている。さらに、このモデルは、データセット固有の最適化を必要とせずに、UCFRepとCountixという2つの追加データセット上で既存のモデルよりも優れた、様々なビデオコンテンツに対する例外的な適応性と一般化を示す。これらの結果は,ビデオにおける不規則な繰り返しに対処するためのアプローチの有効性を確認し,ビデオ分析と理解のさらなる進歩の道を開くものである。

Video Action Counting (VAC) is crucial in analyzing sports, fitness, and everyday activities by quantifying repetitive actions in videos. However, traditional VAC methods have overlooked the complexity of action repetitions, such as interruptions and the variability in cycle duration. Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC). IVAC prioritizes modeling irregular repetition patterns in videos, which we define through two primary aspects: Inter-cycle Consistency and Cycle-interval Inconsistency. Inter-cycle Consistency ensures homogeneity in the spatial-temporal representations of cycle segments, signifying action uniformity within cycles. Cycle-interval inconsistency highlights the importance of distinguishing between cycle segments and intervals based on their inherent content differences. To encapsulate these principles, we propose a new methodology that includes consistency and inconsistency modules, supported by a unique pull-push loss (P2L) mechanism. The IVAC-P2L model applies a pull loss to promote coherence among cycle segment features and a push loss to clearly distinguish features of cycle segments from interval segments. Empirical evaluations conducted on the RepCount dataset demonstrate that the IVAC-P2L model sets a new benchmark in VAC task performance. Furthermore, the model demonstrates exceptional adaptability and generalization across various video contents, outperforming existing models on two additional datasets, UCFRep and Countix, without the need for dataset-specific optimization. These results confirm the efficacy of our approach in addressing irregular repetitions in videos and pave the way for further advancements in video analysis and understanding.

翻訳日:2024-03-20 19:30:44 公開日:2024-03-18

# Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation

CASPER: Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation ( http://arxiv.org/abs/2403.11960v1 )

ライセンス: Link先を確認

Baoyu Jing, Dawei Zhou, Kan Ren, Carl Yang,

(参考訳) 時空間時系列は人間の活動とその影響を理解する基礎であり、通常は異なる場所に配置された監視センサーを通して収集される。収集されたデータは通常、さまざまな障害のために欠落した値を含んでおり、データ分析に大きな影響を及ぼす。欠落した値を暗示するために、多くのメソッドが導入されている。特定のデータポイントを復元する場合、ほとんどの既存手法は、原因と効果の関係の有無にかかわらず、そのポイントに関連するすべての情報を考慮する傾向にある。データ収集の過程では、例えば時系列のバックグラウンドノイズや、構築されたセンサネットワーク内の非因果的ショートカットエッジなど、未知の共同創設者が含まれていることは避けられない。これらの共同設立者は、インプットとアウトプットの間にバックドアパスを開くことができ、言い換えれば、インプットとアウトプットの間に非因果関係を確立することができる。これらの非因果関係を過度に探索すると、過度に適合し、モデルをノイズに弱いものにすることができる。本稿では,入力,出力,埋め込み,共同設立者間の因果関係を示す因果的視点から,まず時空間的時系列計算を再考する。次に、玄関の調整を通じて共同ファウンダーをブロックする方法を示す。正面調整の結果に基づき,新しい時空間注意 (SCA) と Prompt Based Decoder (PBD) を含むCausality-Aware SPatiotEmpoRal graph Neural Network (CASPER) を導入する。 PBDは共同設立者の影響を減らし、SCAは埋め込み間の微妙な因果関係を発見する可能性がある。理論的解析によると、SCAは勾配の値に基づいて因果関係を発見する。実世界の3つのデータセット上でCasperを評価し,実験結果から,Casperはベースラインよりも優れ,因果関係を効果的に発見できることが示された。

Spatiotemporal time series is the foundation of understanding human activities and their impacts, which is usually collected via monitoring sensors placed at different locations. The collected data usually contains missing values due to various failures, which have significant impact on data analysis. To impute the missing values, a lot of methods have been introduced. When recovering a specific data point, most existing methods tend to take into consideration all the information relevant to that point regardless of whether they have a cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths between the input and output, in other words, they establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could result in overfitting and make the model vulnerable to noises. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective, which shows the causal relationships among the input, output, embeddings and confounders. Next, we show how to block the confounders via the frontdoor adjustment. Based on the results of the frontdoor adjustment, we introduce a novel Causality-Aware SPatiotEmpoRal graph neural network (CASPER), which contains a novel Spatiotemporal Causal Attention (SCA) and a Prompt Based Decoder (PBD). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper outperforms the baselines and effectively discovers causal relationships.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 運動補償によるイベントベースビデオ再構成の強化

Enhanced Event-Based Video Reconstruction with Motion Compensation ( http://arxiv.org/abs/2403.11961v1 )

ライセンス: Link先を確認

Siying Liu, Pier Luigi Dragotti,

(参考訳) イベントベースのビデオ再構成のためのディープニューラルネットワークは、解釈可能性の欠如と高いメモリ要求に悩まされることが多い。 CISTA-LSTCと呼ばれる軽量ネットワークが最近導入され、アーキテクチャの体系的設計により高品質な再構築が達成されている。しかし、入力信号と出力再構成フレームが同じスパース表現を共有するというモデリング仮定は、動きによる変位を無視する。そこで本研究では,入力強度フレームとスパース符号の歪みを補正し,再現性を向上させることを提案する。 CISTA-Flowネットワークは、動き補償のためのフローネットワークとCISTA-LSTCを統合することで構築される。このシステムは、予測フローが再構築に役立ち、フロー推定を容易にするために再構築されたフレームを使用するイベントにのみ依存する。この組み合わせシステムのための反復的なトレーニングフレームワークも導入する。以上の結果から,本手法は最先端の復元精度を達成し,信頼性の高い高密度流れ推定を同時に提供することを示す。さらに,本モデルでは,異なるフローネットワークを統合可能な柔軟性を示し,さらなる性能向上の可能性を示唆している。

Deep neural networks for event-based video reconstruction often suffer from a lack of interpretability and have high memory demands. A lightweight network called CISTA-LSTC has recently been introduced showing that high-quality reconstruction can be achieved through the systematic design of its architecture. However, its modelling assumption that input signals and output reconstructed frame share the same sparse representation neglects the displacement caused by motion. To address this, we propose warping the input intensity frames and sparse codes to enhance reconstruction quality. A CISTA-Flow network is constructed by integrating a flow network with CISTA-LSTC for motion compensation. The system relies solely on events, in which predicted flow aids in reconstruction and then reconstructed frames are used to facilitate flow estimation. We also introduce an iterative training framework for this combined system. Results demonstrate that our approach achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation. Furthermore, our model exhibits flexibility in that it can integrate different flow networks, suggesting its potential for further performance enhancement.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 境界密度比を超えた伝達学習

Transfer Learning Beyond Bounded Density Ratios ( http://arxiv.org/abs/2403.11963v1 )

ライセンス: Link先を確認

Alkis Kalavasis, Ilias Zadik, Manolis Zampetakis,

(参考訳) 学習アルゴリズムは、あるソース分布からデータを収集するが、異なるターゲット分布に対して$Q$である。測度論の標準的な変化は、転送学習が密度比$dQ/dP$が有界であるときに起こることを意味する。しかし、Kpotufe と Martinet (COLT, 2018) と Hanneke と Kpotufe (NeurIPS, 2019) による事前の思考誘発研究は、dQ/dP$ の比率が未有のケースを実証している。本研究では,低次多項式推定器のクラスにおける伝達学習に着目した。我々の主な結果は、領域 $\mathbb{R}^n$ 上の一般的な移動不等式であり、低次多項式に対する非自明な移動学習は、非常に穏やかな仮定の下で可能であることを証明し、$dQ/dP$ が有界であるという古典的な仮定をはるかに超えている。例えば、$Q$ が対数凹測度であり、逆比 $dP/dQ$ が有界である場合、常に適用される。不等式の適用性を実証するため,(1) 古典的truncated regression set, where $dQ/dP$ equals infinity, (2) より最近の変換器を用いたインコンテキスト学習線形関数のアウト・オブ・ディストリビューション一般化set という設定で新たな結果を得た。また、Boolean Hypercube $\{-1,1\}^n$ 上での移動不等式を離散的に類似させ、Abe, Bengio, Lotfi, Rizk (ICML, 2023) の不等式に関する最近の一般化問題との関係について研究する。我々の主要な概念的貢献は、推定器 $\widehat{f}-f^*$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^*)$ の誤差の最大値は、転送可能性の十分な条件として作用するということである。

We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio $dQ/dP$ is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain $\mathbb{R}^n$, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded. For instance, it always applies if $Q$ is a log-concave measure and the inverse ratio $dP/dQ$ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where $dQ/dP$ equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube $\{-1,1\}^n$, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator $\widehat{f}-f^*$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^*)$, acts as a sufficient condition for transferability; when $\mathrm{I}_{\max}(\widehat{f}-f^*)$ is appropriately bounded, transfer is possible over the Boolean domain.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# ニューラルネットワーク回帰設計による確率的校正

Probabilistic Calibration by Design for Neural Network Regression ( http://arxiv.org/abs/2403.11964v1 )

ライセンス: Link先を確認

Victor Dheur, Souhaib Ben Taieb,

(参考訳) 多くの実世界のアプリケーションにおいて、回帰問題に対する校正および鋭いニューラルネットワーク予測分布の生成は最適な意思決定に不可欠である。ニューラルネットワークの誤校正問題に対処するために、トレーニング後の予測を調整するポストホック法や、トレーニング中に行動する正規化法など、キャリブレーションを改善するための様々な方法が提案されている。ポストホック法は正則化法に比べてキャリブレーションの改善が進んでいるが, ポストホック法はモデルトレーニングから完全に独立している。本稿では,量子校正トレーニング(Quantile Recalibration Training)と呼ばれる新しいエンド・ツー・エンドのモデルトレーニング手法を導入し,時間後校正を追加パラメータなしで直接トレーニングプロセスに統合する。また,本手法や他のポストホック法および正規化法を含む統一アルゴリズムを提案する。本研究では,57個のグラフ回帰データセットを用いた大規模実験を行い,キャリブレーションを維持しながら予測精度の向上を示す。また,提案手法における異なる成分の意義を評価するためのアブレーション研究や,ベースモデルと異なるハイパーパラメータが予測精度に与える影響の詳細な分析を行った。

Generating calibrated and sharp neural network predictive distributions for regression problems is essential for optimal decision-making in many real-world applications. To address the miscalibration issue of neural networks, various methods have been proposed to improve calibration, including post-hoc methods that adjust predictions after training and regularization methods that act during training. While post-hoc methods have shown better improvement in calibration compared to regularization methods, the post-hoc step is completely independent of model training. We introduce a novel end-to-end model training procedure called Quantile Recalibration Training, integrating post-hoc calibration directly into the training process without additional parameters. We also present a unified algorithm that includes our method and other post-hoc and regularization methods, as particular cases. We demonstrate the performance of our method in a large-scale experiment involving 57 tabular regression datasets, showcasing improved predictive accuracy while maintaining calibration. We also conduct an ablation study to evaluate the significance of different components within our proposed method, as well as an in-depth analysis of the impact of the base model and different hyperparameters on predictive accuracy.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 軌道予測のためのインフォームドスペクトル正規化ガウス過程

Informed Spectral Normalized Gaussian Processes for Trajectory Prediction ( http://arxiv.org/abs/2403.11966v1 )

ライセンス: Link先を確認

Christian Schlauch, Christian Wirth, Nadja Klein,

(参考訳) 事前パラメータ分布は、情報学習のための事前の専門家と世界の知識を表現するエレガントな方法を提供する。従来の研究は、確率的深層学習(DL)モデルを正規化するために、そのような情報的事前の使用により、その性能とデータ効率が向上することが示されている。しかし、確率的DLモデルのサンプリングベース近似は、複数の推論パスと長いトレーニング時間を必要とするため、計算コストがかかる可能性がある。提案手法は、スペクトル正規化ガウス過程(SNGP)のような計算効率のよい最終層カーネル近似である。本稿では,従来のタスクから学習した事前知識を表す情報的事前情報の利用を可能にする,新しい正規化に基づくSNGPの連続学習手法を提案する。提案手法は確立された手法に基づいており,リハーサルメモリやパラメータ拡張を必要としない。本研究では, 自律運転における軌道予測問題に対する情報SNGPモデルの適用について検討した。 2つの公開データセットにおいて、トレーニングデータとロケーション間の性能を低下させ、非インフォームドベースラインおよびインフォームドベースライン上でのデータ効率とロバスト性の向上を実証する。

Prior parameter distributions provide an elegant way to represent prior expert and world knowledge for informed learning. Previous work has shown that using such informative priors to regularize probabilistic deep learning (DL) models increases their performance and data-efficiency. However, commonly used sampling-based approximations for probabilistic DL models can be computationally expensive, requiring multiple inference passes and longer training times. Promising alternatives are compute-efficient last layer kernel approximations like spectral normalized Gaussian processes (SNGPs). We propose a novel regularization-based continual learning method for SNGPs, which enables the use of informative priors that represent prior knowledge learned from previous tasks. Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion. We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge. On two public datasets, we investigate its performance under diminishing training data and across locations, and thereby demonstrate an increase in data-efficiency and robustness to location-transfers over non-informed and informed baselines.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 強相互干渉型超伝導回路格子におけるサイト分解電流の探索

Probing Site-Resolved Current in Strongly Interacting Superconducting Circuit Lattices ( http://arxiv.org/abs/2403.11967v1 )

ライセンス: Link先を確認

Botao Du, Ramya Suresh, Santiago López, Jeremy Cadiente, Ruichao Ma,

(参考訳) 輸送測定は、超伝導から分数量子ホール効果まで、凝縮物質現象を理解するための基礎となる。対照的に、これらは量子シミュレーターで合成量子物質を探索するための強力なツールである。ここでは超伝導回路格子内のその場粒子電流の測定を実演し、コヒーレントおよびバス結合格子の輸送の研究に応用する。本手法は,2重井戸電位による制御トンネル法を用いて,電流をオンサイト密度にマッピングし,サイト解決電流と電流統計を明らかにする。格子充填の異なるBose-Hubbard格子を強く相互作用させ、多体状態が超流動からモット絶縁体へ遷移するにつれて現在の統計の変化を観察する。さらに、格子を調整可能な粒子源および排水源として機能する工学的駆動散逸浴に結合させることにより、非平衡電流力学を考察する。離散導電路および相互作用支援輸送路における定常電流を観測する。これらの結果は超伝導回路における微視的量子輸送を研究するための多用途プラットフォームを確立する。

Transport measurements are fundamental for understanding condensed matter phenomena, from superconductivity to the fractional quantum Hall effect. Analogously, they can be powerful tools for probing synthetic quantum matter in quantum simulators. Here we demonstrate the measurement of in-situ particle current in a superconducting circuit lattice and apply it to study transport in both coherent and bath-coupled lattices. Our method utilizes controlled tunneling in a double-well potential to map current to on-site density, revealing site-resolved current and current statistics. We prepare a strongly interacting Bose-Hubbard lattice at different lattice fillings, and observe the change in current statistics as the many-body states transition from superfluid to Mott insulator. Furthermore, we explore non-equilibrium current dynamics by coupling the lattice to engineered driven-dissipative baths that serve as tunable particle source and drain. We observe steady-state current in discrete conduction channels and interaction-assisted transport. These results establish a versatile platform to investigate microscopic quantum transport in superconducting circuits.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# クラシファイアフリーガイダンスを用いた未開条件拡散モデル:シャープ統計理論

Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory ( http://arxiv.org/abs/2403.11968v1 )

ライセンス: Link先を確認

Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen,

(参考訳) 条件付き拡散モデルは現代の画像合成の基礎となり、計算生物学や強化学習などの分野に広く応用されている。これらの応用において、条件拡散モデルには、様々な条件情報、例えばプロンプト入力が組み込まれ、サンプル生成を所望の特性に導く。経験的成功にもかかわらず、条件拡散モデルの理論はほとんど欠落している。本稿では, 条件拡散モデルを用いた分布推定の急激な統計的理論を提示することにより, このギャップを埋める。解析の結果,データ分布の滑らかさに適応し,ミニマックス下界に適合するサンプル複雑性境界が得られた。我々の理論の発展の鍵は条件付きスコア関数の近似結果にある。さらに,強化学習におけるモデルベース遷移カーネル推定,逆問題解,報奨条件付きサンプル生成など,多種多様な応用における条件拡散モデルの性能を明らかにするための統計的理論の有用性を示した。

Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of conditional diffusion models is largely missing. This paper bridges this gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models. Our analysis yields a sample complexity bound that adapts to the smoothness of the data distribution and matches the minimax lower bound. The key to our theoretical development lies in an approximation result for the conditional score function, which relies on a novel diffused Taylor approximation technique. Moreover, we demonstrate the utility of our statistical theory in elucidating the performance of conditional diffusion models across diverse applications, including model-based transition kernel estimation in reinforcement learning, solving inverse problems, and reward conditioned sample generation.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 深い熱化とヒルベルト空間エルゴディディティにおける最大エントロピー原理

A Maximum Entropy Principle in Deep Thermalization and in Hilbert-Space Ergodicity ( http://arxiv.org/abs/2403.11970v1 )

ライセンス: Link先を確認

Daniel K. Mark, Federica Surace, Andreas Elben, Adam L. Shaw, Joonhee Choi, Gil Refael, Manuel Endres, Soonwon Choi,

(参考訳) 量子多体系において自然に現れる純粋状態のアンサンブルによって示される普遍統計特性について報告する。具体的には、状態アンサンブルの2つのクラスが考慮される。一一量子状態の一様進化の時空間軌道二部分的局所射影測定により得られる小サブシステムの量子状態これらのケースはそれぞれ「ヒルベルト空間エルゴード性」と「深熱化」の現象を例示している。純粋な状態の分布は最大エントロピーを持ち、エネルギー保存のような制約や熱化による効果的な制約を受ける。我々は、アンサンブルの全ての統計モーメントに対して明示的な公式を導出し、広く受け入れられた仮定の下でそのような普遍性に必要な必要十分条件を証明し、実験においてそれらの測定可能な結果を記述することによって、この原理の定量化シグネチャを提示し、数値的に検証する。我々はさらに、この普遍性に関する情報理論的含意について論じる:我々のアンサンブルは、極度に尋問が困難でありながら、最大の情報内容を持ち、自然に隠れる(スクランブル)情報で起こる一般的な量子状態アンサンブルをできるだけ強くする。この結果はヒルベルト空間エルゴード性の概念を時間に依存しないハミルトン力学と無限から有限の有効温度での深熱化に一般化する。我々の研究は、統計および情報理論ツールを用いて量子力学の普遍的挙動を特徴づけ、理解するための新しい視点を提示している。

We report universal statistical properties displayed by ensembles of pure states that naturally emerge in quantum many-body systems. Specifically, two classes of state ensembles are considered: those formed by i) the temporal trajectory of a quantum state under unitary evolution or ii) the quantum states of small subsystems obtained by partial, local projective measurements performed on their complements. These cases respectively exemplify the phenomena of "Hilbert-space ergodicity" and "deep thermalization." In both cases, the resultant ensembles are defined by a simple principle: the distributions of pure states have maximum entropy, subject to constraints such as energy conservation, and effective constraints imposed by thermalization. We present and numerically verify quantifiable signatures of this principle by deriving explicit formulae for all statistical moments of the ensembles; proving the necessary and sufficient conditions for such universality under widely-accepted assumptions; and describing their measurable consequences in experiments. We further discuss information-theoretic implications of the universality: our ensembles have maximal information content while being maximally difficult to interrogate, establishing that generic quantum state ensembles that occur in nature hide (scramble) information as strongly as possible. Our results generalize the notions of Hilbert-space ergodicity to time-independent Hamiltonian dynamics and deep thermalization from infinite to finite effective temperature. Our work presents new perspectives to characterize and understand universal behaviors of quantum dynamics using statistical and information theoretic tools.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# ヒルベルト空間エルゴディディティからの普遍的ゆらぎと雑音学習

Universal fluctuations and noise learning from Hilbert-space ergodicity ( http://arxiv.org/abs/2403.11971v1 )

ライセンス: Link先を確認

Adam L. Shaw, Daniel K. Mark, Joonhee Choi, Ran Finkelstein, Pascal Scholl, Soonwon Choi, Manuel Endres,

(参考訳) 熱平衡に達するシステムはユビキタスである。古典系では、この現象は一般に位相空間のエルゴード性を通じて統計的に理解されるが、量子系への変換は長年の関心事である。最近では、ヒルベルト空間エルゴード性と呼ばれる、孤立した大域量子状態が利用可能な状態空間を均一に探索する、エルゴード性という量子概念が提案されている。ここでは、実験的なRydberg量子シミュレータと様々な数値モデルを用いて、このプロセスのシグネチャを観察し、その環境と相互作用する局所量子系の場合を一般化する。環境が相補的なサブシステムである閉系では、観測可能なものは大きな量子的ゆらぎから小さなガウス的ゆらぎへと変化し、浴槽のサイズが大きくなるにつれて、スムーズな量子-古典的遷移を予測および観測する。この遷移は、有限温度、イテナント粒子を持つもの、ランダム回路を含む幅広い系の定量的レベルにおいて普遍的である。次に、外部環境とノイズに相互作用するオープンシステムの場合について考察する。相関誤差を含むほぼ任意のノイズチャネル下で観測可能な天体の統計を予測し、連続的なハミルトン時間進化とディジタルランダム回路の両方において候補誤差モデルを識別する。最終的に、我々の結果は量子力学におけるエルゴード性の役割を明らかにし、基本的な結果と実践的な結果を得た。

Systems reaching thermal equilibrium are ubiquitous. For classical systems, this phenomenon is typically understood statistically through ergodicity in phase space, but translating this to quantum systems is a long-standing problem of interest. Recently a quantum notion of ergodicity has been proposed, namely that isolated, global quantum states uniformly explore their available state space, dubbed Hilbert-space ergodicity. Here we observe signatures of this process with an experimental Rydberg quantum simulator and various numerical models, before generalizing to the case of a local quantum system interacting with its environment. For a closed system, where the environment is a complementary subsystem, we predict and observe a smooth quantum-to-classical transition in that observables progress from large, quantum fluctuations to small, Gaussian fluctuations as the bath size grows. This transition is universal on a quantitative level amongst a wide range of systems, including those at finite temperature, those with itinerant particles, and random circuits. Then, we consider the case of an open system interacting noisily with an external environment. We predict the statistics of observables under largely arbitrary noise channels including those with correlated errors, allowing us to discriminate candidate error models both for continuous Hamiltonian time evolution and for digital random circuits. Ultimately our results clarify the role of ergodicity in quantum dynamics, with fundamental and practical consequences.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 量子場論における量子参照フレーム、測定スキームおよび局所代数の種類

Quantum reference frames, measurement schemes and the type of local algebras in quantum field theory ( http://arxiv.org/abs/2403.11973v1 )

ライセンス: Link先を確認

Christopher J. Fewster, Daan W. Janssen, Leon Deryck Loveridge, Kasia Rejzner, James Waldron,

(参考訳) 本研究では、相対論的量子計測理論と量子参照フレーム(QRF)を組み合わせ、背景の量子場と対称性の局所的な測定をQRFに対して行う運用フレームワークを開発する。これにより、時空等距離群の自然な作用の下で不変である量子場と参照フレームの合同代数が得られる。量子参照フレームの適切なクラスに対して、この代数は交叉積の項でパラメータ化される。量子場が良い熱的性質(ある種の非零温度でのKMS状態の存在によって表される)を持つとすると、モジュラー理論を用いて不変代数が半有限トレースを持つことを示すことができる。さらに、量子参照フレームが同じ温度で(KMSの重みによって表される)良好な熱挙動を持つ場合、このトレースは有限である。物理的可観測物の不変代数が $\textnormal{II}_1$ factor であるような正確な条件を与える。この結果はChandrasekaran, Longo, Penington and Witten [JHEP 2023, 82 (2023)] の最近の研究に基づいている。

We develop an operational framework, combining relativistic quantum measurement theory with quantum reference frames (QRFs), in which local measurements of a quantum field on a background with symmetries are performed relative to a QRF. This yields a joint algebra of quantum-field and reference-frame observables that is invariant under the natural action of the group of spacetime isometries. For the appropriate class of quantum reference frames, this algebra is parameterised in terms of crossed products. Provided that the quantum field has good thermal properties (expressed by the existence of a KMS state at some nonzero temperature), one can use modular theory to show that the invariant algebra admits a semifinite trace. If furthermore the quantum reference frame has good thermal behaviour (expressed by the existence of a KMS weight) at the same temperature, this trace is finite. We give precise conditions for the invariant algebra of physical observables to be a type $\textnormal{II}_1$ factor. Our results build upon recent work of Chandrasekaran, Longo, Penington and Witten [JHEP 2023, 82 (2023)], providing both a significant mathematical generalisation of these findings and a refined operational understanding of their model.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# OUCopula:OU-UWF画像に基づくマイオピアスクリーニング用バイチャネルマルチラベルコプラ拡張アダプタベースCNN

OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images ( http://arxiv.org/abs/2403.11974v1 )

ライセンス: Link先を確認

Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Bo Fu, Catherien C. Liu, Xingtao Zhou,

(参考訳) 近縁超広視野撮影(UWF)による近視性検診は眼科成績に有意な影響を及ぼすと考えられた。現在,眼科と深部学習(DL)の多分野的な研究は,Oculus Uterque (OU, 両眼)のジョイントモデリングと予測を無視した単一眼画像を用いた疾患分類と診断に重点を置いている。 OUの複雑な関係と、(連続した)結果ラベル(球面平衡と軸長)の高相関に着想を得て、OU UWFファウンダス画像(OUCopula)を用いたコプラエンハンスアダプタ畳み込みニューラルネットワーク(CNN)学習の枠組みを提案し、複数の臨床スコアの同時予測を行う。我々は,(1)高い相関性と不均一性の両方を考慮した2チャネル画像入力を(同一のバックボーンネットワークを共有し,アダプタを用いてチャネル単位の差分をパラメータ化することにより)実現可能な,新しい2チャネルマルチラベルCNNを設計し,(2)連続出力ラベル間の相関情報を(コプラを用いて)組み込む。 OUCopulaは、バックボーンモデルと比較して、ミオピアスコア予測において満足な性能を発揮することを示す。さらに、OUCopulaはシングルアイ入力用に構築されたモデルの性能をはるかに上回ることができる。また,両チャネルモデルがマルチチャネルパラダイムに拡張される可能性や,OUCopulaが様々なバックボーンCNNにまたがる一般化可能性についても示唆した。

Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex relationships between OU and the high correlation between the (continuous) outcome labels (Spherical Equivalent and Axial Length), we propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores. We design a novel bi-channel multi-label CNN that can (1) take bi-channel image inputs subject to both high correlation and heterogeneity (by sharing the same backbone network and employing adapters to parameterize the channel-wise discrepancy), and (2) incorporate correlation information between continuous output labels (using a copula). Solid experiments show that OUCopula achieves satisfactory performance in myopia score prediction compared to backbone models. Moreover, OUCopula can far exceed the performance of models constructed for single-eye inputs. Importantly, our study also hints at the potential extension of the bi-channel model to a multi-channel paradigm and the generalizability of OUCopula across various backbone CNNs.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 非拘束3次元運動モデルを用いた単眼カメラによる歩行者追跡

Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model ( http://arxiv.org/abs/2403.11978v1 )

ライセンス: Link先を確認

Jan Krejčí, Oliver Kost, Ondřej Straka, Jindřich Duník,

(参考訳) 歩行者追跡のための第一原理単目的モデルを提案する。移動物体の広さは、歩行者の高さなどの3次元の既知の統計によって説明できると仮定される。提案したモデルでは3次元の物体の動きを共通の地上面に制約する必要はない。このモデルのための非線形フィルタは、無人カルマンフィルタ(UKF)を用いて実装され、公開されているMOT-17データセットを用いてテストされる。提案手法は, 2次元画像に投影された場合, 完全な結果を維持しつつ, 3次元で有望な結果が得られる。さらに、推定誤差の共分散は真に一致する。従来の手法とは異なり、導入されたモデルパラメータは便利な意味を持ち、問題に対して容易に調整できる。

A first-principle single-object model is proposed for pedestrian tracking. It is assumed that the extent of the moving object can be described via known statistics in 3D, such as pedestrian height. The proposed model thus need not constrain the object motion in 3D to a common ground plane, which is usual in 3D visual tracking applications. A nonlinear filter for this model is implemented using the unscented Kalman filter (UKF) and tested using the publicly available MOT-17 dataset. The proposed solution yields promising results in 3D while maintaining perfect results when projected into the 2D image. Moreover, the estimation error covariance matches the true one. Unlike conventional methods, the introduced model parameters have convenient meaning and can readily be adjusted for a problem.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# クリーンラベルの毒殺対策としての拡散脱臭

Diffusion Denoising as a Certified Defense against Clean-label Poisoning ( http://arxiv.org/abs/2403.11981v1 )

ライセンス: Link先を確認

Sanghyun Hong, Nicholas Carlini, Alexey Kurakin,

(参考訳) クリーンラベル中毒に対する認証された防御策を提示する。これらの攻撃は、トレーニングデータに$p$-normの有界対向摂動を含む少数の毒サンプル(例、1%)を注入して、テストタイム入力のターゲットの誤分類を誘導する。 $denoized$$smoothing$によって達成された対向ロバスト性に着想を得て、オフ・ザ・シェルフ拡散モデルが、改ざんしたトレーニングデータをどのように浄化するかを示す。 7件のクリーンラベル中毒に対する我々の防御を広範囲に検証し、その攻撃成功率を0-16%に抑え、テスト時間の精度は無視できない程度に低下した。我々は,我々の防衛をクリーンラベル中毒に対する既存の対策と比較し,攻撃成功率を最も低くし,最良のモデルユーティリティを提供することを示す。以上の結果から,より強力なクリーンラベル攻撃の開発に向けた今後の取り組みの必要性と,これらの攻撃を評価するための強力なベースラインとして,我々の認定された実用的防御を活用できることが浮き彫りにされた。

We present a certified defense to clean-label poisoning attacks. These attacks work by injecting a small number of poisoning samples (e.g., 1%) that contain $p$-norm bounded adversarial perturbations into the training data to induce a targeted misclassification of a test-time input. Inspired by the adversarial robustness achieved by $denoised$ $smoothing$, we show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy. We compare our defense with existing countermeasures against clean-label poisoning, showing that the defense reduces the attack success the most and offers the best model utility. Our results highlight the need for future work on developing stronger clean-label attacks and using our certified yet practical defense as a strong baseline to evaluate these attacks.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 生成テキストモデルを用いた学生による授業評価のための定性的コードブックの作成

Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching ( http://arxiv.org/abs/2403.11984v1 )

ライセンス: Link先を確認

Andrew Katz, Mitchell Gerhardt, Michelle Soledad,

(参考訳) フィードバックは改善の重要な側面です。残念なことに、複数のソースからの多くのフィードバックがある場合、情報を実用的な洞察に抽出することは困難です。教育者にとって重要なフィードバック源である教育評価(SET)について考察する。授業中の動作についてインストラクターに洞察を与えることができる。 SETのコレクションは、管理者がコースやプログラム全体の信号として役立つ。しかし、数年間にわたる高学歴や行政記録のように大規模に、SETの量は分析を困難にしている。本稿では,自然言語処理(NLP)と大規模言語モデル(LLM)を用いたSETの解析手法について述べる。大規模公立大学から5,000SETのコーパスに適用し,本手法を実証する。提案手法は,SETを抽出,埋め込み,クラスタ化,要約して表現するテーマを識別するために利用できることを示す。より一般的に、この研究はNLP技術とLLMを組み合わせてSETのコードブックを生成する方法を示している。本稿では,本手法が授業や研究環境において,SETやその他の学生の書き方を分析することの意義について論じる。

Feedback is a critical aspect of improvement. Unfortunately, when there is a lot of feedback from multiple sources, it can be difficult to distill the information into actionable insights. Consider student evaluations of teaching (SETs), which are important sources of feedback for educators. They can give instructors insights into what worked during a semester. A collection of SETs can also be useful to administrators as signals for courses or entire programs. However, on a large scale as in high-enrollment courses or administrative records over several years, the volume of SETs can render them difficult to analyze. In this paper, we discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs). We demonstrate the method by applying it to a corpus of 5,000 SETs from a large public university. We show that the method can be used to extract, embed, cluster, and summarize the SETs to identify the themes they express. More generally, this work illustrates how to use the combination of NLP techniques and LLMs to generate a codebook for SETs. We conclude by discussing the implications of this method for analyzing SETs and other types of student writing in teaching and research settings.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# GetMesh: 高品質なメッシュ生成と操作のための制御可能なモデル

GetMesh: A Controllable Model for High-quality Mesh Generation and Manipulation ( http://arxiv.org/abs/2403.11990v1 )

ライセンス: Link先を確認

Zhaoyang Lyu, Ben Fei, Jinyi Wang, Xudong Xu, Ya Zhang, Weidong Yang, Bo Dai,

(参考訳) Meshは様々な産業アプリケーションにおける3Dアセットの基本的な表現であり、プロのソフトウェアによって広く支持されている。しかし、その不規則な構造のため、メッシュの生成と操作はしばしば時間と労力がかかる。本稿では,メッシュ生成と異なるカテゴリ間の操作のための,高度に制御可能な生成モデルGetMeshを提案する。さまざまなポイントを潜在表現として取り、それらをトリプレーン表現として再編成することで、GetMeshはリッチでシャープな詳細を持つメッシュを生成し、単一カテゴリとマルチカテゴリの両方を上回るパフォーマンスを実現している。さらに、グローバル/ローカルメッシュトポロジの変更、メッシュ部品の追加/削除、カテゴリ間のメッシュ部品の結合といった、従来のメッシュ生成モデルでは達成できなかった生成プロセスのきめ細かい制御も、遅延点の数、位置、特徴を調整することで、直感的に、効率よく、堅牢に実現できる。プロジェクトページはhttps://getmesh.github.io.com。

Mesh is a fundamental representation of 3D assets in various industrial applications, and is widely supported by professional softwares. However, due to its irregular structure, mesh creation and manipulation is often time-consuming and labor-intensive. In this paper, we propose a highly controllable generative model, GetMesh, for mesh generation and manipulation across different categories. By taking a varying number of points as the latent representation, and re-organizing them as triplane representation, GetMesh generates meshes with rich and sharp details, outperforming both single-category and multi-category counterparts. Moreover, it also enables fine-grained control over the generation process that previous mesh generative models cannot achieve, where changing global/local mesh topologies, adding/removing mesh parts, and combining mesh parts across categories can be intuitively, efficiently, and robustly accomplished by adjusting the number, positions or features of latent points. Project page is https://getmesh.github.io.

翻訳日:2024-03-20 19:20:58 公開日:2024-03-18

# 変分量子固有解法の成功指標としてのハミルトン-再構成距離

Hamiltonian-reconstruction distance as a success metric for the Variational Quantum Eigensolver ( http://arxiv.org/abs/2403.11995v1 )

ライセンス: Link先を確認

Leo Joon Il Moon, Mandar M. Sohoni, Michael A. Shimizu, Praveen Viswanathan, Kevin Zhang, Eun-Ah Kim, Peter L. McMahon,

(参考訳) 変分量子固有解法(VQE)は、量子シミュレーションのためのハイブリッド量子古典的アルゴリズムである。ハミルトンの基底状態を見つけるための他のヒューリスティックアルゴリズムと同様に、VQEの課題は、真の基底状態と基底状態エネルギーが未知のとき、アルゴリズムの出力解が真の基底状態にどの程度近いかを知ることである。これは、誤った早期終了を避けたいVQEのような反復アルゴリズムにおいて特に重要である。ハミルトニアン再構成の最近の発展 - 固有状態が与えられるハミルトニアンの推定 - は、ハミルトン固有解問題に対する変分解の質を評価するために、計量を与えることができる。この計量は、真の基底状態や基底状態エネルギーを知ることなく、基底状態への変分解の近接性を評価することができる。数値シミュレーションやクラウドベースのトラップイオン量子コンピュータでの実証では、一次元横フィールドイシング(11 qubits)と2次元J1-J2横フィールドイシング(6 qubits)のスピン問題の両方の場合、ハミルトン再構成距離は、VQEが基底状態を発見していないかどうかを示す有益な指標となる。我々の実験では、VQE反復の関数としてのエネルギープラトーがVQEアルゴリズムの誤った早期停止をもたらす可能性があるが、ハミルトン-再構成距離が正しく繰り返し続けることを示唆するケースを含む。ハミルトン-再構成距離は、VQE溶液と真の基底状態の間の忠実度と有用な相関関係を持つ。我々の研究は、ハミルトニアン-再構成距離が、実際にノイズの多い量子プロセッサを含むVQEの成功を評価するのに役立つことを示唆している。

The Variational Quantum Eigensolver (VQE) is a hybrid quantum-classical algorithm for quantum simulation that can be run on near-term quantum hardware. A challenge in VQE -- as well as any other heuristic algorithm for finding ground states of Hamiltonians -- is to know how close the algorithm's output solution is to the true ground state, when the true ground state and ground-state energy are unknown. This is especially important in iterative algorithms, such as VQE, where one wants to avoid erroneous early termination. Recent developments in Hamiltonian reconstruction -- the inference of a Hamiltonian given an eigenstate -- give a metric can be used to assess the quality of a variational solution to a Hamiltonian-eigensolving problem. This metric can assess the proximity of the variational solution to the ground state without any knowledge of the true ground state or ground-state energy. In numerical simulations and in demonstrations on a cloud-based trapped-ion quantum computer, we show that for examples of both one-dimensional transverse-field-Ising (11 qubits) and two-dimensional J1-J2 transverse-field-Ising (6 qubits) spin problems, the Hamiltonian-reconstruction distance gives a helpful indication of whether VQE has yet found the ground state or not. Our experiments included cases where the energy plateaus as a function of the VQE iteration, which could have resulted in erroneous early stopping of the VQE algorithm, but where the Hamiltonian-reconstruction distance correctly suggests to continue iterating. We find that the Hamiltonian-reconstruction distance has a useful correlation with the fidelity between the VQE solution and the true ground state. Our work suggests that the Hamiltonian-reconstruction distance may be a useful tool for assessing success in VQE, including on noisy quantum processors in practice.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# 生成的知識抽出、グラフベース表現、マルチモーダル・インテリジェントグラフ推論による科学的発見の高速化

Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning ( http://arxiv.org/abs/2403.11996v1 )

ライセンス: Link先を確認

Markus J. Buehler,

(参考訳) 生成人工知能 (AI) を用いて, 生物材料の領域における1000の科学論文を詳細なオントロジ知識グラフに変換し, その本質的に無スケールな性質を明らかにした。ノード類似度と相互中心性の組合せランキングに基づく異種概念間のグラフトラバースパス検出を用いて,クエリに応答し,知識のギャップを識別し,前例のない素材設計とその動作を提案する。ある比較では、生体材料とベートーヴェン第9交響楽団の詳細な構造的類似を明らかにし、同型写像を通して複雑さの共有パターンを強調した。このアルゴリズムはさらに、カンディンスキーのコンポジションVII(英語版)の絵画から抽出された原理とグラフサンプリングの結合合成を取り入れた革新的な階層的な菌糸体を創り出し、結果として得られる合成物はカオスと秩序のバランスを反映し、調整可能なポロシティ、機械的強度、複雑なパターン化された化学機能化などの特徴を持つ。我々は、物理的、生物学的、芸術的な領域にまたがる他の同型を解明し、ポストモダン哲学に共鳴する不純物と物質フラックスのニュアンスなオントロジーを明らかにし、これらの相互接続を階層的な枠組みに配置する。本研究は,従来の階層的パラダイムを超越した実体の動的,文脈依存的な相互作用を明らかにし,個々の構成要素の意義とシステム内のゆらぎ的関係を強調した。我々の予測は、従来の生成型AI手法よりもはるかに高い斬新さ、技術的詳細、爆発能力を達成する。このアプローチは、発見を容易にする隠れた接続を明らかにすることによって、イノベーションのための広く有用なフレームワークを確立する。

Using generative Artificial Intelligence (AI), we transformed a set of 1,000 scientific papers in the area of biological materials into detailed ontological knowledge graphs, revealing their inherently scale-free nature. Using graph traversal path detection between dissimilar concepts based on combinatorial ranking of node similarity and betweenness centrality, we reveal deep insights into unprecedented interdisciplinary relationships that can be used to answer queries, identify gaps in knowledge, and propose never-before-seen material designs and their behaviors. One comparison revealed detailed structural parallels between biological materials and Beethoven's 9th Symphony, highlighting shared patterns of complexity through isomorphic mapping. The algorithm further created an innovative hierarchical mycelium-based composite that incorporates joint synthesis of graph sampling with principles extracted from Kandinsky's Composition VII painting, where the resulting composite reflects a balance of chaos and order, with features like adjustable porosity, mechanical strength, and complex patterned chemical functionalization. We uncover other isomorphisms across physical, biological, and artistic spheres, revealing a nuanced ontology of immanence and material flux that resonates with postmodern philosophy, and positions these interconnections within a heterarchical framework. Our findings reveal the dynamic, context-dependent interplay of entities beyond traditional hierarchical paradigms, emphasizing the significant role of individual components and their fluctuative relationships within the system. Our predictions achieve a far higher degree of novelty, technical detail and explorative capacity than conventional generative AI methods. The approach establishes a widely useful framework for innovation by revealing hidden connections that facilitate discovery.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# リカレントニューラルネットワーク重み行列の有用な表現法

Learning Useful Representations of Recurrent Neural Network Weight Matrices ( http://arxiv.org/abs/2403.11998v1 )

ライセンス: Link先を確認

Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber,

(参考訳) リカレントニューラルネットワーク(Recurrent Neural Networks、RNN)は、汎用並列シーケンスコンピュータである。 RNNのプログラムはその重み行列である。下流タスクと同様に、RNN分析を容易にするRNN重みの有用な表現をどうやって学習するか? メカニスティックなアプローチは、その振る舞いを予測するためにRNNの重みを直接調べるが、機能主義的なアプローチは、その全体的な機能、特に入出力マッピングを分析する。我々は、RNN重みに対するいくつかの力学的アプローチを検討し、RNNに対して置換同変のDeep Weight Space層を適用する。我々の2つの新しい機能主義者は、入力を探索することでRNNの重みから情報を'インターロゲート'することで抽出する。機能主義的アプローチがRNNの振る舞いを決定するのに役立つリッチな表現を生成できる条件を示す理論的枠組みを開発する。 RNN重み表現学習のための最初の2つの'モデル動物園'データセットを作成し、リリースする。 1つは形式言語のクラスの生成モデルで構成され、もう1つは逐次処理されたMNIST桁の分類器である。我々は,エミュレーションに基づく自己教師付き学習技術を用いて,複数の下流アプリケーション上で異なるRNN重み符号化技術を比較し,評価する。もっとも難しいのは、RNNがトレーニングした正確なタスクを予測することであり、機能主義者のアプローチは明らかに優位性を示している。

Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality -- specifically, its input-output mapping. We consider several mechanistic approaches for RNN weights and adapt the permutation equivariant Deep Weight Space layer for RNNs. Our two novel functionalist approaches extract information from RNN weights by 'interrogating' the RNN through probing inputs. We develop a theoretical framework that demonstrates conditions under which the functionalist approach can generate rich representations that help determine RNN behavior. We create and release the first two 'model zoo' datasets for RNN weight representation learning. One consists of generative models of a class of formal languages, and the other one of classifiers of sequentially processed MNIST digits. With the help of an emulation-based self-supervised learning technique we compare and evaluate the different RNN weight encoding techniques on multiple downstream applications. On the most challenging one, namely predicting which exact task the RNN was trained on, functionalist approaches show clear superiority.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# HIRI-ViT:高分解能入力を用いた拡張型視覚変換器

HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs ( http://arxiv.org/abs/2403.11999v1 )

ライセンス: Link先を確認

Ting Yao, Yehao Li, Yingwei Pan, Tao Mei,

(参考訳) Vision Transformer(ViT)とConvolution Neural Network(CNN)のハイブリッドディープモデルは、ビジョンタスクの強力なバックボーンクラスとして登場した。このようなハイブリッドバックボーンの入力解像度のスケールアップは、モデル容量を自然に強化するが、必然的に、二次的にスケールする重い計算コストに悩まされる。代わりに、HIgh-Resolution Inputs(HIRI-ViT)を組み込んだ新しいハイブリッドバックボーンを提案し、高解像度入力に適した4段のViTから5段のViTにアップグレードする。 HIRI-ViTは、典型的なCNN操作を2つの並列CNNブランチにコスト効率よく分解するという基本的な考え方に基づいている。 1つの高分解能分岐は入力として第一の高分解能特徴を直接取り込むが、畳み込み演算は少ない。他の低解像度ブランチは、まずダウンサンプリングを行い、その後、そのような低解像度機能に対してより畳み込み演算を利用する。認識タスク(ImageNet-1Kデータセット)と高密度予測タスク(COCOおよびADE20Kデータセット)の両方の実験は、HIRI-ViTの優位性を実証している。 HIRI-ViTは448$\times$448の入力でImageNet上で84.3%の最高のTop-1精度を実現し、224$\times$224の入力で、iFormer-Sの83.4%を0.9%改善した。

The hybrid deep models of Vision Transformer (ViT) and Convolution Neural Network (CNN) have emerged as a powerful class of backbones for vision tasks. Scaling up the input resolution of such hybrid backbones naturally strengthes model capacity, but inevitably suffers from heavy computational cost that scales quadratically. Instead, we present a new hybrid backbone with HIgh-Resolution Inputs (namely HIRI-ViT), that upgrades prevalent four-stage ViT to five-stage ViT tailored for high-resolution inputs. HIRI-ViT is built upon the seminal idea of decomposing the typical CNN operations into two parallel CNN branches in a cost-efficient manner. One high-resolution branch directly takes primary high-resolution features as inputs, but uses less convolution operations. The other low-resolution branch first performs down-sampling and then utilizes more convolution operations over such low-resolution features. Experiments on both recognition task (ImageNet-1K dataset) and dense prediction tasks (COCO and ADE20K datasets) demonstrate the superiority of HIRI-ViT. More remarkably, under comparable computational cost ($\sim$5.0 GFLOPs), HIRI-ViT achieves to-date the best published Top-1 accuracy of 84.3% on ImageNet with 448$\times$448 inputs, which absolutely improves 83.4% of iFormer-S by 0.9% with 224$\times$224 inputs.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# Notochord: リアルタイムMIDIパフォーマンスのための柔軟な確率モデル

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance ( http://arxiv.org/abs/2403.12000v1 )

ライセンス: Link先を確認

Victor Shepardson, Jack Armitage, Thor Magnusson,

(参考訳) 深層学習に基づく音楽データの確率論的モデルは、ますます現実的な結果を生み出し、多くの種類の創造的ワークフローに入ることを約束している。しかし、パフォーマンスの面ではほとんど研究されていないため、ユーザアクションの結果は通常、瞬時に感じるべきである。このような研究を可能にするために、構造化イベントのシーケンスの深い確率モデルであるNotochordを設計し、Lakh MIDIデータセット上でそのインスタンスをトレーニングした。我々の確率的定式化により、サブイベントレベルでの解釈可能な介入が可能となり、1つのモデルがステアブルジェネレーション、調和、機械即興、可能性に基づくインタフェースを含む多様なインタラクティブな音楽機能のためのバックボーンとして機能する。 NotochordはポリフォニックおよびマルチトラックMIDIを生成し、10ミリ秒未満のレイテンシで入力に応答する。トレーニングコード、モデルチェックポイント、インタラクティブな例がオープンソースソフトウェアとして提供されている。

Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# DreamMotion:ゼロショットビデオ編集のための時空間自己相似スコア蒸留

DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing ( http://arxiv.org/abs/2403.12002v1 )

ライセンス: Link先を確認

Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye,

(参考訳) テキスト駆動拡散に基づくビデオ編集は、実際の動きを確立するという、画像編集の文献で遭遇しない独特な課題を提示する。既存のビデオ編集手法とは異なり,本研究では,通常の逆拡散過程を回避し,すでに自然な動きを示すビデオから最適化を開始するために,スコア蒸留サンプリングに焦点を当てる。分析の結果, ビデオスコア蒸留は, ターゲットテキストで示される新しいコンテンツを効果的に導入できる一方で, 重要な構造や動きのずれを引き起こす可能性があることがわかった。これに対抗するために,本研究では,原ビデオと編集ビデオの時空間自己相似性をスコア蒸留中にマッチングすることを提案する。スコア蒸留の応用により,本手法はモデル非依存であり,カスケードおよび非カスケードビデオ拡散フレームワークにも適用可能である。先行手法との比較により,従来の構造と動きを正確に保ちながら外観を変化させる上で,その優位性を示す。

Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video score distillation can effectively introduce new content indicated by target text, it can also cause significant structure and motion deviation. To counteract this, we propose to match space-time self-similarities of the original video and the edited video during the score distillation. Thanks to the use of score distillation, our approach is model-agnostic, which can be applied for both cascaded and non-cascaded video diffusion frameworks. Through extensive comparisons with leading methods, our approach demonstrates its superiority in altering appearances while accurately preserving the original structure and motion.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# GenView: 自己指導型学習のための事前学習型生成モデルによるビュー品質向上

GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning ( http://arxiv.org/abs/2403.12003v1 )

ライセンス: Link先を確認

Xiaojie Li, Yibo Yang, Xiangtai Li, Jianlong Wu, Yue Yu, Bernard Ghanem, Min Zhang,

(参考訳) 自己教師付き学習は、ラベルのないデータから高品質な表現を取得することに成功している。広範に採用されているコントラスト学習フレームワークは、同じ画像から生じるポジティブビュー間の距離を最小化し、不変表現を学習することを目的としている。しかし、既存の正のビューを構築する技術は手動の変換に強く依存しており、結果として多様性が限られ、潜在的に偽の正のペアが生まれる。これらの課題に対処するため、GenViewは、セマンティクスを保ちながら、事前学習された生成モデルのパワーを活用するポジティブビューの多様性を高める制御可能なフレームワークである。可変性を導入しながら本質的な意味の保存を確保するため,サンプリング中の雑音レベルを動的に調整する適応ビュー生成手法を開発した。さらに,前景の類似性と背景の多様性を両立させることにより,正の対の質を評価する品質駆動型コントラスト損失を導入する。この損失は、私たちが構築する高品質な正ペアを優先し、低品質なペアの影響を低減し、生成モデルやアグレッシブなデータ拡張によってもたらされる潜在的な意味的不整合を軽減します。肯定的なビュー品質の改善と品質主導のコントラスト損失のおかげで、GenViewはさまざまなタスクにわたる自己教師型学習を大幅に改善した。例えば、GenViewはImageNetの線形/半教師付き分類でMoCov2のパフォーマンスを2.5%/2.2%改善している。さらに、GenViewは、Laion400MやImageNet21KでImageNetデータセットをナレーション的に拡張するよりも、はるかに優れたパフォーマンスを実現している。コードはhttps://github.com/xiaojieli0903/genview.comから入手できる。

Self-supervised learning has achieved remarkable success in acquiring high-quality representations from unlabeled data. The widely adopted contrastive learning framework aims to learn invariant representations by minimizing the distance between positive views originating from the same image. However, existing techniques to construct positive views highly rely on manual transformations, resulting in limited diversity and potentially false positive pairs. To tackle these challenges, we present GenView, a controllable framework that augments the diversity of positive views leveraging the power of pretrained generative models while preserving semantics. We develop an adaptive view generation method that dynamically adjusts the noise level in sampling to ensure the preservation of essential semantic meaning while introducing variability. Additionally, we introduce a quality-driven contrastive loss, which assesses the quality of positive pairs by considering both foreground similarity and background diversity. This loss prioritizes the high-quality positive pairs we construct while reducing the influence of low-quality pairs, thereby mitigating potential semantic inconsistencies introduced by generative models and aggressive data augmentation. Thanks to the improved positive view quality and the quality-driven contrastive loss, GenView significantly improves self-supervised learning across various tasks. For instance, GenView improves MoCov2 performance by 2.5%/2.2% on ImageNet linear/semi-supervised classification. Moreover, GenView even performs much better than naively augmenting the ImageNet dataset with Laion400M or ImageNet21K. Code is available at https://github.com/xiaojieli0903/genview.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# 機械学習における信頼の可視化:2023年のフィールドの現状

Visualization for Trust in Machine Learning Revisited: The State of the Field in 2023 ( http://arxiv.org/abs/2403.12005v1 )

ライセンス: Link先を確認

Angelos Chatzimparmpas, Kostiantyn Kucher, Andreas Kerren,

(参考訳) 説明可能な信頼性のある機械学習のための可視化は、医療、金融、バイオインフォマティクスなど、さまざまな応用分野における情報可視化と視覚分析において、最も重要な研究分野の1つである。 2020年、200のテクニックからなる最先端のレポートの後、可視化技術に関する査読された論文を継続的に収集し、119のカテゴリからなる以前に確立された分類スキーマに基づいて分類し、オンラインサーベイブラウザで542のテクニックの収集を行った。本稿では,2023年秋以降のこのデータセットの新たな分析結果について報告し,機械学習における可視化利用に関するトレンド,洞察,8つのオープン課題について論じる。我々の結果は、過去3年間に機械学習モデルの信頼性を高めるための可視化技術の急成長傾向を裏付けるもので、可視化は一般的なモデル説明可能性の手法の改善や、新しいディープラーニングアーキテクチャのチェックに役立ちます。

Visualization for explainable and trustworthy machine learning remains one of the most important and heavily researched fields within information visualization and visual analytics with various application domains, such as medicine, finance, and bioinformatics. After our 2020 state-of-the-art report comprising 200 techniques, we have persistently collected peer-reviewed articles describing visualization techniques, categorized them based on the previously established categorization schema consisting of 119 categories, and provided the resulting collection of 542 techniques in an online survey browser. In this survey article, we present the updated findings of new analyses of this dataset as of fall 2023 and discuss trends, insights, and eight open challenges for using visualizations in machine learning. Our results corroborate the rapidly growing trend of visualization techniques for increasing trust in machine learning models in the past three years, with visualization found to help improve popular model explainability methods and check new deep learning architectures, for instance.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# SV3D:潜時ビデオ拡散を用いた単一画像からの新しい多視点合成と3次元生成

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion ( http://arxiv.org/abs/2403.12008v1 )

ライセンス: Link先を確認

Vikram Voleti, Chun-Han Yao, Mark Boss, Adam Letts, David Pankratz, Dmitry Tochilkin, Christian Laforte, Robin Rombach, Varun Jampani,

(参考訳) 安定ビデオ3D(SV3D) - 3Dオブジェクトの周囲の高解像度・画像・マルチビュー生成のための潜時ビデオ拡散モデルを提案する。最近の3D生成技術は、新しいビュー合成(NVS)と3D最適化のために2D生成モデルを適応させる手法を提案する。しかし、これらの手法は、限られた視点や一貫性のないNVSのいずれかのため、いくつかの欠点があり、3次元オブジェクト生成の性能に影響を及ぼす。本研究では,新たな多視点合成と3D生成に画像間拡散モデルを適用するSV3Dを提案する。また,SV3DとそのNVS出力を画像から3D生成に利用するための改良された3D最適化手法を提案する。 2Dと3Dのメトリクスを持つ複数のデータセットの大規模な実験結果とユーザスタディは、SV3DのNVSにおける最先端のパフォーマンスと、以前の作業と比較して3D再構成を実証している。

We present Stable Video 3D (SV3D) -- a latent video diffusion model for high-resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent work on 3D generation propose techniques to adapt 2D generative models for novel view synthesis (NVS) and 3D optimization. However, these methods have several disadvantages due to either limited views or inconsistent NVS, thereby affecting the performance of 3D object generation. In this work, we propose SV3D that adapts image-to-video diffusion model for novel multi-view synthesis and 3D generation, thereby leveraging the generalization and multi-view consistency of the video models, while further adding explicit camera control for NVS. We also propose improved 3D optimization techniques to use SV3D and its NVS outputs for image-to-3D generation. Extensive experimental results on multiple datasets with 2D and 3D metrics as well as user study demonstrate SV3D's state-of-the-art performance on NVS as well as 3D reconstruction compared to prior works.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# ビデオMV:大容量映像生成モデルに基づく連続マルチビュー生成

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model ( http://arxiv.org/abs/2403.12010v1 )

ライセンス: Link先を確認

Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang,

(参考訳) テキストやシングルイメージのプロンプトに基づいてマルチビュー画像を生成することは、3Dコンテンツを作成する上で重要な機能である。このトピックに関する2つの基本的な質問は、トレーニングに使用するデータと、マルチビューの一貫性を保証する方法です。本稿では,両質問に基礎的貢献を行う新しい枠組みを紹介する。トレーニングのために2次元拡散モデルからの画像を利用するのと異なり、市販のビデオ生成モデルから微調整された密集した一貫した多視点生成モデルを提案する。映像生成モデルからのイメージは、フレームの一貫性を強制するために時間モジュールを使用するため、マルチビュー生成に適している。さらに、これらのモデルをトレーニングするために使用されるビデオデータセットは多種多様であり、列車の微調整領域のギャップを減らしている。マルチビューの整合性を高めるために,まずフィードフォワード再構成モジュールを用いてグローバルな3Dモデルを得る3D-Aware Denoising Samplingを導入し,次に,グローバルな3Dモデルから描画された画像をデノージングサンプリングループに効果的に巻き込むサンプリング戦略を適用し,最終画像のマルチビュー整合性を改善する。副産物として、このモジュールはまた、数秒で3Dガウスアンによって表される3Dアセットを作成する高速な方法を提供する。当社のアプローチでは24の濃密なビューを生成して,最先端のアプローチ(4GPU時間と数千GPU時間)よりもはるかに高速に,視覚的品質と一貫性を両立することが可能です。さらに微調整を行うことで、既存の最先端手法よりも定量的メトリクスと視覚効果の両面で優れる。プロジェクトページは aigc3d.github.io/MVMV。

Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# HOIDiffusion:リアルな3Dハンドオブジェクトインタラクションデータを生成する

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data ( http://arxiv.org/abs/2403.12011v1 )

ライセンス: Link先を確認

Mengqi Zhang, Yang Fu, Zheng Ding, Sifei Liu, Zhuowen Tu, Xiaolong Wang,

(参考訳) 3Dハンドオブジェクトのインタラクションデータは、データ収集プロセスのスケールアップにおけるハードウェア上の制約のため、ほとんどありません。本稿では,現実的かつ多様な3次元ハンドオブジェクトインタラクションデータを生成するためのHOIDiffusionを提案する。本モデルは,3次元手対象幾何学構造とテキスト記述を画像合成の入力として用いた条件拡散モデルである。これは、構造とスタイルの入力を非交互に指定できるため、より制御可能で現実的な合成を提供する。 HOIDiffusionは、大規模な自然画像と数枚の人間の実演で事前訓練された拡散モデルを活用することで訓練される。制御可能な画像合成以外にも、生成した3Dデータを用いて6次元オブジェクトのポーズ推定を学習し、認識システムの改善にその効果を示す。プロジェクトページ: https://mq-zhang1.github.io/HOIDiffusion

3D hand-object interaction data is scarce due to the hardware constraints in scaling up the data collection process. In this paper, we propose HOIDiffusion for generating realistic and diverse 3D hand-object interaction data. Our model is a conditional diffusion model that takes both the 3D hand-object geometric structure and text description as inputs for image synthesis. This offers a more controllable and realistic synthesis as we can specify the structure and style inputs in a disentangled manner. HOIDiffusion is trained by leveraging a diffusion model pre-trained on large-scale natural images and a few 3D human demonstrations. Beyond controllable image synthesis, we adopt the generated 3D data for learning 6D object pose estimation and show its effectiveness in improving perception systems. Project page: https://mq-zhang1.github.io/HOIDiffusion

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# リー群に対するKineetic Langevin Monte Carloの収束性

Convergence of Kinetic Langevin Monte Carlo on Lie groups ( http://arxiv.org/abs/2403.12012v1 )

ライセンス: Link先を確認

Lingkai Kong, Molei Tao,

(参考訳) リー群上で定義される関数を最適化するための明示的で運動量に基づく力学は、変分最適化や左自明化といった手法に基づいて最近構築された。我々は、ポテンシャル関数が多様体上に存在するにもかかわらず、運動量変数がユークリッドであるという利点を生かして、最適化力学をサンプリング力学に変換するために、トラクタブルノイズを適切に加える。次に,Lie群MCMCサンプリング器を提案し,その結果の速度論的ラージビン型サンプリングダイナミクスを微妙に判別する。リー群構造は、この離散化によって正確に保存される。連続力学と離散サンプリング器の両方に対する明示的な収束率を持つ指数収束は、W2距離の下で証明される。リー群のコンパクト性とポテンシャル関数の測地的L-滑らか性のみが必要である。我々の知る限りでは、これは曲線空間上での動力学ランゲヴィンの初めての収束結果であり、凸性を必要としない最初の定量的結果である。

Explicit, momentum-based dynamics for optimizing functions defined on Lie groups was recently constructed, based on techniques such as variational optimization and left trivialization. We appropriately add tractable noise to the optimization dynamics to turn it into a sampling dynamics, leveraging the advantageous feature that the momentum variable is Euclidean despite that the potential function lives on a manifold. We then propose a Lie-group MCMC sampler, by delicately discretizing the resulting kinetic-Langevin-type sampling dynamics. The Lie group structure is exactly preserved by this discretization. Exponential convergence with explicit convergence rate for both the continuous dynamics and the discrete sampler are then proved under W2 distance. Only compactness of the Lie group and geodesically L-smoothness of the potential function are needed. To the best of our knowledge, this is the first convergence result for kinetic Langevin on curved spaces, and also the first quantitative result that requires no convexity or, at least not explicitly, any common relaxation such as isoperimetry.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# GeoWizard: 単一画像からの3次元幾何推定のための拡散優先事項の解放

GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image ( http://arxiv.org/abs/2403.12013v1 )

ライセンス: Link先を確認

Xiao Fu, Wei Yin, Mu Hu, Kaixuan Wang, Yuexin Ma, Ping Tan, Shaojie Shen, Dahua Lin, Xiaoxiao Long,

(参考訳) 幾何学的属性,例えば深さ,正規度を単一画像から推定するための新しい生成基盤モデルであるGeoWizardを紹介する。この領域ではすでに重要な研究が行われているが、公開データセットの多様性と品質の低さにより、進歩は著しく制限されている。結果として、以前の作品は限られたシナリオに制約されるか、幾何学的詳細を捉えることができないことに悩まされる。本稿では、従来の識別モデル(例えば、CNN、トランスフォーマー)とは対照的に、生成モデルは本質的に不適切な問題に効果的に対処できることを実証する。さらに,拡散前処理の活用により,資源利用の一般化,詳細な保存,効率性が著しく向上することが示唆された。具体的には,従来の安定拡散モデルを拡張して,両表現間の相互情報交換と高整合性を実現する。より重要なことは、様々なシーンの複雑なデータ分布を異なるサブディストリビューションに分離する、単純かつ効果的な戦略を提案することである。この戦略により,我々のモデルは異なるシーンレイアウトを認識でき,顕著な忠実さで3次元幾何学を捉えることができる。 GeoWizardは、ゼロショット深度と通常の予測のための新しいベンチマークを設定し、3D再構成、2Dコンテンツ作成、新しい視点合成など、多くの下流アプリケーションを大幅に強化した。

We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes, e.g., depth and normals, from single images. While significant research has already been conducted in this area, the progress has been substantially limited by the low diversity and poor quality of publicly available datasets. As a result, the prior works either are constrained to limited scenarios or suffer from the inability to capture geometric details. In this paper, we demonstrate that generative models, as opposed to traditional discriminative models (e.g., CNNs and Transformers), can effectively address the inherently ill-posed problem. We further show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage. Specifically, we extend the original stable diffusion model to jointly predict depth and normal, allowing mutual information exchange and high consistency between the two representations. More importantly, we propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions. This strategy enables our model to recognize different scene layouts, capturing 3D geometry with remarkable fidelity. GeoWizard sets new benchmarks for zero-shot depth and normal prediction, significantly enhancing many downstream applications such as 3D reconstruction, 2D content creation, and novel viewpoint synthesis.

翻訳日:2024-03-20 19:11:08 公開日:2024-03-18

# EnvGen: 人工呼吸器を訓練するためのLLMによる環境の生成と適応

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents ( http://arxiv.org/abs/2403.12014v1 )

ライセンス: Link先を確認

Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal,

(参考訳) 近年のSOTAアプローチでは,環境における次のステップを決定するために,大規模言語モデル(LLM)を直接エージェントとして採用している。世界的知識と推論能力のため、LLMエージェントは強化学習(RL)に基づく従来のより小さなエージェントよりも高い性能を達成するが、LLMを頻繁に呼び出すのは遅くて高価である。 LLMをエージェントとして直接利用する代わりに、LLMの推論機能を使用して、より小さなRLエージェントが、彼らが弱いという有用なスキルを学ぶのに役立つトレーニング環境を適応的に作成できますか? 本稿では,この問題に対処するための新しいフレームワークであるEnvGenを提案する。まず LLM に,エージェントが並列に異なるタスクを素早く学習できるように訓練環境を生成するように促す。具体的には、LLMには、エージェントが学習すべきタスク記述とシミュレーターの目的が与えられ、その後、環境設定(例えば、異なる地形、エージェントに与えられたアイテムなど)のセットを生成するように要求される。次に、LLM生成環境とLLM生成環境を混合した小さなRLエージェントを訓練する。次に, LLMが生成した環境を継続的に適応させ, エージェントのパフォーマンスの形でLLMにフィードバックを提供することにより, エージェントが弱いスキルを徐々に向上させる。 Crafter および Heist 環境での総合的な実験により,EnvGen の有用性を実証する。我々は、EnvGenで訓練された小さなRLエージェントが、GPT-4エージェントを含むSOTAメソッドより優れており、長い水平タスクをかなり高速に学習できることを発見した。我々は、LLMがトレーニング環境に適応し、RLエージェントのより弱いスキルを時間とともに改善する方法を定性的に示す。加えて、EnvGen は LLM コールを少数(例えば、合計 4 個)しか使用していないのに対して、LLM エージェントは数千個の LLM コールを必要とするため、かなり効率的である。最後に、設計選択に関する詳細なアブレーション研究について述べる。

Recent SOTA approaches for embodied learning via interaction directly employ large language models (LLMs) as agents to determine the next steps in an environment. Due to their world knowledge and reasoning capabilities, LLM agents achieve stronger performance than previous smaller agents based on reinforcement learning (RL); however, frequently calling LLMs is slow and expensive. Instead of directly employing LLMs as agents, can we use LLMs' reasoning capabilities to adaptively create training environments to help smaller embodied RL agents learn useful skills that they are weak at? We propose EnvGen, a novel framework to address this question. First, we prompt an LLM to generate training environments that allow agents to quickly learn different tasks in parallel. Concretely, the LLM is given the task description and simulator objectives that the agents should learn and is then asked to generate a set of environment configurations (e.g., different terrains, items given to agents, etc.). Next, we train a small RL agent in a mixture of the original and LLM-generated environments. Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance. We demonstrate the usefulness of EnvGen with comprehensive experiments in Crafter and Heist environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster. We show qualitatively how the LLM adapts training environments to help improve RL agents' weaker skills over time. Additionally, EnvGen is substantially more efficient as it only uses a small number of LLM calls (e.g., 4 in total), whereas LLM agents require thousands of LLM calls. Lastly, we present detailed ablation studies for our design choices.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# 潜伏拡散蒸留による高速高分解能画像合成

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation ( http://arxiv.org/abs/2403.12015v1 )

ライセンス: Link先を確認

Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach,

(参考訳) 拡散モデルは画像合成とビデオ合成の進歩の主要因であるが、推論速度の遅さに悩まされている。最近導入された逆拡散蒸留(ADD)のように、蒸留法は、固定された事前訓練されたDINOv2識別器に依存するため、高価で困難な最適化を犠牲にして、モデルを多段式から単段式にシフトすることを目的としている。 ADDの限界を克服する新しい蒸留法であるLADD(Latent Adversarial Diffusion Distillation)を導入する。ピクセルベースのADDとは対照的に、LADDは事前訓練された潜伏拡散モデルから生成的特徴を利用する。このアプローチは、訓練を単純化し、性能を向上し、高分解能マルチアスペクト比画像合成を可能にする。 LADDを安定拡散3 (8B) に適用し, 4つの無誘導サンプリングステップのみを用いて, 最先端のテキスト・画像生成装置の性能に適合する高速モデルSD3-Turboを得る。さらに,そのスケーリング動作を体系的に検討し,画像編集やインペイントなどの様々な応用においてLADDの有効性を示す。

Diffusion models are the main driver of progress in image and video synthesis, but suffer from slow inference speed. Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator. We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD. In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models. This approach simplifies training and enhances performance, enabling high-resolution multi-aspect ratio image synthesis. We apply LADD to Stable Diffusion 3 (8B) to obtain SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. Moreover, we systematically investigate its scaling behavior and demonstrate LADD's effectiveness in various applications such as image editing and inpainting.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# 逆強化学習としてのファインチューニング

Supervised Fine-Tuning as Inverse Reinforcement Learning ( http://arxiv.org/abs/2403.12017v1 )

ライセンス: Link先を確認

Hao Sun,

(参考訳) LLM(Large Language Models)の整合性に対する一般的なアプローチは、一般的に人間やAIのフィードバックに依存し、特定のタイプの嗜好データセットへのアクセスを前提としている。本研究では,このようなデータセットの有効性に疑問を呈し,専門家による実演との整合性がより現実的であることを証明した様々なシナリオを探索する。実演データセットを用いてLCMを整列する問題を定式化するための逐次的意思決定フレームワークを構築した。逆強化学習と模倣学習から洞察を得た上で,LLMアライメントタスクにおける分散化最小化のための様々なアプローチを提案する。分析では、これらの異なるアプローチの質量探索とモード探索の挙動を強調した。包括的に,古典的微調整法の長所と短所を考察し,異なる方法が輝くシナリオについて検討した。

The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback and assumes access to specific types of preference datasets. In our work, we question the efficacy of such datasets and explore various scenarios where alignment with expert demonstrations proves more realistic. We build a sequential decision-making framework to formulate the problem of aligning LLMs using demonstration datasets. Drawing insights from inverse reinforcement learning and imitation learning, we introduce various approaches for divergence minimization in the LLM alignment tasks. Our analysis highlights the mass-covering and mode-seeking behaviors of these different approaches. Inclusively, we examine the pros and cons of the classical supervised fine-tuning method, elaborating on scenarios where different methods shine.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# 一般化された波多野・ネルソンモデルにおける任意の順序の例外点

Exceptional points of any order in a generalized Hatano-Nelson model ( http://arxiv.org/abs/2403.12018v1 )

ライセンス: Link先を確認

Julius T. Gohsrich, Jacob Fauman, Flore K. Kunst,

(参考訳) 例外点(EP)は真に非エルミート(NH)退化であり、行列が欠陥となる。そのようなEPの順序は、結合固有ベクトルの数によって与えられる。一方、ほとんどの研究は、$N\leq4$-dimensional NH Bloch Hamiltonians における$N$th-order EPの研究に焦点を当てている。一方, NHスキン効果を示すモデルでは, システムサイズでスケールする順序のEPの存在が指摘されている。本稿では,新しいタイプのEPを紹介し,システムサイズにスケールしない任意の順序でEPを実現する方法を提案する。より長距離ホッピングを持つパラダイム的ハタノ・ネルソンモデルの一般化版を導入する。この系に存在するEPは、顕著な物理的特徴を示す:それらの関連する固有状態は、いくつかの部位に局在し、NH皮膚効果を示す。さらに、EPはホッピング強度の一般的な摂動や、特定の形態のオンサイト障害に対して堅牢である。

Exceptional points (EPs) are truly non-Hermitian (NH) degeneracies where matrices become defective. The order of such an EP is given by the number of coalescing eigenvectors. On the one hand, most work focusses on studying $N$th-order EPs in $N\leq4$-dimensional NH Bloch Hamiltonians. On the other hand, some works have remarked on the existence of EPs of orders scaling with systems size in models exhibiting the NH skin effect. In this letter, we introduce a new type of EP and provide a recipe on how to realize EPs of arbitrary order not scaling with system size. We introduce a generalized version of the paradigmatic Hatano-Nelson model with longer-range hoppings. The EPs existing in this system show remarkable physical features: Their associated eigenstates are localized on a subset of sites and are exhibiting the NH skin effect. Furthermore, the EPs are robust against generic perturbations in the hopping strengths as well as against a specific form of on-site disorder.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# LN3Diff:高速3次元生成のためのスケーラブル潜在ニューラルネットワーク拡散

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation ( http://arxiv.org/abs/2403.12019v1 )

ライセンス: Link先を確認

Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy,

(参考訳) ニューラルレンダリングの分野は、生成モデルと微分可能なレンダリング技術の進歩により、大きな進歩をみせた。 2次元拡散は成功したが、統一された3次元拡散パイプラインは依然として未解決のままである。本稿では,このギャップに対処し,高速で高品質で汎用的な条件付き3D生成を可能にするLN3Diffという新しいフレームワークを提案する。提案手法では,3次元アーキテクチャと可変オートエンコーダ(VAE)を用いて,入力画像を構造化されたコンパクトな3次元ラテント空間に符号化する。潜伏剤は、トランスフォーマーベースのデコーダによって、高容量の3Dニューラルフィールドに復号される。この3D対応潜伏空間上での拡散モデルをトレーニングすることにより,ShapeNetの3D生成における最先端性能を実現し,各データセットにおけるモノラルな3D再構成と条件付き3D生成において優れた性能を示す。さらに、既存の3次元拡散法を推論速度で上回り、インスタンスごとの最適化を必要としない。提案するLN3Diffは3次元生成モデリングの大幅な進歩を示し、3次元視覚およびグラフィックスタスクにおける様々な応用を約束する。

The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harnesses a 3D-aware architecture and variational autoencoder (VAE) to encode the input image into a structured, compact, and 3D latent space. The latent is decoded by a transformer-based decoder into a high-capacity 3D neural field. Through training a diffusion model on this 3D-aware latent space, our method achieves state-of-the-art performance on ShapeNet for 3D generation and demonstrates superior performance in monocular 3D reconstruction and conditional 3D generation across various datasets. Moreover, it surpasses existing 3D diffusion methods in terms of inference speed, requiring no per-instance optimization. Our proposed LN3Diff presents a significant advancement in 3D generative modeling and holds promise for various applications in 3D vision and graphics tasks.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# 4つの筆記システムの探索と標準化による法キエン二重翻訳の強化

Enhancing Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems ( http://arxiv.org/abs/2403.12024v1 )

ライセンス: Link先を確認

Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai,

(参考訳) 機械翻訳は主に高リソース言語(HRL)に重点を置いているが、台湾のホッキエンのような低リソース言語(LRL)は比較的研究が進んでいない。本研究は,台湾のホッキエンと中国語と英語の二重翻訳モデルを開発することにより,このギャップを解消することを目的とする。台湾のホッキエン漢と伝統的なマンダリン中国語の正書法的類似性を活用するために,従来のマンダリン中国語に特化して訓練済みのLLaMA2-7Bモデルを用いる。本研究の総合的な実験は,台湾のホクキエンや台湾のホクキエン,その他のHRL間の様々な書記システムにおける翻訳作業を含む。限定的な単言語コーパスの使用により,台湾のホッキエン能力がさらに向上することが判明した。そして、翻訳モデルを用いて、台湾のすべての法キエン文字体系を北キエン漢に標準化し、さらなる性能向上を実現した。さらに,逆翻訳とGPT-4を併用した評価手法を導入し,LRLにおいても信頼性の高い翻訳品質評価を実現する。この研究は台湾のホッキエンの資源ギャップを狭めることに寄与し、LLaMA 2.0に基づく事前学習と微調整の利点と限界を実証的に研究している。

Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Taiwanese Hokkien are relatively under-explored. This study aims to address this gap by developing a dual translation model between Taiwanese Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA2-7B model specialized in Traditional Mandarin Chinese to leverage the orthographic similarities between Taiwanese Hokkien Han and Traditional Mandarin Chinese. Our comprehensive experiments involve translation tasks across various writing systems of Taiwanese Hokkien and between Taiwanese Hokkien and other HRLs. We find that the use of a limited monolingual corpus also further improve the model's Taiwanese Hokkien capabilities. We then utilize our translation model to standardize all Taiwanese Hokkien writing systems into Hokkien Han, resulting in further performance improvements. Additionally, we introduce an evaluation method incorporating back-translation and GPT-4 to ensure reliable translation quality assessment even for LRLs. The study contributes to narrowing the resource gap for Taiwanese Hokkien and empirically investigates the advantages and limitations of pre-training and fine-tuning based on LLaMA 2.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# 大規模言語モデルにおけるヘルスエクイティ・ハームとバイアスに対するツールボックス

A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models ( http://arxiv.org/abs/2403.12025v1 )

ライセンス: Link先を確認

Stephen R. Pfohl, Heather Cole-Lewis, Rory Sayres, Darlene Neal, Mercy Asiedu, Awa Dieng, Nenad Tomasev, Qazi Mamunur Rashid, Shekoofeh Azizi, Negar Rostamzadeh, Liam G. McCoy, Leo Anthony Celi, Yun Liu, Mike Schaekermann, Alanna Walton, Alicia Parrish, Chirag Nagpal, Preeti Singh, Akeiylah Dewitt, Philip Mansfield, Sushant Prakash, Katherine Heller, Alan Karthikesalingam, Christopher Semturs, Joelle Barral, Greg Corrado, Yossi Matias, Jamila Smith-Loud, Ivor Horn, Karan Singhal,

(参考訳) 大規模言語モデル(LLM)は、複雑な健康情報を提供するという大きな約束を持っているが、健康格差を悪化させる可能性がある。エクイティ関連モデル失敗の信頼性評価は、ヘルスエクイティを促進するシステムを開発するための重要なステップである。本研究は,医学的質問に対するLLM生成の長期的回答において,株式関連害を沈降させる可能性を秘めたバイアスを克服し,Med-PaLM 2を用いて経験的ケーススタディを実施し,その結果,これまでで最大の人的評価研究となった。 EquityMedQAは、手動で計算し、LLMで生成した質問を敵対的クエリに富んだ7つの新たにリリースしたデータセットの集合である。我々の人間評価フレームワークとデータセット設計プロセスは、反復的な参加的アプローチと、Med-PaLM 2の逆クエリに対するバイアスの可能性を検証している。実験的な研究を通じて,複数の評価ルーブリックデザインと多様なレーダグループを活用する徹底的な評価プロトコルと組み合わせることで,より狭い評価アプローチによって見逃される可能性のあるバイアスを表面化することを発見した。我々の経験は、多様な評価手法を使うことの重要性と、様々なバックグラウンドや専門知識のラウンダーを巻き込むことの重要性を浮き彫りにしている。我々は、我々のフレームワークが特定のバイアスの種類を特定することはできるが、AIシステムの展開が同等の健康結果を促進するかどうかを全体論的に評価することは十分ではないことを強調する。より広いコミュニティがこれらのツールや手法を活用して、誰もがアクセス可能で公平な医療を促進するLLMの共通の目標を実現することを願っています。

Large language models (LLMs) hold immense promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. In this work, we present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and then conduct an empirical case study with Med-PaLM 2, resulting in the largest human evaluation study in this area to date. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases, and EquityMedQA, a collection of seven newly-released datasets comprising both manually-curated and LLM-generated questions enriched for adversarial queries. Both our human assessment framework and dataset design process are grounded in an iterative participatory approach and review of possible biases in Med-PaLM 2 answers to adversarial queries. Through our empirical study, we find that the use of a collection of datasets curated through a variety of methodologies, coupled with a thorough evaluation protocol that leverages multiple assessment rubric designs and diverse rater groups, surfaces biases that may be missed via narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. We emphasize that while our framework can identify specific forms of bias, it is not sufficient to holistically assess whether the deployment of an AI system promotes equitable health outcomes. We hope the broader community leverages and builds on these tools and methods towards realizing a shared goal of LLMs that promote accessible and equitable healthcare for all.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# FlexCap: 画像にリッチ、ローカライズ、フレキシブルなキャプションを生成する

FlexCap: Generating Rich, Localized, and Flexible Captions in Images ( http://arxiv.org/abs/2403.12026v1 )

ライセンス: Link先を確認

Debidatta Dwibedi, Vidhi Jain, Jonathan Tompson, Andrew Zisserman, Yusuf Aytar,

(参考訳) 様々な長さの領域固有の記述を生成できる汎用的な$\textit{flexible-captioning}$ Vision-Language Model (VLM)を導入する。モデルであるFlexCapは、入力バウンディングボックスのための長さ条件付きキャプションを生成するように訓練されており、これにより、簡潔なオブジェクトラベルから詳細なキャプションまで、その出力の情報密度を制御できる。これを実現するために、キャプション付き画像から、長さの異なる画像領域記述の大規模なトレーニングデータセットを作成する。この柔軟なカプセル化機能には、いくつかの価値のある応用がある。まず、FlexCapはVisual Genomeデータセットの高密度キャプションタスクにおいて優れたパフォーマンスを示す。第二に、視覚的質問応答(VQA)システムはFlexCapを利用して、大きな言語モデルへの入力として局所化された記述を生成することができる。得られたシステムは、多数のVQAデータセット上で最先端のゼロショット性能を達成する。また、FlexCapを使った$\textit{localize-then-describe}$アプローチは、他のVLMによる$\textit{describe-then-localize}$アプローチよりも、オープンなオブジェクト検出に優れていることを示す。我々は,プレフィックス条件付けによって様々な視覚情報を抽出するFlexCapの特徴を強調した。最後に、画像ラベリング、オブジェクト属性認識、ビジュアルダイアログといったタスクにおいてFlexCapの幅広い適用性を質的に示す。プロジェクトWebページ: https://flex-cap.github.io 。

We introduce a versatile $\textit{flexible-captioning}$ vision-language model (VLM) capable of generating region-specific descriptions of varying lengths. The model, FlexCap, is trained to produce length-conditioned captions for input bounding boxes, and this allows control over the information density of its output, with descriptions ranging from concise object labels to detailed captions. To achieve this we create large-scale training datasets of image region descriptions of varying length, starting from captioned images. This flexible-captioning capability has several valuable applications. First, FlexCap demonstrates superior performance in dense captioning tasks on the Visual Genome dataset. Second, a visual question answering (VQA) system can be built by employing FlexCap to generate localized descriptions as inputs to a large language model. The resulting system achieves state-of-the-art zero-shot performance on a number of VQA datasets. We also demonstrate a $\textit{localize-then-describe}$ approach with FlexCap can be better at open-ended object detection than a $\textit{describe-then-localize}$ approach with other VLMs. We highlight a novel characteristic of FlexCap, which is its ability to extract diverse visual information through prefix conditioning. Finally, we qualitatively demonstrate FlexCap's broad applicability in tasks such as image labeling, object attribute recognition, and visual dialog. Project webpage: https://flex-cap.github.io .

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# Pixelsからインサイトへ:大規模基盤モデルの時代における自動チャート理解に関する調査

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models ( http://arxiv.org/abs/2403.12027v1 )

ライセンス: Link先を確認

Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji,

(参考訳) グラフ形式のデータの可視化は、データ分析において重要な役割を担い、重要な洞察を提供し、情報的な意思決定を支援する。自動チャート理解は、近年の大規模基盤モデルの台頭とともに、大きな進歩をみせている。大規模言語モデル(LLM)のような基礎モデルは、様々な自然言語処理(NLP)タスクに革命をもたらし、チャート理解タスクにますます応用されている。本稿では,これらの基礎モデルの文脈におけるチャート理解の最近の展開,課題,今後の方向性について概観する。この論文は、チャート理解、問題定式化の概要、およびチャート理解タスクの研究に不可欠な基本的な構成要素について議論することから始まる。タスクとデータセットの節では、チャート理解の中で様々なタスクを探索し、それらの評価指標と、チャートとテキストのインプットの両方のソースについて議論する。次に、分類ベースと生成ベースの両方のアプローチと、チャート理解性能を高めるツール拡張技術を含むモデリング戦略について検討する。さらに、各タスクの最先端性能について論じ、その性能を改善する方法について論じる。課題と今後の方向性は専用のセクションで対処され、ドメイン固有のチャート、評価への取り組みの欠如、エージェント指向の設定などの課題が強調される。本研究は,大規模基盤モデルを活用したチャート理解における今後の研究に有用な洞察と方向性を提供するものである。この論文で言及された研究は、新しい研究とともに、次のように継続的に更新される。

Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models (LLMs), have revolutionized various natural language processing (NLP) tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. The paper begins by defining chart understanding, outlining problem formulations, and discussing fundamental building blocks crucial for studying chart understanding tasks. In the section on tasks and datasets, we explore various tasks within chart understanding and discuss their evaluation metrics and sources of both charts and textual inputs. Modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed in a dedicated section, highlighting issues such as domain-specific charts, lack of efforts in evaluation, and agent-oriented settings. This survey paper serves to provide valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# ウルトラマン:ウルトラスピードと細部を兼ね備えた1枚の3D人間の再構築

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail ( http://arxiv.org/abs/2403.12028v1 )

ライセンス: Link先を確認

Mingjin Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao,

(参考訳) 3Dの人体再構築はコンピュータビジョンの分野において課題となっている。従来の方法は、しばしば時間がかかり、人体の詳細な外観を捉えるのが困難である。本論文では,1枚の画像からテクスチャ化された3次元人間のモデルを高速に再現する手法である「emph{Ultraman}」を提案する。既存の技術と比較すると, 高品質なテクスチャの詳細を保存しながら, 復元速度と精度を大幅に向上させる。本稿では,3つの部分,幾何学的再構成,テクスチャ生成,テクスチャマッピングからなる,人間の再構築のための新しい枠組みを提案する。まず、メッシュ再構成フレームワークを使用し、単一の画像から正確に3次元の人体形状を抽出する。同時に,一つの画像に基づいて人体の多視点一貫した画像を生成する手法を提案する。最終的に、テクスチャの細部を最適化し、再構築時の色の整合性を確保する新しいテクスチャマッピング手法と組み合わせられる。実験や評価を通じて,各種標準データセット上での \emph{Ultraman} の優れた性能を示す。さらに、emph{Ultraman} は人間のレンダリング品質とスピードの点で最先端の手法よりも優れています。この記事が受理されると、コードとデータを公開します。

3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture mapping. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture mapping method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# Align and Distill: ドメイン適応型オブジェクト検出の統一と改善

Align and Distill: Unifying and Improving Domain Adaptive Object Detection ( http://arxiv.org/abs/2403.12029v1 )

ライセンス: Link先を確認

Justin Kay, Timm Haucke, Suzanne Stathatos, Siqi Deng, Erik Young, Pietro Perona, Sara Beery, Grant Van Horn,

(参考訳) オブジェクト検出器は、トレーニングセットと異なるデータに対して、よく機能しない。ドメイン適応オブジェクト検出(DAOD)手法は近年,この問題に対処する上で大きな成果を上げている。残念ながら、過去の結果を疑問視し、さらなる進歩を妨げるような、体系的なベンチマークの落とし穴を特定します。 (a)低出力ベースラインによる性能の過大評価ロ方法の透明な比較を防止する不整合な実施方法及び (c)時代遅れのバックボーンとベンチマークの多様性の欠如による一般性の欠如。 1) DAODメソッドの比較と今後の開発を支援するALDI(Align and Distill)と,(2) ベンチマークの落とし穴に対処するDAODのための公正かつ現代的なトレーニングおよび評価プロトコル,(3) 新しいDAODベンチマークデータセットであるCFC-DAOD,(4) さまざまな実世界のデータに対する評価を可能にする新たな手法であるALDI++。 ALDI++は、Cityscapesで+3.5 AP50、Sim10kで+5.7 AP50、Cityscapesで+5.7 AP50、CFC Kenai to Channelで+2.0 AP50よりもパフォーマンスが高い。我々のフレームワーク、データセット、最先端の手法はDAODにとって重要なリセットを提供し、将来の研究の強力な基盤を提供する。コードとデータは以下の通りである。 https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting。

Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on addressing this challenge. Unfortunately, we identify systemic benchmarking pitfalls that call past results into question and hamper further progress: (a) Overestimation of performance due to underpowered baselines, (b) Inconsistent implementation practices preventing transparent comparisons of methods, and (c) Lack of generality due to outdated backbones and lack of diversity in benchmarks. We address these problems by introducing: (1) A unified benchmarking and implementation framework, Align and Distill (ALDI), enabling comparison of DAOD methods and supporting future development, (2) A fair and modern training and evaluation protocol for DAOD that addresses benchmarking pitfalls, (3) A new DAOD benchmark dataset, CFC-DAOD, enabling evaluation on diverse real-world data, and (4) A new method, ALDI++, that achieves state-of-the-art results by a large margin. ALDI++ outperforms the previous state-of-the-art by +3.5 AP50 on Cityscapes to Foggy Cityscapes, +5.7 AP50 on Sim10k to Cityscapes (where ours is the only method to outperform a fair baseline), and +2.0 AP50 on CFC Kenai to Channel. Our framework, dataset, and state-of-the-art method offer a critical reset for DAOD and provide a strong foundation for future research. Code and data are available: https://github.com/justinkay/aldi and https://github.com/visipedia/caltech-fish-counting.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# 事前学習モデルに基づくクラスインクリメンタル学習のための拡張可能なサブスペースアンサンブル

Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning ( http://arxiv.org/abs/2403.12030v1 )

ライセンス: Link先を確認

Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye, De-Chuan Zhan,

(参考訳) CIL(Class-Incremental Learning)は、学習システムにおいて、忘れずに新しいクラスを継続的に学習することを必要とする。 CILにおけるPTM(Pre-Trained Models)の強いパフォーマンスにもかかわらず、重要な問題は続く。ネットワークの過剰な変更は忘れを引き起こすが、最小限の調整は新しいクラスに不適当である。その結果,従来の知識を損なうことなく,効率的なモデル更新方法を見出すことが望まれる。本稿では,PTMベースのCILのためのExpAndable Subspace Ensemble (EASE)を提案する。コンフリクトのないモデル更新を可能にするため、タスク固有のサブスペースを作成することを目的として、新しいタスクごとに異なる軽量アダプタモジュールをトレーニングする。これらのアダプタは高次元の特徴空間にまたがり、複数の部分空間をまたいだ共同決定を可能にする。データが進化するにつれて、拡張サブスペースは古いクラス分類器を新しいステージ空間と互換性のないものにする。それに対応して、古いクラスのインスタンスを使わずに、古いクラスの新機能を合成する意味誘導型プロトタイプ補完戦略を設計する。 7つのベンチマークデータセットに対する大規模な実験は、EASEの最先端のパフォーマンスを検証する。コードは、https://github.com/sun-hailong/CVPR24-Easeで入手できる。

Class-Incremental Learning (CIL) requires a learning system to continually learn new classes without forgetting. Despite the strong performance of Pre-Trained Models (PTMs) in CIL, a critical issue persists: learning new classes often results in the overwriting of old ones. Excessive modification of the network causes forgetting, while minimal adjustments lead to an inadequate fit for new classes. As a result, it is desired to figure out a way of efficient model updating without harming former knowledge. In this paper, we propose ExpAndable Subspace Ensemble (EASE) for PTM-based CIL. To enable model updating without conflict, we train a distinct lightweight adapter module for each new task, aiming to create task-specific subspaces. These adapters span a high-dimensional feature space, enabling joint decision-making across multiple subspaces. As data evolves, the expanding subspaces render the old class classifiers incompatible with new-stage spaces. Correspondingly, we design a semantic-guided prototype complement strategy that synthesizes old classes' new features without using any old class instance. Extensive experiments on seven benchmark datasets verify EASE's state-of-the-art performance. Code is available at: https://github.com/sun-hailong/CVPR24-Ease

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# RouterBENCH:マルチLLMルーティングシステムのベンチマーク

ROUTERBENCH: A Benchmark for Multi-LLM Routing System ( http://arxiv.org/abs/2403.12031v1 )

ライセンス: Link先を確認

Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, Shriyash Kaustubh Upadhyay,

(参考訳) 大規模言語モデル(LLM)のアプリケーションの範囲が拡大し続けており、効果的なサービスソリューションの需要がますます重要になっている。 LLMの汎用性にもかかわらず、特にパフォーマンスとコストのバランスをとる場合、すべてのタスクやアプリケーションに最適なモデルが存在しない。この制限により、個々のLSMの制約を克服するために、様々なモデルの強みを組み合わせたLSMルーティングシステムの開発に繋がった。しかし,LLMルータの性能評価のための標準ベンチマークが欠如していることは,この分野の進歩を妨げている。このギャップを埋めるために、我々は、LLMルーティングシステムの有効性を体系的に評価する新しい評価フレームワークであるROUTERBENCHと、代表的なLLMによる405万以上の推論結果からなる包括的なデータセットを提示し、ルーティング戦略の開発を支援する。さらに、LLMルーティングの理論的フレームワークを提案し、ROUTERBENCHを通して様々なルーティングアプローチの比較分析を行い、評価フレームワークにおけるそれらの可能性と限界を明らかにする。この作業は、LLMルーティングシステムの開発を形式化し、前進させるだけでなく、その評価基準を設定し、よりアクセスしやすく、経済的に実行可能なLLMデプロイメントの道を開く。コードとデータはhttps://github.com/withmartian/routerbench.comで公開されている。

As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs. Yet, the absence of a standardized benchmark for evaluating the performance of LLM routers hinders progress in this area. To bridge this gap, we present ROUTERBENCH, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies. We further propose a theoretical framework for LLM routing, and deliver a comparative analysis of various routing approaches through ROUTERBENCH, highlighting their potentials and limitations within our evaluation framework. This work not only formalizes and advances the development of LLM routing systems but also sets a standard for their assessment, paving the way for more accessible and economically viable LLM deployments. The code and data are available at https://github.com/withmartian/routerbench.

翻訳日:2024-03-20 19:01:22 公開日:2024-03-18

# HiKER-SG:階層的知識によるロバストなシーングラフ生成

HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation ( http://arxiv.org/abs/2403.12033v1 )

ライセンス: Link先を確認

Ce Zhang, Simon Stepputtis, Joseph Campbell, Katia Sycara, Yaqi Xie,

(参考訳) 視覚的なシーンを理解することは、自律運転、ロボティクス、その他の視覚に基づくアプローチなど、多くの下流タスクの先駆けとなる。しかし、既存の多くのアプローチでは、霧、雪、煙のような現実世界の汚職や、太陽フレアや水滴のような不均一な摂動が欠如していると仮定している。そこで本研究では,視覚ゲノムデータセット上でのプロシージャ的に生成された気象汚染やその他の変換を含む新しいSGGベンチマークを提案する。さらに,階層的知識向上型ロバストシーングラフ生成(HiKER-SGG)を導入し,このような困難な環境下でのシーングラフ生成の強力なベースラインを提供する。中心となるHiKER-SGGは階層的な知識グラフを用いて予測を粗い初期推定から詳細な予測へと洗練する。広汎な実験では、非破壊画像上でのHKER-SGGは、ゼロショット方式で優れた性能を示すだけでなく、非破壊SGGタスクにおける最先端の手法よりも優れた性能を示す。コードはhttps://github.com/zhangce01/HiKER-SGGで入手できる。

Being able to understand visual scenes is a precursor for many downstream tasks, including autonomous driving, robotics, and other vision-based approaches. A common approach enabling the ability to reason over visual data is Scene Graph Generation (SGG); however, many existing approaches assume undisturbed vision, i.e., the absence of real-world corruptions such as fog, snow, smoke, as well as non-uniform perturbations like sun glare or water drops. In this work, we propose a novel SGG benchmark containing procedurally generated weather corruptions and other transformations over the Visual Genome dataset. Further, we introduce a corresponding approach, Hierarchical Knowledge Enhanced Robust Scene Graph Generation (HiKER-SGG), providing a strong baseline for scene graph generation under such challenging setting. At its core, HiKER-SGG utilizes a hierarchical knowledge graph in order to refine its predictions from coarse initial estimates to detailed predictions. In our extensive experiments, we show that HiKER-SGG does not only demonstrate superior performance on corrupted images in a zero-shot manner, but also outperforms current state-of-the-art methods on uncorrupted SGG tasks. Code is available at https://github.com/zhangce01/HiKER-SGG.

翻訳日:2024-03-20 18:51:34 公開日:2024-03-18

# VFusion3D:ビデオ拡散モデルからスケーラブルな3D生成モデルを学ぶ

VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models ( http://arxiv.org/abs/2403.12034v1 )

ライセンス: Link先を確認

Junlin Han, Filippos Kokkinos, Philip Torr,

(参考訳) 本稿では,事前学習ビデオ拡散モデルを用いたスケーラブルな3次元生成モデル構築のための新しいパラダイムを提案する。基礎3D生成モデルの開発における主要な障害は、3Dデータの可用性の制限である。画像、テキスト、ビデオとは異なり、3Dデータは容易にアクセスできず、入手が困難である。この結果、他の種類のデータと比較すると、大きな差が生じる。そこで本研究では,3次元データの知識源として,大量のテキスト,画像,ビデオで訓練されたビデオ拡散モデルを提案する。微調整により多視点生成能力を解放することにより、大規模な合成多視点データセットを生成し、フィードフォワード3D生成モデルを訓練する。提案するモデルであるVFusion3Dは,約3Mの合成マルチビューデータに基づいてトレーニングされ,単一の画像から1秒で3Dアセットを生成し,現在のSOTAフィードフォワード3D生成モデルと比較して優れた性能が得られる。

This paper presents a novel paradigm for building scalable 3D generative models utilizing pre-trained video diffusion models. The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data. To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 70% of the time.

翻訳日:2024-03-20 18:51:33 公開日:2024-03-18

# CoCoCo: 一貫性,可制御性,コンパチビリティ向上のためのテキストガイド型ビデオインペインティングの改善

CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility ( http://arxiv.org/abs/2403.12035v1 )

ライセンス: Link先を確認

Bojia Zi, Shihao Zhao, Xianbiao Qi, Jianan Wang, Yukai Shi, Qianyu Chen, Bin Liang, Kam-Fai Wong, Lei Zhang,

(参考訳) 近年のビデオ生成の進歩は目覚ましいが、既存の多くの手法は一貫性とテキスト・ビデオの整合性に悩まされている。さらに、テキスト誘導画像の塗布において、よく探索された領域とは対照的な、テキスト誘導ビデオ塗布の効果的な技術が欠如している。そこで本稿では, 一貫性, 制御性, 互換性を向上する新しいテキスト誘導型映像インパインティングモデルを提案する。具体的には、動作の一貫性を維持するためのシンプルだが効率的なモーションキャプチャモジュールを導入し、ランダムな領域選択の代わりにインスタンス対応の領域選択を設計し、テキストによる制御性を向上し、新しい戦略を用いて、パーソナライズされたモデルをCoCoCoモデルに注入し、モデル互換性を向上させる。大規模な実験により,我々のモデルは高品質なビデオクリップを生成できることが判明した。一方,本モデルでは,動作の整合性,テキスト制御性,モデル互換性が向上している。詳細は[cococozibojia.github.io](cococozibojia.github.io]に示されている。

Recent advancements in video generation have been remarkable, yet many existing methods struggle with issues of consistency and poor text-video alignment. Moreover, the field lacks effective techniques for text-guided video inpainting, a stark contrast to the well-explored domain of text-guided image inpainting. To this end, this paper proposes a novel text-guided video inpainting model that achieves better consistency, controllability and compatibility. Specifically, we introduce a simple but efficient motion capture module to preserve motion consistency, and design an instance-aware region selection instead of a random region selection to obtain better textual controllability, and utilize a novel strategy to inject some personalized models into our CoCoCo model and thus obtain better model compatibility. Extensive experiments show that our model can generate high-quality video clips. Meanwhile, our model shows better motion consistency, textual controllability and model compatibility. More details are shown in [cococozibojia.github.io](cococozibojia.github.io).

翻訳日:2024-03-20 18:51:33 公開日:2024-03-18

# テキスト・ツー・イメージモデルを用いたワンステップ画像翻訳

One-Step Image Translation with Text-to-Image Models ( http://arxiv.org/abs/2403.12036v1 )

ライセンス: Link先を確認

Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu,

(参考訳) 本研究では,既存の条件拡散モデルの2つの制限に対処する: 反復的復調過程による推論速度の遅いことと,モデル微調整のためのペアデータへの依存である。これらの課題に対処するために,敵対的な学習目的を通じて,新しいタスクやドメインに単一ステップ拡散モデルを適用するための一般的な手法を提案する。具体的には,バニラ遅延拡散モデルの様々なモジュールを,訓練可能な小重量の単一エンドツーエンドジェネレータネットワークに統合し,オーバーフィッティングを低減しつつ,入力画像構造を保存できる能力を向上する。筆者らのモデルであるCycleGAN-Turboは, 日中変換や霧, 雪, 雨などの気象効果の付加・除去など, 様々な場面翻訳タスクにおいて, 既存のGANベースおよび拡散ベースの手法よりも優れていた。私たちはこのメソッドをペア設定に拡張し、Sketch2PhotoのControl-NetやEdge2Imageのような最近の作業と同等ですが、シングルステップの推論が可能です。本研究は, 単段階拡散モデルが, GAN学習目的の強力なバックボーンとして機能することを示唆している。私たちのコードとモデルはhttps://github.com/GaParmar/img2img-turbo.comで公開されています。

In this work, we address two limitations of existing conditional diffusion models: their slow inference speed due to the iterative denoising process and their reliance on paired data for model fine-tuning. To tackle these issues, we introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. Specifically, we consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve the input image structure while reducing overfitting. We demonstrate that, for unpaired settings, our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects like fog, snow, and rain. We extend our method to paired settings, where our model pix2pix-Turbo is on par with recent works like Control-Net for Sketch2Photo and Edge2Image, but with a single-step inference. This work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. Our code and models are available at https://github.com/GaParmar/img2img-turbo.

翻訳日:2024-03-20 18:51:33 公開日:2024-03-18

# 深部機能地図を用いたゼロショット画像特徴センサス

Zero-Shot Image Feature Consensus with Deep Functional Maps ( http://arxiv.org/abs/2403.12038v1 )

ライセンス: Link先を確認

Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas,

(参考訳) 対応は、生成的および識別的なタスクのために訓練された大規模な視覚モデルから生じる。これは、特徴格子上の最も近い隣人を用いて、一対のイメージ間の対応マップの計算によって明らかにされ、ベンチマークされている。既存の作業は、異なるレイヤやネットワークの特徴を組み合わせるなど、異なるソースからの機能を慎重に混合することで、これらの対応マップの品質向上を図っている。より優れた対応戦略が可能であることを指摘し、対応フィールドに直接構造を課す関数写像について述べる。この単純な数学的ツールを用いて、画素空間から関数空間への対応問題を解き、グローバルに一貫性のある写像を直接最適化する。本手法は,学習対象の大規模視覚モデルに埋め込まれた知識をよりよく反映し,よりスムーズなだけでなく,より正確に対応できることを示す。我々の手法は、様々な密接な対応タスクに新たな最先端を設定できる。また,キーポイント対応やアベイランスマップの転送にも有効であることを示す。

Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and benchmarked by computing correspondence maps between pairs of images, using nearest neighbors on the feature grids. Existing work has attempted to improve the quality of these correspondence maps by carefully mixing features from different sources, such as by combining the features of different layers or networks. We point out that a better correspondence strategy is available, which directly imposes structure on the correspondence field: the functional map. Wielding this simple mathematical tool, we lift the correspondence problem from the pixel space to the function space and directly optimize for mappings that are globally coherent. We demonstrate that our technique yields correspondences that are not only smoother but also more accurate, with the possibility of better reflecting the knowledge embedded in the large-scale vision models that we are studying. Our approach sets a new state-of-the-art on various dense correspondence tasks. We also demonstrate our effectiveness in keypoint correspondence and affordance map transfer.

翻訳日:2024-03-20 18:51:33 公開日:2024-03-18

# 1枚以下の画像にデータセットを蒸留する

Distilling Datasets Into Less Than One Image ( http://arxiv.org/abs/2403.12040v1 )

ライセンス: Link先を確認

Asaf Shul, Eliahu Horwitz, Yedid Hoshen,

(参考訳) データセット蒸留は、データセットをはるかに小さなデータセットに圧縮することで、蒸留データセットでトレーニングされたモデルが高い精度を達成することを目的としている。現在の方法では、K を正の整数とするK 蒸留画像の予算に対する蒸留分類精度を最大化するものである。本稿では,データセットの蒸留の境界を1クラス当たりのイメージ以下に圧縮する。意味のある量は、クラス当たりの蒸留画像数ではなく、データ当たりの蒸留画素数であることに気付くことが重要である。そこで,Poster Dataset Distillation (PoDD)を提案する。ポスターアプローチは、トレーニングイメージと学習可能なラベルを作成するための新しい技術ソリューションを動機付けている。本手法は,従来の1つのイメージ・パー・クラスを用いた手法と比較して,1クラス当たりのイメージ・パー・クラス以下で同等あるいは優れた性能を実現することができる。具体的には, CIFAR-10, CIFAR-100, CUB200に対して, 0.3画像単位の精度で新しい最先端性能を実現する。

Dataset distillation aims to compress a dataset into a much smaller one so that a model trained on the distilled dataset achieves high accuracy. Current methods frame this as maximizing the distilled classification accuracy for a budget of K distilled images-per-class, where K is a positive integer. In this paper, we push the boundaries of dataset distillation, compressing the dataset into less than an image-per-class. It is important to realize that the meaningful quantity is not the number of distilled images-per-class but the number of distilled pixels-per-dataset. We therefore, propose Poster Dataset Distillation (PoDD), a new approach that distills the entire original dataset into a single poster. The poster approach motivates new technical solutions for creating training images and learnable labels. Our method can achieve comparable or better performance with less than an image-per-class compared to existing methods that use one image-per-class. Specifically, our method establishes a new state-of-the-art performance on CIFAR-10, CIFAR-100, and CUB200 using as little as 0.3 images-per-class.

翻訳日:2024-03-20 18:51:33 公開日:2024-03-18

# ビデオオブジェクトセグメンテーション参照のための事前学習型テキスト・ビデオ拡散モデルの検討

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation ( http://arxiv.org/abs/2403.12042v1 )

ライセンス: Link先を確認

Zixin Zhu, Xuelu Feng, Dongdong Chen, Junsong Yuan, Chunming Qiao, Gang Hua,

(参考訳) 本稿では,ビデオ理解タスクのための事前学習されたテキスト・ツー・ビデオ(T2V)拡散モデルから生成された視覚表現について検討する。我々は、事前訓練された生成的T2Vモデルから学習した潜伏表現が、豊かな意味論と一貫性のある時間的対応をカプセル化し、ビデオ理解を自然に促進する、という仮説を立てる。我々の仮説は古典的参照ビデオオブジェクトセグメンテーション(R-VOS)タスクによって検証される。固定事前訓練されたT2Vモデル上に構築されたコンポーネントを専用に設計した新しいフレームワークである ``VD-IT'' を導入する。具体的には、VD-ITはテキスト情報を条件入力として使用し、正確な時間的インスタンスマッチングのための時間間のセマンティック一貫性を保証する。さらに、画像トークンを補足的なテキスト入力として組み込んで、詳細かつニュアンスなマスクを生成する機能セットを充実させ、標準のガウスノイズの代わりに、余分なノイズ予測モジュールを用いて映像特有のノイズを予測し、特徴の忠実さを保ち、セグメンテーション品質を高めることを提案する。広範にわたる実験により,ビデオバックボーン(例えばビデオスウィントランスフォーマー)に画像・ビデオ前タスクを事前訓練した固定型T2V拡散モデルが,意味的アライメントと時間的整合性を維持する可能性が示唆された。既存の標準ベンチマークでは、我々のVD-ITは、多くの最先端の手法を超越して、非常に競争力のある結果を得る。コードは \url{https://github.com/buxiangzhiren/VD-IT} で入手できる。

In this paper, we explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding. Our hypothesis is validated through the classic referring video object segmentation (R-VOS) task. We introduce a novel framework, termed ``VD-IT'', tailored with dedicatedly designed components built upon a fixed pretrained T2V model. Specifically, VD-IT uses textual information as a conditional input, ensuring semantic consistency across time for precise temporal instance matching. It further incorporates image tokens as supplementary textual inputs, enriching the feature set to generate detailed and nuanced masks.Besides, instead of using the standard Gaussian noise, we propose to predict the video-specific noise with an extra noise prediction module, which can help preserve the feature fidelity and elevates segmentation quality. Through extensive experiments, we surprisingly observe that fixed generative T2V diffusion models, unlike commonly used video backbones (e.g., Video Swin Transformer) pretrained with discriminative image/video pre-tasks, exhibit better potential to maintain semantic alignment and temporal consistency. On existing standard benchmarks, our VD-IT achieves highly competitive results, surpassing many existing state-of-the-art methods. The code will be available at \url{https://github.com/buxiangzhiren/VD-IT}

翻訳日:2024-03-20 18:51:33 公開日:2024-03-18

# AIは人間がより良い判断を下すのに役立つか? 実験的な評価のための方法論的枠組み

Does AI help humans make better decisions? A methodological framework for experimental evaluation ( http://arxiv.org/abs/2403.12108v1 )

ライセンス: Link先を確認

Eli Ben-Michael, D. James Greiner, Melody Huang, Kosuke Imai, Zhichao Jiang, Sooahn Shin,

(参考訳) データ駆動型アルゴリズムに基づく人工知能(AI)の利用は今日の社会で広く普及している。しかし、多くの場合、特に利害関係が高い場合、人間は最終的な決定を下す。したがって、重要な疑問は、AIが人間単独やAI単独と比較して、人間によるより良い意思決定を支援するかどうかである。本稿では,新たな方法論の枠組みを導入し,追加の仮定を伴わずにこの疑問に実験的に答えられるようにした。我々は、基準となる潜在的な結果に基づいて、標準分類基準を用いて正しい意思決定を行う意思決定者の能力を測定する。我々は、AI生成レコメンデーションの提供が、人間が最終決定を下すケースでランダム化される、単盲の実験的設計を考える。この実験的な設計の下で、人間とAI、AIとAIの3つの代替意思決定システムの性能を比較する方法を示す。提案手法を,事前リスク評価器のランダム化制御試験から得られたデータに適用する。 AIレコメンデーションは、キャッシュベイルを課す裁判官の決定の分類精度を向上しない。我々の分析は、AIが単独で行う決定は、AI支援の有無にかかわらず、人間の決定よりも一般的に悪い結果をもたらすことを示している。最後に、AIレコメンデーションは、非白人の逮捕者に対して、白人の逮捕者よりも頻繁に現金の保釈を課す傾向にある。

The use of Artificial Intelligence (AI) based on data-driven algorithms has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions as compared to a human alone or AI an alone. We introduce a new methodological framework that can be used to answer experimentally this question with no additional assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded experimental design, in which the provision of AI-generated recommendations is randomized across cases with a human making final decisions. Under this experimental design, we show how to compare the performance of three alternative decision-making systems--human-alone, human-with-AI, and AI-alone. We apply the proposed methodology to the data from our own randomized controlled trial of a pretrial risk assessment instrument. We find that AI recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Our analysis also shows that AI-alone decisions generally perform worse than human decisions with or without AI assistance. Finally, AI recommendations tend to impose cash bail on non-white arrestees more often than necessary when compared to white arrestees.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# GCAM : 食品の微粒化認識におけるガウス的・因果的アテンションモデル

GCAM: Gaussian and causal-attention model of food fine-grained recognition ( http://arxiv.org/abs/2403.12109v1 )

ライセンス: Link先を確認

Guohang Zhuang, Yue Hu, Tianxing Yan, JiaZhan Gao,

(参考訳) 現在、ほとんどの食品認識は、分類の深層学習に依存している。しかしながら、これらのアプローチは視覚的に類似した食品サンプルを効果的に区別することに苦慮し、食品認識におけるきめ細かい問題に対処する必要性を強調している。これらの課題を緩和するため,細粒度物体認識のためのガウス的・因果的アテンションモデルの導入を提案し,特に対象領域におけるガウス的特徴の獲得を訓練し,続いて対象領域から細粒度特徴の抽出を行い,対象領域の特徴マッピング機能の向上を図る。不均一なデータ分布から生じるデータドリフトに対処するために、我々は反実的推論アプローチを採用する。対物的介入を用いて、学習した画像注意機構がネットワーク予測に与える影響を分析し、より詳細な画像認識のためのより有用な注意重みをネットワークが取得できるようにする。最後に,各種モジュール間のトレーニング安定性のバランスをとるための学習可能な損失戦略を設計し,最終的な目標認識の精度を向上する。我々は,この4つのデータセットに対して,GCAMがETH-FOOD101, UECFOOD256, Vireo-FOOD172データセットの最先端手法を超えることを実験的に示した。さらに,本手法は,CUB-200データセットの最先端性能も達成する。

Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition.In particular, we train to obtain Gaussian features over target regions, followed by the extraction of fine-grained features from the objects, thereby enhancing the feature mapping capabilities of the target regions. To counteract data drift resulting from uneven data distributions, we employ a counterfactual reasoning approach. By using counterfactual interventions, we analyze the impact of the learned image attention mechanism on network predictions, enabling the network to acquire more useful attention weights for fine-grained image recognition. Finally, we design a learnable loss strategy to balance training stability across various modules, ultimately improving the accuracy of the final target recognition. We validate our approach on four relevant datasets, demonstrating its excellent performance across these four datasets.We experimentally show that GCAM surpasses state-of-the-art methods on the ETH-FOOD101, UECFOOD256, and Vireo-FOOD172 datasets. Furthermore, our approach also achieves state-of-the-art performance on the CUB-200 dataset.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# 2つの貯水池と相互作用するボソニック系の電流と効率

Current and efficiency of bosonic systems interacting with two thermal reservoirs ( http://arxiv.org/abs/2403.12112v1 )

ライセンス: Link先を確認

Jayarshi Bhattacharya, Sunandan Gangopadhyay, Gautam Gangopadhyay,

(参考訳) 本稿では,異なる温度で2つの貯水池と相互作用する中心系からなるボソニック系の電流と効率のダイナミクスについて検討する。系の密度行列の時間発展を記述するマスター方程式を導出し, 成分間の相互作用とエネルギー移動を考慮した。システム内のボソンの流れを表す電流を定量化し, システムのパラメータと温度依存性を解析する。定常状態においては,エネルギー伝達過程の効率性を表す式を導出した。解析の結果,温度依存性や量子補正係数などの量子効果がエネルギー伝達効率に大きな影響を及ぼすことが示された。特に、高温では、量子システムの効率がカルノットの効率よりも大きいことが観察される。この分析から得られた知見は、エネルギー利用の最適化が不可欠である量子コンピューティングやエネルギー収穫など、様々な分野に影響を及ぼす可能性がある。

This paper investigates the dynamics of current and efficiency in a bosonic system consisting of a central system interacting with two reservoirs at different temperatures. We derive a master equation describing the time evolution of the density matrix of the system, accounting for the interactions and energy transfer between the components. We quantify the current, representing the flow of bosons through the system and analyse its dependence on the system's parameters and temperatures of the thermal reservoirs. In the steady state regime, we derived an expression for the efficiency of the energy transfer process. Our analysis show that quantum effects, such as the dependence on temperature and the quantum correction factor, can significantly impact energy transfer efficiency. In particular, we observe that at high temperatures, the efficiency of the quantum system is greater than the Carnot efficiency. The insights gained from this analysis may have implications in various fields, including quantum computing and energy harvesting, where optimising energy utilisation is crucial.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# 自律鉄道システムの安全分析--SACRED手法の紹介

Safety Analysis of Autonomous Railway Systems: An Introduction to the SACRED Methodology ( http://arxiv.org/abs/2403.12114v1 )

ライセンス: Link先を確認

Josh Hunter, John McDermid, Simon Burton,

(参考訳) 鉄道産業は、自律性と機械学習(ML)の導入をますます求めているため、いくつかの疑問が浮かび上がっている。このようなシステムや技術に対して、どうやって安全性を確保することができるのか? この新たな技術分野における現在の安全基準の適用性はどのようなものか? システムを安全に分類するための重要な指標は何ですか? 現在、鉄道における安全分析は、既存の技術の故障モードを反映しており、対照的に、自動化の分析の主な関心事は、通常平均的な性能である。このような純粋に統計的にMLのパフォーマンスを測定するアプローチは制限されている。これらの課題に対処するため、我々は、初期安全ケースを作成し、自律システムにとって重要な安全基準を決定する安全方法論であるSACREDを紹介した。 SACREDの開発は、ベルリンで提案されたGoA-4ライトレールシステムによって動機付けられている。

As the railway industry increasingly seeks to introduce autonomy and machine learning (ML), several questions arise. How can safety be assured for such systems and technologies? What is the applicability of current safety standards within this new technological landscape? What are the key metrics to classify a system as safe? Currently, safety analysis for the railway reflects the failure modes of existing technology; in contrast, the primary concern of analysis of automation is typically average performance. Such purely statistical approaches to measuring ML performance are limited, as they may overlook classes of situations that may occur rarely but in which the function performs consistently poorly. To combat these difficulties we introduce SACRED, a safety methodology for producing an initial safety case and determining important safety metrics for autonomous systems. The development of SACRED is motivated by the proposed GoA-4 light-rail system in Berlin.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# 深層学習は多専門観測者と比較してコブ角測定を自動化する

Deep learning automates Cobb angle measurement compared with multi-expert observers ( http://arxiv.org/abs/2403.12115v1 )

ライセンス: Link先を確認

Keyu Li, Hanxue Gu, Roy Colglazier, Robert Lark, Elizabeth Hubbard, Robert French, Denise Smith, Jikai Zhang, Erin McCrum, Anthony Catanzano, Joseph Cao, Leah Waldman, Maciej A. Mazurowski, Benjamin Alman,

(参考訳) 変形につながる異常な脊椎曲率を特徴とする強皮症は、効果的な診断と管理のために正確な評価方法を必要とする。コブ角 (Cobb angle) は、傾斜した椎骨間の曲率を測定する、広く使われているスコリシス定量法である。しかし、Cobbのアングルの手動測定は時間がかかり、労働集約的であり、大きなオブザーバ間およびオブザーバ内変動を伴っている。これらの課題に対処するために、既存の自動化手法で見られる解釈可能性の欠如に対処するため、私たちは、Cobb角を正確に測定するだけでなく、これらの測定の明確な可視化を提供する、完全に自動化されたソフトウェアを作成しました。このソフトウェアは、ディープニューラルネットワークに基づくスピーン領域の検出とセグメンテーション、スピーン中心線同定、最も傾いた脊椎のピンポイント、オリジナル画像上のコブ角の直接可視化を統合している。専門家7人の評価結果と比較すると、我々のアルゴリズムはコブ角度の4.17度の平均偏差を示し、特に手作業による平均可読値の5.16度を上回った。また,0.96以上のクラス内相関係数 (ICC) と0.944以上のピアソン相関係数 (Pearson correlation coefficient) も達成した。総合的な読者調査と統計分析を通じて、このアルゴリズムは専門家の読者と高いコンセンサスを確保するだけでなく、評価中の解釈可能性や再現性を高めることができると信じている。臨床応用には非常に有望であり、より正確なスコリオーシスの評価と診断を医師に支援し、患者のケアを改善する可能性がある。

Scoliosis, a prevalent condition characterized by abnormal spinal curvature leading to deformity, requires precise assessment methods for effective diagnosis and management. The Cobb angle is a widely used scoliosis quantification method that measures the degree of curvature between the tilted vertebrae. Yet, manual measuring of Cobb angles is time-consuming and labor-intensive, fraught with significant interobserver and intraobserver variability. To address these challenges and the lack of interpretability found in certain existing automated methods, we have created fully automated software that not only precisely measures the Cobb angle but also provides clear visualizations of these measurements. This software integrates deep neural network-based spine region detection and segmentation, spine centerline identification, pinpointing the most significantly tilted vertebrae, and direct visualization of Cobb angles on the original images. Upon comparison with the assessments of 7 expert readers, our algorithm exhibited a mean deviation in Cobb angle measurements of 4.17 degrees, notably surpassing the manual approach's average intra-reader discrepancy of 5.16 degrees. The algorithm also achieved intra-class correlation coefficients (ICC) exceeding 0.96 and Pearson correlation coefficients above 0.944, reflecting robust agreement with expert assessments and superior measurement reliability. Through the comprehensive reader study and statistical analysis, we believe this algorithm not only ensures a higher consensus with expert readers but also enhances interpretability and reproducibility during assessments. It holds significant promise for clinical application, potentially aiding physicians in more accurate scoliosis assessment and diagnosis, thereby improving patient care.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# 自己決定型バイオインスパイアされた標的を用いた教師なしエンドツーエンドトレーニング

Unsupervised End-to-End Training with a Self-Defined Bio-Inspired Target ( http://arxiv.org/abs/2403.12116v1 )

ライセンス: Link先を確認

Dongshu Liu, Jérémie Laydevant, Adrien Pontlevy, Damien Querlioz, Julie Grollier,

(参考訳) 現在の教師なし学習法は、自己教師付き学習のような深層学習技術によるエンドツーエンドの訓練、高い計算要求、ヘビアン学習のようなバイオインスパイアされたアプローチを用いた層間学習、あるいは教師付き学習とは相容れない局所学習規則を用いる。どちらのアプローチも、疎結合な計算リソースに依存し、教師なしと教師なしの学習フェーズの交互化による大きな恩恵を受けるエッジAIハードウェアには問題があり、環境から広く利用可能なラベルなしのデータとラベル付きトレーニングデータセットを活用する。この課題を解決するために,ネットワークの最終層でWinner-Take-All (WTA) の選択性を利用する「自己定義目標」を導入し,生物学的にインスパイアされたホメオスタシス機構による正規化を補完する。このアプローチはフレームワークに依存しず、グローバル(バックプロパゲーション)とローカル(平衡伝播)の学習ルールの両方と互換性があり、MNISTデータセット上で97.6%のテスト精度を達成する。さらに,隠蔽層を組み込むことで,学習方法の分類精度と品質が向上し,エンド・ツー・エンドの教師なし学習の利点が示されることを示した。半教師付き学習に拡張して、データ可用性に応じてターゲットを動的に調整し、600個のラベル付きMNISTサンプルで96.6%の精度で達成する。この結果は、豊富なラベル付きデータ可用性から不要なシナリオにおける、"教師なしのターゲット"戦略の有効性と柔軟性を強調します。

Current unsupervised learning methods depend on end-to-end training via deep learning techniques such as self-supervised learning, with high computational requirements, or employ layer-by-layer training using bio-inspired approaches like Hebbian learning, using local learning rules incompatible with supervised learning. Both approaches are problematic for edge AI hardware that relies on sparse computational resources and would strongly benefit from alternating between unsupervised and supervised learning phases - thus leveraging widely available unlabeled data from the environment as well as labeled training datasets. To solve this challenge, in this work, we introduce a 'self-defined target' that uses Winner-Take-All (WTA) selectivity at the network's final layer, complemented by regularization through biologically inspired homeostasis mechanism. This approach, framework-agnostic and compatible with both global (Backpropagation) and local (Equilibrium propagation) learning rules, achieves a 97.6% test accuracy on the MNIST dataset. Furthermore, we demonstrate that incorporating a hidden layer enhances classification accuracy and the quality of learned features across all training methods, showcasing the advantages of end-to-end unsupervised training. Extending to semi-supervised learning, our method dynamically adjusts the target according to data availability, reaching a 96.6% accuracy with just 600 labeled MNIST samples. This result highlights our 'unsupervised target' strategy's efficacy and flexibility in scenarios ranging from abundant to no labeled data availability.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# T細胞応答予測のための伝達学習

Transfer Learning for T-Cell Response Prediction ( http://arxiv.org/abs/2403.12117v1 )

ライセンス: Link先を確認

Josua Stadelmaier, Brandon Malone, Ralf Eggeling,

(参考訳) 特定の特定のペプチドに対するT細胞応答の予測について検討し、特に、パーソナライズされたがんワクチンの開発に向けた重要なステップとなる可能性がある。モデルは、T細胞応答に関連する特定のペプチド特性よりも、ソース生物のようなペプチド源の一般的な特性を学習する。本稿では,T細胞応答予測のためのトランスフォーマーモデルを用いて,膨らませた予測性能の危険性は理論上ではなく,実際に発生することを示す。そこで本研究では,ドメイン認識評価手法を提案する。次に、多領域構造とショートカット学習を扱うために、異なるトランスファー学習手法について研究する。さらに,本研究の最終モデルは,ヒトペプチドに対するT細胞応答を予測するために,既存の最先端のアプローチよりも優れていることを示す。

We study the prediction of T-cell response for specific given peptides, which could, among other applications, be a crucial step towards the development of personalized cancer vaccines. It is a challenging task due to limited, heterogeneous training data featuring a multi-domain structure; such data entail the danger of shortcut learning, where models learn general characteristics of peptide sources, such as the source organism, rather than specific peptide characteristics associated with T-cell response. Using a transformer model for T-cell response prediction, we show that the danger of inflated predictive performance is not merely theoretical but occurs in practice. Consequently, we propose a domain-aware evaluation scheme. We then study different transfer learning techniques to deal with the multi-domain structure and shortcut learning. We demonstrate a per-source fine tuning approach to be effective across a wide range of peptide sources and further show that our final model outperforms existing state-of-the-art approaches for predicting T-cell responses for human peptides.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# すべての非キラルアーベル位相に対する低オーバヘッド非クリフォードトポロジカルトトレラント回路

Low-overhead non-Clifford topological fault-tolerant circuits for all non-chiral abelian topological phases ( http://arxiv.org/abs/2403.12119v1 )

ライセンス: Link先を確認

Andreas Bauer,

(参考訳) 本稿では,任意のアーベル非キラル位相を積極的に誤り訂正したフォールトトレラントメモリとして実現した幾何的局所回路群を提案する。これらの回路は、セルコホモロジーと高次カップ生成物を通して表現される離散的不動点経路積分の1-形式対称性から構成される。私たちが使用する具体的な経路積分は、ねじれた量子二重モデルの時空表現である三次元セルレーション上のアーベル的ジクグラーフ・ウィッテン状態和である。結果として得られた回路は、(キューディット)安定化トーリック符号のシンドローム抽出回路に基づいており、そこでは 'twist'' を実装した非クリフォード位相ゲートを挿入する。トーリック符号に対するオーバーヘッドは、ねじれたアーベル位相の既知の構成とは対照的に、適度である。また,測度に基づくトポロジカル量子計算やフロッケ符号のような(量子)トーリック符号相の他のアーキテクチャは,位相ゲートに富み,ツイスト量子双対を実装できることを示した。さらなる結果として、1-形式対称固定点回路と呼ぶ位相回路の非常に一般的なクラスに対して、任意の局所雑音(非パウリノイズを含む)の下での耐故障性を証明する。この概念は、安定化トーリック符号、サブシステムトーリック符号、測定に基づくトポロジカル量子計算、または(CSS)ハニカムフロッケ符号と同様に、この論文の回路を統一する。また,本手法が特定の非アベリア位相に対する耐故障回路の構築にどのように適用できるかを示す。付録では、任意のセルレーション上の高次カップ積の式を定義するための明示的な組合せ手順を提示する。

We propose a family of explicit geometrically local circuits realizing any abelian non-chiral topological phase as an actively error-corrected fault-tolerant memory. These circuits are constructed from measuring 1-form symmetries in discrete fixed-point path integrals, which we express through cellular cohomology and higher-order cup products. The specific path integral we use is the abelian Dijkgraaf-Witten state sum on a 3-dimensional cellulation, which is a spacetime representation of the twisted quantum double model. The resulting circuits are based on a syndrome extraction circuit of the (qudit) stabilizer toric code, into which we insert non-Clifford phase gates that implement the ``twist''. The overhead compared to the toric code is moderate, in contrast to known constructions for twisted abelian phases. We also show that other architectures for the (qudit) toric code phase, like measurement-based topological quantum computation or Floquet codes, can be enriched with phase gates to implement twisted quantum doubles instead of their untwisted versions. As a further result, we prove fault tolerance under arbitrary local (including non-Pauli) noise for a very general class of topological circuits that we call 1-form symmetric fixed-point circuits. This notion unifies the circuits in this paper as well as the stabilizer toric code, subsystem toric code, measurement-based topological quantum computation, or the (CSS) honeycomb Floquet code. We also demonstrate how our method can be adapted to construct fault-tolerant circuits for specific non-Abelian phases. In the appendix we present an explicit combinatorial procedure to define formulas for higher cup products on arbitrary cellulations, which might be interesting in its own right to the TQFT and topological-phases community.

翻訳日:2024-03-20 18:41:45 公開日:2024-03-18

# DistClassiPyを用いた光曲線分類:新しい距離ベース分類器

Light Curve Classification with DistClassiPy: a new distance-based classifier ( http://arxiv.org/abs/2403.12120v1 )

ライセンス: Link先を確認

Siddharth Chaini, Ashish Mahabal, Ajit Kembhavi, Federica B. Bianco,

(参考訳) シントロピック・スカイサーベイの台頭は、時間領域天文学におけるビッグデータの時代に始まり、データ科学と機械学習が天体の研究に欠かせないツールとなった。ツリーベース(例えばランダムフォレスト)とディープラーニングモデルは、この分野の現在の標準を表している。物体の分類に異なる距離の測定値を用いる方法について検討する。そこで我々はDistClassiPyという距離メートル法に基づく新しい分類器を開発した。距離メトリクスの直接利用は、時間領域天文学では研究されていないアプローチであるが、距離に基づく手法は、分類結果の解釈可能性を高め、計算コストを減少させるのに役立つ。特に、異なるクラスの天体間の距離を比較することで、変光星の光曲線を分類する。 10級の6,000個の変光星のカタログに応用した18の距離測定値を用いて,分類と次元の減少を実証した。この分類器は最先端の性能に適合するが, 計算要求が低く, 解釈性も向上していることを示す。 DistClassiPyをオープンソースにしてhttps://pypi.org/project/distclassipy/でアクセスできるようにした。

The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. Tree-based (e.g. Random Forests) and deep learning models represent the current standard in the field. We explore the use of different distance metrics to aid in the classification of objects. For this, we developed a new distance metric based classifier called DistClassiPy. The direct use of distance metrics is an approach that has not been explored in time-domain astronomy, but distance-based methods can aid in increasing the interpretability of the classification result and decrease the computational costs. In particular, we classify light curves of variable stars by comparing the distances between objects of different classes. Using 18 distance metrics applied to a catalog of 6,000 variable stars in 10 classes, we demonstrate classification and dimensionality reduction. We show that this classifier meets state-of-the-art performance but has lower computational requirements and improved interpretability. We have made DistClassiPy open-source and accessible at https://pypi.org/project/distclassipy/ with the goal of broadening its applications to other classification scenarios within and beyond astronomy.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# Anyonic partial Transpose を用いたAnyonic Systemsの絡み合い特性

Characterizing the Entanglement of Anyonic Systems using the Anyonic Partial Transpose ( http://arxiv.org/abs/2403.12121v1 )

ライセンス: Link先を確認

Nico Kirchner, Wonjune Choi, Frank Pollmann,

(参考訳) 混合量子状態の絡み合いは、部分転位とその対応する絡み合い測度、対数ネガティリティを用いて定量化することができる。近年、部分転位の概念は、交換統計がボゾンやフェルミオンのケースを超えたエキゾチック準粒子であるエキゾチック準粒子の系にまで拡張されている。この正準部分転位の基本的な性質を調べたところ、フェルミオン系の特別な場合に適用すると、境界マヨラナフェルミオンが存在するか否かに応じてフェルミオン部分転位またはそのねじれた変種に還元できることが明らかとなった。基底状態の性質に着目して、共形場理論によって予測されるような、空隙のない系の正しい絡み合いスケーリングと、位相的に自明な位相と非自明な位相の相転移の両方を、正準部分転置が捉えていることが分かる。非アーベル素数や二分割幾何に対して、部分転置の固有値、いわゆる負性スペクトルのリッチな多重構造を見つけ、電荷-と不均衡分解された負性の両方を定義する可能性を明らかにする。

Entanglement of mixed quantum states can be quantified using the partial transpose and its corresponding entanglement measure, the logarithmic negativity. Recently, the notion of partial transpose has been extended to systems of anyons, which are exotic quasiparticles whose exchange statistics go beyond the bosonic and fermionic case. Studying the fundamental properties of this anyonic partial transpose, we first reveal that when applied to the special case of fermionic systems, it can be reduced to the fermionic partial transpose or its twisted variant depending on whether or not a boundary Majorana fermion is present. Focusing on ground state properties, we find that the anyonic partial transpose captures both the correct entanglement scaling for gapless systems, as predicted by conformal field theory, and the phase transition between a topologically trivial and a nontrivial phase. For non-abelian anyons and the bipartition geometry, we find a rich multiplet structure in the eigenvalues of the partial transpose, the so-called negativity spectrum, and reveal the possibility of defining both a charge- and an imbalance-resolved negativity.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# ニューラルネットワークの同変表現学習のためのグラフニューラルネットワーク

Graph Neural Networks for Learning Equivariant Representations of Neural Networks ( http://arxiv.org/abs/2403.12143v1 )

ライセンス: Link先を確認

Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang,

(参考訳) 他のニューラルネットワークのパラメータを処理するニューラルネットワークは、暗黙のニューラルネットワーク表現の分類、ニューラルネットワークの重みの生成、一般化エラーの予測など、さまざまな分野のアプリケーションを見つける。しかし、既存のアプローチは、ニューラルネットワークの固有の置換対称性を見落としているか、あるいは、ネットワークアーキテクチャ自体の影響を無視しながら、均等性を達成するために複雑な重み付けパターンに依存している。本研究では,ニューラルネットワークをパラメータの計算グラフとして表現することを提案する。そこで本研究では,ニューラルネットワークグラフを多種多様なアーキテクチャでエンコードする単一モデルを提案する。本稿では,暗黙のニューラル表現の分類と編集,一般化性能の予測,最適化の学習など,幅広いタスクにおける本手法の有効性について述べる。ソースコードはhttps://github.com/mkofinas/neural-graphsで公開されている。

Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance, while ignoring the impact of the network architecture itself. In this work, we propose to represent neural networks as computational graphs of parameters, which allows us to harness powerful graph neural networks and transformers that preserve permutation symmetry. Consequently, our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations, predicting generalization performance, and learning to optimize, while consistently outperforming state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neural-graphs.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# Syn-QA2:Synthetic QAデータセットを用いた長期質問における偽推定の評価

Syn-QA2: Evaluating False Assumptions in Long-tail Questions with Synthetic QA Datasets ( http://arxiv.org/abs/2403.12145v1 )

ライセンス: Link先を確認

Ashwin Daswani, Rohan Sawant, Najoung Kim,

(参考訳) 情報探索問題における虚偽の仮定(または偽の前提)に対する感度は、堅牢な質問回答システム(QA)にとって重要である。近年の研究では、自然発生問題における誤った仮定が、生成的QAと単純な検出タスクの両方で低い性能で、現在のモデルに課題をもたらすことが示されている(Kim et al 2023)。しかし, 自然発生型質問に対する既存の研究の焦点は, 可能な質問の分布の長い部分におけるモデル行動の分析のギャップに繋がる。この目的のために、Syn-(QA)$^2$という合成生成された2つのQAデータセットをWikidataから摂動関係を用いて生成し、HotpotQAを摂動することで生成する(Yang et al 2018)。大規模言語モデルの評価から得られた知見は,(1)QAにおける誤った仮定は,先行研究の成果を反映して困難である,(2)生成的QA自体の難易度よりも二項検出タスクが困難である,(3)自然発生の質問よりも長い質問の方が困難であること,(3)合成データセットや生成手法の有用性を強調している,の3つである。

Sensitivity to false assumptions (or false premises) in information-seeking questions is critical for robust question-answering (QA) systems. Recent work has shown that false assumptions in naturally occurring questions pose challenges to current models, with low performance on both generative QA and simple detection tasks (Kim et al. 2023). However, the focus of existing work on naturally occurring questions leads to a gap in the analysis of model behavior on the long tail of the distribution of possible questions. To this end, we introduce Syn-(QA)$^2$, a set of two synthetically generated QA datasets: one generated using perturbed relations from Wikidata, and the other by perturbing HotpotQA (Yang et al. 2018). Our findings from evaluating a range of large language models are threefold: (1) false assumptions in QA are challenging, echoing the findings of prior work, (2) the binary detection task is challenging even compared to the difficulty of generative QA itself, possibly due to the linguistic structure of the problem, and (3) the detection task is more challenging with long-tail questions compared to naturally occurring questions, highlighting the utility of our synthetic datasets and generation method.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 有限位相格子モデルに対する厳密な反断熱駆動

Exact counterdiabatic driving for finite topological lattice models ( http://arxiv.org/abs/2403.12150v1 )

ライセンス: Link先を確認

Callum W. Duncan,

(参考訳) 断熱プロトコルは、しばしば状態の準備スキームで使用されるが、瞬時固有状態間の遷移が指数関数的に抑制されるように、ゆっくりと変化するハミルトニアンによってシステムを動かす必要がある。ダイアバティック・ドライビング(英: Counterdiabatic driving)は、ダイアバティック・エキサイティング(Diabatic excitations)に対抗する瞬間固有状態から計算された追加用語を含めることで、ダイアバティック・プロトコルを高速化する技術である。しかし、このアプローチは完全な固有スペクトルの知識を必要とするため、反断熱駆動の正確な解析形式は、例えば高調波振動子と横場イジングモデルのような問題のサブセットでのみ知られている。この問題のサブセットを、開境界条件と任意のオンサイトポテンシャル、トンネル項、格子サイズを持つ1次元非相互作用格子モデルの一般族を含むように拡張する。格子モデルのすべての状態に対してこのアプローチを定式化し、トポロジカル絶縁体に現れるような境界状態やギャップ内状態を含む。また、特定の状態に留まるために動的状態を強制するために調整された、標的の反断熱駆動用語を導出する。一例として、Su-Schrieffer-Heegerモデルの位相的エッジ状態を用いた状態遷移を考える。導出された解析的反断熱駆動ハミルトニアンは、多体格子モデルにおける制御プロトコルを知らせたり、格子モデルの非平衡特性を探索するために利用することができる。

Adiabatic protocols are often employed in state preparation schemes but require the system to be driven by a slowly varying Hamiltonian so that transitions between instantaneous eigenstates are exponentially suppressed. Counterdiabatic driving is a technique to speed up adiabatic protocols by including additional terms calculated from the instantaneous eigenstates that counter diabatic excitations. However, this approach requires knowledge of the full eigenspectrum meaning that the exact analytical form of counterdiabatic driving is only known for a subset of problems, e.g., the harmonic oscillator and transverse field Ising model. We extend this subset of problems to include the general family of one-dimensional non-interacting lattice models with open boundary conditions and arbitrary on-site potential, tunnelling terms, and lattice size. We formulate this approach for all states of lattice models, including bound and in-gap states which appear, e.g., in topological insulators. We also derive targeted counterdiabatic driving terms which are tailored to enforce the dynamical state to remain in a specific state. As an example, we consider state transfer using the topological edge states of the Su-Schrieffer-Heeger model. The derived analytical counterdiabatic driving Hamiltonian can be utilised to inform control protocols in many-body lattice models or to probe the non-equilibrium properties of lattice models.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# ゼロショットオブジェクト状態分類のための知識グラフへの大言語モデルからのドメイン特化コンテンツの利用

Fusing Domain-Specific Content from Large Language Models into Knowledge Graphs for Enhanced Zero Shot Object State Classification ( http://arxiv.org/abs/2403.12151v1 )

ライセンス: Link先を確認

Filippos Gouidis, Katerina Papantoniou, Konstantinos Papoutsakis Theodore Patkos, Antonis Argyros, Dimitris Plexousakis,

(参考訳) ドメイン固有の知識は、幅広いビジョンタスクへの対処に大きく貢献する。しかし、そのような知識の創出には相当な人的労働力と時間的コストが伴う。本研究では,Large Language Models (LLMs) のセマンティック埋め込みによるドメイン固有情報の生成と提供の可能性について検討する。これを実現するために、LLMは知識グラフと事前訓練されたセマンティックベクターを、ビジョンベースのゼロショットオブジェクト状態分類タスクのコンテキストで使用するパイプラインに統合される。広範囲なアブレーション研究を通じて, LLMの挙動を徹底的に検討した。その結果,LLMをベースとした組込みと汎用的な事前学習型組込みを組み合わせることで,大幅な性能向上が期待できることがわかった。このアブレーション研究から得られた知見を引用し、競合するモデルとの比較分析を行い、提案手法により達成された最先端の性能を明らかにする。

Domain-specific knowledge can significantly contribute to addressing a wide variety of vision tasks. However, the generation of such knowledge entails considerable human labor and time costs. This study investigates the potential of Large Language Models (LLMs) in generating and providing domain-specific information through semantic embeddings. To achieve this, an LLM is integrated into a pipeline that utilizes Knowledge Graphs and pre-trained semantic vectors in the context of the Vision-based Zero-shot Object State Classification task. We thoroughly examine the behavior of the LLM through an extensive ablation study. Our findings reveal that the integration of LLM-based embeddings, in combination with general-purpose pre-trained embeddings, leads to substantial performance improvements. Drawing insights from this ablation study, we conduct a comparative analysis against competing models, thereby highlighting the state-of-the-art performance achieved by the proposed approach.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 心エコー図による左室エジェクション分画の自動ニューラルネットワーク予測法の開発

Development of Automated Neural Network Prediction for Echocardiographic Left ventricular Ejection Fraction ( http://arxiv.org/abs/2403.12152v1 )

ライセンス: Link先を確認

Yuting Zhang, Boyang Liu, Karina V. Bunting, David Brind, Alexander Thorley, Andreas Karwath, Wenqi Lu, Diwei Zhou, Xiaoxia Wang, Alastair R. Mobley, Otilia Tica, Georgios Gkoutos, Dipak Kotecha, Jinming Duan,

(参考訳) 左室流出率(LVEF)の心エコー計測は,心不全(HF)患者の診断と分類の基礎となる。本稿では,LVEFを自動的かつ正確に定量化するために,深層ニューラルネットワークとアンサンブル学習に基づく新しいパイプライン手法を提案する。パイプライン内では、Atrous Convolutional Neural Network (ACNN) が最初に訓練され、左心室(LV)を分割した後、楕円体単一平面モデルに基づく領域長の定式化を用いてLVEF値を計算した。この定式化には、改良されたジェフリー法を用いたセグメント化から派生したLV領域の入力と、新しいアンサンブル学習モデルから派生したLV長さが必要であった。パイプラインの精度をさらに向上するために、自動ピーク検出アルゴリズムを使用して、エンド・ディストリックとエンド・シストリックのフレームを識別し、ヒューマンエラーの問題を回避した。その後, 単拍LVEF値を全心循環で平均化し, 最終LVEFを得た。この手法は,10,030個の心エコー図を含むオープンソースデータセットを用いて開発され,内部的に検証された。 Pearson's correlation coefficient was 0.83 for LVEF prediction than expert human analysis (p<0.001) and a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 - 0.99) for cateization of HF with reduce ejection (HFrEF; LVEF<40%)。 200個の心エコー図を用いた外部データセットでは、HFrEF評価のためのAUCが0.90(95%信頼区間0.88から0.91)に達した。本研究では、LVEFの自動ニューラルネットワークに基づく計算が、心収縮機能の時間的・フレーム単位のマニュアル評価を行う専門医に匹敵することを示した。

The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). In order to quantify LVEF automatically and accurately, this paper proposes a new pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first trained to segment the left ventricle (LV), before employing the area-length formulation based on the ellipsoid single-plane model to calculate LVEF values. This formulation required inputs of LV area, derived from segmentation using an improved Jeffrey's method, as well as LV length, derived from a novel ensemble learning model. To further improve the pipeline's accuracy, an automated peak detection algorithm was used to identify end-diastolic and end-systolic frames, avoiding issues with human error. Subsequently, single-beat LVEF values were averaged across all cardiac cycles to obtain the final LVEF. This method was developed and internally validated in an open-source dataset containing 10,030 echocardiograms. The Pearson's correlation coefficient was 0.83 for LVEF prediction compared to expert human analysis (p<0.001), with a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 to 0.99) for categorisation of HF with reduced ejection (HFrEF; LVEF<40%). In an external dataset with 200 echocardiograms, this method achieved an AUC of 0.90 (95% confidence interval 0.88 to 0.91) for HFrEF assessment. This study demonstrates that an automated neural network-based calculation of LVEF is comparable to expert clinicians performing time-consuming, frame-by-frame manual evaluation of cardiac systolic function.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 多元経路探索に応用した解集合プログラミングにおけるルーティングとスケジューリング:予備報告

Routing and Scheduling in Answer Set Programming applied to Multi-Agent Path Finding: Preliminary Report ( http://arxiv.org/abs/2403.12153v1 )

ライセンス: Link先を確認

Roland Kaminski, Torsten Schaub, Tran Cao Son, Jiří Švancara, Philipp Wanko,

(参考訳) 本稿では、ASP(Answer Set Programming)におけるルーティングとスケジューリングの代替手法を提案し、マルチエージェントパス探索の文脈でそれらを探索する。その考え方は、アクションや流動性に付随する時間ステップではなく、部分的な順序で時間の流れを捉えることである。これはまた、計画の長さの固定された上界の必要性を廃止する。この回避のトレードオフは、(一部)時間軌道は、同じ作用や流線型の複数の発生がもはや区別できないため、非循環でなければならないことである。このアプローチはルーティングをモデリングする興味深い代替手段を提供するが、きめ細かいタイミングをASP.NETで表現できないため、スケジューリングの代替にはならない。これは、非巡回性や差分制約といった外部手段で効率的に処理できる部分順序に対して異なる。我々はこのアイデアを正式に詳述し、いくつかのASPエンコーディングを提示する。最後に,実験解析による有効性を示す。

We present alternative approaches to routing and scheduling in Answer Set Programming (ASP), and explore them in the context of Multi-agent Path Finding. The idea is to capture the flow of time in terms of partial orders rather than time steps attached to actions and fluents. This also abolishes the need for fixed upper bounds on the length of plans. The trade-off for this avoidance is that (parts of) temporal trajectories must be acyclic, since multiple occurrences of the same action or fluent cannot be distinguished anymore. While this approach provides an interesting alternative for modeling routing, it is without alternative for scheduling since fine-grained timings cannot be represented in ASP in a feasible way. This is different for partial orders that can be efficiently handled by external means such as acyclicity and difference constraints. We formally elaborate upon this idea and present several resulting ASP encodings. Finally, we demonstrate their effectiveness via an empirical analysis.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# ThermoNeRF: 熱合成のためのマルチモーダル神経放射場

ThermoNeRF: Multimodal Neural Radiance Fields for Thermal Novel View Synthesis ( http://arxiv.org/abs/2403.12154v1 )

ライセンス: Link先を確認

Mariam Hassan, Florent Forest, Olga Fink, Malcolm Mielle,

(参考訳) 熱環境の再構築は、建築エネルギー消費分析や非破壊試験など、幅広い分野にわたるap-plicationの可能性を秘めている。しかし、既存のメスオードは通常、密集したシーン計測を必要とし、しばしばRGB画像に頼って3次元形状の再構成を行い、熱情報は再建後に投影される。この2段階の戦略は、熱画像のテクスチャの欠如によって採用され、再構成された物体の形状と温度と実際のシーンの温度の相違をもたらす可能性がある。この課題に対処するため,ニューラル・ラジアンス・フィールドに基づく新しいマルチモーダル・アプローチであるThermoNeRFを提案する。熱画像のテクスチャの欠如を克服するために,RGBと熱画像を組み合わせてシーン密度を学習し,異なるネットワークが色や温度情報を推定する。さらに、シーン再構築に利用可能なRGB+熱的データセットの欠如を緩和する新しいデータセットであるThermoScenesを紹介する。実験結果から, サーモネフロンの平均絶対温度誤差は1.5{\deg}Cであり, コンカレントRGB+熱データとNerfactoを用いた場合に比べて50%以上向上した。

Thermal scene reconstruction exhibit great potential for ap- plications across a broad spectrum of fields, including building energy consumption analysis and non-destructive testing. However, existing meth- ods typically require dense scene measurements and often rely on RGB images for 3D geometry reconstruction, with thermal information being projected post-reconstruction. This two-step strategy, adopted due to the lack of texture in thermal images, can lead to disparities between the geometry and temperatures of the reconstructed objects and those of the actual scene. To address this challenge, we propose ThermoNeRF, a novel multimodal approach based on Neural Radiance Fields, capable of rendering new RGB and thermal views of a scene jointly. To overcome the lack of texture in thermal images, we use paired RGB and thermal images to learn scene density, while distinct networks estimate color and temperature information. Furthermore, we introduce ThermoScenes, a new dataset to palliate the lack of available RGB+thermal datasets for scene reconstruction. Experimental results validate that ThermoNeRF achieves accurate thermal image synthesis, with an average mean ab- solute error of 1.5{\deg}C, an improvement of over 50% compared to using concatenated RGB+thermal data with Nerfacto, a state-of-the-art NeRF method.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# ディリクレ混合モデルにおける効率的なKL偏差推定のための変分法

Variational Approach for Efficient KL Divergence Estimation in Dirichlet Mixture Models ( http://arxiv.org/abs/2403.12158v1 )

ライセンス: Link先を確認

Samyajoy Pal, Christian Heumann,

(参考訳) 本研究は,構成データのクラスタリングに不可欠なディリクレ混合モデル (DMM) におけるKulback-Leibler (KL) の分散を効率的に推定することに取り組む。 DMMが重要であるにも拘わらず、KL分枝に対する解析的に抽出可能な解が得られることが証明されている。過去のアプローチはモンテカルロ法を計算的に要求することに依存しており、新しい変分法の導入を動機付けていた。本手法は,高速モデル比較とロバスト評価のための計算効率を大幅に向上する閉形式解を提供する。実データとシミュレーションデータを用いた検証は、モンテカルロの従来の手法よりも優れた効率と精度を示し、多様なDMMモデルの迅速な探索と、構成データの統計的解析の進歩に新たな道を開く。

This study tackles the efficient estimation of Kullback-Leibler (KL) Divergence in Dirichlet Mixture Models (DMM), crucial for clustering compositional data. Despite the significance of DMMs, obtaining an analytically tractable solution for KL Divergence has proven elusive. Past approaches relied on computationally demanding Monte Carlo methods, motivating our introduction of a novel variational approach. Our method offers a closed-form solution, significantly enhancing computational efficiency for swift model comparisons and robust estimation evaluations. Validation using real and simulated data showcases its superior efficiency and accuracy over traditional Monte Carlo-based methods, opening new avenues for rapid exploration of diverse DMM models and advancing statistical analyses of compositional data.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 金融市場におけるリーダーの声の影響:NASDAQ, NSE, その他に関する実証的深層学習調査

Effect of Leaders Voice on Financial Market: An Empirical Deep Learning Expedition on NASDAQ, NSE, and Beyond ( http://arxiv.org/abs/2403.12161v1 )

ライセンス: Link先を確認

Arijit Das, Tanmoy Nandi, Prasanta Saha, Suman Das, Saronyo Mukherjee, Sudip Kumar Naskar, Diganta Saha,

(参考訳) 株価、株式、金、石油、相互資金といった金融市場は、ニュースやソーシャルメディアへの投稿の影響を受けている。本研究では、さまざまな分野のリーダーのTwitterハンドルのNLP分析に基づいて、金融市場の動向を予測するために、ディープラーニングに基づくモデルを提案する。財務要素の歴史的データだけでなく、歴史的データとTwitterのようなソーシャルメディアのニュースや投稿を組み合わせることで、金融市場を予測できるモデルが、この研究の主目的である。その結果、実質的な改善が示される。現在の作品の主な特徴は- a) Twitterハンドルと金融コンポーネントのモデルを生成することができる完全に一般化されたアルゴリズムを提案すること。ロ株価に対するつぶやき効果の時間窓の予測 c) トレンドを予測するために複数のTwitterハンドルの効果を分析すること。近年の同様の分野における最新の研究の発見、研究ギャップの発見、分析と予測に必要なデータ収集のための詳細な調査が行われている。 State-of-the-artアルゴリズムが提案され,環境との完全な実装が提案されている。金融市場における Twitter データの NLP 分析を考慮した結果改善の洞察に富んだ傾向を示す。インドとアメリカの金融市場は、将来他の市場が取られるように、現在の作業で探索されている。本研究の社会的・経済的影響をまとめる。

Financial market like the price of stock, share, gold, oil, mutual funds are affected by the news and posts on social media. In this work deep learning based models are proposed to predict the trend of financial market based on NLP analysis of the twitter handles of leaders of different fields. There are many models available to predict financial market based on only the historical data of the financial component but combining historical data with news and posts of the social media like Twitter is the main objective of the present work. Substantial improvement is shown in the result. The main features of the present work are- a) proposing completely generalized algorithm which is able to generate models for any twitter handle and any financial component, b) predicting the time window for a tweets effect on a stock price c) analyzing the effect of multiple twitter handles for predicting the trend. A detailed survey is done to find out the latest work in recent years in the similar field, find the research gap, and collect the required data for analysis and prediction. State-of-the-art algorithm is proposed and complete implementation with environment is given. An insightful trend of the result improvement considering the NLP analysis of twitter data on financial market components is shown. The Indian and USA financial markets are explored in the present work where as other markets can be taken in future. The socio-economic impact of the present work is discussed in conclusion.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 計画分析による知的実行

Intelligent Execution through Plan Analysis ( http://arxiv.org/abs/2403.12162v1 )

ライセンス: Link先を確認

Daniel Borrajo, Manuela Veloso,

(参考訳) インテリジェントなロボットは計画を作成し実行する必要がある。現実の環境の複雑さに対処するために、計画は世界についていくつかの仮定をする。計画を実行する場合、通常、仮定は満たされない。ほとんどの研究は、この事実のネガティブな影響と実行失敗後の再計画の使用に焦点を当てている。代わりに私たちは、ポジティブな影響や、より良い計画を見つける機会に重点を置いています。計画する際、提案手法はこれらの機会を見つけ、保存する。その後、実行中に、監視システムは、スクラッチから計画を立て直すのではなく、知覚に集中し、計画を修正するために使用することができる。いくつかのパラダイム的なロボットタスクの実験は、アプローチが標準的な計画戦略よりも優れていることを示す。

Intelligent robots need to generate and execute plans. In order to deal with the complexity of real environments, planning makes some assumptions about the world. When executing plans, the assumptions are usually not met. Most works have focused on the negative impact of this fact and the use of replanning after execution failures. Instead, we focus on the positive impact, or opportunities to find better plans. When planning, the proposed technique finds and stores those opportunities. Later, during execution, the monitoring system can use them to focus perception and repair the plan, instead of replanning from scratch. Experiments in several paradigmatic robotic tasks show how the approach outperforms standard replanning strategies.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 超伝導回路の損失に及ぼすニオブ薄膜構造の影響

The effect of niobium thin film structure on losses in superconducting circuits ( http://arxiv.org/abs/2403.12164v1 )

ライセンス: Link先を確認

Maxwell Drimmer, Sjoerd Telkamp, Felix L. Fischer, Ines C. Rodrigues, Clemens Todt, Filip Krizek, Dominik Kriegner, Christoph Müller, Werner Wegscheider, Yiwen Chu,

(参考訳) 超伝導マイクロ波回路の性能は超伝導膜と基板の材料特性に強く影響される。表面処理の重要性や表面酸化物の影響を理解するために研究が進んでいるが、マイクロ波損失に対する超伝導膜構造の影響は、まだ完全には理解されていない。本研究では, 結晶特性の異なるニオブ共振器のマイクロ波特性とその表面トポグラフィーについて検討した。我々は、Nb結晶配向と表面トポグラフィーが室温と975Kの間で基板温度を変化させることで変化する一連のマグネトロンスパッタ薄膜を解析した。成長系列(550K)の中間温度条件下で成長したフィルムにおいて,結晶ドメインの優先順序と低表面粗さの両方を示すフィルムにおいて,最も高い品質因子を観察した。さらに, 共振器の温度依存性を解析し, Nb膜中の準粒子密度がニオブ結晶構造と粒界の存在によってどのように影響を受けるかを知る。その結果, 超伝導薄膜の結晶構造と共振器による損失機構の関連性が強調され, 薄膜成膜時の温度の適度な変化が結果として生じる品質要因に大きく影響することが示唆された。

The performance of superconducting microwave circuits is strongly influenced by the material properties of the superconducting film and substrate. While progress has been made in understanding the importance of surface preparation and the effect of surface oxides, the complex effect of superconductor film structure on microwave losses is not yet fully understood. In this study, we investigate the microwave properties of niobium resonators with different crystalline properties and related surface topographies. We analyze a series of magnetron sputtered films in which the Nb crystal orientation and surface topography are changed by varying the substrate temperatures between room temperature and 975 K. The lowest-loss resonators that we measure have quality factors of over one million at single-photon powers, among the best ever recorded using the Nb on sapphire platform. We observe the highest quality factors in films grown at an intermediate temperature regime of the growth series (550 K) where the films display both preferential ordering of the crystal domains and low surface roughness. Furthermore, we analyze the temperature-dependent behavior of our resonators to learn about how the quasiparticle density in the Nb film is affected by the niobium crystal structure and the presence of grain boundaries. Our results stress the connection between the crystal structure of superconducting films and the loss mechanisms suffered by the resonators and demonstrate that even a moderate change in temperature during thin film deposition can significantly affect the resulting quality factors.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 弱さの力:Coreset Selectionによるデータリヘアリングの高速化と強化

The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection ( http://arxiv.org/abs/2403.12166v1 )

ライセンス: Link先を確認

Mohammad Jafari, Yimeng Zhang, Yihua Zhang, Sijia Liu,

(参考訳) 機械学習のタスクが進化し続けるにつれて、傾向はより大きなデータセットを集め、ますます大きなモデルを訓練する。これは精度の向上につながったが、計算コストを持続不可能なレベルへとエスカレートした。そこで本研究は,計算効率とモデル精度の微妙なバランスをとることを目的としている。計算時間とモデル性能の両方を効果的に最適化し、コアサブセットの選択を重み付けに利用する新しい手法を提案する。戦略的に選択されたコアセットに焦点をあてることで、アウトリーチの影響を効率よく最小化するため、我々のアプローチは堅牢な表現を提供する。再校正された重みは、データセット全体に対してマッピングされ、伝播される。実験により,本手法の有効性を実証し,モデルトレーニングのスケーラブルで高精度な解法としての可能性を明らかにした。

As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# 医用画像分類のためのディープラーニングモデルの一般化

Generalizing deep learning models for medical image classification ( http://arxiv.org/abs/2403.12167v1 )

ライセンス: Link先を確認

Matta Sarah, Lamard Mathieu, Zhang Philippe, Alexandre Le Guilcher, Laurent Borderie, Béatrice Cochener, Gwenolé Quellec,

(参考訳) 多くのDeep Learning(DL)モデルが、医療実践のさまざまな側面を再形成することを約束する医療画像分析アプリケーションのために開発されている。医療機関がそれを採用することを奨励するDLモデル検証と実装の進歩にもかかわらず、いくつかの根本的な疑問が残る:DLモデルは一般化可能であるか? DLモデルのパフォーマンスが低下する原因は何でしょう? DLモデルのパフォーマンス低下を克服するには? 医療機器のアップデート、新しい画像ワークフロー、患者人口や人口の変化など、複数の要因により、時間とともにこのドリフトが引き起こされるため、医療データは動的でドメインシフトの傾向にある。本稿では,DLに基づく分類モデルの一般化手法の最近の展開を概観する。また、評価プロトコルやベンチマークの改善の必要性など今後の課題についても論じ、医用画像分類のための堅牢で一般化されたモデルを実現するための今後の発展を構想する。

Numerous Deep Learning (DL) models have been developed for a large spectrum of medical image analysis applications, which promises to reshape various facets of medical practice. Despite early advances in DL model validation and implementation, which encourage healthcare institutions to adopt them, some fundamental questions remain: are the DL models capable of generalizing? What causes a drop in DL model performances? How to overcome the DL model performance drop? Medical data are dynamic and prone to domain shift, due to multiple factors such as updates to medical equipment, new imaging workflow, and shifts in patient demographics or populations can induce this drift over time. In this paper, we review recent developments in generalization methods for DL-based classification models. We also discuss future challenges, including the need for improved evaluation protocols and benchmarks, and envisioned future developments to achieve robust, generalized models for medical image classification.

翻訳日:2024-03-20 18:31:46 公開日:2024-03-18

# EasyJailbreak: 大規模言語モデルをジェイルブレイクするための統一フレームワーク

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models ( http://arxiv.org/abs/2403.12171v1 )

ライセンス: Link先を確認

Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang,

(参考訳) 大規模言語モデル(LLM)のセキュリティ脆弱性の特定と緩和には、脱獄攻撃が不可欠である。セーフガードをバイパスし、禁止されたアウトプットを引き出すように設計されている。しかし、さまざまなjailbreakメソッドに大きな違いがあるため、コミュニティで利用可能な標準実装フレームワークは存在せず、包括的なセキュリティ評価が制限されている。本稿では,LLMに対するジェイルブレイク攻撃の構築と評価を容易にする統合フレームワークであるEasyJailbreakを紹介する。 Selector、Mutator、Constraint、Evaluatorの4つのコンポーネントを使ってJailbreak攻撃を構築する。このモジュラーフレームワークは、研究者が新しいコンポーネントと既存のコンポーネントの組み合わせから簡単に攻撃を構築できる。今のところ、EasyJailbreakは11の異なるjailbreakメソッドをサポートし、幅広いLLMのセキュリティ検証を容易にする。 10の異なるLSMで検証した結果、さまざまなジェイルブレイク攻撃で平均60%の攻撃確率で重大な脆弱性が判明した。特に、GPT-3.5-TurboやGPT-4のような先進モデルでさえ、それぞれ平均攻撃成功率(ASR)が57%、33%である。我々は、Webプラットフォーム、PyPIパブリッシュパッケージ、スクリーンキャストビデオ、実験的なアウトプットなど、研究者のための豊富なリソースをリリースした。

Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs. It builds jailbreak attacks using four components: Selector, Mutator, Constraint, and Evaluator. This modular framework enables researchers to easily construct attacks from combinations of novel and existing components. So far, EasyJailbreak supports 11 distinct jailbreak methods and facilitates the security validation of a broad spectrum of LLMs. Our validation across 10 distinct LLMs reveals a significant vulnerability, with an average breach probability of 60% under various jailbreaking attacks. Notably, even advanced models like GPT-3.5-Turbo and GPT-4 exhibit average Attack Success Rates (ASR) of 57% and 33%, respectively. We have released a wealth of resources for researchers, including a web platform, PyPI published package, screencast video, and experimental outputs.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# 骨格型ビデオ異常検出のためのグラフ-Jigsaw条件拡散モデル

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection ( http://arxiv.org/abs/2403.12172v1 )

ライセンス: Link先を確認

Ali Karami, Thi Kieu Khanh Ho, Narges Armanfard,

(参考訳) スケルトンに基づくビデオ異常検出(SVAD)はコンピュータビジョンにおいて重要な課題である。異常パターンや事象を正確に識別することで、オペレーターは不審な行為を迅速に検出し、安全性を高めることができる。これを達成するためには、身体レベルと地域レベルの両方において、人間の動きを包括的に理解することが必要である。しかし、既存の研究はこれらの重要な性質を同時に解決することができない。本稿では,SVADに関連する課題を克服するため,Skeleton-based Video Anomaly Detection (GiCiSAD) のためのグラフ-Jigsaw条件付き拡散モデル(Graph-Jigsaw Conditioned Diffusion Model)を提案する。 GiCiSADは3つの新しいモジュールで構成されている。グラフアテンションベースの予測モジュールはデータ固有の時空間的依存関係をキャプチャし、グラフレベルのJigsaw Puzzle Makerモジュールは正常な動きと異常な動きの間の微妙な領域レベルの不一致を区別し、グラフベースの条件拡散モデルは人間の動きの幅広いスペクトルを生成する。広く使われている4つの骨格ベースのビデオデータセットの大規模な実験により、GiCiSADはトレーニングパラメータが大幅に少ない既存のメソッドよりも優れており、新しい最先端技術として確立されている。

Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action. However, existing studies fail to simultaneously address these crucial properties. This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD) to overcome the challenges associated with SVAD. GiCiSAD consists of three novel modules: the Graph Attention-based Forecasting module to capture the spatio-temporal dependencies inherent in the data, the Graph-level Jigsaw Puzzle Maker module to distinguish subtle region-level discrepancies between normal and abnormal motions, and the Graph-based Conditional Diffusion model to generate a wide spectrum of human motions. Extensive experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters, establishing it as the new state-of-the-art.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# TnT-LLM:大規模言語モデルを用いた大規模テキストマイニング

TnT-LLM: Text Mining at Scale with Large Language Models ( http://arxiv.org/abs/2403.12173v1 )

ライセンス: Link先を確認

Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan,

(参考訳) 非構造化テキストを構造化され有意義な形式に変換し、有用なカテゴリラベルで整理することは、下流の分析と応用のためのテキストマイニングの基本的なステップである。しかしながら、ラベル分類法やテキストベースのラベル分類器を構築するための既存の方法の多くは、ドメインの専門知識と手作業によるキュレーションに大きく依存しているため、そのプロセスは高価で時間を要する。ラベル空間が不特定であり、大規模なデータアノテーションが利用できない場合、これは特に困難である。本稿では,これらの課題を大規模言語モデル (LLM) を用いて解決する。 TnT-LLM は LLM を利用した2段階のフレームワークで,任意のユースケースに対して最小限の人的労力でラベル生成と割り当てのプロセスを自動化する。第1フェーズでは,ラベル分類を反復的に生成・洗練するゼロショット多段階推論手法を導入する。第2フェーズでは、LLMをトレーニングサンプルを生成するデータラベルとして使用し、軽量な教師付き分類器を確実に構築、デプロイ、大規模に提供できるようにします。我々は、オープンドメインチャットベースの検索エンジンであるBing Copilot(旧Bing Chat)のユーザ意図と会話ドメインの分析にTnT-LLMを適用した。 TnT-LLMは、最先端のベースラインと比較すると、より正確で関連性の高いラベル分類を生成でき、大規模分類における精度と効率のバランスが良好であることを示す。また、現実のアプリケーションにおける大規模テキストマイニングにLLMを使うことの課題と機会に関する実践的経験と洞察を共有します。

Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# エンド・ツー・エンド自動運転における説明可能な人工知能の安全性

Safety Implications of Explainable Artificial Intelligence in End-to-End Autonomous Driving ( http://arxiv.org/abs/2403.12176v1 )

ライセンス: Link先を確認

Shahin Atakishiyev, Mohammad Salameh, Randy Goebel,

(参考訳) エンド・ツー・エンドの学習パイプラインは、ディープラーニングの進歩、大規模トレーニングデータセットの可用性、統合センサーデバイスの改善など、高度自動運転車の継続的な開発におけるパラダイムシフトを徐々に生み出している。しかし、現代の学習手法によるリアルタイム意思決定における解釈可能性の欠如は、ユーザの信頼を阻害し、そのような車両の普及と商業化を阻害する。さらに、これらの車両が交通事故に巻き込まれたり、事故を起こしたりする場合には、この問題が悪化する。このような欠点は、社会的および法的観点から深刻な安全上の懸念を提起する。したがって、車両自動化の安全性を実現するためには、エンドツーエンドの自動運転における説明責任が不可欠である。しかしながら、自律運転の安全性と説明可能性の側面は、今日の最先端の研究者によって概して不一致に研究されている。本稿では,これらのトピック間のギャップを埋めて,次のような研究課題に答えようとする。いつ,どのように説明が自動運転の安全性を向上させるのか? そこで本稿では,自律運転における安全性と最先端の説明可能性技術について再考する。さらに,3つの重要なケーススタディを提示し,自動運転車の安全性向上における説明の要点を示す。最後に、我々の経験的調査について説明し、自動車の自律性に対する安全性と透明性を確保することの役割について、実用的な説明可能なAI手法による潜在的な価値、限界、注意点を明らかにする。

The end-to-end learning pipeline is gradually creating a paradigm shift in the ongoing development of highly autonomous vehicles, largely due to advances in deep learning, the availability of large-scale training datasets, and improvements in integrated sensor devices. However, a lack of interpretability in real-time decisions with contemporary learning methods impedes user trust and attenuates the widespread deployment and commercialization of such vehicles. Moreover, the issue is exacerbated when these cars are involved in or cause traffic accidents. Such drawback raises serious safety concerns from societal and legal perspectives. Consequently, explainability in end-to-end autonomous driving is essential to enable the safety of vehicular automation. However, the safety and explainability aspects of autonomous driving have generally been investigated disjointly by researchers in today's state of the art. In this paper, we aim to bridge the gaps between these topics and seek to answer the following research question: When and how can explanations improve safety of autonomous driving? In this regard, we first revisit established safety and state-of-the-art explainability techniques in autonomous driving. Furthermore, we present three critical case studies and show the pivotal role of explanations in enhancing self-driving safety. Finally, we describe our empirical investigation and reveal potential value, limitations, and caveats with practical explainable AI methods on their role of assuring safety and transparency for vehicle autonomy.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# 仮想2モードスクイーズ表現としての真空ラビ分割:周波数シフトからスクイーズパラメータを抽出する

Vacuum Rabi splitting as a manifestation of virtual two-mode squeezing: Extracting the squeezing parameters from frequency shifts ( http://arxiv.org/abs/2403.12177v1 )

ライセンス: Link先を確認

Karol Gietka,

(参考訳) 真空ラビ分裂は、原子の共振周波数と原子が存在する空洞の対称的な分裂に依存する。この研究において、真空ラビ分裂は仮想的な2モードのスクイーズ現象の顕在化であると主張している。仮想励起のスクイーズパラメータと物理モードの周波数シフトの関連性を確立する。この目的のために、Dickeモデルと相互作用する2つの調和振動子のマッピングを用い、素モードと物理的モードの枠組みで解析する。最後に、そのような量子場の仮想的スクイーズもまた、場の量子論において役割を果たすかもしれないことを示唆する。

Vacuum Rabi splitting relies on symmetrical splitting of the common resonance frequency of atoms and the cavity in which the atoms reside. In this work, we argue that vacuum Rabi splitting is a manifestation of virtual light-matter two-mode squeezing. We establish a connection between squeezing parameters of virtual excitations and frequency shifts of the physical modes. To this end, we use the mapping between the Dicke model and two interacting harmonic oscillators, which we analyze in the framework of bare and physical modes. Finally, we suggest that such virtual squeezing of quantum fields might also play a role in quantum field theories.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# MACを用いた施設配置機構設計

MAC Advice for Facility Location Mechanism Design ( http://arxiv.org/abs/2403.12181v1 )

ライセンス: Link先を確認

Zohar Barak, Anupam Gupta, Inbal Talgam-Cohen,

(参考訳) 予測付きアルゴリズムは、伝統的な最悪のケース分析を超える方法として、施設位置の変種を含む、さまざまな領域で近年注目を集めている。我々は、$k$-facilityロケーションメカニズムの設計問題を調査し、$n$エージェントは戦略的であり、位置を誤報告する可能性がある。以前のモデルとは異なり、$k$の最適施設位置の予測は各エージェントの位置の予測に対して$n$の予測を受ける。しかし、これらの予測は「ほとんど」で「ほぼ」正しい(略してMAC)、すなわち予測された位置の$\delta$-fractionの一部が任意に誤りを許容され、残りの予測は$\varepsilon$-errorまで修正できる。我々は誤りの独立を前提にしない。このような予測は、現在の防犯施設の配置で最善を尽くすことができるだろうか? 一組の点の1ドル中央値(幾何学的中央値)は、汚職の下で自然に堅牢であることが示され、MAC予測による単一ファクティリティ位置のアルゴリズムが導かれる。我々はロバスト性の結果を$k$のファシリティケースの"バランスの取れた"変種に拡張する。バランスが取れなければ、ライン上の$k=2$の設備であっても、ロバスト性は完全に崩壊する。この「バランスの取れない」設定のために、予測を使用しないLu et al [2010] の最もよく知られた結果を上回る真にランダムなメカニズムを考案する。途中に「第2の」施設配置の問題(第1の施設位置が固定されている場合)を導入する。古典的なブレークポイントの定量的バージョンがロバストな統計結果をもたらすため、中間者1ドル、より一般的な$k$-メディアンのロバスト性に関する我々の発見は、独立した関心を持つ可能性がある。

Algorithms with predictions have attracted much attention in the last years across various domains, including variants of facility location, as a way to surpass traditional worst-case analyses. We study the $k$-facility location mechanism design problem, where the $n$ agents are strategic and might misreport their location. Unlike previous models, where predictions are for the $k$ optimal facility locations, we receive $n$ predictions for the locations of each of the agents. However, these predictions are only "mostly" and "approximately" correct (or MAC for short) -- i.e., some $\delta$-fraction of the predicted locations are allowed to be arbitrarily incorrect, and the remainder of the predictions are allowed to be correct up to an $\varepsilon$-error. We make no assumption on the independence of the errors. Can such predictions allow us to beat the current best bounds for strategyproof facility location? We show that the $1$-median (geometric median) of a set of points is naturally robust under corruptions, which leads to an algorithm for single-facility location with MAC predictions. We extend the robustness result to a "balanced" variant of the $k$ facilities case. Without balancedness, we show that robustness completely breaks down, even for the setting of $k=2$ facilities on a line. For this "unbalanced" setting, we devise a truthful random mechanism that outperforms the best known result of Lu et al. [2010], which does not use predictions. En route, we introduce the problem of "second" facility location (when the first facility's location is already fixed). Our findings on the robustness of the $1$-median and more generally $k$-medians may be of independent interest, as quantitative versions of classic breakdown-point results in robust statistics.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# ニューラルネットワークによるRKHS関数の近似

Approximation of RKHS Functionals by Neural Networks ( http://arxiv.org/abs/2403.12187v1 )

ライセンス: Link先を確認

Tian-Yi Zhou, Namjoon Suh, Guang Cheng, Xiaoming Huo,

(参考訳) 時系列や画像などの関数データの豊富さによって、そのようなデータをニューラルネットワークに統合し、関数空間からR(関数)へのマップを学習することへの関心が高まっている。本稿では,ニューラルネットワークを用いたカーネルヒルベルト空間(RKHS)における関数の近似について検討する。我々は、RKHS上の関数の近似の普遍性を確立する。具体的には、逆多重四元数、ガウス、ソボレフのカーネルによって誘導されるものに対して明示的な誤差境界を導出する。さらに、ニューラルネットワークが一般化された汎関数線形モデルにおける回帰マップを正確に近似できることを証明し、機能回帰に本研究の成果を適用した。関数学習に関する既存の研究は、事前定義された基底関数のセットを含む統合型基底関数の拡張を必要とする。 RKHSの直交射影を補間することにより,基本関数展開の代替として点評価を用いることで,提案するネットワークはよりシンプルになる。

Motivated by the abundance of functional data such as time series and images, there has been a growing interest in integrating such data into neural networks and learning maps from function spaces to R (i.e., functionals). In this paper, we study the approximation of functionals on reproducing kernel Hilbert spaces (RKHS's) using neural networks. We establish the universality of the approximation of functionals on the RKHS's. Specifically, we derive explicit error bounds for those induced by inverse multiquadric, Gaussian, and Sobolev kernels. Moreover, we apply our findings to functional regression, proving that neural networks can accurately approximate the regression maps in generalized functional linear models. Existing works on functional learning require integration-type basis function expansions with a set of pre-specified basis functions. By leveraging the interpolating orthogonal projections in RKHS's, our proposed network is much simpler in that we use point evaluations to replace basis function expansions.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# PETScML:Scientific Machine Learningにおける回帰学習問題の2次解法

PETScML: Second-order solvers for training regression problems in Scientific Machine Learning ( http://arxiv.org/abs/2403.12188v1 )

ライセンス: Link先を確認

Stefano Zampini, Umberto Zerbinati, George Turkiyyah, David Keyes,

(参考訳) 近年,計算機科学や工学の応用による深層学習技術を用いた分析ツールとして,科学機械学習が出現するのを目撃している。これらの手法のコアとなるのは、ニューラルネットワークの実現を学ぶための教師付きトレーニングアルゴリズムである。しかし、ディープラーニングの実践とは違って、科学的な機械学習のトレーニング問題では、スムーズなデータの量が多くなり、経験的リスク関数のキャラクタリゼーションが向上し、制約のない最適化のための従来の解法に適している。我々は,Portable and Extensible Toolkit for Scientific計算上に構築された軽量なソフトウェアフレームワークを導入し,ディープラーニングソフトウェアと非制約最小化のための従来の解法とのギャップを埋める。我々は,幅広い科学的機械学習手法とテストケースのサロゲートモデルを学習する際に,回帰タスクから生じる一般化誤差を改善するために,ヘッセンのガウス・ニュートン近似に基づく信頼領域法の有効性を実証的に実証した。 L-BFGSや不正確なニュートンを含む従来の二階解法は、コストや精度の観点からも、サロゲートモデルの検証に使用される適応的な一階解法と比較して好意的に比較した。

In recent years, we have witnessed the emergence of scientific machine learning as a data-driven tool for the analysis, by means of deep-learning techniques, of data produced by computational science and engineering applications. At the core of these methods is the supervised training algorithm to learn the neural network realization, a highly non-convex optimization problem that is usually solved using stochastic gradient methods. However, distinct from deep-learning practice, scientific machine-learning training problems feature a much larger volume of smooth data and better characterizations of the empirical risk functions, which make them suited for conventional solvers for unconstrained optimization. We introduce a lightweight software framework built on top of the Portable and Extensible Toolkit for Scientific computation to bridge the gap between deep-learning software and conventional solvers for unconstrained minimization. We empirically demonstrate the superior efficacy of a trust region method based on the Gauss-Newton approximation of the Hessian in improving the generalization errors arising from regression tasks when learning surrogate models for a wide range of scientific machine-learning techniques and test cases. All the conventional second-order solvers tested, including L-BFGS and inexact Newton with line-search, compare favorably, either in terms of cost or accuracy, with the adaptive first-order methods used to validate the surrogate models.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# POLARトラバースデータセット:極地における極地移動をシミュレーションしたステレオカメラ画像のデータセット

The POLAR Traverse Dataset: A Dataset of Stereo Camera Images Simulating Traverses across Lunar Polar Terrain under Extreme Lighting Conditions ( http://arxiv.org/abs/2403.12194v1 )

ライセンス: Link先を確認

Margaret Hansen, Uland Wong, Terrence Fong,

(参考訳) POLARtraverse Dataset: 直線トラバースをシミュレートするために設計された極点照明条件下での月状地形の高忠実なステレオペア画像のデータセットを提案する。カメラの高さやピッチの異なる個々のトラバースの画像は、静止したステレオバーをリゴリスシミュレーションで満たされたテストベッドの上を移動させ、1m間隔で記録され、月の南極地形を模した形状になっている。地上の真実の幾何学やカメラの位置情報も記録された。このデータセットは、月極環境における使用のために、ステレオカメラやモノクラーカメラのイメージ、例えば視覚計測のようなソフトウェアアルゴリズムを開発し、テストすることを目的としており、また、月の極域で期待される照明条件についての洞察を提供する。

We present the POLAR Traverse Dataset: a dataset of high-fidelity stereo pair images of lunar-like terrain under polar lighting conditions designed to simulate a straight-line traverse. Images from individual traverses with different camera heights and pitches were recorded at 1 m intervals by moving a suspended stereo bar across a test bed filled with regolith simulant and shaped to mimic lunar south polar terrain. Ground truth geometry and camera position information was also recorded. This dataset is intended for developing and testing software algorithms that rely on stereo or monocular camera images, such as visual odometry, for use in the lunar polar environment, as well as to provide insight into the expected lighting conditions in lunar polar regions.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# レンズのシフト:大規模言語モデルを用いたnpmエコシステム内のマルウェアの検出

Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models ( http://arxiv.org/abs/2403.12196v1 )

ライセンス: Link先を確認

Nusrat Zahan, Philipp Burckhardt, Mikola Lysenko, Feross Aboukhadijeh, Laurie Williams,

(参考訳) Gartner 2022のレポートは、世界中の組織の45%が2025年までにソフトウェアサプライチェーンの攻撃に遭遇すると予想しており、コミュニティと国家の利益のためにソフトウェアサプライチェーンのセキュリティを改善する緊急性を強調している。現在のマルウェア検出技術は、良性パッケージとマルウェアパッケージをフィルタリングすることで手動でレビューするのに役立つが、偽陽性率が高く、自動化サポートが限られている。したがって、マルウェア検出技術は、正確かつ最小限の偽陽性結果に対する高度な、より自動化されたアプローチの恩恵を受けることができる。本研究の目的は,大規模言語モデル(LLM)の実証研究を通じて,セキュリティアナリストによる悪意のあるパッケージの特定を支援し,npmエコシステムにおける潜在的なマルウェアを検出することである。本稿では,ChatGPTの反復的自己修正とゼロショットロールプレイチェーンを用いた多段階意思決定マルウェア検出ワークフローであるSocketAI Scannerを提案する。我々は,5,115 npmパッケージ(そのうち2,180は悪意がある)を調査し,静的解析ツールを用いてGPT-3およびGPT-4モデルのベースライン比較を行った。誤分類警告率の低いGPTモデルでは有望な結果が得られた。ベースライン比較では, 25%以上の精度, 15%以上のF1スコアにおいて, 静的解析よりも顕著な改善が見られた。 GPT-3モデルの精度は91%, F1スコアは94%であった。 GPT-4は精度(99%)とF1(97%)が優れており、GPT-3は費用対効果のバランスを示す。

The Gartner 2022 report predicts that 45% of organizations worldwide will encounter software supply chain attacks by 2025, highlighting the urgency to improve software supply chain security for community and national interests. Current malware detection techniques aid in the manual review process by filtering benign and malware packages, yet such techniques have high false-positive rates and limited automation support. Therefore, malware detection techniques could benefit from advanced, more automated approaches for accurate and minimally false-positive results. The goal of this study is to assist security analysts in identifying malicious packages through the empirical study of large language models (LLMs) to detect potential malware in the npm ecosystem. We present SocketAI Scanner, a multi-stage decision-maker malware detection workflow using iterative self-refinement and zero-shot-role-play-Chain of Thought (CoT) prompting techniques for ChatGPT. We studied 5,115 npm packages (of which 2,180 are malicious) and performed a baseline comparison of the GPT-3 and GPT-4 models with a static analysis tool. Our findings showed promising results for GPT models with low misclassification alert rates. Our baseline comparison demonstrates a notable improvement over static analysis in precision scores above 25% and F1 scores above 15%. We attained precision and F1 scores of 91% and 94%, respectively, for the GPT-3 model. Overall, GPT-4 demonstrates superior performance in precision (99%) and F1 (97%) scores, while GPT-3 presents a cost-effective balance between performance and expenditure.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# E2F-Net:StyleGANラテント・スペースによるアイ・ツー・フェイス・インペインティング

E2F-Net: Eyes-to-Face Inpainting via StyleGAN Latent Space ( http://arxiv.org/abs/2403.12197v1 )

ライセンス: Link先を確認

Ahmad Hassanpour, Fatemeh Jamalbafrani, Bian Yang, Kiran Raja, Raymond Veldhuis, Julian Fierrez,

(参考訳) 顔画像の欠落または損傷領域を復元する技術である顔の塗り絵は、隠蔽されたシナリオにおける顔認識や、品質の悪いキャプチャによる画像解析といった応用において重要なものである。このプロセスは、現実的なヴィジュアライゼーションを生成するだけでなく、個々のアイデンティティ特性も保持する。本研究の目的は、新しいGANベースの「Eyes-to-Face Network (E2F-Net)」モデルにより、眼球周囲領域(眼球面)に塗布することである。提案手法は,2つの専用エンコーダを用いて眼周囲領域から同一性および非同一性の特徴を抽出する。抽出された特徴は、事前訓練されたStyleGANジェネレータの潜伏空間にマッピングされ、最先端の性能とリッチで多様な表現力のある潜伏空間の恩恵を受けることができる。 GANインバージョン手法の最適化により,遅延空間における最適コードを見つけるために,StyleGAN出力をさらに改良する。私たちのE2F-Netは、二次的な利点として計算の複雑さを減らす最小限のトレーニングプロセスを必要とします。広範囲な実験を通して,本手法は,訓練と監督の努力が著しく少ないにも関わらず,顔全体を高品質に再構築し,現在の技術を超えていることを示す。提案手法をトレーニングし,検証するために,よく知られた公開顔データセットに基づいて,視線対面データセットを7つ生成した。コードとデータセットは公開されている。

Face inpainting, the technique of restoring missing or damaged regions in facial images, is pivotal for applications like face recognition in occluded scenarios and image analysis with poor-quality captures. This process not only needs to produce realistic visuals but also preserve individual identity characteristics. The aim of this paper is to inpaint a face given periocular region (eyes-to-face) through a proposed new Generative Adversarial Network (GAN)-based model called Eyes-to-Face Network (E2F-Net). The proposed approach extracts identity and non-identity features from the periocular region using two dedicated encoders have been used. The extracted features are then mapped to the latent space of a pre-trained StyleGAN generator to benefit from its state-of-the-art performance and its rich, diverse and expressive latent space without any additional training. We further improve the StyleGAN output to find the optimal code in the latent space using a new optimization for GAN inversion technique. Our E2F-Net requires a minimum training process reducing the computational complexity as a secondary benefit. Through extensive experiments, we show that our method successfully reconstructs the whole face with high quality, surpassing current techniques, despite significantly less training and supervision efforts. We have generated seven eyes-to-face datasets based on well-known public face datasets for training and verifying our proposed methods. The code and datasets are publicly available.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# FLex:ステレオ内視鏡映像のダイナミック・ラジアンス・フィールド最適化

FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos ( http://arxiv.org/abs/2403.12198v1 )

ライセンス: Link先を確認

Florian Philipp Stilz, Mert Asim Karaoglu, Felix Tristram, Nassir Navab, Benjamin Busam, Alexander Ladikos,

(参考訳) 内視鏡的シーンの再構築は、外科手術後の分析から教育訓練まで、様々な医療応用にとって重要な要素である。最近, 変形組織を用いた内視鏡的再建術で有望な成績を示した。しかし、セットアップは、静的内視鏡、変形の制限、または内視鏡カメラのカメラポーズ情報を取得するための外部追跡装置に限られている。 FLexでは、変形組織の非常にダイナミックな環境において、動く内視鏡の挑戦的なセットアップを飾ります。複数重重なり合う4次元ニューラルラジアンスフィールド(NeRF)への暗黙的なシーン分離と、再構成とカメラのスクラッチからのポーズを協調的に最適化するプログレッシブ最適化手法を提案する。これにより、使いやすさが向上し、5000フレーム以上の手術ビデオの処理に間に合うように再構築能力を拡張できる。 StereoMISデータセットの大規模な評価により、FLexは競争力のあるポーズ精度を維持しながら、新規ビュー合成の品質を著しく向上することが示された。

Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. Neural rendering has recently shown promising results in endoscopic reconstruction with deforming tissue. However, the setup has been restricted to a static endoscope, limited deformation, or required an external tracking device to retrieve camera pose information of the endoscopic camera. With FLex we adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information. Extensive evaluations on the StereoMIS dataset show that FLex significantly improves the quality of novel view synthesis while maintaining competitive pose accuracy.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# 機械学習プロジェクトにおけるCI/CDパイプラインの進化に関する実証分析

Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects ( http://arxiv.org/abs/2403.12199v1 )

ライセンス: Link先を確認

Alaa Houerbi, Chadha Siala, Alexis Tucker, Dhia Elhaq Rzig, Foyzul Hassan,

(参考訳) 機械学習(ML)の人気が高まり、他のソフトウェアアーティファクトとのMLコンポーネントの統合が増加し、Travis CIやGitHub Actionsなどの継続的インテグレーションとデリバリ(CI/CD)ツールが利用されるようになった。このようなCI/CD構成とサービスは、プロジェクトのライフサイクル中に同期を必要とする。従来のソフトウェアシステムにおけるCI/CD構成とサービスの使い方について、いくつかの研究が議論された。しかしながら、MLプロジェクトでのCI/CD構成とサービスの変更に関する知識は限られている。この知識ギャップを埋めるために、この研究は、MLソフトウェアシステムにおけるCI/CD構成の進化に関する最初の経験的分析を示す。我々は508のオープンソースMLプロジェクトから収集された343のコミットを手動で分析し、MLプロジェクトにおいて一般的なCI/CD構成変更カテゴリを特定し、CI/CDとMLコンポーネントの14の共変更の分類法を考案した。さらに, 頻繁なCI/CD構成変更パターンを15,634コミットで識別するCI/CD構成変更クラスタリングツールを開発した。さらに、CI/CD構成を変更するML開発者の専門知識を測定しました。この分析から、コミットの61.8%がビルドポリシーの変更と、一般的なオープンソースプロジェクトと比較してパフォーマンスと保守性に関する最小限の変更を含んでいることがわかった。さらに、共進化分析では、CI/CD構成が、依存関係の直接包摂や標準化されたテストフレームワークの使用の欠如といった悪いプラクティスのために、不要に変更されたことが判明した。推奨外の設定とジェネリックビルド言語への依存による変更パターンの分析を通じて、さらに多くのプラクティスが見つかった。最後に、私たちの開発者の専門知識分析は、経験豊富な開発者がCI/CD構成を変更する傾向にあることを示唆しています。

The growing popularity of machine learning (ML) and the integration of ML components with other software artifacts has led to the use of continuous integration and delivery (CI/CD) tools, such as Travis CI, GitHub Actions, etc. that enable faster integration and testing for ML projects. Such CI/CD configurations and services require synchronization during the life cycle of the projects. Several works discussed how CI/CD configuration and services change during their usage in traditional software systems. However, there is very limited knowledge of how CI/CD configuration and services change in ML projects. To fill this knowledge gap, this work presents the first empirical analysis of how CI/CD configuration evolves for ML software systems. We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories in ML projects and devised a taxonomy of 14 co-changes in CI/CD and ML components. Moreover, we developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits. Furthermore, we measured the expertise of ML developers who modify CI/CD configurations. Based on this analysis, we found that 61.8% of commits include a change to the build policy and minimal changes related to performance and maintainability compared to general open-source projects. Additionally, the co-evolution analysis identified that CI/CD configurations, in many cases, changed unnecessarily due to bad practices such as the direct inclusion of dependencies and a lack of usage of standardized testing frameworks. More practices were found through the change patterns analysis consisting of using deprecated settings and reliance on a generic build language. Finally, our developer's expertise analysis suggests that experienced developers are more inclined to modify CI/CD configurations.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# 人間と機械の機能の合成学習

Compositional learning of functions in humans and machines ( http://arxiv.org/abs/2403.12201v1 )

ライセンス: Link先を確認

Yanli Zhou, Brenden M. Lake, Adina Williams,

(参考訳) 機能を学び、構成する能力は、人間の効率的な学習と推論の基礎となり、既知の調理プロセスから新しい料理を作るといった柔軟な一般化を可能にします。関数の逐次連鎖以外にも、既存の言語学文献では、人間が相互作用する関数によってより複雑な構成を把握できることが示されている。視覚領域の調査を拡大し、様々な相互作用条件下での合成機能を用いた学習・推論において、人間とニューラルネットワークモデルの能力を探究する機能学習パラダイムを開発した。個々の機能に関する短いトレーニングの後、人間の参加者は、2つの学習された機能を構成する上で評価され、第1の関数の応用が第2の関数を適用するコンテキストを作成したり削除したりするインスタンスを含む4つの主要な相互作用タイプをカバーする方法が検討された。以上の結果から,人間は相互作用条件にまたがる新しい視覚機能合成をゼロショットで一般化することができ,文脈変化に対する感受性を示すことが示唆された。同じタスクにおけるニューラルネットワークモデルとの比較により、合成性(MLC)アプローチのメタラーニングを通じて、標準的なシーケンス対シーケンス変換器は、構成関数における人間の一般化パターンを模倣することができることが明らかになった。

The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# DeCoTR:2Dおよび3Dアテンションによる深度補完

DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions ( http://arxiv.org/abs/2403.12202v1 )

ライセンス: Link先を確認

Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli,

(参考訳) 本稿では,2次元と3次元の両方の注意を生かして,反復的な空間伝搬を必要とせず,高精度な深度補完を実現する手法を提案する。具体的には、まず、ボトルネックにおける2次元特徴に注意を向け、接続をスキップすることで、ベースライン畳み込み深度補完モデルを強化する。これにより、この単純なネットワークの性能が向上し、最新の複雑なトランスフォーマーベースモデルと同等に設定できる。このネットワークからの初期深度と特徴を活用して、2D機能を引き上げて3Dポイントクラウドを形成し、3Dポイントトランスフォーマーを構築し、モデルが3D幾何学的特徴を明示的に学習し、活用できるようにする。さらに,本論文では,学習を改善する点群処理の正規化手法を提案し,棚から点変圧器を直接使用するよりも精度が向上した。さらに、ダウンサンプリングされたポイントクラウド機能に対するグローバルな注意を取り入れ、計算可能でありながら、長距離コンテキストを可能にする。提案手法であるDeCoTRを,NYU Depth V2 や KITTI を含む確立された深度補完ベンチマークで評価した結果,新しい最先端性能が得られた。さらに、ScanNetおよびDDADベンチマークでゼロショット評価を行い、DeCoTRが既存のアプローチよりも優れた一般化性を有することを示した。

In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network and sets it on par with the latest, complex transformer-based models. Leveraging the initial depths and features from this network, we uplift the 2D features to form a 3D point cloud and construct a 3D point transformer to process it, allowing the model to explicitly learn and exploit 3D geometric features. In addition, we propose normalization techniques to process the point cloud, which improves learning and leads to better accuracy than directly using point transformers off the shelf. Furthermore, we incorporate global attention on downsampled point cloud features, which enables long-range context while still being computationally feasible. We evaluate our method, DeCoTR, on established depth completion benchmarks, including NYU Depth V2 and KITTI, showcasing that it sets new state-of-the-art performance. We further conduct zero-shot evaluations on ScanNet and DDAD benchmarks and demonstrate that DeCoTR has superior generalizability compared to existing approaches.

翻訳日:2024-03-20 18:21:58 公開日:2024-03-18

# ビジョンベースのアジャイルフライトのための模倣によるブートストラップ強化学習

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight ( http://arxiv.org/abs/2403.12203v1 )

ライセンス: Link先を確認

Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza,

(参考訳) 視覚に基づく自律型ドローンレースにおける強化学習(RL)の有効性とImitation Learning(IL)の有効性を組み合わせる。我々は、明示的な状態推定なしで視覚入力を直接処理することに集中する。 RLは、試行錯誤を通じて複雑なコントローラを学習するための一般的なフレームワークを提供するが、視覚入力の高次元性のため、サンプル効率と計算要求に関する課題に直面している。逆に、ILは視覚的なデモンストレーションから学ぶことの効率を示すが、これらのデモの品質によって制限され、共変量シフトのような問題に直面している。これらの制約を克服するために、RLとILの利点を組み合わせた新しいトレーニングフレームワークを提案する。本フレームワークは,特権状態情報を用いた教師政策の初期訓練,ILを用いた学生政策への蒸留,適応的RL微調整の3段階からなる。実環境と実環境の両方でのシミュレーション実験により,我々の手法は,明示的な状態推定を伴わない視覚情報のみを用いて,レースコースを走行する際に,ILやRL単独よりも優れた性能とロバスト性を達成できることが示されている。

We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL demonstrates efficiency in learning from visual demonstrations but is limited by the quality of those demonstrations and faces issues like covariate shift. To overcome these limitations, we propose a novel training framework combining RL and IL's advantages. Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# BACQ - 量子コンピューティングのためのアプリケーション指向ベンチマーク

BACQ - Application-oriented Benchmarks for Quantum Computing ( http://arxiv.org/abs/2403.12205v1 )

ライセンス: Link先を確認

Frédéric Barbaresco, Laurent Rioux, Christophe Labreuche, Michel Nowak, Noé Olivier, Damien Nicolazic, Olivier Hess, Anne-Lise Guilmin, Robert Wang, Tanguy Sassolas, Stéphane Louise, Kyrylo Snizhko, Grégoire Misguich, Alexia Auffèves, Robert Whitney, Emmanuelle Vergnaud, Félicien Schopfer,

(参考訳) フランスの量子戦略の一部であるMetriQs-Franceは、量子技術の測定、標準、評価に関する国家プログラムの支援により、BACQプロジェクトは量子コンピューティングのアプリケーション指向ベンチマークに特化している。 The consortium gathering THALES, EVIDEN, a Atos business, CEA, CNRS, TERATEC, LNEは, 業界ユーザにとって有意義な参照評価基準を確立することを目的としている。

With the support of the national program on measurements, standards, and evaluation of quantum technologies MetriQs-France, a part of the French national quantum strategy, the BACQ project is dedicated to application-oriented benchmarks for quantum computing. The consortium gathering THALES, EVIDEN, an Atos business, CEA, CNRS, TERATEC, and LNE aims at establishing performance evaluation criteria of reference, meaningful for industry users.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# データフィッティングに有用なコンパクト表現

Useful Compact Representations for Data-Fitting ( http://arxiv.org/abs/2403.12206v1 )

ライセンス: Link先を確認

Johannes J. Brust,

(参考訳) 第2の微分情報を持たない最小化問題に対して、ヘッセンマトリスを推定する手法は非常に効果的である。しかし、従来の手法では、大きな問題に対して禁忌な高密度な行列が生成される。限定メモリのコンパクト表現は、低ランクの表現の観点から密度の強い配列を表現し、大規模な決定論的問題に対するソフトウェア実装の最先端技術となっている。我々はベクトルの選択によってパラメータ化される新しいコンパクト表現を開発し、特別な選択のために既存のよく知られた公式に還元する。本研究では, 大規模固有値計算, テンソル因子分解, 非線形回帰に対するコンパクト表現の有効性を示す。

For minimization problems without 2nd derivative information, methods that estimate Hessian ma- trices can be very effective. However, conventional techniques generate dense matrices that are prohibitive for large problems. Limited-memory compact representations express the dense arrays in terms of a low rank representation and have become the state-of-the-art for software implementations on large deterministic problems. We develop new compact representations that are parameterized by a choice of vectors and that reduce to existing well known formulas for special choices. We demonstrate effectiveness of the compact representations for large eigenvalue computations, tensor factorizations and nonlinear regressions.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# サイバー影響操作における合成画像生成 : 創発的脅威?

Synthetic Image Generation in Cyber Influence Operations: An Emergent Threat? ( http://arxiv.org/abs/2403.12207v1 )

ライセンス: Link先を確認

Melanie Mathys, Marco Willi, Michael Graber, Raphael Meier,

(参考訳) 人工知能(AI)の進化は、デジタルコンテンツ生成の変革を触媒し、サイバー・インフルエンス・オペレーションに深く影響している。本報告では, 合成画像の作成において, 拡散モデルなどの生成的深層学習モデルの可能性と限界について述べる。我々は、これらのツールのアクセシビリティ、実用性、出力品質と、それらが詐欺、影響、転倒の脅威シナリオに与える影響を批判的に評価する。このレポートは、いくつかの仮説的サイバー影響操作に関するコンテンツを生成し、脅威アクターに対するこれらのAI駆動手法の現在の能力と限界を実証している。生成モデルはイラストや非現実的画像の制作に優れるが、人間の指導による洗練の必要性と計算資源によって制限された、説得力のある写真リアルコンテンツを作成することは依然として大きな課題である。我々の調査は、技術進歩と誤用の可能性の微妙なバランスを浮き彫りにして、進行中の研究、防衛機構、多分野連携、政策開発への推奨を促している。これらの勧告は、特にサイバー影響の文脈において、情報の完全性に対するリスクを保護しながら、ポジティブな影響に対するAIの可能性を活用することを目的としている。

The evolution of artificial intelligence (AI) has catalyzed a transformation in digital content generation, with profound implications for cyber influence operations. This report delves into the potential and limitations of generative deep learning models, such as diffusion models, in fabricating convincing synthetic images. We critically assess the accessibility, practicality, and output quality of these tools and their implications in threat scenarios of deception, influence, and subversion. Notably, the report generates content for several hypothetical cyber influence operations to demonstrate the current capabilities and limitations of these AI-driven methods for threat actors. While generative models excel at producing illustrations and non-realistic imagery, creating convincing photo-realistic content remains a significant challenge, limited by computational resources and the necessity for human-guided refinement. Our exploration underscores the delicate balance between technological advancement and its potential for misuse, prompting recommendations for ongoing research, defense mechanisms, multi-disciplinary collaboration, and policy development. These recommendations aim to leverage AI's potential for positive impact while safeguarding against its risks to the integrity of information, especially in the context of cyber influence.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 効率的な強化学習のためのリアプノフ関数の解法

Decomposing Control Lyapunov Functions for Efficient Reinforcement Learning ( http://arxiv.org/abs/2403.12210v1 )

ライセンス: Link先を確認

Antonio Lopez, David Fridovich-Keil,

(参考訳) 強化学習(RL)を用いた最近の手法は、未知の環境で知的エージェントの訓練に成功している。しかし、RLは現実世界のロボティクスのシナリオでは広く適用されていない。これは、現在の最先端のRLメソッドでは、特定のタスクを学習するために大量のデータを必要とするため、エージェントをデプロイして実際のアプリケーションにデータを収集する場合、不合理なコストが発生するためである。本稿では,RLの報酬関数を再現する既存の作業から,サンプルの複雑性を低減するための制御リャプノフ関数(CLF)を導入する。それでも、この定式化にはシステムのCLFを知る必要があるが、一般的な手法が欠如しているため、適切なCLFを特定することはしばしば困難である。既存の作業はハミルトン・ヤコビ到達可能性手順を通じて低次元のCLFを計算することができる。しかし、この手法は高次元システムでは難解となり、システム分解技術を用いて分解制御リアプノフ関数 (DCLF) と呼ばれるものを計算する。計算されたDCLFを報酬形成に使用し、RL性能の向上を示す。複数の例を通して、我々の手法は、最先端のソフトアクター批判アルゴリズムが必要とする実世界のデータの半分以下にクワッドコプターを着陸させる政策を立証する。

Recent methods using Reinforcement Learning (RL) have proven to be successful for training intelligent agents in unknown environments. However, RL has not been applied widely in real-world robotics scenarios. This is because current state-of-the-art RL methods require large amounts of data to learn a specific task, leading to unreasonable costs when deploying the agent to collect data in real-world applications. In this paper, we build from existing work that reshapes the reward function in RL by introducing a Control Lyapunov Function (CLF), which is demonstrated to reduce the sample complexity. Still, this formulation requires knowing a CLF of the system, but due to the lack of a general method, it is often a challenge to identify a suitable CLF. Existing work can compute low-dimensional CLFs via a Hamilton-Jacobi reachability procedure. However, this class of methods becomes intractable on high-dimensional systems, a problem that we address by using a system decomposition technique to compute what we call Decomposed Control Lyapunov Functions (DCLFs). We use the computed DCLF for reward shaping, which we show improves RL performance. Through multiple examples, we demonstrate the effectiveness of this approach, where our method finds a policy to successfully land a quadcopter in less than half the amount of real-world data required by the state-of-the-art Soft-Actor Critic algorithm.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 欠損を伴う縦型マルチモーダル・マルチビュー予測のための統一モデル

A Unified Model for Longitudinal Multi-Modal Multi-View Prediction with Missingness ( http://arxiv.org/abs/2403.12211v1 )

ライセンス: Link先を確認

Boqi Chen, Junier Oliva, Marc Niethammer,

(参考訳) 医療記録は、画像、テキスト、表情報など、様々なモダリティから構成されることが多い。すべてのモダリティを統合することは、患者の状態の全体像を提供すると同時に、それらを縦に分析することで、疾患の進行をよりよく理解する。しかし、現実世界の経時的医療記録には課題がある。 1)患者は特定の時点のデータの一部または全部を欠くことがあり、 2) ある期間にすべての患者に特定のモダリティや見解が欠如している可能性がある。本研究では,長手型マルチモーダル・マルチビュー(MMMV)予測のための統一モデルを提案する。提案手法は,入力に希望する時間ポイントを最大で確保し,利用可能なデータをすべて活用することを目的としている。変形性膝関節症に対するOAI(Ocearthritis Initiative)とKellgren-Lawrence grade(KLG)による膝関節症データセットの実験的評価を行った。我々は,本手法の有効性を,トレーニングと評価において同一のモダリティとビューの組み合わせを使用する特定のモデルと比較することによって示す。また、時間的データの拡張による利点を示し、異なるタスクにおける各モダリティ/ビューの重要性をより深く理解するためのポストホック分析を提供する。

Medical records often consist of different modalities, such as images, text, and tabular information. Integrating all modalities offers a holistic view of a patient's condition, while analyzing them longitudinally provides a better understanding of disease progression. However, real-world longitudinal medical records present challenges: 1) patients may lack some or all of the data for a specific timepoint, and 2) certain modalities or views might be absent for all patients during a particular period. In this work, we introduce a unified model for longitudinal multi-modal multi-view (MMMV) prediction with missingness. Our method allows as many timepoints as desired for input, and aims to leverage all available data, regardless of their availability. We conduct extensive experiments on the knee osteoarthritis dataset from the Osteoarthritis Initiative (OAI) for pain and Kellgren-Lawrence grade (KLG) prediction at a future timepoint. We demonstrate the effectiveness of our method by comparing results from our unified model to specific models that use the same modality and view combinations during training and evaluation. We also show the benefit of having extended temporal data and provide post-hoc analysis for a deeper understanding of each modality/view's importance for different tasks.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 名前付きエンティティ認識の評価:ブラジルのコーポレートオーナニングにおける単言語モデルと多言語トランスフォーマーモデルの比較分析

Evaluating Named Entity Recognition: Comparative Analysis of Mono- and Multilingual Transformer Models on Brazilian Corporate Earnings Call Transcriptions ( http://arxiv.org/abs/2403.12212v1 )

ライセンス: Link先を確認

Ramon Abilio, Guilherme Palermo Coelho, Ana Estela Antunes da Silva,

(参考訳) 名前付きエンティティ認識(NER)は、テキスト文書から情報を抽出する自然言語処理技術である。しかし、NERに関する既存の研究の多くは英語の文書を中心にしており、ポルトガルの金融ドメインに合わせたデータセットの入手率の差を残している。本研究は、ブラジルの銀行の決算報告から抽出したポルトガル語テキストに着目し、金融分野におけるNERの必要性に対処するものである。 384文字からなる包括的データセットの収集とアノテーションの弱監督手法の活用により,ポルトガル語で訓練された単言語モデル(BERTimbau, PTT5)と多言語モデル(mBERT, mT5)の性能評価を行った。特に,トークン分類タスクをテキスト生成問題として再編成し,T5モデルの微調整と評価を可能にする手法を提案する。モデルの微調整に続いて、テストデータセットの評価を行い、パフォーマンスとエラーのメトリクスを利用する。以上の結果から,BERTベースモデルはT5ベースモデルより一貫して優れていた。さらに,マルチ言語モデルはマクロF1スコアに匹敵する性能を示したが,BERTimbauはPTT5よりも優れた性能を示した。 PTT5 と mT5 が生成した文のマニュアル解析では、元の文と生成された文の間に 0.89 から 1.0 までの類似度が示される。しかし、両モデルとも通貨やパーセンテージの値の変更など、金融分野における正確性や整合性の重要性を裏付ける不一致を示すため、重大なエラーが発生する。これらの課題にもかかわらず、PTT5とmT5はそれぞれ98.52%と98.85%という印象的なマクロF1スコアを達成した。さらに,本研究では,モデル間の推論において,メモリと時間消費の顕著な相違点に光を当てた。

Named Entity Recognition (NER) is a Natural Language Processing technique for extracting information from textual documents. However, much of the existing research on NER has been centered around English-language documents, leaving a gap in the availability of datasets tailored to the financial domain in Portuguese. This study addresses the need for NER within the financial domain, focusing on Portuguese-language texts extracted from earnings call transcriptions of Brazilian banks. By curating a comprehensive dataset comprising 384 transcriptions and leveraging weak supervision techniques for annotation, we evaluate the performance of monolingual models trained on Portuguese (BERTimbau and PTT5) and multilingual models (mBERT and mT5). Notably, we introduce a novel approach that reframes the token classification task as a text generation problem, enabling fine-tuning and evaluation of T5 models. Following the fine-tuning of the models, we conduct an evaluation on the test dataset, employing performance and error metrics. Our findings reveal that BERT-based models consistently outperform T5-based models. Furthermore, while the multilingual models exhibit comparable macro F1-scores, BERTimbau demonstrates superior performance over PTT5. A manual analysis of sentences generated by PTT5 and mT5 unveils a degree of similarity ranging from 0.89 to 1.0, between the original and generated sentences. However, critical errors emerge as both models exhibit discrepancies, such as alterations to monetary and percentage values, underscoring the importance of accuracy and consistency in the financial domain. Despite these challenges, PTT5 and mT5 achieve impressive macro F1-scores of 98.52% and 98.85%, respectively, with our proposed approach. Furthermore, our study sheds light on notable disparities in memory and time consumption for inference across the models.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 2乗和によるプライベートグラフオン推定

Private graphon estimation via sum-of-squares ( http://arxiv.org/abs/2403.12213v1 )

ライセンス: Link先を確認

Hongjie Chen, Jingqiu Ding, Tommaso d'Orsi, Yiding Hua, Chih-Hung Liu, David Steurer,

(参考訳) 確率ブロックモデルを学習し,任意のブロック数の多項式ランニング時間を用いたグラフトン推定のための,最初の純ノード微分プライベートアルゴリズムを開発した。統計的効用は、これらの問題に対する以前の最良の情報理論(指数時間)ノードプライド機構のそれと一致することを保証している。このアルゴリズムは、ブロック数に依存する2乗緩和の和で定義されるスコア関数の指数的なメカニズムに基づいている。結果の主な要素は,(1)2つの確率行列のポリトープ上の2次最適化によるブロックグラモン間距離の特徴づけ,(2)任意のポリトープ上の多項式最適化のための2次収束結果の一般化,(3)総和2乗アルゴリズムパラダイムの一部としてスコア関数のリプシッツ拡張を実行するための一般アプローチである。

We develop the first pure node-differentially-private algorithms for learning stochastic block models and for graphon estimation with polynomial running time for any constant number of blocks. The statistical utility guarantees match those of the previous best information-theoretic (exponential-time) node-private mechanisms for these problems. The algorithm is based on an exponential mechanism for a score function defined in terms of a sum-of-squares relaxation whose level depends on the number of blocks. The key ingredients of our results are (1) a characterization of the distance between the block graphons in terms of a quadratic optimization over the polytope of doubly stochastic matrices, (2) a general sum-of-squares convergence result for polynomial optimization over arbitrary polytopes, and (3) a general approach to perform Lipschitz extensions of score functions as part of the sum-of-squares algorithmic paradigm.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 導波路における多重量子状態移動

Multiplexed quantum state transfer in waveguides ( http://arxiv.org/abs/2403.12222v1 )

ライセンス: Link先を確認

Guillermo F. Peñas, Ricardo Puebla, Juan José García-Ripoll,

(参考訳) 本稿では、QEDセットアップにおける量子情報の記憶と操作の最大化を示すテストベッドとして機能する量子ネットワークの現実的な導波路実装について考察する。ウェーブパケット工学と量子状態伝達プロトコルを用いて2つの手法を解析する。まず、時間領域における直交光子の族を提案し、設計する。これらの光子は異なる標的量子ビットとの選択的相互作用を可能にする。しかし、共振ノードを用いたモード多重化はクロストーク効果によって大きく損なわれている。これは第2のアプローチ、すなわち周波数多重化を動機付けている。ここでは、導波路を通る周波数多重化の限界について検討し、所定の帯域内で異なる周波数の光子をホストし、忠実に送信する能力を解析する。我々は1光と2光の詳細なシミュレーションを行い、現実的な条件下でのコヒーレント量子状態伝達プロトコルの忠実性に関する理論的境界を提供する。この結果から, 耐故障性量子コンピューティングの要求を満たすため, 数十個の多重光子を大域的忠実度で利用することが可能であることが示唆された。これは、単一光子の忠実性の条件が満たされることに注意が必要である。

In this article, we consider a realistic waveguide implementation of a quantum network that serves as a testbed to show how to maximize the storage and manipulation of quantum information in QED setups. We analyze two approaches using wavepacket engineering and quantum state transfer protocols. First, we propose and design a family of orthogonal photons in the time domain. These photons allow for a selective interaction with distinct targeted qubits. Yet, mode multiplexing employing resonant nodes is largely spoiled by cross-talk effects. This motivates the second approach, namely, frequency multiplexing. Here we explore the limits of frequency multiplexing through the waveguide, analyzing its capabilities to host and faithfully transmit photons of different frequencies within a given bandwidth. We perform detailed one- and two-photon simulations and provide theoretical bounds for the fidelity of coherent quantum state transfer protocols under realistic conditions. Our results show that state-of-the-art experiments can employ dozens of multiplexed photons with global fidelities fulfilling the requirements imposed by fault-tolerant quantum computing. This is with the caveat that the conditions for single-photon fidelity are met.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# FloodCastを用いた大規模洪水モデリングと予測

Large-scale flood modeling and forecasting with FloodCast ( http://arxiv.org/abs/2403.12226v1 )

ライセンス: Link先を確認

Qingsong Xu, Yilei Shi, Jonathan Bamber, Chaojun Ouyang, Xiao Xiang Zhu,

(参考訳) 大規模流体力学モデルは通常、高い計算コストをもたらすだけでなく、固定解像度の空間格子とモデルパラメータに依存する。これにより、洪水の隆起を正確に予測し、時限危険警報を発する能力が制限される。本研究では,FloodCastという大規模に動作可能な高速で安定,高精度,解像度不変,幾何適応的な洪水モデリングおよび予測フレームワークを構築した。このフレームワークは、マルチ衛星観測と流体力学モデリングの2つの主要なモジュールから構成されている。マルチ衛星観測モジュールでは,大規模洪水予測におけるマルチ衛星観測の可能性をフル活用するために,リアルタイムな教師なし変化検出法と降雨処理・解析ツールが提案されている。流体力学モデリングモジュールでは、物理インフォームドニューラルネットワークにおけるデータトレーニングの要件がなく、フーリエニューラル演算子による高速で正確で解像度不変のアーキテクチャを特徴とする幾何適応型物理インフォームドニューラルソルバ(GeoPINS)が導入された。 GeoPINSは、一般的なPDEにおいて、正規および不規則なドメインにまたがる印象的なパフォーマンスを示す。大規模洪水モデルにおいて,GeoPINS を用いた長期時間系列と広域空間領域を扱うためのシーケンス・ツー・シーケンスのGeoPINS モデルを提案する。次に,2022年パキスタン洪水における様々な洪水予測手法を評価するために,ベンチマークデータセットを構築した。最後に, 時空間下降時の浸水範囲, 深さ, 移動可能性の3次元的検証を行った。従来の流体力学とシークエンス・ツー・シークエンス(Sequence-to-Sequence)のGeoPINSは、SARに基づく洪水深度データと比較すると、シークエンス・ツー・シークエンス・ジオPINSは予測誤差が小さく、従来の流体力学よりも優れていた。

Large-scale hydrodynamic models generally rely on fixed-resolution spatial grids and model parameters as well as incurring a high computational cost. This limits their ability to accurately forecast flood crests and issue time-critical hazard warnings. In this work, we build a fast, stable, accurate, resolution-invariant, and geometry-adaptative flood modeling and forecasting framework that can perform at large scales, namely FloodCast. The framework comprises two main modules: multi-satellite observation and hydrodynamic modeling. In the multi-satellite observation module, a real-time unsupervised change detection method and a rainfall processing and analysis tool are proposed to harness the full potential of multi-satellite observations in large-scale flood prediction. In the hydrodynamic modeling module, a geometry-adaptive physics-informed neural solver (GeoPINS) is introduced, benefiting from the absence of a requirement for training data in physics-informed neural networks and featuring a fast, accurate, and resolution-invariant architecture with Fourier neural operators. GeoPINS demonstrates impressive performance on popular PDEs across regular and irregular domains. Building upon GeoPINS, we propose a sequence-to-sequence GeoPINS model to handle long-term temporal series and extensive spatial domains in large-scale flood modeling. Next, we establish a benchmark dataset in the 2022 Pakistan flood to assess various flood prediction methods. Finally, we validate the model in three dimensions - flood inundation range, depth, and transferability of spatiotemporal downscaling. Traditional hydrodynamics and sequence-to-sequence GeoPINS exhibit exceptional agreement during high water levels, while comparative assessments with SAR-based flood depth data show that sequence-to-sequence GeoPINS outperforms traditional hydrodynamics, with smaller prediction errors.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 分析-評価-クリーティング:ビジュアルプログラミング領域における計算思考と問題解決の評価

Analyzing-Evaluating-Creating: Assessing Computational Thinking and Problem Solving in Visual Programming Domains ( http://arxiv.org/abs/2403.12227v1 )

ライセンス: Link先を確認

Ahana Ghosh, Liina Malva, Adish Singla,

(参考訳) コンピュータ思考(CT)と問題解決のスキルは、世界中のK-8スクールカリキュラムに統合されつつある。その結果、これらのスキルの学生の熟練度を評価するための信頼性評価を開発する必要性が高まっている。近年の研究では、様々なCT概念や実践、特に大規模研究における精神測定的検証と使用を可能にする多項目に基づいて、これらのスキルを評価するための試験が提案されている。実際の関連性にもかかわらず、これらのテストは学生の計算的創造性を測定する方法に限られており、実際の環境でCTと問題解決を適用する上で重要な能力である。本研究は,ブルームの分類学における3つの高い認知レベル,すなわちアナライズ,評価,創造に焦点を当てた新しいテストであるACEを開発した。 ACEは、これらの3つのレベルにまたがる7x3の多目的アイテムの多種多様なセットで構成されており、基本的なブロックベースのビジュアルプログラミングに基づいている。学年3～7年生371名を対象に,ACEの心理測定特性について検討した。いくつかの心理測定分析フレームワークに基づいて,ACEの信頼性と妥当性を確認した。 Code.org による Hour of Code: Maze Challenge の成績と ACE における学生の成績との間にも正の相関関係が認められた。

Computational thinking (CT) and problem-solving skills are increasingly integrated into K-8 school curricula worldwide. Consequently, there is a growing need to develop reliable assessments for measuring students' proficiency in these skills. Recent works have proposed tests for assessing these skills across various CT concepts and practices, in particular, based on multi-choice items enabling psychometric validation and usage in large-scale studies. Despite their practical relevance, these tests are limited in how they measure students' computational creativity, a crucial ability when applying CT and problem solving in real-world settings. In our work, we have developed ACE, a novel test focusing on the three higher cognitive levels in Bloom's Taxonomy, i.e., Analyze, Evaluate, and Create. ACE comprises a diverse set of 7x3 multi-choice items spanning these three levels, grounded in elementary block-based visual programming. We evaluate the psychometric properties of ACE through a study conducted with 371 students in grades 3-7 from 10 schools. Based on several psychometric analysis frameworks, our results confirm the reliability and validity of ACE. Our study also shows a positive correlation between students' performance on ACE and performance on Hour of Code: Maze Challenge by Code.org.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 画像偽造解析のための物体マスク誘導型フュージョントランス

Fusion Transformer with Object Mask Guidance for Image Forgery Analysis ( http://arxiv.org/abs/2403.12229v1 )

ライセンス: Link先を確認

Dimitrios Karageorgiou, Giorgos Kordopatis-Zilos, Symeon Papadopoulos,

(参考訳) 本研究では,様々な法医学的信号から情報を抽出し,ロバストな画像フォージェリ検出とローカライゼーションを実現するための融合トランスフォーマーネットワークであるOMG-Fuserを紹介する。我々のアプローチは、任意の数の法定信号で動作することができ、その分析にオブジェクト情報を利用することができます。そこで我々は,物体の注意機構によって誘導される変圧器からなる法医学信号ストリームを設計し,同一の物体を表すパッチを関連付ける。このようにして、画像からオブジェクトレベルの情報を取り込む。各法医学信号は、その特異性に適応する異なるストリームによって処理される。その後、トークン融合変換器は任意の数のネットワークストリームの出力を効率よく集約し、各画像パッチに対して融合表現を生成する。これらの表現は最終的に、画像パッチ間の固有の関係をキャプチャする長距離依存変換器によって処理される。提案手法上の2つの融合変種を評価する。 (i)複数の画像鑑定アルゴリズムの出力を融合するスコアレベル融合と (ii)低レベルの法医学的痕跡を直接融合する特徴レベルの融合。どちらの変種も画像偽造検出とローカライゼーションのための7つのデータセットの最先端性能を超えており、F1の相対的な平均改善は12.1%と20.4%である。我々のネットワークは、伝統的で新しい偽造攻撃に対する堅牢性を示し、スクラッチからトレーニングを受けることなく、新しい信号で拡張することができる。

In this work, we introduce OMG-Fuser, a fusion transformer-based network designed to extract information from various forensic signals to enable robust image forgery detection and localization. Our approach can operate with an arbitrary number of forensic signals and leverages object information for their analysis -- unlike previous methods that rely on fusion schemes with few signals and often disregard image semantics. To this end, we design a forensic signal stream composed of a transformer guided by an object attention mechanism, associating patches that depict the same objects. In that way, we incorporate object-level information from the image. Each forensic signal is processed by a different stream that adapts to its peculiarities. Subsequently, a token fusion transformer efficiently aggregates the outputs of an arbitrary number of network streams and generates a fused representation for each image patch. These representations are finally processed by a long-range dependencies transformer that captures the intrinsic relations between the image patches. We assess two fusion variants on top of the proposed approach: (i) score-level fusion that fuses the outputs of multiple image forensics algorithms and (ii) feature-level fusion that fuses low-level forensic traces directly. Both variants exceed state-of-the-art performance on seven datasets for image forgery detection and localization, with a relative average improvement of 12.1% and 20.4% in terms of F1. Our network demonstrates robustness against traditional and novel forgery attacks and can be expanded with new signals without training from scratch.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# ハードサンプルのメタラーニングによる一般化の改善

Improving Generalization via Meta-Learning on Hard Samples ( http://arxiv.org/abs/2403.12236v1 )

ライセンス: Link先を確認

Nishant Jain, Arun S. Suggala, Pradeep Shenoy,

(参考訳) 教師付き学習に対する学習再重み付け(LRW)アプローチでは、代表検証データセットのパフォーマンスを最大化するために、最適化基準を使用してトレーニングインスタンスの重み付けを割り当てる。 LRWトレーニングで使用される検証セットを最適化し、分類器の一般化を改善する。特に、検証集合における分類の難しいインスタンスの使用は、理論上の関係と、一般化の強い経験的証拠の両方を持つことを示す。このメタ最適化モデルを学習するための効率的なアルゴリズムと、注意深い比較研究のための単純なトレインツースヒューリスティックを提供する。簡単な検証データを持つLRWは、ハードな検証データを持つLRWよりも一貫して悪い性能を示し、メタ最適化問題の妥当性を確立した。提案アルゴリズムは,データセットやドメインシフトの課題(Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDSなど)に対して,VIT-BをImagenet上で使用する場合の約1%のゲインで,幅広いベースラインを達成している。また、LRWトレーニングにおける検証のための自然なハード例(Imagenet-R / Imagenet-A)を使用することで、クリーンかつ自然なテストインスタンスの性能が1-2%向上することを示す。 2次解析により、LRWフレームワークにおけるハード検証データを使用することで、テストデータのマージンが向上し、経験的ゲインの基礎となるメカニズムが示唆された。本研究は,メタ学習を教師付き学習コンテキストでメタ学習に最適化するための新たな研究の方向性を開くと信じている。

Learned reweighting (LRW) approaches to supervised learning use an optimization criterion to assign weights for training instances, in order to maximize performance on a representative validation dataset. We pose and formalize the problem of optimized selection of the validation set used in LRW training, to improve classifier generalization. In particular, we show that using hard-to-classify instances in the validation set has both a theoretical connection to, and strong empirical evidence of generalization. We provide an efficient algorithm for training this meta-optimized model, as well as a simple train-twice heuristic for careful comparative study. We demonstrate that LRW with easy validation data performs consistently worse than LRW with hard validation data, establishing the validity of our meta-optimization problem. Our proposed algorithm outperforms a wide range of baselines on a range of datasets and domain shift challenges (Imagenet-1K, CIFAR-100, Clothing-1M, CAMELYON, WILDS, etc.), with ~1% gains using VIT-B on Imagenet. We also show that using naturally hard examples for validation (Imagenet-R / Imagenet-A) in LRW training for Imagenet improves performance on both clean and naturally hard test instances by 1-2%. Secondary analyses show that using hard validation data in an LRW framework improves margins on test data, hinting at the mechanism underlying our empirical gains. We believe this work opens up new research directions for the meta-optimization of meta-learning in a supervised learning context.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 資源制約型IoT環境における効率的なトランスフォーマーベースハイパーパラメータ最適化

Efficient Transformer-based Hyper-parameter Optimization for Resource-constrained IoT Environments ( http://arxiv.org/abs/2403.12237v1 )

ライセンス: Link先を確認

Ibrahim Shaer, Soodeh Nikan, Abdallah Shami,

(参考訳) ハイパーパラメータ最適化(HPO)プロセスは、最も優れた畳み込みニューラルネットワーク(CNN)を見つけるために必須である。 HPOの自動化プロセスは、その巨大な計算フットプリントと透明性の欠如を特徴としている。本稿では,トランスフォーマアーキテクチャとアクタ・クリティック・強化学習(RL)モデルを組み合わせた新しい手法であるTRL-HPOを提案する。これらの仮定は、MNISTデータセット上でTRL-HPOを評価し、CNNモデルをスクラッチから構築する最先端のアプローチと比較することによって、実証的に構築される。 TRL-HPOは,HPOプロセスにおけるTRL-HPOの効率を実証し,これらの手法の分類結果を同時に6.8%向上させることを示した。この結果から, 完全に連結した層を積み重ねることによる性能劣化の主要因を同定した。本稿では,資源制約環境下でのRLベースのHPOプロセスを改善するための新しい方法について述べる。

The hyper-parameter optimization (HPO) process is imperative for finding the best-performing Convolutional Neural Networks (CNNs). The automation process of HPO is characterized by its sizable computational footprint and its lack of transparency; both important factors in a resource-constrained Internet of Things (IoT) environment. In this paper, we address these problems by proposing a novel approach that combines transformer architecture and actor-critic Reinforcement Learning (RL) model, TRL-HPO, equipped with multi-headed attention that enables parallelization and progressive generation of layers. These assumptions are founded empirically by evaluating TRL-HPO on the MNIST dataset and comparing it with state-of-the-art approaches that build CNN models from scratch. The results show that TRL-HPO outperforms the classification results of these approaches by 6.8% within the same time frame, demonstrating the efficiency of TRL-HPO for the HPO process. The analysis of the results identifies the main culprit for performance degradation attributed to stacking fully connected layers. This paper identifies new avenues for improving RL-based HPO processes in resource-constrained environments.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 6Gセキュリティにおける大規模言語モデル - 課題と機会

Large language models in 6G security: challenges and opportunities ( http://arxiv.org/abs/2403.12239v1 )

ライセンス: Link先を確認

Tri Nguyen, Huong Nguyen, Ahmad Ijaz, Saeid Sheikhi, Athanasios V. Vasilakos, Panos Kostakos,

(参考訳) 教育や医療などの分野におけるジェネレーティブAI(GenAI)とLarge Language Models(LLMs)の急速な統合は、テクノロジーの大幅な進歩を象徴している。しかし、この成長は、ほとんど未調査の側面、すなわちセキュリティ上の脆弱性につながっている。オフラインおよびオンラインモデル、さまざまなツール、ブラウザプラグイン、サードパーティアプリケーションを含むエコシステムが拡大を続けるにつれ、攻撃面が大幅に拡大し、セキュリティ侵害の可能性も拡大する。 6Gやランドスケープを超えて拡張されたこれらの拡張は、敵が悪意ある目的のためにLSMを操作するための新たな道を提供する。我々は,LLMのセキュリティ面に,潜在的な敵の立場から焦点をあてる。我々は,その目的と方法論を解明し,既知のセキュリティの弱点を詳細に分析することを目的としている。これには包括的脅威分類の開発が含まれ、様々な敵の行動を分類する。また、我々の研究は、防衛チーム(ブルーチームとしても知られる)によるサイバーセキュリティ活動にLLMがどのように統合されるかに焦点を当てます。 LLMとブロックチェーン技術間のシナジーの可能性を探り、この組み合わせが次世代の完全自律型セキュリティソリューションの開発にどのように寄与するかを検討します。このアプローチは、コンピュータ連続体全体にわたって統一されたサイバーセキュリティ戦略を確立することを目的としており、デジタルセキュリティインフラストラクチャ全体の強化を目的としている。

The rapid integration of Generative AI (GenAI) and Large Language Models (LLMs) in sectors such as education and healthcare have marked a significant advancement in technology. However, this growth has also led to a largely unexplored aspect: their security vulnerabilities. As the ecosystem that includes both offline and online models, various tools, browser plugins, and third-party applications continues to expand, it significantly widens the attack surface, thereby escalating the potential for security breaches. These expansions in the 6G and beyond landscape provide new avenues for adversaries to manipulate LLMs for malicious purposes. We focus on the security aspects of LLMs from the viewpoint of potential adversaries. We aim to dissect their objectives and methodologies, providing an in-depth analysis of known security weaknesses. This will include the development of a comprehensive threat taxonomy, categorizing various adversary behaviors. Also, our research will concentrate on how LLMs can be integrated into cybersecurity efforts by defense teams, also known as blue teams. We will explore the potential synergy between LLMs and blockchain technology, and how this combination could lead to the development of next-generation, fully autonomous security solutions. This approach aims to establish a unified cybersecurity strategy across the entire computing continuum, enhancing overall digital security infrastructure.

翻訳日:2024-03-20 18:12:11 公開日:2024-03-18

# 基準に基づくメトリクスは質問生成のテーマを異にする

Reference-based Metrics Disprove Themselves in Question Generation ( http://arxiv.org/abs/2403.12242v1 )

ライセンス: Link先を確認

Bang Nguyen, Mengxia Yu, Yun Huang, Meng Jiang,

(参考訳) BLEUやBERTScoreのような基準ベースのメトリクスは、質問生成(QG)を評価するために広く使われている。本研究では、SQuADやHotpotQAなどのQGベンチマークにおいて、人手による参照を用いることで基準ベースのメトリクスの有効性を保証できないことを示す。ほとんどのQGベンチマークには1つの参照しかありません。優れた測定基準は、生成した質問に比較して、人間公認の質問を格付けすることが期待された。しかし, 新たに収集した基準値に対する基準基準値の結果は, 基準値自体を反証した。本研究では,大規模言語モデルを用いて,自然性,応答可能性,複雑性などの多次元基準からなる基準自由度尺度を提案する。これらの基準は単一の参照質問の構文や意味に制約されず、メトリクスは多様な参照セットを必要としない。実験の結果、我々の測定基準は高品質な質問と欠陥のある質問を正確に区別し、人間の判断と最先端の一致を実現していることがわかった。

Reference-based metrics such as BLEU and BERTScore are widely used to evaluate question generation (QG). In this study, on QG benchmarks such as SQuAD and HotpotQA, we find that using human-written references cannot guarantee the effectiveness of the reference-based metrics. Most QG benchmarks have only one reference; we replicated the annotation process and collect another reference. A good metric was expected to grade a human-validated question no worse than generated questions. However, the results of reference-based metrics on our newly collected reference disproved the metrics themselves. We propose a reference-free metric consisted of multi-dimensional criteria such as naturalness, answerability, and complexity, utilizing large language models. These criteria are not constrained to the syntactic or semantic of a single reference question, and the metric does not require a diverse set of references. Experiments reveal that our metric accurately distinguishes between high-quality questions and flawed ones, and achieves state-of-the-art alignment with human judgment.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# ゼロショットマルチタスク幻覚検出

Zero-Shot Multi-task Hallucination Detection ( http://arxiv.org/abs/2403.12244v1 )

ライセンス: Link先を確認

Patanjali Bhamidipati, Advaith Malladi, Manish Shrivastava, Radhika Mamidi,

(参考訳) 近年,大規模言語モデルの広範囲な活用は,テキスト生成の品質評価やタスク関連性評価において,ロバストな評価手法の重要性を浮き彫りにしている。これは、生成したテキストがソースへの忠実さに欠け、評価基準から逸脱する、モデルにおける創発的条件である幻覚として知られる一般的な問題を明らかにしている。本研究では,幻覚を正式に定義し,ゼロショット設定における定量的検出のための枠組みを提案する。幻覚検出では, モデル認識設定では0.78, モデル認識設定では0.61の精度が得られた。特に、我々のソリューションは計算効率を保ち、他のSOTAアプローチよりも計算資源をはるかに少なくし、軽量で圧縮されたモデルへの傾向に合わせている。

In recent studies, the extensive utilization of large language models has underscored the importance of robust evaluation methodologies for assessing text generation quality and relevance to specific tasks. This has revealed a prevalent issue known as hallucination, an emergent condition in the model where generated text lacks faithfulness to the source and deviates from the evaluation criteria. In this study, we formally define hallucination and propose a framework for its quantitative detection in a zero-shot setting, leveraging our definition and the assumption that model outputs entail task and sample specific inputs. In detecting hallucinations, our solution achieves an accuracy of 0.78 in a model-aware setting and 0.61 in a model-agnostic setting. Notably, our solution maintains computational efficiency, requiring far less computational resources than other SOTA approaches, aligning with the trend towards lightweight and compressed models.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 寄生虫群集:ゴールデンフリーPCB検証の可能性について

Parasitic Circus:On the Feasibility of Golden Free PCB Verification ( http://arxiv.org/abs/2403.12252v1 )

ライセンス: Link先を確認

Maryam Saadat Safa, Patrick Schaumont, Shahin Tajik,

(参考訳) プリント回路基板(PCB)は電子システムの不可欠な部分である。したがって、サプライチェーン攻撃(例えば、改ざん、偽造)の存在下での物理的な整合性を検証することは、非常に重要である。近年,PCBの電力配信ネットワーク(PDN)のインピーダンス特性を基盤としたタンパー検出技術が,そのグローバルな検出範囲,非侵襲性,低コスト性により注目されている。他の物理的検証方法と同様に、これらの手法は署名の比較のために物理的な金のサンプルの存在に依存している。しかし、ゴールデンシグネチャ抽出のための物理的なゴールデンサンプルにアクセスすることは、多くの現実世界のシナリオでは実現不可能である。そこで本研究では,PCB設計ファイルから得られた金のサンプルを除去し,模擬金のシグネチャに置き換える可能性について検討する。社内設計PCB上で広範囲なシミュレーションと測定を行うことにより,PCBコンポーネントの寄生インピーダンスが,検証に成功するための重要な役割を担っていることを示す。得られた結果と統計値を用いて,シミュレーションと測定から得られたシグネチャの差を緩和できることを示す。

Printed circuit boards (PCBs) are an integral part of electronic systems. Hence, verifying their physical integrity in the presence of supply chain attacks (e.g., tampering and counterfeiting) is of utmost importance. Recently, tamper detection techniques grounded in impedance characterization of PCB's Power Delivery Network (PDN) have gained prominence due to their global detection coverage, non-invasive, and low-cost nature. Similar to other physical verification methods, these techniques rely on the existence of a physical golden sample for signature comparisons. However, having access to a physical golden sample for golden signature extraction is not feasible in many real-world scenarios. In this work, we assess the feasibility of eliminating a physical golden sample and replacing it with a simulated golden signature obtained by the PCB design files. By performing extensive simulation and measurements on an in-house designed PCB, we demonstrate how the parasitic impedance of the PCB components plays a major role in reaching a successful verification. Based on the obtained results and using statistical metrics, we show that we can mitigate the discrepancy between collected signatures from simulation and measurements.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 生成ディープラーニングを用いた適応LDDレーダ波形設計

Adaptive LPD Radar Waveform Design with Generative Deep Learning ( http://arxiv.org/abs/2403.12254v1 )

ライセンス: Link先を確認

Matthew R. Ziemann, Christopher A. Metzler,

(参考訳) 本研究では,その動作環境に混在する低検出(LPD)レーダ波形を適応的に生成する新しい学習手法を提案する。私たちの波形は、周囲の無線周波数(RF)の背景と区別できない分布に従うように設計されています。生成ネットワークは、生成した波形を背景から区別するために最適化された、批評家ネットワークを混乱させるように設計された波形を生成する。生成した波形がまだ検出に有効であることを確かめるために、生成した波形にあいまいさ関数に基づく損失を導入し、最小化する。本研究では, 従来のLPD波形と比較し, 独立に学習した検出ニューラルネットワークを用いて, 単一パルス検出性能の評価を行った。提案手法では,検出性を最大90%低減するLPD波形を生成できると同時に,両義性関数(センサ)特性を向上できることがわかった。私たちのフレームワークは、検出性と検知性能をトレードオフするメカニズムも提供しています。

We propose a novel, learning-based method for adaptively generating low probability of detection (LPD) radar waveforms that blend into their operating environment. Our waveforms are designed to follow a distribution that is indistinguishable from the ambient radio frequency (RF) background -- while still being effective at ranging and sensing. To do so, we use an unsupervised, adversarial learning framework; our generator network produces waveforms designed to confuse a critic network, which is optimized to differentiate generated waveforms from the background. To ensure our generated waveforms are still effective for sensing, we introduce and minimize an ambiguity function-based loss on the generated waveforms. We evaluate the performance of our method by comparing the single-pulse detectability of our generated waveforms with traditional LPD waveforms using a separately trained detection neural network. We find that our method can generate LPD waveforms that reduce detectability by up to 90% while simultaneously offering improved ambiguity function (sensing) characteristics. Our framework also provides a mechanism to trade-off detectability and sensing performance.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 遷移金属シバルコゲナイドモアレ超格子におけるウィグナー分子超結晶-ボトムアップアプローチからの教訓-

Wigner-molecule supercrystal in transition-metal dichalcogenide moiré superlattices: Lessons from the bottom-up approach ( http://arxiv.org/abs/2403.12262v1 )

ライセンス: Link先を確認

Constantine Yannouleas, Uzi Landman,

(参考訳) N=4$フェルミオン型電荷担体を2重井戸型モワール='{e}量子ドット(MQD)において、遷移金属ジカルコゲナイド(TMD)モワール='e超格子における分子超結晶の形成を研究するためのボトムアップ戦略の第一歩として、$\nu > 1$の最小ボディ問題は、フル構成相互作用(FCI)計算による大規模精密対角化(英語版)を用いることで、正確に解決される。しばしば用いられるスピン・アンド・スペース・アンリミテッド・ハートリー・フォック(英語版)(sS-UHF)の平均場解との比較解析は、UHF法(それ自体)の限界を証明し、中間クーロン相互作用の影響を適切に記述する。特に、$\nu=2$ に対して、各 MQD 内の正確な電荷密度 (CDs) は、ウィグナー分子 (WMs) のスライドで見られるように、完全に孤立した MQD のリング状形状特性 (幅広い関連するパラメータに対して) を保持することが明確に示されている。この深く量子力学的な振る舞いは、向きに固定されたよく局在したダンベル二量体のみを描写するUHF CDとは対照的である。さらに sS-UHF の破壊パリティ対称性の復元から得られた FCI 計算と一致する改良CDを導入し, sS-UHF 結果を修正するための平均値を超える方法論的ロードマップを提案する。 $\nu=2$ moir\'e TMD 上の超格子の結論は、孤立 MQD におけるスライド WM に関連する積分充填を持つすべてのケースに拡張されると推測されている。 $\nu=3$のケースは、孤立MQDにおけるピン付きWMと関連付けられており、例外である。

The few-body problem for $N=4$ fermionic charge carriers in a double-well moir\'{e} quantum dot (MQD), representing the first step in a bottom-up strategy to investigate formation of molecular supercrystals in transition metal dichalcogenide (TMD) moir\'e superlattices with integral fillings, $\nu > 1$, is solved exactly by employing large-scale exact-diagonalization via full configuration interaction (FCI) computations. A comparative analysis with the mean-field solutions of the often used spin-and-space unrestricted Hartree Fock (sS-UHF) demonstrates the limitations of the UHF method (by itself) to provide a proper description of the influence of the interdot Coulomb interaction. In particular, it is explicitly shown for $\nu=2$ that the exact charge densities (CDs) within each MQD retain the ring-like shape characteristic (for a wide range of relevant parameters) of a fully isolated MQD, as was found for sliding Wigner molecules (WMs). This deeply quantum-mechanical behavior contrasts sharply with the UHF CDs that portray solely orientationally pinned and well localized dumbbell dimers. An improved CD, which agrees with the FCI-calculated one, derived from the restoration of the sS-UHF broken parity symmetries is further introduced, suggesting a beyond-mean-field methodological roadmap for correcting the sS-UHF results. It is conjectured that the conclusions for the $\nu=2$ moir\'e TMD superlattice case extend to all cases with integral fillings that are associated with sliding WMs in isolated MQDs. The case of $\nu=3$, associated with a pinned WM in isolated MQDs, is an exception.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 包括的包括的包括的包括的地域的イニシアティブ : 共同設計支援技術へのコミュニティ統合

Fostering Inclusion: A Regional Initiative Uniting Communities to Co-Design Assistive Technologies ( http://arxiv.org/abs/2403.12263v1 )

ライセンス: Link先を確認

Katharina Schmermbeck, Oliver Ott, Lennart Ralfs, Robert Weidner,

(参考訳) 障害者は社会のあらゆる領域で差別やアクセスの欠如に直面していることが多い。補助技術の可利用性と適切性の向上は、参加や自立を容易にするための道を開くことができるが、社会の一部としての障害者の認識と受容は避けられない。提示された地域的取り組みは、障害のある人々、学生、研究者、協会をまとめることによって、これらの課題に対処しようとするものである。大学における様々な講義形式において、学生は障害のある人と支援技術を共同設計する。 1年間の実践の後、我々はこのイニシアチブとその支援技術の発展と能力主義の緩和への影響を振り返る。参加者や他の関係者を対象に,13回の半構造化インタビューを実施し,分析した。すべての共同設計プロジェクトは講義中に完了したわけではない。それにもかかわらず、参加者は共同設計のアプローチと正しい方向へのステップを理解できた。インタビュアーは、このイニシアチブが認知を高めることの重要性を強調し、障害に関する知識を広げ、参加する人々に対して内部的に有能な仮定を仮定した。我々は、具体的な補助技術、アクセシビリティのギャップを埋め、より包括的な社会を育むために、コラボレーション、継続、および公的なアウトリーチが最重要であると結論付けている。

People with disabilities often face discrimination and lack of access in all areas of society. While improving the affordability and appropriateness of assistive technologies can pave the way for easier participation and independence, awareness and acceptance of disability as part of society are inevitable. The presented regional initiative strives to tackle these problems by bringing together people with disabilities, students, researchers, and associations. During different lecture formats at the university, students co-design assistive technologies with people with disabilities. After one year in practice, we reflect on the initiative and its impact on assistive technology development and mitigation of ableism. We conducted and analyzed thirteen semi-structured interviews with participants and other involved stakeholders. Not all co-design projects were finished within the time of a lecture. Participants nevertheless appreciated the co-design approach and steps in the right direction as projects are continued in upcoming semesters. Interviewees highlighted the initiative's importance in raising awareness and broadening knowledge regarding disability and internalized ableist assumptions for those participating. We conclude that collaboration, continuity, and public outreach are most important to work towards tangible assistive technologies, bridging accessibility gaps, and fostering a more inclusive society.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# データ効率の良いコントラスト言語画像事前学習:量よりもデータ品質を優先する

Data-Efficient Contrastive Language-Image Pretraining: Prioritizing Data Quality over Quantity ( http://arxiv.org/abs/2403.12267v1 )

ライセンス: Link先を確認

Siddharth Joshi, Arnav Jain, Ali Payani, Baharan Mirzasoleiman,

(参考訳) 大規模画像キャプチャデータセット上でのCLIP(Contrastive Language- Image Pre-Training)は、目覚ましいゼロショットの一般化を実現する表現を学習する。しかし、そのようなモデルは大量の事前学習データを必要とする。事前トレーニングデータの品質向上は、ボリュームの増加よりもCLIPのパフォーマンス向上に有効であることが示されている。それでも、ベストを確実に一般化するトレーニングデータの小さなサブセットを見つけることは、未解決の問題のままである。本稿では,CLIPの理論的に厳密なデータ選択法を提案する。画像とキャプションの相互共分散を密に保存する部分集合は、より優れた一般化性能が得られることを示す。 ConceptualCaptions3MとConceptualCaptions12Mの広範な実験により、 \method\が発見したサブセットは、ImageNetとそのシフトしたバージョンにおける次の最良のベースラインの精度の2.7倍と1.4倍の精度を達成することが示された。さらに,我々のサブセットでは,11の下流データセットの平均精度が1.5倍になることを示す。コードはhttps://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip.comで入手できる。

Contrastive Language-Image Pre-training (CLIP) on large-scale image-caption datasets learns representations that can achieve remarkable zero-shot generalization. However, such models require a massive amount of pre-training data. Improving the quality of the pre-training data has been shown to be much more effective in improving CLIP's performance than increasing its volume. Nevertheless, finding small subsets of training data that provably generalize the best has remained an open question. In this work, we propose the first theoretically rigorous data selection method for CLIP. We show that subsets that closely preserve the cross-covariance of the images and captions of the full data provably achieve a superior generalization performance. Our extensive experiments on ConceptualCaptions3M and ConceptualCaptions12M demonstrate that subsets found by \method\ achieve over 2.7x and 1.4x the accuracy of the next best baseline on ImageNet and its shifted versions. Moreover, we show that our subsets obtain 1.5x the average accuracy across 11 downstream datasets, of the next best baseline. The code is available at: https://github.com/BigML-CS-UCLA/clipcov-data-efficient-clip.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# ウィグナー関数の音化--強い光-物質相互作用のケーススタディ

Sonification of Wigner functions: case study of intense light-matter interactions ( http://arxiv.org/abs/2403.12269v1 )

ライセンス: Link先を確認

Reiko Yamada, Antoine Reserbat-Plantey, Eloy Piñol, Maciej Lewenstein,

(参考訳) 量子力学において、ウィグナー函数 $\rho_W(\textbf{r},\textbf{p})$ は位相空間表現として機能し、量子系の位置 $\textbf{r}$ と運動量 $\textbf{p}$ の両方に関する情報を取得する。ウィグナー関数は観測変数の期待値の計算、量子力学の検証、コヒーレンスと相関の解析を容易にする。したがって、例えば音素化技術を用いて、量子システムを直感的に表現するためのツールとして機能するかもしれない。本稿では,前回のプロジェクトにおける実験戦略を要約し,その成果に基づく新しいアプローチについて述べる。提案手法は,量子現象の直観的理解と解釈を高めることを目的として,量子化とスコアリングのプロセスを洗練することを目的としている。

In quantum mechanics, the Wigner function $\rho_W(\textbf{r},\textbf{p})$ serves as a phase-space representation, capturing information about both the position $\textbf{r}$ and momentum $\textbf{p}$ of a quantum system. The Wigner function facilitates the calculation of expectation values of observables, examination of quantum system dynamics, and analysis of coherence and correlations. Therefore, it might serve as a tool to express quantum systems intuitively, for example, by using sonification techniques. This paper summarizes the experimental strategies employed in a previous project and delineates a new approach based on its outcomes. Emphasizing the attribution of specific Wigner functions to their underlying quantum states, dynamics, and sources; our proposed methodology seeks to refine the sonification and scoring process, aiming to enhance intuitive understanding and interpretation of quantum phenomena.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 相関電子波動関数の変動補間によって実現される高速かつ正確な非断熱的分子動力学

Fast and accurate nonadiabatic molecular dynamics enabled through variational interpolation of correlated electron wavefunctions ( http://arxiv.org/abs/2403.12275v1 )

ライセンス: Link先を確認

Kemal Atalar, Yannic Rath, Rachel Crespo-Otero, George H. Booth,

(参考訳) 本研究では, 固有ベクトル継続の概念に基づいて, 平均フィールドコストで化学空間を通した多体波動関数の訓練セットを, 厳密かつ滑らかに補間する効率的な多状態法を開発した。推定された状態は、異なる核ジオメトリの多体基底間で伝達される訓練状態の変分最適線形結合として表される。モデルから解析的多状態力と非断熱的結合が非断熱的分子動力学に適用可能であることを示す。このことは、光励起された28原子水素鎖の非断熱的分子動力学に応用し、結果として生じる核運動が驚くほど複雑になる。異なるジオメトリーにおける低エネルギー相関電子構造からのトレーニング状態の22個のDMRG計算で、12,000ジオメトリーにおける多状態エネルギー, 力および非断熱結合ベクトルを、ブルート力アプローチでは実現できない分子軌道のアンサンブルに沿った高精度な収束性で推定する。これにより、正確な単一点相関電子構造法と光誘起分子動力学の関連性の時間スケールの間に時間スケールを橋渡しするルートが開かれる。

We build on the concept of eigenvector continuation to develop an efficient multi-state method for the rigorous and smooth interpolation of a small training set of many-body wavefunctions through chemical space at mean-field cost. The inferred states are represented as variationally optimal linear combinations of the training states transferred between the many-body basis of different nuclear geometries. We show that analytic multi-state forces and nonadiabatic couplings from the model enable application to nonadiabatic molecular dynamics, developing an active learning scheme to ensure a compact and systematically improvable training set. This culminates in application to the nonadiabatic molecular dynamics of a photoexcited 28-atom hydrogen chain, with surprising complexity in the resulting nuclear motion. With just 22 DMRG calculations of training states from the low-energy correlated electronic structure at different geometries, we infer the multi-state energies, forces and nonadiabatic coupling vectors at 12,000 geometries with provable convergence to high accuracy along an ensemble of molecular trajectories, which would not be feasible with a brute force approach. This opens up a route to bridge the timescales between accurate single-point correlated electronic structure methods and timescales of relevance for photo-induced molecular dynamics.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 急激な正則化を図った確率的ラウンドリング

Stochastic Rounding Implicitly Regularizes Tall-and-Thin Matrices ( http://arxiv.org/abs/2403.12278v1 )

ライセンス: Link先を確認

Gregory Dexter, Christos Boutsikas, Linkai Ma, Ilse C. F. Ipsen, Petros Drineas,

(参考訳) 機械学習の文脈における確率的ラウンドリングの人気と大規模ディープニューラルネットワークモデルの訓練により、実行列の確率的近接性ラウンドリングは列よりも多くの行を持つと考えられる。確率の高い確率で、確率的に丸い行列の最小特異値がゼロから十分離れているという新しい理論的な証拠を提供する。言い換えれば、確率的丸み付け \textit{implicitly regularizes} の高さと細い行列は $\mathbf{A}$ であり、丸み付きバージョンは全列ランクを持つ。我々の証明はランダム行列理論の強力な結果を利用しており、確率的丸め誤差は低次元の列空間に集中しないという考え方である。

Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# FinLlama: アルゴリズムトレーディングアプリケーションのためのファイナンシャルインセンティブ分類

FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications ( http://arxiv.org/abs/2403.12285v1 )

ライセンス: Link先を確認

Thanos Konstantinidis, Giorgos Iacovides, Mingxue Xu, Tony G. Constantinides, Danilo Mandic,

(参考訳) 市場の動きやトレーダーの判断に影響を及ぼす金融ニュースは、オンラインでもいくつか出ている。これは正確な感情分析の必要性を強調し、適切なアルゴリズムトレーディング技術を持つことに加えて、より詳細なトレーディング決定を下す必要がある。標準的なレキシコンベースの感情アプローチは、財政的な決定を補助する力を示している。しかし、文脈の感度や単語の順序に関する問題に悩まされていることが知られている。 LLM(Large Language Models)もこの文脈で使用することができるが、財務に特化せず、重要な計算資源を必要とする傾向がある。 Llama 2 7Bの基礎モデルに基づく新たなアプローチを導入し,その生成特性と包括的言語操作のメリットを享受する。これは、Llama2 7Bモデルを教師付き財務感情分析データのごく一部に微調整することで、金融レキシコンとコンテキストの複雑さを共同で処理し、さらにニューラルネットワークに基づく決定機構を組み込むことによって達成される。このようなジェネレータ分類スキームはFinLlamaと呼ばれ、感情の原子価を分類するだけでなく、その強さを定量化するために訓練されている。補足すれば、LoRAによるパラメータ効率の良い微調整の実装は、トレーニング可能なパラメータを最適化し、精度を犠牲にすることなく、計算とメモリの要求を最小限に抑えることができる。シミュレーションの結果は、FinLlamaがポートフォリオ管理の強化と市場リターンの向上のためのフレームワークを提供する能力を示している。これらの結果は、不安定な期間や予測不可能な市場イベントであっても、高いレジリエンスを示すハイリターンポートフォリオを構築するためのFinLlamaの能力の基盤となっている。

There are multiple sources of financial news online which influence market movements and trader's decisions. This highlights the need for accurate sentiment analysis, in addition to having appropriate algorithmic trading techniques, to arrive at better informed trading decisions. Standard lexicon based sentiment approaches have demonstrated their power in aiding financial decisions. However, they are known to suffer from issues related to context sensitivity and word ordering. Large Language Models (LLMs) can also be used in this context, but they are not finance-specific and tend to require significant computational resources. To facilitate a finance specific LLM framework, we introduce a novel approach based on the Llama 2 7B foundational model, in order to benefit from its generative nature and comprehensive language manipulation. This is achieved by fine-tuning the Llama2 7B model on a small portion of supervised financial sentiment analysis data, so as to jointly handle the complexities of financial lexicon and context, and further equipping it with a neural network based decision mechanism. Such a generator-classifier scheme, referred to as FinLlama, is trained not only to classify the sentiment valence but also quantify its strength, thus offering traders a nuanced insight into financial news articles. Complementing this, the implementation of parameter-efficient fine-tuning through LoRA optimises trainable parameters, thus minimising computational and memory requirements, without sacrificing accuracy. Simulation results demonstrate the ability of the proposed FinLlama to provide a framework for enhanced portfolio management decisions and increased market returns. These results underpin the ability of FinLlama to construct high-return portfolios which exhibit enhanced resilience, even during volatile periods and unpredictable market events.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 3次元解剖学的セグメンテーションにおけるスライス伝播不確かさの推定と解析

Estimation and Analysis of Slice Propagation Uncertainty in 3D Anatomy Segmentation ( http://arxiv.org/abs/2403.12290v1 )

ライセンス: Link先を確認

Rachaell Nihalaani, Tushar Kataria, Jadie Adams, Shireen Y. Elhabian,

(参考訳) 3次元解剖学的セグメンテーションの監視手法は優れた性能を示すが、アノテートされたデータの可用性によって制限されることが多い。この制限により、利用可能な無注釈データの豊富さと相まって、自己監督的なアプローチへの関心が高まっている。スライス伝播は、スライス登録を自己監督タスクとして活用し、最小限の監督で完全な解剖学的セグメンテーションを実現する自己監督的アプローチとして登場した。このアプローチによって、ドメインの専門知識、時間、およびセグメンテーションネットワークのトレーニングに必要な完全なアノテーション付きデータセット構築に伴うコストが大幅に削減される。しかし、この決定論的ネットワークによる監視の削減へのシフトは、特により正確な教師付きアプローチと比較して、予測の信頼性と信頼性に関する懸念を提起する。この問題に対処するため,キャリブレーションされた不確実性定量化(UQ)をスライス伝播法に統合し,モデルの予測信頼性と信頼性レベルについて考察する。不確実性対策を取り入れることで、自己管理アプローチに対するユーザの信頼感を高め、実用的な適用性を向上させる。 5つのUQ法を用いて3次元腹部分割のための3つのデータセットについて実験を行った。その結果,UQの導入はモデルの信頼性だけでなくセグメンテーションの精度も向上することがわかった。さらに, エンドユーザーにはすぐには明らかでないかもしれないスライス伝播手法の様々な障害モードを明らかにした。本研究は,スライス伝播法の精度と信頼性を向上させるため,新しい研究手法を開拓する。

Supervised methods for 3D anatomy segmentation demonstrate superior performance but are often limited by the availability of annotated data. This limitation has led to a growing interest in self-supervised approaches in tandem with the abundance of available un-annotated data. Slice propagation has emerged as an self-supervised approach that leverages slice registration as a self-supervised task to achieve full anatomy segmentation with minimal supervision. This approach significantly reduces the need for domain expertise, time, and the cost associated with building fully annotated datasets required for training segmentation networks. However, this shift toward reduced supervision via deterministic networks raises concerns about the trustworthiness and reliability of predictions, especially when compared with more accurate supervised approaches. To address this concern, we propose the integration of calibrated uncertainty quantification (UQ) into slice propagation methods, providing insights into the model's predictive reliability and confidence levels. Incorporating uncertainty measures enhances user confidence in self-supervised approaches, thereby improving their practical applicability. We conducted experiments on three datasets for 3D abdominal segmentation using five UQ methods. The results illustrate that incorporating UQ improves not only model trustworthiness, but also segmentation accuracy. Furthermore, our analysis reveals various failure modes of slice propagation methods that might not be immediately apparent to end-users. This study opens up new research avenues to improve the accuracy and trustworthiness of slice propagation methods.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# パルス励起および連続波励起によるh-BN量子エミッタの光子統計解析

Photon statistics analysis of h-BN quantum emitters with pulsed and continuous-wave excitation ( http://arxiv.org/abs/2403.12291v1 )

ライセンス: Link先を確認

Hamidreza Akbari, Pankaj K. Jha, Kristina Malinowski, Benjamin E. C. Koltenbah, Harry A. Atwater,

(参考訳) ヘキサゴナル窒化ホウ素(h-BN)量子エミッタの量子光子統計について,マンデルQパラメータを解析して報告する。我々は,h-BN量子エミッタのマンデルQパラメータを,様々な温度およびポンプ出力条件下で測定した。パルス励起では、-0.002のマンデルQと連続波励起(CW)により、このパラメータは-0.0025に達する。低温がマンデルQに与える影響を調べた結果,光子統計は温度とともに弱く変化することがわかった。励起2レベルエミッタモデルからの自然放出の計算により, 実験光子収集効率を考慮した場合, マンデルQパラメータと測定値との良好な一致を示す。最後に、乱数生成の例による量子応用におけるマンデルQの有用性を説明し、この方法によるランダムビットの生成速度に対するマンデルQの効果を分析する。

We report on the quantum photon statistics of hexagonal boron nitride (h-BN) quantum emitters by analyzing the Mandel Q parameter. We have measured the Mandel Q parameter for h-BN quantum emitters under various temperatures and pump power excitation conditions. Under pulsed excitation we can achieve a Mandel Q of -0.002 and under continuous-wave (CW) excitation this parameter can reach -0.0025. We investigate the effect of cryogenic temperatures on Mandel Q and conclude that the photon statistics vary weakly with temperature. Through calculation of spontaneous emission from an excited two-level emitter model, we demonstrate good agreement between measured and calculated Mandel Q parameter when accounting for the experimental photon collection efficiency. Finally, we illustrate the usefulness of Mandel Q in quantum applications by the example of random number generation and analyze the effect of Mandel Q on the speed of generating random bits via this method.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# DALL-E 2における合成構文と意味論の比較検討

A Comparative Investigation of Compositional Syntax and Semantics in DALL-E 2 ( http://arxiv.org/abs/2403.12294v1 )

ライセンス: Link先を確認

Elliot Murphy, Jill de Villiers, Sofia Lucero Morales,

(参考訳) 本研究は,DALL-E 2が,幼児の理解テストにおける言語指導の意味を視覚的にどう表現するかを比較検討した。 2～7歳の英語を話す数百人の子どもを対象にした評価試験から,文法知識の基本的構成要素を表す文を抽出した。 DALL-E 2は、大人9人の審査員が得点するために、これらのプロンプトを5回与え、アイテムごとに20の漫画を制作した。その結果,若年者(2歳)においても,DALL-E2生成画像が子供の意味的精度に合致する状況はみられなかった。 DALL-E 2 は、可逆的な形で適切な役割を割り当てることに失敗した; 子どもが受け取っていたよりコントラストの強いプロンプトにもかかわらず否定することに失敗した; 間違った名詞に形容詞を割り当てることがしばしばあり、受身者の暗黙のエージェントを無視した。この研究は、DALL-E 2の合成文表現が明らかに存在しないことを示唆している。

In this study we compared how well DALL-E 2 visually represented the meaning of linguistic prompts also given to young children in comprehension tests. Sentences representing fundamental components of grammatical knowledge were selected from assessment tests used with several hundred English-speaking children aged 2-7 years for whom we had collected original item-level data. DALL-E 2 was given these prompts five times to generate 20 cartoons per item, for 9 adult judges to score. Results revealed no conditions in which DALL-E 2-generated images that matched the semantic accuracy of children, even at the youngest age (2 years). DALL-E 2 failed to assign the appropriate roles in reversible forms; it failed on negation despite an easier contrastive prompt than the children received; it often assigned the adjective to the wrong noun; it ignored implicit agents in passives. This work points to a clear absence of compositional sentence representations for DALL-E 2.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 偽カバレッジ率制御による情報共形予測セットの選択

Selecting informative conformal prediction sets with false coverage rate control ( http://arxiv.org/abs/2403.12295v1 )

ライセンス: Link先を確認

Ulysse Gazin, Ruth Heller, Ariane Marandon, Etienne Roquain,

(参考訳) 回帰と分類を含む教師付き学習において、コンフォメーション法は、任意の機械学習予測器に対して有限サンプルカバレッジで結果/ラベルの予測セットを提供する。このような予測セットが選択プロセスの後に現れる場合を考える。選択過程は、選択された予測セットが、明確に定義された意味で「形式的」であることが要求される。予測ラベルセットや予測間隔を十分に小さくしたり、null値を除外したり、あるいは他の適切な「モノトーン」制約に従う場合にのみ、分析者が情報的とみなすような分類と回帰の両方について検討する。本研究は,様々なアプリケーションへの関心を多岐にわたってカバーするが,提案したサンプルに対して偽カバレッジ率(FCR)を制御しながら,このような情報的共形予測セットを構築するための統一的なフレームワークを開発する。選択後の共形予測セットは、この分野における最近の文献の焦点となっているが、InfoSPとInfoSCOPと呼ばれる新しい手順は、情報的予測セットにFCR制御を提供する最初の方法である。提案手法の有効性を実データおよびシミュレーションデータに示す。

In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictors. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be `informative' in a well defined sense. We consider both the classification and regression settings where the analyst may consider as informative only the sample with prediction label sets or prediction intervals small enough, excluding null values, or obeying other appropriate `monotone' constraints. While this covers many settings of possible interest in various applications, we develop a unified framework for building such informative conformal prediction sets while controlling the false coverage rate (FCR) on the selected sample. While conformal prediction sets after selection have been the focus of much recent literature in the field, the new introduced procedures, called InfoSP and InfoSCOP, are to our knowledge the first ones providing FCR control for informative prediction sets. We show the usefulness of our resulting procedures on real and simulated data.

翻訳日:2024-03-20 18:02:18 公開日:2024-03-18

# 大規模言語モデルを用いて臨床ノートから物質使用障害の重症度を抽出する:ゼロショット学習アプローチ

Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach ( http://arxiv.org/abs/2403.12297v1 )

ライセンス: Link先を確認

Maria Mahbub, Gregory M. Dams, Sudarshan Srinivasan, Caitlin Rizy, Ioana Danciu, Jodie Trafton, Kathryn Knight,

(参考訳) 物質利用障害 (SUD) は、健康や社会に有害な影響があるとして大きな懸念を抱いている。 SUDの識別と治療は、重症度、共同決定要因(例えば、離脱症状)、社会的決定要因など、様々な要因に依存している。国際疾患分類(ICD-10)のようなアメリカの保険会社が使用している既存の診断符号システムでは、特定の診断の粒度が不足しているが、臨床医はこの粒度(精神障害の診断・統計マニュアル(DSM-5)で見られるように)を臨床医に補足的な非構造テキストとして追加する。従来の自然言語処理(NLP)手法は、このような多様な臨床言語を正確に解析する際の限界に直面している。大きな言語モデル(LLM)は、多様な言語パターンに適応することで、これらの課題を克服する約束を提供する。本研究は,臨床ノートから重症度関連情報を抽出するためのLSMの応用について検討した。 LLMのゼロショット学習を巧みに構築したプロンプトと後処理技術を用いたワークフローを提案する。オープンソース LLM である Flan-T5 を用いた実験により,ルールベースアプローチよりも優れたリコールを実演する。重症度情報抽出におけるLSMsの有効性はSUD患者のリスク評価と治療計画の改善に寄与すると考えられる。

Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10), lack granularity for certain diagnoses, but clinicians will add this granularity (as that found within the Diagnostic and Statistical Manual of Mental Disorders classification or DSM-5) as supplemental unstructured text in clinical notes. Traditional natural language processing (NLP) methods face limitations in accurately parsing such diverse clinical language. Large Language Models (LLMs) offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of LLMs for extracting severity-related information for various SUD diagnoses from clinical notes. We propose a workflow employing zero-shot learning of LLMs with carefully crafted prompts and post-processing techniques. Through experimentation with Flan-T5, an open-source LLM, we demonstrate its superior recall compared to the rule-based approach. Focusing on 11 categories of SUD diagnoses, we show the effectiveness of LLMs in extracting severity information, contributing to improved risk assessment and treatment planning for SUD patients.